Annotation of mandoc/mandoc_escape.3, Revision 1.1
1.1 ! schwarze 1: .\" $Id$
! 2: .\"
! 3: .\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org>
! 4: .\"
! 5: .\" Permission to use, copy, modify, and distribute this software for any
! 6: .\" purpose with or without fee is hereby granted, provided that the above
! 7: .\" copyright notice and this permission notice appear in all copies.
! 8: .\"
! 9: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
! 10: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
! 11: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
! 12: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
! 13: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
! 14: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
! 15: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
! 16: .\"
! 17: .Dd $Mdocdate$
! 18: .Dt MANDOC_ESCAPE 3
! 19: .Os
! 20: .Sh NAME
! 21: .Nm mandoc_escape
! 22: .Nd parse roff escape sequences
! 23: .Sh LIBRARY
! 24: .Lb libmandoc
! 25: .Sh SYNOPSIS
! 26: .In sys/types.h
! 27: .In mandoc.h
! 28: .Ft "enum mandoc_esc"
! 29: .Fo mandoc_escape
! 30: .Fa "const char **end"
! 31: .Fa "const char **start"
! 32: .Fa "int *sz"
! 33: .Fc
! 34: .Sh DESCRIPTION
! 35: This function scans a
! 36: .Xr roff 7
! 37: escape sequence.
! 38: .Pp
! 39: An escape sequence consists of
! 40: .Bl -dash -compact -width 2n
! 41: .It
! 42: an initial backslash character
! 43: .Pq Sq \e ,
! 44: .It
! 45: a single ASCII character called the escape sequence identifier,
! 46: .It
! 47: and, with only a few exceptions, an argument.
! 48: .El
! 49: .Pp
! 50: Arguments can be given in the following forms; some escape sequence
! 51: identifiers only accept some of these forms as specified below.
! 52: The first three forms are called the standard forms.
! 53: .Bl -tag -width 2n
! 54: .It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&]
! 55: The argument starts after the initial
! 56: .Sq \&[ ,
! 57: ends before the final
! 58: .Sq \&] ,
! 59: and the escape sequence ends with the final
! 60: .Sq \&] .
! 61: .It Two-character argument short form: Ic \&( Ns Ar ar
! 62: This form can only be used for arguments
! 63: consisting of exactly two characters.
! 64: It has the same effect as
! 65: .Ic \&[ Ns Ar ar Ns Ic \&] .
! 66: .It One-character argument short form: Ar a
! 67: This form can only be used for arguments
! 68: consisting of exactly one character.
! 69: It has the same effect as
! 70: .Ic \&[ Ns Ar a Ns Ic \&] .
! 71: .It Delimited form: Ar C Ns Ar argument Ns Ar C
! 72: The argument starts after the initial delimiter character
! 73: .Ar C ,
! 74: ends before the next occurrence of the delimiter character
! 75: .Ar C ,
! 76: and the escape sequence ends with that second
! 77: .Ar C .
! 78: Some escape sequences allow arbitrary characters
! 79: .Ar C
! 80: as quoting characters, some restrict the range of characters
! 81: that can be used as quoting characters.
! 82: .El
! 83: .Pp
! 84: Upon function entry,
! 85: .Fa end
! 86: is expected to point to the escape sequence identifier.
! 87: The values passed in as
! 88: .Fa start
! 89: and
! 90: .Fa sz
! 91: are ignored and overwritten.
! 92: .Pp
! 93: By design, this function cannot handle those
! 94: .Xr roff 7
! 95: escape sequences that require in-place expansion, in particular
! 96: user-defined strings
! 97: .Ic \e* ,
! 98: number registers
! 99: .Ic \en ,
! 100: width measurements
! 101: .Ic \ew ,
! 102: and numerical expression control
! 103: .Ic \eB .
! 104: These are handled by
! 105: .Fn roff_res ,
! 106: a private preprocessor function called from
! 107: .Fn roff_parseln ,
! 108: see the file
! 109: .Pa roff.c .
! 110: .Pp
! 111: The function
! 112: .Fn mandoc_escape
! 113: is used
! 114: .Bl -dash -compact -width 2n
! 115: .It
! 116: recursively by itself, because some escape sequence arguments can
! 117: in turn contain other escape sequences,
! 118: .It
! 119: for error detection internally by the
! 120: .Xr roff 7
! 121: parser part of the
! 122: .Lb libmandoc ,
! 123: see the file
! 124: .Pa roff.c ,
! 125: .It
! 126: above all externally by the
! 127: .Xr mandoc
! 128: formatting modules, in particular
! 129: .Fl Tascii
! 130: and
! 131: .Fl Thtml ,
! 132: for formatting purposes, see the files
! 133: .Pa term.c
! 134: and
! 135: .Pa html.c ,
! 136: .It
! 137: and rarely externally by high-level utilities using the mandoc library,
! 138: for example
! 139: .Xr makewhatis 8 ,
! 140: to purge escape sequences from text.
! 141: .El
! 142: .Sh RETURN VALUES
! 143: Upon function return, the pointer
! 144: .Fa end
! 145: is set to the character after the end of the escape sequence,
! 146: such that the calling higher-level parser can easily continue.
! 147: .Pp
! 148: For escape sequences taking an argument, the pointer
! 149: .Fa start
! 150: is set to the beginning of the argument and
! 151: .Fa sz
! 152: is set to the length of the argument.
! 153: For escape sequences not taking an argument,
! 154: .Fa start
! 155: is set to the character after the end of the sequence and
! 156: .Fa sz
! 157: is set to 0.
! 158: Both
! 159: .Fa start
! 160: and
! 161: .Fa sz
! 162: may be
! 163: .Dv NULL ;
! 164: in that case, the argument and the length are not returned.
! 165: .Pp
! 166: For sequences taking an argument, the function
! 167: .Fn mandoc_escape
! 168: returns one of the following values:
! 169: .Bl -tag -width 2n
! 170: .It Dv ESCAPE_FONT
! 171: The escape sequence
! 172: .Ic \ef
! 173: taking an argument in standard form:
! 174: .Ic \ef[ , \ef( , \ef Ns Ar a .
! 175: Two-character arguments starting with the character
! 176: .Sq C
! 177: are reduced to one-character arguments by skipping the
! 178: .Sq C .
! 179: More specific values are returned for the most commonly used arguments:
! 180: .Bl -column "argument" "ESCAPE_FONTITALIC"
! 181: .It argument Ta return value
! 182: .It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN
! 183: .It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC
! 184: .It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD
! 185: .It Cm P Ta Dv ESCAPE_FONTPREV
! 186: .It Cm BI Ta Dv ESCAPE_FONTBI
! 187: .El
! 188: .It Dv ESCAPE_SPECIAL
! 189: The escape sequence
! 190: .Ic \eC
! 191: taking an argument delimited with the single quote character
! 192: and, as a special exception, the escape sequences
! 193: .Em not
! 194: having an identifier, that is, those where the argument, in standard
! 195: form, directly follows the initial backslash:
! 196: .Ic \eC' , \e[ , \e( , \e Ns Ar a .
! 197: Note that the one-character argument short form can only be used for
! 198: argument characters that do not clash with escape sequence identifiers.
! 199: .Pp
! 200: If the argument consists of more than one character
! 201: and starts with the character
! 202: .Sq u ,
! 203: .Dv ESCAPE_UNICODE
! 204: is returned as described below.
! 205: If the argument is just the single character
! 206: .Sq u ,
! 207: .Dv ESCAPE_ERROR
! 208: is returned.
! 209: .Pp
! 210: The
! 211: .Dv ESCAPE_SPECIAL
! 212: special character escape sequences can be rendered using the functions
! 213: .Fn mchars_spec2cp
! 214: and
! 215: .Fn mchars_spec2str
! 216: described in the
! 217: .Xr mchars_alloc 3
! 218: manual.
! 219: .It Dv ESCAPE_UNICODE
! 220: Escape sequences of the same format as described above under
! 221: .Dv ESCAPE_SPECIAL ,
! 222: but with an argument starting with the character
! 223: .Sq u :
! 224: .Ic \eC'u , \e[u .
! 225: As a special exception,
! 226: .Fa start
! 227: is set to the character after the
! 228: .Sq u ,
! 229: and the
! 230: .Fa sz
! 231: return value does not include the
! 232: .Sq u
! 233: either.
! 234: .Pp
! 235: Such Unicode character escape sequences can be rendered using the function
! 236: .Fn mchars_num2uc
! 237: described in the
! 238: .Xr mchars_alloc 3
! 239: manual.
! 240: .It Dv ESCAPE_NUMBERED
! 241: The escape sequence
! 242: .Ic \eN
! 243: followed by a delimited argument.
! 244: The delimiter character is arbitrary except that digits cannot be used.
! 245: If a digit is encountered instead of the opening delimiter, that
! 246: digit is considered to be the argument and the end of the sequence, and
! 247: .Dv ESCAPE_IGNORE
! 248: is returned.
! 249: .Pp
! 250: Such ASCII character escape sequences can be rendered using the function
! 251: .Fn mchars_num2char
! 252: described in the
! 253: .Xr mchars_alloc 3
! 254: manual.
! 255: .It Dv ESCAPE_IGNORE
! 256: .Bl -bullet -width 2n
! 257: .It
! 258: The escape sequence
! 259: .Ic \es
! 260: followed by an argument in standard form or by an argument delimited
! 261: by the single quote character:
! 262: .Ic \es' , \es[ , \es( , \es Ns Ar a .
! 263: As a special exception, an optional
! 264: .Sq +
! 265: or
! 266: .Sq \-
! 267: character is allowed after the
! 268: .Sq s
! 269: for all forms.
! 270: .It
! 271: The escape sequences
! 272: .Ic \eF ,
! 273: .Ic \eg ,
! 274: .Ic \ek ,
! 275: .Ic \eM ,
! 276: .Ic \em ,
! 277: .Ic \en ,
! 278: .Ic \eV ,
! 279: and
! 280: .Ic \eY
! 281: followed by an argument in standard form.
! 282: .It
! 283: The escape sequences
! 284: .Ic \eA ,
! 285: .Ic \eb ,
! 286: .Ic \eD ,
! 287: .Ic \eo ,
! 288: .Ic \eR ,
! 289: .Ic \eX ,
! 290: and
! 291: .Ic \eZ
! 292: followed by an argument delimited by an arbitrary character.
! 293: .It
! 294: The escape sequences
! 295: .Ic \eH ,
! 296: .Ic \eh ,
! 297: .Ic \eL ,
! 298: .Ic \el ,
! 299: .Ic \eS ,
! 300: .Ic \ev ,
! 301: and
! 302: .Ic \ex
! 303: followed by an argument delimited by a character that cannot occur
! 304: in numerical expressions.
! 305: However, if any character that can occur in numerical expressions
! 306: is found instead of a delimiter, the sequence is considered to end
! 307: with that character, and
! 308: .Dv ESCAPE_ERROR
! 309: is returned.
! 310: .El
! 311: .It Dv ESCAPE_ERROR
! 312: Escape sequences taking an argument but not matching any of the above patterns.
! 313: In particular, that happens if the end of the logical input line
! 314: is reached before the end of the argument.
! 315: .El
! 316: .Pp
! 317: For sequences that do not take an argument, the function
! 318: .Fn mandoc_escape
! 319: returns one of the following values:
! 320: .Bl -tag -width 2n
! 321: .It Dv ESCAPE_SKIPCHAR
! 322: The escape sequence
! 323: .Qq \ez .
! 324: .It Dv ESCAPE_NOSPACE
! 325: The escape sequence
! 326: .Qq \ec .
! 327: .It Dv ESCAPE_IGNORE
! 328: The escape sequences
! 329: .Qq \ed
! 330: and
! 331: .Qq \eu .
! 332: .El
! 333: .Sh FILES
! 334: This function is implemented in
! 335: .Pa mandoc.c .
! 336: .Sh SEE ALSO
! 337: .Xr mchars_alloc 3 ,
! 338: .Xr mandoc_char 7 ,
! 339: .Xr roff 7
! 340: .Sh HISTORY
! 341: This function has been available since mandoc 1.11.2.
! 342: .Sh AUTHORS
! 343: .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
! 344: .An Ingo Schwarze Aq Mt schwarze@openbsd.org
! 345: .Sh BUGS
! 346: The function doesn't cleanly distinguish between sequences that are
! 347: valid and supported, valid and ignored, valid and unsupported,
! 348: syntactically invalid, or undefined.
! 349: For sequences that are ignored or unsupported, it doesn't tell
! 350: whether that deficiency is likely to cause major formatting problems
! 351: and/or loss of document content.
! 352: The function is already rather complicated and still parses some
! 353: sequences incorrectly.
! 354: .
! 355: .ig
! 356: For these sequences, the list given below specifies a starting string
! 357: and either the length of the argument or an ending character.
! 358: The argument starts after the starting string.
! 359: In the former case, the sequence ends with the end of the argument.
! 360: In the latter case, the argument ends before the ending character,
! 361: and the sequence ends with the ending character.
! 362: ..
CVSweb