[BACK]Return to mandoc_escape.3 CVS log [TXT][DIR] Up to [cvsweb.bsd.lv] / mandoc

Annotation of mandoc/mandoc_escape.3, Revision 1.2

1.2     ! schwarze    1: .\"    $Id: mandoc_escape.3,v 1.1 2014/08/05 05:48:56 schwarze Exp $
1.1       schwarze    2: .\"
                      3: .\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org>
                      4: .\"
                      5: .\" Permission to use, copy, modify, and distribute this software for any
                      6: .\" purpose with or without fee is hereby granted, provided that the above
                      7: .\" copyright notice and this permission notice appear in all copies.
                      8: .\"
                      9: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
                     10: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
                     11: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
                     12: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
                     13: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
                     14: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
                     15: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
                     16: .\"
1.2     ! schwarze   17: .Dd $Mdocdate: August 5 2014 $
1.1       schwarze   18: .Dt MANDOC_ESCAPE 3
                     19: .Os
                     20: .Sh NAME
                     21: .Nm mandoc_escape
                     22: .Nd parse roff escape sequences
                     23: .Sh LIBRARY
                     24: .Lb libmandoc
                     25: .Sh SYNOPSIS
                     26: .In sys/types.h
                     27: .In mandoc.h
                     28: .Ft "enum mandoc_esc"
                     29: .Fo mandoc_escape
                     30: .Fa "const char **end"
                     31: .Fa "const char **start"
                     32: .Fa "int *sz"
                     33: .Fc
                     34: .Sh DESCRIPTION
                     35: This function scans a
                     36: .Xr roff 7
                     37: escape sequence.
                     38: .Pp
                     39: An escape sequence consists of
                     40: .Bl -dash -compact -width 2n
                     41: .It
                     42: an initial backslash character
                     43: .Pq Sq \e ,
                     44: .It
                     45: a single ASCII character called the escape sequence identifier,
                     46: .It
                     47: and, with only a few exceptions, an argument.
                     48: .El
                     49: .Pp
                     50: Arguments can be given in the following forms; some escape sequence
                     51: identifiers only accept some of these forms as specified below.
                     52: The first three forms are called the standard forms.
                     53: .Bl -tag -width 2n
                     54: .It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&]
                     55: The argument starts after the initial
                     56: .Sq \&[ ,
                     57: ends before the final
                     58: .Sq \&] ,
                     59: and the escape sequence ends with the final
                     60: .Sq \&] .
                     61: .It Two-character argument short form: Ic \&( Ns Ar ar
                     62: This form can only be used for arguments
                     63: consisting of exactly two characters.
                     64: It has the same effect as
                     65: .Ic \&[ Ns Ar ar Ns Ic \&] .
                     66: .It One-character argument short form: Ar a
                     67: This form can only be used for arguments
                     68: consisting of exactly one character.
                     69: It has the same effect as
                     70: .Ic \&[ Ns Ar a Ns Ic \&] .
                     71: .It Delimited form: Ar C Ns Ar argument Ns Ar C
                     72: The argument starts after the initial delimiter character
                     73: .Ar C ,
                     74: ends before the next occurrence of the delimiter character
                     75: .Ar C ,
                     76: and the escape sequence ends with that second
                     77: .Ar C .
                     78: Some escape sequences allow arbitrary characters
                     79: .Ar C
                     80: as quoting characters, some restrict the range of characters
                     81: that can be used as quoting characters.
                     82: .El
                     83: .Pp
                     84: Upon function entry,
                     85: .Fa end
                     86: is expected to point to the escape sequence identifier.
                     87: The values passed in as
                     88: .Fa start
                     89: and
                     90: .Fa sz
                     91: are ignored and overwritten.
                     92: .Pp
                     93: By design, this function cannot handle those
                     94: .Xr roff 7
                     95: escape sequences that require in-place expansion, in particular
                     96: user-defined strings
                     97: .Ic \e* ,
                     98: number registers
                     99: .Ic \en ,
                    100: width measurements
                    101: .Ic \ew ,
                    102: and numerical expression control
                    103: .Ic \eB .
                    104: These are handled by
                    105: .Fn roff_res ,
                    106: a private preprocessor function called from
                    107: .Fn roff_parseln ,
                    108: see the file
                    109: .Pa roff.c .
                    110: .Pp
                    111: The function
                    112: .Fn mandoc_escape
                    113: is used
                    114: .Bl -dash -compact -width 2n
                    115: .It
                    116: recursively by itself, because some escape sequence arguments can
                    117: in turn contain other escape sequences,
                    118: .It
                    119: for error detection internally by the
                    120: .Xr roff 7
                    121: parser part of the
                    122: .Lb libmandoc ,
                    123: see the file
                    124: .Pa roff.c ,
                    125: .It
                    126: above all externally by the
                    127: .Xr mandoc
                    128: formatting modules, in particular
                    129: .Fl Tascii
                    130: and
                    131: .Fl Thtml ,
                    132: for formatting purposes, see the files
                    133: .Pa term.c
                    134: and
                    135: .Pa html.c ,
                    136: .It
                    137: and rarely externally by high-level utilities using the mandoc library,
                    138: for example
                    139: .Xr makewhatis 8 ,
                    140: to purge escape sequences from text.
                    141: .El
                    142: .Sh RETURN VALUES
                    143: Upon function return, the pointer
                    144: .Fa end
                    145: is set to the character after the end of the escape sequence,
                    146: such that the calling higher-level parser can easily continue.
                    147: .Pp
                    148: For escape sequences taking an argument, the pointer
                    149: .Fa start
                    150: is set to the beginning of the argument and
                    151: .Fa sz
                    152: is set to the length of the argument.
                    153: For escape sequences not taking an argument,
                    154: .Fa start
                    155: is set to the character after the end of the sequence and
                    156: .Fa sz
                    157: is set to 0.
                    158: Both
                    159: .Fa start
                    160: and
                    161: .Fa sz
                    162: may be
                    163: .Dv NULL ;
                    164: in that case, the argument and the length are not returned.
                    165: .Pp
                    166: For sequences taking an argument, the function
                    167: .Fn mandoc_escape
                    168: returns one of the following values:
                    169: .Bl -tag -width 2n
                    170: .It Dv ESCAPE_FONT
                    171: The escape sequence
                    172: .Ic \ef
                    173: taking an argument in standard form:
                    174: .Ic \ef[ , \ef( , \ef Ns Ar a .
                    175: Two-character arguments starting with the character
                    176: .Sq C
                    177: are reduced to one-character arguments by skipping the
                    178: .Sq C .
                    179: More specific values are returned for the most commonly used arguments:
                    180: .Bl -column "argument" "ESCAPE_FONTITALIC"
                    181: .It argument Ta return value
                    182: .It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN
                    183: .It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC
                    184: .It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD
                    185: .It Cm P Ta Dv ESCAPE_FONTPREV
                    186: .It Cm BI Ta Dv ESCAPE_FONTBI
                    187: .El
                    188: .It Dv ESCAPE_SPECIAL
                    189: The escape sequence
                    190: .Ic \eC
                    191: taking an argument delimited with the single quote character
                    192: and, as a special exception, the escape sequences
                    193: .Em not
                    194: having an identifier, that is, those where the argument, in standard
                    195: form, directly follows the initial backslash:
                    196: .Ic \eC' , \e[ , \e( , \e Ns Ar a .
                    197: Note that the one-character argument short form can only be used for
                    198: argument characters that do not clash with escape sequence identifiers.
                    199: .Pp
1.2     ! schwarze  200: If the argument matches one of the forms described below under
        !           201: .Dv ESCAPE_UNICODE ,
        !           202: that value is returned instead.
1.1       schwarze  203: .Pp
                    204: The
                    205: .Dv ESCAPE_SPECIAL
                    206: special character escape sequences can be rendered using the functions
                    207: .Fn mchars_spec2cp
                    208: and
                    209: .Fn mchars_spec2str
                    210: described in the
                    211: .Xr mchars_alloc 3
                    212: manual.
                    213: .It Dv ESCAPE_UNICODE
                    214: Escape sequences of the same format as described above under
                    215: .Dv ESCAPE_SPECIAL ,
1.2     ! schwarze  216: but with an argument of the forms
        !           217: .Ic u Ns Ar XXXX ,
        !           218: .Ic u Ns Ar YXXXX ,
        !           219: or
        !           220: .Ic u10 Ns Ar XXXX
        !           221: where
        !           222: .Ar X
        !           223: and
        !           224: .Ar Y
        !           225: are hexadecimal digits and
        !           226: .Ar Y
        !           227: is not zero:
1.1       schwarze  228: .Ic \eC'u , \e[u .
                    229: As a special exception,
                    230: .Fa start
                    231: is set to the character after the
1.2     ! schwarze  232: .Ic u ,
1.1       schwarze  233: and the
                    234: .Fa sz
                    235: return value does not include the
1.2     ! schwarze  236: .Ic u
1.1       schwarze  237: either.
                    238: .Pp
                    239: Such Unicode character escape sequences can be rendered using the function
                    240: .Fn mchars_num2uc
                    241: described in the
                    242: .Xr mchars_alloc 3
                    243: manual.
                    244: .It Dv ESCAPE_NUMBERED
                    245: The escape sequence
                    246: .Ic \eN
                    247: followed by a delimited argument.
                    248: The delimiter character is arbitrary except that digits cannot be used.
                    249: If a digit is encountered instead of the opening delimiter, that
                    250: digit is considered to be the argument and the end of the sequence, and
                    251: .Dv ESCAPE_IGNORE
                    252: is returned.
                    253: .Pp
                    254: Such ASCII character escape sequences can be rendered using the function
                    255: .Fn mchars_num2char
                    256: described in the
                    257: .Xr mchars_alloc 3
                    258: manual.
                    259: .It Dv ESCAPE_IGNORE
                    260: .Bl -bullet -width 2n
                    261: .It
                    262: The escape sequence
                    263: .Ic \es
                    264: followed by an argument in standard form or by an argument delimited
                    265: by the single quote character:
                    266: .Ic \es' , \es[ , \es( , \es Ns Ar a .
                    267: As a special exception, an optional
                    268: .Sq +
                    269: or
                    270: .Sq \-
                    271: character is allowed after the
                    272: .Sq s
                    273: for all forms.
                    274: .It
                    275: The escape sequences
                    276: .Ic \eF ,
                    277: .Ic \eg ,
                    278: .Ic \ek ,
                    279: .Ic \eM ,
                    280: .Ic \em ,
                    281: .Ic \en ,
                    282: .Ic \eV ,
                    283: and
                    284: .Ic \eY
                    285: followed by an argument in standard form.
                    286: .It
                    287: The escape sequences
                    288: .Ic \eA ,
                    289: .Ic \eb ,
                    290: .Ic \eD ,
                    291: .Ic \eo ,
                    292: .Ic \eR ,
                    293: .Ic \eX ,
                    294: and
                    295: .Ic \eZ
                    296: followed by an argument delimited by an arbitrary character.
                    297: .It
                    298: The escape sequences
                    299: .Ic \eH ,
                    300: .Ic \eh ,
                    301: .Ic \eL ,
                    302: .Ic \el ,
                    303: .Ic \eS ,
                    304: .Ic \ev ,
                    305: and
                    306: .Ic \ex
                    307: followed by an argument delimited by a character that cannot occur
                    308: in numerical expressions.
                    309: However, if any character that can occur in numerical expressions
                    310: is found instead of a delimiter, the sequence is considered to end
                    311: with that character, and
                    312: .Dv ESCAPE_ERROR
                    313: is returned.
                    314: .El
                    315: .It Dv ESCAPE_ERROR
                    316: Escape sequences taking an argument but not matching any of the above patterns.
                    317: In particular, that happens if the end of the logical input line
                    318: is reached before the end of the argument.
                    319: .El
                    320: .Pp
                    321: For sequences that do not take an argument, the function
                    322: .Fn mandoc_escape
                    323: returns one of the following values:
                    324: .Bl -tag -width 2n
                    325: .It Dv ESCAPE_SKIPCHAR
                    326: The escape sequence
                    327: .Qq \ez .
                    328: .It Dv ESCAPE_NOSPACE
                    329: The escape sequence
                    330: .Qq \ec .
                    331: .It Dv ESCAPE_IGNORE
                    332: The escape sequences
                    333: .Qq \ed
                    334: and
                    335: .Qq \eu .
                    336: .El
                    337: .Sh FILES
                    338: This function is implemented in
                    339: .Pa mandoc.c .
                    340: .Sh SEE ALSO
                    341: .Xr mchars_alloc 3 ,
                    342: .Xr mandoc_char 7 ,
                    343: .Xr roff 7
                    344: .Sh HISTORY
                    345: This function has been available since mandoc 1.11.2.
                    346: .Sh AUTHORS
                    347: .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
                    348: .An Ingo Schwarze Aq Mt schwarze@openbsd.org
                    349: .Sh BUGS
                    350: The function doesn't cleanly distinguish between sequences that are
                    351: valid and supported, valid and ignored, valid and unsupported,
                    352: syntactically invalid, or undefined.
                    353: For sequences that are ignored or unsupported, it doesn't tell
                    354: whether that deficiency is likely to cause major formatting problems
                    355: and/or loss of document content.
                    356: The function is already rather complicated and still parses some
                    357: sequences incorrectly.
                    358: .
                    359: .ig
                    360: For these sequences, the list given below specifies a starting string
                    361: and either the length of the argument or an ending character.
                    362: The argument starts after the starting string.
                    363: In the former case, the sequence ends with the end of the argument.
                    364: In the latter case, the argument ends before the ending character,
                    365: and the sequence ends with the ending character.
                    366: ..

CVSweb