[BACK]Return to mandoc_escape.3 CVS log [TXT][DIR] Up to [cvsweb.bsd.lv] / mandoc

Annotation of mandoc/mandoc_escape.3, Revision 1.3

1.3     ! schwarze    1: .\"    $Id: mandoc_escape.3,v 1.2 2014/10/28 14:06:31 schwarze Exp $
1.1       schwarze    2: .\"
                      3: .\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org>
                      4: .\"
                      5: .\" Permission to use, copy, modify, and distribute this software for any
                      6: .\" purpose with or without fee is hereby granted, provided that the above
                      7: .\" copyright notice and this permission notice appear in all copies.
                      8: .\"
                      9: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
                     10: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
                     11: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
                     12: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
                     13: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
                     14: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
                     15: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
                     16: .\"
1.3     ! schwarze   17: .Dd $Mdocdate: October 28 2014 $
1.1       schwarze   18: .Dt MANDOC_ESCAPE 3
                     19: .Os
                     20: .Sh NAME
                     21: .Nm mandoc_escape
                     22: .Nd parse roff escape sequences
                     23: .Sh SYNOPSIS
                     24: .In sys/types.h
                     25: .In mandoc.h
                     26: .Ft "enum mandoc_esc"
                     27: .Fo mandoc_escape
                     28: .Fa "const char **end"
                     29: .Fa "const char **start"
                     30: .Fa "int *sz"
                     31: .Fc
                     32: .Sh DESCRIPTION
                     33: This function scans a
                     34: .Xr roff 7
                     35: escape sequence.
                     36: .Pp
                     37: An escape sequence consists of
                     38: .Bl -dash -compact -width 2n
                     39: .It
                     40: an initial backslash character
                     41: .Pq Sq \e ,
                     42: .It
                     43: a single ASCII character called the escape sequence identifier,
                     44: .It
                     45: and, with only a few exceptions, an argument.
                     46: .El
                     47: .Pp
                     48: Arguments can be given in the following forms; some escape sequence
                     49: identifiers only accept some of these forms as specified below.
                     50: The first three forms are called the standard forms.
                     51: .Bl -tag -width 2n
                     52: .It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&]
                     53: The argument starts after the initial
                     54: .Sq \&[ ,
                     55: ends before the final
                     56: .Sq \&] ,
                     57: and the escape sequence ends with the final
                     58: .Sq \&] .
                     59: .It Two-character argument short form: Ic \&( Ns Ar ar
                     60: This form can only be used for arguments
                     61: consisting of exactly two characters.
                     62: It has the same effect as
                     63: .Ic \&[ Ns Ar ar Ns Ic \&] .
                     64: .It One-character argument short form: Ar a
                     65: This form can only be used for arguments
                     66: consisting of exactly one character.
                     67: It has the same effect as
                     68: .Ic \&[ Ns Ar a Ns Ic \&] .
                     69: .It Delimited form: Ar C Ns Ar argument Ns Ar C
                     70: The argument starts after the initial delimiter character
                     71: .Ar C ,
                     72: ends before the next occurrence of the delimiter character
                     73: .Ar C ,
                     74: and the escape sequence ends with that second
                     75: .Ar C .
                     76: Some escape sequences allow arbitrary characters
                     77: .Ar C
                     78: as quoting characters, some restrict the range of characters
                     79: that can be used as quoting characters.
                     80: .El
                     81: .Pp
                     82: Upon function entry,
                     83: .Fa end
                     84: is expected to point to the escape sequence identifier.
                     85: The values passed in as
                     86: .Fa start
                     87: and
                     88: .Fa sz
                     89: are ignored and overwritten.
                     90: .Pp
                     91: By design, this function cannot handle those
                     92: .Xr roff 7
                     93: escape sequences that require in-place expansion, in particular
                     94: user-defined strings
                     95: .Ic \e* ,
                     96: number registers
                     97: .Ic \en ,
                     98: width measurements
                     99: .Ic \ew ,
                    100: and numerical expression control
                    101: .Ic \eB .
                    102: These are handled by
                    103: .Fn roff_res ,
                    104: a private preprocessor function called from
                    105: .Fn roff_parseln ,
                    106: see the file
                    107: .Pa roff.c .
                    108: .Pp
                    109: The function
                    110: .Fn mandoc_escape
                    111: is used
                    112: .Bl -dash -compact -width 2n
                    113: .It
                    114: recursively by itself, because some escape sequence arguments can
                    115: in turn contain other escape sequences,
                    116: .It
                    117: for error detection internally by the
                    118: .Xr roff 7
                    119: parser part of the
1.3     ! schwarze  120: .Xr mandoc 3
        !           121: library, see the file
1.1       schwarze  122: .Pa roff.c ,
                    123: .It
                    124: above all externally by the
                    125: .Xr mandoc
                    126: formatting modules, in particular
                    127: .Fl Tascii
                    128: and
                    129: .Fl Thtml ,
                    130: for formatting purposes, see the files
                    131: .Pa term.c
                    132: and
                    133: .Pa html.c ,
                    134: .It
                    135: and rarely externally by high-level utilities using the mandoc library,
                    136: for example
                    137: .Xr makewhatis 8 ,
                    138: to purge escape sequences from text.
                    139: .El
                    140: .Sh RETURN VALUES
                    141: Upon function return, the pointer
                    142: .Fa end
                    143: is set to the character after the end of the escape sequence,
                    144: such that the calling higher-level parser can easily continue.
                    145: .Pp
                    146: For escape sequences taking an argument, the pointer
                    147: .Fa start
                    148: is set to the beginning of the argument and
                    149: .Fa sz
                    150: is set to the length of the argument.
                    151: For escape sequences not taking an argument,
                    152: .Fa start
                    153: is set to the character after the end of the sequence and
                    154: .Fa sz
                    155: is set to 0.
                    156: Both
                    157: .Fa start
                    158: and
                    159: .Fa sz
                    160: may be
                    161: .Dv NULL ;
                    162: in that case, the argument and the length are not returned.
                    163: .Pp
                    164: For sequences taking an argument, the function
                    165: .Fn mandoc_escape
                    166: returns one of the following values:
                    167: .Bl -tag -width 2n
                    168: .It Dv ESCAPE_FONT
                    169: The escape sequence
                    170: .Ic \ef
                    171: taking an argument in standard form:
                    172: .Ic \ef[ , \ef( , \ef Ns Ar a .
                    173: Two-character arguments starting with the character
                    174: .Sq C
                    175: are reduced to one-character arguments by skipping the
                    176: .Sq C .
                    177: More specific values are returned for the most commonly used arguments:
                    178: .Bl -column "argument" "ESCAPE_FONTITALIC"
                    179: .It argument Ta return value
                    180: .It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN
                    181: .It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC
                    182: .It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD
                    183: .It Cm P Ta Dv ESCAPE_FONTPREV
                    184: .It Cm BI Ta Dv ESCAPE_FONTBI
                    185: .El
                    186: .It Dv ESCAPE_SPECIAL
                    187: The escape sequence
                    188: .Ic \eC
                    189: taking an argument delimited with the single quote character
                    190: and, as a special exception, the escape sequences
                    191: .Em not
                    192: having an identifier, that is, those where the argument, in standard
                    193: form, directly follows the initial backslash:
                    194: .Ic \eC' , \e[ , \e( , \e Ns Ar a .
                    195: Note that the one-character argument short form can only be used for
                    196: argument characters that do not clash with escape sequence identifiers.
                    197: .Pp
1.2       schwarze  198: If the argument matches one of the forms described below under
                    199: .Dv ESCAPE_UNICODE ,
                    200: that value is returned instead.
1.1       schwarze  201: .Pp
                    202: The
                    203: .Dv ESCAPE_SPECIAL
                    204: special character escape sequences can be rendered using the functions
                    205: .Fn mchars_spec2cp
                    206: and
                    207: .Fn mchars_spec2str
                    208: described in the
                    209: .Xr mchars_alloc 3
                    210: manual.
                    211: .It Dv ESCAPE_UNICODE
                    212: Escape sequences of the same format as described above under
                    213: .Dv ESCAPE_SPECIAL ,
1.2       schwarze  214: but with an argument of the forms
                    215: .Ic u Ns Ar XXXX ,
                    216: .Ic u Ns Ar YXXXX ,
                    217: or
                    218: .Ic u10 Ns Ar XXXX
                    219: where
                    220: .Ar X
                    221: and
                    222: .Ar Y
                    223: are hexadecimal digits and
                    224: .Ar Y
                    225: is not zero:
1.1       schwarze  226: .Ic \eC'u , \e[u .
                    227: As a special exception,
                    228: .Fa start
                    229: is set to the character after the
1.2       schwarze  230: .Ic u ,
1.1       schwarze  231: and the
                    232: .Fa sz
                    233: return value does not include the
1.2       schwarze  234: .Ic u
1.1       schwarze  235: either.
                    236: .Pp
                    237: Such Unicode character escape sequences can be rendered using the function
                    238: .Fn mchars_num2uc
                    239: described in the
                    240: .Xr mchars_alloc 3
                    241: manual.
                    242: .It Dv ESCAPE_NUMBERED
                    243: The escape sequence
                    244: .Ic \eN
                    245: followed by a delimited argument.
                    246: The delimiter character is arbitrary except that digits cannot be used.
                    247: If a digit is encountered instead of the opening delimiter, that
                    248: digit is considered to be the argument and the end of the sequence, and
                    249: .Dv ESCAPE_IGNORE
                    250: is returned.
                    251: .Pp
                    252: Such ASCII character escape sequences can be rendered using the function
                    253: .Fn mchars_num2char
                    254: described in the
                    255: .Xr mchars_alloc 3
                    256: manual.
1.3     ! schwarze  257: .It Dv ESCAPE_OVERSTRIKE
        !           258: The escape sequence
        !           259: .Ic \eo
        !           260: followed by an argument delimited by an arbitrary character.
1.1       schwarze  261: .It Dv ESCAPE_IGNORE
                    262: .Bl -bullet -width 2n
                    263: .It
                    264: The escape sequence
                    265: .Ic \es
                    266: followed by an argument in standard form or by an argument delimited
                    267: by the single quote character:
                    268: .Ic \es' , \es[ , \es( , \es Ns Ar a .
                    269: As a special exception, an optional
                    270: .Sq +
                    271: or
                    272: .Sq \-
                    273: character is allowed after the
                    274: .Sq s
                    275: for all forms.
                    276: .It
                    277: The escape sequences
                    278: .Ic \eF ,
                    279: .Ic \eg ,
                    280: .Ic \ek ,
                    281: .Ic \eM ,
                    282: .Ic \em ,
                    283: .Ic \en ,
                    284: .Ic \eV ,
                    285: and
                    286: .Ic \eY
                    287: followed by an argument in standard form.
                    288: .It
                    289: The escape sequences
                    290: .Ic \eA ,
                    291: .Ic \eb ,
                    292: .Ic \eD ,
                    293: .Ic \eR ,
                    294: .Ic \eX ,
                    295: and
                    296: .Ic \eZ
                    297: followed by an argument delimited by an arbitrary character.
                    298: .It
                    299: The escape sequences
                    300: .Ic \eH ,
                    301: .Ic \eh ,
                    302: .Ic \eL ,
                    303: .Ic \el ,
                    304: .Ic \eS ,
                    305: .Ic \ev ,
                    306: and
                    307: .Ic \ex
                    308: followed by an argument delimited by a character that cannot occur
                    309: in numerical expressions.
                    310: However, if any character that can occur in numerical expressions
                    311: is found instead of a delimiter, the sequence is considered to end
                    312: with that character, and
                    313: .Dv ESCAPE_ERROR
                    314: is returned.
                    315: .El
                    316: .It Dv ESCAPE_ERROR
                    317: Escape sequences taking an argument but not matching any of the above patterns.
                    318: In particular, that happens if the end of the logical input line
                    319: is reached before the end of the argument.
                    320: .El
                    321: .Pp
                    322: For sequences that do not take an argument, the function
                    323: .Fn mandoc_escape
                    324: returns one of the following values:
                    325: .Bl -tag -width 2n
                    326: .It Dv ESCAPE_SKIPCHAR
                    327: The escape sequence
                    328: .Qq \ez .
                    329: .It Dv ESCAPE_NOSPACE
                    330: The escape sequence
                    331: .Qq \ec .
                    332: .It Dv ESCAPE_IGNORE
                    333: The escape sequences
                    334: .Qq \ed
                    335: and
                    336: .Qq \eu .
                    337: .El
                    338: .Sh FILES
                    339: This function is implemented in
                    340: .Pa mandoc.c .
                    341: .Sh SEE ALSO
                    342: .Xr mchars_alloc 3 ,
                    343: .Xr mandoc_char 7 ,
                    344: .Xr roff 7
                    345: .Sh HISTORY
                    346: This function has been available since mandoc 1.11.2.
                    347: .Sh AUTHORS
                    348: .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
                    349: .An Ingo Schwarze Aq Mt schwarze@openbsd.org
                    350: .Sh BUGS
                    351: The function doesn't cleanly distinguish between sequences that are
                    352: valid and supported, valid and ignored, valid and unsupported,
                    353: syntactically invalid, or undefined.
                    354: For sequences that are ignored or unsupported, it doesn't tell
                    355: whether that deficiency is likely to cause major formatting problems
                    356: and/or loss of document content.
                    357: The function is already rather complicated and still parses some
                    358: sequences incorrectly.
                    359: .
                    360: .ig
                    361: For these sequences, the list given below specifies a starting string
                    362: and either the length of the argument or an ending character.
                    363: The argument starts after the starting string.
                    364: In the former case, the sequence ends with the end of the argument.
                    365: In the latter case, the argument ends before the ending character,
                    366: and the sequence ends with the ending character.
                    367: ..

CVSweb