Annotation of mandoc/mandoc_escape.3, Revision 1.2
1.2 ! schwarze 1: .\" $Id: mandoc_escape.3,v 1.1 2014/08/05 05:48:56 schwarze Exp $
1.1 schwarze 2: .\"
3: .\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org>
4: .\"
5: .\" Permission to use, copy, modify, and distribute this software for any
6: .\" purpose with or without fee is hereby granted, provided that the above
7: .\" copyright notice and this permission notice appear in all copies.
8: .\"
9: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16: .\"
1.2 ! schwarze 17: .Dd $Mdocdate: August 5 2014 $
1.1 schwarze 18: .Dt MANDOC_ESCAPE 3
19: .Os
20: .Sh NAME
21: .Nm mandoc_escape
22: .Nd parse roff escape sequences
23: .Sh LIBRARY
24: .Lb libmandoc
25: .Sh SYNOPSIS
26: .In sys/types.h
27: .In mandoc.h
28: .Ft "enum mandoc_esc"
29: .Fo mandoc_escape
30: .Fa "const char **end"
31: .Fa "const char **start"
32: .Fa "int *sz"
33: .Fc
34: .Sh DESCRIPTION
35: This function scans a
36: .Xr roff 7
37: escape sequence.
38: .Pp
39: An escape sequence consists of
40: .Bl -dash -compact -width 2n
41: .It
42: an initial backslash character
43: .Pq Sq \e ,
44: .It
45: a single ASCII character called the escape sequence identifier,
46: .It
47: and, with only a few exceptions, an argument.
48: .El
49: .Pp
50: Arguments can be given in the following forms; some escape sequence
51: identifiers only accept some of these forms as specified below.
52: The first three forms are called the standard forms.
53: .Bl -tag -width 2n
54: .It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&]
55: The argument starts after the initial
56: .Sq \&[ ,
57: ends before the final
58: .Sq \&] ,
59: and the escape sequence ends with the final
60: .Sq \&] .
61: .It Two-character argument short form: Ic \&( Ns Ar ar
62: This form can only be used for arguments
63: consisting of exactly two characters.
64: It has the same effect as
65: .Ic \&[ Ns Ar ar Ns Ic \&] .
66: .It One-character argument short form: Ar a
67: This form can only be used for arguments
68: consisting of exactly one character.
69: It has the same effect as
70: .Ic \&[ Ns Ar a Ns Ic \&] .
71: .It Delimited form: Ar C Ns Ar argument Ns Ar C
72: The argument starts after the initial delimiter character
73: .Ar C ,
74: ends before the next occurrence of the delimiter character
75: .Ar C ,
76: and the escape sequence ends with that second
77: .Ar C .
78: Some escape sequences allow arbitrary characters
79: .Ar C
80: as quoting characters, some restrict the range of characters
81: that can be used as quoting characters.
82: .El
83: .Pp
84: Upon function entry,
85: .Fa end
86: is expected to point to the escape sequence identifier.
87: The values passed in as
88: .Fa start
89: and
90: .Fa sz
91: are ignored and overwritten.
92: .Pp
93: By design, this function cannot handle those
94: .Xr roff 7
95: escape sequences that require in-place expansion, in particular
96: user-defined strings
97: .Ic \e* ,
98: number registers
99: .Ic \en ,
100: width measurements
101: .Ic \ew ,
102: and numerical expression control
103: .Ic \eB .
104: These are handled by
105: .Fn roff_res ,
106: a private preprocessor function called from
107: .Fn roff_parseln ,
108: see the file
109: .Pa roff.c .
110: .Pp
111: The function
112: .Fn mandoc_escape
113: is used
114: .Bl -dash -compact -width 2n
115: .It
116: recursively by itself, because some escape sequence arguments can
117: in turn contain other escape sequences,
118: .It
119: for error detection internally by the
120: .Xr roff 7
121: parser part of the
122: .Lb libmandoc ,
123: see the file
124: .Pa roff.c ,
125: .It
126: above all externally by the
127: .Xr mandoc
128: formatting modules, in particular
129: .Fl Tascii
130: and
131: .Fl Thtml ,
132: for formatting purposes, see the files
133: .Pa term.c
134: and
135: .Pa html.c ,
136: .It
137: and rarely externally by high-level utilities using the mandoc library,
138: for example
139: .Xr makewhatis 8 ,
140: to purge escape sequences from text.
141: .El
142: .Sh RETURN VALUES
143: Upon function return, the pointer
144: .Fa end
145: is set to the character after the end of the escape sequence,
146: such that the calling higher-level parser can easily continue.
147: .Pp
148: For escape sequences taking an argument, the pointer
149: .Fa start
150: is set to the beginning of the argument and
151: .Fa sz
152: is set to the length of the argument.
153: For escape sequences not taking an argument,
154: .Fa start
155: is set to the character after the end of the sequence and
156: .Fa sz
157: is set to 0.
158: Both
159: .Fa start
160: and
161: .Fa sz
162: may be
163: .Dv NULL ;
164: in that case, the argument and the length are not returned.
165: .Pp
166: For sequences taking an argument, the function
167: .Fn mandoc_escape
168: returns one of the following values:
169: .Bl -tag -width 2n
170: .It Dv ESCAPE_FONT
171: The escape sequence
172: .Ic \ef
173: taking an argument in standard form:
174: .Ic \ef[ , \ef( , \ef Ns Ar a .
175: Two-character arguments starting with the character
176: .Sq C
177: are reduced to one-character arguments by skipping the
178: .Sq C .
179: More specific values are returned for the most commonly used arguments:
180: .Bl -column "argument" "ESCAPE_FONTITALIC"
181: .It argument Ta return value
182: .It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN
183: .It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC
184: .It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD
185: .It Cm P Ta Dv ESCAPE_FONTPREV
186: .It Cm BI Ta Dv ESCAPE_FONTBI
187: .El
188: .It Dv ESCAPE_SPECIAL
189: The escape sequence
190: .Ic \eC
191: taking an argument delimited with the single quote character
192: and, as a special exception, the escape sequences
193: .Em not
194: having an identifier, that is, those where the argument, in standard
195: form, directly follows the initial backslash:
196: .Ic \eC' , \e[ , \e( , \e Ns Ar a .
197: Note that the one-character argument short form can only be used for
198: argument characters that do not clash with escape sequence identifiers.
199: .Pp
1.2 ! schwarze 200: If the argument matches one of the forms described below under
! 201: .Dv ESCAPE_UNICODE ,
! 202: that value is returned instead.
1.1 schwarze 203: .Pp
204: The
205: .Dv ESCAPE_SPECIAL
206: special character escape sequences can be rendered using the functions
207: .Fn mchars_spec2cp
208: and
209: .Fn mchars_spec2str
210: described in the
211: .Xr mchars_alloc 3
212: manual.
213: .It Dv ESCAPE_UNICODE
214: Escape sequences of the same format as described above under
215: .Dv ESCAPE_SPECIAL ,
1.2 ! schwarze 216: but with an argument of the forms
! 217: .Ic u Ns Ar XXXX ,
! 218: .Ic u Ns Ar YXXXX ,
! 219: or
! 220: .Ic u10 Ns Ar XXXX
! 221: where
! 222: .Ar X
! 223: and
! 224: .Ar Y
! 225: are hexadecimal digits and
! 226: .Ar Y
! 227: is not zero:
1.1 schwarze 228: .Ic \eC'u , \e[u .
229: As a special exception,
230: .Fa start
231: is set to the character after the
1.2 ! schwarze 232: .Ic u ,
1.1 schwarze 233: and the
234: .Fa sz
235: return value does not include the
1.2 ! schwarze 236: .Ic u
1.1 schwarze 237: either.
238: .Pp
239: Such Unicode character escape sequences can be rendered using the function
240: .Fn mchars_num2uc
241: described in the
242: .Xr mchars_alloc 3
243: manual.
244: .It Dv ESCAPE_NUMBERED
245: The escape sequence
246: .Ic \eN
247: followed by a delimited argument.
248: The delimiter character is arbitrary except that digits cannot be used.
249: If a digit is encountered instead of the opening delimiter, that
250: digit is considered to be the argument and the end of the sequence, and
251: .Dv ESCAPE_IGNORE
252: is returned.
253: .Pp
254: Such ASCII character escape sequences can be rendered using the function
255: .Fn mchars_num2char
256: described in the
257: .Xr mchars_alloc 3
258: manual.
259: .It Dv ESCAPE_IGNORE
260: .Bl -bullet -width 2n
261: .It
262: The escape sequence
263: .Ic \es
264: followed by an argument in standard form or by an argument delimited
265: by the single quote character:
266: .Ic \es' , \es[ , \es( , \es Ns Ar a .
267: As a special exception, an optional
268: .Sq +
269: or
270: .Sq \-
271: character is allowed after the
272: .Sq s
273: for all forms.
274: .It
275: The escape sequences
276: .Ic \eF ,
277: .Ic \eg ,
278: .Ic \ek ,
279: .Ic \eM ,
280: .Ic \em ,
281: .Ic \en ,
282: .Ic \eV ,
283: and
284: .Ic \eY
285: followed by an argument in standard form.
286: .It
287: The escape sequences
288: .Ic \eA ,
289: .Ic \eb ,
290: .Ic \eD ,
291: .Ic \eo ,
292: .Ic \eR ,
293: .Ic \eX ,
294: and
295: .Ic \eZ
296: followed by an argument delimited by an arbitrary character.
297: .It
298: The escape sequences
299: .Ic \eH ,
300: .Ic \eh ,
301: .Ic \eL ,
302: .Ic \el ,
303: .Ic \eS ,
304: .Ic \ev ,
305: and
306: .Ic \ex
307: followed by an argument delimited by a character that cannot occur
308: in numerical expressions.
309: However, if any character that can occur in numerical expressions
310: is found instead of a delimiter, the sequence is considered to end
311: with that character, and
312: .Dv ESCAPE_ERROR
313: is returned.
314: .El
315: .It Dv ESCAPE_ERROR
316: Escape sequences taking an argument but not matching any of the above patterns.
317: In particular, that happens if the end of the logical input line
318: is reached before the end of the argument.
319: .El
320: .Pp
321: For sequences that do not take an argument, the function
322: .Fn mandoc_escape
323: returns one of the following values:
324: .Bl -tag -width 2n
325: .It Dv ESCAPE_SKIPCHAR
326: The escape sequence
327: .Qq \ez .
328: .It Dv ESCAPE_NOSPACE
329: The escape sequence
330: .Qq \ec .
331: .It Dv ESCAPE_IGNORE
332: The escape sequences
333: .Qq \ed
334: and
335: .Qq \eu .
336: .El
337: .Sh FILES
338: This function is implemented in
339: .Pa mandoc.c .
340: .Sh SEE ALSO
341: .Xr mchars_alloc 3 ,
342: .Xr mandoc_char 7 ,
343: .Xr roff 7
344: .Sh HISTORY
345: This function has been available since mandoc 1.11.2.
346: .Sh AUTHORS
347: .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
348: .An Ingo Schwarze Aq Mt schwarze@openbsd.org
349: .Sh BUGS
350: The function doesn't cleanly distinguish between sequences that are
351: valid and supported, valid and ignored, valid and unsupported,
352: syntactically invalid, or undefined.
353: For sequences that are ignored or unsupported, it doesn't tell
354: whether that deficiency is likely to cause major formatting problems
355: and/or loss of document content.
356: The function is already rather complicated and still parses some
357: sequences incorrectly.
358: .
359: .ig
360: For these sequences, the list given below specifies a starting string
361: and either the length of the argument or an ending character.
362: The argument starts after the starting string.
363: In the former case, the sequence ends with the end of the argument.
364: In the latter case, the argument ends before the ending character,
365: and the sequence ends with the ending character.
366: ..
CVSweb