Annotation of mandoc/mandoc_escape.3, Revision 1.5
1.5 ! schwarze 1: .\" $Id: mandoc_escape.3,v 1.4 2017/07/04 23:40:01 schwarze Exp $
1.1 schwarze 2: .\"
3: .\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org>
4: .\"
5: .\" Permission to use, copy, modify, and distribute this software for any
6: .\" purpose with or without fee is hereby granted, provided that the above
7: .\" copyright notice and this permission notice appear in all copies.
8: .\"
9: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16: .\"
1.5 ! schwarze 17: .Dd $Mdocdate: July 4 2017 $
1.1 schwarze 18: .Dt MANDOC_ESCAPE 3
19: .Os
20: .Sh NAME
21: .Nm mandoc_escape
22: .Nd parse roff escape sequences
23: .Sh SYNOPSIS
24: .In sys/types.h
25: .In mandoc.h
26: .Ft "enum mandoc_esc"
27: .Fo mandoc_escape
28: .Fa "const char **end"
29: .Fa "const char **start"
30: .Fa "int *sz"
31: .Fc
32: .Sh DESCRIPTION
33: This function scans a
34: .Xr roff 7
35: escape sequence.
36: .Pp
37: An escape sequence consists of
38: .Bl -dash -compact -width 2n
39: .It
40: an initial backslash character
41: .Pq Sq \e ,
42: .It
43: a single ASCII character called the escape sequence identifier,
44: .It
45: and, with only a few exceptions, an argument.
46: .El
47: .Pp
48: Arguments can be given in the following forms; some escape sequence
49: identifiers only accept some of these forms as specified below.
50: The first three forms are called the standard forms.
51: .Bl -tag -width 2n
52: .It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&]
53: The argument starts after the initial
54: .Sq \&[ ,
55: ends before the final
56: .Sq \&] ,
57: and the escape sequence ends with the final
58: .Sq \&] .
59: .It Two-character argument short form: Ic \&( Ns Ar ar
60: This form can only be used for arguments
61: consisting of exactly two characters.
62: It has the same effect as
63: .Ic \&[ Ns Ar ar Ns Ic \&] .
64: .It One-character argument short form: Ar a
65: This form can only be used for arguments
66: consisting of exactly one character.
67: It has the same effect as
68: .Ic \&[ Ns Ar a Ns Ic \&] .
69: .It Delimited form: Ar C Ns Ar argument Ns Ar C
70: The argument starts after the initial delimiter character
71: .Ar C ,
72: ends before the next occurrence of the delimiter character
73: .Ar C ,
74: and the escape sequence ends with that second
75: .Ar C .
76: Some escape sequences allow arbitrary characters
77: .Ar C
78: as quoting characters, some restrict the range of characters
79: that can be used as quoting characters.
80: .El
81: .Pp
82: Upon function entry,
83: .Fa end
84: is expected to point to the escape sequence identifier.
85: The values passed in as
86: .Fa start
87: and
88: .Fa sz
89: are ignored and overwritten.
90: .Pp
91: By design, this function cannot handle those
92: .Xr roff 7
93: escape sequences that require in-place expansion, in particular
94: user-defined strings
95: .Ic \e* ,
96: number registers
97: .Ic \en ,
98: width measurements
99: .Ic \ew ,
100: and numerical expression control
101: .Ic \eB .
102: These are handled by
1.5 ! schwarze 103: .Fn roff_expand ,
1.1 schwarze 104: a private preprocessor function called from
105: .Fn roff_parseln ,
106: see the file
107: .Pa roff.c .
108: .Pp
109: The function
110: .Fn mandoc_escape
111: is used
112: .Bl -dash -compact -width 2n
113: .It
114: recursively by itself, because some escape sequence arguments can
115: in turn contain other escape sequences,
116: .It
117: for error detection internally by the
118: .Xr roff 7
119: parser part of the
1.3 schwarze 120: .Xr mandoc 3
121: library, see the file
1.1 schwarze 122: .Pa roff.c ,
123: .It
124: above all externally by the
1.4 schwarze 125: .Xr mandoc 1
1.1 schwarze 126: formatting modules, in particular
127: .Fl Tascii
128: and
129: .Fl Thtml ,
130: for formatting purposes, see the files
131: .Pa term.c
132: and
133: .Pa html.c ,
134: .It
135: and rarely externally by high-level utilities using the mandoc library,
136: for example
137: .Xr makewhatis 8 ,
138: to purge escape sequences from text.
139: .El
140: .Sh RETURN VALUES
141: Upon function return, the pointer
142: .Fa end
143: is set to the character after the end of the escape sequence,
144: such that the calling higher-level parser can easily continue.
145: .Pp
146: For escape sequences taking an argument, the pointer
147: .Fa start
148: is set to the beginning of the argument and
149: .Fa sz
150: is set to the length of the argument.
151: For escape sequences not taking an argument,
152: .Fa start
153: is set to the character after the end of the sequence and
154: .Fa sz
155: is set to 0.
156: Both
157: .Fa start
158: and
159: .Fa sz
160: may be
161: .Dv NULL ;
162: in that case, the argument and the length are not returned.
163: .Pp
164: For sequences taking an argument, the function
165: .Fn mandoc_escape
166: returns one of the following values:
167: .Bl -tag -width 2n
168: .It Dv ESCAPE_FONT
169: The escape sequence
170: .Ic \ef
171: taking an argument in standard form:
172: .Ic \ef[ , \ef( , \ef Ns Ar a .
173: Two-character arguments starting with the character
174: .Sq C
175: are reduced to one-character arguments by skipping the
176: .Sq C .
177: More specific values are returned for the most commonly used arguments:
178: .Bl -column "argument" "ESCAPE_FONTITALIC"
179: .It argument Ta return value
180: .It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN
181: .It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC
182: .It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD
183: .It Cm P Ta Dv ESCAPE_FONTPREV
184: .It Cm BI Ta Dv ESCAPE_FONTBI
185: .El
186: .It Dv ESCAPE_SPECIAL
187: The escape sequence
188: .Ic \eC
189: taking an argument delimited with the single quote character
190: and, as a special exception, the escape sequences
191: .Em not
192: having an identifier, that is, those where the argument, in standard
193: form, directly follows the initial backslash:
194: .Ic \eC' , \e[ , \e( , \e Ns Ar a .
195: Note that the one-character argument short form can only be used for
196: argument characters that do not clash with escape sequence identifiers.
197: .Pp
1.2 schwarze 198: If the argument matches one of the forms described below under
199: .Dv ESCAPE_UNICODE ,
200: that value is returned instead.
1.1 schwarze 201: .Pp
202: The
203: .Dv ESCAPE_SPECIAL
204: special character escape sequences can be rendered using the functions
205: .Fn mchars_spec2cp
206: and
207: .Fn mchars_spec2str
208: described in the
209: .Xr mchars_alloc 3
210: manual.
211: .It Dv ESCAPE_UNICODE
212: Escape sequences of the same format as described above under
213: .Dv ESCAPE_SPECIAL ,
1.2 schwarze 214: but with an argument of the forms
215: .Ic u Ns Ar XXXX ,
216: .Ic u Ns Ar YXXXX ,
217: or
218: .Ic u10 Ns Ar XXXX
219: where
220: .Ar X
221: and
222: .Ar Y
223: are hexadecimal digits and
224: .Ar Y
225: is not zero:
1.1 schwarze 226: .Ic \eC'u , \e[u .
227: As a special exception,
228: .Fa start
229: is set to the character after the
1.2 schwarze 230: .Ic u ,
1.1 schwarze 231: and the
232: .Fa sz
233: return value does not include the
1.2 schwarze 234: .Ic u
1.1 schwarze 235: either.
236: .Pp
237: Such Unicode character escape sequences can be rendered using the function
238: .Fn mchars_num2uc
239: described in the
240: .Xr mchars_alloc 3
241: manual.
242: .It Dv ESCAPE_NUMBERED
243: The escape sequence
244: .Ic \eN
245: followed by a delimited argument.
246: The delimiter character is arbitrary except that digits cannot be used.
247: If a digit is encountered instead of the opening delimiter, that
248: digit is considered to be the argument and the end of the sequence, and
249: .Dv ESCAPE_IGNORE
250: is returned.
251: .Pp
252: Such ASCII character escape sequences can be rendered using the function
253: .Fn mchars_num2char
254: described in the
255: .Xr mchars_alloc 3
256: manual.
1.3 schwarze 257: .It Dv ESCAPE_OVERSTRIKE
258: The escape sequence
259: .Ic \eo
260: followed by an argument delimited by an arbitrary character.
1.1 schwarze 261: .It Dv ESCAPE_IGNORE
262: .Bl -bullet -width 2n
263: .It
264: The escape sequence
265: .Ic \es
266: followed by an argument in standard form or by an argument delimited
267: by the single quote character:
268: .Ic \es' , \es[ , \es( , \es Ns Ar a .
269: As a special exception, an optional
270: .Sq +
271: or
272: .Sq \-
273: character is allowed after the
274: .Sq s
275: for all forms.
276: .It
277: The escape sequences
278: .Ic \eF ,
279: .Ic \eg ,
280: .Ic \ek ,
281: .Ic \eM ,
282: .Ic \em ,
283: .Ic \en ,
284: .Ic \eV ,
285: and
286: .Ic \eY
287: followed by an argument in standard form.
288: .It
289: The escape sequences
290: .Ic \eA ,
291: .Ic \eb ,
292: .Ic \eD ,
293: .Ic \eR ,
294: .Ic \eX ,
295: and
296: .Ic \eZ
297: followed by an argument delimited by an arbitrary character.
298: .It
299: The escape sequences
300: .Ic \eH ,
301: .Ic \eh ,
302: .Ic \eL ,
303: .Ic \el ,
304: .Ic \eS ,
305: .Ic \ev ,
306: and
307: .Ic \ex
308: followed by an argument delimited by a character that cannot occur
309: in numerical expressions.
310: However, if any character that can occur in numerical expressions
311: is found instead of a delimiter, the sequence is considered to end
312: with that character, and
313: .Dv ESCAPE_ERROR
314: is returned.
315: .El
316: .It Dv ESCAPE_ERROR
317: Escape sequences taking an argument but not matching any of the above patterns.
318: In particular, that happens if the end of the logical input line
319: is reached before the end of the argument.
320: .El
321: .Pp
322: For sequences that do not take an argument, the function
323: .Fn mandoc_escape
324: returns one of the following values:
325: .Bl -tag -width 2n
326: .It Dv ESCAPE_SKIPCHAR
327: The escape sequence
328: .Qq \ez .
329: .It Dv ESCAPE_NOSPACE
330: The escape sequence
331: .Qq \ec .
332: .It Dv ESCAPE_IGNORE
333: The escape sequences
334: .Qq \ed
335: and
336: .Qq \eu .
337: .El
338: .Sh FILES
339: This function is implemented in
340: .Pa mandoc.c .
341: .Sh SEE ALSO
342: .Xr mchars_alloc 3 ,
343: .Xr mandoc_char 7 ,
344: .Xr roff 7
345: .Sh HISTORY
346: This function has been available since mandoc 1.11.2.
347: .Sh AUTHORS
348: .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
349: .An Ingo Schwarze Aq Mt schwarze@openbsd.org
350: .Sh BUGS
351: The function doesn't cleanly distinguish between sequences that are
352: valid and supported, valid and ignored, valid and unsupported,
353: syntactically invalid, or undefined.
354: For sequences that are ignored or unsupported, it doesn't tell
355: whether that deficiency is likely to cause major formatting problems
356: and/or loss of document content.
357: The function is already rather complicated and still parses some
358: sequences incorrectly.
359: .
360: .ig
361: For these sequences, the list given below specifies a starting string
362: and either the length of the argument or an ending character.
363: The argument starts after the starting string.
364: In the former case, the sequence ends with the end of the argument.
365: In the latter case, the argument ends before the ending character,
366: and the sequence ends with the ending character.
367: ..
CVSweb