Annotation of mandoc/mandoc.3, Revision 1.9
1.9 ! kristaps 1: .\" $Id: mandoc.3,v 1.8 2011/05/17 12:22:15 kristaps Exp $
1.1 kristaps 2: .\"
3: .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4: .\" Copyright (c) 2010 Ingo Schwarze <schwarze@openbsd.org>
5: .\"
6: .\" Permission to use, copy, modify, and distribute this software for any
7: .\" purpose with or without fee is hereby granted, provided that the above
8: .\" copyright notice and this permission notice appear in all copies.
9: .\"
10: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17: .\"
1.8 kristaps 18: .Dd $Mdocdate: May 17 2011 $
1.1 kristaps 19: .Dt MANDOC 3
20: .Os
21: .Sh NAME
22: .Nm mandoc ,
1.3 kristaps 23: .Nm mandoc_escape ,
1.1 kristaps 24: .Nm man_meta ,
25: .Nm man_node ,
1.6 kristaps 26: .Nm mchars_alloc ,
27: .Nm mchars_free ,
28: .Nm mchars_num2char ,
1.7 kristaps 29: .Nm mchars_num2uc ,
1.6 kristaps 30: .Nm mchars_spec2cp ,
31: .Nm mchars_spec2str ,
1.1 kristaps 32: .Nm mdoc_meta ,
33: .Nm mdoc_node ,
34: .Nm mparse_alloc ,
35: .Nm mparse_free ,
36: .Nm mparse_readfd ,
37: .Nm mparse_reset ,
1.2 kristaps 38: .Nm mparse_result ,
39: .Nm mparse_strerror ,
40: .Nm mparse_strlevel
1.1 kristaps 41: .Nd mandoc macro compiler library
1.8 kristaps 42: .Sh LIBRARY
43: .Lb mandoc
1.1 kristaps 44: .Sh SYNOPSIS
45: .In man.h
46: .In mdoc.h
47: .In mandoc.h
1.3 kristaps 48: .Ft "enum mandoc_esc"
49: .Fo mandoc_escape
50: .Fa "const char **in"
51: .Fa "const char **seq"
52: .Fa "int *len"
53: .Fc
1.1 kristaps 54: .Ft "const struct man_meta *"
55: .Fo man_meta
56: .Fa "const struct man *man"
57: .Fc
58: .Ft "const struct man_node *"
59: .Fo man_node
60: .Fa "const struct man *man"
61: .Fc
1.6 kristaps 62: .Ft "struct mchars *"
63: .Fn mchars_alloc
64: .Ft void
65: .Fn mchars_free "struct mchars *p"
66: .Ft char
67: .Fn mchars_num2char "const char *cp" "size_t sz"
1.7 kristaps 68: .Ft int
69: .Fn mchars_num2uc "const char *cp" "size_t sz"
1.6 kristaps 70: .Ft "const char *"
71: .Fo mchars_spec2str
72: .Fa "struct mchars *p"
73: .Fa "const char *cp"
74: .Fa "size_t sz"
75: .Fa "size_t *rsz"
76: .Fc
77: .Ft int
78: .Fo mchars_spec2cp
79: .Fa "struct mchars *p"
80: .Fa "const char *cp"
81: .Fa "size_t sz"
82: .Ft "const char *"
83: .Fc
1.1 kristaps 84: .Ft "const struct mdoc_meta *"
85: .Fo mdoc_meta
86: .Fa "const struct mdoc *mdoc"
87: .Fc
88: .Ft "const struct mdoc_node *"
89: .Fo mdoc_node
90: .Fa "const struct mdoc *mdoc"
91: .Fc
92: .Ft void
93: .Fo mparse_alloc
94: .Fa "enum mparset type"
95: .Fa "enum mandoclevel wlevel"
96: .Fa "mandocmsg msg"
97: .Fa "void *msgarg"
98: .Fc
99: .Ft void
100: .Fo mparse_free
101: .Fa "struct mparse *parse"
102: .Fc
103: .Ft "enum mandoclevel"
104: .Fo mparse_readfd
105: .Fa "struct mparse *parse"
106: .Fa "int fd"
107: .Fa "const char *fname"
108: .Fc
109: .Ft void
110: .Fo mparse_reset
111: .Fa "struct mparse *parse"
112: .Fc
113: .Ft void
114: .Fo mparse_result
115: .Fa "struct mparse *parse"
116: .Fa "struct mdoc **mdoc"
117: .Fa "struct man **man"
1.2 kristaps 118: .Fc
119: .Ft "const char *"
120: .Fo mparse_strerror
121: .Fa "enum mandocerr"
122: .Fc
123: .Ft "const char *"
124: .Fo mparse_strlevel
125: .Fa "enum mandoclevel"
1.1 kristaps 126: .Fc
127: .Vt extern const char * const * man_macronames;
128: .Vt extern const char * const * mdoc_argnames;
129: .Vt extern const char * const * mdoc_macronames;
1.4 kristaps 130: .Fd "#define ASCII_NBRSP"
131: .Fd "#define ASCII_HYPH"
1.1 kristaps 132: .Sh DESCRIPTION
133: The
134: .Nm mandoc
135: library parses a
136: .Ux
137: manual into an abstract syntax tree (AST).
138: .Ux
139: manuals are composed of
140: .Xr mdoc 7
141: or
142: .Xr man 7 ,
143: and may be mixed with
144: .Xr roff 7 ,
145: .Xr tbl 7 ,
146: and
147: .Xr eqn 7
148: invocations.
149: .Pp
150: The following describes a general parse sequence:
151: .Bl -enum
152: .It
153: initiate a parsing sequence with
154: .Fn mparse_alloc ;
155: .It
156: parse files or file descriptors with
157: .Fn mparse_readfd ;
158: .It
159: retrieve a parsed syntax tree, if the parse was successful, with
160: .Fn mparse_result ;
161: .It
162: iterate over parse nodes with
163: .Fn mdoc_node
164: or
165: .Fn man_node ;
166: .It
167: free all allocated memory with
168: .Fn mparse_free ,
169: or invoke
170: .Fn mparse_reset
171: and parse new files.
1.3 kristaps 172: .El
1.6 kristaps 173: .Pp
174: The
175: .Nm
176: library also contains routines for translating character strings into glyphs
177: .Pq see Fn mchars_alloc
178: and parsing escape sequences from strings
179: .Pq see Fn mandoc_escape .
1.7 kristaps 180: .Pp
181: This library is
182: .Ud
1.3 kristaps 183: .Sh REFERENCE
184: This section documents the functions, types, and variables available
185: via
186: .In mandoc.h .
187: .Ss Types
188: .Bl -ohang
189: .It Vt "enum mandoc_esc"
190: .It Vt "enum mandocerr"
191: .It Vt "enum mandoclevel"
1.6 kristaps 192: .It Vt "struct mchars"
193: An opaque pointer to an object allowing for translation between
194: character strings and glyphs.
195: See
196: .Fn mchars_alloc .
1.3 kristaps 197: .It Vt "enum mparset"
198: .It Vt "struct mparse"
199: .It Vt "mandocmsg"
200: .El
201: .Ss Functions
202: .Bl -ohang
203: .It Fn mandoc_escape
1.4 kristaps 204: Scan an escape sequence, i.e., a character string beginning with
205: .Sq \e .
206: Pass a pointer to this string as
207: .Va end ;
208: it will be set to the supremum of the parsed escape sequence unless
209: returning ESCAPE_ERROR, in which case the string is bogus and should be
210: thrown away.
211: If not ESCAPE_ERROR or ESCAPE_IGNORE,
212: .Va start
213: is set to the first relevant character of the substring (font, glyph,
214: whatever) of length
215: .Va sz .
216: Both
217: .Va start
218: and
219: .Va sz
220: may be NULL.
1.3 kristaps 221: .It Fn man_meta
1.4 kristaps 222: Obtain the meta-data of a successful parse.
223: This may only be used on a pointer returned by
224: .Fn mparse_result .
1.3 kristaps 225: .It Fn man_node
1.4 kristaps 226: Obtain the root node of a successful parse.
227: This may only be used on a pointer returned by
228: .Fn mparse_result .
1.6 kristaps 229: .It Fn mchars_alloc
230: Allocate an
231: .Vt "struct mchars *"
232: object for translating special characters into glyphs.
233: See
234: .Xr mandoc_char 7
235: for an overview of special characters.
236: The object must be freed with
237: .Fn mchars_free .
238: .It Fn mchars_free
239: Free an object created with
240: .Fn mchars_alloc .
241: .It Fn mchars_num2char
1.7 kristaps 242: Convert a character index (e.g., the \eN\(aq\(aq escape) into a
243: printable ASCII character.
244: Returns \e0 (the nil character) if the input sequence is malformed.
245: .It Fn mchars_num2uc
246: Convert a hexadecimal character index (e.g., the \e[uNNNN] escape) into
247: a Unicode codepoint.
1.6 kristaps 248: Returns \e0 (the nil character) if the input sequence is malformed.
249: .It Fn mchars_spec2cp
250: Convert a special character into a valid Unicode codepoint.
251: Returns \-1 on failure and 0 if no code-point exists (if this occurs,
252: the caller should fall back to
253: .Fn mchars_spec2str ) .
254: .It Fn mchars_spec2str
255: Convert a special character into an ASCII string.
256: Returns NULL on failure.
1.3 kristaps 257: .It Fn mdoc_meta
1.4 kristaps 258: Obtain the meta-data of a successful parse.
259: This may only be used on a pointer returned by
260: .Fn mparse_result .
1.3 kristaps 261: .It Fn mdoc_node
1.4 kristaps 262: Obtain the root node of a successful parse.
263: This may only be used on a pointer returned by
264: .Fn mparse_result .
1.3 kristaps 265: .It Fn mparse_alloc
1.4 kristaps 266: Allocate a parser.
267: The same parser may be used for multiple files so long as
268: .Fn mparse_reset
269: is called between parses.
270: .Fn mparse_free
271: must be called to free the memory allocated by this function.
1.3 kristaps 272: .It Fn mparse_free
1.4 kristaps 273: Free all memory allocated by
274: .Fn mparse_alloc .
1.3 kristaps 275: .It Fn mparse_readfd
1.4 kristaps 276: Parse a file or file descriptor.
277: If
278: .Va fd
279: is -1,
280: .Va fname
281: is opened for reading.
282: Otherwise,
283: .Va fname
284: is assumed to be the name associated with
285: .Va fd .
286: This may be called multiple times with different parameters; however,
287: .Fn mparse_reset
288: should be invoked between parses.
1.3 kristaps 289: .It Fn mparse_reset
1.4 kristaps 290: Reset a parser so that
291: .Fn mparse_readfd
292: may be used again.
1.3 kristaps 293: .It Fn mparse_result
1.4 kristaps 294: Obtain the result of a parse.
295: Only successful parses
296: .Po
297: i.e., those where
298: .Fn mparse_readfd
299: returned less than MANDOCLEVEL_FATAL
300: .Pc
301: should invoke this function, in which case one of the two pointers will
302: be filled in.
1.3 kristaps 303: .It Fn mparse_strerror
1.4 kristaps 304: Return a statically-allocated string representation of an error code.
1.3 kristaps 305: .It Fn mparse_strlevel
1.4 kristaps 306: Return a statically-allocated string representation of a level code.
1.3 kristaps 307: .El
308: .Ss Variables
309: .Bl -ohang
310: .It Va man_macronames
1.4 kristaps 311: The string representation of a man macro as indexed by
312: .Vt "enum mant" .
1.3 kristaps 313: .It Va mdoc_argnames
1.4 kristaps 314: The string representation of a mdoc macro argument as indexed by
315: .Vt "enum mdocargt" .
1.3 kristaps 316: .It Va mdoc_macronames
1.4 kristaps 317: The string representation of a mdoc macro as indexed by
318: .Vt "enum mdoct" .
1.1 kristaps 319: .El
320: .Sh IMPLEMENTATION NOTES
321: This section consists of structural documentation for
322: .Xr mdoc 7
323: and
324: .Xr man 7
325: syntax trees.
326: .Ss Man Abstract Syntax Tree
327: This AST is governed by the ontological rules dictated in
328: .Xr man 7
329: and derives its terminology accordingly.
330: .Pp
331: The AST is composed of
332: .Vt struct man_node
333: nodes with element, root and text types as declared by the
334: .Va type
335: field.
336: Each node also provides its parse point (the
337: .Va line ,
338: .Va sec ,
339: and
340: .Va pos
341: fields), its position in the tree (the
342: .Va parent ,
343: .Va child ,
344: .Va next
345: and
346: .Va prev
347: fields) and some type-specific data.
348: .Pp
349: The tree itself is arranged according to the following normal form,
350: where capitalised non-terminals represent nodes.
351: .Pp
352: .Bl -tag -width "ELEMENTXX" -compact
353: .It ROOT
354: \(<- mnode+
355: .It mnode
356: \(<- ELEMENT | TEXT | BLOCK
357: .It BLOCK
358: \(<- HEAD BODY
359: .It HEAD
360: \(<- mnode*
361: .It BODY
362: \(<- mnode*
363: .It ELEMENT
364: \(<- ELEMENT | TEXT*
365: .It TEXT
366: \(<- [[:alpha:]]*
367: .El
368: .Pp
369: The only elements capable of nesting other elements are those with
370: next-lint scope as documented in
371: .Xr man 7 .
372: .Ss Mdoc Abstract Syntax Tree
373: This AST is governed by the ontological
374: rules dictated in
375: .Xr mdoc 7
376: and derives its terminology accordingly.
377: .Qq In-line
378: elements described in
379: .Xr mdoc 7
380: are described simply as
381: .Qq elements .
382: .Pp
383: The AST is composed of
384: .Vt struct mdoc_node
385: nodes with block, head, body, element, root and text types as declared
386: by the
387: .Va type
388: field.
389: Each node also provides its parse point (the
390: .Va line ,
391: .Va sec ,
392: and
393: .Va pos
394: fields), its position in the tree (the
395: .Va parent ,
396: .Va child ,
397: .Va nchild ,
398: .Va next
399: and
400: .Va prev
401: fields) and some type-specific data, in particular, for nodes generated
402: from macros, the generating macro in the
403: .Va tok
404: field.
405: .Pp
406: The tree itself is arranged according to the following normal form,
407: where capitalised non-terminals represent nodes.
408: .Pp
409: .Bl -tag -width "ELEMENTXX" -compact
410: .It ROOT
411: \(<- mnode+
412: .It mnode
413: \(<- BLOCK | ELEMENT | TEXT
414: .It BLOCK
415: \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
416: .It ELEMENT
417: \(<- TEXT*
418: .It HEAD
419: \(<- mnode*
420: .It BODY
421: \(<- mnode* [ENDBODY mnode*]
422: .It TAIL
423: \(<- mnode*
424: .It TEXT
425: \(<- [[:printable:],0x1e]*
426: .El
427: .Pp
428: Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
429: the BLOCK production: these refer to punctuation marks.
430: Furthermore, although a TEXT node will generally have a non-zero-length
431: string, in the specific case of
432: .Sq \&.Bd \-literal ,
433: an empty line will produce a zero-length string.
434: Multiple body parts are only found in invocations of
435: .Sq \&Bl \-column ,
436: where a new body introduces a new phrase.
437: .Pp
438: The
439: .Xr mdoc 7
1.5 kristaps 440: syntax tree accommodates for broken block structures as well.
1.1 kristaps 441: The ENDBODY node is available to end the formatting associated
442: with a given block before the physical end of that block.
443: It has a non-null
444: .Va end
445: field, is of the BODY
446: .Va type ,
447: has the same
448: .Va tok
449: as the BLOCK it is ending, and has a
450: .Va pending
451: field pointing to that BLOCK's BODY node.
452: It is an indirect child of that BODY node
453: and has no children of its own.
454: .Pp
455: An ENDBODY node is generated when a block ends while one of its child
456: blocks is still open, like in the following example:
457: .Bd -literal -offset indent
458: \&.Ao ao
459: \&.Bo bo ac
460: \&.Ac bc
461: \&.Bc end
462: .Ed
463: .Pp
464: This example results in the following block structure:
465: .Bd -literal -offset indent
466: BLOCK Ao
467: HEAD Ao
468: BODY Ao
469: TEXT ao
470: BLOCK Bo, pending -> Ao
471: HEAD Bo
472: BODY Bo
473: TEXT bo
474: TEXT ac
475: ENDBODY Ao, pending -> Ao
476: TEXT bc
477: TEXT end
478: .Ed
479: .Pp
480: Here, the formatting of the
481: .Sq \&Ao
482: block extends from TEXT ao to TEXT ac,
483: while the formatting of the
484: .Sq \&Bo
485: block extends from TEXT bo to TEXT bc.
486: It renders as follows in
487: .Fl T Ns Cm ascii
488: mode:
489: .Pp
490: .Dl <ao [bo ac> bc] end
491: .Pp
492: Support for badly-nested blocks is only provided for backward
493: compatibility with some older
494: .Xr mdoc 7
495: implementations.
496: Using badly-nested blocks is
497: .Em strongly discouraged ;
498: for example, the
499: .Fl T Ns Cm html
500: and
501: .Fl T Ns Cm xhtml
502: front-ends to
503: .Xr mandoc 1
504: are unable to render them in any meaningful way.
505: Furthermore, behaviour when encountering badly-nested blocks is not
506: consistent across troff implementations, especially when using multiple
507: levels of badly-nested blocks.
508: .Sh SEE ALSO
509: .Xr mandoc 1 ,
510: .Xr eqn 7 ,
511: .Xr man 7 ,
1.6 kristaps 512: .Xr mandoc_char 7 ,
1.1 kristaps 513: .Xr mdoc 7 ,
514: .Xr roff 7 ,
515: .Xr tbl 7
516: .Sh AUTHORS
517: The
518: .Nm
519: library was written by
520: .An Kristaps Dzonsons Aq kristaps@bsd.lv .
CVSweb