Annotation of mandoc/mandoc.3, Revision 1.8
1.8 ! kristaps 1: .\" $Id: mandoc.3,v 1.7 2011/05/17 11:50:20 kristaps Exp $
1.1 kristaps 2: .\"
3: .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4: .\" Copyright (c) 2010 Ingo Schwarze <schwarze@openbsd.org>
5: .\"
6: .\" Permission to use, copy, modify, and distribute this software for any
7: .\" purpose with or without fee is hereby granted, provided that the above
8: .\" copyright notice and this permission notice appear in all copies.
9: .\"
10: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17: .\"
1.8 ! kristaps 18: .Dd $Mdocdate: May 17 2011 $
1.1 kristaps 19: .Dt MANDOC 3
20: .Os
21: .Sh NAME
22: .Nm mandoc ,
1.3 kristaps 23: .Nm mandoc_escape ,
1.1 kristaps 24: .Nm man_meta ,
25: .Nm man_node ,
1.6 kristaps 26: .Nm mchars_alloc ,
27: .Nm mchars_free ,
28: .Nm mchars_num2char ,
1.7 kristaps 29: .Nm mchars_num2uc ,
1.6 kristaps 30: .Nm mchars_res2cp ,
31: .Nm mchars_res2str ,
32: .Nm mchars_spec2cp ,
33: .Nm mchars_spec2str ,
1.1 kristaps 34: .Nm mdoc_meta ,
35: .Nm mdoc_node ,
36: .Nm mparse_alloc ,
37: .Nm mparse_free ,
38: .Nm mparse_readfd ,
39: .Nm mparse_reset ,
1.2 kristaps 40: .Nm mparse_result ,
41: .Nm mparse_strerror ,
42: .Nm mparse_strlevel
1.1 kristaps 43: .Nd mandoc macro compiler library
1.8 ! kristaps 44: .Sh LIBRARY
! 45: .Lb mandoc
1.1 kristaps 46: .Sh SYNOPSIS
47: .In man.h
48: .In mdoc.h
49: .In mandoc.h
1.3 kristaps 50: .Ft "enum mandoc_esc"
51: .Fo mandoc_escape
52: .Fa "const char **in"
53: .Fa "const char **seq"
54: .Fa "int *len"
55: .Fc
1.1 kristaps 56: .Ft "const struct man_meta *"
57: .Fo man_meta
58: .Fa "const struct man *man"
59: .Fc
60: .Ft "const struct man_node *"
61: .Fo man_node
62: .Fa "const struct man *man"
63: .Fc
1.6 kristaps 64: .Ft "struct mchars *"
65: .Fn mchars_alloc
66: .Ft void
67: .Fn mchars_free "struct mchars *p"
68: .Ft char
69: .Fn mchars_num2char "const char *cp" "size_t sz"
1.7 kristaps 70: .Ft int
71: .Fn mchars_num2uc "const char *cp" "size_t sz"
1.6 kristaps 72: .Ft "const char *"
73: .Fo mchars_res2str
74: .Fa "struct mchars *p"
75: .Fa "const char *cp"
76: .Fa "size_t sz"
77: .Fa "size_t *rsz"
78: .Fc
79: .Ft int
80: .Fo mchars_res2cp
81: .Fa "struct mchars *p"
82: .Fa "const char *cp"
83: .Fa "size_t sz"
84: .Ft "const char *"
85: .Fc
86: .Ft "const char *"
87: .Fo mchars_spec2str
88: .Fa "struct mchars *p"
89: .Fa "const char *cp"
90: .Fa "size_t sz"
91: .Fa "size_t *rsz"
92: .Fc
93: .Ft int
94: .Fo mchars_spec2cp
95: .Fa "struct mchars *p"
96: .Fa "const char *cp"
97: .Fa "size_t sz"
98: .Ft "const char *"
99: .Fc
1.1 kristaps 100: .Ft "const struct mdoc_meta *"
101: .Fo mdoc_meta
102: .Fa "const struct mdoc *mdoc"
103: .Fc
104: .Ft "const struct mdoc_node *"
105: .Fo mdoc_node
106: .Fa "const struct mdoc *mdoc"
107: .Fc
108: .Ft void
109: .Fo mparse_alloc
110: .Fa "enum mparset type"
111: .Fa "enum mandoclevel wlevel"
112: .Fa "mandocmsg msg"
113: .Fa "void *msgarg"
114: .Fc
115: .Ft void
116: .Fo mparse_free
117: .Fa "struct mparse *parse"
118: .Fc
119: .Ft "enum mandoclevel"
120: .Fo mparse_readfd
121: .Fa "struct mparse *parse"
122: .Fa "int fd"
123: .Fa "const char *fname"
124: .Fc
125: .Ft void
126: .Fo mparse_reset
127: .Fa "struct mparse *parse"
128: .Fc
129: .Ft void
130: .Fo mparse_result
131: .Fa "struct mparse *parse"
132: .Fa "struct mdoc **mdoc"
133: .Fa "struct man **man"
1.2 kristaps 134: .Fc
135: .Ft "const char *"
136: .Fo mparse_strerror
137: .Fa "enum mandocerr"
138: .Fc
139: .Ft "const char *"
140: .Fo mparse_strlevel
141: .Fa "enum mandoclevel"
1.1 kristaps 142: .Fc
143: .Vt extern const char * const * man_macronames;
144: .Vt extern const char * const * mdoc_argnames;
145: .Vt extern const char * const * mdoc_macronames;
1.4 kristaps 146: .Fd "#define ASCII_NBRSP"
147: .Fd "#define ASCII_HYPH"
1.1 kristaps 148: .Sh DESCRIPTION
149: The
150: .Nm mandoc
151: library parses a
152: .Ux
153: manual into an abstract syntax tree (AST).
154: .Ux
155: manuals are composed of
156: .Xr mdoc 7
157: or
158: .Xr man 7 ,
159: and may be mixed with
160: .Xr roff 7 ,
161: .Xr tbl 7 ,
162: and
163: .Xr eqn 7
164: invocations.
165: .Pp
166: The following describes a general parse sequence:
167: .Bl -enum
168: .It
169: initiate a parsing sequence with
170: .Fn mparse_alloc ;
171: .It
172: parse files or file descriptors with
173: .Fn mparse_readfd ;
174: .It
175: retrieve a parsed syntax tree, if the parse was successful, with
176: .Fn mparse_result ;
177: .It
178: iterate over parse nodes with
179: .Fn mdoc_node
180: or
181: .Fn man_node ;
182: .It
183: free all allocated memory with
184: .Fn mparse_free ,
185: or invoke
186: .Fn mparse_reset
187: and parse new files.
1.3 kristaps 188: .El
1.6 kristaps 189: .Pp
190: The
191: .Nm
192: library also contains routines for translating character strings into glyphs
193: .Pq see Fn mchars_alloc
194: and parsing escape sequences from strings
195: .Pq see Fn mandoc_escape .
1.7 kristaps 196: .Pp
197: This library is
198: .Ud
1.3 kristaps 199: .Sh REFERENCE
200: This section documents the functions, types, and variables available
201: via
202: .In mandoc.h .
203: .Ss Types
204: .Bl -ohang
205: .It Vt "enum mandoc_esc"
206: .It Vt "enum mandocerr"
207: .It Vt "enum mandoclevel"
1.6 kristaps 208: .It Vt "struct mchars"
209: An opaque pointer to an object allowing for translation between
210: character strings and glyphs.
211: See
212: .Fn mchars_alloc .
1.3 kristaps 213: .It Vt "enum mparset"
214: .It Vt "struct mparse"
215: .It Vt "mandocmsg"
216: .El
217: .Ss Functions
218: .Bl -ohang
219: .It Fn mandoc_escape
1.4 kristaps 220: Scan an escape sequence, i.e., a character string beginning with
221: .Sq \e .
222: Pass a pointer to this string as
223: .Va end ;
224: it will be set to the supremum of the parsed escape sequence unless
225: returning ESCAPE_ERROR, in which case the string is bogus and should be
226: thrown away.
227: If not ESCAPE_ERROR or ESCAPE_IGNORE,
228: .Va start
229: is set to the first relevant character of the substring (font, glyph,
230: whatever) of length
231: .Va sz .
232: Both
233: .Va start
234: and
235: .Va sz
236: may be NULL.
1.3 kristaps 237: .It Fn man_meta
1.4 kristaps 238: Obtain the meta-data of a successful parse.
239: This may only be used on a pointer returned by
240: .Fn mparse_result .
1.3 kristaps 241: .It Fn man_node
1.4 kristaps 242: Obtain the root node of a successful parse.
243: This may only be used on a pointer returned by
244: .Fn mparse_result .
1.6 kristaps 245: .It Fn mchars_alloc
246: Allocate an
247: .Vt "struct mchars *"
248: object for translating special characters into glyphs.
249: See
250: .Xr mandoc_char 7
251: for an overview of special characters.
252: The object must be freed with
253: .Fn mchars_free .
254: .It Fn mchars_free
255: Free an object created with
256: .Fn mchars_alloc .
257: .It Fn mchars_num2char
1.7 kristaps 258: Convert a character index (e.g., the \eN\(aq\(aq escape) into a
259: printable ASCII character.
260: Returns \e0 (the nil character) if the input sequence is malformed.
261: .It Fn mchars_num2uc
262: Convert a hexadecimal character index (e.g., the \e[uNNNN] escape) into
263: a Unicode codepoint.
1.6 kristaps 264: Returns \e0 (the nil character) if the input sequence is malformed.
265: .It Fn mchars_res2cp
266: Convert a predefined character into a valid Unicode codepoint.
267: Returns \-1 on failure and 0 if no code-point exists (if this occurs,
268: the caller should fall back to
269: .Fn mchars_res2str ) .
270: .It Fn mchars_res2str
271: Convert a predefined character into an ASCII string.
272: Returns NULL on failure.
273: .It Fn mchars_spec2cp
274: Convert a special character into a valid Unicode codepoint.
275: Returns \-1 on failure and 0 if no code-point exists (if this occurs,
276: the caller should fall back to
277: .Fn mchars_spec2str ) .
278: .It Fn mchars_spec2str
279: Convert a special character into an ASCII string.
280: Returns NULL on failure.
1.3 kristaps 281: .It Fn mdoc_meta
1.4 kristaps 282: Obtain the meta-data of a successful parse.
283: This may only be used on a pointer returned by
284: .Fn mparse_result .
1.3 kristaps 285: .It Fn mdoc_node
1.4 kristaps 286: Obtain the root node of a successful parse.
287: This may only be used on a pointer returned by
288: .Fn mparse_result .
1.3 kristaps 289: .It Fn mparse_alloc
1.4 kristaps 290: Allocate a parser.
291: The same parser may be used for multiple files so long as
292: .Fn mparse_reset
293: is called between parses.
294: .Fn mparse_free
295: must be called to free the memory allocated by this function.
1.3 kristaps 296: .It Fn mparse_free
1.4 kristaps 297: Free all memory allocated by
298: .Fn mparse_alloc .
1.3 kristaps 299: .It Fn mparse_readfd
1.4 kristaps 300: Parse a file or file descriptor.
301: If
302: .Va fd
303: is -1,
304: .Va fname
305: is opened for reading.
306: Otherwise,
307: .Va fname
308: is assumed to be the name associated with
309: .Va fd .
310: This may be called multiple times with different parameters; however,
311: .Fn mparse_reset
312: should be invoked between parses.
1.3 kristaps 313: .It Fn mparse_reset
1.4 kristaps 314: Reset a parser so that
315: .Fn mparse_readfd
316: may be used again.
1.3 kristaps 317: .It Fn mparse_result
1.4 kristaps 318: Obtain the result of a parse.
319: Only successful parses
320: .Po
321: i.e., those where
322: .Fn mparse_readfd
323: returned less than MANDOCLEVEL_FATAL
324: .Pc
325: should invoke this function, in which case one of the two pointers will
326: be filled in.
1.3 kristaps 327: .It Fn mparse_strerror
1.4 kristaps 328: Return a statically-allocated string representation of an error code.
1.3 kristaps 329: .It Fn mparse_strlevel
1.4 kristaps 330: Return a statically-allocated string representation of a level code.
1.3 kristaps 331: .El
332: .Ss Variables
333: .Bl -ohang
334: .It Va man_macronames
1.4 kristaps 335: The string representation of a man macro as indexed by
336: .Vt "enum mant" .
1.3 kristaps 337: .It Va mdoc_argnames
1.4 kristaps 338: The string representation of a mdoc macro argument as indexed by
339: .Vt "enum mdocargt" .
1.3 kristaps 340: .It Va mdoc_macronames
1.4 kristaps 341: The string representation of a mdoc macro as indexed by
342: .Vt "enum mdoct" .
1.1 kristaps 343: .El
344: .Sh IMPLEMENTATION NOTES
345: This section consists of structural documentation for
346: .Xr mdoc 7
347: and
348: .Xr man 7
349: syntax trees.
350: .Ss Man Abstract Syntax Tree
351: This AST is governed by the ontological rules dictated in
352: .Xr man 7
353: and derives its terminology accordingly.
354: .Pp
355: The AST is composed of
356: .Vt struct man_node
357: nodes with element, root and text types as declared by the
358: .Va type
359: field.
360: Each node also provides its parse point (the
361: .Va line ,
362: .Va sec ,
363: and
364: .Va pos
365: fields), its position in the tree (the
366: .Va parent ,
367: .Va child ,
368: .Va next
369: and
370: .Va prev
371: fields) and some type-specific data.
372: .Pp
373: The tree itself is arranged according to the following normal form,
374: where capitalised non-terminals represent nodes.
375: .Pp
376: .Bl -tag -width "ELEMENTXX" -compact
377: .It ROOT
378: \(<- mnode+
379: .It mnode
380: \(<- ELEMENT | TEXT | BLOCK
381: .It BLOCK
382: \(<- HEAD BODY
383: .It HEAD
384: \(<- mnode*
385: .It BODY
386: \(<- mnode*
387: .It ELEMENT
388: \(<- ELEMENT | TEXT*
389: .It TEXT
390: \(<- [[:alpha:]]*
391: .El
392: .Pp
393: The only elements capable of nesting other elements are those with
394: next-lint scope as documented in
395: .Xr man 7 .
396: .Ss Mdoc Abstract Syntax Tree
397: This AST is governed by the ontological
398: rules dictated in
399: .Xr mdoc 7
400: and derives its terminology accordingly.
401: .Qq In-line
402: elements described in
403: .Xr mdoc 7
404: are described simply as
405: .Qq elements .
406: .Pp
407: The AST is composed of
408: .Vt struct mdoc_node
409: nodes with block, head, body, element, root and text types as declared
410: by the
411: .Va type
412: field.
413: Each node also provides its parse point (the
414: .Va line ,
415: .Va sec ,
416: and
417: .Va pos
418: fields), its position in the tree (the
419: .Va parent ,
420: .Va child ,
421: .Va nchild ,
422: .Va next
423: and
424: .Va prev
425: fields) and some type-specific data, in particular, for nodes generated
426: from macros, the generating macro in the
427: .Va tok
428: field.
429: .Pp
430: The tree itself is arranged according to the following normal form,
431: where capitalised non-terminals represent nodes.
432: .Pp
433: .Bl -tag -width "ELEMENTXX" -compact
434: .It ROOT
435: \(<- mnode+
436: .It mnode
437: \(<- BLOCK | ELEMENT | TEXT
438: .It BLOCK
439: \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
440: .It ELEMENT
441: \(<- TEXT*
442: .It HEAD
443: \(<- mnode*
444: .It BODY
445: \(<- mnode* [ENDBODY mnode*]
446: .It TAIL
447: \(<- mnode*
448: .It TEXT
449: \(<- [[:printable:],0x1e]*
450: .El
451: .Pp
452: Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
453: the BLOCK production: these refer to punctuation marks.
454: Furthermore, although a TEXT node will generally have a non-zero-length
455: string, in the specific case of
456: .Sq \&.Bd \-literal ,
457: an empty line will produce a zero-length string.
458: Multiple body parts are only found in invocations of
459: .Sq \&Bl \-column ,
460: where a new body introduces a new phrase.
461: .Pp
462: The
463: .Xr mdoc 7
1.5 kristaps 464: syntax tree accommodates for broken block structures as well.
1.1 kristaps 465: The ENDBODY node is available to end the formatting associated
466: with a given block before the physical end of that block.
467: It has a non-null
468: .Va end
469: field, is of the BODY
470: .Va type ,
471: has the same
472: .Va tok
473: as the BLOCK it is ending, and has a
474: .Va pending
475: field pointing to that BLOCK's BODY node.
476: It is an indirect child of that BODY node
477: and has no children of its own.
478: .Pp
479: An ENDBODY node is generated when a block ends while one of its child
480: blocks is still open, like in the following example:
481: .Bd -literal -offset indent
482: \&.Ao ao
483: \&.Bo bo ac
484: \&.Ac bc
485: \&.Bc end
486: .Ed
487: .Pp
488: This example results in the following block structure:
489: .Bd -literal -offset indent
490: BLOCK Ao
491: HEAD Ao
492: BODY Ao
493: TEXT ao
494: BLOCK Bo, pending -> Ao
495: HEAD Bo
496: BODY Bo
497: TEXT bo
498: TEXT ac
499: ENDBODY Ao, pending -> Ao
500: TEXT bc
501: TEXT end
502: .Ed
503: .Pp
504: Here, the formatting of the
505: .Sq \&Ao
506: block extends from TEXT ao to TEXT ac,
507: while the formatting of the
508: .Sq \&Bo
509: block extends from TEXT bo to TEXT bc.
510: It renders as follows in
511: .Fl T Ns Cm ascii
512: mode:
513: .Pp
514: .Dl <ao [bo ac> bc] end
515: .Pp
516: Support for badly-nested blocks is only provided for backward
517: compatibility with some older
518: .Xr mdoc 7
519: implementations.
520: Using badly-nested blocks is
521: .Em strongly discouraged ;
522: for example, the
523: .Fl T Ns Cm html
524: and
525: .Fl T Ns Cm xhtml
526: front-ends to
527: .Xr mandoc 1
528: are unable to render them in any meaningful way.
529: Furthermore, behaviour when encountering badly-nested blocks is not
530: consistent across troff implementations, especially when using multiple
531: levels of badly-nested blocks.
532: .Sh SEE ALSO
533: .Xr mandoc 1 ,
534: .Xr eqn 7 ,
535: .Xr man 7 ,
1.6 kristaps 536: .Xr mandoc_char 7 ,
1.1 kristaps 537: .Xr mdoc 7 ,
538: .Xr roff 7 ,
539: .Xr tbl 7
540: .Sh AUTHORS
541: The
542: .Nm
543: library was written by
544: .An Kristaps Dzonsons Aq kristaps@bsd.lv .
CVSweb