Annotation of mandoc/mandoc.3, Revision 1.7
1.7 ! kristaps 1: .\" $Id: mandoc.3,v 1.6 2011/05/01 10:40:52 kristaps Exp $
1.1 kristaps 2: .\"
3: .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4: .\" Copyright (c) 2010 Ingo Schwarze <schwarze@openbsd.org>
5: .\"
6: .\" Permission to use, copy, modify, and distribute this software for any
7: .\" purpose with or without fee is hereby granted, provided that the above
8: .\" copyright notice and this permission notice appear in all copies.
9: .\"
10: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17: .\"
1.7 ! kristaps 18: .Dd $Mdocdate: May 1 2011 $
1.1 kristaps 19: .Dt MANDOC 3
20: .Os
21: .Sh NAME
22: .Nm mandoc ,
1.3 kristaps 23: .Nm mandoc_escape ,
1.1 kristaps 24: .Nm man_meta ,
25: .Nm man_node ,
1.6 kristaps 26: .Nm mchars_alloc ,
27: .Nm mchars_free ,
28: .Nm mchars_num2char ,
1.7 ! kristaps 29: .Nm mchars_num2uc ,
1.6 kristaps 30: .Nm mchars_res2cp ,
31: .Nm mchars_res2str ,
32: .Nm mchars_spec2cp ,
33: .Nm mchars_spec2str ,
1.1 kristaps 34: .Nm mdoc_meta ,
35: .Nm mdoc_node ,
36: .Nm mparse_alloc ,
37: .Nm mparse_free ,
38: .Nm mparse_readfd ,
39: .Nm mparse_reset ,
1.2 kristaps 40: .Nm mparse_result ,
41: .Nm mparse_strerror ,
42: .Nm mparse_strlevel
1.1 kristaps 43: .Nd mandoc macro compiler library
44: .Sh SYNOPSIS
45: .In man.h
46: .In mdoc.h
47: .In mandoc.h
1.3 kristaps 48: .Ft "enum mandoc_esc"
49: .Fo mandoc_escape
50: .Fa "const char **in"
51: .Fa "const char **seq"
52: .Fa "int *len"
53: .Fc
1.1 kristaps 54: .Ft "const struct man_meta *"
55: .Fo man_meta
56: .Fa "const struct man *man"
57: .Fc
58: .Ft "const struct man_node *"
59: .Fo man_node
60: .Fa "const struct man *man"
61: .Fc
1.6 kristaps 62: .Ft "struct mchars *"
63: .Fn mchars_alloc
64: .Ft void
65: .Fn mchars_free "struct mchars *p"
66: .Ft char
67: .Fn mchars_num2char "const char *cp" "size_t sz"
1.7 ! kristaps 68: .Ft int
! 69: .Fn mchars_num2uc "const char *cp" "size_t sz"
1.6 kristaps 70: .Ft "const char *"
71: .Fo mchars_res2str
72: .Fa "struct mchars *p"
73: .Fa "const char *cp"
74: .Fa "size_t sz"
75: .Fa "size_t *rsz"
76: .Fc
77: .Ft int
78: .Fo mchars_res2cp
79: .Fa "struct mchars *p"
80: .Fa "const char *cp"
81: .Fa "size_t sz"
82: .Ft "const char *"
83: .Fc
84: .Ft "const char *"
85: .Fo mchars_spec2str
86: .Fa "struct mchars *p"
87: .Fa "const char *cp"
88: .Fa "size_t sz"
89: .Fa "size_t *rsz"
90: .Fc
91: .Ft int
92: .Fo mchars_spec2cp
93: .Fa "struct mchars *p"
94: .Fa "const char *cp"
95: .Fa "size_t sz"
96: .Ft "const char *"
97: .Fc
1.1 kristaps 98: .Ft "const struct mdoc_meta *"
99: .Fo mdoc_meta
100: .Fa "const struct mdoc *mdoc"
101: .Fc
102: .Ft "const struct mdoc_node *"
103: .Fo mdoc_node
104: .Fa "const struct mdoc *mdoc"
105: .Fc
106: .Ft void
107: .Fo mparse_alloc
108: .Fa "enum mparset type"
109: .Fa "enum mandoclevel wlevel"
110: .Fa "mandocmsg msg"
111: .Fa "void *msgarg"
112: .Fc
113: .Ft void
114: .Fo mparse_free
115: .Fa "struct mparse *parse"
116: .Fc
117: .Ft "enum mandoclevel"
118: .Fo mparse_readfd
119: .Fa "struct mparse *parse"
120: .Fa "int fd"
121: .Fa "const char *fname"
122: .Fc
123: .Ft void
124: .Fo mparse_reset
125: .Fa "struct mparse *parse"
126: .Fc
127: .Ft void
128: .Fo mparse_result
129: .Fa "struct mparse *parse"
130: .Fa "struct mdoc **mdoc"
131: .Fa "struct man **man"
1.2 kristaps 132: .Fc
133: .Ft "const char *"
134: .Fo mparse_strerror
135: .Fa "enum mandocerr"
136: .Fc
137: .Ft "const char *"
138: .Fo mparse_strlevel
139: .Fa "enum mandoclevel"
1.1 kristaps 140: .Fc
141: .Vt extern const char * const * man_macronames;
142: .Vt extern const char * const * mdoc_argnames;
143: .Vt extern const char * const * mdoc_macronames;
1.4 kristaps 144: .Fd "#define ASCII_NBRSP"
145: .Fd "#define ASCII_HYPH"
1.1 kristaps 146: .Sh DESCRIPTION
147: The
148: .Nm mandoc
149: library parses a
150: .Ux
151: manual into an abstract syntax tree (AST).
152: .Ux
153: manuals are composed of
154: .Xr mdoc 7
155: or
156: .Xr man 7 ,
157: and may be mixed with
158: .Xr roff 7 ,
159: .Xr tbl 7 ,
160: and
161: .Xr eqn 7
162: invocations.
163: .Pp
164: The following describes a general parse sequence:
165: .Bl -enum
166: .It
167: initiate a parsing sequence with
168: .Fn mparse_alloc ;
169: .It
170: parse files or file descriptors with
171: .Fn mparse_readfd ;
172: .It
173: retrieve a parsed syntax tree, if the parse was successful, with
174: .Fn mparse_result ;
175: .It
176: iterate over parse nodes with
177: .Fn mdoc_node
178: or
179: .Fn man_node ;
180: .It
181: free all allocated memory with
182: .Fn mparse_free ,
183: or invoke
184: .Fn mparse_reset
185: and parse new files.
1.3 kristaps 186: .El
1.6 kristaps 187: .Pp
188: The
189: .Nm
190: library also contains routines for translating character strings into glyphs
191: .Pq see Fn mchars_alloc
192: and parsing escape sequences from strings
193: .Pq see Fn mandoc_escape .
1.7 ! kristaps 194: .Pp
! 195: This library is
! 196: .Ud
1.3 kristaps 197: .Sh REFERENCE
198: This section documents the functions, types, and variables available
199: via
200: .In mandoc.h .
201: .Ss Types
202: .Bl -ohang
203: .It Vt "enum mandoc_esc"
204: .It Vt "enum mandocerr"
205: .It Vt "enum mandoclevel"
1.6 kristaps 206: .It Vt "struct mchars"
207: An opaque pointer to an object allowing for translation between
208: character strings and glyphs.
209: See
210: .Fn mchars_alloc .
1.3 kristaps 211: .It Vt "enum mparset"
212: .It Vt "struct mparse"
213: .It Vt "mandocmsg"
214: .El
215: .Ss Functions
216: .Bl -ohang
217: .It Fn mandoc_escape
1.4 kristaps 218: Scan an escape sequence, i.e., a character string beginning with
219: .Sq \e .
220: Pass a pointer to this string as
221: .Va end ;
222: it will be set to the supremum of the parsed escape sequence unless
223: returning ESCAPE_ERROR, in which case the string is bogus and should be
224: thrown away.
225: If not ESCAPE_ERROR or ESCAPE_IGNORE,
226: .Va start
227: is set to the first relevant character of the substring (font, glyph,
228: whatever) of length
229: .Va sz .
230: Both
231: .Va start
232: and
233: .Va sz
234: may be NULL.
1.3 kristaps 235: .It Fn man_meta
1.4 kristaps 236: Obtain the meta-data of a successful parse.
237: This may only be used on a pointer returned by
238: .Fn mparse_result .
1.3 kristaps 239: .It Fn man_node
1.4 kristaps 240: Obtain the root node of a successful parse.
241: This may only be used on a pointer returned by
242: .Fn mparse_result .
1.6 kristaps 243: .It Fn mchars_alloc
244: Allocate an
245: .Vt "struct mchars *"
246: object for translating special characters into glyphs.
247: See
248: .Xr mandoc_char 7
249: for an overview of special characters.
250: The object must be freed with
251: .Fn mchars_free .
252: .It Fn mchars_free
253: Free an object created with
254: .Fn mchars_alloc .
255: .It Fn mchars_num2char
1.7 ! kristaps 256: Convert a character index (e.g., the \eN\(aq\(aq escape) into a
! 257: printable ASCII character.
! 258: Returns \e0 (the nil character) if the input sequence is malformed.
! 259: .It Fn mchars_num2uc
! 260: Convert a hexadecimal character index (e.g., the \e[uNNNN] escape) into
! 261: a Unicode codepoint.
1.6 kristaps 262: Returns \e0 (the nil character) if the input sequence is malformed.
263: .It Fn mchars_res2cp
264: Convert a predefined character into a valid Unicode codepoint.
265: Returns \-1 on failure and 0 if no code-point exists (if this occurs,
266: the caller should fall back to
267: .Fn mchars_res2str ) .
268: .It Fn mchars_res2str
269: Convert a predefined character into an ASCII string.
270: Returns NULL on failure.
271: .It Fn mchars_spec2cp
272: Convert a special character into a valid Unicode codepoint.
273: Returns \-1 on failure and 0 if no code-point exists (if this occurs,
274: the caller should fall back to
275: .Fn mchars_spec2str ) .
276: .It Fn mchars_spec2str
277: Convert a special character into an ASCII string.
278: Returns NULL on failure.
1.3 kristaps 279: .It Fn mdoc_meta
1.4 kristaps 280: Obtain the meta-data of a successful parse.
281: This may only be used on a pointer returned by
282: .Fn mparse_result .
1.3 kristaps 283: .It Fn mdoc_node
1.4 kristaps 284: Obtain the root node of a successful parse.
285: This may only be used on a pointer returned by
286: .Fn mparse_result .
1.3 kristaps 287: .It Fn mparse_alloc
1.4 kristaps 288: Allocate a parser.
289: The same parser may be used for multiple files so long as
290: .Fn mparse_reset
291: is called between parses.
292: .Fn mparse_free
293: must be called to free the memory allocated by this function.
1.3 kristaps 294: .It Fn mparse_free
1.4 kristaps 295: Free all memory allocated by
296: .Fn mparse_alloc .
1.3 kristaps 297: .It Fn mparse_readfd
1.4 kristaps 298: Parse a file or file descriptor.
299: If
300: .Va fd
301: is -1,
302: .Va fname
303: is opened for reading.
304: Otherwise,
305: .Va fname
306: is assumed to be the name associated with
307: .Va fd .
308: This may be called multiple times with different parameters; however,
309: .Fn mparse_reset
310: should be invoked between parses.
1.3 kristaps 311: .It Fn mparse_reset
1.4 kristaps 312: Reset a parser so that
313: .Fn mparse_readfd
314: may be used again.
1.3 kristaps 315: .It Fn mparse_result
1.4 kristaps 316: Obtain the result of a parse.
317: Only successful parses
318: .Po
319: i.e., those where
320: .Fn mparse_readfd
321: returned less than MANDOCLEVEL_FATAL
322: .Pc
323: should invoke this function, in which case one of the two pointers will
324: be filled in.
1.3 kristaps 325: .It Fn mparse_strerror
1.4 kristaps 326: Return a statically-allocated string representation of an error code.
1.3 kristaps 327: .It Fn mparse_strlevel
1.4 kristaps 328: Return a statically-allocated string representation of a level code.
1.3 kristaps 329: .El
330: .Ss Variables
331: .Bl -ohang
332: .It Va man_macronames
1.4 kristaps 333: The string representation of a man macro as indexed by
334: .Vt "enum mant" .
1.3 kristaps 335: .It Va mdoc_argnames
1.4 kristaps 336: The string representation of a mdoc macro argument as indexed by
337: .Vt "enum mdocargt" .
1.3 kristaps 338: .It Va mdoc_macronames
1.4 kristaps 339: The string representation of a mdoc macro as indexed by
340: .Vt "enum mdoct" .
1.1 kristaps 341: .El
342: .Sh IMPLEMENTATION NOTES
343: This section consists of structural documentation for
344: .Xr mdoc 7
345: and
346: .Xr man 7
347: syntax trees.
348: .Ss Man Abstract Syntax Tree
349: This AST is governed by the ontological rules dictated in
350: .Xr man 7
351: and derives its terminology accordingly.
352: .Pp
353: The AST is composed of
354: .Vt struct man_node
355: nodes with element, root and text types as declared by the
356: .Va type
357: field.
358: Each node also provides its parse point (the
359: .Va line ,
360: .Va sec ,
361: and
362: .Va pos
363: fields), its position in the tree (the
364: .Va parent ,
365: .Va child ,
366: .Va next
367: and
368: .Va prev
369: fields) and some type-specific data.
370: .Pp
371: The tree itself is arranged according to the following normal form,
372: where capitalised non-terminals represent nodes.
373: .Pp
374: .Bl -tag -width "ELEMENTXX" -compact
375: .It ROOT
376: \(<- mnode+
377: .It mnode
378: \(<- ELEMENT | TEXT | BLOCK
379: .It BLOCK
380: \(<- HEAD BODY
381: .It HEAD
382: \(<- mnode*
383: .It BODY
384: \(<- mnode*
385: .It ELEMENT
386: \(<- ELEMENT | TEXT*
387: .It TEXT
388: \(<- [[:alpha:]]*
389: .El
390: .Pp
391: The only elements capable of nesting other elements are those with
392: next-lint scope as documented in
393: .Xr man 7 .
394: .Ss Mdoc Abstract Syntax Tree
395: This AST is governed by the ontological
396: rules dictated in
397: .Xr mdoc 7
398: and derives its terminology accordingly.
399: .Qq In-line
400: elements described in
401: .Xr mdoc 7
402: are described simply as
403: .Qq elements .
404: .Pp
405: The AST is composed of
406: .Vt struct mdoc_node
407: nodes with block, head, body, element, root and text types as declared
408: by the
409: .Va type
410: field.
411: Each node also provides its parse point (the
412: .Va line ,
413: .Va sec ,
414: and
415: .Va pos
416: fields), its position in the tree (the
417: .Va parent ,
418: .Va child ,
419: .Va nchild ,
420: .Va next
421: and
422: .Va prev
423: fields) and some type-specific data, in particular, for nodes generated
424: from macros, the generating macro in the
425: .Va tok
426: field.
427: .Pp
428: The tree itself is arranged according to the following normal form,
429: where capitalised non-terminals represent nodes.
430: .Pp
431: .Bl -tag -width "ELEMENTXX" -compact
432: .It ROOT
433: \(<- mnode+
434: .It mnode
435: \(<- BLOCK | ELEMENT | TEXT
436: .It BLOCK
437: \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
438: .It ELEMENT
439: \(<- TEXT*
440: .It HEAD
441: \(<- mnode*
442: .It BODY
443: \(<- mnode* [ENDBODY mnode*]
444: .It TAIL
445: \(<- mnode*
446: .It TEXT
447: \(<- [[:printable:],0x1e]*
448: .El
449: .Pp
450: Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
451: the BLOCK production: these refer to punctuation marks.
452: Furthermore, although a TEXT node will generally have a non-zero-length
453: string, in the specific case of
454: .Sq \&.Bd \-literal ,
455: an empty line will produce a zero-length string.
456: Multiple body parts are only found in invocations of
457: .Sq \&Bl \-column ,
458: where a new body introduces a new phrase.
459: .Pp
460: The
461: .Xr mdoc 7
1.5 kristaps 462: syntax tree accommodates for broken block structures as well.
1.1 kristaps 463: The ENDBODY node is available to end the formatting associated
464: with a given block before the physical end of that block.
465: It has a non-null
466: .Va end
467: field, is of the BODY
468: .Va type ,
469: has the same
470: .Va tok
471: as the BLOCK it is ending, and has a
472: .Va pending
473: field pointing to that BLOCK's BODY node.
474: It is an indirect child of that BODY node
475: and has no children of its own.
476: .Pp
477: An ENDBODY node is generated when a block ends while one of its child
478: blocks is still open, like in the following example:
479: .Bd -literal -offset indent
480: \&.Ao ao
481: \&.Bo bo ac
482: \&.Ac bc
483: \&.Bc end
484: .Ed
485: .Pp
486: This example results in the following block structure:
487: .Bd -literal -offset indent
488: BLOCK Ao
489: HEAD Ao
490: BODY Ao
491: TEXT ao
492: BLOCK Bo, pending -> Ao
493: HEAD Bo
494: BODY Bo
495: TEXT bo
496: TEXT ac
497: ENDBODY Ao, pending -> Ao
498: TEXT bc
499: TEXT end
500: .Ed
501: .Pp
502: Here, the formatting of the
503: .Sq \&Ao
504: block extends from TEXT ao to TEXT ac,
505: while the formatting of the
506: .Sq \&Bo
507: block extends from TEXT bo to TEXT bc.
508: It renders as follows in
509: .Fl T Ns Cm ascii
510: mode:
511: .Pp
512: .Dl <ao [bo ac> bc] end
513: .Pp
514: Support for badly-nested blocks is only provided for backward
515: compatibility with some older
516: .Xr mdoc 7
517: implementations.
518: Using badly-nested blocks is
519: .Em strongly discouraged ;
520: for example, the
521: .Fl T Ns Cm html
522: and
523: .Fl T Ns Cm xhtml
524: front-ends to
525: .Xr mandoc 1
526: are unable to render them in any meaningful way.
527: Furthermore, behaviour when encountering badly-nested blocks is not
528: consistent across troff implementations, especially when using multiple
529: levels of badly-nested blocks.
530: .Sh SEE ALSO
531: .Xr mandoc 1 ,
532: .Xr eqn 7 ,
533: .Xr man 7 ,
1.6 kristaps 534: .Xr mandoc_char 7 ,
1.1 kristaps 535: .Xr mdoc 7 ,
536: .Xr roff 7 ,
537: .Xr tbl 7
538: .Sh AUTHORS
539: The
540: .Nm
541: library was written by
542: .An Kristaps Dzonsons Aq kristaps@bsd.lv .
CVSweb