Annotation of mandoc/mandoc.3, Revision 1.17
1.17 ! joerg 1: .\" $Id: mandoc.3,v 1.16 2011/11/08 00:15:23 kristaps Exp $
1.1 kristaps 2: .\"
3: .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4: .\" Copyright (c) 2010 Ingo Schwarze <schwarze@openbsd.org>
5: .\"
6: .\" Permission to use, copy, modify, and distribute this software for any
7: .\" purpose with or without fee is hereby granted, provided that the above
8: .\" copyright notice and this permission notice appear in all copies.
9: .\"
10: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17: .\"
1.17 ! joerg 18: .Dd $Mdocdate: November 8 2011 $
1.1 kristaps 19: .Dt MANDOC 3
20: .Os
21: .Sh NAME
22: .Nm mandoc ,
1.3 kristaps 23: .Nm mandoc_escape ,
1.1 kristaps 24: .Nm man_meta ,
1.14 kristaps 25: .Nm man_mparse ,
1.1 kristaps 26: .Nm man_node ,
1.6 kristaps 27: .Nm mchars_alloc ,
28: .Nm mchars_free ,
29: .Nm mchars_num2char ,
1.7 kristaps 30: .Nm mchars_num2uc ,
1.6 kristaps 31: .Nm mchars_spec2cp ,
32: .Nm mchars_spec2str ,
1.1 kristaps 33: .Nm mdoc_meta ,
34: .Nm mdoc_node ,
35: .Nm mparse_alloc ,
36: .Nm mparse_free ,
1.14 kristaps 37: .Nm mparse_getkeep ,
38: .Nm mparse_keep ,
1.1 kristaps 39: .Nm mparse_readfd ,
40: .Nm mparse_reset ,
1.2 kristaps 41: .Nm mparse_result ,
42: .Nm mparse_strerror ,
43: .Nm mparse_strlevel
1.1 kristaps 44: .Nd mandoc macro compiler library
1.8 kristaps 45: .Sh LIBRARY
46: .Lb mandoc
1.1 kristaps 47: .Sh SYNOPSIS
48: .In man.h
49: .In mdoc.h
50: .In mandoc.h
1.3 kristaps 51: .Ft "enum mandoc_esc"
52: .Fo mandoc_escape
1.15 kristaps 53: .Fa "const char **end"
54: .Fa "const char **start"
55: .Fa "int *sz"
1.3 kristaps 56: .Fc
1.1 kristaps 57: .Ft "const struct man_meta *"
58: .Fo man_meta
59: .Fa "const struct man *man"
60: .Fc
1.14 kristaps 61: .Ft "const struct mparse *"
62: .Fo man_mparse
63: .Fa "const struct man *man"
64: .Fc
1.1 kristaps 65: .Ft "const struct man_node *"
66: .Fo man_node
67: .Fa "const struct man *man"
68: .Fc
1.6 kristaps 69: .Ft "struct mchars *"
70: .Fn mchars_alloc
71: .Ft void
72: .Fn mchars_free "struct mchars *p"
73: .Ft char
74: .Fn mchars_num2char "const char *cp" "size_t sz"
1.7 kristaps 75: .Ft int
76: .Fn mchars_num2uc "const char *cp" "size_t sz"
1.6 kristaps 77: .Ft "const char *"
78: .Fo mchars_spec2str
1.16 kristaps 79: .Fa "const struct mchars *p"
1.6 kristaps 80: .Fa "const char *cp"
81: .Fa "size_t sz"
82: .Fa "size_t *rsz"
83: .Fc
84: .Ft int
85: .Fo mchars_spec2cp
1.16 kristaps 86: .Fa "const struct mchars *p"
1.6 kristaps 87: .Fa "const char *cp"
88: .Fa "size_t sz"
89: .Ft "const char *"
90: .Fc
1.1 kristaps 91: .Ft "const struct mdoc_meta *"
92: .Fo mdoc_meta
93: .Fa "const struct mdoc *mdoc"
94: .Fc
95: .Ft "const struct mdoc_node *"
96: .Fo mdoc_node
97: .Fa "const struct mdoc *mdoc"
98: .Fc
99: .Ft void
100: .Fo mparse_alloc
101: .Fa "enum mparset type"
102: .Fa "enum mandoclevel wlevel"
103: .Fa "mandocmsg msg"
104: .Fa "void *msgarg"
105: .Fc
106: .Ft void
107: .Fo mparse_free
108: .Fa "struct mparse *parse"
109: .Fc
1.14 kristaps 110: .Ft void
111: .Fo mparse_getkeep
112: .Fa "const struct mparse *parse"
113: .Fc
114: .Ft void
115: .Fo mparse_keep
116: .Fa "struct mparse *parse"
117: .Fc
1.1 kristaps 118: .Ft "enum mandoclevel"
119: .Fo mparse_readfd
120: .Fa "struct mparse *parse"
121: .Fa "int fd"
122: .Fa "const char *fname"
123: .Fc
124: .Ft void
125: .Fo mparse_reset
126: .Fa "struct mparse *parse"
127: .Fc
128: .Ft void
129: .Fo mparse_result
130: .Fa "struct mparse *parse"
131: .Fa "struct mdoc **mdoc"
132: .Fa "struct man **man"
1.2 kristaps 133: .Fc
134: .Ft "const char *"
135: .Fo mparse_strerror
136: .Fa "enum mandocerr"
137: .Fc
138: .Ft "const char *"
139: .Fo mparse_strlevel
140: .Fa "enum mandoclevel"
1.1 kristaps 141: .Fc
142: .Vt extern const char * const * man_macronames;
143: .Vt extern const char * const * mdoc_argnames;
144: .Vt extern const char * const * mdoc_macronames;
1.4 kristaps 145: .Fd "#define ASCII_NBRSP"
146: .Fd "#define ASCII_HYPH"
1.1 kristaps 147: .Sh DESCRIPTION
148: The
149: .Nm mandoc
150: library parses a
151: .Ux
152: manual into an abstract syntax tree (AST).
153: .Ux
154: manuals are composed of
155: .Xr mdoc 7
156: or
157: .Xr man 7 ,
158: and may be mixed with
159: .Xr roff 7 ,
160: .Xr tbl 7 ,
161: and
162: .Xr eqn 7
163: invocations.
164: .Pp
165: The following describes a general parse sequence:
166: .Bl -enum
167: .It
168: initiate a parsing sequence with
169: .Fn mparse_alloc ;
170: .It
171: parse files or file descriptors with
172: .Fn mparse_readfd ;
173: .It
174: retrieve a parsed syntax tree, if the parse was successful, with
175: .Fn mparse_result ;
176: .It
177: iterate over parse nodes with
178: .Fn mdoc_node
179: or
180: .Fn man_node ;
181: .It
182: free all allocated memory with
183: .Fn mparse_free ,
184: or invoke
185: .Fn mparse_reset
186: and parse new files.
1.3 kristaps 187: .El
1.6 kristaps 188: .Pp
189: The
190: .Nm
191: library also contains routines for translating character strings into glyphs
192: .Pq see Fn mchars_alloc
193: and parsing escape sequences from strings
194: .Pq see Fn mandoc_escape .
1.3 kristaps 195: .Sh REFERENCE
196: This section documents the functions, types, and variables available
197: via
198: .In mandoc.h .
199: .Ss Types
200: .Bl -ohang
201: .It Vt "enum mandoc_esc"
1.11 kristaps 202: An escape sequence classification.
1.3 kristaps 203: .It Vt "enum mandocerr"
1.11 kristaps 204: A fatal error, error, or warning message during parsing.
1.3 kristaps 205: .It Vt "enum mandoclevel"
1.11 kristaps 206: A classification of an
207: .Vt "enum mandoclevel"
208: as regards system operation.
1.6 kristaps 209: .It Vt "struct mchars"
210: An opaque pointer to an object allowing for translation between
211: character strings and glyphs.
212: See
213: .Fn mchars_alloc .
1.3 kristaps 214: .It Vt "enum mparset"
1.11 kristaps 215: The type of parser when reading input.
216: This should usually be
1.12 kristaps 217: .Dv MPARSE_AUTO
1.11 kristaps 218: for auto-detection.
1.3 kristaps 219: .It Vt "struct mparse"
1.11 kristaps 220: An opaque pointer to a running parse sequence.
221: Created with
222: .Fn mparse_alloc
223: and freed with
224: .Fn mparse_free .
225: This may be used across parsed input if
226: .Fn mparse_reset
227: is called between parses.
1.3 kristaps 228: .It Vt "mandocmsg"
1.11 kristaps 229: A prototype for a function to handle fatal error, error, and warning
230: messages emitted by the parser.
1.3 kristaps 231: .El
232: .Ss Functions
233: .Bl -ohang
234: .It Fn mandoc_escape
1.4 kristaps 235: Scan an escape sequence, i.e., a character string beginning with
236: .Sq \e .
1.17 ! joerg 237: Pass a pointer to the character after the
! 238: .Sq \e
! 239: as
1.4 kristaps 240: .Va end ;
241: it will be set to the supremum of the parsed escape sequence unless
1.12 kristaps 242: returning
243: .Dv ESCAPE_ERROR ,
244: in which case the string is bogus and should be
1.4 kristaps 245: thrown away.
1.12 kristaps 246: If not
247: .Dv ESCAPE_ERROR
248: or
249: .Dv ESCAPE_IGNORE ,
1.4 kristaps 250: .Va start
251: is set to the first relevant character of the substring (font, glyph,
252: whatever) of length
253: .Va sz .
254: Both
255: .Va start
256: and
257: .Va sz
1.12 kristaps 258: may be
259: .Dv NULL .
1.3 kristaps 260: .It Fn man_meta
1.4 kristaps 261: Obtain the meta-data of a successful parse.
262: This may only be used on a pointer returned by
263: .Fn mparse_result .
1.14 kristaps 264: .It Fn man_mparse
265: Get the parser used for the current output.
1.3 kristaps 266: .It Fn man_node
1.4 kristaps 267: Obtain the root node of a successful parse.
268: This may only be used on a pointer returned by
269: .Fn mparse_result .
1.6 kristaps 270: .It Fn mchars_alloc
271: Allocate an
272: .Vt "struct mchars *"
273: object for translating special characters into glyphs.
274: See
275: .Xr mandoc_char 7
276: for an overview of special characters.
277: The object must be freed with
278: .Fn mchars_free .
279: .It Fn mchars_free
280: Free an object created with
281: .Fn mchars_alloc .
282: .It Fn mchars_num2char
1.7 kristaps 283: Convert a character index (e.g., the \eN\(aq\(aq escape) into a
284: printable ASCII character.
285: Returns \e0 (the nil character) if the input sequence is malformed.
286: .It Fn mchars_num2uc
287: Convert a hexadecimal character index (e.g., the \e[uNNNN] escape) into
288: a Unicode codepoint.
1.6 kristaps 289: Returns \e0 (the nil character) if the input sequence is malformed.
290: .It Fn mchars_spec2cp
291: Convert a special character into a valid Unicode codepoint.
1.10 kristaps 292: Returns \-1 on failure or a non-zero Unicode codepoint on success.
1.6 kristaps 293: .It Fn mchars_spec2str
294: Convert a special character into an ASCII string.
1.12 kristaps 295: Returns
296: .Dv NULL
297: on failure.
1.3 kristaps 298: .It Fn mdoc_meta
1.4 kristaps 299: Obtain the meta-data of a successful parse.
300: This may only be used on a pointer returned by
301: .Fn mparse_result .
1.3 kristaps 302: .It Fn mdoc_node
1.4 kristaps 303: Obtain the root node of a successful parse.
304: This may only be used on a pointer returned by
305: .Fn mparse_result .
1.3 kristaps 306: .It Fn mparse_alloc
1.4 kristaps 307: Allocate a parser.
308: The same parser may be used for multiple files so long as
309: .Fn mparse_reset
310: is called between parses.
311: .Fn mparse_free
312: must be called to free the memory allocated by this function.
1.3 kristaps 313: .It Fn mparse_free
1.4 kristaps 314: Free all memory allocated by
315: .Fn mparse_alloc .
1.14 kristaps 316: .It Fn mparse_getkeep
317: Acquire the keep buffer.
318: Must follow a call of
319: .Fn mparse_keep .
320: .It Fn mparse_keep
321: Instruct the parser to retain a copy of its parsed input.
322: This can be acquired with subsequent
323: .Fn mparse_getkeep
324: calls.
1.3 kristaps 325: .It Fn mparse_readfd
1.4 kristaps 326: Parse a file or file descriptor.
327: If
328: .Va fd
329: is -1,
330: .Va fname
331: is opened for reading.
332: Otherwise,
333: .Va fname
334: is assumed to be the name associated with
335: .Va fd .
336: This may be called multiple times with different parameters; however,
337: .Fn mparse_reset
338: should be invoked between parses.
1.3 kristaps 339: .It Fn mparse_reset
1.4 kristaps 340: Reset a parser so that
341: .Fn mparse_readfd
342: may be used again.
1.3 kristaps 343: .It Fn mparse_result
1.4 kristaps 344: Obtain the result of a parse.
345: Only successful parses
346: .Po
347: i.e., those where
348: .Fn mparse_readfd
349: returned less than MANDOCLEVEL_FATAL
350: .Pc
351: should invoke this function, in which case one of the two pointers will
352: be filled in.
1.3 kristaps 353: .It Fn mparse_strerror
1.4 kristaps 354: Return a statically-allocated string representation of an error code.
1.3 kristaps 355: .It Fn mparse_strlevel
1.4 kristaps 356: Return a statically-allocated string representation of a level code.
1.3 kristaps 357: .El
358: .Ss Variables
359: .Bl -ohang
360: .It Va man_macronames
1.4 kristaps 361: The string representation of a man macro as indexed by
362: .Vt "enum mant" .
1.3 kristaps 363: .It Va mdoc_argnames
1.4 kristaps 364: The string representation of a mdoc macro argument as indexed by
365: .Vt "enum mdocargt" .
1.3 kristaps 366: .It Va mdoc_macronames
1.4 kristaps 367: The string representation of a mdoc macro as indexed by
368: .Vt "enum mdoct" .
1.1 kristaps 369: .El
370: .Sh IMPLEMENTATION NOTES
371: This section consists of structural documentation for
372: .Xr mdoc 7
373: and
374: .Xr man 7
1.11 kristaps 375: syntax trees and strings.
376: .Ss Man and Mdoc Strings
377: Strings may be extracted from mdoc and man meta-data, or from text
378: nodes (MDOC_TEXT and MAN_TEXT, respectively).
379: These strings have special non-printing formatting cues embedded in the
380: text itself, as well as
381: .Xr roff 7
382: escapes preserved from input.
383: Implementing systems will need to handle both situations to produce
384: human-readable text.
385: In general, strings may be assumed to consist of 7-bit ASCII characters.
386: .Pp
387: The following non-printing characters may be embedded in text strings:
388: .Bl -tag -width Ds
389: .It Dv ASCII_NBRSP
390: A non-breaking space character.
391: .It Dv ASCII_HYPH
392: A soft hyphen.
393: .El
394: .Pp
395: Escape characters are also passed verbatim into text strings.
396: An escape character is a sequence of characters beginning with the
397: backslash
398: .Pq Sq \e .
399: To construct human-readable text, these should be intercepted with
400: .Fn mandoc_escape
401: and converted with one of
402: .Fn mchars_num2char ,
403: .Fn mchars_spec2str ,
404: and so on.
1.1 kristaps 405: .Ss Man Abstract Syntax Tree
406: This AST is governed by the ontological rules dictated in
407: .Xr man 7
408: and derives its terminology accordingly.
409: .Pp
410: The AST is composed of
411: .Vt struct man_node
412: nodes with element, root and text types as declared by the
413: .Va type
414: field.
415: Each node also provides its parse point (the
416: .Va line ,
417: .Va sec ,
418: and
419: .Va pos
420: fields), its position in the tree (the
421: .Va parent ,
422: .Va child ,
423: .Va next
424: and
425: .Va prev
426: fields) and some type-specific data.
427: .Pp
428: The tree itself is arranged according to the following normal form,
429: where capitalised non-terminals represent nodes.
430: .Pp
431: .Bl -tag -width "ELEMENTXX" -compact
432: .It ROOT
433: \(<- mnode+
434: .It mnode
435: \(<- ELEMENT | TEXT | BLOCK
436: .It BLOCK
437: \(<- HEAD BODY
438: .It HEAD
439: \(<- mnode*
440: .It BODY
441: \(<- mnode*
442: .It ELEMENT
443: \(<- ELEMENT | TEXT*
444: .It TEXT
1.11 kristaps 445: \(<- [[:ascii:]]*
1.1 kristaps 446: .El
447: .Pp
448: The only elements capable of nesting other elements are those with
449: next-lint scope as documented in
450: .Xr man 7 .
451: .Ss Mdoc Abstract Syntax Tree
452: This AST is governed by the ontological
453: rules dictated in
454: .Xr mdoc 7
455: and derives its terminology accordingly.
456: .Qq In-line
457: elements described in
458: .Xr mdoc 7
459: are described simply as
460: .Qq elements .
461: .Pp
462: The AST is composed of
463: .Vt struct mdoc_node
464: nodes with block, head, body, element, root and text types as declared
465: by the
466: .Va type
467: field.
468: Each node also provides its parse point (the
469: .Va line ,
470: .Va sec ,
471: and
472: .Va pos
473: fields), its position in the tree (the
474: .Va parent ,
475: .Va child ,
476: .Va nchild ,
477: .Va next
478: and
479: .Va prev
480: fields) and some type-specific data, in particular, for nodes generated
481: from macros, the generating macro in the
482: .Va tok
483: field.
484: .Pp
485: The tree itself is arranged according to the following normal form,
486: where capitalised non-terminals represent nodes.
487: .Pp
488: .Bl -tag -width "ELEMENTXX" -compact
489: .It ROOT
490: \(<- mnode+
491: .It mnode
492: \(<- BLOCK | ELEMENT | TEXT
493: .It BLOCK
494: \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
495: .It ELEMENT
496: \(<- TEXT*
497: .It HEAD
498: \(<- mnode*
499: .It BODY
500: \(<- mnode* [ENDBODY mnode*]
501: .It TAIL
502: \(<- mnode*
503: .It TEXT
1.11 kristaps 504: \(<- [[:ascii:]]*
1.1 kristaps 505: .El
506: .Pp
507: Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
508: the BLOCK production: these refer to punctuation marks.
509: Furthermore, although a TEXT node will generally have a non-zero-length
510: string, in the specific case of
511: .Sq \&.Bd \-literal ,
512: an empty line will produce a zero-length string.
513: Multiple body parts are only found in invocations of
514: .Sq \&Bl \-column ,
515: where a new body introduces a new phrase.
516: .Pp
517: The
518: .Xr mdoc 7
1.5 kristaps 519: syntax tree accommodates for broken block structures as well.
1.1 kristaps 520: The ENDBODY node is available to end the formatting associated
521: with a given block before the physical end of that block.
522: It has a non-null
523: .Va end
524: field, is of the BODY
525: .Va type ,
526: has the same
527: .Va tok
528: as the BLOCK it is ending, and has a
529: .Va pending
530: field pointing to that BLOCK's BODY node.
531: It is an indirect child of that BODY node
532: and has no children of its own.
533: .Pp
534: An ENDBODY node is generated when a block ends while one of its child
535: blocks is still open, like in the following example:
536: .Bd -literal -offset indent
537: \&.Ao ao
538: \&.Bo bo ac
539: \&.Ac bc
540: \&.Bc end
541: .Ed
542: .Pp
543: This example results in the following block structure:
544: .Bd -literal -offset indent
545: BLOCK Ao
546: HEAD Ao
547: BODY Ao
548: TEXT ao
549: BLOCK Bo, pending -> Ao
550: HEAD Bo
551: BODY Bo
552: TEXT bo
553: TEXT ac
554: ENDBODY Ao, pending -> Ao
555: TEXT bc
556: TEXT end
557: .Ed
558: .Pp
559: Here, the formatting of the
560: .Sq \&Ao
561: block extends from TEXT ao to TEXT ac,
562: while the formatting of the
563: .Sq \&Bo
564: block extends from TEXT bo to TEXT bc.
565: It renders as follows in
566: .Fl T Ns Cm ascii
567: mode:
568: .Pp
569: .Dl <ao [bo ac> bc] end
570: .Pp
571: Support for badly-nested blocks is only provided for backward
572: compatibility with some older
573: .Xr mdoc 7
574: implementations.
575: Using badly-nested blocks is
576: .Em strongly discouraged ;
577: for example, the
578: .Fl T Ns Cm html
579: and
580: .Fl T Ns Cm xhtml
581: front-ends to
582: .Xr mandoc 1
583: are unable to render them in any meaningful way.
584: Furthermore, behaviour when encountering badly-nested blocks is not
585: consistent across troff implementations, especially when using multiple
586: levels of badly-nested blocks.
587: .Sh SEE ALSO
588: .Xr mandoc 1 ,
589: .Xr eqn 7 ,
590: .Xr man 7 ,
1.6 kristaps 591: .Xr mandoc_char 7 ,
1.1 kristaps 592: .Xr mdoc 7 ,
593: .Xr roff 7 ,
594: .Xr tbl 7
595: .Sh AUTHORS
596: The
597: .Nm
598: library was written by
1.13 kristaps 599: .An Kristaps Dzonsons ,
600: .Mt kristaps@bsd.lv .
CVSweb