Annotation of mandoc/mandoc.3, Revision 1.43
1.43 ! schwarze 1: .\" $Id: mandoc.3,v 1.42 2018/08/23 19:33:27 schwarze Exp $
1.1 kristaps 2: .\"
3: .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
1.38 schwarze 4: .\" Copyright (c) 2010-2017 Ingo Schwarze <schwarze@openbsd.org>
1.1 kristaps 5: .\"
6: .\" Permission to use, copy, modify, and distribute this software for any
7: .\" purpose with or without fee is hereby granted, provided that the above
8: .\" copyright notice and this permission notice appear in all copies.
9: .\"
10: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17: .\"
1.43 ! schwarze 18: .Dd $Mdocdate: August 23 2018 $
1.1 kristaps 19: .Dt MANDOC 3
20: .Os
21: .Sh NAME
22: .Nm mandoc ,
1.37 schwarze 23: .Nm deroff ,
24: .Nm man_validate ,
25: .Nm mdoc_validate ,
1.1 kristaps 26: .Nm mparse_alloc ,
1.42 schwarze 27: .Nm mparse_copy ,
1.1 kristaps 28: .Nm mparse_free ,
1.26 schwarze 29: .Nm mparse_open ,
1.1 kristaps 30: .Nm mparse_readfd ,
31: .Nm mparse_reset ,
1.43 ! schwarze 32: .Nm mparse_result
1.1 kristaps 33: .Nd mandoc macro compiler library
34: .Sh SYNOPSIS
1.25 schwarze 35: .In sys/types.h
1.43 ! schwarze 36: .In stdio.h
1.1 kristaps 37: .In mandoc.h
1.31 schwarze 38: .Pp
1.24 schwarze 39: .Fd "#define ASCII_NBRSP"
40: .Fd "#define ASCII_HYPH"
41: .Fd "#define ASCII_BREAK"
1.25 schwarze 42: .Ft struct mparse *
1.1 kristaps 43: .Fo mparse_alloc
1.25 schwarze 44: .Fa "int options"
1.40 schwarze 45: .Fa "enum mandoc_os oe_e"
46: .Fa "char *os_s"
1.1 kristaps 47: .Fc
48: .Ft void
49: .Fo mparse_free
50: .Fa "struct mparse *parse"
51: .Fc
1.42 schwarze 52: .Ft void
53: .Fo mparse_copy
1.14 kristaps 54: .Fa "const struct mparse *parse"
55: .Fc
1.35 schwarze 56: .Ft int
1.26 schwarze 57: .Fo mparse_open
58: .Fa "struct mparse *parse"
59: .Fa "const char *fname"
60: .Fc
1.43 ! schwarze 61: .Ft void
1.1 kristaps 62: .Fo mparse_readfd
63: .Fa "struct mparse *parse"
64: .Fa "int fd"
65: .Fa "const char *fname"
66: .Fc
67: .Ft void
68: .Fo mparse_reset
69: .Fa "struct mparse *parse"
70: .Fc
71: .Ft void
72: .Fo mparse_result
73: .Fa "struct mparse *parse"
1.37 schwarze 74: .Fa "struct roff_man **man"
1.25 schwarze 75: .Fa "char **sodest"
1.2 kristaps 76: .Fc
1.37 schwarze 77: .In roff.h
78: .Ft void
79: .Fo deroff
80: .Fa "char **dest"
81: .Fa "const struct roff_node *node"
82: .Fc
1.25 schwarze 83: .In sys/types.h
1.24 schwarze 84: .In mandoc.h
85: .In mdoc.h
1.37 schwarze 86: .Vt extern const char * const * mdoc_argnames;
87: .Vt extern const char * const * mdoc_macronames;
1.25 schwarze 88: .Ft void
1.37 schwarze 89: .Fo mdoc_validate
90: .Fa "struct roff_man *mdoc"
1.24 schwarze 91: .Fc
1.25 schwarze 92: .In sys/types.h
1.24 schwarze 93: .In mandoc.h
94: .In man.h
1.37 schwarze 95: .Vt extern const char * const * man_macronames;
96: .Ft void
97: .Fo man_validate
98: .Fa "struct roff_man *man"
1.24 schwarze 99: .Fc
1.1 kristaps 100: .Sh DESCRIPTION
101: The
102: .Nm mandoc
103: library parses a
104: .Ux
105: manual into an abstract syntax tree (AST).
106: .Ux
107: manuals are composed of
108: .Xr mdoc 7
109: or
110: .Xr man 7 ,
111: and may be mixed with
112: .Xr roff 7 ,
113: .Xr tbl 7 ,
114: and
115: .Xr eqn 7
116: invocations.
117: .Pp
118: The following describes a general parse sequence:
119: .Bl -enum
120: .It
121: initiate a parsing sequence with
1.27 schwarze 122: .Xr mchars_alloc 3
123: and
1.1 kristaps 124: .Fn mparse_alloc ;
125: .It
1.31 schwarze 126: open a file with
127: .Xr open 2
128: or
129: .Fn mparse_open ;
130: .It
131: parse it with
1.1 kristaps 132: .Fn mparse_readfd ;
133: .It
1.34 schwarze 134: close it with
135: .Xr close 2 ;
136: .It
1.31 schwarze 137: retrieve the syntax tree with
1.1 kristaps 138: .Fn mparse_result ;
139: .It
1.37 schwarze 140: depending on whether the
141: .Fa macroset
142: member of the returned
143: .Vt struct roff_man
144: is
145: .Dv MACROSET_MDOC
146: or
147: .Dv MACROSET_MAN ,
148: validate it with
149: .Fn mdoc_validate
1.1 kristaps 150: or
1.37 schwarze 151: .Fn man_validate ,
152: respectively;
153: .It
1.38 schwarze 154: if information about the validity of the input is needed, fetch it with
155: .Fn mparse_updaterc ;
156: .It
1.37 schwarze 157: iterate over parse nodes with starting from the
158: .Fa first
159: member of the returned
160: .Vt struct roff_man ;
1.1 kristaps 161: .It
162: free all allocated memory with
1.27 schwarze 163: .Fn mparse_free
164: and
165: .Xr mchars_free 3 ,
1.1 kristaps 166: or invoke
167: .Fn mparse_reset
1.37 schwarze 168: and go back to step 2 to parse new files.
1.3 kristaps 169: .El
170: .Sh REFERENCE
171: This section documents the functions, types, and variables available
172: via
1.25 schwarze 173: .In mandoc.h ,
174: with the exception of those documented in
175: .Xr mandoc_escape 3
176: and
177: .Xr mchars_alloc 3 .
1.3 kristaps 178: .Ss Types
179: .Bl -ohang
180: .It Vt "enum mandocerr"
1.31 schwarze 181: An error or warning message during parsing.
1.3 kristaps 182: .It Vt "enum mandoclevel"
1.11 kristaps 183: A classification of an
1.23 schwarze 184: .Vt "enum mandocerr"
1.11 kristaps 185: as regards system operation.
1.37 schwarze 186: See the DIAGNOSTICS section in
187: .Xr mandoc 1
188: regarding the meanings of the levels.
1.3 kristaps 189: .It Vt "struct mparse"
1.11 kristaps 190: An opaque pointer to a running parse sequence.
191: Created with
192: .Fn mparse_alloc
193: and freed with
194: .Fn mparse_free .
195: This may be used across parsed input if
196: .Fn mparse_reset
197: is called between parses.
1.3 kristaps 198: .El
199: .Ss Functions
200: .Bl -ohang
1.37 schwarze 201: .It Fn deroff
1.25 schwarze 202: Obtain a text-only representation of a
1.37 schwarze 203: .Vt struct roff_node ,
1.25 schwarze 204: including text contained in its child nodes.
1.37 schwarze 205: To be used on children of the
206: .Fa first
207: member of
208: .Vt struct roff_man .
1.25 schwarze 209: When it is no longer needed, the pointer returned from
1.37 schwarze 210: .Fn deroff
1.25 schwarze 211: can be passed to
212: .Xr free 3 .
1.37 schwarze 213: .It Fn man_validate
214: Validate the
215: .Dv MACROSET_MAN
216: parse tree obtained with
1.4 kristaps 217: .Fn mparse_result .
1.18 schwarze 218: Declared in
219: .In man.h ,
220: implemented in
221: .Pa man.c .
1.37 schwarze 222: .It Fn mdoc_validate
223: Validate the
224: .Dv MACROSET_MDOC
225: parse tree obtained with
1.4 kristaps 226: .Fn mparse_result .
1.18 schwarze 227: Declared in
228: .In mdoc.h ,
229: implemented in
230: .Pa mdoc.c .
1.3 kristaps 231: .It Fn mparse_alloc
1.4 kristaps 232: Allocate a parser.
1.23 schwarze 233: The arguments have the following effect:
234: .Bl -tag -offset 5n -width inttype
1.25 schwarze 235: .It Ar options
236: When the
1.23 schwarze 237: .Dv MPARSE_MDOC
238: or
1.25 schwarze 239: .Dv MPARSE_MAN
240: bit is set, only that parser is used.
241: Otherwise, the document type is automatically detected.
242: .Pp
243: When the
244: .Dv MPARSE_SO
245: bit is set,
246: .Xr roff 7
247: .Ic \&so
248: file inclusion requests are always honoured.
249: Otherwise, if the request is the only content in an input file,
250: only the file name is remembered, to be returned in the
251: .Fa sodest
252: argument of
253: .Fn mparse_result .
254: .Pp
255: When the
256: .Dv MPARSE_QUICK
257: bit is set, parsing is aborted after the NAME section.
258: This is for example useful in
259: .Xr makewhatis 8
260: .Fl Q
261: to quickly build minimal databases.
1.40 schwarze 262: .It Ar os_e
263: Operating system to check base system conventions for.
264: If
265: .Dv MANDOC_OS_OTHER ,
266: the system is automatically detected from
267: .Ic \&Os ,
268: .Fl Ios ,
269: or
270: .Xr uname 3 .
271: .It Ar os_s
1.23 schwarze 272: A default string for the
273: .Xr mdoc 7
1.40 schwarze 274: .Ic \&Os
1.23 schwarze 275: macro, overriding the
276: .Dv OSNAME
277: preprocessor definition and the results of
278: .Xr uname 3 .
1.37 schwarze 279: Passing
280: .Dv NULL
281: sets no default.
1.23 schwarze 282: .El
283: .Pp
1.4 kristaps 284: The same parser may be used for multiple files so long as
285: .Fn mparse_reset
286: is called between parses.
287: .Fn mparse_free
288: must be called to free the memory allocated by this function.
1.18 schwarze 289: Declared in
290: .In mandoc.h ,
291: implemented in
292: .Pa read.c .
1.3 kristaps 293: .It Fn mparse_free
1.4 kristaps 294: Free all memory allocated by
295: .Fn mparse_alloc .
1.18 schwarze 296: Declared in
297: .In mandoc.h ,
298: implemented in
299: .Pa read.c .
1.42 schwarze 300: .It Fn mparse_copy
301: Dump a copy of the input to the standard output; used for
302: .Fl man T Ns Cm man .
1.18 schwarze 303: Declared in
304: .In mandoc.h ,
305: implemented in
306: .Pa read.c .
1.26 schwarze 307: .It Fn mparse_open
1.32 schwarze 308: Open the file for reading.
309: If that fails and
1.26 schwarze 310: .Fa fname
1.32 schwarze 311: does not already end in
312: .Ql .gz ,
313: try again after appending
314: .Ql .gz .
315: Save the information whether the file is zipped or not.
1.35 schwarze 316: Return a file descriptor open for reading or -1 on failure.
1.26 schwarze 317: It can be passed to
318: .Fn mparse_readfd
319: or used directly.
320: Declared in
321: .In mandoc.h ,
322: implemented in
323: .Pa read.c .
1.3 kristaps 324: .It Fn mparse_readfd
1.30 schwarze 325: Parse a file descriptor opened with
326: .Xr open 2
327: or
1.29 schwarze 328: .Fn mparse_open .
1.30 schwarze 329: Pass the associated filename in
330: .Va fname .
1.29 schwarze 331: This function may be called multiple times with different parameters; however,
1.34 schwarze 332: .Xr close 2
333: and
1.4 kristaps 334: .Fn mparse_reset
335: should be invoked between parses.
1.18 schwarze 336: Declared in
337: .In mandoc.h ,
338: implemented in
339: .Pa read.c .
1.3 kristaps 340: .It Fn mparse_reset
1.4 kristaps 341: Reset a parser so that
342: .Fn mparse_readfd
343: may be used again.
1.18 schwarze 344: Declared in
345: .In mandoc.h ,
346: implemented in
347: .Pa read.c .
1.3 kristaps 348: .It Fn mparse_result
1.4 kristaps 349: Obtain the result of a parse.
1.37 schwarze 350: One of the two pointers will be filled in.
1.18 schwarze 351: Declared in
352: .In mandoc.h ,
353: implemented in
354: .Pa read.c .
1.3 kristaps 355: .El
356: .Ss Variables
357: .Bl -ohang
358: .It Va man_macronames
1.37 schwarze 359: The string representation of a
360: .Xr man 7
361: macro as indexed by
1.4 kristaps 362: .Vt "enum mant" .
1.3 kristaps 363: .It Va mdoc_argnames
1.37 schwarze 364: The string representation of an
365: .Xr mdoc 7
366: macro argument as indexed by
1.4 kristaps 367: .Vt "enum mdocargt" .
1.3 kristaps 368: .It Va mdoc_macronames
1.37 schwarze 369: The string representation of an
370: .Xr mdoc 7
371: macro as indexed by
1.4 kristaps 372: .Vt "enum mdoct" .
1.1 kristaps 373: .El
374: .Sh IMPLEMENTATION NOTES
375: This section consists of structural documentation for
376: .Xr mdoc 7
377: and
378: .Xr man 7
1.11 kristaps 379: syntax trees and strings.
380: .Ss Man and Mdoc Strings
381: Strings may be extracted from mdoc and man meta-data, or from text
382: nodes (MDOC_TEXT and MAN_TEXT, respectively).
383: These strings have special non-printing formatting cues embedded in the
384: text itself, as well as
385: .Xr roff 7
386: escapes preserved from input.
387: Implementing systems will need to handle both situations to produce
388: human-readable text.
389: In general, strings may be assumed to consist of 7-bit ASCII characters.
390: .Pp
391: The following non-printing characters may be embedded in text strings:
392: .Bl -tag -width Ds
393: .It Dv ASCII_NBRSP
394: A non-breaking space character.
395: .It Dv ASCII_HYPH
396: A soft hyphen.
1.25 schwarze 397: .It Dv ASCII_BREAK
398: A breakable zero-width space.
1.11 kristaps 399: .El
400: .Pp
401: Escape characters are also passed verbatim into text strings.
402: An escape character is a sequence of characters beginning with the
403: backslash
404: .Pq Sq \e .
405: To construct human-readable text, these should be intercepted with
1.25 schwarze 406: .Xr mandoc_escape 3
407: and converted with one the functions described in
408: .Xr mchars_alloc 3 .
1.1 kristaps 409: .Ss Man Abstract Syntax Tree
410: This AST is governed by the ontological rules dictated in
411: .Xr man 7
412: and derives its terminology accordingly.
413: .Pp
414: The AST is composed of
1.37 schwarze 415: .Vt struct roff_node
1.1 kristaps 416: nodes with element, root and text types as declared by the
417: .Va type
418: field.
419: Each node also provides its parse point (the
420: .Va line ,
1.37 schwarze 421: .Va pos ,
1.1 kristaps 422: and
1.37 schwarze 423: .Va sec
1.1 kristaps 424: fields), its position in the tree (the
425: .Va parent ,
426: .Va child ,
427: .Va next
428: and
429: .Va prev
430: fields) and some type-specific data.
431: .Pp
432: The tree itself is arranged according to the following normal form,
433: where capitalised non-terminals represent nodes.
434: .Pp
435: .Bl -tag -width "ELEMENTXX" -compact
436: .It ROOT
437: \(<- mnode+
438: .It mnode
439: \(<- ELEMENT | TEXT | BLOCK
440: .It BLOCK
441: \(<- HEAD BODY
442: .It HEAD
443: \(<- mnode*
444: .It BODY
445: \(<- mnode*
446: .It ELEMENT
447: \(<- ELEMENT | TEXT*
448: .It TEXT
1.11 kristaps 449: \(<- [[:ascii:]]*
1.1 kristaps 450: .El
451: .Pp
452: The only elements capable of nesting other elements are those with
1.25 schwarze 453: next-line scope as documented in
1.1 kristaps 454: .Xr man 7 .
455: .Ss Mdoc Abstract Syntax Tree
456: This AST is governed by the ontological
457: rules dictated in
458: .Xr mdoc 7
459: and derives its terminology accordingly.
460: .Qq In-line
461: elements described in
462: .Xr mdoc 7
463: are described simply as
464: .Qq elements .
465: .Pp
466: The AST is composed of
1.37 schwarze 467: .Vt struct roff_node
1.1 kristaps 468: nodes with block, head, body, element, root and text types as declared
469: by the
470: .Va type
471: field.
472: Each node also provides its parse point (the
473: .Va line ,
1.37 schwarze 474: .Va pos ,
1.1 kristaps 475: and
1.37 schwarze 476: .Va sec
1.1 kristaps 477: fields), its position in the tree (the
478: .Va parent ,
479: .Va child ,
1.36 schwarze 480: .Va last ,
1.1 kristaps 481: .Va next
482: and
483: .Va prev
484: fields) and some type-specific data, in particular, for nodes generated
485: from macros, the generating macro in the
486: .Va tok
487: field.
488: .Pp
489: The tree itself is arranged according to the following normal form,
490: where capitalised non-terminals represent nodes.
491: .Pp
492: .Bl -tag -width "ELEMENTXX" -compact
493: .It ROOT
494: \(<- mnode+
495: .It mnode
496: \(<- BLOCK | ELEMENT | TEXT
497: .It BLOCK
498: \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
499: .It ELEMENT
500: \(<- TEXT*
501: .It HEAD
502: \(<- mnode*
503: .It BODY
504: \(<- mnode* [ENDBODY mnode*]
505: .It TAIL
506: \(<- mnode*
507: .It TEXT
1.11 kristaps 508: \(<- [[:ascii:]]*
1.1 kristaps 509: .El
510: .Pp
511: Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
512: the BLOCK production: these refer to punctuation marks.
513: Furthermore, although a TEXT node will generally have a non-zero-length
514: string, in the specific case of
515: .Sq \&.Bd \-literal ,
516: an empty line will produce a zero-length string.
517: Multiple body parts are only found in invocations of
518: .Sq \&Bl \-column ,
519: where a new body introduces a new phrase.
520: .Pp
521: The
522: .Xr mdoc 7
1.5 kristaps 523: syntax tree accommodates for broken block structures as well.
1.1 kristaps 524: The ENDBODY node is available to end the formatting associated
525: with a given block before the physical end of that block.
526: It has a non-null
527: .Va end
528: field, is of the BODY
529: .Va type ,
530: has the same
531: .Va tok
532: as the BLOCK it is ending, and has a
533: .Va pending
534: field pointing to that BLOCK's BODY node.
535: It is an indirect child of that BODY node
536: and has no children of its own.
537: .Pp
538: An ENDBODY node is generated when a block ends while one of its child
539: blocks is still open, like in the following example:
540: .Bd -literal -offset indent
541: \&.Ao ao
542: \&.Bo bo ac
543: \&.Ac bc
544: \&.Bc end
545: .Ed
546: .Pp
547: This example results in the following block structure:
548: .Bd -literal -offset indent
549: BLOCK Ao
550: HEAD Ao
551: BODY Ao
552: TEXT ao
553: BLOCK Bo, pending -> Ao
554: HEAD Bo
555: BODY Bo
556: TEXT bo
557: TEXT ac
558: ENDBODY Ao, pending -> Ao
559: TEXT bc
560: TEXT end
561: .Ed
562: .Pp
563: Here, the formatting of the
1.40 schwarze 564: .Ic \&Ao
1.1 kristaps 565: block extends from TEXT ao to TEXT ac,
566: while the formatting of the
1.40 schwarze 567: .Ic \&Bo
1.1 kristaps 568: block extends from TEXT bo to TEXT bc.
569: It renders as follows in
570: .Fl T Ns Cm ascii
571: mode:
572: .Pp
573: .Dl <ao [bo ac> bc] end
574: .Pp
575: Support for badly-nested blocks is only provided for backward
576: compatibility with some older
577: .Xr mdoc 7
578: implementations.
579: Using badly-nested blocks is
580: .Em strongly discouraged ;
581: for example, the
582: .Fl T Ns Cm html
1.39 schwarze 583: front-end to
1.1 kristaps 584: .Xr mandoc 1
1.39 schwarze 585: is unable to render them in any meaningful way.
1.1 kristaps 586: Furthermore, behaviour when encountering badly-nested blocks is not
1.25 schwarze 587: consistent across troff implementations, especially when using multiple
1.1 kristaps 588: levels of badly-nested blocks.
589: .Sh SEE ALSO
590: .Xr mandoc 1 ,
1.37 schwarze 591: .Xr man.cgi 3 ,
1.25 schwarze 592: .Xr mandoc_escape 3 ,
1.37 schwarze 593: .Xr mandoc_headers 3 ,
1.25 schwarze 594: .Xr mandoc_malloc 3 ,
1.37 schwarze 595: .Xr mansearch 3 ,
1.25 schwarze 596: .Xr mchars_alloc 3 ,
1.37 schwarze 597: .Xr tbl 3 ,
1.1 kristaps 598: .Xr eqn 7 ,
599: .Xr man 7 ,
1.6 kristaps 600: .Xr mandoc_char 7 ,
1.1 kristaps 601: .Xr mdoc 7 ,
602: .Xr roff 7 ,
603: .Xr tbl 7
604: .Sh AUTHORS
1.37 schwarze 605: .An -nosplit
1.1 kristaps 606: The
607: .Nm
608: library was written by
1.37 schwarze 609: .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
610: and is maintained by
611: .An Ingo Schwarze Aq Mt schwarze@openbsd.org .
CVSweb