Annotation of mandoc/mandoc.3, Revision 1.46
1.46 ! schwarze 1: .\" $Id: mandoc.3,v 1.45 2025/02/25 16:17:09 schwarze Exp $
1.1 kristaps 2: .\"
3: .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
1.38 schwarze 4: .\" Copyright (c) 2010-2017 Ingo Schwarze <schwarze@openbsd.org>
1.1 kristaps 5: .\"
6: .\" Permission to use, copy, modify, and distribute this software for any
7: .\" purpose with or without fee is hereby granted, provided that the above
8: .\" copyright notice and this permission notice appear in all copies.
9: .\"
10: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17: .\"
1.46 ! schwarze 18: .Dd $Mdocdate: February 25 2025 $
1.1 kristaps 19: .Dt MANDOC 3
20: .Os
21: .Sh NAME
22: .Nm mandoc ,
1.37 schwarze 23: .Nm deroff ,
1.1 kristaps 24: .Nm mparse_alloc ,
1.42 schwarze 25: .Nm mparse_copy ,
1.1 kristaps 26: .Nm mparse_free ,
1.26 schwarze 27: .Nm mparse_open ,
1.1 kristaps 28: .Nm mparse_readfd ,
29: .Nm mparse_reset ,
1.43 schwarze 30: .Nm mparse_result
1.1 kristaps 31: .Nd mandoc macro compiler library
32: .Sh SYNOPSIS
1.25 schwarze 33: .In sys/types.h
1.43 schwarze 34: .In stdio.h
1.1 kristaps 35: .In mandoc.h
1.45 schwarze 36: .In roff.h
37: .In mandoc_parse.h
1.31 schwarze 38: .Pp
1.24 schwarze 39: .Fd "#define ASCII_NBRSP"
40: .Fd "#define ASCII_HYPH"
41: .Fd "#define ASCII_BREAK"
1.25 schwarze 42: .Ft struct mparse *
1.1 kristaps 43: .Fo mparse_alloc
1.25 schwarze 44: .Fa "int options"
1.40 schwarze 45: .Fa "enum mandoc_os oe_e"
46: .Fa "char *os_s"
1.1 kristaps 47: .Fc
48: .Ft void
49: .Fo mparse_free
50: .Fa "struct mparse *parse"
51: .Fc
1.42 schwarze 52: .Ft void
53: .Fo mparse_copy
1.14 kristaps 54: .Fa "const struct mparse *parse"
55: .Fc
1.35 schwarze 56: .Ft int
1.26 schwarze 57: .Fo mparse_open
58: .Fa "struct mparse *parse"
59: .Fa "const char *fname"
60: .Fc
1.43 schwarze 61: .Ft void
1.1 kristaps 62: .Fo mparse_readfd
63: .Fa "struct mparse *parse"
64: .Fa "int fd"
65: .Fa "const char *fname"
66: .Fc
67: .Ft void
68: .Fo mparse_reset
69: .Fa "struct mparse *parse"
70: .Fc
1.44 schwarze 71: .Ft struct roff_meta *
1.1 kristaps 72: .Fo mparse_result
73: .Fa "struct mparse *parse"
1.2 kristaps 74: .Fc
1.37 schwarze 75: .In roff.h
76: .Ft void
77: .Fo deroff
78: .Fa "char **dest"
79: .Fa "const struct roff_node *node"
80: .Fc
1.25 schwarze 81: .In sys/types.h
1.24 schwarze 82: .In mandoc.h
83: .In mdoc.h
1.37 schwarze 84: .Vt extern const char * const * mdoc_argnames;
85: .Vt extern const char * const * mdoc_macronames;
1.25 schwarze 86: .In sys/types.h
1.24 schwarze 87: .In mandoc.h
88: .In man.h
1.37 schwarze 89: .Vt extern const char * const * man_macronames;
1.1 kristaps 90: .Sh DESCRIPTION
91: The
92: .Nm mandoc
93: library parses a
94: .Ux
95: manual into an abstract syntax tree (AST).
96: .Ux
97: manuals are composed of
98: .Xr mdoc 7
99: or
100: .Xr man 7 ,
101: and may be mixed with
102: .Xr roff 7 ,
103: .Xr tbl 7 ,
104: and
105: .Xr eqn 7
106: invocations.
107: .Pp
108: The following describes a general parse sequence:
109: .Bl -enum
110: .It
111: initiate a parsing sequence with
1.27 schwarze 112: .Xr mchars_alloc 3
113: and
1.1 kristaps 114: .Fn mparse_alloc ;
115: .It
1.31 schwarze 116: open a file with
117: .Xr open 2
118: or
119: .Fn mparse_open ;
120: .It
121: parse it with
1.1 kristaps 122: .Fn mparse_readfd ;
123: .It
1.34 schwarze 124: close it with
125: .Xr close 2 ;
126: .It
1.31 schwarze 127: retrieve the syntax tree with
1.1 kristaps 128: .Fn mparse_result ;
129: .It
1.38 schwarze 130: if information about the validity of the input is needed, fetch it with
131: .Fn mparse_updaterc ;
132: .It
1.37 schwarze 133: iterate over parse nodes with starting from the
134: .Fa first
135: member of the returned
1.44 schwarze 136: .Vt struct roff_meta ;
1.1 kristaps 137: .It
138: free all allocated memory with
1.27 schwarze 139: .Fn mparse_free
140: and
141: .Xr mchars_free 3 ,
1.1 kristaps 142: or invoke
143: .Fn mparse_reset
1.37 schwarze 144: and go back to step 2 to parse new files.
1.3 kristaps 145: .El
1.46 ! schwarze 146: .Pp
! 147: The design goals of the
! 148: .Nm mandoc
! 149: library are limited to providing the functionality required by the
! 150: .Xr mandoc 1
! 151: program.
! 152: Consequently, the functions documented in the present manual page
! 153: do not aim for API stability.
! 154: Any third-party program using them typically requires adjustments after every
! 155: .Nm mandoc
! 156: release.
! 157: Linking such a program requires
! 158: .Fl lz
! 159: because
! 160: .Fn mparse_readfd
! 161: calls
! 162: .Xr gzdopen 3 ,
! 163: .Xr gzread 3 ,
! 164: .Xr gzerror 3 ,
! 165: and
! 166: .Xr gzclose 3 .
! 167: For
! 168: .Xr mandoc 1
! 169: itself, the
! 170: .Pa ./configure
! 171: script automatically adds
! 172: .Fl lz
! 173: to the
! 174: .Ev LDADD
! 175: .Xr make 1
! 176: variable.
1.3 kristaps 177: .Sh REFERENCE
178: This section documents the functions, types, and variables available
179: via
1.25 schwarze 180: .In mandoc.h ,
181: with the exception of those documented in
182: .Xr mandoc_escape 3
183: and
184: .Xr mchars_alloc 3 .
1.3 kristaps 185: .Ss Types
186: .Bl -ohang
187: .It Vt "enum mandocerr"
1.31 schwarze 188: An error or warning message during parsing.
1.3 kristaps 189: .It Vt "enum mandoclevel"
1.11 kristaps 190: A classification of an
1.23 schwarze 191: .Vt "enum mandocerr"
1.11 kristaps 192: as regards system operation.
1.37 schwarze 193: See the DIAGNOSTICS section in
194: .Xr mandoc 1
195: regarding the meanings of the levels.
1.3 kristaps 196: .It Vt "struct mparse"
1.11 kristaps 197: An opaque pointer to a running parse sequence.
198: Created with
199: .Fn mparse_alloc
200: and freed with
201: .Fn mparse_free .
202: This may be used across parsed input if
203: .Fn mparse_reset
204: is called between parses.
1.3 kristaps 205: .El
206: .Ss Functions
207: .Bl -ohang
1.37 schwarze 208: .It Fn deroff
1.25 schwarze 209: Obtain a text-only representation of a
1.37 schwarze 210: .Vt struct roff_node ,
1.25 schwarze 211: including text contained in its child nodes.
1.37 schwarze 212: To be used on children of the
213: .Fa first
214: member of
1.44 schwarze 215: .Vt struct roff_meta .
1.25 schwarze 216: When it is no longer needed, the pointer returned from
1.37 schwarze 217: .Fn deroff
1.25 schwarze 218: can be passed to
219: .Xr free 3 .
1.3 kristaps 220: .It Fn mparse_alloc
1.4 kristaps 221: Allocate a parser.
1.23 schwarze 222: The arguments have the following effect:
223: .Bl -tag -offset 5n -width inttype
1.25 schwarze 224: .It Ar options
225: When the
1.23 schwarze 226: .Dv MPARSE_MDOC
227: or
1.25 schwarze 228: .Dv MPARSE_MAN
229: bit is set, only that parser is used.
230: Otherwise, the document type is automatically detected.
231: .Pp
232: When the
233: .Dv MPARSE_SO
234: bit is set,
235: .Xr roff 7
236: .Ic \&so
237: file inclusion requests are always honoured.
238: Otherwise, if the request is the only content in an input file,
239: only the file name is remembered, to be returned in the
240: .Fa sodest
1.44 schwarze 241: field of
242: .Vt struct roff_meta .
1.25 schwarze 243: .Pp
244: When the
245: .Dv MPARSE_QUICK
246: bit is set, parsing is aborted after the NAME section.
247: This is for example useful in
248: .Xr makewhatis 8
249: .Fl Q
250: to quickly build minimal databases.
1.44 schwarze 251: .Pp
252: When the
253: .Dv MARSE_VALIDATE
254: bit is set,
255: .Fn mparse_result
256: runs the validation functions before returning the syntax tree.
257: This is almost always required, except in certain debugging scenarios,
258: for example to dump unvalidated syntax trees.
1.40 schwarze 259: .It Ar os_e
260: Operating system to check base system conventions for.
261: If
262: .Dv MANDOC_OS_OTHER ,
263: the system is automatically detected from
264: .Ic \&Os ,
265: .Fl Ios ,
266: or
267: .Xr uname 3 .
268: .It Ar os_s
1.23 schwarze 269: A default string for the
270: .Xr mdoc 7
1.40 schwarze 271: .Ic \&Os
1.23 schwarze 272: macro, overriding the
273: .Dv OSNAME
274: preprocessor definition and the results of
275: .Xr uname 3 .
1.37 schwarze 276: Passing
277: .Dv NULL
278: sets no default.
1.23 schwarze 279: .El
280: .Pp
1.4 kristaps 281: The same parser may be used for multiple files so long as
282: .Fn mparse_reset
283: is called between parses.
284: .Fn mparse_free
285: must be called to free the memory allocated by this function.
1.18 schwarze 286: Declared in
287: .In mandoc.h ,
288: implemented in
289: .Pa read.c .
1.3 kristaps 290: .It Fn mparse_free
1.4 kristaps 291: Free all memory allocated by
292: .Fn mparse_alloc .
1.18 schwarze 293: Declared in
294: .In mandoc.h ,
295: implemented in
296: .Pa read.c .
1.42 schwarze 297: .It Fn mparse_copy
298: Dump a copy of the input to the standard output; used for
299: .Fl man T Ns Cm man .
1.18 schwarze 300: Declared in
301: .In mandoc.h ,
302: implemented in
303: .Pa read.c .
1.26 schwarze 304: .It Fn mparse_open
1.32 schwarze 305: Open the file for reading.
306: If that fails and
1.26 schwarze 307: .Fa fname
1.32 schwarze 308: does not already end in
309: .Ql .gz ,
310: try again after appending
311: .Ql .gz .
312: Save the information whether the file is zipped or not.
1.35 schwarze 313: Return a file descriptor open for reading or -1 on failure.
1.26 schwarze 314: It can be passed to
315: .Fn mparse_readfd
316: or used directly.
317: Declared in
318: .In mandoc.h ,
319: implemented in
320: .Pa read.c .
1.3 kristaps 321: .It Fn mparse_readfd
1.30 schwarze 322: Parse a file descriptor opened with
323: .Xr open 2
324: or
1.29 schwarze 325: .Fn mparse_open .
1.30 schwarze 326: Pass the associated filename in
327: .Va fname .
1.29 schwarze 328: This function may be called multiple times with different parameters; however,
1.34 schwarze 329: .Xr close 2
330: and
1.4 kristaps 331: .Fn mparse_reset
332: should be invoked between parses.
1.18 schwarze 333: Declared in
334: .In mandoc.h ,
335: implemented in
336: .Pa read.c .
1.3 kristaps 337: .It Fn mparse_reset
1.4 kristaps 338: Reset a parser so that
339: .Fn mparse_readfd
340: may be used again.
1.18 schwarze 341: Declared in
342: .In mandoc.h ,
343: implemented in
344: .Pa read.c .
1.3 kristaps 345: .It Fn mparse_result
1.4 kristaps 346: Obtain the result of a parse.
1.18 schwarze 347: Declared in
348: .In mandoc.h ,
349: implemented in
350: .Pa read.c .
1.3 kristaps 351: .El
352: .Ss Variables
353: .Bl -ohang
354: .It Va man_macronames
1.37 schwarze 355: The string representation of a
356: .Xr man 7
357: macro as indexed by
1.4 kristaps 358: .Vt "enum mant" .
1.3 kristaps 359: .It Va mdoc_argnames
1.37 schwarze 360: The string representation of an
361: .Xr mdoc 7
362: macro argument as indexed by
1.4 kristaps 363: .Vt "enum mdocargt" .
1.3 kristaps 364: .It Va mdoc_macronames
1.37 schwarze 365: The string representation of an
366: .Xr mdoc 7
367: macro as indexed by
1.4 kristaps 368: .Vt "enum mdoct" .
1.1 kristaps 369: .El
370: .Sh IMPLEMENTATION NOTES
371: This section consists of structural documentation for
372: .Xr mdoc 7
373: and
374: .Xr man 7
1.11 kristaps 375: syntax trees and strings.
376: .Ss Man and Mdoc Strings
377: Strings may be extracted from mdoc and man meta-data, or from text
378: nodes (MDOC_TEXT and MAN_TEXT, respectively).
379: These strings have special non-printing formatting cues embedded in the
380: text itself, as well as
381: .Xr roff 7
382: escapes preserved from input.
383: Implementing systems will need to handle both situations to produce
384: human-readable text.
385: In general, strings may be assumed to consist of 7-bit ASCII characters.
386: .Pp
387: The following non-printing characters may be embedded in text strings:
388: .Bl -tag -width Ds
389: .It Dv ASCII_NBRSP
390: A non-breaking space character.
391: .It Dv ASCII_HYPH
392: A soft hyphen.
1.25 schwarze 393: .It Dv ASCII_BREAK
394: A breakable zero-width space.
1.11 kristaps 395: .El
396: .Pp
397: Escape characters are also passed verbatim into text strings.
398: An escape character is a sequence of characters beginning with the
399: backslash
400: .Pq Sq \e .
401: To construct human-readable text, these should be intercepted with
1.25 schwarze 402: .Xr mandoc_escape 3
403: and converted with one the functions described in
404: .Xr mchars_alloc 3 .
1.1 kristaps 405: .Ss Man Abstract Syntax Tree
406: This AST is governed by the ontological rules dictated in
407: .Xr man 7
408: and derives its terminology accordingly.
409: .Pp
410: The AST is composed of
1.37 schwarze 411: .Vt struct roff_node
1.1 kristaps 412: nodes with element, root and text types as declared by the
413: .Va type
414: field.
415: Each node also provides its parse point (the
416: .Va line ,
1.37 schwarze 417: .Va pos ,
1.1 kristaps 418: and
1.37 schwarze 419: .Va sec
1.1 kristaps 420: fields), its position in the tree (the
421: .Va parent ,
422: .Va child ,
423: .Va next
424: and
425: .Va prev
426: fields) and some type-specific data.
427: .Pp
428: The tree itself is arranged according to the following normal form,
429: where capitalised non-terminals represent nodes.
430: .Pp
431: .Bl -tag -width "ELEMENTXX" -compact
432: .It ROOT
433: \(<- mnode+
434: .It mnode
435: \(<- ELEMENT | TEXT | BLOCK
436: .It BLOCK
437: \(<- HEAD BODY
438: .It HEAD
439: \(<- mnode*
440: .It BODY
441: \(<- mnode*
442: .It ELEMENT
443: \(<- ELEMENT | TEXT*
444: .It TEXT
1.11 kristaps 445: \(<- [[:ascii:]]*
1.1 kristaps 446: .El
447: .Pp
448: The only elements capable of nesting other elements are those with
1.25 schwarze 449: next-line scope as documented in
1.1 kristaps 450: .Xr man 7 .
451: .Ss Mdoc Abstract Syntax Tree
452: This AST is governed by the ontological
453: rules dictated in
454: .Xr mdoc 7
455: and derives its terminology accordingly.
456: .Qq In-line
457: elements described in
458: .Xr mdoc 7
459: are described simply as
460: .Qq elements .
461: .Pp
462: The AST is composed of
1.37 schwarze 463: .Vt struct roff_node
1.1 kristaps 464: nodes with block, head, body, element, root and text types as declared
465: by the
466: .Va type
467: field.
468: Each node also provides its parse point (the
469: .Va line ,
1.37 schwarze 470: .Va pos ,
1.1 kristaps 471: and
1.37 schwarze 472: .Va sec
1.1 kristaps 473: fields), its position in the tree (the
474: .Va parent ,
475: .Va child ,
1.36 schwarze 476: .Va last ,
1.1 kristaps 477: .Va next
478: and
479: .Va prev
480: fields) and some type-specific data, in particular, for nodes generated
481: from macros, the generating macro in the
482: .Va tok
483: field.
484: .Pp
485: The tree itself is arranged according to the following normal form,
486: where capitalised non-terminals represent nodes.
487: .Pp
488: .Bl -tag -width "ELEMENTXX" -compact
489: .It ROOT
490: \(<- mnode+
491: .It mnode
492: \(<- BLOCK | ELEMENT | TEXT
493: .It BLOCK
494: \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
495: .It ELEMENT
496: \(<- TEXT*
497: .It HEAD
498: \(<- mnode*
499: .It BODY
500: \(<- mnode* [ENDBODY mnode*]
501: .It TAIL
502: \(<- mnode*
503: .It TEXT
1.11 kristaps 504: \(<- [[:ascii:]]*
1.1 kristaps 505: .El
506: .Pp
507: Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
508: the BLOCK production: these refer to punctuation marks.
509: Furthermore, although a TEXT node will generally have a non-zero-length
510: string, in the specific case of
511: .Sq \&.Bd \-literal ,
512: an empty line will produce a zero-length string.
513: Multiple body parts are only found in invocations of
514: .Sq \&Bl \-column ,
515: where a new body introduces a new phrase.
516: .Pp
517: The
518: .Xr mdoc 7
1.5 kristaps 519: syntax tree accommodates for broken block structures as well.
1.1 kristaps 520: The ENDBODY node is available to end the formatting associated
521: with a given block before the physical end of that block.
522: It has a non-null
523: .Va end
524: field, is of the BODY
525: .Va type ,
526: has the same
527: .Va tok
528: as the BLOCK it is ending, and has a
529: .Va pending
530: field pointing to that BLOCK's BODY node.
531: It is an indirect child of that BODY node
532: and has no children of its own.
533: .Pp
534: An ENDBODY node is generated when a block ends while one of its child
535: blocks is still open, like in the following example:
536: .Bd -literal -offset indent
537: \&.Ao ao
538: \&.Bo bo ac
539: \&.Ac bc
540: \&.Bc end
541: .Ed
542: .Pp
543: This example results in the following block structure:
544: .Bd -literal -offset indent
545: BLOCK Ao
546: HEAD Ao
547: BODY Ao
548: TEXT ao
549: BLOCK Bo, pending -> Ao
550: HEAD Bo
551: BODY Bo
552: TEXT bo
553: TEXT ac
554: ENDBODY Ao, pending -> Ao
555: TEXT bc
556: TEXT end
557: .Ed
558: .Pp
559: Here, the formatting of the
1.40 schwarze 560: .Ic \&Ao
1.1 kristaps 561: block extends from TEXT ao to TEXT ac,
562: while the formatting of the
1.40 schwarze 563: .Ic \&Bo
1.1 kristaps 564: block extends from TEXT bo to TEXT bc.
565: It renders as follows in
566: .Fl T Ns Cm ascii
567: mode:
568: .Pp
569: .Dl <ao [bo ac> bc] end
570: .Pp
571: Support for badly-nested blocks is only provided for backward
572: compatibility with some older
573: .Xr mdoc 7
574: implementations.
575: Using badly-nested blocks is
576: .Em strongly discouraged ;
577: for example, the
578: .Fl T Ns Cm html
1.39 schwarze 579: front-end to
1.1 kristaps 580: .Xr mandoc 1
1.39 schwarze 581: is unable to render them in any meaningful way.
1.1 kristaps 582: Furthermore, behaviour when encountering badly-nested blocks is not
1.25 schwarze 583: consistent across troff implementations, especially when using multiple
1.1 kristaps 584: levels of badly-nested blocks.
585: .Sh SEE ALSO
586: .Xr mandoc 1 ,
1.37 schwarze 587: .Xr man.cgi 3 ,
1.25 schwarze 588: .Xr mandoc_escape 3 ,
1.37 schwarze 589: .Xr mandoc_headers 3 ,
1.25 schwarze 590: .Xr mandoc_malloc 3 ,
1.37 schwarze 591: .Xr mansearch 3 ,
1.25 schwarze 592: .Xr mchars_alloc 3 ,
1.37 schwarze 593: .Xr tbl 3 ,
1.1 kristaps 594: .Xr eqn 7 ,
595: .Xr man 7 ,
1.6 kristaps 596: .Xr mandoc_char 7 ,
1.1 kristaps 597: .Xr mdoc 7 ,
598: .Xr roff 7 ,
599: .Xr tbl 7
600: .Sh AUTHORS
1.37 schwarze 601: .An -nosplit
1.1 kristaps 602: The
603: .Nm
604: library was written by
1.37 schwarze 605: .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
606: and is maintained by
607: .An Ingo Schwarze Aq Mt schwarze@openbsd.org .
CVSweb