Annotation of mandoc/mdoc.3, Revision 1.46
1.46 ! kristaps 1: .\" $Id: mdoc.3,v 1.45 2010/06/29 19:20:38 schwarze Exp $
1.6 kristaps 2: .\"
1.37 kristaps 3: .\" Copyright (c) 2009-2010 Kristaps Dzonsons <kristaps@bsd.lv>
1.6 kristaps 4: .\"
5: .\" Permission to use, copy, modify, and distribute this software for any
1.28 kristaps 6: .\" purpose with or without fee is hereby granted, provided that the above
7: .\" copyright notice and this permission notice appear in all copies.
1.6 kristaps 8: .\"
1.28 kristaps 9: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
1.33 kristaps 16: .\"
1.46 ! kristaps 17: .Dd $Mdocdate: June 29 2010 $
1.27 kristaps 18: .Dt MDOC 3
1.1 kristaps 19: .Os
20: .Sh NAME
1.39 kristaps 21: .Nm mdoc ,
1.1 kristaps 22: .Nm mdoc_alloc ,
23: .Nm mdoc_endparse ,
1.38 kristaps 24: .Nm mdoc_free ,
25: .Nm mdoc_meta ,
1.4 kristaps 26: .Nm mdoc_node ,
1.38 kristaps 27: .Nm mdoc_parseln ,
1.20 kristaps 28: .Nm mdoc_reset
1.2 kristaps 29: .Nd mdoc macro compiler library
1.1 kristaps 30: .Sh SYNOPSIS
1.38 kristaps 31: .In mandoc.h
1.42 kristaps 32: .In regs.h
1.35 kristaps 33: .In mdoc.h
1.4 kristaps 34: .Vt extern const char * const * mdoc_macronames;
35: .Vt extern const char * const * mdoc_argnames;
1.1 kristaps 36: .Ft "struct mdoc *"
1.43 kristaps 37: .Fo mdoc_alloc
1.44 kristaps 38: .Fa "struct regset *regs"
1.43 kristaps 39: .Fa "void *data"
40: .Fa "int pflags"
41: .Fa "mandocmsg msgs"
42: .Fc
1.26 kristaps 43: .Ft int
1.38 kristaps 44: .Fn mdoc_endparse "struct mdoc *mdoc"
1.1 kristaps 45: .Ft void
1.2 kristaps 46: .Fn mdoc_free "struct mdoc *mdoc"
1.38 kristaps 47: .Ft "const struct mdoc_meta *"
48: .Fn mdoc_meta "const struct mdoc *mdoc"
49: .Ft "const struct mdoc_node *"
50: .Fn mdoc_node "const struct mdoc *mdoc"
1.1 kristaps 51: .Ft int
1.42 kristaps 52: .Fo mdoc_parseln
53: .Fa "struct mdoc *mdoc"
54: .Fa "int line"
55: .Fa "char *buf"
56: .Fc
1.1 kristaps 57: .Ft int
1.38 kristaps 58: .Fn mdoc_reset "struct mdoc *mdoc"
1.1 kristaps 59: .Sh DESCRIPTION
60: The
61: .Nm mdoc
1.33 kristaps 62: library parses lines of
1.17 kristaps 63: .Xr mdoc 7
1.38 kristaps 64: input
65: into an abstract syntax tree (AST).
1.6 kristaps 66: .Pp
1.1 kristaps 67: In general, applications initiate a parsing sequence with
68: .Fn mdoc_alloc ,
1.33 kristaps 69: parse each line in a document with
1.1 kristaps 70: .Fn mdoc_parseln ,
71: close the parsing session with
72: .Fn mdoc_endparse ,
73: operate over the syntax tree returned by
1.33 kristaps 74: .Fn mdoc_node
1.4 kristaps 75: and
76: .Fn mdoc_meta ,
1.1 kristaps 77: then free all allocated memory with
78: .Fn mdoc_free .
1.20 kristaps 79: The
80: .Fn mdoc_reset
81: function may be used in order to reset the parser for another input
1.38 kristaps 82: sequence.
83: See the
1.1 kristaps 84: .Sx EXAMPLES
1.38 kristaps 85: section for a simple example.
1.2 kristaps 86: .Pp
1.33 kristaps 87: This section further defines the
1.6 kristaps 88: .Sx Types ,
1.33 kristaps 89: .Sx Functions
1.6 kristaps 90: and
91: .Sx Variables
1.38 kristaps 92: available to programmers.
93: Following that, the
1.33 kristaps 94: .Sx Abstract Syntax Tree
1.17 kristaps 95: section documents the output tree.
1.6 kristaps 96: .Ss Types
97: Both functions (see
98: .Sx Functions )
99: and variables (see
100: .Sx Variables )
101: may use the following types:
1.37 kristaps 102: .Bl -ohang
1.6 kristaps 103: .It Vt struct mdoc
104: An opaque type defined in
105: .Pa mdoc.c .
106: Its values are only used privately within the library.
107: .It Vt struct mdoc_node
1.38 kristaps 108: A parsed node.
109: Defined in
1.6 kristaps 110: .Pa mdoc.h .
1.33 kristaps 111: See
1.6 kristaps 112: .Sx Abstract Syntax Tree
113: for details.
1.38 kristaps 114: .It Vt mandocmsg
115: A function callback type defined in
116: .Pa mandoc.h .
1.6 kristaps 117: .El
118: .Ss Functions
1.2 kristaps 119: Function descriptions follow:
1.37 kristaps 120: .Bl -ohang
1.2 kristaps 121: .It Fn mdoc_alloc
1.38 kristaps 122: Allocates a parsing structure.
123: The
1.2 kristaps 124: .Fa data
1.40 kristaps 125: pointer is passed to
126: .Fa msgs .
1.20 kristaps 127: The
128: .Fa pflags
129: arguments are defined in
130: .Pa mdoc.h .
1.38 kristaps 131: Returns NULL on failure.
132: If non-NULL, the pointer must be freed with
1.2 kristaps 133: .Fn mdoc_free .
1.20 kristaps 134: .It Fn mdoc_reset
1.38 kristaps 135: Reset the parser for another parse routine.
136: After its use,
1.20 kristaps 137: .Fn mdoc_parseln
1.38 kristaps 138: behaves as if invoked for the first time.
139: If it returns 0, memory could not be allocated.
1.2 kristaps 140: .It Fn mdoc_free
1.38 kristaps 141: Free all resources of a parser.
142: The pointer is no longer valid after invocation.
1.2 kristaps 143: .It Fn mdoc_parseln
1.38 kristaps 144: Parse a nil-terminated line of input.
145: This line should not contain the trailing newline.
146: Returns 0 on failure, 1 on success.
147: The input buffer
1.2 kristaps 148: .Fa buf
149: is modified by this function.
150: .It Fn mdoc_endparse
1.38 kristaps 151: Signals that the parse is complete.
152: Note that if
1.2 kristaps 153: .Fn mdoc_endparse
154: is called subsequent to
1.4 kristaps 155: .Fn mdoc_node ,
1.38 kristaps 156: the resulting tree is incomplete.
157: Returns 0 on failure, 1 on success.
1.4 kristaps 158: .It Fn mdoc_node
1.38 kristaps 159: Returns the first node of the parse.
160: Note that if
1.2 kristaps 161: .Fn mdoc_parseln
162: or
163: .Fn mdoc_endparse
164: return 0, the tree will be incomplete.
1.4 kristaps 165: .It Fn mdoc_meta
1.38 kristaps 166: Returns the document's parsed meta-data.
167: If this information has not yet been supplied or
1.4 kristaps 168: .Fn mdoc_parseln
169: or
170: .Fn mdoc_endparse
171: return 0, the data will be incomplete.
172: .El
1.6 kristaps 173: .Ss Variables
1.4 kristaps 174: The following variables are also defined:
1.37 kristaps 175: .Bl -ohang
1.4 kristaps 176: .It Va mdoc_macronames
177: An array of string-ified token names.
178: .It Va mdoc_argnames
179: An array of string-ified token argument names.
1.2 kristaps 180: .El
1.6 kristaps 181: .Ss Abstract Syntax Tree
1.33 kristaps 182: The
1.6 kristaps 183: .Nm
1.17 kristaps 184: functions produce an abstract syntax tree (AST) describing input in a
1.38 kristaps 185: regular form.
186: It may be reviewed at any time with
1.6 kristaps 187: .Fn mdoc_nodes ;
188: however, if called before
189: .Fn mdoc_endparse ,
190: or after
1.33 kristaps 191: .Fn mdoc_endparse
1.6 kristaps 192: or
193: .Fn mdoc_parseln
1.33 kristaps 194: fail, it may be incomplete.
1.18 kristaps 195: .Pp
196: This AST is governed by the ontological
1.17 kristaps 197: rules dictated in
198: .Xr mdoc 7
1.33 kristaps 199: and derives its terminology accordingly.
1.17 kristaps 200: .Qq In-line
201: elements described in
202: .Xr mdoc 7
1.33 kristaps 203: are described simply as
1.17 kristaps 204: .Qq elements .
1.6 kristaps 205: .Pp
1.33 kristaps 206: The AST is composed of
1.6 kristaps 207: .Vt struct mdoc_node
208: nodes with block, head, body, element, root and text types as declared
209: by the
210: .Va type
1.38 kristaps 211: field.
212: Each node also provides its parse point (the
1.6 kristaps 213: .Va line ,
214: .Va sec ,
215: and
216: .Va pos
217: fields), its position in the tree (the
218: .Va parent ,
219: .Va child ,
1.45 schwarze 220: .Va nchild ,
1.33 kristaps 221: .Va next
1.6 kristaps 222: and
1.33 kristaps 223: .Va prev
1.45 schwarze 224: fields) and some type-specific data, in particular, for nodes generated
225: from macros, the generating macro in the
226: .Va tok
227: field.
1.6 kristaps 228: .Pp
229: The tree itself is arranged according to the following normal form,
230: where capitalised non-terminals represent nodes.
231: .Pp
1.37 kristaps 232: .Bl -tag -width "ELEMENTXX" -compact
1.6 kristaps 233: .It ROOT
234: \(<- mnode+
235: .It mnode
236: \(<- BLOCK | ELEMENT | TEXT
237: .It BLOCK
1.41 kristaps 238: \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
1.6 kristaps 239: .It ELEMENT
240: \(<- TEXT*
241: .It HEAD
1.45 schwarze 242: \(<- mnode*
1.6 kristaps 243: .It BODY
1.45 schwarze 244: \(<- mnode* [ENDBODY mnode*]
1.6 kristaps 245: .It TAIL
1.45 schwarze 246: \(<- mnode*
1.6 kristaps 247: .It TEXT
1.38 kristaps 248: \(<- [[:printable:],0x1e]*
1.6 kristaps 249: .El
1.2 kristaps 250: .Pp
1.6 kristaps 251: Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
1.41 kristaps 252: the BLOCK production: these refer to punctuation marks.
1.38 kristaps 253: Furthermore, although a TEXT node will generally have a non-zero-length
254: string, in the specific case of
1.8 kristaps 255: .Sq \&.Bd \-literal ,
1.6 kristaps 256: an empty line will produce a zero-length string.
1.41 kristaps 257: Multiple body parts are only found in invocations of
258: .Sq \&Bl \-column ,
259: where a new body introduces a new phrase.
1.46 ! kristaps 260: .Ss Badly-nested Blocks
! 261: The ENDBODY node is available to end the formatting associated
! 262: with a given block before the physical end of that block.
! 263: It has a non-null
1.45 schwarze 264: .Va end
265: field, is of the BODY
266: .Va type ,
267: has the same
268: .Va tok
269: as the BLOCK it is ending, and has a
270: .Va pending
271: field pointing to that BLOCK's BODY node.
272: It is an indirect child of that BODY node
273: and has no children of its own.
274: .Pp
275: An ENDBODY node is generated when a block ends while one of its child
276: blocks is still open, like in the following example:
277: .Bd -literal -offset indent
278: \&.Ao ao
279: \&.Bo bo ac
280: \&.Ac bc
281: \&.Bc end
282: .Ed
283: .Pp
284: This example results in the following block structure:
285: .Bd -literal -offset indent
286: BLOCK Ao
287: HEAD Ao
288: BODY Ao
289: TEXT ao
290: BLOCK Bo, pending -> Ao
291: HEAD Bo
292: BODY Bo
293: TEXT bo
294: TEXT ac
295: ENDBODY Ao, pending -> Ao
296: TEXT bc
297: TEXT end
298: .Ed
299: .Pp
1.46 ! kristaps 300: Here, the formatting of the
! 301: .Sq \&Ao
! 302: block extends from TEXT ao to TEXT ac,
! 303: while the formatting of the
! 304: .Sq \&Bo
! 305: block extends from TEXT bo to TEXT bc.
! 306: It renders as follows in
1.45 schwarze 307: .Fl T Ns Cm ascii
308: mode:
1.46 ! kristaps 309: .Pp
1.45 schwarze 310: .Dl <ao [bo ac> bc] end
1.46 ! kristaps 311: .Pp
! 312: Support for badly-nested blocks is only provided for backward
1.45 schwarze 313: compatibility with some older
314: .Xr mdoc 7
315: implementations.
1.46 ! kristaps 316: Using badly-nested blocks is
! 317: .Em strongly discouraged :
! 318: the
! 319: .Fl T Ns Cm html
! 320: and
! 321: .Fl T Ns Cm xhtml
! 322: front-ends are unable to render them in any meaningful way.
! 323: Furthermore, behaviour when encountering badly-nested blocks is not
! 324: consistent across troff implementations, especially when using multiple
! 325: levels of badly-nested blocks.
1.2 kristaps 326: .Sh EXAMPLES
327: The following example reads lines from stdin and parses them, operating
1.33 kristaps 328: on the finished parse tree with
1.2 kristaps 329: .Fn parsed .
1.37 kristaps 330: This example does not error-check nor free memory upon failure.
331: .Bd -literal -offset indent
1.44 kristaps 332: struct regset regs;
1.2 kristaps 333: struct mdoc *mdoc;
1.31 kristaps 334: const struct mdoc_node *node;
1.2 kristaps 335: char *buf;
336: size_t len;
337: int line;
338:
1.44 kristaps 339: bzero(®s, sizeof(struct regset));
1.2 kristaps 340: line = 1;
1.44 kristaps 341: mdoc = mdoc_alloc(®s, NULL, 0, NULL);
1.37 kristaps 342: buf = NULL;
343: alloc_len = 0;
1.2 kristaps 344:
1.37 kristaps 345: while ((len = getline(&buf, &alloc_len, stdin)) >= 0) {
346: if (len && buflen[len - 1] = '\en')
347: buf[len - 1] = '\e0';
348: if ( ! mdoc_parseln(mdoc, line, buf))
349: errx(1, "mdoc_parseln");
350: line++;
1.2 kristaps 351: }
352:
353: if ( ! mdoc_endparse(mdoc))
1.37 kristaps 354: errx(1, "mdoc_endparse");
1.4 kristaps 355: if (NULL == (node = mdoc_node(mdoc)))
1.37 kristaps 356: errx(1, "mdoc_node");
1.2 kristaps 357:
358: parsed(mdoc, node);
359: mdoc_free(mdoc);
360: .Ed
1.38 kristaps 361: .Pp
362: Please see
363: .Pa main.c
364: in the source archive for a rigorous reference.
1.17 kristaps 365: .Sh SEE ALSO
1.20 kristaps 366: .Xr mandoc 1 ,
1.14 kristaps 367: .Xr mdoc 7
1.2 kristaps 368: .Sh AUTHORS
369: The
370: .Nm
1.38 kristaps 371: library was written by
1.37 kristaps 372: .An Kristaps Dzonsons Aq kristaps@bsd.lv .
CVSweb