Annotation of mandoc/mdoc.3, Revision 1.47
1.47 ! schwarze 1: .\" $Id: mdoc.3,v 1.46 2010/07/01 09:33:39 kristaps Exp $
1.6 kristaps 2: .\"
1.47 ! schwarze 3: .\" Copyright (c) 2009, 2010 Kristaps Dzonsons <kristaps@bsd.lv>
! 4: .\" Copyright (c) 2010 Ingo Schwarze <schwarze@openbsd.org>
1.6 kristaps 5: .\"
6: .\" Permission to use, copy, modify, and distribute this software for any
1.28 kristaps 7: .\" purpose with or without fee is hereby granted, provided that the above
8: .\" copyright notice and this permission notice appear in all copies.
1.6 kristaps 9: .\"
1.28 kristaps 10: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
1.33 kristaps 17: .\"
1.47 ! schwarze 18: .Dd $Mdocdate: July 1 2010 $
1.27 kristaps 19: .Dt MDOC 3
1.1 kristaps 20: .Os
21: .Sh NAME
1.39 kristaps 22: .Nm mdoc ,
1.1 kristaps 23: .Nm mdoc_alloc ,
24: .Nm mdoc_endparse ,
1.38 kristaps 25: .Nm mdoc_free ,
26: .Nm mdoc_meta ,
1.4 kristaps 27: .Nm mdoc_node ,
1.38 kristaps 28: .Nm mdoc_parseln ,
1.20 kristaps 29: .Nm mdoc_reset
1.2 kristaps 30: .Nd mdoc macro compiler library
1.1 kristaps 31: .Sh SYNOPSIS
1.38 kristaps 32: .In mandoc.h
1.42 kristaps 33: .In regs.h
1.35 kristaps 34: .In mdoc.h
1.4 kristaps 35: .Vt extern const char * const * mdoc_macronames;
36: .Vt extern const char * const * mdoc_argnames;
1.1 kristaps 37: .Ft "struct mdoc *"
1.43 kristaps 38: .Fo mdoc_alloc
1.44 kristaps 39: .Fa "struct regset *regs"
1.43 kristaps 40: .Fa "void *data"
41: .Fa "int pflags"
42: .Fa "mandocmsg msgs"
43: .Fc
1.26 kristaps 44: .Ft int
1.38 kristaps 45: .Fn mdoc_endparse "struct mdoc *mdoc"
1.1 kristaps 46: .Ft void
1.2 kristaps 47: .Fn mdoc_free "struct mdoc *mdoc"
1.38 kristaps 48: .Ft "const struct mdoc_meta *"
49: .Fn mdoc_meta "const struct mdoc *mdoc"
50: .Ft "const struct mdoc_node *"
51: .Fn mdoc_node "const struct mdoc *mdoc"
1.1 kristaps 52: .Ft int
1.42 kristaps 53: .Fo mdoc_parseln
54: .Fa "struct mdoc *mdoc"
55: .Fa "int line"
56: .Fa "char *buf"
57: .Fc
1.1 kristaps 58: .Ft int
1.38 kristaps 59: .Fn mdoc_reset "struct mdoc *mdoc"
1.1 kristaps 60: .Sh DESCRIPTION
61: The
62: .Nm mdoc
1.33 kristaps 63: library parses lines of
1.17 kristaps 64: .Xr mdoc 7
1.38 kristaps 65: input
66: into an abstract syntax tree (AST).
1.6 kristaps 67: .Pp
1.1 kristaps 68: In general, applications initiate a parsing sequence with
69: .Fn mdoc_alloc ,
1.33 kristaps 70: parse each line in a document with
1.1 kristaps 71: .Fn mdoc_parseln ,
72: close the parsing session with
73: .Fn mdoc_endparse ,
74: operate over the syntax tree returned by
1.33 kristaps 75: .Fn mdoc_node
1.4 kristaps 76: and
77: .Fn mdoc_meta ,
1.1 kristaps 78: then free all allocated memory with
79: .Fn mdoc_free .
1.20 kristaps 80: The
81: .Fn mdoc_reset
82: function may be used in order to reset the parser for another input
1.38 kristaps 83: sequence.
84: See the
1.1 kristaps 85: .Sx EXAMPLES
1.38 kristaps 86: section for a simple example.
1.2 kristaps 87: .Pp
1.33 kristaps 88: This section further defines the
1.6 kristaps 89: .Sx Types ,
1.33 kristaps 90: .Sx Functions
1.6 kristaps 91: and
92: .Sx Variables
1.38 kristaps 93: available to programmers.
94: Following that, the
1.33 kristaps 95: .Sx Abstract Syntax Tree
1.17 kristaps 96: section documents the output tree.
1.6 kristaps 97: .Ss Types
98: Both functions (see
99: .Sx Functions )
100: and variables (see
101: .Sx Variables )
102: may use the following types:
1.37 kristaps 103: .Bl -ohang
1.6 kristaps 104: .It Vt struct mdoc
105: An opaque type defined in
106: .Pa mdoc.c .
107: Its values are only used privately within the library.
108: .It Vt struct mdoc_node
1.38 kristaps 109: A parsed node.
110: Defined in
1.6 kristaps 111: .Pa mdoc.h .
1.33 kristaps 112: See
1.6 kristaps 113: .Sx Abstract Syntax Tree
114: for details.
1.38 kristaps 115: .It Vt mandocmsg
116: A function callback type defined in
117: .Pa mandoc.h .
1.6 kristaps 118: .El
119: .Ss Functions
1.2 kristaps 120: Function descriptions follow:
1.37 kristaps 121: .Bl -ohang
1.2 kristaps 122: .It Fn mdoc_alloc
1.38 kristaps 123: Allocates a parsing structure.
124: The
1.2 kristaps 125: .Fa data
1.40 kristaps 126: pointer is passed to
127: .Fa msgs .
1.20 kristaps 128: The
129: .Fa pflags
130: arguments are defined in
131: .Pa mdoc.h .
1.38 kristaps 132: Returns NULL on failure.
133: If non-NULL, the pointer must be freed with
1.2 kristaps 134: .Fn mdoc_free .
1.20 kristaps 135: .It Fn mdoc_reset
1.38 kristaps 136: Reset the parser for another parse routine.
137: After its use,
1.20 kristaps 138: .Fn mdoc_parseln
1.38 kristaps 139: behaves as if invoked for the first time.
140: If it returns 0, memory could not be allocated.
1.2 kristaps 141: .It Fn mdoc_free
1.38 kristaps 142: Free all resources of a parser.
143: The pointer is no longer valid after invocation.
1.2 kristaps 144: .It Fn mdoc_parseln
1.38 kristaps 145: Parse a nil-terminated line of input.
146: This line should not contain the trailing newline.
147: Returns 0 on failure, 1 on success.
148: The input buffer
1.2 kristaps 149: .Fa buf
150: is modified by this function.
151: .It Fn mdoc_endparse
1.38 kristaps 152: Signals that the parse is complete.
153: Note that if
1.2 kristaps 154: .Fn mdoc_endparse
155: is called subsequent to
1.4 kristaps 156: .Fn mdoc_node ,
1.38 kristaps 157: the resulting tree is incomplete.
158: Returns 0 on failure, 1 on success.
1.4 kristaps 159: .It Fn mdoc_node
1.38 kristaps 160: Returns the first node of the parse.
161: Note that if
1.2 kristaps 162: .Fn mdoc_parseln
163: or
164: .Fn mdoc_endparse
165: return 0, the tree will be incomplete.
1.4 kristaps 166: .It Fn mdoc_meta
1.38 kristaps 167: Returns the document's parsed meta-data.
168: If this information has not yet been supplied or
1.4 kristaps 169: .Fn mdoc_parseln
170: or
171: .Fn mdoc_endparse
172: return 0, the data will be incomplete.
173: .El
1.6 kristaps 174: .Ss Variables
1.4 kristaps 175: The following variables are also defined:
1.37 kristaps 176: .Bl -ohang
1.4 kristaps 177: .It Va mdoc_macronames
178: An array of string-ified token names.
179: .It Va mdoc_argnames
180: An array of string-ified token argument names.
1.2 kristaps 181: .El
1.6 kristaps 182: .Ss Abstract Syntax Tree
1.33 kristaps 183: The
1.6 kristaps 184: .Nm
1.17 kristaps 185: functions produce an abstract syntax tree (AST) describing input in a
1.38 kristaps 186: regular form.
187: It may be reviewed at any time with
1.6 kristaps 188: .Fn mdoc_nodes ;
189: however, if called before
190: .Fn mdoc_endparse ,
191: or after
1.33 kristaps 192: .Fn mdoc_endparse
1.6 kristaps 193: or
194: .Fn mdoc_parseln
1.33 kristaps 195: fail, it may be incomplete.
1.18 kristaps 196: .Pp
197: This AST is governed by the ontological
1.17 kristaps 198: rules dictated in
199: .Xr mdoc 7
1.33 kristaps 200: and derives its terminology accordingly.
1.17 kristaps 201: .Qq In-line
202: elements described in
203: .Xr mdoc 7
1.33 kristaps 204: are described simply as
1.17 kristaps 205: .Qq elements .
1.6 kristaps 206: .Pp
1.33 kristaps 207: The AST is composed of
1.6 kristaps 208: .Vt struct mdoc_node
209: nodes with block, head, body, element, root and text types as declared
210: by the
211: .Va type
1.38 kristaps 212: field.
213: Each node also provides its parse point (the
1.6 kristaps 214: .Va line ,
215: .Va sec ,
216: and
217: .Va pos
218: fields), its position in the tree (the
219: .Va parent ,
220: .Va child ,
1.45 schwarze 221: .Va nchild ,
1.33 kristaps 222: .Va next
1.6 kristaps 223: and
1.33 kristaps 224: .Va prev
1.45 schwarze 225: fields) and some type-specific data, in particular, for nodes generated
226: from macros, the generating macro in the
227: .Va tok
228: field.
1.6 kristaps 229: .Pp
230: The tree itself is arranged according to the following normal form,
231: where capitalised non-terminals represent nodes.
232: .Pp
1.37 kristaps 233: .Bl -tag -width "ELEMENTXX" -compact
1.6 kristaps 234: .It ROOT
235: \(<- mnode+
236: .It mnode
237: \(<- BLOCK | ELEMENT | TEXT
238: .It BLOCK
1.41 kristaps 239: \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
1.6 kristaps 240: .It ELEMENT
241: \(<- TEXT*
242: .It HEAD
1.45 schwarze 243: \(<- mnode*
1.6 kristaps 244: .It BODY
1.45 schwarze 245: \(<- mnode* [ENDBODY mnode*]
1.6 kristaps 246: .It TAIL
1.45 schwarze 247: \(<- mnode*
1.6 kristaps 248: .It TEXT
1.38 kristaps 249: \(<- [[:printable:],0x1e]*
1.6 kristaps 250: .El
1.2 kristaps 251: .Pp
1.6 kristaps 252: Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
1.41 kristaps 253: the BLOCK production: these refer to punctuation marks.
1.38 kristaps 254: Furthermore, although a TEXT node will generally have a non-zero-length
255: string, in the specific case of
1.8 kristaps 256: .Sq \&.Bd \-literal ,
1.6 kristaps 257: an empty line will produce a zero-length string.
1.41 kristaps 258: Multiple body parts are only found in invocations of
259: .Sq \&Bl \-column ,
260: where a new body introduces a new phrase.
1.46 kristaps 261: .Ss Badly-nested Blocks
262: The ENDBODY node is available to end the formatting associated
263: with a given block before the physical end of that block.
264: It has a non-null
1.45 schwarze 265: .Va end
266: field, is of the BODY
267: .Va type ,
268: has the same
269: .Va tok
270: as the BLOCK it is ending, and has a
271: .Va pending
272: field pointing to that BLOCK's BODY node.
273: It is an indirect child of that BODY node
274: and has no children of its own.
275: .Pp
276: An ENDBODY node is generated when a block ends while one of its child
277: blocks is still open, like in the following example:
278: .Bd -literal -offset indent
279: \&.Ao ao
280: \&.Bo bo ac
281: \&.Ac bc
282: \&.Bc end
283: .Ed
284: .Pp
285: This example results in the following block structure:
286: .Bd -literal -offset indent
287: BLOCK Ao
288: HEAD Ao
289: BODY Ao
290: TEXT ao
291: BLOCK Bo, pending -> Ao
292: HEAD Bo
293: BODY Bo
294: TEXT bo
295: TEXT ac
296: ENDBODY Ao, pending -> Ao
297: TEXT bc
298: TEXT end
299: .Ed
300: .Pp
1.46 kristaps 301: Here, the formatting of the
302: .Sq \&Ao
303: block extends from TEXT ao to TEXT ac,
304: while the formatting of the
305: .Sq \&Bo
306: block extends from TEXT bo to TEXT bc.
307: It renders as follows in
1.45 schwarze 308: .Fl T Ns Cm ascii
309: mode:
1.46 kristaps 310: .Pp
1.45 schwarze 311: .Dl <ao [bo ac> bc] end
1.46 kristaps 312: .Pp
313: Support for badly-nested blocks is only provided for backward
1.45 schwarze 314: compatibility with some older
315: .Xr mdoc 7
316: implementations.
1.46 kristaps 317: Using badly-nested blocks is
318: .Em strongly discouraged :
319: the
320: .Fl T Ns Cm html
321: and
322: .Fl T Ns Cm xhtml
323: front-ends are unable to render them in any meaningful way.
324: Furthermore, behaviour when encountering badly-nested blocks is not
325: consistent across troff implementations, especially when using multiple
326: levels of badly-nested blocks.
1.2 kristaps 327: .Sh EXAMPLES
328: The following example reads lines from stdin and parses them, operating
1.33 kristaps 329: on the finished parse tree with
1.2 kristaps 330: .Fn parsed .
1.37 kristaps 331: This example does not error-check nor free memory upon failure.
332: .Bd -literal -offset indent
1.44 kristaps 333: struct regset regs;
1.2 kristaps 334: struct mdoc *mdoc;
1.31 kristaps 335: const struct mdoc_node *node;
1.2 kristaps 336: char *buf;
337: size_t len;
338: int line;
339:
1.44 kristaps 340: bzero(®s, sizeof(struct regset));
1.2 kristaps 341: line = 1;
1.44 kristaps 342: mdoc = mdoc_alloc(®s, NULL, 0, NULL);
1.37 kristaps 343: buf = NULL;
344: alloc_len = 0;
1.2 kristaps 345:
1.37 kristaps 346: while ((len = getline(&buf, &alloc_len, stdin)) >= 0) {
347: if (len && buflen[len - 1] = '\en')
348: buf[len - 1] = '\e0';
349: if ( ! mdoc_parseln(mdoc, line, buf))
350: errx(1, "mdoc_parseln");
351: line++;
1.2 kristaps 352: }
353:
354: if ( ! mdoc_endparse(mdoc))
1.37 kristaps 355: errx(1, "mdoc_endparse");
1.4 kristaps 356: if (NULL == (node = mdoc_node(mdoc)))
1.37 kristaps 357: errx(1, "mdoc_node");
1.2 kristaps 358:
359: parsed(mdoc, node);
360: mdoc_free(mdoc);
361: .Ed
1.38 kristaps 362: .Pp
363: Please see
364: .Pa main.c
365: in the source archive for a rigorous reference.
1.17 kristaps 366: .Sh SEE ALSO
1.20 kristaps 367: .Xr mandoc 1 ,
1.14 kristaps 368: .Xr mdoc 7
1.2 kristaps 369: .Sh AUTHORS
370: The
371: .Nm
1.38 kristaps 372: library was written by
1.37 kristaps 373: .An Kristaps Dzonsons Aq kristaps@bsd.lv .
CVSweb