Annotation of mandoc/mdoc.3, Revision 1.55
1.55 ! kristaps 1: .\" $Id: mdoc.3,v 1.54 2011/01/03 13:55:26 kristaps Exp $
1.6 kristaps 2: .\"
1.47 schwarze 3: .\" Copyright (c) 2009, 2010 Kristaps Dzonsons <kristaps@bsd.lv>
4: .\" Copyright (c) 2010 Ingo Schwarze <schwarze@openbsd.org>
1.6 kristaps 5: .\"
6: .\" Permission to use, copy, modify, and distribute this software for any
1.28 kristaps 7: .\" purpose with or without fee is hereby granted, provided that the above
8: .\" copyright notice and this permission notice appear in all copies.
1.6 kristaps 9: .\"
1.28 kristaps 10: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
1.33 kristaps 17: .\"
1.54 kristaps 18: .Dd $Mdocdate: January 3 2011 $
1.27 kristaps 19: .Dt MDOC 3
1.1 kristaps 20: .Os
21: .Sh NAME
1.39 kristaps 22: .Nm mdoc ,
1.1 kristaps 23: .Nm mdoc_alloc ,
24: .Nm mdoc_endparse ,
1.38 kristaps 25: .Nm mdoc_free ,
26: .Nm mdoc_meta ,
1.4 kristaps 27: .Nm mdoc_node ,
1.38 kristaps 28: .Nm mdoc_parseln ,
1.20 kristaps 29: .Nm mdoc_reset
1.2 kristaps 30: .Nd mdoc macro compiler library
1.1 kristaps 31: .Sh SYNOPSIS
1.38 kristaps 32: .In mandoc.h
1.35 kristaps 33: .In mdoc.h
1.4 kristaps 34: .Vt extern const char * const * mdoc_macronames;
35: .Vt extern const char * const * mdoc_argnames;
1.52 kristaps 36: .Ft int
1.55 ! kristaps 37: .Fo mdoc_addspan
1.52 kristaps 38: .Fa "struct mdoc *mdoc"
39: .Fa "const struct tbl_span *span"
40: .Fc
1.1 kristaps 41: .Ft "struct mdoc *"
1.43 kristaps 42: .Fo mdoc_alloc
1.44 kristaps 43: .Fa "struct regset *regs"
1.43 kristaps 44: .Fa "void *data"
45: .Fa "mandocmsg msgs"
46: .Fc
1.26 kristaps 47: .Ft int
1.38 kristaps 48: .Fn mdoc_endparse "struct mdoc *mdoc"
1.1 kristaps 49: .Ft void
1.2 kristaps 50: .Fn mdoc_free "struct mdoc *mdoc"
1.38 kristaps 51: .Ft "const struct mdoc_meta *"
52: .Fn mdoc_meta "const struct mdoc *mdoc"
53: .Ft "const struct mdoc_node *"
54: .Fn mdoc_node "const struct mdoc *mdoc"
1.1 kristaps 55: .Ft int
1.42 kristaps 56: .Fo mdoc_parseln
57: .Fa "struct mdoc *mdoc"
58: .Fa "int line"
59: .Fa "char *buf"
60: .Fc
1.1 kristaps 61: .Ft int
1.38 kristaps 62: .Fn mdoc_reset "struct mdoc *mdoc"
1.1 kristaps 63: .Sh DESCRIPTION
64: The
65: .Nm mdoc
1.33 kristaps 66: library parses lines of
1.17 kristaps 67: .Xr mdoc 7
1.38 kristaps 68: input
69: into an abstract syntax tree (AST).
1.6 kristaps 70: .Pp
1.1 kristaps 71: In general, applications initiate a parsing sequence with
72: .Fn mdoc_alloc ,
1.33 kristaps 73: parse each line in a document with
1.1 kristaps 74: .Fn mdoc_parseln ,
75: close the parsing session with
76: .Fn mdoc_endparse ,
77: operate over the syntax tree returned by
1.33 kristaps 78: .Fn mdoc_node
1.4 kristaps 79: and
80: .Fn mdoc_meta ,
1.1 kristaps 81: then free all allocated memory with
82: .Fn mdoc_free .
1.20 kristaps 83: The
84: .Fn mdoc_reset
85: function may be used in order to reset the parser for another input
1.38 kristaps 86: sequence.
1.6 kristaps 87: .Ss Types
1.37 kristaps 88: .Bl -ohang
1.6 kristaps 89: .It Vt struct mdoc
1.50 kristaps 90: An opaque type.
1.6 kristaps 91: Its values are only used privately within the library.
92: .It Vt struct mdoc_node
1.38 kristaps 93: A parsed node.
1.33 kristaps 94: See
1.6 kristaps 95: .Sx Abstract Syntax Tree
96: for details.
97: .El
98: .Ss Functions
1.53 kristaps 99: If
100: .Fn mdoc_addspan ,
101: .Fn mdoc_parseln ,
102: or
103: .Fn mdoc_endparse
104: return 0, calls to any function but
105: .Fn mdoc_reset
106: or
107: .Fn mdoc_free
108: will raise an assertion.
1.37 kristaps 109: .Bl -ohang
1.52 kristaps 110: .It Fn mdoc_addspan
111: Add a table span to the parsing stream.
112: Returns 0 on failure, 1 on success.
1.2 kristaps 113: .It Fn mdoc_alloc
1.38 kristaps 114: Allocates a parsing structure.
115: The
1.2 kristaps 116: .Fa data
1.40 kristaps 117: pointer is passed to
118: .Fa msgs .
1.53 kristaps 119: Always returns a valid pointer.
120: The pointer must be freed with
1.2 kristaps 121: .Fn mdoc_free .
1.20 kristaps 122: .It Fn mdoc_reset
1.38 kristaps 123: Reset the parser for another parse routine.
124: After its use,
1.20 kristaps 125: .Fn mdoc_parseln
1.38 kristaps 126: behaves as if invoked for the first time.
127: If it returns 0, memory could not be allocated.
1.2 kristaps 128: .It Fn mdoc_free
1.38 kristaps 129: Free all resources of a parser.
130: The pointer is no longer valid after invocation.
1.2 kristaps 131: .It Fn mdoc_parseln
1.38 kristaps 132: Parse a nil-terminated line of input.
133: This line should not contain the trailing newline.
134: Returns 0 on failure, 1 on success.
135: The input buffer
1.2 kristaps 136: .Fa buf
137: is modified by this function.
138: .It Fn mdoc_endparse
1.38 kristaps 139: Signals that the parse is complete.
140: Returns 0 on failure, 1 on success.
1.4 kristaps 141: .It Fn mdoc_node
1.38 kristaps 142: Returns the first node of the parse.
1.4 kristaps 143: .It Fn mdoc_meta
1.38 kristaps 144: Returns the document's parsed meta-data.
1.4 kristaps 145: .El
1.6 kristaps 146: .Ss Variables
1.37 kristaps 147: .Bl -ohang
1.4 kristaps 148: .It Va mdoc_macronames
149: An array of string-ified token names.
150: .It Va mdoc_argnames
151: An array of string-ified token argument names.
1.2 kristaps 152: .El
1.6 kristaps 153: .Ss Abstract Syntax Tree
1.33 kristaps 154: The
1.6 kristaps 155: .Nm
1.17 kristaps 156: functions produce an abstract syntax tree (AST) describing input in a
1.38 kristaps 157: regular form.
158: It may be reviewed at any time with
1.6 kristaps 159: .Fn mdoc_nodes ;
160: however, if called before
161: .Fn mdoc_endparse ,
162: or after
1.33 kristaps 163: .Fn mdoc_endparse
1.6 kristaps 164: or
165: .Fn mdoc_parseln
1.33 kristaps 166: fail, it may be incomplete.
1.18 kristaps 167: .Pp
168: This AST is governed by the ontological
1.17 kristaps 169: rules dictated in
170: .Xr mdoc 7
1.33 kristaps 171: and derives its terminology accordingly.
1.17 kristaps 172: .Qq In-line
173: elements described in
174: .Xr mdoc 7
1.33 kristaps 175: are described simply as
1.17 kristaps 176: .Qq elements .
1.6 kristaps 177: .Pp
1.33 kristaps 178: The AST is composed of
1.6 kristaps 179: .Vt struct mdoc_node
180: nodes with block, head, body, element, root and text types as declared
181: by the
182: .Va type
1.38 kristaps 183: field.
184: Each node also provides its parse point (the
1.6 kristaps 185: .Va line ,
186: .Va sec ,
187: and
188: .Va pos
189: fields), its position in the tree (the
190: .Va parent ,
191: .Va child ,
1.45 schwarze 192: .Va nchild ,
1.33 kristaps 193: .Va next
1.6 kristaps 194: and
1.33 kristaps 195: .Va prev
1.45 schwarze 196: fields) and some type-specific data, in particular, for nodes generated
197: from macros, the generating macro in the
198: .Va tok
199: field.
1.6 kristaps 200: .Pp
201: The tree itself is arranged according to the following normal form,
202: where capitalised non-terminals represent nodes.
203: .Pp
1.37 kristaps 204: .Bl -tag -width "ELEMENTXX" -compact
1.6 kristaps 205: .It ROOT
206: \(<- mnode+
207: .It mnode
208: \(<- BLOCK | ELEMENT | TEXT
209: .It BLOCK
1.41 kristaps 210: \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
1.6 kristaps 211: .It ELEMENT
212: \(<- TEXT*
213: .It HEAD
1.45 schwarze 214: \(<- mnode*
1.6 kristaps 215: .It BODY
1.45 schwarze 216: \(<- mnode* [ENDBODY mnode*]
1.6 kristaps 217: .It TAIL
1.45 schwarze 218: \(<- mnode*
1.6 kristaps 219: .It TEXT
1.38 kristaps 220: \(<- [[:printable:],0x1e]*
1.6 kristaps 221: .El
1.2 kristaps 222: .Pp
1.6 kristaps 223: Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
1.41 kristaps 224: the BLOCK production: these refer to punctuation marks.
1.38 kristaps 225: Furthermore, although a TEXT node will generally have a non-zero-length
226: string, in the specific case of
1.8 kristaps 227: .Sq \&.Bd \-literal ,
1.6 kristaps 228: an empty line will produce a zero-length string.
1.41 kristaps 229: Multiple body parts are only found in invocations of
230: .Sq \&Bl \-column ,
231: where a new body introduces a new phrase.
1.46 kristaps 232: .Ss Badly-nested Blocks
233: The ENDBODY node is available to end the formatting associated
234: with a given block before the physical end of that block.
235: It has a non-null
1.45 schwarze 236: .Va end
237: field, is of the BODY
238: .Va type ,
239: has the same
240: .Va tok
241: as the BLOCK it is ending, and has a
242: .Va pending
243: field pointing to that BLOCK's BODY node.
244: It is an indirect child of that BODY node
245: and has no children of its own.
246: .Pp
247: An ENDBODY node is generated when a block ends while one of its child
248: blocks is still open, like in the following example:
249: .Bd -literal -offset indent
250: \&.Ao ao
251: \&.Bo bo ac
252: \&.Ac bc
253: \&.Bc end
254: .Ed
255: .Pp
256: This example results in the following block structure:
257: .Bd -literal -offset indent
258: BLOCK Ao
259: HEAD Ao
260: BODY Ao
261: TEXT ao
262: BLOCK Bo, pending -> Ao
263: HEAD Bo
264: BODY Bo
265: TEXT bo
266: TEXT ac
267: ENDBODY Ao, pending -> Ao
268: TEXT bc
269: TEXT end
270: .Ed
271: .Pp
1.46 kristaps 272: Here, the formatting of the
273: .Sq \&Ao
274: block extends from TEXT ao to TEXT ac,
275: while the formatting of the
276: .Sq \&Bo
277: block extends from TEXT bo to TEXT bc.
278: It renders as follows in
1.45 schwarze 279: .Fl T Ns Cm ascii
280: mode:
1.46 kristaps 281: .Pp
1.45 schwarze 282: .Dl <ao [bo ac> bc] end
1.46 kristaps 283: .Pp
284: Support for badly-nested blocks is only provided for backward
1.45 schwarze 285: compatibility with some older
286: .Xr mdoc 7
287: implementations.
1.46 kristaps 288: Using badly-nested blocks is
289: .Em strongly discouraged :
290: the
291: .Fl T Ns Cm html
292: and
293: .Fl T Ns Cm xhtml
294: front-ends are unable to render them in any meaningful way.
295: Furthermore, behaviour when encountering badly-nested blocks is not
296: consistent across troff implementations, especially when using multiple
297: levels of badly-nested blocks.
1.2 kristaps 298: .Sh EXAMPLES
299: The following example reads lines from stdin and parses them, operating
1.33 kristaps 300: on the finished parse tree with
1.2 kristaps 301: .Fn parsed .
1.37 kristaps 302: This example does not error-check nor free memory upon failure.
303: .Bd -literal -offset indent
1.44 kristaps 304: struct regset regs;
1.2 kristaps 305: struct mdoc *mdoc;
1.31 kristaps 306: const struct mdoc_node *node;
1.2 kristaps 307: char *buf;
308: size_t len;
309: int line;
310:
1.44 kristaps 311: bzero(®s, sizeof(struct regset));
1.2 kristaps 312: line = 1;
1.49 schwarze 313: mdoc = mdoc_alloc(®s, NULL, NULL);
1.37 kristaps 314: buf = NULL;
315: alloc_len = 0;
1.2 kristaps 316:
1.37 kristaps 317: while ((len = getline(&buf, &alloc_len, stdin)) >= 0) {
318: if (len && buflen[len - 1] = '\en')
319: buf[len - 1] = '\e0';
320: if ( ! mdoc_parseln(mdoc, line, buf))
321: errx(1, "mdoc_parseln");
322: line++;
1.2 kristaps 323: }
324:
325: if ( ! mdoc_endparse(mdoc))
1.37 kristaps 326: errx(1, "mdoc_endparse");
1.4 kristaps 327: if (NULL == (node = mdoc_node(mdoc)))
1.37 kristaps 328: errx(1, "mdoc_node");
1.2 kristaps 329:
330: parsed(mdoc, node);
331: mdoc_free(mdoc);
332: .Ed
1.38 kristaps 333: .Pp
1.50 kristaps 334: To compile this, execute
335: .Pp
1.51 kristaps 336: .Dl % cc main.c libmdoc.a libmandoc.a
1.50 kristaps 337: .Pp
338: where
1.38 kristaps 339: .Pa main.c
1.50 kristaps 340: is the example file.
1.17 kristaps 341: .Sh SEE ALSO
1.20 kristaps 342: .Xr mandoc 1 ,
1.14 kristaps 343: .Xr mdoc 7
1.2 kristaps 344: .Sh AUTHORS
345: The
346: .Nm
1.38 kristaps 347: library was written by
1.37 kristaps 348: .An Kristaps Dzonsons Aq kristaps@bsd.lv .
CVSweb