Annotation of mandoc/mdoc.3, Revision 1.12
1.12 ! kristaps 1: .\" $Id: mdoc.3,v 1.11 2009/02/25 17:02:47 kristaps Exp $
1.6 kristaps 2: .\"
3: .\" Copyright (c) 2009 Kristaps Dzonsons <kristaps@kth.se>
4: .\"
5: .\" Permission to use, copy, modify, and distribute this software for any
6: .\" purpose with or without fee is hereby granted, provided that the
7: .\" above copyright notice and this permission notice appear in all
8: .\" copies.
9: .\"
10: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
11: .\" WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
12: .\" WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE
13: .\" AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL
14: .\" DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
15: .\" PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
16: .\" TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
17: .\" PERFORMANCE OF THIS SOFTWARE.
1.1 kristaps 18: .\"
19: .Dd $Mdocdate$
20: .Dt mdoc 3
21: .Os
1.6 kristaps 22: .\" SECTION
1.1 kristaps 23: .Sh NAME
24: .Nm mdoc_alloc ,
25: .Nm mdoc_parseln ,
26: .Nm mdoc_endparse ,
1.4 kristaps 27: .Nm mdoc_node ,
28: .Nm mdoc_meta ,
1.1 kristaps 29: .Nm mdoc_free
1.2 kristaps 30: .Nd mdoc macro compiler library
1.6 kristaps 31: .\" SECTION
1.1 kristaps 32: .Sh SYNOPSIS
1.4 kristaps 33: .Fd #include <mdoc.h>
34: .Vt extern const char * const * mdoc_macronames;
35: .Vt extern const char * const * mdoc_argnames;
1.1 kristaps 36: .Ft "struct mdoc *"
37: .Fn mdoc_alloc "void *data" "const struct mdoc_cb *cb"
38: .Ft void
1.2 kristaps 39: .Fn mdoc_free "struct mdoc *mdoc"
1.1 kristaps 40: .Ft int
1.2 kristaps 41: .Fn mdoc_parseln "struct mdoc *mdoc" "int line" "char *buf"
1.1 kristaps 42: .Ft "const struct mdoc_node *"
1.4 kristaps 43: .Fn mdoc_node "struct mdoc *mdoc"
44: .Ft "const struct mdoc_meta *"
45: .Fn mdoc_meta "struct mdoc *mdoc"
1.1 kristaps 46: .Ft int
1.2 kristaps 47: .Fn mdoc_endparse "struct mdoc *mdoc"
1.6 kristaps 48: .\" SECTION
1.1 kristaps 49: .Sh DESCRIPTION
50: The
51: .Nm mdoc
1.6 kristaps 52: library parses lines of mdoc input into an abstract syntax tree.
1.7 kristaps 53: .Dq mdoc ,
54: which is used to format BSD manual pages, is a macro package of the
1.6 kristaps 55: .Dq roff
1.7 kristaps 56: language. The
1.6 kristaps 57: .Nm
58: library implements only those macros documented in the
59: .Xr mdoc 7
60: and
61: .Xr mdoc.samples 7
1.11 kristaps 62: manuals. Documents with
63: .Xr refer 1 ,
64: .Xr eqn 1
65: and other pre-processor sections aren't accomodated.
1.6 kristaps 66: .\" PARAGRAPH
67: .Pp
68: .Nm
69: is
70: .Ud
71: .\" PARAGRAPH
72: .Pp
1.1 kristaps 73: In general, applications initiate a parsing sequence with
74: .Fn mdoc_alloc ,
75: parse each line in a document with
76: .Fn mdoc_parseln ,
77: close the parsing session with
78: .Fn mdoc_endparse ,
79: operate over the syntax tree returned by
1.4 kristaps 80: .Fn mdoc_node
81: and
82: .Fn mdoc_meta ,
1.1 kristaps 83: then free all allocated memory with
84: .Fn mdoc_free .
85: See the
86: .Sx EXAMPLES
87: section for a full example.
1.6 kristaps 88: .\" PARAGRAPH
1.2 kristaps 89: .Pp
1.6 kristaps 90: This section further defines the
91: .Sx Types ,
92: .Sx Functions
93: and
94: .Sx Variables
1.10 kristaps 95: available to programmers. Following that,
96: .Sx Character Encoding
97: describes input format. Lastly,
1.6 kristaps 98: .Sx Abstract Syntax Tree ,
99: documents the output tree.
100: .\" SUBSECTION
101: .Ss Types
102: Both functions (see
103: .Sx Functions )
104: and variables (see
105: .Sx Variables )
106: may use the following types:
1.9 kristaps 107: .Bl -ohang -offset "XXXX"
1.6 kristaps 108: .\" LIST-ITEM
109: .It Vt struct mdoc
110: An opaque type defined in
111: .Pa mdoc.c .
112: Its values are only used privately within the library.
113: .\" LIST-ITEM
114: .It Vt struct mdoc_cb
115: A set of message callbacks defined in
116: .Pa mdoc.h .
117: .\" LIST-ITEM
118: .It Vt struct mdoc_node
119: A parsed node. Defined in
120: .Pa mdoc.h .
121: See
122: .Sx Abstract Syntax Tree
123: for details.
124: .El
125: .\" SUBSECTION
126: .Ss Functions
1.2 kristaps 127: Function descriptions follow:
1.9 kristaps 128: .Bl -ohang -offset "XXXX"
1.6 kristaps 129: .\" LIST-ITEM
1.2 kristaps 130: .It Fn mdoc_alloc
131: Allocates a parsing structure. The
132: .Fa data
133: pointer is passed to callbacks in
134: .Fa cb ,
135: which are documented further in the header file. Returns NULL on
136: failure. If non-NULL, the pointer must be freed with
137: .Fn mdoc_free .
1.6 kristaps 138: .\" LIST-ITEM
1.2 kristaps 139: .It Fn mdoc_free
140: Free all resources of a parser. The pointer is no longer valid after
141: invocation.
1.6 kristaps 142: .\" LIST-ITEM
1.2 kristaps 143: .It Fn mdoc_parseln
144: Parse a nil-terminated line of input. This line should not contain the
145: trailing newline. Returns 0 on failure, 1 on success. The input buffer
146: .Fa buf
147: is modified by this function.
1.6 kristaps 148: .\" LIST-ITEM
1.2 kristaps 149: .It Fn mdoc_endparse
150: Signals that the parse is complete. Note that if
151: .Fn mdoc_endparse
152: is called subsequent to
1.4 kristaps 153: .Fn mdoc_node ,
1.2 kristaps 154: the resulting tree is incomplete. Returns 0 on failure, 1 on success.
1.6 kristaps 155: .\" LIST-ITEM
1.4 kristaps 156: .It Fn mdoc_node
157: Returns the first node of the parse. Note that if
1.2 kristaps 158: .Fn mdoc_parseln
159: or
160: .Fn mdoc_endparse
161: return 0, the tree will be incomplete.
1.4 kristaps 162: .It Fn mdoc_meta
163: Returns the document's parsed meta-data. If this information has not
164: yet been supplied or
165: .Fn mdoc_parseln
166: or
167: .Fn mdoc_endparse
168: return 0, the data will be incomplete.
169: .El
1.6 kristaps 170: .\" SUBSECTION
171: .Ss Variables
1.4 kristaps 172: The following variables are also defined:
1.9 kristaps 173: .Bl -ohang -offset "XXXX"
1.6 kristaps 174: .\" LIST-ITEM
1.4 kristaps 175: .It Va mdoc_macronames
176: An array of string-ified token names.
1.6 kristaps 177: .\" LIST-ITEM
1.4 kristaps 178: .It Va mdoc_argnames
179: An array of string-ified token argument names.
1.2 kristaps 180: .El
1.6 kristaps 181: .\" SUBSECTION
1.10 kristaps 182: .Ss Character Encoding
183: The
184: .Xr mdoc 3
185: library accepts only printable ASCII characters as defined by
186: .Xr isprint 3 .
1.12 ! kristaps 187: Non-ASCII character sequences are delimited in various ways. All are
! 188: preceeded by an escape character
1.10 kristaps 189: .Sq \\
190: and followed by either an open-parenthesis
191: .Sq \&(
192: for two-character sequences; an open-bracket
193: .Sq \&[
194: for n-character sequences (terminated at a close-bracket
195: .Sq \&] ) ;
1.12 ! kristaps 196: an asterisk and open-parenthesis
! 197: .Sq \&*(
! 198: for two-character sequences;
! 199: an asterisk and non-open-parenthesis
! 200: .Sq \&*
! 201: for single-character sequences; or one of a small set of standalone
! 202: single characters for other escapes.
! 203: .Pp
! 204: Examples:
! 205: .Pp
! 206: .Bl -tag -width "XXXXXXXX" -offset "XXXX" -compact
! 207: .\" LIST-ITEM
! 208: .It \\*(<=
! 209: prints
! 210: .Dq \*(<=
! 211: .Pq greater-equal
! 212: .\" LIST-ITEM
! 213: .It \\(<-
! 214: prints
! 215: .Dq \(<-
! 216: .Pq left-arrow
! 217: .\" LIST-ITEM
! 218: .It \\[<-]
! 219: also prints
! 220: .Dq \(<-
! 221: .Pq left-arrow
! 222: .\" LIST-ITEM
! 223: .It \\*(Ba
! 224: prints
! 225: .Dq \*(Ba
! 226: .Pq bar
! 227: .\" LIST-ITEM
! 228: .It \\*q
! 229: prints
! 230: .Dq \*q
! 231: .Pq double-quote
! 232: .El
1.10 kristaps 233: .\" SUBSECTION
1.6 kristaps 234: .Ss Abstract Syntax Tree
235: The
236: .Nm
237: functions produce an abstract syntax tree (AST) describing the input
238: lines in a regular form. It may be reviewed at any time with
239: .Fn mdoc_nodes ;
240: however, if called before
241: .Fn mdoc_endparse ,
242: or after
243: .Fn mdoc_endparse
244: or
245: .Fn mdoc_parseln
246: fail, it may be incomplete.
247: .\" PARAGRAPH
248: .Pp
249: The AST is composed of
250: .Vt struct mdoc_node
251: nodes with block, head, body, element, root and text types as declared
252: by the
253: .Va type
254: field. Each node also provides its parse point (the
255: .Va line ,
256: .Va sec ,
257: and
258: .Va pos
259: fields), its position in the tree (the
260: .Va parent ,
261: .Va child ,
262: .Va next
263: and
264: .Va prev
265: fields) and type-specific data (the
266: .Va data
267: field).
268: .\" PARAGRAPH
269: .Pp
270: The tree itself is arranged according to the following normal form,
271: where capitalised non-terminals represent nodes.
272: .Pp
1.9 kristaps 273: .Bl -tag -width "ELEMENTXX" -compact -offset "XXXX"
1.6 kristaps 274: .\" LIST-ITEM
275: .It ROOT
276: \(<- mnode+
277: .It mnode
278: \(<- BLOCK | ELEMENT | TEXT
279: .It BLOCK
280: \(<- (HEAD [TEXT])+ [BODY [TEXT]] [TAIL [TEXT]]
281: .It BLOCK
282: \(<- BODY [TEXT] [TAIL [TEXT]]
283: .It ELEMENT
284: \(<- TEXT*
285: .It HEAD
286: \(<- mnode+
287: .It BODY
288: \(<- mnode+
289: .It TAIL
290: \(<- mnode+
291: .It TEXT
292: \(<- [[:alpha:]]*
293: .El
294: .\" PARAGRAPH
1.2 kristaps 295: .Pp
1.6 kristaps 296: Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
297: the BLOCK production. These refer to punctuation marks. Furthermore,
1.8 kristaps 298: although a TEXT node will generally have a non-zero-length string, in
299: the specific case of
300: .Sq \&.Bd \-literal ,
1.6 kristaps 301: an empty line will produce a zero-length string.
302: .\" PARAGRAPH
303: .Pp
1.8 kristaps 304: The rule-of-thumb for mapping node types to macros follows. In-line
1.6 kristaps 305: elements, such as
1.8 kristaps 306: .Sq \&.Em foo ,
1.6 kristaps 307: are classified as ELEMENT nodes, which can only contain text.
1.8 kristaps 308: Multi-line elements, such as
309: .Sq \&.Sh ,
1.6 kristaps 310: are BLOCK elements, where the HEAD constitutes line contents and the
311: BODY constitutes subsequent lines. In-line elements with matching
312: pairs, such as
1.8 kristaps 313: .Sq \&.So
1.6 kristaps 314: and
1.8 kristaps 315: .Sq \&.Sc ,
1.6 kristaps 316: are BLOCK elements with no HEAD tag. The only exception to this is
1.8 kristaps 317: .Sq \&.Eo
1.6 kristaps 318: and
1.8 kristaps 319: .Sq \&.Ec ,
1.6 kristaps 320: which has a HEAD and TAIL node corresponding to the enclosure string.
1.8 kristaps 321: TEXT nodes, obviously, constitute text, and the ROOT node is the
322: document's root.
1.6 kristaps 323: .\" SECTION
1.2 kristaps 324: .Sh EXAMPLES
325: The following example reads lines from stdin and parses them, operating
326: on the finished parse tree with
327: .Fn parsed .
328: Note that, if the last line of the file isn't newline-terminated, this
329: will truncate the file's last character (see
330: .Xr fgetln 3 ) .
331: Further, this example does not error-check nor free memory upon failure.
1.9 kristaps 332: .Bd -literal -offset "XXXX"
1.2 kristaps 333: struct mdoc *mdoc;
334: struct mdoc_node *node;
335: char *buf;
336: size_t len;
337: int line;
338:
339: line = 1;
340: mdoc = mdoc_alloc(NULL, NULL);
341:
342: while ((buf = fgetln(fp, &len))) {
343: buf[len - 1] = '\\0';
344: if ( ! mdoc_parseln(mdoc, line, buf))
345: errx(1, "mdoc_parseln");
346: line++;
347: }
348:
349: if ( ! mdoc_endparse(mdoc))
350: errx(1, "mdoc_endparse");
1.4 kristaps 351: if (NULL == (node = mdoc_node(mdoc)))
352: errx(1, "mdoc_node");
1.2 kristaps 353:
354: parsed(mdoc, node);
355: mdoc_free(mdoc);
356: .Ed
1.6 kristaps 357: .\" SECTION
1.2 kristaps 358: .Sh SEE ALSO
359: .Xr mdoc 7 ,
360: .Xr mdoc.samples 7 ,
361: .Xr groff 1 ,
362: .Xr mdocml 1
1.6 kristaps 363: .\" SECTION
1.2 kristaps 364: .Sh AUTHORS
365: The
366: .Nm
367: utility was written by
368: .An Kristaps Dzonsons Aq kristaps@kth.se .
1.6 kristaps 369: .\" SECTION
1.2 kristaps 370: .Sh BUGS
1.4 kristaps 371: Bugs, un-implemented macros and incompabilities are documented in this
372: section. The baseline for determining whether macro parsing is
373: .Qq incompatible
374: is the default
1.3 kristaps 375: .Xr groff 1
376: system bundled with
377: .Ox .
1.9 kristaps 378: .\" PARAGRAPH
1.3 kristaps 379: .Pp
1.4 kristaps 380: Un-implemented: the
1.2 kristaps 381: .Sq \&Xc
382: and
383: .Sq \&Xo
384: macros aren't handled when used to span lines for the
385: .Sq \&It
386: macro. Such usage is specifically discouraged in
387: .Xr mdoc.samples 7 .
1.9 kristaps 388: .\" PARAGRAPH
1.2 kristaps 389: .Pp
1.4 kristaps 390: Bugs: when
1.2 kristaps 391: .Sq \&It \-column
392: is invoked, whitespace is not stripped around
393: .Sq \&Ta
394: or tab-character separators.
1.9 kristaps 395: .\" PARAGRAPH
396: .Pp
397: Bugs: elements within columns for
398: .Sq \&It \-column
399: are not yet supported.
400: .\" PARAGRAPH
1.3 kristaps 401: .Pp
1.4 kristaps 402: Incompatible: the
1.3 kristaps 403: .Sq \&At
1.4 kristaps 404: macro only accepts a single parameter. Furthermore, several macros
405: .Pf ( Sq \&Pp ,
406: .Sq \&It ,
407: and possibly others) accept multiple arguments with a warning.
1.9 kristaps 408: .\" PARAGRAPH
1.5 kristaps 409: .Pp
410: Incompatible: only those macros specified by
411: .Xr mdoc.samples 7
412: and
413: .Xr mdoc 7
414: for
415: .Ox
416: are supported; support for
417: .Nx
1.6 kristaps 418: and other
419: .Bx
420: systems is in progress.
CVSweb