Annotation of mandoc/mdoc.3, Revision 1.10
1.10 ! kristaps 1: .\" $Id: mdoc.3,v 1.9 2009/02/23 15:19:47 kristaps Exp $
1.6 kristaps 2: .\"
3: .\" Copyright (c) 2009 Kristaps Dzonsons <kristaps@kth.se>
4: .\"
5: .\" Permission to use, copy, modify, and distribute this software for any
6: .\" purpose with or without fee is hereby granted, provided that the
7: .\" above copyright notice and this permission notice appear in all
8: .\" copies.
9: .\"
10: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
11: .\" WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
12: .\" WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE
13: .\" AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL
14: .\" DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
15: .\" PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
16: .\" TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
17: .\" PERFORMANCE OF THIS SOFTWARE.
1.1 kristaps 18: .\"
19: .Dd $Mdocdate$
20: .Dt mdoc 3
21: .Os
1.6 kristaps 22: .\" SECTION
1.1 kristaps 23: .Sh NAME
24: .Nm mdoc_alloc ,
25: .Nm mdoc_parseln ,
26: .Nm mdoc_endparse ,
1.4 kristaps 27: .Nm mdoc_node ,
28: .Nm mdoc_meta ,
1.1 kristaps 29: .Nm mdoc_free
1.2 kristaps 30: .Nd mdoc macro compiler library
1.6 kristaps 31: .\" SECTION
1.1 kristaps 32: .Sh SYNOPSIS
1.4 kristaps 33: .Fd #include <mdoc.h>
34: .Vt extern const char * const * mdoc_macronames;
35: .Vt extern const char * const * mdoc_argnames;
1.1 kristaps 36: .Ft "struct mdoc *"
37: .Fn mdoc_alloc "void *data" "const struct mdoc_cb *cb"
38: .Ft void
1.2 kristaps 39: .Fn mdoc_free "struct mdoc *mdoc"
1.1 kristaps 40: .Ft int
1.2 kristaps 41: .Fn mdoc_parseln "struct mdoc *mdoc" "int line" "char *buf"
1.1 kristaps 42: .Ft "const struct mdoc_node *"
1.4 kristaps 43: .Fn mdoc_node "struct mdoc *mdoc"
44: .Ft "const struct mdoc_meta *"
45: .Fn mdoc_meta "struct mdoc *mdoc"
1.1 kristaps 46: .Ft int
1.2 kristaps 47: .Fn mdoc_endparse "struct mdoc *mdoc"
1.6 kristaps 48: .\" SECTION
1.1 kristaps 49: .Sh DESCRIPTION
50: The
51: .Nm mdoc
1.6 kristaps 52: library parses lines of mdoc input into an abstract syntax tree.
1.7 kristaps 53: .Dq mdoc ,
54: which is used to format BSD manual pages, is a macro package of the
1.6 kristaps 55: .Dq roff
1.7 kristaps 56: language. The
1.6 kristaps 57: .Nm
58: library implements only those macros documented in the
59: .Xr mdoc 7
60: and
61: .Xr mdoc.samples 7
62: manuals.
63: .\" PARAGRAPH
64: .Pp
65: .Nm
66: is
67: .Ud
68: .\" PARAGRAPH
69: .Pp
1.1 kristaps 70: In general, applications initiate a parsing sequence with
71: .Fn mdoc_alloc ,
72: parse each line in a document with
73: .Fn mdoc_parseln ,
74: close the parsing session with
75: .Fn mdoc_endparse ,
76: operate over the syntax tree returned by
1.4 kristaps 77: .Fn mdoc_node
78: and
79: .Fn mdoc_meta ,
1.1 kristaps 80: then free all allocated memory with
81: .Fn mdoc_free .
82: See the
83: .Sx EXAMPLES
84: section for a full example.
1.6 kristaps 85: .\" PARAGRAPH
1.2 kristaps 86: .Pp
1.6 kristaps 87: This section further defines the
88: .Sx Types ,
89: .Sx Functions
90: and
91: .Sx Variables
1.10 ! kristaps 92: available to programmers. Following that,
! 93: .Sx Character Encoding
! 94: describes input format. Lastly,
1.6 kristaps 95: .Sx Abstract Syntax Tree ,
96: documents the output tree.
97: .\" SUBSECTION
98: .Ss Types
99: Both functions (see
100: .Sx Functions )
101: and variables (see
102: .Sx Variables )
103: may use the following types:
1.9 kristaps 104: .Bl -ohang -offset "XXXX"
1.6 kristaps 105: .\" LIST-ITEM
106: .It Vt struct mdoc
107: An opaque type defined in
108: .Pa mdoc.c .
109: Its values are only used privately within the library.
110: .\" LIST-ITEM
111: .It Vt struct mdoc_cb
112: A set of message callbacks defined in
113: .Pa mdoc.h .
114: .\" LIST-ITEM
115: .It Vt struct mdoc_node
116: A parsed node. Defined in
117: .Pa mdoc.h .
118: See
119: .Sx Abstract Syntax Tree
120: for details.
121: .El
122: .\" SUBSECTION
123: .Ss Functions
1.2 kristaps 124: Function descriptions follow:
1.9 kristaps 125: .Bl -ohang -offset "XXXX"
1.6 kristaps 126: .\" LIST-ITEM
1.2 kristaps 127: .It Fn mdoc_alloc
128: Allocates a parsing structure. The
129: .Fa data
130: pointer is passed to callbacks in
131: .Fa cb ,
132: which are documented further in the header file. Returns NULL on
133: failure. If non-NULL, the pointer must be freed with
134: .Fn mdoc_free .
1.6 kristaps 135: .\" LIST-ITEM
1.2 kristaps 136: .It Fn mdoc_free
137: Free all resources of a parser. The pointer is no longer valid after
138: invocation.
1.6 kristaps 139: .\" LIST-ITEM
1.2 kristaps 140: .It Fn mdoc_parseln
141: Parse a nil-terminated line of input. This line should not contain the
142: trailing newline. Returns 0 on failure, 1 on success. The input buffer
143: .Fa buf
144: is modified by this function.
1.6 kristaps 145: .\" LIST-ITEM
1.2 kristaps 146: .It Fn mdoc_endparse
147: Signals that the parse is complete. Note that if
148: .Fn mdoc_endparse
149: is called subsequent to
1.4 kristaps 150: .Fn mdoc_node ,
1.2 kristaps 151: the resulting tree is incomplete. Returns 0 on failure, 1 on success.
1.6 kristaps 152: .\" LIST-ITEM
1.4 kristaps 153: .It Fn mdoc_node
154: Returns the first node of the parse. Note that if
1.2 kristaps 155: .Fn mdoc_parseln
156: or
157: .Fn mdoc_endparse
158: return 0, the tree will be incomplete.
1.4 kristaps 159: .It Fn mdoc_meta
160: Returns the document's parsed meta-data. If this information has not
161: yet been supplied or
162: .Fn mdoc_parseln
163: or
164: .Fn mdoc_endparse
165: return 0, the data will be incomplete.
166: .El
1.6 kristaps 167: .\" SUBSECTION
168: .Ss Variables
1.4 kristaps 169: The following variables are also defined:
1.9 kristaps 170: .Bl -ohang -offset "XXXX"
1.6 kristaps 171: .\" LIST-ITEM
1.4 kristaps 172: .It Va mdoc_macronames
173: An array of string-ified token names.
1.6 kristaps 174: .\" LIST-ITEM
1.4 kristaps 175: .It Va mdoc_argnames
176: An array of string-ified token argument names.
1.2 kristaps 177: .El
1.6 kristaps 178: .\" SUBSECTION
1.10 ! kristaps 179: .Ss Character Encoding
! 180: The
! 181: .Xr mdoc 3
! 182: library accepts only printable ASCII characters as defined by
! 183: .Xr isprint 3 .
! 184: Non-ASCII character sequences are escaped with an escape character
! 185: .Sq \\
! 186: and followed by either an open-parenthesis
! 187: .Sq \&(
! 188: for two-character sequences; an open-bracket
! 189: .Sq \&[
! 190: for n-character sequences (terminated at a close-bracket
! 191: .Sq \&] ) ;
! 192: or one of a small set of single characters for other escapes.
! 193: .\" SUBSECTION
1.6 kristaps 194: .Ss Abstract Syntax Tree
195: The
196: .Nm
197: functions produce an abstract syntax tree (AST) describing the input
198: lines in a regular form. It may be reviewed at any time with
199: .Fn mdoc_nodes ;
200: however, if called before
201: .Fn mdoc_endparse ,
202: or after
203: .Fn mdoc_endparse
204: or
205: .Fn mdoc_parseln
206: fail, it may be incomplete.
207: .\" PARAGRAPH
208: .Pp
209: The AST is composed of
210: .Vt struct mdoc_node
211: nodes with block, head, body, element, root and text types as declared
212: by the
213: .Va type
214: field. Each node also provides its parse point (the
215: .Va line ,
216: .Va sec ,
217: and
218: .Va pos
219: fields), its position in the tree (the
220: .Va parent ,
221: .Va child ,
222: .Va next
223: and
224: .Va prev
225: fields) and type-specific data (the
226: .Va data
227: field).
228: .\" PARAGRAPH
229: .Pp
230: The tree itself is arranged according to the following normal form,
231: where capitalised non-terminals represent nodes.
232: .Pp
1.9 kristaps 233: .Bl -tag -width "ELEMENTXX" -compact -offset "XXXX"
1.6 kristaps 234: .\" LIST-ITEM
235: .It ROOT
236: \(<- mnode+
237: .It mnode
238: \(<- BLOCK | ELEMENT | TEXT
239: .It BLOCK
240: \(<- (HEAD [TEXT])+ [BODY [TEXT]] [TAIL [TEXT]]
241: .It BLOCK
242: \(<- BODY [TEXT] [TAIL [TEXT]]
243: .It ELEMENT
244: \(<- TEXT*
245: .It HEAD
246: \(<- mnode+
247: .It BODY
248: \(<- mnode+
249: .It TAIL
250: \(<- mnode+
251: .It TEXT
252: \(<- [[:alpha:]]*
253: .El
254: .\" PARAGRAPH
1.2 kristaps 255: .Pp
1.6 kristaps 256: Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
257: the BLOCK production. These refer to punctuation marks. Furthermore,
1.8 kristaps 258: although a TEXT node will generally have a non-zero-length string, in
259: the specific case of
260: .Sq \&.Bd \-literal ,
1.6 kristaps 261: an empty line will produce a zero-length string.
262: .\" PARAGRAPH
263: .Pp
1.8 kristaps 264: The rule-of-thumb for mapping node types to macros follows. In-line
1.6 kristaps 265: elements, such as
1.8 kristaps 266: .Sq \&.Em foo ,
1.6 kristaps 267: are classified as ELEMENT nodes, which can only contain text.
1.8 kristaps 268: Multi-line elements, such as
269: .Sq \&.Sh ,
1.6 kristaps 270: are BLOCK elements, where the HEAD constitutes line contents and the
271: BODY constitutes subsequent lines. In-line elements with matching
272: pairs, such as
1.8 kristaps 273: .Sq \&.So
1.6 kristaps 274: and
1.8 kristaps 275: .Sq \&.Sc ,
1.6 kristaps 276: are BLOCK elements with no HEAD tag. The only exception to this is
1.8 kristaps 277: .Sq \&.Eo
1.6 kristaps 278: and
1.8 kristaps 279: .Sq \&.Ec ,
1.6 kristaps 280: which has a HEAD and TAIL node corresponding to the enclosure string.
1.8 kristaps 281: TEXT nodes, obviously, constitute text, and the ROOT node is the
282: document's root.
1.6 kristaps 283: .\" SECTION
1.2 kristaps 284: .Sh EXAMPLES
285: The following example reads lines from stdin and parses them, operating
286: on the finished parse tree with
287: .Fn parsed .
288: Note that, if the last line of the file isn't newline-terminated, this
289: will truncate the file's last character (see
290: .Xr fgetln 3 ) .
291: Further, this example does not error-check nor free memory upon failure.
1.9 kristaps 292: .Bd -literal -offset "XXXX"
1.2 kristaps 293: struct mdoc *mdoc;
294: struct mdoc_node *node;
295: char *buf;
296: size_t len;
297: int line;
298:
299: line = 1;
300: mdoc = mdoc_alloc(NULL, NULL);
301:
302: while ((buf = fgetln(fp, &len))) {
303: buf[len - 1] = '\\0';
304: if ( ! mdoc_parseln(mdoc, line, buf))
305: errx(1, "mdoc_parseln");
306: line++;
307: }
308:
309: if ( ! mdoc_endparse(mdoc))
310: errx(1, "mdoc_endparse");
1.4 kristaps 311: if (NULL == (node = mdoc_node(mdoc)))
312: errx(1, "mdoc_node");
1.2 kristaps 313:
314: parsed(mdoc, node);
315: mdoc_free(mdoc);
316: .Ed
1.6 kristaps 317: .\" SECTION
1.2 kristaps 318: .Sh SEE ALSO
319: .Xr mdoc 7 ,
320: .Xr mdoc.samples 7 ,
321: .Xr groff 1 ,
322: .Xr mdocml 1
1.6 kristaps 323: .\" SECTION
1.2 kristaps 324: .Sh AUTHORS
325: The
326: .Nm
327: utility was written by
328: .An Kristaps Dzonsons Aq kristaps@kth.se .
1.6 kristaps 329: .\" SECTION
1.2 kristaps 330: .Sh BUGS
1.4 kristaps 331: Bugs, un-implemented macros and incompabilities are documented in this
332: section. The baseline for determining whether macro parsing is
333: .Qq incompatible
334: is the default
1.3 kristaps 335: .Xr groff 1
336: system bundled with
337: .Ox .
1.9 kristaps 338: .\" PARAGRAPH
1.3 kristaps 339: .Pp
1.4 kristaps 340: Un-implemented: the
1.2 kristaps 341: .Sq \&Xc
342: and
343: .Sq \&Xo
344: macros aren't handled when used to span lines for the
345: .Sq \&It
346: macro. Such usage is specifically discouraged in
347: .Xr mdoc.samples 7 .
1.9 kristaps 348: .\" PARAGRAPH
1.2 kristaps 349: .Pp
1.4 kristaps 350: Bugs: when
1.2 kristaps 351: .Sq \&It \-column
352: is invoked, whitespace is not stripped around
353: .Sq \&Ta
354: or tab-character separators.
1.9 kristaps 355: .\" PARAGRAPH
356: .Pp
357: Bugs: elements within columns for
358: .Sq \&It \-column
359: are not yet supported.
360: .\" PARAGRAPH
1.3 kristaps 361: .Pp
1.4 kristaps 362: Incompatible: the
1.3 kristaps 363: .Sq \&At
1.4 kristaps 364: macro only accepts a single parameter. Furthermore, several macros
365: .Pf ( Sq \&Pp ,
366: .Sq \&It ,
367: and possibly others) accept multiple arguments with a warning.
1.9 kristaps 368: .\" PARAGRAPH
1.5 kristaps 369: .Pp
370: Incompatible: only those macros specified by
371: .Xr mdoc.samples 7
372: and
373: .Xr mdoc 7
374: for
375: .Ox
376: are supported; support for
377: .Nx
1.6 kristaps 378: and other
379: .Bx
380: systems is in progress.
CVSweb