version 1.6, 2009/02/23 09:33:34 |
version 1.11, 2009/02/25 17:02:47 |
|
|
The |
The |
.Nm mdoc |
.Nm mdoc |
library parses lines of mdoc input into an abstract syntax tree. |
library parses lines of mdoc input into an abstract syntax tree. |
.Dq mdoc |
.Dq mdoc , |
is a macro package of the |
which is used to format BSD manual pages, is a macro package of the |
.Dq roff |
.Dq roff |
language, which is used to format BSD manual pages. The |
language. The |
.Nm |
.Nm |
library implements only those macros documented in the |
library implements only those macros documented in the |
.Xr mdoc 7 |
.Xr mdoc 7 |
and |
and |
.Xr mdoc.samples 7 |
.Xr mdoc.samples 7 |
manuals. |
manuals. Documents with |
|
.Xr refer 1 , |
|
.Xr eqn 1 |
|
and other pre-processor sections aren't accomodated. |
.\" PARAGRAPH |
.\" PARAGRAPH |
.Pp |
.Pp |
.Nm |
.Nm |
Line 89 This section further defines the |
|
Line 92 This section further defines the |
|
.Sx Functions |
.Sx Functions |
and |
and |
.Sx Variables |
.Sx Variables |
available to programmers. The last sub-section, |
available to programmers. Following that, |
|
.Sx Character Encoding |
|
describes input format. Lastly, |
.Sx Abstract Syntax Tree , |
.Sx Abstract Syntax Tree , |
documents the output tree. |
documents the output tree. |
.\" SUBSECTION |
.\" SUBSECTION |
Line 99 Both functions (see |
|
Line 104 Both functions (see |
|
and variables (see |
and variables (see |
.Sx Variables ) |
.Sx Variables ) |
may use the following types: |
may use the following types: |
.Bl -ohang |
.Bl -ohang -offset "XXXX" |
.\" LIST-ITEM |
.\" LIST-ITEM |
.It Vt struct mdoc |
.It Vt struct mdoc |
An opaque type defined in |
An opaque type defined in |
|
|
.\" SUBSECTION |
.\" SUBSECTION |
.Ss Functions |
.Ss Functions |
Function descriptions follow: |
Function descriptions follow: |
.Bl -ohang |
.Bl -ohang -offset "XXXX" |
.\" LIST-ITEM |
.\" LIST-ITEM |
.It Fn mdoc_alloc |
.It Fn mdoc_alloc |
Allocates a parsing structure. The |
Allocates a parsing structure. The |
Line 165 return 0, the data will be incomplete. |
|
Line 170 return 0, the data will be incomplete. |
|
.\" SUBSECTION |
.\" SUBSECTION |
.Ss Variables |
.Ss Variables |
The following variables are also defined: |
The following variables are also defined: |
.Bl -ohang |
.Bl -ohang -offset "XXXX" |
.\" LIST-ITEM |
.\" LIST-ITEM |
.It Va mdoc_macronames |
.It Va mdoc_macronames |
An array of string-ified token names. |
An array of string-ified token names. |
Line 174 An array of string-ified token names. |
|
Line 179 An array of string-ified token names. |
|
An array of string-ified token argument names. |
An array of string-ified token argument names. |
.El |
.El |
.\" SUBSECTION |
.\" SUBSECTION |
|
.Ss Character Encoding |
|
The |
|
.Xr mdoc 3 |
|
library accepts only printable ASCII characters as defined by |
|
.Xr isprint 3 . |
|
Non-ASCII character sequences are escaped with an escape character |
|
.Sq \\ |
|
and followed by either an open-parenthesis |
|
.Sq \&( |
|
for two-character sequences; an open-bracket |
|
.Sq \&[ |
|
for n-character sequences (terminated at a close-bracket |
|
.Sq \&] ) ; |
|
or one of a small set of single characters for other escapes. |
|
.\" SUBSECTION |
.Ss Abstract Syntax Tree |
.Ss Abstract Syntax Tree |
The |
The |
.Nm |
.Nm |
|
|
The tree itself is arranged according to the following normal form, |
The tree itself is arranged according to the following normal form, |
where capitalised non-terminals represent nodes. |
where capitalised non-terminals represent nodes. |
.Pp |
.Pp |
.Bl -tag -width "ELEMENTXX" -compact |
.Bl -tag -width "ELEMENTXX" -compact -offset "XXXX" |
.\" LIST-ITEM |
.\" LIST-ITEM |
.It ROOT |
.It ROOT |
\(<- mnode+ |
\(<- mnode+ |
Line 238 where capitalised non-terminals represent nodes. |
|
Line 258 where capitalised non-terminals represent nodes. |
|
.Pp |
.Pp |
Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of |
Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of |
the BLOCK production. These refer to punctuation marks. Furthermore, |
the BLOCK production. These refer to punctuation marks. Furthermore, |
although a TEXT node will generally have a non-zero-length string, it |
although a TEXT node will generally have a non-zero-length string, in |
certain cases, such as |
the specific case of |
.Dq \&.Bd \-literal , |
.Sq \&.Bd \-literal , |
an empty line will produce a zero-length string. |
an empty line will produce a zero-length string. |
.\" PARAGRAPH |
.\" PARAGRAPH |
.Pp |
.Pp |
The rule-of-thumb for mapping node types to macros follows: in-line |
The rule-of-thumb for mapping node types to macros follows. In-line |
elements, such as |
elements, such as |
.Dq \&.Em foo , |
.Sq \&.Em foo , |
are classified as ELEMENT nodes, which can only contain text. |
are classified as ELEMENT nodes, which can only contain text. |
Multi-line elements such as |
Multi-line elements, such as |
.Dq \&.Sh |
.Sq \&.Sh , |
are BLOCK elements, where the HEAD constitutes line contents and the |
are BLOCK elements, where the HEAD constitutes line contents and the |
BODY constitutes subsequent lines. In-line elements with matching |
BODY constitutes subsequent lines. In-line elements with matching |
pairs, such as |
pairs, such as |
.Dq \&.So |
.Sq \&.So |
and |
and |
.Dq \&.Sc , |
.Sq \&.Sc , |
are BLOCK elements with no HEAD tag. The only exception to this is |
are BLOCK elements with no HEAD tag. The only exception to this is |
.Dq \&.Eo |
.Sq \&.Eo |
and |
and |
.Dq \&.Ec , |
.Sq \&.Ec , |
which has a HEAD and TAIL node corresponding to the enclosure string. |
which has a HEAD and TAIL node corresponding to the enclosure string. |
TEXT nodes, obviously, constitute text; the ROOT node is the document's |
TEXT nodes, obviously, constitute text, and the ROOT node is the |
root. |
document's root. |
.\" SECTION |
.\" SECTION |
.Sh EXAMPLES |
.Sh EXAMPLES |
The following example reads lines from stdin and parses them, operating |
The following example reads lines from stdin and parses them, operating |
Line 272 Note that, if the last line of the file isn't newline- |
|
Line 292 Note that, if the last line of the file isn't newline- |
|
will truncate the file's last character (see |
will truncate the file's last character (see |
.Xr fgetln 3 ) . |
.Xr fgetln 3 ) . |
Further, this example does not error-check nor free memory upon failure. |
Further, this example does not error-check nor free memory upon failure. |
.Bd -literal |
.Bd -literal -offset "XXXX" |
struct mdoc *mdoc; |
struct mdoc *mdoc; |
struct mdoc_node *node; |
struct mdoc_node *node; |
char *buf; |
char *buf; |
|
|
.Xr groff 1 |
.Xr groff 1 |
system bundled with |
system bundled with |
.Ox . |
.Ox . |
|
.\" PARAGRAPH |
.Pp |
.Pp |
Un-implemented: the |
Un-implemented: the |
.Sq \&Xc |
.Sq \&Xc |
Line 327 macros aren't handled when used to span lines for the |
|
Line 348 macros aren't handled when used to span lines for the |
|
.Sq \&It |
.Sq \&It |
macro. Such usage is specifically discouraged in |
macro. Such usage is specifically discouraged in |
.Xr mdoc.samples 7 . |
.Xr mdoc.samples 7 . |
|
.\" PARAGRAPH |
.Pp |
.Pp |
Bugs: when |
Bugs: when |
.Sq \&It \-column |
.Sq \&It \-column |
is invoked, whitespace is not stripped around |
is invoked, whitespace is not stripped around |
.Sq \&Ta |
.Sq \&Ta |
or tab-character separators. |
or tab-character separators. |
|
.\" PARAGRAPH |
.Pp |
.Pp |
|
Bugs: elements within columns for |
|
.Sq \&It \-column |
|
are not yet supported. |
|
.\" PARAGRAPH |
|
.Pp |
Incompatible: the |
Incompatible: the |
.Sq \&At |
.Sq \&At |
macro only accepts a single parameter. Furthermore, several macros |
macro only accepts a single parameter. Furthermore, several macros |
.Pf ( Sq \&Pp , |
.Pf ( Sq \&Pp , |
.Sq \&It , |
.Sq \&It , |
and possibly others) accept multiple arguments with a warning. |
and possibly others) accept multiple arguments with a warning. |
|
.\" PARAGRAPH |
.Pp |
.Pp |
Incompatible: only those macros specified by |
Incompatible: only those macros specified by |
.Xr mdoc.samples 7 |
.Xr mdoc.samples 7 |