[BACK]Return to mandoc_html.3 CVS log [TXT][DIR] Up to [cvsweb.bsd.lv] / mandoc

File: [cvsweb.bsd.lv] / mandoc / mandoc_html.3 (download)

Revision 1.20, Fri Mar 13 15:32:28 2020 UTC (4 years, 1 month ago) by schwarze
Branch: MAIN
Changes since 1.19: +119 -18 lines

Split tagging into a validation part including prioritization
in tag.{h,c} and {mdoc,man}_validate.c
and into a formatting part including command line argument checking
in term_tag.{h,c}, html.c, and {mdoc|man}_{term|html}.c.

Immediate functional benefits include:
* Improved prioritization of automatic tags for .Em and .Sy.
* Avoiding bogus automatic tags when .Em, .Fn, or .Sy are explicitly tagged.
* Explicit tagging of .Er and .Fl now works in HTML output.
* Automatic tagging of .IP and .TP now works in HTML output.
But mainly, this patch provides clean earth to build further improvements on.

Technical changes:
* Main program: Write a tag file for ASCII and UTF-8 output only.
* All formatters: There is no more need to delay writing the tags.
* mdoc(7)+man(7) formatters: No more need for elaborate syntax tree inspection.
* HTML formatter: If available, use the "string" attribute as the tag.
* HTML formatter: New function to write permalinks, to reduce code duplication.

Style cleanup in the vicinity while here:
* mdoc(7) terminal formatter: To set up bold font for children,
defer to termp_bold_pre() rather than calling term_fontpush() manually.
* mdoc(7) terminal formatter: Garbage collect some duplicate functions.
* mdoc(7) HTML formatter: Unify <code> handling, delete redundant functions.
* Where possible, use switch statements rather than if cascades.
* Get rid of some more Yoda notation.

The necessity for such changes was first discussed with kn@, but i didn't
bother him with a request to review the resulting -673/+782 line patch.

.\"	$Id: mandoc_html.3,v 1.20 2020/03/13 15:32:28 schwarze Exp $
.\"
.\" Copyright (c) 2014, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org>
.\"
.\" Permission to use, copy, modify, and distribute this software for any
.\" purpose with or without fee is hereby granted, provided that the above
.\" copyright notice and this permission notice appear in all copies.
.\"
.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
.\"
.Dd $Mdocdate: March 13 2020 $
.Dt MANDOC_HTML 3
.Os
.Sh NAME
.Nm mandoc_html
.Nd internals of the mandoc HTML formatter
.Sh SYNOPSIS
.In "html.h"
.Ft void
.Fn print_gen_decls "struct html *h"
.Ft void
.Fn print_gen_comment "struct html *h" "struct roff_node *n"
.Ft void
.Fn print_gen_head "struct html *h"
.Ft struct tag *
.Fo print_otag
.Fa "struct html *h"
.Fa "enum htmltag tag"
.Fa "const char *fmt"
.Fa ...
.Fc
.Ft void
.Fo print_tagq
.Fa "struct html *h"
.Fa "const struct tag *until"
.Fc
.Ft void
.Fo print_stagq
.Fa "struct html *h"
.Fa "const struct tag *suntil"
.Fc
.Ft void
.Fo print_text
.Fa "struct html *h"
.Fa "const char *word"
.Fc
.Ft char *
.Fo html_make_id
.Fa "const struct roff_node *n"
.Fa "int unique"
.Fc
.Ft struct tag *
.Fo print_otag_id
.Fa "struct html *h"
.Fa "enum htmltag tag"
.Fa "const char *cattr"
.Fa "struct roff_node *n"
.Fc
.Sh DESCRIPTION
The mandoc HTML formatter is not a formal library.
However, as it is compiled into more than one program, in particular
.Xr mandoc 1
and
.Xr man.cgi 8 ,
and because it may be security-critical in some contexts,
some documentation is useful to help to use it correctly and
to prevent XSS vulnerabilities.
.Pp
The formatter produces HTML output on the standard output.
Since proper escaping is usually required and best taken care of
at one central place, the language-specific formatters
.Po
.Pa *_html.c ,
see
.Sx FILES
.Pc
are not supposed to print directly to
.Dv stdout
using functions like
.Xr printf 3 ,
.Xr putc 3 ,
.Xr puts 3 ,
or
.Xr write 2 .
Instead, they are expected to use the output functions declared in
.Pa html.h
and implemented as part of the main HTML formatting engine in
.Pa html.c .
.Ss Data structures
These structures are declared in
.Pa html.h .
.Bl -tag -width Ds
.It Vt struct html
Internal state of the HTML formatter.
.It Vt struct tag
One entry for the LIFO stack of HTML elements.
Members are
.Fa "enum htmltag tag"
and
.Fa "struct tag *next" .
.El
.Ss Private interface functions
The function
.Fn print_gen_decls
prints the opening
.Ao Pf \&? Ic xml ? Ac
and
.Aq Pf \&! Ic DOCTYPE
declarations required for the current document type.
.Pp
The function
.Fn print_gen_comment
prints the leading comments, usually containing a Copyright notice
and license, as an HTML comment.
It is intended to be called right after opening the
.Aq Ic HTML
element.
Pass the first
.Dv ROFFT_COMMENT
node in
.Fa n .
.Pp
The function
.Fn print_gen_head
prints the opening
.Aq Ic META
and
.Aq Ic LINK
elements for the document
.Aq Ic HEAD ,
using the
.Fa style
member of
.Fa h
unless that is
.Dv NULL .
It uses
.Fn print_otag
which takes care of properly encoding attributes,
which is relevant for the
.Fa style
link in particular.
.Pp
The function
.Fn print_otag
prints the start tag of an HTML element with the name
.Fa tag ,
optionally including the attributes specified by
.Fa fmt .
If
.Fa fmt
is the empty string, no attributes are written.
Each letter of
.Fa fmt
specifies one attribute to write.
Most attributes require one
.Va char *
argument which becomes the value of the attribute.
The arguments have to be given in the same order as the attribute letters.
If an argument is
.Dv NULL ,
the respective attribute is not written.
.Bl -tag -width 1n -offset indent
.It Cm c
Print a
.Cm class
attribute.
.It Cm h
Print a
.Cm href
attribute.
This attribute letter can optionally be followed by a modifier letter.
If followed by
.Cm R ,
it formats the link as a local one by prefixing a
.Sq #
character.
If followed by
.Cm I ,
it interpretes the argument as a header file name
and generates a link using the
.Xr mandoc 1
.Fl O Cm includes
option.
If followed by
.Cm M ,
it takes two arguments instead of one, a manual page name and
section, and formats them as a link to a manual page using the
.Xr mandoc 1
.Fl O Cm man
option.
.It Cm i
Print an
.Cm id
attribute.
.It Cm \&?
Print an arbitrary attribute.
This format letter requires two
.Vt char *
arguments, the attribute name and the value.
The name must not be
.Dv NULL .
.It Cm s
Print a
.Cm style
attribute.
If present, it must be the last format letter.
It requires two
.Va char *
arguments.
The first is the name of the style property, the second its value.
The name must not be
.Dv NULL .
The
.Cm s
.Ar fmt
letter can be repeated, each repetition requiring an additional pair of
.Va char *
arguments.
.El
.Pp
.Fn print_otag
uses the private function
.Fn print_encode
to take care of HTML encoding.
If required by the element type, it remembers in
.Fa h
that the element is open.
The function
.Fn print_tagq
is used to close out all open elements up to and including
.Fa until ;
.Fn print_stagq
is a variant to close out all open elements up to but excluding
.Fa suntil .
.Pp
The function
.Fn print_text
prints HTML element content.
It uses the private function
.Fn print_encode
to take care of HTML encoding.
If the document has requested a non-standard font, for example using a
.Xr roff 7
.Ic \ef
font escape sequence,
.Fn print_text
wraps
.Fa word
in an HTML font selection element using the
.Fn print_otag
and
.Fn print_tagq
functions.
.Pp
The function
.Fn html_make_id
allocates a string to be used for the
.Cm id
attribute of an HTML element and/or as a segment identifier for a URI in an
.Aq Ic A
element.
If
.Fa n
contains a
.Fa string
attribute, it is used; otherwise, child nodes are used.
If
.Fa n
is an
.Ic \&Sh ,
.Ic \&Ss ,
.Ic \&Sx ,
.Ic SH ,
or
.Ic SS
node, the resulting string is the concatenation of the child strings;
for other node types, only the first child is used.
Bytes not permitted in URI-fragment strings are replaced by underscores.
If any of the children to be used is not a text node,
no string is generated and
.Dv NULL
is returned instead.
If the
.Fa unique
argument is non-zero, deduplication is performed by appending an
underscore and a decimal integer, if necessary.
.Pp
The function
.Fn print_otag_id
opens a
.Fa tag
element of class
.Fa cattr
for the node
.Fa n .
If the flag
.Dv NODE_ID
is set in
.Fa n ,
it attempts to generate an
.Cm id
attribute with
.Fn html_make_id .
If an
.Cm id
attribute is written,
.Fn print_otag_id
also adds an
.Aq Ic A
element of class
.Qq permalink :
outside if
.Fa n
generates a phrasing element, or inside otherwise.
This function is a wrapper around
.Fn html_make_id
and
.Fn print_otag ,
fixing the
.Fa unique
argument to 1 and the
.Fa fmt
arguments to
.Qq chR
and
.Qq ci ,
respectively.
.Pp
The functions
.Fn print_eqn ,
.Fn print_tbl ,
and
.Fn print_tblclose
are not yet documented.
.Sh RETURN VALUES
The functions
.Fn print_otag
and
.Fn print_otag_id
return a pointer to a new element on the stack of HTML elements.
When
.Fn print_otag_id
opens two elements, a pointer to the outer one is returned.
The memory pointed to is owned by the library and is automatically
.Xr free 3 Ns d
when
.Fn print_tagq
is called on it or when
.Fn print_stagq
is called on a parent element.
.Pp
The function
.Fn html_make_id
returns a newly allocated string or
.Dv NULL
if
.Fa n
lacks text data to create the attribute from.
If the
.Fa unique
argument is 0, the caller is responsible for
.Xr free 3 Ns ing
the returned string after using it.
If the
.Fa unique
argument is non-zero, the
.Va id_unique
ohash table is used for de-duplication and owns the returned string.
In this case, it will be freed automatically by
.Fn html_reset
or
.Fn html_free .
.Pp
In case of
.Xr malloc 3
failure, these functions do not return but call
.Xr err 3 .
.Sh FILES
.Bl -tag -width mandoc_aux.c -compact
.It Pa main.h
declarations of public functions for use by the main program,
not yet documented
.It Pa html.h
declarations of data types and private functions
for use by language-specific HTML formatters
.It Pa html.c
main HTML formatting engine and utility functions
.It Pa mdoc_html.c
.Xr mdoc 7
HTML formatter
.It Pa man_html.c
.Xr man 7
HTML formatter
.It Pa tbl_html.c
.Xr tbl 7
HTML formatter
.It Pa eqn_html.c
.Xr eqn 7
HTML formatter
.It Pa out.h
declarations of data types and private functions
for shared use by all mandoc formatters,
not yet documented
.It Pa out.c
private functions for shared use by all mandoc formatters
.It Pa mandoc_aux.h
declarations of common mandoc utility functions, see
.Xr mandoc 3
.It Pa mandoc_aux.c
implementation of common mandoc utility functions
.El
.Sh SEE ALSO
.Xr mandoc 1 ,
.Xr mandoc 3 ,
.Xr man.cgi 8
.Sh AUTHORS
.An -nosplit
The mandoc HTML formatter was written by
.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv .
It is maintained by
.An Ingo Schwarze Aq Mt schwarze@openbsd.org ,
who also wrote this manual.