Annotation of mandoc/mandoc_html.3, Revision 1.24
1.24 ! schwarze 1: .\" $Id: mandoc_html.3,v 1.23 2020/04/24 13:13:06 schwarze Exp $
1.1 schwarze 2: .\"
1.11 schwarze 3: .\" Copyright (c) 2014, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org>
1.1 schwarze 4: .\"
5: .\" Permission to use, copy, modify, and distribute this software for any
6: .\" purpose with or without fee is hereby granted, provided that the above
7: .\" copyright notice and this permission notice appear in all copies.
8: .\"
9: .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10: .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11: .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12: .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13: .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14: .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15: .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16: .\"
1.24 ! schwarze 17: .Dd $Mdocdate: April 24 2020 $
1.1 schwarze 18: .Dt MANDOC_HTML 3
19: .Os
20: .Sh NAME
21: .Nm mandoc_html
22: .Nd internals of the mandoc HTML formatter
23: .Sh SYNOPSIS
1.21 schwarze 24: .In sys/types.h
25: .Fd #include """mandoc.h"""
26: .Fd #include """roff.h"""
27: .Fd #include """out.h"""
28: .Fd #include """html.h"""
1.1 schwarze 29: .Ft void
30: .Fn print_gen_decls "struct html *h"
31: .Ft void
1.11 schwarze 32: .Fn print_gen_comment "struct html *h" "struct roff_node *n"
33: .Ft void
1.1 schwarze 34: .Fn print_gen_head "struct html *h"
35: .Ft struct tag *
36: .Fo print_otag
37: .Fa "struct html *h"
38: .Fa "enum htmltag tag"
1.2 schwarze 39: .Fa "const char *fmt"
40: .Fa ...
1.1 schwarze 41: .Fc
42: .Ft void
43: .Fo print_tagq
44: .Fa "struct html *h"
45: .Fa "const struct tag *until"
46: .Fc
47: .Ft void
48: .Fo print_stagq
49: .Fa "struct html *h"
50: .Fa "const struct tag *suntil"
51: .Fc
52: .Ft void
1.21 schwarze 53: .Fn html_close_paragraph "struct html *h"
54: .Ft enum roff_tok
55: .Fo html_fillmode
56: .Fa "struct html *h"
57: .Fa "enum roff_tok tok"
58: .Fc
59: .Ft int
60: .Fo html_setfont
61: .Fa "struct html *h"
62: .Fa "enum mandoc_esc font"
63: .Fc
64: .Ft void
1.1 schwarze 65: .Fo print_text
66: .Fa "struct html *h"
67: .Fa "const char *word"
68: .Fc
1.21 schwarze 69: .Ft void
70: .Fo print_tagged_text
71: .Fa "struct html *h"
72: .Fa "const char *word"
73: .Fa "struct roff_node *n"
74: .Fc
1.7 schwarze 75: .Ft char *
76: .Fo html_make_id
77: .Fa "const struct roff_node *n"
1.20 schwarze 78: .Fa "int unique"
1.7 schwarze 79: .Fc
1.20 schwarze 80: .Ft struct tag *
81: .Fo print_otag_id
82: .Fa "struct html *h"
83: .Fa "enum htmltag tag"
84: .Fa "const char *cattr"
85: .Fa "struct roff_node *n"
1.7 schwarze 86: .Fc
1.21 schwarze 87: .Ft void
88: .Fn print_endline "struct html *h"
1.1 schwarze 89: .Sh DESCRIPTION
90: The mandoc HTML formatter is not a formal library.
91: However, as it is compiled into more than one program, in particular
92: .Xr mandoc 1
93: and
94: .Xr man.cgi 8 ,
95: and because it may be security-critical in some contexts,
96: some documentation is useful to help to use it correctly and
97: to prevent XSS vulnerabilities.
98: .Pp
99: The formatter produces HTML output on the standard output.
100: Since proper escaping is usually required and best taken care of
101: at one central place, the language-specific formatters
102: .Po
103: .Pa *_html.c ,
104: see
105: .Sx FILES
106: .Pc
107: are not supposed to print directly to
108: .Dv stdout
109: using functions like
110: .Xr printf 3 ,
111: .Xr putc 3 ,
112: .Xr puts 3 ,
113: or
114: .Xr write 2 .
115: Instead, they are expected to use the output functions declared in
116: .Pa html.h
117: and implemented as part of the main HTML formatting engine in
118: .Pa html.c .
119: .Ss Data structures
120: These structures are declared in
121: .Pa html.h .
122: .Bl -tag -width Ds
123: .It Vt struct html
124: Internal state of the HTML formatter.
125: .It Vt struct tag
126: One entry for the LIFO stack of HTML elements.
1.21 schwarze 127: Members include
1.1 schwarze 128: .Fa "enum htmltag tag"
129: and
130: .Fa "struct tag *next" .
131: .El
132: .Ss Private interface functions
133: The function
134: .Fn print_gen_decls
135: prints the opening
136: .Aq Pf \&! Ic DOCTYPE
1.21 schwarze 137: declaration.
1.11 schwarze 138: .Pp
139: The function
140: .Fn print_gen_comment
141: prints the leading comments, usually containing a Copyright notice
142: and license, as an HTML comment.
143: It is intended to be called right after opening the
144: .Aq Ic HTML
145: element.
146: Pass the first
147: .Dv ROFFT_COMMENT
148: node in
149: .Fa n .
1.1 schwarze 150: .Pp
151: The function
152: .Fn print_gen_head
153: prints the opening
154: .Aq Ic META
155: and
156: .Aq Ic LINK
157: elements for the document
158: .Aq Ic HEAD ,
159: using the
160: .Fa style
161: member of
162: .Fa h
163: unless that is
164: .Dv NULL .
165: It uses
166: .Fn print_otag
167: which takes care of properly encoding attributes,
168: which is relevant for the
169: .Fa style
170: link in particular.
171: .Pp
172: The function
173: .Fn print_otag
174: prints the start tag of an HTML element with the name
175: .Fa tag ,
1.2 schwarze 176: optionally including the attributes specified by
177: .Fa fmt .
178: If
179: .Fa fmt
180: is the empty string, no attributes are written.
181: Each letter of
182: .Fa fmt
183: specifies one attribute to write.
184: Most attributes require one
185: .Va char *
186: argument which becomes the value of the attribute.
187: The arguments have to be given in the same order as the attribute letters.
1.5 schwarze 188: If an argument is
189: .Dv NULL ,
190: the respective attribute is not written.
1.2 schwarze 191: .Bl -tag -width 1n -offset indent
192: .It Cm c
193: Print a
194: .Cm class
195: attribute.
196: .It Cm h
197: Print a
198: .Cm href
199: attribute.
1.3 schwarze 200: This attribute letter can optionally be followed by a modifier letter.
201: If followed by
202: .Cm R ,
203: it formats the link as a local one by prefixing a
204: .Sq #
205: character.
206: If followed by
207: .Cm I ,
208: it interpretes the argument as a header file name
209: and generates a link using the
210: .Xr mandoc 1
211: .Fl O Cm includes
212: option.
213: If followed by
214: .Cm M ,
215: it takes two arguments instead of one, a manual page name and
216: section, and formats them as a link to a manual page using the
217: .Xr mandoc 1
218: .Fl O Cm man
219: option.
1.2 schwarze 220: .It Cm i
221: Print an
222: .Cm id
1.24 ! schwarze 223: attribute.
! 224: .It Cm r
! 225: Print an ARIA
! 226: .Cm role
1.2 schwarze 227: attribute.
228: .It Cm \&?
229: Print an arbitrary attribute.
230: This format letter requires two
231: .Vt char *
232: arguments, the attribute name and the value.
1.5 schwarze 233: The name must not be
234: .Dv NULL .
1.23 schwarze 235: .It Cm s
236: Print a
237: .Cm style
238: attribute.
239: If present, it must be the last format letter.
240: It requires two
241: .Va char *
242: arguments.
243: The first is the name of the style property, the second its value.
244: The name must not be
245: .Dv NULL .
246: The
247: .Cm s
248: .Ar fmt
249: letter can be repeated, each repetition requiring an additional pair of
250: .Va char *
251: arguments.
1.2 schwarze 252: .El
253: .Pp
254: .Fn print_otag
255: uses the private function
1.1 schwarze 256: .Fn print_encode
257: to take care of HTML encoding.
258: If required by the element type, it remembers in
259: .Fa h
260: that the element is open.
261: The function
262: .Fn print_tagq
263: is used to close out all open elements up to and including
264: .Fa until ;
265: .Fn print_stagq
266: is a variant to close out all open elements up to but excluding
267: .Fa suntil .
1.21 schwarze 268: The function
269: .Fn html_close_paragraph
270: closes all open elements that establish phrasing context,
271: thus returning to the innermost flow context.
272: .Pp
273: The function
274: .Fn html_fillmode
275: switches to fill mode if
276: .Fa want
277: is
278: .Dv ROFF_fi
279: or to no-fill mode if
280: .Fa want
281: is
282: .Dv ROFF_nf .
283: Switching from fill mode to no-fill mode closes the current paragraph
284: and opens a
285: .Aq Ic PRE
286: element.
287: Switching in the opposite direction closes the
288: .Aq Ic PRE
289: element, but does not open a new paragraph.
290: If
291: .Fa want
292: matches the mode that is already active, no elements are closed nor opened.
293: If
294: .Fa want
295: is
296: .Dv TOKEN_NONE ,
297: the mode remains as it is.
298: .Pp
299: The function
300: .Fn html_setfont
301: selects the
302: .Fa font ,
303: which can be
304: .Dv ESCAPE_FONTROMAN ,
305: .Dv ESCAPE_FONTBOLD ,
306: .Dv ESCAPE_FONTITALIC ,
307: .Dv ESCAPE_FONTBI ,
308: or
309: .Dv ESCAPE_FONTCW ,
310: for future text output and internally remembers
311: the font that was active before the change.
312: If the
313: .Fa font
314: argument is
315: .Dv ESCAPE_FONTPREV ,
316: the current and the previous font are exchanged.
317: This function only changes the internal state of the
318: .Fa h
319: object; no HTML elements are written yet.
320: Subsequent text output will write font elements when needed.
1.1 schwarze 321: .Pp
322: The function
323: .Fn print_text
324: prints HTML element content.
325: It uses the private function
326: .Fn print_encode
327: to take care of HTML encoding.
328: If the document has requested a non-standard font, for example using a
329: .Xr roff 7
330: .Ic \ef
331: font escape sequence,
332: .Fn print_text
333: wraps
334: .Fa word
335: in an HTML font selection element using the
336: .Fn print_otag
337: and
338: .Fn print_tagq
339: functions.
340: .Pp
1.7 schwarze 341: The function
1.21 schwarze 342: .Fn print_tagged_text
343: is a variant of
344: .Fn print_text
345: that wraps
346: .Fa word
347: in an
348: .Aq Ic A
349: element of class
350: .Qq permalink
351: if
352: .Fa n
353: is not
354: .Dv NULL
355: and yields a segment identifier when passed to
356: .Fn html_make_id .
357: .Pp
358: The function
1.7 schwarze 359: .Fn html_make_id
1.20 schwarze 360: allocates a string to be used for the
361: .Cm id
362: attribute of an HTML element and/or as a segment identifier for a URI in an
363: .Aq Ic A
364: element.
365: If
366: .Fa n
367: contains a
1.21 schwarze 368: .Fa tag
1.20 schwarze 369: attribute, it is used; otherwise, child nodes are used.
370: If
1.7 schwarze 371: .Fa n
1.20 schwarze 372: is an
373: .Ic \&Sh ,
374: .Ic \&Ss ,
375: .Ic \&Sx ,
376: .Ic SH ,
377: or
378: .Ic SS
379: node, the resulting string is the concatenation of the child strings;
380: for other node types, only the first child is used.
381: Bytes not permitted in URI-fragment strings are replaced by underscores.
382: If any of the children to be used is not a text node,
383: no string is generated and
1.7 schwarze 384: .Dv NULL
1.20 schwarze 385: is returned instead.
386: If the
387: .Fa unique
388: argument is non-zero, deduplication is performed by appending an
389: underscore and a decimal integer, if necessary.
1.22 schwarze 390: If the
391: .Fa unique
392: argument is 1, this is assumed to be the first call for this tag
393: at this location, typically for use by
394: .Dv NODE_ID ,
395: so the integer is incremented before use.
396: If the
397: .Fa unique
398: argument is 2, this is ssumed to be the second call for this tag
399: at this location, typically for use by
400: .Dv NODE_HREF ,
401: so the existing integer, if any, is used without incrementing it.
1.7 schwarze 402: .Pp
403: The function
1.20 schwarze 404: .Fn print_otag_id
405: opens a
406: .Fa tag
407: element of class
408: .Fa cattr
409: for the node
410: .Fa n .
411: If the flag
412: .Dv NODE_ID
413: is set in
414: .Fa n ,
415: it attempts to generate an
416: .Cm id
417: attribute with
418: .Fn html_make_id .
1.21 schwarze 419: If the flag
420: .Dv NODE_HREF
421: is set in
422: .Fa n ,
423: an
1.20 schwarze 424: .Aq Ic A
425: element of class
1.21 schwarze 426: .Qq permalink
427: is added:
1.20 schwarze 428: outside if
429: .Fa n
1.21 schwarze 430: generates an element that can only occur in phrasing context,
431: or inside otherwise.
1.20 schwarze 432: This function is a wrapper around
433: .Fn html_make_id
434: and
435: .Fn print_otag ,
1.22 schwarze 436: automatically chosing the
1.20 schwarze 437: .Fa unique
1.22 schwarze 438: argument appropriately and setting the
1.20 schwarze 439: .Fa fmt
440: arguments to
441: .Qq chR
442: and
443: .Qq ci ,
444: respectively.
1.7 schwarze 445: .Pp
1.21 schwarze 446: The function
447: .Fn print_endline
448: makes sure subsequent output starts on a new HTML output line.
449: If nothing was printed on the current output line yet, it has no effect.
450: Otherwise, it appends any buffered text to the current output line,
451: ends the line, and updates the internal state of the
452: .Fa h
453: object.
454: .Pp
1.1 schwarze 455: The functions
456: .Fn print_eqn ,
457: .Fn print_tbl ,
458: and
459: .Fn print_tblclose
460: are not yet documented.
1.20 schwarze 461: .Sh RETURN VALUES
462: The functions
463: .Fn print_otag
464: and
465: .Fn print_otag_id
466: return a pointer to a new element on the stack of HTML elements.
467: When
468: .Fn print_otag_id
469: opens two elements, a pointer to the outer one is returned.
470: The memory pointed to is owned by the library and is automatically
471: .Xr free 3 Ns d
472: when
473: .Fn print_tagq
474: is called on it or when
475: .Fn print_stagq
476: is called on a parent element.
477: .Pp
478: The function
1.21 schwarze 479: .Fn html_fillmode
480: returns
481: .Dv ROFF_fi
482: if fill mode was active before the call or
483: .Dv ROFF_nf
484: otherwise.
485: .Pp
486: The function
1.20 schwarze 487: .Fn html_make_id
488: returns a newly allocated string or
489: .Dv NULL
490: if
491: .Fa n
492: lacks text data to create the attribute from.
1.22 schwarze 493: The caller is responsible for
1.20 schwarze 494: .Xr free 3 Ns ing
495: the returned string after using it.
496: .Pp
497: In case of
498: .Xr malloc 3
499: failure, these functions do not return but call
500: .Xr err 3 .
1.1 schwarze 501: .Sh FILES
502: .Bl -tag -width mandoc_aux.c -compact
503: .It Pa main.h
504: declarations of public functions for use by the main program,
505: not yet documented
506: .It Pa html.h
507: declarations of data types and private functions
508: for use by language-specific HTML formatters
509: .It Pa html.c
510: main HTML formatting engine and utility functions
511: .It Pa mdoc_html.c
512: .Xr mdoc 7
513: HTML formatter
514: .It Pa man_html.c
515: .Xr man 7
516: HTML formatter
517: .It Pa tbl_html.c
518: .Xr tbl 7
519: HTML formatter
520: .It Pa eqn_html.c
521: .Xr eqn 7
522: HTML formatter
1.21 schwarze 523: .It Pa roff_html.c
524: .Xr roff 7
525: HTML formatter, handling requests like
526: .Ic br ,
527: .Ic ce ,
528: .Ic fi ,
529: .Ic ft ,
530: .Ic nf ,
531: .Ic rj ,
532: and
533: .Ic sp .
1.1 schwarze 534: .It Pa out.h
535: declarations of data types and private functions
536: for shared use by all mandoc formatters,
537: not yet documented
538: .It Pa out.c
539: private functions for shared use by all mandoc formatters
540: .It Pa mandoc_aux.h
541: declarations of common mandoc utility functions, see
542: .Xr mandoc 3
543: .It Pa mandoc_aux.c
544: implementation of common mandoc utility functions
545: .El
546: .Sh SEE ALSO
547: .Xr mandoc 1 ,
548: .Xr mandoc 3 ,
549: .Xr man.cgi 8
550: .Sh AUTHORS
551: .An -nosplit
552: The mandoc HTML formatter was written by
553: .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv .
1.5 schwarze 554: It is maintained by
555: .An Ingo Schwarze Aq Mt schwarze@openbsd.org ,
556: who also wrote this manual.
CVSweb