=================================================================== RCS file: /cvs/mandoc/Attic/preconv.1,v retrieving revision 1.2 retrieving revision 1.3 diff -u -p -r1.2 -r1.3 --- mandoc/Attic/preconv.1 2011/05/26 12:14:46 1.2 +++ mandoc/Attic/preconv.1 2011/05/26 14:43:07 1.3 @@ -1,4 +1,4 @@ -.\" $Id: preconv.1,v 1.2 2011/05/26 12:14:46 kristaps Exp $ +.\" $Id: preconv.1,v 1.3 2011/05/26 14:43:07 kristaps Exp $ .\" .\" Copyright (c) 2011 Kristaps Dzonsons .\" @@ -42,18 +42,8 @@ Its arguments are as follows: .Bl -tag -width Ds .It Fl D Ar enc The default encoding. -This is case-insensitive. -See -.Sx Algorithm -and -.Sx Encodings . .It Fl e Ar enc The document's encoding. -This is case-insensitive. -See -.Sx Algorithm -and -.Sx Encodings . .It Ar file The input file. .El @@ -63,27 +53,23 @@ If is not provided, .Nm accepts standard input. -Output is written to standard output. -Unicode characters in the ASCII range are printed as regular ASCII -characters; those above this range are printed using the +See +.Sx Algorithm +for encoding choice. +.Pp +The recoded input is written to standard output: Unicode characters in +the ASCII range are printed as regular ASCII characters, while those +above this range are printed using the .Sq \e[uNNNN] format documented in .Xr mandoc_char 7 . .Pp If input bytes are improperly formed in the current encoding, they're passed unmodified to standard output. -.Ss Encodings -The +For some encodings, such as UTF-8, unrecoverable input sequences will +cause .Nm -utility accepts the -.Ar utf\-8 , -.Ar us\-ascii , -and -.Ar latin\-1 -encodings as arguments to -.Fl D Ar enc -or -.Fl e Ar enc . +to stop processing and exit. .Ss Algorithm An encoding is chosen according to the following steps: .Bl -enum @@ -91,13 +77,41 @@ An encoding is chosen according to the following steps From the argument passed to .Fl e Ar enc . .It -If a BOM exists, utf\-8 encoding is selected. +If a BOM exists, UTF\-8 encoding is selected. .It +From the coding tags parsed from +.Qq File Variables +on the first two lines of input. +A file variable is an input line of the form +.Pp +.Dl \%.\e\(dq -*- key: val [; key: val ]* -*- +.Pp +where +.Cm key +is +.Qq coding +and +.Cm val +is the name of the encoding. +A typical usage may be +.Pp +.Dl \%.\e\(dq -*- mode: troff; coding: utf-8 -*- +.It From the argument passed to .Fl D Ar enc . .It If all else fails, Latin\-1 is used. .El +.Pp +The +.Nm +utility recognises the UTF\-8, us\-ascii, and latin\-1 encodings as +passed to the +.Fl e +and +.Fl D +arguments, or as coding tags. +Encodings are matched case-insensitively. .\" .Sh IMPLEMENTATION NOTES .\" Not used in OpenBSD. .\" .Sh RETURN VALUES @@ -107,7 +121,12 @@ If all else fails, Latin\-1 is used. .\" .Sh FILES .Sh EXIT STATUS .Ex -std -.\" .Sh EXAMPLES +.Sh EXAMPLES +Explicitly page a UTF\-8 manual +.Pa foo.1 +in the current locale: +.Pp +.Dl $ preconv \-e utf\-8 foo.1 | mandoc -Tlocale | less .\" .Sh DIAGNOSTICS .\" For sections 1, 4, 6, 7, & 8 only. .\" .Sh ERRORS