[BACK]Return to input.in CVS log [TXT][DIR] Up to [cvsweb.bsd.lv] / mandoc / regress / char / unicode

File: [cvsweb.bsd.lv] / mandoc / regress / char / unicode / input.in (download)

Revision 1.6, Thu May 16 21:29:21 2024 UTC (2 weeks, 3 days ago) by schwarze
Branch: MAIN
CVS Tags: HEAD
Changes since 1.5: +3 -3 lines

Check that lower-case variants of UTF-16 surrogate escape sequences
are rejected with the correct error message.

.\" $OpenBSD: input.in,v 1.6 2024/05/16 21:27:38 schwarze Exp $
.TH CHAR-UNICODE-INPUT 1 "May 16, 2024"
.SH NAME
char-unicode-input \- Unicode characters in the input file
.SH DESCRIPTION
lowest valid: €
.SS One-byte range
.TS
tab(:);
l l l l.
U+0000:0x00:\[u0000]:lowest ASCII
U+001f:0x1f:\[u001F]:highest ASCII control character
U+007f:0x7f:\[u007F]:highest ASCII
:0x80::leading lowest continuation
:0xbf::leading highest continuation
.TE
.SS Two-byte range
.TS
tab(:);
l l l l.
U+0000:0xc080::lowest obfuscated ASCII
U+007f:0xc1bf::highest obfuscated ASCII
U+0080:0xc280:\[u0080]€:lowest two-byte
U+07FF:0xdfbf:\[u07FF]߿:highest two-byte
:0xc278:x:ASCII instead of continuation
:0xc2c380:À:start byte instead of continuation
.TE
.SS Three-byte range
.TS
tab(:);
l l l l.
U+0000:0xe08080::lowest obfuscated ASCII
U+007f:0xe081bf::highest obfuscated ASCII
U+0080:0xe08280::lowest obfuscated two-byte
U+07FF:0xe09fbf::highest obfuscated two-byte
U+0800:0xe0a080:\[u0800]ࠀ:lowest three-byte
U+0FFF:0xe0bfbf:\[u0FFF]࿿:end of first start byte
U+1000:0xe18080:\[u1000]က:begin of second start byte
U+CFFF:0xecbfbf:\[uCFFF]쿿:end of last normal start byte
U+D000:0xed8080:\[uD000]퀀:begin of last start byte
U+D7FB:0xed9fbb:\[uD7FB]ퟻ:highest valid public three-byte
U+D7FF:0xed9fbf:\[uD7FF]퟿:highest public three-byte
U+D800:0xeda080:\[uD800]\[ud800]:lowest surrogate
U+DFFF:0xedbfbf:\[uDFFF]\[udfff]:highest surrogate
U+E000:0xee8080:\[uE000]:lowest private use
U+F8FF:0xefa3bf:\[uF8FF]:highest private use
U+F900:0xefa480:\[uF900]豈:lowest post-private
U+FEFF:0xefbbbf:\[uFEFF]:byte-order mark
U+FFFC:0xefbfbc:\[uFFFC]:object replacement character
U+FFFD:0xefbfbd:\[uFFFD]�:replacement character
U+FFFE:0xefbfbe:\[uFFFE]￾:reversed byte-order mark
U+FFFF:0xefbfbf:\[uFFFF]￿:highest three-byte
.TE
.SS Four-byte range
.TS
tab(:);
l l l l.
U+0000:0xf0808080::lowest obfuscated ASCII
U+007f:0xf08081bf::highest obfuscated ASCII
U+0080:0xf0808280::lowest obfuscated two-byte
U+07FF:0xf0809fbf::highest obfuscated two-byte
U+0800:0xf080a080::lowest obfuscated three-byte
U+FFFF:0xf08fbfbf::highest obfuscated three-byte
U+10000:0xf0908080:\[u10000]𐀀:lowest four-byte
U+3FFFF:0xf0bfbfbf:\[u3FFFF]𿿿:end of first start byte
U+40000:0xf1808080:\[u40000]񀀀:begin of second start byte
U+EFFFF:0xf2bfbfbf:\[uEFFFF]򿿿:highest public character
U+F0000:0xf3808080:\[uF0000]󀀀:lowest plane 15 private use
U+FFFFF:0xf3bfbfbf:\[uFFFFF]󿿿:highest plane 15 private use
U+100000:0xf4808080:\[u100000]􀀀:lowest plane 16 private use
U+10FFFF:0xf48fbfbf:\[u10FFFF]􏿿:highest valid four-byte
U+110000:0xf4908080:\[u110000]:lowest beyond Unicode
U+13FFFF:0xf4bfbfbf:\[u13FFFF]:end of last start byte
U+140000:0xf5808080:\[u140000]:lowest invalid start byte
U+1FFFFF:0xf7bfbfbf:\[u1FFFFF]:highest invalid four-byte
U+200000:0xf888808080:\[u200000]:lowest five-byte
.TE