[BACK]Return to input.out_ascii CVS log [TXT][DIR] Up to [cvsweb.bsd.lv] / mandoc / regress / char / unicode

File: [cvsweb.bsd.lv] / mandoc / regress / char / unicode / input.out_ascii (download)

Revision 1.8, Thu May 16 20:37:02 2024 UTC (2 weeks, 3 days ago) by schwarze
Branch: MAIN
CVS Tags: HEAD
Changes since 1.7: +6 -1 lines

Improve coverage of edge cases for 3-byte UTF-8 sequences.
Coverage for 2-byte and 4-byte sequences was already reasonable.

CHAR-UNICODE-INPUT(1)       General Commands Manual      CHAR-UNICODE-INPUT(1)

NNAAMMEE
     char-unicode-input - Unicode characters in the input file

DDEESSCCRRIIPPTTIIOONN
     lowest valid: <80>

   OOnnee--bbyyttee rraannggee
     U+0000   0x00   <NUL>?   lowest ASCII
     U+001f   0x1f   <US>?    highest ASCII control character
     U+007f   0x7f   <DEL>?   highest ASCII
              0x80   ?        leading lowest continuation
              0xbf   ?        leading highest continuation

   TTwwoo--bbyyttee rraannggee
     U+0000   0xc080     ??         lowest obfuscated ASCII
     U+007f   0xc1bf     ??         highest obfuscated ASCII
     U+0080   0xc280     <80><80>   lowest two-byte
     U+07FF   0xdfbf     <?><?>     highest two-byte
              0xc278     ?x         ASCII instead of continuation
              0xc2c380   ?`A         start byte instead of continuation

   TThhrreeee--bbyyttee rraannggee
     U+0000   0xe08080   ???      lowest obfuscated ASCII
     U+007f   0xe081bf   ???      highest obfuscated ASCII
     U+0080   0xe08280   ???      lowest obfuscated two-byte
     U+07FF   0xe09fbf   ???      highest obfuscated two-byte
     U+0800   0xe0a080   <?><?>   lowest three-byte
     U+0FFF   0xe0bfbf   <?><?>   end of first start byte
     U+1000   0xe18080   <?><?>   begin of second start byte
     U+CFFF   0xecbfbf   <?><?>   end of last normal start byte
     U+D000   0xed8080   <?><?>   begin of last start byte
     U+D7FB   0xed9fbb   <?><?>   highest valid public three-byte
     U+D7FF   0xed9fbf   <?><?>   highest public three-byte
     U+D800   0xeda080   ???      lowest surrogate
     U+DFFF   0xedbfbf   ???      highest surrogate
     U+E000   0xee8080   <?><?>   lowest private use
     U+F8FF   0xefa3bf   <?><?>   highest private use
     U+F900   0xefa480   <?><?>   lowest post-private
     U+FEFF   0xefbbbf   <?><?>   byte-order mark
     U+FFFC   0xefbfbc   <?><?>   object replacement character
     U+FFFD   0xefbfbd   <?><?>   replacement character
     U+FFFE   0xefbfbe   <?><?>   reversed byte-order mark
     U+FFFF   0xefbfbf   <?><?>   highest three-byte

   FFoouurr--bbyyttee rraannggee
     U+0000     0xf0808080     ????     lowest obfuscated ASCII
     U+007f     0xf08081bf     ????     highest obfuscated ASCII
     U+0080     0xf0808280     ????     lowest obfuscated two-byte
     U+07FF     0xf0809fbf     ????     highest obfuscated two-byte
     U+0800     0xf080a080     ????     lowest obfuscated three-byte
     U+FFFF     0xf08fbfbf     ????     highest obfuscated three-byte
     U+10000    0xf0908080     <?><?>   lowest four-byte
     U+3FFFF    0xf0bfbfbf     <?><?>   end of first start byte
     U+40000    0xf1808080     <?><?>   begin of second start byte
     U+EFFFF    0xf2bfbfbf     <?><?>   highest public character
     U+F0000    0xf3808080     <?><?>   lowest plane 15 private use
     U+FFFFF    0xf3bfbfbf     <?><?>   highest plane 15 private use
     U+100000   0xf4808080     <?><?>   lowest plane 16 private use
     U+10FFFF   0xf48fbfbf     <?><?>   highest valid four-byte
     U+110000   0xf4908080     ????     lowest beyond Unicode
     U+13FFFF   0xf4bfbfbf     ????     end of last start byte
     U+140000   0xf5808080     ????     lowest invalid start byte
     U+1FFFFF   0xf7bfbfbf     ????     highest invalid four-byte
     U+200000   0xf888808080   ?????    lowest five-byte

OpenBSD                          May 16, 2024            CHAR-UNICODE-INPUT(1)