Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd argue for some standard tests for UTF-8 strings:

- Basic - UTF-8 byte syntax correct.

- Unambiguous - similar to the rules for Unicode domain names. The rules are complicated, but basically they prohibit homoglyphs, mixing glyphs from different character sets, forwards and backwards modifiers in the same string, no emoji or modifiers, etc. Use where people have to visually compare two things for identity or retype them, such as file names.

- Unambiguous, light version - as above, but allow emoji and modifiers. Normal form for documents.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: