Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Avoiding invalid UTF-8 is easy, almost trivial: just make sure you don't truncate in the middle of a code point.

The latter is fiendishly difficult to get right in all cases, the ugliest case being emoji flags. Being all-or-nothing on both sides of a ZWJ will get you most of the way there, however.



It's not though. Replacing invalid byte sequences is not terribly difficult.

https://golang.org/src/strings/strings.go?s=15854:15900#L627.


We are agreeing, the part I was indicating is difficult is 'displaying gibberish'.

Knowing what constitutes a grapheme cluster is detailed and frequently changes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: