If wchar_t holds the majority of code points for given use, then there are some ...

andrewla · on Sept 9, 2024

UTF-16 is fine so long as you are in Plane 0. Once you have to deal with surrogate pairs, then it really is awful. Once you have to deal with byte-order-markers you might as well just throw in the towel.

UTF-8 is well-designed and has a consistent mechanism for expanding to the underlying code point; it is easy to resynchronize and for ASCII systems (like most protocols) the parsing can be dead simple.

Dealing with Unicode text and glyph handling is always going to be painful because this problem is intrinsically difficult. But expansion of byte strings to unicode code points should not be as difficult as UTF-16 makes it.

Windows was converted to UCS-2 before higher code planes were designed and they never recovered.