So, UTF-8 without the self-synchronization part. It’s a reasonable encoding, sure. I’m not sure what the cost of making the whole thing big endian is, though (guess I need to look at simdutf8!); at first blush it feels like making this wholly little-endian-adapted, with the continuation bits = byte count in unary located in the low bits, would result in very cheap decoding on little-endian machines.