Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most content is not "translated", it just needs to be differently collated. If you have a company directory, you have the names of every user in the company in a table, and you need to display that information in a collation based on the locale of the viewing user. Having to have a new table for every language that contains the same data as the main table is just pointless overhead. Also, while you can choose the collation at query time, the point of per-column collation is to let you have an index over that collation. Can you please demonstrate how the use case of "company directory" would be cleanly and efficiently implemented using per-column collation?

EDIT: I've been looking more into this COLLATE keyword that they have added, and I'm actually somewhat curious to see if I can make this work (where the optimizer manages to choose the right index) by something like the following, in which case I'm going to be seriously happy... ;P.

    CREATE TABLE my_table (a text);
    CREATE INDEX my_index ON my_table ((a COLLATE "en_US"));
    CREATE INDEX my_index ON my_table ((a COLLATE "de_DE"));
    SELECT * FROM my_table ORDER BY a COLLATE "de_DE";


Simply out of curiosity for this same topic, do you happen to know of a good resource for finding out even just some of the less trivial differences that this solves? I'm sure it does but off hand I don't know them (I'm not all that multilingual).

I understand it'll bring in glyph orderings that don't exist in en_US or whatever you've got the default set to, such as 'Ç' in french among others.


I do not have a good resource, however, I know a few off the top of my head: 1) characters with modifiers, like umlauts, sometimes collate the same, and sometimes collate differently; 2) multiple characters may collate as a single character, such as "ll" (I just did a search to verify that this was the case in Spanish, and found the Collation page on Wikipedia, which you might find interesting); and 3) different locales may choose to collate numbers using different algorithms (in English we usually expect "1,000" to sort after "200", but if "," is a decimal point, then you might not).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: