Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tangentially related : It is possible to deanonymize users from kaggle dataset or netflix competition.

https://medium.com/@EmiLabsTech/data-privacy-the-netflix-pri...

Compared to the example of the medical records, Netflix had been very careful not to add any data that could identify a user, like zip-code, birthdate, and of course name, personal IDs, etc. Nevertheless, only a couple of weeks after the release, another PhD student, Arvind Narayanan, announced that they (together with his advisor Vitaly Shmatikov), had been able to connect many of the unique IDs in the Netflix dataset to real people, by cross referencing another publicly available dataset: the movie ratings in the IMDB site, where many users post publicly with their own names.

https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf

https://courses.csail.mit.edu/6.857/2018/project/Archie-Gers...



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: