Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That would be an interesting experiment. It might be more useful to make a model with a cut off close to when copyrights expire to be as modern as possible.

Then, we have a model that knows quite a bit in modern English. We also legally have a data set for everything it knows. Then, there's all kinds of experimentation or copyright-safe training strategies we can do.

Project Gutenberg up to the 1920's seems to be the safest bet on that.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: