Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I really really really hate the tone of this article. What would take me one sentence to understand, took me reading 5+ paras.

And then the sneering tone of this article, sounds unprofessional and disrespectful in my opinion.

I also am pretty sure he’s wrong or at least he has to change layernorm to make this work. Attention simply does a weighted average of the Value Vectors, his change breaks that and I think will push the output closer to 0 as you stack the layers (especially considering Layer Norm). He really should do some small experiments to validate his idea first!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: