In a sense the model is universal. It's just a 100GB (give or take) neural network.
And apparently (or so I heard, I think) feeding transformer models training data of Language A could improve its ability to understand Language B. So maybe there's something truly universal in some sense.
And apparently (or so I heard, I think) feeding transformer models training data of Language A could improve its ability to understand Language B. So maybe there's something truly universal in some sense.