They say "LLMs are trained on the web", are the web pages converted from HTML into markdown before being fed into training?
They say "LLMs are trained on the web", are the web pages converted from HTML into markdown before being fed into training?