I’m honestly surprised LLMs are still screwing up citations. It does not feel like a harder task than building software or generating novel math proofs. In both those cases, of course, there is a verifier, but self-verification with “Does this text support this claim?” seems like it ought to be within the capabilities of a good reasoning model.
But as I understand the situation, even the major Deep Research systems still have this issue.
But as I understand the situation, even the major Deep Research systems still have this issue.