If all your problems with attention are actually just problems with softmax, the...

thousand_nights · 2025-10-25T01:28:11 1761355691

reason we're benchmaxxing is because there's a huge monetary incentive now to have the best performing model on these synthetic benchmarks and that status is worth a lot of money

literally every new release of something point X model of every major player includes some benchmark graphs to show off

mycall · 2025-10-25T04:30:24 1761366624

benchmaxxing has also been identified as one of the causes of hallucination.

svnt · 2025-10-25T05:16:43 1761369403

hallucination is just built in, what am I missing?

ACCount37 · 2025-10-25T12:21:32 1761394892

That LLMs have some basic metaknowledge and metacognitive skills that they can use to reduce the hallucination rate.

Which is what humans do too - it's not magic. Humans just get more metacognitive juice for free. Resulting in a hallucination rate significantly lower than that of LLMs, but significantly higher than zero.

Now, having the skills you need to avoid hallucinations is good, even if they're weak and basic skills. But is an LLM willing to actually put them to use?

OpenAI cooked o3 with reckless RL using hallucination-unaware reward calculation - which punished reluctance to answer and rewarded overconfident guesses. And their benchmark suite didn't catch it, because the benchmarks were hallucination-unaware too.

skissane · 2025-10-25T04:18:53 1761365933

> Add a dedicated "parking spot" like GPT-OSS does and eat the gradient flow tax on that

Not familiar with this topic, but intrigued-anywhere I can read more about it?

ACCount37 · 2025-10-25T12:48:35 1761396515

Looked for it briefly, think the best I got is this older discussion:

https://news.ycombinator.com/item?id=44834918

qcnguy · 2025-10-25T17:40:32 1761414032

OpenAI have talked about it. The neural architecture needs to let the model handle the case where there's nothing worth attending to, as softmax requires attention to be allocated to all tokens but sometimes there's nothing worth it.