There's a huge disconnect between what the benchmarks are showing and what the d...

jzymbaluk · 2025-10-14T16:07:27 1760458047

How ironic that these LLM's appear to be overfitting to the benchmark scores. Presumably these researchers deal with overfitting every day, but can't recognize it right in front of them

woeirua · 2025-10-14T18:36:46 1760467006

I'm sure they all know it's happening. But the incentives are all misaligned. They get promotions and raises for pushing the frontier which means showing SOTA performance on benchmarks.