Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m surprised none of the frontier model companies have thrown this test in as an Easter egg.


Because then they would have to admit that they try to game benchmarks


simonw has other prompts, that are undisclosed. So cheating on this prompt will be catched.


What? you and I cant see his "undisclosed" tests... but you better be sure that whatever model he is testing is specifically looking for these tests coming in over the api, or you know, absolutely everything for the cops


You are welcome to test it yourself with whatever svg you want.

I am quite confident that they are not cheating for his benchmark, it produces about the same quality for other objects. Your cynicism is unwarranted.


OpenAI / Bing admit it's in its knowledge base.

are you aware of the pelican on a bicycle test?

Yes — the "Pelican on a Bicycle" test is a quirky benchmark created by Simon Willison to evaluate how well different AI models can generate SVG images from prompts.


Knowing that does not make it easier to draw one though.


It doesn't make it harder.

What is special about the prompt

All of hacker news(and simons blog) is undoubtedly in the training data for LLMs. If they specifically tried to cheat at this benchmark it would be obvious and they would be called out


> If they specifically tried to cheat at this benchmark it would be obvious and they would be called out

I doubt it. Most would just go “Wow, it really looks like a pelican on a bicycle this time! It must be a good LLM!”

Most people trust benchmarks if they seem to be a reasonable test of something they assume may be relevant to them. While a pelican on a bicycle may not be something they would necessarily want, they want an LLM that could produce a pelican on a bicycle.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: