> Lily puts her keys in an opaque box with a lid on the top and closes it. She leaves. Bob comes back, opens the box, removes the keys, and closes the box, and places the keys on top of the box. Bob leaves.
>Lily returns, wanting her keys. What does she do?
ChatGPT4:
> Lily, expecting her keys to be inside the opaque box, would likely open the box to retrieve them. Upon discovering that the keys are not inside, she may become confused or concerned. However, she would then probably notice the keys placed on top of the box, pick them up, and proceed with her original intention.
GPT4 cannot (without heavy hinting) infer that Lily would have seen the keys before she even opened them! What's amusing is that if you change the prompt to "transparent", it understands she sees them on top of the box immediately and never opens it -- more the actions of a word probability engine than a "reasoning" system.
That is, it can't really "reason" about the world and doesn't have awareness of what it's even writing. It's just an extremely good pattern matcher.
> Can you give an example of a truly novel problem that it solves worse than a child? How old is the child?
See above. 7. All sorts of custom theory of mind problems it fails. Gives a crazy answer to:
> Jane leaves her cat in a box and leaves. Afterwards, Billy moves the cat to the table and leaves. Jane returns and finds her cat in the box. Billy returns. What might Jane say to Billy?
Where it assumes Jane knows Billy moved the cat (which she doesn't).
I also had difficulty with GPT4 getting it to commit to sane answers for mixing different colors of light. It has difficulty on complex ratios in understanding that green + red + blue needs to consistently create a white. i.e. even after a shot of clear explanation, it couldn't generalize that N:M:M of the primary colors must produce a saturated primary color (my kid again could do that after one shot).
> True, but you can let it use output tokens as scratch space and then only look at the final result. That lets it behave as if it has memory.
Yes, but it has difficulties maintaining a consistent thought line. I've found with custom multi-step problems it will start hallucinating.
> To the contrary, the trend of increasingly large transformers seemingly getting qualitatively smarter indicates that maybe the architecture matters less than the scale/training data/cost function.
I think "intelligence" is difficult to define, but there's something to be said how different transformers are from the human mind. They end up with very different strengths and weaknesses.
Previously here: https://news.ycombinator.com/threads?id=usaar333#35275295
Similar problems with this simple prompt:
> Lily puts her keys in an opaque box with a lid on the top and closes it. She leaves. Bob comes back, opens the box, removes the keys, and closes the box, and places the keys on top of the box. Bob leaves.
>Lily returns, wanting her keys. What does she do?
ChatGPT4:
> Lily, expecting her keys to be inside the opaque box, would likely open the box to retrieve them. Upon discovering that the keys are not inside, she may become confused or concerned. However, she would then probably notice the keys placed on top of the box, pick them up, and proceed with her original intention.
GPT4 cannot (without heavy hinting) infer that Lily would have seen the keys before she even opened them! What's amusing is that if you change the prompt to "transparent", it understands she sees them on top of the box immediately and never opens it -- more the actions of a word probability engine than a "reasoning" system.
That is, it can't really "reason" about the world and doesn't have awareness of what it's even writing. It's just an extremely good pattern matcher.
> Can you give an example of a truly novel problem that it solves worse than a child? How old is the child?
See above. 7. All sorts of custom theory of mind problems it fails. Gives a crazy answer to:
> Jane leaves her cat in a box and leaves. Afterwards, Billy moves the cat to the table and leaves. Jane returns and finds her cat in the box. Billy returns. What might Jane say to Billy?
Where it assumes Jane knows Billy moved the cat (which she doesn't).
I also had difficulty with GPT4 getting it to commit to sane answers for mixing different colors of light. It has difficulty on complex ratios in understanding that green + red + blue needs to consistently create a white. i.e. even after a shot of clear explanation, it couldn't generalize that N:M:M of the primary colors must produce a saturated primary color (my kid again could do that after one shot).
> True, but you can let it use output tokens as scratch space and then only look at the final result. That lets it behave as if it has memory.
Yes, but it has difficulties maintaining a consistent thought line. I've found with custom multi-step problems it will start hallucinating.
> To the contrary, the trend of increasingly large transformers seemingly getting qualitatively smarter indicates that maybe the architecture matters less than the scale/training data/cost function.
I think "intelligence" is difficult to define, but there's something to be said how different transformers are from the human mind. They end up with very different strengths and weaknesses.