AI only has to be as reliable as a human. The task is ultimately trivial and clearly in the capability of GPT-4. It is easy to statistically verify if GPT-4 has higher or equal correctness than a human. Of course Tao did not do this here, but it will be done.