Not to denigrate the techniques used here, but it is interesting as a computer vision researcher (face recognition in my case) how important good labelled training and testing sets are. Some of my successes over the years have come more from figuring out where to get good data than good computer vision techniques.
Not only large datasets, but the elasticity of a community to revisit old ideas from a new angle. Computer vision has been very good in this regard overall, though there were some dark times (https://www.facebook.com/yann.lecun/posts/10152034328862143).
Can you ever generate artifical training and test data?
For instance, suppose I would like to be able to take a photo of a chess game and turn that into a diagram of the position. I have no idea where I would get natural photos of thousands or millions of chess games to use for training and testing a chess piece identifying vision system.
Could I instead make 3D models of common chess set designs, and then generate and render photorealistic images of chess positions to use for training and test data for the vision system?
Probably, but it might not be able to work very well with different conditions, like a picture of a chessboard in central park vs a chessboard in a library. This could probably be solved with more renders, and even if it can't, the renders would probably be a good starting point
You still need some natural data, otherwise the network will probably overfit on regularities in your CG engine that don't exist in the real world. It might do that anyway, even with natural data.