It's interesting that, in the image showing the classes easiest and hardest for the algorithm, all the easy ones are animals and all the difficult ones are human-made artifacts.
Does nature tend to create forms which are easily detected by relatively simple neural networks? The evolutionary explanation could be that this allows animals with primitive neural systems to more easily distinguish other members of their species.
I think it's because these human made object have no single form. For instance "letteropener" describes the function of the object, not the form. Compare that to "red fox" which is always gonna look more or less the same.
Yes, but the network is still differentiating between select breeds which means it is learning traits of the animal which are unique to the breed. And training at a higher level is perfectly doable, say "fox" instead of the fox breed.
Ultimately, if you are able to look at an image and say "letter opener" there are features which differentiate this from a knife/can opener/whatever - these are the exact things a convolutional neural network should be able to use (in theory) and has nothing to do with the label, which is typically unimportant as long as it is unique and accurate.
We could flip all the labels around and still get unique answers - the network is just learning a mapping from input -> some integer, and I would argue the variance in dog breeds and lighting in natural scenes is much trickier than the angle/shape of a letter opener.
I still think it comes down to the composition of this particular dataset. Augmenting this with images scraped from online stores would be very interesting as it is fairly trivial to get huge numbers of images for anything that is typically sold online - I think Google is way ahead on this one!
It's impossible to tell from the examples given in the article, but I wouldn't be surprised if the same classifier that gets 100% on "Blenheim Spaniel" and "Flat-coated Retriever" gets less than 100% on "Dog".
It's a question of how visually coherent the category you're trying to learn is. From a purely visual perspective, the first two categories are relatively tightly bunched in the state space, whereas "dog" covers a diffuse cloud of appearances whose total range might even encompass the area where many non-dog animals also lie. Humans may rely on some additional semantic knowledge about different kinds of animal to produce an accurate classification. It's not entirely unlike how determining the meaning of the words in the phrase "eats shoots and leaves" can't be done reliably without incorporating contextual clues such as whether we were just talking about pandas or a murder in a restaurant.
There may also be issues around how distinct the categories are from each other. A couple years ago yours truly picked up a letter opener off the table and used it to spread butter on his toast, much to the amusement of his hosts.
In practical use, you can simply search for anything in the "dog" subclass using the WordNet hierarchy... so there is no loss in accuracy unless you have confusion across the search groups! We actually support this in sklearn-theano - if you plug in 'cat.n.01' and 'dog.n.01' for an OverfeatLocalizer we return all matched points in that subgroup.
In general, if you misclassify "dog" for a fixed architecture you will most certainly misclassify "Blenheim Spaniel" and "Flat-coated Retriever" - the two other classes are subsets of the first. The "eats shoots and leaves" sentence is analogous to a "zoomed in" picture of fur - we don't know what it is but we are pretty sure what it isn't! This is still useful, and would already get most of the way there for large numbers of fur colors/patterns.
I think the concerns you have are more important at training time, but I have not seen a scenario where it has mattered very much. In general having good inference about these nets is really hard, but I think your initial thought about "dog space" ties in nicely to a post by Christopher Olah (http://christopherolah.wordpress.com/2014/04/09/neural-netwo...) - maybe you will find it interesting?
And yes it becomes really fascinating to extend your last thought to "optical illusions" and other tricks of the mind - even our own processing has paths are easily deceived and sometimes flat out wrong... so it is no surprise when something far inferior and less powerful also has trouble :)
The tiger [it's a leopard] and stingray [some other ray?] are wrong, but the system is 100% certain they're right; seems quite a big error considering the apparent accuracy of the other labels.
Isn't it contextual - flat-coated retriever, well-done, but how good is it at picking one out of a pile of images of black animals, panthers, house cats?
It could also be that there are more labeled examples of animal, or that the translation/rotation of an animal is less disruptive than than the same transform to a car or other object.
I know for sure that ImageNet has a huge amount of animals in it, down to very select sub-breeds, while the object categories are usually at higher levels.
I'd say it's an entropy thing. The principle of "correlation of parts" means images of animals have low entropy relative to, say, machines. Intuitively a machine can contain an almost arbitrary set of shapes and components, whereas animal forms are more constrained.
See my earlier answer - I think that it is all about the data. And most things humans design have a similar "correlation of parts" due to the human preference for symmetry. Not all the images are face on headshots - lots of running/action, off centered, etc. ImageNet is a very "real" dataset in that sense.
Entropy isn't the right term here - low entropy would imply something about the compressibility of each image based only on whether it was a machine or a natural object, which I don't think is case.
Well, if the effect is just experimental error, I agree that no explanation is needed.
I was using entropy in the information theoretic sense, as you might assign a measure of entropy in bits to each character in a language. If part of an animal is "less surprising" I'd say it contributed less to the entropy of the whole thing. Maybe that's too woolly.
There is no way you could identify the whole machine based on seeing a small part if an identical part recurs in many different machines. It's not holographic in the way animals are.
Chicken, egg. Maybe these things are easy because they are crucial to survival (and everything that was bad at recognizing these signs died), not due to being easier for simple visual processes.
Both chicken and egg at the same time. Evolution's going to favor the solution that involves the least costly adaptations. That probably means simpler markings will be favored because it's less costly for the prey species to evolve them and it's less costly for the predator species to evolve an instinct to avoid them.
> The evolutionary explanation could be that this allows animals with primitive neural systems to more easily distinguish other members of their species.
It's probably the other way around. Neural systems optimize for natural objects.
But isn't the neural network (like the algorithm described in the article) a mathematical concept? Are the computer-based neural networks so closely modeled after biological equivalents that they would inherit such an optimization?
(As you can tell, I know nearly nothing about neural networks, but curious to learn more...)
Digital circuits? I mean, it is just some matrix multiplies and a nonlinearity (then stack to the moon). No "circuitry" is really involved until you get into recurrent networks, and even then that is just feedback. Not quite sure what you mean here.
There have been experiments trying to encode information the way the brain does, they just haven't worked very well (or at least not as well).
There is equivalency with PCA (well ZCA, a modified form of PCA) in cat and monkey brains, and likely others as well. See Sejnowski and Bell in [1]
Also, PCA is an affine transform, so there is no reason it couldn't be incorporated/learned by the net itself. In fact, I think most nets these days eschew PCA/ZCA when they have sufficient data support.
To clarify for others, this type of neural network has nothing to do with the brain. A neural network is really a "universal function approximator", and I actually prefer to call it as such. Our goal is to learn the best possible mapping of input -> label, through whatever means necessary. It turns out that learning hierarchies of features helps from both a learning aspect and a computational point of view. But a sufficiently wide single layer could do the same thing in theory.
I have no idea. But I'd wager that cause and effect in natural systems has more to do with visual systems adapting to shapes than shapes of entire animals adapting to other species' visual systems.
But isn't the neural network (like the algorithm described in the article) a mathematical concept?
It is. It's a family of models that (a) can be expressed compactly using linear algebra and (b) can represent a large class of mathematical functions. (In fact, the family of neural nets can learn all continuous functions.) "Learning" is typically some variety of gradient descent (on the "error surface" defined by the training set) in the space of parameters.
Are the computer-based neural networks so closely modeled after biological equivalents that they would inherit such an optimization?
In my opinion, no. Biological neural networks are, in fact, a lot more complicated. They have time behavior, changing network topologies, and chemical influences (neurotransmitters) that play a major role. There's still a lot we don't know about them.
I disagree with GP's contention. While the biological neural network was an inspiration for this class of mathematical models, and convolutional behavior (in ANNs, a regularization, or mechanism for reducing the dimensionality of the parameter space, sacrificing training-set performance but often improving test-set performance) may be used in our visual cortex, but artificial neural nets are quite different and, mathematically, most varieties are quite simple.
Does nature tend to create forms which are easily detected by relatively simple neural networks? The evolutionary explanation could be that this allows animals with primitive neural systems to more easily distinguish other members of their species.