"Baidu does not show up in this year’s rankings. The company made more submissions than were permitted and ultimately apologized and fired the team leader who directed juniors to make the unacceptable submissions."
Might be worth noting that it is very likely that Google, Facebook and Baidu did not even participate in this ImageNet. It is widely believed in the vision community that this dataset is at the end of its run, i.e. any subsequent improvements in performance are not due to scientific breakthroughs but through hyperparameter optimization and ensembling.
In classification task, MSRA and ReCeption show very similar performance: 0.03567 vs 0.03581 top(5) error. The gap is much more drastic on the localization task: 0.090178 vs 0.195792
The residual learning presented by Microsoft Research seems to be a breakthrough, based on the early evidence. But yes, ImageNet needs to be updated to stay relevant.
Google entered with ReCeption and finished just after the Microsoft team in the "Ordered by classification error" section (which is the usual "headline" number).
Baidu was banned. I don't think Facebook has ever done ImageNet?
But I agree that most of the interesting work will happen outside ImageNet now that human performance has been comprehensibly surpassed.
The only exception is non-Deep Learning based systems, where some people remain convinced that alternative approaches can match DL systems and have other advantages.
I won't be surprised when human level performance is surpassed, but isn't that paper based on work that has the advantage of knowing the full test dataset in advance? You can say "yes but they only trained on the training data" but that doesn't rule out tweaking across several experiments and measuring them against the known test data, then cherry picking the experiment that was best overfit to that data, right? I'm not saying there are shenanigans here, just wondering how you know there are not.
The test labels are not known to them. The test labels are held by a different team that tells them their test error rate. Furthermore, multiple, separate teams have surpassed human performance.
FWIW, "human performance" is not as meaningful as most people make it out to be, because this is such a narrow task.
The paper mentions it's unfair to humans to have to tell say a coucal from an indigo bunting (both blue coloured birds). I've never heard of either so I guess I would have got that wrong but it's not the fault of my visual system.
I am familiar with the ImageNet competition. Let me know if you find the ImageNet test labels. (Hint: they are not released, since they are re-used for competitions).
Interesting, did not know that. Do you think that proves this team was not able to get and exploit some inside knowledge? Judging from recent competitions, ethics does not seem to be a strong point in some cultures.
Can't speak for this particular team, but a submission for a competition is usually done on the last minute, so it's highly unlikely that _this_ algorithm is part of a product like Oxford API already. But a predecessor might, and it might get integrated over time. A research prototype usually requires a lot of duct tape to work.
Productization usually requires a different mindset (i.e. an engineering rather than a research team) who care more about things like test coverage, distribution, (time) performance etc. So by definition, MSR is ahead of the curve, product teams are on the curve.
so you mean there are 5 entries you can select to be scored in the end, and there is one private leaderboard no one can see but the admins? do you know the size of the private leaderboard test set?
Have they not really added much to it since the Jeopardy stunt? We get some periodic rumblings about integrating it with a customer-service tool, but it is starting to sound a little Duke Nukem Forever
They have it doing medical analysis (I'm hesitant to say "diagnosis" but I think it's being used to find symptom patterns doctors might otherwise miss).
Watson for Oncology analyzes a patient’s medical
information against a vast array of data, including
ongoing expert training from MSK physicians, cancer
case histories, and more, to provide evidence-based
treatment options.
Does Chef Watson count? It's actually pretty neat as a consumer tool, it suggests the weirdest things but somehow they taste fine. No idea about the inner workings though
AlchemyVision[1] from IBM Watson's Developer Cloud/BlueMix[2] is part of the Watson stack, and is pretty good. One of the few public APIs that lets you train your own categories.