After trialling several image recognition and categorization API services, Watson was by far the least impressive when you use their default classifier.
For a project I've been building, I have used Clarif.ai, Google Vision API, Watson, and Imagga. From empirical tests, Watson has always provided poor, if not hilariously non-sensical, results. As an example, I use this image of a baby in a stroller (https://dl.dropboxusercontent.com/u/898689/IMG_2948.jpg). Here are the results in classification tags from the 4 services:
Google: baby carriage, car seat, child, vehicle, diving equipment
Watson: performing, escalator, repairing, indoors, celebration, dancing, human, amusement arcade, bottle, baggage claim, group of people, appliance, tiger, people, big group, child, mixed color
Imagga: people portraits
These results obviously may vary depending on image subject, composition, etc., but I've basically dismissed Watson as a viable off-the-shelf visual recognition API. That being said, if you have a specific dataset of images that positively and negatively identify a concept, Watson's custom classifiers may be of interest although I haven't tried that.
Chris from Imagga here. Just tested with the provided image and the demo returns the following tags: child, happy, kid, cute, person, caucasian, little, boy, people, happiness, childhood, smile, fun, portrait, smiling, pedestrian, children, outside, etc.
We are less conservative with keywords we return and usually offer way more than just two words, that made me check what the API returns.
By the way, I'm on the look-out for other image classification services or open-source projects. I know I can (and will) deep dive into Tensor Flow deep-learning, but any other suggestions are really welcome!
If you're looking to train your neural net for a specific narrow category, then you cannot beat something like tensorflow and training it yourself (and it's very easy: https://www.tensorflow.org/versions/master/how_tos/image_ret...). If you want a general image classifier (most people dont) then you'd probably want to use a service.
I regularly benchmark all image captioning approach, but human captioning is still WAY better (see http://www.cloudsightapi.com/api, scroll down for demo)
Hey folks, I work on the Watson team at IBM, and wanted to clarify a couple of things about this demo. The default classifier set is good for getting started and getting an idea of how the service works, but it's only been trained on a relatively small assortment of images, hence the sometimes bad results for arbitrary images off the internet.
However, the real power of the service is the ability to train it for your specific domain - after setting up one or more custom classifiers, you can get far more accurate results on the tags that matter for your use case. You can see a few examples of this and even try it out yourself on the "Train" tab.
I have to say, I was trying out several services to train my own image aesthetics classifiers and for that the service is pretty neat. Trained it on 50 Macro images in nature and 50 non Macro images in nature. 7/10 served images were correct. To have a comparison, on Caffe or other open source libraries I need to train with more than 10k images to get the same results + setup the whole server myself. Assembling 10k samples can be quite tiring.
https://www.dropbox.com/s/5gfhbrftu4z8xr4/Screenshot%202016-...
Can anybody offer an explanation as to why an image of a tiger can have a confidence score of 99% as a tiger but only 84% confidence score as an animal. I guess the system has no concept of taxonomy.
The classifiers were probably trained independently and are running independently here as well, not in a hierarchical fashion.
Another way of saying this is that while the tiger training data consisted completely of tigers (for positive training examples), the animal training data might not have had any tigers at all (however unlikely) -- i.e. it could be recognizing the tiger image as containing similar enough features to the images of dogs and cats and lions that it did see to trigger the animal classifier, but with only 84% confidence.
This is the biggest gap I find in existing classifiers, the inability to make leaps from trained images (stack of pancakes) to others (single pancake) that are trivial for humans. I suspect this is because most scoring algorithms for rating classifiers don't appropriately penalize absurdly wrong answers, so what we wind up with are systems that are simply very good at matching a new image to a similar reference image, rather than really "figuring out" what the picture is of.
I've used this in teaching a class before as an example of a computer vision API...it's important to understand what it does not do...it doesn't have a huge vocabulary of trained words...and just a couple of months ago, what it was trained on was...not very optimal for showing off the product.
For example, this image of President Obama on the phone at his desk:
--- currently is interpreted by Watson's API as "90% person"...a month ago, when I was demoing the product, it also returned "person"...but a whole variety of other things too...most notably, "Flag Burning"...it's only when you dive into the API and request a list of default classifiers did you see that the API contained a lot of specific terms, but was by no means comprehensive...so "Flag Burning" had been trained, but not just "Flag". It seems they've cut down the vocabulary at this point, which is good, because it shouldn't be judged against APIs that purport to do face/celebrity recognition and have been trained for that.
I think the key advantage of this API is that it's the only one that I had seen that allowed you to build your own classifier. Here's an example I found in the wild, of someone building a classifier to differentiate between M1A1 Abrams tanks and...non-Abrams tanks (you upload a set of positive and negative images to the API):
This is inevitable by some system and will be interesting ethically. Revenge porn, accidental sharing, "anonymous" sharing etc might today have some level of implicit anonymity due to sheer volume of content, but if a machine can precisely tag the media to a person, decisions made before the technology existed will have retroactive impact on people's reputation, embarrassment, privacy. It will probably be easy and popular to ridicule someone for poor judgment when this happens, but that is quaint with benefit of hindsight when it becomes an all encompassing dragnet.
I've heard from teachers that children these days have an interesting new adversary with social media. On the contrary I've heard from people a bit older than me they are grateful things like Facebook didn't exist when they were teens. I'm in my mid 20s, I was perhaps the first WWW native cohort (I started using Netscape in early 1995), but social media was originally text (AIM) when I was particularly stupid, then Myspace and Facebook caught on at high-school age so I kind of straddled the two generations.
It has been aggressively marketed, does that count? It is essentially, IMHO, a sales funnel into data science consulting services. They have some demo interactive things you can play with, but the real deal is a whole suite of ML techniques that they've either already made or can custom tailor to your needs. It is managed I believe by Apache UIMA. But if you already are doing this stuff, you may have a leg up in house.
Anyway... so you could probably think of it as you might think of web development solutions, that is, there isn't one thing, but a suite of all kinds of things, and that IMHO is Watson today. I think they have a big gaping cloud offering as well to run those models they have or come up with as well, but it isn't like Watson is this single computer that won Jeopardy, and that exact computer they are now letting everyone use or something... I mean it is, but it also isn't.
Yes. We have a bunch of cloud services (including visual recognition - demonstrated here) that you can embed in your own application. Try them here: http://ibm.com/watsondevelopercloud
For a project I've been building, I have used Clarif.ai, Google Vision API, Watson, and Imagga. From empirical tests, Watson has always provided poor, if not hilariously non-sensical, results. As an example, I use this image of a baby in a stroller (https://dl.dropboxusercontent.com/u/898689/IMG_2948.jpg). Here are the results in classification tags from the 4 services:
Clarifai: people, child, vehicle, woman, one, man, outdoors, emergency, accident, protest, adult, transportation system, carriage, portrait, wheelchair, safety, road, bike, leisure, wheel
Google: baby carriage, car seat, child, vehicle, diving equipment
Watson: performing, escalator, repairing, indoors, celebration, dancing, human, amusement arcade, bottle, baggage claim, group of people, appliance, tiger, people, big group, child, mixed color
Imagga: people portraits
These results obviously may vary depending on image subject, composition, etc., but I've basically dismissed Watson as a viable off-the-shelf visual recognition API. That being said, if you have a specific dataset of images that positively and negatively identify a concept, Watson's custom classifiers may be of interest although I haven't tried that.