IBM Watson's Visual Recognition Demo

dlau · on March 22, 2016

After trialling several image recognition and categorization API services, Watson was by far the least impressive when you use their default classifier.

For a project I've been building, I have used Clarif.ai, Google Vision API, Watson, and Imagga. From empirical tests, Watson has always provided poor, if not hilariously non-sensical, results. As an example, I use this image of a baby in a stroller (https://dl.dropboxusercontent.com/u/898689/IMG_2948.jpg). Here are the results in classification tags from the 4 services:

Clarifai: people, child, vehicle, woman, one, man, outdoors, emergency, accident, protest, adult, transportation system, carriage, portrait, wheelchair, safety, road, bike, leisure, wheel

Google: baby carriage, car seat, child, vehicle, diving equipment

Watson: performing, escalator, repairing, indoors, celebration, dancing, human, amusement arcade, bottle, baggage claim, group of people, appliance, tiger, people, big group, child, mixed color

Imagga: people portraits

These results obviously may vary depending on image subject, composition, etc., but I've basically dismissed Watson as a viable off-the-shelf visual recognition API. That being said, if you have a specific dataset of images that positively and negatively identify a concept, Watson's custom classifiers may be of interest although I haven't tried that.

imagga · on March 23, 2016

Chris from Imagga here. Just tested with the provided image and the demo returns the following tags: child, happy, kid, cute, person, caucasian, little, boy, people, happiness, childhood, smile, fun, portrait, smiling, pedestrian, children, outside, etc. We are less conservative with keywords we return and usually offer way more than just two words, that made me check what the API returns.

dlau · on March 23, 2016

Hey Chris, you are right. I was accidentally using your categorization endpoint instead of tags endpoint. The tag results seem to be good.

dlau · on March 22, 2016

By the way, I'm on the look-out for other image classification services or open-source projects. I know I can (and will) deep dive into Tensor Flow deep-learning, but any other suggestions are really welcome!

feelix · on March 22, 2016

If you're looking to train your neural net for a specific narrow category, then you cannot beat something like tensorflow and training it yourself (and it's very easy: https://www.tensorflow.org/versions/master/how_tos/image_ret...). If you want a general image classifier (most people dont) then you'd probably want to use a service.

pierre · on March 22, 2016

I regularly benchmark all image captioning approach, but human captioning is still WAY better (see http://www.cloudsightapi.com/api, scroll down for demo)

rahmaniacc · on March 22, 2016

and probably way more expensive? But yeah, i have never seen a more accurate captioning for images yet

jjawssd · on March 22, 2016

I think you might be interested in SemanticMD. http://www.semantic.md/

We provide a medical image analysis platform. I work at SemanticMD.

phonon · on March 22, 2016

https://www.imageidentify.com/ (by Wolfram)

dbecker · on March 22, 2016

I recently used indico.io for a project, and I'm a huge fan of what they offer.

From my brief interactions, they seem like awesome people too.

johnward · on March 22, 2016

At least when I test that image now it seems to only tag person.

dlau · on March 22, 2016

Yes, that's what the website demo returns back. The full API response is what I reported.

nfriedly · on March 22, 2016

Hey folks, I work on the Watson team at IBM, and wanted to clarify a couple of things about this demo. The default classifier set is good for getting started and getting an idea of how the service works, but it's only been trained on a relatively small assortment of images, hence the sometimes bad results for arbitrary images off the internet.

However, the real power of the service is the ability to train it for your specific domain - after setting up one or more custom classifiers, you can get far more accurate results on the tags that matter for your use case. You can see a few examples of this and even try it out yourself on the "Train" tab.

mattygug · on March 22, 2016

I have to say, I was trying out several services to train my own image aesthetics classifiers and for that the service is pretty neat. Trained it on 50 Macro images in nature and 50 non Macro images in nature. 7/10 served images were correct. To have a comparison, on Caffe or other open source libraries I need to train with more than 10k images to get the same results + setup the whole server myself. Assembling 10k samples can be quite tiring. https://www.dropbox.com/s/5gfhbrftu4z8xr4/Screenshot%202016-...

stewbrew · on March 22, 2016

An image of the sea (e.g., https://upload.wikimedia.org/wikipedia/commons/8/8e/Lines_of...) returns whale or dolphin. I'm still waiting for one to show up.

This isn't actual visual "recognition" but rather pattern matching, I guess.

mrloop · on March 22, 2016

Can anybody offer an explanation as to why an image of a tiger can have a confidence score of 99% as a tiger but only 84% confidence score as an animal. I guess the system has no concept of taxonomy.

GrantS · on March 22, 2016

The classifiers were probably trained independently and are running independently here as well, not in a hierarchical fashion.

Another way of saying this is that while the tiger training data consisted completely of tigers (for positive training examples), the animal training data might not have had any tigers at all (however unlikely) -- i.e. it could be recognizing the tiger image as containing similar enough features to the images of dogs and cats and lions that it did see to trigger the animal classifier, but with only 84% confidence.

femto113 · on March 22, 2016

Less than amazing. This pancake got "95% moon" and nothing else: http://www.hoborecipes.com/images/pancake.png

femto113 · on March 22, 2016

It does recognize these however: https://yurielkaim.com/wp-content/uploads/2015/09/Why-You-Sh...

This is the biggest gap I find in existing classifiers, the inability to make leaps from trained images (stack of pancakes) to others (single pancake) that are trivial for humans. I suspect this is because most scoring algorithms for rating classifiers don't appropriately penalize absurdly wrong answers, so what we wind up with are systems that are simply very good at matching a new image to a similar reference image, rather than really "figuring out" what the picture is of.

pdw · on March 22, 2016

It has no clue what a tree is: https://i.imgur.com/UTPTvd9.png

ambirex · on March 22, 2016

This appears to be the rebranded AlchemyVision product from AlchemyAPI (bought by IBM last year).

pesenti · on March 22, 2016

No. There is added functionality. It is allowing you to build your own classifier.

danso · on March 22, 2016

I've used this in teaching a class before as an example of a computer vision API...it's important to understand what it does not do...it doesn't have a huge vocabulary of trained words...and just a couple of months ago, what it was trained on was...not very optimal for showing off the product.

For example, this image of President Obama on the phone at his desk:

https://farm2.staticflickr.com/1445/24129414022_f89da8ea52_b...

--- currently is interpreted by Watson's API as "90% person"...a month ago, when I was demoing the product, it also returned "person"...but a whole variety of other things too...most notably, "Flag Burning"...it's only when you dive into the API and request a list of default classifiers did you see that the API contained a lot of specific terms, but was by no means comprehensive...so "Flag Burning" had been trained, but not just "Flag". It seems they've cut down the vocabulary at this point, which is good, because it shouldn't be judged against APIs that purport to do face/celebrity recognition and have been trained for that.

I think the key advantage of this API is that it's the only one that I had seen that allowed you to build your own classifier. Here's an example I found in the wild, of someone building a classifier to differentiate between M1A1 Abrams tanks and...non-Abrams tanks (you upload a set of positive and negative images to the API):

http://cmadison.me/2015/12/03/classifying-tanks-with-the-ibm...

Here are some quick notes I wrote for my class...I have no idea if they apply exactly today:

https://github.com/compciv/watson-preview

And here's a example of the Watson classifiers performing on sample White House images:

http://stash.compciv.org/samples/watson-preview/printout.htm...

My favorite is the pic of Obama and the Pope, of which the top classifier is "Coffee_Maker -- 0.681709"

http://stash.compciv.org/samples/watson-preview/pics/popehou...

Now...the image is classified as: 99% Person, and 80% wedding...which, actually, isn't the worst guess :)

Scryptonite · on March 22, 2016

It classified this Labrador-"I HAVE NO IDEA WHAT I AM DOING"-meme: http://i.imgur.com/ZQaM2D7.jpg as vip: 45%, moustache: 45%, person: 26%

http://i.imgur.com/FYkxCNS.png

AnkhMorporkian · on March 22, 2016

Well, if it said VID instead it'd definitely be accurate, as that's a Very Important Doggy.

sildur · on March 22, 2016

It's a Very Important Perro

jordache · on March 22, 2016

epic fail.. Bicycles are pretty easy things to recognize..

http://builtbyswift.com/wp-content/uploads/2015/10/2015_Swif...

http://theradavist.com/wp-content/uploads/2016/03/DRIVESIDE_...

http://theradavist.com/wp-content/uploads/2016/02/morgan-tay...

ttrbls · on March 22, 2016

Fail

https://s3.amazonaws.com/uploads.hipchat.com/55839/380188/4O...

thecopy · on March 22, 2016

Not really impressed. Did not recognize Narutos selfie as a monkey: http://i.imgur.com/Pv72ZEZ.jpg

It thought bird as the most probable

rocky1138 · on March 22, 2016

Doesn't do nudes.

kev009 · on March 22, 2016

This is inevitable by some system and will be interesting ethically. Revenge porn, accidental sharing, "anonymous" sharing etc might today have some level of implicit anonymity due to sheer volume of content, but if a machine can precisely tag the media to a person, decisions made before the technology existed will have retroactive impact on people's reputation, embarrassment, privacy. It will probably be easy and popular to ridicule someone for poor judgment when this happens, but that is quaint with benefit of hindsight when it becomes an all encompassing dragnet.

I've heard from teachers that children these days have an interesting new adversary with social media. On the contrary I've heard from people a bit older than me they are grateful things like Facebook didn't exist when they were teens. I'm in my mid 20s, I was perhaps the first WWW native cohort (I started using Netscape in early 1995), but social media was originally text (AIM) when I was particularly stupid, then Myspace and Facebook caught on at high-school age so I kind of straddled the two generations.

Quasimoto3000 · on March 22, 2016

Has IBM able to do anything Watson besides Jeopardy as yet?

th0ma5 · on March 22, 2016

It has been aggressively marketed, does that count? It is essentially, IMHO, a sales funnel into data science consulting services. They have some demo interactive things you can play with, but the real deal is a whole suite of ML techniques that they've either already made or can custom tailor to your needs. It is managed I believe by Apache UIMA. But if you already are doing this stuff, you may have a leg up in house.

Anyway... so you could probably think of it as you might think of web development solutions, that is, there isn't one thing, but a suite of all kinds of things, and that IMHO is Watson today. I think they have a big gaping cloud offering as well to run those models they have or come up with as well, but it isn't like Watson is this single computer that won Jeopardy, and that exact computer they are now letting everyone use or something... I mean it is, but it also isn't.

pesenti · on March 22, 2016

Yes. We have a bunch of cloud services (including visual recognition - demonstrated here) that you can embed in your own application. Try them here: http://ibm.com/watsondevelopercloud

sp332 · on March 22, 2016

Yeah, they've got a cookbook out. http://www.amazon.com/gp/product/149262571X

voidlogic · on March 22, 2016

They are using it to create a pretty impressive system that provides oncology treatment recommendations. I saw a live demo.

znpy · on March 22, 2016

Meh. http://oi65.tinypic.com/11uagc6.jpg

guilamu · on March 22, 2016

Very bad, Google photo is realy far more powerfull and accurate.

propogandist · on March 23, 2016

Hundreds of millions of re-captcha service are solved daily by humans, training Google's machine learning algorithm.

This obviously requires some training

johnward · on March 22, 2016

I would think that Google has the advantage of being able to use their image search results for training.