Winograd schemas falling at 10T parameters is interesting. That's probably only ...

LordDragonfang · on June 24, 2020

For those curious the "counting" is at the end of the article, and really is quite impressive:

>Following this post is an example article from the XSum dataset along with the model-generated abstractive summary. The model correctly abstracts and paraphrases four named frigates (HMS Cumberland, HMS Campbeltown, HMS Chatham and HMS Cornwall) as “four Royal Navy frigates”, something an extractive approach could not do since “four” is not mentioned anywhere. Was this a fluke or did the model actually count? One way to find out is to add and remove ships to see if the count changes.

>As can be seen below, the model successfully “counts” ships from 2 to 5. However, when we add a sixth ship, the “HMS Alphabet”, it miscounts it as “seven”. So it appears the model has learned to count small numbers of items in a list, but does not yet generalize as elegantly as we would hope. Still, we think this rudimentary counting ability is impressive as it was not explicitly programmed into the model, and it demonstrates a limited amount of “symbolic reasoning” by the model.

nl · on June 24, 2020

It's one of the most amazing and surprising things I've seen in the last 12 months in machine learning (I follow and work in the field).

It surprises me a lot more than the excellent performance of GPT-3 on text generation for example. GPT-3 is amazing but looking at GPT-1 -> GPT-2 -> GPT-3 it isn't surprising. Counting on the other hand is something I wouldn't have expected from a summarizer.

FeepingCreature · on June 24, 2020

Isn't the entire point of OpenAI's claim that GPT-3 is a few-shot learner that it generalizes concepts, not just syntax?

nl · on June 24, 2020

Yes, and that's very impressive.

But to me that isn't as surprising. Not claiming I would have thought of it, but if you have a very large multi-dimensional space (such as GPT-3) then giving it some examples of something pushes it into that general area of the space.

Generalizing concepts isn't a new thing - one could argue that word2vec from 2014 did that pretty well. GPT-3's "concepts" are vastly more complex than the single word (or maybe 2 word) concepts in Word2Vec though.

FeepingCreature · on June 25, 2020

I mean in that sense, GPT probably just extracts low-n counts as separate concepts.

I'd love to see an architecture that can keep a separate short-term memory to allow it to count with multiple digits and follow algorithms. On the other hand, given what we've seen from GPT, at that point I would actually worry about it becoming a general intelligence...

nl · on June 25, 2020

> low-n counts as separate concepts

But how would that work?

I agree it probably doesn't "understand" math, but it has learned that number words can substitute for each other in a sentence (three ships/four ships/five ships) which isn't surprising.

But it has somehow learned to link that word with the correct length of the sequence of names, which is astonishing. I can't think of obvious "cheats" that make this work.

The best I can think of is that is has learned to count commas when they are separated by words.

empath75 · on June 24, 2020

Sounds like my children. For a long time my now-four year old counted like this: “one, two, three, so many!”

jimbokun · on June 24, 2020

Sounds like your four year old was ready to start making inductive proofs!