Nothing in that tweet is in any way related to anything I've said. It's certainly not about the distinction between how NNs are trained and how they work. The way NNs are trained is part (and obviously only part) of how they work.
I'm well aware that LLMs are not simple. Remember when I said you'd need around two years of a college CS degree to understand it, and elsewhere where I said that it could be a college course?
The area between "simple" and "beyond human understanding" is pretty large, though.
Your description of "emergent capabilities" so far is pointing to a paper which says that putting larger data into a model produces different results, which is, you know, obvious. Calling those differences "emergent capabilities" is extraordinarily poor communication.
What I mean when I say "emergent capabilities" is capabilities which aren't explained by the input data and program code, and you have yet to present a single one. Certainly nothing that could be called originality or preference, which were my two original examples of what ChatGPT doesn't have.
Again, what "emergent behavior"? By which I mean, what behaviors do these models have which are not easily explained by the program design and inputs?
The CNN weights are not programmed by us, they're programmed by a program which is programmed by us. This level of indirection doesn't suddenly mean we don't understand anything.
Another way of thinking of it: all we have is a program written by us, which takes the training data and a prompt as inputs, and spits out an output. The CNN weights are just a compressed cache of the training data part of the inputs so that you don't have to train the model for every prompt.
The emergent behavior is much more obvious in GPT-4 than in GPT-3.5. It seems to be arising when the data sets get extremely large.
I notice it when the AI conversation is extended for a number of interactions - the AI appears to take the initiative to produce discourse that would not be expected in just LLMs, and which seems more human. It's hard to put a finger on, but, as a human, "I know it when I see it".
Since injecting noise is part of the algorithm, the AI output is different for each cycle. The weights are partially stochastic and not fully programmed. The feedback weights are likely particularly sensitive to this.
In any case, it's early days. Check out the Microsoft paper, Sparks of Artificial General Intelligence: Early experiments with GPT-4
> The emergent behavior is much more obvious in GPT-4 than in GPT-3.5.
What emergent behavior?
> I notice it when the AI conversation is extended for a number of interactions - the AI appears to take the initiative to produce discourse that would not be expected in just LLMs, and which seems more human.
Maybe that's not what you expect, but that's exactly what I would expect. More training data, better trained models. Given they're being trained with human data, they're acting more like the human data. Note that doesn't mean they're acting more human. But it can seem more human in some ways.
> The weights are partially stochastic and not fully programmed.
Right... but with the law of averages a the randomness would eventually tune out. You might end up with different weights but that just indicates different means of performing similar tasks. It's always an approximation, but the "error" would decrease over repeated sampling.
> In any case, it's early days. Check out the Microsoft paper, Sparks of Artificial General Intelligence: Early experiments with GPT-4
This is a good analogy that illustrates the problem with not making the distinction https://twitter.com/nearcyan/status/1632661647226462211