This exact scenario is what I described to a friend of mine who is an AI researcher.
He was convinced that if we trained the AI on enough data, GPT-x would become sentient.
My opinion was similar to yours. I felt like the hallucinating the AI does was insufficient in performing true extrapolating thought.
I said this because humans don’t truly have access to infinite knowledge, even when they do, they can’t process all of it. Adding endless information for the AI to feed on doesn’t seem like the solution to figuring out true intelligence. It’s just more of the same hallucinating.
Yet despite lacking knowledge, us humans still come up with consistently original thoughts and expressions of our intelligence daily. With limited information, our minds create new representations of understanding. This seems to be impossible for Chat GPT.
I could be completely wrong, but that discussion solidified for me that my role as a dev still has at least a couple more decades of shelf life left.
It’s nice to hear that others are reaching similar conclusions.
Current LLMs decode in a greedy manner, token by token. In some cases this is good enough - namely for continuous tasks, but in other cases the end result means the model has to backtrack and try another approach, or edit the response. This doesn't work well with the way we are using LLMs now, but could be fixed. Then you'd get a model that can do discontinuous tasks as well.
>> Write a response that includes the number of words in your response.
> This response contains exactly sixteen words, including the number of words in the sentence itself.
It contains 15 words.
The model would have to plan everything before outputting the first token if it were to solve the task correctly. Works if you follow up with "Explicitly count the words", let it reply, then "Rewrite the answer".
How? The problem is known for a while, for example this article [0] mentions it (as Chain of Thought reasoning). You could think that just having a scratchpad of tokens is enough - you can arguably plan, backtrack and rewrite there [1], right? But this doesn't really work, at least yet - maybe because it wasn't trained for that - and maybe ChatGPT massive logs (probably available only for OpenAI) can help. But the Microsoft report [2] suggests we need a different architerture and/or algorithms? They mention lack of planning and retrospective thinking as a huge problem for GPT-4. Maybe you know some articles on the ideas how to fix this? Backtracking, trying again seems to be linked to human thought - and very well can give us AGI.
You may be shocked to hear this but dijkstra’s short path algorithm is the technical answer to this question. We just don’t use it because it’s expensive.
Language chains or tool use where it can also call on itself to solve subproblems. If you don't have to do just one round of LLM interaction you can do complex stuff.
Backtracking to edit the response is theoretically easily solved by training on a masked language modeling objective instead of an autoregressive one, but using it to actually generate text is a bit expensive because you can't just generate one token at a time and be done, you might have to reevaluate each output token every time another token is changed. So I expect autoregressive generation to remain the default until the recomputation effort can be significantly reduced or hardware advances make the cost bearable.
>> Backtracking to edit the response is theoretically easily solved by training on a masked language modeling objective instead of an autoregressive one, but using it to actually generate text is a bit expensive because you can't just generate one token at a time and be done, you might have to reevaluate each output token every time another token is changed.
I can't imagine how training on masked tokens can "easily" solve backtracking, even in theory. Do you have some literature I could read on this?
Discrete diffusion with rewriting can work well. It feels loosely similar to backtracking, if you assume n_steps large enough - need to be able to rewrite any non-provided position though I think (not all setups do this). Downside is the noise in discrete diffusion (in simplest case randomizing over all vocabulary space) is pretty harsh and makes things very difficult practically. Don't have an exact reference on the relationship, but feels similar to backtracking type mechanics in my experience. I found things tend to "lock in" quickly once a good path is found, which feels a lot like pathfinding to me.
Some early personal experiments with adding "prefix-style" context by a cross-attention (in the vein of PerceiverAR) seemed like it really helped things along, which would kind of point to search-like behavior as well.
Probably the closest theory I can think of is orderless NADE, which builds on the "all orders" training of https://arxiv.org/abs/1310.1757 , which in my opinion closely relates to BERT and all kinds of other masked language work. There's a lot of other NAR language work I'm skipping here that may be more relevant...
On discrete diffusion:
Continuous diffusion for categorical data shows some promise "walking the boundary" between discrete and continuous diffusion https://arxiv.org/abs/2211.15089 , personally like this direction a lot.
My own contribution, SUNMASK, worked reasonably well for symbolic music/small datasets (https://openreview.net/forum?id=GIZlheqznkT), but really struggled with anything text or moderately large vocabulary, maybe due to training/compute/arch issues. Personally think large vocabulary discrete diffusion (thinking of the huge vocabs in modern universal LM work) will continue to be a challenge.
Decoding strategies:
As a general aside, I still don't understand how many of the large generative tools aren't exposing more decoding strategies, or hooks to implement them. Beam search with stochastic/diverse group objectives, per-step temperature/top-k/top-p, hooks for things like COLD decoding https://arxiv.org/abs/2202.11705, minimum Bayes risk https://medium.com/mlearning-ai/mbr-decoding-get-better-resu..., check/correct systems during decode based on simple domain rules and previous outputs, etc.
These kinds of decoding tools have always been a huge boost to model performance for me, and having access to add in these hooks to "big API models" would be really nice... though I guess you would need to limit/lock compute use since a full backtracking search would pretty swiftly crash most systems. Maybe the new "plugins" access from OpenAI will allow some of this.
Backtracking is easily solved with a shortest path algorithm. I don’t see any need for masking if you are simply maximizing likelihood of the entire sequence.
> This exact scenario is what I described to a friend of mine who is an AI researcher.
He was convinced that if we trained the AI on enough data, GPT-x would become sentient.
My opinion was similar to yours. I felt like the hallucinating the AI does was insufficient in performing true extrapolating thought.
It turns out it isn’t just AIs that hallucinate; AI researchers do as well.
> He was convinced that if we trained the AI on enough data, GPT-x would become sentient.
Is there enough data?
As I understand it, the latest large language models are trained on almost every piece of available text. GPT-4 is multimodal in part because there isn't an easy way to increase its dataset with more text. In the meantime, text is already quite information dense.
I'm not sure that future models will be able to train on an order of magnitude more information, even if the size of their training sets has a few more zeroes added to the end.
I don't think that when people commonly discuss sentience they mean to include goldfish. I don't think the legal definition (which probably exists due to external legal implications) has any bearing on the intellectual debate of AI sentience.
If I were talking about sentience I would definitely be including goldfish. What about them is so different to us that we would have sentience while they would not?
> He was convinced that if we trained the AI on enough data, GPT-x would become sentient.
Not saying your friend is right or wrong, but imagine if civilization gives more information, in realtime, to an AI system through sensors: will be at least sentient as the civilization? Seems like a scifi story, a competitor to G-d.
Isaac Asimov wrote a story along those lines, “The Last Question”, which he described as “by far my favorite story of all those I have written.” Full text here:
Some versions of divinity (both from real-world beliefs and sci-fi/fantasy) have it being essentially a gestalt of either all the souls that have ever died, or all those alive now—a kind of "oversoul" or collective consciousness.
While that's an interesting thought experiment, I don't think it can meaningfully apply to any kind of AI we have the capability to make today, even if we could hook it up directly to all our knowledge. Information alone can't make something sentient; it requires a sufficiently complex and sophisticated information processing system, one that can reason about its knowledge and itself.
I’m not at all an expert on the topic, but from what I gathered LLMs are fundamentally limited in the kind of problems they can approximate. They can approximate any integrable function quite well, but we can only come up with limits on a case-by-case basis for non-integrable ones, and I believe most interesting problems are of this latter kind.
Correct me if I’m wrong, but doesn’t it mean that they can’t recursively “think”, on a fundamental basis? And sure I know that you can pass “show your thinking” to GPT, but that’s not general recursion, just “hard-coded to N iterations” basically, isn’t it? And thus no matter how much hardware we throw at it, it won’t be able to surpass this fundamental limit (and without proof, I firmly believe that for a GAI we do need the ability to basically follow through a train of thought)
It fundamentally can’t recurse into a thought process. Let’s say I give you a symbol table where each symbol means something and ask you to “evaluate” this list of symbols. You can do that just fine, but even in theory not even GPT-10384 will be able to do that without changing the whole underlying model itself.
Could you try writing even in this simple language a longer program? Just simply increase the input to 20x or something around that. I’m interested in whether it will break and if it does, at what length.
Interesting, it screwed up at step 160. I think it probably ran out of context, if I explicitly told it to output each step in a more compact way it might do better. Or if I had access to the 32k context length it would probably get 4x further.
Actually it might be worth trying to get it to output the original instructions again every 100 steps, so that the instructions are always available in the context. The ChatGPT UI still wouldn't let you output that much at once but the API would.
If they aren't already, AIs will be posting content on social media apps. These apps measure the amount of attention you pay to each thing presented to you. If it's more than a picture or a video, but something interactive, then it could also learn how we interact with things in more complex ways. It also gets feedback from us through the comments section. Like biological mutations, AIs will learn which of its (at first) random novel creations we find utility in. It will then better learn what drives us and will learn to create and extrapolate at a much faster pace than us.
> If they aren't already, AIs will be posting content on social media apps.
No, people will be posting content on social media apps that they asked LLMs to write.
It may be done through a script, or API calls, but it's 100% at the instigation, direct or indirect, of a human.
LLMs have no ability to decide independently to post to social media, even if you do write code to give them the technical capability to make such posts.
With the new ChatGPT Plugins, it seems they may actually be able to make POST requests to social media APIs soon. It is likely that an LLM could have "I should post a tweet about this" in its training data.
Granted... currently it is likely humans that have written the code that the new Plugins are allowed to call -- but they have given ChatGPT the ability to execute rudimentary Python scrips and even ffmpeg so I think it is only a matter of time before one outputs a Tweet written by its own code.
> It is likely that an LLM could have "I should post a tweet about this" in its training data.
That only matters if a human has explicitly hooked it up so that when ChatGPT encounters that set of tokens, it executes the "post to Twitter" scripts.
ChatGPT doesn't comprehend the text it's producing, so without humans making specific links between particular bundles of text and the relevant plugin scripts, it will never "decide" to use them.
At a high level, all that would have to happen is a person gives GPT, or something like it, access to a social media page and tells it to post to it with the objective of getting the highest level of interaction and followers.
...which in no way grants GPT sapience, nor would it prove that it has it.
The human is still providing the capability to post, the timing script to trigger posting, and the specific heuristic to be used in determining how to choose what to post.
More data will only mean more inference. But at some unexpected moment, the newly created "senseBERT" breaks the barrier between intelligence and consciousness.
> He was convinced that if we trained the AI on enough data, GPT-x would become sentient.
It sounds like he doesn't even understand the basics of what GPT is, or what sentience is. GPT is an impressive manipulator/predictor of language, but we have evidence from all sorts of directions that there's more to sentience or consciousness than that.
I would like to propose a thought experiment concerning the realm of knowledge acquisition. Given that the scope of human imagination is inherently limited, it is inevitable that certain information will remain beyond our grasp; these are the so-called "known unknowns." In the event that an individual generates a piece of knowledge from this inaccessible domain, how might it manifest in our perception? It is likely that such knowledge would appear incomprehensible to us. Consequently, it is worth considering the possibility that the GPT model is not, in fact, experiencing hallucinations; rather, our human understanding is simply insufficient to fully grasp its output.
Yeah. Maybe when a baby says "gabadigoibygee", he is using an extremely efficient language that is too sophisticated for our adult brains to comprehend.
> In the event that an individual generates a piece of knowledge from this inaccessible domain, how might it manifest in our perception? It is likely that such knowledge would appear incomprehensible to us.
If what a person says cannot be comprehended by any other person, we usually have a special term for it.
This is ridiculously “meta”, but I’ve said the same thing, at some point GPT-x will be useless as it will be beyond our comprehension, that’s if it’s actually “smart”.
My honest opinion is the hallucinations are just gibberish, but are they useful gibberish? Maybe we’re saying the same thing ?
He was convinced that if we trained the AI on enough data, GPT-x would become sentient.
My opinion was similar to yours. I felt like the hallucinating the AI does was insufficient in performing true extrapolating thought.
I said this because humans don’t truly have access to infinite knowledge, even when they do, they can’t process all of it. Adding endless information for the AI to feed on doesn’t seem like the solution to figuring out true intelligence. It’s just more of the same hallucinating.
Yet despite lacking knowledge, us humans still come up with consistently original thoughts and expressions of our intelligence daily. With limited information, our minds create new representations of understanding. This seems to be impossible for Chat GPT.
I could be completely wrong, but that discussion solidified for me that my role as a dev still has at least a couple more decades of shelf life left.
It’s nice to hear that others are reaching similar conclusions.