Ask HN: Why can GPT4 en/decode base64?

waselighis · on May 13, 2023

It's a pretty simple and direct mapping. A single character is 8 bits, and a single base64 digit is 6 bits. They perfectly align at 24 bits. So it simply has to learn how to map every 3 characters to 4 base64 digits. Otherwise, there's likely tons of base64-encoded text in the training data simply from scraping the web.

It's not perfect though. I tested it on a few sentences of text and it made a few mistakes. Due to the way that GPT tokenizes the input text, it can't really generalize the pattern, as mappings of text to tokens is somewhat random. It effectively has to learn how to map every unique combination of 3 characters to 4 base64 digits, of which there are up to 2^24=16,777,216 distinct mappings. Otherwise, the number of characters in each token varies, which can also lead to mistakes.

You can use this tool to see how GPT3 maps text to tokens and token IDs: https://platform.openai.com/tokenizer

As an example, the alphabet "abcdefghijklmnopqrstuvwxyz" maps to [39305, 4299, 456, 2926, 41582, 10295, 404, 80, 81, 301, 14795, 86, 5431, 89]. This is what I mean by it's fairly random.

gumballindie · on May 13, 2023

There is always a logical, scientific, explanation as opposed to esoteric assumptions such it being human like intelligent. Thanks for taking the time to share this. Always good to see there are some sane people left in the ai community.

plumeria · on May 13, 2023

Could this be considered another instance of machine translation (albeit easier), which GPT already does?

rvnx · on May 13, 2023

Seems so, I wonder if an improvement specific to base64 translation couldn't be to force 1 char = 1 token (as computation power grows) ? or even further, 1 bit = 1 token, that could potentially help figuring out connections with arithmetic ?

hackinthebochs · on May 13, 2023

People really need to update their model of what a "statistical predictor" can accomplish. We know that Transformers are universal approximators of sequence-to-sequence functions[1], and so any structure that can be encoded into a sequence-to-sequence map can be modeled by Transformer layers. It follows that prediction and modeling are not categorically distinct capacities in LLMs, but exist on a continuum. How well the model predicts in a given instance is largely due to the availability of data during training. This is the basis for the beginnings of genuine understanding in LLMs. I talk about this in some length here[2]. Odd failures and hallucinations are just the model responding from different points along the prediction-modeling spectrum.

[1] https://arxiv.org/abs/1912.10077

[2] https://www.reddit.com/r/naturalism/comments/1236vzf/on_larg...

cma · on May 13, 2023

There are also limits to the sequence to sequence things they can effectively learn, in some cases weaker than LSTMs:

https://arxiv.org/abs/2207.02098

landgenoot · on May 13, 2023

Base64 is just another translation with very basic rules, you can even do it by hand [^1].

How hard would it be for an LLM to convert a string to binary? That's just a lookup table. How hard would it be to remove all spaces and add spaces every 6 bits? And convert that to letters. That's a lookup table again.

[1]: https://pthree.org/2011/04/06/convert-text-to-base-64-by-han...

Someone · on May 13, 2023

I don’t know how LLM do that, but I would think base64 is easier than arithmetic. ‘All’ you need (more or less) to do it perfectly is a table with 256 × 256 × 256 entries containing four-character ascii strings and the ability to chop up byte sequences at every 3rd byte.

Also, changing a character in the input only has local changes; it changes at most 2 characters in other encoded byte stream.

In arithmetic, on the other hand, a single character change can have effects at an arbitrary long-range.

SkyPuncher · on May 13, 2023

I don’t find this one bit surprising. It’s no different than language translation.

OpenAI has an association between Base64 tokens and plain text. It’s likely a pretty high correlation, but like everything, it likely has some unpredictable edge cases.

dave84 · on May 13, 2023

Based on my understanding from the recent TED talk [0] they mention that GPT-4 can for example add 40 digit numbers perfectly, but makes mistakes on adding a 40 digit and a 35 digit number. It hasn’t worked out a generic understanding of arithmetic but several smaller case specific ones which may not always be correct.

[0] https://youtu.be/C_78DM8fG6E

famouswaffles · on May 13, 2023

GPT-4 doesn't really fail at simple arithmetic.

But it's a harder task to learn because arithmetic doesn't encode information about its solution in preceding context. "9383 + 3545" or "is any of the following numbers a prime, 96885, 66576, 4766 ?" doesn't actually tell you anything that would inform the answer. You go to school and you learn the required set of steps for solving these problems.

On the hand, for "John is smiling so he is _____", the preceding context screams happy as a very likely choice. Preceding context actually helps finding the solution rather than being the equivalent of deadweight.

And you simply didn't understand LLMs as well as you thought you did.

Language models trained on Code reason better, even on benchmarks that have nothing to do with code. https://arxiv.org/abs/2210.07128

Encoding/Decoding Base64 is neat but not particularly mindblowing unless you have some serious misconceptions on what language models are capable of.

sinuhe69 · on May 13, 2023

Albeit a bit condescending, you answer didn’t actually answer the OP’s question. So how could GPT-4 encode and decode base64? Was it explicitly trained on this task or there are enough base64 text in the world so GPT could “automatically” pick it up? Or is it even an emergent property?

beebmam · on May 13, 2023

"Base64 conversions are simple arithmetic", I think is the missing piece to connecting their argument to the OP's question

famouswaffles · on May 13, 2023

The problem with that is that there's no guarantee it Zero shots base64 enc/dec the same way people would do it. and there's no way to tell. for all we know, it treats it as just another "translate x to y" task.

famouswaffles · on May 13, 2023

If there was enough Base64 in the training set then it would pick it up. It's just more text to predict. The best prediction is the correct prediction.

Maybe this answer seems a bit lacking but no one, not even the people training these models know exactly what it is these artificial neurons are learning or doing during inference so anything more is more speculation than not.

tikkun · on May 13, 2023

Here's my understanding of how GPT-4 works.

Imagine you're a supercomputer and someone feeds you billions and billions and billions of pages of text written by humans.

Then they ask you to compress it really really small.

You can't compress it that small without figuring out a lot of the underlying laws, frameworks, and rules that apply to humans, that apply to the world, and so on.

Compression == coming up with powerful frameworks that condense knowledge.

It's kind of like how with a really powerful set of rules or frameworks in math or physics, you can derive many other things.

As a side note, I suspect GPT-4 has inside its neurons a bunch of powerful frameworks about the world that humans haven't yet discovered.

famouswaffles · on May 13, 2023

> I suspect GPT-4 has inside its neurons a bunch of powerful frameworks about the world that humans haven't yet discovered.

https://www.nature.com/articles/s41587-022-01618-2

No need to suspect. You can train language models to generate novel functioning protein structures from natural language function descriptions alone, skipping any sort of folding process.

Humans don't know any abstractions that can do the same. The LLM in this instance has found a link we're unaware of.

pmoriarty · on May 13, 2023

It seems to do this ok enough for short bits of base64, but I've been unable to get it to work with encoding/decoding paragraphs or pages of it.

usgroup · on May 13, 2023

presumably if you ask it to write you some code to decode base64 it will.

presumably if you ask it to execute that same code for a input example you provide, it will.

Et viola.

propter_hoc · on May 13, 2023

FYI, it's "voilà", not "viola."

A viola is a string instrument similar to a violin: https://en.wikipedia.org/wiki/Viola

"Voilà" is a contraction of the phrase "vois là" meaning "see there."

patrulek · on May 13, 2023

Good bot.

phantompeace · on May 13, 2023

Keep this kind of comment on Reddit please.

pwdisswordfishc · on May 13, 2023

Good bot.

cjbprime · on May 13, 2023

> presumably if you ask it to execute that same code for a input example you provide, it will.

GPT-4 is run sandboxed, without the ability to execute code.

The only way it can execute code is by attempting to simulate the algorithms, from its understanding of the provided code, during its inference pass. So it does that when it can, oftentimes with mistakes because it doesn't have actual access to working memory other than through tokens that it outputs.

raincole · on May 13, 2023

But it's just not how GPT-4 works. If it works that way internally, it'll be very consistently decoding base64. It actually fails from time to time for longer inputs.

You projecting your thoughts onto GPT-4. It's how a human will solve this problem. It's a perfectly valid solution, but it doesn't mean GPT-4 does that.

blibble · on May 13, 2023

it's probably loaded with heuristics as a pre-filter that modifies the prompt by decoding the base64