LLMs do not model "certainty". This is illogical. It models the language corpus ...

tylerneylon · on Oct 25, 2024

Essentially all modern machine learning techniques have internal mechanisms that are very closely aligned with certainty. For example, the output of a binary classifier is typically a floating point number in the range [0, 1], with 0 being one class, and 1 representing the other class. In this case, a value of 0.5 would essentially mean "I don't know," and answers in between give both an answer (round to the nearest int) as well as a sense of certainty (how close was the output to the int). LLMs offer an analogous set of statistics.

Speaking more abstractly or philosophically, why could a model never internalize something read between the lines? Humans do, and we're part of the same physical system — we're already our own kinds of computers that take away more from a text than what is explicitly there. It's possible.

astrange · on Oct 25, 2024

You don't have to teach an transformer model using a language corpus even if that was the pretraining. You can e.g. write algorithms directly and merge them into the model.

https://github.com/yashbonde/rasp

https://github.com/arcee-ai/mergekit

menhguin · on Oct 25, 2024

Recent research using SAEs suggest that some neurons regulate confidence/certainty: https://arxiv.org/abs/2406.16254