I considered the "pretrained via genetics" possibility but I don't see how that ...

the8472 · on June 24, 2020

It's not really pretraining, more of an architecture optimized for the problem-space the brain is dealing with. There are specialized regions for different tasks while a transformer is uniform. Being uniform makes it easier to scale, but it probably means while being generic it's also kind of inefficient so it needs way more training and parameters than an optimized architecture.

Brains are also quite energy-efficient, which makes parallel search of architectures a lot cheaper. A small rodent having a handful of offspring every year is equivalent to building new supercomputers with specialized hardware beyond current capabilities on a tiny budget and then using that to train a model and seeing whether it's better. Researchers don't have that luxury so they have to compromise by running a very generic model that is flexible but inefficient.

roca · on June 24, 2020

That makes sense but it kind of falls into my "brain has a better architecture" possible explanation. What's interesting is that the thrust of the GPT approaches is "architecture doesn't matter while we can just keep building bigger models".

It would be surprising to me if AGI is achievable via two such different pathways. I don't know this area so I'm willing to be surprised.

the8472 · on June 25, 2020

> That makes sense but it kind of falls into my "brain has a better architecture" possible explanation. What's interesting is that the thrust of the GPT approaches is "architecture doesn't matter while we can just keep building bigger models".

Those are not mutually exclusive. Architecture doesn't matter asymptotically but it does matter for task-specific performance for any concrete complexity budget. Compare to Big-O behavior of algorithms (asymptotes) vs. doing hardware specific optimizations (cache-friendliness, hand-rolled assembly, outsourcing specific parts to ASICs...).

OA is doing the former, looking at asymptotes. Nature has done that too (mammalian brains scale from rodents to humans) but also has applies all kinds of task-specific optimizations. Compare the cerebral cortex and cerebellum for example, the latter acts similar to an application accelerator but if it doesn't work software fallbacks are possible albeit slower.

> It would be surprising to me if AGI is achievable via two such different pathways.

Have you considered cephalopod or bird brains?