I want to pass on my abstracting of an interesting article by Venkatesh Rao, another instance of my using MindBlog as my personal filing system to be sure I can come back to - and refresh my recall of - ideas I think are important. I also pass on ChatGPT 3.5 and ChatGPT 4's summaries of my summary!
The Physics of Intelligence - The missing discourse of AI
There are strong philosophy and engineering discourses, but no physics discourse. This is a problem because when engineers mainline philosophy questions in engineering frames without the moderating influence of physics frames, you get crackpottery…I did not say the physics of artificial intelligence…The physics of intelligence is no more about silicon semiconductors or neurotransmitters than the physics of flight is about feathers or aluminum.
Attention is the focus of one of the six basic questions about the physics of intelligence that I’ve been thinking about. Here is my full list:
What is attention, and how does it work?
What role does memory play in intelligence?
How is intelligence related to information?
How is intelligence related to spacetime?
How is intelligence related to matter?
How is intelligence related to energy and thermodynamics?
The first three are obviously at the “physics of intelligence” level of abstraction, just as “wing” is at the “physics of flight” level of abstraction. The last three get more abstract, and require some constraining, but there are already some good ideas floating around on how to do the constraining…We are not talking about the physics of computation in general…computation and intelligence are not synonymous or co-extensive…To talk about intelligence, it is necessary, but not sufficient, to talk about computation. You also have to talk about the main aspects of embodiment: spatial and temporal extent, materiality, bodily integrity maintenance in relation to environmental forces, and thermodynamic boundary conditions.
What is attention, and how does it work?
A computer is “paying attention” to the data and instructions in the CPU registers in any given clock cycle…but fundamentally, attention is not a design variable used in complex ways in basic computing. You could say AI begins when you start deploying computational attention in a more dynamic way.
Attention is to intelligence as wing is to flight. The natural and artificial variants have the same sort of loose similarity. Enough that using the same word to talk about both is justified…In AI, attention refers primarily to a scheme of position encoding of a data stream. Transformer models like GPT keep track of the position of each token in the input and output streams, and extract meaning out of it. Where a word is in a stream matters almost as much as what the word is.
You can interpret these mechanisms as attention in a human sense. What is in the context of a text? In text streams, physical proximity (tokens before and after), syntactic proximity (relationship among clauses and words in a formal grammatical sense) and semantic proximity (in the sense of some words, including very distant ones, being significant in the interpretation of others) all combine to create context. This is not that different from how humans process text. So at least to first order, attention in human and artificial systems is quite similar.
But as with wings, the differences matter. Human attention, arguably, is not primarily about information processing at all. It is about energy management. We pay attention to steady our energy into a steady, sustainable, and renewable flow. We engage in low-information activities like meditation, ritual, certain kinds of art, and prayer to learn to govern our attention in specific ways. This is not to make us better at information processing, but for other reasons, such as emotion regulation and motivation. Things like dopamine loops of satisfaction are involved. The use of well-trained attention for better information processing is only one of the effects.
Overall, human attention is more complex and multifaceted than AI attention, just as bird wings are fundamentally more complex mechanically. Attention in the sense of position-encoding for information processing is like the pure lift function of a wing. Human attention, in addition, serves additional functions analogous to control and propulsion type functions.
What role does memory play in intelligence?
The idea of attention leads naturally to the idea of memory. Trivially, memory is a curated record of everything you’ve paid attention to in the past…An obvious way to understand current developments in AI is to think of LLMs and LIMs as idiosyncratically compressed atemporal textual and visual memories. Multimodal models can be interpreted as generalizations of this idea.
Human memory is modulated by evolving risk perceptions as it is laid down, and emotions triggered by existing memories regulates current attention, determining what new data gets integrated into the evolving model (as an aside, this is why human memory exists as a kind of evolving coherent narrative of self, rather than just as a pile of digested stuff).
Biological organism memory is not just an undifferentiated garbage record (LGM) of what you paid attention to in the past; it shapes what you pay attention to in the future very directly and continuously. Biological memory is strongly opinionated memory. If a dog bites you…you can’t afford to separate training and inference in the sense of “training” on a thousand dog encounters…you have to use your encounter with Dog 1 to shape your attentional response to Dog 2. Human memories are like LGMs, except that the training process is regulated by a live emotional regulation feedback loop that somehow registers and acts on evolving risk assessments. There’s a term for this in psychology (predictive coding or predictive processing) with a hand-wavy theory of energy-minimization attached, but I don’t find it fully satisfying.
I have a placeholder name for this scheme, but as yet it’s not very fleshed out. Biological memories are Large Danger Models (LDMs).
Why just danger? Why not other signals and drives towards feeding, sex, interestingness, poetry, and so on? I have a stronger suspicion that danger is all you need to generate human-like memory, and in particular human-like experience of time. Human memory is the result of playing to continue the game, ie an infinite-game orientation. Once you have that in place, everything else emerges. It’s not as fundamental as basic survival.
AIs don’t yet count as human-equivalent to me: they’re in no danger, ever. Since we’re in the brute-force stage of training AI models, we train them on basically everything we have, with no danger signal accompanying any of it…AIs today develop their digested memories with no ongoing encoding or capture of the evolving risk and emotion assessments that modulate human memories. Even human grade schools, terrible as they are, do a better job than AI training protocols…the next big leap should be achievable by encoding some kind of environmental risk signal. Ie, we just need to put AIs in danger in the right way. My speculative idea of LDMs don’t seem that mysterious. LDMs are an engineering moonshot, not a physics mystery.
To lay it out more clearly, consider a thought experiment...Suppose you put a bunch of AIs in robot bodies, and let them evolve naturally, while scrounging resources for survival. To keep it simple, let’s say they only compete over scarce power outlets to charge their batteries. Their only hardcoded survival behavior is to plug in when running low….Let’s say the robots are randomly initialized to pay attention differently to different aspects of data coursing through them. Some robots pay extra attention to other robots’ actions. Other robots pay extra attention to the rocks in the environment. Obviously, the ones that happen to pay attention in the right ways will end up outcompeting the ones who don’t. The sneaky robots will evolve to be awake when other robots are powered down or hibernating for self-care, and hog the power outlets then. The bigger robots will learn they can always get the power outlets by shoving the smaller ones out of the way.
Now the question is: given all the multimodal data flowing through them, what will the robots choose to actually remember in their limited storage spaces, as their starter models get trained up? What sorts of LDMs will emerge? How will the risk modulation emerge? What equivalent of emotional regulation will emerge? What sense of time will emerge?
The thought experiment of LDMs suggests a particular understanding of memory in relation to intelligence: memory is risk-modulated experiential data persistence that modulates ongoing experiential attention and risk-management choices....It’s a bit of a mouthful, but I think that’s fundamentally it.
I suspect the next generation of AI models will include some such embodiment feedback loop so memory goes from being brute-force generic persistence to persistence that’s linked to embodied behaviors in a feedback loop exposed to environmental dangers that act as survival pressures.
The resulting AIs won’t be eidetic idiot savants, and less capable in some ways, but will be able to survive in environments more dangerous than datacenters exposed to the world only through sanitized network connections. Instead of being Large Garbage Models (LGMs), they’ll be Large Danger Models (LDMs).
How is intelligence related to information?
We generally think about information as either primitive (you just have to know it) or entailed (you can infer it from what you already know)…Primitive information is a kind of dehydrated substance to which you can add compute (water) to expand it. Entailed information can be dehydrated into primitive form. Compression of various sorts exploits different ways of making the primitive/entailed distinction.
When you think of intelligence in relation to information though, you have to get more sophisticated…We think in terms of whether or not new data patterns require new categories, or simply modify the strengths of, and interrelationships among, existing ones…are you looking at something new, or is this just a variant or instance of something you already know?
Information for an intelligent system them, is best understood in an ontological novelty way rather than an information-theoretic way. Because it is not as fundamental an abstraction level, it is more arbitrary, which means how you factor your experience stream into categories is as much a function of the idiosyncrasies of your survival mode as it is a function of the bits-and-bytes level structure of what you’re experiencing…the models are making up weird compressions that are not human-understandable. That’s what information is to an intelligent system: efficient ontologies that reflect how that particular intelligence is situated in its environment.
Or to put it more plainly: information is the hallucinations an intelligence makes up to categorize reality compactly, in order to survive efficiently.
The hallucinations that LLMs make up — papers and citations that don’t exist, weird words that show up in images, strange new symbols, inscrutable compressions — are neither errors in truth-seeking, nor evidence of weird kinds of magic going on. It is just raw information (in the bits and bytes Shannon sense) that has been judged for salience to the basic challenge of survival, and represented accordingly, in evolving memory. It is ontologically compressed experiential data. It’s just extra weird because the models are evolving in a trivial non-dangerous environment. So distinctions between robustly real categories and fragile hallucinations don’t matter. An academic can get fired for making up a citation. An LLM just gets wonder and mystique.
This is why information comes after memory in my physics of intelligence. Your memories are a function of a real-time danger management behavior. Your emergent categories of cognition did not evolve to be true in any rigorous sense. They evolved to help you survive. If someone who names and classifies certain data patterns as “ghosts” survives better than someone who names and classifies them “paranoid projections,” that’s how it goes. Truth is, in a evolutionary sense, the most robust representation of experience that helps you survive efficiently in the most challenging environments.
Models that describe reality in a broad and interesting way emerge when the danger environment tests you enough that you can’t bullshit to survive, but not so much that you must bullshit to survive. Both disconnected daydreams and psychotic breaks are poor foundations for existing in reality.
Spacetime, Matter, and Energy NOTE: I am not including clips from this section in this post.
Here is ChatGPT 3.5's response to my request to summarize the above in 1000 words:
The article discusses the importance of including a physics discourse in the field of AI. The author proposes six basic questions about the physics of intelligence: attention, memory, information, spacetime, matter, and energy and thermodynamics. The concept of attention is related to position encoding of a data stream, and while it has some similarities between human and artificial systems, the differences are significant. Human attention is about energy management and regulation, and it is multifaceted. Memory is a curated record of everything a person has paid attention to in the past, but it is also modulated by evolving risk perceptions and emotions triggered by existing memories. Biological memory is strongly opinionated memory, and the training process is regulated by a live emotional regulation feedback loop that somehow registers and acts on evolving risk assessments. The author suggests that biological memories are Large Danger Models (LDMs) and that danger is all that is needed to generate human-like memory and experience of time.
And here is ChatGPT 4's response, slightly edited to remove one obvious glitch:
The Physics of Intelligence: The Missing Discourse of AI
The discourse on artificial intelligence (AI) is currently dominated by philosophy and engineering but lacks a physics perspective. To truly understand intelligence, it is necessary to consider its physical aspects, such as spatial and temporal extent, materiality, and thermodynamics.
Six questions about the physics of intelligence are proposed:
What is attention, and how does it work?
What role does memory play in intelligence?
How is intelligence related to information?
How is intelligence related to spacetime?
How is intelligence related to matter?
How is intelligence related to energy and thermodynamics?
Attention is fundamental to intelligence. In AI, attention refers to the position encoding of a data stream. While human and artificial attention mechanisms are similar, human attention is more complex and multifaceted, serving additional functions related to energy management, motivation, and emotion regulation.
Memory is the curated record of everything one has paid attention to in the past. Current AI models, such as LLMs (Language Models) and LIMs (Image Models), can be seen as compressed textual and visual memories. Human memory, however, is modulated by risk perceptions, emotions, and experiences. The idea of Large Danger Models (LDMs) is proposed, where memory is risk-modulated experiential data persistence that modulates ongoing experiential attention and risk-management choices. The next generation of AI models may include embodiment feedback loops that link memory to embodied behaviors in a feedback loop exposed to environmental dangers, resulting in more capable AIs.
Intelligence is related to information through efficient ontologies that reflect how an intelligence is situated in its environment. For intelligent systems, information is best understood in terms of ontological novelty rather than information-theoretic ways. Information is the hallucinations an intelligence makes up to categorize reality compactly to survive efficiently. This is why information comes after memory in the physics of intelligence, as memories are a function of real-time danger management behavior. Truth, in an evolutionary sense, is the most robust representation of experience that helps an organism survive efficiently in challenging environments.
The physics of intelligence is not solely about computation but also involves understanding the physical aspects of intelligence, such as attention, memory, and information. Future AI models may need to incorporate embodiment feedback loops and risk modulation to develop more human-like intelligence.