## Monday, March 11, 2024

### How AI’s GPT engines work - Lanier’s forest and trees metaphor.

Jaron Lanier does a piece in The New Yorker titled "How to Picture A.I." (if you hit the paywall by clicking the link, try opening an 'empty tab" on your browser, then copy and paste in the URL that got you the paywall). I tried to do my usual sampling of small chunks of text to give the message, but found that very difficult, and so I pass several early paragraphs and urge you to read the whole article. Lanier's metaphors give me a better sense of what is going on in a GPT engine, but I'm still largely mystified. Anyway, here's some text:
In this piece, I hope to explain how such A.I. works in a way that floats above the often mystifying technical details and instead emphasizes how the technology modifies—and depends on—human input.
Let’s try thinking, in a fanciful way, about distinguishing a picture of a cat from one of a dog. Digital images are made of pixels, and we need to do something to get beyond just a list of them. One approach is to lay a grid over the picture that measures something a little more than mere color. For example, we could start by measuring the degree to which colors change in each grid square—now we have a number in each square that might represent the prominence of sharp edges in that patch of the image. A single layer of such measurements still won’t distinguish cats from dogs. But we can lay down a second grid over the first, measuring something about the first grid, and then another, and another. We can build a tower of layers, the bottommost measuring patches of the image, and each subsequent layer measuring the layer beneath it. This basic idea has been around for half a century, but only recently have we found the right tweaks to get it to work well. No one really knows whether there might be a better way still.
Here I will make our cartoon almost like an illustration in a children’s book. You can think of a tall structure of these grids as a great tree trunk growing out of the image. (The trunk is probably rectangular instead of round, since most pictures are rectangular.) Inside the tree, each little square on each grid is adorned with a number. Picture yourself climbing the tree and looking inside with an X-ray as you ascend: numbers that you find at the highest reaches depend on numbers lower down.
Alas, what we have so far still won’t be able to tell cats from dogs. But now we can start “training” our tree. (As you know, I dislike the anthropomorphic term “training,” but we’ll let it go.) Imagine that the bottom of our tree is flat, and that you can slide pictures under it. Now take a collection of cat and dog pictures that are clearly and correctly labelled “cat” and “dog,” and slide them, one by one, beneath its lowest layer. Measurements will cascade upward toward the top layer of the tree—the canopy layer, if you like, which might be seen by people in helicopters. At first, the results displayed by the canopy won’t be coherent. But we can dive into the tree—with a magic laser, let’s say—to adjust the numbers in its various layers to get a better result. We can boost the numbers that turn out to be most helpful in distinguishing cats from dogs. The process is not straightforward, since changing a number on one layer might cause a ripple of changes on other layers. Eventually, if we succeed, the numbers on the leaves of the canopy will all be ones when there’s a dog in the photo, and they will all be twos when there’s a cat.
Now, amazingly, we have created a tool—a trained tree—that distinguishes cats from dogs. Computer scientists call the grid elements found at each level “neurons,” in order to suggest a connection with biological brains, but the similarity is limited. While biological neurons are sometimes organized in “layers,” such as in the cortex, they are not always; in fact, there are fewer layers in the cortex than in an artificial neural network. With A.I., however, it’s turned out that adding a lot of layers vastly improves performance, which is why you see the term “deep” so often, as in “deep learning”—it means a lot of layers.