Understanding ChatGPT through what ChatGPT Understands

Abhijit Mahabal
7 min readFeb 26, 2023

(I have moved to Substack: https://amahabal.substack.com/. I will cross-post a few here, but if you want to follow my posts, please subscribe there).

ChatGPT is an idiot savant: on the one hand, capable of writing fascinating prose and “passing” “bar exams”, and on the other, a believer in the division of labor, that nine women can produce a baby in one month. On the one hand, capable of solving difficult problems in theory of computation that would stump most undergrads, on the other, unable to figure out in chess how to mate a lone king with a king and a queen, something rank novices have little difficulty with. How can we understand the intellectual capabilities of this system and predict where it could work and where it will fall flat on its face?

I present one explanation for these mixed grades by analyzing ChatGPT’s “knowledge” through the prism of cognitive psychology’s notion of categories or (equivalently) concepts. In this first in a series of three posts, I lay the groundwork:

  • I ask if it even make sense to talk about ChatGPT’s concepts — isn’t it just a stochastic parrot?
  • If ChatGPT’s failures are caused by deficient concepts, I ask if this can be remedied.
  • Finally, I offer a detailed evolutionary analogy to suggest how ChatGPT may come to have the concepts it does, however limited.

In the remaining two posts, we will walk through five families of concepts of different granularities, starting with short words and phrases and ending with concepts that organize entire documents, exploring ChatGPT’s fluency and error-making. As Hofstadter and Moser put it, to err is human, but to study error-making is cognitive science.

On Having Concepts

Does ChatGPT know anything? Many folks take a hard-line “no” position on this. In brief, the argument is that while ChatGPT may know a lot about how to assemble words, the words themselves are hollow; it can say much about cats, but it has no idea what a cat is. Certainly, its hilarious errors clearly support this assertion, such as the one previously mentioned of nine women producing a baby in a month. And the other day ChatGPT told me that it is illegal for a person to have two sons-in-law as that is tantamount to polygamy, though it was careful to helpfully tell me this is illegal only in certain jurisdictions.

While there are merits in this position of “no meaning, just a statistical parrot”, it is possible that ChatGPT is learning something about words that, in bits and pieces, is locally isomorphic to some things in the real world, tiny islands of meaningfulness that render it a useful tool. Stray too far from these islands of sensibility and ChatGPT wanders into murky silliness.

If this is the case, we can ask if the advent of technology or availability of better training data will enlarge these islands of sensibility: more things done better with fewer errors. Or is it perhaps that some areas are inherently unreachable without significant additional breakthroughs?

I will argue that things will get better, but some things are well nigh unreachable with current methods. I will argue that some other things we do will actually make the situation much, much worse, creating artificial islands that result in drowning several natural islands.

By saying that a system, whether machine or human, “has a concept”, I am not saying that there is a discrete data-structure encoding this fact: I am not here arguing for a symbolic representation. I am not even saying that the system has to be aware that it is aware. You, the person reading this, not only can recognize cats in a picture, you also know that you can recognize cats. Google’s cat-detector from 2012 can also detect cats, but there is no reason to believe that it knows that it can. The way its brain is wired, so to speak, this just happens, much like a baby’s sucking reflex.

A newborn baby likes to look at faces. In this limited sense, it “knows” faces, can somewhat tell faces apart from non-faces, although it is tricked by face-shaped things and fails to notice partially occluded faces. Over time, it gets tricked less and also recognizes obscured faces, better adjusting the conceptual boundaries of face-ness to the “real” boundaries of this category. All I am asking is what abilities analogous to face-detecting does ChatGPT have?

An Evolutionary Analogy

What a newborn knows innately through evolution having wired its neurons just so, ChatGPT “knows” due to its training, which results in the particular “weights” of its synapses.

The analogy to evolution goes further. Consider the evolution of the labradoodle. Its evolution has happened in two phases. For billions of years, natural selection incessantly and unemotionally selected organisms best fitted to their environment, producing wolves. This was not supervised by any intelligent designer. It was slow, but what it lacked in speed it made up in intricacy of its designs, in its remarkably clever and efficient use of limited resources, with complete disdain for clean solutions when messy got the job done just fine. The product of evolution, such as the human brain, resembles at first glance the spaghetti code of a bad programmer.

In the second phase of the labradoodle’s evolution, over the last few hundred years, human hand and artificial selection shaped the wolf into hundreds of dog breeds, including the labradoodle. This was a rapid phase, labor intensive, and resulted in dogs “better fitted” for human needs: better herders, less aggressive, non-shedding, and so forth. This optimization paid no heed to some things that natural selection holds paramount: the ability to have offsprings. Some dog breeds with large heads, such as the Bulldog, require caesarians. Wouldn’t fly with natural selection!

ChatGPT is also trained in two phases.

The first phase of ChatGPT training involves being shown large amounts of training data and being asked to predict the next word, with an incorrect prediction resulting in adjusting weights so that that particular prediction would have come out better. This results in what is called a language model, and is completely unsupervised, just requiring copiously available text. Like blind natural selection, there is no intelligent designer.

The second phase of ChatGPT training is rather different and goes by the name Reinforcement Learning via Human Feedback. It involves being shown a prompt and pairs of passages, where a human has selected which passage is “better” (more informative, less offensive, better structured, etc), and weights are adjusted so that the “correct” response is picked. This is analogous to evolution by artificial selection, and can very rapidly shape the organism, much like how “the intelligent design” of dogs and pigeons has led rapidly to breeds very different from their ancestors.

The Evolution of Concepts in ChatGPT

One can say that “eyes evolved because they are useful”. Organisms that can see are better equipped to deal with threats. Ability to detect faces evolved because being able to tell apart members of your own species is an evolutionary advantage. Although such causal talk of evolution mindfully picking an alternative is obviously inaccurate, merely a shorthand for non-mindful and excruciatingly plodding processes, it is a useful shorthand without which we will have little ability to understand evolution.

Care must be exercised, however, that there exists the step-by-step progression that leads to something as complex as the eye, where partial vision apparatus is not just useless but harmful: imagine an intermediate organism that has an eye but not yet its connections to the brain — no vision, and yet additional damage-prone area is now exposed. Such an organism will perish. The links must exist from blind species to sighted species where each change is at least not harmful.

In the environment ChatGPT grows up in, next-word prediction is the sole environmental factor. Abilities that help in this regard could conceivably develop — provided that the gradual changes leading to such development are at least not harmful.

In this world, where the challenges faced include continuing sentences such as “for breakfast I had __”, it is conceivable that the system picks up both the ability to identify when it is in a situation where some “breakfast food” should be uttered and also the ability to utter some breakfast food. Note that I am not saying that it needs to associate the name “breakfast food” with this category, although that too does happen. Many categories, though, are nameless.

It may learn to identify situations where “pirate-speech” is called for, or situations where the next word should refer to a recently seen city. It may even learn the category “bullet list” whose elements are uttered not at once but interspersed with some other categories. These and other examples we will explore in depth in the next two posts, both with ChatGPT’s successes and its boo boos, replete with screenshots.

We will also look at how the heavy handedness of RLHF’s artificial selection leads to over expression of categories such as “the customer is always right”, shown with the sole screenshot in this post:

The next post is here.

Acknowledgements: I’d like to thank Sagar Khare, Tarun Garg, and Shweta Gupta for their suggestions.

--

--

Abhijit Mahabal

I do unsupervised concept discovery at Pinterest (and previously at Google). Twitter: @amahabal