How Does Generative AI Work?

Plus: Why is data important, and what causes bias?

Sep 08, 2024

Hi AI ethics enthusiasts,

Generative AI can seem almost like magic. The impressive results we get when interacting with chatbots can leave the impression that they are objective, all-knowing, or even sentient. But they are none of those things. Understanding how generative AI works can help us avoid pitfalls and hype.

Therefore, I’ve written a straightforward explanation for people without technical backgrounds. I’ll start with a general explanation of how machine learning works because that is the foundation of generative AI. I will use that to explain why people say that AI algorithms are black boxes, why training data is important, and what causes bias in AI. Then, I will explain how generative AI works.

My explanation is based on two outstanding resources, which I highly recommend:

My explanation of how machine learning works is based on Sam Sartor’s video “Why neural networks aren't neural networks”. The images in that part of the explanation are taken from this video.
My explanation of how generative AI works is based on Beatriz Stollnitz’s wonderful article “How GPT Models Work”.

For dessert, an AI-generated take on this post is at the end!

1. Terminology

Machine Learning (ML) algorithms find patterns in past data and apply them to new data.

Often, people think of Artificial Intelligence (AI) as a broader category, which includes machine learning and other things like robotics (making stuff move).

Generative AI is an AI algorithm that generates outputs like texts, images, audio, and video. It is contrasted by what people sometimes call “predictive AI,” which produces predictions, such as the likelihood that a house will sell at a certain price. However, this distinction is misleading because, as we will see, generative AI is also all about predictions.

I’ll note that there are many disagreements about these terms, and changes are happening over time as the technologies develop. Therefore, you may encounter other approaches.

2. The Basics of Machine Learning

Machine learning algorithms work in three basic stages: problem definition, training, and inference. Let’s illustrate them in a simple example, taken from Sam Sartor’s video, as I’ve mentioned.

Problem definition

Suppose we want an algorithm to sort strawberries and blueberries. Since strawberries are generally heavier than blueberries, our algorithm can sort them by weight. So, the problem the algorithm will solve is sorting berries by weight.

Training

We will first need to know each berry’s weight range. If we don’t already know these ranges, we will start with data collection: we will find the weights of many strawberries and blueberries. We will build a machine learning algorithm to process this data and find a threshold separating them. The process by which the algorithm finds the threshold is called “training.”

Inference

In the inference stage, the algorithm applies what it learned. Given a berry’s weight, it will tell us whether it is a strawberry or a blueberry based on the rule it found. Problem solved!

Now you know the basics!

In reality, most problems are much more complex. The brilliance of computer scientists and engineers lies (among other things) in how they design algorithms to overcome the complexity. Different algorithms vary in how they do that, but the principle is the same: Learn patterns from past data and apply them to new instances. In this way, machine learning algorithms can conduct a wide range of tasks, such as sorting images of cats and dogs, analyzing resumes, and predicting housing prices.

If you’d like to learn more, I highly recommend watching Sam Sartor’s video: “Why neural networks aren't neural networks.” It is only 8 minutes long and very illuminating.

Black boxes

Now you can understand why machine learning algorithms are called black boxes. The engineers’ work is to design an algorithm that finds patterns in the training data. The engineers don’t look for the patterns themselves. Moreover, typically, the engineers can’t tell what the patterns are. This is the sense in which machine learning and artificial intelligence are black boxes: we don’t know the reasons for their conclusions.

Training data and bias

Now you can also understand the importance of training data. If the data the algorithm trains on isn’t representative of the items it will encounter in reality, it will not be able to find the correct rules. For example, suppose we only collected weights of berries grown in the US but berries from Mexico have different weights. In that case, our algorithm will fail to sort Mexican berries correctly, no matter how sophisticated it is. This is why people say “garbage in, garbage out” about AI algorithms -- if the training data is no good, the outputs won’t be good either.

This issue is an important source of bias in machine learning and artificial intelligence. For example, the data used for training typically comes from Western countries, especially from the US, and from general sources across the internet. Since this data isn’t fully representative, the algorithms extract biased patterns.

However, providing representative data to the algorithm isn’t sufficient to get good results and avoid bias. This is a very common misconception. Much depends on the design choices engineers make when building the algorithm, and there are ways to adjust even when the data is deficient.

3. The Basics of Generative AI

Now, let's shift gears to generative AI. Generative AI works in the same three stages: problem definition, training, and inference. I will illustrate using an example from Beatriz Stollnitz’s great article “How GPT Models Work.”