Hi AI ethics enthusiasts,
Generative AI can seem almost like magic. The impressive results we get when interacting with chatbots can leave the impression that they are objective, all-knowing, or even sentient. But they are none of those things. Understanding how generative AI works can help us avoid pitfalls and hype.
Therefore, I’ve written a straightforward explanation for people without technical backgrounds. I’ll start with a general explanation of how machine learning works because that is the foundation of generative AI. I will use that to explain why people say that AI algorithms are black boxes, why training data is important, and what causes bias in AI. Then, I will explain how generative AI works.
My explanation is based on two outstanding resources, which I highly recommend:
My explanation of how machine learning works is based on Sam Sartor’s video “Why neural networks aren't neural networks”. The images in that part of the explanation are taken from this video.
My explanation of how generative AI works is based on Beatriz Stollnitz’s wonderful article “How GPT Models Work”.
For dessert, an AI-generated take on this post is at the end!
1. Terminology
Machine Learning (ML) algorithms find patterns in past data and apply them to new data.
Often, people think of Artificial Intelligence (AI) as a broader category, which includes machine learning and other things like robotics (making stuff move).
Generative AI is an AI algorithm that generates outputs like texts, images, audio, and video. It is contrasted by what people sometimes call “predictive AI,” which produces predictions, such as the likelihood that a house will sell at a certain price. However, this distinction is misleading because, as we will see, generative AI is also all about predictions.
I’ll note that there are many disagreements about these terms, and changes are happening over time as the technologies develop. Therefore, you may encounter other approaches.
2. The Basics of Machine Learning
Machine learning algorithms work in three basic stages: problem definition, training, and inference. Let’s illustrate them in a simple example, taken from Sam Sartor’s video, as I’ve mentioned.
Problem definition
Suppose we want an algorithm to sort strawberries and blueberries. Since strawberries are generally heavier than blueberries, our algorithm can sort them by weight. So, the problem the algorithm will solve is sorting berries by weight.
Training
We will first need to know each berry’s weight range. If we don’t already know these ranges, we will start with data collection: we will find the weights of many strawberries and blueberries. We will build a machine learning algorithm to process this data and find a threshold separating them. The process by which the algorithm finds the threshold is called “training.”
Inference
In the inference stage, the algorithm applies what it learned. Given a berry’s weight, it will tell us whether it is a strawberry or a blueberry based on the rule it found. Problem solved!
Now you know the basics!
In reality, most problems are much more complex. The brilliance of computer scientists and engineers lies (among other things) in how they design algorithms to overcome the complexity. Different algorithms vary in how they do that, but the principle is the same: Learn patterns from past data and apply them to new instances. In this way, machine learning algorithms can conduct a wide range of tasks, such as sorting images of cats and dogs, analyzing resumes, and predicting housing prices.
If you’d like to learn more, I highly recommend watching Sam Sartor’s video: “Why neural networks aren't neural networks.” It is only 8 minutes long and very illuminating.
Black boxes
Now you can understand why machine learning algorithms are called black boxes. The engineers’ work is to design an algorithm that finds patterns in the training data. The engineers don’t look for the patterns themselves. Moreover, typically, the engineers can’t tell what the patterns are. This is the sense in which machine learning and artificial intelligence are black boxes: we don’t know the reasons for their conclusions.
Training data and bias
Now you can also understand the importance of training data. If the data the algorithm trains on isn’t representative of the items it will encounter in reality, it will not be able to find the correct rules. For example, suppose we only collected weights of berries grown in the US but berries from Mexico have different weights. In that case, our algorithm will fail to sort Mexican berries correctly, no matter how sophisticated it is. This is why people say “garbage in, garbage out” about AI algorithms -- if the training data is no good, the outputs won’t be good either.
This issue is an important source of bias in machine learning and artificial intelligence. For example, the data used for training typically comes from Western countries, especially from the US, and from general sources across the internet. Since this data isn’t fully representative, the algorithms extract biased patterns.
However, providing representative data to the algorithm isn’t sufficient to get good results and avoid bias. This is a very common misconception. Much depends on the design choices engineers make when building the algorithm, and there are ways to adjust even when the data is deficient.
3. The Basics of Generative AI
Now, let's shift gears to generative AI. Generative AI works in the same three stages: problem definition, training, and inference. I will illustrate using an example from Beatriz Stollnitz’s great article “How GPT Models Work.”
Problem definition
In language models, like the ones that power chatbots, the task is to generate text. More precisely, the task is to determine which text is most likely to follow a previous piece of text.
To define this task, the text is divided into smaller units called “tokens,” which we can think of as parts of words or strings of characters.
Now, we can define the task as predicting the next token in the series. We can think of the prompts we send chatbots like chains of tokens. The chatbot’s goal is to determine what should follow our chain:
Training
How do generative AI algorithms determine which token should come next? The answer is intuitive. Think of the following sentence:
“The wheels of the bus go round and ________. “
What is the next token? How do you know?
The next token is “round.” And you know this because you’ve heard this sentence many times before. The same is true for generative AI algorithms — they detect patterns in texts in their training sets.
This explanation may seem simplistic, given the impressive outputs from generative AI. This is where the amount of data and the brilliance of engineers come in. Engineers find sophisticated ways to get the algorithms to detect intricate patterns in massive piles of data.
The resulting patterns are statistical, assigning probabilities to possible tokens. These probabilities are used in the inference stage.
Inference
At the inference stage, the algorithm chooses which text to output based on the probabilities it found during the training.
In this way, generative AI forms words, sentences, and paragraphs, adding one token at a time.
Wrapping Up
I hope this simplified explanation has shed some light on the inner workings of machine learning and generative AI. By demystifying these technologies, we can have more informed discussions about their impact and potential.
In particular, we can understand why:
AI algorithms are not sentient
AI algorithms are not objective
AI algorithms are not always right
What are your thoughts on this explanation? Did you find any parts particularly illuminating or confusing? I'd love to hear your feedback and continue this conversation.
Remember, as we navigate the AI revolution, knowledge is our best tool for ensuring these technologies are developed and used responsibly.
Dessert
An AI-generated take on this post!
In the words of the “artists”:
“Generative Soundwave Cityscape: A cityscape silhouette formed by sound wave patterns. The buildings are audio waveforms in various shades of cyan (#00e2fa), representing input data. Pink (#e3407a) waveforms weave through and around the cyan forms, depicting the generative AI process. The sky above the city is filled with spectrograms and frequency charts. Dark background. Artistic style: Minimalist vector art combined with audio visualization techniques.”
Ready for More?
Check out our comprehensive resources, workshops, and consulting services at www.techbetter.ai, and follow us on LinkedIn: Ravit's page, TechBetter's page.
I have shared this clip with so many folks. Such a useful description (video imaging) of the statistics that are used to build algorithms and train the data.