Building an LLM from Zero¶
So Simple You Can Teach It to Your Kids¶
"Any sufficiently advanced technology is indistinguishable from magic." (Arthur C. Clarke)
"Until you build it yourself, then it's just a lot of matrix multiplications." (This Tutorial)
Welcome! This is a step-by-step guide to building a GPT-style language model from scratch. By the end, you'll have a working model that generates Shakespeare-like text, and you'll understand every single line of code that makes it happen.
No GPU. No cloud. No magic. Just Python, PyTorch, and your laptop.
Who Is This For?¶
You, if you: - Know basic Python (variables, functions, loops, classes) - Are curious about how ChatGPT actually works under the hood - Have ever wondered: "Is it really just predicting the next word?" (Spoiler: yes. Mostly.)
You do not need to know calculus, linear algebra, or any machine learning framework. We explain everything from the very beginning.
What You Will Build¶
A character-level GPT model (~825,000 parameters) trained on the complete works of Shakespeare. After training, it generates text like this:
ROMEO:
What is a man, if his chief good and market
Of his time be but to sleep and feed?
A beast, no more.
(Approximately. Your model's output will vary, it has its own creative flair.)
Before You Start¶
Make sure you have completed the setup steps in README.md:
- Python 3.10+
- PyTorch installed (CPU-only is fine)
- Run:
python src/utils/download_data.py
Table of Contents¶
Part 1: Foundations¶
Understanding the ingredients before we cook.
| Chapter | Title | Key Idea |
|---|---|---|
| Ch 01 | What Is a Language Model? | Next-token prediction |
| Ch 02 | Tensors and PyTorch Basics | The "smart array" |
| Ch 03 | Tokenization | Text to numbers |
| Ch 04 | Embeddings | Numbers to meaning |
Part 2: The Attention Mechanism¶
The secret ingredient that makes transformers special.
| Chapter | Title | Key Idea |
|---|---|---|
| Ch 05 | Self-Attention | Tokens talking to each other |
| Ch 06 | Multi-Head Attention | Multiple perspectives |
| Ch 07 | Feed-Forward and Layer Norm | Thinking it over |
Part 3: The Transformer¶
Assembling the engine.
| Chapter | Title | Key Idea |
|---|---|---|
| Ch 08 | The Transformer Block | One complete unit |
| Ch 09 | The Full GPT Architecture | The whole stack |
| Ch 10 | Causal Language Modeling | No cheating! |
Part 4: Training¶
Teaching the model to write.
| Chapter | Title | Key Idea |
|---|---|---|
| Ch 11 | Dataset and DataLoader | Loading the dishwasher |
| Ch 12 | The Training Loop | Repeat until smart |
| Ch 13 | Checkpointing | Saving your game |
Part 5: Generation¶
Making it write.
| Chapter | Title | Key Idea |
|---|---|---|
| Ch 14 | Greedy and Sampling | Safe vs. risky |
| Ch 15 | Temperature and Top-k | The creativity dial |
| Ch 16 | Putting It All Together | The full pipeline |
Ready? Let's start with Chapter 1.