Building an LLM from Zero¶

So Simple You Can Teach It to Your Kids¶

"Any sufficiently advanced technology is indistinguishable from magic." (Arthur C. Clarke)

"Until you build it yourself, then it's just a lot of matrix multiplications." (This Tutorial)

Welcome! This is a step-by-step guide to building a GPT-style language model from scratch. By the end, you'll have a working model that generates Shakespeare-like text, and you'll understand every single line of code that makes it happen.

No GPU. No cloud. No magic. Just Python, PyTorch, and your laptop.

Who Is This For?¶

You, if you: - Know basic Python (variables, functions, loops, classes) - Are curious about how ChatGPT actually works under the hood - Have ever wondered: "Is it really just predicting the next word?" (Spoiler: yes. Mostly.)

You do not need to know calculus, linear algebra, or any machine learning framework. We explain everything from the very beginning.

What You Will Build¶

A character-level GPT model (~825,000 parameters) trained on the complete works of Shakespeare. After training, it generates text like this:

ROMEO:
What is a man, if his chief good and market
Of his time be but to sleep and feed?
A beast, no more.

(Approximately. Your model's output will vary, it has its own creative flair.)

Before You Start¶

Make sure you have completed the setup steps in README.md:

Python 3.10+
PyTorch installed (CPU-only is fine)
Run: python src/utils/download_data.py

Table of Contents¶

Part 1: Foundations¶

Understanding the ingredients before we cook.

Chapter	Title	Key Idea
Ch 01	What Is a Language Model?	Next-token prediction
Ch 02	Tensors and PyTorch Basics	The "smart array"
Ch 03	Tokenization	Text to numbers
Ch 04	Embeddings	Numbers to meaning

Part 2: The Attention Mechanism¶

The secret ingredient that makes transformers special.

Chapter	Title	Key Idea
Ch 05	Self-Attention	Tokens talking to each other
Ch 06	Multi-Head Attention	Multiple perspectives
Ch 07	Feed-Forward and Layer Norm	Thinking it over

Part 3: The Transformer¶

Assembling the engine.

Chapter	Title	Key Idea
Ch 08	The Transformer Block	One complete unit
Ch 09	The Full GPT Architecture	The whole stack
Ch 10	Causal Language Modeling	No cheating!

Part 4: Training¶

Teaching the model to write.

Chapter	Title	Key Idea
Ch 11	Dataset and DataLoader	Loading the dishwasher
Ch 12	The Training Loop	Repeat until smart
Ch 13	Checkpointing	Saving your game

Part 5: Generation¶

Making it write.

Chapter	Title	Key Idea
Ch 14	Greedy and Sampling	Safe vs. risky
Ch 15	Temperature and Top-k	The creativity dial
Ch 16	Putting It All Together	The full pipeline

Ready? Let's start with Chapter 1.