Skip to content

Building an LLM from Zero

So Simple You Can Teach It to Your Kids

"Any sufficiently advanced technology is indistinguishable from magic." (Arthur C. Clarke)

"Until you build it yourself, then it's just a lot of matrix multiplications." (This Tutorial)


Welcome! This is a step-by-step guide to building a GPT-style language model from scratch. By the end, you'll have a working model that generates Shakespeare-like text, and you'll understand every single line of code that makes it happen.

No GPU. No cloud. No magic. Just Python, PyTorch, and your laptop.


Who Is This For?

You, if you: - Know basic Python (variables, functions, loops, classes) - Are curious about how ChatGPT actually works under the hood - Have ever wondered: "Is it really just predicting the next word?" (Spoiler: yes. Mostly.)

You do not need to know calculus, linear algebra, or any machine learning framework. We explain everything from the very beginning.


What You Will Build

A character-level GPT model (~825,000 parameters) trained on the complete works of Shakespeare. After training, it generates text like this:

ROMEO:
What is a man, if his chief good and market
Of his time be but to sleep and feed?
A beast, no more.

(Approximately. Your model's output will vary, it has its own creative flair.)


Before You Start

Make sure you have completed the setup steps in README.md:

  1. Python 3.10+
  2. PyTorch installed (CPU-only is fine)
  3. Run: python src/utils/download_data.py

Table of Contents

Part 1: Foundations

Understanding the ingredients before we cook.

Chapter Title Key Idea
Ch 01 What Is a Language Model? Next-token prediction
Ch 02 Tensors and PyTorch Basics The "smart array"
Ch 03 Tokenization Text to numbers
Ch 04 Embeddings Numbers to meaning

Part 2: The Attention Mechanism

The secret ingredient that makes transformers special.

Chapter Title Key Idea
Ch 05 Self-Attention Tokens talking to each other
Ch 06 Multi-Head Attention Multiple perspectives
Ch 07 Feed-Forward and Layer Norm Thinking it over

Part 3: The Transformer

Assembling the engine.

Chapter Title Key Idea
Ch 08 The Transformer Block One complete unit
Ch 09 The Full GPT Architecture The whole stack
Ch 10 Causal Language Modeling No cheating!

Part 4: Training

Teaching the model to write.

Chapter Title Key Idea
Ch 11 Dataset and DataLoader Loading the dishwasher
Ch 12 The Training Loop Repeat until smart
Ch 13 Checkpointing Saving your game

Part 5: Generation

Making it write.

Chapter Title Key Idea
Ch 14 Greedy and Sampling Safe vs. risky
Ch 15 Temperature and Top-k The creativity dial
Ch 16 Putting It All Together The full pipeline

Ready? Let's start with Chapter 1.