How LLMs Are Trained

I ask Claude to help me code these steps so I can understand the concepts of LLMs better. Here is the post of how I think about LLMs (immortality). You can see examples of each step below.

Training pipeline

Click a stage to jump. Active stage is highlighted.

The same model moves through these phases; each stage changes the weights to make the outputs more useful.

Pre-training: Building the Foundation

The model reads billions of text examples from the entire internet - books, websites, articles, conversations. It learns patterns and relationships between words, creating a compressed 'zip file' of human knowledge.

Example

Input:

Knock! Knock! ...

↓

Model Predicts:

Who's there?

The model learns common patterns and can predict what comes next based on probability from the training data.

Fine-tuning: Learning Specific Styles

After pre-training, we show the model specific conversations from experts or specific people. The model learns their style, knowledge, and way of thinking.

Example

You:

Hi, how is your bitcoin going?

Expert Friend:

Yeah bits is going ok, currently tracking the hashrate trends and looking at the correlation with mining difficulty adjustments...

The model learns to respond like your cryptocurrency expert friend, using their knowledge and speaking style.

Reinforcement Learning: Independent Improvement

The model practices solving problems on its own, just like a student doing homework after class. It tries different approaches, learns from mistakes, and gets better over time.

Example

Math Problem Practice

1

Teacher shows: 2 + 3 = 5
2

Student practices: 4 + 7 = ?
3

Student tries: 11
4

Feedback: Correct! ✓
5

Student gets better through practice

The model improves beyond its training by solving new problems independently, becoming smarter than the data it was trained on.

💭 A Thought to Consider

These same three steps could be applied to your personal data - your conversations, memories, and characteristics. The result? An AI model that thinks, speaks, and writes exactly like you... potentially living on forever.

How LLMs Are Trained

Step 1: Pre-training

Step 2: Fine-tuning

Step 3: Reinforcement Learning

Pre-training: Building the Foundation

Example

Fine-tuning: Learning Specific Styles

Example

Reinforcement Learning: Independent Improvement

Example

💭 A Thought to Consider