You are on page 1of 28

Basic

I. One Node
II. Four Nodes
III. One Hidden Layer
IV. Three Inputs
V. Seven Layers
Advanced
1. Mixture of Experts (MOEs)
2. Recurrent Neural Network (RNN)
3. Mamba
4. Matrix Multiplication
5. LLM Sampling
6. MLP in PyTorch
7. Backpropagation
8. Transformer
9. Batch Normalization
10. Generative Adversarial Network (GAN)
11. Self Attention
12. Dropout
13. Autoencoder
14. Vector Database
15. CLIP
16. Residual Network (ResNet)
17. Graph Convolution Network (GCN)
18. SORA’s Diffusion Transformer (DiT)
19. Gemini 1.5's Switch Transformer
20. Reinforcement Learning with Human Feedback (RLHF)

© 2024 Tom Yeh


Link to my original LinkedIn post
with animation and explanation

Date originally posted


I. One Node 406

12.5.23 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


II. Four Nodes 148

12.6.23 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


III. Hidden Layer 82

12.7.23 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


IV. Three Inputs 105

12.13.23 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


V. Seven Layers 197

12.13.23 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


1. Mixture of Experts 683

X1 X2
2 3
1 1
3 2
1 1
Gate 1 1 0 0 Max
Network 0 1 1 0 ≈

1 0 1 0 0 1 1 0
1 1 0 0 1 1 0 0 ReLU

0 0 1 0 1 -1 0 0 ≈
-1 0 1 0 1 0 1 0
Y1 Y2
Expert 1 Expert 2

12.15.23 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


2. Recurrent Neural Network (RNN)
406
Input Sequence X 3 4 5 6

1 -1 1
Parameters A B C -1 1
1 1 2

Activation Function ɸ: ReLU

Hidden States H0
0
0

Output Sequence Y

1 1 -1 ɸ ɸ ɸ ɸ
2 1 1 ≈ ≈ ≈ ≈
-1 1

12.18.23 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


3. Mamba’s S6 Model 263

Input Sequence 3 4 5 6 Parameters

Output Sequence

Selective Structured State-Space

Scan

1 0
1 -1 0 0
0 -1
0 -1 0 1
1 0 -1 0
1 0 0 -1
1 0 -1 0 1 0

0 1 0 -1 0 -1

1 -1 0 0
0 0 -1 1
-1 0 0 0 -1 0
1 0 0 0 0 1
0 0 -1 0
0 1 0 0
1 -1 0 0
-1 0
0 0 -1 1
0 1
1 0 0 0
0 -1 1 0

12.19.23 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


4. Matrix Multiplication 127

1 1 1 5 2
X =?
-1 1 2 4 2

1.5.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


5. How does an LLM sample a sentence?
Input Embeddings 1123

LLM
Probability Distributions Random
Numbers
I .01 .01 .03
Vocab you .01 .01 .50
they .01 .01 .40
are .01 .40 .01 .34
am .01 .40 .01 .52
how .50 .05 .01 .92
why .10 .05 .01 .65
where .10 .05 .01
who .15 .01 .01
what .10 .01 .01

______ ______ ______

1.6.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


6. Multi Layer Perceptron in pytorch
2
337
1 mlp_model = nn.Sequential(
1
3
2 nn._______( ___, ___, bias = ___ ),
1

1 -1 1 -5 -1 0
3
nn._______(),
1 1 0 0 3 ReLU 3
0 1 1 1 5 ≈ 5
4
nn._______( ___, ___, bias = ___ ),
1 0 1 -2 3 3
5
nn._______(), 1 -1 1 0 2 ReLU 2
0 1 -1 1 1
≈ 1
nn._______( ___, ___, bias = ___ ),
6 1

1 -1 2 3 .95
7
nn._______() -1 1 1 0 .50
σ
8
)
1 -2 -2 -2
≈ .12
2 1 0 5 .99
-3 0 1 -5 .01

Hints:

Linear Layer: { Identity | Linear | Bilinear }


Activation Function: { ReLU | Tanh | Sigmoid }
in_features: { int }
out_features: { int }
bias: { T | F }

1.8.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


7. Backpropagation 1260

X
2
1
3
1

Layer 1
1 -1 1 -5 -1 0
1 1 0 0 3 ReLU 3
0 1 1 1 5 ≈ 5
1 0 1 -2 3 3
1

Layer 2
1 -1 1 0 0 2 ReLU 2
0 1 -1 1 3 4 ≈ 4
1

Layer 3 3
Soft
0
2 0 -1 max .5
0 2 -5 3 ≈ .5 1
1 1 1 -1 0 0
YPred YTarget
L: Cross-Entropy Loss

1.9.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


8. Transformer 1281

Features from the


Previous Block
↓ ↓ ↓ ↓ ↓

1 0 0 0 1
Attention
1 1 0 0 0
Q Attention
0 1 1 0 0 Weight
K
0 0 1 1 0 Matrix (A)
0 0 0 1 1

X1 X2 X3 X4 X5 Z1 Z2 Z3 Z4 Z5
5 6 0 7 0 Attention
0 2 4 0 3 Weighted
1 0 1 1 0 Features

1 1 1 1 1

1 -1 0 1
1 1 0 0 ReLU
0 1 1 1 ≈
-1 1 1 0
1 1 1 1 1

Position-wise 1 0 0 -1 0
Feed-Forward 0 1 1 0 0
Network (FFN) 0 0 1 -1 1

↓ ↓ ↓ ↓ ↓

Next Block

1.11.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


9. Batch Normalization 504

Mini-batch: X1 X2 X3 X4
1 0 3 0
0 3 1 1 Batch Statistics
2 1 0 2
Linear Layer 1 1 1 1 Σ µ σ2 σ
1 0 1 0
ReLU
1 1 0 -1
0 2 -1 0 ≈
Normalize µ
Sum (Σ)
- Mean (µ)
Variance (σ2)
Std Dev (σ)

σ
÷

1 1 1 1

Scale & Shift 2 0 0 0 Trainable


0 3 0 0 Parameters
0 0 -1 1

Next Layer

1.14.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


10. Generative Adversarial Network (GAN)
Noise: N1 N2 N3 N4
1004
1 1 0 1
1 0 1 -1
1 1 1 1

Generator 1 1 0
0 1 2
-1 1 0
[≈ ReLU] 1 1 1 1 Real:
Fake: F1 F2 F3 F4 X1 X2 X3 X4
-1 1 0 0 2 3 3 4
1 0 1 0 1 1 1 1
0 1 1 0 2 3 4 3
0 0 1 1 1 1 1 1
[≈ ReLU] 1 1 1 1 1 1 1 1

Discriminator 1 0 0 -1 0
0 1 1 0 0
0 0 1 -1 1
[≈ ReLU] 1 1 1 1 1 1 1 1

1 1 -1 -1 Z
[≈ σ] [≈ σ]
Predictions: Y

Training the
Discriminator
Targets: - YD 0 0 0 0 1 1 1 1
𝜕𝐿𝐷
Loss Gradients:
𝜕𝑍

Training the
Generator
Targets: - YG 1 1 1 1
𝜕𝐿𝐺
Loss Gradients:
𝜕𝑍

1.15.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


11. Self Attention 1376
q1 q2 q3 q4
x1 x2 x3 x4
2 0 0 2 MatMul
Features
0 1 0 0 (KTQ)
0 2 1 0
0 0 1 1 k1 T
2 0 0 0 k2 T
1 0 1 1 k3 T
WQ q1 q2 q3 q4
k4 T

1 1 0 0 0 0
Scale
0 1 0 1 0 0
0 0 1 0 1 1

!!
WK k1 k2 k3 k4
0 0 1 0 0 0
Softmax
0 1 0 0 0 0
1 0 0 0 0 -1
e□

÷ Σ

Attention
Weight
Matrix (A)
MatMul

WV v1 v2 v3 v4 z1 z2 z3 z4
10 0 0 0 0 0 Attention
Weighted
0 0 0 10 0 0 Features
0 10 0 0 0 0

FFN
1.16.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh
12. Dropout 557

Random
Sequence
X1 X2 Inference
3 5 3 3
Training Data: Unseen Data:
.61 4 1 2 1
.39 1 1 1 1

.75 Linear 1 0 0 1 0 0
.40 1 1 0 1 1 1
.65 0 1 1 -1 1 1
.42 1 -1 0 1 -1 0
.23 [≈ ReLU] 1 1 [≈ ReLU] 1 1

.19 Dropout 0 0 0 0 0 0
.93 (p=0.5) 0 0 0 0 0 0
.42 0 0 0 0 0 0
.87 0 0 0 0 0 0
.53
Linear 1 0 0 1 0 1 0 1 1 0
.27
.69 0 1 1 0 0 1 1 1 0 0
.50 1 0 -1 -1 1 1 0 -1 0 1
[≈ ReLU] 1 1 [≈ ReLU] 1 1
.11
Dropout 0 0 0 0
.42 (p=0.33) 0 0 0 0
0 0 0 0
1 1
Linear 1 -1 0 0 Outputs 1 1 0 0
0 1 -1 -2 Y 0 1 -1 -1

-4 7 Targets
Training
- 10 5 Y’ MSE Loss
Gradients
𝜕𝐿
X2
𝜕𝑌

1.19.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


13. Autoencoder X1 X2 X3 X4
1 1 2 1
849
2 1 2 0
3 2 4 1
1 1 2 1
1 1 1 1

Encoder 1 0 0 1 0
0 1 1 0 0
-1 0 1 0 -1
[≈ ReLU] 1 1 1 1

1 0 1 0
Bottleneck
-1 1 0 0
[≈ ReLU] 1 1 1 1

1 0 0
Decoder
0 1 1
1 -1 0
[≈ ReLU] 1 1 1 1

1 0 -1 0 Outputs

1 -1 0 0 Y
0 0 1 1
0 1 1 -3

Targets
Reconstruction
Loss Y’

- MSE Loss 𝜕𝐿
Gradients 𝜕𝑌

X2

1.22.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


14. Vector Database 2224
Query
Data how are you who are you who am I am I you

Word Embeddings a an the how why who what are is am be was you we I they she he she me him her

0 -1 0 1 0 1 0 0 -1 1 0 0 0 3 1 0 -1 0 0 0 -1 0
2 0 2 0 0 0 -1 1 0 0 0 2 1 0 2 0 2 0 0 2 0 0
-1 0 -1 1 2 0 0 1 0 1 -1 0 0 -1 0 3 0 0 -1 0 2 -1
0 1 0 0 1 0 1 0 1 0 1 -2 0 0 0 1 0 1 0 1 0 1

Text Embeddings 1 1 1 1 1 1 1 1 1 1 1 1

Encoder 1 1 0 0 0
0 1 0 1 0
1 0 1 0 -1
Linear &
ReLU 1 -1 0 0 0

Mean Pooling

Indexing Projection
1 1 0 0 Vector
Storage
0 0 1 1

Retrieval Dot Products

Nearest Neighbor (argmax)

2.1.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


mini batch of text-image pairs
15. CLIP 400 millions more …
885

big table mini chair top hat table top big chair

word2vec mini big top hat chair table Flatten


0 1 1 0 1 0 Patches
1 0 0 0 1 1
0 1 0 1 0 1
Word
Image Encoder
Embeddings 1 1 1 1 1 1

1 1 0 0 0
Text Encoder 1 1 1 1 1 1 0 1 0 1 1
1 0 1 0 0 0 1 -1 1
0 3 0 -2 0 1 1 1 -1
1 1 0 1

[Mean Pooling] [Mean Pooling]


(round) (round)

[Projection] 1 1 1 [Projection] 1 1 1

1 1 0 -2 -1 1 1 0 -1
0 1 1 -2 0 0 -1 1 0

T1 T2 T3 Shared Embedding Space


Cross Entropy
[Softmax] Loss Gradients
÷ Similarity -
e□ Σ ImageàText Target ImageàText
I1 1 0 0
I2 0 1 0
I3 0 0 1
÷ Σ TextàImage
1 0 0
Similarity
TextàImage - 0 1 0
0 0 1
2.10.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh
16. Residual Network 625
2 1 0
0 3 4
X
1 0 2
1 1 1
weight
layer 1 1 0 0
ReLU 0 1 -1 0
1 0 1 -1
weight 1 1 1
layer
1 0 0 0 0 1 0
+ 0 1 0 -1 0 0 0
ReLU 0 0 1 0 -1 0 0
1 1 1

Transformer’s Encoder Block


1 0 0
0 1 0
Input
Embedding 0 0 1

1 0 1 1 0 0
Q
K 1 1 0 0 1 0
Attention 0 1 1 0 0 1
Add & Norm
2 0 1
1 3 -2
Feed 1 1 1

Forward 1 1 1
Add & Norm
1 -1 2
1 1 1

1 0 0 -1 3
Next Block
0 1 -1 0 0
↓ ↓ ↓
Next Block
2.15.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh
17. Graph Convolutional Network
573
Graph Data A B C D E

A B C D E
Graph A B C D E A
Convolutional 2 0 1 0 1 Adjacency
Network B
Matrix
1 1 0 0 0
C
0 0 -1 1 1 A B C D E
D
0 3 0 1 0
E
1 1 1 1 1

1 1 0 0 0
0 1 0 -1 0
1 0 0 1 -1
[ReLU] Messages 1 1 1 1 1

0 1 1 0
1 -1 0 -2
1 0 0 0
[ReLU] Messages 1 1 1 1 1

Fully 1 0 0 -2
Connected 0 1 0 -2
Network
0 0 1 -5
1 -1 0 0
1 0 -1 0
[ReLU] 1 1 1 1 1

1 1 1 1 1 -9

2.17.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


18. SORA’s Diffusion Transformer 2118
Training Video Diffusion Prompt
Step t = 3
1 0 2 0 0 1 0 1 “sora is sky”
0 1 1 0 3 0 4 0
Text
Encoder
Spacetime
Patches 0
1
(Pixels) 1
1 0 0 1 0 0 -1
1 1 1 1 0 1 0 -1 1 0
Visual Encoder Latent
1 1 0 0 1 0 Self-Attention
1 0 -1 0 0
0 2 0 1 0 2 1 0 0 0
0 1 0 1 1
[ReLU] Adaptive 1 1 0 0
Sampled 0 2 1 -1 Q 0 1 1 0
Layer Norm
Noise K
+ -1 0 -2 1 0 0 1 1
X +
Noised
Latent
1 1 1 1

Predicted Pointwise -1 1 -2
Noise FFN
- 0 1 -5

Noise-free Train Sampled 0 2 1 -1


Noise
Latent
- -1 0 -2 1
Visual Decoder 1 1 1 1

MSE Loss
1 0 1
Gradients
0 1 0
1 1 0 Generated Video
-1 1 0
[ReLU]

2.19.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh


19. Switch Transformer 576
(Gemini 1.5’s Sparse Mixture of Experts)

1 0 0 0 1 Attention
Previous Block 1 1 0 0 0
↓ ↓ ↓ ↓ ↓ Q Attention
0 1 1 0 0 Weight
K
0 0 1 1 0 Matrix (A)
0 0 0 1 1

X1 X2 X3 X4 X5 Z1 Z2 Z3 Z4 Z5
5 6 0 7 0 Attention
0 2 4 0 3 Weighted
1 0 1 1 0
Features

Switch A 1 -1 0 Gate
B 1 0 -2 Values
C -1 1 1

argmax Expert IDs

Expert A 1 1
Expert B 1 1
Expert C 1 1

1 0 -1 0 0 0 -1 0 1 0 0 0
0 1 0 0 1 0 0 0 0 1 1 1
0 0 -1 1 1 1 -1 1 0 1 0 0

Position-wise 1 2 3 4 5
Feed-Forward x3
Network (FFN)

↓ ↓ ↓ ↓ ↓

AI by Hand ✍ Vol. 1 © 2024 Tom Yeh Next Block


2.24.24
20. Reinforcement
LLM 5 1 0
prompt
0 4 1 Learning with Human
[S] CEO is 0 0 4 Feedback (RLHF)
550
Preferences
{Winner | Loser} {Winner | Loser}
1 0 -1 2 prompt next prompt next
[ReLU] -1 1 0 0 doc is him doc is them

him 1 1 0
her 0 1 0 Word Embeddings
them 1 0 0 him her them is doc CEO [S]
is/are -1 1 1 0 1 1 1 1 0 -1
doc 2 0 -2 1 0 0 1 1 1 -1
0 1 0 1 0 1 -1
CEO 2 0 -1

Sample (max) Mean Mean


Pool Pool
1/3 1/3 1/3
Reward
1/3 1/3 1/3
Model (RM)
1/3 1/3 1/3

1 0 1 0
0 1 0 0
1 0 -1 0
1 1 0 0
[ReLU]
3 3 3 -3 1 Reward

Align LLM Loss Train RM Winner - Loser

Loss Gradient Predicted σ


- Target -1

3.4.24 AI by Hand ✍ Vol. 1 © 2024 Tom Yeh Loss Gradient

You might also like