Introduction
Language Modeling
Mathematics of Transformers
Unbundling the mathematics behind AI
Shyam Mohan
Mohan, Shyam Transformers
Introduction
Language Modeling
Table of Contents
1 Introduction
2 Language Modeling
Mohan, Shyam Transformers
Introduction
Language Modeling
What is this course about
We all know about the impact of generative AI in the last few
years.
Mohan, Shyam Transformers
Introduction
Language Modeling
What is this course about
We all know about the impact of generative AI in the last few
years.
LLM
Text to image
Text to video
Mohan, Shyam Transformers
Introduction
Language Modeling
Math of gen AI
What maths lies behind the success of these models?
Maths of transformers
Maths of image generation
Maths of video generation
Mohan, Shyam Transformers
Introduction
Language Modeling
Prerequisites
As such, everything can be learned from scratch, but ...
Mohan, Shyam Transformers
Introduction
Language Modeling
Prerequisites
As such, everything can be learned from scratch, but ...Knowledge
of the following can possibly help
Probability
Calculus
Optimization
Statistics
Mohan, Shyam Transformers
Introduction
Language Modeling
Breadth vs Depth
A challenge is the choice of topics to cover. Math of AI can
encompass:
Probability theory
Mohan, Shyam Transformers
Introduction
Language Modeling
Breadth vs Depth
A challenge is the choice of topics to cover. Math of AI can
encompass:
Probability theory
Matrix Calculus and Linear Algebra
Mohan, Shyam Transformers
Introduction
Language Modeling
Breadth vs Depth
A challenge is the choice of topics to cover. Math of AI can
encompass:
Probability theory
Matrix Calculus and Linear Algebra
Optimization
Mohan, Shyam Transformers
Introduction
Language Modeling
Breadth vs Depth
A challenge is the choice of topics to cover. Math of AI can
encompass:
Probability theory
Matrix Calculus and Linear Algebra
Optimization
Statistics
Mohan, Shyam Transformers
Introduction
Language Modeling
Breadth vs Depth
A challenge is the choice of topics to cover. Math of AI can
encompass:
Probability theory
Matrix Calculus and Linear Algebra
Optimization
Statistics
Econometrics and time-series methods
Our top-down approach: Start with topical AI models and study
the math of them in detail
Mohan, Shyam Transformers
Introduction
Language Modeling
Chat-GPT like models
Let us recall what models like Chat-GPT do.
They take a bunch of input (text including questions,
suggestions, etc) and predict
Mohan, Shyam Transformers
Introduction
Language Modeling
Chat-GPT like models
Let us recall what models like Chat-GPT do.
They take a bunch of input (text including questions,
suggestions, etc) and predict the next one output word
Mohan, Shyam Transformers
Introduction
Language Modeling
Chat-GPT like models
Let us recall what models like Chat-GPT do.
They take a bunch of input (text including questions,
suggestions, etc) and predict the next one output word
After this, they append this prediction to the input and
generate the next output word, and so on.
Mohan, Shyam Transformers
Introduction
Language Modeling
Example
.
Mohan, Shyam Transformers
Introduction
Language Modeling
What does the output entail
To find the best next word, we need a probability distribution over
the entire vocabulary!
Mohan, Shyam Transformers
Introduction
Language Modeling
What does the input entail
Every word or a set of words must be mapped to a vector of
numbers.
Similar words must be represented by vectors which are close
enough.
Transformer - black box - converting input to output
Mohan, Shyam Transformers
Introduction
Language Modeling
Idea slide
Goal of ”attention”
Q, K, V matrices
Mohan, Shyam Transformers
Introduction
Language Modeling
Goal
Query:
Similar words must be represented by vectors which are close
enough.
The vector of numbers must represent relationship between
the words
Mohan, Shyam Transformers
Introduction
Language Modeling
Q, K and V matrices
Query:
Similar words must be represented by vectors which are close
enough.
The vector of numbers must represent relationship between
the words
Mohan, Shyam Transformers