Professional Documents
Culture Documents
Deeplearning - Ai Deeplearning - Ai
Deeplearning - Ai Deeplearning - Ai
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
village
city town
gas country
oil happy
petroleum
sad joyful
● One-hot vectors
● Word embeddings
Integers
Word Number
a 1
able 2
about 3
... ...
hand 615
… …
happy 621
... ...
zebra 1000
Integers
+ Simple
excited
Word
Embeddings
deeplearning.ai
Meaning as vectors
negative -2 -1 0 1 2 positive
word vectors
● Words as vectors
○ One-hot vectors
○ Word embedding vectors
Word embeddings
Word
Embedding
Methods
deeplearning.ai
Basic word embedding methods
● word2vec (Google, 2013)
○ Continuous bag-of-words (CBOW)
○ Continuous skip-gram / Skip-gram with negative sampling (SGNS)
Tunable pre-trained
● ELMo (Allen Institute for AI, 2018)
models available
Embedding method
“I think
Corpus Transformation CBOW “therefore”
[???] I am”
Word embeddings
Corpus Transformation CBOW
dog
puppy
hound
terrier
...
Corpus Transformation CBOW
center word
Word embeddings
Corpus Transformation CBOW
Word embeddings
Corpus Transformation CBOW
Word embeddings
CBOW in a nutshell
● Punctuation , ! . ? →. “ ‘ « » ’ ” →∅ … !! ??? → .
Cleaning and tokenization matters
● Letter case “The” == “the” == “THE” → lowercase / upper case
● Punctuation , ! . ? →. “ ‘ « » ’ ” →∅ … !! ??? → .
● Punctuation , ! . ? →. “ ‘ « » ’ ” →∅ … !! ??? → .
● Special characters ∇ $ € § ¶ ** → ∅
Cleaning and tokenization matters
● Letter case “The” == “the” == “THE” → lowercase / upper case
● Punctuation , ! . ? →. “ ‘ « » ’ ” →∅ … !! ??? → .
● Special characters ∇ $ € § ¶ ** → ∅
import nltk
from nltk.tokenize import word_tokenize
import emoji
→ ['Who', '❤', '``', 'word', 'embeddings', "''", 'in', '2020', '.', 'I',
'do', '.']
Example in Python: code
corpus = 'Who ❤ "word embeddings" in 2020? I do!!!'
for x, y in get_windows(
['i', 'am', 'happy', 'because', 'i', 'am', 'learning'],
2
):
print(f'{x}\t{y}')
Sliding window of words in Python
for x, y in get_windows(
['i', 'am', 'happy', 'because', 'i', 'am', 'learning'],
2
):
print(f'{x}\t{y}')
I am because I I am because I
am 0 1 0 0 0.25
because 0 0 1 0 0.25
happy 0 + 0 + 0 + 0 /4 = 0
I 1 0 0 1 0.5
learning 0 0 0 0 0
Final prepared training set
Context words Context words vector Center word Center word vector
W1 NxV W2 VxN
⋮ ⋮ ⋮
b1 Nx1 b2 Vx1
⋮ ReLU
⋮ softmax
⋮
z1 = W1x + b1 z1 = W1 = NxV x= b1 =
Nx1 Nx1
Vx1
Row vectors
b1 = 1xN W1 =
z1 = xW1T + b1 NxV b1 = 1xN
x= 1xV
Architecture of
the CBOW
Model:
deeplearning.ai
Dimensions 2
b1 → B1 = b1 … b1 N broadcasting
Dimensions (batch input) m
X H Ŷ
Vxm Nxm Vxm
Architecture
of the CBOW
Model
deeplearning.ai
Activation Functions
Rectified Linear Unit (ReLU)
ReLU(x) = max(0, x)
Input layer Hidden layer
z1 = W1x + b1
W1
h = ReLU(z1)
⋮ ⋮
b1 z1 h
5.1 5.1
⋮ ⋮
ReLU -0.3 0
⋮ ReLU ⋮
-4.6 0
x h 0.2 0.2
Softmax ∈
ℝK
softmax ∈ [0, 1]K
~probabilities
Hidden layer Output layer
𝝨=1
z = W2h + b2
W2 ŷ = softmax(z)
⋮ ⋮ z ŷ
b2 z1 ŷ1 a
⋮ ⋮ ⋮
⋮ ⋮ zi softmax ŷi happy
softmax
⋮ ⋮ ⋮
zV ŷV zebra
V V
h ŷ Probabilities of
being center word
Softmax: example
z exp(z) ŷ = softmax(z)
9 8103 0.083 am
8 2981 0.03 because
11 exp 59874 /𝝨 0.612 happy ← Predicted center word
10 22026 0.225 I
8.5 4915 0.05 learning
𝝨=97899 𝝨=1
Training a
CBOW Model
deeplearning.ai
Cost Function
Loss
x ŷ y
Context Predicted Actual
words CBOW center word center word
vector vector vector
y ŷ log(ŷ) y ⊙ log(ŷ)
0 am 0.083 -2.49 0
0 because 0.03 -3.49 0
1 happy 0.611 log -0.49 ⊙y -0.49 -𝝨 J = 0.49
0 I 0.225 -1.49 0
0 learning 0.05 -2.49 0
Cross-entropy loss
y ŷ log(ŷ) y ⊙ log(ŷ)
0 am 0.96 -0.04 0
0 because 0.01 -4.61 0
1 happy 0.01 log -4.61 ⊙y -4.61 -𝝨 J = 4.61
0
0
I
learning
0.01
0.01
-4.61
-4.61
0
0 >
J(correct) = 0.49
Cross-entropy loss
J = -log ŷactual
word
incorrect
y ŷ predictions:
penalty
0 am 0.96
0 because 0.01
1 happy 0.01 → J = 4.61 correct
0 I 0.01 predictions:
0 learning 0.01 reward
Training a
CBOW Model
deeplearning.ai
Forward Propagation
Training process
● Forward propagation
● Cost
X H Ŷ
Cost
Cost: mean of losses
Predicted
Actual center
center word
word matrix
matrix
⋮ W1 b1 ReLU ⋮ W2 b2 softmax ⋮
am
am
… because
W1 = w(1) w(V) N x= happy V
I
learning
V
Extracting word embedding vectors: option 2
Input layer Hidden layer Output layer
⋮ W1 b1 ReLU ⋮ W2 b2 softmax ⋮
w(1) am am
because
W2 = … V x= happy V
I
w(V) learning
N
Extracting word embedding vectors: option 3
w2(1)
W1 = w1(1) … w1(V) W2 = …
w2(V)
am
because
W3= 0.5 (W1+W2T) = w3(1) … w3(V) N x= happy V
I
learning
V
Evaluating
Word
Embeddings
deeplearning.ai
Intrinsic Evaluation
Intrinsic evaluation
Test relationships between words
● Analogies
Semantic analogies
“France” is to “Paris” as “Italy” is to <?>
Syntactic analogies
“seen” is to “saw” as “been” is to <?>
⚡ Ambiguity
“wolf” is to “pack” as “bee” is to <?> → swarm? colony?
Intrinsic evaluation
Test relationships between words
● Analogies
Intrinsic evaluation
Test relationships between words
● Analogies
● Clustering
● Analogies
village
city town
● Clustering gas country
oil happy
petroleum
● Visualization sad joyful
Evaluating
Word
Embeddings
deeplearning.ai
Extrinsic Evaluation
Named entity
- Time-consuming
● Word representations
● Evaluation
Going further
● Advanced language modelling and word embeddings