Professional Documents
Culture Documents
Bardia Rafieian
Bardia.rafieian@upc.edu
PhD. Candidate
Universitat Politecnica de Catalunya
Technical University of Catalonia
Outline
2
Last session recap
3
Last session recap: Bag of words
4
Last session recap: Bag of words
5
Word Vectors
6
One-hot
● If you have 10,000 words in your vocabulary, then you can
represent each word as a 1x10,000 vector
● Example: 4 words in our vocabulary →
Mango [1, 0, 0, 0]
Strawberry [0, 1, 0, 0]
City [0, 0, 1, 0]
Barcelona [0, 0, 0, 1]
7
Some useful definitions
( Term Frequency(TF) — Inverse Dense Frequency(IDF) )
IDF =Log[(# Number of documents) / (Number of documents containing the word)] and
TF-IDF = TF*IDF
8
Word Vectors
9
Jointly Trained Embeddings
● The embedding table works as input
layer of the network
11
Jointly Trained Embeddings
12
Jointly Trained Embeddings
● Our Embedding table is the first layer of
our model.
13
Jointly Trained Embeddings
14
Jointly Trained Embeddings
15
Word Vectors
16
Word2Vec
17
Skip-gram and Continuous Bag of Words
Model (CBOW)
● The distributed representation of
the input word is used to predict the
context. (Skip-gram)
Figurefrom:(Mikolov2013ExploitingSA)
18
Word2Vec: Skip-Gram
19
Word2Vec: Skip-Gram
20
Example
Having the sentence: The future king is the prince
21
Example
● The final sizes of these matrices will be n x m, where
n - number of created data points
m - number of unique words
I think you could expect that synonyms like “intelligent” and “smart” would have very
similar contexts. Or that words that are related, like “engine” and “transmission”,
would probably have similar contexts as well.
23
Word2Vec:CBOW
● Predicts the current word
according to the other words in
the windows.
● Prediction based on the average
representation of window’s
words.
● As Skip-Gram no notion of order
in the window is used
● Better for common words in the
corpus.
24
Example
Sentence:
‘It is a pleasant day’
25
● Skip-gram: works well with small amount of the training data, represents
well even rare words or phrases.
● CBOW: several times faster to train than the skip-gram, slightly better
accuracy for the frequent words
26
Glove
27
GloVe
28
GloVe
Co-occurrence Matrix:
The cat sat on mat
Figurefrom:(towardsdatascience)
29
GloVe
Let’s fill a co-occurrence matrix:
?
? ?
? ? ?
? ?
?
Image source: Data Analytics with Hadoop: An Introduction for Data Scientists
30
GloVe
● Co-occurrence probabilities for ice and steam with probe words
● The ratio is better able to distinguish relevant words (solid and gas) from
irrelevant words (water and fashion)
● It is also better able to discriminate between the two relevant words.
31
GloVe
Embedding: wi + wj
32
GloVe
33
Cross-lingual Words
Embeddings
34
Cross-lingual Word Embeddings
35
Cross-lingual Word Embeddings
36
Cross-lingual Words Embeddings
37
Contextual Embeddings
38
Contextual Embeddings
39
ELMo (Embeddings from Language Models)
● Language Modelling: Task of
predicting the next tokens of a
sequence or the probability of a
given sequence to happen in the
language.
40
ELMo (Embeddings from Language Models)
41
ELMo (Embeddings from Language Models)
42
Document Embeddings
43
Doc2vec
● Representing documents as a vector and is a generalizing of the Word2vec method.
● The inspiration : word vectors are asked to contribute to predict the next word.
● The paragraph vectors are also asked to contribute to the prediction task of the next word
given many contexts sampled from the paragraph
44
Doc2vec
45
Text Classification
46
Classification Tasks
47
NLP Tasks
49
Recurrent Neural Networks (RNN)
RNNS keep a context of the
previous steps on the sequence
that conditions the output of
each steps.
Articles:
● Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P.P. (2011). “Natural Language Processing (almost) from Scratch”. Journal of Machine
Learning Research, 12, 2493-2537.
● Goldberg, Y. (2016). “A Primer on Neural Network Models for Natural Language Processing”. J. Artif. Intell. Res., 57, 345-420.
● Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
● Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical
methods in natural language processing (EMNLP) (pp. 1532-1543).
● Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on International
Conference on Machine Learning - Volume 32 (ICML'14). JMLR.org, II–1188–II–1196.
51
Questions?
52