You are on page 1of 24

Computational Knowledge Analysis –

Natural Language Processsing with Python


Session 8
Deep Learning for NLP
03.07.2023
Dr. Maria Becker
Summer Term 2023

1
Our Plan for this Session

I. Evaluation
II. Introduction to Deep Learning for NLP

2
Evaluation
• 15 Minutes for Online-In-Presence-Evaluation
• Link: https://evaluation.tu-
darmstadt.de/evasys/online.php?pswd=QUTN6 (also at Moodle)
• Or:
• Short URL: https://t1p.de/LVE
• Key: QUTN6

3
4
What is Deep Learning?
• A class of machine learning algorithms
• Most deep learning models are based on artificial neural networks
à often the terms “deep learning models” and “neural models” are
used synonymously
• Are widely used in many fields including image recognition,
computer vision, speech recognition, natural language processing,
machine translation, bioinformatics, climate science…

5
What are Neural Networks?
• Set of algorithms designed to recognize patterns in data, modeled after
the human brain
• Neural networks can cluster and classify: Can predict labels according
to similarities among the example inputs (labelled training data)

6
A Short Introduction to Neural Networks

https://www.youtube.com/watch?v=aircAruvnKk

0:00-08:39
7
Layers and Nodes
• Neural Networks are comprised of an input layer, one or more hidden
layers, and an output layer
• The layers consist of nodes (or artificial neurons), and each node connects
to another (or more) and has an associated weight and threshold
• If the output of any individual node is
above the specified threshold value,
that node is activated, sending data to
the next layer of the network.
Otherwise, no data is passed along to
the next layer of the network.
8
Layer Types
• Input layer
• input data are vectors (into which all real-world data, be it images, sound, text or time series, must
be translated)
• in NLP: e.g. with the word to vector method
• Example: The word vectors of the sentence “I liked the movie a lot.”

• Hidden layers
• Input vectors are then transformed through the hidden layers using complex mathematical matrix
operations à Often the hidden part of neural networks is described as a black box

• Output layer
• Prediction of the probabilities for each option
• E.g. For the task: Is the sentence positive or negative?
à 0.7 pos, 0.3 neg

9
Training Neural Networks
• Neural networks rely on training data to learn and improve their accuracy over time
• Hidden layers are trained on labelled training data (e.g. sentences annotated with sentiment
labels), so they “know” how to transform the input (that is, to adjust the weights and
thresholds in order to get correct predictions of labels)
• Neural networks are usually initialized with random weights and thresholds, then they make
the first predictions, which are then compared to the gold label (target)
• If the prediction was wrong: Adjustment of the weights and thresholds

10
Why "Deep" Learning?
• Deep Learning architectures use multiple layers to progressively extract
higher-level features from raw input
à The word "deep" refers to the number of layers through which the data is
transformed
• Each layer learns to transform its input data into a slightly more abstract and
composite representation
• In an image recognition application, the raw input may be a matrix of pixels; the
first representational layer may abstract the pixels and encode edges; the second
layer may compose and encode arrangements of edges; the third layer may
encode a nose and eyes; and the fourth layer may recognize that the image
contains a face
• A deep learning process can learn which features to optimally place in which
layer on its own (by learning from labelled examples/training data)

11
Different types of neural networks

• There are many different neural


architectures for different applications
and purposes such as text
classification, sequence labelling,
image recognition, image generation
etc…
• Design decisions: size of the input
vectors, number of layers, number of
neurons, loss function, initializing
weights and thresholds…
• Usually best settings are figured out
empirically through preliminary
experiments with different
configurations!

12
Deep Leaning in NLP:
From Feature Engineering to Neural Networks

• Paradigm shift from traditional task-specific feature engineering (rules)


to neural network systems: Instead of defining rules, labelled training
data is used to train neural networks
• e.g. POS Tagging:
• Rule based: Defining rules such as: If a word is capitalized, it is a noun (in German)
• Neural approach: sentences labelled with POS tags (training data) à Neural Network à
Prediction of a POS tag for a given word

• Neural architectures have obtained high performance across many


different NLP tasks and downstream applications

13
Deep Leaning in NLP
https://aclanthology.org/

14
Studies and Applications

15
Types of Neural Networks for NLP Tasks:
RNNs and LSTMs
• Recurrent neural networks (RNNs) and Long short-term memory (LSTM) networks are
well suited for dealing with text data as they learn from sequences of data
• Both architectures pass in the hidden state from one
step in the sequence to the next, combined with the
input
• LSTMs are an improvement of RNNs as RNNs have a
hard time remembering longer term memory à LSTMs
are useful when our neural network needs to switch
from remembering recent data, and data from a longer
time ago
• RNNs and LSTMs allow information to flow across time,
to enable the comprehension of global semantics

16
Types of Neural Networks for NLP Tasks:
CNNs
• RNN and LSTM calculation is a slow process, as the later neuron waits
for the information flow from the earlier neurons
• As a faster solution, Convolutional Neural Networks (CNNs) have been
explored for some NLP tasks which have a lower requirement for the
comprehension of global semantics
• The basic idea of CNN is to parallelly apply multiple filters on the input
data, where each filter extracts a certain feature
• When applied to NLP tasks, these filters can be trained to recognize
local patterns across time
17
From Recurrent Neural Networks to
Transformer Language Models:
Recent Trends in Deep Learning for NLP

https://www.youtube.com/watch?v=BGKumht1qLA
0:00-10:44
18
Transformer Language Models
• Language models can predict words, sequences or whole texts by learning the
features and characteristics of a language
• She wanted to go to a restaurant because she was... hungry
• She wanted to go to a restaurant because she was… hungry. She ordered pasta and wine.
• She wanted to go to a restaurant because she was… hungry. She ordered pasta and wine.
The food was very delicious. After she left, she called her friend and recommended the
restaurant to her. She asked if they might want to go there together on the weekend.
• Language models are pretrained on huge amounts of data (e.g. RoBERTa
model: trained on 160 GB of news, books, stories, and web texts)
• Pretraining and finetuning of language models:
• Unsupervised pre-training of a model on as much free text as possible
• Then the model is fine-tuned on specific tasks with a small labeled dataset
(supervised) à text classification, text generation, relation prediction…
19
Transformer Language Models
Success of transformer language models:
- While “traditional” neural networks (e.g.
RNNs) process inputs sequentially,
transformers process them as a block, all at
once
- stack of layers which capture
semantic/syntactic features and attend to
important words/sequences
- Enables heavy pretraining

20
21
Models for Deep Learning in NLP

https://huggingface.co/ 22
List of Interesting Links
• A Beginner's Guide to Neural Networks and Deep Learning: https://wiki.pathmind.com/neural-
network
• Deep Learning for NLP: An Overview of Recent Trends: https://medium.com/dair-ai/deep-learning-
for-nlp-an-overview-of-recent-trends-d0d8f40a776d
• Introduction to Deep Learning for Natural Language Processing: https://www.mlq.ai/deep-learning-
natural-language-processing/
• Introduction to NLP Deep Learning Theories: https://towardsdatascience.com/introduction-to-nlp-
deep-learning-theories-9d6801e3aa7d
• Deep learning for NLP (Online Book): https://livebook.manning.com/book/deep-learning-for-natural-
language-processing/chapter-1/v-11/
• Deep Learning Architectures for Sequence Processing: https://web.stanford.edu/~jurafsky/slp3/9.pdf
• Simple Deep Neural Networks for Text Classification:
https://www.youtube.com/watch?v=wNBaNhvL4pg
• Training Neural Networks: https://www.ibm.com/cloud/learn/neural-networks

23
Next session
• Next session will take place online:
https://audimax.heiconf.uni-heidelberg.de/7rh9-q3d2-xwdh-wuqe
• There you will get instructions for your term paper
• Please don‘t forget to submit the homework for the sentiment-class
last week (deadline: Sunday, 09/07/2023)

24

You might also like