You are on page 1of 38

Applications of

NLP

Carlos Escolano
carlos.escolano@tsc.upc.edu

PostDoc
Universitat Politecnica de Catalunya
Technical University of Catalonia
Outline

● Machine Translation
● Text Summarization
● Question Answering
● Dialog

2
Machine Translation

3
The origins of Machine Translation

Source: https://www.youtube.com/watch?v=K-HfpsHPmvw&feature=youtu.be

4
Statistical Machine Translation
Given a source sentence x want to found about the most probable sentence y in the
target language

Or applying Bayes rule:

Translation Model * Language Model

5
Statistical Machine Translation

Do we only need to produce the right set of words?

source:http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture08-nmt.pdf

6
Alignment

Source: https://www.aclweb.org/anthology/J93-2003.pdf
7
Seq2Seq
● The model acts as a conditional language model.
● It models the translation probability as:

8
Source: http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture08-nmt.pdf
9
source:http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture08-nmt.pdf

10
Training

source:http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture08-nmt.pdf
11
Transformer. Autoregressive inference

Source: https://lena-voita.github.io/nlp_course/seq2seq_and_attention.html
12
Greedy decoding

Source: http://web.stanford.edu/class/cs224n/slides/cs224n-2020-lecture08-nmt.pdf
13
Beam Search Decoding
● On each step of decoder, keep track of the k most probable partial
translations (hypotheses).
● Usually k is between 5 and 10.
● Decoding stop when maximum length achieved or <END> token found.
● Beam Search does not provide an optimal decoding.
● Hypotheses are ranked according their score:

14
Beam Search Decoding

Source: https://huggingface.co/blog/how-to-generate

15
BLEU
● Automatic measure of Machine Translation Quality. Computed by
comparing the decoded sentences with one or several human generated
references.
● Based on:
○ N-gram precision: The correctly predicted n-grams from size to 1 to a
determined size (usually 4)
○ Length penalty: Penalty over short decoded outputs compared to
the reference.

16
Text Summarization

17
Summarization

Given input text x, write a summary y which us shorter and contains the main information of x.

18
Strategies: Extractive summarization

● Select parts from the original


sentence.
● Restricted to the original phrasing
in the input.
● Easier as it does not require
language generation.

19
Strategies: Abstractive summarization
● Generates new text using
language generation
techniques.
● More difficult to
implement.
● No restricted by phrasing.

20
Precision and recall
● Precision is the fraction of retrieved documents that are relevant to the
query.

● Recall is the fraction of the relevant documents that are successfully


retrieved.

● F1 score: Harmonic mean if precision and recall.

21
Neural Summarization
● Single document
summarization is a
translation task.

● Seq2Seq + attention
architecture.

● Easily copy terms to the


output.
22
Metrics: ROUGE
ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

● Based on recall, while BLEU is based in precision. Even


though a F1 version is usually reported.

● Based on n-gram overlap.

23
Metrics: ROUGE

The most common reported results are:

● ROUGE-1: unigram overlap


● ROUGE-2: bigram overlap
● ROUGE-L: Longest common subsequence overlap

24
Question Answering
Question Answering

● Source paragraph: Text from which the answer will be


extracted.
● Question: Question related to the context of the source
paragraph.
● Answer: Fragment of the source paragraph answering the
question.

26
Example

27
Pretrained models for QA

28
Dialog

29
End2End Models

● PRO: Able to talk about any topic and mimic


different personalities.

● CONS: Limited control over the generated


text, requires long context to generate
consistent text.

30
Prompt Tuning
● Input provided to a language model system to perform a downstream task (e.g
Summarization, Dialog, NER, etc).

● It may include a set of instructions or examples of the task we want to perform.

● No training or fine-tuning involved. Pure inference method.

Reid's resume: [paste full resume here]

Given the above information, write a witty speaker bio about Reid.

31
Example: Dialog
You are an expert baker answering users' questions. Reply as agent.

Example conversation:

User: Hey can you help me with something

Agent: Sure! What do you need help with?

User: I want to bake a cake but don't know what temperature to set the oven to.

Agent: For most cakes, the oven should be preheated to 350°F (177°C).

Current conversation:

User: [Insert user's question]

Agent:

32
ChatGPT

33
ChatGPT

34
ChatGPT

35
Virtual Assistants

● Split the model in smaller


tasks inside of a pipeline:
○ Automatic Speech
Recognition (ASR).
○ Intent Recognition.
○ Name Entity
Recognition (NER)

36
Virtual Assistants. Example
ASR
“Open Netflix”

Intent Recognition
“Alexa, open Netflix” Start an App (Which one?)

NER
“Alexa, open Netflix” “Netflix”

run netflix

37
Questions?

38

You might also like