Professional Documents
Culture Documents
Language
Shashank Srivastava
UNC Chapel Hill
Agenda
1. What is NLP?
– Some NLP Applications
2
What is NLP?
• Having computers understand human language and
communication
o Deeper understanding of text beyond string matching
3
NL ambiguities
Ø Word sense ambiguities
• “Kids make nutritious snacks”
vs
4
NL ambiguities
Ø Word sense ambiguities
• “Kids make nutritious snacks”
vs
Ø Syntactic ambiguities
• “Complaints about NBA referees growing ugly”
• “Ban on nude dancing on governor’s desk”
5
NL ambiguities
Ø Word sense ambiguities
• “Kids make nutritious snacks”
vs
Ø Syntactic ambiguities
• “Complaints about NBA referees growing ugly”
• “Ban on nude dancing on governor’s desk”
Ø Paralinguistics:
• “She said that she loved him”
• “She said that she loved him” 6
NLP application: Part-of-speech tagging
8
NLP applications: Question Answering
9
NLP applications: Dialog agents
10
NLP applications: Machine Translation
11
NLP applications: Machine Translation
12
NLP applications: Summarization
13
NLP applications: Response generation
14
NLP applications: Creative Language
Generation
15
Language Models are Unsupervised Multitask Learners. Radford et al,
NLP applications: VQA
What food on the tray is not inside a plastic cylinder ?
19
https://bringmeaspoon.org/
NLP applications: Text to Scene
Generation
There is a table and there are four chairs in the room. There are four
plates with four sandwiches.
20
Text to 3D Scene Generation with Rich Lexical Grounding. Chang et al. ACL
Can computers efficiently learn new tasks through Human
Language interactions with their users?
21
Towards Conversational Learning?
Ø ML currently relies on ‘big data’
Ø Inaccessible to non-experts
Ø Theoretical limits on what can be learned
n ≈ log (H)
22
1. Training Classifiers without labels from NL explanations only
‘Emails from my boss are {important, not-important}
usually important’
Srivastava et al, ACL 2018
Zero-shot training of classifiers from
natural language quantification
25
NL as feature functions
Semantic parsing maps NL to formal logical forms
26
NL as model constraints
27
Sequential Approach
x (email.replied == true)
y important:true
Ey|x [ (x, y)] = busually
Incorporating constraints
Posterior Regularization
in model training
Classifier
θ f :x→y
Unlabeled data 28
Sequential Approach
x (email.replied == true)
y important:true
Ey|x [ (x, y)] = busually
Posterior Regularization
Classifier
Unlabeled data 29
Training classifiers from declarative NL
Ø Explanations encode multiple properties that can aid statistical learning
31
Semantics of quantifiers
Ø Leverage semantics of linguistic quantifiers
Ø Associate point probability estimates for frequency adverbs and determiners
Semantic
Parser
x (email.replied == true)
y important:true
Ey|x [ (x, y)] = busually
Incorporating constraints
Posterior Regularization
in model training
Classifier
θ f :x→y
Unlabeled data 33
Posterior Regularization
Ø Use the posterior regularization (PR) principle to imbue human-
provided advice in learned models
Ø Unobserved class labels as latent variables
p✓c (Y |X)
y1 = ?
<latexit sha1_base64="r2xq5Suxvhh23+soGl74X4o56X8=">AAAB+nicbVBNS8NAEN3Ur1q/Uj16WSxCvZREBD0WvXisYD+kDWGz3bRLN5uwO1FK2p/ixYMiXv0l3vw3btsctPXBwOO9GWbmBYngGhzn2yqsrW9sbhW3Szu7e/sHdvmwpeNUUdaksYhVJyCaCS5ZEzgI1kkUI1EgWDsY3cz89iNTmsfyHsYJ8yIykDzklICRfLuc+FkPhgyIT6fVh0nnzLcrTs2ZA68SNycVlKPh21+9fkzTiEmggmjddZ0EvIwo4FSwaamXapYQOiID1jVUkohpL5ufPsWnRunjMFamJOC5+nsiI5HW4ygwnRGBoV72ZuJ/XjeF8MrLuExSYJIuFoWpwBDjWQ64zxWjIMaGEKq4uRXTIVGEgkmrZEJwl19eJa3zmuvU3LuLSv06j6OIjtEJqiIXXaI6ukUN1EQUPaFn9IrerIn1Yr1bH4vWgpXPHKE/sD5/AP7Yk84=</latexit>
M – step E – step
Update classifier Infer label assignments for
y2 = ? parameters using unlabeled data, regularized Q
inferred labels as given by NL constraints (Constraint
set)
y3 = ? qX (Y )
<latexit sha1_base64="zMv8NJBZ6RFA4Ix2Hu4gCAD9p44=">AAAB73icbVBNTwIxEJ3iF+IX6tFLIzHBC9k1JnokevGIicAa2JBu6UJDt7u0XROy4U948aAxXv073vw3FtiDgi+Z5OW9mczMCxLBtXGcb1RYW9/Y3Cpul3Z29/YPyodHLR2nirImjUWsvIBoJrhkTcONYF6iGIkCwdrB6Hbmt5+Y0jyWD2aSMD8iA8lDTomxkjfuZd60+njeK1ecmjMHXiVuTiqQo9Erf3X7MU0jJg0VROuO6yTGz4gynAo2LXVTzRJCR2TAOpZKEjHtZ/N7p/jMKn0cxsqWNHiu/p7ISKT1JApsZ0TMUC97M/E/r5Oa8NrPuExSwyRdLApTgU2MZ8/jPleMGjGxhFDF7a2YDoki1NiISjYEd/nlVdK6qLlOzb2/rNRv8jiKcAKnUAUXrqAOd9CAJlAQ8Ayv8IbG6AW9o49FawHlM8fwB+jzB3hqj5Q=</latexit>
34
Posterior Regularization
Ø Train with modified EM to maximize PR objective:
35
Synthetic shape classification
Ø Turkers observe samples of shapes from synthetically generated
datasets, and describe them through statements.
ü 50 datasets
ü 30 workers
ü 4.3 statements per task
on average
36
LNQ Accuracy
Harder Easier
Random 0.524 -- --
38
Real tasks: Bird species detection
Example explanations:
• A specimen that has a striped crown is likely
to be a selected bird
• Birds in the other category rarely ever have
dagger- shaped beaks
39
Real tasks: Email foldering
Ø Emails representing common email categories through AMT
Ø Reminders, meeting invitations, requests from boss, internet humor, going
out with friends, policy announcements, etc.
Example explanations:
Most reminders mention a date and a time in the message of the email
The sender of the email is the same as the recipient
These emails usually close with a name or title
These emails sometimes have jpg attachments
The email likely has words like "policy" or "announcement" in the subject
Emails from a public domain are not office requests
40
Results
Bird Species
Identification
Email
Categorization
41
Empirical distributions of probability values
42
1. Training Classifiers without labels from NL explanations only
‘Emails from my boss are {important, not-important}
usually important’
Srivastava et al, ACL 2018
Zero-shot training of classifiers from
natural language quantification
44
Send me NLP related news everyday at 8am
Send me NLP related news everyday at 8am
Date
Finally, email me the link to the three most
recent articles Date
Why is this interesting for NLP?
51
Framework
Use the Mini World-of-Bits framework (Shi et al, ICML’17)
Ø Interactive interfaces for web-like tasks
Ø Example tasks: clicking specified buttons, forwarding emails, liking social-
media posts, etc.
52
Explained Demonstration Dataset
Ø 520 demonstrations & stepwise explanations (AMT)
Ø 3.3 explanations/demonstration
Ø Most explanations (97%) follow sequence of actions in demonstration
53
Web DSL
Ø DSL operators for:
Ø Click/Type actions on web-elements
Ø Filter web-elements with specific features & relations
Ø Filter strings based on features & relations
54
LED Approach
Infer latent programs ( ) in DSL that are (1) consistent with the
demonstration ( ), and (2) relevant to the NL explanations (. )
𝑃 , = Σ𝑙 𝑃 𝑥 𝑙 ) 𝑃 𝑙 𝑑 , )
Relevance Consistency
55
Two key ideas
1 Represent the logical form ( ) for any step in a context ( ) with
set of latent variables denoting:
Ø Action to perform – click or type
Ø Web-element to act on
Ø Attributes of the web-element relevant for action
Ø Relations b/w the web-element with other elements relevant for action
56
Model Training
Ø Optimization with Variational EM
Ø E-step: Infer LV assignments () for demonstrations( )
Ø M-step: Update parameters of semantic generation model, 𝑃 𝑥 𝑙 )
Ø Testing: choose action ( ) that best models the explanation for a step
Ø Since we’re Bayesian, chosen action may not correspond to any single
logical form
57
Evaluation: Task Completion Rates
59
1. Training Classifiers without labels from NL explanations only
‘Emails from my boss are {important, not-important}
usually important’
Srivastava et al, ACL 2018
Zero-shot training of classifiers from
natural language quantification
61
PET’s Main Ideas
1. Patterns and Verbalizers
2. Train smaller LMs for each pattern on the 32 labeled examples, and
ensemble with model distillation using unlabeled data
62
ADAPET: Improve & Simplify PET
Ø Task-specific unlabeled data is often unrealistic for low-data scenarios ( e.g. pairs of sentences for NLI tasks)
Ø ADAPET alleviates these issues through more strongly supervised multi-task training
63
ADAPET: Improve & Simplify PET
1 Training with non-label tokens
The acting was bad and the script was boring.
terrible
All in all, the movie was _____ PET: Make gradient updates to improve likelihood
of the correct tokens (from verbalizer)
bogus
movie
gorilla ADAPET: also down-weight probabilities of all
other words in vocabulary
boy
boating
pink
2 Label Conditioning
The acting was bad and the script was <MASK>.
All in all, the movie was terrible
ADAPET: Given the right label, what is the context?
Predict randomly masked tokens in context given
64
the label
ADAPET Results
Unlabeled Ensemble? Gradient SuperGLUE
Data? updates? Avg
GPT-3 71.8
PET ✓ ✓ ✓ 74.0
iPET ✓ ✓ ✓ 75.4
ADAPET ✓ 76.0
65
1. Training Classifiers without labels from NL explanations only
‘Emails from my boss are {important, not-important}
usually important’
Srivastava et al, ACL 2018
Zero-shot training of classifiers from
natural language quantification
67
Other directions
Ø Learn complex tasks from a mix of
supervision:
Ø Demonstrations, explanations, experimentation,
observation
68
Questions?
69
Learning from fewer examples
70