You are on page 1of 223

Teaching Machines to Ask

Useful Clarification Questions

Sudha Rao Committee


PhD Defense Examination Prof. Hal Daumé III (advisor)
Dept. of Computer Science Prof. Philip Resnik
University of Maryland, College Park Prof. Marine Carpuat
Prof. Jordan Boyd-Graber
Prof. Lucy Vanderwende
Natural Language Understanding

2
Natural Language Understanding

How long does it take to get a PhD?

3
Natural Language Understanding

Give me a recipe
for lasagna

How long does it take to get a PhD?

4
Natural Language Understanding

Give me a recipe
for lasagna

How long does it take to get a PhD?

Please bring me my
coffee mug from the
kitchen

5
Natural Language Understanding

Give me a recipe
for lasagna

How long does it take to get a PhD?

Please bring me my
coffee mug from the
kitchen

6
Human Interactions

7
Human Interactions

Please bring me my
coffee mug from the
kitchen

8
Human Interactions

Please bring me my
coffee mug from the
kitchen

9
Human Interactions

Please bring me my
coffee mug from the
kitchen

What color is
your coffee mug?

10
Teach Machines to Ask Clarification Questions

11
Teach Machines to Ask Clarification Questions

Context-aware questions about missing information

12
Teach Machines to Ask Clarification Questions

Context-aware questions about missing information

How long does it take to get a PhD ?

In which field?

13
Teach Machines to Ask Clarification Questions

Context-aware questions about missing information

How long does it take to get a PhD ? Give me a recipe


for lasagna
In which field?
Any dietary
restrictions?

14
Teach Machines to Ask Clarification Questions

Context-aware questions about missing information

How long does it take to get a PhD ? Give me a recipe


for lasagna
In which field?
Any dietary
restrictions?

Please bring me my
coffee mug from the
kitchen What color is your
coffee mug?

15
PRIOR WORK

16
Reading Comprehension Question Generation

My class is going to the movies on a field trip next week.


We have to get permission slips signed before we go.
We are going to see a movie that tells the story from a book we read.

Q: What do the students need to do before going to the movies?

o  Vasile, et al. NLG 2010


o  Heilman. PhD thesis 2011
o  Olney, Graesser, and Person. Dialogue & Discourse 2012
o  Richardson, et al. EMNLP 2013
o  Chali and Hasan. ACL 2015
o  Serban, et al. ACL 2016
o  Du, Shao & Cardie ACL 2017
o  Tang et al. NAACL 2018
o  Mrinmaya and Xing. NAACL 2018

17
Question Generation for Slot Filling
SLOTS
USER: I want to go to Melbourne on July 14
<origin city>
SYSTEM: What time do you want to leave?
<departure city>
USER: I must be in Melbourne by 11 am
<origin time>
SYSTEM: Would you like a Delta flight that arrives at 10.15 am?
<departure time>
USER: Sure
<airline>
SYSTEM: In what name should I make the reservation?

o  Goddeau, et al. 1996


o  Bobrow., et al. Artificial intelligence 1977
o  Lemon, et al. EACL 2006
o  Williams, et al SIGDIAL 2013
o  Young, et al. IEEE 2013
o  Dhingra, et al. ACL 2017
o  Bordes, et al. ICLR 2017

18
Visual Question Generation Task

Q: Was anyone injured in the crash?

Q: Is the motorcyclist alive?

Q: What caused the accident?

Mostafazadeh et al. "Generating natural questions about an image." ACL 2016

19
We consider two scenarios

20
We consider two scenarios -- First Scenario

StackExchange

How to set environment variables for installation?


I'm aiming to install ape, a simple code for pseudopotential generation.
I'm having this error message while running ./configure
<error message> Context
So I have the library but the program installation isn't finding it.
Any help? Thanks in advance!

21
We consider two scenarios -- First Scenario

StackExchange

How to set environment variables for installation?


I'm aiming to install ape, a simple code for pseudopotential generation.
I'm having this error message while running ./configure
<error message> Context
So I have the library but the program installation isn't finding it.
Any help? Thanks in advance!

What version of Ubuntu do you have?

How are you installing ape? Shortlist of useful


questions
Do you have GSL installed?

22
We consider two scenarios -- Second Scenario

Amazon

23
We consider two scenarios -- Second Scenario

Amazon

Is this induction safe?

What is the warranty or guarantee on this?

What are the handles made of?

24
Our Contributions

1.  Question Ranking Model:


ü  Good question is one whose answer is useful

25
Our Contributions

1.  Question Ranking Model:


ü  Good question is one whose answer is useful

2.  Question Generation Model:


ü  Generate question from scratch
ü  Sequence-to-sequence trained using adversarial networks

26
Talk Outline

o  How we build the clarification questions dataset?

o  How we rank clarification questions from an existing set?

o  How we generate clarification questions from scratch?

o  How we control specificity of the generated clarification questions?

o  Future Directions

27
Talk Outline

o  How we build the clarification questions dataset?

o  How we rank clarification questions from an existing set?

o  How we generate clarification questions from scratch?

o  How we control specificity of the generated clarification questions?

o  Future Directions

28
Clarification Questions Dataset: StackExchange

29
Clarification Questions Dataset: StackExchange

How to configure path or set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.


I'm having this error message while running ./configure
<error message>
So I have the library but the program installation isn't finding it.
Any help? Thanks in advance!

30
Clarification Questions Dataset: StackExchange

How to configure path or set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.


I'm having this error message while running ./configure Initial Post
<error message>
So I have the library but the program installation isn't finding it.
Any help? Thanks in advance!

Finding: Questions go unanswered for a long time if they are not clear enough

Asaduzzaman, Muhammad, et al. "Answering questions about unanswered questions of stack


overflow.” Working Conference on Mining Software Repositories. IEEE Press, 2013.

31
Clarification Questions Dataset: StackExchange

How to configure path or set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.


I'm having this error message while running ./configure Initial Post
<error message>
So I have the library but the program installation isn't finding it.
Question
Any help? Thanks in advance!
comment

What version of ubuntu do you have?

32
Clarification Questions Dataset: StackExchange

How to configure path or set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.


I'm having this error message while running ./configure Initial Post
<error message>
So I have the library but the program installation isn't finding it.
Question
Any help? Thanks in advance!
comment

What version of ubuntu do you have?

I'm aiming to install ape in Ubuntu 14.04 LTS, a simple code for
pseudopotential generation.
I'm having this error message while running ./configure
<error message> Updated Post
So I have the library but the program installation isn't finding it.
Any help? Thanks in advance!

33
Clarification Questions Dataset: StackExchange

How to configure path or set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.


I'm having this error message while running ./configure Initial Post
<error message>
So I have the library but the program installation isn't finding it.
Question
Any help? Thanks in advance!
comment

What version of ubuntu do you have?

I'm aiming to install ape in Ubuntu 14.04 LTS, a simple code for Edit as an answer
pseudopotential generation. to the question
I'm having this error message while running ./configure
<error message> Updated Post
So I have the library but the program installation isn't finding it.
Any help? Thanks in advance!

34
Clarification Questions Dataset: StackExchange

How to configure path or set environment variables for installation?

I'm aiming to install ape, a simple code for pseudopotential generation.


I'm having this error message while running ./configure Initial Post
<error message>
So I have the library but the program installation isn't finding it.
Question
Any help? Thanks in advance!
comment

What version of ubuntu do you have?

I'm aiming to install ape in Ubuntu 14.04 LTS, a simple code for Edit as an answer
pseudopotential generation. to the question
I'm having this error message while running ./configure
<error message> Updated Post
So I have the library but the program installation isn't finding it.
Any help? Thanks in advance!

35
Clarification Questions Dataset: StackExchange

Dataset Creation

( context, question , answer ) triples

context Original post

question Clarification question posted in comments

answer Edit made to the post in response to the question


OR author’s reply to the question comment

Dataset Size: ~77 K triples


Domains: AskUbuntu, Unix, Superuser

36
Clarification Questions Dataset: Amazon

37
Clarification Questions Dataset: Amazon

McAuley and Yang. Addressing complex and subjective product-related queries with customer reviews. WWW 2016

38
Clarification Questions Dataset: Amazon

context

question
answer

McAuley and Yang. Addressing complex and subjective product-related queries with customer reviews. WWW 2016

39
Clarification Questions Dataset: Amazon

context

question
answer

Dataset Size: ~24K (3-10 questions per context)


Domain: Home & Kitchen

McAuley and Yang. Addressing complex and subjective product-related queries with customer reviews. WWW 2016

40
Talk Outline

o  How we build the clarification questions dataset?

§  Two datasets: StackExchange & Amazon

o  How we rank clarification questions from an existing set?

o  How we generate clarification questions from scratch?

o  How we control specificity of the generated clarification questions?

o  Future Directions

41
Talk Outline

o  How we build the clarification questions dataset?


ü  Two datasets: StackExchange & Amazon

o  How we rank clarification questions from an existing set?

o  How we generate clarification questions from scratch?

o  How we control specificity of the generated clarification questions?

o  Future Directions

Sudha Rao, Hal Daumé III, "Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected
Value of Perfect Information ”, ACL 2018

42
Expected Value of Perfect Information (EVPI) inspired model

Mordecai et al. "The value of information and stochastic programming." Operations Research 18.5 (1970)

43
Expected Value of Perfect Information (EVPI) inspired model

o  Use EVPI to identify questions that add the most value to the given post

Mordecai et al. "The value of information and stochastic programming." Operations Research 18.5 (1970)

44
Expected Value of Perfect Information (EVPI) inspired model

o  Use EVPI to identify questions that add the most value to the given post

o  Definition: Value of Perfect Information VPI (x|c)


How much value does x add to a given information content c?

Mordecai et al. "The value of information and stochastic programming." Operations Research 18.5 (1970)

45
Expected Value of Perfect Information (EVPI) inspired model

o  Use EVPI to identify questions that add the most value to the given post

o  Definition: Value of Perfect Information VPI (x|c)


How much value does x add to a given information content c?

o  Since we have not acquired x, we define its value in expectation

Mordecai et al. "The value of information and stochastic programming." Operations Research 18.5 (1970)

46
Expected Value of Perfect Information (EVPI) inspired model

o  Use EVPI to identify questions that add the most value to the given post

o  Definition: Value of Perfect Information VPI (x|c)


How much value does x add to a given information content c?

o  Since we have not acquired x, we define its value in expectation

EVPI (x|c) =

x X

Mordecai et al. "The value of information and stochastic programming." Operations Research 18.5 (1970)

47
Expected Value of Perfect Information (EVPI) inspired model

o  Use EVPI to identify questions that add the most value to the given post

o  Definition: Value of Perfect Information VPI (x|c)


How much value does x add to a given information content c?

o  Since we have not acquired x, we define its value in expectation

Likelihood of x given c

EVPI (x|c) = P (x|c)

x X

Mordecai et al. "The value of information and stochastic programming." Operations Research 18.5 (1970)

48
Expected Value of Perfect Information (EVPI) inspired model

o  Use EVPI to identify questions that add the most value to the given post

o  Definition: Value of Perfect Information VPI (x|c)


How much value does x add to a given information content c?

o  Since we have not acquired x, we define its value in expectation

Likelihood of x given c

EVPI (x|c) = P (x|c) Utility(x, c)

x X
Value of updating c with x

Mordecai et al. "The value of information and stochastic programming." Operations Research 18.5 (1970)

49
EVPI formulation for our problem

50
EVPI formulation for our problem

EVPI ( qi | c )=

c : given context

qi : question from set of question candidates Q

51
EVPI formulation for our problem

Likelihood of aj being the answer to qi on context c

EVPI ( qi | c )= P( aj | c , qi )

c : given context

qi : question from set of question candidates Q

52
EVPI formulation for our problem

Likelihood of aj being the answer to qi on context c

EVPI ( qi | c )= P( aj | c , qi ) U( c + aj )

Utility of updating the context c with answer aj

c : given context

qi : question from set of question candidates Q

53
EVPI formulation for our problem

Likelihood of aj being the answer to qi on context c

EVPI ( qi | c )= P( aj | c , qi ) U( c + aj )
aj A
Utility of updating the context c with answer aj

c : given context

qi : question from set of question candidates Q

aj : answer from set of answer candidates A

54
We rank questions by their EVPI value

EVPI ( qi | c )= P( aj | c , qi ) U( c + aj )
aj A

Question Candidates EVPI value

What is the make of your wifi card? 0.34

What version of Ubuntu do you have? 0.85

What OS are you using?


0.67

55
We rank questions by their EVPI value

EVPI ( qi | c )= P( aj | c , qi ) U( c + aj )
aj A

Question Candidates EVPI value

What is the make of your wifi card? 0.34 What version of Ubuntu do you have?

What version of Ubuntu do you have? 0.85 What OS are you using?

What OS are you using? What is the make of your wifi card?
0.67

56
Three parts of our formulation:

EVPI ( qi | c )= P( aj | c , qi ) U( c + aj )
qi Q aj A

Question & Answer Answer Utility


Candidate Generator Modeling Calculator

1 2 3

57
Three parts of our formulation:

EVPI ( qi | c )= P( aj | c , qi ) U( c + aj )
qi Q aj A

Question & Answer


Candidate Generator

58
1. Question & Answer Generator

Dataset of
(post, question, answer)

Post as
Documents

Lucene
Post p as
Search
query
Engine

59
1. Question & Answer Generator

Dataset of Ten posts


(post, question, answer) similar to given
post p

p1
Post as
Documents
p2

Lucene
Post p as pj
Search
query
Engine

p10

60
1. Question & Answer Generator

Dataset of Ten posts Questions


(post, question, answer) similar to given paired with
post p those posts
p1 q1

Post as
Documents q2
p2

Lucene qj
Post p as pj
Search
query
Engine

p10 q10

61
1. Question & Answer Generator

Dataset of Ten posts Questions Answers paired


(post, question, answer) similar to given paired with with those posts
post p those posts
p1 q1 a1
Post as
Documents q2
p2 a2

Lucene qj
Post p as pj aj
Search
query
Engine

p10 q10 a10

62
Three parts of our formulation:

EVPI ( qi | c )= P( aj | c , qi ) U( c + aj )
qi Q aj A

Answer
Modeling

63
2. Answer Modeling

P( aj | c , qi )≈ cosine_sim ( Embans( c , qi ), aj )

64
2. Answer Modeling

P( aj | c , qi )≈ cosine_sim ( Embans( c , qi ), aj )

Neural
Embedding
Network

c qi aj

65
2. Answer Modeling

P( aj | c , qi )≈ cosine_sim ( Embans( c , qi ), aj )

Training objective Neural


Embedding
Network
close
( c , qi ) a0 Correct
answer
c qi aj

a1
Other
answers

a10

66
2. Answer Modeling

P( aj | c , qi )≈ cosine_sim ( Embans( c , qi ), aj )

Feedforward Average

Neural Network c qi

Context Question
LSTM LSTM

Word embedding module

c qi aj

67
Three parts of our formulation:

EVPI ( qi | c )= P( aj | c , qi ) U( c + aj )
qi Q aj A

Utility
Calculator

68
3. Utility Calculator

U( c + aj ) Value between
0 and 1

Neural Network

c qi aj

69
3. Utility Calculator

U( c + aj ) Value between
0 and 1

Training objective
Neural Network
Label
Original
( c , q0 , a0 ) (ques, ans) y=1

c qi aj

( c , q1 , a1 ) y=0
Other
(ques, ans)

( c , q10 , a10 ) y=0

70
3. Utility Calculator

U( c + aj ) Value between
0 and 1

Feedforward

Neural
Network c qi aj

Context Question Answer


LSTM LSTM LSTM

Word embedding module

c qi aj

71
Our EVPI inspired question ranking model (in summary)

EVPI ( qi | c )= P( aj | c , qi ) U( c + aj )
qi Q aj A

Question & Answer Answer Utility


Candidate Generator Modeling Calculator

72
Human-based Evaluation Design

73
Human-based Evaluation Design

TALK: Teaching Machines to Ask


Clarification Questions

74
Human-based Evaluation Design

TALK: Teaching Machines to Ask


Clarification Questions

What is going on?

What is EVPI?

How many candidates do you consider?

How is answer used in selecting useful questions?

When is lunch?

75
Human-based Evaluation Design

TALK: Teaching Machines to Ask Annotator 1


Clarification Questions
Best Valid
What is going on?

What is EVPI?

How many candidates do you consider?

How is answer used in selecting useful questions?

When is lunch?

Note: We use UpWork to find expert annotators 76


Human-based Evaluation Design

TALK: Teaching Machines to Ask Annotator 1 Annotator 2


Clarification Questions
Best Valid Best Valid
What is going on?

What is EVPI?

How many candidates do you consider?

How is answer used in selecting useful questions?

When is lunch?

Note: We use UpWork to find expert annotators 77


Human-based Evaluation Design (Union of “best”)

TALK: Teaching Machines to Ask Annotator 1 Annotator 2


Clarification Questions
Best Valid Best Valid
What is going on?

What is EVPI?

How many candidates do you consider?

How is answer used in selecting useful questions?

When is lunch?

Note: We use UpWork to find expert annotators 78


Human-based Evaluation Design (Intersection of “valid”)

TALK: Teaching Machines to Ask Annotator 1 Annotator 2


Clarification Questions
Best Valid Best Valid
What is going on?

What is EVPI?

How many candidates do you consider?

How is answer used in selecting useful questions?

When is lunch?

Note: We use UpWork to find expert annotators 79


Research Questions for Experimentation

80
Research Questions for Experimentation
1.  Does a neural network architecture improve upon non-neural baselines?

81
Research Questions for Experimentation
1.  Does a neural network architecture improve upon non-neural baselines?

2.  Are answers useful in identifying good questions?

82
Research Questions for Experimentation
1.  Does a neural network architecture improve upon non-neural baselines?

2.  Are answers useful in identifying good questions?

3.  Does EVPI formalism improve over a traditionally trained neural network?

83
Neural Baseline Model

o  Neural (c, q, a)

Value between 0 and 1

Feedforward

ci qi ai
Both Neural (c, q, a) and
Neural
Network EVPI (q|c, a) have similar
Context Ques Ans no. of parameters
LSTM LSTM LSTM

Word embedding module

ci qi ai

84
Human based evaluation results on StackExchange

Union of Best

Random 17.5

0 10 20 30 40
Precision @1

85
Human based evaluation results on StackExchange

Union of Best

Bag-of-ngrams (c, q, a) 19.4

Random 17.5

0 5 10 15 20 25 30 35 40
Precision @1

86
Human based evaluation results on StackExchange

Union of Best

Features (c, q) 23.1

Bag-of-ngrams (c, q, a) 19.4

Random 17.5

0 5 10 15 20 25 30 35 40
Precision @1
Nandi, Titas, et al. IIT-UHH at SemEval-2017 task 3: Exploring multiple features for community question
answering and implicit dialogue identification. Workshop on Semantic Evaluation (SemEval-2017).
87 2017.
Human based evaluation results on StackExchange

Union of Best

Neural (c, q, a) 25.2

Non-linear vs linear
Features (c, q) 23.1

Bag-of-ngrams (c, q, a) 19.4

Random 17.5

0 5 10 15 20 25 30 35 40
Precision @1

88
Human based evaluation results on StackExchange

Union of Best

Neural (c, q, a) 25.2


Explicitly modeling
“answer” is useful
Neural (c, q) 21.9

Features (c, q) 23.1

Bag-of-ngrams (c, q, a) 19.4

Random 17.5

0 5 10 15 20 25 30 35 40
Precision @1

89
Human based evaluation results on StackExchange

Union of Best

EVPI (q|c, a) 27.7 Mainly differ in


their loss function
Neural (c, q, a) 25.2

Neural (c, q) 21.9

Features (c, q) 23.1

Bag-of-ngrams (c, q, a) 19.4

Random 17.5
Train: 61,678
Tune: 7,710
0 5 10 15 20 25 30 35 40
Test: 500 Precision @1
Note: Difference between EVPI and all baselines is statistically significant with p < 0.05

90
Talk Outline

o  How we build the clarification questions dataset?


ü  Two datasets: StackExchange & Amazon

o  How we rank clarification questions from an existing set?


ü  Answers are helpful in identifying useful questions
ü  EVPI formalism outperforms traditional neural network

o  How we generate clarification questions from scratch?

o  How we control specificity of the generated clarification questions?

o  Future Directions

91
Talk Outline

o  How we build the clarification questions dataset?


ü  Two datasets: StackExchange & Amazon

o  How we rank clarification questions from an existing set?


ü  Answers are helpful in identifying useful questions
ü  EVPI formalism outperforms traditional neural network

o  How we generate clarification questions from scratch?

o  Future Directions

o  Conclusion

Sudha Rao, Hal Daumé III, "Answer-based Adversarial Training for Generating Clarification
Questions”, In Submission 92

92
Issue with the ranking approach

o  It only regurgitates previously seen questions

Existing contexts New unseen contexts

Contexts with Ubuntu OS Contexts with Windows OS

What version of Ubuntu do you have? What version of Windows do you have?

93
Issue with the ranking approach

o  It only regurgitates previously seen questions

o  It relies on Lucene to get the initial set of candidate questions

Existing contexts New unseen contexts

Contexts with Ubuntu OS Contexts with Windows OS

What version of Ubuntu do you have? What version of Windows do you have?

94
Sequence-to-sequence neural network model

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. NIPS 2014

95
Sequence-to-sequence neural network model

o  Given an input sequence, generate output sequence one word at a time

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. NIPS 2014

96
Sequence-to-sequence neural network model

o  Given an input sequence, generate output sequence one word at a time

A B C <EOS>

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. NIPS 2014

97
Sequence-to-sequence neural network model

o  Given an input sequence, generate output sequence one word at a time

A B C <EOS>

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. NIPS 2014

98
Sequence-to-sequence neural network model

o  Given an input sequence, generate output sequence one word at a time

W X

A B C <EOS> W

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. NIPS 2014

99
Sequence-to-sequence neural network model

o  Given an input sequence, generate output sequence one word at a time

W X Y

A B C <EOS> W X

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. NIPS 2014

100
Sequence-to-sequence neural network model

o  Given an input sequence, generate output sequence one word at a time

W X Y Z

A B C <EOS> W X Y

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. NIPS 2014

101
Sequence-to-sequence neural network model

o  Given an input sequence, generate output sequence one word at a time

W X Y Z <EOS>

A B C <EOS> W X Y Z

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. NIPS 2014

102
Sequence-to-sequence neural network model

o  Given an input sequence, generate output sequence one word at a time

o  Trained to maximize the likelihood of input-output pairs in data

W X Y Z <EOS>

A B C <EOS> W X Y Z

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. NIPS 2014

103
Max-likelihood clarification question generation model

Context Loss function

Loss = - log Pr(q|c)

Question
Generator
(Seq2seq)

Question

104
Max-likelihood clarification question generation model

Context Loss function

Loss = - log Pr(q|c)

Question
Generator Issues
(Seq2seq)
o  Maximum-likelihood (MLE) training generates
generic questions

Question What are the dimensions?


Is this made in China?

o  MLE relies heavily on the original question.


Contexts can have multiple good questions

Li et al. A diversity-promoting objective function for neural conversation models. In NAACL, 2016.

105
Max-utility based clarification question generation model

Context

Question Answer
Generator Generator
(Seq2seq) (Seq2seq)

Question Answer

106
Max-utility based clarification question generation model

Context

Question Answer
Utility
Generator Generator Reward
Calculator
(Seq2seq) (Seq2seq)

Question Answer

107
Max-utility based clarification question generation model

Context

Question Answer
Utility
Generator Generator Reward
Calculator
(Seq2seq) (Seq2seq)

Question Answer

Train Question Generator to Maximize this Reward

108
Max-utility based clarification question generation model

Context
Reward
Calculator

Question Answer
Utility
Generator Generator Reward
Calculator
(Seq2seq) (Seq2seq)

Question Answer

Train Question Generator to Maximize this Reward

109
Max-likelihood vs Max-utility

Context Objective: Objective:


Maximize likelihood of Maximize reward
(context, question) pairs
Question
Generator
(Seq2seq)
Loss Function: Loss Function:
Loss = - log Pr(q|c) Loss = - reward(q|c)

Question

Reward
Calculator

Reward

110
Max-likelihood vs Max-utility

Context Objective: Objective:


Maximize likelihood of Maximize reward
(context, question) pairs
Question
Generator
(Seq2seq)
Loss Function: Loss Function:
Loss = - log Pr(q|c) Loss = - reward(q|c)

Question
Differentiable Non- Differentiable

Reward
Calculator
Similar to discrete metrics
like BLEU & ROUGE

Reward

Ranzato, Marc'Aurelio, et al. "Sequence level training with recurrent neural networks." ICLR 2016

111
Max-likelihood vs Max-utility

Context Objective: Objective:


Maximize likelihood of Maximize reward
(context, question) pairs
Question
Generator
(Seq2seq)
Loss Function: Loss Function:
Loss = - log Pr(q|c) Loss = - reward(q|c)

Question
Differentiable Non- Differentiable

Reward Therefore, we use


Calculator Reinforcement Learning

Reward

Ranzato, Marc'Aurelio, et al. "Sequence level training with recurrent neural networks." ICLR 2016

112
ould generate an answer that would increase the utility of the context by adding useful
to it (see §2.3 for details).
ptimizing metrics like BLearning
Reinforcement LEU and ROUGE , this U TILITY function
for Clarification QuestionalsoGeneration
operates on dis-
utputs, which makes optimization difficult due to non-differentiability. A successful
oach dealing with the non-differentiability while also retaining some advantages of max-
hood training is the Mixed Key
Context Incremental
Idea: Cross-Entropy Reinforce (Ranzato et al., 2015)
M IXER). In M IXER, the overall loss L is differentiated as in R EINFORCE (Williams,
ü  Estimate loss by drawing samples (“questions”)
Question
L(✓) Eqs ⇠p✓ r(q s )
=Generator ; r✓ L(✓) = - Eq s ⇠p✓ r(q
Loss = s
)r✓ log
reward(q p✓ (q s )
s|c) (3)
(Seq2seq)
a random output sample according to the model p✓ , where ✓ are the parameters of the
We then approximate the expected gradient using a single sample q s = (q1s , q2s , ..., qTs )
odel distribution (p✓ ). In R EINFORCE, the policy is initialized random, which can cause
gence times.Question
To solve this, M IXER starts by optimizing maximum likelihood and slowly
imizing the expected reward from Eq 3. For the initial time steps, M IXER optimizes
r the remaining (T ) time steps, it optimizes the external reward.
Reward
el, we minimize the U TILITY-based
Calculator loss Lmax-utility defined as:
T
X
Lmax-utility = (r(q p ) r(q b )) log p(qt |q1 , q2 , ..., qt 1 , ct ) (4)
Reward
t=1

) is the U TILITY based reward on the predicted question and r(q b ) is a baseline reward
o reduce the high variance otherwise observed when using R EINFORCE. 113
ould
lity
ty thatgenerate
that as the
would
would an
be answer
beutility
obtained
obtained thatthat ifwould
ifwould
the be
the context increase
contextobtained were
were theif utility
the
updated
updated of
contextwith
with thethewere
thecontext updated
answer
answer byto toadding
with useful
the proposed
the the
proposed answer
Rao to it & Daumé
(see §2.3 III (2018)
for details). Recently
observed Rao
that & Daumé
usefulness III
of (2018)
a question observed can be that usefulness
better measured of a q
We We use
use thisquestion.
this observation
observation We use totoas this
define
define observation
aa U U TILITYto
TILITY define
based
based rewarda
reward U function
function
TILITY based and
and reward
train
train thefunction
the updated trw
question
question and
ity that would be obtained ifthe theutility
context thatwere would updatedbe obtained with the if answer
the context to the were proposed
to tooptimize
optimize generator
this reward.
this to optimize
reward. Weand
We trainthis
train thereward.
the We
UUClarification train
reward
Ureward thefunction
to U predict areward
the to predict
likelihood that the
a l
rewardto predict
TILITYthe likelihood that
Weptimizing
use thismetrics
Reinforcement observation like BLearning
question.
toLEU define aRU Wefor
OUGE
TILITYuse ,TILITY
TILITY
this
based observation
TILITY to
function
Question define also
and Uoperates
train
Generation
TILITY on
the based dis-rewar
question
would
would
utputs, question
generate
generate
which anananswer
makes would
answer generate
that would
that
optimization
generator would an answer
increase
toincrease
difficult
optimize duethat
the
the would
utility
utility
toreward
this increase
of the
of the
non-differentiability.
reward. We the utility
context
context
train by Uof
byAadding
adding the useful
successful context
useful b
to optimize this reward. We train the U TILITY to predict the the likelihood TILITY thatrewarda
on
n totoitdealing
oach it(see
(seeinformation
§2.3
§2.3
with for
for the to itquestion
details).
details). (see §2.3would
non-differentiability for details). whilethe
generate also anretaining
answer some
that advantages
would of themax-
would generate an answer that would increase utility of the context byincrease
adding usefulutility
ohood to ittraining is for
the Mixed Incremental Cross-Entropy for Reinforce (Ranzato et al., 2015)
Context
noptimizing
optimizing (seeSimilar
metrics
metrics
§2.3 like BBinformation
to details).
optimizing
like LEU
LEU
Key metrics
and
and
Idea: tolike
RROUGE
OUGE it (see
,,B this
this §2.3
LEU U and
UTILITY
TILITY Rdetails).
OUGE function
function , this U TILITY
also
also operates
operates function
on dis-also
on
M IXER ). which
outputs,
outputs, In M text
crete
which IXERmakes
makes , theoptimization
outputs, overall
which loss makes L is optimization
difficult differentiated
due as in R EINFORCE
difficult
to non-differentiability.
non-differentiability. due A(Williams,
toRnon-differentiability
successful
optimizing metrics likeoptimization
BSimilar
LEU and ü  to R difficult
optimizing
OUGE
Estimate , due
this
loss by U to
metrics TILITY
drawing like B
function
LEU
samples andalso A
operates
OUGE
(“questions”)
successful
, thison U dis-TILIT
proach
roach dealingrecent
dealing withapproach
with the
the dealing
non-differentiability
non-differentiability with the non-differentiability
while
while also
also retaining
retaining while
some
some also
advantagesretaining
advantages of
of some
max-
max- adv
outputs, which makes optimization s crete text outputs, difficultwhich due tomakes s optimization
non-differentiability. difficult
A due
successful to non
Question s
elihood
lihood L(✓) =imum
training
training Eqisis
Generatorlikelihood
s ⇠p the
the r(qMixed
Mixed ) training
;Incremental
Incremental
r is
L(✓)the
Loss
recent approach dealing =Mixed
= - E
Cross-Entropy
Cross-Entropy Incremental
s
with
⇠p r(q
reward(qReinforce
Reinforce
)r Cross-Entropy
logs|c)
the non-differentiability p (Ranzato
(Ranzato
(q ) Reinforce
et
et al.,
al.,
while (3)
2015)
2015) (Ranz
also ret
roach dealing with the ✓
non-differentiability ✓
while also
q ✓
retaining ✓
some advantages

of max-
(M(MIXERIXER).).algorithm
In M
In MIXER
(Seq2seq) IXER (M , , the
the
IXER
imum ).
overall
overall In M loss
loss
IXER
likelihood L L , isthe
is overall
differentiated
differentiated
training loss as
as
L is
inin differentiated
RR EINFORCE (Williams,
EINFORCE as in
(Williams, R EINFO
lihood
a random training
output is the
sample Mixed Incremental
according ü  to theCross-Entropy
Differentiate model the p✓is theas
, where
loss Mixed
Reinforce ✓ areIncremental
(Ranzato
the parameters Cross-Entropy
et al.,of2015)
the
1992): algorithm (M ). In M , the overall s loss sL is differentiate
We (Mthen ). In M IXERthe
IXERapproximate , the s overall
expected loss
gradient IXER
L is differentiated
using
s a single
IXER s as
sample in R qEINFORCE
= (q
s 1 s 2s, q (Williams,
s
, ..., q s
T) s
L(✓)
L(✓) = = E E q q ⇠p ✓✓
r(q
r(q s
L(✓)) )
1992): =
;; rrE ✓ q
✓ L(✓)
L(✓)
Loss r(q
odel distribution (p✓ ). In R EINFORCE, the policy is initialized random, which can cause
ss ⇠p s ⇠p ✓ = =
= - )EE qqs;s⇠p
⇠p ✓
r✓
r(q
r(q✓ logs
L(✓) )r
)r Pr(q
✓ =
✓ log
logs|c)Epp q
✓✓ (q
(q⇠p
s
)
reward(q
s )

r(q )r|c) ✓ log (3)
p ✓ (q
gence
sisaarandom times.
random
L(✓) where ToEsolve
output
=Question
output yqss⇠p is
samplethis,
r(q s )M
✓a random
sample ; output
according
IXER
according rstartsto
✓ L(✓) by
the=optimizing
tosample
the
L(✓) model
model
-= Eq s E
accordingpp✓q✓,s✓,⇠p
⇠p maximum
where
r(q s s s|c)
✓to)r
where r(q
reward(q the are;likelihood
model
✓✓✓) are
log the
pthe
✓r
s log and
), where
parameters
(qpparameters
✓✓L(✓) Pr(q= slowly
✓E are
of
of
s|c)
(3)
q sthe
⇠pthe✓ r(
p
imizing
WeWethen thenthe expectedWe
network.
approximate
approximate reward
the then
the froms Eq
approximate
expected
expected 3. For
gradient
gradient the the
using initial
expected
using aa single time
gradient
single sample
sample steps,
using qqssM a=
= single
IXER (q
(q ss ,optimizes
, qqsample
ss , ..., ssq s =
2, ..., qthe
Tp) , w
is a random
r the remaining output sample where
)Intime according y
steps, it(pis a
to random
the
optimizes model output
the p ,
external
✓ sample
where ✓ according
reward. are the to
parametersthe
11 model
2 of T ✓
model
model from the
distribution
distribution (T(p model
(p ✓).). In distribution
RR EINFORCE
EINFORCE
network. We , , ).
the
the
then In R
policy
policy is
is
EINFORCE
approximate initialized
initialized ,
the the policy
random,
random,
expected is initialized
which
which
gradient can
canusingrandom,
cause
cause a sing w
We then approximate ✓ the expected gradient ✓
using a single sample q s
= (q s
, q s
, ..., q s
)
ergence
rgence long
times.
times.
el, we distribution
minimize convergence
To
Reward
To solve
solve
the(pU this,
this, times.
MM IXER
IXER
-based To solve
starts
starts
loss, the by this,
by M
optimizing
optimizing IXER starts maximum
maximum by optimizing
likelihood
likelihood maximum
1 and
and 2 slowly
slowly Tlikeli
model Calculator ✓ ). Infrom
TILITY R EINFORCEthe model policy defined
Ldistribution
max-utility (p✓ ). as:
is initialized In Rrandom,EINFORCE , the can
which policy causeis initi
ptimizingshifts
ptimizing the to optimizing
theexpected
expected reward
rewardlong the from
from expected
Eq
Eq 3.
3. reward
For
For the
the from
initial
initial Eq 3.
time
time For the
steps,
steps, initial
M
M time
optimizes
IXER optimizes
IXER steps, M
ergence times. To solve this, Mconvergence
IXER starts by times.
optimizing To solve maximumthis, M IXER likelihoodstarts by andoptimizing
slowly
forthe
or theremainingLmle and
remaining (Tfor the))shifts
(T remaining
timesteps,
time steps,
to itToptimizes
itX
(T
optimizing optimizes timeexpected
) the thesteps,
the external it reward
external optimizesreward.
reward. from theEq external
3. For reward.
the initial
ptimizing the expected reward p from b Eq 3. For the initial time steps, M IXER optimizes
or
del,
del, the
we L
weremaining
In Reward =
our model,
minimize
max-utility
minimize theU(r(q
the
(T we
UTILITY
TILITY
)minimize
)Ltime r(q
and
-based
mle-based
))loss
for
steps, itthe
the
loss log
optimizes
UL LTILITYp(qt |q
remaining
max-utility
max-utility the
-based ,(T qexternal
defined
1defined 2 , loss
...,as:q)L
ttime
as: 1 , ctsteps,
reward. ) defined
max-utility it optimizesas: (4) the ext
REINFORCE: Ronaldt=1 J Williams. Simple statistical gradient-following algorithms for
del, we minimize the U In our model,
-based weLminimizedefined
loss
TILITY
X
the U as: -based loss L
TT max-utility
connectionist reinforcement
X b
TILITY
learning. Machine defin
max-utility
Tlearning , 8(3-4):229–256, 1992.
) is the ULLTILITY based reward onr(q
the X
max-utility=
max-utility (r(qppL
= (r(q ))predicted
bb ))
))max-utility
r(q = T (r(q
log pquestion
) tt|q
logp(q
p(q |qr(q b and r(q ) is a baseline reward
11,,qq2)) ...,qqtlog
2,,..., t 11,,p(q q2T, ..., qt 1 ,(4)
cctt))t |q1 ,114 ct )
o reduce the high variance otherwise observed X when using R EINFORCE . X
Lmax-utility = (r(q ) r(q )) t=1 log
p b t=1 Lmax-utility
p(qt |q1 ,=q2 , ...,t=1 p
(r(q qt )1 , cr(qt ) ))
b
log p(qt(4)|q1 , q2
ould
lity
ty thatgenerate
that as the
would
would an
be answer
beutility
obtained
obtained
question. thatthatWe ifwould
ifwouldthethis
the
use be increase
context
context obtained
observation were
were thetoif utility
the
updated
updated
define contextofwith
with
a(2018)
U thethe
TILITY were
thecontext updated
answer
answer
based byto
reward toadding
with
the proposed
the
function useful
theandanswer
proposed train theq
Rao to it & Daumé
(see §2.3 III (2018)
for generator
details). Recently
observed Rao
that & Daumé
usefulness III
of a question observed can be thatbetter usefulness
measured of a
We We use
use thisquestion.
this observation
observation We use totoas this
define
define
to observation
optimize
the aacontext
UUTILITY
utility
this
TILITY
that tobased
reward.
would define
based Wereward
be aobtained
reward
train Uthe function
function
TILITY U TILITY
if based
the and
and
reward reward
train
train
context the
thefunction
to predict
were question
question and trw
the likeliho
updated
ity that would be obtained question if
would the generate an were
answer updated
that would withincreasethe answer
the theutility to the
of the proposed
context bythe addil
to to optimize
optimize
ptimizing generator
this
this reward.to
reward. optimizeWeWe trainthis
train thereward.
the U U TILITY
TILITY We train
reward
reward the to
to U predict
predict
TILITY thereward likelihood
likelihood to predict that
that a
We use thismetrics
Reinforcement observation like BtoLEU
information toand
question.
define
Learning aRU
it (see Wefor
OUGE
TILITY
§2.3 for, details).
use this
based
Clarification U TILITY rewardfunction
observation to define
function
Question also
anda Uoperates
train
Generation
TILITY the basedon
questiondis-rewar
would
would
utputs, question
generate
generate
which anan
makes would
answer
answer generate
that
that
optimization
generator would
would an answer
increase
increase
difficult
to optimize due that
the
the to
this would
utility
utility increase
of
of
non-differentiability.
reward. the
the We the
context
context
train utility
by
by
the A of
adding
adding
U the
successful context
useful
useful reward b
to optimize this reward.
Similar We
to (see train
optimizing the
metrics U TILITY reward to predict
like B LEU and ROUGE, this U TILITY function also operat the likelihood TILITY that a
on
n toto it
oach dealingit (see
(see information
§2.3
§2.3
with for
for to
details).
details). it §2.3 for details).
would generate anthe non-differentiability
answer
crete textquestion
that
outputs, would would
which increase
makeswhile
generate thealso anretaining
utility
optimization answer of the
difficult some
that context
due advantages
would byincrease
adding
to non-differentiability. of the max-
useful utilityAs
ohood to ittraining is for
the Mixed Incremental Cross-Entropy for Reinforce (Ranzato et al., 2015)
Context
noptimizing
optimizing (seeSimilar
metrics
metrics
§2.3 to details).
recent
optimizing
like BBinformation
likeapproach LEU
LEU
Key dealing
metrics
and
and
Idea: to
RROUGEwith
OUGE it the
like (see
,,B non-differentiability
this
this §2.3
LEU U and
UTILITY
TILITY Rdetails).
OUGE function
function while
, this also retaining
U TILITY
also
also operates
operates some
functiononadvantage
on dis-also
M IXER ). which In M text imum , the likelihood
overall training
loss Lis is thedifferentiated
Mixed Incremental as in Cross-Entropy
R EINFORCE Reinforce(Williams, (Ranzato et
outputs,
outputs, crete
which IXERmakes
makes outputs, optimization
optimization which makes
difficult
difficult optimization
due
due toto difficult
non-differentiability.
non-differentiability. due to non-differentiability
AA successful
successful
optimizing metrics like BSimilar
algorithm (M IXER
LEU ü  to
and R optimizing
).Estimate
In M IXER
OUGE ,loss themetrics
,this by U TILITY
overall
drawing likefunction
loss LB isLEU
samples andalso
differentiated Roperates
OUGE
(“questions”)as in, R this
on U
EINFORCEdis- TILIT(
proach
roach dealingrecent
dealing withapproach
with the
the dealing
non-differentiability
non-differentiability
1992): optimization with the non-differentiability
while
while also
also retaining
retaining while
some
some also
advantages
advantagesretaining of
of some
max-
max- adv
outputs, which makes s crete text outputs, difficultwhich due to makes optimization
non-differentiability. difficult
A due
successful to non
Question s s
elihood
lihood L(✓) =imum
training
training Eqisis
Generatorlikelihood
s ⇠p the
the Mixed
Mixed
r(q ) training
;Incremental
Incremental
r
L(✓) is
=L(✓)
recent approachq dealing the
Loss E Mixed
=
= - E
Cross-Entropy
Cross-Entropy
r(q ssIncremental
)⇠pwith; r(q
reward(q
r
the Reinforce
Reinforce
)r Cross-Entropy
logs|c)
✓ non-differentiability
L(✓) = p
E (Ranzato
(Ranzato
(q )r(q s Reinforce
)r et
et al.,
al.,
✓ while
log p (3)
2015)
2015)
(q s(Ranz
also ret
)
roach dealing with the ✓
non-differentiability ✓ s ⇠pwhile

q
also ✓
retaining ✓
some ✓ s
q ⇠p advantages
✓ of✓max-
(M(MIXERIXER).).algorithm
In M
In MIXER
(Seq2seq) IXER (M , , the
the
IXER
y simum
).
overall
overall In M loss
lossIXER
likelihood L L ,isthe
is overall
differentiated
differentiated
training loss as
as
L is
inin differentiated
RR EINFORCE
EINFORCE as in
(Williams,
(Williams, R EINFO
lihood
a random training
output is the
sample
where Mixed is aIncremental
according random
ü  to theCross-Entropy
output
Differentiate model
sample the p✓is
according theas
, where
loss Mixed
Reinforce
to the ✓ are Incremental
model (Ranzato
the ✓ , whereet
pparameters Cross-Entropy
✓ areal.,of 2015)
the theparamet
1992): network. algorithm We then approximate
(M ).theIn expected
M gradient
, the using
overall s a single
loss sample
is q sq s=) (q1s , q
differentiate
We (Mthen ). In M IXER
IXERapproximate , the
the s overall
expected loss
gradient IXER
L is differentiated
using a
s R EINFORCE single
IXER as
sample in
ss, the policy
R qEINFORCE
= (q sL
, q
ss 1 s 2srandom,
(Williams,
s
, ..., T whichs
sfrom the smodel distribution (p ✓ ). -In is initialized
L(✓)
L(✓) = =
odel distribution (plong E E r(q
r(q L(✓) ) )
1992): ;=; rrE L(✓)
L(✓)
Loss r(q
==
= ) EE s;s⇠p r r(q
r(q L(✓)
log )r
)r Pr(q= log
logs|c)Epp (q
(q
reward(q)
) r(q )r |c) log (3)
p✓ (q
✓ ). In R EINFORCE times., the policy this,isMinitialized random, which can cause
q q s ⇠p
⇠p ✓ q
✓ s ⇠p q q ⇠p ✓ ✓ ✓ q
✓✓ s ⇠p ✓
✓✓ ✓ ✓ ✓ ✓
convergence To solve IXER starts by optimizing maximum likelihood a
gence
sisaarandomtimes.
random
L(✓) where ToEsolve
output
=Question
output is
yqssshifts
sample
⇠p this,
✓a random
sample
r(q s M IXER the
)according
to optimizing rstarts
; output
according to
✓ L(✓) by
the=optimizing
tosample
the
expectedL(✓) model
model Eq s E
according
-reward
= ✓q✓,s✓,⇠p
ppfrom
⇠p maximum
where
r(q Eq
reward(qs 3. sFor sthe
✓to)r
where r(q the✓✓✓) are
log ;likelihood
model
are
|c) the
pthe✓r
s logtime
), where
parameters
(qpparameters
initial ✓✓L(✓) and
Pr(q slowly
=steps,✓E
s|c)
are
of
of qM (3)
sthe
⇠pthe ✓ p
IXER r(
imizing
WeWethen thenthe expected
network.
approximate
approximate We
Lmle reward
and
the
the thenfor from
the
expected s Eq
remaining
approximate
expected 3. (T
gradient
gradient For
the the initial
) time
expected
using
using aa steps,
single
single ittime
gradient optimizes
sample
sample steps,
using the
qq ssM aexternal
== single
IXER (q
(q ss ,optimizes
, reward.
qq sample
ss , ..., ssq s =
2, ..., qtheTp) , w
is a random
r the remaining output sample where
according y is a
to random
the model output p , sample
where ✓ according
are the to
parametersthe
1 1 model
2 of T
model
model from the
distribution
distribution (T(p Inmodel
(p ✓).). In
our Intime
)model,R steps,
distribution
R we
EINFORCE
EINFORCE
network.
minimize
We
it(poptimizes
, , ).
theIn
the
the
then policy
U R
policy the
EINFORCE
TILITY
approximate is
is external

initialized
initialized
-based ,
loss
the
reward.
the policy
random,
random,
Lmax-utility
expected is initialized
which
which
defined
gradient
as: can
canusing random,
cause
cause a

sing w
We then approximate ✓ the expected ü  gradient
Mixed

using
Incremental a single
Cross-Entropy sample q s
=
Reinforce (q s
, q s
, ...,
(MIXER) q s
)
ergence
rgence long
times.
times.
el, we distribution
minimize convergence
To
Reward
To solve
solve
the(pU this,
this, times.
MM IXER
IXER
-based To solve
starts
starts
loss, the bythis,
by M
optimizing
optimizing IXER starts maximum
maximum by optimizing
likelihood
likelihood maximum
1 and
and 2 slowly
slowly Tlikeli
model Calculator ✓ ).
TILITYInfromR EINFORCE the model policyp defined
Ldistribution
max-utility (p✓b). as:
is initialized In Rrandom,
X T EINFORCE , the can
which policy cause is initi
ptimizingshifts
ptimizing the to optimizing
theexpected
expected reward
reward long the from
from expected
Eq
Eq 3.
3. reward
For
For the
the from
initial
initial
– r(q Eq 3.
time
time For the
steps,
steps, initial
M
M IXER
IXER time
optimizes
optimizes steps, M
ergence times. To solve this, Mconvergence starts==bytimes.
Lmax-utility (r(q ))To
optimizing solve )) this,log
r(q maximum Mp(q likelihood starts
t |q1 ,|c)q2 , ..., qby t 1optimizing
and , cslowly
t)
Loss - (r(q s b )) log Pr(q s
IXER IXER
forthe
or theremainingLmle and
remaining (Tfor the))shifts
(T remaining
timesteps,
time steps,
to itToptimizes
itX
(T
optimizing optimizes ) thetimeexpected
thesteps,
the external
external it reward
optimizes
t=1 reward.
reward. from theEq external
3. For reward.
the initial
ptimizing the expected rewardp from b Eq 3. For the initial time steps, M IXER optimizes
or
del,
del, the
we L
weremaining
In Reward =
our model,
minimize
max-utility
minimize where
the
the
(T U(r(q we
UTILITY
r(q
TILITY
p)
)Ltimeminimize
mle
r(q
and
) is-based
-basedUfor
steps,
the )) itthe
the
loss
loss
TILITY UL log
LTILITY p(q
remaining
optimizes
based t |q
reward
max-utility
max-utility the
-based ,(T qexternal
on
defined
1defined 2 ,the
...,predicted
loss q)L
ttime
as:
as: 1 , ctsteps,
reward. ) defined
question
max-utility it and
optimizes as:b ) is(4)
r(q athe ext
baseli
introduced to reduce thet=1 high variance otherwise observed when using R EINFORCE.
del, we minimize the In, the
U TILITY our
In M IXERRanzato
model,
-based
baseline
we
loss
isXTTLminimize
max-utility
estimated
levelusing thewith
defined
a linear
U TILITY
as: -based
T
regressor
loss Lmax-utility
ICLR 2016 defin
X X b that takes in the current hidden
et.al Sequence training recurrent neural networks
) is the ULLTILITY based reward onr(q
ppL)input the))ispredicted
bb )) pquestion and
b the r(q is
),,p(q a|qerror
baseline reward (4)
max-utilitythe
== model
(r(qas and =trained top(q
minimize mean squared p b
max-utility (r(q r(q
) max-utility (r(q
log
log )
p(q tt |q
|qr(q
11,,qq2))
2,, ...,
..., q qlog
tt 11 c
c tt))
t 1 , q 2 , ...,
(||r(q q , c
)t 1r(q t )||)2
)
o reduce the high we variance otherwisetraining
use a self-critical observed
X T
t=1
when
approach using
Rennie et R EINFORCE
al.
t=1 p
(2017) where. X
115T
the baseline is estima
p b t=1 b
Lmax-utility the
= reward
(r(qobtained
) r(qby)) Lmax-utility
log
the current model 1 ,=q2 ,greedy
p(qt |qunder (r(qqt decoding
..., )1 , cr(q )) test
t ) during p(qt(4)
logtime. |q1 , q2
Max-utility based clarification question generation model

Context

Question
Reward
Generator Reward
Calculator
(Seq2seq)

Question

Train Question Generator to Maximize this Reward


using Reinforcement Learning

116
Max-utility based clarification question generation model

Context
Trained Offline

Question
Reward
Generator Reward
Calculator
(Seq2seq)

Question

Train Question Generator to Maximize this Reward


using Reinforcement Learning

117
Max-utility based clarification question generation model

Context Train it along with


Question Generator

Question
Reward
Generator Reward
Calculator
(Seq2seq)

Question

Train Question Generator to Maximize this Reward


using Reinforcement Learning

118
Generative Adversarial Networks (GAN) based training

Context

Generator

Question
Generator
(Seq2seq)

Question

Model Data

119
Generative Adversarial Networks (GAN) based training

Context

Generator Discriminator

Question
Reward
Generator
Calculator
(Seq2seq)

Question

Model Data

120
Generative Adversarial Networks (GAN) based training
Real Data
Context
(context,
question,
Generator Discriminator
answer)

Question
Reward
Generator Reward
Calculator
(Seq2seq)

Question
ü  Discriminator tries to distinguish between
Model Data real and model data
ü  Generator tries to fool the discriminator by
generating real looking data

121
GAN-Utility based Clarification Question Generation Model
Real Data
Context
(context,
question,
Generator Discriminator
answer)

Question
Reward
Generator Reward
Calculator
(Seq2seq)

Question

Train Question Generator to Maximize this Reward


using Reinforcement Learning

122
Our clarification question generation model (in summary)

123
Our clarification question generation model (in summary)
Sequence-to-sequence model trained using MLE
Context

Question
Generator
(Seq2seq)

Question

124
Our clarification question generation model (in summary)
Sequence-to-sequence model trained using RL
Context

Question Answer
Utility
Generator Generator Reward
Calculator
(Seq2seq) (Seq2seq)

Question Answer

Train Question Generator to Maximize this Reward

125
Our clarification question generation model (in summary)
Sequence-to-sequence model trained using GAN
Context

Generator Discriminator

Question Answer
Utility
Generator Generator Reward
Calculator
(Seq2seq) (Seq2seq)

Question Answer

Train Question Generator to Maximize this Reward

126
Example outputs

Original: are these pillows firm and do they keep their shape

Max-Likelihood: what is the size of the pillow ?

GAN-Utility: does this pillow come with a cover or does it have a zipper ?

127
Example outputs

Original: are these pillows firm and do they keep their shape

Max-Likelihood: what is the size of the pillow ?

GAN-Utility: does this pillow come with a cover or does it have a zipper ?

Original: does it come with a shower hook or ring ?

Max-Likelihood: is it waterproof ?

GAN-Utility: is this shower curtain mildew resistant ?

128
Error Analysis of GAN-Utility model

Incompleteness

what is the size of the towel ? i 'm looking for something to be able to use it for

Word repetition

what is the difference between this and the picture of the cuisinart deluxe
deluxe deluxe deluxe deluxe deluxe deluxe

129
Research Questions for Experimentation

1.  Do generation models outperform simpler retrieval baselines?

130
Research Questions for Experimentation

1.  Do generation models outperform simpler retrieval baselines?

2.  Does maximizing reward improve over max-likelihood training?

131
Research Questions for Experimentation

1.  Do generation models outperform simpler retrieval baselines?

2.  Does maximizing reward improve over max-likelihood training?

3.  Does adversarial training improve over pretrained reward calculator?

132
Research Questions for Experimentation

1.  Do generation models outperform simpler retrieval baselines?

2.  Does maximizing reward improve over max-likelihood training?

3.  Does adversarial training improve over pretrained reward calculator?

4.  How do models perform when evaluated for specificity and usefulness?

133
Human-based Evaluation Design

Context
Evaluation set size: 500

Generated Question

•  How relevant is the question?

•  How grammatical is the question?

•  How specific is it to the product?

•  Does this question ask for new information?

•  How useful is this question to a potential buyer?

Note: We use a crowdsourcing platform called Figure-Eight 134


Human-based Evaluation Design

Context
Evaluation set size: 500

Generated Question

•  How relevant is the question?


All models equal and
close to reference
•  How grammatical is the question?

•  How specific is it to the product?

•  Does this question ask for new information?

•  How useful is this question to a potential buyer?

Note: We use a crowdsourcing platform called Figure-Eight 135


Human-based Evaluation Results on Amazon Dataset

How specific is the question to the given context?

136
Human-based Evaluation Results on Amazon Dataset

How specific is the question to the given context?

Original 3.07

0 0.5 1 1.5 2 2.5 3 3.5 4

Specificity score

137
Human-based Evaluation Results on Amazon Dataset

How specific is the question to the given context?

Information
Lucene 2.8 Retrieval

Original 3.07

0 0.5 1 1.5 2 2.5 3 3.5 4

Specificity score

138
Human-based Evaluation Results on Amazon Dataset

How specific is the question to the given context?

Max-Likelihood 2.84
Learning vs
Non-learning
Lucene 2.8

Original 3.07

0 0.5 1 1.5 2 2.5 3 3.5 4

Specificity score

139
Human-based Evaluation Results on Amazon Dataset

How specific is the question to the given context?

Max-Utility 2.88
Reinforcement
Learning
Max-Likelihood 2.84

Lucene 2.8

Original 3.07

0 0.5 1 1.5 2 2.5 3 3.5 4

Specificity score

140
Human-based Evaluation Results on Amazon Dataset

How specific is the question to the given context?

Gan-Utility 2.99
Adversarial
Training
Max-Utility 2.88

Max-Likelihood 2.84

Lucene 2.8

Original 3.07

0 0.5 1 1.5 2 2.5 3 3.5 4


Specificity score
Note: Difference between GAN-Utility and all others is statistically significant with p < 0.001

141
Human-based Evaluation Results on Amazon Dataset

Does the question ask for new information?

Gan-Utility 2.51
Difference
Max-Utility 2.47 Statistically
Insignificant

Max-Likelihood 2.48

Lucene 2.56

Original 2.68

0 0.5 1 1.5 2 2.5 3

New information score

142
Human-based Evaluation Results on Amazon Dataset

How useful is this question to a potential buyer?

Gan-Utility 0.94
Difference
Max-Utility 0.9 Statistically
Insignificant

Max-Likelihood 0.93

Lucene 0.77

Original 0.79

0 0.2 0.4 0.6 0.8 1

Usefulness score

143
Talk Outline

o  How we build the clarification questions dataset?


ü  Two datasets: StackExchange & Amazon

o  How we rank clarification questions from an existing set?


ü  Answers are useful in identifying useful questions
ü  EVPI formalism outperforms traditional neural network

o  How we generate clarification questions from scratch?


ü  Sequence-to-sequence model generates relevant & useful questions
ü  Adversarial training generates questions more specific to context

o  How we control specificity of the generated clarification questions?

o  Future Directions

144
Talk Outline

o  How we build the clarification questions dataset?


ü  Two datasets: StackExchange & Amazon

o  How we rank clarification questions from an existing set?


ü  Answers are useful in identifying useful questions
ü  EVPI formalism outperforms traditional neural network

o  How we generate clarification questions from scratch?


ü  Sequence-to-sequence model generates relevant & useful questions
ü  Adversarial training generates questions more specific to context

o  How we control specificity of the generated clarification questions?

o  Future Directions

145
Generic versus specific questions

Amazon

Generic questions Specific questions

Where was this manufactured? Is this induction safe?

What is the warranty? Is ladle included in the set?

146
Sequence-to-sequence model for question generation

Input Output

Context
Context Question

Training data
Question
Generator Context Question
(Seq2seq)

Context Question

Question

147
Sequence-to-sequence model for controlling specificity

Input Output

Context Specific
< specific > Question
Training data
Question Context Generic
Generator < generic > Question
(Seq2seq)
Context Generic
< generic > Question

Sennrich et al. Controlling politeness in neural machine translation via side constraints. NAACL 2016

148
Sequence-to-sequence model for controlling specificity

Input Output
Context
< specific > Context Specific
< specific > Question
Training data
Question Context Generic
Generator < generic > Question
(Seq2seq)
Context Generic
< generic > Question
Specific
Question

Sennrich et al. Controlling politeness in neural machine translation via side constraints. NAACL 2016

149
Annotating questions with level of specificity

Input Output
o  We need annotations on training data
Context Specific
o  Manually annotating is expensive
< specific > Question

Context Generic
< generic > Question

Context Generic
< generic > Question

150
Annotating questions with level of specificity

Input Output
o  We need annotations on training data
Context Specific
o  Manually annotating is expensive
< specific > Question
o  Hence
Context Generic
Ø  Use ask humans1 to annotate a set of
< generic > Question
3000 questions
Ø  Train a machine learning model to Context Generic
< generic > Question
automatically annotate the rest

1 We use a crowdsourcing platform called Figure-Eight

151
Specificity classifier

Context Question specific


Training
Specificity data
Context Question generic
Classifier

Context Question specific

Input Output

Louis & Nenkova. "Automatic identification of general and specific sentences by leveraging discourse annotations.” IJCNLP 2011

152
Specificity classifier

Context Question specific


Training
Test Input Specificity data
Context Question generic
Question Classifier

Context Question specific


Output
Input Output
specific OR generic

Louis & Nenkova. "Automatic identification of general and specific sentences by leveraging discourse annotations.” IJCNLP 2011

153
Specificity classifier

Context Question specific


Training
Test Input Specificity data
Context Question generic
Question Classifier

Context Question specific


Output
Input Output
specific OR generic

Features for training logistic regression model


ü  Question Length
ü  Path of question word in WordNet
ü  Syntax
ü  Polarity
ü  Question bag-of-words
ü  Average word embeddings

Louis & Nenkova. "Automatic identification of general and specific sentences by leveraging discourse annotations.” IJCNLP 2011

154
Summary of specificity-controlled question generation model

Context Question specific


Training
Test Input Specificity data
Context Question generic
Question Classifier

Context Question specific


Output
Input Output
specific OR generic

Context Specific
Training < specific > Question
Input Question data
Test Context
Generation Context Generic
< specific > Model < specific > Question
Output
Context Generic
Specific Question < specific > Question

Input Output

155
Specificity classifier results (with feature ablation)

Test Accuracy Training Accuracy

0.73
All features 0.79

0.71
Question bag-of-words 0.8

0.7
Syntax 0.71

0.64
Average word embeddings 0.66

0.65
Polarity 0.65

0.64
Path in WordNet 0.63

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

156
Example Outputs

Original: can this thermometer be left inside of a roast as it cooks ?

Max-Likelihood: is this thermometer dishwasher safe ?

GAN-Utility: is this a leave-in ?

Specificity-MLE (g): is it made in the usa ?

Specificity-MLE (s): can you use this thermometer to make a turkey ?

Specificity-GAN (g): is this dishwasher safe ?

Specificity-GAN (s): does this thermometer have a timer ?

157
Automatic metric based evaluation of question generation

Diversity

GAN-Utility
0.13

MLE
0.12

0 0.04 0.08 0.12 0.16 0.2

Diversity = Proportion of unique trigrams in the question

158
Automatic metric based evaluation of question generation

Diversity Diversity (specific)

0.14
Specificity-GAN-Utility

0.16
Specificity-MLE

GAN-Utility
0.13

MLE
0.12

0 0.04 0.08 0.12 0.16 0.2

Diversity = Proportion of unique trigrams in the question

159
Automatic metric based evaluation of question generation

Diversity Diversity (specific) Diversity (generic)

0.14
Specificity-GAN-Utility
0.1

0.16
Specificity-MLE
0.1

GAN-Utility
0.13

MLE
0.12

0 0.04 0.08 0.12 0.16 0.2

Diversity = Proportion of unique trigrams in the question

160
Automatic metric based evaluation of question generation

BLEU (specific)

Specificity-GAN-Utility 2.95

Specificity-MLE 4.45

GAN-Utility 2.69

MLE 1.41

0 2 4 6 8 10 12 14

161
Automatic metric based evaluation of question generation

BLEU (specific) BLEU (generic)

2.95
Specificity-GAN-Utility
12.84

4.45
Specificity-MLE
12.61

2.69
GAN-Utility
12.01

1.41
MLE
12.61

0 2 4 6 8 10 12 14

162
Talk Outline

o  How we build the clarification questions dataset?


ü  Two datasets: StackExchange & Amazon

o  How we rank clarification questions from an existing set?


ü  Answers are useful in identifying useful questions
ü  EVPI formalism outperforms traditional neural network

o  How we generate clarification questions from scratch?


ü  Sequence-to-sequence model generates relevant & useful questions
ü  Adversarial training generates questions more specific to context

o  How we control specificity of the generated clarification questions?

o  Future Directions

163
Talk Outline

o  How we build the clarification questions dataset?


ü  Two datasets: StackExchange & Amazon

o  How we rank clarification questions from an existing set?


ü  Answers are useful in identifying useful questions
ü  EVPI formalism outperforms traditional neural network

o  How we generate clarification questions from scratch?


ü  Sequence-to-sequence model generates relevant & useful questions
ü  Adversarial training generates questions more specific to context

o  How we control specificity of the generated clarification questions?

o  Future Directions

164
1. Using multi-modal context (Text + Image)

165
1. Using multi-modal context (Text + Image)

MODEL Generated Question


Using product description: Does the set include a ladle?

Using description + image: Are they induction compatible?

166
2. Knowledge-grounded question asking

Post related to Ubuntu


Operating System

What version of Ubuntu are you using?

167
2. Knowledge-grounded question asking

Operating systems Knowledge


ü  <version> Base
ü  <bit>

Post related to Ubuntu


Operating System

What version of Ubuntu are you using?

168
2. Knowledge-grounded question asking

Operating systems Knowledge


Toaster
ü  <version> Base
ü  <dimensions>
ü  <bit> ü  <watts>

Post related to Ubuntu Product description


Operating System about Toaster

What version of Ubuntu are you using? What is the dimensions of the toaster?

169
3. Towards more intelligent dialog agents

Please bring me my
coffee mug from the
kitchen

What color is your


coffee mug?

Black

I found two black mugs.


Is yours the one with
the NFL logo?

170
CONCLUSION

ü  Identify importance of teaching machines to ask clarification questions

171
CONCLUSION

ü  Identify importance of teaching machines to ask clarification questions

ü  Create dataset of clarification questions (StackExchange & Amazon)

172
CONCLUSION

ü  Identify importance of teaching machines to ask clarification questions

ü  Create dataset of clarification questions (StackExchange & Amazon)

ü  Novel model for ranking clarification questions

173
CONCLUSION

ü  Identify importance of teaching machines to ask clarification questions

ü  Create dataset of clarification questions (StackExchange & Amazon)

ü  Novel model for ranking clarification questions

ü  Novel model for generating clarification questions

174
CONCLUSION

ü  Identify importance of teaching machines to ask clarification questions

ü  Create dataset of clarification questions (StackExchange & Amazon)

ü  Novel model for ranking clarification questions

ü  Novel model for generating clarification questions

ü  Novel model for generating specificity-controlled clarification questions

175
Collaborators

Philip Resnik Marine Carpuat UMD

Hal Daumé III


My wonderful advisor J

Allyson Ettinger Yogarshi Vyas Xing Niu

Daniel Marcu Kevin Knight Joel Tetreault Paul Mineiro


ISI Internship Grammarly Internship MSR Internship
Acknowledgements

Ø  Thesis committee members:


Hal Daumé III
Philip Resnik
Marine Carpuat
Jordan Boyd-Graber
David Jacobs
Lucy Vanderwende (University of Washington)

Ø  CLIP lab members

Ø  Friends and family


Publications

o  Clarification Questions
ü  Sudha Rao, Hal Daumé III, "Learning to Ask Good Questions: Ranking Clarification Questions
using Neural Expected Value of Perfect Information ”, ACL 2018 (Best Long Paper Award)

ü  Sudha Rao, Hal Daumé III, “Answer-based Adversarial Training for Generating Clarification
Questions” In Submission

o  Formality Style Transfer


ü  Sudha Rao, Joel Tetreault, "Dear Sir or Madam, May I introduce the GYAFC Dataset: Corpus,
Benchmarks and Metrics for Formality Style Transfer”, NAACL 2018
ü  Xing Niu, Sudha Rao, Marine Carpuat, "Multi-task Neural Models for Translating Between Styles
Within and Across Languages”, COLING 2018

o  Semantic Representations
ü  Sudha Rao, Yogarshi Vyas, Hal Daume III, Philip Resnik, "Parser for Abstract Meaning
Representation using Learning to Search", Meaning Representation Parsing, NAACL 2016
ü  Sudha Rao, Daniel Marcu, Kevin Knight Hal Daumé III, "Biomedical Event Extraction using
Abstract Meaning Representation” Biomedical Natural Language Processing, ACL 2017

o  Zero Pronoun Resolution


ü  Sudha Rao, Allyson Ettinger, Hal Daumé III, Philip Resnik, "Dialogue focus tracking for zero
pronoun resolution", NAACL 2015
Backup Slides

179
Generalization beyond large datasets

ü  Bootstrapping process:
1.  Use template based approach or humans to write initial set of questions
2.  Train model on small set of questions and generate more
3.  Add these (noisy) questions to training data and retrain

ü  Domain adaptation:
1.  Find a similar domain that has large no. of clarification questions
2.  Train neural network parameters on out-domain and tune on in-domain

ü  Use reading comprehension questions data (like SQUAD)


1.  Remove the answer sentence from the passage
2.  The question can now become a clarification question

ü  EVPI idea can be applicable to identify “good” questions among several


template-based questions

180
StackExchange dataset: Example of comment as answer

Make install: cannot run strip: No such file or directory

root@server:~/shc-3.8.9# make install


*** Installing shc and shc.1 on /usr/local
*** Do you want to continue? y
install -c -s shc /usr/local/bin/
install: cannot run strip: No such file or directory Initial Post
install: strip process terminated abnormally
make: *** [install] Error 1

I don't use make install often. Can someone tell me how to fix it? J

what exactly are you trying to install and Question


what version of ubuntu are you on ? comment

i 'm trying to install shc-3.8.9 and i tried Answer


to follow this guide : use ubuntu 14.04 comment

181
StackExchange dataset: Example of comment as answer

Not enough space to build proposed filesystem while setting up superblock

Just bought a new external drive. Plugged it in, erased current partition using fdisk and
created a new extended partition using fdisk. Used all the defaults for start and end
blocks. I then try to format the new partition using the following:
sudo mkfs.ext4 /dev/sdb1 Initial
However, I received the following error: Post
mke2fs 1.42 (29-Nov-2011)
/dev/sdb1: Not enough space to build proposed filesystem while setting up superblock
Any ideas what could be wrong? Should I have created a primary partition? If so, why?

Question
are you installing from a bootable thumb drive ?
comment

i am booting from a dvd drive . i created a dvd Answer


with ubuntu 12.04 installation iso image on it . comment

182
StackExchange dataset: Example of edit as answer

VM with host communication

i run a program inside a vm which outputs 0 or 1


only . how can i communicate this result from the vm Initial Post
to my host machine ( which is ubuntu 12.04 )

guest os ? where does your program Question


output the result to ? comment

use virtualbox
2. virtual machine os : ubuntu 12.04 lts Edit to the post
3. host machine os : ubuntu 12.04 lts .

183
StackExchange dataset: Example of non-answer

My Ubunto 12.04 Installation hangs after “Preparing to install Ubuntu”.


What can I do to work around the problem?

I did download Ubuntu 12.04LTS. I tried to install - no progress. I tried to remove all
partition using a bootable version of GParted. I created one big partition ext4 formatted. Initial
It all did not help. The installation stops after "Preparing to install Ubuntu". All three Post
checkmarks are checked an I can click "Continue" but then nothing for hours. What can I
do? Please help!

why don't you try to create a partition Question


via gparted ? comment

i already know how to partition it using gparted . Answer


i am trying to expand my knowledge . comment

184
Human-based Evaluation Results (Specificity)
How specific is the question to the product?

300

250

200

150

100

50

0
Original Lucene Max-Likelihood Max-Utility GAN-Utility
This product Similar Products Products in Home & Kitchen N/A

185
Human-based Evaluation Results (Usefulness)
How useful is the question to a potential buyer?
300

250

200

150

100

50

0
Original Lucene Max-Likelihood Max-Utility GAN-Utility
Should be in the description Useful to large no. of users
Useful to small no. of users Useful only to person asking
N/A

186
Human-based Evaluation Results (Seeking new information)

Does the question ask for new information currently not included in the description?

450

400

350

300

250

200

150

100

50

0
Original Lucene Max-Likelihood Max-Utility GAN-Utility
Completely Somewhat No N/A

187
Human-based Evaluation Results (Relevance)

How relevant is the question to the product?

500

450

400

350

300

250

200

150

100

50

0
Original Lucene Max-Likelihood Max-Utility GAN-Utility
Yes No

188
Human-based Evaluation Results (Grammaticality)

How grammatical is the question?

500

450

400

350

300

250

200

150

100

50

0
Original Lucene Max-Likelihood Max-Utility GAN-Utility
Grammatical Comprehensible Incomprehensible

189
Human-based Evaluation Results

190
Error Analysis of MLE model

Short and Generic questions

dishwasher safe ?

what are the dimensions ?

is this a firm topper ?

where is this product made ?

191
Error Analysis of Max-Utility model

Incompleteness and repetition

what are the dimensions of this item ? i have a great size of baking pan and pans and pans

what are the dimensions of this topper ? i have a queen size mattress topper topper topper

what is the height of the trash trash trash trash trash

can this be used with the sodastream system system system system

192
Error Analysis of GAN-Utility model

<unk> tokens and bad long questions

what is the difference between the <unk> and the <unk> ?

what is the size of the towel ? i 'm looking for something to be able to use it for

what is the difference between this and the picture of the cuisinart <unk> deluxe
deluxe deluxe deluxe deluxe deluxe deluxe

193
Error Analysis of specificity model

Incomplete questions

what are the dimensions of the table ? i 'm looking for something to put it in a suitcase

what is the density of the mattress pad ? i 'm looking for a mattress for a memory foam

does this unit come with a hose ? i need to know if the window window can be mounted

Disconnected multi-sentence questions

can you use this in a conventional oven ? i have a small muffin pan for baking .

what is the height of this unit ? i want to use it in a rental .

what are the dimensions of the basket ? i need to know if the baskets are in the picture

194
Reward Calculator

Context Question Answer Training

Real Data

Reward
Calculator

Generated Generated
Context Testing
Question Answer

Model Output

195
Other types of Question Generation

o  Liu, et al. “Automatic question generation for literature review writing support." International
Conference on Intelligent Tutoring Systems. 2010
o  Penas and Hovy, “Filling knowledge gaps in text for machine reading” International Conference
on Computational Linguistics: Posters ACL 2010
o  Artzi & Zettlemoyer, “Bootstrapping semantic parsers from conversations” EMNLP 2011
o  Labutov, et al.“Deep questions without deep understanding” ACL 2015
o  Mostafazadeh et al. "Generating natural questions about an image." ACL 2016
o  Mostafazadeh et al. "Multimodal Context for Natural Question and Response Generation.” IJCNLP
2017.
o  Rothe, Lake and Gureckis. “Question asking as program generation” NIPS 2017.

196
Key Idea behind Expected Value of Perfect Information (EVPI)

How to configure path or set environment variables for installation?


I'm aiming to install ape, a simple code for pseudopotential generation.
I'm having this error message while running ./configure
<error message>
So I have the library but the program installation isn't finding it.
Any help? Thanks in advance!

Possible questions

(a)  What version of Ubuntu do you have? à Just right

(b)  What is the make of your wifi card? à Not useful

(c) Are you running Ubuntu 14.10 kernel 4.4.0-59-


generic on an x86 64 architecture? à Unlikely to add value

Avriel, Mordecai, and A. C. Williams. "The value of information and stochastic programming." Operations Research 18.5
197(1970)

197
4. Writing Assistance

Hi Kathy,

We have decided to meet at 10am tomorrow


to discuss the next group assignment.

Hey John,

Thanks for letting me know.


Where are we meeting though?

198
4. Writing Assistance

Hi Kathy,

We have decided to meet at 10am tomorrow


to discuss the next group assignment.

Hey John,

Thanks for letting me know.


Where are we meeting though?

Oh right. Forgot to mention that.


In the 3rd floor grad lounge.

199
4. Writing Assistance

Do you want to include


the location?
Hi Kathy,

We have decided to meet at 10am tomorrow


to discuss the next group assignment.

200
4. Writing Assistance

Do you want to include


the location?
Hi Kathy,

We have decided to meet at 10am tomorrow


to discuss the next group assignment.

Hi Kathy,

We have decided to meet at 10am tomorrow


in the 3rd floor grad lounge to discuss the
next group assignment.

Sounds good!

201
3. Interactive Search Query

Historical gas prices

202
3. Interactive Search Query

Historical gas prices

Which region?

203
3. Interactive Search Query

Historical gas prices

Which region?

Which period?

204
4. Asking questions to help build reasoning

Jack and Jill were running a race.


Jack reached the finish line when
Jill was still a few steps behind.
Jill was quite upset.

205
4. Asking questions to help build reasoning

Jack and Jill were running a race.


Jack reached the finish line when
Jill was still a few steps behind. Why was Jill
Jill was quite upset. upset?

206
4. Asking questions to help build reasoning

Jack and Jill were running a race.


Jack reached the finish line when
Jill was still a few steps behind. Why was Jill
Jill was quite upset. upset?

Because she
did not win
the race.

207
Generating Natural Questions from Images (+ Text)

Q: Was anyone injured in the crash?

Q: Is the motorcyclist alive?

Q: What caused the accident?

User1: My son is ahead and surprised!

User2: Did he end up winning the race?

User1: Yes he won, he can’t believe it!

o  Mostafazadeh et al. "Generating natural questions about an image." ACL 2016

o  Mostafazadeh et al. "Image-Grounded Conversations: Multimodal Context for Natural


Question and Response Generation." IJCNLP 2017.

208
Example outputs

Original: where is the hose attachment hole located ?

Max-Likelihood: does it have a remote control?

GAN-Utility: does this unit have a drain hose on the outside ?

Original: how quickly does it boil water ?

Max-Likelihood: does this kettle have a warranty ?

GAN-Utility: does it come with a cord ?

209
ndal.a discriminator.
(2017) proposed a sequence
The generator GAN model
is an arbitrary model gfor 2 Gtext
that generation
produces outpu to
e gradient
eat their The
uestions). update
generator from
as anisthe
discriminator discriminator
agent
another and
model 2 Dto
used the theattempts
generator.
discriminator
that as aRecently
to classifyreward
betwee
dequence
ive modelGAN
model-generated
GAN-Utilityusing model
outputs. for
Thetext
reinforcement
based goal generation
Clarificationof the
learning toGeneration
generator
Question overcome
is to generate
techniques. Our this
data issue.
such thatT
GAN-based
Model
criminator;
gent andGAN
quence the goal
use the of thewith
discriminator
discriminator
model as isa toreward
two main be ablefunction
to successfully
modifications: a)toWe distinguish
update
use the M be
ge
neratedlearning
ement data. In the process of trying
techniques. Our to fool the discriminator,
GAN-based approach the
is generator by
inspired pro
or (§2.2) instead
Ø  General
as close
GANof
as possible policy
Objective
to the gradient
real data approach;
distribution. and the
Generically, b) GAN
We use the UisT
objective
two main modifications:
scriminator a) We use the
instead of a convolutional M IXER
neural algorithm
network (CNN). as our ge
y gradient
LGAN (D, approach;
G) = max minand Eb) x⇠We
p̂ loguse
d(x)the + EU log(1 function
TILITY d(g(z)))(§2.3) as
our model, the
onvolutional answer
neural is an latent
network
d2D g2G
(CNN). variable: we do not actually use i
z⇠p z

scriminator. Because of this, we train our discriminator using (con


sampled
nswer)
an latent from
triplestheastrue
variable: data distribution
positive
we do instances
not p̂, and
actually is sampled
andzuse
(context,
it from aexcept
prior question,
generated
anywhere defined
to on
train
bles
e z . Clarification
pØ 
negative instances.Question Model GAN Objective
Formally, our objective function is:
this, we train our discriminator using (context, true question, gener
instances
GANs and successfully
have been (context, generated
used for image question, generated
tasks, training GANsanswer) triple
for text generat
gmally, our
dueLtoGAN-U objective
the discrete
(U =function
, M)naturemax minis:Eq⇠
of outputs in p̂text. The discrete
log u(c, q)) + Efrom
q, A(c, outputs the gen
c⇠p̂ log(1
u2U m2M
ficult to pass the gradient update from the discriminator to the generator. Recent
7)min
proposed
q⇠p̂alog
Eis sequence
u(c, q,GAN
A(c, model
q)) +E for
c⇠text generation
p̂ log(1 u(c,tom(c),
overcome
A(c, this issue.
m(c))))
here U the U discriminator, M is the M generator,
enerator as an agent and use the discriminator as a reward function to update p̂the
m2M TILITY IXER is g
nswer) triples and A
l using reinforcement is ourtechniques.
learning answer generator.
Our GAN-based approach is inspired b
criminator,
GAN model withMtwois the
mainMmodifications:
IXER generator, is our
a) Wep̂use the data of algorithm
M IXER (context,asquest
our g
rinstead
answer generator.
of policy gradient approach; and b) We use the U TILITY function (§2.3) a
5 instead
or P RETRAINING
of a convolutional neural network (CNN).
210
el, the answer is an latent variable: we do not actually use it anywhere except to tra
Generative Adversarial Networks (GAN)

Generator Discriminator

211
Generative Adversarial Networks (GAN)
Goal: Train a model to generate digits
Latent Space + Noise

Generator Discriminator

Model Data

212
Generative Adversarial Networks (GAN)
Real Data
Latent Space + Noise

1 (Real)
Generator Discriminator
0 (Fake)

ü  Discriminator tries to distinguish between


real and model data

Model Data

213
Generative Adversarial Networks (GAN)
Real Data
Latent Space + Noise

1 (Real)
Generator Discriminator
0 (Fake)

ü  Discriminator tries to distinguish between


real and model data
ü  Generator tries to fool the discriminator by
Model Data
generating real looking data
ü  Thus, the generator is optimized

214
Style transfer prior work

Informal Formal

Gotta see both sides of the story You have to consider both sides of the story

Shakespearean English Modern English

I should kill thee straight I ought to kill you right now

Brooke et al. Automatic acquisition of lexical formality. ACL 2010

Niu et al. Controlling the formality of machine translation output. EMNLP 2017

Rao and Tetreault. Corpus, Benchmarks and Metrics for Formality Style Transfer. NAACL 2018

Xu et al. Paraphrasing for style COLING 2012

215
Upwork annotation statistics

Ø  Agreement on best in ‘strict sense’: 0.15

Ø  Agreement on best in ‘relaxed sense’: 0.87


(best by one annotator is valid by another)

Ø  Agreement on valid in ‘strict sense’: 0.58


(binary judgment of is valid)

Ø  Original in union of best: 72%

Ø  Original in intersection of best: 20%

Ø  Original in intersection of valid: 76%

Ø  Original in union of valid: 88%

216
Detailed human evaluation results

B1 [ B2 V1\V2 Original
Model p@1 p@3 p@5 MAP p@1 p@3 p@5 MAP p@1
Random 17.5 17.5 17.5 35.2 26.4 26.4 26.4 42.1 10.0
Bag-of-ngrams 19.4 19.4 18.7 34.4 25.6 27.6 27.5 42.7 10.7
Community QA 23.1 21.2 20.0 40.2 33.6 30.8 29.1 47.0 18.5
Neural (p, q) 21.9 20.9 19.5 39.2 31.6 30.0 28.9 45.5 15.4
Neural (p, a) 24.1 23.5 20.6 41.4 32.3 31.5 29.0 46.5 18.8
Neural (p, q, a) 25.2 22.7 21.3 42.5 34.4 31.8 30.1 47.7 20.5
EVPI 27.7 23.4 21.5 43.6 36.1 32.2 30.5 49.2 21.4

Table 4.1: Model performances on 500 samples when evaluated against the union
of the “best” annotations (B1 [ B2), intersection of the “valid” annotations (V 1 \
V 2) and the original question paired with the post in the dataset. The di↵erence
between the bold and the non-bold numbers is statistically significant with p <
0.05 as calculated using bootstrap test. p@k is the precision of the k questions
ranked highest by the model and MAP is the mean average precision of the ranking
predicted by the model.

217
Detailed human evaluation results (without original)

B1 [ B2 V1\V2
Model p@1 p@3 p@5 MAP p@1 p@3 p@5 MAP
Random 17.4 17.5 17.5 26.7 26.3 26.4 26.4 37.0
Bag-of-ngrams 16.3 18.9 17.5 25.2 26.7 28.3 26.8 37.3
Community QA 22.6 20.6 18.6 29.3 30.2 29.4 27.4 38.5
Neural (p,q) 20.6 20.1 18.7 27.8 29.0 29.0 27.8 38.9
Neural (p,a) 22.6 20.1 18.3 28.9 30.5 28.6 26.3 37.9
Neural (p,q,a) 22.2 21.1 19.9 28.5 29.7 29.7 28.0 38.7
EVPI 23.7 21.2 19.4 29.1 31.0 30.0 28.4 39.6

Table 4.2: Model performances on 500 samples when evaluated against the union
of the “best” annotations (B1 [ B2) and intersection of the “valid” annotations
(V 1 \ V 2), with the original question excluded. The di↵erence between all numbers
except the random and bag-of-ngrams are statistically insignificant.

predict the “best” question. The model predicts “why would you need this” with
very high probability likely because it is a very generic question, unlike the question
marked as “best” by the annotator which is too specific. In the third example,218 the
model again predicts a very generic question which is also marked as “valid” by the
0.50 define “ frozen ” . did it panic ? or did something else happen ?
0.50 maybe you need to use your ‘fn‘ key when pressing print screen ?
0.50 tried ctrl + alt + f2 ?
StackExchange example
0.49 does the script output
process 1 iteration (ranking)
successfully ?
0.49 laptop or desktop ?
Title: How to flash a USB drive?.
Post: I have a 8 GB Sandisk USB drive. Recently it became write somehow.
So I searched in Google and I tried to remove the write protection
through almost all the methods I found. Unfortunately nothing worked.
So I decided to try some other ways.
Some said that flashing the USB drive will solve the problem.
But I don’t know how. So how can it be done ?
1.01 what file system was the drive using ?
1.00 was it 16gb before or it has been 16mb from the first day you used it ?
0.74 which os are you using ? which file system is used by your pen drive ?
0.64 what operation system you use ?
0.51 can you narrow ’a hp usb down ’ ?
0.50 could the device be simply broken ?
0.50 does it work properly on any other pc ?
0.50 usb is an interface , not a storage device . was it a flash drive or a portable disk ?
0.49 does usb flash drive tester have anything useful to say about the drive ?
0.49 your drive became writeable ? or read-only ?

Table 4.4: Examples of human annotation from the unix and superuser domain of
our dataset. The questions are sorted by expected utility, given in the first column.
The “best” annotation is marked with black ticks and the “valid”’ annotations
are marked with grey ticks .

219
43
StackExchange example output (ranking)

Title: Frozen Linux Recovery Without SysReq


Post: RHEL system has run out of memory and is now frozen.
The SysReq commands are not working, so I am not even sure that
/proc/sys/kernel/sysrq is set to 1.
Is there any other ”safe” way I can reboot w/out power cycling?
0.91 why would you need this ?
0.77 maybe you need to use your ‘fn‘ key when pressing print screen ?
0.59 do you have sudo rights on this computer ?
0.55 are you sure sysrq is enabled on your machine ?
0.52 did you look carefully at the logs when you rebooted after it hung ?
0.51 i assume you have data open which needs to be saved ?
0.50 define “ frozen ” . did it panic ? or did something else happen ?
0.50 maybe you need to use your ‘fn‘ key when pressing print screen ?
0.50 tried ctrl + alt + f2 ?
0.49 does the script process 1 iteration successfully ?
0.49 laptop or desktop ?
Title: How to flash a USB drive?.
Post: I have a 8 GB Sandisk USB drive. Recently it became write somehow.
So I searched in Google and I tried to remove the write protection
220
through almost all the methods I found. Unfortunately nothing worked.
So I decided to try some other ways.
StackExchange example output (ranking)
Title: Ubuntu 15.10 instant resume from suspend
Post: I have an ASUS desktop PC that I decided to install Ubuntu onto.
I have used Linux before, specifically for 3 years in High School.
I have never encountered suspend resume issues on Linux before until now.
It appears that my PC is instantly resuming from suspend on Ubuntu 15.10
I am not sure what is causing this, but my hardware is as follows:
Intel Core i5 4460 @ 3.2 GHz
2 TB Toshiba 7200 RPM disk
8 GB DDR3 RAM
Corsair CX 500 Power Supply
AMD Radeon R9 270X Graphics - 4 Gigs
ASUS Motherboard for OEM builds
VIA technologies USB 3.0 Hub
Realtek Network Adapter
Any help is greatly appreciated. I haven’t worked with Linux in over a year,
and I am trying to get back into it, as I plan to pursue a career in Comp Science
(specifically through internships and trade school) and this is a problem,
as I don’t want to drive the power bill up.
(Even though I don’t pay it, my parents do.)
0.87 does suspend - resume work as expected ?
0.71 what , specifically , is the problem you want help with ?
0.70 the suspend problem exits only if a virtual machines is running ?
0.67 is the pasted workaround still working for you ?
0.57 just wondering if you got a solution for this ?
0.50 we *could* try a workaround , with a keyboard shortcut . would that interest you ?
0.49 did you restart the systemd daemon after the changes ‘sudo restart systemd-logind‘
?
0.49 does running ‘sudo modprobe -r psmouse ; sleep 1 ; sudo modprobe psmouse‘ enable
the touchpad ?
0.49 2 to 5 minutes ?
0.49 does it work from the menu or not ?

Table 4.3: Example of human annotation from the askubuntu domain of our dataset.
The questions are sorted by expected utility, given in the first column. The “best”
annotation is marked with black ticks and the “valid”’ annotations are marked
with grey ticks . 221
Automatic metric based evaluation (question generation)

Amazon StackExchange
Model Diversity Bleu Meteor Diversity Bleu Meteor
Reference 0.6934 — — 0.7509 — —
Lucene 0.6289 4.26 10.85 0.7453 1.63 7.96
MLE 0.1059 17.02 12.72 0.2183 3.49 8.49
Max-Utility 0.1214 16.77 12.69 0.2508 3.89 8.79
GAN-Utility 0.1296 15.20 12.82 0.2256 4.26 8.99

Table 5.1: Diversity as measured by the proportion of unique trigrams in model


outputs. Bleu and Meteor scores using up to 10 references for the Amazon
dataset and up to six references for the StackExchange dataset. Numbers in bold
are the highest among the models. All results for Amazon are on the entire test set
whereas for StackExchange they are on the 500 instances of the test set that have
multiple references.

5.3.5 Automatic Metric Results


222
Table 5.1 shows the results on the two datasets when evaluated according to
automatic metrics.
Specificity-controlled question generation model results

Generic Specific
Model Diversity Bleu Meteor Diversity Bleu Meteor

Reference 0.6071 — — 0.7474 — —


Lucene 0.6289 2.90 12.04 0.6289 1.76 6.96

MLE 0.1201 12.61 13.29 0.1201 1.41 5.06


Max-Utility 0.1299 12.17 14.06 0.1299 1.79 5.57
GAN-Utility 0.1304 12.01 14.35 0.1304 2.69 6.12
Specificity-MLE 0.1023 12.61 13.53 0.1640 4.45 7.85
Specificity-GAN-Utility 0.1012 12.84 14.18 0.1357 2.95 6.08

Table 6.2: Diversity as measured by the proportion of unique trigrams in model


outputs. Bleu and Meteor scores are calculated using an average of 6 references
under generic setting and using an average of 3 references under specific setting.
The highest numbers within a column is in bold (except for diversity under generic
setting where the lowest number is bold).

Our best model is the one that uses all the features and attains an accuracy
of 0.73 on the test set. In comparison, a baseline model that predicts the specificity
223
label at random gets an accuracy of 0.58 on the test set.