Teaching A For Forgetful Learners

Teaching Multiple Concepts to Forgetful Learners
Yuxin Chen
chenyuxin@uchicago.edu
Anette Hunziker Yuxin Chen Oisin Mac Aodha Manuel Gomez Andreas Krause Pietro Perona Yisong Yue Adish Singla
Rodriguez
Applications: Language learning
• Over 300+ million students

• Based on spaced repetition of flash cards
• Can we compute optimal personalized schedule of repetition?
2
Teaching Interaction Using Flashcards
Learning Phase (1)

Interaction at time 𝒕 = 𝟏, 𝟐, … 𝑻
1. Teacher displays a flashcard 𝑥𝑡 ∈ {1,2, . . , 𝑛}
2. Learner’s recall is 𝑦𝑡 ∈ 0, 1 1
3. Teacher provides the correct answer

Answer: Spielzeug
3
Learning Phase (1) Learning Phase (2) Learning Phase (3) Learning Phase (4) Learning Phase (5) Learning Phase (6)
1
x jouet
2
Answer: Spielzeug
3 Answer: Spielzeug Answer: Nachtisch Answer: Buch Answer: Nachtisch Answer: Nachtisch
jouet Submit
x jouet ✓ Spielzeug x ✓ Buch x nachs ✓ Nachtisch
2 jouet Submit Spielzeug Submit Submit Buch Submit nachs Submit Nachtisch Submit
3
Background on Teaching Policies
Example setup
- 𝑇 = 20 and 𝑛 = 5 concepts given by 𝑎, 𝑏, 𝑐, 𝑑, 𝑒
Naïve teaching policies

• Random: 𝑎→𝑏→𝑎→𝑒→𝑐→𝑑→𝑎→𝑑→𝑐→𝑎→𝑏→𝑒→𝑎→𝑏→𝑑→𝑒→
• Round-robin: 𝑎→𝑏→𝑐→𝑑→𝑒→𝑎→𝑏→𝑐→𝑑→𝑒→𝑎→𝑏→𝑐→𝑑→𝑒→𝑎→
Key limitation: Schedule agnostic to learning process

4
Background: Pimsieur Method (1967)
Used in mainstream language learning platforms

Based on spaced repetition ideas
𝑎→𝑏→𝑎→𝑏→𝑐→𝑎→𝑐→𝑏→𝑑→𝑐→𝑑→𝑎→𝑏→𝑑→𝑐→𝑒→
5
Background: Leitner System (1972)
Adaptive spacing intervals

Key limitation: No guarantees on
the optimality of the schedule
Student 1: 𝑎→𝑏→𝑎→𝑏→𝑐→𝑎→𝑐→𝑏→𝑑→𝑐→𝑑→𝑎→𝑏→𝑑→𝑐→𝑒→
Student 2: 𝑎→𝑎→𝑏→𝑎→𝑏→𝑐→𝑎→𝑐→𝑎→𝑏→𝑐→𝑎→𝑏→𝑎→𝑑→𝑐→
6
Modeling Forgetfulness
Half-life Regression (HRL) model [Settles & Meeder, ACL 2016]

Time since last teaching concept
Time t
Recall Probability ti
of Concept i: pi (t | history) = 2 hi
Half-life estimate
(depends on feedback)
hi += ai
hi += ai
hi += bi
hi += bi
Interactive Teaching Protocol
• For t = 1…T
- Teacher chooses concept 𝑖𝜖 1, … , 𝑚 (e.g., a flashcard)
- Learner tries to recall concept (success or fail)
- Teacher reveals answer (e.g., “Spielzug”)
• Goal: maximize Spielzeug
% '
11
𝑓 history = B B 𝑝𝑖 𝑡 | history$:&)$
𝑚𝑇
"#$ &#$
“Area Under Curve”
Naive Approaches
• Round Robin
- Doesn’t adapt to new estimates of learner recall probabilities
- Over-teaches easy concepts
- Under-teaches hard concepts
• Lowest Recall Probability

- Generalization of Pimsleur method and Leitner system
- Doesn’t consider change to recall probability
Greedy Teaching Algorithm (interactive)
• Choose concept i to maximize

Δ 𝑖 history = 𝐸*! 𝑓 history⨁ 𝑖, 𝑦& − 𝑓(history)
yt: success or failure of recall at time t

(randomness over model estimate)
ti
pi (t | history) = 2 hi
(hi updated after observing yt)

Characteristics of the Optimization Problem
• Non-submodular
- Gain of a concept 𝑥 can increase given longer history
- Captured by submodularity ratio 𝛾 over sequences
11
Characteristics of the Optimization Problem (cont.)
• Post-fix non-monotone
- 𝑓 orange ⨁ blue < 𝑓 blue
- Captured by curvature ω
12
Theoretical Guarantees: General Case
• Guarantees for the general case (any memory model)

• Utility of 𝜋 gr (greedy policy) compared to 𝜋 opt is given by
) &*(
𝛾)*& 𝜔+ 5 𝛾+ 1
𝐹 𝜋 !" ≥𝐹 𝜋 #$% 0 2 1− ≥𝐹 𝜋 #$% 1 − 𝑒 *0!"# 1 2!$%
𝑇 𝑇 𝜔-./
&'( +',
Theorem 1 Corollary 2
13
Theoretical Guarantees: HLR Model
• Consider the task of teaching 𝑛 concepts where each concept is

following an independent HLR model with the same parameters
𝑎 0 = 𝑧, 𝑏 0 = 𝑧 ∀ 𝑥 ∈ {1,2, . . , 𝑛}. A sufficient condition for the
algorithm to achieve (1 − 𝜖) high utility is
12 "
z ≥ max {log 𝑇, log 3𝑛 , log }
3'
14
D Teaching
D Algorithm
Teaching
Illustration: and Analysis
Algorithm
Simulation and Analysis
Results
Simulation Results
D.1 Analysis for HLRfor
D.1 Analysis memory model model
HLR memory
Round
Greedy
Robin
ng Algorithm
Teaching and Analysis
Algorithm and Analysis
s for HLRfor
Analysis memory model model
HLR memory
(a) Greedy
(a) Greedy (b) Optimal
(b) Optimal (c) Round(c)Robin
Round Robin (d) O
Optimal Objective
edy
(a) Greedy (b) Optimal
(b) Optimal (c) Round(c)
gRobin
RoundgRobin (d) Objective
g(d) Objective
g
(e) Marginal gain (f) (g) ! (h) Emp
User Study 150 participants from Mechanical Turk platform
T=40, m=15, total study time is about 25 mins
German dataset
Figure 6: Samples from the German dataset

GR LR RR RD
Biodiversity dataset
German Avg. gain 0.572 0.487 0.462 0.467
p-value - 0.0652 0.0197 0.0151 17
Biodiversity dataset
(a) Common: Owl, Cat, Horse, Elephant, Lion, Tiger, Bear
(b) Rare: Angwantibo, Olinguito, Axolotl, Ptarmigan, Patrijshond, Coelacanth, Pyrrhuloxia

Figure 7: Samples from the Biodiversity dataset
GR LR RR RD
Biodiversity Avg. gain 0.475 0.411 0.390 0.251
(all species) p-value - 0.0017 0.0001 0.0001
GR LR RR RD
Biodiversity Avg. gain 0.766 0.668 0.601 0.396
(rare species) p-value - 0.0001 0.0001 0.0001 18
Summary: Teaching Concepts to People
• Teaching forgetful learners

- Limited memory (modeling forgetfulness)
- Engagement (interface design)
• Challenges not covered in this talk:

- Limited inference power and noise
- Mismatch in representation
- Interpretability (e.g., teaching via labels vs. rich feedback)
- Safety (e.g., when teaching physical tasks)
- Fairness (e.g., when teaching a class)
- …
Questions?
19

Teaching A For Forgetful Learners

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Teaching A For Forgetful Learners

Uploaded by

Copyright:

Available Formats

Teaching Multiple Concepts to Forgetful Learners

• Over 300+ million students

Learning Phase (1)

3. Teacher provides the correct answer

Naïve teaching policies

Key limitation: Schedule agnostic to learning process

Used in mainstream language learning platforms

Adaptive spacing intervals

Half-life Regression (HRL) model [Settles & Meeder, ACL 2016]

• Goal: maximize Spielzeug

• Lowest Recall Probability

• Choose concept i to maximize

yt: success or failure of recall at time t

(hi updated after observing yt)

• Guarantees for the general case (any memory model)

• Consider the task of teaching 𝑛 concepts where each concept is

Figure 6: Samples from the German dataset

(a) Common: Owl, Cat, Horse, Elephant, Lion, Tiger, Bear

(b) Rare: Angwantibo, Olinguito, Axolotl, Ptarmigan, Patrijshond, Coelacanth, Pyrrhuloxia

• Teaching forgetful learners

• Challenges not covered in this talk:

You might also like