You are on page 1of 32

Repeat Intro of Self

Speech, Ink, and Slides: The


Interaction of Content Channels

Richard Anderson
Mention:
Crystal Hoyer -Richard
Craig Prince -Jonathan
Jonathan Su In Audience
Fred Videon
Steve Wolfman

1
Background
 Content channels simply refers to the
various sources of information in some
context (e.g. audio, slides, digital ink,
video, etc.)
 Our focus is on the use of digital ink in
the classroom setting
 We want to capture/playback/analyze
these channels intelligently
2
Why do we want to analyze
content channels?
 We want to make it easier to interact
with electronic materials
 Better search and navigation of
presentations Conversion to:
Braille/Screen Reader
 Accessibility for the
hearing/learning/visually impaired
 Generating text transcripts
 Recognizing high level behaviors
3
Distance Learning Classes

4
Classroom Presenter
 General tool for giving presentations on the
Tablet PC
 Many similar systems – our findings
applicable to all such systems
 Enables writing directly on the slides
 Tablet PC enables high-quality digital ink
 Used in over 100 courses so far
 Allows us to collect real usage data
5
Questions We Wanted to Explore
 High Level Question: What is the potential
for automatic analysis of archived content?
 Other Questions:
 How well can digital ink be recognized by itself?
 How closely are different content channels tied
together?
 Speech and Ink?
 Ink and Slide Content?
 Can we identify high level behaviors by
analyzing the content channels?

6
Research Methodology

1. We wanted to understand what real


presentation data is like
2. We collected several 100’s of hrs. of
recorded lectures from distance learning
classes
3. Analyzed the data in various ways to help
answer our guiding questions.
• Note: All examples given here are from real
presentations!
7
Outline
 Motivation
 Handwriting Recognition
 Joint Writing and Speech Recognition
 Attentional Mark Identification
 Activity Inference: Recognizing
Corrections

8
Handwriting Recognition
 Classroom lectures on Tablet PC offer
interesting challenges for handwriting
recognition
 Somewhat Awkward
• Small Surface to Write On
• Bad Angle to the Tablet PC
 Hastily Written
• Concentrating on Speaking
• Excited / Nervous
9
Recognition Examples
Mark: Success/Failure
 The Good:

 The Bad:

 The Ugly:

10
Recognition Procedure
 Studied isolated words/phrases written
on slides
 Removed all non-textual ink
 Fed through the Microsoft Handwriting
Recognizer
 No training done!

11
Mention That These Results Are Surprisingly Good!

Handwriting Recog. Results

Exact Alternate Close None


Prof. A 16 (88%) 1 (6%) 0 (0%) 1 (6%)
Prof. B 146 (59%) 26 (10%) 6 (2%) 71 (29%)
Prof. C 18 (42%) 5 (11%) 1 (3%) 19 (44%)
Prof. D 262 (61%) 45 (11%) 9 (2%) 111 (26%)
Prof. E 408 (79%) 46 (9%) 2 <(1%) 58 (11%)
Total 850 (68%) 123 (10%) 18 (1%) 260 (21%)
12
Each Row Represents a Different Lecturer
Outline
 Motivation
 Handwriting Recognition Look at Potential
 Joint Writing and Speech Recognition
 Attentional Mark Identification
 Activity Inference: Recognizing
Corrections

13
Joint Writing and Speech
Recognition
 Co-expression of ink and speech
 Is digital ink spoken as it is written?
 Yes, but how often? How “closely” to the
written text? In Time/Accuracy, Wanted Empirical Evidence
 Can speech be used to disambiguate
handwriting?
 Can handwriting be used to disambiguate
speech? (incl. deictic references)

14
Examples
Eswaran, Gray, Loric, Traiger
 Difficult for Speech and Ink Recognition
DigiMon

 Difficult Written Abbreviations

Java 2 Enterprise Edition


 Speech/Ink Used to Disambiguate Ink/Speech
corn flakes

15
Experiment
 Examined instances of isolated word writing
 Selected word writing episodes at random
but uniformly from the various instructors
 Generated transcripts manually from the
audio
 Checked whether the instructor spoke the
exact word written
 Measured the time between the written and
spoken word
16
Speech/Text Co-occurrence
Results

Exact Approx None Simul 0-2s > 2s


A 1 (100%) 0 (0%) 0 (0%) 1 (100%) 0 (0%) 0 (0%)
B 9 (75%) 3 (25%) 0 (0%) 12 (100%) 0 (0%) 0 (0%)
C 9 (82%) 2 (18%) 0 (0%) 10 (91%) 1 (9%) 0 (0%)
D 12 (86%) 2 (14%) 0 (0%) 10 (71%) 4 (29%) 0 (0%)
E 9 (56%) 7 (44%) 0 (0%) 7 (44%) 4 (25%) 5 (31%)
Total 40 (74%) 14 (26%) 0 (0%) 40 (74%) 9 (17%) 5 (9%)

Each Row Represents a Different Lecturer


17
Outline
 Motivation
 Handwriting Recognition
 Joint Writing and Speech Recognition
 Attentional Mark Identification
 Activity Inference: Recognizing
Corrections

18
Attentional Mark Identification
 Attentional Marks are…
 First step is to Identify a stroke as a
mark
 Tying Attentional Marks to slide
content is important
 Attentional Ink provides a concrete link
between speech and slide content!
19
Example

20
Method
 Segmentation
 Few strokes
 Close spatial and temporal proximity
 Mark Recognition
 Created hand tuned classifiers for:
Circles, Lines, Bullets/Ticks
 Matched with slide content

21
Experiment

1. Identified and Classified Attention


Marks by Hand
 Two different people per slide
 Identified type of mark as well as slide
content mark referred to
2. Identified Attention Marks
Automatically
3. Compared Resulting Identification
22
Content Matching Issues
 Hard to determine exactly what content a
mark refers to

Not just a recognition Issue, but also related to HOW people draw

23
Content Matching Cont.
 Granularity of content parsing can be an
issue

24
Attentional Ink Recognition
Accuracy

Exact Exact to Close Non-Match


Punctuation
Circles 70 (66%) 13 (12%) 6 (6%) 17 (16%) 106
Underlines 207 (61%) 22 (6%) 44 (13%) 66 (20%) 339
Bullets 52 (60%) 0 (0%) 0 (0%) 35 (40%) 87
329 (62%) 35 (7%) 50 (9%) 118 (22%) 532

25
Outline
 Motivation
 Handwriting Recognition
 Joint Writing and Speech Recognition
 Attentional Mark Identification
 Activity Inference: Recognizing
Corrections

26
Recongizing Corrections
 Why?
 Want to answer the broad question:
- “Can we recognize patterns of activity by analyzing the ink and
speech channels?”
Our vision allows false
 Useful for Presenters positives
- Occurs frequently (about 1-3 per lecture)
 But Non-trivial

27
Recognizing Corrections
 Identified Six Types of Corrections

28
Looked through large # of lectures, wide range of marks
Example Results

No Table Because:
1. Not a robust experiment 29
2. Proof of Concept
Wrap-up
 We wanted to understand the nature
of real data to direct our focus when
building tools for automatic analysis

 Our studies provided the necessary


understanding to accomplish this

30
Wrap-up (Cont.)
ALL OPEN for Refinement
Specific Results:
 Basic handwriting recognition is
surprisingly good
 Very strong co-occurrence of written and
spoken words
 We were able to identify attentional
marks and the content associated with
them
 Activity Recognition: There are certain
high-level activities that we can identify
31
Questions?

E-mail
cmprince@cs.washington.edu
jonsu@cs.washington.edu

Classroom Presenter Website


http://www.cs.washington.edu/education/dl/presenter/

32