writing-and-speech-recognition3212 | Speech Recognition | Data

Repeat Intro of Self

Speech, Ink, and Slides: The Interaction of Content Channels
Richard Anderson Crystal Hoyer Craig Prince Jonathan Su Fred Videon Steve Wolfman
Mention: -Richard -Jonathan In Audience

1

Background

 

Content channels simply refers to the various sources of information in some context (e.g. audio, slides, digital ink, video, etc.) Our focus is on the use of digital ink in the classroom setting We want to capture/playback/analyze these channels intelligently
2

Why do we want to analyze content channels?

We want to make it easier to interact with electronic materials

 

Better search and navigation of presentations Conversion to: Braille/Screen Reader Accessibility for the hearing/learning/visually impaired Generating text transcripts Recognizing high level behaviors
3

Distance Learning Classes

4

Classroom Presenter

   

General tool for giving presentations on the Tablet PC Many similar systems – our findings applicable to all such systems Enables writing directly on the slides Tablet PC enables high-quality digital ink Used in over 100 courses so far Allows us to collect real usage data
5

Questions We Wanted to Explore
 

High Level Question: What is the potential for automatic analysis of archived content? Other Questions:
 

How well can digital ink be recognized by itself? How closely are different content channels tied together?
 

Speech and Ink? Ink and Slide Content?

Can we identify high level behaviors by analyzing the content channels?
6

Research Methodology
1.

2.

3.

We wanted to understand what real presentation data is like We collected several 100’s of hrs. of recorded lectures from distance learning classes Analyzed the data in various ways to help answer our guiding questions.
Note: All examples given here are from real presentations!
7

Outline
    

Motivation Handwriting Recognition Joint Writing and Speech Recognition Attentional Mark Identification Activity Inference: Recognizing Corrections
8

Handwriting Recognition

Classroom lectures on Tablet PC offer interesting challenges for handwriting recognition

Somewhat Awkward
• •

Small Surface to Write On Bad Angle to the Tablet PC Concentrating on Speaking Excited / Nervous
9

Hastily Written
• •

Recognition Examples
Mark: Success/Failure

The Good:

The Bad:

The Ugly:

10

Recognition Procedure

 

Studied isolated words/phrases written on slides Removed all non-textual ink Fed through the Microsoft Handwriting Recognizer No training done!

11

Mention That These Results Are Surprisingly Good!

Handwriting Recog. Results
Exact Prof. A 16 (88%) Prof. C 18 (42%) Prof. E 408 (79%) Total Alternate
1 (6%)

Close
0 (0%) 6 (2%) 1 (3%) 9 (2%) 2 <(1%) 18 (1%)

None
1 (6%) 71 (29%) 19 (44%) 111 (26%) 58 (11%) 260 (21%)
12

Prof. B 146 (59%) 26 (10%)
5 (11%)

Prof. D 262 (61%) 45 (11%)
46 (9%) 850 (68%) 123 (10%)

Each Row Represents a Different Lecturer

Outline
    

Motivation Handwriting Recognition Look at Potential Joint Writing and Speech Recognition Attentional Mark Identification Activity Inference: Recognizing Corrections
13

Joint Writing and Speech Recognition

Co-expression of ink and speech

Is digital ink spoken as it is written?

Yes, but how often? How “closely” to the written text? In Time/Accuracy, Wanted Empirical Evidence

Can speech be used to disambiguate handwriting? Can handwriting be used to disambiguate speech? (incl. deictic references)
14

Examples
Eswaran, Gray, Loric, Traiger

Difficult for Speech and Ink Recognition
DigiMon

Difficult Written Abbreviations

Java 2 Enterprise Edition

Speech/Ink Used to Disambiguate Ink/Speech
corn flakes

15

Experiment
    

Examined instances of isolated word writing Selected word writing episodes at random but uniformly from the various instructors Generated transcripts manually from the audio Checked whether the instructor spoke the exact word written Measured the time between the written and spoken word
16

Speech/Text Co-occurrence Results
Exact A B C D E 1 (100%) 9 (75%) 9 (82%) 12 (86%) 9 (56%) Approx None Simul 0-2s 0 (0%) 0 (0%) 1 (9%) > 2s 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (100%) 3 (25%) 0 (0%) 12 (100%) 2 (18%) 0 (0%) 10 (91%) 2 (14%) 0 (0%) 7 (44%) 0 (0%)

10 (71%) 4 (29%) 0 (0%) 7 (44%) 4 (25%) 5 (31%) 40 (74%) 9 (17%) 5 (9%)

Total 40 (74%) 14 (26%) 0 (0%)

Each Row Represents a Different Lecturer
17

Outline
    

Motivation Handwriting Recognition Joint Writing and Speech Recognition Attentional Mark Identification Activity Inference: Recognizing Corrections
18

Attentional Mark Identification
 

Attentional Marks are… First step is to Identify a stroke as a mark Tying Attentional Marks to slide content is important Attentional Ink provides a concrete link between speech and slide content!
19

Example

20

Method

Segmentation
 

Few strokes Close spatial and temporal proximity Created hand tuned classifiers for: Circles, Lines, Bullets/Ticks

Mark Recognition

Matched with slide content
21

Experiment
1. Identified and Classified Attention Marks by Hand
 

Two different people per slide Identified type of mark as well as slide content mark referred to

2. Identified Attention Marks Automatically 3. Compared Resulting Identification
22

Content Matching Issues

Hard to determine exactly what content a mark refers to

Not just a recognition Issue, but also related to HOW people draw
23

Content Matching Cont.

Granularity of content parsing can be an issue

24

Attentional Ink Recognition Accuracy
Exact Circles Underlines Bullets Exact to Punctuation Close Non-Match

70 (66%) 13 (12%) 6 (6%) 17 (16%) 207 (61%) 22 (6%) 44 (13%) 66 (20%) 52 (60%) 329 (62%) 0 (0%) 35 (7%)

106 339

0 (0%) 35 (40%) 87 50 (9%) 118 (22%) 532

25

Outline
    

Motivation Handwriting Recognition Joint Writing and Speech Recognition Attentional Mark Identification Activity Inference: Recognizing Corrections
26

Recongizing Corrections

Why?  Want to answer the broad question:
- “Can we recognize patterns of activity by analyzing the ink and speech channels?”

 

Useful for Presenters
- Occurs frequently (about 1-3 per lecture)

Our vision allows false positives

But Non-trivial

27

Recognizing Corrections

Identified Six Types of Corrections

Looked through large # of lectures, wide range of marks

28

Example Results

No Table Because: 1. Not a robust experiment 2. Proof of Concept

29

Wrap-up

We wanted to understand the nature of real data to direct our focus when building tools for automatic analysis Our studies provided the necessary understanding to accomplish this

30

Wrap-up (Cont.)
Specific Results:
 
ALL OPEN for Refinement

Basic handwriting recognition is surprisingly good Very strong co-occurrence of written and spoken words We were able to identify attentional marks and the content associated with them Activity Recognition: There are certain high-level activities that we can identify
31

Questions?
E-mail
cmprince@cs.washington.edu jonsu@cs.washington.edu

Classroom Presenter Website
http://www.cs.washington.edu/education/dl/presenter/

32

Sign up to vote on this title
UsefulNot useful