You are on page 1of 27

Gregory Aist Page 1 of 27 11/16/2010

Improving Elementary Students’ Reading Abilities

with Skill-Specific Spoken Dialogs

in a Reading Tutor that Listens

Ph.D. Thesis Proposal

Gregory Aist

Language Technologies Institute

Carnegie Mellon University

213 Cyert Hall, 4910 Forbes Avenue

Pittsburgh, PA 15213-3720

Thesis Committee:

Jack Mostow,, Robotics Institute/LTI, advisor

Albert Corbett,, HCII

Alex Rudnicky,, CSD/LTI

Charles Perfetti,, University of Pittsburgh

Gregory Aist Page 2 of 27 11/16/2010


The Literacy Challenge. Reading is fundamental. If children don't learn to read independently by

the fourth grade, they fall further and further behind in school. One-on-one instruction by trained

human tutors succeeds in helping kids learn to read, but is expensive and sometimes unavailable.

Efforts to duplicate the effects of one-on-one tutoring in large group settings have typically not

matched the performance of human tutors.

The Technology Opportunity. Advances in speech recognition and spoken dialog technology

have made possible computer-based reading tutoring. Intelligent tutoring systems based on

cognitive principles have previously proven successful in such varied domains as algebra and

computer programming. By combining spoken dialogue and intelligent tutoring systems we hope to

come closer to the goal of a "two-sigma" computer tutor for reading -- one that duplicates the two-

standard-deviation gain in reading skill observed for human-human tutoring.

Research Strategy. One methodology in reading research is to study successful human tutors and

identify what makes tutorial dialog effective. Unfortunately, human-human tutorial dialog cannot

be directly imitated by the computer. Errors in speech recognition, combined with the broad range

of discourse, domain and world knowledge used by human tutors, require an indirect approach.

Therefore, we intend to identify a few critical reading skills and explore which features of human-
Gregory Aist Page 3 of 27 11/16/2010

human dialog effectively train those skills. For each skill, by combining a cognitive model of the

skill with the dialog features that effectively train components of that skill, we will design human

computer multimodal dialogues that successfully train the desired skill.

Organization of this Proposal. In this proposal, we briefly describe related research in beginning

reading and in educational software for reading. We then describe Project LISTEN’s Reading

Tutor, including some of the technological and design considerations that will guide our dialog

design for reading tutoring. Next, we suggest several examples of reading skills we might focus on:

word attack, word comprehension, and passage comprehension. For each example skill, we

describe how that skill is learned and taught, discuss hypothetical computer human dialogs designed

to train that skill, propose methods for evaluating the effectiveness of such dialogs, and consider the

expected contributions of developing successful dialogs. (As implemented in the thesis, these

dialogs may be newly designed, or they may be modifications of existing dialogs within the Reading

Tutor.) We claim that skill specific human computer multimodal dialogs, based on cognitive skill

models and successful human tutoring strategies, can improve elementary students' reading abilities.

Related Research

Beginning Reading. Many factors involved in achieving competence in early reading. For poor

readers, word recognition skills are critical (Ehrlich 1993, Stanovich 1991). For good readers, other

factors including metacognitive skills and motivation are also important:

“Basic word decoding and perceptual skills are necessary in order to read; if a child lacks these

cognitive skills, even the most adaptive attribution and self-efficacy beliefs will not magically reveal the

meaning behind the text. Thus for poor readers, word decoding skill is highly related to comprehension
Gregory Aist Page 4 of 27 11/16/2010

ability. In contrast, for good readers who possess adequate decoding skills, motivational variables such

as perceived competence emerge as influential factors determining reading performance.” (Ehrlich

1993). In addition to predicting immediate ability, poor word decoding skills are a good predictor of

long-term reading difficulties (Ehrlich 1993).

Beyond word recognition, fluent reading relies on lower-level cognitive skills such as symbol-naming

ability (Bowers 1993). Phonological awareness is also a factor, but it is not clear whether this is an

result of word-recognition skills or an independent contribution to reading success (Bowers 1993).

Individual differences also play a role in achieving reading fluency. For example, while some poor

readers learn better with instruction including “Listening Previewing”, or hearing a passage read aloud

while following along in the text (Daly and Martens 1994), some learn better without previewing

(Tingstrom 1995).

What role do segmentation skills play in beginning reading? Nation and Hulme (1997) found that

phonemic segmentation predicts early reading and spelling skills more than onset-rime segmentation.

Peterson and Haines (1992) found that training kindergarten children to construct rhyming words from

onsets and rimes improved children's segmentation ability, letter-sound knowledge, and ability to read

words by analogy.

What about selection of material? Rosenhouse et al. (1997) found that interactive reading aloud to first-

graders led to increases in decoding, passage comprehension, and picture storytelling. Rosenhouse et al.

also found that reading serial stories (stories with the same characters and moderately predictable plots

or conflicts) had a positive impact on the number of books bought for pleasure reading.

A review of the literature by Roller (1994) reveals that teachers interrupt poor readers more frequently

than good readers, but which comes first (poor reading or interruption) is not clear. Also, with good

readers, more emphasis is placed on meaning. Again, the reasons and causal relationships are unclear.
Gregory Aist Page 5 of 27 11/16/2010

Because of the importance of word recognition in learning to read, reading software should encourage

the development of word decoding skills and aim to increase the sight vocabulary of the student.

However, since motivational variables become important for good readers, and the goal of the software

is to enable poor readers to become good readers, reading software should ideally also encourage the

development of positive motivational attitudes towards reading and provide experiences that let the

student experience the joy of reading.

Educational software for reading. While the literature on intelligent tutoring systems is quite

substantial, we will focus here on automated reading tutors. Commercial reading systems provide help

on demand, and some (Vanderbilt 1996) even provide the opportunity for students to record their own

readings of the material. Use of speech recognition in reading systems is much more rare and is at the

research stage.

Some reading software systems provide spoken assistance on demand (Discis 1991), (Edmark 1995),

(Learning Company 1995), and the use of synthesized speech has been explored in a research context

(Lundberg and Olofsson1993). The Little Planet Literacy Series (Vanderbilt 1996) provides help on

demand and allows children to record their own voices, but it does not use speech recognition and

therefore cannot judge the quality of the child's reading or offer help based on such a judgement.

Previous research has demonstrated that mouse- or keyboard-oriented computer-assisted instruction can

improve reading skills such as phonological awareness and word identification (e.g. Barker and

Torgesen 1995).

Lewin (1998) in a study of “talking book” software, found that such software was typically used with

students in pairs (82%) or as individuals (29%), or occasionally with students in larger groups (11%).

Pairs or groups were more commonly normally progressing readers, perhaps to ensure that all students
Gregory Aist Page 6 of 27 11/16/2010

were able to use the software while allowing poorer readers more individual time. Teachers requested

additional feedback from the software, such as onset and rime, and “hints”. Teachers also requested

more "reinforcement activities" for particular skills . Most teachers, however, made little use of records

of which words kids clicked on.

As a cautionary note, in software filled with animated or talking characters, children may spend large

amounts of time clicking on the characters to see the animation (Underwood and Underwood 1998), to

the detriment of time spent reading.

What is it that distinguishes human tutoring from most reading software? One fundamental difference is

that in human tutoring, to one extent or another, the student produces the desired sounds (e.g.

pronouncing an unfamiliar word), instead of recognizing them. Not only does the student make the

sound; the student makes the effort to make the sound. By contrast, in reading software that does not

listen, students may be restricted to receptive activities such as matching up rhyming words, or to

constructive activities that redirect what would normally be speech into some other medium, such as

putting blocks together to make words.

Automatic speech recognition (Huang 1993), while it has appeared in other language-related educational

systems (such as single-word foreign language pronunciation training, and speech pathology software),

is still a rarity in reading software (but see Edmark 1997). DRA Malvern has developed a system called

STAR (Speech Training Aid) that listens to isolated words without context (Russell 1996). Russell et al.

(1996) also describe an ongoing research effort with aims similar to those of Project LISTEN, the

Talking and Listening Book project, but they use word spotting techniques to listen for a single word at

a time. They also require the child to decide when to move on to the next word (fully user-initiated) or

completely reserve that choice to the system (fully system-initiated). For other systems using speech

recognition with reading tutoring, see (Edmark 1997, IBM 1998).

Gregory Aist Page 7 of 27 11/16/2010

Research Context: Project LISTEN

Project LISTEN: A Reading Tutor that Listens. Project LISTEN’s Reading Tutor (Mostow and Aist

AAAI 1997, Mostow et al. 1995, Mostow et al. 1994, Mostow et al. 1993) adapts the Sphinx-II speech

recognizer (Huang et al. 1993) to listen to children read aloud. The Reading Tutor runs on a single

stand-alone Pentium™. The child uses a noise-cancelling headset or handset microphone and a mouse,

but not a keyboard. Roughly speaking, the Reading Tutor displays a sentence, listens to the child read it,

Figure 1. Reading Tutor, Fall 1997. (This screen Tutor looks nearly identical in the Fall

1998 Reading Tutor, with Reader relabeled Goodbye.)

Gregory Aist Page 8 of 27 11/16/2010

provides help in response to requests or on its own initiative based on student performance. (Aist 1997)

describes how the Reading Tutor decides when to go on to the next sentence.

The student can read a word aloud, read a sentence aloud, or read part of a sentence aloud. The student

can click on a word to get help on it. The student can click on Back to move to the previous sentence,

Help to request help on the sentence, or Go to move to the next sentence (Figure 1). The student can

click on Story to pick a different story, or on Goodbye to log out.

The Reading Tutor can choose from several communicative actions, involving digitized and synthesized

speech, graphics, and navigation (Aist and Mostow 1997). The Reading Tutor can provide help on a

word (e.g. by speaking the word), provide help on a sentence (e.g. by reading it aloud), backchannel

(“mm-hmm”), provide just-in-time help on using the system, and navigate (e.g. go on to the next

sentence). With speech awareness central to its design, interaction can be natural, compelling, and

effective (Mostow and Aist WPUI 1997).

Writing. The Reading Tutor has the capability, so far used mostly in the laboratory, to allow new stories

to be typed in and then recorded with automatic quality control on the recordings using automatic speech


Taking turns. People in general exhibit a rich variety of turn-taking behavior: interruption,

backchanneling, and multiple turns (Ayres et al. 1994, Duncan 1972, Sacks, Schegloff, and Jefferson

1974, Tannen 1984, Uljin 1995). Turn-taking is important in tutorial dialog as well (Fox 1993).

Humans use different conversational styles when speaking to computers than when speaking to humans.

Shorter sentences, a smaller vocabulary, fewer exchanges, fewer interruptions, and fewer justifications

of requests are all characteristic of human conversational style during human-computer spoken dialogue,

but it is not clear whether the difference is due to the (supposed or actual) identity of the interlocutor, or

the interlocutor's conversational style (Johnstone 1994). Apparently when computers behave similarly
Gregory Aist Page 9 of 27 11/16/2010

to humans, human-computer interaction more closely resembles human-human interaction (Johnstone


In general, spoken language systems follow strict turn-taking behavior, and even Wizard of Oz studies

tend to use a simple “my turn”—“your turn” approach to low-level discourse behavior in terms of their

generation abilities (Johnstone 1994). The Reading Tutor, however, employs a conversational

architecture (Aist 1998) that allows interruption by either the Tutor or the student, overlap,

backchanneling, and multiple turn-taking (cf. Donaldson and Cohen 1997, Keim, Fulkerson, and

Biermann 1997, Ward 1996, Ball 1997).

The nature of feedback. In prior work, the Reading Tutor has been designed to never tell the student

she was right, and to never tell the student she was wrong. Because of error in the speech recognition,

explicit right/wrong feedback would be incorrect sometimes, which might confuse the student.

Therefore, the Reading Tutor generally simply gives the correct answer and leaves the right/wrong

judgment partially as an exercise for the reader. This turns out to match well with the observation of

Weber and Shake (1988) that teachers’ rejoinders to student responses in comprehension discussions

were most frequently null or involved repetition of the student’s answer. Giving the correct answer may

seem like corrective feedback if the student was right, and may seem like confirmation if the student was

wrong. This interpretation depends on the ability of the student to contrast his or her answer with the

answer given by the system.

Visual design. Throughout the visual design of the Reading Tutor, we will continue to use buttons

labeled with both text and pictures, for maximum clarity and ease of use (King et al. 1996).

Skill: Word Attack

Description. For the first example skill, let us consider “word attack”, or decoding skills. Here the goal

is to learn how to take unknown words and turn them into sound. There are many stages in taking a
Gregory Aist Page 10 of 27 11/16/2010

word from printed symbol to speech:

1. Visual stimulus "cat"

2. Visual stimulus "c", "a", "t"

3. Orthographic symbols 'c', 'a', 't'

4. Phonological representation: phonemes /k/, /a/, /t/

5. Phoneme string /kat/

6. Sound of the word 'cat'

This skill concentrates on that part of the reading process that transforms letters into sounds – the

orthographic to phonological mapping (#4 above). One reasonable skill model is thus a probabilistic

context-sensitive unidirectional grapheme-to-phoneme mappings, at various levels of subword detail.

There are several sources for deriving the subword units: inferring them from English text, inferring

them from kids’ performance, or from the literature on subword components in reading.

Human tutoring strategies. What constitutes a successful human-human dialog for teaching word

attack? In a study of 30 college student-elementary student tutoring dyads, Juel (1996) analyzed

videotaped interactions for successful tutoring strategies.

Two activities were found to be particularly important in successful dyads: (a) the use of

texts that gradually and repetitively introduced both high-frequency vocabulary and

words with common spelling patterns, and (b) activities in which children were engaged

in direct letter-sound instruction. Two forms of verbal interactions were found to be

particularly important: (a) scaffolding of reading and writing, and (b) modeling of how
Gregory Aist Page 11 of 27 11/16/2010

to read and spell unknown words. A synergistic relationship was found to exist between

the form and content of instruction. (Juel 1996).

In this study, direct letter sound instruction included making index cards with words on them together.

Most of the children’s responses in cited dialogue are words that are answers to tutor's questions. Juel

also notes that successful pairs share "affection, bonding, and reinforcement."

Why are dialogs like those described by Juel successful? Perhaps, by sounding out real words, kids get

to practice decoding rules in the contexts in which they are used. Perhaps, by providing scaffolding,

human tutors keep kids from mislearning rules and provide correct examples of decoding.

Computer-human dialog. What would a computer-human dialog that successfully taught word attack

skills look like? One possibility is to adapt the current Reading Tutor sentence-reading dialog to a

word-list reading dialog. Using information about what rules are used to pronounce words, and using its

records of student performance, the Reading Tutor would select a grapheme-to-phoneme rule for the

student to practice. The Reading Tutor would present a list of words that involved a specified rule. The

student would then read each of these words, with the Reading Tutor focusing on modeling correct

sounding-out of the words to reinforce the particular rule. By placing the correct stimulus (the words)

on the screen, we would hope to reduce off-task or non-reading speech, and thus make the speech

recognition task feasible. What kind of scaffolding might the Reading Tutor provide during this task?

The Reading Tutor already provides help such as sounding out words and providing rhyming words. In

the future, the Reading Tutor might employ other help, such as orthographic units slightly apart as a

subtle aid to visually grouping letters together: “team” might get redisplayed briefly as “t ea m”.

Another possibility is for the Reading Tutor to dynamically construct short bits of text for the student to

read that, when read, get the student to “practice” sounding out a word: “t ea m. t ea m. team.”

Gregory Aist Page 12 of 27 11/16/2010

How would such a dialog be evaluated? We could test the effectiveness of practice on an individual rule

(say, d → /d/) by looking for improvements on real words or pseudo-words with that mapping. For

example, students could be presented with two pseudo-words to read, where the items are identical

except that one contains the rule used during training and the other contains a different rule. Besides

pedagogical effectiveness, the dialog must also be understandable to students and fun enough to get

them to participate.

Expected Contribution. Successful design of a spoken dialog to teach word attack skills would be an

important achievement in the field of reading education. In addition, such a dialog is expected to raise

important questions about integrating intelligent tutoring systems with speech recognition, two fields

that are ripe for combination.

Skill: Word Comprehension

Description. Once a word has been decoded (or recognized, if it is a familiar word), a student must be

able to access the meaning of the word in order to understand the sentence the word is in. What is the

goal of training word comprehension skills? Essentially, vocabulary growth – kids should learn the

meaning, spelling, pronunciation, and usage of new words.

Human tutoring strategies. What are techniques that work when human tutors help students learn new

words? For beginning readers, many words may be learned through written context, by having stories

read and re-read to them (Eller et al. 1988). In order for children to encounter many new words,

however, they may need to read material hard enough to traditionally be considered at their frustration
Gregory Aist Page 13 of 27 11/16/2010

level (Carver 1994) – and children may not choose material this difficult on their own, either in

traditional free reading time or with computerized instruction.

Human tutors also introduce and explain new vocabulary, and help students practice spellings of words.

When should definitions or other word specific comprehension assistance be presented? For high school

readers, Memory (1990) suggests that the time of instruction (before, during, or after the reading

passage) for teaching technical vocabulary may not matter as much as the manner of instruction. The

implication is that the Reading Tutor may be able to choose when to present a definition or other word

comprehension help at several different times without substantially harming the student’s ability to learn

from the assistance.

What kind of word-specific comprehension should be given? Definitions, in particular context-specific

definitions, are one obvious candidate. What are some other options? Example sentences may be of

some help (Scott and Nagy 1997), but learning new words from definitions is still very hard even with

example sentences.

Which words should be explicitly taught? Zechmeister et al. (1995) suggests that explicit vocabulary

instruction be focused on functionally important words, which they operationalize as main entries in a

medium-sized dictionary.

Human-computer dialog. Here a direct approach to generating a human-computer dialog that captures

the essence of human tutorial strategies – such as constructing a dialog where the Reading Tutor

interactively explains the meaning of new words, or augmenting stories with specially written, context-

sensitive definitions – results in excessive requirements for curriculum design or is beyond the state of

the art in spoken dialog.

How can we design an alternate interaction that places fewer requirements on instructional content and

speech recognition? One possibility is to adapt the Writing capability of the Reading Tutor to allow kids
Gregory Aist Page 14 of 27 11/16/2010

to build up a “My Words” portfolio that serves as an explicit record of the student’s growing vocabulary.

The trick here is to try to get the classroom teacher involved in the process by commenting on, and

indirectly supervising, the student’s production of this portfolio. Words could be selected for inclusion

in the My Words portfolio when the student encountered them in stories read with the Reading Tutor. In

order to get students to encounter new words, the Tutor could influence or partly control the student’s

choice of stories to read in order to steer the student towards harder material. Part of adding a word to

“My Words” might include practice sessions for spelling, or reading sentences that contained that word

drawn from the stories used by the Reading Tutor.

Should the computer or the student take the initiative in selecting word specific comprehension

assistance? A study by Reinking and Rickman (1990) indicated that mandatory computer presentation of

context specific definitions was better than offline access. Students selected computer presentation was

also better than offline access, and mandatory computer presentation was better than student selected

computer presentation but not significantly. Perhaps a mixed-initiative strategy, where the Reading

Tutor selects extra comprehension activities for some "hard" words but students can choose to pursue

them for other words as well, would allow students to feel in control while ensuring some additional

comprehension assistance.

Evaluation. How would such a portfolio-based interaction be evaluated?

As kids read stories, the Reading Tutor would seek to identify new, unusual, or difficult words. Some

percentage of those words might be selected for "my words"; the rest would be used as control

condition. Students would build up a “My Words” list by reading words and drawing pictures of them.

(The resulting objects could be used later in illustrating stories). For each word in “My Words”,

students would write definitions and sample sentences for the words - expository writing, to harness

kids' communicative intent in order to "get words right". The underlying message here is that learning
Gregory Aist Page 15 of 27 11/16/2010

new words is important because they help you read and write interesting stories – and students should

learn the importance of new words through experience.

How could such an interaction be evaluated? One method would be to determine if students are better at

recognizing and understanding the words they've put into “My Words”, as compared to other words that

were in the stories they read, but were not selected for “my words”. Several factors make evaluation of

this sort of interaction difficult, however. Even if spending time on “My Words” led to improvement in

recognizing and understanding the words it contained, it might still be the case that that time might have

been better spent reading (different, and challenging) stories. The evaluation for this interaction will

have to address these issues.

Expected Contribution. A successful human-computer dialog for teaching word comprehension as

described above would “close the gap” between reading, writing, and illustrating in the Reading Tutor,

allowing students to participate in several literacy activities in a coherent instructional situation. One

expected technical contribution is the development of automated techniques for the Reading Tutor to

pick out difficult words – from any material -- for students to spend more time on, thus extending

comprehension assistance to student-authored material. In addition, the presence of “My Words” as a

writing (and therefore narrating) exercise would bring the “authoring” capability of the Reading Tutor to

use in real classrooms by students. Another contribution would be the use of speech recognition –

constrained to be less difficult than full-scale dictation – in a constructive learning activity, another

bridge between spoken dialog systems and instructional tutoring systems.

Gregory Aist Page 16 of 27 11/16/2010

Skill: Passage Comprehension

Description. Being able to read and understand individual words is not by itself sufficient. More is

needed to be a skilled reader. What is the goal of teaching passage comprehension? Reading with

understanding – making meaning out of print.

Human tutoring strategies. While a student is reading, human tutors supply words in response to pauses,

interrupt students to engage in phonologically motivated interventions, complete students' words, and

provide backchannel feedback (e.g. Oops!) (Roller 1994). One of the simplest strategies for improved

comprehension is rereading. Reading a text twice increases retention of facts (Barnett and Seefeldt


Other skills play a role in passage comprehension. For example, word recognition is a good predictor of

reading comprehension, especially for poor readers (Ehrlich et al. 1993). One skill of good readers is

adjusting reading rate to passage difficulty (Freese 1997). Providing prior information about relevant

subject matter may improve comprehension of text about unfamiliar topics (Stahl and Jacobson 1986).

However, improving comprehension on a particular passage is not necessarily the same as helping the

student learn how to comprehend other passages better.

What about pictures as an aid to comprehension? Pictures have appeal for young students, but may not

teach skills that are transferable to text without pictures. For pictures to improve passage

comprehension, O’Keefe and Solman (1987) suggest that pictures must be displayed simultaneously

with the text they depict, rather than prior to reading the text or after reading the text.

Human-computer dialog. Human tutors teach comprehension in a variety of ways, but the current

Reading Tutor interaction is based on one-on-one oral reading tutoring. The resulting dialog is called

“shared reading”, where the student reads wherever possible and the computer reads wherever
Gregory Aist Page 17 of 27 11/16/2010

necessary. How can shared reading be improved? The current Reading Tutor allows kids free choice of

all material on the system. Some kids make what appear to be poor choices -- reading material that's too

easy, or reading the same story over and over again. During a 1996-1997 pilot study, six third-grade

children who started out below grade level gained almost two years in fluency in only eight months of

Reading Tutor use (Mostow and Aist WPUI 1997). The aide (Ms. Brooks) who helped kids pick stories

most likely played a role in the observed two-year fluency gain. In order to improve the shared reading

process, perhaps the Reading Tutor should do better at helping kids pick appropriate material. One

possibility here are "curriculum adjustment" tools, where the teacher or some administrator gets to adjust

an individual kid's reading list. However, we have found it difficult to involve teachers in directly using

the Reading Tutor for such things as entering student data; they seem to prefer interacting with the

Reading Tutor indirectly. For example, one teacher put a list of stories to read on an 3x5 index card and

set the card on top of the Reading Tutor monitor. So, an alternate approach is to have the Reading Tutor

“filter” or “sort” the list of stories that students can access to restrict or guide students to appropriate

material. This approach would have to be balanced to allow the teacher to “indirectly” use the computer

through influencing her students’ choices, if desired.

Evaluation. How should a revised story selection scheme be evaluated? Perhaps the most basic

question is whether kids will tolerate the interaction design that lets the Tutor affect story choice. In

terms of pedagogical evaluation, the question of appropriate material has at least two subcomponents: is

the story of the right difficulty? and, is the story interesting for that kid at that moment?) In addition,

since teachers may affect student choice of stories, teacher choices may affect the results of evaluation

of Tutor-mediated story selection. One possibility is to separate out good story choices from poor

choices, based on records of kids' performance. Note that this is a tricky proposition. What should we

assume makes a “good” choice or a “poor” choice? An acceptable range of fluency on that story?
Gregory Aist Page 18 of 27 11/16/2010

Should we seek an expert consensus on which stories are obviously too hard or obviously too easy? If

so, by watching videotapes of the student reading the story, or by prediction from the material? One

exciting alternate possibility is to test the goodness of the story choice based on the actual learning from

that story. It is an open question as to whether or not we can detect learning at that fine of a level.

Expected Contribution. By modifying the Reading Tutor to better guide the story choices kids make,

we would hope to improve the overall effectiveness of the Reading Tutor. In addition, using the results

of speech recognition – in the form of a student model – to guide students’ selection of stories from an

open-ended set of material forges an important link between automatic speech recognition and

intelligent tutoring systems.


We propose to enhance Project LISTEN’s Reading Tutor by developing skill-specific human-computer

spoken dialogs that train elementary students in fundamental reading skills. Reading is fundamental,

and the time is ripe in intelligent tutoring systems and in spoken dialog systems to address beginning

reading using techniques from both fields and using the issues that arise to build bridges between the

two fields. The major expected scientific contributions are an improved Reading Tutor and a better

understanding of how spoken language technology can be used with intelligent tutoring systems. We

have briefly described relevant research both in reading and in automated reading tutoring. We have

described the work to date on Project LISTEN’s Reading Tutor, which will also serve as the platform for

the proposed work. We have described three example skills to train: word attack, word comprehension,

and passage comprehension. For each skill we have discussed how human tutors train that skill, how a

human-computer spoken dialog might train that skill, and how such a dialog might be evaluated. Finally

we will conclude with a projected timeline for completion.

Gregory Aist Page 19 of 27 11/16/2010

Proposed Schedule

August 1998: Proposal

(September 1998 - December 1998: visiting Macquarie University, Sydney, Australia)

January 1999 - May 1999: Collect and analyze videotapes and transcripts of successful human-human

dialogs, perhaps by courtesy of other researchers; collect examples and descriptions of successful human

tutoring strategies from the literature

June 1999 - December 1999: Design and implement computer-human dialogs that capture the important

characteristics of successful human-human tutoring dialogs

January 2000 - May 2000: Usability testing and code hardening

June 2000 - May 2001: Evaluation and writeup

May 2001: Defense.


This material is based upon work supported in part by the National Science Foundation under Grant No.

IRI-9505156 and CDA-9616546 and by the author's National Science Foundation Graduate Fellowship

and Harvey Fellowship. Any opinions, findings, conclusions, or recommendations expressed in this

publication are those of the author(s) and do not necessarily reflect the views of the National Science

Foundation or the official policies, either expressed or implied, of the sponsors or of the United States

Gregory Aist Page 20 of 27 11/16/2010

We thank first our committee members; the Principal of Fort Pitt Elementary School, Dr. Gayle Griffin,

and the teachers at Fort Pitt for their assistance; Drs. Rollanda O'Connor and Leslie Thyberg for their

expertise on reading; Raj Reddy and the CMU Speech Group (especially Ravi Mosur) for the Sphinx-II

speech recognizer; and the many present and past members of Project LISTEN whose work contributed

to current and previous versions of the Reading Tutor; and many students, educators, and parents for

tests of the Reading Tutor tests in our lab and at Fort Pitt Elementary School.


Aist, G. S. 1998. Expanding a time-sensitive conversational architecture for turn-taking to handle

content-driven interruption. To appear in ICSLP 1998, Sydney, Australia.

Aist, G. S. 1997. Challenges for a Mixed Initiative Spoken Dialog System for Oral Reading Tutoring.

AAAI 1997 Spring Symposium on Computational Models for Mixed Initiative Interaction. AAAI

Technical Report SS-97-04.

Aist, G. S., and Mostow, J. 1997. Adapting Human Tutorial Interventions for a Reading Tutor that

Listens: Using Continuous Speech Recognition in Interactive Educational Multimedia. In Proceedings

of CALL 97: Theory and Practice of Multimedia in Computer Assisted Language Learning. Exeter, UK.

Ayres, J., Hopf, T., Brown, K, and Suek, J. M. 1994. The Impact of Communication Apprehension,

Gender, and Time on Turn-Taking Behavior in Initial Interactions. The Southern Communication

Journal, 59(2):142-152.
Gregory Aist Page 21 of 27 11/16/2010

Ball, G. 1997. Dialogue Initiative in a Web Assistant. AAAI 1997 Spring Symposium on

Computational Models for Mixed Initiative Interaction.

Barker, Theodore Allen, and Torgesen, Joseph K. 1995. An evaluation of computer-assisted instruction

in phonological awareness with below average readers. Journal of Educational Computing Research

13(1), pp. 89-103.

Barnett, Jerrold E., and Seefeldt, Richard W. 1989. Read something once, why read it again?: Repetitive

reading and recall. Journal of Reading Behavior 21(4), pp. 351-360.

Bowers, P. G. 1993. Text reading and rereading: Determinants of fluency beyond word recognition.

Journal of Reading Behavior 25(2) 133-153.

Carver, Ronald P. 1994. Percentage of unknown vocabulary words in text as a function of the relative

difficulty of the text: Implications for instruction. Journal of Reading Behavior 26(4) pp. 413-437.

Daly, E.J., and Martens, B. K. 1994. A comparison of three interventions for increasing oral reading

performance: application of the instructional hierarchy. Journal of Applied Behavior Analysis 27(3)


Discis Knowledge Research Inc.. 1991. DISCIS Books. Macintosh software for computer-assisted

Gregory Aist Page 22 of 27 11/16/2010

Donaldson, T. and Cohen, R. 1997. A Constraint Satisfaction Framework for Managing Mixed-

Initiative Discourse. AAAI 1997 Spring Symposium on Computational Models for Mixed Initiative


Duncan, S. 1972. Some signals and rules for taking speaking turns in conversations. Journal of

Personality and Social Psychology 23(2):283-292.

Edmark. 1995. Bailey's Book House.

Edmark. 1997. Let’s Go Read.

Ehrlich, Marie-France, Kurtz-Costes, Beth, Loridant, Catherine. 1993. Cognitive and motivational

determinants of reading comprehension in good and poor readers. Journal of Reading Behavior 25(4),

pp. 365-381.

Eller, Rebecca G., Pappas, Christine C., and Brown, Elga. 1988. The lexical development of

kindergarteners: Learning from written context. Journal of Reading Behavior 20(1), pp. 5-24.

Fox, B. A. 1993. The Human Tutorial Dialogue Project: Issues in the Design of Instructional Systems.

Hillsdale NJ: Lawrence Erlbaum.

Freese, Anne Reilley. 1997. Reading rate and comprehension: Implications for designing computer

technology to facilitate reading comprehension. Computer Assisted Language Learning 10(4), pp. 311-
Gregory Aist Page 23 of 27 11/16/2010


Huang, X. D., Alleva, F., Hon, H. W., Hwang, M. Y., Lee, K. F., and Rosenfeld, R. 1993. The Sphinx-

II Speech Recognition System: An Overview. Computer Speech and Language 7(2):137-148.

IBM. 1998. Watch Me Read.

Johnstone, A., Berry, U., Nguyen, T., and Asper, A. 1994. There was a long pause: Influencing turn-

taking behaviour in human-human and human-computer spoken dialogues. International Journal of

Human-Computer Studies 41, 383-411.

Juel, Connie. 1996. What makes literacy tutoring effective? Reading Research Quarterly 31(3), pp. 268-


Keim, G. A., Fulkerson, M. S., and Biermann, A. W. 1997. Initiative in Tutorial Dialogue Systems.

AAAI 1997 Spring Symposium on Computational Models for Mixed Initiative Interaction.

King, Kira S., Boling, Elizabeth; Anneli, Janet; Bray, Marty; Cardenas, Dulce; and Theodore Frick.

1996. Relative perceptibility of hypercard buttons using pictorial symbols and text labels. Journal of

Educational Computing Research 14(1), pp. 67-81.

The Learning Company. 1995. Reader Rabbit’s® Interactive Reading Journey™.

Gregory Aist Page 24 of 27 11/16/2010

Lewin, Cathy. 1998. Talking book design: What do practitioners want? Computers in Education

30(1/2), pp. 87-94.

Lundberg, I., and Olofsson, A. 1993. Can computer speech support reading comprehension?

Computers in Human Behavior 9(2-3), 283-293.

Memory, David M. 1990. Teaching technical vocabulary: Before, during or after the reading

assignment? Journal of Reading Behavior 22(1), pp. 39-53.

Mostow, J., Hauptmann, A. G., Chase, L. L., and Roth. S. 1993. Towards a Reading Coach that

Listens: Automatic Detection of Oral Reading Errors. In Proceedings of the Eleventh National

Conference on Artificial Intelligence (AAAI-93), 392-397. Washington DC: American Association for

Artificial Intelligence.

Mostow, J., Roth, S. F., Hauptmann, A. G., and Kane, M. 1994. A Prototype Reading Coach that

Listens. In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle

WA. Selected as the AAAI-94 Outstanding Paper.

Mostow, J., Hauptmann, A., and Roth, S. F. 1995. Demonstration of a Reading Coach that Listens. In

Proceedings of the Eighth Annual Symposium on User Interface Software and Technology, Pittsburgh

PA. Sponsored by ACM SIGGRAPH and SIGCHI in cooperation with SIGSOFT.

Mostow, J., and Aist, G. S. 1997. The Sounds of Silence: Towards Automatic Evaluation of Student
Gregory Aist Page 25 of 27 11/16/2010

Learning in a Reading Tutor that Listens. In Proceedings of the 1997 National Conference on Artificial

Intelligence (AAAI 97), pages 355-361.

Mostow, J., and Aist, G. S. 1997. When Speech Input is Not an Afterthought: A Reading Tutor that

Listens. Workshop on Perceptual User Interfaces, Banff, Alberta, Canada, October 1997.

Nation, K., and Hulme, C. 1997. Phonemic segmentation, not onset-rime segmentation, predicts early

reading and spelling skills. Reading Research Quarterly 32(2) pp. 154-167.

O'Keefe, Elizabeth J., and Solman, Robert T. 1987. The influence of illustrations on children's

comprehension of written stories. Journal of Reading Behavior 19(4), pp. 353-377.

Peterson, Margareth E. and Haines, Leonard P. 1992. Orthographic analogy training with kindergarten

children: Effects on analogy use, phonemic segmentation, and letter-sound knowledge. pp. 109-127.

Reinking, David, and Rickman, Sharon Salmon. 1990. The effects of computer-mediated texts on the

vocabulary learning and comprehension of intermediate-grade learners. Journal of Reading Behavior

22(4), pp. 395-411.

Roller, C. 1994. Teacher-student interaction during oral reading and rereading. Journal of Reading

Behavior 26(2), pp. 191-209.

Rosenhouse, J., Feitelson, D., Kita, B., and Goldstein, Z. 1997. Interactive reading aloud to Israeli first

graders: Its contribution to literacy development. Reading Research Quarterly 32(2), pp. 168-183.
Gregory Aist Page 26 of 27 11/16/2010

Russell, M., Brown, C., Skilling, A., Series, R., Wallace, J., Bohnam, B., and Barker, P. 1996.

Applications of Automatic Speech Recognition to Speech and Language Development in Young

Children. In Proceedings of the Fourth International Conference on Spoken Language Processing,

Philadelphia PA.

Sacks, H., Schegloff, E. A., and Jefferson, G. 1974. A simplest systematics for the organization of turn-

taking for conversation. Language 50(4): 696-735.

Scott, Judith A., and Nagy, William E. 1997. Understanding the definitions of unfamiliar verbs.

Reading Research Quarterly 32(2), pp. 184-200.

Stahl, Steven A., and Jacobson, Michael G. 1986. Vocabulary difficulty, pior knowledge, and text

comprehension. Journal of Reading Behavior 18(4), pp. 309-323.

Stanovich, K. E. 1991. Word recognition: Changing perspectives. In Handbook of Reading Research

vol. 2, p. 418 – 452.

Tannen, D. 1984. Conversational style: Analyzing talk among friends. Norwood NJ: Ablex.

Tingstrom, D. H., Edwards, R. P., and Olmi, D. J. 1995. Listening previewing in reading to read:

Relative effects on oral reading fluency. Psychology in the Schools 32, 318-327.

Ulijn, J. M., and Li, X. 1995. Is interrupting impolite? Some temporal aspects of turn-taking in
Gregory Aist Page 27 of 27 11/16/2010

Chinese-Western and other intercultural business encounters. Text 15(4): 598-627.

Underwood, Geoffrey, and Underwood, Jean D. M. 1998. Children's interactions and learning outcomes

with interactive talking books. Computers in Education 30(1/2), pp. 95-102.

Vanderbilt (The Cognition and Technology Group at Vanderbilt). 1996. A multimedia literacy series.

Communications of the ACM 39(6), 106-109. Now a commercial product from Little Planet Publishing,

Nashville TN.

Ward, N. 1996. Using Prosodic Clues to Decide When to Produce Back-channel Utterances. In

Proceedings of the 1996 International Symposium on Spoken Dialogue, pages 1728-1731, Philadelphia


Weber, Rose-Marie, and Shake, Mary C. 1988. Teachers' rejoinders to students' responses in reading

lessons. Journal of Reading Behavior 20(4), pp. 285-299.

Zechmeister, E. B., Chronis, A. M., Cull, W. L., D'Anna, C. A., and Healy, N. A. 1995. Growth of a

functionally important lexicon. Journal of Reading Behavior 27(2), pp. 201-212.