# Welcome back

## Find a book, put up your feet, stay awhile

Sign in with Facebook

Sorry, we are unable to log you in via Facebook at this time. Please try again later.

or

Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more

Download

Standard view

Full view

of .

Look up keyword

Like this

Share on social networks

2Activity

×

0 of .

Results for: No results containing your search query

P. 1

A Machine Learning Framework for Spoken-dialogRatings: (0)|Views: 29|Likes: 1

Published by Serge

One of the key tasks in the design of large-scale dialog systems is classiﬁcation. This consists of assigning, out of a ﬁnite set, a speciﬁc category to each spoken utterance, based on the output of a speech recognizer. Classiﬁcation in general is a standard machine learning problem, but the objects to classify in this particular case are word lattices, or weighted automata, and not the ﬁxed-size vectors learning algorithms were originally designed for. This chapter presents a general kernel-based learning framework for the design of classiﬁcation algorithms for weighted automata. It introduces a family of kernels, rational kernels, that combined with support vector machines form powerful techniques for spoken-dialog classiﬁcation and other classiﬁcation tasks in text and speech processing. It describes efﬁcient algorithms for their computation and reports the results of their use in several difﬁcult spoken-dialog classiﬁcation tasks based on deployed systems. Our results show that rational kernels are easy to design and implement and lead to substantial improvements of the classiﬁcation accuracy. The chapter also provides some theoretical results helpful for the design of rational kernels.

One of the key tasks in the design of large-scale dialog systems is classiﬁcation. This consists of assigning, out of a ﬁnite set, a speciﬁc category to each spoken utterance, based on the output of a speech recognizer. Classiﬁcation in general is a standard machine learning problem, but the objects to classify in this particular case are word lattices, or weighted automata, and not the ﬁxed-size vectors learning algorithms were originally designed for. This chapter presents a general kernel-based learning framework for the design of classiﬁcation algorithms for weighted automata. It introduces a family of kernels, rational kernels, that combined with support vector machines form powerful techniques for spoken-dialog classiﬁcation and other classiﬁcation tasks in text and speech processing. It describes efﬁcient algorithms for their computation and reports the results of their use in several difﬁcult spoken-dialog classiﬁcation tasks based on deployed systems. Our results show that rational kernels are easy to design and implement and lead to substantial improvements of the classiﬁcation accuracy. The chapter also provides some theoretical results helpful for the design of rational kernels.

See more

See less

https://www.scribd.com/doc/7881634/A-Machine-Learning-Framework-for-Spoken-dialog

05/09/2014

text

original

Springer Handbook on Speech Processing and Speech Communication 1

A MACHINE LEARNING FRAMEWORK FOR SPOKEN-DIALOGCLASSIFICATION

Corinna Cortes

11

Google Research76 Ninth AvenueNew York, NY 10011

corinna@google.com

Patrick Haffner

22

AT&T Labs – Research180 Park AvenueFlorham Park, NJ 07932

haffner@research.att.com

Mehryar Mohri

3

,

13

Courant Institute251 Mercer StreetNew York, NY 10012

mohri@cims.nyu.edu

ABSTRACT

One of the key tasks in the design of large-scale di-alog systems is classiﬁcation. This consists of as-signing, out of a ﬁnite set, a speciﬁc category to eachspokenutterance,basedontheoutputofaspeechrec-ognizer. Classiﬁcation in general is a standard ma-chine learning problem, but the objects to classifyin this particular case are word lattices, or weightedautomata, and not the ﬁxed-size vectors learning al-gorithms were originally designed for. This chap-ter presents a general kernel-based learning frame-work for the design of classiﬁcation algorithms forweighted automata. It introduces a family of kernels,

rational kernels

, that combined with support vec-tor machines form powerful techniques for spoken-dialog classiﬁcation and other classiﬁcation tasks intext and speech processing. It describes efﬁcient al-gorithmsfor their computationandreports the resultsof their use in several difﬁcult spoken-dialog classi-ﬁcation tasks based on deployed systems. Our re-sults show that rational kernelsare easy to design andimplement and lead to substantial improvements of the classiﬁcation accuracy. The chapter also providessome theoretical results helpful for the design of ra-tional kernels.

1. MOTIVATION

A critical problem for the design of large-scalespoken-dialog systems is to assign a category, out of a ﬁnite set, to each spoken utterance. These cate-gories help guide the dialog manager in formulatinga response to the speaker. The choice of categoriesdepends on the application, they could be for exam-ple

referral

or

pre-certiﬁcation

fora health-carecom-pany dialog system, or

billing services

or

credit

foran operator-service system.To determine the category of a spoken utterance,one needs to analyze the output of a speech recog-nizer. Figure 1 is taken from a customer-care appli-cation. It illustrates the output of a state-of-the-artspeech recognizer in a very simple case where thespoken utterance is ”Hi, this is my number”. Theoutput is an acyclic weighted automaton called a

word lattice

. It compactly represents the recognizer’sbest guesses. Each path is labeled with a sequenceof words and has a score obtained by summing theweights of the constituent transitions. The path withthe lowest score is the recognizer’s best guess, in thiscase ”I’d like my card number”.This example make evident that the error rate of conversational speech recognition systems is still toohigh in many tasks to rely only on the one-best out-put of the recognizer. Instead, one can use the fullword lattice which contains the correct transcriptionin most cases. This is indeed the case in Figure 1,since thetoppathis labeledwiththecorrectsentence.Thus, in this chapter, spoken-dialog classiﬁcation isformulated as the problem of assigning a category toeach word lattice.Classiﬁcation in general is a standard machinelearning problem. A classiﬁcation algorithmreceivesa ﬁnite number of labeled examples which it uses fortraining, and selects a hypothesis expected to makefew errors on future examples. For the design of modern spoken-dialog systems, this training sample

Springer Handbook on Speech Processing and Speech Communication 2

01hi/80.762I/47.363I’d/143.34this/16.85this/153.06this/90.37like/41.58is/70.9710is/22.3687is/71.16is/77.688Mike/192.59my/19.215my/63.09uh/83.34hard/22card/20.157number/34.56

Figure1: Wordlattice outputofaspeechrecognitionsystem forthespokenutterance“Hi, this is mynumber”.is often available. It is the result of careful humanlabeling of spoken utterances with a ﬁnite number of pre-determined categories of the type already men-tioned.But, most classiﬁcation algorithms were origi-nally designed to classify ﬁxed-size vectors. The ob- jects to analyze for spoken-dialog classiﬁcation areword lattices, each a collection of a large numberof sentences with some weight or probability. Howcan standard classiﬁcation algorithms such as sup-port vector machines [Cortes and Vapnik, 1995] beextended to handle such objects?This chapter presents a general framework andsolution for this problem, which is based on

kernelsmethods

[Boser et al., 1992, Sch¨olkopf and Smola,2002]. Thus, we shall start with a brief introductionto kernel methods (Section 2). Section 3 will thenpresent a kernel framework,

rational kernels

, that isappropriate for word lattices and other weighted au-tomata. Efﬁcient algorithms for the computation of these kernels will be described in Section 4. Wealso report the results of our experiments using thesemethods in several difﬁcult large-vocabularyspoken-dialogclassiﬁcation tasks based ondeployedsystemsin Section 5. There are several theoretical resultsthat canguidethe designofkernelsforspoken-dialogclassiﬁcation. These results are discussed in Sec-tion 6.

2. INTRODUCTION TO KERNEL METHODS

Let us start with a very simple two-group classi-ﬁcation problem illustrated by Figure 2 where onewishes to distinguish two populations, the blue andred circles. In this very simple example, one canchoose a hyperplane to correctly separate the twopopulations. But, there are inﬁnitely many choicesfor the selection of that hyperplane. There is goodtheory though supporting the choice of the hyper-plane that maximizes the

margin

, that is the distancebetween each population and the separating hyper-plane. Indeed, let

F

denote the class of real-valuedfunctions on the ball of radius

R

in

R

N

:

F

=

{

x

→

w

·

x

:

w

≤

1

,

x

≤

R

}

.

(1)Then, it can be shown [Bartlett and Shawe-Taylor,1999] that there is a constant

c

such that, for all dis-tributions

D

over

X

, with probabilityat least

1

−

δ

, if a classiﬁer

sgn(

f

)

, with

f

∈ F

, has margin at least

ρ

over

m

independently generated training examples,then the generalization error of

sgn(

f

)

, or error onany future example, is no more than

cm

R

2

ρ

2

log

2

m

+ log1

δ

.

(2)This bound justiﬁes large-margin classiﬁcation algo-rithms such as support vector machines (SVMs). Let

w

·

x

+

b

= 0

be the equation of the hyperplane,where

w

∈

R

N

is a vector normal to the hyperplaneand

b

∈

R

a scalaroffset. Theclassiﬁer

sgn(

h

)

corre-sponding to this hyperplane is unique and can be de-ﬁned with respect to the training points

x

1

,...,x

m

:

h

(

x

) =

w

·

x

+

b

=

m

i

=1

α

i

(

x

i

·

x

) +

b,

(3)

Springer Handbook on Speech Processing and Speech Communication 3

(a) (b)Figure 2: Large-margin linear classiﬁcation. (a) An arbitrary hyperplane can be chosen to separate the twogroups. (b) The maximal-margin hyperplane provides better theoretical guarantees.where

α

i

s are real-valued coefﬁcients. The mainpoint we are interested in here is that both for theconstruction of the hypothesis and the later use of that hypothesis for classiﬁcation of new examples,one needs only to compute a number of dot productsbetween examples.In practice, non-linear separation of the trainingdata is often not possible. Figure 3(a) shows an ex-ample where any hyperplane crosses both popula-tions. However, one can use more complex functionsto separate the two sets as in Figure 3(b). One way todo that is to use a non-linear mapping

Φ :

X

→

F

fromtheinputspace

X

toahigher-dimensionalspace

F

where linear separation is possible.The dimension of

F

can truly be very large inpractice. For example, in the case of document clas-siﬁcation,onemayuseasfeatures,sequencesofthreeconsecutive words (trigrams). Thus, with a vocabu-lary of just 100,000 words, the dimension of the fea-ture space

F

is

10

15

. On the positive side, as in-dicated by the error bound of Equation 2, the gen-eralization ability of large-margin classiﬁers such asSVMs does not depend on the dimension of the fea-turespacebutonlyonthemargin

ρ

andthenumberof training examples

m

. However, taking a large num-ber of dot products in a very high-dimensional spaceto deﬁne the hyperplane may be very costly.A solution to this problem is to use the so-called’kernel trick’ or

kernel methods

. The idea is to deﬁnea function

K

:

X

×

X

→

R

called a

kernel

, suchthat the kernel function on two examples

x

and

y

ininput space,

K

(

x,y

)

, is equal to the dot product of two examples

Φ(

x

)

and

Φ(

y

)

in feature space:

∀

x,y

∈

X, K

(

x,y

) = Φ(

x

)

·

Φ(

y

)

.

(4)

K

is often viewed as a similarity measure. A crucialadvantage of

K

is efﬁciency: there is no need any-more to deﬁne and explicitly compute

Φ(

x

)

,

Φ(

y

)

,and

Φ(

x

)

·

Φ(

y

)

. Another beneﬁt of

K

is ﬂexibil-ity:

K

can be arbitrarily chosen so long as the ex-istence of

Φ

is guaranteed, which is called Mercer’scondition. This condition is important to guaranteethe convergence of training for algorithms such asSVMs.

1

A condition equivalent to Mercer’s condition isthat the kernel

K

be

positive deﬁnite and sym-metric

, that is, in the discrete case, the matrix

(

K

(

x

i

,x

j

))

1

≤

i,j

≤

n

must be symmetric and positivesemi-deﬁnite for any choice of

n

points

x

1

,...,x

n

in

X

. Said differently, the matrix must be symmetricand its eigenvalues non-negative. Thus, for the prob-lem that we are interested in, the question is how todeﬁne positive deﬁnite symmetric kernels for wordlattices or weighted automata.

3. RATIONAL KERNELS

This section introduces a family of kernels forweighted automata,

rational kernels

. We will startwith some preliminary deﬁnitions of automata and

1

Some standard Mercer kernels over a vector space are thepolynomial kernels of degree

d

∈

N

,

K

d

(

x,y

) = (

x

·

y

+1)

d

, andGaussian kernels

K

σ

(

x,y

) = exp(

−

x

−

y

2

/σ

2

)

,

σ

∈

R

+

.

- Read and print without ads
- Download to keep your version
- Edit, email or read offline

Sign in with Facebook

Sorry, we are unable to log you in via Facebook at this time. Please try again later.

or

Password Reset Email Sent

Join with Facebook

Sorry, we are unable to log you in via Facebook at this time. Please try again later.

or

By joining, you agree to our

read free for two weeks

Unlimited access to more than

one million books

one million books

Personalized recommendations

based on books you love

based on books you love

Syncing across all your devices

Join with Facebook

or Join with emailSorry, we are unable to log you in via Facebook at this time. Please try again later.

Already a member? Sign in.

By joining, you agree to our

to download

Unlimited access to more than

one million books

one million books

Personalized recommendations

based on books you love

based on books you love

Syncing across all your devices

Continue with Facebook

Sign inJoin with emailSorry, we are unable to log you in via Facebook at this time. Please try again later.

By joining, you agree to our

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

CANCEL

OK

You've been reading!

NO, THANKS

OK

scribd