You are on page 1of 11

Preprocessing video images for neural learning of lipreading 

K. Venkatesh Prasad, David G. Stork, Gregory J. Wol


Machine Learning and Perception Group
Ricoh California Research Center
2882 Sand Hill Road, Suite 115
Menlo Park, CA 94025-7022
mlp@crc.ricoh.com

Abstract
W
2 Ricoh California Research Center Technical Report # 93{26
and vice versa. Thus, for example =mi= $ =ni= are highly confusable acoustically but are easily distinguished based
on the visual information of lip closure. Conversely, =bi= $ =pi= are highly confusable visually (\visemes"), but are
easily distinguished acoustically by the voice-onset time (the delay between the burst sound and the onset of vocal fold
vibration). Th
4 Ricoh California Research Center Technical Report # 93z26

A{B
Preprocessing Video for Lipreading 5

Gray Level
6 Ricoh California Research Center Technical Report # 93{26

i=1 ...

v-1
8 Ricoh California Research Center Technical Report # 93{26
vertical mouth_gap (pixels)

25

20

15

10

5
1 10 33 50
time (frame number) --->
Preprocessing Video for Lipreading 9

ts
uni
p -
ts
uni
x -

You might also like