You are on page 1of 44

An Embedded System of

Speech Recognition

P. Sudhakara Rao
Central Electronics Engineering Research Institute Centre,
CSIR Madras Campus, Tharamani,
Chennai – 600 113, India
Introduction
• Robust Speech Recognition and Synthesis in
Embedded Systems provides a link between
the technology and the application worlds

• This Lecture reviews the problems of robust


speech recognition and synthesis,
summarizes the work done at CEERI Delhi
Centre and also gives the details of current
state of the art of Speech technologies
Speech Technology

• Language dependent
• Reverse Engineering not possible
• Interdisciplinary Research
• Technology Matured now
– Some practical Applications available
– Wide range of applications in sight
Speech Technology Helps IN
• Communication with Computers
• Information Services
• Language Translation
• Aids for Handicapped
• Recognition of messages
• Recognition of Speakers
• Text to Speech Conversion
• Language Identification
Applied Research Problems
• Fluent Speech recognition
– (All Speakers, all accents, environment etc..)
• Natural sounding Synthetic Speech
– (desired accent, language, voice etc..)
• Speaker Recognition
– (Text independent identification and verification of a
speaker)
• Speech Communication / understanding messages in high
ambient noise conditions
E m b e d d e d S p e e c h T e c h n o lo g y
L I N G U IS T I C
S In p ut T e x t

A /D
P HO N E T IC S T e x t P r oc e s s in g
P A R S IN G

P r e -P r o c e s s i n g

S p e e ch C o n c a te n a ti o n
K n o w le d g e o f P a r a m e te r s
S pe e c h A n a l y s i s R e p r e se n ta ti o of S p e e ch
( E x tr a c ti o n o f n . D a ta b a se S a m p le s
P a r a m e te r s ) R u l e s e tc . (R u l e s)

R u le of P r os o d y
D a ta
C o m p r e s s ion an d
R e pr e s e n ta ti o n
H i g h Q u a l i ty
S pe e c h
S y n th e s i z e r

S pe e c h S pe a k e r
R e c o g n i ti o n R e c o g n i ti o n

M e s s ag e S pe a k e r I de n ti t y

A P P L IC A T IO N S

Com m and I n f o r m a ti o n T e le c o m m . A i ds fo r M ac h in e
& C o n tr o l R e tr i e v a l / e n tr y s e r vi c e s H an d ic a p p e d tr a n s l a ti o n
Speech- Natural, Efficient &
Economical way of Communication
Specific Properties of Indian
Spoken Languages

• Phonetic in Nature
• Better Articulatory discipline
• systematic manner of production
• Very few Flaps/Taps or Trills
• Five or Six distinct place of Articulation
• Few fricatives compared to English
Acoustic Phonetic classification
of Hindi and Bengali sounds
Major Achievements in
Speech Recognition

• Different versions of microprocessor


based Isolated Word Recognition
Systems
• Vocabulary Size 20,50,100 and 200
words
• Language independent
• Real Time standalone system
Major Achievements …….
contd.-
contd.-

• Application Systems of IWRS

• Voice operated motorized wheel chair


• Voice operated mobile intelligent Robot
• Multilingual Display of Voice commands
Isolated Word (Speech)
recognition System
• Features:
– Vocabulary Size: 200 words (user selectable)
– Language Independent (base on template matching)
– Response Time: 300 msecs. (vocabulary of 50 words)
– Word Duration:80-1120 msecs
– Speaker Dependent (may be easily trained for a new speaker)
– Accuracy: 91%-100% (depending upon the vocabulary size and
words)
– Input Bandwidth:200 Hz-7000 Hz.
– Standalone and Portable Type
– Easy to Interface with host computers and Easy to operate
ISOLATED WORD(SPEECH)
RECOGNITION SYSTEM

• TECHNICAL SPECIFICATIONS:
INPUT: Closed talking head worn microphone.
OUTPUT: Parallel and serial(RS-232C) ports
Built in 7-segment LED display.
PRE-PROCESSOR:
Input Bandwidth: 200 Hz -7000 Hz.
Filter Bank analyzer: 16 critical bands Lowpass filter 40 Hz.
DATA PROCESSOR: MC68000 based microcomputer.
WORD BOUNDARIES: Based on silence background noise
level.
-Contd.
DATA COMPRESSION:
Removal of redundant information.
Variable segment encoding (100: 1 aprox.)
PATTERN MATICHING :
Template matching ( Dynamic time warping / Discrete HMM).
AVAILABLE INTERFACES:
Stepper motor controller.
Multilingual (GIST) CRT Terminal.
Telephone Dialer
Speech Synthesizer (CVSD CODEC)
Voice Controlled Wheel Chair
INDIGENOUS TECHNOLOGY

VOICE AND JOYSTICK CONTROLLED


BATTERY OPERATED WHEELCHAIR
HIGHLY EFFECTIVE MANEUVERABILITY STATE OF THE ART SPEECH RECOGNITION BASED TECHNOLOGY LANGUAGE INDEPENDENT VOICE COMMANDS

ON-SPOT CHAIR ROTATION (SPIN MOVEMENT) BOON FOR QUADRIPLEGIC AND PARAPLEGIC PATIENTS AUTOMATIC AND MANUAL CONTROL OPTIONS

Central Electronics Engineering Research Institute Pilani, Rajasthan - 333031, India


SALIENT FEATURES
Control Joystick and Voice

Movements Forward, Reverse, Left, Right and Spin (variable speed)


Voice Commands Forward, Reverse, Left, Right, Slow, Fast, Spin and Stop
Voice Input Microphone (Head-Worn / Collar)
Voice Recognition Pattern Matching

Motors 24V/120W

Speed 0 - 4 Km / Hour

Drive High Power MOSFET

Wheels General Purpose Castor

Braking Electromagnetic

Power General Purpose Battery (12V DC x 2)

Structure Tubular, foldable and adjustable

Weight 60 Kg, Including batteries


Speaker Recognition
Major Achievements in
Speech Synthesis
• High Quality Speech Synthesizer and
Text to Speech conversion system for
Hindi
• PC based system
• On-line, Unlimited vocabulary Text to
Speech Conversion
• Adaptable to other Indian Languages
Block diagram of Hindi TTS
Block Diagram of Speech
Synthesizer
Hindi-Vani : A Windows Based TTS

• Synthesizes Hindi characters as they are


entered by Keyboard
• Synthesizes Hindi Words
• Synthesizes Text Line
• Synthesizes the entire document
• Saves Hindi text in standard RTF format
• Uses normal sound-blaster card for speech output
• Developed in VC++ using MFC
• Improvement in speech quality is underway
Hindi-Vani : A Windows Based TTS
Synthetic
Sentence
(Copy
Synthesis)

Original
Sentence
Major Achievements in
Basic Speech Research

• Speech Processing Tools


– CEERI SPEECH PROCESSOR (CSP)
• A speech analysis tool

• Speech Data base development


Initial Screen of
CEERI Speech Processor
Main Window of
CEERI Speech Processor
Initial Screen of
CEERI Speech Processor
Database Introduction

• Acoustic-phonetic correlates of a language


• To structure the variability that occur in the speech
signals
• The speech databases are language dependent.
• Each language has different types and number of basic
speech units
• Speech databases for Hindi and other Indian languages.
• R&D efforts at CEERI
Procedure for creating data base
S p e e c h fro m S h u re M ic ro p h o n e S p e e c h fro m O rd in a ry M ic ro p h o n e

C h an n el 1 C h an n el 2
C o m p u te ris e d S p e e c h
L a b (C S L ) S y ste m

S e ttin g th e re c o rd in g S /R a te : 1 6 ,0 0 0
c o n d itio n th ro u g h D u ra tio n : 4 0 S e c
s o ftw a re

R e c o rd in g fiv e
s e n te n c e s ( a p p r o x .)
a t a tim e

M a rk in g s e n te n c e
b o u n d a rie s

C o n firm s e n te n c e
b o u n d a rie s b y lis te n in g
b ack
P ro c e s s th is d a ta file
S to re th e d a ta in a file U s in g s e m i-a u to m a tic
L a b e lin g to o l
C o p y th e d ig ita l
s p e e c h d a ta b a s e in to M a n u a l in s p e c tio n a n d
C D -R O M c o rre c tio n o f s e g m e n t
b o u n d a rie s

O rg a n is a tio n a n d
d is trib u tio n o f th e
la b e le d d a ta
General Purpose Speech Data Base for
Hindi

• A general-purpose database has been created for a vocabulary of 1000 most frequently used
words.
– The specifications of the data base are as follow

1. Language : Standard Hindi (Khari Boli)


2. Vocabulary Size : A set of 1000 most frequently
occurring Hindi words.
3. Speakers : 50 speakers.
4. Utterances : (30 male and 20 female) 2
repetitions each.
5. Audio Recording : Recording on a cassette tape in
studio S/N >50 dBs, 70 Hz to 7KHz.
6. Digitisation : 16 KHz sampling, 16 bit
quantization.
7. Storage Media : Floppies, Tape cartridge , CD-ROM.
8. Database Organisation : Words, Speakers and Repetitions.
9. Working Platform : PC/AT 486.
10. Specialised H/W : Ariel DSP-16 card.

• This database will be very useful for developing general-purpose speech recognition systems
Data Base of 1000 sentences
• A phonetically rich database has been created for 1000 sentences
– The specifications of the data base are as follow
1. Language : Standard Hindi
2. Vocabulary Size : A set of 800 phonetically compact
and 2 phonetically rich sentences
3. Speakers : 100 speakers (60 male and 40 female)
4. Utterances : one
5. Audio Recording : 2 channel Recording using two different
microphones
6. Microphones : SHURE microphone, An ordinary microphone
7. Signal to Noise ratio : First Channel (50db), Second Channel(20db)
8. Digitisation : 16 KHz sampling, 16 bit quantization.
9. Storage Media : Floppies, Tape cartridge , CD-ROM.
10. Recording Platform : directly on a Pentium PC
11. Specialised H/W : Kay’s Computerised Speech Lab
12. Labelling : Manual Labelling using Sensimetrics
Speech Station Software
A sample set of 10 sentences
Main Window of the Data base
Sensimetrics Speech Station
Software used for Labeling
Challenges in Speech Technology

• For Applications
– Speech to Speech Translation
– Text Reading Machines
– Multi-lingual dialogue in speech mode
– Voice Operated Telecom Services
– Voice interactive (2 mode / 3 mode communication service
for Multi-media, financial transaction, enquires etc..
– Voice commands/control in noisy and hazardous
environment, security applications
THANK YOU