You are on page 1of 10

‘Human Computer

“BLISS to Blind” using “JAWS” SOFTWARE
speech synthesis which converts text to
speech, and the speech units. The third
part portrays the speech production by the

PREMBLE: vocal chords and acoustic phonetics. The

This paper was mainly intended for the next part is dealt with the detailed

blind people who could not make use of description on

computer efficienly, who could just

interact with computer only through the
braille display or the knowledge of
typing. In order to facilitate them to
constrict the vulnerabilities that peep out various converted speech units (words,

through, while engaging with system sylablles, demisylablles, diphone,

security and data security (giving wrong allophone, phoneme), the way in which

passwords) and other security threaten these primitive speech units are combined

applications our paper could serve as an to form a full speech with phonetic

alternative. sounds. The process indulged in the

ABSTRACT: analysis of speech signals and the various

The Acronym for jaws is “Job Access synthesis methods, the coding of vocal

with Speech” .The first part of this paper tract parameters using Linear

introduces the necessity of refreshable Predictive Coding (LPC) synthesis with

Braille display, device which rests on the its block diagram. The fifth part is all

keyboard by means of raising dots about Automatic Speech Recognition

through holes in flat. .The second part (ASR) principles and the stages involved

focuses on the fundamentals of the in it, speech unit’s synthesis influencing

factors .The later part discusses the signal tear, these displays are expensive.
processing front end and Usually, only 40 or 80 Braille cells are
parameterisation and their feature, displayed. Models with 18-40 cells exist
blocking and windowing. in some note taker devices. On some
The sixth part discusses the various models the position of the cursor is
speech recognition techniques, represented by vibrating the dots, and
and finally the conclusion part followed some models have a switch associated
by bibliography. with each cell to move the cursor to that
INDEX cell directly. The mechanism which raises
• Introduction the dots uses the piezo effect. Some
• Why JAWS crystals expand, when a voltage is applied
• Fundamentals of speech synthesis to them. Such a crystal is connected to a
• Speech units lever, which in turn raises the dot. There
• Speech production and Acoustic has to be such a Crystal for each dot of
units Synthesis methods the display, i.e., eight crystals per
INTRODUCTION: character.
The refreshable Braille display is an The software that controls the display is
electro-mechanical device for displaying called a screen reader. It gathers the
Braille characters, usually by means of content of the screen from the operating
raising dots through holes in a flat system converts it into Braille characters
surface. The display sits under the and sends it to the display. Screen readers
computer keyboard. for graphical operation systems are
It is used to present text to computer especially complex, because graphical
users who are blind and cannot use a elements like windows and slide bars
normal computer monitor. Speech have to be interpreted and described in
synthesizers are also commonly used for text form. A new development, called the
the same task, and a blind user may rotating wheel Braille display, was
switch between the two systems developed in 2000 by the National
depending on circumstances. Because of Institute of Standards and Technology
the complexity of producing a reliable And is still in the process of
display that will cope with daily wear and commercialization. Braille dots are put on
the edge of a spinning wheel, which synthesizers concerns trade-offs among
allows the blind user to read continuously the conflicting demands of maximizing
with a stationary finger while the wheel the quality of speech, while minimizing
spins at a selected speed. The Braille dots memory space, algorithmic complexity,
are set in a simple scanning style fashion and computation speed. The text might be
as the dots on the wheel spins past a entered by keyboard or optical character
stationary actuator that sets the Braille recognition or obtained from a stored
characters. As a result, manufacturing database. Speech synthesizers then
complexity is greatly reduced and convert the text into a sequence of speech
rotating wheel Braille display will be units by lexical access routines. Using
much more inexpensive than traditional large speech units such as phrases and
Braille displays. sentences can give high quality output
speech but requires much more memory.
A block diagram of the steps in speech
synthesis is shown below.

Fig : A Blind Person Communicating

With The Computer
Fig: block diagram of text to
Speech units are retrieved and
Speech synthesis is the automatic
concatenated to output the synthetic
generation of speech waveforms based on
speech. Speech is often modeled as the
an input text to synthesize and on
response of a time varying linear filter
previously analyzed digital speech data.
(corresponding to the vocal tract from the
The critical issues for current speech
glottis to the lips) to an excitation
waveform consisting of broadband noise, production: coarticulation,
a periodic waveform of pulses, or a intonation and vocal-tract
combination. In summary, the most excitation. Most synthesizers
important components of a synthesizer’s reproduce speech bandwidth in
algorithm are the range of 0-3 KHZ (e.g. for
1. Stored speech units: storage of telephone applications)
speech parameters, obtained from Or 0-5 KHZ (eg. For high quality).
natural speech in terms of speech Frequencies up to 3 kHz are sufficient for
units. vowel perception since vowels are
2. Concatenation routines: A adequately specified by the first three
program rules to concatenate formants.
these units, smoothing the SPEECH UNITS:
parameters to create time The choices of speech units determine
trajectories. Real time speech the storage memory required and the
synthesizer has been available for quality of the synthetic speech. Some
many years. Such speech is possible units are described in the table
generally intelligible, but lack of given below. The number in the bracket
natural ness. Quality inferior to under the column “quantity” is the
that of human speech is usually sufficient numbers of sub word units
due to inadequate modeling of describe all English words.
three aspects of human speech

Units Quantity Descriptions Advantages/Disadvantages

Words 300,000 The fundamental units of a Advantages:
(50,000) sentence. Yield high quality speech
Simple concatenation synthesis algorithm
Limited by the memory requirement
Need to adjust the duration of word
Syllables 20,000 It consists of a nucleus Disadvantage:
(4400) Syllable boundary is uncertain
Demisyllable 4500 It is obtained by dividing syllables Advantages:
(2000) in half, with the cut during the Preserve the transition between adjacent phones
vowel, where the effects of co Simple smoothing rules
articulation are minimal. Produce smooth speech
Diphone 1500 It is obtained by dividing a speech Advantages:
(1200) waveform into phone-sized units, Preserve the transition between adjacent phones
with the cuts in the middle of each Simple smoothing rules produce smooth speech
Allophone 250 They are phonemic variants. Advantage:
Reduce the complexity of the interpolation
algorithm comparing to phonemes
Phonemes 37 It is the fundamental unit of Advantages:
phonology. Memory requirement is small
Require complex smoothing rules to represent the
co articulation effect need to adjust intonation at
each Context

Consider the English word "segmentation". Its representation according to each of the above
sub word unit sets is:

SPEECH PRODUCTION AND analyze the speech sounds. Acoustic

ACOUSTIC-PHONETICS: phonetics is the study of the physics of the
speech signal. When sound travels through
The English phonemes can be classified
the air from the speaker’s mouth to the
according to how they are produced by
hearer’s ear it does so in the form of
vocal organ. An understanding of speech
vibrations in the air. It is possible to
production mechanism will help us to
measure and analyze these vibrations by examples, the initial and final elements are
mathematical techniques studied in identical, but the middle elements are
physics of sound. different. It is the different in this middle
element that distinguish the four words.
Similarly, we can find those sounds, which
Listener into a sequence of words and differentiate one word from another for all
sentences converts the acoustic signals. the words of a language. Such
The most familiar language units are distinguishing sounds are called
words. They can be thought of as a phonemes. There are about fifty phoneme
sequence of smaller linguistic units, units in English. The following figure
phoneme, which are the fundamental units shows a segment of vowel /ix/. The quasi-
of phonology. We will use ARPABET periodicity (almost periodic) of voiced
symbols in the rest of this paper. The speech can be observed.
easiest way to understand the nature of
phonemes is to consider a group of words
like “hid”, “ head “, “hood”, and “hod”.
All these words are made up of an initial,
middle, and a final element. In our

Fricative sounds are generated by Velocity to produce turbulence. This

constricting the vocal tract at some point turbulent air sounds like a hiss e.g. /hh/
along the vocal tract and forcing the air or /s/.
stream to flow through at a high enough
Plosive or stop sounds are resulted from mechanism produces sounds like /p/ and
blocking the vocal tract by closing the lips /g/. The following figure shows the stop
and nasal cavity, allowing the air pressure /g/. The silence before the burst is the stop
to build up behind the closure, and closure.
following by a sudden release of it. This

ACOUSTIC PHONETICS we have a signal,which is sampled at

ANALYSIS OF SPEECH 16Khz, the typical steps to compute the
spectrogram are described as follows. The
Based on the knowledge of the speech speech is blocked by Hamming window
production mechanism and the study of with duration of 256 samples. A 56-point
acoustic phonetics,we are able to extract a FFT( fast Fourier transform) is applied to
set of features,which can best represent a each windowed speech. The power spectra
particular phoneme. One of the popular in dB are plotted. Horizontal bars in the
techniques is the spectrogram,which spectrogram display the format
describe how the frequency contents of a frequencies while the vertical lines there
speech signal change with time. Suppose indicate the pitch period( i.e. the inverse of
the fundamental frequency).
Synthesis Methods: 1. Waveform Synthesizers

Synthesis can be classified by how they 2. Terminal Analog Synthesizers

parameterize the speech for storage and
3. Articulaticulatory Synthesizers

4. Formant Synthesizers
Formant Synthesisers: Formant synthesizer employs formant
resonances and bandwidths to represent
the stored spectrum. The vocal tract filter is
usually represented by 10-23 spectral
parameters. Formant synthesizers have an
advantage over LPC systems in that
bandwidths can be more easily manipulated
and that zeros can be directly introduced
into the filter. However, locating the popes
and zeros automatically in natural speech is
a more difficult task than automatic LPC
analysis. Most of formant synthesizers use
a model similar to that shown below.

Fig: Block Diagram Finally, an effective, interactive

Of Format Synthesizer communicative medium has
been deviced with the glimpse of
Human Computer Interface using
the Braille displays. Speaker
independent speech
synthesizers could presume
highly sophisticated algorithms,
which consume much time and
space complexities.

Thus the JAWS persuade the blind

people to make use of computer as usual
as a normal man.


The following websites have been

taken for reference: