You are on page 1of 7

PERCEPTION OF ISOLATED SPEECH SEGMENTS

LEVELS OF SPEECH PROCESSING

SPEECH AS A MODULAR SYSTEM

A cognitive system is modular if it (1) is domain specific (2) operates on a mandatory basis (3) is fast and
(4) unaffected by feedback.

The fact that there is no one to one correspondence between acoustic cues and perceptual events has
been termed the Lack of Invariance.

Speech percept are based on both invariant and context conditioned cues. As an example, Cole and
Scott point out that The nasal consonant (m)and (n) are distinguisgh from the other consonants by a
single bar of low frequency energy along with a complete lack of high frequency energy .

A number of experimental findings has been advanced to support the view of speech perceptually
special, but the one that has recieved the most attention has been phenomenon of Categorical
Perception.

The discriminative capacity is largely continuous in a sense that we can perceive a series of quantitative
changes in stimulus lying on a continuum such as tone of varying degree of intesity. To comprehend
speech, we must impose an absolute or categorial identification on the incoming speech signal rather
than simply a relative determination of the various physical characteristics of the signal.That is, our job
to identify whether a sound is a (p) or (b).

On a speech spectogram, It is posible to identify the difference between the voiced sound [ba] and the
voiceless [pa] as due to the time between when sound is released at the lips and when when the vocal
cord begin vibrating.

Voice Onset Time (VOT) is an important cue in the perception of the voicing feature.

PERCEPTION OF CONTINUOUS SPEECH

Speech sound are embedded in a context of fluent speech,also the acoustic structure of a speech sound
varies with its immediate phonetic context such as adjacent syllables and clauses may play a significant
role in our identification of speech.
PROSODIC FACTORS IN SPEECH RECOGNITION

We can delect the moods of person talking down the hall from the intonational contour of the speech
but still not be able to identify what they are saying similarly , there are two cases of the way prosodic
and segmental information interact, the Stress and Rate.

Semantic and Syntactic Factors in Speech perception

Context and speech Recognition As we have seen, a word isolated from its context becomes less
intelligible (Pollack & Pickett,1964). It follows that if we vary semantic and syntactic aspects of this
context, then we should find changes in the perceptibility of the speech passage.

The role of higher-order contextual factors in speech recognition has been convincingly
demonstrated by George Miller and his associates. Miller, Hesie, and Lichten(1951) presented words
either in isolation or in five-word sentences in the presence of white noise(hissing sound). Performance
was better in the sentence condition at all levels of noise. Apparently, listeners were able to use the
syntactic and semantic constraints of continuous speech to limit the number of possibilities to consider.
Further research (Miller & Isard,1963) isolated the influence of syntactic and semantic information in
this process. In this study,three different types of sentences were presented in continuous speech:(1)
grammatical Strings, (2)Anomalous strings that preserved grammatical word order,and (3)
ungrammatical strings:

(1) Accidents kill motorists on the highways.

(2) Accidents carry honey between the house.

(3) Around accidents country honey the shoot.

The results indicated that people were most accurate with grammatical strings, somewhat less accurate
with anomalous strings , and even less able to recognize ungrammatical strings. It would appear that the
more predictable a passage is, the better it is recognized.

These results are consistent with our discussion of top-down processing in chapter 3. Top down
processing proceeds from the semantic level of processing to the sensory levels. Thus, our knowledge of
the general organization of the input enables us to predict some of the sensory features that are to
follows. Top down processing of continuous speech seems most likely when the speech context is
semantically reasonable and familiar to the listener.

Phonemic Restoration A most dramatic demonstration of the role of Top-down processing of speech
signals comes from what is called Phonemic restoration (Warren,1970; warren & warren 1970). The first
/s/ in the word legislatures in sentence (4) was removed and replaced with a cough.
(4) The state governors met with their respective legistures convening in the capital city.

(5) It was found that the *eel was on the axle.

(6) It was found that the *eel was on the shoe.

(7) It was found that the *eel was on the orange.

(8) It was found that the *eel was on the table.

Mispronunciation Detection What happens when a perfectly ordinary sentence contains a minor
phonetic error? For example, if you heard sentence (9), would you have noticed that the first phoneme
in the fourth word has been mispronounced?(you might try reading it aloud to a friend.)

(9) It has been zuggested that students be required to preregister.

Marslen-Wilson and Welsh(1978) extended this results by combining the mispronunciation detection
task with a Shadowing task. A shadowing task is one in which subjects have to repeat immediately what
they hear. Marslen-Wilson and Welsh examined the conditions under Which listeners would repeat a
mispronounced sounds exactly,as opposed to restoring the "Intended" pronunciation. They found that
restorations were associated with greater fluency than were exact repetitions; in particular, less pausing
was observed for restorations. Moreover restorations tended to occur when the context was highly
predictable, but reproductions were more likely with low levels of contextual predictability.

It is as if when we "know" what a person is going to say, we barely listen for the actual words and need
only check for broad agreement of sounds with expectations. In contrast, when uncertainty is higher, we
are less likely to have a firm basis on which to make these restorations. Moreover, the fluent nature of
the restorations suggests that semantic and syntactic constraints are naturally integrated with incoming
speech during language processing. These are not guesses but rather are heard, like phonemic
restorations, just as clearly as if they were really there. Our immediate awareness thus seems to be a
combination of an analysis of incoming sounds with an application of semantic and syntactic constraints.

The interactive nature of the perceptual process is revealed in another aspect of Malslen-Wilson and
Welsh's study. They examined the relative proportion of restorations in cases in which the target
("intended") phoneme and presented phoneme differed in one, two, or three distinctive features. The
percentage of restorations was far higher (74%) when only one feature differentiated target and
presented phoneme than when three features differentiated them (24%). So bottom-up processing
plays a role here,too. Even if the context strongly implies that a word is appropriate, if the expected
phoneme is not sufficiently similar to the presented one on phonetic grounds, restoration is not likely to
occur. Under these conditions, listeners are prone to pause, as if to make these comparisons, then
repeat the presented word.

PERCEPTION OF WRITTEN LANGUAGE


Our approach will be selective in attempting to identify points of similarity and difference with the early
stages of auditory language processing.Visual processing of larger units of language.Such as sentences
and discourse will be treated in subsequent chapters.

Different Writing Systems

A Writing system is a method of visually representing verbal communication,


based on a script and a set of rules regulating its use. While both writing and
speech are useful in conveying messages, Writing system require shared
understanding between writers and reader of the meaning behind the sets if
characters that make up a script.

Writing Systems can be placed into broad categories suhch as Alphabets,


Syllabaries, or Logographies.

1.) alphabets - An Alphabets is a small sets symbols, each of which roughly


represent or historically represented a phoneme of the language,In a perfectly
phonological alphabets. The phonems ad letters would correspon perfectly in
two direction.

1. Writer could predict the spelling of a word given its pronunciation.

2. A speaker could predict the pronunciation of a word given its spelling.

2.) A syllabary is a set of written symbols that represent(or, Approximate)


syllables. A glyph in a syllables typically represents a consonant followed by a
vowel, or just vowel alone. Though in some scripts more complex syllables ( such
as consonant-vowel-consonant or Consonant-consonant-vowel) may have
dedicated glyph.

Phonetically related syllables are not so indicated in the script. For instance, the
syllables "Ka" may look nothing like the syllable "Ki", nor will syllables with the
same vowels be similar.
Logographies- A logogram is a single written character which represents a
complete grammatical word, Chinese character sre types examples of logograms
are required to write all the words of language.

Example of Alphabets

( A, B, C)

Example of syllabary

Cherokee

Japanese hiragana

Japanese katakana

Yi languages

Example of logographies

Chinese language

Japanese kanji
Levels of Written language Processing

Written language- The representation of a language by means of a writing system.


Written language is an invention in that it must be taught to children: Children
will pick up spoken language ( oral or sign) by exposure without being specifically
taught.

Natural Language Processing- allows called NLP, allows machines to understand


the meaning pf phrases spoken and written by humans NLP is a branch if artificial
intelligence ( AI) that enables computers to comprehend, generate and
manipulate human language. Natural language processing has the ability to
interrogate the data with natural language text or voice.

PERCEPTION OF LETTER IN WORD CONTEXT

The word-superiority effect- in an early study of word perception Cattel (1886) compared performance
on individual letters with letters in word context. His result were striking. Whereas people were able to
report only about three or four unrelated letters, they could report as many as two short words that
were not semantically or syntactically related to one another.

TWO MODELS OF READING

Dual-Route-Model- The Dual-Route-Model (Coltheart, Curtains, Atkins, & Haller, 1993: Rastle, Perry,
Langdon, & Ziegler, 2001) proposes that we have two different ways of converting print to speech. The
lexical route is the process by which a printed set of letters or characters activities the entry for the
corresponding word in our international lexicon. The heart of the Dual-route-model is the assumption
that we have two different systems that enable us to read individual words: a rule system and a memory
system.

Connectionist model- assumes a series of layers, with the weight of the connections between layers
determined by the reader's experience. Connectionist models attempt to explain the computational
mechanisms underlying various psychological acquisition of grammar.

You might also like