Professional Documents
Culture Documents
Accent (Linguistics)
Acoustic Phonetics
Belt (Music)
Histology Of Vocal Folds
Intelligibility (Communication)
Lombard Effect
Manner Of Articulation
Paralanguage: Nonverbal Voice Cues In
Communication
Phonation
Phonetics
Voice Change In Boys
Speaker Recognition
Speech Synthesis
Vocal Loading
Vocal Rest
Vocal Range
Vocal Warm Up
Vocology
Voice Analysis
Voice Disorders
Voice Frequency
Voice Organ
Voice Pedagogy
Voice Projection
Voice Synthesis
Voice Types (Singing Voices)
Use Of The Web By People With Disabilities
Human Voice
The human voice consists of sound made by a human being using the vocal
folds for talking, singing, laughing, crying,screaming, etc. Human voice is specifically that
part of human sound production in which the vocal folds (vocal cords) are the primary
sound source. Generally speaking, the mechanism for generating the human voice can be
subdivided into three parts; the lungs, the vocal folds within the larynx, and the articulators.
The lung (the pump) must produce adequate airflow and air pressure to vibrate vocal folds
(this air pressure is the fuel of the voice). The vocal folds (vocal cords) are a vibrating valve
that chops up the airflow from the lungs into audible pulses that form the laryngeal sound
source. The muscles of the larynx adjust the length and tension of the vocal folds to ‘fine
tune’ pitch and tone. The articulators (the parts of the vocal tract above the larynx
consisting of tongue, palate, cheek, lips, etc.) articulate and filter the sound emanating from
the larynx and to some degree can interact with the laryngeal airflow to strengthen it or
weaken it as a sound source.
The vocal folds, in combination with the articulators, are capable of producing highly
intricate arrays of sound. The tone of voice may be modulated to suggest emotions such
as anger, surprise, or happiness. Singers use the human voice as an instrument for
creating music.
Voice, Human (Range Of The). The range of the human voice is quite astounding, - there
being about 9 perfect tones, but 17,592,186,044,515 different sounds; thus 14 direct
muscles, alone, or together, produce 16,383 ; 30 indirect muscles, ditto, 173,741,823, and
all in co-operation produce the number we have named ; and these, independently of
different degrees of intensity. A man's voice ranges from bass to tenor, the medium being
whs called a barytone. The female voice ranges from contral o to soprano, the medium
being tinned a mezzo-soprano, - whereas, a boy's Voice is alto, or between a tenor and a
treble.
Phonography
Accent (linguistics)
.
In linguistics, an accent is a manner of pronunciation of a language. An accent may be
associated with the region in which its speakers reside (ageographical or regional accent),
the socio-economic status of its speakers, their ethnicity, their caste or social class,
their first language (when the language in which the accent is heard is not their native
language), and so on.
Accents can be confused with dialects which are varieties of language differing
in vocabulary, syntax, and morphology, as well as pronunciation. Dialects are usually
spoken by a group united by geography or social status.
History
As human beings spread out into isolated communities, stresses and peculiarities develop.
Over time these can develop into identifiable accents. In North America, the interaction of
people from many ethnic backgrounds contributed to the formation of the different varieties
of North American accents. It is difficult to measure or predict how long it takes an accent
to formulate. Accents in the USA, Canada and Australia, for example, developed from the
combinations of different accents and languages in various societies, and the effect of this
on the various pronunciations of the British settlers, yet North American accents remain
more distant, either as a result of time or of external or "foreign" linguistic interaction, such
as the Italian accent.
In many cases, the accents of non-English settlers from Great Britain and Ireland affected
the accents of the different colonies quite differently. Irish, Scottish and Welsh immigrants
had accents which greatly affected the vowel pronunciation of certain areas of Australia and
Canada.
Development
Children are able to take on accents relatively quickly. Children of immigrant families, for
example, generally have a more native-like pronunciation than their parents, though both
children and parents may have a noticeable non-native accent. Accents seem to remain
relatively malleable until a person's early twenties, after which a person's accent seems to
become more entrenched.
All the same, accents are not fixed even in adulthood. An acoustic analysis by Jonathan
Harrington of Queen Elizabeth II's Royal Christmas Messages revealed that the speech
patterns of even so conservative a figure as a monarch can continue to change over her
lifetime.
Non-native accents
Pronunciation is the most difficult part of a non-native language to learn. Most individuals
who speak a non-native language fluently speak it with an accent of their native tongue.
The most important factor in predicting the degree to which the accent will be noticeable (or
strong) is the age at which the non-native language was learned. The critical period theory
states that if learning takes place after the critical period (usually considered around
puberty) for acquiring native-like pronunciation, an individual is unlikely to acquire a native-
like accent. This theory, however, is quite controversial among researchers. Although many
subscribe to some form of the critical period, they either place it earlier than puberty or
consider it more of a critical “window,” which may vary from one individual to another and
depend on factors other than age, such as length of residence, similarity of the non-native
language to the native language, and the frequency with which both languages are
used. Nevertheless, children as young as 6 at the time of moving to another country often
speak with a noticeable non-native accent as adults. There are also rare instances of
individuals who are able to pass for native speakers even if they learned their non-native
language in early adulthood.However, neurological constrains associated with brain
development appear to limit most non-native speakers’ ability to sound native-like. Most
researchers agree that for adults, acquiring a native-like accent in a non-native language is
near impossible.
Social factors
When a group defines a standard pronunciation, speakers who deviate from it are often said
to "speak with an accent". However, everyone speaks with an accent. People from theUnited
States would "speak with an accent" from the point of view of an Australian, and vice versa.
Accents such as BBC English or General American or Standard American may sometimes be
erroneously designated in their countries of origin as "accentless" to indicate that they offer
no obvious clue to the speaker's regional or social background.
Prestige
Certain accents are perceived to carry more prestige in a society than other accents. This is
often due to their association with the elite part of society. For example in the United
Kingdom, Received Pronunciation of the English language is associated with the traditional
upper class. However, in linguistics, there is no differentiation among accents in regards to
their prestige, aesthetics, or correctness. All languages and accents are linguistically equal.
Accent Stereotyping and Prejudice
Stereotypes refer to specific characteristics, traits, and roles that a group and its members
are believed to possess. Stereotypes can be both positive and negative, although negative
are more common.
Stereotypes may result in prejudice, which is defined as having negative attitudes toward a
group and its members. Individuals with non-standard accents often have to deal with both
negative stereotypes and prejudice because of an accent. Researchers consistently show
that people with accents are judged as less intelligent, less competent, less educated,
having poor English/language skills, and unpleasant to listen to. [19][20] Not only people with
standard accents subscribe to these believes and attitudes, but individuals with accent also
often stereotype against their own or others' accents.
Accent Discrimination
Discrimination refers to specific behaviors or actions directed at a group or its individual
members based solely on the group membership. In accent discrimination, one's way of
speaking is used as a basis for arbitrary evaluations and judgments. [21] Unlike other forms of
discrimination, there are no strong norms against accent discrimination in the general
society. Rosina Lippi-Green writes,
Accent serves as the first point of gate keeping because we are forbidden, by law and social
custom, and perhaps by a prevailing sense of what is morally and ethically right, from using
race, ethnicity, homeland or economics more directly. We have no such compunctions about
language, however. Thus, accent becomes a litmus test for exclusion, and excuse to turn
away, to recognize the other.
Speakers with accents often experience discrimination in housing and employment. [22][23] For
example, landlords are less likely to call back speakers who have foreign or ethnic accents
and are more likely to be assigned by employers to lower status positions than are those
with standard accents. In business settings, individuals with non-standard accents are more
likely to evaluated negatively. Accent discrimination is also present in educational
institutions. For example, non-native speaking graduate students, lecturers, and professors,
across college campuses in the US have been target for being unintelligible because of
accent. On average, however, students taught by non-native English speaker do not
underperform when compared to those taught by native speakers of English.
Studies have shown the perception of the accent, not the accent by itself, often results in
negative evaluations of speakers. In a study conducted by Rubin (1992), students listened
to a taped lecture recorded by the same native English speaker with a standard accent.
However, they were shown a picture of the lecturer who was either a Caucasian or Asian.
Participants in the study who saw the Asian picture believed that they had heard an
accented lecturer and performed worse on a task measuring lecture comprehension.
Negative evaluations may reflect the prejudices rather than real issues with understanding
accents.
Acting and accents
Actors are often called upon to speak varieties of language other than their own. For
example, Missouri-born actor Dick van Dyke attempted to imitate a cockney accent in the
film Mary Poppins. Similarly, an actor may portray a character of some nationality other
than his or her own by adopting into the native language the phonological profile typical of
the nationality to be portrayed – what is commonly called "speaking with an accent". One
example would be Viggo Mortensen's use of a Russian accent in his portrayal of Nikolai in
the movie Eastern Promises.
The perception or sensitivity of others to accents means that generalizations are passed off
as acceptable, such as Brad Pitt's Jamaican accent in Meet Joe Black. Angelina
Jolie attempted a Greek accent in the film Alexander that was said by critics to be
distracting. Gary Oldman has become known for playing eccentrics and for his mastery of
accents.
Accents may have associations and implications for an audience. For example,
in Disney films from the 1990s onward, English accents are generally employed to serve one
of two purposes: slapstick comedy or evil genius. Examples include Aladdin (the Sultan and
Jafar, respectively), The Lion King (Zazu and Scar, respectively), The Hunchback of Notre
Dame(Victor the Gargoyle and Frollo, respectively), and Pocahontas (Wiggins and Ratcliffe,
respectively - both of whom happen to be played by the same actor, American David Ogden
Stiers).
Legal implications
In the United States, Title VII of the Civil Rights Act of 1964 prohibits discrimination based
on national origin, implying accents. However, employers can insist that a person’s accent
impairs his or her communication skills that are necessary to the effective business
operation and be off the hook. The courts often rely on the employer’s claims or use judges’
subjective opinions when deciding whether the (potential) employee’s accent would
interfere with communication or performance, without any objective proof that accent was
or might be a hindrance.
Kentucky's highest court in the case of Clifford vs. Commonwealth held that a white police
officer, who had not seen the black defendant allegedly involved in a drug transaction,
could, nevertheless, identify him as a participant by saying that a voice on an audiotape
"sounded black." The police officer based this "identification" on the fact that the defendant
was the only African American man in the room at the time of the transaction and that an
audio-tape contained the voice of a man the officer said “sounded black” selling crack
cocaine to a white informant planted by the police.
Acoustic phonetics
Acoustic phonetics is a subfield of phonetics which deals with acoustic aspects
of speech sounds. Acoustic phonetics investigates properties like the mean
squared amplitude of awaveform, its duration, its fundamental frequency, or other
properties of its frequency spectrum, and the relationship of these properties to other
branches of phonetics (e.g. articulatory orauditory phonetics), and to abstract linguistic
concepts like phones, phrases, or utterances.
The study of acoustic phonetics was greatly enhanced in the late 19th century by the
invention of the Edison phonograph. The phonograph allowed the speech signal to be
recorded and then later processed and analyzed. By replaying the same speech signal from
the phonograph several times, filtering it each time with a different band-pass filter,
a spectrogram of the speech utterance could be built up. A series of papers by Ludimar
Hermann published in Pflüger's Archiv in the last two decades of the 19th century
investigated the spectral properties of vowels and consonants using the Edison phonograph,
and it was in these papers that the term formant was first introduced. Hermann also played
back vowel recordings made with the Edison phonograph at different speeds to distinguish
between Willis' and Wheatstone's theories of vowel production.
Further advances in acoustic phonetics were made possible by the development of
the telephone industry. (Incidentally, Alexander Graham Bell's father, Alexander Melville
Bell, was a phonetician.) During World War II, work at the Bell Telephone
Laboratories (which invented the spectrograph) greatly facilitated the systematic study of
the spectral properties of periodicand aperiodic speech sounds, vocal tract resonances and
vowel formants, voice quality, prosody, etc.
On a theoretical level, acoustic phonetics really took off when it became clear that speech
acoustic could be modeled in a way analogous to electrical circuits. Lord Rayleigh was
among the first to recognize that the new electric theory could be used in acoustics, but it
was not until 1941 that the circuit model was effectively used, in a book by Chiba and
Kajiyama called "The Vowel: Its Nature and Structure". (Interestingly, this book by
Japanese authors working in Japan was published in English at the height of World War II.)
In 1952, Roman Jakobson,Gunnar Fant, and Morris Halle wrote "Preliminaries to Speech
Analysis", a seminal work tying acoustic phonetics and phonological theory together. This
little book was followed in 1960 by Fant "Acoustic Theory of Speech Production", which has
remained the major theoretical foundation for speech acoustic research in both the academy
and industry. (Fant was himself very involved in the telephone industry.) Other important
framers of the field include Kenneth N. Stevens, Osamu Fujimura, and Peter Ladefoged.
Belt (music)
Belting (or vocal belting) refers to a specific technique of singing by which a singer
produces a loud sound in the upper middle of the pitch range. It is often described as
a vocal registeralthough some dispute this since technically the larynx is not oscillating in a
unique way . Singers can use belting to convey heightened emotional states .
Technique
The term "belt" is sometimes mistakenly described as the use of chest voice in the higher
part of the voice. (The chest voice is a very general term for the sound and muscular
functions of the speaking voice, singing in the lower range and the voice used to shout. Still,
all those possibilities require help from the muscles in the vocal folds and a thicker closure
of the vocal folds. The term "chest voice" is therefore often a misunderstanding, as it
describes muscular work in the chest-area of the body, but the "sound" described as
"chestvoice" is also produced by work of the vocal folds.) However, the proper production of
the belt voice according to some vocal methods involves minimizing tension in the throat
and change of typical placement of the voice sound in the mouth, bringing it forward into
the hard palate.
It is possible to learn classical vocal methods like bel canto and to also be able to belt; in
fact, many musical roles now require it. The belt sound is easier for some than others, but
the sound is possible for classical singers, too. It requires muscle coordinations not readily
used in classically trained singers, which may be why some opera singers find learning to
belt challenging.
In order to increase the number of high notes one can belt, one must practice. This can be
by repeatedly attempting to hit the note in a melody line, or by using vocalise programs
utilizing scales. Many commercial learn-to-sing packageshave a set of scales to sing along to
as their main offering, which the purchaser must practice with often to see improvement.
'Belters' are not exempt from developing a strong head voice, as the more resonant their
higher register in head voice, the better the belted notes in this range will be. Some belters
find that after a period of time focusing on the belt, the head voice will have improved and,
likewise, after a period of time focusing on the head voice, the belt may be found to have
improved.
Physiology
There are many explanations as to how the belting voice quality is produced. When
approaching the matter from the Bel Canto point of view, it is said that the chest voice is
applied to the higher register However, through studying singers who use a "mixed" sound
practitioners have defined mixed sound as belting . One researcher,Jo Estill, has conducted
research on the belting voice, and describes the belting voice as an extremely muscular and
physical way of singing. When observing the vocal tract and torso of singers, while belting,
Estill observed:
Minimal airflow (longer closed phase (70% or greater) than in any other type of
phonation)
Maximum muscular engagement of the torso (In Estill terms: Torso anchoring).
Engagement of muscles in the head and neck in order to stabilize the larynx) (in
Estill terms: Head and neck anchoring)
A downwards tilt of the cricoid cartilage (An alternative option would be the thyroid
tilting backwards. Observations show a larger CT space).
High positioning of the larynx
Maximum muscular effort of the extrinsic laryngeal muscles, minimum effort at the
level of the true vocal folds.
Narrowing of the aryepiglottic sphincter (the "twanger")
Possible dangers of belting
Use of belting without proper coordination can lead to forcing. Forcing can lead
consequently to vocal deterioration. Moderate use of the technique and, most importantly,
retraction of the ventricular folds while singing is vital to safe belting. Without proper
training in retraction, belting can indeed cause trauma to the vocal folds that requires the
immediate attention of a doctor.
Most tutors and some students of the method known as Speech Level Singing, created and
supported by Seth Riggs, regard belting as damaging to long term vocal health. They may
teach an alternative using a "mixed" or middle voice which can sound almost as strong, as
demonstrated by Aretha Franklin, Patti LaBelle, Celine Dion, Whitney Houston, Mariah
Carey,Lara Fabian, Ziana Zain, and Regine Velasquez. The subject of belting is a matter of
heated controversy among singers, singing teachers and methodologies.
Proponents of belting say that it is a "soft yell," and if produced properly it can be healthy.
It does not require straining and they say it is not damaging to the voice. Though
the larynx is higher than in classical technique,and many experts on the singing voice
believe that a high larynx position is both dangerous to vocal health and produces what
many find to be an unpleasant sound. According to master teacher David Jones, "Some of
the dangers are general swelling of the vocal cords, pre-polyp swelling, ballooning of
capillaries on the surface of the vocal cords, or vocal nodules. A high-larynxed approach to
the high voice taught by a speech level singing instructor who does not listen appropriately
can lead to one or ALL of these vocal disorders".
However, it is thought by some that belting will produce vocal nodules. This may be true if
belting is produced incorrectly. If the sound is produced is a mixed head and chest sound
that safely approximates a belt, produced well, there may be no damage to the vocal folds.
As for the physiological and acoustical features of the metallic voices, a master thesis has
drawn the following conclusions:
No significant changes in frequency and amplitude of F1 were observed
Significant increases in amplitudes of F2, F3 and F4 were found
In frequencies for F2, metallic voice perceived as louder was correlated to increase in
amplitude of F3 and F4
Vocal tract adjustments like velar lowering, pharyngeal wall narrowing, laryngeal
raising, aryepiglottic and lateral laryngeal constriction were frequently found.
Intelligibility (communication)
In phonetics, Intelligibility is a measure of how comprehendible speech is, or the degree
to which speech can be understood. Intelligibility is affected by spoken clarity, explicitness,
lucidity, comprehensibility, perspicuity, and precision.
Noise levels
For satisfactory communication, the average speech level should exceed that of an
interfering noise by 6dB; lower sound:noise ratios are rarely acceptable (Moore, 1997).
Manifesting in a wide frequency range, speech is quite resistant to many types of masking
frequency cut-off—Moore reports, for example, that a band of frequencies from 1000Hz to
2000Hz is sufficient (sentence articulation score of about 90%).
Word articulation remains high even when only 1–2% of the wave is unaffected by
distortion:
Quantity to be
Unit of measurement Good values
measured
%ALcons Articulation loss (popular in USA) < 10 %
C50 Clarity index (widespread in Germany) > 3 dB
STI (RASTI) Intelligibility (international known) > 0.6
Intelligibility with different types of speech
Lombard speech
The human brain automatically changes speech made in noise through a process called
the Lombard effect. Such speech has increased intelligibility compared to normal speech. It
is not only louder but the frequencies of its phonetic fundamental are increased and the
durations of its vowels are prolonged. People also tend to make more noticeable facial
movements.
Screaming
Shouted speech is less intelligible than Lombard speech because increased vocal energy
produces decreased phonetic information.
Clear speech
Clear speech is used when talking to a person with a hearing impairment. It is characterized
by a slower speaking rate, more and longer pauses, elevated speech intensity, increased
word duration, "targeted" vowel formants, increased consonant intensity compared to
adjacent vowels, and a number of phonological changes (including fewer reduced vowels
and more released stop bursts).
Infant-directed speech
Infant-directed speech—or Baby talk—uses a simplified syntax and a small and easier-to-
understand vocabulary than speech directed to adults Compared to adult directed speech, it
has a higher fundamental frequency, exaggerated pitch range, and slower rate.
Citation speech
Citation speech occurs when people engage self-consciously in spoken language research. It
has a slower tempo and fewer connected speech processes (e.g., shortening of nuclear
vowels, devoicing of word-final consonants) than normal speech.
Hyperspace speech
Hyperspace speech, also known as the hyperspace effect, occurs when people are misled
about the presence of environment noise. It involves modifying the F1 and F2 of phonetic
vowel targets to ease perceived difficulties on the part of the listener in recovering
information from the acoustic signal.
Lombard effect
Due to the Lombard effect, Great tits sing at a higher
frequency in noise polluted urban surroundings than
quieter ones to help overcome the auditory
masking that would otherwise impair other birds
hearing their song. In humans, the Lombard effect
results in speakers adjusting not only frequency but
also the intensity and rate of pronouncing
word syllables.
The Lombard effect or Lombard reflex is
the involuntary tendency of speakers to increase the
intensity of their voice when speaking inloud noise to
enhance its audibility. This change includes not only
loudness but also other acoustic features such
as pitch and rate and duration of
sound syllables. This compensation effect results in an increase in the auditory signal-to-
noise ratio of the speaker’sspoken words.
The effect links to the needs of effective communication as there is a reduced effect when
words are repeated or lists are read wherecommunication intelligibility is not
important. Since the effect is also involuntary it is used as a means to detect malingering in
those simulating hearing loss. Research upon Great tits and Beluga whales that live in
environments with noise pollution finds that the effect also occurs in the vocalizations of
nonhuman animals.
The effect was discovered in 1909 by Étienne Lombard, a French otolaryngologist.
Lombard speech
When heard with noise, listeners hear speech recorded in noise better compared to that
speech which has been recorded in quiet and then played given with the same level
of masking noise. Changes between normal and Lombard speech include:
increase in phonetic fundamental frequencies
shift in energy from low frequency bands to middle or high bands,
increase in sound intensity,
increase in vowel duration,
spectral tilting,
shift in formant center frequencies for F1 (mainly) and F2.
the duration of content words are prolonged to a greater degree in noise
than function words.
great lung volumes are used,
it is accompanied by larger facial movements but these do not aid as much as its
sound changes.
These changes cannot be controlled by instructing a person to speak as they would in
silence, though people can learn control with feedback.
The Lombard effect also occurs following laryngectomy when people following speech
therapy talk with esophageal speech.
Mechanisms
The intelligibility of an individual’s own vocalization can be adjusted with audio-vocal
reflexes using their own hearing (private loop), or it can be adjusted indirectly in terms of
how well listeners can hear the vocalization (public loop). Both processes are involved in the
Lombard effect.
Private loop
A speaker can regulate their vocalizations particularly its amplitude relative to background
noise with reflexive auditory feedback. Such auditory feedback is known to maintain the
production of vocalization since deafness affects the vocal acoustics of both humans and
songbirds Changing the auditory feedback also changes vocalization in human speechor bird
song. Neural circuits have been found in the brainstem that enable such reflex adjustment.
Public loop
A speaker can regulate their vocalizations at higher cognitive level in terms of observing its
consequences on their audience’s ability to hear it. In this auditory self-monitoring adjusts
vocalizations in terms of learnt associations of what features of their vocalization, when
made in noise, create effective and efficient communication. The Lombard effect has been
found to be greatest upon those words that are important to the listener to understand a
speaker suggesting such cognitive effects are important.
Development
Both private and public loop processes exist in children. There is a development shift
however from the Lombard effect being linked to acoustic self-monitoring in young children
to the adjustment of vocalizations to aid its intelligibility for others in adults.
Neurology
The Lombard effect depends upon audio-vocal neurons in the periolivary region of
the superior olivary complex and the adjacent pontine reticular formation. It has been
suggested that the Lombard effect might also involve the higher cortical areas that control
these lower brainstem areas.
Choral singing
Choral singers experience reduced feedback due to the sound of other singers upon their
own voice. This results in a tendency for people in choruses to sing at a louder level if it is
not controlled by a conductor. Trained soloists can control this effect but it has been
suggested that after a concert they might speak more loudly in noisy surrounding as in
after-concert parties.
The Lombard effect also occurs to those playing instruments such as the guitar
Animal vocalization
Noise has been found to effect the vocalizations of animals that vocalize against a
background of human noise pollution. Great tits in Leiden sing with a higher frequency than
do those in quieter area to overcome the masking effect of the low frequency background
noise pollution of cities. Beluga whales in the St. Lawrence River estuary adjust their whale
song so it can be heard against shipping noise
Experimentally, the Lombard effect has also been found in the vocalization of:
Budgerigars
Cats
Chickens
Common marmosets
Cottontop tamarins
Japanese quail
Nightingales
Rhesus Macaques
Squirrel monkey.
Zebra finches
Manner of articulation
A continuum from closed glottis to open. The black triangles represent the arytenoid
cartilages, the sail shapes the vocal cords, and the dotted circle the windpipe.
In linguistic phonetic treatments of phonation, such as those of Peter Ladefoged, phonation
was considered to be a matter of points on a continuum of tension and closure of the vocal
cords. More intricate mechanisms were occasionally described, but they were difficult to
investigate, and until recently the state of the glottis and phonation were considered to be
nearly synonymous.
If the vocal cords are completely relaxed, with the arytenoid cartilages apart for maximum
airflow, the cords do not vibrate. This is voicelessphonation, and is extremely common
with obstruents. If the arytenoids are pressed together for glottal closure, the vocal cords
block the airstream, producing stop sounds such as the glottal stop. In between there is
a sweet spot of maximum vibration. This is modal voice, and is the normal state for vowels
and sonorants in all the world's languages. However, the aperture of the arytenoid
cartilages, and therefore the tension in the vocal cords, is one of degree between the end
points of open and closed, and there are several intermediate situations utilized by various
languages to make contrasting sounds.
For example, Gujarati has vowels with a partially lax phonation called breathy
voice or murmured, while Burmese has vowels with a partially tense phonation
called creaky voice orlaryngealized. Both of these phonations have dedicated IPA
diacritics, an under-umlaut and under-tilde. The Jalapa dialect of Mazatec is unusual in
contrasting both with modal voice in a three-way distinction. (Note that Mazatec is a tonal
language, so the glottis is making several tonal distinctions simultaneously with the
phonation distinctions.)
Mazatec
breathy voice [ja̤] he wears
modal voice [já] tree
he
creaky voice [ja̰]
carries
Note: There was an . ing error in the source of this information. The latter two
translations may have been mixed up.
Javanese does not have modal voice in its plosives, but contrasts two other points along the
phonation scale, with more moderate departures from modal voice, called slack
voice and stiff voice. The "muddy" consonants in Shanghainese are slack voice; they
contrast with tenuis and aspirated consonants.
Although each language may be somewhat different, it is convenient to classify these
degrees of phonation into discrete categories. A series of seven alveolar plosives, with
phonations ranging from an open/lax to a closed/tense glottis, are:
Open glottis [t] voiceless (full airstream)
[d̤] breathy voice
Speaker recognition
.
Speaker recognition is the computing task of validating a user's claimed identity
using characteristics extracted from their voices.
There is a difference between speaker recognition (recognizing who is speaking)
and speech recognition (recognizing what is being said). These two terms are frequently
confused, as isvoice recognition. Voice recognition is combination of the two where it uses
learned aspects of a speakers voice to determine what is being said - such a system cannot
recognise speech from random speakers very accurately, but it can reach high accuracy for
individual voices it has been trained with. In addition, there is a difference between the act
of authentication (commonly referred to as speaker verification or speaker
authentication) and identification.
Speaker recognition has a history dating back some four decades and uses the acoustic
features of speech that have been found to differ between individuals. These acoustic
patterns reflect both anatomy (e.g., size and shape of the throat and mouth) and learned
behavioral patterns (e.g., voice pitch, speaking style). Speaker verification has earned
speaker recognition its classification as a "behavioral biometric."
Verification versus identification
There are two major applications of speaker recognition technologies and methodologies. If
the speaker claims to be of a certain identity and the voice is used to verify this claim, this
is called verification or authentication. On the other hand, identification is the task of
determining an unknown speaker's identity. In a sense speaker verification is a 1:1 match
where one speaker's voice is matched to one template (also called a "voice print" or "voice
model") whereas speaker identification is a 1:N match where the voice is compared against
N templates.
From a security perspective, identification is different from verification. For example,
presenting your passport at border control is a verification process - the agent compares
your face to the picture in the document. Conversely, a police officer comparing a sketch of
an assailant against a database of previously documented criminals to find the closest
match(es) is an identification process.
Speaker verification is usually employed as a "gatekeeper" in order to provide access to a
secure system (e.g.: telephone banking). These systems operate with the user's knowledge
and typically requires their cooperation. Speaker identification systems can also be
implemented covertly without the user's knowledge to identify talkers in a discussion, alert
automated systems of speaker changes, check if a user is already enrolled in a system, etc.
In forensic applications, it is common to first perform a speaker identification process to
create a list of "best matches" and then perform a series of verification processes to
determine a conclusive match.[citation needed]
Variants of speaker recognition
Each speaker recognition system has two phases: Enrollment and verification. During
enrollment, the speaker's voice is recorded and typically a number of features are extracted
to form a voice print, template, or model. In the verification phase, a speech sample or
"utterance" is compared against a previously created voice print. For identification systems,
the utterance is compared against multiple voice prints in order to determine the best
match(es) while verification systems compare an utterance against a single voice print.
Because of the process involved, verification is faster than identification.
Speaker recognition systems fall into two categories: text-dependent and text-independent.
If the text must be the same for enrollment and verification this is called text-dependent
recognition. In a text-dependent system, prompts can either be common across all speakers
(e.g.: a common pass phrase) or unique. In addition, the use of shared-secrets (e.g.:
passwords and PINs) or knowledge-based information can be employed in order to create a
multi-factor authentication scenario.
Text-independent systems are most often used for speaker identification as they require
very little if any cooperation by the speaker. In this case the text during enrollment and test
is different. In fact, the enrollment may happen without the user's knowledge, as in the
case for many forensic applications. As text-independent technologies do not compare what
was said at enrollment and verification, verification applications tend to also employ speech
recognition to determine what the user is saying at the point of authentication.
Technology
The various technologies used to process and store voice prints include frequency
estimation, hidden Markov models, Gaussian mixture models, pattern
matching algorithms, neural networks, matrix representation,Vector Quantization
and decision trees. Some systems also use "anti-speaker" techniques, such as cohort
models, and world models.
Ambient noise levels can impede both collection of the initial and subsequent voice samples.
Noise reduction algorithms can be employed to improve accuracy, but incorrect application
can have the opposite effect. Performance degradation can result from changes in
behavioural attributes of the voice and from enrolment using one telephone and verification
on another telephone ("cross channel"). Integration with two-factor authentication products
is expected to increase. Voice changes due to ageing may impact system performance over
time. Some systems adapt the speaker models after each successful verification to capture
such long-term changes in the voice, though there is debate regarding the overall security
impact imposed by automated adaptation.
Capture of the biometric is seen as non-invasive. The technology traditionally uses existing
microphones and voice transmission technology allowing recognition over long distances via
ordinary telephones (wired or wireless).
Digitally recorded audio voice identification and analogue recorded voice identification uses
electronic measurements as well as critical listening skills that must be applied by a forensic
expert in order for the identification to be accurate.
Speaker recognition
Speaker recognition is the computing task of validating a user's claimed identity using
characteristics extracted from their voices.
There is a difference between speaker recognition (recognizing who is speaking)
and speech recognition (recognizing what is being said). These two terms are frequently
confused, as isvoice recognition. Voice recognition is combination of the two where it uses
learned aspects of a speakers voice to determine what is being said - such a system cannot
recognise speech from random speakers very accurately, but it can reach high accuracy for
individual voices it has been trained with. In addition, there is a difference between the act
of authentication (commonly referred to as speaker verification or speaker
authentication) and identification.
Speaker recognition has a history dating back some four decades and uses the acoustic
features of speech that have been found to differ between individuals. These acoustic
patterns reflect both anatomy (e.g., size and shape of the throat and mouth) and learned
behavioral patterns (e.g., voice pitch, speaking style). Speaker verification has earned
speaker recognition its classification as a "behavioral biometric."
Verification versus identification
There are two major applications of speaker recognition technologies and methodologies. If
the speaker claims to be of a certain identity and the voice is used to verify this claim, this
is called verification or authentication. On the other hand, identification is the task of
determining an unknown speaker's identity. In a sense speaker verification is a 1:1 match
where one speaker's voice is matched to one template (also called a "voice print" or "voice
model") whereas speaker identification is a 1:N match where the voice is compared against
N templates.
From a security perspective, identification is different from verification. For example,
presenting your passport at border control is a verification process - the agent compares
your face to the picture in the document. Conversely, a police officer comparing a sketch of
an assailant against a database of previously documented criminals to find the closest
match(es) is an identification process.
Speaker verification is usually employed as a "gatekeeper" in order to provide access to a
secure system (e.g.: telephone banking). These systems operate with the user's knowledge
and typically requires their cooperation. Speaker identification systems can also be
implemented covertly without the user's knowledge to identify talkers in a discussion, alert
automated systems of speaker changes, check if a user is already enrolled in a system, etc.
In forensic applications, it is common to first perform a speaker identification process to
create a list of "best matches" and then perform a series of verification processes to
determine a conclusive match.[citation needed]
Variants of speaker recognition
Each speaker recognition system has two phases: Enrollment and verification. During
enrollment, the speaker's voice is recorded and typically a number of features are extracted
to form a voice print, template, or model. In the verification phase, a speech sample or
"utterance" is compared against a previously created voice print. For identification systems,
the utterance is compared against multiple voice prints in order to determine the best
match(es) while verification systems compare an utterance against a single voice print.
Because of the process involved, verification is faster than identification.Speaker
recognition systems fall into two categories: text-dependent and text-independent.
If the text must be the same for enrollment and verification this is called text-dependent
recognition. In a text-dependent system, prompts can either be common across all speakers
(e.g.: a common pass phrase) or unique. In addition, the use of shared-secrets (e.g.:
passwords and PINs) or knowledge-based information can be employed in order to create a
multi-factor authentication scenario.
Text-independent systems are most often used for speaker identification as they require
very little if any cooperation by the speaker. In this case the text during enrollment and test
is different. In fact, the enrollment may happen without the user's knowledge, as in the
case for many forensic applications. As text-independent technologies do not compare what
was said at enrollment and verification, verification applications tend to also employ speech
recognition to determine what the user is saying at the point of authentication.
Technology
The various technologies used to process and store voice prints include frequency
estimation, hidden Markov models, Gaussian mixture models, pattern
matching algorithms, neural networks, matrix representation,Vector Quantization
and decision trees. Some systems also use "anti-speaker" techniques, such as cohort
models, and world models.
Ambient noise levels can impede both collection of the initial and subsequent voice samples.
Noise reduction algorithms can be employed to improve accuracy, but incorrect application
can have the opposite effect. Performance degradation can result from changes in
behavioural attributes of the voice and from enrolment using one telephone and verification
on another telephone ("cross channel"). Integration with two-factor authentication products
is expected to increase. Voice changes due to ageing may impact system performance over
time. Some systems adapt the speaker models after each successful verification to capture
such long-term changes in the voice, though there is debate regarding the overall security
impact imposed by automated adaptation.
Capture of the biometric is seen as non-invasive. The technology traditionally uses existing
microphones and voice transmission technology allowing recognition over long distances via
ordinary telephones (wired or wireless).
Digitally recorded audio voice identification and analogue recorded voice identification uses
electronic measurements as well as critical listening skills that must be applied by a forensic
expert in order for the identification to be accurate.
Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for
this purpose is called a speech synthesizer, and can be implemented
in software orhardware. A text-to-speech (TTS) system converts normal language text
into speech; other systems render symbolic linguistic representations like phonetic
transcriptions into speech.
Synthesized speech can be created by concatenating pieces of recorded speech that are
stored in a database. Systems differ in the size of the stored speech units; a system that
stores phones or diphones provides the largest output range, but may lack clarity. For
specific usage domains, the storage of entire words or sentences allows for high-quality
output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other
human voice characteristics to create a completely "synthetic" voice output.
The quality of a speech synthesizer is judged by its similarity to the human voice and by its
ability to be understood. An intelligible text-to-speech program allows people with visual
impairments or reading disabilities to listen to written works on a home computer. Many
computer operating systems have included speech synthesizers since the early 1980s.
Overview of text processing
Vocal loading
Vocal loading is the stress inflicted on the speech organs when speaking for long periods.
Background
Of the working population, about 15% have professions where their voice is their primary
tool. That includes professions such as teachers, sales personnel, actors and singers, and TV
and radio reporters. Many of them, especially teachers, suffer from voice-related medical
problems. In a larger scope, this involves millions of sick-leave days every year, for
example, both in the US and the European Union. Still, research in vocal loading has often
been treated as a minor subject.
Voice organ
Voiced speech is produced by air streaming from the lungs through the vocal cords, setting
them into an oscillating movement. In every oscillation, the vocal folds are closed for a
short period of time. When the folds reopen the pressure under the folds is released. These
changes in pressure form the waves called (voiced) speech.
Loading on tissue in vocal folds
The fundamental frequency of speech for an average male is around 110Hz and for an
average female around 220Hz. That means that for voiced sounds the vocal folds will hit
together 110 or 220 times a second, respectively. Suppose then that a female is speaking
continuously for an hour. Of this time perhaps five minutes is voiced speech. The folds will
then hit together more than 30 thousand times an hour. It is intuitively clear that the vocal
fold tissue will experience some tiring due to this large number of hits.
Vocal loading also includes other kinds of strain on the speech organs. These include all
kinds of muscular strain in the speech organs, similarly as usage of any other muscles will
experience strain if used for an extended period of time. However, researchers' largest
interest lies in stress exerted on the vocal folds.
Effect of speaking environment
Several studies in vocal loading show that the speaking environment does have a significant
impact on vocal loading. Still, the exact details are debated. Most scientists agree on the
effect of the following environmental properties:
air humidity - dry air increases stress experienced in the vocal folds
hydration - dehydration increases effects of stress inflicted on the vocal folds
background noise - people tend to speak louder when background noise is present,
even when it isn't necessary. Increasing speaking volume increases stress inflicted on
the vocal folds
pitch - the "normal" speaking style has close to optimal pitch. Using a higher or
lower pitch than normal will also increase stress in the speech organs.
In addition, smoking and other types of air pollution might have a negative effect on voice
production organs.
Symptoms
Objective evaluation or measurement of vocal loading is very difficult due to the tight
coupling of the experienced psychological and physiological stress. However, there are some
typical symptoms that can be objectively measured. Firstly, the pitch range of the voice will
decrease. Pitch range indicates the possible pitches that can be spoken. When a voice is
loaded, the upper pitch limit will decrease and the lower pitch limit will rise. Similarly, the
volume range will decrease.
Secondly, an increase in the hoarseness and strain of a voice can often be heard.
Unfortunately, both properties are difficult to measure objectively, and only perceptual
evaluations can be performed.
Voice care
Regularly, the question arises of how one should use one's voice to minimise tiring in the
vocal organs. This is encompassed in the study of vocology, the science and practice of
voice habilitation. Basically, a normal, relaxed way of speech is the optimal method for voice
production, in both speech and singing. Any excess force used when speaking will increase
tiring. The speaker should drink enough water and the air humidity level should be normal
or higher. No background noise should be present or, if not possible, the voice should be
amplified. Smoking is discouraged.
Vocal rest
Vocal rest is the process of resting the vocal folds by not speaking or singing, which
typically follows vocal disorders or viral infections which cause hoarseness in the voice, such
as thecommon cold or influenza. The purpose of vocal rest is to hasten recovery time. It is
believed that vocal rest, along with rehydration, will significantly decrease recovery time
after a cold. It is generally believed, however, that if one needs to communicate one should
speak and not whisper. The reasons for this differ; some believe that whispering merely
does not allow the voice to rest and may have a dehydrating effect, while others hold that
whispering can cause additional stress to the larynx.
Vocal range
Vocal range is the measure of the breadth of pitches that a human voice can phonate.
Although the study of vocal range has little practical application in terms of speech, it is a
topic of study within linguistics, phonetics, and speech and language pathology, particularly
in relation to the study of tonal languages and certain types of vocal disorders. However,
the most common application of the term "vocal range" is within the context of singing,
where it is used as one of the major defining characteristics for classifying singing voices
into groups known as voice types.
Singing and the definition of vocal range
While the broadest definition of vocal range is simply the span from the lowest to the
highest note a particular voice can produce, this broad definition is often not what is meant
when "vocal range" is discussed in the context of singing. Vocal pedagogists tend to define
the vocal range as the total span of "musically useful" pitches that a singer can produce.
This is because some of the notes a voice can produce may not be considered usable by the
singer within performance for various reasons. For example, within opera all singers must
project over an orchestra without the aid of a microphone. An opera singer would therefore
only be able to include the notes that they are able to adequately project over an orchestra
within their vocal range. In contrast, a pop artist could include notes that could be heard
with the aid of a microphone.
Another factor to consider is the use of different forms of vocal production. The human voice
is capable of producing sounds using different physiological processes within the larynx.
These different forms of voice production are known as vocal registers. While the exact
number and definition of vocal registers is a controversial topic within the field of singing,
the sciences identify only four registers: the whistle register, the falsetto register, the modal
register, and the vocal fry register. Typically, only the usable range of the modal register,
the register used in normal speech and most singing, is used when determining vocal range.
However, there are some instances where other vocal registers are included. For example,
within opera, countertenors utilize falsetto often and coloratura sopranos utilize the whistle
register frequently. These voice types would therefore include the notes from these other
registers within their vocal range. Another example would be a male doo-wop singer who
might quite regularly deploy his falsetto pitches in performance and thus include them in
determining his range. However, in most cases only the usable pitches within the modal
register are included when determining a singer's vocal range.
Vocal range and voice classification
Vocal range plays such an important role in classifying singing voices into voice types that
sometimes the two terms are confused with one another. A voice type is a particular kind of
human singing voice perceived as having certain identifying qualities or characteristics;
vocal range being only one of those characteristics. Other factors are vocal weight,
vocal tessitura, vocal timbre, vocal transition points, physical characteristics, speech level,
scientific testing, and vocal registration. All of these factors combined are used to categorize
a singer's voice into a particular kind of singing voice or voice type.
There are a plethora of different voice types used by vocal pedagogists today in a variety of
voice classification systems. Most of these types, however, are sub-types that fall under
seven different major voice categories that are for the most part acknowledged across all of
the major voice classification systems. Women are typically divided into three
groups: soprano,mezzo-soprano, and contralto. Men are usually divided into four
groups: countertenor, tenor, baritone, and bass. When considering the pre-pubescent
voices of children an eighth term,treble, can be applied. Within each of these major
categories there are several sub-categories that identify specific vocal qualities
like coloratura facility and vocal weight to differentiate between voices.
Vocal range itself can not determine a singer's voice type. While each voice type does have
a general vocal range associated with it, human singing voices may possess vocal ranges
that encompass more than one voice type or are in between the typical ranges of two voice
types. Therefore, voice teachers only use vocal range as one factor in classifying a singer's
voice. More important than range in voice classification is tessitura, or where the voice is
most comfortable singing, and vocal timbre, or the characteristic sound of the singing
voice.For example, a female singer may have a vocal range that encompasses the high
notes of a mezzo-soprano and the low notes of a soprano. A voice teacher would therefore
look to see whether or not the singer were more comfortable singing up higher or singing
lower. If the singer were more comfortable singing higher than the teacher would probably
classify her as a soprano and if the singer were more comfortable singing lower than they
would probably classify her as a mezzo-soprano. The teacher would also listen to the sound
of the voice. Sopranos tend to have a lighter and less rich vocal sound than a mezzo-
soprano. A voice teacher, however, would never classify a singer in more than one voice
type, regardless of the size of their vocal range.
The following are the general vocal ranges associated with each voice type using scientific
pitch notation where middle C=C4. Some singers within these voice types may be able to
sing somewhat higher or lower:
Soprano: C4 – C6
Mezzo-soprano: A3 – A5
Contralto: F3 – F5
Tenor: C3 – C5
Baritone: F2 – F4
Bass: E2 – E4
In terms of frequency, human voices are roughly in the range of 80 Hz to 1100 Hz (that is,
E2 to C6) for normal male and female voices together.
World records and extremes of vocal range
The following facts about female and male ranges are known:
Guinness lists the highest demanded note in the classical repertoire as G6 in 'Popoli
di Tessaglia,' a concert aria by W. A. Mozart, composed for Aloysia Weber. Though pitch
standards were not fixed in the eighteenth century, this rare note is also heard in the
opera Esclarmonde by Jules Massenet.[citation needed] The highest note commonly called for is
F6, famously heard in the Queen of the Night's two arias "Der Hölle Rache kocht in
meinem Herzen" and "O zittre nicht, mein lieber Sohn" in Mozart's opera Die
Zauberflöte.
Several little-known works call for pitches higher than G6. For example, the soprano Mado
Robin, who was known for her exceptionally high voice, sang a number of compositions
created especially to exploit her highest notes, reaching C7.
Lowest note in a solo: Guinness lists the lowest demanded note in the classical
repertoire as D2 (almost two octaves below Middle C) in Osmin's second aria in
Mozart's Die Entführung aus dem Serail. Although Osmin's note is the lowest 'demanded'
in the operatic repertoire, lower notes are frequently heard, both written and unwritten,
and it is traditional for basses to interpolate a low C in the duet "Ich gehe doch rathe ich
dir" in the same opera. Leonard Bernstein composed an optional B1 (a minor third below
D2) in a bass aria [not specific enough to verify] in the opera house version of Candide. In a Russian
piece combining solo and choral singing, Pavel Chesnokov directs the bass soloist in "Do
not deny me in my old age" to descend even lower, to G1, depending on the
arrangement.
Lowest note for a choir: Mahler's Eighth Symphony (bar 1457 in the "Chorus
mysticus") and Rachmaninoff's Vespers require B♭1. In Russian choirs
the oktavists traditionally sing an octave below the bass part, down to G1.
Vocal warm up
A vocal warm-up is a series of exercises which prepare the voice for singing, acting, or
other use.
Why Warm Up
A study by Elliott, Sundberg, & Gramming emphasized that changing pitch undoubtedly
stretches the muscles, and any singer will tell you that vocal warm-ups make them feel
more prepared.
Physical whole-body warm-ups also help prepare a singer. Muscles all over the body are
used when singing (the diaphragm being one of the most obvious). Stretches of
the abdomen,back, neck, and shoulders are important to avoid stress, which influences the
sound of the voice.
Some warm ups also train your voice. Sometimes called vocalises, these activities
teach breath control, diction, blending, and balance.
How To Warm Up
Breathing
Before you start to actually sing, it is important to start breathing properly and from the
diaphragm. Start with simple exercises such as hissing. Take a deep breath in then make a
hissing sound, breathing outwards until you've expelled as much air as possible from your
lungs. Repeat several times and be sure when you're breathing in to breath using your
diaphragm, not moving your shoulders up and down. (That is a common sign of an
untrained breather).
After, use lip trills and tongue trills to help control your breathing as well. Start just using a
steady note, then making a "fire engine sound" go up and down. Eventually move to real
notes, starting in the middle of your range, such as Middle C.
Range and Tone
Start easy, with light humming. Pick a note in the middle of your range (Middle C is
reasonable) and begin humming. Move between notes, but stay in the middle range.
To start warming up your range, sigh from the top of your range to the bottom, letting the
voice fall in a glissando without much control. Do several of these, working on getting really
to the highest and lowest parts of your range.
Next, sing an arpeggio of three thirds to an octave (1 3 5 1 5 3 1), again starting
from middle C. Use open vowels, like o, ih, ay, and ah, starting with a consonant like B, D,
or P. Repeat the exercise a half-step higher, and continue up to the top of your range, but
don't push too high.
Next, sing down a five note scale, with an open vowel and a sibilant like Z. "Za a a a a" is
reasonable. This time, repeat the exercise a half-step lower, to the bottom of your
comfortablerange.
Finally, sing a slightly more difficult phrase, again starting an octave lower than middle
C. Jump first an octave, then down a fourth, then down a third, then another third. (1 8 5 3
1). The phrase "I lo-ove to sing" fits with this exercise. Others choose to sing a few words
over and over to warm up, such as "Me, my, mo, mull."
Vocology
Vocology is the science of enabling or endowing the human voice with greater ability or
fitness. Its concerns include the nature of speech and language pathology, the defects of
the vocal tract (laryngology), the remediation of speech therapy and the voice
training and voice pedagogy of song and speech for actors and public speakers.
The study of vocology is recognized academically in taught courses and institutes such as
the National Center for Voice and Speech, Westminster Choir College at Rider University,
The Grabscheid Voice Center at Mount Sinai Medical Center, the Vox Humana Laboratory
at St. Luke's-Roosevelt Hospital Center and the Regional Center for Voice and Swallowing,
at Milan's Azienda Ospedaliera Fatebenefratelli e Oftalmico.
Also reflecting this increased recognition is that when the Scandinavian journal of
logopedics & phoniatrics and Voice merged in 1996 the new name selected was Logopedics,
Phoniatrics, Vocology.
Meaning and Origin of term
Vocology was invented (simultaneously, but independently) by lngo R. Titze, and
an otolaryngologist at Washington University, Prof. George Gates. Titze defines Vocology as
"the science and practice of voice habilitation, with a strong emphasis on habilitation". To
habilitate means to “enable”, to “equip for”, to “capacitate”; in other words, to assist in
performing whatever function that needs to be performed". He goes on that this "is more
than repairing a voice or bringing it back to a former state ... rather, it is the process of
strengthening and equipping the voice to meet very specific and special demands".
Voice analysis
Voice analysis is the study of speech sounds for purposes other than linguistic content,
such as in speech recognition. Such studies include mostly medical analysis of
the voice i.e.phoniatrics, but also speaker identification. More controversially, some believe
that the truthfulness or emotional state of speakers can be determined using Voice Stress
Analysis orLayered Voice Analysis.
Typical voice problems
A medical study of the voice can be, for instance, analysis of the voice of patients who have
had a polyp removed from his or her vocal cords through an operation. In order to
objectively evaluate the improvement in voice quality there has to be some measure of
voice quality. An experienced voice therapist can quite reliably evaluate the voice, but this
requires extensive training and is still always subjective.
Another active research topic in medical voice analysis is vocal loading evaluation. The vocal
cords of a person speaking for an extended period of time will suffer from tiring, that is, the
process of speaking exerts a load on the vocal cords where the tissue will suffer from tiring.
Among professional voice users (i.e. teachers, sales people) this tiring can cause voice
failures and sick leaves. To evaluate these problems vocal loading needs to be objectively
measured.
Analysis methods
Voice problems that require voice analysis most commonly originate from the vocal folds or
the laryngeal musculature that controls them, since the folds are subject to collision forces
with each vibratory cycle and to drying from the air being forced through the small gap
between them, and the laryngeal musclature is intensely active during speech or singing
and is subject to tiring. However, dynamic analysis of the vocal folds and their movement is
physically difficult. The location of the vocal folds effectively prohibits direct, invasive
measurement of movement. Less invasive imaging methods such as x-
rays or ultrasounds do not work because the vocal cords are surrounded by cartilage which
distort image quality. Movements in the vocal cords are rapid, fundamental frequencies are
usually between 80 and 300 Hz, thus preventing usage of ordinary video. Stroboscopic,
and high-speed videos provide an option but in order to see the vocal folds, a fiberoptic
probe leading to the camera has to be positioned in the throat, which makes speaking
difficult. In addition, placing objects in the pharynx usually triggers a gag reflex that stops
voicing and closes the larynx. In addition, stroboscopic imaging is only useful when the
vocal fold vibratory pattern is closely periodic.
The most important indirect methods are currently inverse filtering of either microphone or
oral airflow recordings and electroglottography (EGG). In inverse filtering, the speech sound
(the radiated acoustic pressure waveform, as obtained from a microphone) or the oral
airflow waveform from a circumferentially vented (CV) mask is recorded outside the mouth
and then filtered by a mathematical method to remove the effects of the vocal tract. This
method produces an estimate of the waveform of the glottal airflow pulses, which in turn
reflect the movements of the vocal folds. The other kind of noninvasive indirect indication of
vocal fold motion is the electroglottography, in which electrodes placed on either side of the
subject's throat at the level of the vocal folds record the changes in the conductivity of the
throat according to how large a portion of the vocal folds are touching each other. It thus
yields one-dimensional information of the contact area. Neither inverse filtering nor EGG are
sufficient to completely describe the complex 3-dimensional pattern of vocal fold movement,
but can provide useful indirect evidence of that movement.
List of voice disorders
From Wikipedia, the free encyclopedia
(Redirected from Voice disorders)
Voice disorders are medical conditions affecting the production of speech. These include:
Chorditis
Vocal fold nodules
Vocal fold cysts
Vocal cord paresis
Reinke's Edema
Spasmodic dysphonia
Foreign accent syndrome
Bogart-Bacall Syndrome
Laryngeal papillomatosis
Puberphonia
Voice frequency
From Wikipedia, the free encyclopedia
A voice frequency (VF) or voice band is one of the frequencies, within part of
the audio range, that is used for the transmission of speech.
In telephony, the usable voice frequency band ranges from approximately 300 Hz to
3400 Hz. It is for this reason that the ultra low frequency band of the electromagnetic
spectrumbetween 300 and 3000 Hz is also referred to as voice frequency (despite the fact
that this is electromagnetic energy, not acoustic energy). The bandwidth allocated for a
single voice-frequency transmission channel is usually 4 kHz, including guard bands,
allowing a sampling rate of 8 kHz to be used as the basis of the pulse code
modulation system used for the digital PSTN.
Fundamental frequency
The voiced speech of a typical adult male will have a fundamental frequency from 85 to
180 Hz, and that of a typical adult female from 165 to 255 Hz. Thus, the fundamental
frequency of most speech falls below the bottom of the "voice frequency" band as defined
above. However, enough of the harmonic series will be present for the missing
fundamental to create the impression of hearing the fundamental tone.
Vocal apparatus
Abduction and adduction
Latin plica vocalis
Gray's subject #236 1079
MeSH Vocal+Cords
Vocal pedagogy covers a broad range of aspects of singing, ranging from the physiological
process of vocal production to the artistic aspects of interpretation of songs from different
genres or historical eras. Typical areas of study include:
Human anatomy and physiology as it relates to the physical process of singing.
Breathing and air support for singing
Posture for singing
Phonation
Vocal resonation or voice projection
Diction, vowels and articulation
Vocal registration
Sostenuto and legato for singing
Other singing elements, such as range extension, tone quality, vibrato, coloratura
Vocal health and voice disorders related to singing
Vocal styles, such as learning to sing opera, belt, or Art song
Phonetics
Voice classification
All of these different concepts are a part of developing proper vocal technique. Not all
vocal teachers have the same opinions within every topic of study which causes variations in
pedagogical approaches and vocal technique.
History
Pythagoras, the man in the center with the book, teaching music, in The School of
Athens by Raphael
Within Western culture, the study of vocal pedagogy began in Ancient Greece. Scholars such
as Alypius and Pythagoras studied and made observations on the art of singing. It is
unclear, however, whether the Greeks ever developed a systematic approach to teaching
singing as little writing on the subject survives today.
The first surviving record of a systematized approach to teaching singing was developed in
the medieval monasteries of the Roman Catholic Church sometime near the beginning of the
13th century. As with other fields of study, the monasteries were the center of musical
intellectual life during the medieval period and many men within the monasteries devoted
their time to the study of music and the art of singing. Highly influential in the development
of a vocal pedagogical system were monks Johannes de Garlandia and Jerome of
Moravia who were the first to develop a concept of vocal registers. These men identified
three registers: chest voice, throat voice , and head voice (pectoris , guttoris, and capitis).
Their concept of head voice, however, is much more similar to the modern pedagogists
understanding of the falsetto register. Other concepts discussed in the monastic system
included vocal resonance, voice classification, breath support, diction, and tone quality to
name a few. The ideas developed within the monastic system highly influenced the
development of vocal pedagogy over the next several centuries including the Bel Canto style
of singing.
With the onset of the Renaissance in the 15th century, the study of singing began to move
outside of the church. The courts of rich partons, such as the Dukes of Burgundy who
supported the Burgundian School and the Franco-Flemish School, became secular centers of
study for singing and all other areas of musical study. The vocal pedagogical methods
taught in these schools, however, were based on the concepts developed within the
monastic system. Many of the teachers within these schools had their initial musical training
from singing in church choirs as children. The church also remained at the forefront of
musical composition at this time and remained highly influential in shaping musical tastes
and practices both in and outside the church. It was the Catholic Church that first
popularized the use of castrato singers in the 16th century, which ultimately led to the
popularity of castrato voices in Baroque and Classical operas.
It was not until the development of opera in the 17th century that vocal pedagogy began to
break away from some of the established thinking of the monastic writers and develop
deeper understandings of the physical process of singing and its relation to key concepts
like vocal registration and vocal resonation. It was also during this time, that noted voice
teachers began to emerge. Giulio Caccini is an example of an important early Italian voice
teacher. In the late 17th century, the bel canto method of singing began to develop in Italy.
This style of singing had a huge impact on the development of opera and the development
of vocal pedagogy during the Classical and Romantic periods. It was during this time, that
teachers and composers first began to identify singers by and write roles for more
specific voice types. However, it wasn't until the 19th century that more clearly defined
voice classification systems like the German Fach system emerged. Within these systems,
more descriptive terms were used in classifying voices such as coloratura soprano and lyric
soprano.
Vocal resonation is the process by which the basic product of phonation is enhanced in
timbre and/or intensity by the air-filled cavities through which it passes on its way to the
outside air. Various terms related to the resonation process include amplification,
enrichment, enlargement, improvement, intensification, and prolongation, although in
strictly scientific usage acoustic authorities would question most of them. The main point to
be drawn from these terms by a singer or speaker is that the end result of resonation is, or
should be, to make a better sound.
There are seven areas that may be listed as possible vocal resonators. In sequence from the
lowest within the body to the highest, these areas are the chest, the tracheal tree,
the larynx itself, the pharynx, the oral cavity, the nasal cavity, and the sinuses.
Articulation
Places of articulation (passive & active):
1. Exo-labial, 2. Endo-labial, 3. Dental, 4. Alveolar, 5. Post-alveolar, 6. Pre-palatal, 7.
Palatal, 8. Velar, 9. Uvular, 10. Pharyngeal, 11. Glottal, 12. Epiglottal, 13. Radical, 14.
Postero-dorsal, 15. Antero-dorsal, 16. Laminal, 17. Apical, 18. Sub-apical
Articulation is the process by which the joint product of the vibrator and the resonators is
shaped into recognizable speech sounds through the muscular adjustments and movements
of the speech organs. These adjustments and movements of the articulators result in verbal
communication and thus form the essential difference between the human voice and other
musical instruments. Singing without understandable words limits the voice to nonverbal
communication. In relation to the physical process of singing, vocal instructors tend to focus
more on active articulation as opposed to passive articulation. There are five basic active
articulators: the lip ("labial consonants"), the flexible front of the tongue ("coronal
consonants"), the middle/back of the tongue ("dorsal consonants"), the root of the tongue
together with the epiglottis ("radical consonants"), and the larynx ("laryngeal consonants").
These articulators can act independently of each other, and two or more may work together
in what is calledcoarticulation.
Unlike active articulation, passive articulation is a continuum without many clear-cut
boundaries. The places linguolabial and interdental, interdental and dental, dental and
alveolar, alveolar and palatal, palatal and velar, velar and uvular merge into one another,
and a consonant may be pronounced somewhere between the named places.
In addition, when the front of the tongue is used, it may be the upper surface or blade of
the tongue that makes contact ("laminal consonants"), the tip of the tongue ("apical
consonants"), or the under surface ("sub-apical consonants"). These articulations also
merge into one another without clear boundaries.
Interpretation
Interpretation is sometimes listed by voice teachers as a fifth physical process even though
strictly speaking it is not a physical process. The reason for this is that interpretation does
influence the kind of sound a singer makes which is ultimately achieved through a physical
action the singer is doing. Although teachers may acquaint their students with musical
styles and performance practices and suggest certain interpretive effects, most voice
teachers agree that interpretation can not be taught. Students who lack a natural creative
imagination and aesthetic sensibility can not learn it from someone else. Failure to interpret
well is not a vocal fault even though it may affect vocal sound significantly.
Classification of vocal sounds
Vocal sounds are divided into two basic categories-vowels and consonants-with a wide
variety of sub-classifications. Voice Teachers and serious voice students spend a great deal
of time studying how the voice forms vowels and consonants, and studying the problems
that certain consonants or vowels may cause while singing. The International Phonetic
Alphabet is used frequently by voice teachers and their students.
Problems in describing vocal sounds
Describing vocal sound is an inexact science largely because the human voice is a self-
contained instrument. Since the vocal instrument is internal, the singer's ability to monitor
the sound produced is complicated by the vibrations carried to the ear through the
Eustachean (auditory) tube and the bony structures of the head and neck. In other words,
most singers hear something different in their ears/head than what a person listening to
them hears. As a result, voice teachers often focus less on how it "sounds" and more on
how it "feels". Vibratory sensations resulting from the closely-related processes of phonation
and resonation, and kinesthetic ones arising from muscle tension, movement, body position,
and weight serve as a guide to the singer on correct vocal production.
Another problem in describing vocal sound lies in the vocal vocabulary itself. There are
many schools of thought within vocal pedagogy and different schools have adopted different
terms, sometimes from other artistic disciplines. This has led to the use of a plethora of
descriptive terms applied to the voice which are not always understood to mean the same
thing. Some terms sometimes used to describe a quality of a voice's sound are: warm,
white, dark, light, round, reedy, spread, focused, covered, swallowed, forward, ringing,
hooty, bleaty, plummy, mellow, pear-shaped, and so forth.
Posture
The singing process functions best when certain physical conditions of the body exist. The
ability to move air in and out of the body freely and to obtain the needed quantity of air can
be seriously affected by the posture of the various parts of the breathing mechanism. A
sunken chest position will limit the capacity of the lungs, and a tense abdominal wall will
inhibit the downward travel of the diaphragm. Good posture allows the breathing
mechanism to fulfill its basic function efficiently without any undue expenditure of energy.
Good posture also makes it easier to initiate phonation and to tune the resonators as proper
alignment prevents unnecessary tension in the body. Voice Instructors have also noted that
when singers assume good posture it often provides them with a greater sense of self
assurance and poise while performing. Audiences also tend to respond better to singers with
good posture. Habitual good posture also ultimately improves the overall health of the body
by enabling better blood circulation and preventing fatigue and stress on the body.
Breathing and breath support
In the words of Robert C. White, who paraphrased a "Credo" for singing (no blasphemy
intended):
In the Beginning there was Breath, and Singing was with Breath, and Singing was Breath,
and Singing was Breath. And all singing was made by the Breath, and without Breath was
not any Singing made that was made. (White 1988, p. 26)
All singing begins with breath. All vocal sounds are created by vibrations in
the larynx caused by air from the lungs. Breathing in everyday life is a subconscious bodily
function which occurs naturally, however the singer must have control of the intake and
exhalation of breath to achieve maximum results from their voice.
Natural breathing has three stages: a breathing-in period, a breathing out period, and a
resting or recovery period; these stages are not usually consciously controlled. Within
singing there are four stages of breathing:
1. a breathing-in period (inhalation)
2. a setting up controls period (suspension)
3. a controlled exhalation period (phonation)
4. a recovery period
These stages must be under conscious control by the singer until they becomed conditioned
reflexes. Many singers abandon conscious controls before their reflexes are fully conditioned
which ultimately leads to chronic vocal problems.
Voice classification
In European classical music and opera, voices are treated
like musical instruments. Composers who write vocal music must Voice type
have an understanding of the skills, talents, and vocal properties
of singers. Voice classification is the process by which human Female voices
singing voices are evaluated and are thereby designated Soprano
into voice types. These qualities include but are not limited Mezzo-soprano
to: vocal range, vocal weight, vocal tessitura, vocal timbre, Contralto
and vocal transition points such as breaks and lifts within the Male voices
voice. Other considerations are physical characteristics, speech Countertenor
level, scientific testing, and vocal registration. The science Tenor
behind voice classification developed within European classical Baritone
music and has been slow in adapting to more modern forms of Bass
singing. Voice classification is often used withinopera to
associate possible roles with potential voices. There are currently several different systems
in use within classical music including: the German Fachsystem and the choral music
system among many others. No system is universally applied or accepted.
However, most classical music systems acknowledge seven different major voice categories.
Women are typically divided into three groups: soprano, mezzo-soprano, and contralto. Men
are usually divided into four groups: countertenor, tenor, baritone, and bass. When
considering children's voices, an eighth term,treble, can be applied. Within each of these
major categories there are several sub-categories that identify specific vocal qualities
like coloratura facility and vocal weight to differentiate between voices.
It should be noted that within choral music, singers voices are divided solely on the basis
of vocal range. Choral music most commonly divides vocal parts into high and low voices
within each sex (SATB). As a result, the typical choral situation affords many opportunities
for misclassification to occur. Since most people have medium voices, they must be
assigned to a part that is either too high or too low for them; the mezzo-soprano must sing
soprano or alto and the baritone must sing tenor or bass. Either option can present
problems for the singer, but for most singers there are fewer dangers in singing too low
than in singing too high.
Within contemporary forms of music (sometimes referred to as Contemporary Commercial
Music), singers are classified by the style of music they sing, such as jazz, pop, blues, soul,
country, folk, and rock styles. There is currently no authoritative voice classification system
within non-classical music. Attempts have been made to adopt classical voice type terms to
other forms of singing but such attempts have been met with controversy. The development
of voice categorizations were made with the understanding that the singer would be using
classical vocal technique within a specified range using unamplified (no microphones) vocal
production. Since contemporary musicians use different vocal techniques, microphones, and
are not forced to fit into a specific vocal role, applying such terms as soprano, tenor,
baritone, etc. can be misleading or even inaccurate.
Dangers of quick identification
Many voice teachers warn of the dangers of quick identification. Premature concern with
classification can result in misclassification, with all its attendant dangers. Vennard says:
"I never feel any urgency about classifying a beginning student. So many premature
diagnoses have been proved wrong, and it can be harmful to the student and embarrassing
to the teacher to keep striving for an ill-chosen goal. It is best to begin in the middle part of
the voice and work upward and downward until the voice classifies itself."
Most voice teachers believe that it is essential to establish good vocal habits within a limited
and comfortable range before attempting to classify the voice. When techniques of posture,
breathing, phonation, resonation, and articulation have become established in this
comfortable area, the true quality of the voice will emerge and the upper and lower limits of
the range can be explored safely. Only then can a tentative classification be arrived at, and
it may be adjusted as the voice continues to develop. Many acclaimed voice instructors
suggest that teachers begin by assuming that a voice is of a medium classification until it
proves otherwise. The reason for this is that the majority of individuals possess medium
voices and therefore this approach is less likely to misclassify or damage the voice.
Vocal registration
Vocal registers
Highest
Whistle
Falsetto
Modal
Vocal fry
Lowest
Vocal registration refers to the system of vocal registers within the human voice. A
register in the human voice is a particular series of tones, produced in the same vibratory
pattern of the vocal folds, and possessing the same quality. Registers originate
in laryngeal function. They occur because the vocal folds are capable of producing several
different vibratory patterns. Each of these vibratory patterns appears within a
particular range of pitches and produces certain characteristic sounds. The term register can
be somewhat confusing as it encompasses several aspects of the human voice. The term
register can be used to refer to any of the following:
A particular part of the vocal range such as the upper, middle, or lower registers.
A resonance area such as chest voice or head voice.
A phonatory process
A certain vocal timbre
A region of the voice which is defined or delimited by vocal breaks.
A subset of a language used for a particular purpose or in a particular social setting.
In linguistics, a register language is a language which combines tone and
vowel phonation into a single phonological system.
Within speech pathology the term vocal register has three constituent elements: a certain
vibratory pattern of the vocal folds, a certain series of pitches, and a certain type of sound.
Speech pathologists identify four vocal registers based on the physiology of laryngeal
function: the vocal fry register, the modal register, the falsetto register, and the whistle
register. This view is also adopted by many teachers of singing.
Some voice teachers, however, organize registers differently. There are over a dozen
different constructs of vocal registers in use within the field. The confusion which exists
concerning what a register is, and how many registers there are, is due in part to what
takes place in the modal register when a person sings from the lowest pitches of that
register to the highest pitches. The frequency of vibration of the vocal folds is determined
by their length, tension, and mass. As pitch rises, the vocal folds are lengthened, tension
increases, and their thickness decreases. In other words, all three of these factors are in a
state of flux in the transition from the lowest to the highest tones.
If a singer holds any of these factors constant and interferes with their progressive state of
change, his laryngeal function tends to become static and eventually breaks occur with
obvious changes of tone quality. These breaks are often identified as register boundaries or
as transition areas between registers. The distinct change or break between registers is
called apassaggio or a ponticello. Vocal instructors teach that with study a singer can move
effortlessly from one register to the other with ease and consistent tone. Registers can even
overlap while singing. Teachers who like to use this theory of "blending registers" usually
help students through the "passage" from one register to another by hiding their "lift"
(where the voice changes).
However, many voice instructors disagree with this distinction of boundaries blaming such
breaks on vocal problems which have been created by a static laryngeal adjustment that
does not permit the necessary changes to take place. This difference of opinion has effected
the different views on vocal registration.
Coordination
Singing is an integrated and coordinated act and it is difficult to discuss any of the individual
technical areas and processes without relating them to the others. For example, phonation
only comes into perspective when it is connected with respiration; the articulators affect
resonance; the resonators affect the vocal folds; the vocal folds affect breath control; and
so forth. Vocal problems are often a result of a breakdown in one part of this coordinated
process which causes voice teachers to frequently focus in intensively on one area of the
process with their student until that issue is resolved. However, some areas of the art of
singing are so much the result of coordinated functions that it is hard to discuss them under
a traditional heading like phonation, resonation, articulation, or respiration.
Once the voice student has become aware of the physical processes that make up the act of
singing and of how those processes function, the student begins the task of trying to
coordinate them. Inevitably, students and teachers, will become more concerned with one
area of the technique than another. The various processes may progress at different rates,
with a resulting imbalance or lack of coordination. The areas of vocal technique which seem
to depend most strongly on the student's ability to coordinate various functions are.
1. Extending the vocal range to its maximum potential
2. Developing consistent vocal production with a consistent tone quality
3. Developing flexibility and agility
4. Achieving a balanced vibrato
Developing the singing voice
Singing is not a natural process but is a skill that requires highly developed muscle reflexes.
Singing does not require much muscle strength but it does require a high degree of muscle
coordination. Individuals can develop their voices further through the careful and systematic
practice of both songs and vocal exercises. Voice Teachers instruct their students to
exercise their voices in an intelligent manner. Singers should be thinking constantly about
the kind of sound they are making and the kind of sensations they are feeling while they are
singing.
Exercising the singing voice
There are several purposes for vocal exercises, including:
1. Warming up the voice
2. Extending the vocal range
3. "Lining up" the voice horizontally and vertically
4. Acquiring vocal techniques such as legato, staccato, control of dynamics, rapid
figurations, learning to comfortably sing wide intervals, and correcting vocal faults.
Extending the vocal range
An important goal of vocal development is to learn to sing to the natural limits of one's
vocal range without any obvious or distracting changes of quality or technique. Voice
instructors teach that a singer can only achieve this goal when all of the physical processes
involved in singing (such as laryngeal action, breath support, resonance adjustment, and
articulatory movement) are effectively working together. Most voice teachers believe that
the first step in coordinating these processes is by establishing good vocal habits in the
most comfortable tessitura of the voice first before slowly expanding the range beyond that.
There are three factors which significantly affect the ability to sing higher or lower:
1. The Energy Factor- In this usage the word energy has several connotations. It refers to
the total response of the body to the making of sound. It refers to a dynamic relationship
between the breathing-in muscles and the breathing-out muscles known as the breath
support mechanism. It also refers to the amount of breath pressure delivered to the vocal
folds and their resistance that pressure, and it refers to the dynamic level of the sound.
2. The Space Factor- Space refers to the amount of space created by the moving of the
mouth and the position of the palate and larynx. Generally speaking, a singer's mouth
should be opened wider the higher they sing. The internal space or position of the soft
palate and larynx can be widened by the relaxing of the throat. Voice teachers often
describe this as feeling like the "beginning of a yawn".
3. The Depth Factor- In this usage the word depth has two connotations. It refers to the
actual physical sensations of depth in the body and vocal mechanism and it refers to mental
concepts of depth as related to tone quality.
McKinney says, "These three factors can be expressed in three basic rules: (1) As you sing
higher, you must use more energy; as you sing lower, you must use less. (2) As you sing
higher, you must use more space; as you sing lower, you must use less. (3) As you sing
higher, you must use more depth; as you sing lower, you must use less."
General music studies
Some voice teachers will spend time working with their students on general music
knowledge and skills, particularly music theory, music history, and musical styles and
practices as it relates to the vocal literature being studied. If required they may also spend
time helping their students become better sight readers, often adopting Solfege which
assigns certain syllables to the notes of the scale.
Performance skills and practices
Since singing is a performing art, voice teachers spend some of their time preparing their
students for performance. This includes teaching their students etiquette of behavior on
stage such as bowing, addressing problems like stage fright or nervous tics, and the use of
equipment such as microphones. Some students may also be preparing for careers in the
fields ofopera or musical theater where acting skills are required. Many voice instructors will
spend time on acting techniques and audience communication with students in these fields
of interest. Students of opera also spend a great deal of time with their voice teachers
learning foreign language pronunciations.
Voice projection
Voice projection is the strength of speaking or singing whereby the voice is used loudly
and clearly. It is a technique which can be employed to demand respect and attention, such
as when a teacher is talking to the class, or simply to be heard clearly, as an actor in
a theatre.
Breath technique is essential for proper voice projection. Whereas in normal talking one
may use air from the top of the lungs, a properly projected voice uses air properly flowing
from the expansion of the diaphragm. In good vocal technique, well-balanced respiration is
especially important to maintaining vocal projection. The goal is to isolate and relax the
muscles controlling the vocal folds, so that they are unimpaired by tension. The
external intercostal muscles are used only to enlarge the chest cavity, whilst the
counterplay between the diaphragm and abdominal muscles is trained to control airflow.
Stance is also important, and it is recommended to stand up straight with your feet shoulder
width apart and your upstage foot (right foot if right-handed etc) slightly forward. This
improves your balance and your breathing.
In singing voice projection is often equated with resonance, the concentrated pressure
through which one produces a focused sound. True resonance will produce the greatest
amount of projection available to a voice by utilizing all the key resonators found in the
vocal cavity. As the sound being produced and these resonators find the same overtones,
the sound will begin to spin as it reaches the ideal singer's formant at about 2800 Hz. The
size, shape, and hardness of the resonators all factor into the production of these overtones
and ultimately determine the projective capacities of the voice.
Voice type
A voice type is a particular kind of human singing voice
perceived as having certain identifying qualities or Voice type
characteristics. Voice classification is the process by which
human voices are evaluated and are thereby designated Female voices
into voice types. These qualities include but are not limited Soprano
to: vocal range, vocal weight, vocal tessitura, vocal timbre, Mezzo-soprano
and vocal transition points such as breaks and lifts within the Contralto
voice. Other considerations are physical characteristics, speech Male voices
level, scientific testing, and vocal registration. The science Countertenor
behind voice classification developed within European classical Tenor
music and is not generally applicable to other forms of singing. Baritone
Voice classification is often used within opera to associate Bass
possible roles with potential voices. There are currently several
different systems in use including: the German Fach system and the choral music system
among many others. No system is universally applied or accepted. This article focuses on
voice classification within classical music. For other contemporary styles of singing
see: Voice classification in non-classical music.
Voice classification is a tool for singers, composers, venues, and listeners to categorize vocal
properties, and to associate possible roles with potential voices. There have been times
when voice classification systems have been used too rigidly, i.e. a house assigning a singer
to a specific type, and only casting him or her in roles they consider belonging to this
category.
A singer will ultimately choose a repertoire that suits their instrument. Some singers such
as Enrico Caruso, Rosa Ponselle, Joan Sutherland, Maria Callas, Ewa Podles, or Plácido
Domingo have voices which allow them to sing roles from a wide variety of types; some
singers such as Shirley Verrett or Grace Bumbrychange type, and even voice part over their
careers; and some singers such as Leonie Rysanek have voices which lower with age,
causing them to cycle through types over their careers. Some roles as well are hard to
classify, having very unusual vocal requirements; Mozart wrote many of his roles for specific
singers who often had remarkable voices, and some of Verdi’s early works make extreme
demands on his singers.
A note on vocal range vs. tessitura: Choral singers are classified into voice parts based
on range; solo singers are classified into voice types based in part on tessitura – where the
voice feels most comfortable for the majority of the time.
(For more information and roles and singers, see the individual voice type pages.)
Number of voice types
There are a plethora of different voice types used by vocal pedagogists today in a variety of
voice classification systems. Most of these types, however, are sub-types that fall under
seven different major voice categories that are for the most part acknowledged across all of
the major voice classification systems. Women are typically divided into three
groups: soprano,mezzo-soprano, and contralto. Men are usually divided into four
groups: countertenor, tenor, baritone, and bass. When considering the pre-pubescent male
voice an eighth term, treble, can be applied. Within each of these major categories there
are several sub-categories that identify specific vocal qualities like coloratura facility
and vocal weight to differentiate between voices.
Female voices
The range specifications given below are based on the American scientific pitch notation.
Soprano
Soprano range: The soprano is the highest female voice. The typical soprano voice lies
between middle C (C4) and "high C"(C6). The low extreme for sopranos is roughly B3 or A3
(just below middle C). Most soprano roles do not extend above "high C" although there are
several standard soprano roles that call for D6 or D-flat6. At the highest extreme,
some coloratura soprano roles may reach from F6 to A6 (the F to A above "high C").
Soprano tessitura: The tessitura of the soprano voice lies higher than all the other female
voices. In particular, the coloratura soprano has the highest tessitura of all the soprano sub-
types.
Soprano sub-types: As with all voice categories, sopranos are often divided into different
sub-categories based on range, vocal color or timbre, the weight of voice, and dexterity of
the voice. These sub-categories include: Coloratura soprano, Soubrette, Lyric
soprano, Spinto, and Dramatic soprano.
Intermediate voice types
Two types of soprano especially dear to the French are the Dugazon and the Falcon, which
are intermediate voice types between the soprano and the mezzo soprano: a Dugazon is a
darker-colored soubrette, a Falcon a darker-colored soprano drammatico. Mezzo-soprano
The mezzo-soprano is the middle-range voice type for females and is the most common
female voice.
Mezzo-soprano range: The mezzo-soprano voice lies between the soprano voice and
contralto voice, over-lapping both of them. The typical range of this voice is between A3
(the A below middle C) to A5 (the A two octaves above A3). In the lower and upper
extremes, some mezzo-sopranos may extend down to the G below middle C (G3) and as
high as "high C" (C6).
Mezzo-soprano tessitura: Although this voice overlaps both
the contralto and soprano voices, the tessitura of the mezzo-soprano is lower than that of
the soprano and higher than that of the contralto.
Mezzo-soprano sub-types: Mezzo-sopranos are often broken down into three
categories: Lyric mezzo-soprano, Coloratura mezzo-soprano and Dramatic mezzo-
soprano.
Alto
Contralto and alto are not the same term. Technically, "alto" is not a voice type but a
designated vocal line in choral music based on vocal range. The range of the alto part in
choral music is usually more similar to that of a mezzo-soprano than a contralto. However,
in many compositions the alto line is split into two parts. The lower part, Alto 2, is usually
more suitable to a contralto voice than a mezzo-soprano voice..
Contralto
Contralto range: The contralto voice is the lowest female voice. A true operatic contralto is
extremely rare, so much so that often roles intended for contraltos are performed by
mezzo-sopranos as this voice type is difficult to find. The typical contralto range lies
between the F below middle C (F3) to the second F (F5) above middle C. In the lower and
upper extremes, some contralto voices can sing from the E below middle C (E3) to the
second b-flat above (b-flat5), which is only one whole step short of the "Soprano C".
Contralto tessitura: The contralto voice has the lowest tessitura of the female voices. In
current operatic practice, female singers with very low vocal tessituras are often included
amongmezzo-sopranos.
Contralto sub-types: Contraltos are often broken down into two categories: Lyric
contralto and Dramatic contralto.
Male voices
Countertenor
The term countertenor refers to the highest male voice. Many countertenor singers perform
roles originally written for castrati in baroque operas. Except for a few very rare voices
(such as the American male soprano Michael Maniaci, or singers with a disorder such
as Kallmann syndrome), singers called countertenors generally sing in the falsetto register,
sometimes using their modal register for the lowest notes. Historically, there is much
evidence that "countertenor", in England at least, also designated a very high tenor voice,
the equivalent of the French haute-contre, and something similar to the "leggiero tenor"
or tenor altino. It should be remembered that, until about 1830, all male voices used some
falsetto-type voice production in their upper range.
Countertenor ranges (approximate)[citation needed]
:
Countertenor: from about G3 to E5 or F5
Sopranist: extend the upper range to usually only C6, but some as high as E6 or F6
Haute-contre: from about D3 or E3 to about D5
Countertenor sub-types: There are several sub-types of countertenors
including Sopranist or male soprano, Haute-contre, and modern castrato.
Tenor
Tenor range: The tenor is the highest male voice within the modal register. The typical
tenor voice lies between the C one octave below middle C (C3) to the C one octave above
"Middle C" (C5). The low extreme for tenors is roughly B-flat 2 (the second b-flat below
middle C). At the highest extreme, some tenors can sing up to the second F above "Middle
C" (F5).Tenor tessitura: The tessitura of the tenor voice lies above the baritone voice and
below the countertenor voice. The Leggiero tenor has the highest tessitura of all the tenor
sub-types.
Tenor sub-types: Tenors are often divided into different sub-categories based on range,
vocal color or timbre, the weight of the voice, and dexterity of the voice. These sub-
categories include: Leggiero tenor, Lyric tenor, Spinto tenor, Dramatic tenor,
and Heldentenor.
Baritone
The Baritone is the most common type of male voice.
Baritone range: The vocal range of the baritone lies between the bass and tenor ranges,
overlapping both of them. The typical baritone range is from the second F below middle C
(F2) to the F above middle C (F4), which is exactly two octaves. In the lower and upper
extremes, a baritone's range can be extended at either end.
Baritone tessitura: Although this voice overlaps both the tenor and bass voices,
the tessitura of the baritone is lower than that of the tenor and higher than that of the bass.
Baritone sub-types: Baritones are often divided into different sub-categories based on
range, vocal color or timbre, the weight of the voice, and dexterity of the voice. These sub-
categories include: Lyric baritone, Bel Canto (coloratura) baritone, kavalierbariton, Dramatic
baritone, Verdi baritone, baryton-noble, and Bariton/Baryton-Martin.
Bass
Bass range: The bass is the lowest male voice. The typical bass range lies between the
second E below "middle C" (E2) to the E above middle C (E4). In the lower and upper
extremes of the bass voice, some basses can sing from the C two octaves below middle C
(C2) to the G above middle C (G4).
Bass tessitura: The bass voice has the lowest tessitura of all the voices.
Bass sub-types: Basses are often divided into different sub-categories based on range, vocal
color or timbre, the weight of the voice, and dexterity of the voice. These sub-categories
include: Basso Profondo, Basso Buffo / Bel Canto Bass, Basso Cantante, Dramatic
Bass, and Bass-baritone.
Children's voices
The voice from childhood to adulthood
The human voice is in a constant state of change and development just as the whole body is
in a state of constant change. A human voice will alter as a person gets older moving from
immaturity to maturity to a peak period of prime singing and then ultimately into a declining
period. The vocal range and timbre of children's voices does not have the variety that
adults' voices have. Both boys and girls prior to puberty have an equivalent vocal range and
timbre. The reason for this is that both groups have a similar laryngeal size and height and
a similarvocal cord structure. With the onset of puberty, both men and women's voices alter
as the vocal ligaments become more defined and the laryngeal cartilages harden.
The laryngealstructure of both voices change but more so in men. The height of the
male larynx becomes much longer than in women. The size and development of adult lungs
also changes what the voice is physically capable of doing. From the onset of puberty to
approximately age 22, the human voice is in an in-between phase where it is not quite a
child's voice nor an adult one yet. This is not to suggest that the voice stops changing at
that age. Different singers will reach adult development earlier or later than others, and as
stated above there are continual changes throughout adulthood as well.
Treble
The term treble can refer to either a young female or young male singer with an
unchanged voice in the soprano range. Initially, the term was associated with boy
sopranos but as the inclusion of girls into children's choirs became acceptable in the
twentieth century the term has expanded to refer to all pre-pubescent voices. The lumping
of children's voices into one category is also practical as both boys and girls share a similar
range and timbre.
Treble range: Most trebles have an approximate range from the A below "middle C" (A3) to
the F one and a half octaves above "middle C" (F5). Some trebles, however, can extend
their voices higher in the modal register to "high C" (C6). This ability may be comparatively
rare, but the Anglican church repertory, which many trained trebles sing, frequently
demands G5 and even A5. Many trebles are also able to reach higher notes by use of
the whistle register but this practice is rarely called for in performance.
Classifying singers
Voice classification is important for vocal pedagogists and singers as a guiding tool for the
development of the voice. Misclassification can damage the vocal cords, shorten a singing
career and lead to the loss of both vocal beauty and free vocal production. Some of these
dangers are not immediate ones; the human voice is quite resilient, especially in early
adulthood, and the damage may not make its appearance for months or even years.
Unfortunately, this lack of apparent immediate harm can cause singers to develop bad
habits that will over time cause irreparable damage to the voice. Singing outside the natural
vocal range imposes a serious strain upon the voice. Clinical evidence indicates that singing
at a pitch level that is either too high or too low creates vocal pathology. Noted vocal
pedagogist Margaret Greene says,
"The need for choosing the correct natural range of the voice is of great importance in
singing since the outer ends of the singing range need very careful production and should
not be overworked, even in trained voices."
Singing at either extreme of the range may be damaging, but the possibility of damage
seems to be much more prevalent in too high a classification. A number of medical
authorities have indicated that singing at too high a pitch level may contribute to certain
vocal disorders. Medical evidence indicates that singing at too high of a pitch level may lead
to the development of vocal nodules. Increasing tension on the vocal cords is one of the
means of raising pitch. Singing above an individual's best tessitura keeps the vocal cords
under a great deal of unnecessary tension for long periods of time, and the possibility of
vocal abuse is greatly increased. Singing at too low a pitch level is not as likely to be
damaging unless a singer tries to force the voice down.
In general vocal pedagogists consider four main qualities of a human voice when attempting
to classify it: vocal range, tessitura, timbre, and vocal transition points. However, teachers
may also consider physical characteristics, speech level, scientific testing and other factors.
Dangers of quick identification
Many vocal pedagogists warn of the dangers of quick identification. Premature concern with
classification can result in misclassification, with all its attendant dangers. William
Vennardsays:
"I never feel any urgency about classifying a beginning student. So many premature
diagnoses have been proved wrong, and it can be harmful to the student and embarrassing
to the teacher to keep striving for an ill-chosen goal. It is best to begin in the middle part of
the voice and work upward and downward until the voice classifies itself."
Most vocal pedagogists believe that it is essential to establish good vocal habits within a
limited and comfortable range before attempting to classify the voice. When techniques of
posture, breathing, phonation, resonation, and articulation have become established in this
comfortable area, the true quality of the voice will emerge and the upper and lower limits of
the range can be explored safely. Only then can a tentative classification be arrived at, and
it may be adjusted as the voice continues to develop. Many vocal pedagogists suggest that
teachers begin by assuming that a voice is of a medium classification until it proves
otherwise. The reason for this is that the majority of individuals possess medium voices and
therefore this approach is less likely to misclassify or damage the voice.
Choral music classification
Unlike other classification systems, choral music divides voices solely on the basis of vocal
range. Choral music most commonly divides vocal parts into high and low voices within each
sex (SATB). As a result, the typical choral situation affords many opportunities for
misclassification to occur. Since most people have medium voices, they must be assigned to
a part that is either too high or too low for them; the mezzo-soprano must sing soprano or
alto and the baritone must sing tenor or bass. Either option can present problems for the
singer, but for most singers there are fewer dangers in singing too low than in singing too
high.
Speech synthesis - Synthesizer technologies: Encyclopedia II - Speech synthesis - Synthesizer technologies
There are two main technologies used for the generating synthetic speech waveforms: concatenative synthesis and
formant synthesis. Speech synthesis - Concatenative synthesis. Concatenative synthesis is based on the concatenation
(or stringing together) of segments of recorded speech. Generally, concatenative synthesis gives the most natural
sounding synthesized speech. However, natural variation in speech and automated techniques for segmenting the
waveforms sometimes result in audible glitches in the output, detracting from the naturalness
This document provides an introduction to use of the Web by people with disabilities. It
illustrates some of their requirements when using Web sites and Web-based applications,
and provides supporting information for the guidelines and technical work of the World Wide
Web Consortium's (W3C) Web Accessibility Initiative (WAI).
Table of Contents
1. Introduction
2. Scenarios of People with Disabilities Using the Web
3. Different Disabilities That Can Affect Web Accessibility
4. Assistive Technologies and Adaptive Strategies
5. Further Reading
6. Scenario References
7. General References
8. Acknowledgements
1. Introduction
This document provides a general introduction to how people with different kinds of
disabilities use the Web. It provides background to help understand how people with
disabilities benefit from provisions described in the Web Content Accessibility Guidelines
1.0, Authoring Tool Accessibility Guidelines 1.0, and User Agent Accessibility Guidelines 1.0.
It is not a comprehensive or in-depth discussion of disabilities, nor of the assistive
technologies used by people with disabilities. Specifically, this document describes:
scenarios of people with disabilities using accessibility features of Web sites and
Web-based applications;
general requirements for Web access by people with physical, visual, hearing, and
cognitive or neurological disabilities;
some types of assistive technologies and adaptive strategies used by some people
with disabilities when accessing the Web.
This document contains many internal hypertext links between the sections on scenarios,
disability requirements, assistive technologies, and scenario references. The scenario
references and general references sections also include links to external documents.
The following scenarios show people with different kinds of disabilities using assistive
technologies and adaptive strategies to access the Web. In some cases the scenarios show
how the Web can make some tasks easier for people with disabilities.
Please note that the scenarios do not represent actual individuals, but rather individuals
engaging in activities that are possible using today's Web technologies and assistive
technologies. The reader should not assume that everyone with a similar disability to those
portrayed will use the same assistive technologies or have the same level of expertise in
using those technologies. In some cases, browsers, media players, or assistive technologies
with specific features supporting accessibility may not yet be available in an individual's
primary language. Disability terminology varies from one country to another, as do
educational and employment opportunities.
Each scenario contains links to additional information on the specific disability or disabilities
described in more detail in Section 3; to the assistive technology or adaptive strategy
described in Section 4; and to detailed curriculum examples or guideline checkpoints in the
Scenarios References in Section 6.
He has difficulty reading the text on many Web sites. When he first starting using the Web,
it seemed to him the text and images on a lot of sites used poor color contrast, since they
appeared to use similar shades of brown. He realized that many sites were using colors that
were indistinguishable to him because of his red/green color blindness. In some cases the
site instructions explained that discounted prices were indicated by red text, but all of the
text looked brown to him. In other cases, the required fields on forms were indicated by red
text, but again he could not tell which fields had red text.
Mr. Lee found that he prefered sites that used sufficient color contrast, and redundant
information for color. The sites did this by including names of the colors of clothing as well
as showing a sample of the color; and by placing an asterix (*) in front of the required fields
in addition to indicated them by color.
After additional experimentation, Mr. Lee discovered that on most newer sites the colors
were controlled by style sheets and that he could turn these style sheets off with his
browser or override them with his own style sheets. But on sites that did not use style
sheets he couldn't override the colors.
Eventually Mr. Lee bookmarked a series of online shopping sites where he could get reliable
information on product colors, and not have to guess at which items were discounted.
Mr. Jones is a reporter who must submit his articles in HTML for publishing in an on-line
journal. Over his twenty-year career, he has developed repetitive stress injury (RSI) in his
hands and arms, and it has become painful for him to type. He uses a combination
of speech recognition and an alternative keyboard to prepare his articles, but he doesn't use
a mouse. It took him several months to become sufficiently accustomed to using speech
recognition to be comfortable working for many hours at a time. There are some things he
has not worked out yet, such as a sound card conflict that arises whenever he tries to use
speech recognition on Web sites that have streaming audio.
He has not been able to use the same Web authoring software as his colleagues, because
the application that his office chose for a standard is missing many of the keyboard
equivalents that he needs in place of mouse-driven commands. To activate commands that
do not have keyboard equivalents, he would have to use a mouse instead of speech
recognition or typing, and this would re-damage his hands at this time. He researched some
of the newer versions of authoring tools and selected one with full keyboard support. Within
a month, he discovered that several of his colleagues have switched to the new product as
well, after they found that the full keyboard support was easier on their own hands.
When browsing other Web sites to research some of his articles, Mr. Jones likes the access
key feature that is implemented on some Web pages. It enables him to shortcut a long list
of links that he would ordinarily have to tab through by voice, and instead go straight to the
link he wants.
Online student who is deaf
Ms. Martinez is taking several distance learning courses in physics. She is deaf. She had
little trouble with the curriculum until the university upgraded their on-line courseware to a
multimedia approach, using an extensive collection of audio lectures. For classroom-based
lectures the university provided interpreters; however for Web-based instruction they
initially did not realize that accessibility was an issue, then said they had no idea how to
provide the material in accessible format. She was able to point out that the University was
clearly covered by a policy requiring accessibility of online instructional material, and then to
point to the Web Content Accessibility Guidelines 1.0 as a resource providing guidance on
how to make Web sites, including those with multimedia, accessible.
The University had the lectures transcribed and made this information available through
their Web site along with audio versions of the lectures. For an introductory multimedia
piece, the university used a SMIL-based multimedia format enabling synchronized
captioning of audio and description of video. The school's information managers quickly
found that it was much easier to comprehensively index the audio resources on the
accessible area of the Web site, once these resources were captioned with text.
The professor for the course also set up a chat area on the Web site where students could
exchange ideas about their coursework. Although she was the only deaf student in the class
and only one other student knew any sign language, she quickly found that the Web-based
chat format, and the opportunity to provide Web-based text comments on classmates' work,
ensured that she could keep up with class progress.
Ms. Laitinen is an accountant at an insurance company that uses Web-based formats over a
corporate intranet. She is blind. She uses a screen reader to interpret what is displayed on
the screen and generate a combination of speech output and refreshable braille output. She
uses the speech output, combined with tabbing through the navigation links on a page, for
rapid scanning of a document, and has become accustomed to listening to speech output at
a speed that her co-workers cannot understand at all. She uses refreshable braille output to
check the exact wording of text, since braille enables her to read the language on a page
more precisely.
Much of the information on the Web documents used at her company is in tables, which can
sometimes be difficult for non-visual users to read. However, since the tables on this
company's documents are marked up clearly with column and row headers which her screen
reader can access, she easily orients herself to the information in the tables. Her screen
reader reads her the alternative text for any images on the site. Since the insurance codes
she must frequently reference include a number of abbreviations and acronyms, she finds
the expansions of abbreviations and acronyms the first time they appear on a page allows
her to better catch the meaning of the short versions of these terms.
As one of the more senior members of the accounting staff, Ms. Laitenen must frequently
help newer employees with their questions. She has recently upgraded to a browser
that allows better synchronization of the screen display with audio and braille rendering of
that information. This enables her to better help her colleagues, since the screen shows her
colleagues the same part of the document that she is reading with speech or braille output.
Her school recently started to use more online curricula to supplement class textbooks. She
was initially worried about reading load, since she reads slowly. But recently she tried text
to speech software, and found that she was able to read along visually with the text much
more easily when she could hear certain sections of it read to her with the speech synthesis,
instead of struggling over every word.
Her classes recent area of focus is Hans Christian Andersen's writings, and she has to do
some research about the author. When she goes onto the Web, she finds that some sites
are much easier for her to use than others. Some of the pages have a lot of graphics, and
those help her focus in quickly on sections she wants to read. In some cases, though, where
the graphics are animated, it is very hard for her to focus, and so it helps to be able
to freeze the animated graphics.
One of the most important things for her has been the level of accessibility of the Web-
based online library catalogues and the general search functions on the Web. Sometimes
the search options are confusing for her. Her teacher has taught a number of different
search strategies, and she finds that some sites provide options for a variety of searching
strategies and she can more easily select searching options that work well for her.
Mr. Yunus uses the Web to manage some of his household services and finances. He has
some central-field vision loss, hand tremor, and a little short-term memory loss.
He uses a screen magnifier to help with his vision and his hand tremor; when the icons and
links on Web pages are bigger, it's easier for him to select them, and so he finds it easier to
use pages with style sheets. When he first started using some of the financial pages, he
found the scrolling stocktickers distracting, and they moved too fast for him to read. In
addition, sometimes the pages would update before he had finished reading them.
Therefore he tends to use Web sites that do not have a lot of movement in the text, and
that do not auto-refresh. He also tended to "get stuck" on some pages, finding that he could
not back up, on some sites where new browser windows would pop open without notifying
him.
Mr. Yunus has gradually found some sites that work well for him, and developed a
customized profile at some banking, grocery, and clothing sites.
Mr. Sands has put groceries in bags for customers for the past year at a supermarket. He
has Down syndrome, and has difficulty with abstract concepts, reading, and doing
mathematical calculations. He usually buys his own groceries at this supermarket, but
sometimes finds that there are so many product choices that he becomes confused, and he
finds it difficult to keep track of how much he is spending. He has difficulty re-learning
where his favorite products are each time the supermarket changes the layout of its
products.
Recently, he visited an online grocery service from his computer at home. He explored the
site the first few times with a friend. He found that he could use the Web site without much
difficulty -- it had a lot of pictures, which were helpful in navigating around the site, and in
recognizing his favorite brands.
His friend showed him different search options that were available on the site, making it
easier for him to find items. He can search by brand name or by pictures, but he mostly
uses the option that lets him select from a list of products that he has ordered in the past.
Once he decides what he wants to buy, he selects the item and puts it into his virtual
shopping basket. The Web site gives him an updated total each time he adds an item,
helping him make sure that he does not overspend his budget.
The marketing department of the online grocery wanted their Web site to have a high
degree of usability in order to be competitive with other online stores. They usedconsistent
design and consistent navigation options so that their customers could learn and remember
their way around the Web site. They also used the clearest and simplest language
appropriate for the site's content so that their customers could quickly understand the
material.
While these features made the site more usable for all of the online-grocery's customers,
they made it possible for Mr. Sands to use the site. Mr. Sands now shops on the online
grocery site a few times a month, and just buys a few fresh items each day at the
supermarket where he works.
Ms. Kaseem uses the Web to find new restaurants to go to with friends and classmates. She
has low vision and is deaf. She uses a screen magnifier to enlarge the text on Web sites to a
font size that she can read. When screen magnification is not sufficient, she also uses
a screen reader to drive a refreshable braille display, which she reads slowly.
At home, Ms. Kaseem browses local Web sites for new and different restaurants. She uses a
personal style sheet with her browser, which makes all Web pages display according to her
preferences. Her preferences include having background patterns turned off so that there is
enough contrast for her when she uses screen magnification. This is especially helpful when
she reads on-line sample menus of appealing restaurants.
A multimedia virtual tour of local entertainment options was recently added to the Web site
of the city in which Ms. Kaseem lives. The tour is captioned and described -- including text
subtitles for the audio, and descriptions of the video -- which allows her to access it using a
combination of screen magnification and braille. The interface used for the virtual tour
is accessible no matter what kind of assistive technology she is using -- screen
magnification, her screen reader with refreshable braille, or herportable braille device. Ms.
Kaseem forwards the Web site address to friends and asks if they are interested in going
with her to some of the restaurants featured on the tour.
She also checks the public transportation sites to find local train or bus stops near the
restaurants. The Web site for the bus schedule has frames without meaningful titles, and
tables without clear column or row headers, so she often gets lost on the site when trying to
find the information she needs. The Web site for the local train schedule, however, is easy
to use because the frames on that Web site have meaningful titles, and the schedules,
which are laid out as long tables with clear row and column headersthat she uses to orient
herself even when she has magnified the screen display.
Occasionally she also uses her portable braille device, with an infrared connection, to get
additional information and directions at a publicly-available information kiosk in a shopping
mall downtown; and a few times she has downloaded sample menus into her braille device
so that she has them in an accessible format once she is in the restaurant.
This section describes general kinds of disabilities that can affect access to the Web. There
are as yet no universally accepted categorizations of disability, despite efforts towards that
goal. Commonly used disability terminology varies from country to country and between
different disability communities in the same country. There is a trend in many disability
communities to use functional terminology instead of medical classifications. This document
does not attempt to comprehensively address issues of terminology.
Abilities can vary from person to person, and over time, for different people with the same
type of disability. People can have combinations of different disabilities, and combinations of
varying levels of severity.
The term "disability" is used very generally in this document. Some people with conditions
described below would not consider themselves to have disabilities. They may, however,
have limitations of sensory, physical or cognitive functioning which can affect access to the
Web. These may include injury-related and aging-related conditions, and can be temporary
or chronic.
The number and severity of limitations tend to increase as people age, and may include
changes in vision, hearing, memory, or motor function. Aging-related conditions can be
accommodated on the Web by the same accessibility solutions used to accommodate people
with disabilities.
Sometimes different disabilities require similar accommodations. For instance, someone who
is blind and someone who cannot use his or her hands both require full keyboard
equivalents for mouse commands in browsers and authoring tools, since they both have
difficulty using a mouse but can use assistive technologies to activate commands supported
by a standard keyboard interface.
Many accessibility solutions described in this document contribute to "universal design" (also
called "design for all") by benefiting non-disabled users as well as people with disabilities.
For example, support for speech output not only benefits blind users, but also Web users
whose eyes are busy with other tasks; while captions for audio not only benefit deaf users,
but also increase the efficiency of indexing and searching for audio content on Web sites.
Each description of a general type of disability includes several brief examples of the kinds
of barriers someone with that disability might encounter on the Web. These lists of barriers
are illustrative and not intended to be comprehensive. Barrier examples listed here are
representative of accessibility issues that are relatively easy to address with existing
accessibility solutions, except where otherwise noted.
Following is a list of some disabilities and their relation to accessibility issues on the Web.
visual disabilities
o blindness
o low vision
o color blindness
hearing impairments
o deafness
o hard of hearing
physical disabilities
o motor disabilities
speech disabilities
o speech disabilities
cognitive and neurological disabilities
o dyslexia and dyscalculia
o attention deficit disorder
o intellectual disabilities
o memory impairments
o mental health disabilities
o seizure disorders
multiple disabilities
aging-related conditions
Visual disabilities
Blindness (scenario -- "accountant")
To access the Web, many individuals who are blind rely on screen readers -- software that
reads text on the screen (monitor) and outputs this information to a speech
synthesizer and/or refreshable braille display. Some people who are blind use text-based
browsers such as Lynx, or voice browsers, instead of a graphical user interface browser plus
screen reader. They may use rapid navigation strategies such as tabbing through the
headings or links on Web pages rather than reading every word on the page in sequence.
Examples of barriers that people with blindness may encounter on the Web can include:
There are many types of low vision (also known as "partially sighted" in parts of Europe),
for instance poor acuity (vision that is not sharp), tunnel vision (seeing only the middle of
the visual field), central field loss (seeing only the edges of the visual field), and clouded
vision.
To use the Web, some people with low vision use extra-large monitors, and increase the
size of system fonts and images. Others use screen magnifiers or screen enhancement
software. Some individuals use specific combinations of text and background colors, such as
a 24-point bright yellow font on a black background, or choose certain typefaces that are
especially legible for their particular vision requirements.
Barriers that people with low vision may encounter on the Web can include:
Web pages with absolute font sizes that do not change (enlarge or reduce) easily
Web pages that, because of inconsistent layout, are difficult to navigate when
enlarged, due to loss of surrounding context
Web pages, or images on Web pages, that have poor contrast, and whose contrast
cannot be easily changed through user override of author style sheets
text presented as images, which prevents wrapping to the next line when enlarged
also many of the barriers listed for blindness, above, depending on the type and
extent of visual limitation
Color blindness is a lack of sensitivity to certain colors. Common forms of color blindness
include difficulty distinguishing between red and green, or between yellow and blue.
Sometimes color blindness results in the inability to perceive any color.
To use the Web, some people with color blindness use their own style sheets to override the
font and background color choices of the author.
Barriers that people with color blindness may encounter on the Web can include:
Hearing Impairments
To use the Web, many people who are deaf rely on captions for audio content. They may
need to turn on the captions on an audio file as they browse a page; concentrate harder to
read what is on a page; or rely on supplemental images to highlight context.
Barriers that people who are deaf may encounter on the Web can include:
Hard of hearing
A person with a mild to moderate hearing impairment may be considered hard of hearing.
To use the Web, people who are hard of hearing may rely on captions for audio content
and/or amplification of audio. They may need to toggle the captions on an audio file on or
off, or adjust the volume of an audio file.
Physical disabilities
Motor disabilities can include weakness, limitations of muscular control (such as involuntary
movements, lack of coordination, or paralysis), limitations of sensation, joint problems, or
missing limbs. Some physical disabilities can include pain that impedes movement. These
conditions can affect the hands and arms as well as other parts of the body.
To use the Web, people with motor disabilities affecting the hands or arms may use a
specialized mouse; a keyboard with a layout of keys that matches their range of hand
motion; a pointing device such as a head-mouse, head-pointer or mouth-stick; voice-
recognition software; an eye-gaze system; or other assistive technologies to access and
interact with the information on Web sites. They may activate commands by typing single
keystrokes in sequence with a head pointer rather than typing simultaneous keystrokes
("chording") to activate commands. They may need more time when filling out interactive
forms on Web sites if they have to concentrate or maneuver carefully to select each
keystroke.
Barriers that people with motor disabilities affecting the hands or arms may encounter
include:
Speech disabilities
Speech disabilities
Speech disabilities can include difficulty producing speech that is recognizable by some voice
recognition software, either in terms of loudness or clarity.
To use parts of the Web that rely on voice recognition, someone with a speech disability
needs to be able to use an alternate input mode such as text entered via a keyboard.
Barriers that people with speech disabilities encounter on the Web can include:
Web sites that require voice-based interaction and have no alternative input mode
Individuals with visual and auditory perceptual disabilities, including dyslexia (sometimes
called "learning disabilities" in Australia, Canada, the U.S., and some other countries) and
dyscalculia may have difficulty processing language or numbers. They may have difficulty
processing spoken language when heard ("auditory perceptual disabilities"). They may also
have difficulty with spatial orientation.
To use the Web, people with visual and auditory perceptual disabilities may rely on getting
information through several modalities at the same time. For instance, someone who has
difficulty reading may use a screen reader plus synthesized speech to facilitate
comprehension, while someone with an auditory processing disability may use captions to
help understand an audio track.
Barriers that people with visual and auditory perceptual disabilities may encounter on the
Web can include:
lack of alternative modalities for information on Web sites, for instance lack of
alternative text that can be converted to audio to supplement visuals, or the lack of
captions for audio
Individuals with attention deficit disorder may have difficulty focusing on information.
To use the Web, an individual with an attention deficit disorder may need to turn off
animations on a site in order to be able to focus on the site's content.
Barriers that people with attention deficit disorder may encounter on the Web can include:
To use the Web, people with intellectual disabilities may take more time on a Web site, may
rely more on graphics to enhance understanding of a site, and may benefit from the level of
language on a site not being unnecessarily complex for the site's intended purpose.
Individuals with memory impairments may have problems with short-term memory, missing
long-term memory, or may have some loss of ability to recall language.
To use the Web, people with memory impairments may rely on a consistent navigational
structure throughout the site.
Individuals with mental health disabilities may have difficulty focusing on information on a
Web site, or difficulty with blurred vision or hand tremors due to side effects from
medications.
To use the Web, people with mental health disabilities may need to turn off distracting
visual or audio elements, or to use screen magnifiers.
Seizure disorders
Some individuals with seizure disorders, including people with some types of epilepsy
(including photo-sensitive epilepsy), are triggered by visual flickering or audio signals at a
certain frequency.
To use the Web, people with seizure disorders may need to turn off animations, blinking
text, or certain frequencies of audio. Avoidance of these visual or audio frequencies in Web
sites helps prevent triggering of seizures.
Barriers can include:
use of visual or audio frequencies that can trigger seizures
For instance, while someone who is blind can benefit from hearing an audio description of a
Web-based video, and someone who is deaf can benefit from seeing the captions
accompanying audio, someone who is both deaf and blind needs access to a text transcript
of the description of the audio and video, which they could access on a refreshable braille
display.
Similarly, someone who is deaf and has low vision might benefit from the captions on audio
files, but only if the captions could be enlarged and the color contrast adjusted.
Someone who cannot move his or her hands, and also cannot see the screen well, might
use a combination of speech input and speech output, and might therefore need to rely on
precise indicators of location and navigation options in a document.
Changes in people's functional ability due to aging can include changes in abilities or a
combination of abilities including vision, hearing, dexterity and memory. Barriers can
include any of the issues already mentioned above. Any one of these limitations can affect
an individual's ability to access Web content. Together, these changes can become more
complex to accommodate.
For example, someone with low vision may need screen magnification, however when using
screen magnification the user loses surrounding contextual information, which adds to the
difficulty which a user with short-term memory loss might experience on a Web site.
Assistive technologies are products used by people with disabilities to help accomplish tasks
that they cannot accomplish otherwise or could not do easily otherwise. When used with
computers, assistive technologies are sometimes referred to as adaptive software or
hardware.
Some assistive technologies are used together with graphical desktop browsers, text
browsers, voice browsers, multimedia players, or plug-ins. Some accessibility solutions are
built into the operating system, for instance the ability to change the system font size, or
configure the operating system so that multiple-keystroke commands can be entered with a
sequence of single keystrokes.
Adaptive strategies are techniques that people with disabilities use to assist in using
computers or other devices. For example someone who cannot see a Web page may tab
through the links on a page as one strategy for helpinjg skim the content.
Following is a list of the assistive technologies and adaptive strategies described below. This
is by no means a comprehensive list of all such technologies or strategies, but rather
explanations of examples highlighted in the scenarios above.
Alternate keyboards or switches are hardware or software devices used by people with
physical disabilities, that provide an alternate way of creating keystrokes that appear to
come from the standard keyboard. Examples include keyboard with extra-small or extra-
large key spacing, keyguards that only allow pressing one key at a time, on-screen
keyboards, eyegaze keyboards, and sip-and-puff switches. Web-based applications that can
be operated entirely from the keyboard, with no mouse required, support a wide range of
alternative modes of input.
Braille is a system using six to eight raised dots in various patterns to represent letters and
numbers that can be read by the fingertips. Braille systems vary greatly around the world.
Some "grades" of braille include additional codes beyond standard alpha-numeric characters
to represent common letter groupings (e.g., "th," "ble" in Grade II American English braille)
in order to make braille more compact. An 8-dot version of braille has been developed to
allow all ASCII characters to be represented. Refreshable or dynamic braille involves the use
of a mechanical display where dots (pins) can be raised and lowered dynamically to allow
any braille characters to be displayed. Refreshable braille displays can be incorporated into
portable braille devices with the capabilities of small computers, which can also be used as
interfaces to devices such as information kiosks.
Scanning software
Scanning software is adaptive software used by individuals with some physical or cognitive
disabilities that highlights or announces selection choices (e.g., menu items, links, phrases)
one at a time. A user selects a desired item by hitting a switch when the desired item is
highlighted or announced.
Screen magnification is software used primarily by individuals with low vision that magnifies
a portion of the screen for easier viewing. At the same time screen magnifiers make
presentations larger, they also reduce the area of the document that may be viewed,
removing surrounding context . Some screen magnifiers offer two views of the screen: one
magnified and one default size for navigation.
Software used by individuals who are blind or who have dyslexia that interprets what is
displayed on a screen and directs it either to speech synthesis for audio output, or to
refreshable braille for tactile output. Some screen readers use the document tree (i.e., the
parsed document code) as their input. Older screen readers make use of the rendered
version of a document, so that document order or structure may be lost (e.g., when tables
are used for layout) and their output may be confusing.
Speech recognition
Speech (or voice) recognition is used by people with some physical disabilities or temporary
injuries to hands and forearms as an input method in some voice browsers. Applications
that have full keyboard support can be used with speech recognition.
Speech synthesis or speech output can be generated by screen readers or voice browsers,
and involves production of digitized speech from text. People who are used to using speech
output sometimes listen to it at very rapid speeds.
Some accessibility solutions are adaptive strategies rather than specific assistive
technologies such as software or hardware. For instance, for people who cannot use a
mouse, one strategy for rapidly scanning through links, headers, list items, or other
structural items on a Web page is to use the tab key to go through the items in sequence.
People who are using screen readers -- whether because they are blind or dyslexic -- may
tab through items on a page, as well as people using voice recognition.
Text browsers
Text browsers such as Lynx are an alternative to graphical user interface browsers. They
can be used with screen readers for people who are blind. They are also used by many
people who have low bandwidth connections and do not want to wait for images to
download.
Visual notification
Visual notification is an alternative feature of some operating systems that allows deaf or
hard of hearing users to receive a visual alert of a warning or error message that might
otherwise be issued by sound