You are on page 1of 20

MUSIC 318 MINI-COURSE ON SPEECH AND SINGING

4. RHYTHM, PROSODY, TONE, LANGUAGE

Science of Sound, Chapter 16 Springer Handbook of Acoustics, Chapter 16

RHYTHM
A STRIKING CHARACTERISTIC OF A FOREIGN LANGUAGE IS ITS RHYTHM. ENGLISH, RUSSIAN, ARABIC AND THAI ARE STRESS-TIMED LANGUAGES. STRESSED SYLLABLES RECUR AT APPROXIMATELY EQUAL INTERVALS. SYLLABLES MOST OFTEN END WITH A CONSONANT. FRENCH, SPANISH, GREEK, ITALIAN, YORUBA AND TELEGU ARE SYLLABLE TIME LANGUAGES. SYLLABLES RECUR AT APPROXIMATELY EQUAL INTERVALS. SYLLABLES OFTEN END WITH A VOWEL. RHYTHMIC PATTERNS CAN BE USED TO SIGNAL DIFFERENCES IN SYNTACTIC STRUCTURE. COMPARE: 1. The 2000-year-old skeletons 2. The two 1000-year-old skeletons

A QUESTION. LOUDNESS. IN TERMS OF ACOUSTICS. CONTRAST AND FOCUS.PROSODY IN LINGUISTICS. STRESS. THE EMOTIONAL STATE OF A SPEAKER. EMPHASIS. OR A COMMAND. THE PROSODICS OF ORAL LANGUAGES INVOLVE VARIATION IN SYLLABLE LENGTH. PROSODY IS OF GREAT INTEREST IN AUTOMATIC SPEECH RECOGNITION . AND THE FORMANT FREQUENCIES OF SPEECH SOUNDS. WHETHER THE UTTERANCE IS A STEMENT. PROSODY IS THE RHYTHM. WHETHER THE SPEAKER IS BEING IRONIC OR SARCASTIC. PROSODY MAY REFLECT VARIOUS FEATURES OF THE SPEAKER OR THE UTTERANCE. PITCH. AND INTONATION OF SPEECH.

IMPERATIVE DECALARATIVE: “You are going home” INTEROGATIVE: “You are going home?” (voice is raised at end of sentence) IMPERATIVE: “You ARE going home!” (are is emphasized) .DECLARATIVE. INTEROGATIVE.

ATTEMPTS HAVE BEEN MADE TO ACCOMPLISH ACOUSTIC “LIE DETECTION” BY ANALYZING THE PROSODIC FEATURES OF RECORDED SPEECH FOR EVIDENCE OF STRESS .EMOTIONAL STATE OF THE SPEAKER PROSODIC FEATURES TEND TO INDICATE THE EMOTIONAL STATE OF THE SPEAKER. INCREASES BOTH LOUDNESS AND PITCH. “RAISING ONE’S VOICE “ IN ANGER. FOR EXAMPLE. A STATE OF EXCITEMENT FREQUENCY CAUSES AN INCREASE IN THE RATE OF SPEAKING.

EFFECT OF EMOTION ON PHONATION FREQUENCY PHONATION FREQUENCY vs TIME FOR THREE ACTORS SPEAKING THE SAME SENTENCE (“For God’s sake!”) IN FOUR DIFFERENT MODES (Williams and Stevens 1972) .

A=ANGER .EFFECT OF EMOTION ON PHONATION FREQUENCY MEDIAN AND RANGE OF THE PHONATION FREQUENCY FOR THREE ACTORS SPEAKING THE SAME SENTENCE: S=SORROW. F=FEAR. N=NEUTRAL.

RADIO ANNOUNCER SPEAKING BEFORE (top) AND AFTER (bottom) THE CRASH OF THE HINDENBURG DIRIGIBLE (1937) .

STRESS SPECTOGRAMS OF THE WORD “SQUEAL” SPOKEN WITH FOUR DEGREES OF STRESS IN RESPONSE TO A LIST OF QUESTIONS (Brownlee 1996) .

THE FOUR TONES IN MANDARIN CHINESE ARE SHOWN . SUCH AS CHINESE. A PHONEME CAN TAKE ON DIFFERENT MEANINGS DEPENDING ON ITS TONE.TONE IN SOME LANGUAGES.

HEALTH. PERSONALITY. .VOICE QUALITY VOICE QUALITY IS A BROAD TERM THAT REFERS TO THE EXTRALINGUISTIC ASPECTS OF A SPEAKER’S VOICE WITH REGARD TO IDENTITY. VOCAL TRACT LENGTH. VOCAL FOLD MASS. AND HEALTH. AND NASAL CAVITY VOLUME MAY INDICATE INFORMATION ABOUT AGE. AND EMOTIONAL STATE. JAW AND TONGUE SIZE. TRACHEAL LENGTH. SEX. PHYSIQUE.

BUT HIGH HARMONICS CAN BE MEASURED UP TO 20 kHz. IN 2000. A BIG IMPROVEMENT OVER THE OLD “TELEPHONE SOUND. AND FALL OFF RAPIDLY ABOVE 5000 Hz. BECAUSE HIGH FREQUENCIES ARE SLIGHTLY DIMINIISHED.” HOPEFULLY CELL-PHONE SOUND WILL SOON SOUND MUCH BETTER.“High fidelity on the line: please say ‘ahh’” THIS IS THE TITLE OF AN INTERESTING ARTICLE BY STEN TERNSTRÖM IN THE FALL 2008 ISSUE OF ECHOES. A WIDE-BAND STANDARD FOR TELEPHONY WAS DEFINED UP TO 7 000 Hz. SPECTRA OF SPEECH SOUNDS ARE ESPECIALLY RICH UP TO 4000 Hz. VOICES HEARD IN LIVE PERFORMANCE MAY SOUND A LITTLE “DULL” OF “FADED” BEYOND THE 15TH ROW. EARLY TELEPHONES TRANSMITTED ONLY 300-3500 Hz WITH LITTLE LOSS IN INTELLIGIBILITY (SEE FILTERED SPEECH IN LESSON 3). .

“YAWNY”. /ae/. . and a slightly constricted ORAL CAVITY. Titze. and “TWANGY” voice. widened LIP OPENING. AND “TWANGY” VOICE Story. F1 and F2 moved farther apart. Relative to NORMAL speech. TWANGY vowels were characterized by shortened TRACT length. the ORAL CAVITY is widened and the TRACT is lengthened for YAWNY vowels. and Hoffman (2001) did a 3-dimensional study of the vocal tract using MRI to determine the shape when vowels /i/. /α/. “YAWNY”.NORMAL. F1 and F2 moved closer together. and /u/ were spoken with NORMAL.

Story. Titze and Hoffman. 2001) .

2001) . Titze Hoffman.Story.

.ACCENTS “TWO COUNTRIES SEPARATED BY A COMMON LANGUAGE” Have you ever misunderstood someone or been misunderstood by someone who speaks with a different accent? The sounds that an American hears as 'Bob the clerk' may be heard by an Australian as 'barb the clock'.

The two most important parameters in determining different vowel sounds are the first two formants. which are frequency bands with increased power. These are the two axes on the graph. Other important parameters are the length of the vowel and other formants . so that they approximately correspond to the axes long used by phoneticians and linguists: F1 (vertical) approximately corresponds to the jaw height (which correlates negatively with the extent of the mouth opening). as here. The axes are traditionally plotted backwards. F2 (horizontal) approximately corresponds to the position (forward or back) of the constriction of the vocal tract where the tongue is close to the roof of the mouth.

F1 AND F2 FOR ENGLISH VOWEL SOUNDS SPOKEN BY AUSTRALIAN SPEAKERS F1 CORRELATES WITH MOUTH OPENING. F2 CORRELATES WITH TONGUE PLACEMENT .

for an American. the words "hud" and "hard" have a similar sound. the main difference is the length. a long bud is a bard. it's a bird. it is "hud" and "heard" that are distinguished by length. For an Australian. AMERICAN SPEAKER .AUSTRALIAN SPEAKER For the Australians in this sample. For this sample of Americans.

TO PARTICIPATE IN THIS SURVEY BY WOLFE.phys. SMITH AND COLLEAGUES.au/swe/survey/form.php . CLICK ON http://project.unsw.edu.