You are on page 1of 10

a voice has a fundamental frequency of 100 Hz, harmonics will occur at 200 Hz, 300 Hz, 400 Hz,

etc. A
voice with an F0 of 210 Hz has harmonics at 420 Hz, 630 Hz, 840 Hz, etc. Further, in vocal fold
vibration, as the harmonics increase in frequency they gradually fall off in amplitude
The loudness of the sound depends on both frequency and amplitude, but for a given F0, the greater the
overall amplitude, the louder the sound. The component frequencies are called harmonics.

fundamental frequency, which defines its pitch, and we know the frequencies and amplitudes of its
components, which define its quality.

1. The difference between periodic, aperiodic, and transient sounds


Actual sounds are combined by different sounds
Examples of combination of two sounds: [z] (underlying pattern that repeats, a lot of noisy
variations)
Special category of aperiodic sound: transient(instantaneous)
Examples of transient sound: [k’] ← ejective consonant. Silence first and then sudden burst of
sound lasting about a hundredth of a sound, quickly dying down to silence

Periodic [a] Pressure wave of a specific Musical notes


shape is repeated multiple
times

aperiodic [s] ← change from moment -to-moment pressure The scratching of sandpaper
positive value to negative variations are more random
No repeating pattern
value fast and random Moment disturbance(not
drawn out or repeated)
Wave-form: graphical representation of sound pressure changes over time
Pressure scale: arbitrary
X-axis: time (milliseconds) in sound
Positive values Moments of higher pressure (compression)

Negative values Moments of lower pressure (rarefaction)

Equilibrium 0

2. How to describe simple harmonic motion in terms of frequency,


amplitude, and phase
The amount of time required for each cycle: period
○ (The period of a cycle is determined by the length of a string)
○ Longer pendulum → Longer period →Lower frequency)
➢ The number of cycles in a given amount of time: frequency
F = 1/P
P = 1/F
➢ The displacement of the pendulum: amplitude (from (A) To (B) )
○ *depends on the energy exerted
○ The higher the energy → the greater the amplitude
Loss of energy because of gravity, friction, low energy exerted
Gradual loss of energy from cycle to cycle: damping
a. Lightly-damped system: Energy lost is gradual. Oscillation continues for a longer
time
b. Highly-damped system: Energy lost is quick. Oscillation dies down soon.
(negligible loss of energy, energy losses too quick → give the fake impression
undamped systems)
c. undamped systems: no energy is lost at all. Amplitude constant from cycle to
cycle
➢ Sine wave: moves from 0 to 360 (a cycle)
➢ Sinusoid: any wave that has the shape of a sine wave, regardless of
differences in phase
➢ Cosine wave: starts from maximum value 1
➢ A difference in starting point: phase shift
➢ A cosine wave: graphs the instantaneous (immediate) velocity of the
pendulum bob
➢ Positive1 : Peak velocity in a rightward direction
➢ Negative1 : Peak velocity in a leftward direction
➢ Pendulum: Peak velocity is reached at 0 displacement

Why these are important?


➢ Oscillating system (has period and velocity) have the inverse relationship captured by the
sine and cosine wave → simple harmonic motion
➢ The motion of the tine of the tuning fork is exactly analogous to the motion of a pendulum
➢ Inherent stiffness of the tine: the tine springs backward to the rest position. It does not
stop instantaneously at the rest position. Inertia causes it to overshoot and deform in the
opposite position → lightly damped system
➢ The oscillation of tuning fork take place within the range of frequencies and amplitudes
to human sensitive ears
➢ Frequency proportional to pitch: higher the frequency → the higher the not
➢ For a given frequency, the amplitude of the vibration (how far the tine is displaced,
which is determined by how hard the fork is struck) will be proportional to its loudness.
➢ The mathematics of sinusoidal motion are well understood
○ the frequency (how fast the system is cycling)
○ the amplitude (how big each movement is)
○ the phase (at what point the movement started) (not that important)
➢ Every kind of vibration can be described as the sum of a set of simple sinusoids of
varying frequencies and amplitudes. It can be very complex and takes a lot of sinusoids.

3. The relationship between simple and complex waves


➢ There are many cyclic (periodic) patterns in nature, but most are not as
simple as the harmonic motion of a pendulum.
➢ Relationship: More complicated patterns are built up as one kind of
motion is laid on another.
➢ Complex wave is created by adding two simple sinusoids. The result of adding
sinusoids → complex wave
➢ Complex wave itself is periodic
➢ Complex rate: the rate at which the whole pattern repeats, is called the
fundamental frequency (F0).
➢ Each of these motions is represented as a sinusoid
➢ What is spectrum? A graph of frequency and amplitude like that in Figure
6.9 is called a spectrum.
➢ F0 determines the pitch (note) of a sound wave
➢ The loudness of the sound depends on both frequency and amplitude,
but for a given F0, the greater the overall amplitude, the louder the
sound. The component frequencies are called harmonics.
➢ The differing frequencies and amplitudes of the component harmonics
give the sound its quality.
➢ The fundamental frequency is always equal to the greatest common
factor of the component frequencies
4. How to use the basic formulas of acoustics

➢ Sound travels outward from the source as a wave


➢ Speed of sound (in air) = 340 m/sec
➢ We wave is a disturbance that travels through a medium
➢ The movement of any one water particle, bobbing up and down in simple
harmonic motion, can be described with a sinusoid.
➢ A sinusoid can also describe the shape of the surface of the pond after a few
ripples have passed through
➢ the distance between peaks depends on how much time elapses between the
genera- tion of each peak and how fast the disturbance propagates

➢ Time = period
➢ Distance = wavelength (the distance from one peak to the other)
➢ Rate = the motion propagates from one particle to the next
➢ Distance = Rate ∗ Time
➢ Wavelength = Speed ∗ Period, or (since P = 1/F)
➢ Wavelength = Speed/Frequency or
➢ Frequency = Speed/Wavelength
➢ Period (ms) = 1/ frequency (in Hz X1000)
➢ 1 second = 1000 ms

For sound waves, the rate at which the energy propagates through
the air is approximately 340 meters per second (or 770mph)
➢ In a transverse wave, the motion of individual particles is perpendicular to the
motion of the wave
➢ In contrast, waves of sound are longitudinal waves. The motion of the individual
particles is parallel to the motion of the wave.
➢ This pattern of compression followed by rarefaction, moving down the line, is a
particular kind of longitudinal wave, a pressure wave.
➢ The range of pressures to which the ear is sensitive is quite large
➢ The loudness of a sound is actually proportional to the amount of energy
that is represent in wave - intensity
➢ Intensity is a function of both amplitude and frequency: both how big and how
fast the pressure variations are.
➢ Energy: watts per m2
➢ The logarithmic scale used for measuring sound intensity is the decibel scale.
➢ The pitch of the pure tone thus produced is proportional to the frequency of the
oscillation.
➢ The loudness of the tone is proportional to its intensity, which is a function of both
frequency and amplitude.
➢ Free vibration
➢ natural resonant frequency:
○ Every object has a basic frequency, or set of frequencies, at which it will
naturally oscillate when energy is applied.
○ A pendulum has a particular frequency depending on its length
➢ Patterns of vibration that are sustained by continual self-reinforcement in an
oscillating system are called standing waves. Any frequency that can set up as a
standing wave will be resonant frequency of that object.
➢ The fundamental frequency of the plucked string, as well as its component
harmonic frequencies, is determined by the string length and tension. wave will
be resonant frequency of that object.
➢ Free vibration occurs when energy is applied once and the system is left to
oscillate on its own
➢ to force an object to oscillate at any frequency at all, but such forced vibration
takes the continued application of energy
➢ The resonating body thus acts as a filter, allowing only some frequencies to get
through: resonant frequencies are amplified, other frequencies are lost.

1. What kinds of information are represented in a waveform


Waveform:
➢ A waveform is a graph of changes in amplitude (air pressure) over time.
➢ the most straightforward way of looking at speech
➢ a graphical representation of sound pressure changes over time
➢ a representation in the time domain, it is useful for measuring duration
Spectrum:
➢ representation in the frequency domain, useful for determining sound quality
Spectrogram:
➢ combines the two, allowing the phoneticians to view changes in time
---------------------------------------------------------------------------------------------------------------------
Kymograph
➢ The complexity of the vocalic waveform cannot be recorded. Yet, the duration differences
can be observed.
➢ Example: Tense [i] is longer than lax [I ], both vowels are longer preceding voiced
consonants than preceding voiceless.
Oscilloscopes and sound spectrographs
○ like tape recorders, used a microphone to transfer patterns of vibration in the air
into patterns of variation in electrical current.
➢ In an oscilloscope, the variations were displayed as a waveform on a screen.

➢ An analog signal is a continuously varying wave


➢ The record of measurements (a long string of numbers) is the digital representation of
the speech wave.
➢ how often to sample (sampling rate), and how precisely (quantization)
➢ high-frequency signals change very quickly (have very short periods), low-frequency
signals change less quickly (have longer periods).
➢ in order to detect the presence of a sinusoidal component in a complex wave, you have to
capture a measurement twice within its period: once in the positive phase and once in the
negative phase.
○ sample twice as fast as the highest frequency you want to measure.
➢ aliasing:a high frequency masquerades as a 0 low frequency.
---------------------------------------------------------------------------------------------------------------------
Quantization
➢ Exactly how much space a speech file takes up depends on the precision with which each
sample is measured
Praat:
➢ to maximize the signal-to-noise ratio (SNR)
➢ Take advantage from the system’s dynamic range
➢ There will always be a certain amount of background noise. Some of this noise is
quantization error (or rounding error) in representing the continuous analog signal as a
series of discrete levels.
➢ The amount of quantization noise is constant, depending on the bit rate. The higher the
bit rate, the more levels available, and the lower the quantization error
➢ It is important to maximize the dynamic range and minimize background noise
○ If the amplitude of the incoming signal is greater than the maximum amplitude
the system can measure, the result is clipping:
○ the high amplitude peaks are cut off, resulting in distortion.

➢ While it usually is not possible to read the identity of specific segments from
a waveform, different classes of sounds have different defining
characteristics.
---------------------------------------------------------------------------------------------------------------------

2. How to distinguish the major classes of speech sound in a


waveform:
● Stops vs. Fricatives vs. Sonorants
➢ Voiced stops:
○ periodic, much lower amplitude.
○ Have a transient burst at the moment when the closure is released into the vowel.
○ In American English, the periodic energy in [b, d, g] will often die down
(devoiced) during the closure, unless the stop is between other voiced sounds.
➢ Voiceless stops: silence during closure phase, flatline in waveforms, followed by a burst
➢ Voiceless fricative:
○ no repeating pattern, appear as random noise
○ Strident: High amplitude
○ Non-strident:very low amplitude
○ Not followed by a burst (to distinguish from voiceless stops)
➢ Voiced fricatives: COmbined with periodic sounds and noise, patterns die out towards
the end
● Nasals vs. Vowels
➢ Vowels: Highest relative amplitude (open mouth), complex periodic pattern. Amplitude
depends on loudness of utterance
➢ Sonorant consonants (Nasal, laterals): Look like vowels, but with lower amplitude, less
complex (less dense)
● Voicing vs. Voicelessness
➢ Voicing: Higher amplitude
➢ Voicelessness: lower amplitude
● Aspirated (long VOT) vs. Unaspirated (short VOT)
➢ Aspiration: random sound, low amplitude, less complex, longer voice onset time

3. How to segment a waveform (Question 5, Ch. 7)


➢ Depends on amplitude, duration, types of sounds (periodic/ aperiodic, burst, transient)

3. Spectra and Spectrograms:


a. What kinds of information are represented in spectra and
spectrograms

➢ to quantify, visualize, and analyze the details of sound quality


➢ Amplitude over frequency
➢ Two dimensional
b. How to estimate formants, harmonic frequencies, and
fundamental frequency from a spectrum (Question 6, Ch. 7)
➢ Formants: Peak of harmonics, identify the first three peaks, F1. F2, F3
➢ the relationship between the locations of the first three formants that most strongly
determine perceived sound quality
➢ F0: first harmonic
➢ Harmonics: multiplication of F0, e.g. F0: 100Hz, Harmonics would be at 200Hz, 300Hz,
etc.
c. How to estimate formants from a spectrogram
d. How to identify sequences of speech sounds from a
spectrogram (Question 7, Ch. 7)
e. The relationships between:
• Harmonics and F0
➢ F0: first harmonic
➢ Harmonics: multiplication of F0, e.g. F0: 100Hz, Harmonics would be at 200Hz, 300Hz,
etc.
• Formants and harmonics
➢ Formants: Peak of harmonics, identify the first three peaks, F1. F2, F3
• Formants, vocal tract length, and wavelength • Vowel
height/backness and F1/F2
➢ F1 resonance: space behind tongue
➢ F2,F3 resonance: space in front of tongue
➢ The distance between F1 and F2 is inversely proportional to vowel backness.
○ Closer F1, F2: more back vowel, vice versa
■ Move tongue body back equalize front and back cavity resonance =>
similar F1, F2=> closer peaks, closer F1, F2
➢ F1 is inversely proportional to vowel height.
○ . F1 is high for low vowels, and low for high vowels
➢ Lip rounding: lower formats, especially F2 and F3 (front cavity resonance)
➢ Back vowels and lip rounding both lower F2=> back vowels to be round and front
vowels to be unround.
○ Rounding a front vowel will lower F2, bringing F2 and F1 closer together,
making the vowel sound more back; and unrounding a back vowel will raise F2,
making the vowel sound more front.
■ Rounding and backness of vowels reinforce each other

Spectra Reading:
➢ Center of gravity: energy concentration level
○ Wavelength= speed/frequency
○ Long vocal tract amplifies low frequency=> high center of gravity
➢ Kurtosis: negative for /f/: front, short tube=> not enough space for sound to be filtered,
not amplified
○ High kurtosis=> high frequency, vise versa
➢ Skewness
○ Positive: long tail to right
○ Negative: long tail to left
➢ SD
○ High SD, flat frequency, vice versa

4. Tone and Intonation:


a. The difference between tone and intonation, and what kind of
information each conveys
➢ Tone and intonation are both related to the use of pitch.
➢ The tone and intonation depends on the same variable - pitch.The relationship
between pitch and meaning is linguistically structured and language specific for
both tone and intonation.
➢ Yet, tone uses different pitch to give out lexical contrast, while intonation uses it
to express discourse-level meaning.
➢ Different tones used for pronouncing a word give out different meanings. For
example, as stated in the bookIn Thai, if you say [kha:] with rising pitch, it means
leg; but if you say it with falling pitch, it means value.

➢ Intonation is not the direct expression of emotion in the voice. In intonation, pitch
pattern will not change the lexical meaning of a work; however, the status of the
word in the discourse changes. Intonation distinguishes different kinds of
sentences such as yes/no questions, wh-questions , or focuses attention on a
particular word. For example, rising pitch is indicating a yes/ no question, falling
pitch is used for statement.
➢ cross-linguistically, low or falling pitch is associated with assertion and finality,
➢ higher pitch is associated with non-finality, uncertainty, a topic
➢ Positive tag questions, such as “You’re not a werewolf, are you?” (Figure 17.8),
have rising intonation on the tag; while negative tag questions, such as “You’re
a werewolf, aren’t you?” (Figure 17.9), have falling intonation on the tag.

b. The difference between a pitch accent and a phrase


accent/boundary tone
➢ Intonational contours can be broken down into two different kinds of markets
➢ Pitch accent and boundary tones:
○ Pitch accents associate to salient syllables in salient words
○ and boundary tones associate to phrase edges.
➢ Pitch accent language: pitch accent is related to a salient syllable in a word for
creating lexical contrast
➢ in intonation, a pitch accent is associated to a salient syllable in a word for the
purpose of conveying discourse information
Intonational pitch accents associate to the primary stressed syllable of words that
are salient in a phrase.

c. What H, L, –, *, and % mean in the ToBI system


➢ national pitch accents are marked with an asterisk (H*, L*)
➢ boundary tones with a raised dash or percent-sign (H−, H%),
depending on the type of phrase edge they attach to.
➢ The final fall of a statement and rise of a question are the result of boundary
tones
➢ words that are asserted in a simple statement receive an H* accent, and words
that are queried in a yes/no question receive a L* accent
➢ “Marion mowed the lawn?” a question with contrastive focus on “Marion” (i.e., “I
thought that it was Kim’s job”). There is a L* pitch accent on the stressed
syllable of Marion, and an H% boundary tone at the end of the utterance
➢ the default accent placement is on the last word of the phrase
➢ Narrow constractive focus: emphasising the meaning
d. How English pitch accents are assigned, and the effects of
pitch accent placement
➢ This intermediate phrase break is indicated with the superscripted bar after the
accent, while the full intonational phrase break is indicated with the percent
sign.
➢ bitonal pitch accents: two-way combinations of H and L
➢ The H*+L gives the steep fall at the end of the accented
syllable, and the distinct L− and H% give the low plateau
and steep ending rise.
➢ H of assertion plus the L of uncertainty add up to surprise. There needs to be
room for cross-linguistic variation and conventionalization, however: intonation is
not the direct expression of emotion.

e. The acoustic aspects of pitch accent: pitch, duration, intensity

You might also like