ECE 598: Speech Synthesis The Vocal Tract and Lossless Tube Models

ECE 598: Speech Synthesis The Vocal Tract and Lossless Tube Models
Richard Sproat http://www.linguistics.uiuc.edu/rws/ URL for this course: http://catarina.ai.uiuc.edu/ECE598/
Literature
Flanagan, J. 1972. Speech Analysis, Synthesis and Perception. Second Edition. Springer-Verlag Rabiner, L. and Schafer, R.W. 1978. Digital Processing of Speech Signals. Prentice-Hall.
ECE 598: Vocal Tract and Lossless Tube Models
The Vocal Mechanism
(Flanagan, Fig. 2.1, p. 10)

Schematic of Vocal Mechanism
(Flanagan, Fig. 3.1, p. 24)

General Discrete-Time Model for Speech Production
(Rabiner & Schafer, Fig. 3.50, p. 105)

Modeling Speech
A realistic model would have to model all of the above, plus loss due to soft tissue We start with a simpler model: the uniform lossless tube
(Rabiner & Schafer, Fig. 3.14a, p. 62)
Term Denitions
p = p(x, t) is the variation in sound pressure in the tube at position x and time t u = u(x, t) is the variation in volume velocity ow at position x and time t is the density of air in the tube c is the velocity of sound A = A(x, t) is the area function of the tube
Area Function
For an arbitrary tube A(x, t) would have some varying shape
(Rabiner & Schafer, Fig. 3.13a, p. 61) For a uniform tube, A(x, t) = A(x) = A is a constant
Uniform Tubes & the Wave Equation

With a uniform tube the wave equation describing the relation between pressure and volume velocity can be expressed as (Rabiner & Schafer, p. 62, Eqn. 3.2): p u = x A t u A p = x c2 t (1)
(2)
For instance, the rst equation states that the change in pressure with respect to distance along the tube is related to the change in volume velocity with respect to time, times a constant /A.
Relation to Lossless Electrical Transmission Lines

The wave equations for a uniform lossless tube are identical in form to the equations for a lossless uniform electrical transmission line (R&S Eqn. 3.4, p. 63): v i =L x t i v =C x t where v is voltage, i is current, L is inductance and C = capacitance (3)
(4)
10
Relation to Lossless Electrical Transmission Lines: Analogies

Acoustics p: pressure u: volume velocity /A: acoustic inductance A/(c2) acoustic capacitance Electricity v: voltage i: current L: inductance C: capacitance
(R&S Table 3.3, p. 63)
11
Solution of Wave Equation

Returning to the wave equations it can be shown that the solutions have the form (R&S Eqn. 3.3, p. 62): u(x, t) = [u+(t x/c) u(t + x/c)] c + p(x, t) = [u (t x/c) + u(t + x/c)] A where u+ and u are, respectively, positive direction and negative direction traveling waves. (5) (6)
12
Concatenated Lossless Tubes

13
Concatenated Lossless Tubes

Pressure and volume velocity for the kth tube are related by (R&S Eqn. 3.35, p. 83): c + pk (x, t) = [uk (t x/c) + u(t + x/c)] k Ak uk (x, t) = [u+(t x/c) u(t + x/c)] k k What happens at the junction of two tubes? (7) (8)
14
Continuity
Continuity must obtain, so that the pressure at the right edge of the kth tube must be the same as the pressure at the left edge of the (k + 1)st tube. Thus (R&S Eqn. 3.36, p. 84):
pk (lk , t) = pk+1(0, t) uk (lk , t) = uk+1(0, t) Substitution of 8 into 910 yields (R&S Eqn. 3.37a, p. 84): Ak+1 + [uk (t k ) + u(t + k )] = u+ (t) + u (t) k k+1 k+1 Ak where k = lk /c is the time required to travel the kth tube,
(9) (10)
(11)
15
And (R&S Eqn. 3.37b, p. 84): u+(t k ) u(t + k ) = u+ (t) u (t) k k k+1 k+1 (12)
16
Continuity

17
Continuity
It is important to note that it is not that u+(t k ) = u+ (t) k k+1 (13)
Rather it is only the sum of the forward and backward volume velocity that must be equal at the boundary. In particular, some of the forward wave will be reected back at the righthand boundary and some of the backward wave will be reected forward at the lefthand boundary.
18
Continuity Continued
From Equations 1112 (solving 12 for u(t + k ), substituting the result into k 11; and then subtracting 12 from 11) we have that (R&S Eqn. 3.38, p. 84): u+ (t) k+1 2Ak+1 Ak+1 Ak + = u (t k ) + u (t) Ak+1 + Ak k Ak+1 + Ak k+1 Ak+1 Ak + 2Ak uk (t k ) + u (t) Ak+1 + Ak Ak+1 + Ak k+1 (14)
u(t + k ) = k Here,
(15)
Ak+1 Ak (16) Ak+1 + Ak is termed the reection coecient since it determines the amount of u (t) k+1 that is reected at the junction. Note that 1 rk 1. rk =
19
Substituting, we have (R&S Eqn. 3.41, p. 85): u+ (t) = (1 + rk )u+(t k ) + rk u (t) k+1 k k+1 u(t + k ) = rk u+(t k ) + (1 rk )u (t) k k k+1 (17) (18)
20
Signal-Flow Representation
21
Signal-Flow Representation
Thus, for example u(t + k ) is obtained as follows: k
22
Boundary Conditions
We have so far ignored the boundary conditions at the lips and glottis. A good model of radiation at the lips is the spherical bae This turns out to be hard to model, so another approximation, which assumes that the size of the orice is relatively small compared to the curvature of the sphere, is the planar bae.
23
Boundary Conditions at Lips

u (t + N ) = rLu+ (t N ) N N (R&S Eqn. 3.44, p. 86) where the reection coecient at the lips rL is (R&S Eqn. 3.45, p. 86): c/AN ZL c/AN + ZL (ZL is the radiation impedance at the lips) rL = The output volume velocity at the lips is (R&S Eqn. 3.46, p. 86): (20) (19)
uN (lN , t) = u+ (t N ) u (t + N ) N N = (1 + rL)u+ (t N ) N
(21) (22)
24
Signal Flow Diagram at Lips
25
Boundary Conditions at the Glottis
(Rabiner & Schafer, Fig. 3.28, p. 79) Opening and closing of the glottis is controlled by: Air pressure in lungs Tension/stiness in vocal cords Area of glottal opening under rest conditions It is also aected by the vocal tract (coupling) so the system is non-linear.
26

But the coupling is weak so it is common to ignore the interaction. This allows us to have a glottal impedance of the form: ZG() = RG + jLG (RG is resistance at glottis, LG is inductance at glottis) (23)
27

(R&S Eqn. 3.49, p. 87) u+(t) = 1 Where (R&S Eqn. 3.50, p. 87):
c ZG A1
1 + rG uG(t) + rGu(t) 1 2
(24)
rG =
ZG + c A
(25)
28
Signal Flow Diagram at Glottis
29
Flow Diagram of Two-Tube Model
30
Side Note on Nasal Tract

The model we have just developed has only resonances (poles, formants) meaning that there are particular frequencies at which the system will respond to an excitation. For the /uh/ in but for a typical male talker the rst two resonances will be at about 500Hz and 1200 Hz. With nasal sounds the model changes since we are now dealing with two tubes rather than one. For nasal stops the oral tube is closed, so we have a situation like the following:

31
Side Note on Nasal Tract

The closed oral cavity traps energy at certain frequencies thus introducing zeroes (antiresonances) into the output from the nostrils. It is common to avoid modeling the nasal tract directly, instead mimicking the presence of zeroes using a large number of tubes in the single tract model, with a resulting large number of resonances.
32
Transfer Function: Background Notions

Unit impulse sequence (unit sample) (R&S Eqn. 2.1, p. 11): (n) = 1 if n = 0 = 0 otherwise (26) (27)
ECE 598: Transfer Function
33
Transfer Function: Background Notions

Linear shift-invariant systems are completely characterized by their response to a unit sample input. For a linear shift-invariant system, you can compute the output y(n) for a given input x(n) by convolving the latter with the unit sample response (R&S Eqn. 2.5a, p. 13):
y(n) =
k=
x(k)h(n k) = x(n) h(n) )d )
(28)
(Where, f g
t f (t)g(t 0
34
Frequency domain transforms, z-transform (R&S Eqn. 2.6a, p. 13):
X(z) =
n=
x(n)z n
(29)
where z is some complex number. Various properties of z-transforms include that the z-transform of x(n) h(n) is equal to X(z)H(z) If we set z = ej we get (R&S Eqn. 2.9a, p. 15):
X(ej ) =
n=
x(n)ej
(30)
the Fourier transform.

35
Note that H(z), the z-transform of the unit sample response is called the system function.
36
Transfer Function
We seek the transfer function V (z) =
UL (z) UG (z)
(R&S Eqn. 3.63, p. 92)
Returning to our lossless tube model, suppose we have a model with N equal length sections, and we sample every T = 2 samples, where is the time required to traverse each section (the delay of each section). Then the previous equations for the volume velocity at each section become: 1 = (1 + ) + rk u (t) k+1 2 1 u(t) = rk u+(t 1) + (1 rk )u (t ) k k k1 2 u+ (t) k+1 rk )u+(t k (31) (32)
A property of the z-transform is that for x(n + n0) the z-transform is z n0 X(z) (shift). Also, for ax1(n) + bx2(n) we get aX1(z) + bX2(z) (linearity ).
37
Transfer Function
Thus the z-transform for the junction equations are (R&S Eqn. 3.64, p. 92):
+ + Uk+1(z) = (1 + rk )z 1/2Uk (z) + rk Uk+1(z) + Uk (z) = rk z 1Uk (z) + (1 rk )z 1/2Uk+1(z)
(33) (34)
which can be solved to (R&S Eqn. 3.65, p. 9293): z 1/2 + rk z 1/2 Uk+1(z) Uk+1(z) = 1 + rk 1 + rk z 1/2 rk z 1/2 + Uk (z) = U (z) + U (z) 1 + rk k+1 1 + rk k+1
+ Uk (z)
(35) (36)
38
Transfer Function: Boundary Condition at Lips

Treat the boundary condition at the lips UL(z) as the boundary to a ctitious N + 1st tube of innite length (so that there is no negative-going wave). Then (R&S, p. 93):
+ UN +1(z) = UL(z) UN +1(z) = 0
(37) (38)
39
Transfer Function
Let (R&S Eqn. 3.673.68, p. 93):
+ Uk Uk rk z 1+rk 1/2 z 1+rk
1/2
Uk = Qk =
(39) (40)
z 1+rk rk z 1/2 1+rk
1/2
Then we can express equations 35-36 as the system (R&S Eqn. 3.66, p. 93): Uk = Qk Uk+1 Applying this equation iteratively yields :
(41)
40
U1 = Q1 Q2 . . . QN UN +1
N
(42) (43)
=
k=1
Qk UN +1
From 24 we have that:
u+(t) = 1
1 + rG uG(t) + rGu(t) 1 2
(44) (45) (46)
1 + rG uG(t) = u+(t) rGu(t) 1 1 2 2 2rG + uG(t) = u (t) u (t) 1 + rG 1 1 + rG 1

41
So (R&S Eqn. 3.70, p. 94): UG(z) = Or (R&S Eqn. 3.71, p. 94): 2 2rG , U1(z) 1 + rG 1 + rG 2rG 2 + U1 (z) U1 (z) 1 + rG 1 + rG (47)
UG(z) =
(48)
Thus since (R&S Eqn. 3.72, p. 94): UL(z) 0 1 0
UN +1 =
UL(z)
(49)
We have (R&S Eqn. 3.73, p. 94):

42
2rG 1 UG(z) 2 , = = V (z) UL(z) 1 + rG 1 + rG
Qk
k=1
1 0
(50)
43
Transfer Function
If we factor out the z 1/2, z 1/2 terms for Q as follows (R&S Eqn. 3.74, p. 94):
Qk
= z 1/2 = z 1/2Qk
1 1+rk rk z 1 1+r k
rk 1+r z 1+rk
1
(51) (52)
Then we have (R&S Eqn. 3.75, p. 94): 2 2rG 1 = z N/2 , V (z) 1 + rG 1 + rG

Qk
k=1
1 0
(53)
44
From which we can eventually derive (R&S Eqn. 3.78, p. 95):
V (z) = [1, rG] =
0.5(1 + rG) 1 r1 r1z 1 z 1

N k=1 (1
N k=1 (1
...
+ rk )z N/2 1 rN rN z 1 z 1
1 0
(54)
0.5(1 + rG)
+ rk )z N/2
D(z)
(55)
D(z) has the form (R&S Eqn. 3.79, p. 95):

N
D(z) = 1
k=1
k z k
(56)
Transfer function of lossless tube has delay corresponding to number of sections, has no zeroes, only poles (resonances).
45
If we assume rG = 1 (innite impedance at the glottis), then we can evaluate the following recursion to solve D(z) (R&S Eqn. 3.89, p. 96):
D0(z) = 1 Dk (z) = Dk1(z) + rk z k Dk1(z 1); D(z) = DN (z) k = 1, 2, . . . , N
(57) (58) (59)

ECE 598: Speech Synthesis The Vocal Tract and Lossless Tube Models

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECE 598: Speech Synthesis The Vocal Tract and Lossless Tube Models

Uploaded by

Copyright:

Available Formats

ECE 598: Speech Synthesis The Vocal Tract and Lossless Tube Models

Richard Sproat http://www.linguistics.uiuc.edu/rws/ URL for this course: http://catarina.ai.uiuc.edu/ECE598/

ECE 598: Vocal Tract and Lossless Tube Models

The Vocal Mechanism

(Flanagan, Fig. 2.1, p. 10)

Schematic of Vocal Mechanism

(Flanagan, Fig. 3.1, p. 24)

General Discrete-Time Model for Speech Production

(Rabiner & Schafer, Fig. 3.50, p. 105)

(Rabiner & Schafer, Fig. 3.14a, p. 62)

ECE 598: Vocal Tract and Lossless Tube Models

ECE 598: Vocal Tract and Lossless Tube Models

ECE 598: Vocal Tract and Lossless Tube Models

Uniform Tubes & the Wave Equation

ECE 598: Vocal Tract and Lossless Tube Models

Relation to Lossless Electrical Transmission Lines

ECE 598: Vocal Tract and Lossless Tube Models

Relation to Lossless Electrical Transmission Lines: Analogies

(R&S Table 3.3, p. 63)

ECE 598: Vocal Tract and Lossless Tube Models

Solution of Wave Equation

ECE 598: Vocal Tract and Lossless Tube Models

Concatenated Lossless Tubes

(Rabiner & Schafer, Fig. 3.32, p. 83)

Concatenated Lossless Tubes

ECE 598: Vocal Tract and Lossless Tube Models

ECE 598: Vocal Tract and Lossless Tube Models

(Rabiner & Schafer, Fig. 3.33, p. 84)

ECE 598: Vocal Tract and Lossless Tube Models

ECE 598: Vocal Tract and Lossless Tube Models

(Rabiner & Schafer, Fig. 3.34, p. 85)

ECE 598: Vocal Tract and Lossless Tube Models

ECE 598: Vocal Tract and Lossless Tube Models

(Rabiner & Schafer, Fig. 3.19, p. 70)

ECE 598: Vocal Tract and Lossless Tube Models

Boundary Conditions at Lips

Signal Flow Diagram at Lips

(Rabiner & Schafer, Fig. 3.35, p. 86)

ECE 598: Vocal Tract and Lossless Tube Models

Boundary Conditions at the Glottis

Boundary Conditions at the Glottis

ECE 598: Vocal Tract and Lossless Tube Models

Boundary Conditions at the Glottis

ECE 598: Vocal Tract and Lossless Tube Models

Signal Flow Diagram at Glottis

(Rabiner & Schafer, Fig. 3.36, p. 87)

ECE 598: Vocal Tract and Lossless Tube Models

Flow Diagram of Two-Tube Model

(Rabiner & Schafer, Fig. 3.37, p. 87)

ECE 598: Vocal Tract and Lossless Tube Models

Side Note on Nasal Tract

(Rabiner & Schafer, Fig. 3.27a, p. 78)

Side Note on Nasal Tract

ECE 598: Vocal Tract and Lossless Tube Models

Transfer Function: Background Notions

(Rabiner & Schafer, Fig. 2.2a, p. 12)

ECE 598: Transfer Function

Transfer Function: Background Notions

x(k)h(n k) = x(n) h(n) )d )

ECE 598: Transfer Function

Frequency domain transforms, z-transform (R&S Eqn. 2.6a, p. 13):

the Fourier transform.