You are on page 1of 46

ECE 598: Speech Synthesis The Vocal Tract and Lossless Tube Models

Richard Sproat http://www.linguistics.uiuc.edu/rws/ URL for this course: http://catarina.ai.uiuc.edu/ECE598/

Literature
Flanagan, J. 1972. Speech Analysis, Synthesis and Perception. Second Edition. Springer-Verlag Rabiner, L. and Schafer, R.W. 1978. Digital Processing of Speech Signals. Prentice-Hall.

ECE 598: Vocal Tract and Lossless Tube Models

The Vocal Mechanism

(Flanagan, Fig. 2.1, p. 10)


ECE 598: Vocal Tract and Lossless Tube Models

Schematic of Vocal Mechanism

(Flanagan, Fig. 3.1, p. 24)


ECE 598: Vocal Tract and Lossless Tube Models

General Discrete-Time Model for Speech Production

(Rabiner & Schafer, Fig. 3.50, p. 105)


ECE 598: Vocal Tract and Lossless Tube Models

Modeling Speech
A realistic model would have to model all of the above, plus loss due to soft tissue We start with a simpler model: the uniform lossless tube

(Rabiner & Schafer, Fig. 3.14a, p. 62)

ECE 598: Vocal Tract and Lossless Tube Models

Term Denitions
p = p(x, t) is the variation in sound pressure in the tube at position x and time t u = u(x, t) is the variation in volume velocity ow at position x and time t is the density of air in the tube c is the velocity of sound A = A(x, t) is the area function of the tube

ECE 598: Vocal Tract and Lossless Tube Models

Area Function
For an arbitrary tube A(x, t) would have some varying shape

(Rabiner & Schafer, Fig. 3.13a, p. 61) For a uniform tube, A(x, t) = A(x) = A is a constant

ECE 598: Vocal Tract and Lossless Tube Models

Uniform Tubes & the Wave Equation


With a uniform tube the wave equation describing the relation between pressure and volume velocity can be expressed as (Rabiner & Schafer, p. 62, Eqn. 3.2): p u = x A t u A p = x c2 t (1)

(2)

For instance, the rst equation states that the change in pressure with respect to distance along the tube is related to the change in volume velocity with respect to time, times a constant /A.

ECE 598: Vocal Tract and Lossless Tube Models

Relation to Lossless Electrical Transmission Lines


The wave equations for a uniform lossless tube are identical in form to the equations for a lossless uniform electrical transmission line (R&S Eqn. 3.4, p. 63): v i =L x t i v =C x t where v is voltage, i is current, L is inductance and C = capacitance (3)

(4)

ECE 598: Vocal Tract and Lossless Tube Models

10

Relation to Lossless Electrical Transmission Lines: Analogies


Acoustics p: pressure u: volume velocity /A: acoustic inductance A/(c2) acoustic capacitance Electricity v: voltage i: current L: inductance C: capacitance

(R&S Table 3.3, p. 63)

ECE 598: Vocal Tract and Lossless Tube Models

11

Solution of Wave Equation


Returning to the wave equations it can be shown that the solutions have the form (R&S Eqn. 3.3, p. 62): u(x, t) = [u+(t x/c) u(t + x/c)] c + p(x, t) = [u (t x/c) + u(t + x/c)] A where u+ and u are, respectively, positive direction and negative direction traveling waves. (5) (6)

ECE 598: Vocal Tract and Lossless Tube Models

12

Concatenated Lossless Tubes

(Rabiner & Schafer, Fig. 3.32, p. 83)


ECE 598: Vocal Tract and Lossless Tube Models

13

Concatenated Lossless Tubes


Pressure and volume velocity for the kth tube are related by (R&S Eqn. 3.35, p. 83): c + pk (x, t) = [uk (t x/c) + u(t + x/c)] k Ak uk (x, t) = [u+(t x/c) u(t + x/c)] k k What happens at the junction of two tubes? (7) (8)

ECE 598: Vocal Tract and Lossless Tube Models

14

Continuity
Continuity must obtain, so that the pressure at the right edge of the kth tube must be the same as the pressure at the left edge of the (k + 1)st tube. Thus (R&S Eqn. 3.36, p. 84):

pk (lk , t) = pk+1(0, t) uk (lk , t) = uk+1(0, t) Substitution of 8 into 910 yields (R&S Eqn. 3.37a, p. 84): Ak+1 + [uk (t k ) + u(t + k )] = u+ (t) + u (t) k k+1 k+1 Ak where k = lk /c is the time required to travel the kth tube,
ECE 598: Vocal Tract and Lossless Tube Models

(9) (10)

(11)

15

And (R&S Eqn. 3.37b, p. 84): u+(t k ) u(t + k ) = u+ (t) u (t) k k k+1 k+1 (12)

ECE 598: Vocal Tract and Lossless Tube Models

16

Continuity

(Rabiner & Schafer, Fig. 3.33, p. 84)


ECE 598: Vocal Tract and Lossless Tube Models

17

Continuity
It is important to note that it is not that u+(t k ) = u+ (t) k k+1 (13)

Rather it is only the sum of the forward and backward volume velocity that must be equal at the boundary. In particular, some of the forward wave will be reected back at the righthand boundary and some of the backward wave will be reected forward at the lefthand boundary.

ECE 598: Vocal Tract and Lossless Tube Models

18

Continuity Continued
From Equations 1112 (solving 12 for u(t + k ), substituting the result into k 11; and then subtracting 12 from 11) we have that (R&S Eqn. 3.38, p. 84): u+ (t) k+1 2Ak+1 Ak+1 Ak + = u (t k ) + u (t) Ak+1 + Ak k Ak+1 + Ak k+1 Ak+1 Ak + 2Ak uk (t k ) + u (t) Ak+1 + Ak Ak+1 + Ak k+1 (14)

u(t + k ) = k Here,

(15)

Ak+1 Ak (16) Ak+1 + Ak is termed the reection coecient since it determines the amount of u (t) k+1 that is reected at the junction. Note that 1 rk 1. rk =
ECE 598: Vocal Tract and Lossless Tube Models

19

Substituting, we have (R&S Eqn. 3.41, p. 85): u+ (t) = (1 + rk )u+(t k ) + rk u (t) k+1 k k+1 u(t + k ) = rk u+(t k ) + (1 rk )u (t) k k k+1 (17) (18)

ECE 598: Vocal Tract and Lossless Tube Models

20

Signal-Flow Representation

(Rabiner & Schafer, Fig. 3.34, p. 85)

ECE 598: Vocal Tract and Lossless Tube Models

21

Signal-Flow Representation
Thus, for example u(t + k ) is obtained as follows: k

ECE 598: Vocal Tract and Lossless Tube Models

22

Boundary Conditions
We have so far ignored the boundary conditions at the lips and glottis. A good model of radiation at the lips is the spherical bae This turns out to be hard to model, so another approximation, which assumes that the size of the orice is relatively small compared to the curvature of the sphere, is the planar bae.

(Rabiner & Schafer, Fig. 3.19, p. 70)

ECE 598: Vocal Tract and Lossless Tube Models

23

Boundary Conditions at Lips


u (t + N ) = rLu+ (t N ) N N (R&S Eqn. 3.44, p. 86) where the reection coecient at the lips rL is (R&S Eqn. 3.45, p. 86): c/AN ZL c/AN + ZL (ZL is the radiation impedance at the lips) rL = The output volume velocity at the lips is (R&S Eqn. 3.46, p. 86): (20) (19)

uN (lN , t) = u+ (t N ) u (t + N ) N N = (1 + rL)u+ (t N ) N
ECE 598: Vocal Tract and Lossless Tube Models

(21) (22)

24

Signal Flow Diagram at Lips

(Rabiner & Schafer, Fig. 3.35, p. 86)

ECE 598: Vocal Tract and Lossless Tube Models

25

Boundary Conditions at the Glottis

(Rabiner & Schafer, Fig. 3.28, p. 79) Opening and closing of the glottis is controlled by: Air pressure in lungs Tension/stiness in vocal cords Area of glottal opening under rest conditions It is also aected by the vocal tract (coupling) so the system is non-linear.
ECE 598: Vocal Tract and Lossless Tube Models

26

Boundary Conditions at the Glottis


But the coupling is weak so it is common to ignore the interaction. This allows us to have a glottal impedance of the form: ZG() = RG + jLG (RG is resistance at glottis, LG is inductance at glottis) (23)

ECE 598: Vocal Tract and Lossless Tube Models

27

Boundary Conditions at the Glottis


(R&S Eqn. 3.49, p. 87) u+(t) = 1 Where (R&S Eqn. 3.50, p. 87):
c ZG A1

1 + rG uG(t) + rGu(t) 1 2

(24)

rG =

ZG + c A

(25)

ECE 598: Vocal Tract and Lossless Tube Models

28

Signal Flow Diagram at Glottis

(Rabiner & Schafer, Fig. 3.36, p. 87)

ECE 598: Vocal Tract and Lossless Tube Models

29

Flow Diagram of Two-Tube Model

(Rabiner & Schafer, Fig. 3.37, p. 87)

ECE 598: Vocal Tract and Lossless Tube Models

30

Side Note on Nasal Tract


The model we have just developed has only resonances (poles, formants) meaning that there are particular frequencies at which the system will respond to an excitation. For the /uh/ in but for a typical male talker the rst two resonances will be at about 500Hz and 1200 Hz. With nasal sounds the model changes since we are now dealing with two tubes rather than one. For nasal stops the oral tube is closed, so we have a situation like the following:

(Rabiner & Schafer, Fig. 3.27a, p. 78)


ECE 598: Vocal Tract and Lossless Tube Models

31

Side Note on Nasal Tract


The closed oral cavity traps energy at certain frequencies thus introducing zeroes (antiresonances) into the output from the nostrils. It is common to avoid modeling the nasal tract directly, instead mimicking the presence of zeroes using a large number of tubes in the single tract model, with a resulting large number of resonances.

ECE 598: Vocal Tract and Lossless Tube Models

32

Transfer Function: Background Notions


Unit impulse sequence (unit sample) (R&S Eqn. 2.1, p. 11): (n) = 1 if n = 0 = 0 otherwise (26) (27)

(Rabiner & Schafer, Fig. 2.2a, p. 12)

ECE 598: Transfer Function

33

Transfer Function: Background Notions


Linear shift-invariant systems are completely characterized by their response to a unit sample input. For a linear shift-invariant system, you can compute the output y(n) for a given input x(n) by convolving the latter with the unit sample response (R&S Eqn. 2.5a, p. 13):

y(n) =
k=

x(k)h(n k) = x(n) h(n) )d )

(28)

(Where, f g

t f (t)g(t 0

ECE 598: Transfer Function

34

Frequency domain transforms, z-transform (R&S Eqn. 2.6a, p. 13):

X(z) =
n=

x(n)z n

(29)

where z is some complex number. Various properties of z-transforms include that the z-transform of x(n) h(n) is equal to X(z)H(z) If we set z = ej we get (R&S Eqn. 2.9a, p. 15):

X(ej ) =
n=

x(n)ej

(30)

the Fourier transform.


ECE 598: Transfer Function

35

Note that H(z), the z-transform of the unit sample response is called the system function.

ECE 598: Transfer Function

36

Transfer Function
We seek the transfer function V (z) =
UL (z) UG (z)

(R&S Eqn. 3.63, p. 92)

Returning to our lossless tube model, suppose we have a model with N equal length sections, and we sample every T = 2 samples, where is the time required to traverse each section (the delay of each section). Then the previous equations for the volume velocity at each section become: 1 = (1 + ) + rk u (t) k+1 2 1 u(t) = rk u+(t 1) + (1 rk )u (t ) k k k1 2 u+ (t) k+1 rk )u+(t k (31) (32)

A property of the z-transform is that for x(n + n0) the z-transform is z n0 X(z) (shift). Also, for ax1(n) + bx2(n) we get aX1(z) + bX2(z) (linearity ).
ECE 598: Transfer Function

37

Transfer Function
Thus the z-transform for the junction equations are (R&S Eqn. 3.64, p. 92):
+ + Uk+1(z) = (1 + rk )z 1/2Uk (z) + rk Uk+1(z) + Uk (z) = rk z 1Uk (z) + (1 rk )z 1/2Uk+1(z)

(33) (34)

which can be solved to (R&S Eqn. 3.65, p. 9293): z 1/2 + rk z 1/2 Uk+1(z) Uk+1(z) = 1 + rk 1 + rk z 1/2 rk z 1/2 + Uk (z) = U (z) + U (z) 1 + rk k+1 1 + rk k+1
+ Uk (z)

(35) (36)

ECE 598: Transfer Function

38

Transfer Function: Boundary Condition at Lips


Treat the boundary condition at the lips UL(z) as the boundary to a ctitious N + 1st tube of innite length (so that there is no negative-going wave). Then (R&S, p. 93):
+ UN +1(z) = UL(z) UN +1(z) = 0

(37) (38)

ECE 598: Transfer Function

39

Transfer Function
Let (R&S Eqn. 3.673.68, p. 93):
+ Uk Uk rk z 1+rk 1/2 z 1+rk
1/2

Uk = Qk =

(39) (40)

z 1+rk rk z 1/2 1+rk

1/2

Then we can express equations 35-36 as the system (R&S Eqn. 3.66, p. 93): Uk = Qk Uk+1 Applying this equation iteratively yields :
ECE 598: Transfer Function

(41)

40

U1 = Q1 Q2 . . . QN UN +1
N

(42) (43)

=
k=1

Qk UN +1

From 24 we have that:

u+(t) = 1

1 + rG uG(t) + rGu(t) 1 2

(44) (45) (46)

1 + rG uG(t) = u+(t) rGu(t) 1 1 2 2 2rG + uG(t) = u (t) u (t) 1 + rG 1 1 + rG 1


ECE 598: Transfer Function

41

So (R&S Eqn. 3.70, p. 94): UG(z) = Or (R&S Eqn. 3.71, p. 94): 2 2rG , U1(z) 1 + rG 1 + rG 2rG 2 + U1 (z) U1 (z) 1 + rG 1 + rG (47)

UG(z) =

(48)

Thus since (R&S Eqn. 3.72, p. 94): UL(z) 0 1 0

UN +1 =

UL(z)

(49)

We have (R&S Eqn. 3.73, p. 94):


ECE 598: Transfer Function

42

2rG 1 UG(z) 2 , = = V (z) UL(z) 1 + rG 1 + rG

Qk
k=1

1 0

(50)

ECE 598: Transfer Function

43

Transfer Function
If we factor out the z 1/2, z 1/2 terms for Q as follows (R&S Eqn. 3.74, p. 94):

Qk

= z 1/2 = z 1/2Qk

1 1+rk rk z 1 1+r k

rk 1+r z 1+rk
1

(51) (52)

Then we have (R&S Eqn. 3.75, p. 94): 2 2rG 1 = z N/2 , V (z) 1 + rG 1 + rG


ECE 598: Transfer Function

Qk
k=1

1 0

(53)

44

From which we can eventually derive (R&S Eqn. 3.78, p. 95):

V (z) = [1, rG] =

0.5(1 + rG) 1 r1 r1z 1 z 1


N k=1 (1

N k=1 (1

...

+ rk )z N/2 1 rN rN z 1 z 1

1 0

(54)

0.5(1 + rG)

+ rk )z N/2

D(z)

(55)

D(z) has the form (R&S Eqn. 3.79, p. 95):


N

D(z) = 1
k=1

k z k

(56)

Transfer function of lossless tube has delay corresponding to number of sections, has no zeroes, only poles (resonances).
ECE 598: Transfer Function

45

If we assume rG = 1 (innite impedance at the glottis), then we can evaluate the following recursion to solve D(z) (R&S Eqn. 3.89, p. 96):

D0(z) = 1 Dk (z) = Dk1(z) + rk z k Dk1(z 1); D(z) = DN (z) k = 1, 2, . . . , N

(57) (58) (59)

ECE 598: Transfer Function

You might also like