Final

VOICE
MORPHING
By: Aswin.S
S7T
Roll No: 15
WHAT IS VOICE MORPHING ??
• Voice morphing is a technique for modifying a (source)

speaker's speech to sound as if it were spoken by a
different (target) speaker.
• In simpler terms, it is being able to change the speech of

one speaker to that of another speaker.
• Technology developed at the Los Alamos National

Laboratory in New Mexico, USAby George Papcun.
• Applic ations for Voice Morphing range from rec reational

ones to security ones.
WHAT IT ACTUALLY PERFORMS ?
• It is a technique to modify a source speaker's speech

to sound as if itwas spoken by a target speaker.
• Voice morphing enables speech patterns to be
cloned.
• An accurate copy of a person's voice can be made
that can be able to say anything in the voice of
someone else.
NEED OF VOICE MORPHING
 Text To Speech (TTS).
 In public speech systems.

 For special
effects ( just like video or image
morphing is done).
 To diminish Ethnical barriers.
HOW TO MORPH VOICE ??
• We need to effectively change the pitch from that of a male

speaker to that of a female speaker. If we reminisce the excitation
signal hasinformationabout the speaker.
• We findtheLPCcoefficients forthe Source and Target Signalsand
using these coefficients we are going to interpolate between the
two Signals.
• Weget theNewLPC(linearpredictive coding) coefficients.
new lpc coeff = [constant*(lpcsource )+( 1- constant )( lpc target)] ;
0<=constant<=1
• The pitch of a female speaker will be close to twice that of
the male speaker. In our example, the pitch of the male
speaker is141Hz and that of the female speaker is210Hz.
• So we need to develop some time stretching algorithm so
that we can implement pitch shifting. We obtain the
residue of the source signal and stretch it according to the
value of the const. The constant indicates what is the
position of morphed signal in between the source and
target.
• For example: if constant = 0.2, then the morphed signal will
be closer in pitch to the source signal and a value of 0.8 for
constant will result in a pitch that is closer to the target
signal.
HOW DO WE SHIFT THE PITCH ??
We break the residue signal into small windows and

introduce fade in and fade out for each block. We
recombine everything to form the pitch shifted signal.
Based on the alpha we can time stretch the residue
according to ourrequirements.
HOW DO WE MORPH FINALLY ??

We now have the pitch shifted residue signal and the
new LPC coefficients. We should resample the pitch
shifted signal so that it is played at a faster rate.
[Remember when we pitch shift then the residue will
last longer]. If we inverse filter the resampled pitch
shifted residue then we can effect morphing.
BLOCK DIAGRAM
Time Domain Plots of Source and Target featuring the Pitch
MATCHING AND WARPING
DTW (Dynamic TimeWarping)
- Dynamic Time Warping (DTW) is used to find the

best match between the pitch of the twosounds.
SIGNAL RE-ESTIMATION
Loss during Signalre-estimation
- Due to signals being transformed into the cepstral

domain, a magnitude function is used. This results in a
loss of phase information in the representation of the data.
APPLICATIONS
 In public speech systems, we can make the sound to be
of a popular public speaker. We can implement that in
many areas like railway announcements.
 Video and image morphing is extensively used for film
and graphical special effects.
 Text to speech system converts normal language text
into speech; other systems render symbolic linguistic
representations like phonetic transcriptions into
speech.
ADVANTAGES
• Allows speech model to be duplicated and an exact

copy of a person’s voice.
• Powerful combat zoneweapon.

DISADVANTAGES
• Voice detection is done through sophisticated 3D

rendering, but there are a lot of normalizingproblems.
• Some applications require extensive sound libraries.
• Different languages require different phonetics and
thus updating or extending is tedious.
• It is very seldom complete; that is, we may not be able
to add every small talk, every phonetics into the
database.
CONCLUSION
• The approach we have adopted separates the sounds

intotwo forms:
1. Spectral envelope information
2. Pitch and voicinginformation.
• Dynamic Time Warping
- Aligns the sounds with respect to their pitches.
• Signal re-estimation algorithm
- Frames are c onverted bac k into a time domain
waveform.
FUTURE SCOPE
• Extending the functionality oftool.
-Create a powerful and flexible morphing tool.
• Increased userinteraction.
-Graphical User Interface could be designed and

integrated to make the package more ‘user-friendly’.
BIBLIOGRAPHY
• Hui Ye and Steve Young (2003). "Perceptually

Weighted Linear Transformations for Voice
Conversion". Eurospeech 2003, Geneva.
• Ye, H. and S. Young (2004). "High Quality Voice
Morphing". Int Conference Acoustics Speech and
Signal Processing, Montreal, Canada.
• High quality Voice Morphing. Hui Ye and
Steve Young. Quality-enhanced Voice
Morphing
THANK YOU!!!

Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final

Uploaded by

Copyright:

Available Formats

VOICE

• Voice morphing is a technique for modifying a (source)

• In simpler terms, it is being able to change the speech of

• Technology developed at the Los Alamos National

• Applic ations for Voice Morphing range from rec reational

• It is a technique to modify a source speaker's speech

 Text To Speech (TTS).

 In public speech systems.

• We need to effectively change the pitch from that of a male

new lpc coeff = [constant*(lpcsource )+( 1- constant )( lpc target)] ;

We break the residue signal into small windows and

HOW DO WE MORPH FINALLY ??

DTW (Dynamic TimeWarping)

- Dynamic Time Warping (DTW) is used to find the

Loss during Signalre-estimation

- Due to signals being transformed into the cepstral

• Allows speech model to be duplicated and an exact

• Powerful combat zoneweapon.

• Voice detection is done through sophisticated 3D

• The approach we have adopted separates the sounds

-Graphical User Interface could be designed and

• Hui Ye and Steve Young (2003). "Perceptually

You might also like