Professional Documents
Culture Documents
Shengwen Yang
Voice Changer is created to modify input voices in real-time 2.2. Pitch Shifting
during a conversation, and it works on every platform. This Pitch scaling or pitch shifting is the opposite to time
type of entertainment device has being amazingly popular stretching, where the process of changing the pitch without
among young people recently. In order to modify, change affecting the speed is used for audio data processing. Similar
and disguise the input human voice in any circumstances by methods can change speed, pitch, or both at the same time,
using a microphone, and to add another dimension of or in a time-varying way. These processes are used to match
creativity with limitless options, voice changers work the pitches and tempos of two pre-recorded clips for mixing
behind the scenes intercepting audio from the microphone when the clips cannot be reperformed or resampled. For
before it goes to the applications, so there’s no need to instance, a drum track containing no pitched instruments
change any configurations or settings in other programs. It’s could be moderately resampled for tempo without adverse
become easier to simply run the program and start creating effects, but a pitched track could not. They are also used to
voice distortions in minutes. generate an effect, such as increasing the range of an
instrument (like pitch shifting a guitar down an octave).
The pitch shifting technique mentioned in this paper is a
2. PITCH SHIFTING sound effects unit that raises or lowers the pitch of an audio
signal by a preset interval. For example, a pitch shifter set to
Pitch shifters are included in most audio processors today, increase the pitch by a fourth will raise each note three
and pitch shifting is provided based on the concepts of diatonic intervals above the notes actually played. Simple
increasing pitch and reducing durations, or reducing pitch pitch shifters raise or lower the pitch by one or two octaves,
and increasing duration. At the far extremes of pitch shifting, while more sophisticated devices offer a range of interval
the resultant sound bears little resemblance to the original. alterations.
Usually there are two synthesis methods. One is based on a
bank of sine wave oscillators, and the other is based on an
inverse FFT. These techniques can also be used to transpose 3. VOCODER
an audio sample while holding speed or duration constant.
This may be accomplished by time stretching and then Vocoder is a category of voice codec that analyzes and
resampling back to the original length. Alternatively, the synthesizes the human voice signal for audio data
frequency of the sinusoids in a sinusoidal model may be compression, multiplexing, voice encryption, voice
altered directly, and the signal might be reconstructed at the transformation, etc.
appropriate time scale. The human voice consists of sounds created by the opening
and closing of the glottis by the vocal cords, which produces
a periodic waveform with many harmonics. This basic
1
sound is then filtered by the nose and throat (a complicated 5. MAX/MSP
resonant piping system) to produce differences in harmonic
content (formants) in a controlled way, generating a wide Max/MSP is a visual programming language written in C++
variety of sounds used in speech and conversation. The and produced by Cycling '74, that helps us build complex,
vocoder examines speech by measuring how its spectral interactive programs without any prior experience writing
characteristics change over time. This results in a series of code. MSP is a DSP plug-in for Max, allowing real-time
signals representing these modified frequencies at any audio synthesis. Max/MSP is especially useful for building
particular time as the user speaks. In simple terms, the signal audio, MIDI, video, and graphics applications where user
is split into a large number of frequency bands (the larger interaction is needed. Max/MSP is split into several parts -
this number, the more accurate the analysis) and the level of "Max" handles discrete operations and MIDI, this is the
signal present at each frequency band gives the easiest place to start getting familiar with the tool, where
instantaneous representation of the spectral energy content. "MSP" deals with signal processing and audio.
Therefore, the vocoder is dramatically reducing the amount
of information needed to store speech, from a complete
recording to a series of numbers. Since the vocoder process
sends only the parameters of the vocal model over the
communication link, instead of a point-by-point recreation
of the waveforms, the bandwidth required to transmit speech
can be reduced significantly.
4. PHASOR
2
6.2. Implement 𝑓𝑜𝑢𝑡 = 𝑓𝑖𝑛 ∗ (1 − 𝑝𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 ∗ (𝑑𝑤)/1000)
In this project, we create several Max patches to implement Where 𝑓𝑜𝑢𝑡 is frequency in, while 𝑓𝑖𝑛 is frequency that
voice changers with diverse functions. For example, a basic comes out. 𝑝𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 refers to the frequency of the
real-time voice changer with an effect of “World of phasor, and 𝑑𝑤 refers to a delay window.
Warcraft”, an advanced one with an effect of “Man, Woman If the input frequency is 100Hz, then the frequency out is
and Child voice converting”, and an ultimate version of 60Hz; if the former is 1000Hz, and the latter is 600hz.
“Robot sound with customized tones” based on vocoder. Therefore, we shift pitches down. Of course, when phasor
number is negative, the opposite happens and the pitches
shift up.
7. CONCLUSION
3
multiple real-time signals with one sound’s harmonics as a
pitch sieve for the other. As this work was concentrated on
developing new and funny applications of pitch shifting and
phase vocoder, improvements such as phase locking and
multi-resolution peak detection would make obvious
progress to the overall quality of processed sounds.
8. REFERENCE