Ijoart: Analysis and Synthesis of Speech Using Matlab

International Journal of Advancements in Research & Technology, Volume 2, Issue 5, M ay-2013 373
ISSN 2278-7763
ANALYSIS AND SYNTHESIS OF SPEECH USING MATLAB

Vishv Mohan (State Topper Himachal Pradesh 2008, 2009, 2010)
B.E(Hons.) Electrical and Electronics Engineering; President-NCSTU
University: Birla Institute Of Technology & Science, Pilani -333031(Rajasthan)-India
E-mail: vishv.mohan.1@gmail.com
ABSTRACT waves on a medium such as a phonograph.

The interval of each sound wave has Digital recording converts the analog sound
signal picked up by the microphone to a digital
different frequency in its sub-sections. This
form by a process of digitization, allowing it to
paper has made an analysis of two matlab
be stored and transmitted by a wider variety of
functions namely GenerateSpectrogram.m
media. Digital recording stores audio as a series
and MatrixToSound.m , in order to analyze of binary numbers representing samples of
and synthesis the speech signals. The first the amplitude of the audio signal at equal time
Matlab code section intervals, at a sample rate high enough to
GenerateSpectrogram.m record the user convey all sounds capable of being heard. The
IJOART
input sound for user (more precisely from feature of analysis and synthesis of sound, is
the source) defined duration and asks applied to create the speech with the help of
required parameters for computation of matrix of elements as frequency or time domain
spectrogram and returns a matrix with analyzed parameters with specific amplitude.
frequency as rows and time as column and Keywords : spectrum, synthesis, simulation,
corresponding matrix element as amplitude frequency, sound-waves, amplitude, wave
of that frequency. MatrixToSound.m uses sequence.
the method of additive synthesis of sound
INTRODUCTION
to generate sound from the user defined
matrix with frequencies as its rows and time The speech is an acoustic signal, hence, it is
as its columns. Sound recording is an a mechanical wave that is an oscillation of
electrical or mechanical inscription pressure transmitted through solid liquid or
of sound waves, such as spoken voice, gas and it is composed of frequencies
singing, instrumental music, or sound within hearing range. Sound is a sequence
effects. The two main classes of sound of waves of pressure that propagates
recording technology are analog through compressible media such as air or
recording and digital recording. Acoustic water. Audible range of sound is 20 Hz to
analog recording is achieved by a 20KHz, at standard temperature and
small microphone diaphragm that can pressure. During propagation, waves can
detect changes in atmospheric pressure be reflected, refracted, or attenuated by
(acoustic sound waves) and record them as the medium.
a graphic representation of the sound
Copyright © 2013 SciResPub. IJOART

ISSN 2278-7763
fidelity (wider frequency response or

dynamic range), but because the digital
format can prevent much loss of quality
found in analog recording due to noise
Recording of Sound and electromagnetic interference in
Sound recording is an electrical or playback, and mechanical deterioration or
mechanical inscription of sound waves, damage to the storage medium. A digital
such as spoken voice, singing, instrumental audio signal must be reconverted to analog
form during playback before it is applied to
music, or sound effects. The two main
a loudspeaker or earphones.
classes of sound recording technology
are analog recording and digital recording. Analysis of Sound Signal
Acoustic analog recording is achieved by a The long-term frequency analysis of speech
small microphone diaphragm that can signals yields good information about the
overall frequency spectrum of the signal, but
detect changes in atmospheric pressure
no information about the temporal location
(acoustic sound waves) and record them as of those frequencies. Since speech is a very
a graphic representation of the sound dynamic signal with a time-varying
waves on a medium such as spectrum, it is often insightful to look at
IJOART
a phonograph (in which a stylus senses frequency spectra of short sections of the
speech signal.
grooves on a record). In magnetic
tape recording, the sound waves vibrate the Long-term frequency analysis
microphone diaphragm and are converted
into a varying electric current, which is then The frequency response of a system is
defined as the discrete-time Fourier
converted to a varying magnetic field by
transform (DTFT) of the system's impulse
an electromagnet, which makes a response h[n]:
representation of the sound as magnetized
areas on a plastic tape with a magnetic
coating on it.
Similarly, for a sequence x[n], its long-term
Digital recording converts the analog sound frequency spectrum is defined as the DTFT
signal picked up by the microphone to a of the Sequence
digital form by a process of digitization,
allowing it to be stored and transmitted by
a wider variety of media. Digital recording
stores audio as a series of binary numbers
representing samples of the amplitude of Theoretically, we must know the sequence
the audio signal at equal time intervals, at x[n] for all values of n (from n=-∞ until
a sample rate high enough to convey all n=∞) in order to compute its frequency
sounds capable of being heard. Digital spectrum. Fortunately, all terms where x[n]
recordings are considered higher quality = 0 do not matter in the sum, and therefore
than analog recordings not necessarily an equivalent expression for the sequence's
because they have higher spectrum is

ISSN 2278-7763
the non-zero samples of x[n], and still

obtain the sequence's true spectrum X (ω).
But what is the correct mathematical
Here we've assumed that the sequence expression to compute the spectrum over a
starts at 0 and is N samples long. This tells short section of
us that we can apply the DTFT only to all of
the sequence, that is, over only part of the
non-zero samples of the sequence?
Then we compute the spectrum of the
windowed sequence x w [n] as usual
Window sequence
It turns out that the mathematically correct

way to do that is to multiply the sequence
The following figure illustrates how a
x[n] by a ‘window sequence’ w[n] that is
window sequence w[n] is applied to the
non-zero only for n=0… L-1, where L, the
sequence x[n]:
length of the window, is smaller than the
length N of the sequence x[n]:
IJOART
As the figure shows, the windowed Effect of the window
sequence is shorter in length than the To answer that question, we need to
original sequence. So we can further introduce an important property of the
truncate the DTFT of the windowed Fourier transform. The diagram below
sequence: illustrates the property graphically:
I. Implementation of an LTI system in the

time domain.
Using this windowing technique, we can

select any section of arbitrary length of the
input sequence x[n] by choosing the length
and location of the window accordingly. The
window sequence w[n] affect the short-
term frequency spectrum.

ISSN 2278-7763
The two implementations of an LTI system

II. Equivalent are equivalent: they will give the same
implementation of an LTI system in output for the same input. Hence,
the frequency domain. convolution in the time domain =
multiplication in the frequency domain:
This shows that multiplying the sequence
x[n] with the window sequence w[n] in the
time domain is equivalent to convolving the
spectrum of the sequence X (ω), with the
And since the time domain and the spectrum of the window W(ω). The result of
IJOART
frequency domain are each other’s dual in the convolution of the spectra in the
the Fourier transform, it is also true that frequency domain is that the spectrum of
multiplication in the time domain = the sequence is ‘smeared’ by the spectrum
convolution in the frequency domain: of the window. This is best illustrated by the
example in the figure below:
a) Choice of window important. Matlab supports a number of

Because the window determines the common windows, each with their own
spectrum of the windowed sequence to a strengths and weaknesses. Some common
great extent, the choice of the window is choices of windows are shown below.

ISSN 2278-7763
IJOART
All windows share the same characteristics.
Their spectrum has a peak, called the main
lobe, and ripples to the left and right of the
main lobe called the side lobes. The width
of the main lobe and the relative height of
term frequency spectrum: window length
and the number of frequency sample
points.
The window length controls
fundamental trade-off between time
the
the side lobes are different for each resolution and frequency resolution of the
window. The main lobe width determines short-term spectrum, irrespective of the
how accurate a window is able to resolve window's shape. A long window gives poor
different frequencies: wider is less accurate. time resolution, but good frequency
The side lobe height determines how much resolution. Conversely, a short window
spectral leakage the window has. An gives good time resolution, but poor
important thing to realize is that we can't frequency resolution. For example, a 250
have short-term frequency analysis without millisecond long window can, roughly
a window. Even if we don't explicitly use a speaking, resolve frequency components
window, we are implicitly using a when they are 4 Hz or more apart (1/0.250
rectangular window. = 4), but it can't tell where in those 250
millisecond those frequency components
b) Parameters of the short-term occurred. On the other hand, a 10
frequency spectrum millisecond window can only resolve
Besides the type of window —rectangular, frequency components when they are 100
hamming, etc. — there are two other Hz or more apart (1/0.010= 100), but the
factors in Matlab that control the short- uncertainty in time about the location of

ISSN 2278-7763
those frequencies is only 10 millisecond. the speech signal, i.e., the duration of one
The result of short-term spectral analysis period of the fundamental frequency. A
using a long window is referred to as a common choice for the window length is
narrowband spectrum (because a long either less than 1 times the fundamental
window has a narrow main lobe), and the period, or greater than 2-3 times the
result of short-term spectral analysis using a fundamental period.
short window is called a wideband Examples of narrowband and wideband
spectrum. In short-term spectral analysis of short-term spectral analysis of speech are
speech, the window length is often chosen given in the figures below:
with respect to the fundamental period of
The other factor controlling the short-term An important use of short-term spectral
spectrum in Matlab is the number of points analysis is the short-time Fourier
IJOART
at which the frequency spectrum H (ω) is transform or spectrogram of a signal.
evaluated. The number of points is usually The spectrogram of a sequence is
equal to the length of the window. constructed by computing the short term
Sometimes a greater number of points is spectrum of a windowed version of the
chosen to obtain a smoother looking sequence, then shifting the window over
spectrum. Evaluating H (ω) at fewer points to a new location and repeating this
than the window length is possible, but very process until the entire sequence has
rare. been analyzed. The whole process is
c) Time-frequency domain: illustrated in the figure below:
Spectrogram
Together, these short-term spectra (bottom where the horizontal axis is time, the vertical
row) make up the spectrogram, and are axis is frequency, and magnitude is the color
typically shown in a two-dimensional plot, or intensity of the plot. For example:

ISSN 2278-7763
The appearance of the spectrogram is 1 sample between the spectra. But doing so
controlled by a third parameter: window is wasteful when analyzing speech signals,
overlap. Window overlap determines how because the spectrum of speech does not
much the window is shifted between change at such a high rate. It is more
repeated computations of the short term practical to compute a new spectrum every
spectrum. Common choices for window 20-50 millisecond, since that is the rate at
overlap are 50% or 75% of the window which the speech spectrum changes.
length. For example, if the window length is
200 samples and window overlap is 50%, In a wideband spectrogram (i.e., using a
the window would be shifted over 100 window shorter than the fundamental
samples between each short-term period), the fundamental frequency of the
spectrum. In the case that the overlap was speech signal resolves in time. That means
75%, the window would be shifted over 50 that you can't really tell what the
samples. The choice of window overlap fundamental frequency is by looking at the
depends on the application. When a frequency axis, but you can see energy
temporally smooth spectrogram is fluctuations at the rate of the fundamental
desirable, window overlap should be 75% or frequency along the time axis. In a
more. When computation should be at a narrowband Spectrogram (i.e., using a
minimum, no overlap or 50% overlap are
IJOART
window 2-3 times the fundamental period),
good choices. If computation is not an issue, the fundamental frequency resolves in
you could even compute a new short-term frequency, i.e., you can see it as an energy
spectrum for every sample of the sequence. peak along the frequency axis.
In that case, window overlap = window
length – 1, and the window would only shift
GenerateTimeVsFreq.m
1) Duration=input('Enter the time in seconds for which you want to
record:');
2) samplingRate=input('Enter what sampling rate is required of audio 8000
or 22050: ');
3) timeResolution=input('Enter the time resolution desired in millisecond:
');
4) frequencyResolution=input('enter the frequency resolution required: ');
5) usedWindowLength =ceil(samplingRate/frequencyResolution);
6) recObj = audiorecorder(samplingRate,8,1);
7) disp('Start speaking.')
8) recordblocking(recObj,Duration);
9) disp('End of Recording.');
10) % Play back the recording.
11) play(recObj);
12) % Store data in double-precision array.
13) myRecordingData = getaudiodata(recObj);
14) figure(1)
15) plot (myRecordingData);
16) % No of Data points= samplingRate*Duration;
17) % No of columns in spectrogram=(duration*1000)/timeResolution;
18) % =duration*frequencyResolution;
19) actualWindowLength= ceil((samplingRate*timeResolution)/1000);
20) overlapLength= usedWindowLength -actualWindowLength +4;

ISSN 2278-7763
21) % Plot the spectrogram

22) S
=spectrogram(myRecordingData,usedWindowLength,overlapLength,samplingRat
e-1,samplingRate,'yaxis');
23) [ar ac]=size(S);
24) S1=imresize(S,[ar (Duration*1000)/timeResolution]);
25) AbsoluteMagnitude=abs(S1);
26) figure(2)
27) spectrogram(myRecordingData,256,200,256,samplingRate-1,'yaxis');
28) TimeInterval=input('Enter the time interval in terms of multiple
of time resolution to see the frequencies present at that moment:');
29) figure(3)
30) plot(AbsoluteMagnitude(:,timeInterval));
Additive synthesis generates sound by
adding the output of multiple sine wave
Synthesis of Sound generators. Harmonic additive synthesis is
There are many methods of sound closely related to the concept of a Fourier
synthesis. Jeff Pressing in "Synthesizer series which is a way of expressing a
Performance and Real-Time Techniques" periodic function as the sum of sinusoidal
gives this list of approaches to sound functions with frequencies equal to integer
synthesis, namely Additive synthesis, multiples of a common fundamental
Subtractive synthesis, frequency frequency. These sinusoids are called
IJOART
modulation synthesis ,sampling ,composite harmonics, overtones, or generally, partials.
synthesis ,phase distortion , wave shaping In general, a Fourier series contains an
,Re-synthesis ,granular synthesis ,linear infinite number of sinusoidal components,
predictive coding ,direct digital synthesis with no upper limit to the frequency of the
,wave sequencing ,vector synthesis ,physical sinusoidal functions and includes a DC
modeling. component (one with frequency of 0 Hz).
Frequencies outside of the human audible
We are using additive synthesis to
range can be omitted in additive synthesis.
synthesize the sound from matrix having
As a result only a finite number of
rows as different frequencies and columns
sinusoidal terms with frequencies that lie
as time intervals.
within the audible range are modeled in
a) Additive Synthesis additive synthesis.
Additive synthesis is a sound synthesis b) Harmonic form
technique that creates timbre by adding
The simplest harmonic additive synthesis
sine waves together. In music, timbre also
can be mathematically expressed as:
known as tone color or tone quality from
psychoacoustics(i.e. scientific study of
sound perception) , is the quality of a
musical note or sound or tone that
distinguishes different types of sound where ,y(t) is the synthesis output, , ,
production, such as voices and musical and are the amplitude, frequency, and
instruments, string instruments, wind the phase offset of the th harmonic partial
instruments, and percussion instruments of a total of harmonic partials, and is
the fundamental frequency of the

ISSN 2278-7763
waveform and the frequency of the musical a function of time, , in which case
note. the synthesis output is
c) Time-dependent amplitudes
More generally, the amplitude of
each harmonic can be prescribed as
d) Matlab Code
MatrixToSound.m
1) % FUNCTION TO PLAY SOUND FROM THE MATRIX
2) samplingRate= input('please enter the sampling rate used: ');
3) timeResolution= input('Please enter the time resolution in milliseconds: ');
4) matrix= input('please enter the matrix for conversion to sound');
5) lowerThreshold= input('Please enter the lower threshold value below which the
matrix element should be neglected( a number between 0 and 255: ');
6) time=0:1/samplingRate:(timeResolution/1000);
7) [mrows mcolumn]= size(matrix);
8) count=0;
9) [timerow NoOfComponents]= size(time);
10) SineVector=zeros(1,NoOfComponents);
IJOART
11) InitialSoundMatrix=zeros(NoOfComponents,mcolumn);
12) for j=1:mcolumn
13) for i=1:mrows
14) if(matrix(i,j)>lowerThreshold)
15) t=matrix(i,j)*sin(2*pi*time*i);
16) count=count+1;
17) SineVector=SineVector+t;
18) end
19) end
20) InitialSoundMatrix(:,j)=(SineVector’)
21) end
22) SoundMatrix=InitialSoundMatrix./(255*count);
23) [SMRow SMColumn]=size(SoundMatrix);
24) SoundColumn=reshape(SoundMatrix,SMRow*SMColumn,1);
25) soundsc(SoundColumn,samplingRate);
Conclusion Acknowledgement
My research paper is dedicated to my
The spectra of the sound corresponding to parents Sh. Vasu Dev Sharma, Lecturer
time can be computed using the Biology at Government Senior Secondary
GenerateTimeVsFrequency.m matlab file School Bilaspur Himachal Pradesh(India)
and its result matches approximately with and Smt. Bandna Sharma; T.G.T
Mathematics at Sarswati Vidya Mandir
that of specgramdemo function of the
Bilaspur Himachal Pradesh(India) whose
matlab. Additive synthesis of sound can be
blessing and wishes made me capable to
simulated with the help of the matlab file complete this paper more effectively and
created MatrixToSound.m. It approximates efficiently.
the actual sound.

ISSN 2278-7763
References
(a) Textbooks
[1] Oppenheim, A.V., and R.W.

Schafer, Discrete-Time Signal Processing,
Prentice-Hall, Englewood Cliffs, NJ, 1989,
pp.713-718.
[2] Rabiner, L.R., and R.W. Schafer, Digital

Processing of Speech Signals, Prentice-Hall,
Englewood Cliffs, NJ, 1978.
(b) Websources
1) http://www.mathworks.in/matlabcentr
al/fileexchange/index?utf8=%E2%9C%9
3&term=spectrogram
2) http://en.wikipedia.org/wiki/Additive_s
IJOART
ynthesis#Time-dependent_amplitudes
3) http://isdl.ee.washington.edu/people/s
tevenschimmel/sphsc503/
http://hyperphysics.phy-
astr.gsu.edu/hbase/audio/synth.html
By: Er. Vishv Mohan, s/o Sh. Vasu Dev Sharma
State Topper Himachal Pradesh 2008, 2009, 2010.

B.E(Hons.) Electrical & Electronics Engineering,
BITS-Pilani_(Rajasthan)-333031_India.
vasuvishv@gmail.com

Ijoart: Analysis and Synthesis of Speech Using Matlab

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ijoart: Analysis and Synthesis of Speech Using Matlab

Uploaded by

Copyright:

Available Formats

International Journal of Advancements in Research & Technology, Volume 2, Issue 5, M ay-2013 373

ANALYSIS AND SYNTHESIS OF SPEECH USING MATLAB

ABSTRACT waves on a medium such as a phonograph.

Copyright © 2013 SciResPub. IJOART

fidelity (wider frequency response or

Copyright © 2013 SciResPub. IJOART

the non-zero samples of x[n], and still

It turns out that the mathematically correct

I. Implementation of an LTI system in the

Using this windowing technique, we can

Copyright © 2013 SciResPub. IJOART

The two implementations of an LTI system

a) Choice of window important. Matlab supports a number of

Copyright © 2013 SciResPub. IJOART

Copyright © 2013 SciResPub. IJOART

Copyright © 2013 SciResPub. IJOART

Copyright © 2013 SciResPub. IJOART

21) % Plot the spectrogram

Copyright © 2013 SciResPub. IJOART

Copyright © 2013 SciResPub. IJOART

[1] Oppenheim, A.V., and R.W.

[2] Rabiner, L.R., and R.W. Schafer, Digital

By: Er. Vishv Mohan, s/o Sh. Vasu Dev Sharma

State Topper Himachal Pradesh 2008, 2009, 2010.

Copyright © 2013 SciResPub. IJOART

You might also like