This action might not be possible to undo. Are you sure you want to continue?
Algorithms in Signal Processors
Audio and Video Applications
2011
DSP Project Course
using
Texas Instruments TMS320C6713 DSK and TMS320DM6437
Dept. of Electrical and Information Technology, Lund University,
Sweden
i
ii
Contents
I Guitar tuner
P V Soumya, A Norrgren, CJ Waldeck, F Brosj¨o 1
1 Introduction 2
2 Theory 3
2.1 Harmonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Fourier Transform . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Cross Correlation . . . . . . . . . . . . . . . . . . . . . 4
3 Method 5
3.1 Analysis of the guitar sound . . . . . . . . . . . . . . . . . . . 5
3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1 Echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.2 Detection . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.3 Process . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.4 DFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.5 SmallXcorr . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.6 MatLab GUI . . . . . . . . . . . . . . . . . . . . . . 9
4 Results and Discussion 9
4.1 Matlab testing . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Posttesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Conclusion and further development 12
II Pitch Estimaiton
Jonas Rosenqvist, Kim Smidje, Henrik Nilsson, Johan Mattsson 15
1 Introduction 16
2 Theory 16
3 Methods 18
3.1 Time domain . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Frequency domain . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Implementation 19
5 Problems encountered 22
iii
6 Conclusion 23
7 References 23
III Vocoder
Mattias Danielsson, Andre Ericsson, Kujtim Iljazi, Babak Rajabian 25
1 Introduction 26
2 Theory 26
2.1 Overall description of our vocoder model . . . . . . . . . . . . 26
2.2 The highpass ﬁlter . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 The autocorrelation function . . . . . . . . . . . . . . . . . . 27
2.4 The Levinsondurbin recursion . . . . . . . . . . . . . . . . . 27
2.5 The IIR lattice ﬁlter . . . . . . . . . . . . . . . . . . . . . . . 28
3 Implementation 30
4 Testing and debugging 31
5 Results and conclusions 32
IV Reverberation
R. Tullberg, R. Mittipalli, S. AbduRahman, T. Isacsson 35
1 Introduction 36
1.1 Reverberation . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2 Theory 38
2.1 Reverb Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 Reverberation time . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3 Delay elements z
−mi
. . . . . . . . . . . . . . . . . . . . . . . 39
2.4 Damping Filters h
i
(z) . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Diﬀusion Matrix A . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 Gains b
i
and c
i
. . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.7 Tonal Correction Filter t(z) . . . . . . . . . . . . . . . . . . . 40
3 Implementation 41
3.1 Realtime versus nonrealtime implementation . . . . . . . . . 41
3.2 Diﬀusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Software Optimizations . . . . . . . . . . . . . . . . . . . . . 41
3.3.1 Matrix Multiplication . . . . . . . . . . . . . . . . . . 41
3.3.2 Circular Buﬀers . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Compiler Optimization . . . . . . . . . . . . . . . . . . . . . . 43
iv
3.5 Hardware Memory Considerations . . . . . . . . . . . . . . . 43
4 Result 45
4.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5 Discussion and Conclusion 45
5.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Nonrealtime versus realtime implementation . . . . . . . . . 46
5.3 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
V Speech Recognition Using MFCC
Harshavardhan Kittur, Kaoushik Raj Ramamoorthy,
Manivannan Ethiraj, Mohan Raj Gopal 49
1 Introduction 50
1.1 Why Speech recognition? . . . . . . . . . . . . . . . . . . . . 50
1.2 Common problems found in designing such a system . . . . . 50
1.3 Tools Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2 Theory 51
2.1 Speech Recognition Algorithm . . . . . . . . . . . . . . . . . 51
2.1.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . 51
2.1.2 Feature Matching . . . . . . . . . . . . . . . . . . . . . 52
2.2 Mel Frequency Cepstrum Coeﬃcients . . . . . . . . . . . . . . 52
3 Implementation 53
3.1 Level detection . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Frame blocking . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4 Fast Fourier transform . . . . . . . . . . . . . . . . . . . . . . 53
3.5 Power spectrum calculation . . . . . . . . . . . . . . . . . . . 53
3.6 Melfrequency wrapping . . . . . . . . . . . . . . . . . . . . . 55
3.7 Logenergy spectrum . . . . . . . . . . . . . . . . . . . . . . . 56
3.8 Melfrequency cepstral coeﬃcients . . . . . . . . . . . . . . . 56
3.9 Comparison in the feature matching phase . . . . . . . . . . . 56
4 Implementation in MATLAB 56
5 Implementation in DSP Board 58
6 Tests and Results 58
7 Conclusion 58
v
VI Face Detection, Tracking and Recognition
Asheesh Mishra, Mohammed Ibraheem, Shashikant Patil 61
1 Abstract 61
2 Introduction 62
3 YCbCr Color Space Model 63
4 Image Filtering for Noise reduction 63
5 Edge Detection 63
6 Face Detection and Tracking 64
7 Face Recognition 67
8 Problem faced 68
9 Conclusion and Future work 69
VII Circular Object Detection
Ajosh K Jose, Qazi Omar Farooq, Sherine Thomas, Sreejith P Raghavan 71
1 Introduction 72
2 Theory 72
2.1 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.1.1 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . 74
2.1.2 Finding gradients . . . . . . . . . . . . . . . . . . . . . 75
2.1.3 Nonmaximum suppression . . . . . . . . . . . . . . . 75
2.1.4 Double thresholding . . . . . . . . . . . . . . . . . . . 76
2.1.5 Edge tracking by hysteresis . . . . . . . . . . . . . . . 76
2.2 Circular Object Detection . . . . . . . . . . . . . . . . . . . . 78
3 Implementation 78
4 Conclusion & Future Work 79
vi
Part I
Guitar tuner
P V Soumya, A Norrgren, CJ Waldeck, F Brosj¨o
Abstract
This report handles the development of a guitar tuner based on the
Texas Instrument TMS320C6713DSK signal processing board. First
the theory about the guitar strings and it’s harmonic patterns is cov
ered, along with a description of the diﬀerent mathematical algorithms
used to tune them. This is followed by the analysis of the guitar sound
and the method used implement the tuner on the DSPboard. A great
deal of the report handles problems associated with the memory of
the board, together with solutions developed to work around them. A
working guitar tuner was then made. Suggestions of possible improve
ments are presented in the ﬁnal section, which concludes this project.
1
2 Guitar tuner
1 Introduction
When thinking of pitch estimation one of the project group members came
to think of a problem he experiences in tuning his guitars. When a string
on some guitars is tuned, the other strings change pitch as well due to the
higher stress on the guitar caused by the tension. This makes tuning a
tough and time consuming procedure, since the strings have to be tuned
separately and many times to achieve a stable pitch for all of them. A way
to address this issue is to have a tuner that allows the user to overlook all
strings at the same time; thereby being able to correct changes to the other
strings instantaneously. When searching for this type of tuner only one
was found on the commercial market, a TC Electronic polytune [4]. This
was our source of inspiration for this project in pitch estimation. The goal
of this project is ﬁrst to be able to determine the pitch of a single string
and eventually expanding it to being able to estimate the pitch for multiple
strings simultaneously and presenting the result to the user.
P V Soumya, A Norrgren, CJ Waldeck, F Brosj¨o 3
2 Theory
2.1 Harmonics
A note played on a string instrument consists of a fundamental frequency,
called the pitch, and a number of harmonics. The frequencies of these har
monics are multiples of the pitch and are the same for every instrument
tuned to the same pitch. What diﬀers the instruments and gives them their
unique sound is the amplitude pattern for these harmonics. These patterns
depends on many factors like the length, thickness and material of the string.
A guitar has six strings tuned to diﬀerent pitches. In table 1 all strings with
their corresponding pitches and ﬁrst two harmonics are shown.
String nbr Octave f
0
2f
0
3f
0
6 E2 82.407 164.814 247.221
5 A2 110.000 220.000 330.000
4 D3 146.830 293.660 440.490
3 G3 196.000 392.000 588.000
2 B3 246.940 493.880 740.820
1 E4 329.630 659.260 988.890
Table 1: Guitar frequencies
The pattern of the harmonic amplitudes are not the same for diﬀerent strings
and guitars which gives each guitar its speciﬁc sound. In ﬁgure 1 the fre
quency pattern of the E4 string on a Hagstr¨om Viking is shown.
Figure 1: Spectrum of E4
4 Guitar tuner
The chromatic scale that is used globally, an octave is divided into 12
pitches, which are one semitone apart. Between each semitone there are
100 equally spaced increments called cents [2]. This unit is commonly used
to measure the accuracy of instrument tuners. A commercial good quality
tuner usually have an accuracy of between +/1 and +/3 cents.
2.2 Algorithms
2.2.1 Fourier Transform
The Fourier transform is a discrete transform between the time and fre
quency domain. A portion of a signal in time domain is analysed and the
frequency components are extracted with their corresponding amplitude.
This is done using equation 1 which has a linear frequency scale.
X
k
=
¸
N−1
n=0
x
n
e
−2πnk/N
k = 0, 1, ..., N −1 (1)
In this case the frequency scale of octaves is logarithmic, meaning that an
increase of one octave corresponds to a doubling of the frequency. To handle
this smoothly a logarithmic frequency scale can be implemented. Since the
distance between the octaves is increasing for higher frequencies the number
of increments between each octave is increasing in the linear frequency scale.
This makes the accuracy better for higher frequencies. To obtain a linear
accuracy over the entire frequency spectrum, the frequency scale is made
logarithmic. This is done by replacing k in equation 1 with (2):
k = f
0
· B
i/N
i = 0, 1, ..., N −1 (2)
Where B is an arbitrary base and f
0
is the starting frequency. The base
determines the size of the scale along with the number of points.
The information about DFT and the logarithmic frequency scale was found
in Martin Stridh doctoral thesis, Signal Characterization of Atrial Arrhyth
mias using the Surface ECG [3].
2.2.2 Cross Correlation
A cross correlation is used to ﬁnd similarities between two diﬀerent discrete
signals. This is done by multiplying the signal components on each index
with each other and summing them up. This is done repeatedly where one
of the signals are shifted in relation to the other. The cross correlation of
the signals x and y is given by equation 3.
r
xy
(n) =
¸
l
x(l) · y(l −n) (3)
P V Soumya, A Norrgren, CJ Waldeck, F Brosj¨o 5
Provided that the guitar is roughly in tune the reference spectrum should
have a high correlation near the centre index. To save time and calculations
there is no need to do a full correlation and so it was limited to 20 steps
around the centre index. The correlation results in an array where the index
with highest value represents the best match between the signals.
The reference spectra is only composed of three ones, where the ones
represents the fundamental and the two harmonics. The correlation speed
can then be improved by removing all the multiplications and most of the
additions, so all that is left is the sum of the three values in the spectrum
where the reference is one. If the spectrum is of the length 1500 an ordinary
scalar product requires 1500 multiplications and 1499 additions. With the
improved correlation it only requires two additions and no multiplications.
That means that it saves 2997 operations for every scalar product. Roughly
estimated this saves a total of 700 000 operations with the improved corre
lation for all the six strings.
−40 −30 −20 −10 0 10 20 30 40
0
0.5
1
1.5
2
2.5
Shift
C
o
r
r
e
l
a
t
i
o
n
c
o
e
f
f
i
c
i
e
n
t
Figure 2: Correlation of the spectra in ﬁgure 1
3 Method
3.1 Analysis of the guitar sound
The project began with recordings of the guitar strings from a Hagstr¨om
Viking semi acoustic electric guitar. The sounds were then analysed us
ing Matlab’s built in FFTfunction to get an idea of how the frequency
spectrum would look like. The spectrum was found to vary a lot between
diﬀerent strings and pitches, and it was quickly concluded that the frequency
positions of the harmonic peaks were of higher signiﬁcance than their am
plitude. Since the goal is to tune multiple strings simultaneously, a normal
auto correlation, or even a cepstrum, that might have been used for one
string, could not be used. Instead another method was tested.
6 Guitar tuner
By cross correlating the frequency spectra from the coincident strings
with the spectra from the individual strings in turn, a separate correlation
for each string could be obtained. This method was tested using recorded
sounds, which resulted in a very messy graph. The amplitude diﬀerence
of the diﬀerent harmonics of the reference sounds made it hard to get a
clear result. To get rid of this issue, the references were constructed rather
than recorded, to be quantised and noise free. In this way we could get
the exact frequencies for the pitch and harmonics of each string, and the
correlation was done to the frequency pattern rather than the amplitude of
the harmonics.
Since it was realised that the frequency diﬀerence between the pitch
and the harmonics would be changed when the string is not tuned, the
linear frequency scale would be hard to use because the correlation would
yield multiple peaks depending on if the pitch or the harmonics matched
perfectly, as shown in ﬁgure 3 A. This would compromise the accuracy since
it would be better if both the pitch and the harmonics matched at the same
time, as in ﬁgure 3 B. To obtain an adequate accuracy, there was a need
−20 −15 −10 −5 0 5 10 15 20
0
0.5
1
1.5
2
2.5
3
Cross correlation for linear frequency scale
A
m
p
lit
u
d
e
Shift
−20 −15 −10 −5 0 5 10 15 20
0
0.5
1
1.5
2
2.5
3
Cross correlation for logarithmic frequency scale
A
m
p
lit
u
d
e
Shift
Figure 3: A Linear correlation B Logarithmic correlation
for a specialised Fourier transform with a logarithmic frequency scale. This
solves the problem because when the cross correlation is made, the pitch
and the harmonics will match at the same time.
The aim was to construct a tuner with relatively high accuracy, and with
the limited memory, an accuracy of ±3 cents was chosen. This came to eﬀect
the choice of resolution and thereby the base to the logarithmic frequency
scale. This corresponds to an acceptable error in the correlation of ±1 steps
in the correlation.
P V Soumya, A Norrgren, CJ Waldeck, F Brosj¨o 7
3.2 Implementation
The program was constructed using a number of diﬀerent functions. A main
function where the necessary parameters and arrays were initialized and
constructed. Since the DSP uses software interrupts, no loop is needed to run
the program. A software interrupt called echo is activated when the input
buﬀer is full. This calls the detection which registers the input amplitude of
the signal and triggers a software interrupt if the signal exceeds a threshold
level. The interrupt runs the process which calls the diﬀerent functions
needed to process the signal. Further explanation of these functions follow.
Figure 4: Flow chart over the algorithm
3.2.1 Echo
Echo is called by interrupt when the input buﬀer is full and collects the values
from the input buﬀer and passes on the buﬀer to the detection function.
When the DSP is starting up is has a lot of random values on the input
that must be ignored. A counter was then used to ignore the ﬁrst 128 calls
to detection. The counter is then reset when the data has been processed,
to ignore the end of a signal and prevent misreadings.
8 Guitar tuner
3.2.2 Detection
The detection function ﬁrst calculates the mean power of the input. If
the value is higher than a predeﬁned threshold value, the detection function
goes into an buﬀering mode where it samples every package until the sample
buﬀer is ﬁlled. When the last step is done the buﬀering mode is deactivated,
the process ﬂag is set and the interrupt process is called.
3.2.3 Process
Process gathers all the functions needed to perform the processing of the
signal. The ﬁrst step is to perform a DFT which is covered in the section
below. The result is then correlated together with the reference array for the
individual strings. The returning values represents the indices of the max
imum correlation and the corresponding value in relation to the maximum
possible correlation value.
3.2.4 DFT
The DiscreteFT is based on a normal Fourier transform summation using
a double forloop. The frequency array used was explained in the theory
section, and this is the only diﬀerence from a normal DFT. Since the DSP
does not have support for the complex numbers, the summation had to be
done in two separate variables, one for the real and one for the imaginary
part of the complex result. The magnitude of these are normalized to reduce
the risk of overﬂow and then stored in the output array.
3.2.5 SmallXcorr
The small cross correlation function is made so that it only calculates a
small part of a normal correlation. It was chosen to only shift 20 steps to
the left and to the right around the centre element. In other words 41 steps
in total. The small correlation is done using a forloop which runs from 20
to 20 where it sums the elements where the reference frequencies are, which
is done for the references of all six strings. When this is done a second
loop goes through the resulting correlation arrays to ﬁnd the index of the
maximum value.
P V Soumya, A Norrgren, CJ Waldeck, F Brosj¨o 9
3.2.6 MatLab GUI
A graphical user interface was done using Matlab’s GUI Guide. The pro
gram consists of a table, two buttons and a timer. To access the DSP a built
in function called ccsdsp was used, which makes it possible to load and run
the project on the board from within Matlab. The RUN button uploads
the program to the DSP and runs it and the STOP button stops the DSP
and closes the program. The table is updated every second using a timer
interrupt.
Figure 5: Graphical User Interface
4 Results and Discussion
4.1 Matlab testing
The ﬁrst thing we did was to record the sound from all strings and take a
DFT to get an idea of how the spectra would look like. The result, using
Matlab’s built in FFTfunction, gave us the spectra shown in ﬁgure 6A.
As can be seen there is a lot of diﬀerent peaks with varying amplitude and
it’s diﬃcult to distinguish between the fundamentals and the harmonics.
In a spectrum for one string this diﬀerence, as visible in ﬁgure 6, is much
clearer. The work continued by implementing a DFTalgorithm in Matlab
to ensure its functionality. The initial results were good, however it was
soon realized, as mentioned prior, that the distance between the frequency
increments had to be logarithmic to yield appropriate accuracy in lower
frequencies. The implementation of this was fairly simple in Matlab and
did not generate any problems out of the ordinary. The function was tested
against Matlab’s built in FFTfunction and resulted in very similar data.
Some slight variations were found, but could very well be contributed to
round oﬀ errors.
10 Guitar tuner
Figure 6: A Spectrum of all strings B Spectrum of D3
The correlation algorithm was also implemented and tested against Mat
lab’s correlation function. There were some minor diﬀerences that most
probably are because of round oﬀ errors. This is due to that Matlab does
not use full length ﬂoats like C, but a user deﬁned length; in our case the for
mat short containing four decimals. Since the correlation algorithm handles
a discrete frequency array, the best correlation can sometimes be between
two indexes, and thereby result in a double peak in the correlated data. This
could be avoided by using a higher frequency resolution; however the amount
of memory and the time needed for additional calculations set a limit to this
resolution. It was also important to get the frequency points in the array
as close as possible to the known frequencies to be used as reference values.
Otherwise there would be an error because of displacement from the correct
value, and the tuner would always have an oﬀset. The values used for the
frequency array, calculated using Stridh’s formula described in the theory
section, had an initial frequency of 72.1 Hz, 1500 points and a base of 15.
4.2 Implementation
The implementation on the DSP board was straight forward and did not
generate so many problems at ﬁrst. As mentioned above, the DSP com
piler did not support the complex.h package, but a suitable work around to
this problem has already been covered. After implementing the necessary
functions without any major issues, the program was tested and resulted in
confusion. The values were not at all consistent with the expected values
generated in Matlab. After many hours spent on error correction it was
found that the memory was over written in some way and replaced the re
sult values with memory addresses. This turned out to be because of lacking
P V Soumya, A Norrgren, CJ Waldeck, F Brosj¨o 11
internal memory that we were expected to have, and many arrays had to
be moved to the external memory. The memory conﬁguration of the board
was fairly hard to understand, especially the amount of memory that was
available in the diﬀerent memory banks.
By moving almost all of the arrays to the external memory the functions
started to work better with correct results, however very slow. The time
to go through one cycle was too long; it took between 10 and 20 seconds.
To decrease this time the algorithms were analysed many times to look for
means of improvement. After much testing a new correlation algorithm was
created that only used the points of interest in the reference arrays rather
than correlating every point. This drastically improved the time to process
the signal to a few seconds. This is still a bit too long but acceptable.
4.3 Posttesting
The system was now complete and some post testing was done. The Hag
str¨om guitar was used to test the tuner and tuning one string worked well.
This is shown in ﬁgure 7 where the 5th and 6th strings are tuned individually.
The strings has prior to the test been tuned with a TC electronic polytune
commercial tuner [4]. As seen the strings get the tuning value of 1, which
implies that their pitch is slightly high. This is due to low resolution of the
tuner, and the guitar should be viewed as tuned for values between 1 and
1.
When tuning all strings the ﬁrst result is most often wrong, most likely
as a result of sampling to early when the strings are still unstable after the
strum, see ﬁgure 8 A. The second sampling of the same strum usually have
a more accurate shift. The accuracy of the tuning, based on the amplitude
of the correlation however is lower, visible in ﬁgure 8 B. The cause could
also be that the diﬀerent strings have diﬀerent amplitudes and diﬀerent
sustain, causing the spectrum to be uneven. This is clearly something to
continue working on, the timing of the tuning has to be perfected. The
data presented after a tuning was also more unstable the more strings that
were included. The most likely cause is the enormous amount of frequency
data which results in correlation matches in other places than the intended.
This could easily be improved with more memory and processing power that
would allow longer reference arrays with more harmonics.
12 Guitar tuner
Figure 7: A Sixth string E2, B Fifth string A2
Figure 8: A All strings 1, B All strings 2
5 Conclusion and further development
As it is now the tuner works quite well for tuning of one string at a time,
but get much less accurate when more strings are tuned. This can be due
to too good correlation matches in more than one point in the spectrum
because of the large amount of fundamentals and harmonics in the sampled
signal. Another reason may be that the sound level from some of the strings
has dropped in amplitude before the sampling has started; thereby causing
a lower probability that the right string is tuned. This might be counter
acted by increasing the frequency resolution of the tuner and the number of
harmonics in the reference arrays.
One way to make sure that some noise is suppressed might be to use
windowing functions, however we discovered that a hamming window did
not actually improve the frequency spectrum in a noticeable way. More
testing with windowing functions should be able to clean up the spectra by
narrowing the peaks, making correlation more accurate.
Since we have limited memory and computing power we had to limit the
number of frequency points to 1500 to be able to get a result in a reasonable
time. This yields an uncertainty in the higher frequencies where the step
size becomes to large. In further development this can be helped by using
more frequency points and also by reducing the base, in order to make the
points come closer to each other. This would improve the accuracy for the
entire spectra, and not just the higher frequencies, which is good.
A great deal of time could be spent on optimization for faster perfor
mance, allowing longer arrays and higher resolution as well as quicker results.
As for now, the result is presented in a few seconds which is a bit too slow
P V Soumya, A Norrgren, CJ Waldeck, F Brosj¨o 13
in our opinion. A goal for further development would be to reduce this time
to under a second to make real time tuning bearable.
References
[1] Vaughn Aubuchon, This Vaughns Music Note Frequency Chart
http://www.vaughns1pagers.com/music/musicalnote
frequencies.htm (20110228).
[2] Hyperphysics,
http://hyperphysics.phyastr.gsu.edu/hbase/music/cents.html
(20110303)
[3] Martin Stridh, Signal Characterization of Atrial Arrhythmias using the
Surface ECG, Vol. 33, ISSN 14028662, 2003
[4] TC Electronics,
http://www.tcelectronic.com/polytune.asp (20110303)
14 Guitar tuner
15
Part II
Pitch Estimaiton
Jonas Rosenqvist, Kim Smidje, Henrik Nilsson, Johan Matts
son
Abstract
This report covers the implementation of a digital signal process
ing algorithm for the TMTS320C6713 by Texas Instruments. The
algorithm is written in C using Code Composer Studio IDE and it
aims to determine the dominant frequency for a given input audio sig
nal. This is achieved by applying the cepstrum transform which is an
extension of the Discrete Fourier Transform, involving additional ma
nipulation of each sample in the frequency domain. This is followed by
an inverse Fourier transform which yields a signal in what is known as
the quefrency domain, in where you search for the highest amplitude,
disregarding certain intervals. From here the dominant frequency can
be extracted and the closest pure tone as well as the distance to it is
presented to the user.
16 Pitch Estimaiton
1 Introduction
Pitch estimation is just as the name says, an estimation of the pitch. There
are some techniques that can be used with diﬀerent advantages and disad
vantages. But the common thing is that they all require a lot of compu
tations. In this report we will focus on the cepstrum algorithm which is a
collection of mathematical tools. The cepstrum algorithm has the advantage
that its faster than for example autocorrelation which gives the possibility to
compute the frequencies from a faster sample rate. The reason why cepstrum
is fast comes from the fact that it is computed using only the Fast Fourier
Transform, its inverse, as well as the absolute and logarithmic functions, all
of which have a fairly low time complexity. The working process we chose to
solve the problem was to ﬁrst solve the problem in a familiar environment,
namely in Matlab, and after that go deeper into Code Composer.
2 Theory
The Fourier transform transforms a signal in the time domain, i.e. the
amplitude as a function of time, into a signal in the frequency domain, i.e.
amplitude as a function of frequency. This is visualised in Figure 1 and
Figure 2. In Figure 1, the top plot, two sinusoidal signals with diﬀerent
phase, amplitude and frequency are shown and the lower plot its the sum
of these signals. The result of applying the Fourier transform on to the
sum of the signals is displayed in Figure 2. One sees here that the value
of the Fourier transform for a given value of the variable f corresponds to
the amplitude of a sinusoid component with that frequency in the original
signal. Since the original signal is made up only of the sum of two pure
sinusoid with diﬀerent amplitudes, the Fourier transform consists of only
two peaks, each of them representing the amplitude of the two sinusoid. For
all other values of f, F(f) has a value of zero because those frequencies are
absent, or to put it diﬀerently, have an amplitude of zero in the original
signal.
Cepstrum, which comes from reversing the four ﬁrst letters in the word
“spectrum”, is a method which distinguish frequencies in a tone or sound in
order to determine which frequency is the ground frequency of them all. The
cepstrum method uses several diﬀerent other operations in order to reach
its goal and its mathematical representation is:
F
−1
(log
10
(F (x) ))
What it does is that it takes the signal and samples it in the discrete Fourier
transform, then the absolute values of that result, converts it into the loga
rithmic scale and lastly transforms the samples back into the time domain.
Jonas Rosenqvist, Kim Smidje, Henrik Nilsson, Johan Mattsson 17
Figure 1: Top: Two sinusoidal signals with diﬀerent amplitudes and phases.
Bottom: The sum of the two signals
The reason for using both the absolute and the logarithmic functions are
because you want to emphasize lower frequencies to make sure that the
dominant peak comes from the ground frequency and not from one of the
over tones. The result after the transformation is called quefrency, which is
measured in seconds, though not in the sense of a signal in the time domain.
Because of the convolution occurring with the FFT, the signals will be ad
ditive, which is an important property for the cepstra (the spectra for the
cepstrum). The quefrency will therefor be the sum of all the signals which
are recorded. After the quefrency has been calculated, the highest peak in
the window will correspond to the right frequency. For instance if a pea!
k in the cepstrum diagram would appear at point X (dimensionless), this
would respond to the frequency derived from taking the sample rate (mea
sured in Hz) divided with X. The peaks in the cepstra occurs as a result of
the periodicity of the signal or the sound.
[1]
18 Pitch Estimaiton
Figure 2: The power spectrum of the two sinusoidal signals
3 Methods
There’s two types of algorithms that can be used to obtain estimated fre
quencies. Algorithms in the time domain and algorithms in the frequency
domain.
3.1 Time domain
One very simple approach that could be used in the time domain is to look
at the zero crossings in the signal. That is when the signal goes from a high
value to a low value and the other way around. In one period the signal will
cross zero two times and by measuring at what times this is done a rough
estimate of the pitch can be calculated. This approach isn’t that robust
since for signals that consist of multiple sinus signals with diﬀerent periods
the result will not be close to the real frequency. Other methods in the time
domain such as autocorrelation has a diﬀerent approach. As the name says
autocorrelation tries to ﬁnd the correlation of an input signal with the input
signal with some lag, by comparing the cross correlation 1 point a time we
can start building another graph which hopefully will look like a sinus signal.
The main problem with the autocorrelation is for higher frequency’s because
the number of additions will become overwhelming. This is the reason why
autocorrelation is mostly used in the low to mid frequency range.
[1]
Jonas Rosenqvist, Kim Smidje, Henrik Nilsson, Johan Mattsson 19
autocorrelation[k] =
1
N −k
N
¸
n=k
signal [n] ∗ signal [n −k]
3.2 Frequency domain
Frequency domain methods takes an input in the time domain and com
putes the frequency spectrum. The input signal displayed in the frequency
spectrum will cover the whole spectrum but the dominant frequency will
have the highest peak. The main advantage of frequency methods is the use
of the fast Fourier transform which makes the computations fast and reli
able. There are a number of diﬀerent algorithms that perform operations
in the frequency domain for example kepstrum, cepstrum, power cepstrum
and maximum likelihood. To be able to compute the estimated pitch the
input signal needs to be divided into smaller parts and computed for each
part. The disadvantage of dividing the input signal is the loss of resolution,
since the estimated frequency depends on the sampling rate and the length
of the divided input signal. It is possible to still get a good resolution if for
example the sampling rate is 8000 and the lenght of the divided input signal
is 8000 then each frequency can be represented.
[1]
sample rate
indexof (maximumvalue)
4 Implementation
In order to evaluate the algorithms ability to correctly detect the dominant
frequency we decided to ﬁrst implement it in Matlab. In addition to the
group being more experienced working with Matlab as opposed to the C
language, it allows for much faster implementation thanks to the high level
developing environment with many of the crucial algorithms, such as the fast
Fourier transform, already implemented. It was also at this stage that we
estimated the appropriate cutoﬀ level in the quefrency domain, as well as
suitable signal sample size, by experimenting with various audio signals. By
not discarding enough initial values in the quefrency domain one run the risk
of ﬁnding a false dominant frequency. On the other hand, if too many values
are ignored, one might miss the true dominant frequency. The sample size
must cover a large enough time frame to detect the lowest possible frequency,
and that the same time not be too big in regards to the limited memory
and real time requirements. Using a sample rate of 32000 samples/second
and a vector of length 512, corresponding to a window of 16 ms, we found
that we got acceptable results while still being able to detect frequencies as
low as approximately 240 Hz. We loaded an audio signal generated by a
horn, which had the frequency 123.47 Hz. That frequency corresponds to
20 Pitch Estimaiton
Figure 3: Power spectrum from the B
2
horn
a B
2
note, which means the note B in the second octave. When running
our Matlab pitch detection program, we get these plots where the ﬁrst plot
represent the power spectrum from the B
2
horn and the second plot shows
the cepstrum of the B
2
horn. Given the cepstrum, the estimated frequency
can be calculated to approximately 126 Hz, which can be considered as a
reasonable deﬂection from the correct answer. Since we aimed to make our
range designed for fourth octave we did the same with C
4
, then we got the
plot shown below. If the peak is located at index 124, which gives us the
frequency circa 258 Hz.
Implementation in Code composer was very similar to what we did in
Matlab. With the use of the DSP library the algorithms FFT and IFFT
was made available. Other functions as log and abs was introduced into the
program with the libraries math.h. To increase the accuracy of the program
we chose to take the 10 latest values and only present the median of these to
remove any potential outliers. After some trial and error we chose a cut oﬀ
point at 20 samples as that gave us the right results in our chosen octave.
Results
Table 1 shows the frequency estimation for 17 diﬀerent frequencies that cov
ers evenly spaced intervals of the third and ﬁfth as well as the whole fourth
octave in the frequency range of 260520 Hz. In addition to this we’ve added
Jonas Rosenqvist, Kim Smidje, Henrik Nilsson, Johan Mattsson 21
Figure 4: The cepstrum plot with folding
Figure 5: Highest peak at index 124
22 Pitch Estimaiton
all the pure notes in the fourth octave.The fourth column of the table shows
that the errors in the fourth octave are very small (average of 1.5 Hz), and
rapidly increase as we move outside it.
Input frequency (Hz) Output frequency (Hz) Note Deviation (Hz)
240 260 20
262 262 C
4
0
280 280 0
293 292 D
4
1
320 320 0
330 328 E
4
2
349 346 F
4
3
360 358 2
392 390 G
4
2
400 400 0
440 438 A
4
2
480 476 4
494 492 B
4
2
520 524 4
523 524 C
5
1
560 560 0
600 602 2
640 640 0
680 680 0
720 726 6
760 760 0
800 800 0
840 842 0
880 888 8
We could get the right results as low as 180 Hz and as high as 2000 Hz
but at these values the reliability suﬀers and you sometimes end up with an
over tone, the right note but wrong octave.
5 Problems encountered
The biggest problems which we encountered where during the implementa
tion in C. Mainly the DSP library provided by Texas Instruments caused
big problems. First just setting the class path right was kind of tricky but
after looking into the reference guide and some help from Frida we got it
right. After solving the class path a new problem was introduced, how to
use the function DSPF sp ﬀtSPxSP which was provided by the DSP library.
Apparently there is a bug in C when converting from unsigned short to ﬂoat
Jonas Rosenqvist, Kim Smidje, Henrik Nilsson, Johan Mattsson 23
which forces you to ﬁrst cast to from unsigned short to short and after that
cast to ﬂoat. If this isn’t done correctly there will be values like 0 and those
will be interpreted as the maximum ﬂoat value.
The last issue we had to resolve was that some frequencies seemed impos
sible to get good estimations for and the estimations ﬂuctuated a lot. What
we did was to use the median of the last 10 values instead of just the regular
average. This cancel out the ﬂuctuation but introduce other disadvantages
for example if the frequency of the input signal varies very fast, but on the
other hand in reality this should not be a problem.
6 Conclusion
The estimation was good for frequencies in the range 240  880 Hz before
and after this range the errors becomes too large. The reason for this is
that the input signal is divided into smaller parts. Another limitation of
our program is if the input signal changes frequency every 16 ms then the
output will just be the median frequency of the last 10 frequencies.
7 References
[1] Roads, Curtis (1996). The Computer Music Tutorial, Part 4: Sound
analysis
[2] Norton, Michael; Karczub, Denis (2003). Fundamentals of Noise and
Vibration Analysis for Engineers, Cambridge University Press.
[3] Frequencies of Musical Notes  http://www.phy.mtu.edu/˜suits/notefreqs.html
24 Pitch Estimaiton
25
Part III
Vocoder
Mattias Danielsson, Andre Ericsson, Kujtim Iljazi, Babak
Rajabian
Abstract
This report is based on a project by students to become more ex
perienced in programming signal processing algorithms on a Texas In
struments TMS320C6713 DSK. The project was to program a musical
based LPCvocoder. A highpass ﬁlter was used for preﬁltering (FIR
ﬁlter). The LevinsonDurbin recursion was used to model the voice
using an IIR lattice ﬁlter structure. In order to do so the autocor
relation was needed from the voice. A synthesizer was needed as a
carrier signal which was the key to change the voice. The vocoder was
programmed so that the voice was controlling the level of the carrier
signal. The course Optimum Signal Processing is recommended to take
before reading this report.
26 Vocoder
1 Introduction
The purpose of this project was to program a Texas Instruments TMS320C6713
DSK in Code Composer Studio v3.3 in C to an LPCbased (Linear Prediction
Coding) vocoder used as a musical instrument. The vocoder is programmed
to be used together with a synthesizer. When the performer presses down a
key on the synthesizer and speaks into the microphone (both plugged in to
the vocoder) a synthesized vocal sound is heard from the loudspeakers. The
characteristic from the synthesized vocal sound is dependent on what sound
the synthesizer is set to produce. The big advantage is that the vocoder
is compatible with basically every musical instrument that can produce a
sound with a constant sustain level and rich frequency content.
2 Theory
2.1 Overall description of our vocoder model
Figure 1: Our vocoder model
Our vocoder model is shown in ﬁgure 1 above. A sampled speech signal
from the microphone is ﬁrst ﬁltered through a high pass ﬁlter, in order to get
rid of the low frequency content in the voice (the higher frequency content
in the voice spectrum is what deﬁnes the vocal tract, which explained on
the second lecture of this course). Otherwise the unnecessary, low frequency
content of the voice will be modeled. Thereafter a block of samples is built
from the highpass ﬁlter in order to calculate the autocorrelation values.
With the autocorrelation values we can estimate the ﬁlter coeﬃcients for
Mattias Danielsson, Andre Ericsson, Kujtim Iljazi, Babak Rajabian 27
the allpole model (IIRﬁlter) which should represent a model for the vocal
tract.
From the sample block from the high pass ﬁlter the maximum value can
be obtained to represent the amplitude of the speech signal. This value
is multiplied with the normalized carrier signal (which is always a signal
between 1 and 1). This way the sound level of the carrier signal is basically
controlled by the voice signal. The signal from the IIRmodel is the modiﬁed
voice. The last step that needs to be done is to invert the eﬀect of the high
pass ﬁlter that was implemented at the beginning by ﬁltering the output
signal from the IIRmodel with a low pass ﬁlter.
2.2 The highpass ﬁlter
An FIR ﬁlter is simply deﬁned by the following equation
H(z) =
p
¸
k=0
b
p
(k)z
−k
(1)
An FIR ﬁlter structure of this type is used to implement a highpass
ﬁlter.
2.3 The autocorrelation function
hej The autocorrelation function is a function to determine how much a
signal relates to itself at diﬀerent time lags. The estimation of the autocor
relation function is given below.
r
x
(k) =
1
N −k
N
¸
n=k
x(n)x
∗
(n −k) (2)
The autocorrelation function must be normalized in order to prevent over
ﬂow and to work properly with the LevinsonDurbin recursion.
ρ
x
(k) =
r
x
(k)
r
x
(0)
(3)
2.4 The Levinsondurbin recursion
The LevinsonDurbin recursion is an algorithm used to ﬁnd an allpole
model by using a sequence of autocorrelation values. It calculates both
regular IIRﬁlter coeﬃcients, a(j) , and the reﬂection coeﬃcients for an IIR
lattice ﬁlter, Γ
j
. The LevinsonDurbin algorithm is described in [1] and
repeted in the table at the top of the next page.
28 Vocoder
1. Initialize the recursion
(a) a
0
(0) = 1
(b)
0
= ρ
x
(0)
2. For j = 0, 1, ..., p −1
(a) γ
j
= ρ
x
(j + 1) +
j
¸
i=1
a
j
(i)ρ
x
(j −i + 1)
(b) Γ
j+1
= −γ
j
/
j
(c) For i = 1, 2, ..., j
a
j+1
(i) = a
j
(i) + Γ
j+1
a
∗
j
(j −i + 1)
(d) a
j+1
(j + 1) = Γ
j+1
(e)
j+1
=
j
[1 −Γ
j+1

2
]
3. b(0) =
√
p
Table 1: The LevinsonDurbin recursion
2.5 The IIR lattice ﬁlter
The IIR lattice ﬁlter structure is an alternative structure of the IIRﬁlter.
Instead of using the regular ﬁlter coeﬃcients, a(k) , it uses the reﬂection
coeﬃcients Γ
k
. This structure has ”the same advantages of modularity,
simple tests for stability and decreased sensitivity to parameter quantization
eﬀects”. A single stage of an IIR lattice ﬁlter structure is shown on the next
page and it’s diﬀerence equations describing it is shown below.
e
+
j
(n) = e
+
j+1
(n) −Γ
j+1
e
−
j
(n −1) (4)
e
−
j+1
(n) = e
−
j
(n −1) + Γ
∗
j+1
e
+
j
(n) (5)
Mattias Danielsson, Andre Ericsson, Kujtim Iljazi, Babak Rajabian 29
Figure 2: Single stage of an IIR lattice ﬁlter
A complete p:th order IIR lattice ﬁlter can than be derived from the
diﬀerence equations from the previous page and the ﬁgure above as shown
in the ﬁgure below.
Figure 3: p:th order IIR lattice ﬁlter
30 Vocoder
3 Implementation
The implementation of the vocoder was done in the C programming lan
guage using Code Composer Studio. A code template used in the second lab
in this course was used to make starting easier. There are a lot of conﬁgura
tion bits that can be set to change the parameters of the AD/DA converter
but these were left at their default value as found except for the sampling
rate that was changed to 8 kHz, not to break any realtime performance. Us
ing the knowledge that speech is somewhat frequency stationary during 20
ms (which was read on a report based on a similar speech modeling project
in this course) and the sampling frequency of 8 kHz results in the buﬀersize
of 160 samples. One thing to keep in mind is that the PIP buﬀers are ﬁlled
with unsigned shorts that have to be type casted to short and then to ﬂoat
to be able to do calculations in increased precision. Also the input voltage
to the ADconverter has to be taken into account. For example if you want
to use an old analogue synthesizer to generate the carrier signal you have
to make sure that the output from the synthesizer is below the reference
voltage for the ADconverter. To make the implementation of the highpass
ﬁlter (an FIR ﬁlter) of any size a general and already written function by
Texas Instrument was used even though the order of the highpass ﬁlter was
only of order one.
Our highpass ﬁlter is deﬁned by the following equation
H
HP
= 1 −0.98z
−1
(6)
The order of the ﬁlter modeling the vocal tract was chosen to 8. This
results in the size of the normalized autocorrelation vector to be 9 (to make
an allpole model using the LevinsonDurbin recursion of order n you need
n+1 autocorrelation values). At ﬁrst a regular IIRﬁlter was used to model
the vocal tract but was later replaced by an IIR lattice ﬁlter structure which
is described later in the results section. To make the voice control the level
of the carrier signal, the absolute largest value is found from the block of 160
samples from the highpass ﬁltered speech signal. This value is multiplied
with the normalized carrier signal which is in an interval of values between
1 and 1, which was done by dividing the carrier signal by 32000, because
the absolute maximum value that the samples we work with can obtain is
32000.
Mattias Danielsson, Andre Ericsson, Kujtim Iljazi, Babak Rajabian 31
4 Testing and debugging
For testing of the diﬀerent blocks in the vocoder we took a pragmatic
approach. Knowing the behaviour of the blocks, diﬀerent signals revealing
easily if the blocks were working correctly or not. For the highpass ﬁlter
we used sounds having both high and low frequency content and listened to
the ﬁltered signal. In code composer studio there is also special commands
enabling printout of internal variables and output from blocks while running
the code.
For the autocorrelation we used a sine signal and looked at the resulting
output vector. The autocorrelation was strictly descending in value for
increasing lag shifts. The maximum lag shift in our case is 9, resulting in 9
autocorrelation coeﬃcients. We also tested the autocorrelation with white
noise and this resulted in a low autocorrelation and not predictable values
for lag shifts greater than zero.
The LevinsonDurbin algorithm was tested by feeding white noise into
an IIR ﬁlter where we ourselves have set the ﬁlter coeﬃcients. The output of
this ﬁlter was used as input to our autocorrelation block and the output of
the autocorrelation was sent to the LevinsonDurbin algorithm. The output
of the LevinsonDurbin algorithm should be estimates of the ﬁlter coeﬃcients
in the IIR ﬁlter. Our LevinsonDurbin algorithm produced estimates that
varied around the values preset by us. The reason for the variation around
the correct ﬁlter coeﬃcient values was the short input block length of 160.
The value of 160 is the result of the sample rate of 8 KHz and speech
duration block of 20 ms used. To test our complete vocoder system we used
a recorded voice sample on one the left channel and diﬀerent square wave
audio sources increasing in frequency on the right channel. The diﬀerent
sources on the right channel could be mixed and ampliﬁed at will in the
sound program Audacity.
32 Vocoder
5 Results and conclusions
The ﬁrst test of our complete vocoder system resulted in low sound level.
This was overcome by using ampliﬁed speakers (we did’nt gain the output
with the parameter in the program because we did’nt know how to change
the gain parameter during runtime). The reason for the low sound level
is probably that no ampliﬁcation is done in the AD/DA converter. When
using our regular IIR ﬁlter for ﬁltering of the carrier, this resulted in loud
and painful sound level spikes later discovered to be caused by unstable
ﬁlter coeﬃcients. Trying diﬀerent ﬁxes to the IIR ﬁlter resulted in some
improvements but no complete absence of the painful sound spikes. The
ordinary IIR ﬁlter was later replaced by a lattice IIR ﬁlter. After altering
the LevinsonDurbin algorithm so that the reﬂection coeﬃcients did not
exceed an absolute value of one resulted in no spikes in the output signal.
The pitch of the generated speech was also tested by changing the frequency
of the square wave carrier. The speech pitch changed satisfactory when
changing the frequency of the carrier making us happy with the result. As
always there are diﬀerent changes, choices and improvements you can make
in system design and implementation but we are satisﬁed with our choices.
We never had time to test carrier signals from real synthesizers before the
deadline for the report but we will show it in our demonstration.
Mattias Danielsson, Andre Ericsson, Kujtim Iljazi, Babak Rajabian 33
References
[1] Monson H. Hayes, Statistical digital signal processing and modeling,
John Wiley and Sons, Inc, 1996
[2] http://en.wikipedia.org/wiki/Vocoder
34 Vocoder
35
Part IV
Reverberation
R. Tullberg, R. Mittipalli, S. AbduRahman, T. Isacsson
Abstract
In this project, the challenge was to implement a digital reverb on
a Texas Instruments TMS320C6713 DSK development board. Jean
Marc Jot’s Feedback Delay Network algorithm was used as reverbera
tion algorithm. Diﬀerent parameters in the algorithm had to be iden
tiﬁed and tuned experimentally. To meet realtime constraints imposed
by CPU and memory speeds various hardware and software optimiza
tions had to be employed. To aid development, the algorithm was also
implemented as a nonrealtime version in Matlab. This was beneﬁ
cial both as a reference design as well as a tool for parameter tuning
and code analysis. The ﬁnished application produces a smooth reverb
sound running without glitches at a CPU consumption of approxi
mately 6065%.
36 Reverberation
1 Introduction
1.1 Reverberation
Sound waves travelling in a room are reﬂected when they hit walls or other
obstacles. The reﬂections go on themselves to hit walls and obstacles and
get reﬂected again and so on. This phenomenon is called reverberation.
Sounds are enriched and colored by these reﬂections due to airs and obsta
cles tendency to dampen higher frequencies to a greater extent than lower
frequencies.
A reverberated sound consists of three main parts: The sound that trav
els directly from the source to the listener is called the direct sound. Re
ﬂected copies of the sound are delayed some time depending on the physical
properties of the surroundings, such as room size, material and wall surface,
before reaching the listener. The earliest iterations of these reﬂections are
called early reﬂections and extend roughly 60 to 100 ms, depending on the
size of the room[1], after the initial direct sound (see ﬁgure 1).
In time, as reﬂections are reﬂected and multiplied again and again, they
become indistinguishable as separate echoes to the listener. This last part
called late reverberations starts at 100 ms and can go on for several seconds
in a large enough room or concert hall.
Figure 1: Simpliﬁed image of direct sound and 1:st and 2:d order early
reﬂections
Figure 2 shows the early part, consisting of a several of early delays,
followed by the late reverberated decaying part.
R. Tullberg, R. Mittipalli, S. AbduRahman, T. Isacsson 37
Figure 2: Impulse Response of actual implemented reverb showing early
reﬂections and late reverberations
38 Reverberation
2 Theory
2.1 Reverb Algorithm
Early on in the project we settled on the ﬁrst of the algorithms developed
by JeanMarc Jot. We had read that, while computationally expensive, it
produced an impressive reverberated sound with rich echo density[1]. The
deﬁning characteristic of this algorithm being the feedback delay network
that Jot had introduced to model late reverberations.
Figure 3: Jot’s FDN algorithm and how it ﬁts in the overall reverb imple
mentation
2.2 Reverberation time
The time for a sound to attenuate by 60 dB in a reverberant space is called
the reverberation time, T
r
. Sound is attenuated because of the surfaces in
the room, as they absorb the energy of the sound waves and their reﬂec
tions. The reasoning behind the 60 dB attenuation requirement is that the
diﬀerence between the intensity of a common orchestra, 100 dB, and the
background noise of an ordinary room, 40 dB[5]. The two most common
formulas for the approximation of T
r
is the Eyring formula[1] (0.163 in both
formulas corrected to 0.161[2]):
T
r
=
0.161 · V
A· ln(1 −s) + 4 · δ
a
· V
(1)
R. Tullberg, R. Mittipalli, S. AbduRahman, T. Isacsson 39
and the Sabine equation[1]:
T
r
=
0.161 · V
s · A
(2)
where V is the room volume, A is the room surface area, s an average ab
sorption coeﬃcient and δ
a
is the frequency dependent attenuation constant
of air. OnceT
r
is calculated, the attenuation can be derived and frequency
dependency established as[1]:
T
r
(ω) = −
3 · T
log
10
(γ(ω))
(3)
where T is the sampling period and γ(ω) is the attenuation per sample
period as a function of the frequency ω.
2.3 Delay elements z
−mi
The z
−mi
delays model the time it takes for a reﬂection to reach the listener
and/or another obstacle or wall. The delay in samples is the delay time in
milliseconds multiplied by sampling frequency, for example:
m
16
= 100ms48kHz = 4800 (4)
The diﬀerent delay values are recommended to be mutually prime, when
expressed in sample units. This is to avoid superpositioning of harmoni
cally related sound waves causing unpleasant resonances, so called ﬂutter
echoes[3].
2.4 Damping Filters h
i
(z)
Starting from input x(n) in ﬁgure 3, the signal is copied into, in our case,
16 diﬀerent lines, delayed m
i
samples, and then ﬁltered by the h
i
(z) ﬁlters.
These lowpass ﬁlters model the real worlds attenuation due to absorption,
reﬂection and spreading in walls and other obstacles. High frequency compo
nents are attenuated to a greater extent than lower frequencies as described
in 2.2. The ﬁlters are expressed as follows in the frequency domain[1]:
h
i
(z) = g
i
(1 −a
i
)
(1 −a
i
z
−1
)
(5)
where
g
i
= 10
−
3miT
T
r
(dc)
,
a
i
=
ln 10
4
log
10
(g
i
)(1 −
1
α
2
),
40 Reverberation
α =
T
r
(Nyquist)
T
r
(dc)
,
with T
r
(Nyquist) and T
r
(dc) being the time it takes for the highest and
lowest frequencies respectively to decay by 60 dB.
2.5 Diﬀusion Matrix A
When a sound wave is reﬂected when hitting an obstacle in a room it is scat
tered across the room hitting other obstacles, which in turn scatter the new
reﬂections across the room hitting other obstacles and so on. Each time, the
reﬂections are redistributed among the walls and obstacles in the room. The
element responsible for this redistribution in Jot’s algorithm is the diﬀusion
matrix. It takes its inputs from the n delay lines and redistributes them
back into the same delay lines. Since the damping or attenuation of sound
waves is handled by the damping h
i
(z) ﬁlters, the diﬀusion matrix should
only redistribute the energy and neither amplify nor attenuate it. In other
words, it should be both stable and lossless, both of which are fulﬁlled if the
matrix is unitary, in case of a complex valued matrix, or orthogonal, in case
of a matrix containing only real values.
2.6 Gains b
i
and c
i
These two vectors are simple gains used to achieve diﬀerent eﬀects. We
simply set all the elements of vector b to
1
16
to make sure the output did
not clip as the input was copied 16 times and then summed. The c vector,
often used to achieve stereo spread or other cross channel eﬀects, was left
unused, the equivalent of setting all its elements to a value of one.
2.7 Tonal Correction Filter t(z)
As the outputs of the h
i
(z) ﬁlters have lose some of the higher frequen
cies they tend to not be an accurate representation of the original signal.
The solution to this is to place an inverted version of our low pass ﬁlters,
called a tonal correction ﬁlter, before output, to equalize the modal energy
irrespective of the reverberation time in each ﬁlter[4].
t(z) =
1 −bz
−1
1 −b
(6)
where
b =
1 −α
1 + α
R. Tullberg, R. Mittipalli, S. AbduRahman, T. Isacsson 41
3 Implementation
3.1 Realtime versus nonrealtime implementation
To gain understanding of the algorithm a reference prototype was initially
developed in Matlab. When this algorithm produced satisfying results it
was adopted to the real time environment in TMS320C6713 DSK, where
especially constraints on CPU and memory usage were the next challenges,
with the obvious requirement that to be able to keep up with a sound input
sampled at a certain frequency, our application had to process a certain
amount of samples before the next chunk of samples arrived.
3.2 Diﬀusion Matrix
A potentially unlimited amount of matrices fulﬁll the condition of being uni
tary, when containing complex values, or orthogonal, when only containing
real values, so other considerations were taken into account when choosing
the diﬀusion matrix. For example, a better echo density is achieved the
more nonzero elements there is in a matrix[1]. However, the more nonzero
elements a matrix contains the more multiplications have to be performed.
So naturally a matrix that lends itself to optimization when multiplied with
a vector is preferred.
Doing some research one such matrix[6] was found:
A =
1
2
A
4
−A
4
−A
4
−A
4
−A
4
A
4
−A
4
−A
4
−A
4
−A
4
A
4
−A
4
−A
4
−A
4
−A
4
A
4
¸
¸
¸
¸
(7)
where A
4
is a Hadamard matrix of the 4:th order[7]:
A
4
=
1
2
1 1 1 1
1 −1 1 −1
1 1 −1 −1
1 −1 −1 1
¸
¸
¸
¸
(8)
This matrix has the triple beneﬁts of being orthogonal (A = A
1
), con
taining only nonzero elements of equal magnitude, and being, as we later
shall see, easy to optimize.
3.3 Software Optimizations
3.3.1 Matrix Multiplication
Normally multiplying a vector of size n by a matrix of dimension n requires
n
2
multiplications and n(n − 1) additions. However, some matrices have
beneﬁcial properties that make multiplying by them easier. The diﬀusion
42 Reverberation
matrix described in section 3.2 is one such matrix. To start with the matrix
consists of only positive and negative ones, aside from a scalar that can
be factored out,
1
4
in this case. Thus, the vector elements need only be
multiplied with the scalar after having been summed according to the signs
in each matrix column. This reduces the number of multiplication to n, or
16 in our case. Furthermore, the regularity of the A
4
matrix allows us to
calculate intermediate sum values that can be reused instead of having to
do each addition separately[6]. For example, when multiplying a vector x of
size 4 with anA
4
matrix the following intermediate values are calculated:
a = x
1
+ x
2
b = x
1
−x
2
c = x
3
+ x
4
d = x
3
−x
4
and the resulting vector becomes:
Y
14
=
a + c
a −c
b + d
b −d
¸
¸
¸
¸
In our case this was done for the 16 element input vector in groups of
four so that four resulting vectors were calculated for each sub input vector.
These were organized in the following manner:
B =
Y
14
−Y
58
−Y
912
−Y
1316
Y
58
−Y
14
−Y
912
−Y
1316
Y
912
−Y
58
−Y
14
−Y
1316
Y
1316
−Y
58
−Y
912
−Y
14
¸
¸
¸
¸
Finally, each row was summed to get the ﬁnal result vector B of the
matrix multiplication. The resulting operation count is 16 + 16 + 163 = 80
additions, instead of the usual 1615 = 240. So by choosing a certain type of
matrix, the number of multiplications could be reduced from 256 to 16 and
the, admittedly cheaper, additions from 240 to 80.
3.3.2 Circular Buﬀers
Every input to the ﬁlters h
i
(n) is delayed m
i
samples, the outputs from the
ﬁlters are then fed to the diﬀusion matrix and summed before being sent
to the tonal correction ﬁlter. Because of the long delay between the time
a value is calculated and the time when it ﬁnally reaches the output and
can be discarded, arrays had to be used to store the values. These arrays
were implemented as circular buﬀers, with each size of buﬀer i equal to the
sample delay length mi.
R. Tullberg, R. Mittipalli, S. AbduRahman, T. Isacsson 43
Circular buﬀers are governed by a pointer to the array. The pointers
position is incremented each iteration and when it reaches the end of the
array it is reset to the beginning of the array. Values are read from the
position in the buﬀer indicated by the pointer, and after the pointer is
moved to the next position in the buﬀer, a newly calculated value is written
to that new position, making sure it wont be read until the pointer has
traversed the whole buﬀer, which happens exactly mi iterations later.
Using circular buﬀers reduces CPU load by just reading or writing to
the array element at the pointer position instead of having to move all the
elements of the array one position forward for each iteration. In addition to
the delays, the predelay line was also implemented as a circular buﬀer.
3.4 Compiler Optimization
Compiling the program with default options and running it on the DPS
board at 48 KHz resulted in very high CPU utilization. So high in fact
that the low priority analysis module had trouble report any CPU load
information back to the host. To remedy this we tried the diﬀerent compiler
optimization levels and settled on O2 which gave an equally good CPU
load as O3, but without any potential increase in program size. This is
as expected since O3 mostly deals with the inlining of functions[8] and,
aside from the interrupt triggered process function, our program only calls
a function once to set some initial global variables.
Figure 4: CPU load when compiled with optimization level O1
3.5 Hardware Memory Considerations
The predelay buﬀer and the 16 delay line buﬀers, duplicated for each of
the two stereo channels, need a memory space of roughly 0.5 MB depending
on conﬁgured delay lengths and predelay, where each single sample delay
44 Reverberation
Figure 5: CPU load when compiled with optimization level O2
Figure 6: CPU load when compiled with optimization level O3
requires 4 bytes (size of ﬂoat data type) per channel. This large amount of
data that has to be stored for later, delayed processing prohibited the use
of the relatively small internal memory of 256 KB[9]. Instead the onboard
SDRAM memory with its larger capacity of 8 MB[10] was used. An external
heap of 983,040 (0xf0000) bytes was declared in the memory conﬁguration
utility in Code Composer Studio, with the additional space to allow for some
headroom in setting predelay and m
i
delay lengths. The 34 buﬀer arrays
were then allocated to it using MEM calloc commands.
While the SDRAM memory worked great, we were careful as to not
allocate anything that we didn’t have to, as the internal memory is faster
than the SDRAM memory[9].
R. Tullberg, R. Mittipalli, S. AbduRahman, T. Isacsson 45
4 Result
4.1 Experimental setup
Experimentation was conducted in part using the prototype in Matlab where
ﬁles of diﬀering sound material were treated with the reverb algorithm and
sound ﬁles of 100% wet and userdeﬁned wet ratio signals were generated
and compared to the originals, both by ear and by comparing plots of the
dry, wet and mixed signals. The bulk of the algorithms parameters, such as
the A matrix and the m delay lengths, were chosen and permanently set at
this stage.
Once the realtime implementation was in place and working, we pro
ceeded to use recordings of anechoic music as input to the DSP board to
ﬁnetune parameters, such as room dimensions, α, and T
r
(Nyquist), to taste.
4.2 Results
We achieved most of the goals that we sought. The reverb is nice sounding,
in no small part due to the large amount of delay lines, and as described
in 3.4 the CPU load is within a satisfactory range running at the maxi
mum sampling frequency of the development board. The ones that weren’t
achieved were minor ones like the implementation of a host based user in
terface.
Also, the learning experience has been huge, and has gained us additional
skills in diﬀerent areas, such as signal processing, C programming, optimized
algorithms, etc.
5 Discussion and Conclusion
5.1 General
When modelling a reverberation application, in essence trying to replicate
a physical environment as much as possible, the physical properties have to
be known. The volume of the room, it’s total surface area, absorption of
the materials used in the room, what other elements in the room that could
absorb energy and/or reﬂect the rays. One way of doing this is Ray Tracing
where an impulse response of the room is achieved by setting up microphones
in the room and then ﬁring a starter gun to produce the impulse. The
response could be used to design a system which mimics the room perfectly
but the size of the response is so big and complex that in real time processing
it would yield impossible.
So replicating rooms with an algorithm is the way to go, but with no idea
of what kind of room we are trying to replicate, nor what kinds of absorption
the room would present we had problems with knowing what we were looking
for. But as the work progressed, we made these parameters conﬁgurable,
46 Reverberation
and didn’t bother too much with accurately reproducing real rooms, but
focused on the audible results and getting the program to actually work.
The hardest part was understanding the limitations of the DSP boards
memory conﬁguration, and the related special commands. Understanding
what the compiler was trying to tell you, why things didn’t work at all,
why global variables weren’t seen by subfunctions and that Code Composer
Studios own ﬂavor of the C language seems to have math functions that
aren’t exact replicas of the Math library in C.
The hours spent on trying to understand Code Composer Studio and
sifting through the wealth of help information provided was the biggest
drawback of the project.
5.2 Nonrealtime versus realtime implementation
As mentioned earlier a version of the algorithm was implemented in Mat
lab, as a nonrealtime version. Besides the obvious advantages of having a
prototype that all members of the group can reference while working on the
realtime implementation, it was also beneﬁciary not to have to deal with
constraints on memory and CPU while trying to get a grip on the algo
rithm itself. However, there were drawbacks. Realtime issues did naturally
not manifest during the prototyping phase, but it was also hard to measure
the beneﬁts of optimizing matrix multiplication in Matlab since it’s already
heavily optimized for that purpose.
The main disadvantage of a realtime processing is the lack of inﬁnite,
or at least comfortably large, processing power. Because of the time re
strictions when doing processing, the calculations had to be optimized and
if not enough, cut down and quality restricted. In comparison to oﬄine
implementation, a lot more time is spent on making a realtime application
work. Allocating memory right, optimizing calculations and using the DSP’s
builtin functions is just some of the aspects that had to be addressed. The
advantage is though having playback in realtime, using a reverb in live mu
sic rigs and recording software to enable the user to hear the eﬀect while
playing.
5.3 Improvements
A user interface was on our todo list, but never implemented due to lack
of time. This interface would contain controls for parameters like wet/dry
ratio, room size, gain, and the like. The interface would ideally be able to
communicate with the board in real time, using a subfunction that calculates
all the necessary parameters from a theoretical user input.
R. Tullberg, R. Mittipalli, S. AbduRahman, T. Isacsson 47
References
[1] Lilja, Ola
Algorithms for Reverberation  Theory and Implementation
Master Thesis, LTH
2002
[2] Wikipedia: Reverberation
http://en.wikipedia.org/wiki/Reverberation#Sabine equation
Visited: 20110301
[3] Rocchesso, Davide
Introduction to Sound Processing, 3.6.2 Reverberation
2003
[4] Tonal Correction Filter
https://ccrma.stanford.edu/˜jos/Reverb/Tonal Correction Filter.html
[5] Reverberation Time
http://hyperphysics.phyastr.gsu.edu/hbase/acoustic/revtim.html
[6] Campbell, Spencer
An Implementation of a Feedback Delay Network  Final Project Report
20081209
http://twentyhertz.com/618 FinalProjectReport SpencerCampbell.pdf
[7] Wikipedia: Hadamard matrix
http://en.wikipedia.org/wiki/Hadamard matrix
Visited: 20110214
[8] Brian J. Gough
An Introduction to GCC  for the GNU compilers gcc and g++, 6.4
Optimization levels
2005
http://www.networktheory.co.uk/docs/gccintro/index.html
[9] Chassaing, Rulph; Reay, Donald
Applications With The TMS320C6713 And TMS320C6416 Dsk, 3.2
The TMS320C6x Architecture
2008
[10] SPECTRUM DIGITAL
TMS320C6713 DSK Technical Reference 5067350001 Rev. A
May 2003
48 Reverberation
49
Part V
Speech Recognition Using MFCC
Harshavardhan Kittur, Kaoushik Raj Ramamoorthy,
Manivannan Ethiraj, Mohan Raj Gopal
Abstract
In this project, we present one of the techniques to extract the
feature set from a speech signal and implement it in an speech recogni
tion algorithm using TMS320C6713 DSK Board. The key is to convert
the speech waveform into some type of parametric representation for
further analysis and processing. A wide range of techniques exist for
parametrically representing the speech signal for the speech recogni
tion task, such as Linear Prediction Coding (LPC), MelFrequency
Cepstrum Coeﬃcients (MFCC), and others. MFCC is perhaps the
best known and most popular, and is used in this project.
50 Speech Recognition Using MFCC
1 Introduction
1.1 Why Speech recognition?
Speech is the primary means of communication between people. For reasons
ranging from realization of human speech capabilities, to the desire to auto
mate simple tasks inherently requiring humanmachine interactions, research
in automatic speech recognition has attracted a great deal of attention over
the past few decades. Although there are numerous ways to model a speech
signal and perform speech recognition in both hardware and software, no
such system is stable for all kind of speakers in the world. Our interest is to
ﬁnd out the intricacies of designing such a system by implementing it on a
TMS320C6713 DSK board.
1.2 Common problems found in designing such a system
• People from diﬀerent parts of the world pronounce words diﬀerently.
Also, the rate at which they speak aﬀects the implementation of a
speech modelling system
• Speech is usually continuous in nature and word boundaries are not
clearly deﬁned.
• The rate of error in the recognition system depends on the amount of
data stored in the system by training. When the number of words in
the database is large and consists of similar sounding words (rhyming
words), there is a good probability that one word is recognized as the
other.
• Noise is generally a major factor in speech recognition and has to
be carefully analysed while designing a system. A noisy environment
limits the system performance.
1.3 Tools Used
• MATLAB
• Code Composer Studio
• TMS320C6713 DSK Board
• HiFi Microphone
• Stereo Speakers
Harshavardhan Kittur, Kaoushik Raj Ramamoorthy,Manivannan Ethiraj, Mohan Raj Gopal51
Figure 1: Feature extraction using MFCC
Figure 2: Feature matching using MFCC
2 Theory
2.1 Speech Recognition Algorithm
At the highest level there are a number ways to do this complex task of
speech recognition but the basic principles are feature extraction and feature
matching. Feature extraction is the process that extracts a small amount of
data from the voice signal that can later be used to represent each speaker.
Feature matching involves the actual procedure to identify the unknown
speaker by comparing extracted features from the voice input with the ones
from a set of known speakers. The block diagram are shown in Figures 1
and 2.
2.1.1 Feature Extraction
In feature extraction phase the speech can be parameterized by various
methods such as Linear Prediction Coding (LPC), MelFrequency Cepstrum
52 Speech Recognition Using MFCC
Coeﬃcients (MFCC), and others. MFCC which is used in this project is
perhaps the best known and most popular. MFCCs takes human perception
sensitivity with respect to frequencies into consideration, and therefore are
best for speech recognition. MFCCs are based on the known variation of the
human ears critical bandwidths with frequency ﬁlters spaced linearly at low
frequencies and logarithmically at high frequencies capture the phonetically
important characteristics of speech. This is expressed in the melfrequency
scale, which is linear frequency spacing below 700 Hz and a logarithmic
spacing above 700 Hz.
2.1.2 Feature Matching
The feature matching phase involves the use of Euclidean distance. In math
ematics, the Euclidean distance or Euclidean metric is nothing but the or
dinary distance between two points that one would measure with a ruler,
which can be proven by repeated application of the Pythagorean Theorem.
By using this formula as distance, Euclidean space becomes a metric space.
This is a measurement of how similar two user templates are. Thus, the Eu
clidean distance measures the percentage of dissimilar bits out of the number
of comparisons made, therefore we choose this method of comparison.
2.2 Mel Frequency Cepstrum Coeﬃcients
These are derived from a type of cepstral representation of the audio clip
(a cepstrum is nothing but a ”spectrumofaspectrum”). The diﬀerence be
tween the cepstrum and the Melfrequency cepstrum (MFC)is that in the
MFC, the frequency bands are positioned logarithmically (on the mel scale)
which approximates the human auditory system’s response more closely than
the linearlyspaced frequency bands obtained directly from the FFT or DCT.
This can allow for better processing of data, for example, in audio compres
sion. However, unlike the sonogram, MFCCs lack an outer ear model and,
hence, cannot represent perceived loudness accurately. MFCCs are com
monly derived as follows:
1. Take the Fourier transform of (a windowed excerpt of) a signal.
2. Map the log amplitudes of the spectrum obtained above onto the Mel
scale, using triangular overlapping windows.
3. Take the Discrete Cosine Transform of the list of Mel logamplitudes,
as if it were a signal.
4. The MFCCs are the amplitudes of the resulting spectrum.
Harshavardhan Kittur, Kaoushik Raj Ramamoorthy,Manivannan Ethiraj, Mohan Raj Gopal53
3 Implementation
Both the training and the recognition system is the same till we ﬁnd MFCC
coeﬃcients. The training phase stores the coeﬃcients and the recognition
phase compare current recorded coeﬃcients with the stored ones. The steps
that are implemented to complete our design are listed below and the block
diagram is shown in ﬁgure 3.
3.1 Level detection
When the speaker says out a word the system has to do silence detection
and capture only the speech signal. The start of an input speech signal is
identiﬁed based on a prestored threshold value. Speech is captured when it
exceeds the threshold and is passed on to the framing stage. The sampling
frequency for our system is 8kHz and the speech is captured for 1 sec which
leaves us with 8192 samples.
3.2 Frame blocking
It is assumed that recorded speech is piecewise stationary which means the
signal is stationary for short period of times. By taking advantage of this
property, we divide the captured signal into ﬁxed number of overlapping
frames (156 samples overlap) of sample length 256. Meaning, Each frame
consists of 256 samples of speech signal, and the subsequent frame starts
from the 100th sample of the previous frame. This technique is called fram
ing.
3.3 Windowing
After framing, windowing is applied to prevent spectral leakage.A Hamming
window with 256 coeﬃcients is used, since the frame length is 256 samples.
Also, It is easier to combine this step with the Frame blocking step.
3.4 Fast Fourier transform
The FFT converts the timedomain speech signal into a frequency domain
to yield a complex signal. Speech is a real signal, but its FFT has both
real and imaginary components. We apply 256 point radix 2 FFT for each
frame. The total no of stages in FFT is 8. The FFT algorithm in Rulph
Chassing book [1] is used in our implementation.
3.5 Power spectrum calculation
The power in the frequency domain is calculated by summing the square of
the real and imaginary components of the signal. The second half of the
54 Speech Recognition Using MFCC
Figure 3: Our Speech Recognition System
Harshavardhan Kittur, Kaoushik Raj Ramamoorthy,Manivannan Ethiraj, Mohan Raj Gopal55
samples in the frame are ignored since they are symmetric to the ﬁrst half
(since the speech signal is real and has a linear phase).
3.6 Melfrequency wrapping
Triangular ﬁlters are designed using the Melfrequency scale with a bank
of ﬁlters to approximate the human ear. The power signal is then applied
to this bank of ﬁlters to determine the frequency content across each ﬁlter.
Twenty ﬁlters are chosen, uniformly spaced in the Melfrequency scale be
tween 0 and 4kHz. The Melfrequency spectrum is computed by multiplying
the signal spectrum with a set of triangular ﬁlters designed using the Mel
scale. For a given frequency f, the mel of the frequency is given by
B(f) = 1125 ∗ ln(1 +
f
700
)mels (1)
The frequency edge of each ﬁlter is computed by substituting the corre
sponding mel. Once the edge frequencies and the center frequencies of the
ﬁlter are found, boundary points are computed to determine the transfer
function of the ﬁlter. The transfer function of the triangular ﬁlters is given
below:
H(k, m) =
0 if f[k] < f
c
[m−1],
f[k]−f
c
[m−1]
f
c
[m]−f
c
[m−1]
if f
c
[m−1] ≤ f[k] < f
c
[m],
f[k]−f
c
[m+1]
f
c
[m]−f
c
[m+1]
if f
c
[m] ≤ f[k] < f
c
[m + 1],
0 if f[k] ≥ f
c
[m + 1]
(2)
where f[k] is the frequency of the k
th
sample given by
k∗f
s
N
and N is the no. of samples in each frame (256 in our case)
The width of the ﬁlter (resolution) of the ﬁlter is given by:
φ =
φ
max
−φ
min
M + 1
(3)
where φ
min
is the lowest frequency of the ﬁlter bank and
φ
max
is the highest frequency of the ﬁlter bank
The center frequencies on melscale is given by φ
c
[m] = m ∗ φ for
m ∈ [1, 20]. The center frequencies in the frequencyscale is given as
f
c
[m] = 700 ∗ [10
φ
c
[m]
2595
−1].
Once the ﬁlter transfer function is obtained, we can apply this ﬁlter bank
onto the powerspectrum to obtain the melspectrum .This step is basically
a frequencywarping operation where we change the frequency of the signal
56 Speech Recognition Using MFCC
based on the melscale.
This is elaborated in the equation below:
Mel spectrum[m] =
N−1
¸
k=0
Power spectrum[k] ∗ H[k, m] (4)
3.7 Logenergy spectrum
Once the Melspectrum is obtained, we take the logspectrum of the subse
quent signal. The logfunction is basically an amplitudemodulation where
the lower frequencies are boosted and the higher frequencies are almost
maintained constant.
This is given by: Log energy spectrum[m] = ln(Mel spectrum[m])
3.8 Melfrequency cepstral coeﬃcients
The log mel spectrum is converted back to time. The discrete cosine trans
form (DCT) of the log mel spectrum yields the MFCC. We take the DCT
since the powerspectrum and logmel spectrum are real signals.
3.9 Comparison in the feature matching phase
Once we have the MFCCs, these characterize the particular speaker and
word which are stored during the training phase. During the recognition or
feature matching phase, the coeﬃcients are again determined for the uttered
word and recognition is carried out by analyzing the Euclidean distance
with respect to the stored coeﬃcients and deﬁning an appropriate threshold
calibrated appropriately to increase the word recognition rate.
4 Implementation in MATLAB
An initial feasible algorithm was implemented in MATLAB to emulate the
diﬀerent steps for speech recognition, which will be implemented on the
DSP board. We normalized the recorded speech signal and implemented the
MFCC algorithm. Observing the various plots, we came to the conclusion
that the recorded speech signal can be of varying length (time). Therefore
instead of timewarping the speech signal into a standard time domain, we
considered that the speech is to be spoken for ﬁxed time duration on the
DSP board. The maximal duration was considered as 100 frames (no. of
samples = 100*256*sampling frequency) after repeated trials by diﬀerent
speakers to encompass all speech parameters. Steps that encomposes our
MATLAB implementation are shown below in ﬁgures 4 and 5
Harshavardhan Kittur, Kaoushik Raj Ramamoorthy,Manivannan Ethiraj, Mohan Raj Gopal57
Figure 4: Recorded signal and Signal after silence detection
Figure 5: Melspectrum and MFCC for all frames
58 Speech Recognition Using MFCC
5 Implementation in DSP Board
The DSP board has limited onchip memory (192K internal RAM). We are
required to generate the opcode to ﬁt into this memory along with the
stack and heap space. This poses a diﬃcult situation for us; hence we used
only a minimal set of variables both global and local. Further we made
sure that the sequential steps in the algorithm operated on the pointer to
the variable instead of creating copies of the variable. Further important
constants were stored in the program memory (as #deﬁne preprocessor
directives) and other variables were instructed to be stored in the heap or
stack (as #pragma preprocessor directives).
6 Tests and Results
In the Speech training phase, the experimental setup was performed in a
controlled environment were the noise was minimal and its eﬀects could be
disregarded. The training vectors (Time averaged MFCC coeﬃcients) are
obtained for diﬀerent words like Cat, Dog, Elephant, Hippopotamus, Mouse
and Tiger. It is also possible to easily add other words to our training sys
tem.Then those training vectors are stored in a header ﬁle to be compared
with the test vectors. In the speech recognition phase training vectors are
compared with the test vector using the Euclidean distance method. The
word identiﬁed is displayed using normal printf statement in the code com
poser studio. There is a 90% match for certain speakers and less than 50%
for some speakers. The words ’Cat’ and ’Dog’ has the higher recognition
rate than the other four words.
7 Conclusion
We have successfully implemented an MFCC system for extracting a feature
from voices. We were also able to identify the word from diﬀerent speakers
uttering the word using the extracted feature. Although the results obtained
were not as expected, the amount of knowledge obtained during this project
is exceptional. We learned the techniques of implementing a speech recog
nition system, using MFCC in particular. We also learned to use the Code
Composer studio and DSP Bios. This project enhanced our experiences of
working in MATLAB and C. Our MATLAB implementation helped us a lot
to complete our ﬁnal implementation in the DSK Board. We also learned to
use Latex in the process of completing our report. Apart from the technical
aspects, we learned to manage our time with proper planning. Overall, this
project was challenging and was a good experience to us.
Harshavardhan Kittur, Kaoushik Raj Ramamoorthy,Manivannan Ethiraj, Mohan Raj Gopal59
References
[1] Rulph Chassaing Digital Signal Processing and Applications with the
C6713 and C6416 DSK, John Wiley & Sons, 2005
[2] Sigurdur Sigurdsson, Kaare Brandt Petersen and Tue LehnSchiler Mel
Frequency Cepstral Coeﬃcients: An Evaluation of Robustness of MP3
Encoded Music, Proceedings of the Seventh International Conference
on Music Information Retrieval (ISMIR), 2006
[3] Adarsh K.P., A. R. Deepak, Diwakar R., Karthik R. Implementation of
a VoiceBased Biometric System, Thesis submitted at R.V. College of
Engineering, India ,2007.
60 Speech Recognition Using MFCC
61
Part VI
Face Detection, Tracking and
Recognition
Asheesh Mishra, Mohammed Ibraheem, Shashikant Patil
1 Abstract
The Face detection, tracking and recognition in video is a computational
extensive procedure. It requires the processing of each and every pixel,
depending on the desired ﬁnal output. We used the exceptional capabilities
of the latest DSP board TMS320DM6437 from Texas Instrument, USA. The
board uses the state of the art DSP processor, DaVinci, which does all the
computations only on ﬁxed point numbers. To facilitate our implementation
of project, we used the Code Composer Studio (CCS), that too from Texas
Instruments, which actually comes along with the board. TI provides various
in build functions to get started with the video projects like, videopreview.c
example ﬁle, contains various basic functions to process the pixels in current
frame. We did make use of those functions in the understanding of how the
system actually works and how we can manipulate the pixels values. There
can be various parameters depending on which, we can eﬃciently detect
the human face with in a frame of incoming video data stream, like edge
detection, skin detection, etc. We used the skin detection as our parameter
to implement the face detection and tracking in video. This is because,
we can achieve better eﬃciency in detection and can extract the various
feature like, eyes, nose, lips, ears for further processing in our face recognition
algorithm.
62 Face Detection, Tracking and Recognition
2 Introduction
Face recognition is getting most important with availability of cameras and
need for automated processing of the videos to serve many purposes. DSP
gives the power for processing video and get meaningful information from
the video. TMS320DM6437 with Code Composer Studio gives the algo
rithm developers power to concentrate on developing powerful and eﬃcient
algorithms in less time with many helping utilities. Our system that we
developed in our project consists of four main stages:
1  Capture an image.
2  Face Detection.
3  Face Tracking.
4  Face Recognition.
The ﬁrst stage to capture an image frame from a stream video input to
the TI TMS320DM6437 and processing it to detect the face region in the
captured image and then send this part from image to the PCA module to
generate the features vector and compare it with the prestored vectors in
the database to ﬁnd if the nearest match (face recognition stage). These
stages are shown on the following system block diagram in ﬁg. 1.
Asheesh Mishra, Mohammed Ibraheem, Shashikant Patil 63
3 YCbCr Color Space Model
A Color space is only a format of representing color, brightness, luminance,
and saturation in one way or the other. A thorough understanding of YCbCr
was mandatory in our project of video processing. Here, Y is the luminance
component whereas Cb & Cr are the blue and red color chrominance compo
nents. Luma (Y) is basically, responsible for the brightness in an image and
greatly inﬂuences the perception of an image. Chrominance components in
an image are responsible for the color composition of an image. As far as
our project was concerned, we typically had to work around mostly with the
Chroma components of an image.
Every pixel contains the YCbCr information in the format of 4:2:2, it
means every other value is a Y component and every fourth value is a Cb
and Cr component in the series of video data.
4 Image Filtering for Noise reduction
Image ﬁlters are used to remove the undesired image details, like smoothen
ing of image, which are more suitable for the edge detection or further pro
cessing on image frame. There are various good image ﬁltering algorithms
available for reducing the noise, like, Gaussian Noise ﬁlters, Median Image
ﬁlters and others. One can always take the help of any good book on Image
processing to refer for image ﬁlters. We made use of Median Image ﬁlter
to reduce the noise in our project, because, we wanted to reduce the unde
sired pixels which are of similar values as of skin color. The median ﬁlter
works on the principal of 8 neighborhood. Here, if the pixels diﬀerentiate
greater than a certain threshold with their neighboring pixels, then they will
be set to average value of those pixels, hence, false edges are not detected
during further processing.
The main disadvantage of implementing this was that, this process con
sumes a lot of time, and hence, slows down the overall performance of the
ﬁnal output. To accommodate this feature we, had to reduce the actual
frame rate, hence process only a few frames to speed up the entire process.
5 Edge Detection
The simplest approach after detecting face is to get some feature extraction
from the detected face so that we can match with the next face to be recog
64 Face Detection, Tracking and Recognition
nized. Detecting edge once we thought a simplest approach towards feature
extraction.
Diﬀerent algorithms for Edge Detection are as below 
• Canny edge detection .
• Other ﬁrstorder methods.
• Thresholding and linking.
• Edge Thinning.
• Phase congruency based edge detection.
When implementing, we found the thresholding and linking method
can be implemented successfully in the simulation and on the realtime
TMS320DM6437 system. The reason for using the edge detection was
to mainly to extract the features from the detected face like, eyes, nose
and mouth. Since, we have implemented the Principle Component Anal
ysis(PCA) for the recognition part, this step may not be of much use in
that sense, but it actually helped the system to become more robust in rec
ognizing only face, rather than recognizing other parts of human body as
face.
Basically using it we compare with the neighboring pixels with some
threshold value. The threshold value is dependent many factors so it has
to adjust with the idea setting for the successful diﬀerentiation and edge
detection.
6 Face Detection and Tracking
Among all frames it gets important task to detect face for further processing.
There is need to separate the face from other background. There are many
algorithms available for it like 
• Binary patternclassiﬁcation.
• Skin color to ﬁnd face segments using static background and lighting
condition.
• Windowsliding technique using background pattern.
• Eye blinking pattern detection.
• Appearance, face and movement detection.
Asheesh Mishra, Mohammed Ibraheem, Shashikant Patil 65
While studying the method in the Ref[1] we found the implementation of
Skin color to detect face segments using static background and appropriate
lighting condition in lab environment most suitable and successful. It has
given expected results for detecting the face from the entire frame.
To detect the face in video, ﬁrstly, we had to store the current fame to
a temporary array, which we are going to manipulate. The luminance (Y)
component is set to any particular value which diﬀerentiate from skin color,
like setting it to 0xFF gives the comparable results, but the point is, entire
Y component should be of same value.
Since, we are going to diﬀerentiate face from background, the Chroma
components becomes much more important. Most of the skin color falls in
the speciﬁc color ranges of Chroma red component. hence, we need to ﬁrst
set a speciﬁc limit to Chroma blue, to make entire procedure more robust.
Now then, the skin color falls in the range of 0x8A to 0x8C of Chroma red.
We set all the pixels falling in this range to speciﬁc color, and others to the
same value as we had given to the Luma component.
Here, the image consists of only face, then, this stored and modiﬁed
frame has to be written back on to the write cache buﬀer, to display on
to the monitor. Below are some of the images shown after processing this
function.
Fig.3 After Processing Luma(Y) and Chroma(Cb)(Cr) Components.
Now, we had to apply the ﬁlter to remove the undesired noises coming
from the reﬂective surfaces. As discussed above, we made use of the median
ﬁlter for this.
Now, we had to detect the face from the frame, which is done by scanning
the entire frame from top left position till the end of frame is reached. During
the scanning we searched for the continuous pixels holding the same values
66 Face Detection, Tracking and Recognition
as of skin color, when it satisﬁes a certain threshold, a ﬂag is raised to
indicate that face had been detected in a frame and that position is also
saved in a pointer register.
Now, comes the tracking part, which required the detected face should
be tracked in a frame for its new position.
A box is place on to the position captured from face detection part, on
the basis of the status of the ﬂag raised. The box was placed by setting the
values of all the desired pixels (to black) using the value in pointer register.
Fig.4 After detection and VGA display of processed frame.(Exp1)
Fig.5 After detection and VGA display of processed frame.(Exp2)
Asheesh Mishra, Mohammed Ibraheem, Shashikant Patil 67
7 Face Recognition
Common algorithms for face recognition like,
1. Principal Component Analysis (PCA)
2. Independent Component Analysis (ICA)
3. Support Vector Machine (SVM)
4. Hidden Markov Models (HMM)
5. Boosting and Ensemble
Among these algorithms PCA based Eigen face [9],[5],[14]algorithm we
found most interesting and successful in the simulated environment using
Matlab. Ref [11] also, the PCA has big advantages on the implementation
since it reduce the dimensions of the images that need to be stored to in
the database, and of course that helps us to improve the memory resources
consuming in the real time system because of the hardware limitation [13].
We apply the algorithm in three stages which are described as follows.
Creating the database: Before applying the PCA, we have to create
the training database that contains the faces. First we have to reformat each
image from a two dimensional image to a one vector image by concatenating
each row or coulomb into along vector. Then we combine the images vectors
into one matrix called the trained matrix (Tmatrix).
Generating the Eigenfaces: Considering the Tmatrix as input to
this stage, we have to calculate the following matrices to be the input to the
recognition stage:
1. MMatrix: Mean values of the Tmatrix (training database).
2. AMatrix: Centered images generated by subtracting the Mmatrix
from Tmatrix.
3. Eigenfaces: Socalled Eigenvectors, its the features matrix contains
the faces features. We ﬁrst calculate the covariance matrix by multiplying
the AMatrix by its transposed one and then ﬁnding the Eigen vectors Ma
trix, and modifying it by sorting them and removing the negative values.
Finally by multiplying AMatrix and by the modiﬁed Eigen vectors Matrix
we can get the Eigenfaces Matrix.
Face Recognition process: In this process we receive the 3 outputs
from the previous stage as the inputs for this stage addition to the input
picture that we want to recognize the face.
First we have to project the centering images into the face space by
multiplying the each column in the AMatrix that represents corresponding
image by the Eigenfaces this give us the projected images Matrix. After
that we have to the project the input image using the same concept (center
the image by subtract the main of TMatrix and multiplying it with the
Eigenfaces).
Having the projected images set and the projected input image, by cal
culating the Euclidean distance between each projected image in the set.
68 Face Detection, Tracking and Recognition
Test image should be having the minimum distance with its corresponding
image in the database.
The ﬂowchart (Fig.6) describes in detail.
Fig.6 Flowchart of PCA Algorithm for Face Recognition
8 Problem faced
We faced several problems, when implementing the reference model in to
the TMS320DM6437. The limited memory available onboard was one of
the major bottlenecks. It has only 192K of RAM available, so not much
of data we can store on it. Moreover, since, video data manipulation is
Asheesh Mishra, Mohammed Ibraheem, Shashikant Patil 69
very computation exhaustive, hence, slow response of the kit was also a
problem, during processing of image data. We tried to overcome this prob
lem, by processing only a fewer number of frames, which improves a bit of
performance. the other problem we faced is in the implementation of recog
nition part. We initially, thought of implementing every part of PCA on
to the board, including the training set, and creating database for at least
three distinct persons, but that actually degrades the overall performance
of system drastically, virtually unrealistic. So, we tried to implement the
calculation part of PCA(like ﬁnding Eigenfaces and training databases) on
to the matlab itself, and comparing the realtime data, with these eigenfaces
directly. The TMS320DM6437 is a ﬁxed point processor, and also a problem
since most calculations on the PCA is done using division, multiplications,
square roots, so we have to use the scaling technique (scaling up) to convert
the data to ﬁxed point and restoring it after calculation scaling down using
shift operations for the square root calculation we use Babylonian method
[13]. Although we were able to implement most of the part of recognition
phase onto the kit, but the results were not as expected.
9 Conclusion and Future work
We were able to ﬁnish the face detection and tracking pat of our project
successfully, and results were found to be as expected. In a highly, noisy en
vironment(like improper background), the performance degrades marginally,
but still able to detect most of the time. The recognition part was also im
plemented as a hybrid between onboard and matlab, but output results were
not as were expected, and also due to lack of enough time, we could did very
little to optimize the recognition part. Hence, we separated out the recog
nition part from the project, as of time being. Since, the project was much
more complex to implement as opposite when we thought otherwise initially,
but the kind of exposer we had got during the implementation phase was
very satisfying. The project also honed our skills on embedded C language
and code composer studio.
References
[1] A.N.Rajagopalan, K. Sandeep ”Human Face Detection in Cluttered
Color Images using Skin color and Edge Information”, Indian Confer
ence on Computer Vision, Graphics and Image Processing Dec. 2002.
[2] Sanjay Kr. Singh, D. S. Chauhan, Mayank Vatsa, Richa Singh. ”A
Robust Skin Color Based Face Detection Algorithm”, Tamkang Journal
of Science and Engineering,Vol. 6, No. 4, 2003, pp. 227234.
70 Face Detection, Tracking and Recognition
[3] Minku Kang, PCAbased Face Recognition in an Embedded Module for
Robot Application, ICROSSICE International Joint Conference 2009
[4] William H. Press, The Art of Scientiﬁc Computing , Cambridge Uni
versity Press, 3rd edition, ISBN 13: 9780521880688
[5] http://www.mathworks.com/matlabcentral/fileexchange/
17032pcabasedfacerecognitionsystem
[6] http://www.eit.lth.se/fileadmin/eit/courses/eti121/
Seminar/lect1_2011.pdf
[7] http://www.eit.lth.se/fileadmin/eit/courses/eti121/
Seminar/lect2_2011.pdf
[8] http://www.csus.edu/indiv/p/pangj/aresearch/video_
compression/ref/report_summer09_Shriram%20_face_detection.
pdf
[9] http://www.eit.lth.se/fileadmin/eit/courses/eti121/
Reports/ASP_Reports_2010.pdf
[10] http://www.facerec.org/algorithms/
[11] http://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors
[12] http://cswww.essex.ac.uk/mv/allfaces/faces94.html
[13] http://en.wikipedia.org/wiki/Face_detection.
[14] http://www.cs.otago.ac.nz/cosc453/student_tutorials/
principal_components.pdf
71
Part VII
Circular Object Detection
Ajosh K Jose, Qazi Omar Farooq, Sherine Thomas, Sreejith
P Raghavan
Abstract
This project deals with the implementation of a circular object
detection method on DSPTMS320DM6437 Evaluation Module. The
aim is to detect a circular moving object when the background is kept
ﬁxed. The ﬁrst step is to make the reference frame by capturing the
ﬁrst frame sent from the video camera. Then the successive frames
are subtracted from the reference frame to obtain the moving object.
Further processing is done only on the area of moving object. This will
reduce the processing time required for each successive frame. In the
second step, Canny edge detection algorithm is employed to extract
the edges of moving object. Finally the object is checked for circular
shape using modiﬁed Circular Hough Transform. If a circular object
is detected on the frame then it is marked in the video.
72 Circular Object Detection
1 Introduction
Realtime image processing applications are now widely used due to the
very fast advancement in technology. Due to the introduction of portable
devices with stringent resource limitations, the image processing algorithms
used in such systems need to be chosen wisely. Also the image processing
algorithms are widely used in medical imaging, surveillance systems and
digital cinema. The rapid change in video and image processing standards
also introduce additional complexity and the need for higher throughput. In
this project we are trying to familiarize with the various algorithms used in
image processing. Detecting moving objects is an important task in video
surveillance. If the shape of moving object can be detected automatically,
the task of manually monitoring the safety can be reduced.
Our project implementation is to detect circular moving objects in real time.
Here we are trying to study the methods and challenges involved in detect
ing an object correctly. After successful implementation of circular object
detection system, the algorithm can be further improved to detect objects
of more complex shape. We are using TMS320DM6437 evaluation module
for implementing our project.
2 Theory
Figure 1, shows the steps involved in our circular object detection imple
mentation. The circular object detection is implemented in the following
steps,
• Moving object Detection
• Edge Detection
• Cicular Object Detection
In the ﬁrst step moving objects in the frame is detected by background
substraction method. By doing this the processing time required for the next
steps can be reduced very much. Also to further reduce the processing time
every 15th frame is processed instead of consecutive frames. Background
frame is the ﬁrst captured frame which is stored in the memory. Then
the new frame is substracted from the background frame to detect moving
objects. This will reduce the area of interest. In the next step edge detection
algorithm is applied on that particular area to detect the edges of the object.
In the ﬁnal step, circular object detection algorithm is applied on the edge
detected input to ﬁnd the circular objects in the region.
Ajosh K Jose, Qazi Omar Farooq, Sherine Thomas, Sreejith P Raghavan73
Figure 1: Block Diagram of the steps
2.1 Edge Detection
Edge detection is an important step in image processing. The main objective
of edge detection is to reduce the amount of data which is to be processed by
keeping the structural content intact. There are several algorithms proposed
by diﬀerent people for edge detection. The processing time and output of
these algorithms vary very much. The main edge detection algorithms are
• Prewitt Method
• Canny Method
• Sobel Method
• Roberts Method
• Laplacian of Gaussian Method
• ZeroCross Method
Please refer [1] for more details about edge detection algorithms. We did
an intial study using Matlab to ﬁnd the most suitable algorithm for edge
detection. The results of this study is shown in ﬁgure 2.
From the comparison it was clear that canny edge detection algorithm will
be more suitable because of the ﬁne details available in the output which
will be needed for tracking a moving object. But the actual problem of ap
plying the canny edge detection algorithm to the complete frame is the large
processing time required. For reducing the processing time, we added the
method of background substraction. In this method we substract the static
background from the current frame to detect the actual area of interest. So
by applying the canny edge detection algorithm on that particular area the
processing time can be reduced considerabily.
The input to the Canny edge detection algorithm is the gray scaled image.
Canny Edge detection algorithm consists of ﬁve steps. They are
74 Circular Object Detection
Figure 2: Edge detection algorithm outputs
• Smoothing
• Finding gradients
• Nonmaximum suppression
• Double thresholding
• Edge tracking by hysteresis
A brief idea about diﬀerent steps of canny edge detection algorithm is
given below. For detailed description, please refer [2] or [3].
2.1.1 Smoothing
Smoothing is done to reduce the noise level in the image. Usually gaussian
ﬁlter is employed for this step. This step helps to remove unwanted edges
detected due to the noise present in the image. The image is smoothened by
applying a Gaussian ﬁlter with a standard deviation of 1.4. The gaussian
matrix is shown in the ﬁgure 3.
Smoothing takes long processing time due to the matrix multiplication in
volved. In this project we skipped this step since we are processing only the
moving object and the eﬀect on the edge detected image was found to be
less.
Ajosh K Jose, Qazi Omar Farooq, Sherine Thomas, Sreejith P Raghavan75
Figure 3: Gaussian Matrix
Figure 4: Sobel Matrix
2.1.2 Finding gradients
By ﬁnding the gradients, the edges where the gray scale intensity varies
most is determined. This is done by applying sobel matrix to each pixel in
the image. The sobel matrices are shown in ﬁgure 4. The matrices consists
of G
x
and G
y
matrix. Then the gray scale sum is calculated by the equation,
sum = G
x
· pixelvalue +G
y
· pixelvalue (1)
The gray scale angle is calculated by the equation,
angle =
G
x
· pixelvalue
G
y
· pixelvalue
(2)
The output after applying the sobel matrix is shown in the ﬁgure 5. It is
clearly visible that all the edges in the image are highlighted.
2.1.3 Nonmaximum suppression
For suppressing the nonmaximum, the angle calculated from the previous
step is used. The angle obtained is rounded to the nearest 45 degree with
which the gradient direction of all the pixels is determined. It will be either
76 Circular Object Detection
Figure 5: Image after ﬁnding the gradients
0,45,90 or 135 degrees. Then the current pixel strength is compared with
the positive and negative gradient direction. If the current pixel have more
strength than the positive and negative gradient direction, then it is choosen
and the other values will be suppressed. Thus in this step all the gradient
edges with local maxima will be selected. Figure 6 shows the output image
after nonmaximum suppression.
2.1.4 Double thresholding
For double thresholding, the remaining edges are classiﬁed into strong and
weak edges. Strong edges will be retained and they will be part of the edges.
Weak edges will be further checked in the next step.
2.1.5 Edge tracking by hysteresis
For edge tracking by hysteresis, all the weak edges are checked for connection
with strong pixels in the neighbourhood. If it is connected to any of the
strong pixels, it will be considered as part of the edge and will be retained.
The other pixels will be cancelled. Figure 7 shows the output image after
doing the hysteresis.
Ajosh K Jose, Qazi Omar Farooq, Sherine Thomas, Sreejith P Raghavan77
Figure 6: Image after non maximum suppression
Figure 7: Image after Hysterisis
78 Circular Object Detection
Figure 8: Circle detection method
2.2 Circular Object Detection
Circular object is detected by applying circular Hough transform (CHT) al
gorithm on the edge detected frame. Circular Hough Transform algorithm
is applied to the output of Canny edge detected image to ﬁnd the edges of
circles in the image. This algorithm is based on the equation,
(x
1
−x
0
)
2
+ (y
1
−y
0
)
2
= r
2
(3)
All the pixels will be searched for the possibility of a circle with a radius
of particular limit. Six pixels are searched for determing the circle. If all
the six pixels have edge information within the particular radius, then it is
considered as a circle. Figure 8 shows the method used for determining the
circle. For detailed description on CHT please refer [4], [5] and [6].
3 Implementation
The implementation of the project was done in two steps. In the ﬁrst step
the algorithm was tried in matlab to determine the eﬃciency. In the second
step, the matlab implementation was converted into a C implementation.
Code composer studio was used for compiling and downloading the code.
The hardware tools used for testing the project included,
• Video Camera
• DM6437 Evaluation board
• Television
The platform DM6437 oﬀers an interface in the framework, from which
we can access the input video stream frame by frame. The pixels are in
Ajosh K Jose, Qazi Omar Farooq, Sherine Thomas, Sreejith P Raghavan79
YCbCr format, where Y is the luma component and Cb& Cr are the chroma
components with the ratio 4:2:2. For this project we took into consideration
a PAL system. So the frame size is 720x576. The processing consists of
reading the frame buﬀer and updating the frame data and writing it back
into the buﬀer.
4 Conclusion & Future Work
Circular object detection was successfully implemented and tested. It was a
nice experience working with this project which introduced us to the world
of programming DSP processors. Diﬀerent algorithms used in signal pro
cessing were familiarized in this course. The two labs which were done as
part of this course were helpful in familiarizing the tool and the DSP kit.
Working with DM6437 and code composer studio was a nice experience.
We felt the processing power of DM6437 Evaluation kit is not enough for
handling complex algorithms used in video processing.
Due to the lack of enough processing time, we were not able to imple
ment more reliable algorithms for circular object detection in the DM6437
processor. Also edge detection was implemented only in the selected area
where a moving object is detected. So as future work we are planning to
optimize our current implementation and add more reliable algorithms for
circular object detection which can detect circular objects with diﬀerent ra
dius. Also if the processing power permits, we would like to implement more
complex algorithms like human hand detection and tracking the movement
of hand.
References
[1] Raman Maini, Dr. Himanshu Aggarwal, Study and Comparison of Var
ious Image Edge Detection Techniques, International Journal of Image
Processing (IJIP), Volume (3): Issue (1)
[2] John Canny, A computational approach to edge detection. Pattern Anal
ysis and Machine Intelligence, IEEE Transactions on, PAMI8(6):679
698, Nov. 1986.
[3] Canny Edge Detection implementation tutorial, Labortary of computer
vision and media technology. Advanced image processing. Allborg Uni
versity
[4] Mohamed Rizon, Object Detection using Circular Hough Transform,
American Journal of Applied Sciences, 2005.
80 Circular Object Detection
[5] Marcin Smereka, Ignacy Duleba, Circular Object Detection Using A
Modiﬁed Hough Transform, Int. J. Appl. Math. Comput. Sci., 2008.
[6] Mohamed Roushdy, Detecting Coins with Diﬀerent Radii based on
Hough Transform in Noisy and Deformed Image, GVIP Journal, Vol
ume 7, Issue 1, April, 2007.
[7] Project Report 2010, ETI121, Algorithm in Signal Processing Course.
ii
Contents
I Guitar tuner 1
2 3 3 4 4 4 5 5 7 7 8 8 8 8 9 9 9 10 11 12
P V Soumya, A Norrgren, CJ Waldeck, F Brosj¨ o
1 Introduction 2 Theory 2.1 Harmonics . . . . . . . . . 2.2 Algorithms . . . . . . . . 2.2.1 Fourier Transform 2.2.2 Cross Correlation .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
3 Method 3.1 Analysis of the guitar sound 3.2 Implementation . . . . . . . 3.2.1 Echo . . . . . . . . . 3.2.2 Detection . . . . . . 3.2.3 Process . . . . . . . 3.2.4 DFT . . . . . . . . . 3.2.5 SmallXcorr . . . . . 3.2.6 MatLab GUI . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
4 Results and Discussion 4.1 Matlab testing . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Posttesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusion and further development
II
Pitch Estimaiton 15
16 16
Jonas Rosenqvist, Kim Smidje, Henrik Nilsson, Johan Mattsson
1 Introduction 2 Theory
3 Methods 18 3.1 Time domain . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Frequency domain . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Implementation 5 Problems encountered iii 19 22
6 Conclusion 7 References
23 23
III
Vocoder 25
26 26 26 27 27 27 28 30 31 32
Mattias Danielsson, Andre Ericsson, Kujtim Iljazi, Babak Rajabian
1 Introduction 2 Theory 2.1 Overall description of our vocoder model 2.2 The highpass ﬁlter . . . . . . . . . . . . 2.3 The autocorrelation function . . . . . . 2.4 The Levinsondurbin recursion . . . . . 2.5 The IIR lattice ﬁlter . . . . . . . . . . . 3 Implementation 4 Testing and debugging 5 Results and conclusions
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
IV
Reverberation 35
R. Tullberg, R. Mittipalli, S. AbduRahman, T. Isacsson
1 Introduction 36 1.1 Reverberation . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2 Theory 2.1 Reverb Algorithm . . . . . 2.2 Reverberation time . . . . . 2.3 Delay elements z −mi . . . . 2.4 Damping Filters hi (z) . . . 2.5 Diﬀusion Matrix A . . . . . 2.6 Gains bi and ci . . . . . . . 2.7 Tonal Correction Filter t(z) 38 38 38 39 39 40 40 40 41 41 41 41 41 42 43
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
3 Implementation 3.1 Realtime versus nonrealtime implementation 3.2 Diﬀusion Matrix . . . . . . . . . . . . . . . . 3.3 Software Optimizations . . . . . . . . . . . . 3.3.1 Matrix Multiplication . . . . . . . . . 3.3.2 Circular Buﬀers . . . . . . . . . . . . . 3.4 Compiler Optimization . . . . . . . . . . . . . iv
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . .8 Melfrequency cepstral coeﬃcients . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 V Speech Recognition Using MFCC 49 Harshavardhan Kittur. . . . .1. . . . . . . . . 50 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5 Discussion and Conclusion 45 5. . . .1 Feature Extraction . .7 Logenergy spectrum . . . . . . . . . . . . Mohan Raj Gopal 1 Introduction 50 1. . . . . . . . . .2 Common problems found in designing such a system . . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Fast Fourier transform . . . . . 3. . . . . . . . . . . 50 1. . .1 General . . . . 3. . . . . . . . . 3. . . . . .1 Level detection . . . . . . 46 5. . . . . . . . . . . . . . . . .2 Frame blocking . . . . . . . . . . . 51 51 51 52 52 53 53 53 53 53 53 55 56 56 56 56 58 58 58 v . 43 4 Result 45 4. . . . . . . . . . . . . . . 3.3 Windowing . . . . . . . . . . . . .2 Feature Matching . . . . 3. . 3. . 3 Implementation 3. .3 Improvements . . . . . . . . . . .1 Why Speech recognition? . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . 3. . . . . . . .6 Melfrequency wrapping . . . . . . 45 4. . . .2 Nonrealtime versus realtime implementation . . . . . . . . . Kaoushik Raj Ramamoorthy. 2. . . . . . . . . .1 Experimental setup . . . . .2 Results . . . .2 Mel Frequency Cepstrum Coeﬃcients . . . . . . . . . .1 Speech Recognition Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manivannan Ethiraj. . . . . . . . . . . . . . . . .3 Tools Used . . .1. . . . . . . . . . 45 5. . . . . . . .5 Power spectrum calculation . . . . . . . . . . . .9 Comparison in the feature matching phase 4 Implementation in MATLAB 5 Implementation in DSP Board 6 Tests and Results 7 Conclusion . . . . . . . . . . .5 Hardware Memory Considerations . . . . . . . . . . . .3. . . . . . . . . . . . . 50 2 Theory 2. . . . . 2.
. . . . . Sreejith P Raghavan 1 Introduction 2 Theory 2. . . . . . . . Sherine Thomas. Tracking and Recognition 61 61 62 63 63 63 64 67 68 69 Asheesh Mishra. . . . . . . . . . . . .2 Finding gradients . . . . .VI Face Detection. . . .2 Circular Object Detection . . . . . .1 Edge Detection . . . . . . . 3 Implementation 4 Conclusion & Future Work . . 2. . . .1. . . . . 2. . . . . .1. Shashikant Patil 1 Abstract 2 Introduction 3 YCbCr. .1 Smoothing . . . . . .1. . . . . .3 Nonmaximum suppression 2. . . . . . . . . . . . . . . . . . . . . .Color Space Model 4 Image Filtering for Noise reduction 5 Edge Detection 6 Face Detection and Tracking 7 Face Recognition 8 Problem faced 9 Conclusion and Future work VII Circular Object Detection 71 72 72 73 74 75 75 76 76 78 78 79 Ajosh K Jose. . . . . . Mohammed Ibraheem. . .1. . . 2. . . .5 Edge tracking by hysteresis 2.1. . . . . .4 Double thresholding . . . . 2. Qazi Omar Farooq. . . . . . . . . . . vi . . . . . . . . . . . . . . . . . .
A working guitar tuner was then made. which concludes this project. A great deal of the report handles problems associated with the memory of the board. Suggestions of possible improvements are presented in the ﬁnal section. First the theory about the guitar strings and it’s harmonic patterns is covered. along with a description of the diﬀerent mathematical algorithms used to tune them. CJ Waldeck. F Brosj¨ o Abstract This report handles the development of a guitar tuner based on the Texas Instrument TMS320C6713DSK signal processing board. together with solutions developed to work around them. 1 .Part I Guitar tuner P V Soumya. This is followed by the analysis of the guitar sound and the method used implement the tuner on the DSPboard. A Norrgren.
. the other strings change pitch as well due to the higher stress on the guitar caused by the tension. When a string on some guitars is tuned. since the strings have to be tuned separately and many times to achieve a stable pitch for all of them. thereby being able to correct changes to the other strings instantaneously. a TC Electronic polytune [4]. When searching for this type of tuner only one was found on the commercial market. This was our source of inspiration for this project in pitch estimation.2 Guitar tuner 1 Introduction When thinking of pitch estimation one of the project group members came to think of a problem he experiences in tuning his guitars. The goal of this project is ﬁrst to be able to determine the pitch of a single string and eventually expanding it to being able to estimate the pitch for multiple strings simultaneously and presenting the result to the user. A way to address this issue is to have a tuner that allows the user to overlook all strings at the same time. This makes tuning a tough and time consuming procedure.
String nbr 6 5 4 3 2 1 Octave E2 A2 D3 G3 B3 E4 f0 82.814 220. In table 1 all strings with their corresponding pitches and ﬁrst two harmonics are shown. In ﬁgure 1 the frequency pattern of the E4 string on a Hagstr¨m Viking is shown.830 196.940 329. F Brosj¨ o 3 2 2.630 2f0 164.890 Table 1: Guitar frequencies The pattern of the harmonic amplitudes are not the same for diﬀerent strings and guitars which gives each guitar its speciﬁc sound.820 988. What diﬀers the instruments and gives them their unique sound is the amplitude pattern for these harmonics.000 293. A Norrgren.000 246.000 146. thickness and material of the string.000 740. o Figure 1: Spectrum of E4 .260 3f0 247. These patterns depends on many factors like the length.660 392. called the pitch.221 330.000 493.P V Soumya. A guitar has six strings tuned to diﬀerent pitches.1 Theory Harmonics A note played on a string instrument consists of a fundamental frequency. The frequencies of these harmonics are multiples of the pitch and are the same for every instrument tuned to the same pitch. CJ Waldeck.407 110.880 659.490 588.000 440. and a number of harmonics.
the frequency scale is made logarithmic. This is done using equation 1 which has a linear frequency scale.2 Cross Correlation A cross correlation is used to ﬁnd similarities between two diﬀerent discrete signals. This is done by replacing k in equation 1 with (2): k = f0 · B i/N i = 0. 2.2 2. The cross correlation of the signals x and y is given by equation 3. . This makes the accuracy better for higher frequencies. which are one semitone apart. To handle this smoothly a logarithmic frequency scale can be implemented. an octave is divided into 12 pitches. N − 1 (1) In this case the frequency scale of octaves is logarithmic. rxy (n) = l x(l) · y(l − n) (3) . Since the distance between the octaves is increasing for higher frequencies the number of increments between each octave is increasing in the linear frequency scale. Xk = N −1 −2πnk/N n=0 xn e k = 0. To obtain a linear accuracy over the entire frequency spectrum. The information about DFT and the logarithmic frequency scale was found in Martin Stridh doctoral thesis.. The base determines the size of the scale along with the number of points. 1. N − 1 (2) Where B is an arbitrary base and f0 is the starting frequency. Between each semitone there are 100 equally spaced increments called cents [2].1 Algorithms Fourier Transform The Fourier transform is a discrete transform between the time and frequency domain. 1.. A commercial good quality tuner usually have an accuracy of between +/1 and +/3 cents. meaning that an increase of one octave corresponds to a doubling of the frequency.. .4 Guitar tuner The chromatic scale that is used globally..2. Signal Characterization of Atrial Arrhythmias using the Surface ECG [3]. This unit is commonly used to measure the accuracy of instrument tuners.2. This is done by multiplying the signal components on each index with each other and summing them up. A portion of a signal in time domain is analysed and the frequency components are extracted with their corresponding amplitude.. This is done repeatedly where one of the signals are shifted in relation to the other.. 2.
The correlation speed can then be improved by removing all the multiplications and most of the additions.1 Method Analysis of the guitar sound The project began with recordings of the guitar strings from a Hagstr¨m o Viking semi acoustic electric guitar. The correlation results in an array where the index with highest value represents the best match between the signals. and it was quickly concluded that the frequency positions of the harmonic peaks were of higher signiﬁcance than their amplitude. . A Norrgren.P V Soumya. or even a cepstrum. Roughly estimated this saves a total of 700 000 operations with the improved correlation for all the six strings. could not be used. Instead another method was tested. that might have been used for one string. If the spectrum is of the length 1500 an ordinary scalar product requires 1500 multiplications and 1499 additions. so all that is left is the sum of the three values in the spectrum where the reference is one. a normal auto correlation. The sounds were then analysed using Matlab’s built in FFTfunction to get an idea of how the frequency spectrum would look like. Since the goal is to tune multiple strings simultaneously. CJ Waldeck. 2. where the ones represents the fundamental and the two harmonics. To save time and calculations there is no need to do a full correlation and so it was limited to 20 steps around the centre index.5 0 −40 −30 −20 −10 0 Shift 10 20 30 40 Figure 2: Correlation of the spectra in ﬁgure 1 3 3. The reference spectra is only composed of three ones.5 2 Correlation coefficient 1. F Brosj¨ o 5 Provided that the guitar is roughly in tune the reference spectrum should have a high correlation near the centre index. That means that it saves 2997 operations for every scalar product. The spectrum was found to vary a lot between diﬀerent strings and pitches. With the improved correlation it only requires two additions and no multiplications.5 1 0.
the references were constructed rather than recorded. This would compromise the accuracy since it would be better if both the pitch and the harmonics matched at the same time. This came to eﬀect the choice of resolution and thereby the base to the logarithmic frequency scale. To get rid of this issue. The aim was to construct a tuner with relatively high accuracy.5 2 Amplitude Amplitude −15 −10 −5 0 Shift 5 10 15 20 2 1. In this way we could get the exact frequencies for the pitch and harmonics of each string. an accuracy of ±3 cents was chosen.5 0. the pitch and the harmonics will match at the same time.6 Guitar tuner By cross correlating the frequency spectra from the coincident strings with the spectra from the individual strings in turn. and with the limited memory. To obtain an adequate accuracy.5 1 1 0. This corresponds to an acceptable error in the correlation of ±1 steps in the correlation. as in ﬁgure 3 B. This solves the problem because when the cross correlation is made. as shown in ﬁgure 3 A. the linear frequency scale would be hard to use because the correlation would yield multiple peaks depending on if the pitch or the harmonics matched perfectly. .5 2. and the correlation was done to the frequency pattern rather than the amplitude of the harmonics. This method was tested using recorded sounds. there was a need Cross correlation for linear frequency scale 3 3 Cross correlation for logarithmic frequency scale 2. a separate correlation for each string could be obtained. which resulted in a very messy graph. Since it was realised that the frequency diﬀerence between the pitch and the harmonics would be changed when the string is not tuned.5 0 −20 0 −20 −15 −10 −5 0 Shift 5 10 15 20 Figure 3: A Linear correlation B Logarithmic correlation for a specialised Fourier transform with a logarithmic frequency scale. to be quantised and noise free.5 1. The amplitude diﬀerence of the diﬀerent harmonics of the reference sounds made it hard to get a clear result.
The counter is then reset when the data has been processed. . no loop is needed to run the program. to ignore the end of a signal and prevent misreadings.2 Implementation The program was constructed using a number of diﬀerent functions. A main function where the necessary parameters and arrays were initialized and constructed. A software interrupt called echo is activated when the input buﬀer is full.2. A Norrgren.1 Echo Echo is called by interrupt when the input buﬀer is full and collects the values from the input buﬀer and passes on the buﬀer to the detection function. The interrupt runs the process which calls the diﬀerent functions needed to process the signal. Since the DSP uses software interrupts. F Brosj¨ o 7 3. Figure 4: Flow chart over the algorithm 3. Further explanation of these functions follow. This calls the detection which registers the input amplitude of the signal and triggers a software interrupt if the signal exceeds a threshold level. A counter was then used to ignore the ﬁrst 128 calls to detection.P V Soumya. CJ Waldeck. When the DSP is starting up is has a lot of random values on the input that must be ignored.
The magnitude of these are normalized to reduce the risk of overﬂow and then stored in the output array. The result is then correlated together with the reference array for the individual strings. The returning values represents the indices of the maximum correlation and the corresponding value in relation to the maximum possible correlation value. It was chosen to only shift 20 steps to the left and to the right around the centre element.3 Process Process gathers all the functions needed to perform the processing of the signal. the detection function goes into an buﬀering mode where it samples every package until the sample buﬀer is ﬁlled.2.5 SmallXcorr The small cross correlation function is made so that it only calculates a small part of a normal correlation. The ﬁrst step is to perform a DFT which is covered in the section below.2.2. When this is done a second loop goes through the resulting correlation arrays to ﬁnd the index of the maximum value.2.2 Detection Guitar tuner The detection function ﬁrst calculates the mean power of the input. and this is the only diﬀerence from a normal DFT. one for the real and one for the imaginary part of the complex result. 3. the process ﬂag is set and the interrupt process is called.4 DFT The DiscreteFT is based on a normal Fourier transform summation using a double forloop. When the last step is done the buﬀering mode is deactivated. 3. 3. . Since the DSP does not have support for the complex numbers. The small correlation is done using a forloop which runs from 20 to 20 where it sums the elements where the reference frequencies are. the summation had to be done in two separate variables.8 3. In other words 41 steps in total. The frequency array used was explained in the theory section. If the value is higher than a predeﬁned threshold value. which is done for the references of all six strings.
The function was tested against Matlab’s built in FFTfunction and resulted in very similar data. using Matlab’s built in FFTfunction. F Brosj¨ o 3. The implementation of this was fairly simple in Matlab and did not generate any problems out of the ordinary.6 MatLab GUI 9 A graphical user interface was done using Matlab’s GUI Guide. Figure 5: Graphical User Interface 4 4. A Norrgren. In a spectrum for one string this diﬀerence. as mentioned prior. The result. The program consists of a table.P V Soumya. . CJ Waldeck.1 Results and Discussion Matlab testing The ﬁrst thing we did was to record the sound from all strings and take a DFT to get an idea of how the spectra would look like. however it was soon realized. is much clearer. but could very well be contributed to round oﬀ errors. that the distance between the frequency increments had to be logarithmic to yield appropriate accuracy in lower frequencies. To access the DSP a built in function called ccsdsp was used. The initial results were good. Some slight variations were found. As can be seen there is a lot of diﬀerent peaks with varying amplitude and it’s diﬃcult to distinguish between the fundamentals and the harmonics. which makes it possible to load and run the project on the board from within Matlab. The RUN button uploads the program to the DSP and runs it and the STOP button stops the DSP and closes the program. The table is updated every second using a timer interrupt. gave us the spectra shown in ﬁgure 6A. as visible in ﬁgure 6. two buttons and a timer. The work continued by implementing a DFTalgorithm in Matlab to ensure its functionality.2.
and thereby result in a double peak in the correlated data.h package. the best correlation can sometimes be between two indexes. however the amount of memory and the time needed for additional calculations set a limit to this resolution. 4.10 Guitar tuner Figure 6: A Spectrum of all strings B Spectrum of D3 The correlation algorithm was also implemented and tested against Matlab’s correlation function. After implementing the necessary functions without any major issues.1 Hz. but a suitable work around to this problem has already been covered. This could be avoided by using a higher frequency resolution. 1500 points and a base of 15. the DSP compiler did not support the complex. had an initial frequency of 72. The values used for the frequency array. After many hours spent on error correction it was found that the memory was over written in some way and replaced the result values with memory addresses. It was also important to get the frequency points in the array as close as possible to the known frequencies to be used as reference values. and the tuner would always have an oﬀset. This is due to that Matlab does not use full length ﬂoats like C. There were some minor diﬀerences that most probably are because of round oﬀ errors. Since the correlation algorithm handles a discrete frequency array. calculated using Stridh’s formula described in the theory section. The values were not at all consistent with the expected values generated in Matlab.2 Implementation The implementation on the DSP board was straight forward and did not generate so many problems at ﬁrst. but a user deﬁned length. As mentioned above. in our case the format short containing four decimals. Otherwise there would be an error because of displacement from the correct value. This turned out to be because of lacking . the program was tested and resulted in confusion.
especially the amount of memory that was available in the diﬀerent memory banks. and many arrays had to be moved to the external memory.P V Soumya. The time to go through one cycle was too long. By moving almost all of the arrays to the external memory the functions started to work better with correct results. This is clearly something to continue working on. and the guitar should be viewed as tuned for values between 1 and 1. The strings has prior to the test been tuned with a TC electronic polytune commercial tuner [4]. F Brosj¨ o 11 internal memory that we were expected to have. The data presented after a tuning was also more unstable the more strings that were included. most likely as a result of sampling to early when the strings are still unstable after the strum. As seen the strings get the tuning value of 1. The memory conﬁguration of the board was fairly hard to understand. This is due to low resolution of the tuner. A Norrgren. . 4. see ﬁgure 8 A. The cause could also be that the diﬀerent strings have diﬀerent amplitudes and diﬀerent sustain. the timing of the tuning has to be perfected. causing the spectrum to be uneven. based on the amplitude of the correlation however is lower. This could easily be improved with more memory and processing power that would allow longer reference arrays with more harmonics. The most likely cause is the enormous amount of frequency data which results in correlation matches in other places than the intended. This is still a bit too long but acceptable. This drastically improved the time to process the signal to a few seconds. The accuracy of the tuning. When tuning all strings the ﬁrst result is most often wrong. The second sampling of the same strum usually have a more accurate shift. visible in ﬁgure 8 B.3 Posttesting The system was now complete and some post testing was done. To decrease this time the algorithms were analysed many times to look for means of improvement. however very slow. it took between 10 and 20 seconds. After much testing a new correlation algorithm was created that only used the points of interest in the reference arrays rather than correlating every point. o This is shown in ﬁgure 7 where the 5th and 6th strings are tuned individually. CJ Waldeck. The Hagstr¨m guitar was used to test the tuner and tuning one string worked well. which implies that their pitch is slightly high.
in order to make the points come closer to each other. making correlation more accurate. B Fifth string A2 Figure 8: A All strings 1. This would improve the accuracy for the entire spectra. B All strings 2 5 Conclusion and further development As it is now the tuner works quite well for tuning of one string at a time. which is good. however we discovered that a hamming window did not actually improve the frequency spectrum in a noticeable way. A great deal of time could be spent on optimization for faster performance. Since we have limited memory and computing power we had to limit the number of frequency points to 1500 to be able to get a result in a reasonable time. This might be counteracted by increasing the frequency resolution of the tuner and the number of harmonics in the reference arrays. One way to make sure that some noise is suppressed might be to use windowing functions. Another reason may be that the sound level from some of the strings has dropped in amplitude before the sampling has started. This yields an uncertainty in the higher frequencies where the step size becomes to large. This can be due to too good correlation matches in more than one point in the spectrum because of the large amount of fundamentals and harmonics in the sampled signal. but get much less accurate when more strings are tuned. and not just the higher frequencies. thereby causing a lower probability that the right string is tuned.12 Guitar tuner Figure 7: A Sixth string E2. As for now. In further development this can be helped by using more frequency points and also by reducing the base. the result is presented in a few seconds which is a bit too slow . allowing longer arrays and higher resolution as well as quicker results. More testing with windowing functions should be able to clean up the spectra by narrowing the peaks.
2003 [4] TC Electronics. http://hyperphysics. 33.edu/hbase/music/cents. http://www.htm (20110228).asp (20110303) .P V Soumya. References [1] Vaughn Aubuchon. Vol.tcelectronic. [2] Hyperphysics.com/polytune. Signal Characterization of Atrial Arrhythmias using the Surface ECG.gsu.html (20110303) [3] Martin Stridh.vaughns1pagers. CJ Waldeck. This Vaughns Music Note Frequency Chart http://www. A goal for further development would be to reduce this time to under a second to make real time tuning bearable. F Brosj¨ o 13 in our opinion. ISSN 14028662.phyastr. A Norrgren.com/music/musicalnotefrequencies.
14 Guitar tuner .
disregarding certain intervals. This is achieved by applying the cepstrum transform which is an extension of the Discrete Fourier Transform. The algorithm is written in C using Code Composer Studio IDE and it aims to determine the dominant frequency for a given input audio signal. From here the dominant frequency can be extracted and the closest pure tone as well as the distance to it is presented to the user. This is followed by an inverse Fourier transform which yields a signal in what is known as the quefrency domain. Johan Mattsson Abstract This report covers the implementation of a digital signal processing algorithm for the TMTS320C6713 by Texas Instruments. Henrik Nilsson. in where you search for the highest amplitude. involving additional manipulation of each sample in the frequency domain.15 Part II Pitch Estimaiton Jonas Rosenqvist. Kim Smidje. .
. For all other values of f. But the common thing is that they all require a lot of computations. converts it into the logarithmic scale and lastly transforms the samples back into the time domain. as well as the absolute and logarithmic functions. two sinusoidal signals with diﬀerent phase. the top plot. In this report we will focus on the cepstrum algorithm which is a collection of mathematical tools. namely in Matlab. The result of applying the Fourier transform on to the sum of the signals is displayed in Figure 2. is a method which distinguish frequencies in a tone or sound in order to determine which frequency is the ground frequency of them all. The reason why cepstrum is fast comes from the fact that it is computed using only the Fast Fourier Transform. an estimation of the pitch. i. its inverse. 2 Theory The Fourier transform transforms a signal in the time domain. The cepstrum algorithm has the advantage that its faster than for example autocorrelation which gives the possibility to compute the frequencies from a faster sample rate. amplitude and frequency are shown and the lower plot its the sum of these signals. amplitude as a function of frequency. then the absolute values of that result. into a signal in the frequency domain. the Fourier transform consists of only two peaks. The working process we chose to solve the problem was to ﬁrst solve the problem in a familiar environment. This is visualised in Figure 1 and Figure 2. all of which have a fairly low time complexity. The cepstrum method uses several diﬀerent other operations in order to reach its goal and its mathematical representation is: F −1 (log10 (F (x) )) What it does is that it takes the signal and samples it in the discrete Fourier transform. the amplitude as a function of time.16 Pitch Estimaiton 1 Introduction Pitch estimation is just as the name says. i.e. In Figure 1. have an amplitude of zero in the original signal. Since the original signal is made up only of the sum of two pure sinusoid with diﬀerent amplitudes. which comes from reversing the four ﬁrst letters in the word “spectrum”. One sees here that the value of the Fourier transform for a given value of the variable f corresponds to the amplitude of a sinusoid component with that frequency in the original signal. each of them representing the amplitude of the two sinusoid. There are some techniques that can be used with diﬀerent advantages and disadvantages.e. or to put it diﬀerently. F(f) has a value of zero because those frequencies are absent. and after that go deeper into Code Composer. Cepstrum.
Jonas Rosenqvist. Kim Smidje. The peaks in the cepstra occurs as a result of the periodicity of the signal or the sound. Bottom: The sum of the two signals The reason for using both the absolute and the logarithmic functions are because you want to emphasize lower frequencies to make sure that the dominant peak comes from the ground frequency and not from one of the over tones. which is an important property for the cepstra (the spectra for the cepstrum). Johan Mattsson 17 Figure 1: Top: Two sinusoidal signals with diﬀerent amplitudes and phases. the signals will be additive. though not in the sense of a signal in the time domain. this would respond to the frequency derived from taking the sample rate (measured in Hz) divided with X. For instance if a pea! k in the cepstrum diagram would appear at point X (dimensionless). After the quefrency has been calculated.[1] . The result after the transformation is called quefrency. The quefrency will therefor be the sum of all the signals which are recorded. the highest peak in the window will correspond to the right frequency. Henrik Nilsson. which is measured in seconds. Because of the convolution occurring with the FFT.
As the name says autocorrelation tries to ﬁnd the correlation of an input signal with the input signal with some lag.18 Pitch Estimaiton Figure 2: The power spectrum of the two sinusoidal signals 3 Methods There’s two types of algorithms that can be used to obtain estimated frequencies. Algorithms in the time domain and algorithms in the frequency domain. The main problem with the autocorrelation is for higher frequency’s because the number of additions will become overwhelming.[1] .1 Time domain One very simple approach that could be used in the time domain is to look at the zero crossings in the signal. This is the reason why autocorrelation is mostly used in the low to mid frequency range. 3. This approach isn’t that robust since for signals that consist of multiple sinus signals with diﬀerent periods the result will not be close to the real frequency. by comparing the cross correlation 1 point a time we can start building another graph which hopefully will look like a sinus signal. That is when the signal goes from a high value to a low value and the other way around. In one period the signal will cross zero two times and by measuring at what times this is done a rough estimate of the pitch can be calculated. Other methods in the time domain such as autocorrelation has a diﬀerent approach.
Jonas Rosenqvist, Kim Smidje, Henrik Nilsson, Johan Mattsson
19
autocorrelation [k] =
1 N −k
N
signal [n] ∗ signal [n − k]
n=k
3.2
Frequency domain
Frequency domain methods takes an input in the time domain and computes the frequency spectrum. The input signal displayed in the frequency spectrum will cover the whole spectrum but the dominant frequency will have the highest peak. The main advantage of frequency methods is the use of the fast Fourier transform which makes the computations fast and reliable. There are a number of diﬀerent algorithms that perform operations in the frequency domain for example kepstrum, cepstrum, power cepstrum and maximum likelihood. To be able to compute the estimated pitch the input signal needs to be divided into smaller parts and computed for each part. The disadvantage of dividing the input signal is the loss of resolution, since the estimated frequency depends on the sampling rate and the length of the divided input signal. It is possible to still get a good resolution if for example the sampling rate is 8000 and the lenght of the divided input signal is 8000 then each frequency can be represented.[1] sample rate index of (maximumvalue)
4
Implementation
In order to evaluate the algorithms ability to correctly detect the dominant frequency we decided to ﬁrst implement it in Matlab. In addition to the group being more experienced working with Matlab as opposed to the C language, it allows for much faster implementation thanks to the high level developing environment with many of the crucial algorithms, such as the fast Fourier transform, already implemented. It was also at this stage that we estimated the appropriate cutoﬀ level in the quefrency domain, as well as suitable signal sample size, by experimenting with various audio signals. By not discarding enough initial values in the quefrency domain one run the risk of ﬁnding a false dominant frequency. On the other hand, if too many values are ignored, one might miss the true dominant frequency. The sample size must cover a large enough time frame to detect the lowest possible frequency, and that the same time not be too big in regards to the limited memory and real time requirements. Using a sample rate of 32000 samples/second and a vector of length 512, corresponding to a window of 16 ms, we found that we got acceptable results while still being able to detect frequencies as low as approximately 240 Hz. We loaded an audio signal generated by a horn, which had the frequency 123.47 Hz. That frequency corresponds to
20
Pitch Estimaiton
Figure 3: Power spectrum from the B2 horn a B2 note, which means the note B in the second octave. When running our Matlab pitch detection program, we get these plots where the ﬁrst plot represent the power spectrum from the B2 horn and the second plot shows the cepstrum of the B2 horn. Given the cepstrum, the estimated frequency can be calculated to approximately 126 Hz, which can be considered as a reasonable deﬂection from the correct answer. Since we aimed to make our range designed for fourth octave we did the same with C4 , then we got the plot shown below. If the peak is located at index 124, which gives us the frequency circa 258 Hz. Implementation in Code composer was very similar to what we did in Matlab. With the use of the DSP library the algorithms FFT and IFFT was made available. Other functions as log and abs was introduced into the program with the libraries math.h. To increase the accuracy of the program we chose to take the 10 latest values and only present the median of these to remove any potential outliers. After some trial and error we chose a cut oﬀ point at 20 samples as that gave us the right results in our chosen octave.
Results
Table 1 shows the frequency estimation for 17 diﬀerent frequencies that covers evenly spaced intervals of the third and ﬁfth as well as the whole fourth octave in the frequency range of 260520 Hz. In addition to this we’ve added
Jonas Rosenqvist, Kim Smidje, Henrik Nilsson, Johan Mattsson
21
Figure 4: The cepstrum plot with folding
Figure 5: Highest peak at index 124
22
Pitch Estimaiton
all the pure notes in the fourth octave.The fourth column of the table shows that the errors in the fourth octave are very small (average of 1.5 Hz), and rapidly increase as we move outside it. Input frequency (Hz) 240 262 280 293 320 330 349 360 392 400 440 480 494 520 523 560 600 640 680 720 760 800 840 880 Output frequency (Hz) 260 262 280 292 320 328 346 358 390 400 438 476 492 524 524 560 602 640 680 726 760 800 842 888 Note C4 D4 E4 F4 G4 A4 B4 C5 Deviation (Hz) 20 0 0 1 0 2 3 2 2 0 2 4 2 4 1 0 2 0 0 6 0 0 0 8
We could get the right results as low as 180 Hz and as high as 2000 Hz but at these values the reliability suﬀers and you sometimes end up with an over tone, the right note but wrong octave.
5
Problems encountered
The biggest problems which we encountered where during the implementation in C. Mainly the DSP library provided by Texas Instruments caused big problems. First just setting the class path right was kind of tricky but after looking into the reference guide and some help from Frida we got it right. After solving the class path a new problem was introduced, how to use the function DSPF sp ﬀtSPxSP which was provided by the DSP library. Apparently there is a bug in C when converting from unsigned short to ﬂoat
Curtis (1996). If this isn’t done correctly there will be values like 0 and those will be interpreted as the maximum ﬂoat value. Cambridge University Press.html . Part 4: Sound analysis [2] Norton.mtu. Fundamentals of Noise and Vibration Analysis for Engineers. The reason for this is that the input signal is divided into smaller parts. This cancel out the ﬂuctuation but introduce other disadvantages for example if the frequency of the input signal varies very fast.Jonas Rosenqvist. Michael. Henrik Nilsson. Karczub. [3] Frequencies of Musical Notes . Johan Mattsson 23 which forces you to ﬁrst cast to from unsigned short to short and after that cast to ﬂoat. What we did was to use the median of the last 10 values instead of just the regular average.http://www. Kim Smidje. The Computer Music Tutorial. 7 References [1] Roads. The last issue we had to resolve was that some frequencies seemed impossible to get good estimations for and the estimations ﬂuctuated a lot. Another limitation of our program is if the input signal changes frequency every 16 ms then the output will just be the median frequency of the last 10 frequencies. Denis (2003). but on the other hand in reality this should not be a problem. 6 Conclusion The estimation was good for frequencies in the range 240 .880 Hz before and after this range the errors becomes too large.edu/˜suits/notefreqs.phy.
24 Pitch Estimaiton .
25 Part III Vocoder Mattias Danielsson. . A synthesizer was needed as a carrier signal which was the key to change the voice. In order to do so the autocorrelation was needed from the voice. The LevinsonDurbin recursion was used to model the voice using an IIR lattice ﬁlter structure. The vocoder was programmed so that the voice was controlling the level of the carrier signal. Andre Ericsson. A highpass ﬁlter was used for preﬁltering (FIR ﬁlter). The course Optimum Signal Processing is recommended to take before reading this report. Kujtim Iljazi. The project was to program a musical based LPCvocoder. Babak Rajabian Abstract This report is based on a project by students to become more experienced in programming signal processing algorithms on a Texas Instruments TMS320C6713 DSK.
26 Vocoder 1 Introduction The purpose of this project was to program a Texas Instruments TMS320C6713 DSK in Code Composer Studio v3. low frequency content of the voice will be modeled. 2 2. The characteristic from the synthesized vocal sound is dependent on what sound the synthesizer is set to produce. With the autocorrelation values we can estimate the ﬁlter coeﬃcients for . in order to get rid of the low frequency content in the voice (the higher frequency content in the voice spectrum is what deﬁnes the vocal tract. When the performer presses down a key on the synthesizer and speaks into the microphone (both plugged in to the vocoder) a synthesized vocal sound is heard from the loudspeakers. The vocoder is programmed to be used together with a synthesizer. which explained on the second lecture of this course).3 in C to an LPCbased (Linear Prediction Coding) vocoder used as a musical instrument. Thereafter a block of samples is built from the highpass ﬁlter in order to calculate the autocorrelation values. A sampled speech signal from the microphone is ﬁrst ﬁltered through a high pass ﬁlter. The big advantage is that the vocoder is compatible with basically every musical instrument that can produce a sound with a constant sustain level and rich frequency content. Otherwise the unnecessary.1 Theory Overall description of our vocoder model Figure 1: Our vocoder model Our vocoder model is shown in ﬁgure 1 above.
The LevinsonDurbin algorithm is described in [1] and repeted in the table at the top of the next page. ρx (k) = rx (k) rx (0) (3) 2. It calculates both regular IIRﬁlter coeﬃcients.Mattias Danielsson.2 The highpass ﬁlter An FIR ﬁlter is simply deﬁned by the following equation p H(z) = k=0 bp (k)z −k (1) An FIR ﬁlter structure of this type is used to implement a highpass ﬁlter. Kujtim Iljazi. a(j) . From the sample block from the high pass ﬁlter the maximum value can be obtained to represent the amplitude of the speech signal. rx (k) = 1 N −k N x(n)x∗ (n − k) n=k (2) The autocorrelation function must be normalized in order to prevent overﬂow and to work properly with the LevinsonDurbin recursion.3 The autocorrelation function hej The autocorrelation function is a function to determine how much a signal relates to itself at diﬀerent time lags. and the reﬂection coeﬃcients for an IIR lattice ﬁlter. . 2. The last step that needs to be done is to invert the eﬀect of the high pass ﬁlter that was implemented at the beginning by ﬁltering the output signal from the IIRmodel with a low pass ﬁlter. Γj . The signal from the IIRmodel is the modiﬁed voice. This way the sound level of the carrier signal is basically controlled by the voice signal. Andre Ericsson. 2. The estimation of the autocorrelation function is given below. Babak Rajabian 27 the allpole model (IIRﬁlter) which should represent a model for the vocal tract. This value is multiplied with the normalized carrier signal (which is always a signal between 1 and 1).4 The Levinsondurbin recursion The LevinsonDurbin recursion is an algorithm used to ﬁnd an allpole model by using a sequence of autocorrelation values.
. simple tests for stability and decreased sensitivity to parameter quantization eﬀects”.28 Vocoder 1. e+ (n) = e+ (n) − Γj+1 e− (n − 1) j j+1 j e− (n) = e− (n − 1) + Γ∗ e+ (n) j+1 j j+1 j (4) (5) . it uses the reﬂection coeﬃcients Γk . 1. p − 1 j 0 = ρx (0) (a) γj = ρx (j + 1) + i=1 aj (i)ρx (j − i + 1) j (b) Γj+1 = −γj / (c) For i = 1. j aj+1 (i) = aj (i) + Γj+1 a∗ (j − i + 1) j (d) aj+1 (j + 1) = Γj+1 (e) 3. Instead of using the regular ﬁlter coeﬃcients. a(k) . For j = 0... b(0) = √ p j+1 = j [1 − Γj+1 2 ] Table 1: The LevinsonDurbin recursion 2.. . . Initialize the recursion (a) a0 (0) = 1 (b) 2.. This structure has ”the same advantages of modularity. 2. A single stage of an IIR lattice ﬁlter structure is shown on the next page and it’s diﬀerence equations describing it is shown below..5 The IIR lattice ﬁlter The IIR lattice ﬁlter structure is an alternative structure of the IIRﬁlter.
Babak Rajabian 29 Figure 2: Single stage of an IIR lattice ﬁlter A complete p:th order IIR lattice ﬁlter can than be derived from the diﬀerence equations from the previous page and the ﬁgure above as shown in the ﬁgure below. Figure 3: p:th order IIR lattice ﬁlter . Kujtim Iljazi.Mattias Danielsson. Andre Ericsson.
98z −1 (6) The order of the ﬁlter modeling the vocal tract was chosen to 8. There are a lot of conﬁguration bits that can be set to change the parameters of the AD/DA converter but these were left at their default value as found except for the sampling rate that was changed to 8 kHz. Our highpass ﬁlter is deﬁned by the following equation HHP = 1 − 0. Using the knowledge that speech is somewhat frequency stationary during 20 ms (which was read on a report based on a similar speech modeling project in this course) and the sampling frequency of 8 kHz results in the buﬀersize of 160 samples. For example if you want to use an old analogue synthesizer to generate the carrier signal you have to make sure that the output from the synthesizer is below the reference voltage for the ADconverter. A code template used in the second lab in this course was used to make starting easier. This results in the size of the normalized autocorrelation vector to be 9 (to make an allpole model using the LevinsonDurbin recursion of order n you need n+1 autocorrelation values). because the absolute maximum value that the samples we work with can obtain is 32000. . This value is multiplied with the normalized carrier signal which is in an interval of values between 1 and 1. To make the implementation of the highpass ﬁlter (an FIR ﬁlter) of any size a general and already written function by Texas Instrument was used even though the order of the highpass ﬁlter was only of order one. To make the voice control the level of the carrier signal. At ﬁrst a regular IIRﬁlter was used to model the vocal tract but was later replaced by an IIR lattice ﬁlter structure which is described later in the results section.30 Vocoder 3 Implementation The implementation of the vocoder was done in the C programming language using Code Composer Studio. the absolute largest value is found from the block of 160 samples from the highpass ﬁltered speech signal. not to break any realtime performance. which was done by dividing the carrier signal by 32000. One thing to keep in mind is that the PIP buﬀers are ﬁlled with unsigned shorts that have to be type casted to short and then to ﬂoat to be able to do calculations in increased precision. Also the input voltage to the ADconverter has to be taken into account.
Mattias Danielsson. To test our complete vocoder system we used a recorded voice sample on one the left channel and diﬀerent square wave audio sources increasing in frequency on the right channel. The output of the LevinsonDurbin algorithm should be estimates of the ﬁlter coeﬃcients in the IIR ﬁlter. Our LevinsonDurbin algorithm produced estimates that varied around the values preset by us. The autocorrelation was strictly descending in value for increasing lag shifts. The LevinsonDurbin algorithm was tested by feeding white noise into an IIR ﬁlter where we ourselves have set the ﬁlter coeﬃcients. Andre Ericsson. In code composer studio there is also special commands enabling printout of internal variables and output from blocks while running the code. Babak Rajabian 31 4 Testing and debugging For testing of the diﬀerent blocks in the vocoder we took a pragmatic approach. . The diﬀerent sources on the right channel could be mixed and ampliﬁed at will in the sound program Audacity. resulting in 9 autocorrelation coeﬃcients. For the autocorrelation we used a sine signal and looked at the resulting output vector. The reason for the variation around the correct ﬁlter coeﬃcient values was the short input block length of 160. Kujtim Iljazi. Knowing the behaviour of the blocks. For the highpass ﬁlter we used sounds having both high and low frequency content and listened to the ﬁltered signal. The maximum lag shift in our case is 9. diﬀerent signals revealing easily if the blocks were working correctly or not. The output of this ﬁlter was used as input to our autocorrelation block and the output of the autocorrelation was sent to the LevinsonDurbin algorithm. We also tested the autocorrelation with white noise and this resulted in a low autocorrelation and not predictable values for lag shifts greater than zero. The value of 160 is the result of the sample rate of 8 KHz and speech duration block of 20 ms used.
The pitch of the generated speech was also tested by changing the frequency of the square wave carrier. The reason for the low sound level is probably that no ampliﬁcation is done in the AD/DA converter. We never had time to test carrier signals from real synthesizers before the deadline for the report but we will show it in our demonstration. The speech pitch changed satisfactory when changing the frequency of the carrier making us happy with the result. . This was overcome by using ampliﬁed speakers (we did’nt gain the output with the parameter in the program because we did’nt know how to change the gain parameter during runtime). When using our regular IIR ﬁlter for ﬁltering of the carrier. After altering the LevinsonDurbin algorithm so that the reﬂection coeﬃcients did not exceed an absolute value of one resulted in no spikes in the output signal. this resulted in loud and painful sound level spikes later discovered to be caused by unstable ﬁlter coeﬃcients. Trying diﬀerent ﬁxes to the IIR ﬁlter resulted in some improvements but no complete absence of the painful sound spikes. As always there are diﬀerent changes. The ordinary IIR ﬁlter was later replaced by a lattice IIR ﬁlter. choices and improvements you can make in system design and implementation but we are satisﬁed with our choices.32 Vocoder 5 Results and conclusions The ﬁrst test of our complete vocoder system resulted in low sound level.
org/wiki/Vocoder . 1996 [2] http://en.Mattias Danielsson.wikipedia. Kujtim Iljazi. Statistical digital signal processing and modeling. Hayes. John Wiley and Sons. Inc. Babak Rajabian 33 References [1] Monson H. Andre Ericsson.
34 Vocoder .
R.35 Part IV Reverberation R. the challenge was to implement a digital reverb on a Texas Instruments TMS320C6713 DSK development board. AbduRahman. Diﬀerent parameters in the algorithm had to be identiﬁed and tuned experimentally. T. To meet realtime constraints imposed by CPU and memory speeds various hardware and software optimizations had to be employed. Mittipalli. To aid development. S. JeanMarc Jot’s Feedback Delay Network algorithm was used as reverberation algorithm. . Isacsson Abstract In this project. Tullberg. the algorithm was also implemented as a nonrealtime version in Matlab. The ﬁnished application produces a smooth reverb sound running without glitches at a CPU consumption of approximately 6065%. This was beneﬁcial both as a reference design as well as a tool for parameter tuning and code analysis.
36 Reverberation 1 1. The reﬂections go on themselves to hit walls and obstacles and get reﬂected again and so on. as reﬂections are reﬂected and multiplied again and again. In time. The earliest iterations of these reﬂections are called early reﬂections and extend roughly 60 to 100 ms. after the initial direct sound (see ﬁgure 1). Sounds are enriched and colored by these reﬂections due to airs and obstacles tendency to dampen higher frequencies to a greater extent than lower frequencies. This last part called late reverberations starts at 100 ms and can go on for several seconds in a large enough room or concert hall.1 Introduction Reverberation Sound waves travelling in a room are reﬂected when they hit walls or other obstacles. such as room size. before reaching the listener. followed by the late reverberated decaying part. consisting of a several of early delays. material and wall surface. A reverberated sound consists of three main parts: The sound that travels directly from the source to the listener is called the direct sound. . they become indistinguishable as separate echoes to the listener. Figure 1: Simpliﬁed image of direct sound and 1:st and 2:d order early reﬂections Figure 2 shows the early part. Reﬂected copies of the sound are delayed some time depending on the physical properties of the surroundings. This phenomenon is called reverberation. depending on the size of the room[1].
R. AbduRahman. S. Tullberg. T. Isacsson 37 Figure 2: Impulse Response of actual implemented reverb showing early reﬂections and late reverberations . Mittipalli. R.
2 Reverberation time The time for a sound to attenuate by 60 dB in a reverberant space is called the reverberation time. 40 dB[5]. while computationally expensive.163 in both formulas corrected to 0.161[2]): 0. We had read that. Sound is attenuated because of the surfaces in the room. The two most common formulas for the approximation of T r is the Eyring formula[1] (0. as they absorb the energy of the sound waves and their reﬂections.38 Reverberation 2 2.1 Theory Reverb Algorithm Early on in the project we settled on the ﬁrst of the algorithms developed by JeanMarc Jot. 100 dB. Figure 3: Jot’s FDN algorithm and how it ﬁts in the overall reverb implementation 2. T r . The deﬁning characteristic of this algorithm being the feedback delay network that Jot had introduced to model late reverberations. it produced an impressive reverberated sound with rich echo density[1]. and the background noise of an ordinary room.161 · V A · ln(1 − s) + 4 · δa · V Tr = (1) . The reasoning behind the 60 dB attenuation requirement is that the diﬀerence between the intensity of a common orchestra.
3 Delay elements z −mi The z −mi delays model the time it takes for a reﬂection to reach the listener and/or another obstacle or wall. These lowpass ﬁlters model the real worlds attenuation due to absorption. 16 diﬀerent lines. for example: m16 = 100ms48kHz = 4800 (4) The diﬀerent delay values are recommended to be mutually prime. the attenuation can be derived and frequency dependency established as[1]: Tr (ω) = − 3·T log10 (γ(ω)) (3) where T is the sampling period and γ(ω) is the attenuation per sample period as a function of the frequency ω. the signal is copied into. T. when expressed in sample units. and then ﬁltered by the hi (z) ﬁlters. This is to avoid superpositioning of harmonically related sound waves causing unpleasant resonances. delayed mi samples. 2. OnceT r is calculated.161 · V s·A 39 (2) where V is the room volume. Tullberg. 4 α . reﬂection and spreading in walls and other obstacles.2. Mittipalli. Isacsson and the Sabine equation[1]: Tr = 0. ai = ln 10 1 log10 (gi )(1 − 2 ). R.R. 2. s an average absorption coeﬃcient and δa is the frequency dependent attenuation constant of air. The delay in samples is the delay time in milliseconds multiplied by sampling frequency. so called ﬂutter echoes[3]. AbduRahman. The ﬁlters are expressed as follows in the frequency domain[1]: hi (z) = gi where (1 − ai ) (1 − ai z −1 ) − (5) 3miT gi = 10 Tr (dc) .4 Damping Filters hi (z) Starting from input x(n) in ﬁgure 3. High frequency components are attenuated to a greater extent than lower frequencies as described in 2. in our case. A is the room surface area. S.
Each time. The solution to this is to place an inverted version of our low pass ﬁlters. Tr (dc) with Tr (N yquist) and Tr (dc) being the time it takes for the highest and lowest frequencies respectively to decay by 60 dB. 2. the reﬂections are redistributed among the walls and obstacles in the room.6 Gains bi and ci These two vectors are simple gains used to achieve diﬀerent eﬀects. it should be both stable and lossless. We 1 simply set all the elements of vector b to to make sure the output did 16 not clip as the input was copied 16 times and then summed. the equivalent of setting all its elements to a value of one. often used to achieve stereo spread or other cross channel eﬀects. t(z) = where b= 1−α 1+α 1 − bz −1 1−b (6) . before output. both of which are fulﬁlled if the matrix is unitary. In other words. The element responsible for this redistribution in Jot’s algorithm is the diﬀusion matrix. 2. The c vector. in case of a matrix containing only real values.5 Diﬀusion Matrix A When a sound wave is reﬂected when hitting an obstacle in a room it is scattered across the room hitting other obstacles. was left unused. 2. in case of a complex valued matrix. which in turn scatter the new reﬂections across the room hitting other obstacles and so on. called a tonal correction ﬁlter. to equalize the modal energy irrespective of the reverberation time in each ﬁlter[4]. Since the damping or attenuation of sound waves is handled by the damping hi (z) ﬁlters. or orthogonal. It takes its inputs from the n delay lines and redistributes them back into the same delay lines.40 Reverberation α= Tr (N yquist) .7 Tonal Correction Filter t(z) As the outputs of the hi (z) ﬁlters have lose some of the higher frequencies they tend to not be an accurate representation of the original signal. the diﬀusion matrix should only redistribute the energy and neither amplify nor attenuate it.
The diﬀusion . some matrices have beneﬁcial properties that make multiplying by them easier.3 3. 3. a better echo density is achieved the more nonzero elements there is in a matrix[1]. Mittipalli. where especially constraints on CPU and memory usage were the next challenges. easy to optimize. containing only nonzero elements of equal magnitude. So naturally a matrix that lends itself to optimization when multiplied with a vector is preferred. with the obvious requirement that to be able to keep up with a sound input sampled at a certain frequency. T. our application had to process a certain amount of samples before the next chunk of samples arrived. For example.3. When this algorithm produced satisfying results it was adopted to the real time environment in TMS320C6713 DSK. so other considerations were taken into account when choosing the diﬀusion matrix. However. However. or orthogonal. Doing some research one such matrix[6] was found: A4 −A4 −A4 −A4 1 −A4 A4 −A4 −A4 A= (7) 2 −A4 −A4 A4 −A4 −A4 −A4 −A4 A4 where A4 is a Hadamard matrix 1 1 1 A4 = 2 1 1 of the 4:th order[7]: 1 1 1 −1 1 −1 1 −1 −1 −1 −1 1 (8) This matrix has the triple beneﬁts of being orthogonal (A = A1 ).1 Software Optimizations Matrix Multiplication Normally multiplying a vector of size n by a matrix of dimension n requires n2 multiplications and n(n − 1) additions. AbduRahman. Tullberg. as we later shall see.R. 3. R. when containing complex values. the more nonzero elements a matrix contains the more multiplications have to be performed. Isacsson 41 3 3. S.1 Implementation Realtime versus nonrealtime implementation To gain understanding of the algorithm a reference prototype was initially developed in Matlab. and being.2 Diﬀusion Matrix A potentially unlimited amount of matrices fulﬁll the condition of being unitary. when only containing real values.
each row was summed to get the ﬁnal result vector B of the matrix multiplication. To start with the matrix consists of only positive and negative ones. the vector elements need only be 4 multiplied with the scalar after having been summed according to the signs in each matrix column.3. when multiplying a vector x of size 4 with anA4 matrix the following intermediate values are calculated: a = x1 + x2 b = x1 − x2 c = x3 + x4 d = x3 − x4 and the resulting vector becomes: a+c a−c = b+d b−d Y 14 In our case this was done for the 16 element input vector in groups of four so that four resulting vectors were calculated for each sub input vector. So by choosing a certain type of matrix. aside from a scalar that can 1 be factored out. This reduces the number of multiplication to n. The resulting operation count is 16 + 16 + 163 = 80 additions. 3. in this case. the number of multiplications could be reduced from 256 to 16 and the.2 Circular Buﬀers Every input to the ﬁlters hi (n) is delayed mi samples. the regularity of the A4 matrix allows us to calculate intermediate sum values that can be reused instead of having to do each addition separately[6]. the outputs from the ﬁlters are then fed to the diﬀusion matrix and summed before being sent to the tonal correction ﬁlter. admittedly cheaper. Because of the long delay between the time a value is calculated and the time when it ﬁnally reaches the output and can be discarded. arrays had to be used to store the values.2 is one such matrix. Furthermore. . Thus. instead of the usual 1615 = 240. For example. These were organized in the following manner: Y 14 Y 58 B= Y 912 Y 1316 −Y 58 −Y 912 −Y 1316 −Y 14 −Y 912 −Y 1316 −Y 58 −Y 14 −Y 1316 −Y 58 −Y 912 −Y 14 Finally. These arrays were implemented as circular buﬀers. with each size of buﬀer i equal to the sample delay length mi.42 Reverberation matrix described in section 3. additions from 240 to 80. or 16 in our case.
and after the pointer is moved to the next position in the buﬀer. 3. making sure it wont be read until the pointer has traversed the whole buﬀer. R. In addition to the delays. our program only calls a function once to set some initial global variables.5 MB depending on conﬁgured delay lengths and predelay. Tullberg. The pointers position is incremented each iteration and when it reaches the end of the array it is reset to the beginning of the array. but without any potential increase in program size. Mittipalli. Values are read from the position in the buﬀer indicated by the pointer. AbduRahman. duplicated for each of the two stereo channels. T. which happens exactly mi iterations later. Figure 4: CPU load when compiled with optimization level O1 3. aside from the interrupt triggered process function. where each single sample delay . So high in fact that the low priority analysis module had trouble report any CPU load information back to the host. need a memory space of roughly 0. Using circular buﬀers reduces CPU load by just reading or writing to the array element at the pointer position instead of having to move all the elements of the array one position forward for each iteration. a newly calculated value is written to that new position. S. Isacsson 43 Circular buﬀers are governed by a pointer to the array.4 Compiler Optimization Compiling the program with default options and running it on the DPS board at 48 KHz resulted in very high CPU utilization.5 Hardware Memory Considerations The predelay buﬀer and the 16 delay line buﬀers. the predelay line was also implemented as a circular buﬀer. To remedy this we tried the diﬀerent compiler optimization levels and settled on O2 which gave an equally good CPU load as O3.R. This is as expected since O3 mostly deals with the inlining of functions[8] and.
While the SDRAM memory worked great. .040 (0xf0000) bytes was declared in the memory conﬁguration utility in Code Composer Studio. as the internal memory is faster than the SDRAM memory[9]. T his large amount of data that has to be stored for later. The 34 buﬀer arrays were then allocated to it using MEM calloc commands. delayed processing prohibited the use of the relatively small internal memory of 256 KB[9]. with the additional space to allow for some headroom in setting predelay and mi delay lengths. Instead the onboard SDRAM memory with its larger capacity of 8 MB[10] was used.44 Reverberation Figure 5: CPU load when compiled with optimization level O2 Figure 6: CPU load when compiled with optimization level O3 requires 4 bytes (size of ﬂoat data type) per channel. An external heap of 983. we were careful as to not allocate anything that we didn’t have to.
The response could be used to design a system which mimics the room perfectly but the size of the response is so big and complex that in real time processing it would yield impossible. and Tr (N yquist). and as described in 3. . such as the A matrix and the m delay lengths. in essence trying to replicate a physical environment as much as possible. R. such as signal processing. optimized algorithms. S. The volume of the room. to taste. The bulk of the algorithms parameters.1 Discussion and Conclusion General When modelling a reverberation application. 5 5. The reverb is nice sounding. we proceeded to use recordings of anechoic music as input to the DSP board to ﬁnetune parameters. Once the realtime implementation was in place and working. absorption of the materials used in the room. The ones that weren’t achieved were minor ones like the implementation of a host based user interface. 4.R. Mittipalli. etc. Tullberg. and has gained us additional skills in diﬀerent areas. AbduRahman. T. α. But as the work progressed. both by ear and by comparing plots of the dry. wet and mixed signals. nor what kinds of absorption the room would present we had problems with knowing what we were looking for.1 Result Experimental setup Experimentation was conducted in part using the prototype in Matlab where ﬁles of diﬀering sound material were treated with the reverb algorithm and sound ﬁles of 100% wet and userdeﬁned wet ratio signals were generated and compared to the originals.2 Results We achieved most of the goals that we sought. in no small part due to the large amount of delay lines. it’s total surface area. the physical properties have to be known. Also. what other elements in the room that could absorb energy and/or reﬂect the rays. we made these parameters conﬁgurable.4 the CPU load is within a satisfactory range running at the maximum sampling frequency of the development board. the learning experience has been huge. So replicating rooms with an algorithm is the way to go. C programming. One way of doing this is Ray Tracing where an impulse response of the room is achieved by setting up microphones in the room and then ﬁring a starter gun to produce the impulse. such as room dimensions. but with no idea of what kind of room we are trying to replicate. Isacsson 45 4 4. were chosen and permanently set at this stage.
there were drawbacks. as a nonrealtime version. . but it was also hard to measure the beneﬁts of optimizing matrix multiplication in Matlab since it’s already heavily optimized for that purpose. Besides the obvious advantages of having a prototype that all members of the group can reference while working on the realtime implementation. 5. The interface would ideally be able to communicate with the board in real time. room size. The main disadvantage of a realtime processing is the lack of inﬁnite. gain. The advantage is though having playback in realtime. Allocating memory right. why global variables weren’t seen by subfunctions and that Code Composer Studios own ﬂavor of the C language seems to have math functions that aren’t exact replicas of the Math library in C.2 Nonrealtime versus realtime implementation As mentioned earlier a version of the algorithm was implemented in Matlab. a lot more time is spent on making a realtime application work. This interface would contain controls for parameters like wet/dry ratio. Understanding what the compiler was trying to tell you. why things didn’t work at all. 5. but never implemented due to lack of time. or at least comfortably large. cut down and quality restricted. optimizing calculations and using the DSP’s builtin functions is just some of the aspects that had to be addressed. it was also beneﬁciary not to have to deal with constraints on memory and CPU while trying to get a grip on the algorithm itself. In comparison to oﬄine implementation.46 Reverberation and didn’t bother too much with accurately reproducing real rooms. and the like. using a subfunction that calculates all the necessary parameters from a theoretical user input. and the related special commands.3 Improvements A user interface was on our todo list. The hardest part was understanding the limitations of the DSP boards memory conﬁguration. Realtime issues did naturally not manifest during the prototyping phase. However. using a reverb in live music rigs and recording software to enable the user to hear the eﬀect while playing. The hours spent on trying to understand Code Composer Studio and sifting through the wealth of help information provided was the biggest drawback of the project. the calculations had to be optimized and if not enough. but focused on the audible results and getting the program to actually work. processing power. Because of the time restrictions when doing processing.
Spencer An Implementation of a Feedback Delay Network .Theory and Implementation Master Thesis.stanford.co. T. S. AbduRahman. Isacsson 47 References [1] Lilja.for the GNU compilers gcc and g++.networktheory.Final Project Report 20081209 http://twentyhertz.2 The TMS320C6x Architecture 2008 [10] SPECTRUM DIGITAL TMS320C6713 DSK Technical Reference 5067350001 Rev.org/wiki/Hadamard matrix Visited: 20110214 [8] Brian J.phyastr.pdf [7] Wikipedia: Hadamard matrix http://en.uk/docs/gccintro/index. Reay.com/618 FinalProjectReport SpencerCampbell. 3. Davide Introduction to Sound Processing. R.gsu. 3. Mittipalli.2 Reverberation 2003 [4] Tonal Correction Filter https://ccrma.org/wiki/Reverberation#Sabine equation Visited: 20110301 [3] Rocchesso. Gough An Introduction to GCC . LTH 2002 [2] Wikipedia: Reverberation http://en.html [6] Campbell.4 Optimization levels 2005 http://www.wikipedia. Tullberg.edu/˜jos/Reverb/Tonal Correction Filter.6.html [9] Chassaing.edu/hbase/acoustic/revtim.wikipedia. 6.html [5] Reverberation Time http://hyperphysics. Ola Algorithms for Reverberation . Donald Applications With The TMS320C6713 And TMS320C6416 Dsk.R. Rulph. A May 2003 .
48 Reverberation .
MelFrequency Cepstrum Coeﬃcients (MFCC).49 Part V Speech Recognition Using MFCC Harshavardhan Kittur. . The key is to convert the speech waveform into some type of parametric representation for further analysis and processing. we present one of the techniques to extract the feature set from a speech signal and implement it in an speech recognition algorithm using TMS320C6713 DSK Board. A wide range of techniques exist for parametrically representing the speech signal for the speech recognition task. such as Linear Prediction Coding (LPC). Manivannan Ethiraj. Mohan Raj Gopal Abstract In this project. MFCC is perhaps the best known and most popular. and is used in this project. and others. Kaoushik Raj Ramamoorthy.
Although there are numerous ways to model a speech signal and perform speech recognition in both hardware and software. no such system is stable for all kind of speakers in the world. When the number of words in the database is large and consists of similar sounding words (rhyming words). • Noise is generally a major factor in speech recognition and has to be carefully analysed while designing a system. to the desire to automate simple tasks inherently requiring humanmachine interactions. Our interest is to ﬁnd out the intricacies of designing such a system by implementing it on a TMS320C6713 DSK board. A noisy environment limits the system performance.3 Tools Used • MATLAB • Code Composer Studio • TMS320C6713 DSK Board • HiFi Microphone • Stereo Speakers . 1.2 Common problems found in designing such a system • People from diﬀerent parts of the world pronounce words diﬀerently. there is a good probability that one word is recognized as the other. Also. For reasons ranging from realization of human speech capabilities. • The rate of error in the recognition system depends on the amount of data stored in the system by training. 1. the rate at which they speak aﬀects the implementation of a speech modelling system • Speech is usually continuous in nature and word boundaries are not clearly deﬁned. research in automatic speech recognition has attracted a great deal of attention over the past few decades.1 Introduction Why Speech recognition? Speech is the primary means of communication between people.50 Speech Recognition Using MFCC 1 1.
MelFrequency Cepstrum . Mohan Raj Gopal51 Figure 1: Feature extraction using MFCC Figure 2: Feature matching using MFCC 2 2.Harshavardhan Kittur.1. Feature matching involves the actual procedure to identify the unknown speaker by comparing extracted features from the voice input with the ones from a set of known speakers. Kaoushik Raj Ramamoorthy.1 Theory Speech Recognition Algorithm At the highest level there are a number ways to do this complex task of speech recognition but the basic principles are feature extraction and feature matching. 2. Feature extraction is the process that extracts a small amount of data from the voice signal that can later be used to represent each speaker. The block diagram are shown in Figures 1 and 2.Manivannan Ethiraj.1 Feature Extraction In feature extraction phase the speech can be parameterized by various methods such as Linear Prediction Coding (LPC).
This is expressed in the melfrequency scale. The diﬀerence between the cepstrum and the Melfrequency cepstrum (MFC)is that in the MFC. MFCCs takes human perception sensitivity with respect to frequencies into consideration. . the frequency bands are positioned logarithmically (on the mel scale) which approximates the human auditory system’s response more closely than the linearlyspaced frequency bands obtained directly from the FFT or DCT. MFCCs are based on the known variation of the human ears critical bandwidths with frequency ﬁlters spaced linearly at low frequencies and logarithmically at high frequencies capture the phonetically important characteristics of speech. therefore we choose this method of comparison. the Euclidean distance or Euclidean metric is nothing but the ordinary distance between two points that one would measure with a ruler. In mathematics. Euclidean space becomes a metric space. MFCCs lack an outer ear model and. Take the Fourier transform of (a windowed excerpt of) a signal. 2. in audio compression. as if it were a signal. 3. and others. Take the Discrete Cosine Transform of the list of Mel logamplitudes.1. the Euclidean distance measures the percentage of dissimilar bits out of the number of comparisons made. cannot represent perceived loudness accurately. However.52 Speech Recognition Using MFCC Coeﬃcients (MFCC). using triangular overlapping windows. which can be proven by repeated application of the Pythagorean Theorem. 4. 2. hence. and therefore are best for speech recognition. Thus. unlike the sonogram. MFCCs are commonly derived as follows: 1. This can allow for better processing of data.2 Mel Frequency Cepstrum Coeﬃcients These are derived from a type of cepstral representation of the audio clip (a cepstrum is nothing but a ”spectrumofaspectrum”). which is linear frequency spacing below 700 Hz and a logarithmic spacing above 700 Hz. This is a measurement of how similar two user templates are. 2. MFCC which is used in this project is perhaps the best known and most popular. The MFCCs are the amplitudes of the resulting spectrum. By using this formula as distance.2 Feature Matching The feature matching phase involves the use of Euclidean distance. Map the log amplitudes of the spectrum obtained above onto the Mel scale. for example.
we divide the captured signal into ﬁxed number of overlapping frames (156 samples overlap) of sample length 256. but its FFT has both real and imaginary components. 3. windowing is applied to prevent spectral leakage. Meaning. Each frame consists of 256 samples of speech signal.4 Fast Fourier transform The FFT converts the timedomain speech signal into a frequency domain to yield a complex signal. We apply 256 point radix 2 FFT for each frame.3 Windowing After framing.A Hamming window with 256 coeﬃcients is used. The steps that are implemented to complete our design are listed below and the block diagram is shown in ﬁgure 3. This technique is called framing. 3. 3. Mohan Raj Gopal53 3 Implementation Both the training and the recognition system is the same till we ﬁnd MFCC coeﬃcients. 3. By taking advantage of this property.2 Frame blocking It is assumed that recorded speech is piecewise stationary which means the signal is stationary for short period of times. The FFT algorithm in Rulph Chassing book [1] is used in our implementation.1 Level detection When the speaker says out a word the system has to do silence detection and capture only the speech signal.Harshavardhan Kittur.Manivannan Ethiraj. The start of an input speech signal is identiﬁed based on a prestored threshold value. 3. Speech is captured when it exceeds the threshold and is passed on to the framing stage. It is easier to combine this step with the Frame blocking step. The total no of stages in FFT is 8. The training phase stores the coeﬃcients and the recognition phase compare current recorded coeﬃcients with the stored ones. Kaoushik Raj Ramamoorthy. Also. The second half of the . The sampling frequency for our system is 8kHz and the speech is captured for 1 sec which leaves us with 8192 samples. and the subsequent frame starts from the 100th sample of the previous frame. Speech is a real signal.5 Power spectrum calculation The power in the frequency domain is calculated by summing the square of the real and imaginary components of the signal. since the frame length is 256 samples.
54 Speech Recognition Using MFCC Figure 3: Our Speech Recognition System .
Once the ﬁlter transfer function is obtained. boundary points are computed to determine the transfer function of the ﬁlter. 20]. 0 if f [k] ≥ fc [m + 1] where f[k] is the frequency of the k th sample given by k∗fs N and N is the no. m) = ff [k]−fcc [m−1] (2) [m+1] fc [m]−fc [m+1] if fc [m] ≤ f [k] < fc [m + 1]. Once the edge frequencies and the center frequencies of the ﬁlter are found. of samples in each frame (256 in our case) The width of the ﬁlter (resolution) of the ﬁlter is given by: φ= φmax − φmin M +1 (3) where φmin is the lowest frequency of the ﬁlter bank and φmax is the highest frequency of the ﬁlter bank The center frequencies on melscale is given by φc [m] = m ∗ φ for m ∈ [1. uniformly spaced in the Melfrequency scale between 0 and 4kHz.Manivannan Ethiraj. The power signal is then applied to this bank of ﬁlters to determine the frequency content across each ﬁlter.6 Melfrequency wrapping Triangular ﬁlters are designed using the Melfrequency scale with a bank of ﬁlters to approximate the human ear. For a given frequency f. c [m]−f H(k. 3. Twenty ﬁlters are chosen. Kaoushik Raj Ramamoorthy. Mohan Raj Gopal55 samples in the frame are ignored since they are symmetric to the ﬁrst half (since the speech signal is real and has a linear phase). f [k]−f [m−1] c if fc [m − 1] ≤ f [k] < fc [m]. The Melfrequency spectrum is computed by multiplying the signal spectrum with a set of triangular ﬁlters designed using the Mel scale.This step is basically a frequencywarping operation where we change the frequency of the signal . we can apply this ﬁlter bank onto the powerspectrum to obtain the melspectrum .Harshavardhan Kittur. The transfer function of the triangular ﬁlters is given below: 0 if f [k] < fc [m − 1]. the mel of the frequency is given by B(f ) = 1125 ∗ ln(1 + f )mels 700 (1) The frequency edge of each ﬁlter is computed by substituting the corresponding mel. The center frequencies in the frequencyscale is given as fc [m] = 700 ∗ [10 φc [m] 2595 − 1].
Therefore instead of timewarping the speech signal into a standard time domain. these characterize the particular speaker and word which are stored during the training phase.8 Melfrequency cepstral coeﬃcients The log mel spectrum is converted back to time. which will be implemented on the DSP board. Observing the various plots.7 Logenergy spectrum Once the Melspectrum is obtained. The discrete cosine transform (DCT) of the log mel spectrum yields the MFCC. we came to the conclusion that the recorded speech signal can be of varying length (time). of samples = 100*256*sampling frequency) after repeated trials by diﬀerent speakers to encompass all speech parameters. We normalized the recorded speech signal and implemented the MFCC algorithm. This is given by: Log energy spectrum[m] = ln(M el spectrum[m]) 3. we considered that the speech is to be spoken for ﬁxed time duration on the DSP board. we take the logspectrum of the subsequent signal.56 based on the melscale. We take the DCT since the powerspectrum and logmel spectrum are real signals. m] (4) 3.9 Comparison in the feature matching phase Once we have the MFCCs. 3. the coeﬃcients are again determined for the uttered word and recognition is carried out by analyzing the Euclidean distance with respect to the stored coeﬃcients and deﬁning an appropriate threshold calibrated appropriately to increase the word recognition rate. 4 Implementation in MATLAB An initial feasible algorithm was implemented in MATLAB to emulate the diﬀerent steps for speech recognition. During the recognition or feature matching phase. The maximal duration was considered as 100 frames (no. The logfunction is basically an amplitudemodulation where the lower frequencies are boosted and the higher frequencies are almost maintained constant. This is elaborated in the equation below: N −1 Speech Recognition Using MFCC M el spectrum[m] = k=0 P ower spectrum[k] ∗ H[k. Steps that encomposes our MATLAB implementation are shown below in ﬁgures 4 and 5 .
Kaoushik Raj Ramamoorthy.Manivannan Ethiraj.Harshavardhan Kittur. Mohan Raj Gopal57 Figure 4: Recorded signal and Signal after silence detection Figure 5: Melspectrum and MFCC for all frames .
7 Conclusion We have successfully implemented an MFCC system for extracting a feature from voices. This project enhanced our experiences of working in MATLAB and C. We also learned to use Latex in the process of completing our report. It is also possible to easily add other words to our training system. Hippopotamus. We are required to generate the opcode to ﬁt into this memory along with the stack and heap space. 6 Tests and Results In the Speech training phase. the amount of knowledge obtained during this project is exceptional. Further we made sure that the sequential steps in the algorithm operated on the pointer to the variable instead of creating copies of the variable. Elephant. Although the results obtained were not as expected. There is a 90% match for certain speakers and less than 50% for some speakers.Then those training vectors are stored in a header ﬁle to be compared with the test vectors. Apart from the technical aspects. the experimental setup was performed in a controlled environment were the noise was minimal and its eﬀects could be disregarded. This poses a diﬃcult situation for us. The words ’Cat’ and ’Dog’ has the higher recognition rate than the other four words. The word identiﬁed is displayed using normal printf statement in the code composer studio.58 Speech Recognition Using MFCC 5 Implementation in DSP Board The DSP board has limited onchip memory (192K internal RAM). We learned the techniques of implementing a speech recognition system. Overall. Our MATLAB implementation helped us a lot to complete our ﬁnal implementation in the DSK Board. In the speech recognition phase training vectors are compared with the test vector using the Euclidean distance method. Further important constants were stored in the program memory (as #deﬁne preprocessor directives) and other variables were instructed to be stored in the heap or stack (as #pragma preprocessor directives). we learned to manage our time with proper planning. The training vectors (Time averaged MFCC coeﬃcients) are obtained for diﬀerent words like Cat. this project was challenging and was a good experience to us. Mouse and Tiger. Dog. We were also able to identify the word from diﬀerent speakers uttering the word using the extracted feature. using MFCC in particular. . We also learned to use the Code Composer studio and DSP Bios. hence we used only a minimal set of variables both global and local.
Kaoushik Raj Ramamoorthy..2007.. 2005 [2] Sigurdur Sigurdsson. Mohan Raj Gopal59 References [1] Rulph Chassaing Digital Signal Processing and Applications with the C6713 and C6416 DSK. Proceedings of the Seventh International Conference on Music Information Retrieval (ISMIR).P. R.Harshavardhan Kittur. College of Engineering. Implementation of a VoiceBased Biometric System.V. Kaare Brandt Petersen and Tue LehnSchiler Mel Frequency Cepstral Coeﬃcients: An Evaluation of Robustness of MP3 Encoded Music. 2006 [3] Adarsh K. . Thesis submitted at R. John Wiley & Sons. India . Deepak. A. Diwakar R. Karthik R.Manivannan Ethiraj.
60 Speech Recognition Using MFCC .
It requires the processing of each and every pixel. we used the Code Composer Studio (CCS). videopreview. we can eﬃciently detect the human face with in a frame of incoming video data stream. lips. tracking and recognition in video is a computational extensive procedure. We used the skin detection as our parameter to implement the face detection and tracking in video. ears for further processing in our face recognition algorithm. which does all the computations only on ﬁxed point numbers. DaVinci. contains various basic functions to process the pixels in current frame. Tracking and Recognition Asheesh Mishra. This is because. The board uses the state of the art DSP processor. Mohammed Ibraheem. we can achieve better eﬃciency in detection and can extract the various feature like. eyes. There can be various parameters depending on which.61 Part VI Face Detection. etc. We did make use of those functions in the understanding of how the system actually works and how we can manipulate the pixels values. which actually comes along with the board.c example ﬁle. Shashikant Patil 1 Abstract The Face detection. USA. . depending on the desired ﬁnal output. TI provides various in build functions to get started with the video projects like. We used the exceptional capabilities of the latest DSP board TMS320DM6437 from Texas Instrument. nose. skin detection. To facilitate our implementation of project. that too from Texas Instruments. like edge detection.
2 .Capture an image. TMS320DM6437 with Code Composer Studio gives the algorithm developers power to concentrate on developing powerful and eﬃcient algorithms in less time with many helping utilities. 4 . . 3 . These stages are shown on the following system block diagram in ﬁg. Tracking and Recognition 2 Introduction Face recognition is getting most important with availability of cameras and need for automated processing of the videos to serve many purposes.Face Tracking. DSP gives the power for processing video and get meaningful information from the video.Face Recognition. Our system that we developed in our project consists of four main stages: 1 .62 Face Detection.Face Detection. 1. The ﬁrst stage to capture an image frame from a stream video input to the TI TMS320DM6437 and processing it to detect the face region in the captured image and then send this part from image to the PCA module to generate the features vector and compare it with the prestored vectors in the database to ﬁnd if the nearest match (face recognition stage).
we wanted to reduce the undesired pixels which are of similar values as of skin color. Chrominance components in an image are responsible for the color composition of an image. hence.Color Space Model A Color space is only a format of representing color. Here. There are various good image ﬁltering algorithms available for reducing the noise. Luma (Y) is basically. 4 Image Filtering for Noise reduction Image ﬁlters are used to remove the undesired image details. slows down the overall performance of the ﬁnal output. We made use of Median Image ﬁlter to reduce the noise in our project. false edges are not detected during further processing. Mohammed Ibraheem. brightness. like smoothening of image. Y is the luminance component whereas Cb & Cr are the blue and red color chrominance components. and hence. The main disadvantage of implementing this was that. and saturation in one way or the other. Every pixel contains the YCbCr information in the format of 4:2:2. As far as our project was concerned. One can always take the help of any good book on Image processing to refer for image ﬁlters.Asheesh Mishra. luminance. Here. if the pixels diﬀerentiate greater than a certain threshold with their neighboring pixels. had to reduce the actual frame rate. it means every other value is a Y component and every fourth value is a Cb and Cr component in the series of video data. this process consumes a lot of time. To accommodate this feature we. Median Image ﬁlters and others. like. The median ﬁlter works on the principal of 8.neighborhood. then they will be set to average value of those pixels. which are more suitable for the edge detection or further processing on image frame. Shashikant Patil 63 3 YCbCr. hence process only a few frames to speed up the entire process. responsible for the brightness in an image and greatly inﬂuences the perception of an image. 5 Edge Detection The simplest approach after detecting face is to get some feature extraction from the detected face so that we can match with the next face to be recog . because. A thorough understanding of YCbCr was mandatory in our project of video processing. we typically had to work around mostly with the Chroma components of an image. Gaussian Noise ﬁlters.
we have implemented the Principle Component Analysis(PCA) for the recognition part. eyes.64 Face Detection. • Windowsliding technique using background pattern. Diﬀerent algorithms for Edge Detection are as below • Canny edge detection . There is need to separate the face from other background. we found the thresholding and linking method can be implemented successfully in the simulation and on the realtime TMS320DM6437 system. this step may not be of much use in that sense. face and movement detection. • Eye blinking pattern detection. • Edge Thinning. • Thresholding and linking. • Phase congruency based edge detection. The reason for using the edge detection was to mainly to extract the features from the detected face like. • Appearance. There are many algorithms available for it like • Binary patternclassiﬁcation. rather than recognizing other parts of human body as face. The threshold value is dependent many factors so it has to adjust with the idea setting for the successful diﬀerentiation and edge detection. nose and mouth. When implementing. but it actually helped the system to become more robust in recognizing only face. • Skin color to ﬁnd face segments using static background and lighting condition. Tracking and Recognition nized. Basically using it we compare with the neighboring pixels with some threshold value. • Other ﬁrstorder methods. 6 Face Detection and Tracking Among all frames it gets important task to detect face for further processing. Detecting edge once we thought a simplest approach towards feature extraction. . Since.
3 After Processing Luma(Y) and Chroma(Cb)(Cr) Components. Since. the Chroma components becomes much more important. we need to ﬁrst set a speciﬁc limit to Chroma blue. which we are going to manipulate. like setting it to 0xFF gives the comparable results. which is done by scanning the entire frame from top left position till the end of frame is reached. the skin color falls in the range of 0x8A to 0x8C of Chroma red. As discussed above. Mohammed Ibraheem.Asheesh Mishra. we had to detect the face from the frame. the image consists of only face. It has given expected results for detecting the face from the entire frame. Now. to display on to the monitor. Now. ﬁrstly. this stored and modiﬁed frame has to be written back on to the write cache buﬀer. Below are some of the images shown after processing this function. Most of the skin color falls in the speciﬁc color ranges of Chroma red component. entire Y component should be of same value. we had to apply the ﬁlter to remove the undesired noises coming from the reﬂective surfaces. and others to the same value as we had given to the Luma component. Fig. Shashikant Patil 65 While studying the method in the Ref[1] we found the implementation of Skin color to detect face segments using static background and appropriate lighting condition in lab environment most suitable and successful. hence. Here. During the scanning we searched for the continuous pixels holding the same values . To detect the face in video. but the point is. we made use of the median ﬁlter for this. Now then. we had to store the current fame to a temporary array. we are going to diﬀerentiate face from background. then. The luminance (Y) component is set to any particular value which diﬀerentiate from skin color. We set all the pixels falling in this range to speciﬁc color. to make entire procedure more robust.
4 After detection and VGA display of processed frame.5 After detection and VGA display of processed frame.(Exp1) Fig. on the basis of the status of the ﬂag raised. Fig. when it satisﬁes a certain threshold.(Exp2) . The box was placed by setting the values of all the desired pixels (to black) using the value in pointer register. comes the tracking part. Tracking and Recognition as of skin color. Now. which required the detected face should be tracked in a frame for its new position.66 Face Detection. a ﬂag is raised to indicate that face had been detected in a frame and that position is also saved in a pointer register. A box is place on to the position captured from face detection part.
and of course that helps us to improve the memory resources consuming in the real time system because of the hardware limitation [13].Asheesh Mishra. Shashikant Patil 67 7 Face Recognition Common algorithms for face recognition like.[14]algorithm we found most interesting and successful in the simulated environment using Matlab. Support Vector Machine (SVM) 4. Generating the Eigenfaces: Considering the Tmatrix as input to this stage. by calculating the Euclidean distance between each projected image in the set. Principal Component Analysis (PCA) 2. Having the projected images set and the projected input image. Then we combine the images vectors into one matrix called the trained matrix (Tmatrix). its the features matrix contains the faces features. We ﬁrst calculate the covariance matrix by multiplying the AMatrix by its transposed one and then ﬁnding the Eigen vectors Matrix. We apply the algorithm in three stages which are described as follows. we have to calculate the following matrices to be the input to the recognition stage: 1. Hidden Markov Models (HMM) 5. 1. Face Recognition process: In this process we receive the 3 outputs from the previous stage as the inputs for this stage addition to the input picture that we want to recognize the face. AMatrix: Centered images generated by subtracting the Mmatrix from Tmatrix. After that we have to the project the input image using the same concept (center the image by subtract the main of TMatrix and multiplying it with the Eigenfaces). we have to create the training database that contains the faces.[5]. First we have to reformat each image from a two dimensional image to a one vector image by concatenating each row or coulomb into along vector. First we have to project the centering images into the face space by multiplying the each column in the AMatrix that represents corresponding image by the Eigenfaces this give us the projected images Matrix. 3. Ref [11] also. Eigenfaces: Socalled Eigenvectors. Mohammed Ibraheem. . 2. MMatrix: Mean values of the Tmatrix (training database). the PCA has big advantages on the implementation since it reduce the dimensions of the images that need to be stored to in the database. Creating the database: Before applying the PCA. Finally by multiplying AMatrix and by the modiﬁed Eigen vectors Matrix we can get the Eigenfaces Matrix. Boosting and Ensemble Among these algorithms PCA based Eigen face [9]. Independent Component Analysis (ICA) 3. and modifying it by sorting them and removing the negative values.
68 Face Detection. The ﬂowchart (Fig.6) describes in detail. The limited memory available onboard was one of the major bottlenecks. Moreover. It has only 192K of RAM available. so not much of data we can store on it. when implementing the reference model in to the TMS320DM6437. video data manipulation is . Fig. Tracking and Recognition Test image should be having the minimum distance with its corresponding image in the database.6 Flowchart of PCA Algorithm for Face Recognition 8 Problem faced We faced several problems. since.
Sandeep ”Human Face Detection in Cluttered Color Images using Skin color and Edge Information”. hence.Asheesh Mishra. Indian Conference on Computer Vision. we tried to implement the calculation part of PCA(like ﬁnding Eigenfaces and training databases) on to the matlab itself. which improves a bit of performance. Mohammed Ibraheem. Richa Singh. 2003. we could did very little to optimize the recognition part. and creating database for at least three distinct persons. ”A Robust Skin Color Based Face Detection Algorithm”. Chauhan. D. but output results were not as were expected. but that actually degrades the overall performance of system drastically. 227234. Graphics and Image Processing Dec. So. but the kind of exposer we had got during the implementation phase was very satisfying. Hence. with these eigenfaces directly. pp. and also a problem since most calculations on the PCA is done using division. by processing only a fewer number of frames. we separated out the recognition part from the project. The recognition part was also implemented as a hybrid between onboard and matlab. We tried to overcome this problem. References [1] A. noisy environment(like improper background). 6. 9 Conclusion and Future work We were able to ﬁnish the face detection and tracking pat of our project successfully. In a highly. and also due to lack of enough time. The project also honed our skills on embedded C language and code composer studio. . Mayank Vatsa. Although we were able to implement most of the part of recognition phase onto the kit. The TMS320DM6437 is a ﬁxed point processor. thought of implementing every part of PCA on to the board. slow response of the kit was also a problem. including the training set. Singh. Tamkang Journal of Science and Engineering. as of time being. the other problem we faced is in the implementation of recognition part. virtually unrealistic. square roots. Since. the performance degrades marginally.Rajagopalan. and comparing the realtime data. so we have to use the scaling technique (scaling up) to convert the data to ﬁxed point and restoring it after calculation scaling down using shift operations for the square root calculation we use Babylonian method [13]. No. 4. multiplications.N. but still able to detect most of the time. We initially.Vol. during processing of image data. but the results were not as expected. Shashikant Patil 69 very computation exhaustive. the project was much more complex to implement as opposite when we thought otherwise initially. K. S. [2] Sanjay Kr. and results were found to be as expected. 2002.
eit. Tracking and Recognition [3] Minku Kang. ISBN 13: 9780521880688 [5] http://www.lth.ac.eit.html [13] http://en. PCAbased Face Recognition in an Embedded Module for Robot Application.essex.70 Face Detection.pdf [8] http://www. 3rd edition.lth.org/wiki/Face_detection.cs.otago.mathworks.org/wiki/Eigenvalues_and_eigenvectors [12] http://cswww.lth.se/fileadmin/eit/courses/eti121/ Seminar/lect2_2011.pdf [7] http://www.nz/cosc453/student_tutorials/ principal_components.ac.csus.wikipedia.eit.pdf [10] http://www. [14] http://www.uk/mv/allfaces/faces94.com/matlabcentral/fileexchange/ 17032pcabasedfacerecognitionsystem [6] http://www. Press.pdf .se/fileadmin/eit/courses/eti121/ Seminar/lect1_2011.wikipedia. ICROSSICE International Joint Conference 2009 [4] William H. Cambridge University Press. pdf [9] http://www. The Art of Scientiﬁc Computing .edu/indiv/p/pangj/aresearch/video_ compression/ref/report_summer09_Shriram%20_face_detection.facerec.se/fileadmin/eit/courses/eti121/ Reports/ASP_Reports_2010.org/algorithms/ [11] http://en.
Then the successive frames are subtracted from the reference frame to obtain the moving object. . This will reduce the processing time required for each successive frame.71 Part VII Circular Object Detection Ajosh K Jose. Qazi Omar Farooq. The aim is to detect a circular moving object when the background is kept ﬁxed. Canny edge detection algorithm is employed to extract the edges of moving object. Finally the object is checked for circular shape using modiﬁed Circular Hough Transform. The ﬁrst step is to make the reference frame by capturing the ﬁrst frame sent from the video camera. Sreejith P Raghavan Abstract This project deals with the implementation of a circular object detection method on DSPTMS320DM6437 Evaluation Module. Sherine Thomas. Further processing is done only on the area of moving object. If a circular object is detected on the frame then it is marked in the video. In the second step.
Background frame is the ﬁrst captured frame which is stored in the memory. 2 Theory Figure 1. the task of manually monitoring the safety can be reduced. Detecting moving objects is an important task in video surveillance. The circular object detection is implemented in the following steps. Here we are trying to study the methods and challenges involved in detecting an object correctly. the algorithm can be further improved to detect objects of more complex shape. In the next step edge detection algorithm is applied on that particular area to detect the edges of the object. Also to further reduce the processing time every 15th frame is processed instead of consecutive frames. circular object detection algorithm is applied on the edge detected input to ﬁnd the circular objects in the region. In the ﬁnal step. • Moving object Detection • Edge Detection • Cicular Object Detection In the ﬁrst step moving objects in the frame is detected by background substraction method. Also the image processing algorithms are widely used in medical imaging. If the shape of moving object can be detected automatically. shows the steps involved in our circular object detection implementation. This will reduce the area of interest. After successful implementation of circular object detection system. By doing this the processing time required for the next steps can be reduced very much. Our project implementation is to detect circular moving objects in real time. We are using TMS320DM6437 evaluation module for implementing our project. The rapid change in video and image processing standards also introduce additional complexity and the need for higher throughput. .72 Circular Object Detection 1 Introduction Realtime image processing applications are now widely used due to the very fast advancement in technology. Then the new frame is substracted from the background frame to detect moving objects. the image processing algorithms used in such systems need to be chosen wisely. surveillance systems and digital cinema. Due to the introduction of portable devices with stringent resource limitations. In this project we are trying to familiarize with the various algorithms used in image processing.
The main objective of edge detection is to reduce the amount of data which is to be processed by keeping the structural content intact. From the comparison it was clear that canny edge detection algorithm will be more suitable because of the ﬁne details available in the output which will be needed for tracking a moving object. we added the method of background substraction. We did an intial study using Matlab to ﬁnd the most suitable algorithm for edge detection. Sreejith P Raghavan73 Figure 1: Block Diagram of the steps 2. For reducing the processing time. So by applying the canny edge detection algorithm on that particular area the processing time can be reduced considerabily. The main edge detection algorithms are • Prewitt Method • Canny Method • Sobel Method • Roberts Method • Laplacian of Gaussian Method • ZeroCross Method Please refer [1] for more details about edge detection algorithms.Ajosh K Jose. In this method we substract the static background from the current frame to detect the actual area of interest. Sherine Thomas. The processing time and output of these algorithms vary very much.1 Edge Detection Edge detection is an important step in image processing. There are several algorithms proposed by diﬀerent people for edge detection. Canny Edge detection algorithm consists of ﬁve steps. Qazi Omar Farooq. The input to the Canny edge detection algorithm is the gray scaled image. They are . The results of this study is shown in ﬁgure 2. But the actual problem of applying the canny edge detection algorithm to the complete frame is the large processing time required.
Smoothing takes long processing time due to the matrix multiplication involved. . For detailed description. 2.1. please refer [2] or [3]. Usually gaussian ﬁlter is employed for this step. The gaussian matrix is shown in the ﬁgure 3.4. The image is smoothened by applying a Gaussian ﬁlter with a standard deviation of 1.1 Smoothing Smoothing is done to reduce the noise level in the image. In this project we skipped this step since we are processing only the moving object and the eﬀect on the edge detected image was found to be less. This step helps to remove unwanted edges detected due to the noise present in the image.74 Circular Object Detection Figure 2: Edge detection algorithm outputs • Smoothing • Finding gradients • Nonmaximum suppression • Double thresholding • Edge tracking by hysteresis A brief idea about diﬀerent steps of canny edge detection algorithm is given below.
1.1.3 Nonmaximum suppression For suppressing the nonmaximum. It will be either . Qazi Omar Farooq. angle = Gx · pixelvalue Gy · pixelvalue (1) (2) The output after applying the sobel matrix is shown in the ﬁgure 5. The sobel matrices are shown in ﬁgure 4. It is clearly visible that all the edges in the image are highlighted. Then the gray scale sum is calculated by the equation. the edges where the gray scale intensity varies most is determined. The angle obtained is rounded to the nearest 45 degree with which the gradient direction of all the pixels is determined. the angle calculated from the previous step is used. 2. sum = Gx · pixelvalue + Gy · pixelvalue The gray scale angle is calculated by the equation.2 Finding gradients By ﬁnding the gradients.Ajosh K Jose. Sreejith P Raghavan75 Figure 3: Gaussian Matrix Figure 4: Sobel Matrix 2. This is done by applying sobel matrix to each pixel in the image. The matrices consists of Gx and Gy matrix. Sherine Thomas.
Thus in this step all the gradient edges with local maxima will be selected. the remaining edges are classiﬁed into strong and weak edges. The other pixels will be cancelled. it will be considered as part of the edge and will be retained.90 or 135 degrees. Strong edges will be retained and they will be part of the edges.1.5 Edge tracking by hysteresis For edge tracking by hysteresis. If it is connected to any of the strong pixels. Figure 6 shows the output image after nonmaximum suppression. Then the current pixel strength is compared with the positive and negative gradient direction. all the weak edges are checked for connection with strong pixels in the neighbourhood. Weak edges will be further checked in the next step. then it is choosen and the other values will be suppressed. .76 Circular Object Detection Figure 5: Image after ﬁnding the gradients 0. If the current pixel have more strength than the positive and negative gradient direction. 2.4 Double thresholding For double thresholding.1.45. Figure 7 shows the output image after doing the hysteresis. 2.
Sreejith P Raghavan77 Figure 6: Image after non maximum suppression Figure 7: Image after Hysterisis . Sherine Thomas.Ajosh K Jose. Qazi Omar Farooq.
Six pixels are searched for determing the circle. Circular Hough Transform algorithm is applied to the output of Canny edge detected image to ﬁnd the edges of circles in the image. Code composer studio was used for compiling and downloading the code. (x1 − x0 )2 + (y1 − y0 )2 = r2 (3) All the pixels will be searched for the possibility of a circle with a radius of particular limit. In the ﬁrst step the algorithm was tried in matlab to determine the eﬃciency. The pixels are in . 3 Implementation The implementation of the project was done in two steps. • Video Camera • DM6437 Evaluation board • Television The platform DM6437 oﬀers an interface in the framework. Figure 8 shows the method used for determining the circle. For detailed description on CHT please refer [4]. This algorithm is based on the equation. the matlab implementation was converted into a C implementation. from which we can access the input video stream frame by frame. [5] and [6].78 Circular Object Detection Figure 8: Circle detection method 2. If all the six pixels have edge information within the particular radius.2 Circular Object Detection Circular object is detected by applying circular Hough transform (CHT) algorithm on the edge detected frame. In the second step. The hardware tools used for testing the project included. then it is considered as a circle.
Allborg University [4] Mohamed Rizon. So the frame size is 720x576. Nov. American Journal of Applied Sciences. Sreejith P Raghavan79 YCbCr format. Diﬀerent algorithms used in signal processing were familiarized in this course. For this project we took into consideration a PAL system. Due to the lack of enough processing time. where Y is the luma component and Cb& Cr are the chroma components with the ratio 4:2:2. International Journal of Image Processing (IJIP). References [1] Raman Maini. Also if the processing power permits. 2005. PAMI8(6):679698. The processing consists of reading the frame buﬀer and updating the frame data and writing it back into the buﬀer. Also edge detection was implemented only in the selected area where a moving object is detected. We felt the processing power of DM6437 Evaluation kit is not enough for handling complex algorithms used in video processing. Volume (3): Issue (1) [2] John Canny. Himanshu Aggarwal. Advanced image processing. It was a nice experience working with this project which introduced us to the world of programming DSP processors. Study and Comparison of Various Image Edge Detection Techniques. Object Detection using Circular Hough Transform.Ajosh K Jose. Labortary of computer vision and media technology. So as future work we are planning to optimize our current implementation and add more reliable algorithms for circular object detection which can detect circular objects with diﬀerent radius. Working with DM6437 and code composer studio was a nice experience. we were not able to implement more reliable algorithms for circular object detection in the DM6437 processor. Pattern Analysis and Machine Intelligence. Dr. . we would like to implement more complex algorithms like human hand detection and tracking the movement of hand. 4 Conclusion & Future Work Circular object detection was successfully implemented and tested. Qazi Omar Farooq. 1986. IEEE Transactions on. A computational approach to edge detection. [3] Canny Edge Detection implementation tutorial. The two labs which were done as part of this course were helpful in familiarizing the tool and the DSP kit. Sherine Thomas.
Int. [6] Mohamed Roushdy. April. 2008. Ignacy Duleba. 2007. ETI121. [7] Project Report 2010. Circular Object Detection Using A Modiﬁed Hough Transform. J. Comput. Detecting Coins with Diﬀerent Radii based on Hough Transform in Noisy and Deformed Image. Sci. Algorithm in Signal Processing Course. . GVIP Journal.. Issue 1. Math. Appl. Volume 7.80 Circular Object Detection [5] Marcin Smereka.