Professional Documents
Culture Documents
Thesis Project Book Format Khademul - Sir
Thesis Project Book Format Khademul - Sir
May 2017
Submitted by
ID: xxxxxxxxxx
Name:
Department of Computer Science and Engineering
Varendra University
Rajshahi, Bangladesh
Supervised by
Md. Khademul Islam Molla, PhD
Professor
Department of Computer Science and Engineering
Varendra University
Rajshahi, Bangladesh
Abstract
The separation of mixed audio signals is the problem of automated separation of audio sources
present around a set of differently placed microphones capturing the acoustical scene. The
whole problem resembles the task, a human can solve in a cocktail party situation, where using
two sensors (ears), the brain can focus on a specific source of interest, suppressing all other
sources present. In this thesis, we examine the audio source separation problem using a range of
approaches to segregate the component sources from monophonic and stereo recordings. In
particular, we consider underdetermined condition (i.e. the number of sensors is less than the
number of sources) which is a challenging topic in the field of blind source
separation……………………………………………….
Acknowledgements
Completing a Ph. D. is usually a journey through a long and winding road, where one has to
tame more oneself than the actual phenomena in research. Luckily, I am not alone in this trip.
My sincerest thanks are due to all those that have helped me to get this far.
Primarily, I would like to thank my supervisor Professor Keikichi Hirose for his inspiration,
support and very fruitful collaboration, not to mention for coping with the mountains of
paperworks I must have caused. Without his generous help it would have been impossible to
finish this work. Next comes Dr. Nobuaki Minematsu for his fruitful suggestions during this
journey.
Thanks to Japanese Government (Monbukagakusho) scholarship for funding my study and
living expenses in Tokyo during the period of my course. To the 21 st century Center of
Excellence (COE) project of the University of Tokyo for the financial support to attend and
present my work in several conferences. Thanks to Nippon Telegraph and Telephone (NTT)
communication research laboratory for the permission to use the anechoic room and the
technical supports for audio recording. I am grateful to the people of Bangladesh who have
supported my graduate study in public university. Also thanks to the University of Rajshahi,
Bangladesh for approving me of study leave to pursue this degree.
Thanks to everyone else in the laboratory who have rescued me from the frustration during the
early days in Japan and for their continuous help due to the lack of my ability in Japanese
language. They have made the lab a fun place to live for the last three and half years.
I am deeply grateful to my parents for bringing me into this world and to my siblings for
keeping a warm family where I grew up. My wife Sheuly has always inspired me and deserves
more than mere acknowledgement.
Finally, I wish to thank all of my friends and well-wishers for their support over the time its
taken to get this done.
Contents
127```````````````````````````````````````````````````````````````````````````````````````````````````````````````
````````````````````````
Chapter 1
Introduction
where fj and zj are column vectors of lengths equal to the number of frequency bins and time
frames of X, respectively. Each fj(i) corresponds to a spectral basis vector derived from X. The
group of such bases, denoted by Fi, represents the overall spectrum of the ith source.
References
[1] Anemuller, J. and Gramss, T.: On-line blind separation of moving sound
sources. Proc. of Int. Conf. on Independent Component Analysis and Blind
Source Separation (ICA’99), pp: 331-334, 1999.
[2] Asano, F., Goto, M., Itou, K. and Asoh, H.: Real-time sound source localization
and separation system and its application to automatic speech recognition. Proc.
of Eurospeech01, pp:1013-1016, 2001.
[3] Allen, J. B.: How do humans process and recognize speech? IEEE Transaction
on Speech and Audio, 2(4), pp: 567-577, 1994.
[4] Brown, G. J. and Cooke, M.: Computational auditory scene analysis. Computer
Speech Language, Vol. 8(4), pp: 297-336, 1994.
[5] Bofill, P.: Underdetermined blind separation of delayed sound sources in the
frequency domain. Neurocomputing, Vol. 55, No. 3/4, pp: 627-641, 2003.
[6] Boll, S. F.: Suppression of acoustic noise in speech using spectral subtraction.
IEEE Trans. on Acoustic, Speech and Signals Processing, Vol. 27, pp: 113-120,
1979.
[7] Bregman, A. S.: Auditory scene analysis. MIT Press, Cambridge, 1990.
[8] Bregman, A. S.: Auditory Scene Analysis: The perceptual organization of sound.
MIT press, 2nd edition, 1999.
[9] Breebaart, J., Van de Par, S. and Kohlrausch, A.: Binaural processing model on
contralateral inhibition. I. Model structure. Journal of Acoustical Society of
America. 110, pp: 1074-1088, 2001
[10] Baeck, M. and Zolzer, U.: Real-time implementation of source separation
algorithm. Proc. of DAFx-03, pp: 29-34, 8-11 Sep, 2003.