Professional Documents
Culture Documents
6. TEAM PROFILE
Team Leaders
1. Hakan Erdogan, Assistant Professor, Sabanci University
Dr. Erdogan is going to oversee the development of the software and help with the development of
the software as necessary.
Bio: Hakan Erdogan is an assistant professor at Sabanci University in Istanbul, Turkey. He received
his B.S. degree in Electrical Engineering and Mathematics in 1993 from METU, Ankara and his
M.S. and Ph.D. degrees in Electrical Engineering: Systems from the University of Michigan, Ann
Arbor in 1995 and 1999 respectively. His Ph.D. was on developing algorithms to speed up statistical
image reconstruction methods for PET transmission and emission scans. His work there resulted in
three journal papers which are highly cited. He was with the Human Language Technologies group
at IBM T.J. Watson Research Center, NY between 1999 and 2002 where he worked on various
internally funded and DARPA funded projects. At IBM, he focused on the following problems of
speech recognition: acoustic modeling, language modeling and speech translation. He has been with
Sabanci University since 2002. His research interests are in developing and applying probabilistic
methods and algorithms for multimedia information extraction. Specifically, he is interested in
speech recognition, audio-visual speech recognition and multiple biometrics systems. As of
December 2009, Dr. Erdogan has published 10 journal papers, 2 book chapters and 40+ conference
papers. He has 3 patents. His works have been cited more than 200 times in the science citation
index. He served as co-organizer of "Speech to speech translation workshop" in ACL 02, technical
co-chair of IEEE SIU 2006 conference and DSP-in-cars 2007 workshop. He has been a program
committee member for LREC 2005-2010, ISCIS 2006-2010, SIU 2006-2010 and IEEE ICPS 2007.
He is the finance co-chair of ICPR 2010. He is a member of IEEE since 1992 and a member of
ISCA since 2003.
2. Saygin Topkaya, Ph.D. student, Sabanci University
Saygin Topkaya has good experience in software development and he will be the main leader for
the project. He is already working on the project and it is his Ph.D. topic.
Research Interests: Computer Vision, Speech Recognition, Object Tracking
Ph.D. (2008 - Cont.) Sabanci University - Electronics Engineering
Fellow Ph.D. Student in TUBITAK Project; Novel Approaches in Audio Visual Speech Recognition
M.Sc. (2005 - 2008) Yildiz Technical University - Mathematical Engineering
Master's Thesis: Face Recognition in Videos
B.Sc. (1998 - 2004) Yildiz Technical University - Mathematical Engineering
Researchers
1. Berkay Yılmaz, Ph.D. student, Sabanci University
Berkay Yilmaz will work on implementing the visual acquisition and feature extraction part of the
project.
Research Interests: 2d/3d computer vision, image processing, machine learning
Ph.D. (2009 - Cont.) Sabanci University - Computer Science
Fellow Ph.D. Student in TUBITAK Project; Novel Approaches in Audio Visual Speech Recognition
M.Sc. (2007 - 2008) Sabanci University - Mechatronics Engineering
Master's Thesis: Statistical Facial Feature Extraction and Lip Segmentation
B.Sc. (2003 - 2007) Bahcesehir University – Computer Engineering
2. Mehmet Umut Sen, MS student, Sabanci University
Mehmet Umut Sen will help develop the recognition system in general.
Research Interests: Statistical Signal Proc., Pattern Recog., Speech and Speaker Recog.
M.Sc. (2009 - Cont.) Sabanci University - Electronics Engineering
Fellow M.Sc. Student in TUBITAK Project; Novel Approaches in Audio Visual Speech Recognition
B.Sc. (2004 - 2009) Sabanci University - Electronics Engineering
3. Murat Saraclar, Professor, Bogazici University
Dr. Murat Saraclar has expressed interest in the project saying that he is willing to contribute as
necessary. We will ask for his valuable advice while developing the software.
4. Other interested researchers:
We seek interested researchers to help in all aspects of the project as listed in the “team members”
part of this proposal.
7. REFERENCES
[1] A. Ganapathiraju, J. Hamaker, and J. Picone, “Hybrid SVM/HMM architectures for speech recognition,” in in Speech
Transcription Workshop, 2000.
[2] A. J. Robinson, L. Almeida, J. Boite, H. Bourlard, F. Fallside, M. Hochberg, D. Kershaw, P. Kohn, Y. Konig, N. Morgan,
J. P. Neto, S. Renals, M. Saerens, C. Wooters, H. Speechproducts, and H. Speechproducts, “A neural network based, speaker
independent, large vocabulary, continuous speech recognition system: The wernicke project,” in Proc. EUROSPEECH’93,
1993, pp. 1941– 1944.
[3] L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” in Proceedings of
the IEEE, 1989, pp. 257–286.
[4] B. P. Bogert, M. J. R. Healy, and J. W. Tukey, “The quefrency analysis of time series for echoes: Cepstrum, pseudo
autocovariance, cross-cepstrum and saphe cracking,” in Proceedings of the Symposium on Time Series Analysis (M.
Rosenblatt, Ed). New York:Wiley, 1963, ch. 15, pp. 209–243.
[5] H. Hermansky, “Perceptual linear predictive (PLP) analysis for speech,” Jour- nal of Acoustical Society of America, vol.
87, pp. 1738–1753, 1990.
[6] ETSI, “Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end
feature extraction algorithm; compres- sion algorithms,” in ETSI ES 202 050 Ver.1.1.3, Nov. 2002.
[7] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proc. of the IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, 2001, pp. 511–518.
[8] Yilmaz B., Erdogan, H., Unel, M., "Probabilistic Facial Feature Extraction Using Joint Distribution of Location and
Texture Information," International Symposium on Visual Computing 2009, Las Vegas, USA, Nov. 2009.
[9] M. B. Stegmann, R. Fisker, B. K. Ersbøll, H. H. Thodberg, L. Hyldstrup, Active appearance models: Theory and Cases,
Proc. 9th Danish Conference on Pattern Recognition and Image Analysis, vol. 1, pp. 49-57, AUC, 2000
[10] H. McGurk and J. MacDonald, “Hearing lips and seeing voices,” Nature, vol. 264, pp. 746–748, 1976.
[11] A. Nefian, L. Liang, X. Pi, X. Liu, and K. Murphy, “Dynamic bayesian networks for audio-visual speech recognition,”
EURASIP Journal on Applied Signal Pro- cessing, pp. 1–5, Nov. 2002.
[12] H. Hermansky, D. P. W. Ellis, and S. Sharma, “Tandem connectionist feature extraction for conventional HMM
systems,” in Proc. ICASSP, 2000, pp. 1635– 1638.
[13] David H. Wolpert, “Stacked generalization” in Neural Networks, v.5 n.2, p.241-259, 1992