You are on page 1of 4

ESCA/SOCRATES Workshop on Method

and Tool Innovations for Speech Science


ISCA Archive Education (MATISSE)
http://www.iscaĆspeech.org/archive University College, London, UK
April 16-17, 1999

Teaching Digital Speech Processing for Telecommunications


Mircea Giurgiu
Department of Communications, Technical University, Cluj-Napoca, Romania
Mircea.Giurgiu@com.utcluj.ro

Abstract has become a necessity in the Department of


Communications from Technical University of Cluj-
The communication focuses on the educational
Napoca, Romania. This necessity has risen after 1990,
environment for teaching "Digital Speech Processing" at
when a great restructuring process begun in the high-
undergraduate, graduate and PhD levels in the
level education as a result of alignment to the West
Department of Telecommunications from Technical
European educational standards.
University of Cluj-Napoca, Romania. Emphasis will be
done to reveal particular successes and drawbacks in The analysis of education status at that moment
order to set up and to develop a modern curriculum highlighted some facts: a centralised curricula, an
during the last four years. This educational environment inflexible infrastructure, the lack of updated literature
has been set up to cover special issues on speech and equipments, all of them being serious drawbacks
processing, but from an engineering applicative point of even in cases when the self motivation and the
view: that of telecommunications. The teaching willingness to change and to move things further played
infrastructure is tailored to the local conditions, but it an important role.
can be adopted as an overall model for other The university autonomy, educational and scientific
universities. The conclusions will refer to a critical connections through dedicated programmes (Tempus,
analysis of education in the above mentioned field in our COST, PHARE, Socrates, etc) with EU universities
department, possibilities to improve teaching resources allowed a restructuring process in what concerns: setting
and teaching methods, but also proposals to move from up new curricula, updating the teaching materials and
"classical" education to an open and continuous one teaching methodologies, endowment with equipments,
based on electronic information interchange. developing the communications infrastructure, retraining
and specialisation of teaching staff, etc. As a result, a
step by step strategy has been designed in order to cover
1. Introduction the gap in the areas of DSP and DSPT. This strategy
was included in the objectives of S-JEP 08012/94
Digital Speech Processing for Telecommunications
Tempus Project "DIDAPRO" - Distributed Data
(DSPT) has become and it has rapidly developed over
Processing for Telecommunications coordinated by our
the last decades as an emerging technology in the area of
department and having four Romanian partners and
communications not only through practical applications,
seven EU universities partners.
but also through an amazing progress in the theoretical
and scientific fields. Electronic data interchange, mobile First, a general course on DSP has been set up (1993)
communications, multimedia networks or voice for students studying telecommunications engineering
activated systems are just few examples demanding speciality. The course comprised general topics on DSP
specific high-computational algorithms for speech in the fourth year, second semester and both speech and
processing in order to ensure low bit rate transmission, image processing applications in the fifth year, first
subjective quality, free channel errors, signal bandwidth semester in a number of 3 hours course, 2 hours
or high recognition rate and robustness. Such examples laboratory and 1 hour project per week. In the
and many others commercial products among them, are meantime, the retraining of three assistant professors
becoming now more and more familiar [1][2]. through Tempus mobilities took place, the laboratory
endowment with computing equipments and the
Hence, speech processing as an applied technology in
developing of teaching materials (the course and the
the area of telecommunications engineering education
laboratory guide) have been accomplished.
rises specific theoretical and practical issues in the fields
of analysis, synthesis, coding and ASR. These issues can Second, based on the gained experience and looking
be faced not only using general Digital Signal further to the professional demands from the
Processing (DSP) theory, but implementing dedicated telecommunications field, the shared speech and image
speech processing algorithms directly linked with the processing course from the fifth year has been split in
nature of speech production and perception mechanism two according to the new two created specialities. Since
and suited to a specific application. then (1995), the course on DSPT is running at
undergraduate level as an obligatory discipline.
2. General frame for teaching DSPT The above approach is considered one of success since
From the above point of view, the creation of a new the theoretical DSP, information theory and computer
subject on DSPT adapted to the modern technologies programming skills are gained by students before

- 109 -
ESCA/SOCRATES MATISSE UCL, London 16-17 April 1999

starting speech processing as an applicative in the fifth Speech analysis takes into account applications of
and last year of faculty, when the connections with the speech signal modelling used by telecommunications
subject "Digital Telecommunications Networks" are systems. The approach organises the scientific content
better assimilated. in: time domain analysis, frequency domain analysis,
Taking benefit of the developed educational and linear predictive models and homomorphic speech
research infrastructure, our department is involved in processing. The emphasis is given to the idea of
different joint EU projects aiming to set up and to parametric speech representation as a modality to extract
further develop pilot models for teaching and and characterise speech features that can be used for
researching telecommunications. In this sense, storing, transmission and further processed (coding,
continuous education is one of our concerns. synthesis or recognition) [4].
Furthermore, in the frame of PHARE Project "Open and Complementary to speech analysis, speech synthesis
Distance Learning Education in Central and Eastern techniques are treated in a more general manner
European Countries" modern technological facilities discussing the principles of waveform synthesis,
have been created in order to support a new MS on spectrum synthesis, articulatory synthesis and TTS
"Multimedia Technologies". This MS runs since revealing the problems and the applications.
September 1998. One of the subjects is "Multimedia A major part of the course is dedicated to speech coding
Data Encoding and Compression" and at this moment technology [5][6]. It deals with the presentation and
comprises both video and audio streams encoding, but in designing of speech coders for different
the near future these two topics are going to separate in telecommunications standards. Waveform coding (time
two distinct areas. So, the course "Speech Compression domain: PCM, DPCM, ADPCM, IMA-ADPCM, Delta;
for Multimedia Applications" will continue in a specific frequency domain: subband coding, MPEG-1, MPEG-2
manner the topics studied in DSPT course. with corresponding layers), parametric coding (LPC10
At PhD level there are five people researching topics on standard, multiband excitation coding) and hybrid
ASR, speech coding at low bit rate, text to speech coding based on analysis by synthesis principle
synthesis (TTS) and speaker verification. (multipulse excited, regular pulse excited and code
Taking into account the variety of topics covered by the excited) are discussed both as theoretical principles and
above mentioned subjects taught at undergraduate, applications. Since mobile communications are rapidly
graduate and PhD levels, the laboratory infrastructure to increasing, the speech compression and encoding in
develop practical classes, the teaching methods and the GSM receive a special attention. Also, Vector
strategy for continuous updating, we believe that all of Quantization (VQ) algorithms are presented, since VQ
them form an educational environment for teaching represents a powerful speech compression technique[4].
DSPT, whose particularities will be revealed in the next Multirate speech processing is a recently introduced
sections. module with the aim of presenting polyphasic
decomposition and the power of Quadrature Mirror
Filters (QMF) used in subband coding. Then, the links
3. Speech processing for undergraduates with Wavelet Transform is revealed together with
At undergraduate level, for students in the fifth year of important properties that can be explored for speech
telecommunications engineering, the obligatory course enhancement, fundamental frequency estimation and
on ’Digital Speech Processing for Telecommunications’ speech compression using wavelet decomposition [4].
has been introduced in the curriculum in 1995 in the The flavour of ASR is given by basic material
frame of "DIDAPRO" Tempus Project. The course is concerning the problem dimensions, difficulties and
taught in the first semester and has allocated a number different approaches for isolated word recognition:
of 42 hours course, 28 hours practical classes and 14 Dynamic Time Warping (DTW), Hidden Markov
hours for project development. A dedicated speech Modelling (HMM) and Artificial Neural Networks
processing laboratory has been entirely endowed with (ANN). More details are reserved for MS studies [3].
computing facilities (LAN with 6 PCs connected to
In Romania there is not yet a reference book treating
Internet) using funding from the Tempus Project.
speech processing, so, because of the lack of the basic
The scientific content has been properly designed from literature most of the scientific material stated above has
the telecommunications engineering perspective to cover been studied and processed during the Tempus
major issues on speech processing: analysis, synthesis, retraining and specialisation visits abroad or during the
coding and automatic recognition, according to the ELSNET Summer Schools.
students’ previous background in signal processing,
The laboratory classes closely follow the course
information theory and digital communications.
structure with the aim to practically illustrate and
The course starts with an introductory part dealing with implement specific speech processing algorithms. For
the mechanism of speech production and perception, example: endpoint detection, voiced/unvoiced,
acoustic and statistic properties of the speech wave and fundamental frequency estimation using different
an overview of methods, applications and speech strategies, Linear Predictive Coding (LPC), Fast Fourier
technologies [1][3]. Transform (FFT), Line Spectrum Pairs (LSP) analysis,

- 110 -
ESCA/SOCRATES MATISSE UCL, London 16-17 April 1999

subband coding, VQ compression, speech enhancement For the first time in the department, the courses are
and speech compression using wavelet decomposition, organised and printed in modules and then evaluated
IMA-ADPCM coding, etc. Also, students become both in a formative sense and in a summative sense. To
familiar with PC-based hardware implementations: achieve the learning outcomes the key events such as:
speech acquisition with plug-in boards, delta modulators gain attention, the statement of objectives, stimulus,
with Continuous Variable Slope Delta (CVSD) feedback, summarise and concluding remarks have been
modulators circuits MC3417 and digital speech used for material editing [9][10]. Also, markers to gain
processing with TMS320C50 signal processor. attention and space for personal notes and responses to
Most of the laboratories are Matlab-based programming the questions are provided in the written text.
environment and some uses computer-aided learning Courseware design is entirely accomplished in the
with applets in Java or are Internet demos. During the multimedia studio and "Tele Europa Nova" studio (a TV
last years, in the frame of practical project at the DSPT channel shared by private sector and university) by the
discipline or as final diploma projects, dedicated speech tutors and technical staff in order to provide a learning
processing software environments have been created in method to the students. This method takes into account:
laboratory. As a matter of example "VisualWords" the written text, face to face interaction, downloadable
(1995) is a software environment for speech analysis course materials, browsable course structure, dedicated
(time, frequency and cepstral domains) and automatic Internet connections among study group and the tutor,
recognition of isolated words using dedicated keeping a track of the communication, discussion list,
techniques: DTW, VQ and HMM. It has a user-friendly practical classes using applets in Java, etc. [9][10]
graphical interface that allows an easy and interactive The syllabus covers topics on: multimedia applications
selection of speech processing parameters [7][8]. that use speech compression, the presentation of the
The logistic of laboratory classes has been step by step current speech compression standards, ADPCM G.722,
developed by attracting students in the activity of wideband G.722 speech encoding, MPEG encoding
software implementation in such a manner that many through Layer I, Layer II and layer III, speech
diploma projects become later laboratory platforms. A compression using wavelet decomposition, VQ and
number of more than 35 diploma projects (proving the speech compression for multimedia mobile (RPE-LTP,
students’ interest in the subject) have been supervised in CELP and VSELP) [5].
the last four years. These diploma projects focused both An important aim in such educational environment is an
on developing the educational environment in what advanced interactivity. For that a multimedia application
concerns computer learning applications, but also for is in development. The main components of the
research purposes. application are the Oracle Web, Database Server and
Video Server. The functionality is achieved by a
4. Speech processing for MS studies collection of PL/SQL procedures and the interaction
The department offers courses for two MS specialities: between the Web Server and Database Server. The
"Modern Telecommunications Techniques" (1995) and application is divided into three logical modules: course
"Multimedia Technologies" (1998), both of them manager, tutors and learners. Every action performed by
including subjects on speech processing taught for one learners, tutors or course manager alter only some
semester (28 hours course, 14 hours laboratory, 14 hours database tables within the assigned database scheme. In
project). the development process the following conditions have
been taken into account: the user can access the
For the first stated MS the course "Speech Analysis and
information as a hypermedia document, the document
Synthesis" covers general topics on speech processing,
has to be personalised according to the application’s
having in view specific DSP techniques applied to this
context, the access is controlled through a registration
particular signal. The course is addressed to an
procedure, the possibility of monitoring the activity, a
heterogeneous audience with different electrical
tutor should be assigned for a study group, on-line and
engineering background, so the topics have been
off-line communication among users [11].
carefully selected to cover basic analysis and coding
speech processing schemes that can be easily 5. The PhD level
implemented during laboratories on a DSP TMS320C50 At PhD level there is not an official format for teaching
platform. speech processing, but monthly scientific seminars take
The teaching environment for MS on "Multimedia place. This is a way to debate and to argue scientific
Technologies" uses facilities created in the regional strategies in a group of more than seven specialised
"Centre for Open and Distance Education" set up in the people. The topics are speech compression, automatic
frame of National Programme for Continuous speech recognition and speaker recognition.
Education, Tempus Project and PHARE Project. For Since 1996 two PhDs in this area has been read. Another
this specific MS, state-of-the-art teaching methodologies four PhD students are preparing their final dissertation.
(e.g. telematic teaching) are experimented in our
The main research contributions have been the
university in order to stimulate self-study and self-
implementation of an ASR system for isolated words in
learning.

- 111 -
ESCA/SOCRATES MATISSE UCL, London 16-17 April 1999

Romanian, from the beginning to the end and an secrets of speech technology. Also, to Prof. Gavril
experimental platform for speaker verification. Different Toderean my PhD supervisor and to those my
recognition approaches have been implemented: DTW, colleagues from the Department of Telecommunications
HMM, Semicontinuous Hidden Markov Models who understood the necessity of speech processing
(SCHMM), ANN as Multilayer Perceptrons (MLP). subject in the curricula and who promoted it at different
Most of this work has been published at national and education levels.
international conferences in a number of more than 30
scientific papers [8]. References
Now, the research is focusing on continuous speech [1] Rabiner, L and Juang, B (1993). Fundamentals of
recognition, language modelling, robust ASR, speech Speech Recognition, Prentice-Hall, Englewood
compression at low bit rates and speech coding for Cliffs, New-Jersey, 1993.
multimedia applications. [2] Klejn, B and Paliwal, K (1995). Speech Coding
and Synthesis, Elsevier Science Publishers.,
Amsterdam, 1995.
6. Conclusions
[3] Furui S (1989). Digital Speech Analysis, Synthesis
The paper presented the educational environment for
and Recognition, Marcel Decker, New-York,
teaching DSPT in what concerns the curriculum
1989.
innovations, syllabus, teaching methodology and the
adopted strategy to develop it in the particular [4] Bloothooft, G and al. (1998). The Landscape of
conditions offered by Technical University of Cluj- Future Education in Speech Communications
Napoca, Romania. These specific conditions have been Sciences. Proposals, Utrecht Institute of
created during the education restructuring process in the Linguistics, Utrecht, 1998.
frame of international projects and according to the [5] Kondoz A (1994). Digital Speech Coding for Low
economic demands. Bit Rate Communications Systems, John Wiley
The keypoints of the strategy are: the setting up of a new &Sons, Chichester, 1994.
curricula based on retrained or specialised teaching [6] Papamichalis P (1988). Practical Approaches to
staff, the creation of the laboratory infrastructure, a Speech Coding, Prentice-Hall Inc., New-Jersey,
continuous effort to update and develop the teaching 1988.
materials and methodology synchronously with the [7] Giurgiu M (1995). Software Environment for
telecommunications practical needs. Speech Processing, Proc. of ECCTD’95, August,
Other Telecommunications Departments in the country 1995, Istanbul, 503-506.
do not offer yet dedicated speech processing courses, [8] Giurgiu M (1996). Contributions to Automatic
only some speech processing background included as Speech Recognition in Romanian (PhD Thesis),
application in a general DSP course. So, we think that Technical University of Cluj, Cluj-Napoca, 1996.
our experience can be shared with other departments
[9] Mason, R and Kaye A (1990). Mindweave:
and it can serve as an overall model for high education
Communication, Computers and Distance
institution in Romania, since the curriculum is based on
Education, Pergamon Press, London, 1990.
related principles. Major drawbacks are the lack of
updated literature, not enough computational resources [10] Vin H (1996). Heterogenous Networking, IEEE
and poor information processing facilities. Multimedia, 4/2: 84-87.
The MS on Multimedia Technologies uses specific [11] Collins B (1996). Tele-Learning in a Digital
teaching methods and resources focusing on high-level World, Int. Thomson Computer Press, 1996.
professional topics that will allow to move from the
"classical" education to an open and continuous one
based on electronic information interchange.
With the gained experience and using the created
infrastructure we think to propose possible educational
projects as European Masters (e.g. Advanced Speech
Coding Techniques for Telecommunications) or to open
new ways on inter-university collaboration both in
education and research.

Acknowledgements
I would like to thank to the Socrates Thematic Network
"Speech Communications Sciences", especially to Dr.
Gerrit Bloothooft from Utrecht University for his
important support in the publication of this material, to
people from Granada University where I learned many

- 112 -

You might also like