Public Version

Gain Optimization for Cochlear Implant
Systems
Phyu Phyu Khing
A thesis submitted in fulfilment of the requirements for the degree of
Doctor of Philosophy
The University of New South Wales
School of Electrical Engineering and Telecommunications
Sydney, AUSTRALIA
August 2013
PLEASE TYPE
THE UNIVERSITY OF NEW SOUTH WALES
Thesis/Dissertation Sheet
Surname or Family name: Khing
First name: Phyu Phyu Other name/s:
Abbreviation for degree as given in the University

calendar: PhD
School: EE&T Faculty: Engineering
Title: Gain Optimization for Cochlear Implant

Systems
Abstract 350 words maximum: (PLEASE TYPE)
Cochlear implant systems need Automatic Gain Control (AGC) to compress the large dynamic range (~120 dB) of
the acoustic environment into the small dynamic range (< 20 dB) of electrical stimulation. This thesis is concerned
with the design, implementation and evaluation of AGC systems for cochlear implants. It investigated the effects of
AGC on the speech intelligibility of cochlear implant recipients. Various AGC configurations were evaluated with
sentences presented over a wide range of levels at different Signal-to-Noise Ratios (SNR) to identify important
factors affecting the performance. Signal metrics were developed to quantify the effects of AGC on the channel
envelopes. The goal was to improve speech intelligibility in adverse listening conditions.
The performance-intensity functions of cochlear implant recipients with no AGC and with a front-end compression
limiter were measured in noise. With no AGC, the proportion of envelope clipping grew monotonically with
presentation level. The front-end limiter substantially reduced envelope clipping yet gave little improvement in
speech intelligibility. The recipients were highly tolerant of envelope clipping when the background noise was low.
SNR degradation was identified as the main factor reducing speech intelligibility.
A front-end limiter cannot guarantee zero envelope clipping. In contrast, the proposed envelope profile limiter
eliminated envelope clipping and hence preserved the spectral profile. The two AGCs were evaluated, with two
release times (75 and 625 ms). The shorter release time gave worse speech intelligibility because it caused more
waveform distortion and output SNR reduction. For a given release time, preserving spectral envelope profile gave
additional benefits. In a take-home experiment, cochlear implant recipients rated a program with the envelope
profile limiter equivalent to their everyday program.
A conventional cochlear implant signal path uses a predetermined input dynamic range, which is shifted up or
down by the AGC. In contrast, the proposed Adaptive Loudness Growth Function (ALGF) continually optimized the
input dynamic range by estimating the noise floor and peak level in each channel. The ALGF gave better Speech
Reception Threshold (SRT) than the existing state-of-the-art AGC system at the high presentation level when
evaluated with a newly developed roving-level SRT test at three presentation levels.
Declaration relating to disposition of project thesis/dissertation
I hereby grant to the University of New South Wales or its agents the right to archive and to make available my
thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known,
subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain
the right to use in future works (such as articles or books) all or part of this thesis or dissertation.
I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts
International (this is applicable to doctoral theses only).
……………………………………………… ……………………………………..…… ……….……………………...

…………… ………… …….…
Signature Witness Date
The University recognises that there may be exceptional circumstances requiring restrictions on copying or
conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a
longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean
of Graduate Research.
FOR OFFICE USE ONLY Date of completion of requirements for

Award:
THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS

Abstract
Cochlear implant systems need Automatic Gain Control (AGC) to compress the large
dynamic range (~120 dB) of the acoustic environment into the small dynamic range (< 20
dB) of electrical stimulation. This thesis is concerned with the design, implementation and
evaluation of AGC systems for cochlear implants. It investigated the effects of AGC on the
speech intelligibility of cochlear implant recipients. Various AGC configurations were
evaluated with sentences presented over a wide range of levels at different Signal-to-Noise
Ratios (SNR) to identify important factors affecting the performance. Signal metrics were
developed to quantify the effects of AGC on the channel envelopes. The goal was to improve
speech intelligibility in adverse listening conditions.
The performance-intensity functions of cochlear implant recipients with no AGC and with a
front-end compression limiter were measured in noise. With no AGC, the proportion of
envelope clipping grew monotonically with presentation level. The front-end limiter
substantially reduced envelope clipping yet gave little improvement in speech intelligibility.
The recipients were highly tolerant of envelope clipping when the background noise was
low. SNR degradation was identified as the main factor reducing speech intelligibility.
A front-end limiter cannot guarantee zero envelope clipping. In contrast, the proposed
envelope profile limiter eliminated envelope clipping and hence preserved the spectral
profile. The two AGCs were evaluated, with two release times (75 and 625 ms). The shorter
release time gave worse speech intelligibility because it caused more waveform distortion
and output SNR reduction. For a given release time, preserving spectral envelope profile
gave additional benefits. In a take-home experiment, cochlear implant recipients rated a
program with the envelope profile limiter equivalent to their everyday program.
A conventional cochlear implant signal path uses a predetermined input dynamic range,
which is shifted up or down by the AGC. In contrast, the proposed Adaptive Loudness
Growth Function (ALGF) continually optimized the input dynamic range by estimating the
noise floor and peak level in each channel. The ALGF gave better Speech Reception
Threshold (SRT) than the existing state-of-the-art AGC system at the high presentation level
when evaluated with a newly developed roving-level SRT test at three presentation levels.
i
Acknowledgements
First of all, I would like to thank my supervisor, Professor Eliathamby Ambikairajah. With
his great support, supervision and mentoring, my study in the UNSW has been a great
successful journey.
I would like to thank my co-supervisor, Dr. Brett Swanson, for envisioning this thesis and
guiding me through. His knowledge in the field of cochlear implant signal processing is
immense and invaluable for this thesis. I am proud to be his first PhD student and I could not
have a better mentor.
Thanks are also due to my employer, Cochlear Limited, and my work colleagues.
Particularly, Mr. Michael Goorevich, for providing the research tools and giving help from
time to time in the clinical studies. I would like to thank Mr. Sasha Case. His help in the
take-home study in particular is greatly appreciated. I would like to thank Mr. Paul
Holmberg, for teaching me how to write Assembly code. I would like to thank Ms. Esti Nel
and the clinical team for their help in the clinical studies of this thesis.
I would like to express my gratitude and special thanks to the cochlear implant recipients
who voluntarily participated in the listening tests. Their help is greatly appreciated. The
studies in this thesis would not be complete without their contribution.
I would like to thank friends and staff from the UNSW signal processing lab for their
encouragement and support.
Finally, I would like to thank my family for their love and endless support. I hope to spend
more time with you.
ii
Table of Contents
Abstract ..................................................................................................................................... i
Acknowledgements .................................................................................................................. ii
Table of Contents .................................................................................................................... iii
List of Figures ......................................................................................................................... ix
List of Tables ........................................................................................................................ xvi
Acronyms and Abbreviations ............................................................................................. xviii
1 Introduction ...................................................................................................................... 1
1.1 Thesis Objectives ................................................................................................... 1

1.2 Research Overview ................................................................................................ 1
1.3 Thesis Outline ........................................................................................................ 7
1.4 Thesis Contributions .............................................................................................. 9
1.5 Patents and Publications ...................................................................................... 11
2 Sound and Hearing ......................................................................................................... 14
2.1 Introduction .......................................................................................................... 14

2.2 Hearing Mechanism ............................................................................................. 14
2.3 Cochlear Compression ......................................................................................... 17
2.4 Auditory Models .................................................................................................. 18
2.5 Loudness Perception ............................................................................................ 19
2.6 Speech Perception ................................................................................................ 21
2.7 Conclusion ........................................................................................................... 23
3 Cochlear Implants .......................................................................................................... 24
3.1 Introduction .......................................................................................................... 24

3.2 Brief History of Cochlear Implants ...................................................................... 24
3.3 Cochlear Implant Systems ................................................................................... 26
3.4 Electrical Stimulation........................................................................................... 29
3.4.1 Stimulation Mode .................................................................................... 29
3.4.2 Current Configuration ............................................................................. 30
3.5 Loudness Perception ............................................................................................ 31
3.6 Conclusion ........................................................................................................... 33
iii
4 Cochlear Implant Sound Coding Strategies .................................................................... 34
4.1 Introduction .......................................................................................................... 34

4.2 Sound Coding Strategies ...................................................................................... 34
4.2.1 Continuous Interleaved Sampling (CIS).................................................. 35
4.2.2 HiResolution (HiRes) .............................................................................. 36
4.2.3 Spectral Peak (SPEAK) ........................................................................... 37
4.2.4 Advanced Combinational Encoder (ACE) .............................................. 37
4.2.5 Alternative Channel Selection Rules ....................................................... 38
4.3 Signal Processing Modules .................................................................................. 39
4.3.1 Microphone Directionality and Pre-emphasis ......................................... 39
4.3.2 Front-end Gain Control ........................................................................... 40
4.3.3 Filterbank................................................................................................. 40
4.3.4 Combine into Channels ........................................................................... 41
4.3.5 Channel Gains ......................................................................................... 42
4.3.6 Maxima Selection .................................................................................... 43
4.3.7 Loudness Growth Function ..................................................................... 43
4.3.8 Dynamic Range Selection ....................................................................... 45
4.3.9 Mapping................................................................................................... 46
4.4 Speech Perception ................................................................................................ 46
4.4.1 Amplitude Cues ....................................................................................... 48
4.4.2 Spectral Processing.................................................................................. 49
4.4.3 Temporal Processing ............................................................................... 50
4.5 Conclusion............................................................................................................ 51
5 Automatic Gain Control Systems ................................................................................... 52
5.1 Introduction .......................................................................................................... 52

5.2 Fundamentals of AGC .......................................................................................... 53
5.3 AGC in Hearing Aids ........................................................................................... 56
5.4 AGC in Cochlear Implant Systems ...................................................................... 64
5.5 AGC Systems in the Nucleus Sound Processors .................................................. 67
5.5.1 Compression Limiter ............................................................................... 68
5.5.2 Automatic Sensitivity Control ................................................................. 68
5.5.3 Whisper ................................................................................................... 70
5.5.4 Unified Gain Model (Tri-loop AGC) ...................................................... 70
5.5.5 Adaptive Dynamic Range Optimization.................................................. 72
iv
5.6 Noise Estimation .................................................................................................. 74
5.6.1 Martin’s Minimum Statistics Noise Estimator ........................................ 76
5.6.2 Lin’s Recursive Averaging Noise Estimator ........................................... 79
5.7 Conclusion ........................................................................................................... 80
6 Speech Intelligibility Metrics ......................................................................................... 81
6.1 Introduction .......................................................................................................... 81

6.2 Speech Intelligibility Index .................................................................................. 82
6.3 Speech Transmission Index ................................................................................. 84
6.4 Normalized Covariance Measure ......................................................................... 85
6.5 Apparent SNR ...................................................................................................... 87
6.6 Across-Source Modulation Correlation ............................................................... 88
6.7 Conclusion ........................................................................................................... 89
7 Test Methodology .......................................................................................................... 90
7.1 Introduction .......................................................................................................... 90

7.2 Test Materials....................................................................................................... 91
7.3 Test Methods........................................................................................................ 92
7.3.1 Fixed Method .......................................................................................... 94
7.3.2 Adaptive Method..................................................................................... 95
7.4 Test Methodology for Clinical Studies in this Thesis .......................................... 97
7.4.1 Test Setup................................................................................................ 97
7.4.2 Test Materials .......................................................................................... 98
7.4.3 Fixed Level Test...................................................................................... 99
7.4.4 Roving-level SRT Test............................................................................ 99
7.4.5 Research Platforms................................................................................ 102
7.5 Conclusion ......................................................................................................... 106
8 Investigating Effects of No AGC and Fast AGC on Cochlear Implant Speech
Intelligibility................................................................................................................. 107
8.1 Introduction ........................................................................................................ 107

8.2 Clinical Study..................................................................................................... 108
8.2.1 Subjects ................................................................................................. 108
8.2.2 Signal Processing .................................................................................. 108
8.2.3 Test Setup.............................................................................................. 110
8.2.4 Results ................................................................................................... 110
8.3 Discussions ........................................................................................................ 114
v
8.4 Conclusions ........................................................................................................ 116
9 Investigating Effects of Slow AGC and Fast AGC on Cochlear Implant Speech
Intelligibility ................................................................................................................. 117
9.1 Introduction ........................................................................................................ 117

9.2 Clinical Study ..................................................................................................... 118
9.2.1 Test Setup .............................................................................................. 118
9.2.2 Subjects ................................................................................................. 118
9.2.3 Signal Processing................................................................................... 118
9.2.4 Results ................................................................................................... 119
9.2.5 Effect of Presentation Level .................................................................. 124
9.2.6 Test-retest Reliability of Roving-level SRT Test with Single Adaptive
Track ............................................................................................................. 127
9.3 Discussions ......................................................................................................... 130
9.4 Conclusions ........................................................................................................ 133
10 Proposed Envelope Profile Limiter .............................................................................. 134
10.1 Introduction ........................................................................................................ 134

10.2 Signal Processing ............................................................................................... 135
10.2.1 Front-end Compression Limiter ............................................................ 135
10.2.2 Proposed Envelope Profile Limiter ....................................................... 136
10.3 Clinical Studies .................................................................................................. 138
10.3.1 Test Setup .............................................................................................. 138
10.3.2 Study Design ......................................................................................... 138
10.3.3 Experiment 1: High Presentation Level................................................. 138
10.3.4 Experiment 2: Performance-Intensity Function..................................... 139
10.4 Results ................................................................................................................ 140
10.4.1 Experiment 1: High Presentation Level................................................. 140
10.4.2 Experiment 2: Performance-Intensity Functions ................................... 143
10.5 Discussions ......................................................................................................... 147
10.6 Conclusions ........................................................................................................ 151
11 Take-home Study with the Proposed Envelope Profile Limiter ................................... 152
11.1 Introduction ........................................................................................................ 152

11.2 DSP Implementation .......................................................................................... 153
11.3 Fitting Procedures .............................................................................................. 154
11.4 Results and Discussions ..................................................................................... 156
vi
11.5 Conclusions ........................................................................................................ 160
12 Proposed Adaptive Loudness Growth Function........................................................... 161
12.1 Introduction ........................................................................................................ 161

12.2 Background ........................................................................................................ 161
12.3 Implementation of the ALGF............................................................................. 164
12.3.1 Fast Saturation Level Regulator ............................................................ 167
12.3.2 Slow Saturation Level Regulator .......................................................... 168
12.3.3 Base Level Regulator ............................................................................ 172
12.4 Offline Data Analysis ........................................................................................ 177
12.4.1 Comparison of the Noise Estimators..................................................... 177
12.4.2 Processing Conditions ........................................................................... 188
12.4.3 Offline Performance Analysis of the Gain Algorithms ......................... 192
12.5 Clinical Studies .................................................................................................. 201
12.5.1 Test setup .............................................................................................. 201
12.5.2 Study 1: Tri + ADRO vs. ALGF-1........................................................ 202
12.5.3 Study 2: Tri + ADRO vs. ALGF-2........................................................ 207
12.5.4 Study 2F: Adaptive vs. Fixed Dynamic Range ..................................... 215
12.5.5 Test-retest Reliability of Interleaved Roving-level SRT Test ............... 221
12.6 Conclusions ........................................................................................................ 224
13 Predicting Cochlear Implant Speech Intelligibility ...................................................... 226
13.1 Introduction ........................................................................................................ 226

13.2 Signal Processing ............................................................................................... 226
13.2.1 Test Stimuli ........................................................................................... 227
13.2.2 AGC Configurations ............................................................................. 228
13.2.3 Curve Fitting ......................................................................................... 228
13.3 Signal Metrics and Performance Analysis ......................................................... 229
13.3.1 Clipping Proportion............................................................................... 229
13.3.2 Output SNR ........................................................................................... 233
13.3.3 Across-Source Modulation Correlation ................................................ 237
13.3.4 Normalized Covariance Measure .......................................................... 241
13.4 Discussions and Conclusions ............................................................................. 244
14 Conclusions and Future Work ...................................................................................... 249
14.1 Summary of Experimental Results .................................................................... 249

14.2 Conclusions ........................................................................................................ 251
vii
14.3 Future Work ....................................................................................................... 256
Appendix 1: Subjects............................................................................................................ 259
Appendix 2: Cochlear Implant Clinical Questionnaire ........................................................ 261
Appendix 3: Statistics........................................................................................................... 267
References ............................................................................................................................ 269
viii
List of Figures
Figure 1-1 Overview of the dynamic range difference between acoustic and electric hearing 3
Figure 1-2 Gain optimization research overview ..................................................................... 7
Figure 2-1 Illustration of the peripheral auditory system....................................................... 14
Figure 2-2 Cross-section of the cochlea ................................................................................. 16
Figure 2-3 Block diagram of Lyon's auditory model ............................................................. 19
Figure 3-1 Cochlear implant system ...................................................................................... 26
Figure 3-2 Nucleus 5 system.................................................................................................. 28
Figure 3-3 Stimulation of the biphasic waveform (left panel) and two current waveforms
with equal charge (right panel) .................................................................................... 30
Figure 3-4 Sequential pulse stimulation, showing timing and amplitudes ............................ 31
Figure 4-1 Cochlear implant sound processing (Swanson 2008)........................................... 35
Figure 4-2 Continuous Interleaved Sampling strategy (Wilson 2006b) ............................... 36
Figure 4-3 Signal processing modules of the ACE strategy .................................................. 38
Figure 4-4 Magnitude response of 22-channel filterbank ...................................................... 41
Figure 4-5 Instantaneous infinite non-linear compression of LGF ........................................ 44
Figure 4-6 Electrodogram of the monosyllabic word ‘Choice’ ............................................. 47
Figure 4-7 Reconstructed spectrogram of the monosyllabic word ‘Choice’ ......................... 48
Figure 5-1 Input-output diagram of an AGC with different compression ratios ................... 54
Figure 5-2 Components of an AGC system ........................................................................... 55
Figure 5-3 Behaviour of a typical AGC system ..................................................................... 56
Figure 5-4 Intelligibility score as a function of the number of channels, with compression
ratio as a parameter (Plomp 1994) ............................................................................... 62
Figure 5-5 Block diagram of dual time-constant AGC system (Boyle et al. 2009) ............... 66
Figure 5-6 AGC systems of the Nucleus CP810 sound processor ......................................... 68
Figure 5-7 Block diagram of Automatic Sensitivity Control (Seligman 2000) ..................... 69
Figure 5-8 Input-output diagram of Whisper ......................................................................... 70
Figure 5-9 Input, output and gain signals of the tri-loop AGC on a roving-level sinusoid ... 72
Figure 5-10 Block diagram of ADRO in one frequency channel .......................................... 74
ix
Figure 6-1 Speech modulation envelope spectrum (Houtgast and Steeneken (1985)) ........... 84
Figure 7-1 Top-level architecture of Champ (Swanson et al. 2007) .................................... 103
Figure 7-2 Components of the real-time Nucleus-xPC system (Goorevich 2005).............. 105
Figure 7-3 ACE sound coding strategy with the standard front-end AGC (blue block) and the
proposed AGC (green block) ..................................................................................... 106
Figure 8-1 Signal path used in the experiment ..................................................................... 109
Figure 8-2 Percent correct scores of four cochlear implant subjects with no AGC and with
FEL75 (legends of the curves are as described in Figure 8-3) ................................... 111
Figure 8-3 Group mean scores of four cochlear implant recipients with no AGC and with
FEL75 ........................................................................................................................ 111
Figure 9-1 SRT of seven cochlear implant subjects measured by the roving-level SRT test
with a single adaptive track. The error bar indicates one standard error. The asterisks
indicate statistically significant difference in performance between the two AGC
systems (* p < 0.05, ** p < 0.01). .............................................................................. 120
Figure 9-2 SRT of seven cochlear implant subjects at 80 dB SPL measured by the
interleaved roving-level SRT test. The error bars indicate one standard error. The
asterisks indicate statistically significant difference in performance between the two
AGC systems (* p < 0.05, ** p < 0.01). .................................................................... 121
AGC systems (* p < 0.05, ** p < 0.01). .................................................................... 122
Figure 9-4 An example of bad SRT convergence due to the lack of audibility at 50 dB SPL.
Left panel shows the convergence of SRT over trials and right panel shows mean
percent correct words at each SNR. ........................................................................... 123
Figure 9-5 Percent correct scores of seven cochlear implant subjects at 50 dB SPL. The error
bars indicate one standard error. The asterisks indicate statistically significant
difference in performance between the two AGC sytems (* p < 0.05, ** p < 0.01).. 124
Figure 9-6 Comparison of SRTs between 65 and 80 dB SPL for the FEL program. The error
bars indicate one standard error. The asterisks indicate statistically significant
x
difference in performance between the two test conditions (* p < 0.05, ** p < 0.01).
................................................................................................................................... 125
Figure 9-7 Comparison of SRTs between 65 and 80 dB SPL the Tri + ADRO program. The
error bars indicate one standard error. The asterisks indicate statistically significant
................................................................................................................................... 126
Figure 9-8 Test-retest variability of the roving SRT test with single adaptive track from the
SRT of six subjects taken from test and retest sessions. Top panel shows SRTs of each
subject and group mean for test and retest. The bottom panel shows the SRT
difference between test and retest, and mean of the absolute SRT differences. ........ 127
Figure 9-9 Adaptive tracks of the roving-level SRT test with a single adaptive track for S5
with the Tri + ADRO in the test and retest sessions. The convergence of SNRs was
poor in the test session (top panel) and good in the retest session (bottom). ............. 129
Figure 10-1 Block diagram of ACE signal path with the envelope profile limiter (EPL) ... 136
Figure 10-2 Envelope clipping of a vowel; at spectral waveform (top panel) and temporal
waveform (bottom panel) at the output of the LGF, processed by FEL and EPL ..... 137
Figure 10-3 Effects of gain structure and release time on speech intelligibility of six cochlear
implant subjects in quiet (top panel) and in noise (bottom panel). Error bars indicate
one standard error. ..................................................................................................... 140
Figure 10-4 Performance-intensity functions of FEL75 and EPL625 ................................. 144
Figure 10-5 Comparison of scores between the FEL75 and the EPL625 for the presentation
levels above 70 dB SPL at SNR 20 dB (top panel) and SNR 10 dB (bottom panel).
The asterisks indicate statistically significant difference in performance between the
two AGCs (* p < 0.05, ** p < 0.01). ......................................................................... 145
Figure 10-6 Proportion of clipping for speech presented at 89 dB SPL with the front-end
compression limiter with the release time 75 ms and 625 ms ................................... 148
Figure 11-1 The DSP 1 signal path of the Nucleus CP810 sound processor with a switch
between UGM and EPL ............................................................................................. 154
Figure 11-2 Helpfulness indication of the UGM and EPL programs for each question in
CICQ .......................................................................................................................... 156
xi
Figure 11-3 Helpfulness indication of the UGM and EPL programs in the categorised
listening conditions .................................................................................................... 158
Figure 12-1 Degree of freedom for the input signal to move within the dynamic range of
LGF ............................................................................................................................ 163
Figure 12-2 Top level Simulink block diagram of ALGF .................................................... 165
Figure 12-3 Simulink block diagram of fast saturation level regulator ................................ 168
Figure 12-4 Simulink block diagram of slow saturation level regulator .............................. 169
Figure 12-5 Simulink block diagram of clipping proportion calculation ............................. 170
Figure 12-6 Simulated listening condition showing the slow saturation level with the hold
distance of 0 dB (top panel) and 15 dB (bottom panel) ............................................. 172
Figure 12-7 Simulink block diagram implementation of base level regulator ..................... 173
Figure 12-8 Simulink block diagram of Lin's recursive averaging noise floor estimator .... 174
Figure 12-9 Simulink block diagram of smoothing parameter calculation .......................... 174
Figure 12-10 Simulink block diagram of the proposed MCRA noise floor estimator ......... 175
Figure 12-11 Simulink block diagram of the minima-controlled feature in the proposed
MRCA noise estimation algorithm ............................................................................ 176
Figure 12-12 Fixed-level Sentences presented with three types of noise: four-talker babble,
city noise and LTASS noise ....................................................................................... 178
Figure 12-13 Estimation of three different noises presented at the fixed level by Martin’s
minimum statistics method (top panel), Lin’s recursive averaging method (middle
panel) and the proposed MCRA method (bottom panel) ........................................... 179
Figure 12-14 Probability density functions of the normalized error for estimating fixed level
noises.......................................................................................................................... 180
Figure 12-15 Roving-level sentences presented in four-talker babble (top panel), and LTASS
noise (bottom panel)................................................................................................... 181
Figure 12-16 Estimation of roving-level four-talker babble noise by: Martin’s minimum
statistics method (top panel), Lin’s recursive averaging method (middle panel) and the
proposed MCRA method (bottom panel) ................................................................... 183
Figure 12-17 Probability density function of the normalized error for estimating the roving-
level four-talker babble noise ..................................................................................... 184
xii
Figure 12-18 Estimation of roving-level LTASS noise from the noisy speech by: Martin’s
minimum statistics method (top panel), Lin’s recursive averaging method (middle
panel) and the proposed MCRA method (bottom panel) ........................................... 186
Figure 12-19 Probability density function of the normalized error for estimating roving-level
LTASS noise.............................................................................................................. 187
Figure 12-20 Simulink block diagram of the ALGF with a fixed dynamic range setting ... 191
Figure 12-21 Simulink block diagram of the Nucleus signal path with the existing AGC
systems and the ALGF............................................................................................... 193
Figure 12-22 Input, output and gain signals produced from the Nucleus signal path with Tri
+ ADRO. There are three sentence presentations in the figure, having levels of 65, 80
and 50 dB SPL. Each presentation consists of three seconds of noise, then the
sentence, and then three seconds of noise. Signals at the frequency channels centred at
367 Hz (top panel), 1101 Hz (middle panel) and 4282 Hz (bottom panel) were
analyzed. .................................................................................................................... 195
Figure 12-23 Input, output and gain signals produced from the Nucleus signal path with the
ALGF-1. There are three sentence presentations in the figure, having levels of 65, 80
analyzed. .................................................................................................................... 197
Figure 12-24 Input, output and gain signals produced from the Nucleus signal path with
ALGF-2. There are three sentence presentations in the figure, having levels of 65, 80
analyzed. .................................................................................................................... 199
Figure 12-25 Input, output and gain signals produced from the Nucleus signal path with the
ALGF-2F. There are three sentence presentations in the figure, having levels of 65, 80
xiii
analyzed. .................................................................................................................... 200
Figure 12-26 SRT comparison between Tri + ADRO and ALGF-1. Error bars indicate one
standard deviation from the mean. The asterisks indicate statistically significant
difference in performance between the two processing conditions (* p < 0.05, ** p <
0.01). .......................................................................................................................... 203
Figure 12-27 Percent correct scores of four cochlear implant subjects with Tri + ADRO and
with ALGF-1 in the fixed test at 50 dB SPL in noise. Error bars indicate one standard
deviation from the mean. ........................................................................................... 205
Figure 12-28 Percent correct scores of four cochlear implant subjects with Tri + ADRO and
Figure 12-29 SRT comparison between Tri + ADRO and ALGF-2. Error bars indicate one
standard deviation from the mean. The asterisks indicate the statistical significance of
the difference between the two processing conditions (* p < 0.05, ** p < 0.01). ...... 209
Figure 12-30 Percent correct scores of five cochlear implant subjects with Tri + ADRO and
Figure 12-31 Percent correct scores of three cochlear implant subjects with Tri + ADRO and
Figure 12-32 SRT Comparison between ALGF-2 and ALGF-2F. Error bars indicate one
standard deviation from the mean value. The asterisks indicate the statistical
significance of the difference between the two ALGFs (* p < 0.05, ** p < 0.01). .... 217
Figure 12-33 Percent correct scores of four cochlear implant subjects with ALGF-2 and with
ALGF-2F in the fixed test at 50 dB SPL in noise. Error bars indicate one standard
Figure 12-34 Test-retest variability of the interleaved SRT test from the SRT of four subjects
with Tri + ADRO taken from Study 1 and Study 2. Error bars indicate one standard
deviation from the mean. The asterisks indicate the statistical significance of the
difference between the two studies (* p < 0.05, ** p < 0.01). ................................... 222
xiv
Figure 12-35 SRT differences between the two studies of each subject. The difference was
calculated as: SRT (Study 1) – SRT (Study 2). The mean was calculated as the
average of the absolute SRT differences between the two studies. ........................... 223
Figure 13-1 Percent correct scores of the cochlear implant subjects with different AGC
configurations in the fixed level test. Open and filled symbols represent 10 dB SNR
and 20 dB SNR respectively. ..................................................................................... 227
Figure 13-2 Comparison of signal amplitudes processed by each AGC configuration, for
speech presented at 80 dB SPL in the presence of four-talker babble at SNR 10dB.
The envelopes at channel 7 were taken before the LGF. ........................................... 230
Figure 13-3 Percent correct scores of the individual subject as a function of clipping
proportion. The open and filled symbols represent the results at SNR 10 dB and 20 dB
respectively. ............................................................................................................... 231
Figure 13-4 Group mean percent correct scores as a function of clipping proportion. The top
panel shows the scores in all conditions, the middle panel shows the scores at 20 dB
SNR and the bottom panel shows the scores at 10 dB SNR. ..................................... 232
Figure 13-5 Block diagram of output SNR calculation for the signal path. The front-end
AGC was used as an example in this diagram. .......................................................... 233
Figure 13-6 Percent correct score of individual subject as a function of output SNR. The
open and filled symbols represent the results in 10 dB and 20 dB SNR respectively.
................................................................................................................................... 235
Figure 13-7 Group mean scores as a function of output SNR. The top panel shows the scores
in all conditions, the middle panel shows the scores at 20 dB SNR and the bottom
panel shows the scores at 10 dB SNR........................................................................ 236
Figure 13-8 ASMC calculation for the signal path. The front-end AGC was used as an
example in this diagram. ............................................................................................ 237
Figure 13-9 Percent correct score of the individual subject as a function of ASMC. The open
and filled symbols represent the results at SNR 10 dB and 20 dB respectively. ....... 238
Figure 13-10 Group mean scores as a function of ASMC. The top panel shows the scores in
all conditions, the middle panel shows the scores at 20 dB SNR and the bottom panel
shows the scores at 10 dB SNR. ................................................................................ 239
xv
Figure 13-11 Percent correct score of the individual subject as a function of as a function of
NCM. The open and filled symbols represent the results in SNR 10 dB and 20 dB
respectively. ............................................................................................................... 242
Figure 13-12 Group mean scores as a function of NCM index. The top panel shows the
scores in all conditions, the middle panel shows the scores at 20 dB SNR and the
bottom panel shows the scores at 10 dB SNR. ........................................................... 243
Figure 13-13 Comparison of deviance between four signal metrics .................................... 244
Figure 13-14 Effect of release time on speech intelligibility predicted by ASMC (top panel),
NCM (middle panel) and OSNR (bottom panel) ....................................................... 247
Figure 14-1 Application of the envelope profile limiter in the bilateral sound processing .. 258
List of Tables
Table 3-1 Recent cochlear implant systems of the major cochlear implant companies ......... 27
Table 3-2 Nucleus cochlear implant models .......................................................................... 28
Table 3-3 Nucleus sound processors ...................................................................................... 29
Table 5-1 Rationales for different AGC systems ................................................................... 58
Table 5-2 Parameter settings of the UGM.............................................................................. 71
Table 7-1 Test materials and methods used in AGC studies .................................................. 94
Table 7-2 Test materials and methods used in the clinical studies of this thesis.................. 101
Table 8-1 Parameter setting of the front-end compression limiter ....................................... 109
Table 8-2 Statistical analysis of the score difference between the front-end compression
limiter and no AGC for the presentation levels above 70 dB SPL in two SNR
conditions. The asterisks indicate statistically significant difference in performance
between no AGC and FEL75 (* p < 0.05, ** p < 0.01). ............................................ 112
Table 9-1 Parameter setting of each program....................................................................... 119
Table 10-1 AGC configurations tested ................................................................................. 138
Table 10-2 Statistical analysis on the scores of the individual and group. The score difference
was obtained by subtracting the score of the first AGC from the second AGC. The
xvi
AGCs (* p < 0.0125, ** p < 0.0025). ........................................................................ 142
Table 10-3 Statistical analysis of the scores between the FEL75 and the EPL625 for the
presentation levels above 70 dB SPL in two SNR conditions. The asterisks indicate
statistically significant difference in performance between the two AGCs (* p < 0.05,
** p < 0.01)................................................................................................................ 146
Table 11-1 Combination of AGCs in each program. Zoom is a fixed beamformer with a
super-cardioid polar response. ................................................................................... 155
Table 11-2 Mean benefit scores and preferred program in quiet and noisy background. S4 did
not answer some questions, as indicated by dashes. .................................................. 159
Table 12-1 Setting of other program parameters ................................................................. 188
Table 12-2 Parameter setting of three configurations of ALGF .......................................... 189
Table 12-3 Statistical analysis of SRT measured with Tri + ADRO and the ALGF-1 at 50, 65
and 80 dB SPL. The value inside the bracket indicates the standard deviation. p-values
were calculated by a t-test for the hypothesis testing of the significance on the SRT
difference. .................................................................................................................. 204
Table 12-4 Statistical analysis of SRT measured with Tri+ADRO and the ALGF-2 at 50, 65
difference. .................................................................................................................. 210
Table 12-5 Retrospective comparison with SRT results from other studies that used a roving-
level SRT test............................................................................................................. 214
Table 12-6 Statistical analysis of SRT measured with ALGF-2 and the ALGF-2F at 50, 65
difference. .................................................................................................................. 218
xvii
Acronyms and Abbreviations
ACE Advanced Combinational Encoder (strategy)

ADC Analogue to Digital Converter
ADRO Adaptive Dynamic Range Optimization (automatic gain control algorithm)

AGC Automatic Gain Control
ALGF Adaptive Loudness Growth Function
ASC Automatic Sensitivity Control (automatic gain control algorithm)

AVC Automatic Volume Control (automatic gain control algorithm)
BLR Base Level Regulator

BTE Behind-The-Ear (sound processor)
C-level Maximum Comfortable level
CDI Cochlear Device Interface
CG Common Ground (stimulation mode)
CI Cochlear Implant
CIS Continuous Interleaved Sampling (strategy)
CP810 The Cochlear Nucleus 5 Sound Processor
DSP Digital Signal Processing
EPL Envelope Profile Limiter (automatic gain control algorithm)
FEL Front-End compression Limiter (automatic gain control algorithm)
FFT Fast Fourier Transform
FSLR Fast Saturation Level Regulator
HiRes HiResolution (strategy)
xviii
LGF Loudness Growth Function
LTASS Long-Term Average Speech-Shaped (noise)
MAP The clinical program parameters for an individual recipient

MCRA Minima-Controlled Recursive Average (noise estimator)
MP Monopolar (stimulation mode)
NMB Nucleus MATLAB Blockset (Simulink)

NMT Nucleus MATLAB Toolbox
P-I function Performance-Intensity Function
pps Pulses per second
SLR Saturation Level Regulator
SNR Signal-to-Noise Ratio
SPEAK Spectral Peak (strategy)
SRT Speech Reception Threshold

SSLR Slow Saturation Level Regulator
StimGen Stimulus Generator (test tool)

T-level Threshold level
Tri-loop Front-end AGC system with three control loops
UGM Unified Gain Model (automatic gain control algorithm)
WDRC Wide Dynamic Range Compressor (automatic gain control algorithm)
Whisper Front-end wide dynamic range compressor

xPC “extra” PC processing platform from the Mathworks
xix
Chapter 1 Introduction
1 Introduction
1.1 Thesis Objectives
The goal of this thesis is to improve the speech intelligibility of cochlear implant recipients
using automatic gain control (AGC) techniques; in particular, to allow recipients to better
cope with changing sound levels in the presence of noise.
The objectives that align with the above goal are:
 To evaluate the performance of existing AGC systems on the speech intelligibility of
cochlear implant recipients;
 To develop and evaluate new algorithms for optimizing signal levels within the
limited dynamic range of electrical stimulation;
 To investigate test methodology for evaluating AGC systems;
 To investigate speech metrics to predict the speech intelligibility of cochlear implant
recipients.
1.2 Research Overview
When a person has lost most of the hair cells in the inner ear, no amount of amplification can
restore normal hearing. A cochlear implant (CI) is a hearing prosthesis for severe-to-
profound hearing impaired persons who can no longer benefit from hearing aids. It bypasses
the transduction process of the inner ear and directly stimulates the auditory nerve to provide
hearing sensation. To date, the cochlear implant is the most successful of all neural
prostheses developed (Wilson and Dorman 2008).
Cochlear implant systems have improved remarkably with technology advances over the last
four decades. The performance of cochlear implant recipients has increased from relying on
lip-reading to telephone usage and music appreciation. The speech understanding of top-
performing cochlear implant recipients is comparable to normal hearing subjects in easy
listening conditions. However, the speech intelligibility of cochlear implant recipients
P age |1
degrades significantly in adverse listening conditions. Unlike normal hearing subjects who
can get benefits from the redundant spectral and temporal cues of speech waveforms in
adverse listening conditions, cochlear implant recipients cannot achieve similar listening
performance due to the limited spectral and temporal cues available. The loss of the inner ear
functions, particularly the loss of a sophisticated gain control mechanism of the outer hair
cells, causes speech intelligibility degradation in adverse listening conditions.
Speech and other sounds vary over a wide range of levels in everyday listening
environments. The overall level of everyday speech can vary in a 35 dB range from casual
conversation to shouting (Pearsons, Bennett and Fidell 1977; Olsen 1998). The individual
components of a segment of speech vary over 40 to 50 dB range when the overall
presentation level remains constant (Boothroyd, Erickson and Medwetsky 1994; Zeng et al.
2002). A normal hearing subject can hear a 120 dB range of acoustic signals. Compared to
that, the dynamic range of electrical current pulses in a cochlear implant is considerably
smaller (5 ~ 20 dB) (Nelson et al. 1995). With constraints on the range of presentable
amplitudes, frequency channels and stimulation rate, an ongoing challenge of cochlear
implant research has been how to best convey the information in acoustic signals onto the
electrodes.
If a very wide range of channel envelopes was mapped into the electrical dynamic range,
fine details of spectro-temporal waveform variations would be lost. It would not matter much
if a large number of discriminable current steps were available within the electrical dynamic
range. The study of Nelson et al. (1996) showed that the cumulative number of discriminable
intensity steps across the dynamic range of electric hearing ranged from as few as 6.6 to as
many as 45.2. Therefore the operating range of input signals is restricted to optimally present
the most relevant range of acoustic signals, i.e., speech. From the study of Cosendai and
Pelizzone (2001), the input dynamic range of more than 45 dB was necessary for optimal
speech recognition. Dawson et al. (2007) showed that increasing the input dynamic range
from 31 dB to 46 and 56 dB improved word recognition at low presentation levels in quiet.
Spahr et al. (2007) highlighted the importance of the input dynamic range setting and dual-
loop (slow and fast time constants) AGC system for speech intelligibility in the performance
P age |2
study on different cochlear implant systems. Figure 1-1 shows the overview of the dynamic
range requirement of the cochlear implant hearing.
Figure 1-1 Overview of the dynamic range difference between acoustic and electric
hearing
The input dynamic range of a cochlear implant system determined the range of input signals
(shown between C-SPL and T-SPL in Figure 1-1) that would be stimulated within the
electrical dynamic range (between C-level and T-level) with no gain adjustment. An AGC is
necessary in the transformation stage for input signals with the presentation levels beyond
the C-SPL and T-SPL. AGC compresses the dynamic range of input stimuli into the input
dynamic range of the system. It is more effective than a linear gain because it can improve
the audibility of soft sounds without amplifying those already-loud sounds.
It is important to have access to low level sounds to expend less effort to maintain speech
intelligibility for different acoustic scenarios. The softest components of everyday speech are
approximately 25 dB SPL. The overall level of speech is low if the talker is a soft-spoken
individual, for example a child, or if the talker-listener distance is far because sound pressure
level decreases as per the inverse square law. Many listening situations occur over a
distance, for example, class participation during lecture.
Many studies showed that having access to low level sounds improved intelligibility of soft
speech of cochlear implant recipients (Skinner et al. 1997; James et al. 2002; Firszt et al.
2004; Muller-Deile et al. 2008; Boyle et al. 2013). Access to soft speech can be beneficial
P age |3
for children with incidental learning situations. Different methods to improve intelligibility
of soft speech sounds have been investigated by other researcher. From Holden et al. (2011)
study, half of the cochlear implant subjects preferred more than one input dynamic range
settings. Holden et al. recommended raised T-levels ( > 10% of the maximum current levels)
and a wide input dynamic range for soft speech understanding in quiet and a narrow input
dynamic range in noisy conditions as clinical guidelines.
Having access to low-level input stimuli often cannot avoid unwanted low-level noise from
entering into the system. Some recipients found such noise objectionable (Holden et al.
2011). Skinner et al. (1999) showed that raised T-levels improved the understanding of soft
speech sounds of Nucleus 22 implant recipients and the recipients preferred to use in a quiet
listening condition. Although the measured speech intelligibility showed an improvement in
noise with raised T-levels, there was a trend towards subjects preferring the default T-levels
in noisy listening conditions. Souza et al. (2000) showed a poor correlation between the
improved speech audibility of hearing impaired subjects measured in the clinic and everyday
communication benefit reported by the patients. These studies indicated a research space in
AGC on amplification of low-level input stimuli to improve speech intelligibility.
A decline in speech recognition performance of normal hearing (Dubno, Horwitz and
Ahlstrom 2005) and normal and hearing impaired listeners (Studebaker et al. 1999) was
observed when speech presentation level was increased above the normal conversation level
while the signal-to-noise ratio was fixed. Both studies indicated that a decrease in the
effective signal-to-noise ratio with the increase in speech presentation level was the main
cause of performance degradation. In cochlear implant systems, the most obvious form of
distortion affecting the channel envelopes of high-level input signals is envelope clipping at
the saturation level of the loudness growth function (LGF). Clipping can degrade the speech
intelligibility by distorting amplitude cues. Shannon et al. (2001) showed that the vowel
recognition of normal hearing subjects listening to four-channel noise vocoder was
monotonically degraded with increasing percentage of envelope clipping. Shannon et al.
indicated that the amplitude pattern was important for speech intelligibility if the frequency
components were poorly presented. It is hypothesized that speech intelligibility in noise can
be improved by avoiding peak-clipping at the output signal because spectral peaks of
P age |4
formants are likely to be preserved in background noise. This thesis will investigate the
importance of spectral envelope cues for speech intelligibility of cochlear implant recipients.
With advances in technology, speech intelligibility of experienced cochlear implant
recipients is nearing to 100% for sentences in quiet. However, speech intelligibility of
cochlear implant recipients degrades in noise. In their performance study of cochlear implant
recipients from multiple clinics, Lazard et al. (2012) indicated that the performance variation
was biased by using good performers in the studies and the performance of cochlear implant
recipients in noise still remained a challenge.
Cochlear implant recipients have more difficulties in speech understanding than normal
hearing persons in noisy conditions due to the degraded spectral resolution (Fu, Shannon and
Wang 1998; Stickney et al. 2004a) and poor preservation and processing of fine temporal
structures (Qin and Oxenham 2003; Nie, Barco and Zeng 2006). When the background is a
single competing talker or amplitude-modulated noise, the difference in Speech Reception
Threshold (SRT) between normal and hearing-impaired subjects can be as large as 15 dB
(Moore, Peters and Stone 1999; Moore 2003b). When the background noise is stationary, the
difference is usually around 2 to 5 dB (Plomp 1994).
The purpose of a hearing device is to rehabilitate normal hearing. One of the most important
tasks is to provide speech intelligibility in various listening conditions given limited hearing
functions. The performance shortcomings of cochlear implant recipients in adverse listening
environments, in which sounds operate beyond the designated dynamic range, show the
importance of gain optimization research for cochlear implant systems. An AGC is essential
to adapt the system to different listening situations by adjusting the gain slowly and to
normalize loudness between soft and loud components of speech by adjusting the gain
dynamically. To design an effective AGC system, it is important to understand the
relationship between spectral and temporal characteristics of acoustic signals and the
compression system (White 1986).
Substantial research has been done on AGC systems in hearing aids (Dillon 2001; Souza
2002; Kates 2010). Compared to that, the amount of research done on AGC systems in
cochlear implant systems is relatively small. The review of Souza (2002) on the effects on
P age |5
compression on speech intelligibility and quality stated that compression has increased in
complexity with greater numbers of parameters which are under the clinician’s control. The
advances in compression hearing aids bring greater flexibility, precision in fitting and
selection to the users. Together with these advantages, the need for more information about
the effects of compression amplification on speech perception and quality is also increased.
The gain optimization research in this thesis analyses the important factors in the envelope
distortion for speech intelligibility. It then investigates the effects of AGC on speech
intelligibility over a wide range of presentation level. The robustness of different AGCs for
speech presented at different levels in noise will also be investigated. The aim is to expand
the operating range of input stimuli beyond the designated range between C-SPL and T-SPL.
Figure 1-2 shows the overview of the gain optimization techniques in this thesis. The
investigation is carried out in two stages. In stage 1, the existing front-end and multichannel
AGC systems in the Nucleus signal path will be investigated. The feasibility of consolidating
various AGC systems at a point after the filterbank will be studied. The aim is to simplify the
signal path and improve the envelope presentation within the dynamic range of electrical
stimulation. In stage 2, the feasibility of adapting the input dynamic range to the dynamic
range of input stimuli in various test conditions will be investigated. The research in this
thesis also involves investigation of test methods and signal metrics to evaluate the gain
optimization techniques.
P age |6
Figure 1-2 Gain optimization research overview
1.3 Thesis Outline
This thesis consists of two major parts.
Part 1 (Chapters 2 – 7) reviews acoustic and electric hearing, concerning AGC functions of
the inner ear and a cochlear implant system.
Chapter 2 is concerned with acoustic hearing. It examines the normal hearing mechanism,
with emphasis on the compressive function of the inner ear.
Chapter 3 is concerned with electric hearing in cochlear implant systems.
Chapter 4 presents sound coding strategies and speech perception in cochlear implant
systems. It concentrates the signal processing implementation of the Advanced
Combinational Encoder (ACE) strategy in the Nucleus CP810 sound processor.
P age |7
Chapter 5 broadly discusses AGC systems of hearing aids and cochlear implant systems. It
then explains the existing AGC systems of the Nucleus CP810 sound processor. Noise floor
estimation methods are also studied in this chapter.
Chapter 6 examines signal metrics that attempt to predict the effect of AGC systems on the
speech intelligibility of cochlear implant recipients.
Chapter 7 discusses test methods and procedures for evaluating speech intelligibility with
cochlear implant recipients. It also describes the test materials and methods used in the
clinical studies of this thesis.
Part 2 (Chapters 8 – 13) contains the experimental work of this thesis.
Chapter 8 studies the performance-intensity functions of cochlear implant recipients with no
AGC and with a front-end compression limiter (fast AGC).
Chapter 9 measures the Speech Reception Thresholds (SRTs) of cochlear implant recipients
with two existing AGC systems. It also discusses shortcomings of the existing roving-level
SRT tests.
Chapter 10 describes a new multichannel envelope profile limiter with a spectral profile
preserving feature. It examines the effect of preserving spectral envelope cues and the effect
of compression speed by comparing the envelope profile limiter and the front-end
compression limiter with different release times.
Chapter 11 explains the implementation of the proposed envelope profile limiter in the
CP810 sound processor, and its evaluation in a take-home experiment. The feedback of the
cochlear implant recipients after experiencing the new AGC in their daily life is also
discussed.
Chapter 12 describes design and implementation of a new dynamic range optimization
technique for cochlear implant systems. The performance of the new algorithm was
compared with the performance of existing AGC systems by conducting acute listening tests
with cochlear implant recipients.
P age |8
Chapter 13 studies the correlation between existing signal metrics and the measured speech
intelligibility of cochlear implant recipients. It then develops a new metric that could predict
some of the speech scores of cochlear implant recipients from previous chapters.
Chapter 14 provides a general discussion of the experimental results, summarizes the
findings of the thesis, and suggests avenues for future research.
Appendix 1 lists the biographical details of the cochlear implant subjects who participated in
the experiments of this thesis. A cross-reference is provided showing which experiments
they participated in.
Appendix 2 lists the questionnaire used to survey the experience of a cochlear implant
subject with two programs under comparison in different acoustic scenarios.
Appendix 3 explains the statistical methods used for analyzing the experimental results.
1.4 Thesis Contributions
This research provides original contributions on the dynamic range compression of cochlear
implant systems and evaluation techniques. The major contributions of the research can be
summarized as follows:
Effect of envelope clipping on speech intelligibility: The effect of envelope distortion over
a wide range of presentation levels is investigated. This is the first report on the signal path
with no AGC, which represents the worst case processing for presentation levels beyond the
nominal level. With a good SNR, the subjects showed high levels of speech understanding
at very high presentation levels, in which channel envelopes were heavily clipped. Hence,
preserving the envelope shape is not important when other cues, spectral and slow temporal
modulation, of the target signal are not seriously distorted by competing noise.
Effect of optimizing spectral envelopes: A new compression limiter (envelope profile
limiter) is proposed that eliminates envelope clipping at the LGF. The effect of preserving
channel envelopes is studied in parallel with the effect of compression speed. Compression
speed is far more important than preserving the spectral envelope profile. However,
P age |9
preserving the channel envelope profile is important when temporal envelope cues are
reduced by fast compression speed in quiet. Similarly, preserving the channel envelope
profile is important in noise even when the compression speed is slow. This study also
demonstrates the feasibility of consolidating AGC systems after the filterbank to have a
better representation of spectral envelope cues with respect to the saturation level of the
LGF.
Importance of an AGC with noise-control in real-life listening: The envelope profile
limiter was implemented in the Nucleus CP810 sound processor for take-home use. The
subjects answered a set of questions to compare their everyday program (with front-end
slow and fast AGC, and the multichannel slow AGC) and the modified program with the
envelope profile limiter (and the multichannel slow AGC). For overall comparison between
the two programs, the subjects rated the envelope profile limiter program equivalent to their
everyday program. However, they preferred their everyday program particularly in noisy
situations. The difference between the two programs is the front-end slow AGC that slowly
reduces the overall level when the background noise is high. This study clearly indicates the
importance of noise control feature in an AGC for listening comfort in real-life listening
situations where noise is inevitable.
A novel signal level optimization algorithm: Conventional AGCs control the level of an
input signal to be within the designated dynamic range by varying the gain. That sometimes
compromises the operation when the goals contradict, for example improving audibility
without increasing background noise level. A novel signal level optimization algorithm
called Adaptive Loudness Growth Function (ALGF) is proposed in this thesis. The
innovative scheme of the ALGF continually contracts or expands the dynamic range of the
LGF to achieve both level adjustment of input stimuli and noise reduction simultaneously.
The speech intelligibility of the cochlear implant recipients with the ALGF was equal or
better than with the existing AGC systems (the tri-loop AGC and ADRO) of the Nucleus
CP810 for roving-level sentences presented at 50, 65 and 80 dB SPL.
Improvement to the roving-level SRT test: The test-retest reliability of the existing
roving-level SRT tests was found to be relatively poor. An improvement to the roving-level
P a g e | 10
SRT is proposed to fix the presentation level sequence to eliminate the performance bias by
the unbalanced randomized sequence.
Predicting speech performance by signal metrics: Performance of AGC systems in noise
is affected by many factors: temporal envelope distortion, output SNR reduction and cross-
modulation between speech and noise. Four signal metrics were implemented for the
cochlear implant processing and tested. The effective SNR at the output of the signal path is
shown to be the most relevant factor affecting speech performance in noise. It is highly
correlated with speech intelligibility of cochlear implant recipients tested with different
AGC configurations over a wide range of presentation levels.
1.5 Patents and Publications
Patent
1. Swanson, B. A., Khing, P. P. “Post-Filter Common-Gain Determination” US Patent
2013/0103396A1
2. Swanson, B. A., Khing, P. P. “Feature-based Level Control” US Patent
2013/0195278A1
Journal Paper
The clinical studies, results and findings of the effects of different compression limiters on
speech intelligibility of cochlear implant recipients were submitted to PLOS ONE is a Peer-
Reviewed, Open Access Journal.
Khing, P. P., Swanson, B. A., E. Ambikairajah. “The Effect of Automatic Gain Control
Structure and Release Time on Cochlear Implant Speech Intelligibility”
P a g e | 11
Conference Papers
The clinical studies, results and findings of the effects of no AGC and fast AGC on speech
intelligibility of cochlear implant recipients were published and presented at the IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011.
Prague, Czech Republic.
Khing, P. P., E. Ambikairajah, Swanson, B. A. (2011). “Effect of fast AGC on
cochlear implant speech intelligibility” 2011 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP). Page: 285-288
The analysis on predicting speech intelligibility of cochlear implant recipients using selected
signal metrics were published and presented at the IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), 2013. Vancouver, Canada.
Khing, P. P., E. Ambikairajah, Swanson, B. A. (2013). “Predicting the effect of AGC
on speech intelligibility of cochlear implant recipients in noise” 2013 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Page: 8061-8065
Poster Presentation at International Conferences
In addition to the publications, the results of the findings in various chapters of this thesis
were also presented at international conferences, the ones most relavant for cochlear implant
research.
 Conference on Implantable Auditory Prostheses (CIAP), 2011. Asilomar, California,
USA.
 8th Asia Pacific Symposium on Cochlear Implant and Related Sciences (APSCI),
2011. Daegu, Korea.
 12th International Conference on Cochlear Implants and Other Implantable Auditory
Technologies, 2012. Baltimore, USA.
P a g e | 12
 11th European Symposium on Paediatric Cochlear Implantation (ESPCI), 2013.
Istanbul, Turkey.
 Conference on Implantable Auditory Prostheses (CIAP), 2013. Asilomar, California,
USA.
P a g e | 13
Chapter 2 Sound and Hearing
2 Sound and Hearing
2.1 Introduction
Knowledge of the physiology of the ear, especially the inner ear, is necessary to design
sound processing algorithms for cochlear implants. The most effective way of processing is
to mimic the physiological functions that are bypassed by the prosthesis (Wilson and
Dorman 2008). This chapter is concerned with the mechanism of the auditory system, with
emphasis on the compressive function of the inner ear. It also studies loudness and speech
perception.
2.2 Hearing Mechanism
Hearing is a complex process that converts mechanical movements of a sound wave into
action potentials for the brain to process. Sound travels from the outer ear through the middle
ear to the inner ear. Figure 2-1 shows the structure of the peripheral part of the human
auditory system.
Figure 2-1 Illustration of the peripheral auditory system
P a g e | 14
The pinnae aid sound localization by modifying the incoming sound, particularly at high
frequencies. Sound travels through the ear canal and vibrates the ear drum. These vibrations
are transmitted through the middle ear by three small bones to the oval window of the
cochlea. The three small bones of the middle ear are called the malleus, incus and stapes.
The middle ear acts as the impedance matching unit between the outer ear and the inner ear
for efficient sound transmission. The middle ear is also thought to provide some automatic
gain control (AGC) via the stapedial reflex. The frequency range of human hearing is
typically from 20 Hz to 20 kHz and it is most sensitive to acoustic signals with frequency
between 1 kHz and 5 kHz, largely due to the resonance of the outer ear canal and the middle
ear.
The inner ear contains the cochlea and the vestibular (balance) system. Understanding the
function of the cochlea can provide insight into many aspects of auditory perception (Moore
2003a). A cross-section of the cochlea is as shown in Figure 2-2. The cochlea is a spiral-
shaped, fluid-filled chamber of bone. The cochlea has three chambers; scala timpani, scala
vestibule and scala media. The scala timpani and the scale media are separated by the basilar
membrane which supports the organ of Corti. The start of the cochlea, where the oval
window is located, is called the base. The basal end is most sensitive to high frequencies.
The inner tip at the other end is known as the apex which is most sensitive to low
frequencies. It is partly due to the physical property of the basilar membrane, which is stiff
and thick at the basal end for the wave to travel faster and thin and flexible at the apical end
for the wave to travel slowly.
P a g e | 15
Figure 2-2 Cross-section of the cochlea
When the oval window is set in motion by three small bones of the middle ear, a pressure
difference causes the basilar membrane to vibrate. Early research on the frequency
selectivity of the cochlea was carried out by Georg von Békésy. He observed the amplitude
displacement of the basilar membrane as a function of frequency by experimenting with the
inner ears of human cadavers. Sounds of different frequencies produce maximum
displacement at different places along the basilar membrane. The characteristic frequency
can be defined as the frequency that gives the maximal displacement on a particular location
on the basilar membrane. Each point on the basilar membrane is considered as a bandpass
filter with the centre frequency corresponding to the characteristic frequency. Hence the
cochlea acts like a frequency analyzer. The bandwidth roughly increases in proportion with
the characteristic frequency.
Between the basilar membrane and the tectorial membrane are hair cells. The hair cells
contain three rows of outer hair cells and one row of inner hair cells. The protruding hairs on
top of the hair cells are called stereocilia. When the basilar membrane is moved by sound, a
shearing motion created between the basilar membrane and the tectorial membrane causes
the steoreocilia to displace. Deflections of the basilar membrane towards the tectorial
P a g e | 16
membrane cause the inner hair cells to initiate the action potentials in the neurons, also
known as the spiral ganglion cells, of the auditory nerve. No neuron firings occur for the
movement of the basilar membrane in the other direction. This phenomenon is called phase
locking and is effective for the audio frequency up to 5 kHz. Phase locking synchronizes
timing of the neuron firings to the waveform of the sound. Information is conveyed by the
neurons connected to the inner hair cells to the central nervous system. There are
approximately 30,000 neurons in the human cochlea. Unlike the inner hair cells, the outer
hair cells do not send information to the central nervous system. They function as an active
amplifier and compression unit. They are also responsible for sharp tuning and high
sensitivity of sounds at different frequencies. The gain control mechanism of the cochlea is
described more in the following section.
2.3 Cochlear Compression
The dynamic range of normal hearing is approximately 120 dB (Killion 1978). Neural
response studies show that the dynamic range of auditory nerves is in the order of 20 to 30
dB. Therefore a level adjusting or gain control mechanism is necessary to damp or amplify
the basilar membrane motion according to the available dynamic range of the auditory
nerves. The outer hair cells act as muscles to feed the energy back into the basilar membrane.
In the absence of the outer hair cells, the basilar membrane motion gets damped by the organ
of Corti. When the stimulus is small, the outer hair cells feed the energy back into the basilar
membrane movement. They reduce the damping until the signal is large enough to transmit
to the higher centre of the brain. Similarly for a high level input stimulus, the outer hair cells
compress the motion of basilar membrane (Yates 1995).
The above behavior of cochlea indicates a non-linear motion of the basilar membrane. The
velocity in response to a characteristic frequency is almost linear for very low and very high
levels but has a shallow slope for the levels in mid range. This compressive function allows a
very large range of input sound levels in the peripheral hearing.
P a g e | 17
2.4 Auditory Models
Auditory models can be divided into two broad groups: physiological models and
computational models. Physiological models attempt to reproduce the complex
hydromechanical activities of the cochlea by partial differential and integral equations.
Computational models on the other hand are much simpler compared to the physiological
models. They also attempt to predict the behaviour of cochlea, but with less emphasis given
on the physiological responses.
One computationally efficient model is Lyon’s cochlea model (Lyon 1982, 1983, 1984). The
model describes the propagation of sounds in the cochlea and the conversion of acoustical
energy into the firing of neurons. The detailed description of the model implementation can
be found in Slaney (1988). The filterbank consists of a series of notch filters that model the
travelling pressure waves and resonators that transform the pressure wave into basilar
membrane motion. A notch filter and a resonator together form a bandpass filter, with a very
sharp high frequency cutoff. The spacing between the centre frequencies is approximately
linear below 1 kHz and logarithmic above 1 kHz. Half-wave rectifiers that follow the
filterbank model the inner hair cells. The half-wave rectifier performs a non-linear detecting
operation of the inner hair cells. The compressive operation of the basilar membrane is
modelled by four stages of AGC. Each AGC uses a different time constant to stimulate
different adaptation times in the ear. Each AGC is coupled with the nearest neighbours to
include the simultaneous masking effect of the ear. The output of the AGCs represents the
time-varying probability of firing of the neurons.
P a g e | 18
Figure 2-3 Block diagram of Lyon's auditory model
The computational model of Lyon and Mead (1988) was derived from the Lyon’s cochlea
model described above. One of the important changes incorporated in this model was a
replacement of time-invariant gain mechanisms with adaptive active gain model of the outer
hair cells. Time-varying gain factors resulted in Q-factors of the filters that change in time as
a function of input sound pressure level. The gain control behavior of cochlea is effectively
coupled with the nearest neighbors such that a large signal detected in one place can reduce
the overall gain to other places. Iso-output curves are also sharpened by the coupled AGC
action.
Another widely used computational model was developed by Kates (1991). In Kates’ model,
the nonlinear compressive behavior of the filters in response to the change in the sound
pressure level and the response due to the damage of the hair cells were included.
2.5 Loudness Perception
Loudness is defined as the attribute of the auditory sensation in terms of which sound can be
ordered on a scale extending from quiet to loud (Moore 2003a). Loudness is measured by
psychophysical procedures as scaling and matching.
In a direct scaling method, the subject chooses a number corresponding to the loudness of a
stimulus. Stevens’ power law proposed a relationship between the magnitude of a physical
P a g e | 19
stimulus and its perceived intensity or strength (Stevens 1957). Based on his theory, the
loudness of a pure tone has been measured from a power function of sound intensity:
(2.1)
where L is the subjective unit of loudness, k is a proportionality constant, I is the intensity of
a tone and is the exponent. was 0.6 approximately the loudness estimation done with a
3 kHz tone with the randomly chosen amplitude (Stevens 1975). The power or the scaling
term in the log domain is often used in relating the psychological aspects to the physical
aspects of the stimuli. Normal hearing people can reliably report perceived loudness from
intensity changes and discriminate fine intensity differences over a very large dynamic
range. Changes in intensity are highly correlated with loudness changes. However the
perceived loudness can also be affected by changes in frequency, duration and bandwidth of
a stimulus although the intensity remains fixed (Yost and Nielsen 1994).
In a matching procedure, two stimuli are presented to the listener to match their equality
based on some physical attributes. For example, tones with equal intensity are not equally
loud if their frequencies are different. Fletcher and Munson are the pioneers of loudness
equalization across frequency. They produced the first equal-loudness contours using the
loudness matching procedure (Fletcher and Munson 1933; Fletcher and Munson 1937). In
their study, the subjects were asked to match the loudness of tones at different frequencies to
a reference tone at 1 kHz. The intensity of the reference tone was adjusted until it was
perceived to have an equal loudness as the test tone. The lowest equal-loudness contour
represents the threshold of hearing and the highest contour represents the threshold of pain.
The threshold of hearing is typically used for testing hearing level (HL). For example, a
person with 40 dB HL at 1 kHz is 40 dB less sensitive than normal hearing subjects at that
frequency.
While the thresholds of audibility are important to define the dynamic range of the auditory
system over frequency, it is also important to know the sensitivity to discriminate a change
in intensities and frequencies. The auditory system can detect a change of 0.5 to 1 dB
approximately across a broad range of frequencies and intensities. The ability to discriminate
just-noticeable change in intensity is called just-noticeable difference (JND). The JND of
P a g e | 20
intensity can also be described by Weber fraction . For many conditions, Weber fraction is
not constant. This subtlety in the auditory processing is often described by the “near miss to
Weber law” (Yost and Nielsen 1994).
2.6 Speech Perception
Speech is a time-varying complex waveform with intensity modulating in both time and
frequency domain. Speech communication occurs in various environments. Between a
speaker and a listener, speech is subjected to interferences such as background noise from
external sources and reverberation by room acoustics and distortion due to poor transmission
channels. Yet speech, due to its redundant nature, shows resistance to noise and distortion
with little-to-none intelligibility degradation. The work of Licklider and Pollack (1948)
showed that speech was still intelligible when all the amplitude information was eliminated
by infinite peak-clipping. One of the qualities of speech is its resistance to distortion.
Assmann and Summerfield reviewed the contribution of redundancy of speech under adverse
listening conditions (Chapter 5, Greenberg et al. 2004). They defined adverse listening
conditions as any perturbation of the communication process resulting from either an error in
production by the speaker, channel distortion or masking in transmission or a distortion in
the auditory system of the listener.
Research on speech perception has tried to uncover what features of speech are robust and
what features are critical for intelligibility. Redundancy of speech occurs in acoustic,
phonetic and linguistic levels (Greenberg et al. 2004). At the acoustic level, the redundancy
is shown by the covariation in the pattern of the amplitude modulation across frequency and
time. At the phonetic level, one phonetic cue can be distinguished by many acoustic cues
(Klatt 1989). Linguistic knowledge and contextual information support the robustness of
speech in the sentence level.
Rosen (1992) categorized temporal information of speech into three groups; (i) envelope
cues from 2 to 50 Hz, (ii) periodicity cues from 50 to 500 Hz and (iii) temporal fine structure
cues from 600 to 10 kHz. Low rate modulation of speech between 5 and 50 Hz conveys
important information for segmental (phonetic) and suprasegmental (stress pattern, words
P a g e | 21
onset/offset, prosody and speaking rate) distinctions of speech. From the study of Plomp
(1983), the modulation frequency of phonetic entities shows that the modulation of stress
pattern, words, syllables and phonemes is around 1 Hz, 2.5 Hz, 5 Hz and 12 Hz respectively.
According to the studies of Drullman and coworkers (Drullman, Festen and Plomp 1994b,
a), speech intelligibility was maintained as long as temporal envelope modulation below 16
Hz was preserved. The modulation spectrum can be sensitive to noise, distortion such as
peak-clipping and filtering. For speech recognition in noisy conditions, other temporal cues
such as periodicity help to identify the target speaker. Periodicity cues are relevant to
intonation, voicing and consonant manner. Lastly, temporal fine structures are relevant to
information regarding consonant place and vowel quality.
Formants are vocal tract resonances that provide both phonetic information and speaker’s
identification. The frequency location of the first three formants together with their transition
in time provides phonetic cues for vowels and certain consonants. Vowels are voiced sounds,
which are produced by vibration of the vocal folds in the larynx. Vowels form the nucleus of
syllables and possess distinct energies at the formant frequencies. Different vowels can be
distinguished by the frequencies of their first and second formants (F1 and F2). Vowel
perception involves an analysis of the spectral profile.
Consonants are pronounced with pressure, a constriction or closure at some point along the
vocal tract. The closure and release of vocal tract create rapid spectral change. Consonants
are different from vowels by their rate of change in the short-time frequency. According to
Stevens (1983), the density of speech information is highest during the consonantal closures
and onsets. Each consonant can be distinguished by several phonetic features; manner, place
and voicing. Compared to vowels, consonants are more susceptible to masking by noise
because they are short in duration and low in energy.
Both spectral and temporal information are important for speech intelligibility but some
components of speech rely more on the temporal information and some on the spectral
information. For example, vowels, semivowels and nasals have a good spectral
representation in terms of formants but little temporal modulation information. On the other
hand, some consonants such as plosives and fricatives contain little spectral information.
P a g e | 22
They are mainly determined from the modulated waveform in time domain. Both the
envelope modulation and amplitude fine structure carry the intelligibility of speech.
Envelope modulation is more important to speech intelligibility than fine structure as speech
can still be intelligible without fine temporal structures.
The Long-Term Average Speech Spectrum (LTASS) has been studied for the spectral
characteristics of speech in adverse conditions (Dunn and White 1940; French and Steinberg
1947; Licklider and Miller 1951). LTASS has 25 dB range of variation in average level
across frequency with most of the energy lying below 1 kHz. LTASS is dominated by voiced
sounds, with the energy peaking approximately at 500 Hz and gradually decreasing above
that. The low frequency emphasis of speech improves robustness in adverse conditions. For
instance, the reliability of phase locking below 1.5 kHz is supported by the low frequency
emphasis (Greenberg 1996). The first three formants below 3 kHz carry most intelligibility.
Common periodicity and interaural timing cues for separation of target speech and other
sounds are preserved in the low frequency neural discharge pattern (Greenberg et al. 2004).
2.7 Conclusion
Speech is robust signal because it has redundant cues in different levels. Even heavily
distorted speech can still be intelligible for normal hearing with the robust peripheral
processing. The peripheral hearing system can handle 120 dB range of sounds. Filtering in
various stages of the ear improves robustness. However, for hearing prostheses such as
cochlear implant system, it is important to preserve critical cues to maintain speech
intelligibility because the impaired hearing system cannot process all the available acoustic
features. The electric hearing of a cochlear implant system and speech perception of cochlear
implant recipients will be studied in the next two chapters.
P a g e | 23
Chapter 3 Cochlear Implants
3 Cochlear Implants
3.1 Introduction
A cochlear implant is a solution for persons with severe-to-profound deafness who can no
longer benefit from hearing aids for speech understanding. Electrical stimulation of the
auditory nerves with the cochlear implant partially restores hearing. This chapter briefly
describes the history of cochlear implants and the clinical aspects. It then describes the
components of a cochlear implant system, the principles of operation and the psychophysics.
It then finally examines the effects of cochlear compression loss on the loudness and speech
perception.
3.2 Brief History of Cochlear Implants
Single-electrode cochlear implants were experimented between 1950 and 1980. Djourno and
Eyries (1957) implanted a telecoil on the auditory nerve and stimulated with a burst of 100
Hz signals at a rate of 15 to 20 times per minute. With postoperative rehabilitation, the
patient could distinguish certain words but speech recognition was not developed. House and
Urban (1973) implanted a single electrode in the scala tympani of three patients. The patients
experienced hearing sensation but the study did not elaborate on speech understanding. With
Helmholtz’s place theory, a cochlear implant with multiple electrodes was experimented in
1970s to stimulate different frequency components on different places of the basilar
membrane. The University of Melbourne’s multichannel electrode array was implanted in a
recipient in 1978 (Clark 2003). The subject perceived both the place pitch and the rate pitch
with a significant improvement in open-set speech understanding. With technology advances
over four decades, today cochlear implant recipients have comparable speech understanding
as normal hearing persons in favourable listening conditions.
Initially, only patients with a bilateral profound sensorineural hearing loss with no open-set
speech recognition were considered as candidates for cochlear implantation. Nowadays,
individuals with greater amount of residual hearing and pre-implant speech recognition
P a g e | 24
scores are also considered for the implant (Gifford 2011). Most importantly, a candidate
needs to have some functional auditory nerves for a cochlear implant to work successfully.
Hence the audiometric threshold and speech recognition results are key factors to select
candidacy for adults. Post-lingually deaf adults and pre-lingually deaf children are the
potential candidates of cochlear implants. The candidacy for children also considers age,
etiology, auditory progress with hearing aids, and speech recognition performance (for older
children), etc. Children who have received the implants before two years of age show the
most benefits from cochlear implantation (Kral and O'Donoghue 2010). Children show
language skills equivalent to those of normal hearing children. From the study of Yoshinaga-
Itano et al. (1998), early identification of hearing loss and early intervention can provide
significantly better language development.
Profound deafness can have a great impact on education, employment and social life. Hence
cochlear implants have improved the quality of life for the recipients. The cochlear implant
has been labelled as the most successful and effective implantable prosthesis in terms of
restoring function to recipients (Wilson and Dorman 2008). As of today, more than 181,000
have received Nucleus cochlear implants. A combined effort from various disciplines
including bioengineering, physiology, otolaryngology, speech science, material science and
signal processing technology has contributed to the success of cochlear implants.
P a g e | 25
3.3 Cochlear Implant Systems
Figure 3-1 Cochlear implant system
A cochlear implant system consists of (1) a radio frequency (RF) transmitter coil, (2) a sound
processor, (3) a receiver-stimulator and (4) an electrode array. The numbers are as labeled in
Figure 3-1. Early sound processors are body worn but modern processors are behind-the-ear
(BTE). The receiver-stimulator implant unit is placed under the skin behind the ear. The
electrode array is surgically inserted into the scala tympani of the cochlea. The sound
processing procedures of a cochlear implant system are as below:
 The microphones of the sound processor capture incoming acoustic signals
 The sound processor processes the captured audio and transmits an encoded data and
power to the implant transcutaneously via the RF transmitter coil
 The internal receiver-stimulator circuit decodes the digitized waveform and sends a
pulse train to the electrode array inside the cochlea
 The electrical current pulses excite the neurons of the auditory nerve to give hearing
perception to the recipient
At present, there are four major cochlear implant companies; Cochlear Limited (Sydney,
Australia), Med-El (Austria), Advanced Bionics (USA) and Neurelec (France). Table 3-1
P a g e | 26
lists recent cochlear implant models and default sound coding strategies of the CI
manufacturers.
Manufacturer Implant model Number of electrodes Sound coding strategy
Cochlear Limited CI24RE 22 intra-cochlear Continuous Interleaved
electrodes + 2 extra- Sampling (CIS), Spectral
cochlear electrodes Peak (SPEAK), Advanced
Combinational Encoder
(ACE)
Med-El Pulsar 19 intra-cochlear CIS, Fine Structure
electrodes + 2 extra- Processing (FSP), High
cochlear electrodes Definition CIS (HD-CIS)
Advanced Bionics HiRes 90K 16 intra-cochlear CIS, HiResolution (HiRes)
electrodes + 2 extra- strategies
cochlear electrodes
Neurelec Digisonic 20 intra-cochlear CIS, Main Peaks
electrodes + 2 extra- Interleaved Sampling
cochlear electrodes (MPIS)
Table 3-1 Recent cochlear implant systems of the major cochlear implant companies
Cochlear Limited have manufactured three generations of Nucleus cochlear implants. They
can be distinguished by the Integrated Circuit (IC) as shown in Table 3-2. Nucleus implants
have 22 intra-cochlear electrodes. In addition, CI24M, CI24R and CI24RE implants have
two extra-cochlear electrodes.
P a g e | 27
System name Implant IC RF link frequency Total pulse rate
model (MHz) (pps)
Nucleus 22 CI22M CIC1 2.5 1500
Nucleus 24 CI24M CIC3 5.0 14400
CI24R
Nucleus CI24RE CIC4 5.0 31500
Freedom
Nucleus 5 CI24RE CIC4 5.0 31500
Table 3-2 Nucleus cochlear implant models
The first extra-cochlear electrode is a ball electrode connected by a separate lead wire and
the second extra-cochlear electrode is the platinum plate mounted on the titanium package
(see the left picture of Figure 3-2).
Figure 3-2 Nucleus 5 system
Over the last 18 years, Cochlear Ltd has produced five generations of Nucleus sound
processors. Table 3-3 lists the Nucleus sound processors together with the type of filterbank.
As the filterbank consumed the most power of signal processing, decisions regarding the
implementation of the filterbank determined the overall processor architecture (Swanson et
al. 2007). Two different technologies have been used to implement the filterbank: switched
capacitor filters (SCF) and digital signal processing (DSP). Modern sound processors are
driven by the power of DSP. The AGC systems used in the Nucleus systems will be
elaborated in chapter 5.
P a g e | 28
Sound Processor Name Processor Type Filterbank Type
Spectra Body worn SCF
Sprint Body worn DSP
ESPrit BTE SCF
Freedom BTE DSP
CP810 BTE DSP
Table 3-3 Nucleus sound processors
3.4 Electrical Stimulation
During the first fitting session after the implantation, a recipient’s sound processor
communicates with the implant for the first time. The programming software determines the
processor type and speech processing strategy to generate basic stimulation parameters such
as stimulation mode and rate. Using these basic stimulation parameters, a clinician measures
the current level for each electrode pair to obtain the usable dynamic range of the electrical
current stimulation on the auditory nerve. Traditionally, short bursts of pulses are presented
on a single channel at increasing current levels to determine T and C levels. The minimum
current level that provides a consistent hearing sensation is called T-level. The maximum
current level that gives a comfortably loud hearing sensation is called C-level. The clinical
software then creates a program or MAP for the recipient to encode the amplitudes of
acoustic signals into electrical stimulation levels.
3.4.1 Stimulation Mode
The stimulation mode refers to an electrode configuration for the current flow between an
active and a reference electrode. The most commonly used stimulation mode in cochlear
implant systems from all manufacturers is monopolar mode, where current flows between an
intra-cochlear (active) electrode and an extra-cochlear (reference) electrode.
P a g e | 29
3.4.2 Current Configuration
All commercial implants use a charge-balanced biphasic waveform as shown in the left
diagram of Figure 3-3. One current pulse contains two phases of equal duration and
amplitude. This form of stimulation is safe because no net charge remains in the tissue or
electrode bands. Each biphasic current pulse is delivered between an active electrode and a
reference electrode. The opposite polarity of the two phases indicates the direction of current
flow between the active electrode and the reference electrode. The loudness perceived from
the electrical stimulation is influenced by both current level and pulse width (Shannon 1983).
Loudness is determined by the total charge delivered by a pulse. Total charge is the product
of the current level and the duration of the pulse (i.e. the pulse width). Hence the level of
loudness can be adjusted by changing either the current level or the pulse width such that
pulses with an equal charge produce equal loudness sensation. The right plot of Figure 3-3
shows two biphasic pulses with an equal charge.
Figure 3-3 Stimulation of the biphasic waveform (left panel) and two current
waveforms with equal charge (right panel)
In the Nucleus 24 implant, the actual current level in microampere (uA) is calculated from
the clinical current level by the following equation:
(3.1)
where is the minimum current level ( ) produced when the clinical current value c is
zero and is the maximum current level ( ) produced when (Swanson
P a g e | 30
2008). The clinical current level in the Nucleus 24 system is specified by an 8-bit value from
0 to 255. The current calculation can also be expressed in terms of an exponential function:
(3.2)
where r = 0.0203 indicates 2% increment of the current level for each step increase in the
clinical current unit.
The Nucleus implants can only produce non-simultaneous current pulses because a single
current source is available. The flow of current is routed between electrodes by electronic
switches. An example of sequential current pulses with different amplitudes is shown in
Figure 3-4.
10
7
Channel
1
16 17 18 19 20
Time (ms)
Figure 3-4 Sequential pulse stimulation, showing timing and amplitudes
3.5 Loudness Perception
The narrow dynamic range for electrical stimulation is a direct consequence of the loss of
cochlear gain control function (Chapter 6, Bacon, Fay and Popper 2004). The electrical
dynamic range is defined as the ratio between C and T currents. Stevens’ power law states
that the loudness grows as a power of sound intensity with the exponent of 0.6 for the
P a g e | 31
acoustic hearing (Equation 2.1). Fu and Shannon (1998b) estimated loudness perceived by
three Nucleus 22 recipients on the current level between T and C levels stimulated at the rate
of 500 pps. The electrical loudness estimate was well fitted by a power function of the
current level in uA with an exponent, = 2.7.
(3.3)
where k is the subject specific constant and i is the current level in uA. The power function of
the loudness perceived by cochlear implant recipients is in agreement with Steven’s power
law. A greater power term indicates a rapid loudness growth with the current level. Loudness
is normalized by the reference current level which produces the loudness unit of one
(Swanson 2008). The unit of loudness as a function of current can be calculated by:
(3.4)
When and i of the above equation are substituted by the equation 3.2, L becomes:
(3.5)
Taking a logarithm on both side of equation 3.5, the log loudness becomes:
(3.6)
Finally, the loudness in log scale is a linear function of the clinical current level.
The dynamic range of neurons in acoustic hearing varies from 10 to 50 dB depending on the
spontaneous activity. On the other hand, the dynamic range of electrical stimulation is
uniformly narrow from 5 to 20 dB even though T and C levels are different between subjects
(Bacon, Fay and Popper 2004).
The number of discriminable current steps is also important. Nelson et al. (1996) measured
the difference limens (DLs) of eight cochlear implant subjects for changes in electrical
current, using 300 ms bursts stimuli at a rate of 125 Hz. Their study showed that the intensity
discrimination of cochlear implant recipients can be quantified by Weber fractions in
decibels, , as in acoustic stimulation. The average exponent of the Weber
functions of electric stimulation is an order of magnitude higher than that of acoustic
P a g e | 32
stimulation. It was because of the loss of cochlear compression by directly stimulating on the
nerves. The cumulative number of discriminable intensity steps across the dynamic range of
electric hearing ranged from as few as 6.6 to as many as 45.2.
3.6 Conclusion
This chapter described the principles of cochlear implant hearing. With an electrode array,
place coding is performed by stimulating neurons on different places of the cochlea.
Loudness is proportional to a total charge delivered by a current pulse. Either changing the
pulse width or amplitude can adjust loudness. The number of intensity difference limens is
limited, ranging from 6 to 45, in cochlear implant hearing. Speech represented by the
electrical current waveforms is crude due to the limited spectral, temporal and intensity cues
that can be transmitted to the implant. Therefore it is important for a sound coding strategy to
selectively present important cues for the intelligibility of speech and other sounds. The next
chapter will discuss the sound coding strategies of cochlear implant systems and the speech
perception of cochlear implant recipients.
P a g e | 33
Chapter 4 Cochlear Implant Sound Coding Strategies
4 Cochlear Implant Sound Coding Strategies
4.1 Introduction
The previous chapter described cochlear implant hearing by the electrical stimulation of the
multiple-channel electrodes on the neural nerves. Place and rate coding of frequency are
important for speech perception. In this chapter, sound coding strategies of cochlear implant
systems are studied to see how each strategy transforms the acoustic stimuli to electrical
stimuli to provide the spectral, temporal and intensity cues for hearing.
4.2 Sound Coding Strategies
A sound coding strategy transforms acoustic signals to electrical current pulses. The main
objective of a sound coding strategy is to provide essential cues of speech waveform for
speech understanding of cochlear implant recipients. Early sound coding strategies for the
Nucleus systems extracted key speech features from sounds for speech understanding. The
F0/F2 strategy coded the fundamental frequency (F0) as the rate of stimulation and the
second formant (F2) as place of stimulation. The F0/F2 strategy was later upgraded to the
F0/F1/F2 strategy by stimulating one more electrode that corresponded to the first formant
frequency (F1). The F0/F1/F2 strategy was extended to the Multipeak strategy by stimulating
two additional electrodes at the basal end to include frequency information above 2 kHz.
These strategies worked well for listening conditions in quiet (Clark 2003).
The filterbank-based speech processing algorithms were developed in the 1990s. These
strategies present spectral information of sounds by place of stimulation. They made best use
of the tonotopic structure of the cochlea. More importantly, the filterbank-based strategies
provided higher speech intelligibility than the feature-based strategies. The signal processing
framework of the filterbank-based sound processing strategies is shown in Figure 4-1. The
front-end does the frequency shaping and level adjustment to the audio signals captured by
the microphones. The front-end block typically involves a pre-emphasis filter, a sensitivity
control and an AGC system. The filterbank splits the audio signal into frequency bands such
P a g e | 34
that each frequency band is allocated to one stimulation channel. The sampling and selection
block determines the rate of stimulation and the shape of stimuli across the frequency
channels. The amplitude mapping block converts the channel amplitudes into current levels
within the predetermined electrical dynamic range of each electrode.
Figure 4-1 Cochlear implant sound processing (Swanson 2008)
SPEAK, ACE, CIS and HiRes are the most widespread-used sound coding strategies in
cochlear implant systems. Those strategies can be sorted into two groups based on the
channel selection strategy. CIS and HiRes stimulate all frequency channels whereas ACE
and SPEAK pick a subset of frequency channels with the maximum amplitude levels.
4.2.1 Continuous Interleaved Sampling (CIS)
The Continuous Interleaved Sampling (CIS) sound coding strategy was initially developed
for the Ineraid implant with six electrodes that are connected directly to a percutaneous plug
(Wilson et al. 1991). The CIS strategy is offered by all manufacturers (Table 3-1). CIS uses
the number of frequency bands the same as the number of active electrodes. CIS stimulates
different sites on the cochlea based on the frequency contents of the input signals. It
emphasizes presenting the rapid temporal variation of the input signals via a high stimulation
P a g e | 35
rate. Each of the filterbank envelopes is sampled at a fixed rate, with the current pulses on
each channel interleaved in a round-robin fashion. The signal processing blocks of CIS
strategy are shown Figure 4-2. The pre-emphasis filter attenuates the frequency components
below 1.2 kHz at 6 dB/octave to reduce the dominating low frequency noise and increases
the energy level of relatively weak consonants compared to vowels. The filterbank is
followed by full-wave rectification and lowpass filtering for envelope extraction. The
frequency channels are assigned to the selected electrodes, following the tonotopic order of
the cochlea.
Figure 4-2 Continuous Interleaved Sampling strategy (Wilson 2006b)
4.2.2 HiResolution (HiRes)
HiResolution (HiRes) strategy (Firszt 2003), offered by Advanced Bionics, is a close
variation of CIS strategy. In HiRes there are 16 logarithmically spaced filters in the
filterbank that corresponds to 16 active electrodes in the cochlea. The principal differences
between HiRes and CIS are number of stimulation channels, rate of stimulation and use of a
half-wave rectifier only for envelope detection in HiRes (Wilson 2006a). HiRes can provide
P a g e | 36
temporal information up to 2800 Hz across 16 frequency channels. The stimulation rate can
be maximised by the number of electrodes used and the pulse width.
4.2.3 Spectral Peak (SPEAK)
The Spectral Maxima Sound Processing (SMSP) strategy was developed in the University of
Melbourne for the Nucleus 22 cochlear implant. In this scheme the six largest outputs of 16
frequency channels were used to stimulate at a constant rate of 250 pps. Cochlear
implemented SMSP on the Spectra sound processor of the Nucleus 22 system. It became
well known as SPEAK strategy (McDermott, Mckay and Vandali 1992; McDermott et al.
1993).
The front-end of the Spectra processor involved a sensitivity control and a fast front-end
AGC. The sensitivity control can adjust the AGC compression threshold to increase the
dynamic range of the input signal. The fast front-end AGC had an attack and release time of
2.5 ms and 50 ms respectively. The compression ratio was two to compress signal peaks
above the compression threshold. SPEAK differs from the SMSP by using a total of 20
frequency channels. The total stimulation rate was limited to 2000 pps in Nucleus 22
implant. Hence the stimulation rate per channel will be less than 100 Hz if all 20 channels
are stimulated. To provide adequate temporal information in the channel envelopes, the
number of channels was fixed at six for the average stimulation rate of 250 pps. The channel
selection of SPEAK selects the channels with higher amplitude. The number of channels
selected was based on the number of channel amplitudes above the preset noise threshold
and the energy distribution of input signal across frequency. Therefore the number of
selected channels can be less than six sometimes. For that reason, the stimulation rate was
varied from 180 to 300 pps.
4.2.4 Advanced Combinational Encoder (ACE)
The Advanced Combinational Encoder (ACE) sound coding strategy employs the maxima
selection method similar to the SPEAK strategy. The signal processing components of ACE
strategy are shown in Figure 4-3. ACE became available to the recipients when the Nucleus
P a g e | 37
24 system was launched in 1997. Since the Nucleus 24 implant is able to stimulate at higher
rate, the number of maxima can be increased without reducing the stimulation rate per
electrode. The maximum stimulation rate of the Nucleus 24 implant is 14400 pps. Therefore
compared to SPEAK, ACE can present more spectro-temporal details of the input signals by
allowing more channels to be stimulated at a higher rate.
Figure 4-3 Signal processing modules of the ACE strategy
4.2.5 Alternative Channel Selection Rules
There are also sound coding strategies that pick channels by a different set of rules.
Psychoacoustic Advanced Combinational Encoder (PACE) strategy applied a
psychoacoustic masking model to select the perceptually important channels above the
masking threshold (Nogueira et al. 2005). According to speech tests with the cochlear
implant recipients, the PACE strategy was better than ACE when each selected only four
channels per stimulation cycle. Hu and Loizou (2008) proposed a sound coding strategy with
a channel selection criterion based on the SNR of the channels. The SNR in each channel
P a g e | 38
was estimated, and target-dominated channels with SNR  0 dB were selected, while
masker-dominated channels with SNR < 0 dB were discarded.
4.3 Signal Processing Modules
The previous section described the major sound coding strategies for cochlear implant
systems. The signal processing modules of the cochlear implant signal path are studied in
this section. These signal processing modules are within the framework of any sound coding
strategy.
4.3.1 Microphone Directionality and Pre-emphasis
A directional microphone can improve the SNR if target speech and other sounds come from
different directions, for example if the speaker is in front of the recipient and interfering
sounds come from behind. Early Nucleus sound processors only had a single directional
microphone. The Freedom sound processor employs a directional microphone and an omni-
directional microphone. The Nucleus CP810 sound processor has two omni-directional
microphones whose phase and magnitude are calibrated by digital filters in the firmware to
form different microphone directionalities. In addition to the fixed-directional spatial filters,
Freedom and CP810 sound processors also have an adaptive-directional spatial noise
reduction technique called BEAM (Spriet et al. 2007).
A delay-sum directional microphone has a frequency response shape that emphasizes high
frequency components more than low frequency components. The pre-emphasis filter
provides a gain of approximately 5 dB per octave between 0.5 and 4 kHz. It flattens the
energy presented to the filterbank for the long term average spectrum of speech. It therefore
balances the energy ratio between vowels and consonants. In addition, the pre-emphasis
filter can also avoid the subsequent gain control systems from predominantly being driven by
intense sounds at low frequency, for example car noise or the recipient’s own voice. Without
the pre-emphasis, maxima selection would select more low frequency channels than high
frequency channels.
P a g e | 39
4.3.2 Front-end Gain Control
The front-end gain control system consists of the sensitivity control and the front-end AGC
system. The purpose of the front-end gain control is to increase the operating range of
acoustic signals beyond the pre-determined input dynamic range. The front-end AGC
systems for the Nucleus sound processors will be described in chapter 5 (§5.5).
4.3.3 Filterbank
The filterbank splits the frequency components of the input signals into the frequency bands
whose centre frequencies correspond to the characteristic frequencies of the cochlea. The
filterbank can be a group of Infinite Impulse Response (IIR) or Finite Impulse Response
(FIR) filters, followed by half-wave rectification and low-pass filtering for envelope
extraction. Modern DSP processors, such as SPrint, Freedom and CP810, use an efficient
FFT implementation for the frequency analysis of the input audio waveform. Hence the FFT
filterbank is of interest and explained below.
A 128-point FFT is done on the analysis frame. Before the FFT analysis, a Hann window is
applied to the input signals. Each analysis frame is overlapped with the previous frame to
make the analysis rate or the envelope sample rate close to the stimulation rate. The audio
sample rate of the Nucleus CP810 sound processor (§7.4.5.1) is approximately 16 kHz.
The Hann window is described in the following equation.
(4.1)
where N = 128 and n = sample index from 1 to N.
The Discrete Fourier Transform (DFT) of an input sequence of N samples is described in the
equation below.
(4.2)
where k is the FFT bin and n is the sample index from 0 to N – 1.
P a g e | 40
In contrast to the DFT implementation, which needs number of multiply-accumulate
operations, the FFT implementation only requires number of multiply-
accumulate operations. For 128 point FFT analysis, there are 65 bins for real components
with centre frequencies spaced linearly at multiples of 125 Hz.
4.3.4 Combine into Channels
A linear-log frequency spacing is required for the frequency bands: the centre frequencies
are linearly spaced below 1 kHz and logarithmically spaced above 1 kHz. For the bands
below 1 kHz, each band is assigned to one FFT bin. Bin 0 and 1 are discarded and the
assignment starts from bin 2. For frequency bands above 1 kHz, two or more consecutive
FFT bins are combined to produce wider bands. The default frequency range is 187 – 7937
Hz. The frequency allocation for a 22-channel filterbank is shown in Figure 4-4.
-5
-10
Magnitude (dB)
-15
-20
-25
-30
50 100 200 400 800 1000 2000 4000 8000
Frequency (Hz)
Figure 4-4 Magnitude response of 22-channel filterbank
The channel envelopes are calculated using the quadrature envelope detection method which
combines the real and imaginary parts of complex FFT samples into the allocated frequency
bands. Since the input signal is real, the FFT output is symmetric between the first half and
the second half of the bins. Hence FFT bins 65 to 127 are not required for the envelope
calculation. There are two ways to calculate the channel envelopes: (i) power sum and (ii)
P a g e | 41
vector sum. The main difference between the two methods is that the vector sum will pass
more high frequency envelope modulation than the power sum method. The following two
equations describe the power sum and the vector sum respectively.
Power sum: (4.3)
Vector sum: (4.4)
where j is FFT bin index, is the magnitude response at channel k. and are the
complex output and the fraction of energy corresponding to jth FFT bin respectively. Re and
Im stand for the real and imaginary components of the complex FFT respectively.
The energy leakage occurs between FFT bins due to the Hann window. Therefore the
channel equalization gains were applied to the channel envelopes to compensate. The
channel equalization gains considered the number of FFT bins, bin proportions, bin indices
and the Hann window response for each channel. Hence, a sinusoid input at the centre of any
channel would result in the same peak magnitude with the channel equalization. The
maximum attenuation of the channel equalization gains occur at the highest three frequency
channels and is approximately – 6 dB.
4.3.5 Channel Gains
The channel gains are either the fixed channel gains or the adaptive gains from the Adaptive
Dynamic Range Optimization (ADRO). The fixed channel gains are derived from the
channel equalization gains and the clinical gains. The clinical gains are set by the clinician
and allow the user to shape the spectrum of the filterbank. When ADRO is enabled, the fixed
channel gains are bypassed and only used in ADRO as initial gains (Figure 11-1). The
implementation of ADRO will be explained in the next chapter (§5.5.5).
P a g e | 42
4.3.6 Maxima Selection
The channel selection method for the ACE strategy is called maxima selection. The maxima
selection block scans the amplitudes of the channel envelopes and selects the channels with
highest amplitudes. The number of maxima is a clinical parameter and can be different
between subjects. If the number of channel amplitudes above the base level is less than the
number of maxima, power-up frames are presented after the stimulus frames.
4.3.7 Loudness Growth Function
For acoustic hearing, loudness is a power function of sound pressure level with exponent
(equation 2.1). Similarly, loudness was a power function of electrical current with
exponent (equation 3.3). When the cross-modality is performed between the two
equations for the same loudness (equation 4.5), a new power function with the constant
and the exponent is derived (equation 4.6).
(4.5)
(4.6)
The exponent . According to the study of Fu and Shannon (1998a),
cochlear implant recipients showed the highest speech recognition with the power-law
amplitude mapping function with an exponent of 0.2 although the subjects were fairly
insensitive to the exponent between 0.1 and 0.5. The logarithmic compressive function of
Nucleus systems has a close approximation to a power-law mapping function with the
exponent of approximately 0.25. The logarithmic compression of a typical loudness growth
function (LGF) is described by the following equation.
(4.7)
(4.8)
where is the control parameter for the steepness of the compression curve and is the
non-linear scaling function for the input signal between the base (B) and saturation (S)
P a g e | 43
levels for the frequency channel k. is related to Q value of the LGF. Q value is defined as
the percentage decrease in the output for a 10 dB decrease in the input from the saturation
level. The compressive operation of the LGF with two Q values, 20 and 40, is shown in
Figure 4-5.
Loudness Growth Function: Base Level = 4, Q varies

1
0.9
Q = 20
0.8
0.7
Output magnitude
0.6
Q = 40
0.5
0.4
0.3
10 dB
0.2
0.1
0
0B 20 40 60 80 100 120 140 M 160 180 200
Filter envelope amplitude
Figure 4-5 Instantaneous infinite non-linear compression of LGF
The shape of the LGF is intended to match the loudness perception of cochlear implant
recipients to that of normal hearing subjects for changes in sound intensity. The maximum
current level is not allowed to exceed C-level. The envelope at the saturation level (M in the
Figure 4-5) is stimulated by the maximum current level, C-level. The envelope at the base
level (B in the Figure 4-5) is stimulated by the minimum current level, T-level. The range
between the saturation level and the base level determines the operating range of the channel
envelopes. When the AGC system is not active, the levels of input acoustic signals, speech-
like signals (in dB SPL), at T-SPL and C-SPL correspond to the base and saturation level of
the LGF respectively. The signal path is calibrated such that speech at the C-SPL is
stimulated at C-levels (i.e., at the saturation level of the LGF) for the majority frequency
channels at a nominal sensitivity. T-SPL is typically determined from C-SPL and the
P a g e | 44
dynamic range of the LGF. With AGC systems, the operating range of the acoustic signals
beyond T-SPL and C-SPL can be extended.
4.3.8 Dynamic Range Selection
The challenge of cochlear implant signal processing is to map a wide range of acoustic
signals into a narrow range of electrical signals without losing essential information. If the
entire acoustic range of 120 dB was mapped into electrical dynamic range between C and T
levels, a substantial amount of compression would be required. Because the number of
differentiable current steps is limited, the intensity variations in acoustic waveforms would
be lost during compression. Therefore the operating range of the LGF is set to be much less
than 120 dB in practice. In Nucleus cochlear implant systems, the dynamic range of the LGF
is set to match the speech dynamic range because speech is most important and targeted
signal for cochlear implant recipients. A speech dynamic range of 30 dB, with speech peaks
12 dB above, and valleys 18 dB below the root-mean-square (RMS) level respectively, has
been used in the Articulatory Index (AI) calculation: ANSI S3.5 1997. Hence the input
dynamic range of the LGF in early Nucleus processors was set to 30 dB. The Freedom and
CP810 sound processors use the input dynamic range of at least 40 dB. Studies showed the
benefit of using a wider dynamic range (Dawson, Decker and Psarros 2004; Spahr, Dorman
and Loiselle 2007). The signal path at the default setting is tuned to work optimally at C-SPL
(the targeted speech level). The C-SPL is typically set to 65 dB SPL, the conversational
speech level at normal vocal effort. From the study of speech levels in everyday life by
Pearsons et al. (1976), the average overall levels for casual and normal vocal efforts by
males, females and children were 56 and 60 dB SPL respectively; with the measurements
taken at one metre from the talkers.
P a g e | 45
4.3.9 Mapping
The output of the LGF is linearly mapped into the electrode current levels (in clinical units).
Mapping can be described in the following equation.
(4.9)
where c is the current level, p is the output of LGF, T is the threshold current level, C is the
maximum comfortable current level and Vol is the volume control. The volume control can
only reduce current levels if the stimuli are too loud for the recipient.
4.4 Speech Perception
Many factors can affect speech intelligibility of cochlear implant recipients. The individual
factors include the type of hearing loss, the duration of deafness before implantation,
survival of neurons in the inner ear, duration of the implant usage and the clinical parameters
for sound coding with preferences. Broad factors affecting speech intelligibility of a recipient
include the ability of a particular cochlear implant system to provide accurate and salient
speech cues (Henry et al. 2000).
Figure 4-6 shows the electrodogram of the monosyllabic word ‘Choice’ captured at the
output of the cochlear implant signal path (§8.2.2) using Decoder Implant Emulation Tool
(DIET). Figure 4-7 shows the spectrogram of ‘Choice’ after the output stimuli were
converted back into the channel envelopes before the LGF. The diagrams show slow
temporal modulation at each electrode and frequency transition from one phoneme to
another; ‘ch’ –> ‘oi’ –> ‘ce’.
P a g e | 46
1
Ch oi ce
2
3
4
5
6
7
8
9
10
Electrode
11
12
13
14
15
16
17
18
19
20
21
22
23
4400 4500 4600 4700 4800 4900 5000 5100 5200
Time (ms)
Electrode
21
22
4500 4550 4600 4650 4700 4750 4800

Time (ms)
Figure 4-6 Electrodogram of the monosyllabic word ‘Choice’
P a g e | 47
7279
7279
5383 5383
3915 3915
2875 2875
Frequency (Hz)
2080 2080
1529 1529
1101 1101
856 856
612 612
367
367
0 0.2 0.4 0.6 0.8 1

4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5
1
0.8
0.6
0.4
0.2
0
4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5
Time (s)
Figure 4-7 Reconstructed spectrogram of the monosyllabic word ‘Choice’
The speech intelligibility of experienced cochlear implant recipients is comparable to normal
hearing subjects in favourable listening conditions even though the representation of speech
provided by the implant is crude. However, considerable performance variability remains
between subjects in noise (Fu and Galvin III 2008).
Researchers have experimented with simulated processing of the cochlear implant,
sometimes called vocoded speech, with normal hearing subjects to understand the
intelligibility of temporally and spectrally degraded speech.
4.4.1 Amplitude Cues
For normal hearing persons, amplitude cues may not be important for speech intelligibility
(Licklider and Pollack 1948). However, for cochlear implant recipients who rely on limited
temporal and spectral cues, amplitude cues are relatively more important for speech
recognition (Shannon, Zeng and Wygonski 1998; Zeng and Galvin III 1999; Zeng et al.
2002). Shannon et al. (2001) studied the effects of amplitude peak-clipping (amplitudes
above threshold was clipped) and center-clipping (amplitudes below threshold was not
presented) on the intelligibility of spectrally degraded speech processed by four channels
noise-vocoder with seven normal hearing subjects. The results showed that peak-clipping
affected vowels more than consonants whereas center-clipping affected both equally. The
consonants recognition was not significantly reduced until 75% of the amplitudes were peak-
P a g e | 48
clipped whereas vowels recognition was degraded monotonically with the percentage of
peak-clipping. Vowels are recognized from formant patterns, for example the energy ratio of
F1 and F2 is determined from both frequency place and the amplitude level at each
frequency. Shannon et al.’s study indicated that amplitude pattern was important for speech
intelligibility if the frequency components were poorly presented. It seems that the center-
clipping can affect consonants more than vowels because the energy of vowels is higher than
consonants. However, the results of center-clipping indicated that the recognition scores of
both vowels and consonants were monotonically dropped at the same rate with the amount of
center-clipping. Loizou et al. (2000) showed that amplitude resolution of eight steps were
sufficient for cochlear implant recipients for consonant recognition. They also showed that
the intensity resolution could be traded with spectral resolution when normal hearing
subjects were tested with sine-vocoded speech. Similarly, the amount of amplitude
compression did not significantly affect speech recognition (Fu and Shannon 1998a; Zeng
and Galvin III 1999).
4.4.2 Spectral Processing
The study of Loizou et al. (1999) showed that amplitude cues could be traded with the
frequency resolution. High level of speech understanding was still shown for sentences
processed through 16 channels, but with only two steps of amplitude resolution. The well-
known study of Zeng and Galvin III (1999) showed that speech was still intelligible for
Nucleus 22 cochlear implant recipients even when the amplitudes were presented at two
levels. The intelligibility was still carried by slow temporal modulation and frequency-place
information in that case. The Nucleus 22 recipients scored higher in noise when the number
of electrodes was more than four. These studies consistently showed that amplitude
resolution was less important when other cues were not restricted.
Spectral resolution becomes more important for speech understanding in noise. Fu et al.
(1998) showed that phoneme recognition with 16 frequency channels was significantly
higher than with 8 channels noise-vocoded speech. Similarly, Stickney et al. (2004b) showed
that normal hearing subjects performed significantly better with eight channels than with
four channels noise-vocoded speech for different SNR conditions. Friesen et al. (2001)
P a g e | 49
showed that the speech understanding of cochlear implant recipients and normal hearing
subjects listening to noise-vocoded stimuli improved with the number of frequency channels
for speech tests at different SNRs. However, the improvement was seen for up to seven to
ten electrodes for cochlear implant recipients. In contrast, improvement was seen up to 20
channels for normal hearing subjects. Their study demonstrated that cochlear implant
recipients were not able to fully utilize the spectral information provided by the number of
electrodes used in their implant.
4.4.3 Temporal Processing
Speech perception studies with normal hearing subjects showed that low-rate modulation
below 16 Hz was perceptually most important for speech (Houtgast and Steeneken 1985;
Drullman, Festen and Plomp 1994b). The speech envelope modulation was most prominent
around 3 to 4 Hz (Houtgast and Steeneken 1985).
In the study of Shannon et al. (1995), normal hearing subjects tested with four-channel
vocoder showed that speech intelligibility in quiet was not significantly affected until
temporal envelope modulation was reduced below 16 Hz. They concluded that slow
temporal envelope modulation was sufficient for speech intelligibility, at least in quiet.
Shannon (1992) showed that cochlear implant recipients could follow temporal modulation
in the range of 1 to 50 Hz if loudness was matched properly. A good temporal envelope
representation requires the carrier rate to be at least four times faster than the highest
modulation rate of temporal envelopes in each channel (McKay, McDermott and Clark
1994). From the study of Füllgrabe et al. (2009), low-rate amplitude modulation even lower
than 4 Hz could also contribute the speech intelligibility in noise when the listeners only
relied on the envelope information.
A trade-off between the temporal and spectral information was observed for vowels and
consonants recognition (Xu, Thompson and Pfingst 2005). Consonant recognition was
improved with the frequency channels up to 12 and with temporal modulation up to 32 Hz.
When the spectral resolution was restricted, consonant recognition was improved by
allowing more temporal information. In contrast, the spectral cues were more important for
P a g e | 50
vowel recognition and trading the number of frequency channels with more temporal
information did not improve the recognition scores much.
In normal hearing, the segregation between two competing voices is done based on the
unique temporal information from each speaker, i.e., pitch. Since cochlear implant recipients
do not perceive pitch strongly, it is difficult for them to segregate the target speech from the
competing voice. In addition, temporal processing of hearing impaired subjects is poor such
that they do not get benefits from listening through gaps to pick up components related to
target speaker like normal hearing persons. Stickney et al. (2004b) showed that speech
intelligibility of cochlear implant recipients was lower when the competing signal was
another voice than a steady state speech-shaped noise. Their study indicated that segregation
between target and competing signals was easier for cochlear implants recipients if the two
signals were temporally different (steady vs. fluctuating). Speech understanding of cochlear
implant recipients suffers if target and masker signals are modulated at similar rate. In that
case the listeners have to rely on spectral and intensity information for segregation.
4.5 Conclusion
This chapter described the sound coding strategies of cochlear implant systems and speech
perception of cochlear implant recipients or normal hearing subjects with simulated cochlear
implant processing. The existing framework of sound processing is sufficient for cochlear
implant recipients to understand speech and other environmental sounds in quiet.
Performance variation between cochlear implant recipients is still considerably large in
noise. Intensity resolution is not important in envelope waveforms if spectral resolution is
sufficiently provided in favorable listening conditions. Spectral cues can be traded if slow
temporal modulation of at least 16 Hz is preserved. However, for speech understanding in
adverse listening conditions, the recipient will require more intensity resolution in temporal
envelope waveforms, even if spectral channels are not limited to present temporally different
target speech from the non-target signals. The next chapter will study gain algorithms for
hearing prostheses to improve amplitude cues within the limited dynamic range of an
impaired hearing.
P a g e | 51
Chapter 5 Automatic Gain Control Systems
5 Automatic Gain Control Systems
5.1 Introduction
An automatic gain control (AGC) is an essential signal processing component for hearing
devices with restricted output dynamic range. The role of compression in hearing devices is
to decrease the range of sound levels in the environment to better match the limited dynamic
range of a hearing-impaired person (Dillon 2001). For hearing aid users with a low-to-
moderate hearing impairment, the output dynamic range is limited due to loudness
recruitment. For cochlear implant users with severe-to-profound deafness the output dynamic
range is limited due to direct stimulation on the auditory nerve. A good AGC shall perform
compression or amplification to improve audibility of the input signal without causing
loudness discomfort and vice versa. The goal is to improve speech intelligibility of the
recipients in adverse listening conditions. Designing an AGC system to accommodate an
input signal whose amplitude varies over a wide range into a limited dynamic range of an
impaired auditory system without perceptual distortion is a challenge. The aim of this
research is to implement such an AGC system or similar technique to optimize the input
dynamic range of cochlear implants. First, the literature review is carried out on the
characteristics of an AGC system and their effects on speech intelligibility of hearing
impaired subjects.
AGC systems in hearing aids are well established. AGC systems in cochlear implants are
adapted from them. There are similarities as well as differences between the AGC systems of
hearing aids and of cochlear implants. Although a common goal of both hearing aid and
cochlear implant system is to rehabilitate hearing by supplementing the functions of the
impaired auditory system, the methodology to achieve the goal can be quite different
between the two systems. The output of a hearing aid is a reconstructed acoustic signal and
the output of a cochlear implant system is a sequence of electrical current pulses. The output
dynamic range of hearing aids can be different across frequency for an individual user
whereas the electrical dynamic range of cochlear implant recipients is low across all
P a g e | 52
frequencies. Hence, it is worth mentioning the rationales of AGC systems in cochlear
implant systems as well as in hearing aids.
This chapter first describes the fundamentals of an AGC system. It then reviews different
AGCs for hearing aids and cochlear implant systems and their roles to rehabilitate hearing. It
describes the AGC systems of the Nucleus cochlear implant systems and then elaborates the
ones that were used by the recipients in this thesis. Noise floor estimation methods are also
investigated in this chapter because some AGC systems employ them.
5.2 Fundamentals of AGC
An AGC compresses the dynamic range of an input signal, to better fit into the dynamic
range of the system. It basically reduces the gain for an input signal above the predetermined
threshold. Typical AGC parameters are:
 compression threshold or compression knee point
 compression ratio
 attack time
 (optional) hold time
 release time
The compression threshold is the input level, above which an AGC is active. The
compression ratio determines the amount of gain, which is the inverse of the slope of the
input-output (I-O) diagram, for the input levels above the compression threshold. Figure 5-1
shows the I-O diagrams of an AGC with different compression ratios. The output levels were
different from the input level above the compression threshold, for a compression ratio of
greater than one. The compression ratio of 1:1 indicates a linear gain, 2:1 indicates a
compression of 0.5 and ∞:1 indicates an infinite compression.
P a g e | 53
AGC IO Curve
120
100
1: 1
80 2:1
Output level (dB)
:1
60
 knee point
40
20
0
0 20 40 60 80 100 120
Input level (dB)
Figure 5-1 Input-output diagram of an AGC with different compression ratios
The attack time is the time taken for an AGC to react to an increase in input signal level. As
per the standard for AGC systems in hearing aids, IEC 60118-2, the attack time is the time
taken for the output to stabilize within 2 dB of its final value after the input increases from
55 to 80 dB SPL (IEC 1997). According to ANSI, the attack time is defined as the time it
takes the output to drop to within 3 dB of the steady state level after a 2 kHz sinusoidal input
changes from 55 to 90 dB SPL (ANSI 1996). The release time is the time taken for the gain
to return back from the attenuation mode following a decrease in input level. As per IEC
60118-2, the release time can be defined as the time taken for the AGC output signal to
increase to within 2 dB of its final value following a decrease in input level from 80 dB SPL
to 55 dB SPL. According to ANSI, the release time is defined as the time it takes the 2 kHz
sinusoidal output to stabilize within 4 dB of the steady-state level after the input changes
from 90 to 55 dB SPL. The hold time is the time taken for the gain to stay at the previous
value before it is released from the attenuation mode following the decrease in an input level.
The objective of applying a hold time is to reduce pumping effect. The hold time parameter
is optional in AGC systems.
P a g e | 54
These parameters, attack time, release time and hold time, define the dynamic characteristics
of an AGC, whereas compression threshold and compression ratio define the static
characteristic of an AGC.
An AGC consists of a control block, which consists of a feature extraction and gain decision
unit, and a gain unit as shown in Figure 5-2.
Figure 5-2 Components of an AGC system
The feature extraction in a simple AGC system detects the envelope of an input signal and
calculates the peak or RMS level for the gain decision unit. An envelope detector consists of
a rectifier and a low-pass filter for smoothing. The gain decision unit checks whether the
input level exceeds the compression threshold and determines the amount of gain according
to the compression ratio. The gain unit reduces the input signal accordingly.
Depending on the type of a level detector, the amount of gain can be different. If a peak
detector is employed, more compression will be exerted on the complex stimuli with
distinctive peaks. If the level detector detects the RMS level of the input signal, the
compressor may exercise lesser amount of compression compared to the one with a peak
detector. An AGC often cannot prevent fast transients from entering into the system,
depending on the smoothing low-pass filter in the envelope detector. The time constants of
an AGC are determined from the smoothing filter. If the envelope detector is slow, the attack
and release times are long, and the gain will be slow responding to the rapid changes in input
signal. Although less spectral distortion can be expected, temporal distortion such as
overshoots can occur. Overshoots can create false articulation and affect speech
intelligibility (Verschuure et al. 1996). If the attack time is short, the AGC responds quickly
to loud input sounds. However, abrupt gain changes can introduce extra frequency
P a g e | 55
components that are not part of the input signal. Figure 5-3 shows an AGC responding a
sinusoid input above the compression threshold.
Figure 5-3 Behaviour of a typical AGC system
A typical AGC uses an attack time much less than a release time. An AGC system can be
classified as fast-acting if the release time is less than 200 ms (Walker and Dillon 1982;
Dreschler 1992; Souza 2002). Fast AGCs typically have short attack (0.5–20 ms) and release
times (5–200 ms). A fast-acting AGC is often called a “syllabic” or “phonemic” AGC if the
time constants are shorter than the duration of common syllables (Moore 2008).
5.3 AGC in Hearing Aids
The aims of an AGC system in hearing devices are (i) to provide access to low level sounds
and (ii) to make loud sounds more comfortable. More importantly, an AGC should achieve
both goals without significant perceptual distortion. The ultimate goal is to improve the
speech intelligibility of recipients in their everyday life listening environments. There are
different rationales for designing an AGC system for a hearing prosthesis. An AGC system
either adjusts the gain slowly to adapt to the changes in the overall presentation level from
one listening situation to another or adjusts the gain dynamically to normalize loudness
P a g e | 56
between soft and loud components of speech. An AGC system can have multiple
compression channels to provide gain specific to frequency.
Dynamic range reduction is one of the deficits associated with the sensorineural hearing loss.
For a person with low-to-moderate hearing impairment, the dynamic range reduction shows
as an increased hearing threshold although the perception of loud sounds is still the same as
normal hearing. This phenomenon is called loudness recruitment (Dillon 2001). Many forms
of compression are used in both systems to overcome the reduced dynamic range.
Suprathreshold deficits associated with sensorineural hearing impairment reduce the ability
to discriminate components in both frequency and time domains (Moore 2003a).
The operation of an AGC system can be described in terms of (i) speed of compression, (ii)
amount of compression, and (iii) number of independent compression channels. The speed of
compression is the rate of gain change for a given input level and is determined from a
combination of the timing parameters. Table 5-1 lists the rationales for different AGC
systems that are commonly used in hearing aids.
P a g e | 57
Type of AGC Implementation Rationale
Compression Compression Compression
Speed Threshold Ratio
Compression fast high high To limit envelope
limiter clipping and to reduce
loudness discomfort
Wide Dynamic fast low low To reduce inter-
Range syllabic intensity
Compressor contrast (therefore
(WDRC) increase audibility of
softer syllables of
speech)
Automatic slow low low/high To reduce the overall
Volume Control level difference
(AVC) between different
listening conditions
Table 5-1 Rationales for different AGC systems
The choices of time constants and their effects on speech intelligibility have been reviewed
for hearing aids (Chapter 6, Dillon 2001; Moore 2008). Ideally, the time constants (attack
and release times) need to be fast for an AGC to adjust the energy ratio of soft and loud
sounds effectively. The rationales for a fast-acting AGC system are (i) to avoid loudness
discomfort and sound quality distortion due to peak-clipping of sounds at very high level, (ii)
to normalize loudness as normal hearing by reducing the intensity difference between short-
term speech components (phonemes and syllables) with high and low energy (Dillon 2001).
Based on these two rationales, a fast AGC system can be either a compression limiter or a
Wide Dynamic Range Compressor (WDRC).
Early hearing aids employed a peak-clipper to limit the maximum amplitude of the output
signals. An AGC was proposed as a compression limiter to avoid harmonic distortion,
intermodulation distortion and noticeable sound quality distortion introduced by the peak-
P a g e | 58
clipper (Steinberg and Gardner 1937). A compression limiter typically has fast time
constants, a high compression threshold and a high compression ratio.
A WDRC works under the assumption that speech intelligibility can be improved by having
access to low level signals (Villchur 1973). It dynamically reduces the inter-syllabic intensity
between speech components by doing both compression and amplification. A WDRC has a
low compression threshold to include signals within a wide range. Due to the setting of fast
time constants and a low compression threshold, the compression ratio of a WDRC is
typically low, less than three, to avoid significant distortion on temporal envelopes (Dillon
2001).
Speech intelligibility improvement was mostly shown for speech in quiet and the
degradation in noisy conditions (Souza 2002). The reported benefits of WDRC were higher
than linear amplification in the study of Marriage et al. (2005) with severe and profound
hearing loss children with multichannel hearing aids. However, some of the children did not
like the amplification of background sounds in noisy situations with the WDRC. The
performance improvement with a WDRC was due to the accessibility of low-level signals
below hearing threshold. The degradation in noise was due to the output SNR reduction.
Fast AGC can reduce speech intelligibility in noise due to the distortion of the correlated
fluctuation in the envelopes of different frequency channels which promotes the perceptual
fusion of the targeted speech and the reduction of the modulation depth and intensity contrast
(Stone and Moore 2004). It can also reduce speech intelligibility by reducing spectral and
temporal contrasts of speech (Plomp 1988). If the rate of gain change is faster than the
modulation frequency of a phonetic entity, a fast AGC is likely to distort temporal envelope
patterns and the information carried by that phonetic entity (Plomp 1983). In addition, the
cross-modulation components introduced by fast AGC between target speech and competing
voice degraded the intelligibility of target speech (Stone and Moore 2003, 2004, 2007). Fast
compression can also degrade speech intelligibility in noisy conditions by reducing the
apparent signal-to-noise ratio at the output. Rhebergen et al. (2008b) showed an effective
SNR degradation for a WDRC with different compression ratios.
P a g e | 59
The rationale of employing a slow AGC in a hearing prosthesis is to reduce the overall
speech presentation levels from different listening conditions. The time constants, typically
the release time, are much longer than the duration of syllables. The intensity relationship
between syllables is not affected by slow AGC system. A slow AGC with the time constants
in the range of seconds are often called Automatic Volume Control (AVC).
From the review of Souza (2002), a release time of 200 ms in compression systems of
hearing aids is typical for most daily situations. Many studies have evaluated the effect of
AGC time constants on speech intelligibility and quality. Most of them consistently showed
that a long release time was preferred for the perceived quality of sounds in listening
conditions with high background noise. The results of speech understanding with slow AGC
were mixed. Hansen (2002) showed the advantage of using a long release time over shorter
release times. Gatehouse et al. (2006) showed that shorter release times gave better speech
intelligibility than longer release times in two-channel AGC although the longer release
times setting was rated higher for speech quality. A few studies showed that the release
times, short or long, had no significant effect on speech intelligibility and quality (King and
Martin 1984; Neuman et al. 1998; Stone et al. 1999; Jenstad and Souza 2005).
A slow AGC, with a long release time, often has stagnant moments in which gain is slowly
recovered from momentarily loud transient sounds. Sounds following loud transients can be
too soft during that period. Sudden increases in sound level are commonly found in real-life.
For instance, the amplitude variation of an input signal can be wide during a conversation
between the recipient and another person, with rapid increases in level at the onset of voiced
sounds produced by the recipient due to a short distance between the microphones and the
recipient’s mouth. In order to accommodate a wide range of acoustic signals (speech and
other sounds) with short and long-term level variations, more than one AGC systems, or an
AGC system with more than one control loop with short and long time constants, are
necessary (Moore and Glasberg 1988).
Multichannel AGC systems are most commonly found in hearing aids because of frequency
dependent hearing loss. Studies reported mixed results on the benefits of a multichannel
compression compared to a linear amplification in hearing aids. A multichannel compression
P a g e | 60
can improve the ability of hearing impaired subjects to distinguish between stops and
fricatives. In contrast, it can degrade spectral concentration and hence the relative intensity
cues to identify some particular stops and fricatives. It can also degrade the place of
articulation and duration required to identify some consonants (De Gennaro, Braida and
Durlach 1986).
Unlike a single-channel wideband AGC, a multichannel AGC can reduce the level of signal
in those frequency channels where the background noise dominates. With an independent
multichannel AGC, less cross-modulation effect can be expected from two signals with
different frequency contents (Stone and Moore 2008). Another benefit of using a
multichannel AGC with independent compression channels is that it can reduce the level of
high intensity narrowband components without affecting other frequency components. On
the other hand, the independent compression can flatten the spectral profile of an input signal
by lowering spectral peaks while spectral troughs remain untouched. Since spectral peaks
and valleys are the important characteristics of speech sounds, spectral profile flattening
makes it harder to identify different components of speech, for example identification of
place of articulation of consonants. To reach a compromise between spectral flattening and
the increase in audibility, a multichannel AGC typically avoids having a compression ratio
higher than three if the number of channel is high (Dillon 2001). Plomp (1994) showed the
degradation of sentence recognition with more compression channels when the compression
ratio was higher than two (see Figure 5-4). A multichannel AGC can have more detrimental
effects on speech intelligibility than a single-channel wideband AGC if the rate of gain
change is too fast. A fast multichannel AGC with an independent compression channels can
distort short-term spectral cues such as formant patterns and the rapid gain change can
reduce the temporal modulation depth and the intensity contrast of different speech
components (White 1986; Plomp 1994; Moore 2008; Stone and Moore 2008).
P a g e | 61
Figure 5-4 Intelligibility score as a function of the number of channels, with

compression ratio as a parameter (Plomp 1994)
A multichannel AGC can be cross-coupled between frequency channels to reduce spectral
distortion and consequently improve speech intelligibility (White 1986). Moore, Peters and
Stone (1999) compared the effectiveness of a linear amplification and a WDRC with one,
two, four or eight channels by measuring SRT of subjects listening to sentences in
background sounds with spectral and/or temporal dips. When the background sounds
contained temporal dips, the number of channels was not important. However, when the
background sounds contained spectral dips, the multichannel AGC helped to improve the
audibility of a target speech that fell in the spectral dips of the background sound.
Yund and Buckles (1995b) analyzed the effect on the number of channels on the speech
intelligibility of mild-to-moderate hearing impaired subjects with speech-shaped noise and
competing speakers at different SNRs. They reported that speech intelligibility was
improved with increasing number of compression channels from four to eight. Scores were
not significantly different for the AGC with more than eight compression channels. The
study of Kates (2010) indicated that the number of channels was important for a steeply
sloping hearing loss. Kates showed that speech intelligibility was reduced when the number
of compression channels was reduced. The benefits of using a multichannel fast-acting
P a g e | 62
compression for speech intelligibility were only showed in certain conditions. Bustamante
and Braida (1987) showed that higher speech intelligibility was achieved with multichannel
WDRCs at low presentation levels because the independent channel compression allowed
more gain to be introduced in each channel for an equivalent overall loudness generated by a
linear gain. However, speech intelligibility became worse at high presentation levels because
of spectral distortions inflicted by an independent channel compression. Only subjects with
moderate hearing loss received benefits from a multichannel fast AGC over a linear
amplification in the low SNR condition (Yund and Buckles 1995a).
Numerous studies explored the benefits of different AGC systems on speech intelligibility
and quality for hearing aid users. The results are mixed with no consensus on how to best
configure a compression system for a given hearing loss (Kates 2010). Many studies showed
the advantage of compression over linear amplification (Laurence, Moore and Glasberg
1983; King and Martin 1984; Gatehouse, Naylor and Elberling 2006; Shi and Doherty 2008),
some showed no advantage (Crain and Yund 1995; Yund and Buckles 1995a; Davies-Venn,
Souza and Fabry 2007) and a few showed linear amplification had advantage over
compression (Lippmann, Braida and Durlach 1981; van Buuren, Festen and Houtgast 1999).
Although the studies aimed to characterize the same parameter, the difficulty to compare the
results from different studies arises from AGC systems with different configurations. In
addition, using different test stimuli and test setups on subjects with different degrees of
hearing loss makes the comparison between studies almost impossible.
Many fitting procedures, such as NAL-NL1 (Byrne et al. 2001), Camfit (Moore 2000), and
the Desired Sensation Level (Scollie et al. 2005) have been developed for configuration of
AGC systems in hearing aids as a function of hearing loss. From the literature of hearing
aid’s fitting, the most effective compression parameters depend on the hearing loss of the
aid’s wearer. It was not uncommon that the effects of compression parameters on speech
intelligibility of hearing aid users were mixed. Although an optimal set of parameters for an
AGC system is not readily available in all listening conditions, it is generally agreed upon to
avoid using a high compression ratio if the compression threshold is low and time constants
are fast. Similarly, AGC systems should avoid using a large number of compression channels
if the compression ratio is high and the compression channels are independent (Plomp 1988).
P a g e | 63
Levitt stated that the magnitude and impact of individual differences were greater with more
advanced compression systems and therefore should not be underestimated (Bacon, Fay and
Popper 2004).
5.4 AGC in Cochlear Implant Systems
The purpose of an AGC in a cochlear implant system is to provide access to sounds that
would otherwise be outside of the input dynamic range of the cochlear implant. The
rationales for the input dynamic range selection was reviewed in §4.3.8.
Sensitivity control, located after the Analog-to-Digital Converter (ADC) of the microphones,
is the first gain control in the cochlear implant signal path. It provides a linear gain to adjust
the sensitivity of the microphone signals as well as the knee point of the front-end AGC.
Increasing sensitivity brings more low-level input signals into the electrical dynamic range.
It can increase the intelligibility of low level speech components. Decreasing sensitivity on
the other hand can push low level signals out of the operating range, and this setting can be
useful in noisy situations. The sensitivity control adjusts the gain for varying acoustic
scenarios by user’s intervention. It is inconvenient to adjust the sensitivity control frequently
in some listening situations. Besides, if the recipient is a child, he or she may not be able to
manually adjust the gain without the adult’s supervision. Therefore an AGC is necessary to
maintain an input signal at the optimal level in different listening environments.
In cochlear implant systems, different forms of compression are used to present a wide range
of auditory signals to the implant. The amount of published data on AGC systems in
cochlear implant systems is relatively few compared to AGC systems in hearing aids. Like
AGC systems in hearing aids, AGC systems in cochlear implant systems can also be
categorized into three: a compression limiter, a WDRC, and a slow AGC. The purpose of a
compression limiter is to reduce peak-clipping distortion due to the instantaneous infinite
compression at the LGF. The purpose of a WDRC is to provide compression amplification to
promote the audibility of low-level input stimuli. A slow AGC is employed in cochlear
implant systems to adjust a long-term gain for an input signal to adapt changes in listening
situations.
P a g e | 64
McDermott et al. (2002) showed the benefits of a WDRC, known as Whisper in the Nucleus
systems, for low level speech understanding in quiet. Whisper improved speech intelligibility
of cochlear implant recipients for speech presented at low presentation level. Although it was
not statistically significant, Whisper showed a tendency to degrade the performance in noise.
It was explained by the reduction of the effective SNR when background noise was boosted
during speech pauses. Six subjects who used Whisper in real-life listening environments
reported that compression made background noise louder. All subjects except one agreed
that the benefits of Whisper outweighed the background noise. It should be noted that
individuals with a good sensitivity to temporal fine structures can get more benefits from a
fast compression (Moore 2008). Since cochlear implant recipients have poor sensitivity to
temporal fine structures, they are less likely to have much benefit from a fast compression.
A slow AGC is necessary in a cochlear implant system to expand the range of input signal
without causing the spectral and temporal distortion that typically associated with rapid gain
change. Automatic Sensitivity Control (ASC) is a slow AGC employed in the Nucleus
systems to control the level of background noise (Seligman and Whitford 1995). ASC
improved speech intelligibility of cochlear implant recipients in noisy conditions (Wolfe et
al. 2009).
Modern cochlear implant sound processors use sophisticated AGC systems with multiple
control loops to meet the requirements for short-term and long-term level adjustments. The
dual-time-constant AGC system developed in Cambridge has been widely accepted in
contemporary cochlear implant systems as a standard AGC (Moore and Glasberg 1988;
Moore, Glasberg and Stone 1991; Stone et al. 1999; Spahr, Dorman and Loiselle 2007;
Boyle et al. 2009). If a loud transient comes in, the dual-loop AGC passes the control to the
fast compression to compress the input signal immediately. Hence it can overcome the
disadvantage of a slow AGC which often makes sounds inaudible after loud transients due to
a long release time. When the transient is gone, the AGC then passes the control back to the
slow component to determine the output level.
Stöbich et al. (1999) evaluated six front-end configurations; the standard slow AGC, four
configurations dual-loop slow AGC and a linear gain, in the Med-El Combi-40 system.
P a g e | 65
Stöbich et al. showed that the AGCs evaluated were effective to maintain speech
intelligibility at soft, medium, and loud level (55, 70 and 85 dB SPL) and no significant
performance difference between them. Although the slow AGC was efficient for speech
presented at the three presentation levels, subjects performed significantly better with the
dual-loop AGCs in listening situations that involved impulsive noise.
Boyle et al. (2009) compared the performance of two front-end AGC systems, a fast AGC
and a dual-time-constant AGC, with six cochlear implant recipients. Figure 5-5 shows the
implementation of the dual-loop AGC system evaluated in Boyle et al.’s study. If the output
level was above 8 dB of the running level determined by the slow system, the fast AGC
rapidly reduced the gain. A hold time was inserted in the dual-loop AGC of Boyle et al.’s
study. When the hold timer was activated, gain was frozen to prevent the background sounds
from becoming loud in between short gaps of speech components. Since the compression
threshold of the slow AGC was lower than that of the fast AGC, the operation of the dual-
loop AGC was mainly determined by the slow component.
Figure 5-5 Block diagram of dual time-constant AGC system (Boyle et al. 2009)
The dual-loop AGC showed a significantly better performance than the fast AGC in both
fixed and roving presentation level tests. Boyle et al. explained that the performance
degradation with the fast AGC due to the disruption of low-rate envelope cues was reduced
P a g e | 66
in the dual-loop AGC. The quality questionnaire indicated that the dual-loop AGC was
preferred more than the fast AGC by majority of the participants. Many found that it was
easier to separate target speech from the background noise with the dual-loop AGC.
A multichannel AGC operates on the channel amplitudes after the filterbank in the cochlear
implant signal path. The compression channels can be independent or interdependent
between each other. If the number of independent compression channels is same as the
number of frequency channels, the rate of gain change shall be very slow to avoid envelope
distortion. A multichannel slow AGC called Adaptive Dynamic Range Optimization
(ADRO) (Blamey 2005) has been used in the Nucleus systems (James et al. 2002; Patrick,
Busby and Gibson 2006). ADRO independently and slowly adjusts gain in each frequency
channel using some statistical rules. It improved speech intelligibility of cochlear implant
recipients at low presentation level in quiet (Dawson, Decker and Psarros 2004; Iwaki,
Blamey and Kubo 2008; Muller-Deile et al. 2008). Not all studies showed the benefits of
ADRO in noise. Although ADRO has a background noise rule to limit the level of
background noise, it was not very active due to a high noise threshold.
Little has been published on the performance of just the compression limiter on speech
intelligibility of cochlear implant recipients. It has been employed in the Nucleus systems as
a default AGC to limit envelope clipping and loudness discomfort of input signals above the
target level (C-SPL). The compression limiter frequently adjusts short-term levels of high
level input signals above the compression threshold. As its role seems to be important for
loud sounds, at least intially before the slow system takes over the operation, the effects of
compression limiter on speech intelligibility of cochlear implant recipients will be studied
more in the second part of this thesis.
5.5 AGC Systems in the Nucleus Sound Processors
This section first describes the AGC systems in the Nucleus sound processor. It then
describes the implementation of the existing AGC systems in the Nucleus CP810 sound
processors because these AGCs will be evaluated with cochlear implant recipients in the
second part of this thesis. The speech intelligibility of cochlear implant recipients with the
P a g e | 67
existing AGC system will serve as a performance benchmark for the proposed gain
optimization algorithms to compare with.
The Nucleus sound processors have the front-end gain control algorithms known as ASC,
Whisper and the compression limiter. The multichannel AGC known as Adaptive Dynamic
Range Optimization (ADRO) (James et al. 2002) is available after the filterbank. ADRO has
been introduced into the Nucleus signal path since 2005 (Patrick, Busby and Gibson 2006).
The AGC systems available in the Nucleus CP810 sound processor are shown in Figure 5-6.
In the Nucleus CP810 sound processor, the front-end AGCs are consolidated into one AGC
at the front-end. Since this AGC can be configured to perform a single or a combination of
the front-end AGC algorithms, it will be called the unified gain model (UGM). The UGM
can emulate the performance of ASC, Whisper and the compression limiter.
Figure 5-6 AGC systems of the Nucleus CP810 sound processor
5.5.1 Compression Limiter
The front-end compression limiter is a default AGC in the Nucleus sound processors. The
purpose is to reduce envelope clipping to high-level input stimuli. The compression
threshold of the fast AGC is set to compress peaks of the speech waveform whose overall
level is above the targeted level, i.e., C-SPL. A typical C-SPL is 65 dB SPL in the Nucleus
Freedom and the Nucleus CP810 sound processors. An infinite compression ratio is used to
ensure that the output signal cannot exceed the compression threshold. The attack and
release time is typically less than 10 and 100 ms for this type of AGC.
5.5.2 Automatic Sensitivity Control
Automatic Sensitivity Control (ASC) is a slow AGC that automatically adjusts the
microphone sensitivity (gain) slowly to control the level of background noise that exceeds
P a g e | 68
the breakpoint (Seligman and Whitford 1995). It gives listening comfort to the recipients in
difficult listening situations with high background noise (Wolfe et al. 2009).
The block diagram of ASC is as shown in Figure 5-7. It is a feedback comparative system
with the envelope detector followed by the noise floor detector. The envelope detector
follows peaks of the half-wave rectified input signals. The noise floor estimator tracks the
minimum level of the envelopes. The gain provided by ASC is based on the comparison
between the estimated noise floor level and the ASC break point. If the estimated noise floor
is above the threshold, the gain is slowly decreased and if it is below the threshold, the gain
is slowly increased. The maximum gain is limited by the user’s sensitivity setting.
ASC Gain
(Sensitivity)
ASC Input ASC Output
half wave
rectifier
envelope
detector
noise floor
detector
sensitivity (gain)
adjustment breakpoint
(noise floor target)
Figure 5-7 Block diagram of Automatic Sensitivity Control (Seligman 2000)
When ASC is not enabled, a fixed gain is applied to the input signals. When ASC is enabled,
the sensitivity is fixed at the default setting of 12 (0 dB gain) for the estimated noise floor
below the ASC break point. The ASC reduces gain in the presence of a long duration high-
level noise while having negligible effects on speech signals in quiet conditions.
P a g e | 69
5.5.3 Whisper
Whisper is a fast-acting front-end AGC with a post-compression gain boost (McDermott,
Henshall and McKay 2002). Whisper is a WDRC that adjusts the intensity difference
between loud and soft speech components. The purpose of Whisper is to increase the
perceived loudness of low-level speech below the compression threshold. The compression
threshold of Whisper is 52 dB SPL and the compression speed is approximately at a syllabic
rate. The compression ratio of Whisper is 2:1. Figure 5-8 shows the input-output diagram of
Whisper.
Whisper: IO Curve
80
75
70
2:1
65
60
Output (dB)
 knee point
55
50
45
40
35
30
30 40 50 60 70 80
Input (dB SPL 1kHz Sine)
Figure 5-8 Input-output diagram of Whisper
5.5.4 Unified Gain Model (Tri-loop AGC)
The unified gain model (UGM) consists of a slow, medium and fast AGC. The fast AGC is
functionally the same as the compression limiter (§5.5.1). The medium AGC with the
intermediate time constants between fast and slow is to provide listening comfort. The AGCs
of the UGM are cascaded in the order of time constants, the slow AGC followed by the
medium AGC and then the fast AGC. When the UGM is configured to operate all three AGC
components, it becomes a tri-loop AGC. The term tri-loop AGC will be used, if all three
AGCs of the UGM are utilized. The input to the UGM is either the RMS or peak of the half-
P a g e | 70
wave rectified envelopes. The tri-loop AGC experimented in this thesis used an RMS level
detector. The output of the level detection circuit is updated at the envelope sample rate after
the filterbank. Since the compression threshold of the slow AGC is lower than that of the fast
AGC, the operation of the tri-loop AGC is mainly determined by the slow compression. The
slow AGC determines the amount of gain based on the comparison between the input level
and the slow compression threshold. The medium AGC determines the amount of gain based
on the comparison between the compressed input level (i.e., the level at the output of the
slow AGC) and the medium compression threshold. Likewise, the fast AGC determines the
amount of gain based on the comparison between the already compressed input level by the
slow and medium AGCs and the fast compression threshold. Total gain is a linear
combination of gains in decibel from all AGCs involved.
Many forms of AGC, for example emulation of ASC, Whisper and the front-end
compression limiter of the Freedom processing, can be configured by different parameter
settings of the UGM. Table 5-2 shows three settings of the UGM parameters used in this
thesis. The first column named FEL is the UGM parameter setting that emulates the front-
end compression limiter of the Nucleus Freedom processor. The second and third columns
are two configurations of the tri-loop AGC used in this thesis.
UGM Parameter Setting FEL Tri-loop 75 Tri-loop 65

AGC Level Detection Peak RMS RMS
Slow AGC Attack Time 0 8000 8000
Slow AGC Release Time 0 8000 8000
Slow AGC Compression Threshold 95 54 54
Slow AGC Compression Ratio 1 Inf inf
Medium AGC Attack Time 0 300 300
Medium AGC Release Time 0 2000 2000
Medium AGC Hold Time 0 100 100
Medium AGC Compression Threshold 95 74 64
Medium AGC Compression Ratio 1 Inf Inf
Fast AGC Attack Time 3 5 5
Fast AGC Release Time 53 100 100
Fast AGC Compression Threshold 73 79 69
Fast AGC Compression Ratio Inf Inf Inf
Table 5-2 Parameter settings of the UGM
P a g e | 71
Figure 5-9 shows the input, output signals and the compression of the tri-loop AGC (Tri-loop
65) to a sinusoid input at 1 kHz with step changes in the presentation level. When the
presentation level of the input signal stepped up at 7 seconds, it exceeded the compression
thresholds of the slow AGC. Hence the slow AGC slowly reduced the gain in response to the
increased level of the input signal. The medium AGC also reduced the gain since the
compressed input level at the output of the slow AGC was still above the compression
threshold of the medium AGC. At about 9 and 16.5 seconds, and the input stepped up again
and triggered all AGCs. When the amount of medium gain at 9 and 16.5 seconds are
compared, the one at 16.5 seconds was lower because the input level had already been
reduced by the slow AGC in the latter case. The same observation was made for the fast
AGC. Since the overall compression was mainly determined by the slow AGC, less temporal
and spectral distortion is guaranteed in this multiple-loop compression scheme.
80
dB SPL
60
Input Slow AGC compression threshold Medium AGC compression threshold Fast AGC compression threshold
40
0
-10
dB
-20
Fast AGC Gain
-30
0
-10
dB
-20
Medium AGC Gain
-30
0
-10
dB
-20
Slow AGC Gain
-30
0
-10
dB
-20
Total Gain
-30
0 5 10 15 20 25 30
Time(s)
Figure 5-9 Input, output and gain signals of the tri-loop AGC on a roving-level sinusoid
5.5.5 Adaptive Dynamic Range Optimization
The Adaptive Dynamic Range Optimization (ADRO) (James et al. 2002) is a slow
multichannel AGC located after the filterbank in the cochlear implant signal path (Figure
5-6). ADRO independently adjusts the gain in each frequency channel. The main objective
P a g e | 72
of ADRO is to make the output comfortably loud. The gain rules of the ADRO are devised
to meet the following aims:
 to improve the audibility of soft sounds,
 to maintain loud sounds at a comfortable level, and
 to keep background noise below an objectionable level
The input to the control block is the percentile estimates of the long term level of the output
at each frequency channel. The percentile levels are estimated by Nysen’s recursive
percentile estimator (Nysen 1980). The recursive estimator operates by taking the level in the
data register and comparing it with the instantaneous signal level. If the level is greater than
the estimate, the estimate is increased by an amount proportional to ‘q’, the desired
percentile level. If the level is less than the estimate, the estimate is decreased by an amount
proportional to ‘1 - q’.
For each gain rule, a percentile estimate is compared with a preset threshold to determine the
direction of gain, up or down, for each channel. Figure 5-10 shows the operation of ADRO
in one channel. Three gain rules operate ADRO as per the objectives outlined above. Only
one rule is activated at a time. Gain rules are in the order of importance as shown in the
pseudo code below:
if high_percentile > high_target % Comfort rule
reduce gain;
elseif low_percentile > low_target % Noise rule
reduce gain;
elseif mid_percentile < mid_target % Audibility rule
increase gain;
else hold gain;
end
P a g e | 73
The rate of gain change is typically slow. The precedence of the rules indicates that listening
comfort is given higher priority than audibility. The final gain is limited to not exceed a
predetermined maximum gain.
Figure 5-10 Block diagram of ADRO in one frequency channel
One possible setback of ADRO is that the background noise rule can reduce audibility if the
estimated background noise floor using the percentile estimator is inaccurate. Similarly, the
audibility rule can increase the background noise level together with target speech. Therefore
ADRO has placed the limitation on the maximum gain to prevent the output from becoming
excessively loud. There are a few studies that evaluated ADRO with cochlear implant
recipients (James et al. 2002; Muller-Deile et al. 2008). All studies consistently show the
benefit of ADRO for low-level speech perception of the cochlear implant subjects.
5.6 Noise Estimation
Noise reduction and speech enhancement algorithms estimate the inherent noise power from
the input stimuli. Noise estimation is also part of the AGCs, for example ASC and ADRO, to
reduce the overall level input stimuli in high background noise.
A single-channel noise estimation algorithm does not have an exclusive access to the noise
power. The input signal to a noise estimation algorithm is a mixture of speech and other
competing signals. Therefore, noise estimation algorithms work under the assumption that
P a g e | 74
speech and noise are independent signals, and they are either temporally or spectrally
distinguishable. If speech and noise come from spatially different sources, for example
listening to the speaker in front while lawn-mowing noise comes from behind, the two
signals can be separated by a microphone-array beamforming algorithm. This thesis is
concerned with the single-channel noise estimation algorithms that estimate noise power
based on different spectro-temporal characteristics of speech and noise signals. Noise
estimation and speech enhancement are active research topics in speech processing and
communication fields.
Single-channel noise estimation algorithms can generally be sorted into three categories
(Loizou 2007):
 Minimum-tracking algorithms
 Time-recursive averaging algorithms
 Histogram-based algorithms
The minimum-tracking algorithms work under the assumption that noisy speech power in
each frequency band often decays to the noise power, even during speech activity (Martin
1994, 2001). Therefore noise power can be estimated by tracking the minimum of the noisy
speech power within a short time window of duration from 400 ms to 1 second. Martin’s
minimum statistics noise estimation algorithm has been successfully used in the SNR-based
Noise Reduction algorithm of the Nucleus CP810 sound processor (Dawson, Mauger and
Hersbach 2011; Hersbach et al. 2012).
The recursive averaging algorithms update noise estimate whenever the effective SNR is low
(Lin, Holmes and Ambikairajah 2003a, b). The updated noise power is the weighted average
of the estimated noise power from the previous frames and the noisy signal power spectrum
at the present frame. In the noise estimation method proposed by Lin, Holmes and
Ambikairajah (2003b), the coefficient of the noise update filter is adaptively calculated from
the effective SNR. There are different implementations of time-recursive averaging
algorithms that used a fixed coefficient for the noise update filter but only updated the
estimated noise power when the effective SNR exceeded a certain threshold value (Hirsch
and Ehrlicher 1995) or when the noisy spectrum fell within the a fraction of the variance of
P a g e | 75
the noise estimate (Ris and Dupont 2001). Other types of time-recursive averaging
algorithms calculated the probability of speech presence in the frequency bins and updated
the smoothing coefficient accordingly (Malah, Cox and Accardi 1999). The speech presence
probability was also calculated from the ratio of the noisy power spectrum to the spectral
minimum from the previous frames (Cohen and Berdugo 2002). Such algorithms use a
spectral minima-tracking feature and are also known as Minima-Controlled Recursive
Averaging (MCRA) algorithms.
The histogram-based techniques are based on the observation that noise components occur
more frequently than speech components (Loizou 2007). The histogram of a noisy speech is
expected to be a bimodal, with the lower-level mode corresponding to the noise components.
From the study of Stahl et al. (2000), using the median of the noisy speech as the noise floor
estimate reduced the word error rate. Histogram-based techniques often can be slow due to
the process of sorting the signal levels and counting the number of occurrences. They also
require a large time window to collect samples to obtain meaningful statistical information
from the input signal. Nysen (1980) proposed a method that efficiently estimates a percentile
without having to collect and sort a large number of samples from the input signal. Nysen’s
percentile estimator has been successfully used in ADRO (James et al. 2002; Blamey 2005;
Blamey, Macfarlane and Steele 2005).
The recursive averaging sub-band noise estimation method proposed by Lin (2004) was
experimented in the proposed Adaptive Loudness Growth Function (§12.3.3). Minima
tracking and bias compensation in the Martin’s minimum statistics noise estimation (Martin
2001) are applied in the proposed noise estimator. Therefore both Martin’s and Lin’s method
are described in the following subsections.
5.6.1 Martin’s Minimum Statistics Noise Estimator
The Power Spectral Density (PSD) of noisy speech often decays to the noise power level
even during speech activity. Hence the power level of noise floor can be estimated within a
time window that is long enough to cover speech components with brief pause periods. The
noisy speech PSD is recursively smoothed by a first-order IIR filter.
P a g e | 76
(5.1)
where is the frame index, k is the frequency bin index, is the smoothed PSD of the
noisy signal, is the smoothing parameter and is the PSD of the noisy signal. As
per the implementation in Martin (1994), the smoothing parameter is calculated as:
(5.2)
where = the window length, R = FFT buffer length and is the sample rate. The
smoothing parameter is fixed and not updated over time. Using a fixed smoothing parameter
can widen spectral peaks and the noise estimate may not be accurate. In Martin (2001), a
time-varying smoothing parameter was introduced. The optimal smoothing parameter is
derived as:
(5.3)
where is true noise PSD. In the actual implementation the true noise PSD is
replaced by the estimated noise PSD from the previous frame . Because of using
the estimated noise PSD, needs to be limited to , with a value less than one to
avoid a deadlock occurring when the noise PSD is equal to the smoothed noisy signal PSD.
When it happens, the smoothed noisy signal PSD cannot react quickly to the changes in the
actual noisy signal PSD.
The estimated noise PSD lags behind the actual noise PSD by a number of frames depending
on how the minimum of the smoothed noisy signal PSD is found from the previous frames.
Hence an error monitoring procedure is necessary to detect the deviation between the PSDs
of the smoothed noisy signal and the average of the actual noisy signal. The correction factor
is calculated as:
(5.4)
P a g e | 77
The maximum value of is limited and it is further smoothed by a first-order IIR filter.
Then the is finally calculated by:
(5.5)
The minimum value of the smoothing parameter is limited to avoid reaching zero.
The minimum value of the smoothed noisy signal PSD is searched over D
consecutive frames. can be updated for every frame but it is computationally
expensive to do D – 1 compare operations for each frequency band. The searching can be
effectively done by a simple procedure described in (Martin 1994). The update is done at
every D frames but the minimum is found for each frame by only comparing with the
value from the previous frame . The minimum searching procedure is described as in
the following pseudo code.
if modulus( ,D) == 0 % at Dth frame
= min( , );
= ;
else % at other frames
= min( , );
= min( , );
end
is the estimated noise PSD that represents . The procedure is done for
each frequency band. The estimated noise PSD is the minimum smoothed noisy signal power
PSD.
The noise PSD estimated by minimum tracking is often biased towards lower value of the
actual noise PSD and therefore needs to be compensated. It can be compensated by
multiplying the noise estimate with the reciprocal of the mean of the minimum of a sequence
P a g e | 78
of random independent variables. The bias is inversely proportional to the number of
previous noisy PSD samples considered in the estimate of the smoothed noisy signal PSD. In
other words, the more the previous frames are used to smooth the power spectrum, the lower
the bias is. However a very long duration window cannot be used to smooth the power
spectrum of highly non-stationary signals like speech. In practice, an IIR filter is used to
smooth the noisy signal power. As a result the smoothed PSDs between successive frames
are correlated to a certain extent. Therefore the bias factor can only be estimated by
simulating the underlying exponential distribution of the power spectrum for a given number
of frames to search the minimum. The bias calculation is a complicated procedure yet
important for compensating the estimated noise that would otherwise be underestimated.
5.6.2 Lin’s Recursive Averaging Noise Estimator
Unlike other methods, the noise estimation using recursive averaging method does not
require a large buffer of samples to update the noise power. It works under the assumption
that speech and noise are independent signals and noise power changes slowly over time.
The noise power in each frequency band is estimated by a first-order IIR filter.
(5.6)
The key component in equation 5.6 is the smoothing parameter which is updated at
each analysis frame by:
(5.7)
where L is the length of frames for estimating the average noise power and Q controls the
steepness of , the smoothing parameter. close to 1 can result a slow noise estimation. The
value of L is typically set between 5 and 10 and Q is between 3 and 10. Although the
equation describes the average noise power at the denominator, it can also use the median
value of the noise power from the previous L frames. The smoothing parameter is
limited between 0 and 1. When speech is absent, the noisy signal power contains noise only
and . As a result, the noise power estimate, , is updated with the noisy
signal power at that time. When speech presents, and the noise power estimate is
P a g e | 79
not updated with the new value of noisy speech power. The final noise power estimate is
obtained by smoothing the noise power estimate with a first-order IIR filter.
(5.8)
The value of the IIR filter coefficient of the final smoothing filter is set around 0.5.
5.7 Conclusion
This chapter reviewed AGC systems for hearing aids and cochlear implant systems, and
described the AGC systems in the Nucleus sound processors. The aims of an AGC system
are (i) to provide access to low level sounds and (ii) to make loud sounds more comfortable.
More importantly, an AGC should achieve both goals with little-to-none perceptual
distortion. The rationales for AGC systems in hearing aids and cochlear implant systems are
mainly different in terms of compensating frequency-dependent hearing loss. The optimal
parameter set for an AGC in hearing aids is dependent on many factors and the results are
mixed between studies. The main purpose of AGC systems in cochlear implants is to expand
the predetermined input dynamic range with minimal distortion. Modern cochlear implant
systems use more than one AGC system or an AGC system with multiple (slow and fast)
control loops. While it is important to know the effects of AGC parameters on speech
intelligibility, it is also important to know how to evaluate those effects. Therefore the signal
metrics to quantify the effects of AGC system and test methodology for evaluation of AGC
systems in cochlear implants will be studied in next.
P a g e | 80
Chapter 6 Speech Intelligibility Metrics
6 Speech Intelligibility Metrics
6.1 Introduction
Listening tests with recipients are compulsory for research studies of cochlear implants.
Sometimes listening tests, preparation and actual testing, can be time-consuming for both
cochlear implant recipients and researchers, especially when the parameter space of an
algorithm/system is large or performance is affected by factors other than the main factor to
be analyzed.
Blamey et al. (1996) implemented a three-stage model of auditory performance over time
with the data of 800 recipients who benefited from a cochlear implant. The study was
replicated by Lazard et al. (2012) with more recipients data, 2251 recipients from 15
international clinics (Lazard et al. 2012). The model accounted for 22% of the performance
variance. A part of the unexplained 78% was likely the variation due to test-retest reliability
of speech intelligibility measures used. The performance variation between recipients and
the test-retest reliability of speech tests can undermine the performance variation due to the
processing conditions under test. The overall test results may not be sensitive to the
processing conditions in that case.
A signal metric can be defined as a measure to quantify the performance of a system
objectively. A speech metric is a subset of signal metrics that uses speech signals as input
stimuli. Test conditions are simulated offline to gather the output of a system under test for
performance analysis. The advantage of simulating test conditions offline is repeatability,
with results not affected by subject’s variability. A reliable metric can expedite the
development process of new signal processing algorithms. More importantly, metrics that are
highly correlated with speech intelligibility of cochlear implant recipients allow
understanding of important features of speech.
The standard metrics for speech processing and communication systems are the Speech
Intelligibility Index (SII) of ANSI S3.5 and Speech Transmission Index (STI) of IEC 60268-
P a g e | 81
16. Both quantify speech intelligibility by the average signal-to-noise ratio across critical
channels. Both measures are applicable to speech applications for normal and hearing
impaired subjects.
Compression affects speech intelligibility by reducing the amount of low rate modulation
depth, flattening spectral profile, introducing temporal distortion and reducing signal-to-
noise ratio. Measures that can quantify spectral-temporal distortion of envelopes are
hypothesized to predict speech intelligibility. The prediction power depends on how
sensitive the envelope distortion is on speech understanding. For example, cochlear implant
recipients are more sensitive to noise distortion than normal hearing subjects.
This chapter examines the signal metrics that can quantify envelope distortion due to various
processing in the cochlear implant signal path. The objective is to predict the effects of AGC
systems on the speech intelligibility of cochlear implant recipients. The standard metrics: SII
and STI, and non-standard measures: apparent SNR method, Across-Source Modulation
Correlation (ASMC) and Normalized Covariance Measure (NCM), will be studied.
6.2 Speech Intelligibility Index
The Speech Intelligibility Index (SII) is a metric derived from the Articulation Index (AI). AI
was developed by French and Steinberg (1947) and later improved by Kryter (1962a, b). The
basis of AI is based on the principle that the intelligibility of distorted, filtered or masked
speech depends on the proportion of speech available to the listener. AI was originally
developed to predict speech intelligibility of telephone communication systems under
various conditions of noise distortion, filtering and low level speech. The method to calculate
AI measure was re-examined in many studies (Pavlovic 1984; Pavlovic and Studebaker
1984; Pavlovic, Studebaker and Sherbecoe 1986; Pavlovic 1987) and became the standard
metric SII in ANSI. SII: ANSI S3.5 (1997) is a quantity that highly correlates with the
intelligibility of speech under adverse listening conditions such as noise masking, filtering
and reverberation. The concept of SII is that different frequencies contribute different
amounts of speech intelligibility and the intelligibility of a speech communication system
P a g e | 82
can be predicted by measuring the SNR in each contributing frequency band. This concept
has been widely used as the basis of other speech intelligibility metrics.
The method calculates the intelligibility index by the summation of the audible quantity of
speech that carries intelligibility across the frequency bands. It can be described as:
(6.1)
where is the band importance function, is the audibility function and P is the
proficiency factor of both talker and listener on the speech material. The procedure is to
divide the input speech material into a number of critical bands. The audibility W of a
frequency band depends on the effective sensation level of that band in the listener’s ear.
The scope of the SII is limited to natural speech, normal hearing listeners with no linguistic
and cognitive deficiencies, listening to natural speech, and the processing should not contain
sharp filters. The procedure was modified for hearing impairment by widening the
integration bandwidth of speech and noise spectrum greater than the critical bandwidth.
From Pavlovic (1984), the discrepancies between the observed and predicted performance
are greatest in those frequency regions where hearing loss is greatest. The contribution factor
of each frequency band was modified by a speech desensitization factor, a function of
hearing loss in frequency (Pavlovic, Studebaker and Sherbecoe 1986). The other
shortcoming of SII measure is that it is unable to account for the interaction between various
frequency components because it uses the long-term spectrum of speech and noise
(minimum duration of 30 seconds) and hence cannot reliably predict the intelligibility of
speech in non-stationary noise maskers (Rhebergen and Versfeld 2005). The conventional
SII measure was extended by partitioning the speech and noise into smaller frames and
averaged over the SII values of those time frames to get the final SII value (Rhebergen and
Versfeld 2005; Rhebergen, Versfeld and Dreschler 2006; Rhebergen, Versfeld and Dreschler
2008a).
P a g e | 83
6.3 Speech Transmission Index
Steeneken and Houtgast (1980) developed the Speech Transmission Index (STI) to quantify
speech transmission channels. The STI measure is based on the idea that the speech
intelligibility reduction can be predicted by the reduction in the temporal envelope
modulations. The method is conceptually very similar to the AI of French and Steinberg
(1947), although it applied the modulation transfer function (MTF) concept to evaluate
temporal characteristics of the system under test. The MTF calculates the reduction of the
output envelope spectrum compared to the input envelope spectrum. STI is obtained by the
weighted summation of the transmission indices in the critical frequency bands from 125 Hz
to 8 kHz. In each critical band, the modulation index was calculated for each frequency
component in the test stimuli. The range of the modulation frequency, typically between 0.5
Hz and 16 Hz, is also the most intelligible range of the connected speech of a typical
conversation. The study of Houtgast and Steeneken (1985) shows that the modulation index
of the speech intensity envelope spectrum is the highest from 2 to 4 Hz, as shown in Figure
6-1 below.
Figure 6-1 Speech modulation envelope spectrum (Houtgast and Steeneken (1985))
A high correlation between the STI and the subjective speech intelligibility scores were
shown for a wide variety of speech communication systems with distortions like band-
limiting, peak clipping, AGC, noise and reverberation (Houtgast and Steeneken 1973). Some
P a g e | 84
studies show that the MTF cannot be used directly to predict the speech intelligibility, for
example, if the test stimulus is speech in the fluctuating background noise. The fluctuations
in the background noise will underestimate the modulation reduction of speech caused by the
system under test.
6.4 Normalized Covariance Measure
The Normalized Covariance Measure (NCM) is a STI-based measure which quantifies the
performance of a system by calculating the SNR as a weighted sum of the transmission
indexes across critical bands (Holube and Kollmeier 1996). The difference between the
traditional STI and NCM is that NCM uses the covariance between temporal envelope
modulations of the reference and processed signals to determine the transmission index in
each band (Goldsworthy and Greenberg 2004; Chen and Loizou 2010). Chen and Loizou
(2011b) indicated that STI-based measures needed to use high modulation rates up to 100 Hz
for a better prediction of the intelligibility of the vocoded speech based on the findings from
their study. They also showed that a better speech intelligibility prediction of cochlear
implant recipients was achieved when subject-specific factors, the channel interaction matrix
in particular, were included in the NCM calculation (Chen and Loizou 2011a).
The procedures to calculate the NCM were described in Ma et al. (2009) and Chen and
Loizous (2011a). The channel interaction matrix (Chen and Loizou 2011a) was calculated
for each electrode as:
(6.2)
where Ii and Ij are the index of channel i and j and is the current spreading factor. The
channel interaction matrix was then multiplied with both the reference and processed signals
for all channels. The signals were then low-pass filtered.
The normalized covariance between the reference and the processed signals was calculated
as:
P a g e | 85
(6.3)
where and are the mean values and and are the envelopes of the reference
and the processed signals in the critical band i respectively.
Then the SNR for the critical band i was calculated as:
(6.4)
where is the covariance calculated in the previous step by equation 6.3. The was
limited between -15 and 15 dB.
The transmission index was calculated by linearly mapping the SNR between 0 and 1 as:
(6.5)
The band importance function can be as per the standard SII measure. The critical band
weight was calculated as described in (Ma, Hu and Loizou 2009):
(6.6)
Where x is the magnitude of input stimuli at the critical band i. The exponent factor p can be
between 0.12 and 1.5.
NCM was finally calculated as the weighted sum of the transmission indexes from the
critical bands:
(6.7)
where K is the total number of critical bands, is the weight calculated by equation 6.6 and
is the transmission index of the critical band i calculated by equation 6.5.
The NCM will be used in §13 to predict speech intelligibility of cochlear implant recipients
with different AGC conditions.
P a g e | 86
6.5 Apparent SNR
Sound processing, either linear or nonlinear, can cause the SNR of the output signal to be
different from the SNR of the input stimuli and therefore affect the intelligibility (Steeneken
and Houtgast 1980; Hagerman and Olofsson 2004; Rhebergen, Versfeld and Dreschler
2009). Compression is non-linear processing and therefore the effective SNR at the output
can be different from the input SNR. Souza et al. (2006) showed that fast AGC (WDRC)
degraded SNR due to the amplification of background noise. Even with no amplification, an
AGC could still reduce a positive SNR value of the input signal by compressing target
speech more than background noise. In cochlear implant processing, there are other non-
linear processing such as envelope sampling (maxima selection) and the input dynamic range
limitation that can affect the output SNR.
Hagerman and Olofsson (2004) proposed an inversion technique to recover speech and noise
from the compressed mixture at the output of an AGC. One of the components (either speech
or noise) was phase inverted and combined with the mixture to recover the other component.
The output SNR was then calculated from the recovered compressed speech and noise. The
method is sensitive to the exact time alignment of the original and phase-inverted waveform.
The inversion technique is not suitable for evaluating AGC after the filterbank in cochlear
implant processing because phase information is discarded when the FFT bins are combined
into channels.
Rhebergen et al. developed the apparent SNR method to quantify the effect of compression
amplification (Rhebergen, Versfeld and Dreschler 2008b). The gain produced by an AGC for
a high level input signal, the mixture of speech and noise, was applied to the speech and
noise signals separately to calculate the output SNR. The apparent SNR measure will be used
in this thesis to predict the effect of different AGC systems on speech intelligibility of
cochlear implant recipients.
P a g e | 87
6.6 Across-Source Modulation Correlation
While an AGC improves speech intelligibility of cochlear implant recipients by adjusting the
signal level to be within a predetermined input dynamic range, it also can have several
effects on the envelope of speech signals and consequently degrade speech intelligibility,
especially in noise. The effects of fast-acting compression on the intelligibility of speech in
noise were analyzed by Stone and Moore (2003, 2004, 2007). Stone and Moore quantified
the effect of fast compression on the envelope of speech by three signal metrics; (i) Within-
Signal Modulation Coefficient (WSMC), (ii) Fidelity of Envelope Shape (FES), and (iii)
Across-Source Modulation Correlation (ASMC). The WSMC measure calculated the amount
of correlation between speech envelopes at different frequency channels after compression.
The hypothesis was that if the correlated fluctuation of envelopes at different frequency
channels was important for the speech intelligibility, the WSMC index would be able to
predict subject’s scores. The FES measure compared the envelope shape of the target speech
in each frequency channel before and after the compression. The hypothesis was that if the
preservation of the envelope shape was important for speech intelligibility, the FES index
would show a positive correlation with the subject’s scores. Lastly, the ASMC measure
calculated the correlation between the target speech and interfering signals at the output of
the signal path. The gain produced by an AGC for a high level input signal, the mixture of
speech and noise, was applied to the speech and noise signals separately to calculate the
ASMC. The hypothesis was that cross-modulation between the two originally independent
signals could degrade the speech intelligibility.
Among the three metrics analyzed, the ASMC was shown to have the highest correlation
with the intelligibility scores. Therefore the ASMC measure will be used in §13.3 of this
thesis to predict speech intelligibility of cochlear implant recipients affected by different
AGC systems in noisy conditions.
P a g e | 88
6.7 Conclusion
Signal metrics are useful for performance analysis during the development and optimization
of new signal processing algorithms. A signal metric that reliably predicted the speech
intelligibility of cochlear implant recipients would accelerate development and optimization
of cochlear implant processing algorithms. This chapter examined standard metrics and also
non-standard signal metrics that have the potential to quantify the effects of AGC for
cochlear implant systems, and consequently predict speech intelligibility of cochlear implant
recipients. An indirect, but more important, goal is to find out the important features of
speech that an AGC or a signal processing algorithm should preserve to improve speech
intelligibility of cochlear implant recipients in adverse listening conditions. The apparent
SNR, the NCM and the ASMC measure are chosen in this thesis to quantify the performance
of different AGC configurations for the cochlear implant system. The prediction power of
each signal metric will be analyzed in chapter 13 based on the correlation between the
quantified indexes and the subject’s scores from various clinical studies in this thesis.
P a g e | 89
Chapter 7 Test Methodology
7 Test Methodology
7.1 Introduction
With technology advances in implant design, sound processor hardware and accessories,
multi-microphone technology, signal processing and sound coding techniques, the majority
of cochlear implant recipients show high levels of speech understanding even in relatively
challenging listening conditions. Increases in speech performance demand the review of
speech recognition tests to avoid ceiling effect. Tests to evaluate the performance of cochlear
implant recipients range from psychoacoustic to music tests. Among them, speech
recognition tests are most commonly used because the system that is shown to provide the
highest speech intelligibility is considered to be the most beneficial system for the recipients.
Speech recognition tests are important to evaluate the effectiveness of a system or an
algorithm, to compare between two or more devices or to compare different parameter
settings in a single device. An ideal speech test should be reliable and sensitive to differences
between test and processing conditions. The results should be highly correlated with real
world speech perception (Mackersie 2002). A good test is often measured by test-retest
reliability. Test-retest reliability is indicated by the performance deviation for the same test
conducted at different times.
This chapter first describes test materials and methods used in research studies of AGC
systems in cochlear implants. It then devises test methodology for clinical studies in this
thesis. It also explains the research platforms used to develop signal processing strategies
and elaborates the speech tests for evaluation of the existing and proposed AGC systems in
this thesis.
P a g e | 90
7.2 Test Materials
Planning a speech recognition test involves choosing the right test materials and method for
a system or an algorithm under test. Test materials range from components of speech, vowels
and consonants, spondees, words in carrier phrase to sentences without and with context. A
proper selection of test material is critical for evaluating AGC systems (Stöbich, Zierhofer
and Hochmair 1999). Sentences are most appropriate to use in listening tests that evaluate
effects of compression on dynamic stimuli. For example, speech tests using isolated words
may not fully evaluate the benefits of an AGC system with time constants longer than one
second. The City University of New York (CUNY) sentence list, Bamford-Kowal-Bench
(BKB) sentence list (Bench, Kowal and Bamford 1979), Hearing In Noise Test (HINT)
sentences, IEEE sentences, AzBio sentences, Hochmair-Schulz-Moser (HSM) sentences
(Hochmair-Desoyer et al. 1997) and Oldenburg (OLSA) sentences are commonly used in
research studies for evaluation of speech understanding of hearing-impaired subjects.
Predictability of speech material varies between different test materials, for example, the
BKB sentence test and the Oldenburg sentence test. The BKB sentence lists contain simple
short sentences that are suitable for evaluating speech perception in children. Hence, the
sentences are relatively easy to predict. The Oldenburg test, on the other hand, consists of
sentences that have five words, and each sentence has the same syntactical form (Name verb
number adjective object). The predictability of the speech material is low. The predictability
of speech can affect the performance comparison. As context effects are prevalent in the
perception of speech, the effective number of statistically independent Bernoulli trials can be
smaller than the number of words (Brand and Kollmeier 2002). Thus, easy test material can
bias scores towards the higher end. With easy test materials, for example, short sentences
with context, such as the BKB sentence lists, the probability of correctly scoring the
remaining words increases if one of the first words has been recognized.
Gifford et al. (2008) assessed speech perception of 156 adult cochlear implant recipients and
50 hearing aids users on commonly used speech recognition materials; CNC words, and
sentence recognition in quiet with Hearing In Noise Test (HINT) and AzBio sentences and in
noise with BKB-SIN sentences. Their study shows that 28% of the subjects achieved 100%
P a g e | 91
correct scores and 71% of the subjects achieved more than 85% correct scores for HINT
sentence in quiet. The ceiling effect suggested more difficult test materials were required.
CNC monosyllabic words, vowel-consonant-vowel (VCV) disyllables are commonly used in
speech recognition tests. Speech tests using single isolated words are often not efficient to
test certain aspects of an AGC system. For example, a monosyllabic word test such as
Consonant-Nucleus-Consonant (CNC) word test may not be efficient to evaluate AGC
systems with different time constants. Duration of test materials matters in this case.
Researchers often use carrier phrases before the keywords.
The Australia Sentence Test In Noise (AuSTIN) sentences (Dawson, Hersbach and Swanson
2013) were developed by the Cooperative Research Centre for Cochlear Implant and
Hearing Aid Innovation for Australian use (Cameron and Dillon 2007). They are short and
simple like BKB sentences. The sentences were recorded with a female native Australian
speaker. AuSTIN contains a total of 80 lists, each comprising 16 sentences.
7.3 Test Methods
While selection of appropriate test materials is important, test methods should also consider
including the dynamics of speech sounds that are commonly encountered. Test conditions
should reflect everyday listening condition, i.e., speech presented in background noise or
competing voices. Sentence in noise tests are most commonly used for evaluating AGC
systems in cochlear implants. There are two basic approaches to conduct a speech-in-noise
test:
 Fixed method: both target speech and competing noise are presented at fixed
presentation levels. The SNR is fixed during the test and performance is usually
expressed as percent correct items, for example, words or sentences.
 Adaptive method: either speech or noise level is adjusted during the test. The SNR is
adaptively changed using an up/down rule to find the SNR that yields a specified
percent correct score.
P a g e | 92
Table 7-1 lists test materials and methods used in research studies that evaluated AGCs of
cochlear implant systems. These studies used sentence tests to evaluate different aspects of
compression.
Sentence Presentation
Study Test Methods Test Material
Level
Whisper evaluation o fixed presentation  45, 55, 70 dBA in  SIT sentences
level test quiet

McDermott et al.
o adaptive SRT test  speech at 65 dBA  speech-shaped
(2002)
noise
Evaluation of five fixed presentation 55, 70, 85 dB SPL at Gottingen sentences in
AGC (slow and dual- level test individually selected Fastl-noise (amplitude-
loop) configurations SNR modulated speech-
shaped noise)
Stöbich et al. (1999)
ADRO evaluation fixed presentation  40, 50, 60 dB SPL in  Close set spondees
level test quiet in a carrier phrase

James et al. (2002)
 60 and 70 dB SPL in  Open set CNC
quiet words in quiet
 50, 60, 70 dB SPL in  CUNY sentences in
quiet quiet
 70 dB SPL at SNR of  CUNY sentences
+15 and +10 dB in 8-talker babble
noise
ADRO evaluation o fixed presentation  50, 60, 70 dB SPL in  Freiburger
Muller-Deile et level test quiet numbers test
al.(2008)  Freiburger
monosyllabic
words test
o adaptive SRT test  Background noise
 Oldenburger
was fixed at 65
sentence
dB SPL and speech
was adaptively
P a g e | 93
varied
ADRO evaluation fixed presentation  50 dB SPL in quiet BKB sentences in 8-
level test  speech at 65 dB SPL talker babble noise

Dawson, Decker and
at individually
Psarros (2004)
selected SNR
Fast AGC and dual- o fixed presentation  65 dB SPL at  HSM sentence test
loop AGC evaluation level test individually selected  ABC sentence test
SNR in unmodulated
Boyle et al.(2009)
o adaptive SRT test  roving speech level, speech-shaped
65 dB SPL  10 dB noise
Table 7-1 Test materials and methods used in AGC studies
7.3.1 Fixed Method
The fixed method presents both target speech and background noise or competing voices at
predetermined presentation levels. The SNR is fixed throughout the test. Speech
intelligibility is the average of percent correct items, for example, number of correct
morphemes, target words or a whole sentence. Presentation level of sentences and the SNR
were chosen by the researcher in those tests.
The fixed method is only of limited relevance to everyday life listening scenarios. In
everyday speech communication, the short-term and long-term level of target speech as well
as other sounds in the environment can vary within a few tens decibel range. Hence the fixed
method may not fully evaluate all aspects of an algorithm. For example, evaluating the effect
of slow time constants needs the presentation level of test stimuli to be varied within the test.
The other disadvantage of using the fixed method is that the SNR needs to be carefully
selected to avoid floor and ceiling effects. The advantage is that it is simple and relatively
easy to arrange. It is useful for finding the performance-intensity function of a processing
condition.
P a g e | 94
7.3.2 Adaptive Method
An adaptive method that finds speech reception threshold (SRT) is most commonly used in
speech recognition testing. SRT can be defined as signal-to-noise ratio (SNR) level at which
a listener achieves a targeted performance. 50% is most commonly used as the targeted
percent correct score level for SRT. The adaptive method has the advantages of greater
flexibility and higher efficiency over the fixed SNR method (Levitt 1978). With the adaptive
method, performance evaluation is not affected by floor or ceiling effect. Different aspects of
compression can be evaluated as the presentation level of either speech or background noise
is varied with SNR which is adaptively varied during the test. An adaptive method
determines the SNR of next stimuli based on the current stimuli and the response. Most tests
adapt SNR by adjusting background noise with respect to the fixed presentation level of
speech. There are also adaptive tests in which the background noise level is fixed and the
presentation level of each sentence is altered based on the subject’s response to the previous
one (Muller-Deile et al. 2008).
The disadvantage is that the test can be relatively complex and parameters need to be
selected carefully. There must be sufficient sentences for the final SNR values to converge
closely to the actual SRT. An SRT test with a standard deviation less than 1 dB is acceptable
for test-retest reliability. The scoring method can also have an impact on test results. For
instance, an adaptive procedure is less efficient with sentence scoring (Brand and Kollmeier
2002). Word scoring produces more test samples and is therefore more suitable to use in
adaptive tests. In addition, the starting SNR value, the up/down procedure, and the method to
calculate the final SRT have impact on the SRT result (Dawson, Hersbach and Swanson
2013).
Speech tests in the laboratory are often criticized for not being representative of real-life
listening conditions. A test should have more than one presentation level to evaluate the
adaptability of an AGC to different presentation levels. An SRT test with sentences roving
between more than one presentation level has been proposed and used by researchers to
evaluate the performance of gain algorithms and cochlear implant systems (Boyle et al.
2009).
P a g e | 95
The roving-level SRT test has been used to evaluate the performance of fast and dual-loop
AGC systems (Boyle et al. 2009) and different sound processors (Haumann, Lenarz and
Büchner 2010; Boyle et al. 2013) with cochlear implant subjects. The test originally used
male-voiced German HSM sentences, with speech-shaped noise presented 0.5 second before
and after each sentence. The presentation level for each sentence was randomly selected.
Haumann et al. (2010) tested two roving conditions: 65 dB SPL ±10 dB roving (i.e.
presentation levels roving at 55, 65 and 75 dB SPL) and 65 dB SPL ±15 dB roving (i.e.
presentation levels roving 50, 65 and 80 dB SPL). The SNR was adapted as a single track for
all presentation levels tested within the test. There were altogether 30 sentences in one test,
with 10 sentences for each presentation level. The SRT was calculated as the mean of the last
ten SNR values. As the presentation level was roved between low and high presentation level
within the test, it was hypothesized to be more realistic and emulating listening conditions
outside the laboratory. Besides, the adaptive roving-level test was more sensitive to the
difference between processor designs that were not effectively revealed using the fixed
method.
Boyle et al. (2013) recently developed a new sentence test called Sentence Test with
Adaptive Randomized Roving levels (STARR) to evaluate the effectiveness of hearing
prostheses. The STARR used IEEE sentences spoken by male and female speakers. The SRT
was independently and adaptively calculated for each speaker in the STARR. In STARR, ten
sentences were presented at each of three presentation levels: 50, 65 and 80 dB SPL, with the
presentation level randomly selected. 15 sentences were spoken by the male speaker and 15
by the female speaker. No consecutive sentences were presented at the same level or by the
same speaker. The initial SNR was selected +20 dB. A speech-shaped noise was varied
adaptively to track the SNR at 50% correct scores. The step size was 10 dB initially. The
step size was reduced to 5 dB after the first reversal and to 2.5 dB after the second reversal.
The SNRs of the last nine sentences for each speaker plus the SNR that would have been
applied to the next sentence was averaged to get the final SRT result for each speaker. The
STARR tracked the SRT of male and female speaker independently.
The study on STARR showed that the comparison between two test conditions could only be
meaningful if the SRT difference was greater than 2.2 dB for a normal hearing subject using
P a g e | 96
one test list per condition. The SRT variation of the cochlear implant participants in their
study was much higher than that of the normal hearing subjects. Both the normal hearing and
the cochlear implant subjects performed at their best at 65 dB SPL. However, only 40% of
the cochlear implant recipients achieved an SRT lower than 20 dB. A high SRT above 20 dB
indicates that the competing noise was not the main factor affecting the performance. A
significantly lower group mean score at low presentation level (50 dB SPL) than at mid and
high presentation levels (65 and 80 dB SPL) shows that lack of audibility was the main
factor affecting the speech intelligibility. Since the test employed a single adaptive track the
final SRT was over-weighted by the low presentation level. These studies showed that the
roving-level SRT test needed to be improved.
7.4 Test Methodology for Clinical Studies in this Thesis
This thesis evaluated the existing AGC systems with fast and slow time constants and
proposed new gain or dynamic range optimization algorithms. The fixed test was used for
evaluating AGC systems with fast time constants in chapters 8 and 10. For evaluation of
AGC systems with fast and slow time constants and with more than one control loops, the
adaptive method was used in chapters 9 and 12. The fixed test was also used for evaluating
ALGF in chapter 12 to observe the benefits of employing a noise estimator in the algorithm.
7.4.1 Test Setup
Listening tests were carried out in a sound-treated room. A 1 kHz narrow band noise was
used to calibrate sound pressure level at the listener’s position in the soundroom. The sound
pressure level of the narrowband noise at the test position was shown 65 dB SPL within 1 dB
tolerance when measured with B&K 2250 sound level meter.
Two test setups were organized. In the first set-up (referred to as the loudspeaker set-up), the
audio was presented from a single loudspeaker one metre in front of the subject. The sound
pressure level was restricted to 80 dB to avoid loudspeaker distortion. To achieve effective
presentation levels above 80 dB SPL, the manual sensitivity control was increased to provide
additional gain.
P a g e | 97
The second setup (referred to as the direct connect setup) bypassed the loudspeaker and
microphones, and presented the audio signal directly to the ADC of the real-time processing
platform (§7.4.5.2). A pre-emphasis filter was used to match the frequency response of the
Standard directionality. The direct connect setup was calibrated by a 1 kHz sinusoid at the
compression threshold of the front-end compression limiter. This had two advantages: the
audio could be presented at high levels without distortion, and there was no possibility of the
recipient using any residual acoustic hearing in their contra-lateral ear. The drawback was
that the subjects could not hear their own voice.
All the experiments in this thesis used a single sound source such that both speech and noise
sounds were presented from the loudspeaker in front of the recipient. Therefore the
directional microphone techniques would have little effect on the performance of the
subjects. The Standard microphone directionality (Hersbach et al. 2012) was used in all
experiments.
7.4.2 Test Materials
The sentence materials of AuSTIN (Dawson, Hersbach and Swanson 2013) were used in the
listening experiments of this thesis. Each sentence contains four to six words or six to eight
syllables. Four-talker babble noise was used in the fixed test and LTASS noise was used in
the roving-level SRT test. Each sentence was time-aligned with a particular segment of four-
talker babble.
The morphemic scoring method was used in all tests except in the adaptive roving-level test
in §9.2, which used word scoring. The morphemic scoring method gives scores on the
number of morphemes correctly repeated from each sentence. For example, the sentence
“She is do/ing her home/work” contains seven morphemes.
P a g e | 98
7.4.3 Fixed Level Test
The presentation level of sentences and the input SNR were fixed for each listening
condition in the fixed test. The RMS level of noise was adjusted accordingly before the test.
Four-talker babble noise was used in the speech in noise tests using the fixed method. The
RMS level of noise was adjusted based on the sentence presentation level and the SNR to be
tested. The background noise was presented one second before and also one second after
each sentence.
7.4.4 Roving-level SRT Test
The adaptive SNR rule limits the maximum SNR at 30 dB. The reason for this limitation is
that if a subject cannot score well at very high SNR (30 dB in this case), then the level of
background noise may not be the performance degradation factor. The performance is likely
to be affected by factors relating to the presentation level such as audibility and envelope
distortion. Since increasing SNR is not likely to improve performance for SNR converging
to get the final SRT, the maximum SNR is limited at 30 dB.
7.4.4.1 Roving-level SRT Test (Single Adaptive Track)
In this test, the presentation level of sentences is randomly roved between 50, 65 and 80 dB
SPL during the test. The adaptive rule is same as the one used in the study of Haumann et al.
(2010) and described above in §7.3.2. The test was used in the clinical study that evaluated
the performance of the existing AGC systems of the Nucleus CP810 system (§9.2.1).
7.4.4.2 Interleaved Roving-level SRT Test
Cochlear Limited has its own an adaptive SRT test known as Australian Sentence Test In
Noise (AuSTIN) for clinical studies (Dawson, Hersbach and Swanson 2013). With AuSTIN,
a standard deviation of 1 dB can be achieved with 20 sentences when a psychometric fitting
rule is applied. AuSTIN can be configured to present a single or roving presentation level
between sentences or noise during the test. The main difference between AuSTIN roving-
level test and the roving-level test described in §7.3.2 is that AuSTIN allows multiple
P a g e | 99
adaptive tracks, i.e., each presentation level has its own track. Hence it is also called the
interleaved roving-level SRT test.
The interleaved roving-level SRT test also selects the sentence presentation level randomly
from the list. The SNR for each presentation level is calculated independently. It thus allows
the SRT to be measured separately for each presentation level by applying a psychometric fit
rule. A total of 48 sentences are used, with 16 sentences for each presentation level. The
number of correct items (words or morphemes) was counted before the direction of SNR was
decided for next sentence. AuSTIN adaptive rule is similar to the one used in HINT (Nilsson,
Soli and Sullivan 1994). The step size is 4 dB for the first four sentences and 2 dB for
remaining sentences. The background noise can be continuously presented or it can be
played at a certain time before and after the sentence. When the background noise is
presented continuously, a beep is presented before each sentence to alert the subject.
Table 7-2 lists the components speech tests, test materials and methods, used in the clinical
studies of this thesis.
P a g e | 100
Sentence Presentation
Study Test Methods Test Material
Level
Effects of no AGC fixed presentation 55-89 dB SPL SNR of AuSTIN sentences in
and the front-end level test +10 and +20 dB four-talker babble
compression limiter noise
on speech
intelligibility (§7)
Evaluation of the o roving-level SRT roving speech level, 65 AuSTIN sentences in
existing AGC test (single dB SPL  15 dB speech-shaped noise
systems (§9) adaptive track)
o interleaved
roving-level SRT
test (multiple
adaptive tracks)
Effects of envelope fixed presentation  89 dB SPL in quiet AuSTIN sentences in
profile limiter on level test and at SNR 10 dB four-talker babble
speech intelligibility  55-89 dB SPL SNR noise
(§ 10) of +10 and +20 dB
Evaluation of o fixed presentation  50 and 80 dB SPL at  AuSTIN sentences
Adaptive Loudness level individually selected in four-talker
Growth Function o interleaved SNR babble
roving-level SRT  roving speech level,

(§ 12)
test (multiple 65 dB SPL  15 dB  AuSTIN sentences
adaptive tracks) in speech-shaped
noise
Table 7-2 Test materials and methods used in the clinical studies of this thesis
P a g e | 101
7.4.5 Research Platforms
The signal processing algorithms in this thesis were implemented in either the Nucleus
CP810 behind-the-ear (BTE) sound processor or a real-time PC-based research platform
called the Nucleus-xPC system. The Nucleus-xPC system has the advantage of quick
prototyping and evaluation in the laboratory before an algorithm can be deployed on the
BTE processor. The advantage of implementing algorithms on the BTE sound processor is
the flexibility to evaluate them in different listening conditions. Most of the clinical studies
conducted in the laboratory used the Nucleus-xPC system. The Nucleus CP810 sound
processor was used in the clinical study for evaluation of the existing AGC systems (§9.2)
and the take-home experiment of the proposed envelope profile limiter (§10.2.2) for quality
assessment. The description of each system is elaborated in the following subsections.
7.4.5.1 Nucleus CP810 Sound Processor
The Nucleus CP810 sound processor is shown in Figure 3-2. The Nucleus CP810 sound
processor was launched with the Nucleus 5 cochlear implant system in 2009. The sound
processor used the same customized DSP chip as Freedom processor (Swanson et al. 2007;
Bondarew and Seligman 2012). The processor is based on a 0.18 m CMOS ASIC. The
processing core consists of an 8051 micro-controller and four identical custom DSPs
(Patrick, Busby and Gibson 2006). The analog domain contains a low power oscillator, three
16-bit sigma-delta Analog-to-Digital Converter (ADC), a class-D output and a DC/DC
converter. The top level architecture is shown in Figure 7-1.
P a g e | 102
Figure 7-1 Top-level architecture of Champ (Swanson et al. 2007)
The DSPs are utilised in a serial fashion, such that each DSP performs a different set of
functions in the signal path. The DSP firmware runs on four DSPs, and is responsible for
processing audio samples taken from two omni-directional microphones. The DSPs send the
resulting stimulation commands to the stimulus controller which encodes them into an RF
signal for the implant. An acoustic output, available via an optional earphone accessory, is
used for monitoring the signal as it passes through the DSP firmware.
Each of the four DSPs has 1024 words of instruction, X, Y and Z data memory, and run at
5011000 cycles per second. Three DSPs are used in the existing design, leaving one for
future expansion. The multi-microphone directional and beamforming algorithms,
microphone calibration filters, input select and mixing and frequency analysis (FFT) are
implemented on the first DSP (DSP0). In Nucleus CP810 sound processor, a number of
alternative audio inputs, such as telecoil, Lapel Microphone, TV/HI-FI cable, and the Euro-
Adaptor, are available in addition to the omni-directional microphones. The alternative audio
inputs can be mixed with the microphone according to a user-defined mixing ratio.
Combine-Into-Channel (CIC), clinical gains and AGCs, the UGM and ADRO, are
implemented on the second DSP (DSP1). Maxima selection, LGF, mapping and Data
Encoder Formatter (DEF) are implemented on the third DSP (DSP2).
P a g e | 103
7.4.5.2 Real-Time Nucleus-xPC System
The real-time Nucleus-xPC system is a laboratory-based research platform developed in
Cochlear Ltd (Goorevich 2005). The xPC system assists research and development of
cochlear implant sound processing by allowing quick prototyping and real-time evaluation of
signal processing algorithms with cochlear implant recipients.
Figure 7-2 shows the hardware components of the Nucleus-xPC system for Nucleus signal
path. Similar to a sound processor, the Nucleus-xPC system consists of the hardware for
audio input, processing unit and a stimulus generator. The processing unit consists of two
computers; a host computer and an x86-based real-time computer system as a target
computer. DSP algorithms for cochlear implant sound processing are implemented in the
Nucleus-xPC system using Simulink from the Mathworks. The host computer builds
Simulink models and downloads them into the target PC. The target computer executes them
in real-time. The audio inputs to the Nucleus-xPC system are captured by two omni-
microphones mounted in the behind-the-ear (BTE) housing of CP810 sound processor. The
electrical signals at the output of the two omni-microphones are amplified by a preamplifier.
The audio interface to the target computer is accomplished by a high performance audio
board such as Bittware PMC+ and General Standards as shown in Figure 7-2. The custom-
made stimulus generator (StimGen) drives the RF coil for the implant.
P a g e | 104
Figure 7-2 Components of the real-time Nucleus-xPC system (Goorevich 2005)
The Simulink blocks and models for Nucleus sound processing can be found in a Simulink
library called Nucleus MATLAB Blockset (NMB). NMB contains the commercially
available algorithms such as SPEAK, ACE, CIS, Whisper, ADRO and Beam for the Nucleus
systems. NMB can also be used for off-line simulation and experimentation using Simulink
alone. Figure 7-3 shows an example of ACE sound coding strategy with the two AGCs under
study.
P a g e | 105
Figure 7-3 ACE sound coding strategy with the standard front-end AGC (blue block)
and the proposed AGC (green block)
7.5 Conclusion
This chapter studied test methodology, materials and methods, for speech tests to evaluate
the effectiveness of the signal processing algorithms, AGC system in particular, in hearing
prostheses. Speech tests using sentences are more appropriate to evaluate compression
circuits than isolated words. Compared to the fixed method, the adaptive method has the
advantages in a speech test; greater flexibility and higher efficiency. Roving-level SRT test
is considered to represent realistic listening conditions outside the laboratory. A roving-level
adaptive SRT test can have a single or multiple adaptive tracks. The clinical studies in this
thesis used both the fixed method and adaptive method, SRT test. The speech tests that
evaluated the sophisticated gain algorithms used the roving-level SRT test. Two research
platforms were used in this thesis to implement gain algorithms in the Nucleus signal path.
Implementation on the Nucleus CP810 sound processor provides the flexibility to evaluate
the algorithms in real world listening conditions. On the other hand, implementation on the
Nucleus-xPC system has the advantage of quick prototyping and evaluation in the
laboratory.
P a g e | 106
Chapter 8 Investigating Effects of No AGC and Fast AGC
on Cochlear Implant Speech Intelligibility
8 Investigating Effects of No AGC and Fast AGC on
Cochlear Implant Speech Intelligibility
8.1 Introduction
With no AGC in the signal path, the input dynamic range limitation at the LGF (§4.3.7)
would clip the envelopes for speech presented above C-SPL. It could be considered as the
worst case processing for speech intelligibility of cochlear implant recipients at high
presentation levels. Although there are many studies on the effects of different AGC system
on speech performance of cochlear implant recipients (§5.4), no specific study on the effects
of cochlear signal path with no AGC on the speech intelligibility of cochlear implant
recipients at various presentation levels was found.
The study of Zeng et al. (2002) showed that the speech understanding of cochlear implant
subjects was degraded when the input dynamic range was set below 40 dB in quiet
condition. Zeng and Galvin (1999) showed that the resolution of intensity steps was not very
important for the speech intelligibility of the cochlear implant recipients because the speech
intelligibility of cochlear implant subjects was not significantly affected for the maximum
reduction of current levels, just two levels, in the electric dynamic range. They concluded
that amplitude cues could be traded with frequency cues, at least for speech in quiet. An
inference made from these two studies is that a proper dynamic range setting before mapping
into electrical current levels is important for speech intelligibility although the number of
current steps may not be so important. With no AGC, speech close to C-SPL was presented
in the upper part of the dynamic range of the LGF. Performance degradation was expected at
presentation levels further away from C-SPL. In other words, an AGC is necessary to extend
the dynamic range beyond the predetermined range between C-SPL and T-SPL. It should be
noted that limited current levels at the implant could degrade performance in noise because
stimulating target speech and noise at similar levels would make them less distinct from each
other and the segregation between them would be harder.
P a g e | 107
The front-end compression limiter (§5.5.1) is a default signal processing algorithm in the
everyday sound processing strategy of Nucleus cochlear implant systems. It is a fast AGC
with time constants less than 100 ms. The purpose of a compression limiter in a cochlear
implant system is to reduce envelope clipping at the LGF for high level input signals. It
therefore prevents potential loudness discomfort due to excessive stimulation at maximum
current levels (C-level). Studies showed that fast compression could degrade speech
intelligibility. However, the signal path with the front-end compression limiter would still be
better than the signal path with no AGC at all. It would be interesting to find out the
effectiveness of the front-end compression limiter over a wide range of presentation levels.
The objective of this study is to observe the effects of channel envelope clipping for speech
presented at different levels. Hence the Performance-Intensity function (P-I) of cochlear
implant recipients with no AGC will be measured with sentences presented from low to high
presentation levels. The P-I function of the recipients with the front-end compression limiter
will also be measured for the same input conditions to observe any performance
improvement. The P-I functions of both processing conditions will be measured in two SNRs
to study confounding effects of additive noise on speech intelligibility.
8.2 Clinical Study
8.2.1 Subjects
Four cochlear implant recipients; S1, S2, S3 and S4, participated in this study. The subject
details can be found in Appendix 1. The subjects were unilaterally implanted with either the
Nucleus 24 or Nucleus Freedom cochlear implant. All had more than two years experience
with their implant, and previous experience with speech tests in clinical evaluations.
8.2.2 Signal Processing
The Nucleus Freedom ACE signal path (§4.3) with no microphone directionality and no
other AGC system except the fast front-end AGC was used. The signal processing in the
P a g e | 108
ACE sound coding strategy and the AGC were implemented on Simulink for testing with the
Nucleus-xPC system (§7.4.5.2).
Figure 8-1 Signal path used in the experiment
The compression limiter evaluated in this study had the same parameter setting as that of the
Nucleus Freedom sound processor as shown in Table 8-1.
Parameter Value Unit
Attack time 5 ms
Release time 75 ms
Hold time 0 ms
Compression threshold 73 dB SPL
Compression ratio Inf
Post AGC gain 0 dB
Table 8-1 Parameter setting of the front-end compression limiter
It had unity gain up to the compression threshold, and infinite compression beyond. The
compression threshold was 73 dB SPL, calibrated by a sine tone at 1 kHz. Since the crest
factor of speech is about 8 dB higher than that of a sinusoid, peaks of speech presented at 65
dB SPL started to hit the compression threshold. The attack time was 5 ms. For a sudden
increase in the input signal level, overshoots can occur during the attack time. Therefore the
front-end compression limiter also has an overshoot limiter to limit the maximum overshoot
of 3 dB above the compression threshold.
P a g e | 109
8.2.3 Test Setup
The fixed method as described in §7.3.1 was used to evaluate the processing conditions. The
sentences were presented in four-talker babble noise. Eight or more sentences were tested for
each presentation level.
The presentation level was varied from 55 to 89 dB SPL at two SNR conditions: 10 and 20
dB. All subjects except S1 were tested using the loudspeaker setup described in §7.4.1. S1
was tested with the direct connect setup for sentences presented above 75 dB SPL because
S1 had the contra-lateral hearing that could affect the evaluating of test conditions at high
presentation levels.
8.2.4 Results
Each diagram in Figure 8-2 shows the percent correct morpheme scores of each cochlear
implant subject with each processing at each SNR. Figure 8-3 shows the group mean scores
of all subjects.
P a g e | 110
100
80
Percent correct (%)
60
40
20
S3 S4
0
100
80
Percent correct (%)
60
40
20
S1 S2
0
55 60 65 70 75 80 83 86 8955 60 65 70 75 80 83 86 89
Presentation level (dB SPL) Presentation level (dB SPL)
Figure 8-2 Percent correct scores of four cochlear implant subjects with no AGC and
with FEL75 (legends of the curves are as described in Figure 8-3)
100
90
80
Percent correct (%)
70
60
50
40
30
SNR10 No AGC
20
SNR20 No AGC
10 Mean SNR10 FEL75
SNR20 FEL75
0
55 60 65 70 75 80 83 86 89
Presentation Level (dB SPL)
Figure 8-3 Group mean scores of four cochlear implant recipients with no AGC and
with FEL75
P a g e | 111
Subj 20 dB SNR 10 dB SNR
 scores (%) p-value  scores (%) p-value
S1 5 0.0625 5 0.0273*
S2 11 <0.001** 21 <0.001**
S3 6 0.039* 13 <0.001**
S4 14 <0.001** 11 <0.001**
Group 9 <0.001** 11 <0.001**
Table 8-2 Statistical analysis of the score difference between the front-end compression
limiter and no AGC for the presentation levels above 70 dB SPL in two SNR
conditions. The asterisks indicate statistically significant difference in performance
between no AGC and FEL75 (* p < 0.05, ** p < 0.01).
8.2.4.1 With no AGC
At 20 dB SNR, the group mean scores were at least 80% for the presentation level up to 83
dB SPL. The performance degraded above 83 dB SPL. The scores at 83 and 86 dB SPL were
compared using the paired binomial test. The difference was highly significant for all
subjects. Subjects differed in their tolerance to envelope distortion with no AGC at high
presentation levels. For example, subject S2 could still score more than 60% at the highest
presentation level whereas the score of S3 dropped to 30% at that level. Performance
degradation at 55 dB SPL compared to 60 dB SPL could be due to low audibility at 55
dB SPL. The difference of more than 50 percentage points was observed between the highest
and the lowest group scores.
At 10 dB SNR, group mean scores started to degrade from the presentation level of 70
dB SPL. Individual subjects showed different starting points of the score degradation. For
example, subject S4 had the score degraded from 65 dB SPL whereas S3’s scores degraded
from 75 dB SPL. When the scores at 70 and 75 dB SPL were compared, the difference was
highly significant for S2 and S4 and marginally significant for S3 (p-value < 0.05*)
according to the paired binomial test. The scores of each subject reached their lowest
P a g e | 112
performance at the highest three presentation levels. The group mean scores were
approximately 10% at 83 and 86 dB SPL and almost 0% at the highest presentation level of
89 dB SPL. The performance difference of more than 80 percentage points was observed
between the highest and the lowest group mean scores.
8.2.4.2 With Compression Limiter
The compression limiter started to show effects on mean scores for the presentation levels
above 65 dB SPL because the peaks of speech waveform at 65 dB SPL started to hit the
compression threshold.
At 20 dB SNR, the group mean scores with the compression limiter were more than 90% for
presentation levels between 60 and 80 dB SPL. The group mean scores started to degrade
above 80 dB SPL. The group mean scores with the compression limiter were higher than
with no AGC condition above 70 dB SPL except at 83 dB SPL. The paired binomial test
between the two processing conditions on the pooled scores for presentation levels above 70
dB SPL showed that the performance improvement with the compression limiter was highly
significant for the subject S2, S3 and S4 and the group (Table 8-2). The performance
difference of approximately 40 percentage points was observed between the highest and the
lowest group mean scores.
At 10 dB SNR, the improvement of the group mean scores with the compression limiter was
approximately 10 percentage points for the presentation levels above 70 dB SPL. The paired
binomial test on the pooled scores between the two processing conditions for presentation
levels above 70 dB SPL showed that the performance improvement with the compression
limiter was highly significant for each subject and the group (Table 8-2). The P-I functions
of the subjects with and without the compression limiter show that the rates of score
degradation were similar. The difference was that the P-I function of the processing with no
AGC was offset by approximately -10 percentage points. It appears that the input dynamic
range was expanded approximately 5 dB by the AGC between 65 and 80 dB SPL. For
example, the group mean score with the compression limiter at 70 dB SPL was comparable
to that with no AGC at 65 dB SPL. This trend continued to 80 dB SPL. The group mean
P a g e | 113
scores of the subjects with the compression limiter were approximately 20% at 83 and 86
dB SPL and 10% at 89 dB SPL. The score difference between the highest and the lowest
scores was approximately 80 percentage points.
8.3 Discussions
The subjects showed the performance degradation with increasing presentation level for the
processing with no AGC before the LGF in the signal path. The score degradation at high
presentation levels was due to the distortion of envelopes above the saturation level for
speech presented above C-SPL. The effect of clipping can be divided into two categories:
waveform distortion and SNR reduction. Waveform distortion includes temporal envelope
modulation depth reduction and spectral envelope shape distortion. Spectral properties of
speech that carry intelligibility such as formants can be degraded. Waveform distortion
occurred at high presentation levels regardless of background noise. The output SNR was
reduced when the envelopes of high-level speech components were clipped more than the
background noise.
The comparison of the P-I functions at two SNR conditions showed that the starting point of
the score degradation and the rate of score degradation were different for different
background noise levels. It appeared that the subjects were more tolerant of waveform
distortion than noise. For example, the background noise levels of speech presented at 70 dB
SPL in noise at 10 dB SNR and speech presented at 80 dB SPL in noise at 20 dB SNR were
approximately 60 dB SPL. Even though waveform distortion occurred more for speech at 80
dB SPL than at 70 dB SPL, the group mean scores with no AGC at those two test conditions
were approximately the same. This agrees with research by others showing that noise is more
detrimental than amplitude distortion to speech intelligibility (Zeng and Galvin III 1999).
The mean scores began to degrade when the level of the background noise was about
60 dB SPL at both SNR conditions. The output SNR at this point could be well below the
input SNR due to clipping. The higher the presentation level was increased, the lower the
output SNR could become in the processing with no AGC. Therefore the output SNR
degradation in high presentation levels was attributed as the main cause of performance
degradation. Similar observation was made with speech understanding of normal hearing
P a g e | 114
subjects (Dubno, Horwitz and Ahlstrom 2005) and hearing impaired listeners (Studebaker et
al. 1999) in high presentation levels. Each study reported that the relative increase in
masking level at high presentation levels caused the drop in effective SNR level and that
affected the performance. The degree of performance degradation is far worse for cochlear
implant recipients because they rely mainly on the envelope cues of target speech which is
easily corrupted by noise.
The front-end compression limiter was employed in the cochlear implant systems to reduce
envelope clipping at the LGF. The score improvement with the compression limiter was
most significant at the highest two presentation levels at the 20 dB SNR. The channel
envelopes could be clipped extensively at those levels. Employing the compression limiter
certainly helped to improve scores as it reduced the envelope distortion more compared to
the no AGC condition. At 10 dB SNR condition, the group mean score improvement with
the compression limiter was approximately 11 percentage points for presentation levels
above 70 dB SPL. However, the P-I function of the compression limiter showed similar
trend of score degradation as with no AGC processing above 65 dB SPL at 10 SNR. The
positive effect of the compression limiter on the reduction of envelope clipping might get
offset by the negative effects of fast compression on the envelopes of the input stimuli at
high presentation levels when the background noise level was also high.
According to the studies of Stone and Moore, the cross-modulation between target and
competing speech signals introduced by fast compression made the segregation between
them difficult (Stone and Moore 2003, 2004, 2007). In addition to the cross-modulation
effect, the output SNR reduction by fast compressing could also degrade speech
intelligibility in noise (Rhebergen, Versfeld and Dreschler 2008b). It would be interesting to
find out which component contributed more to the speech intelligibility degradation. The
quantification of the effects of the front-end compression limiter on the envelope of the input
stimuli will be studied in chapter 13.
The front-end compression limiter expanded the input dynamic range, 5 dB approximately.
Speech intelligibility improvement was relatively moderate. That indicates the necessity of
P a g e | 115
additional compression stages before the compression limiter to expand the dynamic range
even wider.
8.4 Conclusions
This study investigated the effects of the input dynamic range limitation by the instantaneous
compression at the LGF on the speech intelligibility of cochlear implant recipients. The P-I
functions with no AGC and with the front-end compression limiter were measured for
sentences presented at different presentation levels in two SNR conditions. With no AGC
before the LGF, performance degraded at higher-than-normal presentation levels. Speech
was still intelligible even when the sentence presentation level was very high for high SNR
condition. At low SNR condition, envelope clipping had high impact on the performance not
only due to waveform distortion, but also due to the output SNR degradation. The effect of
background noise level became significant when the noise level was more than 60 dB SPL.
At this level, the background noise produced stimulation at C-level. Although the
compression limiter reduced the envelope distortion at the LGF for the presentation levels
above 65 dB SPL, the score improvement was moderate. Fast compression itself could also
introduce envelope distortion. This study implies that if a slow-acting compression was
available to adjust the overall presentation level, then short intervals of clipping due to
louder transients would not be very objectionable, and there would be little need for fast
AGC. More research will be done in next chapters on the performance of AGC with different
properties.
P a g e | 116
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
9 Investigating Effects of Slow AGC and Fast AGC on
Cochlear Implant Speech Intelligibility
9.1 Introduction
The previous study (Chapter 8) questioned the effectiveness of the front-end compression
limiter to reduce envelope distortion at high presentation levels. It suggested that the use of
slow AGC in the signal path would improve the speech intelligibility of cochlear implant
recipients.
The existing AGC systems of the Nucleus sound processors (ASC, Whisper, the front-end
compression limiter, UGM and ADRO) were described in chapter 5 (§5.5). Chapter 5 also
reviewed the studies on the performance of ASC, Whisper and ADRO in the existing AGC
systems. Although there are a few studies on the performance of the dual-loop AGC from the
Cambridge group, no study has been published on the performance of the tri-loop AGC
(UGM) of the Nucleus CP810 sound processor.
Haumann et al. (2010) conducted a performance study on different cochlear implant systems
using the roving-level SRT test with a single adaptive track. The results showed that
recipients using the Nucleus Freedom performed worse than recipients using the Auria or
Harmony and Opus 2. Haumann et al. hypothesized that the poorer performance was due to
Freedom using only a single fast AGC (i.e., the front-end compression limiter), whilst
Harmony and Opus 2 used a dual-loop AGC system. If Haumann et al.’s hypothesis was
true, then the tri-loop AGC should provide significant benefits under the same test condition.
There are two objectives in this chapter; (i) to study the performance of the existing AGC
system with the tri-loop AGC and ADRO, and (ii) to establish a good test framework for
evaluating AGC systems in cochlear implant systems.
Two roving-level SRT tests were used: the roving-level SRT test with a single adaptive SRT
track (as in Haumann et al.’s study, §7.4.4.1) and the interleaved roving-level SRT test
P a g e | 117
(§7.4.4.2) to evaluate the two processing conditions. The SRT results measured by the two
roving-level SRT tests on the same processing condition will be compared to observe the
effectiveness of each test.
9.2 Clinical Study
9.2.1 Test Setup
The loudspeaker setup (§7.4.1) was used. The two roving-level SRT tests described in §7.4.4
were used to evaluate the two programs. The roving-level SRT test (single adaptive track) in
this study is very similar to the roving-level SRT test used in Haumann et al (2010) study
except for the sentence material.
9.2.2 Subjects
Seven cochlear implant recipients participated in this study: S1, S2, S5, S6, S8, S9 and S10
(refer to Appendix 1 for recipient details). The study was conducted as a repeated-measures
single-subject design in which each subject served as their own control.
9.2.3 Signal Processing
Two AGC systems were evaluated in this study: (i) the front-end compression limiter (FEL)
and (ii) the tri-loop AGC and ADRO (Tri + ADRO). Both the tri-loop AGC and the front-
end compression limiter were configured by changing the parameter setting of the UGM.
The LGF dynamic range of the FEL program was a default 40 dB with C-SPL 65 dB. The
program setting of Tri + ADRO was available as a research option of the Nucleus CP810
sound processor at the time of testing. The C-SPL of the Tri + ADRO program was 75 dB
and therefore the dynamic range of the LGF was 50 dB.
The C-levels of some recipients with the Tri + ADRO program were increased by 10%
approximately to obtain equal loudness as the FEL program. In addition to increasing C-
levels, the Q value of the LGF function was reduced from 20 to 16 in the Tri + ADRO
program. The parameter setting of FEL and Tri + ADRO can be found in the first and second
P a g e | 118
column of Table 5-2. Other parameter setting of each program is shown in Table 9-1. Both
programs were loaded into a CP810 sound processor. The program order was
counterbalanced between subjects.
Processing FEL Tri + ADRO 75

Sensitivity 12 12
C-SPL 65 75
LGF Q Value 20 16
Table 9-1 Parameter setting of each program
9.2.4 Results
9.2.4.1 Roving-level SRT Test (Single Adaptive Track)
The SRT of each subject except S9 was the mean of test and retest SRTs. S9 was not retested
due to limited time available for the study. Figure 9-1 shows the individual and group mean
SRT results of the subjects with each program. All subjects except S6 showed SRT
improvement with the Tri + ADRO program. Subject S10 showed an exceptional SRT
improvement of 12 dB with the Tri + ADRO program. The SRT of S8 and S10 with the FEL
program (approximately 16 dB) was noticeably higher (worse) than the other subjects. The
group SRT performance with the Tri + ADRO program was better than with the FEL
program. According to a t-test, the improvement was statistically significant. The overall
SRT improvement with the Tri + ADRO was approximately 5 dB.
P a g e | 119
Overall SRT (50, 65, 80 dB)

18 18
FEL
16 Tri+ADRO p = 0.0159* 16
14 14
12 12
10 10
SRT (dB)
8 8
6 6
4 4
2 2
0 0
-2 -2
-4 -4
-6 -6
S1 S2 S5 S6 S8 S9 S10 Mean
Subject
Figure 9-1 SRT of seven cochlear implant subjects measured by the roving-level SRT
test with a single adaptive track. The error bar indicates one standard error. The
AGC systems (* p < 0.05, ** p < 0.01).
P a g e | 120
9.2.4.2 Interleaved Roving-level SRT test
The Interleaved roving-level SRT test calculated one SRT for each presentation level.
9.2.4.2.1 At 80 dB SPL
The individual and group mean SRT results of the subjects at 80 dB SPL are shown in Figure
9-2. Six out of seven subjects showed SRT improvement with Tri + ADRO. The mean
benefit of Tri + ADRO was approximately 2.5 dB. According to the analysis with a t-test, the
improvement was statistically significant (p < 0.05*).
Interleaved Roving-level SRT Test: 80 dB SPL

14 14
FEL
12 Tri+ADRO 12
10 p = 0.0388* 10
8 8
6 6
SRT (dB)
4 4
2 2
0 0
-2 -2
-4 -4
-6 -6
Subject
AGC systems (* p < 0.05, ** p < 0.01).
P a g e | 121
9.2.4.2.2 At 65 dB SPL
The individual and group mean SRTs of the subjects at 65 dB SPL are shown in Figure 9-3.
Four of them performed better with Tri + ADRO and three performed better with FEL.
Because of a large SRT variability amongst subjects, there was no significant difference
between the two processing conditions. Subject S9 showed a large SRT improvement of
approximately 12 dB with the Tri + ADRO at 65 dB SPL although the performance at 80 dB
SPL was quite the opposite. The overall improvement with the Tri + ADRO program was
approximately 1.5 dB.
Interleaved Roving-level SRT Test: 65 dB SPL

14 14
FEL
12 Tri+ADRO 12
10 10
8 p = 0.2300 8
6 6
SRT (dB)
4 4
2 2
0 0
-2 -2
-4 -4
-6 -6
Subject
AGC systems (* p < 0.05, ** p < 0.01).
P a g e | 122
9.2.4.2.3 At 50 dB SPL
An SRT could not be obtained for the 50 dB SPL presentation level in the interleaved SRT
test because the scores were below 50% even at the maximum SNR of 30 dB. Figure 9-4
shows an example of bad SRT performance at 50 dB SPL. The adaptive track of the
interleaved SRT test (left panel) shows that the subject could not get half of the words in the
first sentence presented at 20 dB SNR. The SNR was stepped up 5 dB, but the subject could
not still get 50% scores. The SNR was stepped up again and reached to the maximum 30 dB.
It clearly showed that the background noise was not the main factor in this test case.
30 30
25 25
20 20
15 15
10 10
5 5
0 0
-5 -5
-10 -10
0 5 10 15 0 0.25 0.5 0.75 1
Trial number Mean response at each level
Figure 9-4 An example of bad SRT convergence due to the lack of audibility at 50 dB
SPL. Left panel shows the convergence of SRT over trials and right panel shows mean
percent correct words at each SNR.
In order to compare the two AGC programs at 50 dB SPL, the correct word scores from the
sentences presented at an SNR of 5 dB or higher were extracted and summed for each AGC
program. At first, it seems improper to compare the percent correct scores between the two
test conditions measured with an adaptive SRT test because the scores were taken at
different SNRs. However, an exception can be made if SNR was not the main factor that
changed the performance. For that case, comparing the percent correct scores collected at
different SNRs (above 5 dB) between the two test conditions is justifiable. A similar
P a g e | 123
procedure was also followed in recent study (Boyle et al. 2013). The speech intelligibility at
50 dB SPL in the present study was clearly affected by audibility.
100 100
FEL
90 Tri+ADRO 90
80 80
Percent correct words (%)
70 70
p = 2e-5**
60 60
50 50
40 40
30 30
20 20
10 10
0 0
Subject
Figure 9-5 Percent correct scores of seven cochlear implant subjects at 50 dB SPL. The
difference in performance between the two AGC sytems (* p < 0.05, ** p < 0.01).
Figure 9-5 shows the word scores at 50 dB SPL of each subject and the group. A binomial
test showed that subjects obtained significantly higher word scores with Tri + ADRO than
with FEL. The group mean score of the Tri + ADRO was better than that of the FEL
program by 20 percentage points.
9.2.5 Effect of Presentation Level
The advantage of the interleaved roving-level SRT test is that comparisons can be made
between programs as well as between presentation levels. The intelligibility at 50 dB SPL
was the poorest among the three presentation levels tested due to the lack of audibility.
Therefore, the SRT difference between 65 and 80 dB SPL was analyzed for each program.
Worse performance was observed with FEL at 80 dB SPL compared to 65 dB SPL. The
performance degradation with FEL at the high presentation level was consistent with the
P a g e | 124
performance indicated by the P-I function in the previous study (§8.2.4.2). Figure 9-6 shows
the SRT comparison between 65 and 80 dB SPL for FEL. Six out seven subjects showed
higher (worse) SRT at 80 dB SPL than at 65 dB SPL. However, the difference was not
statistically significant. Subject S9 was different from the other subjects because the SRT of
S9 at 80 dB SPL was lower (better) than at 65 dB SPL. The overall difference was just above
3 dB.
FEL: effect of presentation level

14 14
65 dB
12 80 dB 12
p = 0.0851
10 10
8 8
6 6
SRT (dB)
4 4
2 2
0 0
-2 -2
-4 -4
-6 -6
Subject
Figure 9-6 Comparison of SRTs between 65 and 80 dB SPL for the FEL program. The
Tri + ADRO also showed higher (worse) SRT at 80 dB SPL than at 65 dB SPL. Figure 9-7
shows the overall SRT difference between 65 and 80 dB SPL for the Tri + ADRO program
was just over 2 dB. The difference was statistically significant according to the analysis with
a t-test (p < 0.05*).
P a g e | 125
Tri+ADRO: effect of presentation level

14 14
65 dB
12 80 dB 12
10 10
8 p = 0.0369* 8
6 6
SRT (dB)
4 4
2 2
0 0
-2 -2
-4 -4
-6 -6
Subject
Figure 9-7 Comparison of SRTs between 65 and 80 dB SPL the Tri + ADRO program.
The error bars indicate one standard error. The asterisks indicate statistically
significant difference in performance between the two test conditions (* p < 0.05, ** p <
0.01).
The SRT difference between 65 and 80 dB SPL was larger with FEL than with Tri + ADRO
for all subjects except S9 and S10. The SRT degradation of S10 at 80 dB SPL was
comparable between the two programs. S9 was different from others, and showed SRT
improvement at 80 dB SPL with the FEL. It could be that S9 needed more audibility than the
other subjects because she got more benefits from the high presentation level of 80 dB SPL.
Every subject except S9 with FEL showed equal or better SRT at 65 dB SPL compared to 80
dB SPL.
P a g e | 126
9.2.6 Test-retest Reliability of Roving-level SRT Test with Single
Adaptive Track
A retest on each program was only done with the roving-level SRT with single adaptive
track because of limited time available from the subjects who volunteered in the study.
Therefore, test-retest analysis was only done for the roving-level SRT test with single
adaptive track. The SRTs of all subjects except S9 were measured again with both programs.
Test:FEL
Test:Tri+ADRO
15 Retest:FEL
Retest:Tri+ADRO
SRT (dB)
10
FEL: Test - Retest

6 Tri+ADRO: Test - Retest
2
dB
-2
-4
S1 S2 S5 S6 S8 S10 Mean
Subject
Figure 9-8 Test-retest variability of the roving SRT test with single adaptive track from
the SRT of six subjects taken from test and retest sessions. Top panel shows SRTs of
each subject and group mean for test and retest. The bottom panel shows the SRT
difference between test and retest, and mean of the absolute SRT differences.
Figure 9-8 shows the test and retest SRTs of the subjects with each program. The difference
was taken as the SRT of trial 1 minus that of trial 2. A positive value indicates that the SRT
of trial 2 is better. One possible cause for high test-retest variability of the roving-level SRT
test could be due to randomized sequence of the presentation levels. When the less favorable
presentation level was presented at the beginning of the test, SNR was increased by a larger
P a g e | 127
step until the performance reversal occurred. Since the number of sentences was limited, the
adaptive track might not be able to converge to the final SRT.
Figure 9-9 shows an example of poor test-retest reliability of due to a randomized sequence
in the roving-level SRT with a single adaptive track. The analyzed case was the test-retest
SRTs of S5 with Tri + ADRO. The top panel of Figure 9-9 shows an example of poor SRT
convergence due to the randomized sequence. The shading of the circles represents the
percent word scores at each sentence. Black indicates 0%, white indicates 100%, and the
values in between are shown by grey. The first five sentences were presented at 50 dB SPL.
Due to the lack of audibility, the SNR was stepped up until it reached to the maximum, 30
dB SNR. Although the SNR was reduced at other presentation levels, the adaptive track was
yet to converge to the final SRT. Perhaps, a few more sentences were required to get there.
The bottom panel of Figure 9-9 shows an example of a good SRT convergence for
evaluating the same program. Four sentences at the beginning were presented at 65 and 80
dB SPL. The SNRs were converged to the true SRT of S5 after half of the sentences were
presented. Poor audibility at 50 dB SPL was still observed, but overall SRT was less biased
by its effect.
It is hypothesized that the effect of presentation level has less impact on the interleaved SRT
test because the test can isolate the negative effects of any presentation level to its own SRT.
It means that the interleaved SRT test can still measure a good SRT of the other two
presentation levels even if one presentation level is problematic. Test-retest reliability of the
interleaved SRT test will be studied in the clinical study of chapter 12.
P a g e | 128
Test: Tri
30 5050
50
25 50 80
80 80 65
20 50 50 65 65 50 80
50 65 50 65
15 50 65
SNR (dB)
80
10 80
65
5 Estimate = 6.7 65 80
80 65 65
0 80 80
-5
-10
0 5 10 15 20 25 30
Trial number
Retest: Tri
30
25
20 65
15
SNR (dB)
10 80
5 65
65 65 65 80 80
0 65 80 50 50 50 50 65 50 65 80 50 80
65 50 80 65 50 50 50
-5 80 80
Estimate = -0.4
-10 80
0 5 10 15 20 25 30
Trial number
Figure 9-9 Adaptive tracks of the roving-level SRT test with a single adaptive track for
S5 with the Tri + ADRO in the test and retest sessions. The convergence of SNRs was
poor in the test session (top panel) and good in the retest session (bottom).
The test-retest variability was high for subject S2, S5, S6 and S10 with Tri + ADRO. The
difference was more than 5 dB. Compared to Tri + ADRO, a test-retest variability of FEL
was low. The performance of Tri + ADRO at one presentation level depended on the level of
previous sentence because of the long time constants. For example, a test sequence with
P a g e | 129
smaller changes between the presentation levels (+/-15 dB) would favor the results more
than a sequence with larger steps (+/- 30 dB).
9.3 Discussions
The poor roving-level SRT results with the front-end compression limiter was due to the
poor performance at high (80 dB SPL) and low (50 dB SPL) presentation levels, shown
exclusively by the interleaved SRT test. The previous study (§8.2.4.2) showed that percent
correct scores of the subjects with FEL was lowered by approximately 40 percentage points
when the presentation level was increased from 65 to 80 dB SPL. In this study, a 3 dB higher
(poorer) SRT was observed at 80 dB SPL compared to 65 dB SPL. The transfer function of
percent correct scores to SRT in this study agrees with the study of Plomp (1994), which
showed that an SRT increase of 3 dB could be equated to 50% decrease in percent correct
scores for hearing impaired subjects.
SRT greater than 20 dB at 50 dB SPL with the FEL program showed that performance was
reduced by the lack of audibility, rather than the level of background noise. The subjects
could achieve at most 30% correct scores at 50 dB SPL.
The tri-loop AGC improved the performance of the subjects for speech presented at high
presentation level. The tri-loop AGC contains slow and medium time constants AGCs in
addition to the FEL. In the tri-loop AGC, the slow AGC primarily reduced the input signals
at high presentation levels. Hence, the performance improvement was attributed mainly to
the slow AGC that adjusted the overall presentation level of speech to be within the input
dynamic range without serious envelope distortions. The results also agree with the study of
Boyle et al. (2009) which showed that the dual-time-constants AGC performed better than
the fast AGC for roving-level speech. Using a slow AGC as the main one for the level
adjustment of the input signals is shown to be important for speech intelligibility in adverse
conditions. It should be noted that the medium and fast AGC were still important because
they adjusted the signal level when the slow AGC was not ready to respond.
Studies on ADRO by other researchers showed that ADRO improved the audibility of low-
level speech (§5.4). In the present study, the performance improvement at 50 dB SPL by the
P a g e | 130
Tri + ADRO cannot be attributed to the tri-loop AGC, because it had unity gain for the
envelopes below the compression threshold. The maximum gain ADRO could provide was 3
dB, however when ADRO was on, the channel equalization (§4.3.4) was bypassed as shown
in Figure 11-1. The channel equalization has attenuation of up to 6 dB on the high frequency
channels. Thus Tri +_ADRO provided up to 9 dB additional gain relative to FEL.
Even so, more audibility was needed at 50 dB SPL with the Tri + ADRO program. Another
possible reason for not achieving a good audibility at low presentation level with the Tri +
ADRO program was the C-SPL at 75 dB SPL. Without any adjustment, a program with C-
SPL 75 would stimulate at lower current levels than a program with C-SPL 65. Therefore, C-
levels of the Tri + ADRO program with C-SPL 75 were set approximately 10% higher than
the C-levels of the FEL program with C-SPL 65. The steepness factor Q of the LGF function
was set to 16 (more compressive) in the C-SPL 75 program. With these adjustments,
loudness was assumed to be equal between the two C-SPL programs for speech presented at
the same level. Perhaps 10% increase in C-levels with the Tri + ADRO was not enough to
achieve equal loudness level.
The roving-level SRT test with a single adaptive track showed poor test-retest reliability.
Using a single adaptive track for all three presentation levels with a randomized sequence to
calculate a single SRT is suspected to be the cause of poor test-retest reliability. The
convergence of an adaptive rule is only guaranteed when the underlying psychometric
function is constant. This is clearly not the case for the processing conditions whose
performance depended on presentation level. It was hypothesized that the randomized
sequence could have less effect on SRT produced by the interleaved SRT test due to the
independent adaptive tracks. Test-retest reliability of the interleaved SRT test will be
observed in the clinical study of chapter 12. From the SRT results obtained from the two
SRT tests, the interleaved roving-level SRT level could be claimed as a better test procedure
because it measured SRT at each presentation level. The effects of presentation level are
shown separately in each adaptive track and hence can be observed more clearly.
Another suggestion to improve test-retest reliability is to use a fixed presentation level
sequence in roving-level SRT tests. If the sequence of presentation levels is fixed, the impact
P a g e | 131
of poor presentation levels on the SRT can be reduced. A fixed sequence of presentation
level, for example a repeated sequence of 50, 65 and 80, can avoid an unbalanced
presentation level sequence such as the one shown in Figure 9-9. Another advantage of using
a fixed sequence of presentation levels is that AGC systems will be tested with equal number
of up-steps and down-steps. For example, the number of +15 dB up-steps and -30 dB down-
steps in a roving-level SRT test that repeats a fixed sequence of 50, 65 and 80 dB SPL for a
total of 30 sentences are 20 and 9 respectively. Hence, the performance variation can be
more attributable to different processing conditions of systems than factors related to test
systems.
Compared to a fixed-presentation level test, an adaptive roving-level SRT test can represent
different listening conditions in everyday life. However, some test conditions are uncommon
in listening conditions outside a laboratory, for example, presenting of the background noise
0.5 seconds before and after each sentence. It is more common to have a continuous
background noise between conversations in everyday environments. Besides, a gain
algorithm with a long time constants like ASC needs continuous background noise to fully
evaluate its performance in noise.
The advantage of the interleaved roving-level SRT test is measuring an individual SRT for
each presentation level. The disadvantage is that the test needs more sentences for the SNRs
to converge to the 50% correct score point. At least 16 sentences for each presentation level
should be used in the interleaved roving-level SRT test. Since the number of sentence lists is
often limited, it is important to use them wisely to avoid having to repeat them. If a subject is
tested with the same set of sentences within a short period of time, performance comparison
may not be valid due to a bias introduced by familiar test stimuli.
P a g e | 132
9.4 Conclusions
This chapter expanded the knowledge on AGC systems with different characteristics. It
showed the performance shortcomings of the front-end compression limiter in difficult
listening conditions. The existing AGC systems of the Nucleus CP810, the tri-loop AGC and
ADRO, improved SRT in all three presentation levels tested. The results highlighted the
importance of a slow AGC to adjust the overall presentation level of the input signal to be
within the LGF dynamic range. Although the existing AGC system improved the
performance at all three presentation levels, there is still room for improvement at low and
high presentation levels when compared to the SRT at 65 dB SPL. An ideal AGC system
would produce equal SRTs at all three presentation levels. Gain optimization techniques to
improve the audibility of low-level input stimuli as well as to reduce the spectro-temporal
distortion of input stimuli at high presentation levels should be further investigated.
Based on the findings of the present study on the roving-level SRT tests, a better SRT test
will be determined for evaluation of future gain control algorithms. The proposed roving-
level SRT test was the interleaved SRT test with a fixed sequence of sentence presentation
levels.
P a g e | 133
Chapter 10 Proposed Envelope Profile Limiter
10 Proposed Envelope Profile Limiter
10.1 Introduction
The speech intelligibility of cochlear implant recipients with no AGC and with just the front-
end compression limiter in the signal path was studied in chapter 8. The main cause of
envelope distortion with no AGC was envelope clipping at the LGF. The front-end
compression limiter improved the speech intelligibility of the subjects but performance
degradation at high presentation levels still showed similar trend. This study will investigate
different ways to optimize a compression limiter of a cochlear implant system.
It was considered that bringing together all the gain control elements at a single point in the
signal path, preferably after the filterbank, allowed simplification. Slow AGCs such as ASC
and ADRO, could be combined. It may also allow better integration of new features such as
SNR-based noise reduction (Dawson, Mauger and Hersbach 2011) and dual-microphone
spatial noise reduction (Spriet et al. 2007), which also act on the filterbank outputs. In this
chapter, an initial investigation is made on the feasibility of moving the front-end
compression limiter to just before the LGF. A potential benefit of monitoring signal levels at
the input to the LGF is that an AGC can use that information to prevent envelope clipping.
Such AGC can optimize spectral envelopes of an input signal by preserving the shape of
spectral profile.
A secondary goal was to investigate the effect of compression speed on speech intelligibility.
There is no clear consensus regarding the optimal AGC speed in acoustic hearing aids
(Dillon 2001; Souza 2002; Gatehouse, Naylor and Elberling 2006; Moore 2008; Kates
2010).
Hearing aids often use multichannel AGCs because the amount of hearing loss, and the
dynamic range of residual hearing, varies with frequency. However, if each channel operates
independently, with fast time constants, then amplitude differences across frequencies will
be reduced, degrading the spectral cues used in recognising speech (Plomp 1994; Stone and
P a g e | 134
Moore 2004, 2008). Given these results, it would be expected that independent fast AGC on
22 channels would give very poor performance. A solution is to cross-couple the channels,
so that the gains are related in some way (White 1986). A coupled AGC between frequency
channels is a functional requirement of the peripheral hearing system to be able to cope with
a wide range of loudness (Lyon’s auditory model, §2.4).
The previous chapter showed the benefits of employing slow AGCs in the signal path for
roving-level speech at low and high presentation levels. The tri-loop system provided benefit
because the compression limiter was activated relatively infrequently. However the fast
AGC will operate whenever there is a sudden increase in the speech level, so it is still
worthwhile understanding its effect.
10.2 Signal Processing
Two AGCs, the front-end compression limiter and the proposed multichannel compression
limiter, with two release times, 75 and 625 ms, were investigated. The AGCs were
incorporated in the simplified ACE path shown in Figure 8-1. The signal processing in the
ACE sound coding strategy and the AGCs was implemented on Simulink for testing with the
Nucleus-xPC system (§7.4.5.2).
10.2.1 Front-end Compression Limiter
The front-end compression limiter (FEL) was a single-channel AGC located before the
filterbank. The implementation and parameter setting were the same as described in §8.2.2.
P a g e | 135
10.2.2 Proposed Envelope Profile Limiter
Figure 10-1 shows the block diagram of the proposed multichannel compression limiter in
the signal path.
Figure 10-1 Block diagram of ACE signal path with the envelope profile limiter (EPL)
The Max block produced the instantaneous maximum value, across channels, of the set of
envelopes, so that the gain rule acted upon whichever channel had the largest amplitude. The
gain rule had unity gain up to the compression threshold, and infinite compression beyond.
The resulting gain was then applied to all channels. Unlike the front-end AGC, zero attack
time could be realised because gain changes after the filterbank cannot produce any
undesirable spectral smearing. The rise time of the envelopes was thus determined by the
filterbank, and no overshoot could occur. The release time was either 75 ms or 625 ms.
With all channels having equal gain, at first glance it may appear that the multichannel
limiter would have behaviour identical to the front-end limiter. The difference is that levels
were observed after the filterbank, where they directly control the stimulation current, and
the compression threshold was set equal to the saturation level of the LGF. Thus no envelope
could exceed the LGF saturation level. Because it eliminates envelope clipping, and
preserves the spectral profile, this multichannel AGC will be referred to as the Envelope
Profile Limiter (EPL).
P a g e | 136
The behaviour of the two AGCs is compared in Figure 10-2. At high presentation levels, the
FEL allows some envelope clipping. This has three detrimental effects. Firstly, it distorts the
spectral profile. As shown in the top panel of Figure 10-2, it flattens the formant peak,
making it harder to determine the formant frequency, potentially degrading vowel
perception. Secondly, examining the temporal waveform in the bottom panel of Figure 10-2,
the amplitude modulation is lost. For a vowel, this modulation occurs at the fundamental
frequency, and is the primary cue to voice pitch. Thirdly, at positive SNRs, envelope
clipping reduces the amplitude of the signal peaks relative to the background noise, thus
reducing the SNR. The EPL avoids these drawbacks, and it was hypothesized that it would
provide better speech intelligibility.
Figure 10-2 Envelope clipping of a vowel; at spectral waveform (top panel) and
temporal waveform (bottom panel) at the output of the LGF, processed by FEL and
EPL
P a g e | 137
10.3 Clinical Studies
10.3.1 Test Setup
The fixed method described in §7.4.3 was used with the morphemic scoring method.
10.3.2 Study Design
The study used a repeated measure, single-subject design in which each subject served as
their own control. The test order was counterbalanced between subjects. At the beginning of
the listening session, the subjects were asked to comment on the loudness they perceived
from their own voice and the researcher’s voice with each AGC. They were not informed as
to which AGC was being tested.
Experiment 1 was a two-factor design, measuring sentence recognition in quiet and in noise,
at a high presentation level. Experiment 2 measured sentence recognition in noise as a
function of presentation level, for two of the AGC configurations from Experiment 1. As the
goal was to investigate the effect of fast AGC, the usual slow gain control blocks in the
Nucleus signal path (ASC and ADRO) were disabled in Experiment 1 and 2.
10.3.3 Experiment 1: High Presentation Level
Experiment 1 compared the front-end limiter and the envelope profile limiter with two
release time settings; 75 and 625 ms. Abbreviations for the four AGC configurations are
listed in Table 10-1.
Release Time (ms)

AGC Type
75 625
Frontend compression limiter FEL75 FEL625
Envelope profile limiter EPL75 EPL625
Table 10-1 AGC configurations tested
P a g e | 138
In this experiment, the presentation level of the sentences was set to the highest possible
level, 89 dB SPL, at which envelope clipping occurred significantly with no AGC. The
speech intelligibility of cochlear implant recipients with each AGC configuration was
measured in this listening condition. The purpose was to study the important factors of AGC
systems affecting speech intelligibility. The comparisons between the two AGC systems
with the same release time aimed to show the importance of the gain structure whereas the
comparison within the same AGC system with different release times was to show the
effects of time constants.
The direct connect setup was used in this experiment. Sentences were presented in two
conditions: in quiet, and in four-talker babble at 10 dB SNR. A total of 16 sentences were
presented for each test condition. The speech-in-quiet condition reveals the effects of
envelope distortions, in particular envelope clipping, as well as reduced amplitude
modulation depth. The speech-in-noise condition is also subject to envelope distortion, and
in addition, a fast AGC can worsen the effective SNR and introduce cross-modulation
components between target speech and background noise.
10.3.4 Experiment 2: Performance-Intensity Function
The high presentation level used in experiment 1 was not representative of everyday
listening conditions. The objective of experiment 2 was to measure performance over a wide
range of presentation levels, i.e. to obtain AGC performance-intensity functions. Because of
the limited availability of the subjects’ time, only two AGC configurations were tested:
FEL75, which gave the lowest scores in Experiment 1; and EPL625, which gave the highest
scores in Experiment 1. The two AGCs were evaluated at presentation levels from 55 to 89
dB SPL, in four-talker babble, at two SNRs: 10 and 20 dB. The 20 dB SNR condition was
used, instead of the speech-in-quiet condition of experiment 1, to avoid ceiling effects.
All subjects were initially tested with the loudspeaker setup. Subject S1 obtained surprisingly
good scores at the higher presentation levels, apparently assisted by his residual contralateral
hearing (despite that ear being plugged), and therefore he was retested using the direct
connect setup.
P a g e | 139
10.4 Results
10.4.1 Experiment 1: High Presentation Level
Six cochlear implant subjects participated in Experiment 1. Hypothesis testing was done for
each comparison by using the statistical analysis by Monte Carlo simulation, assuming
binomial distributions (Simon 1997). Appendix 3 described the statistical method. The
statistical significance tests done for each comparison between the four AGC configurations
are shown in Table 10-2. Since four hypothesis testings made on the dataset of the two AGC
systems, the Bonferroni correction was made to adjust the significance level. Hence the
significance level for the probability (p-value) of the two AGC conditions to be the same is
0.05/4 = 0.0125. The p-value lower than 0.0125 will reject the null hypothesis and indicate
the two conditions are different.
100
Percent Correct (%)
80
60
40
20
0
FEL75
EPL75
FEL625
Percent Correct (%)
100 EPL625
80
60
40
20
0
Subject
Figure 10-3 Effects of gain structure and release time on speech intelligibility of six
cochlear implant subjects in quiet (top panel) and in noise (bottom panel). Error bars
indicate one standard error.
P a g e | 140
Subject AGC condition Quiet Noisy
scores (%) p-value scores (%) p-value
S1 EPL75 - FEL75 7.54 0.1049 -7.23 0.0556
EPL625 - FEL625 4.15 0.1287 11.96 0.0414
FEL625 - FEL75 17.73 5e-4** 39.08 <1e-4**
EPL625 - EPL75 14.34 0.0011** 58.27 <1e-4**
S3 EPL75 - FEL75 8.51 0.0067* 4.85 0.0979
EPL625 - FEL625 - - 11.72 0.0426
FEL625 - FEL75 - - 52.33 <1e-4**
EPL625 - EPL75 - - 59.32 <1e-4**
S4 EPL75 - FEL75 20.04 0.0002** 4.8 0.1771
EPL625 - FEL625 -1.98 0.1055 -1.49 0.4203
FEL625 - FEL75 28.9 <1e-4** 42.73 <1e-4**
EPL625 - EPL75 6.84 0.016 36.44 <1e-4**
S5 EPL75 - FEL75 2.33 0.2841 35.22 <1e-4**
EPL625 - FEL625 0 0.5 8.55 0.0505
FEL625 - FEL75 9.47 0.002** 55.8 <1e-4**
EPL625 - EPL75 7.14 0.0054* 29.13 <1e-4**
S6 EPL75 - FEL75 -4.08 0.0817 -11.39 0.0385
EPL625 - FEL625 0 0.5 -4.01 0.1818
FEL625 - FEL75 1.98 0.1055 56.57 <1e-4**
EPL625 - EPL75 6.06 0.0103* 63.95 <1e-4**
S7 EPL75 - FEL75 -0.04 0.5097 -0.38 0.4803
EPL625 - FEL625 -0.05 0.5052 14.49 0.0107*
FEL625 - FEL75 0.07 0.5 31.69 <1e-4**
EPL625 - EPL75 0.06 0.5044 46.56 <1e-4**
Group EPL75 - FEL75 5.72 0.0006* 4.31 0.0329
EPL625 - FEL625 0.42 0.3085 6.87 0.0031*
FEL625 - FEL75 6.9 0.0001** 46.4 0.0001**
EPL625 - EPL75 11.6 0.0001** 48.9 0.0001**
P a g e | 141
Table 10-2 Statistical analysis on the scores of the individual and group. The score
difference was obtained by subtracting the score of the first AGC from the second
AGC. The asterisks indicate statistically significant difference in performance between
the two AGCs (* p < 0.0125, ** p < 0.0025).
10.4.1.1 In Quiet
The upper panel of Figure 10-3 shows the percent correct scores of the cochlear implant
subjects with each AGC configuration evaluated in quiet. Scores in quiet exhibited a ceiling
effect, especially for the 625 ms release time. S3 did not undertake the 625 ms release time
condition due to time constraints and because his scores were likely to have been near
ceiling; the mean scores for the 625 ms condition shown in the upper panel of Figure 10-3
are for the remaining subjects.
The group mean scores for sentences presented in quiet were more than 85% for all AGC
conditions. The EPL75 was 5.7 percentage points (from 86.9% to 92.6%) higher than that of
the FEL75. This improvement was statistically significant according to the binomial test.
The group mean scores of the FEL625 and the EPL625 were almost the same. The scores
were improved when the release time was increased from 75 ms to 625 ms; 11.6 percentage
points (from 86.9% to 98.5%) with the FEL and 6.9 percentage points (from 92.6% to
99.5%) with the EPL. The improvement was statistically significant. The score was mainly
determined by the release time.
10.4.1.2 In Four-talker Babble Noise (SNR = 10 dB)
The background noise caused a substantial degradation in speech understanding. The bottom
panel of Figure 10-3 shows the percent correct scores of the cochlear implant subjects with
each AGC configuration for sentences in the presence of four-talker babble noise at 10 dB
SNR. The overall speech intelligibility in noise was higher with the EPL than with the FEL.
The group mean score improvement was 4.4 percentage points (from 20.8% to 25.2%) for
the release time 75 ms and 6.9 percentage points (from 67.2% to 74.1%) for the release time
625 ms. The score difference between the EPL625 and the FEL625 was statistically
significant according to the paired comparison by the binomial test. Increasing the release
P a g e | 142
time from 75 ms to 625 ms significantly improved speech intelligibility in noise for both
AGC systems. The group mean score improvement was 46.4 percentage points (from 20.8%
to 67.2%) for the FEL and 48.9 percentage points (from 25.2% to 74.1%) for the EPL
respectively.
When the percent correct scores of all four AGC conditions are compared, the EPL625
obtained the highest scores and the FEL75 obtained the lowest scores. The difference was
approximately 53.3 percentage points.
When the scores in quiet and in noise were compared for each AGC, a significant interaction
between the release time and the background noise was observed for each AGC system. That
means the slow release time is more important in noisy conditions.
10.4.2 Experiment 2: Performance-Intensity Functions
In Experiment 1, speech understanding was measured only for speech presented at the
highest presentation level. It is more informative to measure the performance of each AGC
systems over a wide range of speech presentation levels. Performance-Intensity (P-I)
functions observe the sensitivity of a gain algorithm to the intensity of the stimulus. If the P-I
functions were to be measured for the nine presentation levels from 55 to 89 dB SPL in two
SNR conditions for four AGC settings, a total of 72 sentence lists would be necessary.
Instead, in the interest of time, it was decided to measure only the P-I functions of the two
AGC systems from Experiment 1 with the largest performance difference between them: the
front-end limiter with the release time of 75 ms and the envelope profile limiter with the
release time of 625 ms. The P-I functions of these two AGC conditions were measured with
sentences presented at each presentation level from 55 to 89 dB SPL at two SNRs, 10 and 20
dB.
Six cochlear implant subjects participated in Experiment 2. Five of them also participated in
Experiment 1. The top six panels of Figure 10-4 show the percent correct scores of the
individual subjects and the bottom panel shows the group mean scores.
P a g e | 143
100
Percent correct (%)
80
60
40
20
S5 S6
0
100
Percent correct (%)
80
60
40
20
S3 S4
0
100
Percent correct (%)
80
60 SNR10 FEL75
SNR20 FEL75
40 SNR10 EPL625
SNR20 EPL625
20
S1 S2
0
55 60 65 70 75 80 83 86 8955 60 65 70 75 80 83 86 89
Presentation Level (dB SPL) Presentation Level (dB SPL)
100
90
80
Percent correct (%)
70
60
50
40
30
SNR10 FEL75
20 SNR20 FEL75
10 Mean SNR10 EPL625
SNR20 EPL625
0
55 60 65 70 75 80 83 86 89
Presentation Level (dB SPL)
Figure 10-4 Performance-intensity functions of FEL75 and EPL625
P a g e | 144
SNR = 20 dB
100 ** *
** ** ** **
**
Percent Correct(%)
80
60
40
20
0
SNR = 10 dB
100 FEL75
EPL625
Percent Correct(%)
80 ** **
** **
60 ** ** **
40
20
0
Subject
Figure 10-5 Comparison of scores between the FEL75 and the EPL625 for the
presentation levels above 70 dB SPL at SNR 20 dB (top panel) and SNR 10 dB (bottom
panel). The asterisks indicate statistically significant difference in performance between
the two AGCs (* p < 0.05, ** p < 0.01).
P a g e | 145
Subject SNR = 20 dB SNR = 10 dB
 scores (%) p-value  scores (%) p-value
S1 10.81 0.0005** 25.98 0.0005**
S2 5.5 0.0005** 16.02 0.0005**
S3 9.23 0.0005** 22.49 0.0005**
S4 20.48 0.0005** 39.39 0.0005**
S5 2.86 0.0485* 17.92 0.0005**
S6 4.42 0.001** 38.55 0.0001**
Group 8.88 0.0001** 26.73 0.0001**
Table 10-3 Statistical analysis of the scores between the FEL75 and the EPL625 for the
presentation levels above 70 dB SPL in two SNR conditions. The asterisks indicate
statistically significant difference in performance between the two AGCs (* p < 0.05, **
p < 0.01).
10.4.2.1 At SNR 20 dB
The subjects scored more than 85% for the presentation level from 55 to 80 dB SPL at 20 dB
SNR. Above 80 dB SPL, the speech intelligibility started to decline with the FEL75. The
EPL625 consistently maintained a high level of performance at all presentation levels. The
group mean scores dropped by approximately 25 percentage points with the front-end limiter
when the presentation level was increased from 80 dB SPL to 89 dB SPL. The variation
between the subjects was also higher with the FEL75 than with the EPL625 at high
presentation levels. For example, subject S3 and S4 performed lower than the others at the
presentation level above 83 dB SPL.
For statistical analysis, the percent correct scores of each AGC system were averaged for the
presentation levels above 70 dB SPL, at which both AGCs were active. The top panel of
Figure 10-5 shows the percent correct scores of the two AGC systems for the presentation
level above 70 dB SPL in the SNR of 20 dB. All subjects showed significantly better
performance with the EPL625. The range of score improvement was from 4.4 to 20.6
percentage points with the average of 8.9 percentage points.
P a g e | 146
10.4.2.2 At SNR 10 dB
At 10 dB SNR, the speech intelligibility varied widely between the subjects. The group mean
scores started to degrade above 65 dB SPL for both AGC systems. Approximately 70
percentage points drop in the percent correct scores was observed with the FEL75 when the
presentation level was increased from 65 dB SPL to 89 dB SPL. With the EPL625
approximately 30 percentage points drop in the mean scores was observed when the
presentation level was increased from 65 dB SPL to 89 dB SPL. The rate of score
degradation from 65 to 75 dB SPL was approximately the same for the two AGCs. The
scores continued to drop with FEL75 for the presentation level above 75 dB SPL. Compared
to that, no significant score degradation was observed with the EPL625 for the presentation
level above 75 dB SPL. All subjects scored more than 50% with the EPL625 at the highest
two presentation levels.
The percent correct scores of the two AGC systems were compared for the presentation
levels above 70 dB SPL as a group. The individual and group mean scores are shown in the
bottom panel of Figure 10-5. All subjects scored significantly higher with the EPL625 than
with the FEL75. The range of improvement was from 16 to 40 percentage points with the
average of 26.7 percentage points.
10.5 Discussions
In Experiment 1, when the release time was kept constant, the envelope profile limiter gave
equal or better speech intelligibility when compared to the front-end compression limiter.
The speech-in-quiet condition revealed the effect of envelope distortion. Figure 10-6 shows
the proportion of envelope samples that exceeded the LGF saturation level (i.e. the amount
of envelope clipping) with the front-end limiter for sentences presented at 89 dB SPL in
Experiment 1 (the envelope profile limiter is not shown because it had zero clipping under
all conditions). In both quiet and noise, increasing the release time substantially reduced the
amount of clipping, because the gain was lower on average.
P a g e | 147
30
FEL75
FEL625
25
Percent envelopes clipped(%)
20
15
10
0
In Quiet In 10 dB SNR
Test Condition
Figure 10-6 Proportion of clipping for speech presented at 89 dB SPL with the front-
end compression limiter with the release time 75 ms and 625 ms
With 75 ms release time, about 10% of envelope samples were clipped, and the envelope
profile limiter provided a small benefit (approximately 6 percentage points), perhaps due to
better representation of spectral peaks. According to (Drullman 1995), modulation in spectral
peaks carries more intelligibility than that in spectral troughs. Hence the performance
improvement with the envelope profile limiter may be explained by its preservation of
spectral peaks. With 625 ms release time, clipping affected less than 4% of envelope
samples, so there was little scope for the envelope profile limiter to provide benefit. The
results suggest that the subjects were not very sensitive to envelope clipping. This is
consistent with the P-I functions shown in the previous study (§8.2.4), where subjects with
no AGC scored well at high SNR at very high presentation levels. Zeng and Galvin III
(1999) found relatively small reduction in cochlear implant vowel intelligibility (about 10
percentage points) in noise and in quiet when the electrical dynamic range was reduced to
one current level, giving a binary representation, which is equivalent to 100% of the pulses
being affected by envelope clipping. It should be noted that these results were obtained with
the ACE or SPEAK coding strategies, which select the envelopes with largest amplitude for
P a g e | 148
stimulation in each cycle; it is possible that envelope clipping may be more detrimental in a
coding strategy such as CIS, which stimulates all channels in each cycle.
One methodological issue with Experiment 1 was the ceiling effect for sentences in quiet,
especially with 625 ms release time. To better observe a difference between the two AGC
types in quiet, more difficult speech material is needed. Isolated words (e.g. CNC words)
could be used, perhaps with a carrier phrase to exercise the dynamic behaviour of the AGC
systems. An alternative is to use low predictability or nonsense sentences (Boothroyd and
Nittrouer 1988).
The two experiments in this study clearly showed that fast compression speed was
detrimental to speech intelligibility. The effect was consistent across subjects, and was
greatest for speech in noise, with scores in Experiment 1 dropping by more than 45
percentage points when the release time was decreased. In Experiment 2, it is highly likely
that the advantage of EPL625 over FEL75 was primarily due to the longer release time. The
consistency and size of the detriment for fast compression with cochlear implants contrasts
with the mixed results obtained in studies with acoustic hearing aids (Gatehouse, Naylor and
Elberling 2006). Moore (2008) proposed that the benefit depended on the individual’s ability
to process temporal fine structure, which facilitates listening to the dips of background noise.
The results of the present study are consistent with that hypothesis, as cochlear implants are
unable to convey temporal fine structure.
Release time had a significant effect for speech in quiet in Experiment 1, implying that
temporal envelope distortion played a role. As cochlear implant speech perception relies on
envelope cues, the fidelity of the envelopes in the limited number of frequency channels is
important (Stone and Moore 2008). The compression speed of 1.6 Hz (for the release time
625 ms) is unlikely to have a significant effect on the modulation rate of the phonetic entities
except stress pattern. Compared to that, the compression speed of 13.33Hz (for the release
time 75 ms) affects the modulation of most phonetic entities; words, syllables and phonemes
in particular (Plomp 1983). Comparing the scores in quiet between the two release times, 75
and 625 ms, in each AGC system, supported the importance of preserving low rate
modulation. Temporal modulation below 16 Hz is perceptually most important for speech
P a g e | 149
(Houtgast and Steeneken 1985; Drullman, Festen and Plomp 1994b). A low rate amplitude
modulation of less than 4 Hz could also contribute the speech intelligibility in noise when the
listeners relied only on the envelope information (Füllgrabe, Stone and Moore 2009). Based
on those studies, the AGC release time needs to be at least 500 ms to maintain temporal
modulation cues.
Speech intelligibility of the participants degraded significantly in noise. The additive noise
was more detrimental to speech intelligibility than the compression that reduced the
modulation depth (Drullman 1995). Several factors could contribute to the degradation of
speech intelligibility of cochlear implant recipients at high levels in the presence of noise;
the envelope modulation reduction, the energetic masking of the speech by the noise, the
distortion of the low-rate envelope modulation by compression, and cross-modulation
between target speech and noise introduced by compression. Each of these factors will be
studied in chapter 13.
The poor results with 75 ms release time probably explain why Spahr et al. (2007) found that
ESPrit 3G users performed worse in noise than users of the CII or Tempo+ sound processors
(which had dual-loop AGC systems). The ESPrit 3G processor (released in 2002), used a
front-end compression limiter with a release time of 82 ms, and although ASC was available,
it was not enabled in the default processor setting. In contrast, ASC is on by default in the
CP810 processor (released in 2009), giving a dual-loop AGC system. The performance-
intensity functions of Experiment 2 suggest the improvement that would be obtained with
ASC. Based on bench measurements, at 89 dB SPL and 10 dB SNR, ASC would reduce the
gain by 18 dB; this is equivalent to reducing the presentation level to 71 dB SPL, and
suggests that scores would improve from about 20% correct to 80% correct.
The experiments in the present study showed some speech intelligibility improvements with
the envelope profile limiter compared to the front-end compression limiter. However, the
envelope profile limiter may underperform in some listening conditions. For example, it
would reduce the level of all frequency channels for a narrowband intense sound. More
experiments are needed to explore the performance of the envelope profile limiter with
different test stimuli. The participants in this study had no experience with the envelope
P a g e | 150
profile limiter, yet they commented that it did not sound much different to the front-end
limiter. Some of the subjects anecdotally mentioned that the target speech stood out clearer
in the background noise and therefore it was easier to identify words with the envelope
profile limiter. The overall impression on the proposed envelope profile limiter was positive.
To date, cochlear implants have used AGC systems that were essentially the same as those
found in hearing aids. A short AGC release time appears to have a more detrimental effect
in cochlear implants than in acoustic hearing aids, showing the importance of studies
involving cochlear implant recipients. The envelope profile limiter proposed in the present
study was specifically tailored to the needs of a cochlear implant system, and would not be
suitable for a hearing aid. Moving all the gain control elements to after the filterbank opens
new opportunities for optimisation and integration with other processing algorithms.
10.6 Conclusions
A front-end compression limiter prevents the amplitude of the audio signal from exceeding
the compression threshold. However, if the signal path is calibrated for typical speech
signals, then occasional envelope clipping can occur when the audio signal has narrow
bandwidth or low crest factor. The proposed envelope profile limiter eliminated envelope
clipping by monitoring the maximum envelope level (rather than the front-end level) and
setting the envelope compression threshold to be equal to the saturation level of the LGF. It
preserved the spectral profile by applying the same gain to all channels. The primary
conclusion of this study is that the envelope profile limiter is a feasible alternative to a front-
end compression limiter in a cochlear implant system. While both the front-end limiter and
the envelope profile limiter can extend the upper boundary of the operational acoustic range
of cochlear implant systems, the envelope profile limiter accomplishes this with less
distortion.
Among the two factors of the AGC systems investigated in the present study, the release
time was more important for speech intelligibility. The secondary conclusion of this study is
that a slow AGC is important for cochlear implant systems because fast compression speed
can reduce speech intelligibility.
P a g e | 151
C h a p t e r 11 Take-home Study with the Proposed Envelope
Profile Limiter
11 Take-home Study with the Proposed Envelope
Profile Limiter
11.1 Introduction
Listening tests in the laboratory can only replicate a subset of listening conditions
encountered in real life. Therefore take-home experiments are conducted to evaluate quality
and acceptance. A complete evaluation of a system or an algorithm comprises both
laboratory and take-home assessments.
In a take-home experiment, the subject rates a program after a period of acquaintance. The
subject answers a set of questions designed to find out different aspects of the new program
compared to a reference program. The reference program often is the program that the
subject has previously used most of the time. By having to listen with the new program for a
considerable period in the real-world listening environment, a listener can appreciate the
quality and intelligibility of not only speech but also other environmental sounds. Sound
quality and acceptance should be judged after a familiarization period of at least two to four
weeks.
In the previous chapter, the envelope profile limiter, implemented on the Nucleus-xPC
system, was evaluated with cochlear implant recipients in the laboratory. The results
indicated equal or better speech intelligibility performance with the envelope profile limiter
compared to the front-end limiter. The next step was to conduct a take-home study. This
required implementing the envelope profile limiter on the Nucleus CP810 sound processor
(§7.4.5.1). This chapter describes the implementation of the envelope profile limiter. It
explains the clinical fitting procedures and analyzes the results of the clinical questionnaire.
P a g e | 152
Profile Limiter
11.2 DSP Implementation
The DSP implementation of the proposed EPL was done by the present author in Assembly
language. Necessary modification in the DSP firmware was made and thoroughly tested by
the present author before the subjects were equipped with the Nucleus CP810 sound
processors with the modified programs.
The modified signal path on DSP 1 is shown in Figure 11-1. The signal path allowed either
the unified gain model (UGM) or the envelope profile limiter (EPL) to be selected. The
inputs to DSP 1 are 16 samples time domain data and 128 FFT output samples. The UGM
consisted of three cascaded AGCs with slow, medium and fast time constants respectively
(§5.5.4). Based on the comparison between the level of input signal and a set of compression
thresholds, the UGM block calculated the gain and passed it to the AGC Gains block which
applied the gain as well as scaling. The operation of the EPL was as described in §10.2.2.
The down-slew-rate of the EPL was set to 40 dB per second (equivalent to a release time of
625 ms) and instantaneous infinite compression was applied to the channel envelopes if the
maximum amplitude exceeded the compression threshold. To avoid substantial code
modification, whilst still allowing a recipient to switch between a UGM program and an EPL
program on the same processor, ADRO was retained at the end of the signal path in DSP 1.
It was not ideal for ADRO to operate after the EPL because it was possible that ADRO could
increase the gain and cause envelope clipping.
P a g e | 153
Profile Limiter
Figure 11-1 The DSP 1 signal path of the Nucleus CP810 sound processor with a switch
between UGM and EPL
11.3 Fitting Procedures
Five cochlear implant recipients participated in this study. The subjects were provided with a
loaner CP810 speech processor loaded with research firmware that supported the EPL. The
study duration was at least two weeks.
The Nucleus CP810 sound processor can store a maximum of four programs; each program
with parameter settings oriented for specific listening situations. A recipient typically has
four programs in his/her sound processor; Everyday, Noise, Focus and Music. The recipient’s
Everyday and Noise programs were used selected in this take-home study. The loaner
processor was loaded with the recipient’s standard Everyday and Noise programs (using the
UGM) in the program slot 1 and 2, and the modified Everyday and Noise programs (using
the EPL) in the program slow 3 and 4. The recipient’s own program contained the UGM
with a combination of different AGC systems such as ASC, Whisper, the compression
limiter and ADRO.
The subject could easily select programs by using the CR120 remote assistant. The
combination of AGC systems that was used in each program is shown in Table 11-1.
P a g e | 154
Profile Limiter
Subject Standard Program Modified Program
Everyday Noise Everyday Noise
S1 ASC, ADRO Zoom, ASC, ADRO EPL, ADRO Zoom, ADRO, EPL
S7 ADRO Zoom, ASC, ADRO EPL, ADRO Zoom, ADRO, EPL
Table 11-1 Combination of AGCs in each program. Zoom is a fixed beamformer with a
super-cardioid polar response.
The subjects were asked to compare two programs: the UGM Everyday program and the
EPL Everyday program. The subjects were provided with the Cochlear Implant Clinical
Questionnaire (CICQ), developed by the HEARing Cooperative Cochlear Research Centre.
The questions in CICQ are listed in Appendix 2. There are a total of 18 items in the
questionnaire. Each item asks the subject to rate the helpfulness of each program in day-to-
day listening situations. Each program is rated on a five-point response scale, ranging from 1
(not helpful) to 5 (extremely helpful). A subject can select ‘Not Applicable’ if they did not
experience that listening condition. In addition to the helpfulness rating, subjects were also
asked to provide the overall preferred program in quiet and noisy conditions. After selecting
the preferred program, the subjects were also asked to rate the sound quality in quiet and
noisy conditions. The rating was on a four-point scale: the preferred program was (i) very
similar to, (ii) slightly better than, (iii) moderately better, and (iv) much better than the other
programs. The subjects were encouraged to wear the loaner processor as much as they could
to cover many listening scenarios.
P a g e | 155
Profile Limiter
11.4 Results and Discussions
Five subjects, S1, S3, S4, S5 and S7, participated in the take-home study. S1, S3 and S7 use
contralateral hearing aids. S5 is a bilateral subject but only one loaner processor was
provided. Subject S7 answered the questionnaire differently from other subjects such that he
compared the benefit of the UGM program to listening with his contra-hearing alone. He
then compared the benefit of the EPL program to the UGM program. Hence the helpfulness
indicated by S7 was different from the other subjects and therefore not included in Figure
11-2 and Figure 11-3. The average ratings of the two programs were equal for 11 out of 18
questions. Subjects rated the EPL program as less helpful than the UGM program for six out
of 18 conditions. Those conditions in which the UGM program performed less optimally
were concerned with background noise. For example, questions 3, 6, 13, 15 and 18 are about
conversation in noisy background and question 11 is about soft sounds in the environment.
The absolute helpfulness was less than okay in those six conditions.
Extreme
UGM
EPL
Very
Helpfulness
Okay
Little
None
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Question #
Figure 11-2 Helpfulness indication of the UGM and EPL programs for each question in
CICQ
P a g e | 156
Profile Limiter
The questions in the CICQ are about a cochlear implant recipient understanding of speech
and everyday sounds with each program. These questions can be sorted into the following
categories:
 One-to-one conversation in quiet
 One-to-one conversation in noise
 Group conversation in quiet
 Group conversation in noise
 Listening to TV/radio
 Telephone conversation
 Music
 Other sounds in the environment
Figure 11-3 shows the helpfulness of each program in the categorized listening conditions
mentioned above. As per the absolute helpfulness indicator, the subjects found the EPL
program was of little help for conversations in noisy background. The subjects found UGM
more helpful than the EPL program in noisy environments. None of them were very helpful
for group conversation in noise.
P a g e | 157
Profile Limiter
EPL
Telephone conversation UGM
Other sounds in the environment
Music
Listening TV/Radio
Group conversation in noise
One-to-one conversation in noise
Group conversation in quiet
One-to-one conversation in quiet
None Little Okay Very Extreme

Helpfulness
Figure 11-3 Helpfulness indication of the UGM and EPL programs in the categorised
listening conditions
In subsequent analysis, the benefit score for each question was defined as the rating for the
EPL program minus the rating for the UGM program, giving a score in the range -4 to +4. A
positive sign indicates that subjects found EPL more helpful than UGM. Similarly, zeros
means no difference and a negative sign indicates that subjects found EPL less helpful than
UGM. The results from subject S7 were included in this analysis because S7 answered the
questions in this manner, i.e. he gave a helpfulness indication of the EPL program with
respect to the UGM program.
From the questionnaire, seven questions concerning conversation in quiet, and six questions
concerning conversation in noisy conditions were selected for analysis. Table 11-2 shows the
mean benefit score and the overall preferred program for the questions concerning quiet and
noisy backgrounds. According to a t-test, the mean benefit score was not significantly
different from zero (i.e. no net benefit). The questionnaire showed that the subjects had no
strong preference of one program over the other. Despite S7 showing a net benefit for the
EPL program in quiet, he still preferred the UGM program.
Some subjects reported anecdotally that background noise was more objectionable with the
EPL program. It should be noted that in most cases, the UGM program incorporated a slow
P a g e | 158
Profile Limiter
front-end AGC (ASC); the only exception was the Everyday program of S7. In contrast, the
EPL program did not include ASC. ADRO is a slow multichannel AGC with a background
noise rule to reduce high-level background noise. However, the standard parameter setting of
ADRO aimed at improving audibility than noise reduction. That is why the EPL program
with ADRO was not helpful in noisy situations. The subjective reports indicated that a slow
AGC system that reduced the overall level in noisy situations was important for listening
comfort.
Some recipients noticed sound drop-outs with the EPL program after impulsive sounds, for
example door slams. This is a known issue for AGC with a release time over 300 ms
(Stöbich, Zierhofer and Hochmair 1999; Moore 2008).
Subject Quiet Noisy
Preferred Mean benefit score Preferred Mean benefit score
program program
S1 UGM -1 None 0.25
S3 None 0 EPL 0
S4 - - - -1
S5 EPL 0.3 UGM -1
S7 UGM 1.6 UGM -0.8
Mean 0.2 -0.5
Table 11-2 Mean benefit scores and preferred program in quiet and noisy background.
S4 did not answer some questions, as indicated by dashes.
P a g e | 159
Profile Limiter
11.5 Conclusions
The subjects found that the EPL program was similar to their standard program in easy
listening conditions. However, they perceived that the background noise was louder with the
EPL program when the conversation was held in noisy background. The findings from the
take-home study implied that fast AGC alone was not sufficient for day-to-day listening
situations. A slow AGC is necessary to adjust the presentation level of the input signal from
one listening conditions to another. It is particularly useful to have an algorithm like ASC
that reduces the level of background noise. It is hypothesized that the EPL program would be
improved if it was used after a slow AGC. Such AGC arrangement is similar to the dual-loop
AGC of Boyle et al. (2009).
The previous chapter showed that it was feasible to replace the front-end compression limiter
with the envelope profile limiter. The placement of all AGCs at a location, preferably just
before the LGF, would simplify the signal path. In this hypothetical signal path, the ASC
functionality and ADRO would be combined, and the EPL would be the last gain block
before the LGF, to eliminate envelope clipping. The next chapter will investigate further
optimization of the signal path.
P a g e | 160
Chapter 12 Proposed Adaptive Loudness Growth Function
12 Proposed Adaptive Loudness Growth Function
12.1 Introduction
In the previous chapters, some existing AGC systems and the new envelope profile limiter
system were evaluated with cochlear implant subjects. All those systems adjust the level of
the input signal by varying gain. This chapter proposes a new technique, called Adaptive
Loudness Growth Function (ALGF), which adjusts the input dynamic range.
Three configurations of the ALGF were evaluated with cochlear implant subjects in the
laboratory by sentence tests, both with fixed levels and with the adaptive roving-level SRT
test. The performance of the ALGF was compared with the performance of the existing AGC
system, consisting of the front-end tri-loop AGC and ADRO.
12.2 Background
The goal of an AGC system in a hearing device is to improve speech intelligibility by
increasing the audibility of soft components, while keeping loud components at a
comfortable level. This goal is not easily achieved if the input signal is a mixture of a target
and non-target signal, for example speech in the presence of competing noise. When an AGC
compresses noisy speech, the audibility of the desired speech signal suffers. Conversely, an
AGC cannot improve the audibility of low-level speech without amplifying the noise
components.
A typical AGC (§5.2) has a simple level detector, and a gain unit that compresses the signal
level above the compression threshold. ADRO (§5.5.5) uses percentile estimators and a more
sophisticated set of gain rules. As shown in chapters 9 and 10, the performance of these
AGC systems was mixed in noisy conditions. The output SNR can be degraded by an AGC
either by amplifying noise between speech components or by compressing high level speech
components more than interfering noise components. ADRO and ASC employ a noise floor
P a g e | 161
estimator to control the level of background noise. Yet they cannot satisfy both audibility
and noise reduction goals at the same time.
One approach to reduce the noise level without compromising the audibility of target speech
is to employ a noise reduction algorithm before the gain control algorithms. In that
traditional approach, the dynamic range is fixed and the noise is pushed down out of the
dynamic range by the noise reduction system. The enhanced speech is compressed or
amplified by the subsequent gain algorithm to be within the designated dynamic range. The
AGC system and the noise reduction method operate independently from each other. In this
chapter a new technique is proposed. Unlike conventional AGC systems, the proposed
technique expands or contracts the dynamic range of the LGF. It is hypothesized that
adjusting the dynamic range with the input signal can achieve both audibility and noise
reduction simultaneously. Since the new technique is integral to the LGF and the dynamic
range varies adaptively with the input signal level, it is called the Adaptive Loudness Growth
Function (ALGF).
The inspiration of the ALGF comes from the prior art invention of Neal (2011). Neal
proposed a technique to optimize the input dynamic range by setting the lower end of the
dynamic range, i.e. the base level of the LGF, to the estimated noise floor. This way the
noise reduction is realized without affecting the audibility of a target signal.
All commercially available cochlear implant systems use a fixed input dynamic range.
Holden et al. (2011) recommended as clinical guidelines to use raised T-levels (> 10% of M-
levels) and to provide the recipients with two programs: one with a wide input dynamic
range for soft speech understanding in quiet, and another with a narrow input dynamic range
for noisy conditions. Nucleus cochlear implant systems typically use an input dynamic range
of between 30 and 50 dB. The input dynamic range has an impact on the speech
intelligibility of cochlear implant recipients in different listening conditions (Spahr, Dorman
and Loiselle 2007). For everyday use, there are advantages and limitations associated with
the input dynamic range (Wolfe et al. 2009). A wide dynamic range may be more likely to
facilitate a range of loudness experiences within the small electrical dynamic range of the
cochlear implant user. For example, recipients have more benefits from a wide dynamic
P a g e | 162
range for speech in quiet because low level speech components become more accessible
(James et al. 2003; Dawson et al. 2007; Spahr, Dorman and Loiselle 2007). The effects of a
wide input dynamic range on the intelligibility of noisy speech are mixed. One study shows
no performance difference between different input dynamic range settings (Dawson et al.
2007) but another study shows performance degradation for speech presented in noise
(Spahr, Dorman and Loiselle 2007). A narrow dynamic range may be more useful in noisy
environments because it would partially reduce the noise mapped into the electrical dynamic
range (Wolfe et al. 2009). These studies suggested to the present author that the dynamic
range should be adaptive to the condition of the input signal to maintain the intelligibility.
The aim of the ALGF is to satisfy the requirement of different input dynamic ranges for
different listening scenarios.
1
2 degrees of freedom
 
0.8
Output magnitude
saturation level
base level
0.6
0.4
0.2  
1 degree of freedom
0
-50 -40 -30 -20 -10 0 10

Filter band amplitude (dB)
Figure 12-1 Degree of freedom for the input signal to move within the dynamic range of
LGF
The ALGF adjusts both the lower and upper end of the input dynamic range according to the
varying level of the input signal. Figure 12-1 depicts the adjustment to the level of the signal
in the input dynamic range by a conventional AGC system and the proposed ALGF. A
conventional AGC provides only one degree of freedom. The signal (shown by a red cross in
P a g e | 163
the diagram) is either shifted left (towards the base level), or right (towards the saturation
level) by varying the gain. The input dynamic range is fixed. The proposed technique can
adjust the relative position of the signal within the dynamic range, with two degrees of
freedom, by adjusting the base and saturation level independently.
12.3 Implementation of the ALGF
The implementation of the ALGF in Simulink is shown in Figure 12-2. The ALGF consisted
of two main components: the saturation level regulator (SLR) and the base level regulator
(BLR). The saturation level regulator consisted of a fast saturation level regulator (FSLR)
and a slow saturation level regulator (SSLR).
P a g e | 164
Figure 12-2 Top level Simulink block diagram of ALGF
P a g e | 165
The logarithmic compression of the Nucleus LGF was described in §4.3.7. The scaling
function of the traditional LGF was calculated by using a fixed saturation and fixed base
level (equation 4.8).
The scaling function of the ALGF used the adaptive base and saturation level and produced
the output between 0 and 1. The adaptive saturation and base level were updated
continuously. The noise floor and hence the base level was calculated independently in each
frequency channel because noise in real life was mostly coloured and distributed differently
in different frequency channels. The ALGF slowly adapted the base level to the estimated
noise floor to reduce the level of noise without introducing processing artifact such as
musical tones. Besides, slow noise removal allows access to environmental sounds although
some of them are noise-like. The scaling function of the ALGF is described as:
(12.1)
where and are the envelope and the adaptive base level of the frequency channel k
respectively. is the adaptive saturation level for all frequency channels.
In Figure 12-2, the blocks showing the fast and slow saturation level regulators are coloured
blue and the block showing the base level regulator is coloured yellow. The input to both fast
and slow saturation level regulators was the maximum amplitude of the envelopes across
frequency channels. The fast saturation level regulator worked like a compression limiter and
the slow saturation level regulator worked like a slow AGC. The three orange switches in
Figure 12-2 allowed different configurations of the ALGF. For example, the ALGF could be
configured to employ only an adaptive saturation level or only an adaptive base level. When
both adaptive base and saturation regulators were bypassed, the ALGF became the ordinary
loudness growth function. One of the configurations of the ALGF evaluated with cochlear
implant subjects was the ALGF with a fixed dynamic range setting (§12.4.2.4).
Decreasing the saturation level is equivalent to increasing the gain in a conventional AGC.
For listening comfort, a minimum value was imposed on the dynamic range of the ALGF to
prevent the slow saturation level from approaching the base level when there was no signal
P a g e | 166
or only low-level signal present at the input. This minimum dynamic range was realized by
restricting the slow saturation level to stay a specified distance above the maximum of base
levels across frequency channels. The final saturation level was the maximum of the fast
saturation level and the slow saturation level. The computational complexity of the ALGF is
similar to the existing UGM, ADRO and LGF which it would replace. The detailed
implementation of the fast and slow saturation level regulators and the base level regulator
are explained in the subsections below.
12.3.1 Fast Saturation Level Regulator
The implementation of the fast saturation level regulator (FSLR) is shown in Figure 12-3.
The operation of the FSLR was similar to the envelope profile limiter (EPL) described in
§10.2.2 except that the direction of the saturation level was opposite to the direction of gain
in the EPL. The FSLR tracked the maximum level of the channel envelopes over time and
increased the fast saturation level instantaneously if the maximum amplitude exceeded the
fast saturation level from the previous frame. Hence it prevented the channel envelopes from
exceeding the saturation level. Unlike the EPL, the FSLR tracked the maximum level of the
channel envelopes regardless of its relative position from the reference level, for example C-
SPL in the traditional signal path. The potential disadvantage of adjusting the signal level
this way would be the loss of intensity variation in the output signal.
The EPL applied the same amount of gain to all frequency channels to preserve the short-
term spectral profile of the input signal. The saturation level regulator of the ALGF also used
one saturation level for all frequency channels. The other important parameters of the FSLR
were the time constants. The down-slew-rate of the fast saturation level determined how
quickly the system responded to a sudden increase in the input level. The experimental
results of the EPL (§10.4.1) with the cochlear implant subjects showed that a release time
long enough to maintain the modulation rate of the phonetic entities was necessary to
maintain the speech intelligibility of sentences presented in noise. A down-slew-rate of -40
dB/s (i.e. equivalent to the release time 625 ms) was considered a good choice for a single-
loop AGC. However, the ALGF employed a dual-loop saturation level regulator, therefore a
down-slew-rate shorter than -40 dB/s was also acceptable without affecting the speech
P a g e | 167
intelligibility in noise. The reason for anticipating a faster down-slew-rate (i.e. a shorter
release time) was to reduce the ‘pumping’ effect that would be noticeable for the release time
in the range between 100 ms and 500 ms. Two values of down-slew-rate were tested: -80 and
-300 dB/s.
Figure 12-3 Simulink block diagram of fast saturation level regulator
As shown in the block diagram, the FSLR also had a hold timer block. The hold timer
facilitated the adjustment of the fast saturation level to stay at the same level for a certain
duration before it was released from the compression mode to track the level of the input
signal.
12.3.2 Slow Saturation Level Regulator
The Simulink block diagram of the slow saturation level regulator (SSLR) is shown in
Figure 12-4. The objective of the SSLR was to slowly adjust the saturation level so that the
input signal was kept within the upper part of the input dynamic range. Hence the SSLR
improved the audibility of the input signal with minimal distortion on the temporal envelope.
The input signal to the SSLR was the maximum level of the channel envelopes.
P a g e | 168
Figure 12-4 Simulink block diagram of slow saturation level regulator
The SSLR acted as a feature-based AGC. The main components of the SSLR were a feature
extraction unit that calculated the proportion of clipping, and a level adjustment unit that
adjusted the slow saturation level based on the decision made by the comparison between the
extracted features and the preset thresholds. In this context, clipping is defined as the
maximum envelope exceeding the slow saturation level, i.e. the envelopes would clip if the
FSLR was not present. In other words, clipping proportion was the proportion of time that
the FLSR was active. The level adjustment unit operated on a set of rules as described
below:
if clip_prop > clip_threshold % Comfort

slow_sat_level = prev_slow_sat_level * up_slew_rate;
elseif slow_sat_level - peak_env > hold_distance % Audibility
slow_sat_level = prev_slow_sat_level * down_slew_rate;
else % maintain
slow_sat_level = prev_slow_sat_level;
end
Rules were arranged in the order of importance. The highest priority was given to the
comfort rule. The SSLR increased the slow saturation level if the proportion of clipping
exceeded the threshold. If the input signal satisfied the comfort criterion, then the SSLR
checked if the audibility criterion was met. The SSLR compared a slow-moving envelope of
P a g e | 169
the input signal and the slow saturation level from the previous frame. A switch was placed
to choose either the slow envelope or the RMS level of the input signal for checking the
audibility criterion. If the slow saturation level was above the slow envelope of the input
signal by more than a certain level of magnitude (i.e. the hold distance) then the audibility
criterion was not met. The SSLR decreased the slow saturation level to boost the audibility
of the input signal. If both the comfort and audibility criteria were met, the SSLR kept the
slow saturation level at the previous level.
It is more important to reduce the amount of clipping than to increase the audibility (as per
the precedence of the rules). Therefore the up-slew-rate of the SSLR was set considerably
faster than the down-slew-rate. Rather than using fixed time constants as in other AGC
systems, the up-slew-rate was adaptively changed with the proportion of clipping. This
advanced feature allowed the slow saturation level to be more responsive to the variation in
the input signal.
12.3.2.1 Clipping proportion
Figure 12-5 shows the Simulink block diagram of the clipping proportion calculator. The
input signal to the calculation unit was the maximum amplitude of the channel envelopes.
The SSLR counted the number of envelope samples above the slow saturation level within a
frame and divided by total number of samples in the frame.
Figure 12-5 Simulink block diagram of clipping proportion calculation
P a g e | 170
12.3.2.2 Adaptive up-slew-rate
A high proportion of clipping could result in loudness discomfort and speech intelligibility
reduction, especially in noise. It was justifiable to use the proportion of clipping itself as a
multiplier if the up-slew-rate would be adaptive to the input signal. The up-slew-rate of the
SSLR was multiplied with the proportion of clipping so that the rate became faster for larger
proportion of clipping.
12.3.2.3 Hold distance
The purpose of using the hold distance in the level determination rules of the SSLR was to
stabilize the slow saturation level. It also helped to maintain a slow temporal modulation of
the envelopes in each frequency channel. For example, the temporal envelope of the output
signal would have less modulation if the input signal and the saturation level modulated
together. For a listening condition with roving-level input signal, for instance conversation
between the recipient and another person, it could be perceptually annoying for cochlear
implant recipients if they noticed the frequent level adjustment by the algorithm for the
overall level changes of the input signal. The overall levels of the two voices are different at
the microphone of the sound processor depending on the distance between them. According
to the inverse square law, the sound pressure level decreases about 6 dB for every doubling
of distance in the free field. The distance between the behind-the-ear sound processor and the
recipient’s mouth is approximately 0.2 m. In a hypothetical listening situation where another
person was one metre away from the recipient, the sound pressure level difference between
the two voices, measured at the microphone, could be approximately 15 dB.
The hold distance could be set between 10 and 15 dB in the experiments. The diagrams in
Figure 12-6 show a slow saturation level, with and without the hold distance, varied with the
roving input signal between two input stimuli simulating a typical acoustic scenario between
the recipient and another person (Mr. X). The envelope waveform between 8 and 16.2
seconds belongs to the recipient and that between 16.2 and 24 second belongs to Mr. X in
this simulation. The diagrams show that using the hold distance avoided modulation in the
slow saturation level.
P a g e | 171
Figure 12-6 Simulated listening condition showing the slow saturation level with the
hold distance of 0 dB (top panel) and 15 dB (bottom panel)
12.3.3 Base Level Regulator
The Simulink block diagram of the base level regulator (BLR) is shown in Figure 12-7.
Rather than directly using the estimated noise floor as the base level, the BLR adjusted the
base level with respect to the estimated noise level. This allowed the BLR to control the
speed of noise tracking. The BLR slowly increased the base level if the noise floor was
above the base level. Similarly, the BLR reduced the base level down quickly when the noise
floor was below the base level. The up-slew-rate of the BLR was considerably slower than
the down-slew-rate. Like any noise reduction algorithm, the BLR could introduce two types
of error: overestimation and underestimation of true noise power. The overestimation error
P a g e | 172
could potentially distort the input signal. In contrast, the output could still be noisy due to
underestimation error. The reason for making the down-slew-rate faster than the up-slew-rate
was to reduce the overestimation error more than the underestimation error.
Figure 12-7 Simulink block diagram implementation of base level regulator
Any sub-band noise floor estimation algorithms can be appropriately applied in the BLR.
The first noise estimator applied in the BLR was Lin’s time recursive averaging noise
estimation method (§5.6.2). Later, a new noise estimator was proposed by incorporating the
minima-controlled feature into Lin’s noise estimator.
12.3.3.1 Lin’s time-recursive averaging sub-band noise floor estimator
The Simulink block diagram of Lin’s time-recursive averaging sub-band noise estimation
algorithm is shown in Figure 12-8. The smoothing parameter calculation procedure is shown
Figure 12-9.
P a g e | 173
Figure 12-8 Simulink block diagram of Lin's recursive averaging noise floor estimator
Figure 12-9 Simulink block diagram of smoothing parameter calculation
The algorithm was adapted to incorporate into the BLR. As shown in Figure 12-8, the power
of input stimuli was smoothed by the first-order IIR filter (equation 5.6). The filter
coefficient of the smoothing filter was updated for every input signal (equation 5.7 &
Figure 12-9). The filter coefficient was limited to the maximum value of 0.9999 and the
minimum value of 0.8. The reason of using the maximum value less than one was to avoid
the deadlock of updating the estimated noise floor only with the previous level. The
minimum value of 0.8 on the other hand was to slow down the algorithm to avoid
overestimating noise floor. The filter coefficient of the final smoothing filter of the
noise power estimate is set to 0.9667 (equation 5.8). The magnitude of the estimated noise
was obtained by taking the square root of the estimated noise power. The filter coefficients
and the limit values were chosen empirically. Compared to the original method, the adapted
P a g e | 174
noise estimation used in the BLR was less aggressive to update the noise floor with the noisy
signal power to reduce the overestimation error.
12.3.3.2 New minima-controlled recursive averaging noise floor
estimator
The Lin’s time-recursive averaging method could follow changes in the noisy signal power
yet the algorithm often overestimated the noise power during speech activity. Therefore a
control feature was proposed to add into the Lin’s noise floor estimator to reduce the
overestimation error. Since it monitored the minimum level of the input stimuli to control
Lin’s recursive averaging method, it was called a minima-controlled recursive averaging
(MCRA) noise estimation algorithm. The Simulink diagram of the MCRA noise estimation
algorithm is shown below. The minima-controlled feature was coloured pink in the block
diagram.
Figure 12-10 Simulink block diagram of the proposed MCRA noise floor estimator
P a g e | 175
Figure 12-11 Simulink block diagram of the minima-controlled feature in the proposed
MRCA noise estimation algorithm
Figure 12-11 shows the implementation of the minima-controlled feature. The updated noise
estimate at the output of the low-pass IIR filter was compared with the minimum of the
smoothed signal power. The minimum signal power was obtained at each frame by
comparing the smoothed signal power at the current frame with the one from the previous
frame. At the end of the search period, the minimum signal power was updated with the new
signal power and the search continued. The new signal power update was restricted to 10 dB
maximum to avoid transients. The noise estimate was then updated by taking the weighted
average of the estimated noise power and the minimum signal power. The coefficient of the
weighted average filter was empirically set to 0.8. The final noise power estimate was
obtained by taking the minimum of the noise power estimate before and after the minimum
search procedure.
Because of the minima-controlled procedure, the noise power estimate could bias towards
lower value. Therefore the bias compensation was done as described in the Martin’s
minimum statistics noise estimator (§5.6.1).
P a g e | 176
12.4 Offline Data Analysis
12.4.1 Comparison of the Noise Estimators
This section evaluates the performance of the noise floor estimators used in the base level
regulator: (i) Lin’s recursive averaging noise estimator and (ii) the proposed minima-
controlled recursive averaging noise estimator. For comparison, the noise estimator using the
Martin’s minimum statistics method was also evaluated.
The algorithms were run offline in Simulink. 15 sentences were concatenated with 0.5
second of silent period before and after each sentence. Noise was presented with no silent
gaps. Two test conditions, fixed and roving level sentences, were tested to observe the
effectiveness of the noise estimators. The SNR was fixed at 8 dB for each test condition. The
histogram of the normalized error between the magnitudes of the actual noise and the
estimated noise was generated. The normalized error was calculated for each frequency
channel as:
(12.2 )
The magnitude of the true noise floor was smoothed first. The normalized error could range
from - ∞ to 1. Zero normalized error indicates an ideal noise estimation, i.e. the magnitude of
the estimated noise is equal to the true noise. Positive normalized error indicates the
underestimation of the true noise. Conversely, negative normalized error indicates the
overestimation of the true noise. The upper limit, i.e. one, of the normalized error indicates
that the magnitude of an estimated noise level is significantly lower than the true noise. The
normalized error of -1 indicates that the magnitude of the estimated noise floor is twice
larger than that of the true noise.
P a g e | 177
12.4.1.1 Fixed Presentation Level
Three different types of noise signals were used in this simulation; four-talker babble noise,
city noise and Long-Term Average Speech-Shaped (LTASS) noise. Noises were
concatenated as shown in Figure 12-12.
0.4
speech
0.3 noise
0.2
0.1
Amplitude
-0.1
-0.2
 speech in four-talker babble  speech in city noise  speech in LTASS noise 
-0.3
-0.4
8 9 10 11 12 13 14 15 16
Time (s)
Figure 12-12 Fixed-level Sentences presented with three types of noise: four-talker
babble, city noise and LTASS noise
Figure 12-13 shows the magnitude of noisy speech, actual noise and estimated noise by the
three noise estimators at the output of the frequency channels with the centre frequency (Fc)
of 367 Hz, 1101 Hz and 4282 Hz.
Martin's minimum statistics noise floor estimation

-20
Fc = 4282 Hz Noisy speech
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS
-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)
P a g e | 178
Lin's recursive averaging noise floor estimation

-20
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS
-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)
Proposed minima-controlled recursive averaging noise floor estimation
-20
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS
-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)
Figure 12-13 Estimation of three different noises presented at the fixed level by
Martin’s minimum statistics method (top panel), Lin’s recursive averaging method
(middle panel) and the proposed MCRA method (bottom panel)
P a g e | 179
Normalized Noise Estimation Error
0.12
0.09 Fc = 4282 Hz
0.06
0.03
0
0.12
0.09
PDF
Fc = 1101 Hz
0.06
0.03
0
0.12
0.09 Martin's min stats
Lin's RA Fc = 367 Hz
0.06 New MCRA
0.03
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Normalized Error
Figure 12-14 Probability density functions of the normalized error for estimating fixed
level noises
Figure 12-14 show the probability density function of the normalized estimation error
produced by each noise estimator in the frequency channels centered at 367Hz, 1101 Hz and
4282 Hz. The RMS level of the noise was fixed at the same level for all three types of noise.
The proportion of the normalized error with the Martin’s minimum statistics algorithm and
the new MCRA method was mainly concentrated between 0.4 and 1. Lin’s recursive
averaging method on the other hand produced the normalized error distribution centered at
about 0. The error distribution was uniformly spread across the range. Compared to the other
two noise estimators, Lin’s method showed considerably less underestimation error.
However, Lin’s method produced more negative normalized error than the other two noise
estimators.
The top and bottom panels of Figure 12-13 show that the estimated noise by the minimum
statistics method and the new MCRA method could follow the variation in the true noise
floor. However, the overall level was approximately 10 dB lower than the true noise floor.
The middle panel of Figure 12-13 shows that Lin’s recursive averaging method tracked the
true noise floor of all three types of noise reasonably well although overestimation
occasionally occurred.
P a g e | 180
12.4.1.2 Roving presentation level
Two different types of noise were used: four-talker babble and LTASS noise. Both the
sentences and the background noise were roved together in a sequence of 50, 65 and 80 dB
SPL as shown in Figure 12-15.
speech
1.5 noise
0.5
Amplitude
-0.5
-1
-1.5
8 9 10 11 12 13 14 15 16
Time (s)
1.5
speech
1 noise
0.5
Amplitude
-0.5
-1
-1.5
8 9 10 11 12 13 14 15 16
Time (s)
Figure 12-15 Roving-level sentences presented in four-talker babble (top panel), and
LTASS noise (bottom panel)
Figure 12-16 show the magnitude of the noisy speech, the true noise and noise estimated by
the three potential noise estimators for the base level regulator. The speech and noise
analysis was done at the frequency channels with the centre frequency (Fc) of 367 Hz, 1101
Hz and 4282 Hz.
P a g e | 181

-20
Noisy speech Fc = 4282 Hz
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS
-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)
-20
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS
-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)
P a g e | 182

-20
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS
-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)
Figure 12-16 Estimation of roving-level four-talker babble noise by: Martin’s minimum
statistics method (top panel), Lin’s recursive averaging method (middle panel) and the
proposed MCRA method (bottom panel)
The results show that the minimum statistics algorithm and the proposed MCRA algorithm
were slow on tracking the noise floor when the presentation level of the noisy speech was
increased to another level by 15 dB. Compared to them, Lin’s recursive averaging method
tracked the noise floor more effectively.
P a g e | 183
Normalized Noise Estimation Error - 4-talker babble
0.12
0.09 Fc = 4282 Hz
0.06
0.03
0
0.12
0.09
PDF
Fc = 1101 Hz
0.06
0.03
0
0.12
0.06 New MCRA
0.03
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Normalized Error
Figure 12-17 Probability density function of the normalized error for estimating the
roving-level four-talker babble noise
Figure 12-17 show the probability density functions of the normalized estimation error
produced by each noise estimator at three frequency channels centered at 367Hz, 1101 Hz
and 4282 Hz. The proportion of the normalized error with the Martin’s minimum statistics
algorithm and the new MCRA method was mainly concentrated between 0.8 and 1 with
peaks at about 0.95. The highest probability density at 0.95 indicated the underestimation of
the true noise magnitude by both algorithms. The top and bottom panels of Figure 12-16
show that the estimated noise by the Martin’s minimum statistics method and the new
MCRA method was below the true noise floor by 10 to 20 dB, especially when the
presentation level was roved up. The Lin’s recursive averaging method on the other hand
produced more evenly distributed normalized error. The middle panel of Figure 12-16 shows
that the estimated noise by Lin’s recursive averaging method could effectively track the true
noise floor of four-talker babble noise. However, the estimated noise floor was occasionally
above the true noise. Based on the comparison of the error distribution curves, Lin’s
recursive averaging method was most effective for tracking roving-level babble noise
amongst the three noise estimators.
P a g e | 184
Figure 12-18 show the magnitude of the noisy speech, the true noise and the estimated noise.
The stationary LTASS noise was used in this analysis.

-20
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS
-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)
-20
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS
-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)
P a g e | 185

-20
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS
-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)
Figure 12-18 Estimation of roving-level LTASS noise from the noisy speech by:
Martin’s minimum statistics method (top panel), Lin’s recursive averaging method
(middle panel) and the proposed MCRA method (bottom panel)
The results show that the Martin’s minimum statistics and the proposed MCRA methods
were slow on tracking the noise floor when the presentation level was stepped up 15 dB to
another level. Lin’s recursive averaging method on the other hand tracked the noise floor
more effectively than the other two methods.
P a g e | 186
Normalized Noise Estimation Error - LTASS
0.12
0.09 Fc = 4282 Hz
0.06
0.03
0
0.12
0.09
PDF
Fc = 1101 Hz
0.06
0.03
0
0.12
0.06 New MCRA
0.03
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Normalized Error
Figure 12-19 Probability density function of the normalized error for estimating
roving-level LTASS noise
Figure 12-19 shows the probability density functions of the normalized estimation error
produced by each noise estimator in three selected frequency channels. The probability of
the normalized error estimated by the Martin’s minimum statistics algorithm was mainly
concentrated in the region between 0.8 and 1. The error PDF showed two peaks and the peak
at almost 1 was suspected to be very low noise estimate at the beginning of an increased
presentation level. The Lin’s recursive averaging method showed the error density spread
across the entire range. The normalized error showed two peaks and concentrated more in
the region of -0.4 and 0.4. A small peak at about 0.8 indicates that Lin’s algorithm showed
the underestimation of the true noise at the beginning of the sentences when the presentation
level was stepped up to another level. The probability distribution of the normalized error
produced by the Lin’s method was higher than the other two methods on the negative region.
It indicates the occasional overestimation of the true noise floor. The error PDF of the
proposed MCRA was in between the error PDFs of the other two methods. Compared to
Lin’s algorithm, it still showed the normalized error concentrated more on the positive
region and peaked at about 0.7. According to the normalized error density comparison, Lin’s
P a g e | 187
recursive averaging method could track the roving-level stationary noise most effectively.
The other two noise estimators showed significant underestimation error.
12.4.2 Processing Conditions
The following signal processing conditions were analyzed. The same processing conditions
were evaluated with cochlear implant subjects in the laboratory:
 Tri + ADRO
 ALGF-1
 ALGF-2
 ALGF-2F
The setting of manual sensitivity and the LGF were common in all processing conditions and
described in Table 12-1.
Other program parameter Value Unit

Sensitivity 12
C-SPL 65 dB SPL
Dynamic range of the LGF 40 dB
LGF Q Value 20
Table 12-1 Setting of other program parameters
12.4.2.1 Tri + ADRO
Tri + ADRO represents the existing AGC system of the Nucleus signal path. It consists of
the front-end tri-loop AGC (§5.5.4) and ADRO (§5.5.5). The standard setting of ADRO was
used (James et al. 2002). The parameter settings of the tri-loop AGC were shown in the third
column of Table 5-2.
12.4.2.2 ALGF-1
ALGF-1 was the first version of ALGF evaluated with cochlear implant recipients. The
ALGF-1 consisted of the fast and slow saturation level regulators and the base level
regulator.
P a g e | 188
The parameter setting of the ALGF-1 is shown in Table 12-2. The down-slew-rate of the
FSLR was equivalent to 312.5 ms release time. According to the offline data analysis of the
noise estimators with different noises, the Lin’s recursive averaging noise floor estimator
showed occasional overestimation. Hence, the BLR of the ALGF-1 conservatively used the
up-slew-rate of 3 dB/s to reduce the overestimation error.
Value Unit
ALGF Parameter
ALGF-1 ALGF-2 ALGF-2F
Minimum dynamic range 20 20 40 dB
Q (steepness factor) 20 20 20
Fast Saturation Level Regulator
Down-slew-rate -80 -300 -300 dB/s
Slow Saturation Level Regulator
Down-slew-rate -9 -12 -12 dB/s
Maximum up-slew-rate 90 60 60 dB/s
Frame duration for clipping 100 100 100 ms
Clipping proportion threshold 0 0 0 %
Proportion
Hold distance 20 15 15 dB
Proportion
Base Level Regulator
Minimum base level - - -40 dBFS
Down-slew-rate -300 -300 - dB/s
Up-slew-rate 3 10 - dB/s
Noise estimator Lin’s MCRA -
Noise floor bias compensation 0 (Martin - dB
Enabled fixed dynamic range False False True Boolean
2001)
Fixed dynamic range - - 40 dB
Table 12-2 Parameter setting of three configurations of ALGF
12.4.2.3 ALGF-2
ALGF-2 was the second version of ALGF evaluated with cochlear implant recipients. The
ALGF-2 employed the same saturation level regulator as ALGF-1, but with different
parameter settings. The parameter setting of the ALGF-2 is shown together with the other
two ALGF versions in Table 12-2. The base level regulator of the ALGF-2 employed the
proposed minima-controlled recursive averaging (MCRA) noise floor estimator.
Since the ALGF used a dual-loop saturation level regulator, it was anticipated that a faster
down-slew-rate in the fast saturation level regulator would not be too detrimental to the
P a g e | 189
speech intelligibility. Hence, a faster down-slew-rate was employed in the ALGF-2. The
down-slew-rate of -300 dB/s is equivalent to the release time of 83.3 ms.
The ALGF-2 aimed to achieve more audibility. Hence the hold distance of 15 dB was used.
To give a fair comparison between the two processing conditions: Tri + ADRO vs. ALGF-2,
the fixed channel gains (§4.3.5) were bypassed in the ALGF-2 processing as in the
processing of Tri + ADRO.
According to the offline data analysis of the noise estimators with different noises, the new
MCRA noise floor estimator showed little-to-none overestimation. Hence, the BLR of the
ALGF-2 used high up-slew-rate of 10 dB/s.
12.4.2.4 ALGF-2F
The ALGF-2F is similar to ALGF-2, but with a fixed dynamic range setting, which bypassed
the BLR. Since it only employed the adaptive saturation level regulator, it behaved like a
normal AGC. The base level in this configuration was calculated as the magnitude of the
saturation level (in dB) minus the fixed dynamic range (in dB). The performance comparison
between the ALGF-2 and the ALGF-2F allows observing the importance of using the
estimated noise floor as the base level.
Figure 12-20 shows the Simulink block diagram of the ALGF-2F. A switch colored orange
near the BLR routed either a fixed base level or an adaptive base level as per the estimated
noise floor.
P a g e | 190
Figure 12-20 Simulink block diagram of the ALGF with a fixed dynamic range setting
P a g e | 191
The ALGF-2F set a fixed dynamic range of a 40 dB. 40 dB was chosen because it was a
typical dynamic range setting used in the Nucleus signal path with the existing AGC system.
The parameter setting of the ALGF-2F is shown in Table 12-2.
12.4.3 Offline Performance Analysis of the Gain Algorithms
This section investigates the performance of the Nucleus signal path with the existing AGC
system and the three versions of the ALGF by visualizing the input and output signals. The
implementation of the signal path in Nucleus-xPC system is shown in Figure 12-21. The
same processing conditions were also evaluated with cochlear implant subjects in the
laboratory.
P a g e | 192
Figure 12-21 Simulink block diagram of the Nucleus signal path with the existing AGC
systems and the ALGF
P a g e | 193
Roving-level sentences were presented with the LTASS noise at 8 dB SNR. The input
stimuli represented a stimulus produced by the roving-level SRT test (§7.4.4). The
presentation levels of the sentences were 50, 65 and 80 dB SPL. The background noise was
presented 3 seconds before and after each sentence. In the actual testing with the recipients, a
beep was presented two seconds before each sentence to alert the recipients. To reproduce
the stimuli tested with the recipients, a beep was also included in the test stimuli of the
offline data analysis.
12.4.3.1 Tri + ADRO
The input to the front-end tri-loop AGC was a time-domain waveform and the input signal to
ADRO was a vector of filterbank outputs. The output signal was taken at the output of the
loudness growth function before mapping. Figure 12-22 show the input, output and gain
signals of the frequency channels centered at 367 Hz, 1100 Hz and 4282 Hz.
Figure 12-22 shows the simulated input, output and gain signals of the Tri + ADRO
processing. The bottom plot in each panel shows the input signal. The middle plot shows the
gain signals and the top plot shows the signals at the output of the LGF. The input signal plot
shows that the entire sentence was above the saturation level of the LGF for 80 dB SPL
presentation level. The sentence presented at 65 dB SPL was in the upper range of the input
dynamic range between the base and saturation level. Speech occasionally exceeded the
saturation level at 65 dB SPL. The sentence presented at 50 dB SPL resided in the lower
range of the dynamic range. The fast and medium AGCs of the tri-loop AGC were active at
65 dB SPL. ADRO was not very active at this level. All three AGCs of the tri-loop AGC and
ADRO were active at 80 dB SPL. The input signal at 80 dB SPL was reduced to within the
input dynamic range. The gain diagram of the tri-loop AGC shows that the fast AGC
operated on component-level, the medium AGC on sentence-level and the slow AGC on the
overall presentation level at 80 dB SPL. At 50 dB SPL, only ADRO was active, providing
positive gain to improve audibility. The tri-loop AGC was released from compression mode
at 50 dB SPL.
P a g e | 194
Fc = 367 Hz
1
Gain (dB) LGF output (dB)

0.5
0
5
-5 ADRO
-15 Tri
-25
Input (dB SPL)
90 Input Fixed sat level Fixed base level

70
50
30
10
10 15 20 25 30
Fc = 1101 Hz
1
0.5
0
5
-5 ADRO
-15 Tri
-25
Input (dB SPL)

70
50
30
10
10 15 20 25 30
Fc = 4282 Hz
1
0.5
0
5
-5 ADRO
-15 Tri
-25
Input (dB SPL)

70
50
30
10
10 15 20 25 30
Figure 12-22 Input, output and gain signals produced from the Nucleus signal path
with Tri + ADRO. There are three sentence presentations in the figure, having levels of
65, 80 and 50 dB SPL. Each presentation consists of three seconds of noise, then the
367 Hz (top panel), 1101 Hz (middle panel) and 4282 Hz (bottom panel) were analyzed.
P a g e | 195
12.4.3.2 ALGF-1
Figure 12-23 shows the input signals, output signals and base level and saturation level of
the ALGF-1 operated in the signal path shown in Figure 12-21. The bottom plot in each
panel shows the input signal together the adaptive saturation and adaptive base level of the
ALGF. The top plot shows the signals at the output of the ALGF. It was observed that the
ALGF output at the beginning of the roved-up noise was noticeably high. This was due to
the noise estimator taking a few seconds to respond to a sudden increase in the overall
presentation level. However, after two seconds, the noise level then decreased, becoming
considerably lower than in the Tri + ADRO processing. The saturation level was increased at
the onset of the beep and stayed for the duration of the presentation at 65 and 80 dB SPL. It
was then gradually reduced when the sentence presentation level was decreased, from 80 to
50 dB SPL. The release time of the ALGF-1 was up approximately at the beginning of the
sentence at 50 dB SPL. The ALGF-1 could be released at a faster rate, with the risk of
pumping effect.
P a g e | 196
Fc = 367 Hz
1
0.8
LGF output
0.6
0.4
0.2
0
Input Adaptive sat level Adaptive base level Fixed sat level Fixed base level
90
dB SPL
70
50
30
10
10 15 20 25 30
Time(s)
Fc = 1101 Hz
1
0.8
LGF output
0.6
0.4
0.2
0
90
dB SPL
70
50
30
10
10 15 20 25 30
Time(s)
Fc = 4282 Hz
1
0.8
LGF output
0.6
0.4
0.2
0
90
dB SPL
70
50
30
10
10 15 20 25 30
Time(s)
with the ALGF-1. There are three sentence presentations in the figure, having levels of
65, 80 and 50 dB SPL. Each presentation consists of three seconds of noise, then the
P a g e | 197
12.4.3.3 ALGF-2
the ALGF-2. Compared to the output of the ALGF-1, the overall saturation level of the
ALGF-2 was closer to the input signal due to a shorter hold distance used in the slow
saturation level regulator. Hold distance was set to 20 dB in the ALGF-1 and 15 dB in the
ALGF-2. That would provide more audibility at the output signal.
Although the Lin’s recursive averaging noise estimator of the ALGF-1 could track the noise
floor faster than the new MCRA noise estimator of the ALGF-2 (§ 12.4.1.2), the up-slew-
rate of the base level regulator in the ALGF-1 was deliberately made slow to reduce potential
overestimation error of the true noise floor. The up-slew-rate of the base level regulator in
the ALGF-2 was at least three times faster than that of the ALGF-1. Hence, the noise
tracking performance of the two ALGFs were similar. The down-slew-rate of the slow
saturation level regulator of the ALGF-2 was faster than that of the ALGF-1. As a result, the
compressive mode from 80 dB SPL was finished approximately one second before the next
sentence was presented at 50 dB SPL.
P a g e | 198
Fc = 367 Hz
1
0.8
LGF output
0.6
0.4
0.2
0
90
dB SPL
70
50
30
10
10 15 20 25 30
Time(s)
Fc = 1101 Hz
1
0.8
LGF output
0.6
0.4
0.2
0
90
dB SPL
70
50
30
10
10 15 20 25 30
Time(s)
Fc = 4282 Hz
1
0.8
LGF output
0.6
0.4
0.2
0
90
dB SPL
70
50
30
10
10 15 20 25 30
Time(s)
with ALGF-2. There are three sentence presentations in the figure, having levels of 65,
80 and 50 dB SPL. Each presentation consists of three seconds of noise, then the
P a g e | 199
12.4.3.4 ALGF-2F
the ALGF-2F. The difference between the ALGF-2 and the ALGF-2F could be seen from the
two dynamic ranges and the base levels shown in Figure 12-24 and Figure 12-25.
Fc = 367 Hz
1
0.8
LGF output
0.6
0.4
0.2
0
90
dB SPL
70
50
30
10
10 15 20 25 30
Time(s)
Fc = 1101 Hz
1
0.8
LGF output
0.6
0.4
0.2
0
90
dB SPL
70
50
30
10
10 15 20 25 30
Time(s)
Fc = 4282 Hz
1
0.8
LGF output
0.6
0.4
0.2
0
90
dB SPL
70
50
30
10
10 15 20 25 30
Time(s)
with the ALGF-2F. There are three sentence presentations in the figure, having levels
of 65, 80 and 50 dB SPL. Each presentation consists of three seconds of noise, then the
P a g e | 200
12.5 Clinical Studies
Two studies were conducted to evaluate the processing conditions described in §12.4.2. At
least four cochlear implant subjects participated in each study. The speech intelligibility of
the subjects were measured with the designated processing conditions on the same day using
the same test setup to minimize any performance variation due to external factors not related
to the algorithms under test. Before each test session, the subjects engaged in brief
conversation with the experimenters, using the processing condition to be tested. The overall
loudness was balanced between the processing conditions by increasing volume. Before
loudness balancing, the subjects reported that the ALGF was softer than the Tri + ADRO.
Subject S6 with ALGF-2 had to increase the overall C-levels by 2%. No formal procedure
was followed to provide additional listening experience with each processing condition.
Subjects were not able to use the processing before and after the test sessions.
12.5.1 Test setup
The real-time Nucleus-xPC system (§7.4.5.2) was used. The speech intelligibility of the
cochlear implant subjects were evaluated with each processing condition by using the fixed-
level test (§7.4.3) and the interleaved roving-level SRT test (§7.4.4.2). Two fixed-level tests,
sentences presented at 50 dB SPL and 80 dB SPL at a preset SNR, were conducted to
complement the results of the adaptive SRT test. Four-talker babble noise was used for fixed
test and LTASS noise was used for adaptive roving-level SRT test.
In the roving-level SRT test, the background noise was presented continuously. A beep was
presented two seconds before each sentence to alert the subject that a sentence was about to
be presented.
P a g e | 201
12.5.2 Study 1: Tri + ADRO vs. ALGF-1
Four subjects, S1, S4, S5 and S6, participated in this experiment. Two processing conditions
were compared:
 Tri + ADRO
 ALGF-1
The program order was counterbalanced between subjects.
12.5.2.1 Results
12.5.2.1.1 Interleaved roving-level SRT test
Figure 12-26 shows the SRT of each subject as well as group mean SRT at each presentation
level. The group mean SRT of the ALGF-1 was comparable to the that of the Tri + ADRO at
all three presentation levels; half of the group performed better with the ALGF-1 and the
other half performed better with Tri + ADRO. Table 12-3 shows the group mean SRT
difference between the two processing, the standard deviation across subjects and p-value
calculated by a t-test to check the statistical significance between the SRTs of the two
processing. The standard deviation at each presentation level shows that SRT variation
between subjects was larger with Tri + ADRO than with ALGF-1 at 50 and 80 dB SPL but
the variation was comparable at 65 dB SPL.
A large SRT difference between the two processing conditions was observed for some
subjects. For example, S4 received a large benefit from the ALGF-1 in all three presentation
levels. The improvement was approximately 6, 10 and 4 dB for the presentation level at 50,
65 and 80 dB SPL respectively. S5 on the other hand got more benefit from Tri + ADRO at
all three presentation levels. The SRT improvement with Tri + ADRO was approximately 3,
1.5 and 7 dB at the presentation level of 50, 65 and 80 dB SPL respectively. The results of
S1 and S6 were mixed.
P a g e | 202
Interleaved Roving-level SRT Test
4 50 dBSPL
SRT (dB) 2
0
-2
-4
Tri+ADRO
-6 ALGF-1
4 65 dBSPL
2
SRT (dB)
0
-2
-4
-6 Interleaved Roving-level SRT Test
4 80 dBSPL
2
SRT (dB)
0
-2
-4
-6
Subject
Figure 12-26 SRT comparison between Tri + ADRO and ALGF-1. Error bars indicate
one standard deviation from the mean. The asterisks indicate statistically significant
difference in performance between the two processing conditions (* p < 0.05, ** p <
0.01).
P a g e | 203
50 dB SPL 65 dB SPL 80 dB SPL
SRT (dB) : Tri + ADRO 1.9 (4.16) 0.48 (2.88) -1.02 (4.58)
SRT (dB) : ALGF-1 0.64 (1.82) -2.23 (2.95) -0.02 (0.45)
Mean difference (dB) -1.26 -2.71 1
p-value 0.7 0.4 0.5
Table 12-3 Statistical analysis of SRT measured with Tri + ADRO and the ALGF-1 at
50, 65 and 80 dB SPL. The value inside the bracket indicates the standard deviation. p-
values were calculated by a t-test for the hypothesis testing of the significance on the
SRT difference.
12.5.2.1.2 Fixed presentation level test
Figure 12-27 and Figure 12-28 show the percent correct scores of the subjects evaluated by
the fixed tests at 50 and 80 dB SPL respectively. The group mean scores show comparable
performance between the two processing at both presentation levels. Among the subjects, S4
consistently scored better with the ALGF in both 50 and 80 dB SPL fixed tests. The score
improvements of S4 were more than 25 and 50 percentage points for the fixed test in 50 and
80 dB SPL respectively. S1 scored almost equally with both processing conditions in both
fixed tests. The scores of the other two subjects, S5 and S6, are mixed.
P a g e | 204
Fixed Level Test (50 dBSPL)
Tri+ADRO
100 ALGF-1
90
80
Percent correct(%)
70
60
50
40
30
20
10
0
Subject
Figure 12-27 Percent correct scores of four cochlear implant subjects with Tri + ADRO
and with ALGF-1 in the fixed test at 50 dB SPL in noise. Error bars indicate one
standard deviation from the mean.
100
90
80
Percent correct(%)
70
60
50
40
30
20
10
0
Subject
Figure 12-28 Percent correct scores of four cochlear implant subjects with Tri + ADRO
P a g e | 205
12.5.2.2 Discussion
The first version of the ALGF was successfully evaluated with four cochlear implant
subjects. The overall SRT results showed a comparable group performance between the two
processing conditions for roving-level sentences. The individual SRT results showed that
some subjects received the new algorithm well, but some did not. The SRT difference
between the two processing conditions was significantly large for some subjects.
Anecdotal reports from the subjects on the first-time listening experience with the ALGF
were positive. Some subjects reported that they heard soft environment sounds in the
laboratory for the first time, for example, keyboard tapping and paper rustling. Some
subjects reported that they occasionally noticed the pumping effect of the ALGF-1 when the
compression was released. One subject mentioned that after he spoke, the investigator’s
voice was soft at the beginning and slowly returned back to normal. The possible reason
could be the down-slew-rate (-80 dB/s which is equivalent to 278 ms release time) of the fast
saturation level saturation. Subjects can perceive the pumping effect of sounds following
cessation of a preceding louder sound by an AGC with the release time between 100 ms and
3 seconds (Dillon 2001).
The other possible shortcoming in the ALGF could be the base level regulator. In order to
control the noise floor estimator, the base level regulator restricted the up-slew-rate of the
base level tracking the estimated noise floor. The down-slew-rate of the base level regulator
was significantly faster than the up-slew-rate. This arrangement ensured that the base level
could not stay above the input signal when the presentation level was decreased. When the
presentation level was increased as in the roving SRT test, the base level regulator took a few
seconds to increase the base level to catch up with the increase in the overall presentation
level. This potentially allowed background noise at the beginning of the increased
presentation level to reside in the dynamic range.
Study 1 showed that level adjustment of the input signal by adjusting the dynamic range was
feasible. The dynamic range of the LGF could potentially be the true dynamic range of an
input signal to satisfy both audibility and noise reduction at the same time. For example,
speech in quiet can have a large dynamic range. Conversely, the same segment of speech can
P a g e | 206
have a reduced dynamic range when presented in noise. Hence the expansion of the dynamic
range during the periods of clean speech and the contraction of the range by raising the base
level during the noisy periods is fully justified.
In summary, the SRTs in roving-level sentences and the percent correct scores at the fixed
presentation levels were comparable. The performance variation between subjects was lower
with ALGF-1. This study showed that the ALGF was a feasible alternative of the AGCs in
cochlear implant systems. It also showed that the parameter space of the ALGF was large
and some parameters could be set differently to improve the speech intelligibility of cochlear
implant recipients.
12.5.3 Study 2: Tri + ADRO vs. ALGF-2
After ALGF-1 was evaluated with the cochlear implant recipients, changes were made to
improve performance, resulting in ALGF-2. Two processing conditions were evaluated in
this experiment:
 ALGF-2
 Tri + ADRO
The parameter setting of the Tri + ADRO was identical to the one used in Study 1. The
parameter setting of the ALGF-2 was described in Table 12-2.
12.5.3.1 Results
Six cochlear implant subjects, S1, S3, S4, S5, S6 and S7, participated in this experiment. All
of them participated in the interleaved roving-level SRT test. Only five subjects participated
in the fixed test at 50 dB SPL and three participated in the fixed test at 80 dB SPL.
Figure 12-29 shows the SRT results of the individual subjects and the group mean SRTs with
Tri + ADRO and ALGF-2 at each presentation level. The group mean SRT results were
comparable between the two processing conditions at 50 and 65 dB SPL. The group mean
SRT of the ALGF-2 was significantly better than that of the Tri +ADRO at 80 dB SPL. All
P a g e | 207
subjects performed better with the ALGF-2 at 80 dB SPL. According to a t-test, the
improvement was statistically significant. The SRT variation within group was comparable
between the two processing conditions at each presentation level. The difference between the
two standard deviations was less than 1 dB for all presentation levels.
For most subjects, the SRT difference between the two conditions was less than 1 dB which
is probably within the test-retest reliability (Dawson, Hersbach and Swanson 2013).
However, some of the subjects showed a tendency to improve with the ALGF-2 at 50 and 65
dB SPL. For example, at 50 dB SPL, the SRT improvement of S4 and S7 with ALGF-2 was
more than 6 dB. Among the subjects, S4 obtained a large benefit from the ALGF-2 in all
three presentation levels (the SRT improvement was 6 dB, 3 dB and 4 dB at 50, 65 and 80
dB SPL respectively). In contrast, S1 at 65 dB SPL performed better with Tri + ADRO than
with the ALGF.
In summary, performance with ALGF-2 was generally equal or better than Tri + ADRO.
P a g e | 208
4 50 dBSPL
SRT (dB) 2
0
-2
-4
Tri+ADRO
-6 ALGF-2
4 65 dBSPL
2
SRT (dB)
0
-2
-4
4 80 dBSPL
2
SRT (dB)
0
-2
-4 *p<0.02
-6
Subject
Figure 12-29 SRT comparison between Tri + ADRO and ALGF-2. Error bars indicate
one standard deviation from the mean. The asterisks indicate the statistical significance
of the difference between the two processing conditions (* p < 0.05, ** p < 0.01).
P a g e | 209
SRT (dB) : Tri + ADRO -0.29 (2.0) -0.81 (1.84) -0.53 (0.98)
SRT (dB) : ALGF-2 -2.18 (2.68) -0.85 (0.95) -2.42 (1.27)
Difference (dB) -1.89 -0.04 -1.89
p-value 0.2 0.96 0.02*
Table 12-4 Statistical analysis of SRT measured with Tri+ADRO and the ALGF-2 at
50, 65 and 80 dB SPL. The value inside the bracket indicates the standard deviation. p-
SRT difference.
The group mean score of ALGF-2 was not significantly different to that of Tri + ADRO for
sentences presented at both 50 and 80 dB SPL in noise.
100 Tri+ADRO ALGF-2
90
80
Percent correct(%)
70
60
50
40
30
20
10
0
Subject
Figure 12-30 Percent correct scores of five cochlear implant subjects with Tri + ADRO
P a g e | 210
100
90
80
Percent correct(%)
70
60
50
40
30
20
10
0
Subject
Figure 12-31 Percent correct scores of three cochlear implant subjects with Tri +
ADRO and with ALGF-2 in the fixed test at 80 dB SPL in noise. Error bars indicate
one standard deviation from the mean.
12.5.3.2 Discussion
ALGF-2 achieved equal or better SRT compared to Tri + ADRO in the roving-level SRT
test. The results of the group mean scores were comparable between the two processing
conditions in the fixed-level tests at 50 and 80 dB SPL. Similar outcome was observed in
§12.5.2.1.2. Both Tri + ADRO and ALGF-2 employed fast and slow gain control
mechanisms. Comparably good results between Tri + ADRO and ALGF could be due to the
slow components of each algorithm which adapted to the fixed presentation level throughout
the test. Performance difference would be observed if one algorithm was better than the other
in a listening situation where the overall level changed often. For this reason, sentence test
using a fixed presentation level was not effective for evaluating AGCs with slow time
constants.
The overall SRT results showed a tendency to improve the speech intelligibility of the
recipients with the ALGF in difficult listening conditions. Based on the speech intelligibility
P a g e | 211
improvement observed in some subjects in the roving-level SRT test, the potential
performance contributing factors that the ALGF could provide are identified as audibility,
listening comfort and noise reduction.
At 50 dB SPL, ALGF-2 achieved equal or better SRT compared to Tri + ADRO processing.
Apparently, ALGF-2 provided more audibility than the maximum 3 dB gain of ADRO. The
SRT improvement of S4 and S7 with the ALGF-2 at 50 dB SPL was approximately 6 dB.
The other four subjects achieved comparable SRTs with both processing conditions at 50
dB SPL. Subject S1, S4 and S7 have contralateral hearing from the hearing aids. Their
hearing aids were turned off and they only had to listen through the implant during testing.
Perhaps, the processing that provided more audibility gave more benefits to these subjects.
The roving SRT test used a fixed sequence to present the sentences at three presentation
level of 50, 65 and 80 dB SPL. The presentation level took +15 dB steps on the way up, from
50 to 65 to 80 dB SPL, but took a -30 dB step on the way down from 80 to 50 dB SPL.
Because of this -30 dB difference, the beginning part of the sentences presented at 50 dB
SPL could have lower audibility than the later part. Hence the SRT of the subjects could be
higher (worse) at 50 dB SPL than at the other two presentation levels if the release time (i.e.,
the down-slew-rate) of the ALGF was not fast enough to release the saturation level from the
highest presentation level of 80 dB SPL. Comparing the results between the presentation
levels, the SRT at 50 dB SPL was not exceptionally higher (worse) than the SRTs at the
other two presentation levels. Therefore, the down-slew-rate (– 12 dB/s) of the slow
saturation level regulator appears to be appropriate.
The down-slew-rate -12 dB/s can be equated to a release time of 2.08 seconds
approximately. It is within the range of release time values that can produce pumping effect.
A subject may hear background noise growing louder following a loud sound if the criterion
to hold the slow saturation level is not met. The quality aspect of the ALGF should further be
investigated in real-life listening situations.
For sentences presented at 80 dB SPL, there was an initial concern about prolonged
stimulation at C-levels and excessive loudness. However, the minimum dynamic range
constraint and the spectral profile preserving feature of the fast saturation level regulator
P a g e | 212
prevented potential loudness discomfort. With the ALGF, no spectral distortion due to
envelope clipping could occur. None of the subjects reported that sentences were
uncomfortably loud with the ALGF at the 80 dB SPL during the test. They also did not
report excessive loudness with Tri + ADRO.
The SRT results at 80 dB SPL clearly showed that the ALGF-2 performed better than the
existing AGC system. Every subject scored lower (better) SRT with the ALGF-2 than with
the Tri + ADRO. The explanation for the improvement with the ALGF could be partly due to
less spectral distortion by avoiding the clipping distortion and partly due to the noise
reduction by the adaptive base level.
The audibility was less of an issue for sentences presented at 65 and 80 dB SPL. Less
spectral distortion or clipping can be expected at 65 dB SPL than at 80 dB SPL. If the SRT
of the subjects with the ALGF was improved at 65 dB SPL, it could be mainly attributed to
the noise reduction ability of the ALGF. The dynamic behavior of the ALGF could also be
partly attributed. The mean SRT at 65 dB SPL are almost the same for both processing
conditions. The individual SRT results are also comparably close between the two
processing for four out of six subjects. The noise estimator was considered as less effective
due to no significant improvement at 65 dB SPL.
The signal path of Nucleus sound processors are designed to work optimally for speech
presented at 65 dB SPL (C-SPL). The results of the earlier study (§9.2.4.1) showed that the
mean SRT of Tri + ADRO at 80 dB SPL was poorer than the one at 65 dB SPL. Compared
to that, the mean SRTs of Tri + ADRO at 50 and 80 dB SPL were closer to 65 dB SPL in
Study 1 (§12.5.2.1.1) and the current study. A possible reason could be a wider input
dynamic range (50 dB) employed in the Tri + ADRO program of the earlier study in Chapter
9. This agrees with the study of Holden et al. (2011), which suggested a narrow input
dynamic range in noisy conditions as clinical guidelines.
Both Tri + ADRO (with a proper dynamic range setting) and the ALGF used slow time
constants, and acted as the Automatic Volume Control (AVC). Since they adapt to the
changes in the overall presentation level of input stimuli, the dependency of the input signal
level on the targeted SPL (C-SPL) becomes less prominent with these programs.
P a g e | 213
The experimental results showed that the ALGF could potentially improve the intelligibility
of speech presented at very high or very low levels and also provide more access to sounds
within a wide range. However, some recipients may find that an overall loudness perception
becomes less natural if the presentation level of different sounds has little variation over
time. For example, it would be difficult to judge the distance of the sound sources if two
sounds from near and far distances were perceived to have the same loudness. With the
ALGF acting as an automatic volume control, cochlear implant recipients may not know the
actual loudness of sounds. This could be inconvenient in some listening situations. For
example, the recipient would not adjust the volume of a loud TV or stereo because the sound
was not very loud for him or her. The pros and cons of this dynamic level adjustment should
be tested extensively in various listening conditions outside the laboratory.
It would be interesting to see the SRTs of the subjects evaluated in the present study
compared to the SRT of cochlear implant subjects measured by other researchers using an
adaptive roving-level SRT test. It should be noted that comparing the performance of
subjects between different studies is questionable because different studies used different test
methods and materials, different subjects and different signal processing algorithms.
The SRT results were taken from Haumann et al. (2010) and Boyle et al. (2013); both
studies used the roving-level SRT test with a single adaptive track (§7.4.4.1). The sequence
of the presentation levels was randomized in the SRT test of their studies. The present study
used the adaptive roving-level SRT test with a fixed sequence of the presentation levels.
Table 12-5 shows mean and standard deviation of the SRTs from each study.
Study the better performing Opus 2 Study 2

group Tri + ADRO ALGF-2
(Haumann, Lenarz
(Boyle et al. 2013) and Büchner 2010)
SRT (dB)
Mean (std dev) 9.4 (4) 1 (2) -0.5 (1.3) -1.8 (1.1)
approximately approximately
Table 12-5 Retrospective comparison with SRT results from other studies that used a
roving-level SRT test
P a g e | 214
The mean SRTs of the two processing conditions evaluated in the present study were the
lowest among the group. The negative SRTs indicated robustness of both processing for
roving-level speech in noise. A further improvement with ALGF-2 in such difficult listening
condition was encouraging.
12.5.4 Study 2F: Adaptive vs. Fixed Dynamic Range
The SRT improvement with ALGF-2 was attributed to the level adjustment by the adaptive
saturation level and the noise reduction by the adaptive base level. However it was not clear
which component contributed more to the speech intelligibility, from the SRT comparison
between ALGF-2 and Tri + ADRO. Therefore a comparison was made between the ALGF
with and without the noise estimation, i.e., the base level was not driven by the estimated
noise, to show the contribution of the noise floor estimator to the speech intelligibility of the
subjects. When the noise estimator was turned off, the base level was coupled to the
saturation level and the dynamic range of the ALGF was fixed. The ALGF used in this study
was configured from ALGF-2 but with a fixed dynamic range. Hence, it is labeled ALGF-
2F.
The current study is a subset of Study 2. To reflect the nature of the investigation, the study
is labeled as Study 2F. Measurements of Tri + ADRO, ALGF-2 and ALGF-2F were
conducted on the same day using the same test setup on the same subjects. When the
adaptive base level was off, the ALGF-2 became a dual-loop AGC. The dynamic range of 40
dB was chosen for the ALGF-2F in this study. The parameter settings of the saturation level
regulators are the same for both ALGF-2 and ALGF-2F. The parameter settings of the ALGF
were shown in Table 12-2.
P a g e | 215
12.5.4.1 Results
Only five cochlear implant subjects were experimented with ALGF-2F. All five of them
were evaluated using the roving-level SRT test. The speech intelligibility of only four was
evaluated using the fixed-level test at 50 dB SPL and none of them were measured at 80 dB
SPL.
Figure 12-32 shows the SRT comparison between ALGF-2 and ALGF-2F at each
presentation level. Table 12-6 shows the group mean and standard deviation of the SRTs
with each processing and the difference between the mean SRT of the two ALGF. The group
mean SRT values of ALGF-2 and ALGF-2F were comparable at 50 and 80 dB SPL in the
roving-level SRT test. The group mean SRT of ALGF-2 was higher (worse) than ALGF-2F
at 65 dB SPL. According to analysis by a t-test, the difference is statistically significant. The
individual subject’s SRT results show that most of them obtained comparable SRTs with
each processing. None of them showed a consistent improvement with either ALGF
configuration in all three presentation levels. For example, The SRT of S4 was lower (better)
at 50 and 80 dB SPL but higher (worse) at 65 dB SPL with ALGF-2. The SRT of S7 on the
other hand was lower (better) at 65 and 80 dB SPL but higher (worse) at 50 dB SPL with
ALGF-2F. S5 obtained comparable SRT values with both configurations of the ALGF-2 at
each presentation level.
P a g e | 216
4 50 dBSPL
SRT (dB) 2
0
-2
-4
ALGF-2F
-6 ALGF2
4 65 dBSPL
2
SRT (dB)
0
-2
-4 *p<0.03
4 80 dBSPL
2
SRT (dB)
0
-2
-4
-6
Subject
Figure 12-32 SRT Comparison between ALGF-2 and ALGF-2F. Error bars indicate
one standard deviation from the mean value. The asterisks indicate the statistical
significance of the difference between the two ALGFs (* p < 0.05, ** p < 0.01).
P a g e | 217
SRT (dB) : ALGF-2 -2.66 (2.7) -0.96 (1.02) -2.59 (1.35)
SRT (dB) : ALGF-2F -2.45 (1.2) -2.14 (0.84) -1.56 (0.79)
Difference (dB) 0.21 -1.18 1.03
p-value 0.81 0.03* 0.15
Table 12-6 Statistical analysis of SRT measured with ALGF-2 and the ALGF-2F at 50,
65 and 80 dB SPL. The value inside the bracket indicates the standard deviation. p-
SRT difference.
The mean scores between ALGF-2 and ALGF-2F were comparable, with the difference of
less than 10 percentage points. Three out of four subjects performed lower with ALGF-2
than ALGF-2F. The score difference was more than 20 percentage points for S1 and S7.
Only S5 scored better with the ALGF-2, with the improvement of approximately 15
percentage points.
ALGF-2F ALGF-2
100
90
80
Percent correct(%)
70
60
50
40
30
20
10
0
Subject
Figure 12-33 Percent correct scores of four cochlear implant subjects with ALGF-2 and
with ALGF-2F in the fixed test at 50 dB SPL in noise. Error bars indicate one standard
deviation from the mean.
P a g e | 218
12.5.4.2 Discussion
The motivation of Study-2F was to investigate the effect of the adaptive base level driven by
the estimated noise level. A statistically significant SRT degradation of approximately 1.2
dB was observed at 65 dB SPL in the roving-level test for ALGF-2 compared to ALGF-2F.
The rest of the results did not give a clear indication of the performance difference between
the ALGF with and without the adaptive base level. Equal or lower performance was
obtained with ALGF-2, with the adaptive base level. Three possible reasons are for the lower
or equal performance with the ALGF-2 are: (i) inaccurate noise estimation (ii) ineffective
test method for evaluation and (iii) abnormal loudness perception.
When the offline data from Figure 12-24 was reviewed, the base level took time to catch up
with the presentation level of the noisy speech when the presentation level was roved up
from 50 to 65 dB SPL and from 65 to 80 dB SPL. At the beginning of the new presentation
level, at 65 and 80 dB SPL in particular, a high level of noise was observed at the output
signal due to the underestimation of true noise. Compared to the noise tracking at 65 and 80
dB SPL, the base level of the ALGF-2 tracked the noise level reasonably well at 50 dB SPL
reasonably well. However no significant SRT difference between ALGF-2 and ALGF-2F
was observed at this level. The mean SRTs were approximately -2.5 dB at 50 dB SPL. At the
SNR of -2.5 dB, the background noise level was higher than the level of target speech.
Perhaps the output SNR could not be improved further by the adaptive base level or the
noise estimator itself became less effective at negative SNR.
Since the roving-level conditions of SRT affected the performance of the noise estimator, the
results of the fixed test were observed to see the performance difference due to the adaptive
base level. Although performance improvement was anticipated from noise removal by the
adaptive base level, no significant performance difference between ALGF-2 and ALGF-2F
was observed at 50 dB SPL. When the offline results for three different types of noise
presented at the fixed test were checked at Figure 12-13, the new MCRA noise estimator
underestimated the four-talker babble noise. If the true dynamic range of the noisy speech at
50 dB SPL was lower than the dynamic range estimated by ALGF-2, the ALGF-2 could
show more noise at the output. Similarly, if the dynamic range estimated by ALGF-2 was
P a g e | 219
larger than 40 dB, the fixed dynamic range of ALGF-2F, then more noise could be expected
at the output with ALGF-2.
In the new MCRA noise estimator, noise was estimated by recursive-averaging the noisy
channel envelopes like Lin’s method. It has an additional feature to reduce potential
overestimation error of the recursive-averaging method. Perhaps the minima-controlled
component was biased towards the level lower than the true noise floor. It should be noted
that the MCRA method employed the bias compensation by calculating the mean of the
variance of the stationary noise as proposed by Martin (2001). The underestimation at the
offline results indicated that more bias compensation was necessary for non-stationary noise
like babbles.
The individual and group SRT results were more consistent across the presentation levels
with the ALGF-2F than ALGF-2. The ALGF-2F only had the slow and fast saturation level
regulator to perform slow and fast gain adjustment at different presentation levels. Achieving
equal SRT values within a certain tolerance across all three presentation levels indicated that
the saturation level regulator performed well and truly achieved the goal. This validated the
comparison between the two ALGFs to highlight the performance of the adaptive base level.
The third possible reason for not observing performance improvement with the ALGF with
an adaptive dynamic range was abnormal loudness perception. The subjects have no
experience with processing that varied the dynamic range continuously. The loudness
perception could be affected by frequent variation in the dynamic range. Ideally, the ALGF
estimated the dynamic range of the input stimuli and no abnormal loudness perception was
expected. However, in a situation like roving-level sentences in noise, the overall dynamic
range estimated by the ALGF was not accurate due to the slow tracking of the adaptive base
level to the noise floor. Noise was gradually decreased as the base level was increased to
catch up with the noise floor. That could have impact on the loudness perception, although
no speech intelligibility degradation was expected. Further investigations should be done on
the loudness perception of the subjects with the ALGF with an adaptive dynamic range.
The hypothesis was that the adaptive base level was important for speech in noise and
performance could be improved by using the estimated noise floor as the base level.
P a g e | 220
However, the experimental results did not really show the difference to support the
hypothesis, for reasons were stated above. From Study 2F, it was concluded that the noise
estimator could be improved further. Secondly, the roving-level SRT test may not be
effective to show the contribution of the noise estimator. More research needs to be done on
test methodology to evaluate the full potential of the ALGF. Finally, the loudness perception
with the ALGF should be systematically assessed.
12.5.5 Test-retest Reliability of Interleaved Roving-level SRT Test
The test-retest variability of the adaptive roving-level SRT test with a fixed presentation
sequence was analyzed from the SRT values of four subjects who participated in Study 1 and
2 with the same processing condition, Tri + ADRO. The time difference between the two
studies ranged from 7 to 24 weeks.
Figure 12-34 shows the SRTs and Figure 12-35 shows the SRT difference of the subjects
with the Tri + ADRO between the two studies that were conducted on different days. A
positive value indicates that the SRT of Study 2 is better. Subject S4 shows the SRT
improvement in Study 2 at all three presentation levels. Likewise, S1 shows the SRT
improvement at 50 and 65 dB SPL in Study 2. S6 is the only subject with SRT differences of
less than 1 dB between the two studies for all presentation levels. 8 out of 12 SRT
differences were positive. A learning effect was possibly shown in Study 2. The effect of
subject-related matters at the time of testing such as fatigue, distraction and cognitive load
could not be ruled out from the performance difference between studies.
P a g e | 221
Interleaved Roving-level SRT Test (Tri+ADRO: Test-retest)
4 50 dBSPL
2
SRT (dB)
0
-2
-4
Trial 1
-6 InterleavedTrial 2
Roving-level SRT Test (Tri+ADRO: Test-retest)
4 65 dBSPL
2
SRT (dB)
0
-2
-4
-6 Interleaved Roving-level SRT Test (Tri+ADRO: Test-retest)
4 80 dBSPL
2
SRT (dB)
0
-2
-4
-6
Subject
Figure 12-34 Test-retest variability of the interleaved SRT test from the SRT of four
subjects with Tri + ADRO taken from Study 1 and Study 2. Error bars indicate one
standard deviation from the mean. The asterisks indicate the statistical significance of
the difference between the two studies (* p < 0.05, ** p < 0.01).
P a g e | 222
Test-retest: Tri + ADRO

6
0
dB
-2
-4
-6 50 dB SPL
65 dB SPL
80 dB SPL
-8
Subject
Figure 12-35 SRT differences between the two studies of each subject. The difference
was calculated as: SRT (Study 1) – SRT (Study 2). The mean was calculated as the
average of the absolute SRT differences between the two studies.
The test and retest variability of the interleaved roving-level SRT test was better than that of
the roving-level SRT test using single adaptive track (§9.2.6). Each system measured Tri +
ADRO in two sessions held at different days. The improvement was approximately 2 dB.
The test-retest reliability improvement with the interleaved SRT supported the claim that the
interleaved SRT test was better due to the use of independent adaptive tracks and the fixed
sequence of presentation level which could reduce the performance bias due to unbalanced
randomized sequence.
From the study of Dawson et al.(2013) on the adaptive SRT test with one presentation level,
a standard deviation of 1.2 dB could be expected from 16 sentences when a psychometric fit
calculation rule was applied. Compared to that, the standard deviation of 3.4, 2.6 and 3.1 dB
for the interleaved SRT roving at 50, 65 and 80 dB SPL in the present study was higher. The
standard deviation of each presentation level was calculated from 8 SRT values (four
subjects in two studies). A possible reason of higher SRT values observed in the present
study could be the difficulty of the SRT test; the presentation levels were roved between
three: 50, 65 and 80 dB SPL. Another reason could be small sample size.
P a g e | 223
12.6 Conclusions
Adjusting signal level by gain means has the implication of increasing the background noise
when the audibility of a low-level target signal is improved. In contrast, adjusting the relative
position of the signal level within the input dynamic range, by expanding or contracting the
range itself, can potentially improve audibility without compromising the level of
background noise. With this concept of level adjustment, a new signal processing technique
that optimizes the input dynamic range of a cochlear implant system was proposed,
implemented and evaluated in this chapter. The experimental results show that the ALGF is
feasible to replace the existing AGC systems.
Two versions of ALGF were successfully evaluated with the recipients by using the fixed
and roving level speech tests in the laboratory. Anecdotal reports from the subjects on the
first-time listening experience with the ALGF were positive. The impression was that
audibility was significantly improved as some reported that they heard low-level sounds of
activities in the laboratory. With a slightly different parameter set and the proposed MCRA
noise estimator, the second version of the ALGF, ALGF-2 was implemented. ALGF-2
achieved equal or better SRT compared to the existing AGC system in the roving-level SRT
test. The SRT improvement with the ALGF-2 was statistically significant at 80 dB SPL.
Based on the results, it was concluded that the ALGF could potentially perform better in
loud noisy environments than the existing AGC. The envelope profile limiter showed better
performance than the front-end limiter for high-level speech in noise at 10 dB SNR
(§10.4.1). The improvement was attributed to the spectral profile preserving feature of the
EPL in adverse listening conditions. Amplitude cues become more important when spectral
cues were distorted. The ALGF also preserved the spectral envelope shape by avoiding
envelope clipping. Therefore, it was deduced that the SRT improvement at the highest level
with the ALGF was also due to the ability that could preserve envelope cues more than the
existing AGC system.
The hypothesis of the ALGF was the speech intelligibility could be improved by using the
estimated noise floor as the adaptive base level. Equal or better performance with ALGF-2
shown in Study 2 only weakly supported the hypothesis. To determine whether the base level
P a g e | 224
regulator was effective to reduce the background noise, the ALGF with no base level
regulator (ALGF-2F) was also evaluated and compared to ALGF with the base level
regulator (ALGF-2). Equal or lower performance of group SRT with ALGF-2 compared to
ALGF-2F was observed. A reasonable doubt was placed on the efficacy of the base level
regulator to follow the roving-level background noise. The roving-level SRT test may not be
the right test to evaluate the performance of noise reduction.
The interleaved roving-level SRT test using a fixed sequence could reduce the variability of
performance due to test-related factors. A fixed sequence allowed equal chance for each
processing to be evaluated with the same number of ups and downs in the presentation
levels. One the other hand, it is possible for test condition to be evaluated with more up steps
than down steps or vice versa by the roving-level SRT test with random presentation level
sequences. Test-retest variation can be widened consequently. The above claim was
supported by the comparison between the test-retest variability of the interleaved roving-
level SRT test using a fixed sequence of the presentation levels in the present study and that
of the roving-level SRT test with a single adaptive track using randomized sequence. It was
generally concluded that the interleaved roving-level SRT test with fixed sequence of
presentation levels was more robust for evaluation of AGC systems. More research should
be done on the effect of test-related factors on AGC performance.
The present study showed the feasibility of ALGF as a robust level optimization technique
for cochlear implant systems. The ALGF is a feature-based algorithm. The features that were
utilised in the present study were level-related features, clipping proportion, and estimated
background noise for adjusting the dynamic range. More features could be added into the
future implementation of ALGF to make it more robust in different listening conditions. An
example of a new feature could be the energy profile of input stimuli. For example, energy
profiles of a transient noise and car noise are different. Energy profiles of speech and other
environmental noise can be different. Such information may be useful to make the algorithm
more adaptable to different stimuli.
P a g e | 225
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility
13 Predicting Cochlear Implant Speech Intelligibility
13.1 Introduction
Optimizing a sound processing algorithm to perform well in different acoustic scenarios can
take up a lot of time from both researcher and subjects. Signal metrics are useful for
exploration and tuning of the parameter space of an algorithm. A reliable signal metric can
expedite the optimization process, for example fine-tuning of a large parameter set. Two
signal metrics that were reviewed in chapter 6 are applied in this chapter: Across-Source
Modulation Correlation (ASMC) and Normalized Covariance Measure (NCM). In addition,
two new metrics are developed: Clipping Proportion, and Output SNR (OSNR).
Several AGC conditions were evaluated in previous chapters. In this chapter, the effect of
those AGC conditions on the channel envelopes are quantified by the selected signal metrics.
Then the correlation between each signal metric and the measured speech intelligibility
scores will be analysed, to determine the effectiveness of the selected metric to predict
speech intelligibility of cochlear implant subjects. This will strengthen understanding of the
important factors affecting speech intelligibility of cochlear implant recipients.
13.2 Signal Processing
The signal processing used in the previous clinical studies was replicated in this chapter:
 Performance-intensity function: no AGC vs. FEL75 (§8.2)
 Analysis of gain structure and release time: FEL75, FEL625, EPL75 and EPL625
(§10.3.3)
 Performance-intensity function: FEL75 vs. EPL625 (§10.3.4).
Subjects’ scores are collectively shown in Figure 13-1. It should be noted that the
performance-intensity function of the front-end compression limiter was measured in §8.2
with four recipients and in §10.4.2 with six recipients (four common subjects from the
previous study). The results used in this study were the average of the two clinical studies.
P a g e | 226
Intelligibility
100
90
80
Percent correct(%)
70
60
50
No AGC SNR10
40 No AGC SNR20
FEL75 SNR10
30 FEL75 SNR20
EPL625 SNR10
20
EPL625 SNR20
10 FEL625 SNR10
EPL75 SNR10
0
55 60 65 70 75 80 85 90
Presentation level (dB SPL)
Figure 13-1 Percent correct scores of the cochlear implant subjects with different AGC
configurations in the fixed level test. Open and filled symbols represent 10 dB SNR and
20 dB SNR respectively.
13.2.1 Test Stimuli
AuSTIN sentences and four-talker babble noise were used as test stimuli. Five sentences
were concatenated with silent gaps of four seconds between sentences. The reason for
inserting silent gaps is to initialize the compression of each AGC at 0 dB for the sentence
starts. The background noise was presented one second before and after each sentence. Each
metric was calculated over the duration of the sentences, excluding silent gaps in between
sentences.
The presentation levels and the SNR values evaluated in the clinical studies were
reproduced. The presentation levels of the sentences ranged from 55 to 89 dB SPL in the
studies that measured the performance-intensity functions (§8.2 and 10.3.4). The noise
presentation level for each test condition was adjusted as per the input SNR of 10 and 20 dB.
P a g e | 227
Intelligibility
13.2.2 AGC Configurations
The following AGC configurations were investigated:
1. No AGC
2. FEL75: front-end compression limiter with 75 ms release time
3. FEL625: front-end compression limiter with 625 ms release time
4. EPL75: envelope profile limiter with 75ms release time
5. EPL625: envelope profile limiter with 625 ms release time
Signal processing was performed offline in the MATLAB-Simulink platform, using the same
signal path and parameter settings that were used in the clinical studies.
The subjects’ own MAPs were also used for the performance analysis of individual subjects.
A default map was used for the group performance analysis.
13.2.3 Curve Fitting
A psychometric function relates the subject’s performance in a psychophysical task to the
physical quantity of the stimuli (Wichmann and Hill 2001). In this chapter, the performance
measure was the subject’s percent correct score (Chapters 8 and 10), and the physical
quantity was a signal metric (Chapter 6). The psychometric function is assumed to have a
sigmoidal shape. A cumulative Gaussian psychometric function was fitted to the recipients’
mean percent correct scores against each signal metric using the psignifit toolbox for
MATLAB by Jeremy Hill (version 2.5.6, available at http://bootstrap-
software.org/psignifit/), which implements a maximum-likelihood method (Wichmann and
Hill 2001). The goodness of fit was quantified by the deviance, D; a smaller deviance
indicated a better fit.
P a g e | 228
Intelligibility
13.3 Signal Metrics and Performance Analysis
13.3.1 Clipping Proportion
The most obvious form of envelope distortion is envelope clipping. The clipping proportion
was calculated by summing the number of envelope samples that exceeded the saturation
level of the LGF, and dividing by the total number of samples. The proportion of clipping
(%) was obtained by:
(13.1)
where is the amplitude of an envelope at channel at time index i. M is the number of
frequency channels, N is the number of samples collected and S is the saturation level. The
logical produces a binary number; 0 or 1.
One goal of AGC is to prevent envelope clipping, so clipping proportion is a measure of the
effectiveness of the AGC system. Figure 13-2 shows the envelopes processed by no AGC,
FEL75 and EPL625. The input stimuli were sentences presented at 80 dB SPL in four-talker
babble noise at SNR 10 dB. The envelopes were shown together with the saturation level of
the LGF to observe how effective each configuration was to prevent clipping.
P a g e | 229
Intelligibility
Band 7
0.06
No AGC
0.04
0.02
0
0.04
Amplitude
0.03 FEL
0.02
0.01
0
Signal
0.04
Sat level
0.03 EPL Base level
0.02
0.01
0
0.5 1 1.5 2 2.5 3
Time(s)
Figure 13-2 Comparison of signal amplitudes processed by each AGC configuration,

for speech presented at 80 dB SPL in the presence of four-talker babble at SNR 10dB.
The envelopes at channel 7 were taken before the LGF.
The clipping proportion was calculated only for the processing conditions with the front-end
compression limiter and with no AGC. The results of the envelope profile limiter were not
included because no envelope clipping occurred due to the novel gain structure that
preserved the spectral envelope profile. The bottom panel of Figure 13-2 shows no envelope
from the processing with the EPL exceeded the saturation level of the LGF.
P a g e | 230
Intelligibility
100 100
90 90
S2
Percent Correct Scores(%)

S1 80
80
70 70
60 60
50 50
40 40
D = 24.21 D = 17.12
30 30
20 20
10 10
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Proportion of Clipping(%) Proportion of Clipping(%)
100 100 No AGC SNR 10
S4
No AGC SNR 20
90 90 FEL75 SNR 10
S3 FEL75 SNR 20
80

80
FEL625 SNR 10
70 70
60 60
50 50
40 40
30 30
20 D = 19.17 20 D = 16.19
10 10
0 0
0 20 40 60 80 100 0 20 40 60 80 100
100 100
90 90
S5 S6
80 80
70 70
60 60
50 50
40 40
30 30 D = 4.47
20 20
10 D = 3.56 10
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Figure 13-3 Percent correct scores of the individual subject as a function of clipping
proportion. The open and filled symbols represent the results at SNR 10 dB and 20 dB
respectively.
The diagrams in Figure 13-3 show the percent correct scores of each subject as a function of
clipping proportion. The range of deviance was between 3.56 and 24.21. The deviances of
S5 and S6 were low because they were not tested with no AGC condition.
P a g e | 231
Intelligibility
100
80
60
40
20 D = 90.42
0
100
No AGC (D = 4.00)
80 FEL75 (D = 4.02)
60 SNR 20dB
40
20
0
100
No AGC (D = 1.57)
80 FEL75 (D = 2.90)
FEL625
60
SNR 10dB
40
20
0
0 10 20 30 40 50 60 70 80
Proportion of Clipping (%)
Figure 13-4 Group mean percent correct scores as a function of clipping proportion.
The top panel shows the scores in all conditions, the middle panel shows the scores at
20 dB SNR and the bottom panel shows the scores at 10 dB SNR.
The three diagrams in Figure 13-4 show the mean percent correct scores of the subjects as a
function of clipping proportion for each SNR and all SNR conditions together. The
correlation between the scores and the clipping proportion was high, indicated by low
deviance, for each processing condition at each SNR. However, a single curve cannot fit well
for the scores from all conditions together, with the deviance of 90.42. The rate of score
degradation was faster with the proportion of clipping produced by the front-end limiter. The
P a g e | 232
Intelligibility
bottom diagram of Figure 13-4 shows that the proportion of clipping is approximately 15%
from the processing with the front-end compression limiter and the proportion of clipping is
between 50% and 60% from the processing with no AGC. The same score is predicted for
both conditions. Factors other than envelope clipping, affected speech intelligibility in this
case.
Speech was still intelligible with no AGC at 20 dB SNR, even at 83 dB SPL presentation
level, where the channel envelopes were significantly clipped (§8.2.4.1). In contrast, the
envelope profile limiter prevented envelope clipping, but speech intelligibility still degraded
at high presentation levels in low SNR condition (§10.4.2). Hence, clipping proportion by
itself does not appear to be a good indicator of speech intelligibility.
13.3.2 Output SNR
The Output SNR (OSNR) metric is a simple but powerful metric to predict the effect of
compression for speech in noise. The calculation of the output SNR is similar to Rhebergen’s
apparent SNR calculation (Rhebergen, Versfeld and Dreschler 2008b; Rhebergen, Versfeld
and Dreschler 2009), but adapted for the cochlear implant processing. Figure 13-5 shows the
effective output SNR calculation in the cochlear signal path of ACE sound coding.
Figure 13-5 Block diagram of output SNR calculation for the signal path. The front-end
AGC was used as an example in this diagram.
The mixture of speech and noise was processed by the signal path. The gain signals from the
AGCs and the channel indices from the maxima selection block were recorded. Next, the
clean speech was processed through the signal path, applying the recorded gain, and
P a g e | 233
Intelligibility
choosing stimulation pulses using the recorded channel indices. An inverse LGF was applied
to revert to the linear domain, while retaining the effect of clipping. Similarly, the noise
alone was processed, using the recorded gain and channel indices. Then, the SNR was
calculated for each channel. As in other metrics, the channel SNRs were weighted by using
the relative signal power (Ma, Hu and Loizou 2009) and summed to give the OSNR.
P a g e | 234
Intelligibility
100 100
90 S1 90 S2 D = 5.19
D = 5.59

80
80
70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
-2 0 2 4 6 8 10 12 14 16 18 20 -2 0 2 4 6 8 10 12 14 16 18 20
Output SNR(dB) Output SNR(dB)
100 100
90 S3 90 S4
D = 6.41 D = 5.55

80 80
70 70
60 60
50 50
No AGC SNR 10
40 40 No AGC SNR 20
FEL75 SNR 10
30 30 FEL75 SNR 20
EPL625 SNR 10
20 20
EPL625 SNR 20
10 10 FEL625 SNR 10
EPL75 SNR 10
0 0
-2 0 2 4 6 8 10 12 14 16 18 20 -2 0 2 4 6 8 10 12 14 16 18 20
100 100
90 S5 D = 4.66 90 S6 D = 7.41
80
80
70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
-2 0 2 4 6 8 10 12 14 16 18 20 -2 0 2 4 6 8 10 12 14 16 18 20
Figure 13-6 Percent correct score of individual subject as a function of output SNR.
The open and filled symbols represent the results in 10 dB and 20 dB SNR respectively.
Each of the diagrams in Figure 13-6 shows the percent correct scores of each subject with
different AGC configurations as a function of OSNR. Each subject’s scores were highly
correlated with the OSNR. The psychometric curve was a good fit for each subject; with the
deviance between 4.66 and 7.41. The slopes of the psychometric functions were similar
across subjects, but the OSNR knee points were different, i.e. the curves were shifted
horizontally. This was expected because some subjects performed better than others in noise.
All subjects reached a performance asymptote for high OSNRs. The three diagrams in Figure
P a g e | 235
Intelligibility
13-7 show the group mean percent correct scores of the subjects with the three processing
conditions as a function of OSNR.
100
80 D = 16.36
60
40
20
0
100
80
60
SNR 20dB
40 No AGC (D = 3.70)
20 FEL75 (D = 3.42)
EPL625 (D = 5.57)
0
100
80
SNR 10dB
60 No AGC (D = 3.56)
FEL75 (D = 6.11)
40
EPL625 (D = 3.74)
20 FEL625
EPL75
0
0 5 10 15 20
Output SNR (dB)
Figure 13-7 Group mean scores as a function of output SNR. The top panel shows the
scores in all conditions, the middle panel shows the scores at 20 dB SNR and the bottom
panel shows the scores at 10 dB SNR.
The low deviance of the psychometric fit indicates that the OSNR could predict the percent
correct scores of recipients with each AGC condition in noise. The OSNR reduction with no
AGC was mainly due to the instantaneous compression at the LGF. As the presentation level
increased, the clipping initially affected only target speech while having little effect on
background noise level. The higher the presentation level, the OSNR reduction became
P a g e | 236
Intelligibility
significant with no AGC. The OSNR degradation still occurred with the front-end
compression limiter at high presentation levels although the amount of OSNR reduction was
not as high as the processing with no AGC. Among the three processing conditions, the
OSNR degradation with the proposed envelope profile limiter was the lowest. When
comparing the OSNR of each AGC with the two release times; 75 and 625 ms, it was shown
that each AGC with the release time 75 ms produced significantly lower OSNR than the
same AGC with the release time 625 ms. The performance degradation with shorter release
time was explained by the OSNR.
13.3.3 Across-Source Modulation Correlation
The Across-Source Modulation Correlation (ASMC) metric calculates the correlation
coefficient between the target speech and competing voice (Stone and Moore 2007). The two
signals that were originally independent acquired some correlation due to the common
modulation introduced by the compression. The hypothesis was that the segregation between
the target and background noise then became difficult, degrading speech intelligibility.
Figure 13-8 ASMC calculation for the signal path. The front-end AGC was used as an
example in this diagram.
The procedure to calculate ASMC was adapted from the study of Stone and Moore (2007).
The implementation of the ASMC metric for the ACE strategy was shown in Figure 13-8.
The mixture of speech and noise was processed by the signal path. The gain signals from the
AGCs were recorded. Next, the clean speech was processed through the signal path,
applying the recorded gain. The channel envelopes after the filterbank were subject to the
base and saturation level in the linear domain, while retaining the effect of clipping. Stone
P a g e | 237
Intelligibility
and Moore suggested calculating the coefficients from the logarithm of the channel
envelopes because log amplitudes were more relevant to the perception than linear
amplitudes. Therefore the channel envelopes were converted into the log domain. The log
channel envelopes were then smoothed by a low-pass filter with the cut-off of 50 Hz.
Similarly, the noise alone was processed, using the recorded gain. Then, the correlation was
calculated for each channel. As in other metrics, the ASMCs across all frequency were
averaged to give the final ASMC.
100 100
90 S1 90 S2
80
80
70 70 D = 20.49
60 60
50 50
40 D = 31.00 40
30 30
20 20
10 10
0 0
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1
ASMC Index ASMC Index
100 100
90 S3 90 S4
80
80
70 70
60 60
50 50
40 40 D = 19.33
FEL75 SNR 10
30 D = 22.98 30 FEL75 SNR 20
EPL625 SNR 10
20 20
EPL625 SNR 20
10 10 FEL625 SNR 10
EPL75 SNR 10
0 0
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1
100 100
90 90
80 S5 S6
80
70 70
60 60
D = 28.30
50 50
40 D = 12.69 40
30 30
20 20
10 10
0 0
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1
Figure 13-9 Percent correct score of the individual subject as a function of ASMC. The
open and filled symbols represent the results at SNR 10 dB and 20 dB respectively.
P a g e | 238
Intelligibility
Each of the diagrams in Figure 13-9 shows the percent correct scores of each subjects with
different AGC configurations as a function of ASMC. The ASMC could follow the trend of
scores for each subject. The deviance of the psychometric curve fitting ranged between 12.7
and 31.
100
80
D = 132.61
60
40
20
0
100
80
60
40 SNR 20dB
20 FEL75 (D = 3.28)
EPL625 (D = 5.88)
0
100
80
60 SNR 10dB
FEL75 (D = 4.27)
40
EPL625 (D = 3.88)
20 FEL625
EPL75
0
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1
ASMC
Figure 13-10 Group mean scores as a function of ASMC. The top panel shows the
P a g e | 239
Intelligibility
The bottom and middle panels of Figure 13-10 show that the correlation between the
measured scores and the ASMC was high for each SNR condition. The deviance was low
(indicating a good fit) for the psychometric curve for each SNR processing. However, a
single psychometric curve could not fit well for the scores at different SNR conditions. The
top diagram of Figure 13-10 shows two trends of score degradation with ASMC; each
represents each SNR condition. The deviance of the curve fitting was high at that condition
(D = 132.61).
The amount of compression and consequently the magnitude of ASMC also increased with
the presentation level. The measured scores reflected the performance indicated by ASMC.
The magnitude of ASMC with the proposed envelope profile limiter was lower than that
with the front-end counterpart. The ASMC comparison between the same AGC with
different release times show that the performance improvement was mainly due to the
release time.
It was unclear why ASMC at SNR 20 dB was higher in magnitude than that at SNR 10 dB
for the same presentation levels. It was logical to think that more negative ASMC would
result at low SNR condition because each AGC exercised more compression at SNR 10 dB
than at SNR 20 dB. A possible explanation for observing higher magnitude ASMC at high
SNR was the amount of modulation in the gain. At SNR 20 dB, the AGC gain was mainly
determined by speech and occasionally by noise. Hence it was modulated at the rate of
speech approximately. Whereas at SNR 10 dB, the slow modulation pattern of gain was less
prominent because the AGC reduced the overall presentation level most of the time.
Moreover, the type of background noise could also have impact on the ASMC. If the
background noise was a competing voice, as in the study of Stone and Moore (Stone and
Moore 2003, 2004), the effect of SNR value would have less effect on the ASMC.
P a g e | 240
Intelligibility
13.3.4 Normalized Covariance Measure
The Normalized Covariance Measure (NCM) indicates the fidelity of channel envelopes at
the output of the LGF compared to the reference channel envelopes. When the presentation
level is increased, the variation between the reference and processed signals becomes larger
due to compression of the AGC systems and the instantaneous compression at the LGF.
Hence the correlation between the reference and processed signal degrades at high
presentation levels.
The reference signal was the channel envelopes of clean sentences taken before the LGF and
hence not clipped at the saturation nor thresholded at the base level. The processed signal
was taken after the LGF to include the effect of instantaneous infinite compression at the
LGF. An inverse LGF was applied to revert to the linear domain, while retaining the effect
of clipping. The implementation of the NCM method closely followed the procedure
described in the study of Ma et al. (2009). The calculation of the NCM was described in
§6.4. The weight of transmission index for each frequency channel was calculated based on
the RMS energy of the reference signal.
P a g e | 241
Intelligibility
100 100
90 90 S2
S1

80 80
70 70
60 60
50 50
D = 14.92
40 40
D = 15.58
30 30
20 20
10 10
0 0
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
NCM Index NCM Index
100 100
90 S3 90 S4

80 80
70 70
60 60
50 50
No AGC SNR 10
40 D = 14.55 40 No AGC SNR 20
D = 10.00 FEL75 SNR 10
30 30 FEL75 SNR 20
EPL625 SNR 10
20 20
EPL625 SNR 20
10 10 FEL625 SNR 10
EPL75 SNR 10
0 0
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
NCM Index NCM Index
100 100
90 S5 90 S6
80
80
70 70
60 60
50 50 D = 13.22
40 40
30 30
20 20
D = 8.92
10 10
0 0
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
NCM Index NCM Index
Figure 13-11 Percent correct score of the individual subject as a function of as a

function of NCM. The open and filled symbols represent the results in SNR 10 dB and
20 dB respectively.
The diagrams in Figure 13-11 show the percent correct scores of each subject as a function
of the NCM. The NCM could follow the trend of scores for each subject. The deviance of the
psychometric curve fitting ranged between 8.92 and 15.58.
P a g e | 242
Intelligibility
100
80
D = 55.94
60
40
20
0
100
80
60
SNR 20dB
40
No AGC (D = 3.73)
20 FEL75 (D = 3.35)
EPL625 (D = 7.01)
0
100
80
SNR 10dB
60 No AGC (D = 2.86)
FEL75 (D = 7.73)
40
EPL625 (D = 3.87)
20 FEL625
EPL75
0
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
NCM
Figure 13-12 Group mean scores as a function of NCM index. The top panel shows the
The three diagrams in Figure 13-12 show the mean percent correct scores of the subjects as a
function of NCM. The bottom and middle diagrams of Figure 13-12 show that the
correlation between the scores and the NCM indexes was high for each processing condition
in each SNR. The deviance was low (indicating a good fit) for the psychometric curve for
each processing condition. A single curve fitting for all scores together captured the trend of
score degradation but the spread was wide, with the deviance of 55.94.
P a g e | 243
Intelligibility
The reduction of the NCM with no AGC was mainly due to envelope clipping at the LGF.
The higher the presentation level, the more the correlation was reduced between the
reference and the clipped envelopes for the processing with no AGC. The NCM measure
captured the envelope distortion due to clipping and it was highly correlated to the subjects’
performance for each SNR. Like OSNR and ASMC measures, the NCM also can determine
the performance change due to the release time.
13.4 Discussions and Conclusions
Four signal metrics have been investigated in this chapter. From the analysis done on each
signal metric, the following conclusions are made. Each metric tried to predict the speech
intelligibility scores of the subjects measured in the previous clinical studies using the fixed
method. Among them, the output SNR metric consistently captured the change in scores as
per the changes in the output SNR. Figure 13-13 shows the comparison of deviances from
the psychometric fitting of each signal metric and the mean speech intelligibility scores.
140
120
100
Deviance
80
60
40
20
0
Clipping(%) OSNR ASMC NCM
Signal Metric
Figure 13-13 Comparison of deviance between four signal metrics
The clipping proportion showed a good correlation with mean scores for each processing in
each test condition. Scores degraded as the proportion of clipping increased. However, using
P a g e | 244
Intelligibility
the proportion of clipping as a performance indicator to compare two AGC systems or
configurations is not effective. The effect of envelope clipping could be prominent for very
high level speech in quiet. However, not enough speech intelligibility was measured for that
test condition. For speech in noise, factors other than spectral envelope clipping affected the
performance. The clipping proportion can be used as a measure of envelope distortion but it
alone is not sufficient to predict the speech intelligibility of cochlear implant recipients.
There are other potential side-effects of envelope clipping. For example, sound quality of
clipped envelopes can be low, and stimulating at the maximum current level for prolonged
duration can drain the battery power faster.
The NCM explained the performance degradation with the reduction of temporal envelope
correlation between the reference clean speech and the signal processed by the AGC systems
and the LGF. The NCM captured the general trend of score degradation but the spread was
wide between processing conditions. The sensitivity of the NCM to the speech intelligibility
performance of the subjects depended on the AGC system. For example, the rate of score
degradation with decreasing NCM was higher for the processing with the front-end
compression limiter than the processing with no AGC. The comparison of deviances shows
that NCM is better than clipping proportion. However, like in clipping proportion, factors
other than preserving temporal envelope shape are more important for speech intelligibility.
The findings in this chapter on envelope preservation predicted by the proportion of clipping
and NCM also agree with Stone and Moore (2007) who also showed that fidelity of envelope
shape was not very important for speech intelligibility.
The ASMC has a good sensitivity to the effect of compression for speech in noise at high
presentation levels. It can compare the performance of two AGC systems. However, care
should be taken when the two test conditions with different SNRs are compared. More
investigation should be done on ASMC with different types of noise at different SNR levels
to show its dependency on noise-related test conditions.
Because cochlear implant sound processing is non-linear, the SNR at the output differs from
the SNR at the input. The present study extended prior work on calculating the apparent
SNR, to make it suitable for cochlear implant processing. Prior work recorded the gains
P a g e | 245
Intelligibility
produced by the AGC in response to the speech and noise mixture, and applied these gains
separately to the clean speech and the noise (Rhebergen, Versfeld and Dreschler 2009).
However, if the remaining cochlear implant processing was then applied, the maxima
selection on the clean speech would choose different channels from the maxima selection on
the noise. Instead, the channel indexes were recorded from the maxima selection on the
speech and noise mixture, so that the contributions of speech and noise to each stimulation
pulse could be identified.
Unlike other metrics, the performance indication by the OSNR was not conditional on
processing condition or amount of input SNR. The OSNR metric quantified the performance
by the signal-to-noise ratio at the output. The diagrams of the measured scores against the
OSNR showed that cochlear implant subjects were sensitive to the output SNR between 0
and 5 dB. In this study, the OSNR explained the performance degradation by the energetic
masking of competing noise to the target speech. Energetic masking is a peripheral masking
phenomenon that occurs when energy from two or more sounds overlaps both spectrally and
temporally, thereby reducing signal detection (Stickney et al. 2004b). The four-talker babble
noise used in the clinical studies of the AGCs appears to produce energetic masking, as
shown by the monotonic performance degradation with the level of background noise in the
fixed-level testing.
Normal hearing subjects can differentiate two voices by their pitches and other properties
such as speaking style and accents. Besides, normal hearing subjects can get benefits of dip
listening at the moments when noise energy drops (Miller 1947; Greenberg et al. 2004).
Unlike normal hearing subjects, hearing-impaired subjects did not get much benefit from
masking release during spectral and temporal dips (Moore, Peters and Stone 1999). It is well
known that pitch perception of cochlear implant recipients is poor (Zeng et al. 2002). If the
amount of noise is assumed the main determinant of speech understanding of cochlear
implant recipients in noisy conditions, the OSNR metric can be used to predict the
intelligibility of speech in any type of noise. However, the study of Stickney et al. (2004b)
showed that cochlear recipients and normal hearing subjects listening to noise-vocoded
speech achieved higher scores for speech presented with the stationary noise than with a
single-talker babble for the same SNR. They discussed that segregation between speech and
P a g e | 246
Intelligibility
noise was better if they were spectrally different. Further experiments should be conducted
on the effectiveness of OSNR on predicting speech intelligibility of cochlear implant
recipients with different types of noise, for example a single competing talker, or LTASS
with spectral and temporal dips to observe factors other energetic masking of noise, affecting
speech intelligibility.
0
-0.05
-0.1
-0.15
-0.2 67% 69%
-0.25
-0.3 20% 25%
-0.35
-0.4
-0.45 ASMC
0.65 NCM 69%

0.6 67%
0.55 25%
20%
0.5
0.45
4 OSNR 69%
67%
3
dB
2
20% 25%
1
0
FE75 FEL625 EPL75 EPL625
AGC configurations
Figure 13-14 Effect of release time on speech intelligibility predicted by ASMC (top
panel), NCM (middle panel) and OSNR (bottom panel)
Figure 13-14 shows the effect of release time on speech intelligibility predicted by each
metric. The OSNR, ASMC and NCM metrics consistently showed that the release time was
the main factor that affected speech intelligibility in this case. When the release time is
longer, the compression speed becomes slower. Slow compression brings less distortion on
temporal envelopes compared to the fast compression. In addition, when the release time is
P a g e | 247
Intelligibility
longer, the amount of compression becomes larger on reducing the signal above the
compression threshold and less amount of clipping can result at the output in this case.
P a g e | 248
Chapter 14 Conclusions and Future Work
14 Conclusions and Future Work
This chapter first summarizes the experimental results of this thesis chapter by chapter and
then draws the conclusions on the gain control techniques in cochlear implant systems and
their effects on speech intelligibility of the recipients based on the findings. It then suggests
further work on the proposed algorithms to enhance listening performance of the recipients.
14.1 Summary of Experimental Results
Chapter 8 investigated the effects of the input dynamic range limitation by the instantaneous
compression at the LGF on the speech intelligibility of cochlear implant recipients. The
performance-intensity functions with no AGC and with the front-end compression limiter
were measured for sentences presented at different presentation levels in two SNR
conditions. With no AGC, speech was still intelligible even when the sentence presentation
level was very high, for the high SNR condition. At low SNR, the envelope clipping for
sentences presented at high levels had a high impact on performance, not only due to
envelope distortion, but also due to the output SNR degradation. Background noise was
shown to have more impact on speech intelligibility than envelope clipping. Although the
front-end limiter reduced envelope clipping at the LGF for presentation levels above
65 dB SPL, the score improvement was moderate. The study in chapter 8 highlighted the
importance of an AGC system for cochlear implant systems in adverse listening conditions,
but questioned the effectiveness and robustness of a front-end limiter for speech presented at
high presentation levels.
Chapter 9 measured the SRTs of the recipients with an existing AGC system (the tri-loop
AGC and ADRO), and with the front-end limiter, for sentences roving at three presentation
levels: 50, 65 and 80 dB SPL. The study clearly indicated that a slow AGC was essential to
adapt to changes in the overall presentation level. It also showed that the performance of the
existing AGC system at 50 and 80 dB SPL could be improved. The study not only collected
baseline results, but also investigated advantages and disadvantages of two variants of the
roving-level SRT test. The roving-level SRT test with a single adaptive track showed a poor
P a g e | 249
test-retest reliability. The SRT depended on the randomized sequence of presentation levels.
The interleaved roving-level SRT, with a fixed sequence of presentation levels, was
recommended for evaluation of AGC systems in the future.
Chapter 10 demonstrated the feasibility of replacing the front-end compression limiter with
a new multichannel compression limiter, called the envelope profile limiter, which
eliminated envelope clipping. Intelligibility with the envelope profile limiter was equal to, or
slightly better, than the front-end limiter. The study also showed that a slow compression
speed (i.e. a longer release time) was more important than preserving the spectral envelope
profile.
Chapter 11 evaluated the envelope profile limiter in a take-home trial. The subjects
preferred their standard program, with ASC, in noisy listening conditions. The study showed
the importance of employing a slow AGC for everyday use.
Chapter 12 proposed a novel algorithm called Adaptive Loudness Growth Function
(ALGF), which expanded and contracted the dynamic range of the LGF adaptively based on
the features extracted from the channel envelopes. It was evaluated with cochlear implant
recipients by using the interleaved SRT test. The ALGF achieved equal or better speech
understanding compared to an existing AGC system (Tri + ADRO). A significant speech
intelligibility improvement with the ALGF was shown at a high presentation level (80
dB SPL). The ALGF showed the potential to achieve both audibility and noise reduction.
Chapter 13 investigated whether signal processing metrics could be used to predict the
speech intelligibility scores of the cochlear implant recipients that were obtained in the
previous chapters. The proportion of envelope clipping was not a good predictor of
intelligibility in noise. The ASMC could not predict the effect of SNR. The NCM was able
to predict the overall trends in scores. The best metric was output SNR, which was an
effective predictor of speech intelligibility scores for a range of processing, presentation
level, and input SNR conditions.
P a g e | 250
14.2 Conclusions
The cochlear implant is a hearing prosthesis that transforms the world of silence into the
world of sounds. There is no doubt that the cochlear implant has rehabilitated hearing of deaf
people, from lip-reading to telephone answering and music appreciation. Countless
conversations with the recipients and numerous hours of speech tests indicated that speech
understanding of the recipients was up to par at least in favourable listening conditions.
However, performance degraded quickly in adverse listening conditions.
This thesis has investigated the gain optimization techniques for input stimuli to present
within a small electrical dynamic range of cochlear implants. The conventional AGC
systems were investigated first and then new techniques were proposed based on the
findings.
The investigation started from the effect of instantaneous infinite compression of the LGF on
the speech intelligibility. The LGF of the Nucleus systems infinitely compressed the
filterbank outputs above the saturation level. It also thresholded the filterbank outputs below
the base level. With no AGC before the LGF to adjust the signal level, envelope clipping
occurred for high-level sounds. The speech intelligibility of cochlear implant recipients with
no AGC (§8.2.4.1) before the LGF was measured. With no AGC, envelope clipping occurred
substantially at high presentation levels (§13.3.1). Envelope clipping had a strong effect on
speech intelligibility at high presentation levels. The effect was more prominent when the
background noise was also up in the upper range of the LGF.
Envelope clipping has two effects on the input stimuli: envelope distortion and SNR
degradation (in noise). The envelope distortion can be analysed as temporal waveform
distortion in time and spectral envelope distortion in frequency domain. In time domain,
clipping can distort the shape of temporal envelope waveform and reduce the modulation
depth. Clipping of spectral envelopes on the other hand can destroy formant patterns and the
ratio of energy between voiced and unvoiced speech. The envelope distortion occurred both
in quiet and in noise for high-level input stimuli. However, the SNR degradation could only
occurr in noise. Speech intelligibility degradation was plotted against proportion of envelope
P a g e | 251
clipping. With no AGC, the recipients could tolerate 25% of envelope clipping, with mean
scores at about 90% (Figure 13-4 at §13.3.1) when the background noise level was low (at
20 dB SNR). It was deduced that envelope distortion has low impact on speech
intelligibility.
Envelope clipping reduced the output SNR in noise (§13.3.2). When the speech intelligibility
of cochlear implant recipients with no AGC in noise at 10 dB SNR was plotted against the
clipping proportion, performance was degraded monotonously for the presentation levels
above 70 dB SPL (§13.3.1). At 10 dB SNR, the clipping proportion of speech at 70 dB SPL
was approximately 25% and the mean score was about 60% (Figure 13-4 at §13.3.1). For the
same amount of clipping proportion, the mean score was 30 percentage points higher at 20
dB SNR. The output SNR was approximately 2.5 dB and 5 dB for the input SNR of 10 dB
and 20 dB. The output SNR degradation due to envelope clipping was the main factor
affecting the speech understanding at high presentation levels in noise.
Although envelope distortion has low impact on speech intelligibility, it can become
prominent when clipping proportion was more than 25%. The experiment with no AGC did
not show this effect because the background noise level was relatively high (i.e. close to the
saturation level) for the presentation levels above 83 dB SPL at 20 dB SNR. The output SNR
reduction occurred at high presentation level in noise because the envelopes of target speech
were clipped more than the background noise. If that was the case, then reducing clipping
proportion at high presentation level in noise should improve the output SNR and therefore
the speech intelligibility. The experimental results of the front-end compression limiter
(§8.2.4.2) partially supported the above statement. The speech intelligibility improvement
was seen at very high presentation levels, at 86 and 89 dB SPL, at 20 dB SNR and above 70
dB SPL at 10 dB SNR.
The purpose of the front-end compression limiter (§8.2.4.2) was to reduce the envelope
clipping at the LGF. However, it could not guarantee zero percent envelope clipping,
because the compression happened before the filterbank and the filterbank outputs might still
be above the saturation level of the LGF. This could happen for input stimuli with a low
crest factor. Envelope clipping was observed at high presentation levels with the front-end
P a g e | 252
compression limiter (§13.3.1). Performance was not proportionally improved with the
reduction of clipping proportion (Figure 13-4). Besides, the front-end compression limiter
could also introduce temporal envelope distortion and SNR reduction, only in noise, and
consequently reduce the speech intelligibility of cochlear implant recipients.
A novel compression limiter, known as the envelope profile limiter, was proposed to
improve the spectral envelope cues of input stimuli (§10.2.2). By using the saturation level
of the LGF as the compression threshold, the proposed envelope profile limiter could be
configured to avoid clipping completely. More importantly, it applied a single gain to all
channels to preserve the spectral envelope profile of input stimuli. The effect of avoiding
envelope clipping by preserving spectral profile was observed by comparing speech
intelligibility of cochlear implant recipients with the front-end compression limiter and with
the envelope profile limiter (§10.4.1). In addition to that, the effect of compression speed on
speech intelligibility was also observed. The experimental results showed that the
compression speed had a larger effect on speech intelligibility than the gain structure that
preserved spectral profile (§10.4). The AGC configuration with longer release time achieved
higher speech intelligibility for each AGC.
Preserving spectral envelope could also improve speech intelligibility only at some test
conditions. It was shown more effective when envelope distortion was more severe due to
fast compression, for example the envelope profile limiter performed better than the front-
end compression limiter in quiet when the release time was 75 ms. However, in noise, it was
absolutely important to reduce the compression speed for speech intelligibility. Additional
benefit of preserving spectral envelope profile was observed when the release time was 625
ms. Figure 13-14 showed the effect of compression speed or release time quantified by the
output SNR, the ASMC and the NCM. All three signal metrics showed that longer release
time was important. Each AGC with the release time 625 ms improved the output SNR,
reduced the cross-modulation between target and non-target signals and preserved temporal
envelope shape more than the same AGC with the release time 75 ms.
The importance of slow AGC was also observed when the existing AGC systems were
evaluated using the roving-level SRT tests (§9.2.4). Take-home experiment on the envelope
P a g e | 253
profile limiter showed that subjects preferred their standard program in noisy situations
because it contained ASC that reduced the overall level when the background noise was high
(§11.4). AGCs with even slower time constants (i.e., a release time longer than 625 ms) were
not only necessary for speech intelligibility, but also for listening comfort in real-life. Unlike
hearing aid recipients who showed mixed results on compression speed (Kates 2010), the
experimental results in this thesis clearly indicated that a slow compression speed in an AGC
was primarily important for cochlear implant recipients. An AGC with a fast compression
speed could produce more negatives to the channel envelopes than the positives of loudness
balancing. Hence a hypothetical AGC system that could bring benefits in both quiet and
noisy condition would be the envelope profile limiter with longer release time than 625 ms
or an AGC system with a slow AGC followed by the envelope profile limiter placed before
the LGF.
A fast AGC, also known as WDRC, and a slow multichannel AGC called ADRO are
employed in hearing aids and Nucleus cochlear implant systems to improve audibility of
low-level sounds. Such systems improved speech intelligibility of low-level speech in quiet
but mixed results were shown in noise. WDRC has implications in noise due to fast
compression speed. ADRO on the other hand is a slow AGC but the background noise rule
that preceded audibility rule could compromise audibility. The standard parameter setting of
ADRO uses less stringent background noise criterion to improve audibility.
For the normal hearing system, heavily distorted speech can still be intelligible because the
intelligibility is carried by redundant cues in time and frequency. Unlike normal hearing, it is
important for a cochlear implant system to preserve critical cues of speech to maintain
intelligibility because it cannot preserve all the acoustic features available in speech. AGCs
are important to maintain, if not improve, the available limited spectro-temporal cues. This
thesis has singled out effects of AGC that are important for speech intelligibility. The most
significant factor of AGC that could affect speech intelligibility is compression speed. The
effect is strongly related to the level of background noise. Many researchers have shown that
slow compression speed is important to maintain slow temporal envelope modulation. This
thesis showed that slow compression speed is not only important for slow temporal envelope
modulation but also for reducing the proportion of clipping as well as improving the output
P a g e | 254
SNR because it applies a larger amount of gain to input stimuli above the compression
threshold.
From the results of various studies on AGC systems in hearing aids, Kates (2010) suggested
that the parameters of hearing aids cannot be fixed for all hearing losses or listening
situations. It should rather adjust the processing dynamically in response to the calculated
individual benefit. Similarly, in cochlear implant systems, signal processing should be
adaptable to listening situations. Conventional AGCs, slow and fast AGC together, can
tackle the long-term and short-term level variation in input stimuli. They could extend the
upper and lower limit of the operating range but the net range between them would still be
the same. With conventional AGC systems, the audibility of target signals can be achieved at
the risk of increasing the level of competing signals because gain adjustment is done within a
fixed dynamic range.
A proposal was made in this thesis that the input dynamic range of cochlear implant systems
should follow the dynamic range of input stimuli. For example, a large dynamic range was
preferable for clean speech to include low-level components in the range. In contrast, a small
dynamic range was more appropriate in noisy conditions to exclude noise from the range
(Holden et al. 2011). A novel signal level optimization technique proposed in this thesis
could improve audibility without compromising on the level of background noise (§12.3).
The proposed algorithm, known as the Adaptive Loudness Growth Function (ALGF),
continually adjusted the dynamic range of the LGF to accommodate the input stimuli. It
controlled audibility and distortion on the channel envelopes by adjusting the saturation level
of the LGF. The saturation level regulator is a dual-loop level control with slow and fast time
constants and with the ability to preserve spectral envelope profile. In addition, it could also
control the level of background noise by adjusting the base level of the LGF. The ALGF has
the potential to reduce the level of background noise. The experimental results showed that
the ALGF achieved equal or better SRT in roving-level sentences compared to the existing
AGC systems (§12.5).
Based on the experimental results and anecdotal reports from the recipients, the ALGF was
considered as a feasible alternative for the conventional AGC systems. This thesis has taken
P a g e | 255
a first step in the research area of robust signal level optimization. Better speech
intelligibility of cochlear implant recipients in adverse listening conditions is anticipated
with this new approach.
14.3 Future Work
Future works are listed in the order of priority, anticipated by the present author.
Parameter fine-tuning of ALGF with signal metrics
It is almost impossible to find the optimal parameter set by evaluating speech intelligibility
of cochlear implant subjects with different parameter sets because the parameter space of the
ALGF is large. The best way to work on fine-tuning the parameter space is to use reliable
signal metrics to quantify the effects of changing the parameters. According to the
experimental works in this thesis, the output SNR metric can consistently predict speech
intelligibility of the recipient in noise. Therefore the optimal parameter set of the ALGF
should be worked out with the output SNR metric or other reliable signal metrics before a
take-home study is conducted.
Take-home experiment for ALGF
The output of the ALGF is likely to be presented at the same presentation level regardless of
the input presentation level. It would be interesting to find out the perception of cochlear
implant subjects on sounds processed by the ALGF. Such quality assessment is more
appropriate to conduct in real-life listening conditions than in the laboratory. The benefit of
the ALGF on speech intelligibility has been shown by the acute testing in the laboratory in
this thesis. A take-home study is still needed for complementing the benefits shown by acute
testing. Due to considerable amount of time required to implement the ALGF on the BTE
sound processor, the present author has not done in this thesis. Therefore it is noted as a
future work.
Feature space exploration for ALGF
P a g e | 256
The ALGF is a feature-based dynamic range optimization algorithm. The feature it utilized
was clipping proportion in the saturation level regulator and estimated noise magnitude in
the base level regulator. There are other features that could be applied to improve speech
intelligibility, for example spectral envelope correlation or coherence between channels,
modulation rate and depth and energy profile of input stimuli. Feature space is large and it is
important to extract useful features. Some are more important than the others as shown by
the experiments with the signal metrics (§13.3). For example, the output SNR was shown
important for speech in noise. Hence employing a noise estimator in the base level regulator
is the right thing to improve the output SNR.
Test strategy for evaluating AGC systems for cochlear implant systems
The current test strategy of a roving-level SRT test is to assess the performance of AGC
systems in low, normal and high presentation levels. It also assesses the dynamic behavior of
the AGC for tracking the abrupt changes in the presentation level. The roving-level SRT test
was hypothesized to represent realistic listening conditions outside the laboratory (Haumann,
Lenarz and Büchner 2010). However, it is rare in the real-life scenarios in which speech and
noise presentation level was roved between three levels. It would be better if the test
condition represented a listening situation that was mostly encountered by recipients such as
a conversation between a recipient and another person. Such test condition can be simulated
by roving two presentation levels, which either be 50 and 65 dB SPL or 65 and 80 dB SPL
without losing the essence of the roving-level SRT test. The original roving-level SRT test of
Boyle et al. (2009) presented the background noise 0.5 second before and after each
sentence. The interleaved roving-level SRT test used for evaluating the ALGF presented the
background noise continuously. Both techniques increased or decreased the background
noise together with the presentation level of each sentence. Again it is arguable that such
acoustic scenarios in which the background noise changes between different levels are
hardly around us. A scenario in which sentences roved between different levels in the
presence of a background noise with a constant level is more common than the one just
described. It is important to evaluate robustness of AGCs with roving sentence and noise
levels but at the same time it is important for test conditions to reflect realistic listening
conditions. More research should be done on test methods for AGC systems.
P a g e | 257
Envelope profile limiter for bilateral cochlear implant sound processing
So far, the proposed envelope profile limiter has only been evaluated with unilateral implant
recipients. It is hypothesized that the envelope profile limiter could improve localization of
bilateral cochlear implant recipients by preserving the important binaural cues known as
Inter-aural Level Difference (ILD). Figure 14-1 shows an example of a binaural envelope
profile limiter. A similar arrangement can be made with the front-end AGCs of the bilateral
processors(van Hoesel, Ramsden and O'Driscoll 2002). However, it cannot guarantee to
improve the ILD for input stimuli at high presentation levels. For example, the two front-end
AGCs synchronously reduce the signal level to maintain the ILD. However, envelope
clipping at the LGF can destroy the relative level difference between the two envelopes and
therefore the ILD cues. With the bilateral envelope profile limiter, the relative level
difference between the two envelopes is guaranteed, because clipping never happens.
Figure 14-1 Application of the envelope profile limiter in the bilateral sound processing
Envelope profile limiter for music perception
In cochlear implant music perception, amplitude modulation of the envelopes provides an
important temporal cue to pitch (Laneau, Wouters and Moonen 2004; Swanson 2008). Since
the envelope profile limiter preserves the envelope modulation, it may improve melody
recognition. Therefore the application of the envelope profile limiter for cochlear implant
music perception is noted as a future work.
P a g e | 258
Appendix 1: Subjects
The subject details are listed in Table A-1. The subjects were post-lingually deafened adults
with the Nucleus 24 (CI24M, CI24R), Nucleus Freedom (CI24RE) or Nucleus 5 (CI512)
cochlear implants. The last column indicates the per-channel stimulation rate used in their
usual processor. All subjects used the ACE strategy and the CP810 sound processor at the
time of testing. Bilateral subjects used only one implant of their choice and bimodal subjects
turned off their hearing aids during the test.
Sex Age Aetiology Number Implant Implant Number Number PPS

(yr) of use (yr) type of of
implant channels maxima
S1 M 75 Unknown Bimodal >5 CI24RE 20 10 1800
S2 M 86 Unknown Monaural > 12 CI24M 22 12 720
S3 M 74 Unknown Bilateral > 2.5 CI24RE 22 8 900
S4 M 70 Familial Bimodal > 3.5 CI24RE 22 8 900
S5 F 49 Congenital, Bilateral > 6.5 CI24R 22 9 1200
Progressive
S6 F 64 Unknown Bilateral > 5.5 CI24M 22 12 900
S7 M 41 Unknown Bimodal > 2.5 CI512 22 8 900
S8 F 73 Otosclerosis Bilateral > 10.5 CI24M 22 14 720
S9 F 62 Unknown Monaural > 13 CI24R(ST) 22 12 900
S10 M 79 Progressive Bilateral >5 CI24R(CA) 22 12 900
Table A-1 Subject's biographical information, device use and stimulation information
P a g e | 259
Table A-2 is a cross-reference listing the experiments participated in by each subject.
Results
Experiment S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
Section
P-I functions of no
AGC and front-end 8.2.4    
limiter
Existing AGC systems

9.2.4       
evaluation
Release time vs. AGC

10.4.1      
structure
P-I functions of front-
end limiter and 10.4.2      
envelope profile limiter
Take-home trial of the

11.4     
envelope profile limiter
12.5.2.1.1    
ALGF-1 vs. Tri
12.5.2.1.2    
+ADRO
12.5.2.1.2    
12.5.3.1.1      
ALGF-2 vs. Tri +
12.5.3.1.2   
ADRO
12.5.3.1.2     
ALGF-2: adaptive vs. 12.5.4.1.1     
fixed dynamic range 12.5.4.1.2    
Table A-2 Subject-experiment participation
P a g e | 260
Appendix 2: Cochlear Implant Clinical Questionnaire
Comparison of Two Conditions
Name: _______________________________ Date: ________________
Speech Processor: ______________________ Project: _______________________
We are interested in knowing which of the two programs or speech processors (P1 or P2)
that you have been comparing performs best in your daily life.
In this questionnaire you are asked to judge the helpfulness of each program or speech
processor in a variety of listening situations. You are asked to judge the benefit of the
processors or programs in each situation, NOT the difficulty of the situation itself.
To answer each question, indicate for each processor or program by how much by circling
the response:
A: B: C: D: E:
Ext. Helpful Very Helpful Helpful Little Help No Help
Not Applicable
Mark ONE rating for processor or program.
The “Not Applicable” response box is provided if you have not experienced the situation.
P a g e | 261
We know that not all people talk alike. Some mumble, others talk too fast, and others talk
without moving their lips very much. Please answer the questions according to the way most
people talk.
1. You are watching the news on TV.
P 1: A B C D E
P 2: A B C D E Not Applicable
2. You are at home talking to a friend or member of your family who is in the next room.
P 1: A B C D E
3. You are in a busy shopping centre. There is a lot of background noise and you are in
conversation with a friend.
P 1: A B C D E
3. You are speaking to a softly spoken person in a room without any background noise.
P 1: A B C D E
5. You are listening to the news on the radio in a quiet room.
P a g e | 262
P 1: A B C D E
6. You are in a crowded grocery store checkout line and are talking with the cashier.
P 1: A B C D E
7. You are talking with a small group of friends or family in a quiet room.
P 1: A B C D E
8. You are talking to a familiar person on the telephone.
P 1: A B C D E
9. You are talking to a friend or family member about 2-3 feet (1 metre) away in a quiet
room at your home.
P 1: A B C D E
10. You are talking to a familiar person in quiet conditions outside.
P a g e | 263
P 1: A B C D E
11. You are listening to soft sounds in your environment (such as a refrigerator motor, birds
at a distance, or water boiling on the stove).
P 1: A B C D E
12. You are talking with one other familiar person in a quiet carpeted room.
P 1: A B C D E
13. You are travelling in a car in noisy traffic with some of the windows down. You are
having a conversation with one other person.
P 1: A B C D E
14. You are talking with someone across the other side of the room. The other person is
speaking in a normal voice.
P 1: A B C D E
P a g e | 264
15. You are talking to a friend or family member and the TV is loud in the background.
P 1: A B C D E
16. You are listening to music on your stereo.
P 1: A B C D E
17. You are on near a busy road talking with a friend.
P 1: A B C D E
18. You are with talking with a small group of people in a busy restaurant or café.
P 1: A B C D E
Program Preference
What is your overall preferred program when listening in quiet?
 P1  P2  no difference
Using your preferred program in quiet, how would you describe the sound quality?
P a g e | 265
 Very similar to the other programs
 Slightly better than the other programs
 Moderately better than the other programs
 Much better than the other programs
What is your overall preferred program when listening in noise?
 P1  P2  no difference
Using your preferred program in noise, how would you describe the sound quality?
 Very similar to the other programs
 Slightly better than the other programs
 Moderately better than the other programs
 Much better than the other programs
THANK YOU FOR COMPLETING THE QUESTIONNAIRE.
P a g e | 266
Appendix 3: Statistics
This chapter describes the statistical methods used to analyze the speech scores of the CI
subjects who participated in the experiments in this thesis. Since the time of CI subjects who
volunteer in experiments is precious, the number of trials should be limited, but sufficient to
show a significant difference between two processing conditions.
Binomial Test
In the fixed presentation level sentence tests, the number of correct morphemes was counted
for each sentence, and summed across sentences. The score was the proportion of correct
morphemes. In all experiments in this thesis, we are interested in comparing the subject’s
performance under two processing conditions. A hypothesis test is performed to determine
whether one processing condition is significantly better than the other.
Swanson (2008) developed an efficient way to determine the statistical significance of the
difference between two proportions, using Monte Carlo simulation (Simon 1997). The
analysis scripts are part of the Nucleus Matlab Toolbox and were used in this thesis. The
following steps explain the method, which is implemented in the function
Difference_between_paired_proportions.
>> Difference_between_paired_proportions([X1, X2], N, 'monte
carlo');
X1 and X2 are the numbers of correct morphemes in each condition, and N is the total
numbers of morphemes in each condition. The appropriate null hypothesis is that both
processing conditions give equal probability of a correct response. If the null hypothesis is
true, then the best estimate of this common probability p0 is:
>> p0 = (X1 + X2) / (2 * N);
Next, two sets of random samples are generated assuming that they come from the same
binomial distribution with probability p0, as stated in the null hypothesis.
P a g e | 267
>> sim_x1 = binornd(length(X1), p0, 1 , num_sim);
>> sim_x2 = binornd(length(X2), p0, 1 , num_sim);
num_sim is the number of random samples. The difference between the simulated data
sim_x1 and sim_x2 is found.
>> sim_d = sim_x1 – sim_x2;
The difference between the actual data X1 and X2 is determined.
>> actual_d = X1 – X2;
Then the difference calculated from the simulated data, and the difference between the
measured data X1 and X2, are compared.
>> p_diff = sum(sim_d >= actual_d)/num_sim;
If p_diff < 0.05 then the difference is significant and the null hypothesis is not
supported. It means the probability of the difference coming from the same dataset is less
than 5%. Hence X1 and X2 are statistically significantly different.
P a g e | 268
References
References
ANSI, A. (1996). "S3.22-1996, Specification of hearing aid characteristics". New York:

American National Standards Institute.
ANSI, A. (1997). "S3.5-1997, Methods for the calculation of the speech intelligibility
index". New York: American National Standards Institute.
Bacon, S. P., R. R. Fay, et al. (2004). "Compression: from cochlea to cochlear implants",
Springer Verlag.
Bench, J., Å. Kowal, et al. (1979). "The BKB (Bamford-Kowal-Bench) sentence lists for
partially-hearing children." British Journal of Audiology 13(3): 108-112.
Blamey, P. (2005). "Adaptive dynamic range optimization (ADRO): A digital amplification

strategy for hearing aids and cochlear implants." Trends in Amplification 9(2): 77.
Blamey, P., P. Arndt, et al. (1996). "Factors affecting auditory performance of

postlinguistically deaf adults using cochlear implants." Audiology and Neurotology
1(5): 293-306.
Blamey, P., D. Macfarlane, et al. (2005). "An intrinsically digital amplification scheme for
hearing aids." EURASIP Journal on Applied Signal Processing 18: 3026-3033.
Bondarew, V. and P. Seligman (2012). "The Cochlear Story", CSIRO Publishing.
Boothroyd, A., F. N. Erickson, et al. (1994). "The hearing aid input: a phonemic approach to
assessing the spectral distribution of speech." Ear and hearing 15(6): 432.
Boothroyd, A. and S. Nittrouer (1988). "Mathematical treatment of context effects in

phoneme and word recognition." The Journal of the Acoustical Society of America
84: 101.
Boyle, P. J., A. Büchner, et al. (2009). "Comparison of dual-time-constant and fast-acting

automatic gain control (AGC) systems in cochlear implants." International Journal
of Audiology 48(4): 211-221.
Boyle, P. J., T. B. Nunn, et al. (2013). "STARR: A Speech Test for Evaluation of the
Effectiveness of Auditory Prostheses Under Realistic Conditions." Ear and hearing
34(2): 203-212.
Brand, T. and B. Kollmeier (2002). "Efficient adaptive procedures for threshold and
concurrent slope estimates for psychophysics and speech intelligibility tests." The
Journal of the Acoustical Society of America 111: 2801.
Bustamante, D. K. and L. D. Braida (1987). "Multiband compression limiting for hearing-

impaired listeners." Journal of Rehabilitation Research and Development 24(4):
149-160.
Byrne, D., H. Dillon, et al. (2001). "NAL-NL1 procedure for fitting nonlinear hearing aids:
Characteristics and comparisons with other procedures." JOURNAL-AMERICAN
ACADEMY OF AUDIOLOGY 12(1): 37-51.
P a g e | 269
References
Cameron, S. and H. Dillon (2007). "Development of the listening in spatialized noise-

sentences test (LISN-S)." Ear and hearing 28(2): 196-211.
Chen, F. and P. C. Loizou (2010). "Analysis of a simplified normalized covariance measure

based on binary weighting functions for predicting the intelligibility of noise-
suppressed speech." Journal of the Acoustical Society of America 128(6): 3715-
3723.
Chen, F. and P. C. Loizou (2011a). "Modeling speech intelligibility by cochlear implant

users". Conference on Implantable Auditory Prostheses, Pacific Grove, California,
USA.
Chen, F. and P. C. Loizou (2011b). "Predicting the intelligibility of vocoded speech." Ear
and hearing 32(3): 331.
Clark, G. (2003). "Cochlear implants: fundamentals and applications", Springer Verlag.
Cohen, I. and B. Berdugo (2002). "Noise estimation by minima controlled recursive

averaging for robust speech enhancement." Signal Processing Letters, IEEE 9(1):
12-15.
Cosendai, G. and M. Pelizzone (2001). "Effects of the Acoustical Dynamic Range on Speech
Recognition with Cochlear Implants: Efectos en el rango dinámico del
reconocimiento del habla con implantes cocleares." International Journal of
Audiology 40(5): 272-281.
Crain, T. R. and E. W. Yund (1995). "The effect of multichannel compression on vowel and
stop-consonant discrimination in normal-hearing and hearing-impaired subjects."
Ear and hearing 16(5): 529-543.
Davies-Venn, E., P. Souza, et al. (2007). "Speech and music quality ratings for linear and
nonlinear hearing aid circuitry." Journal of the American Academy of Audiology
18(8): 688-699.
Dawson, P., J. Decker, et al. (2004). "Optimizing dynamic range in children using the
Nucleus cochlear implant." Ear and hearing 25(3): 230.
Dawson, P. W., A. A. Hersbach, et al. (2013). "An adaptive Australian Sentence Test In
Noise (AuSTIN)." Ear and hearing in press.
Dawson, P. W., S. J. Mauger, et al. (2011). "Clinical Evaluation of Signal-to-Noise Ratio–

Based Noise Reduction in Nucleus® Cochlear Implant Recipients." Ear and hearing
32(3): 382.
Dawson, P. W., A. E. Vandali, et al. (2007). "Clinical evaluation of expanded input dynamic
range in nucleus cochlear implants." Ear and hearing 28(2): 163-176.
De Gennaro, S., L. Braida, et al. (1986). "Multichannel syllabic compression for severely
impaired listeners." Journal of Rehabilitation Research and Development 23(1): 17.
Dillon, H. (2001). "Hearing aids", Boomerang press.
P a g e | 270
References
Djourno, A. and C. Eyries (1957). "Auditory prosthesis by means of a distant electrical

stimulation of the sensory nerve with the use of an indwelt coiling]." La Presse
médicale 65(63): 1417.
Dreschler, W. A. (1992). "Fitting multichannel-compression hearing aids." International

Journal of Audiology 31(3): 121-131.
Drullman, R. (1995). "Temporal envelope and fine structure cues for speech intelligibility."
Journal of the Acoustical Society of America 97(1): 585-592.
Drullman, R., J. M. Festen, et al. (1994a). "Effect of reducing slow temporal modulations on
speech reception." Journal of the Acoustical Society of America 95(5): 2670-2680.
Drullman, R., J. M. Festen, et al. (1994b). "Effect of temporal envelope smearing on speech
reception." Journal of the Acoustical Society of America 95(2): 1053-1064.
Dubno, J., A. Horwitz, et al. (2005). "Word recognition in noise at higher-than-normal

levels: Decreases in scores and increases in masking." The Journal of the Acoustical
Society of America 118: 914.
Dunn, H. and S. White (1940). "Statistical measurements on conversational speech." The

Firszt, J. B. (2003). "HiResolution sound processing." Advanced Bionics White Paper.

Sylmar, Calif: Advanced Bionics Corp.
Firszt, J. B., L. K. Holden, et al. (2004). "Recognition of speech presented at soft to loud
levels by adult cochlear implant recipients of three cochlear implant systems." Ear
and hearing 25(4): 375.
Fletcher, H. and W. Munson (1937). "Relation between loudness and masking." J. Acoust.
Soc. Am. 9: 1-10.
Fletcher, H. and W. A. Munson (1933). "Loudness, its definition, measurement and

calculation." The Journal of the Acoustical Society of America 5(2): 82-108.
French, N. and J. Steinberg (1947). "Factors governing the intelligibility of speech sounds."
The Journal of the Acoustical Society of America 19(1): 90-119.
Friesen, L. M., R. V. Shannon, et al. (2001). "Speech recognition in noise as a function of the
number of spectral channels: comparison of acoustic hearing and cochlear implants."
The Journal of the Acoustical Society of America 110: 1150.
Fu, Q.-J. and J. J. Galvin III (2008). "Maximizing cochlear implant patients’ performance
with advanced speech training procedures." Hearing research 242(1): 198-208.
Fu, Q.-J. and R. V. Shannon (1998a). "Effects of amplitude nonlinearity on phoneme

recognition by cochlear implant users and normal-hearing listeners." The Journal of
the Acoustical Society of America 104: 2570.
Fu, Q.-J., R. V. Shannon, et al. (1998). "Effects of noise and spectral resolution on vowel
and consonant recognition: Acoustic and electric hearing." The Journal of the
Acoustical Society of America 104: 3586.
P a g e | 271
References
Fu, Q. and R. Shannon (1998b). "Effects of amplitude nonlinearity on phoneme recognition

by cochlear implant users and normal-hearing listeners." The Journal of the
Füllgrabe, C., M. A. Stone, et al. (2009). "Contribution of very low amplitude-modulation

rates to intelligibility in a competing-speech task." The Journal of the Acoustical
Gatehouse, S., G. Naylor, et al. (2006). "Linear and nonlinear hearing aid fittings--1. Patterns
of benefit." International Journal of Audiology 45(3): 130.
Gifford, R. H. (2011). "Who is a cochlear implant candidate?" The Hearing Journal 64(6):
16-18.
Gifford, R. H., J. K. Shallop, et al. (2008). "Speech recognition materials and ceiling effects:
considerations for cochlear implant programs." Audiology and Neurotology 13(3):
193-205.
Goldsworthy, R. and J. Greenberg (2004). "Analysis of speech-based speech transmission

index methods with implications for nonlinear operations." The Journal of the
Goorevich, M. (2005). "An Algorithmic Testbench for Cochlear Implant DSP Speech
Processors". Department of Electronics of the Division of Information and
Communication Sciences, Macquarie University, Sydney, Australia.Master of
Science. 289
Greenberg, S. (1996). "Auditory processing of speech." Principles of experimental

phonetics: 362-407.
Greenberg, S., W. A. Ainsworth, et al. (2004). "Speech processing in the auditory system",
Springer Berlin.
Hagerman, B. and A. Olofsson (2004). "A method to measure the effect of noise reduction
algorithms using simultaneous speech and noise." Acta Acustica united with
Acustica 90(2): 356-361.
Hansen, M. (2002). "Effects of multi-channel compression time constants on subjectively

perceived sound quality and speech intelligibility." Ear and hearing 23(4): 369-380.
Haumann, S., T. Lenarz, et al. (2010). "Speech Perception with Cochlear Implants as
Measured Using a Roving-Level Adaptive Test Method." ORL 72(6): 312-318.
Henry, B. A., C. M. McKay, et al. (2000). "The relationship between speech perception and
electrode discrimination in cochlear implantees." The Journal of the Acoustical
Society of America 108(3): 1269-1280.
Hersbach, A. A., K. Arora, et al. (2012). "Combining Directional Microphone and Single-
Channel Noise Reduction Algorithms: A Clinical Evaluation in Difficult Listening
Conditions With Cochlear Implant Users." Ear and hearing 33(4): e13-e23.
Hirsch, H. and C. Ehrlicher (1995). "Noise estimation techniques for robust speech
recognition". Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995
International Conference on, IEEE.
P a g e | 272
References
Hochmair-Desoyer, I., E. Schulz, et al. (1997). "The HSM sentence test as a tool for
evaluating the speech understanding in noise of cochlear implant users." Otology &
Neurotology 18(6): S83-S86.
Holden, L. K., R. M. Reeder, et al. (2011). "Optimizing the perception of soft speech and
speech in noise with the Advanced Bionics cochlear implant system." International
Journal of Audiology 50(4): 255-269.
Holube, I. and B. Kollmeier (1996). "Speech intelligibility prediction in hearing impaired

listeners based on a psychoacoustically motivated perception model." The Journal of
House, W. F. and J. Urban (1973). "Long term results of electrode implantation and
electronic stimulation of the cochlea in man." The Annals of otology, rhinology, and
laryngology 82(4): 504.
Houtgast, T. and H. J. M. Steeneken (1973). "The modulation transfer function in room

acoustics as a predictor of speech intelligibility." The Journal of the Acoustical
Houtgast, T. and H. J. M. Steeneken (1985). "A review of the MTF concept in room
acoustics and its use for estimating speech intelligibility in auditoria." The Journal of
Hu, Y. and P. Loizou (2008). "A new sound coding strategy for suppressing noise in
cochlear implants." The Journal of the Acoustical Society of America 124: 498.
IEC (1997). "IEC 60118-2, Hearing aids, Part 2: Hearing aids with automatic gain control
circuits".
Iwaki, T., P. Blamey, et al. (2008). "Bimodal studies using adaptive dynamic range
optimization (ADRO) technology." International Journal of Audiology 47(6): 311-
318.
James, C. J., P. J. Blamey, et al. (2002). "Adaptive dynamic range optimization for cochlear
implants: a preliminary study." Ear and hearing 23(1): 49S.
James, C. J., M. W. Skinner, et al. (2003). "An investigation of input level range for the
Nucleus 24 cochlear implant system: speech perception performance, program
preference, and loudness comfort ratings." Ear and hearing 24(2): 157.
Jenstad, L. M. and P. E. Souza (2005). "Quantifying the effect of compression hearing aid
release time on speech acoustics and intelligibility." Journal of Speech, Language
and Hearing Research 48(3): 651.
Kates, J. M. (1991). "A time-domain digital cochlear model." Signal Processing, IEEE
Transactions on 39(12): 2573-2592.
Kates, J. M. (2010). "Understanding compression: Modeling the effects of dynamic-range

compression in hearing aids." International Journal of Audiology 49(6): 395-409.
Killion, M. C. (1978). "Revised estimate of minimum audible pressure: Where is

the’’missing 6 dB’’?" The Journal of the Acoustical Society of America 63: 1501.
P a g e | 273
References
King, A. and M. Martin (1984). "Is AGC beneficial in hearing aids?" British Journal of
Audiology 18(1): 31-38.
Klatt, D. H. (1989). "Review of selected models of speech perception". Lexical

representation and process, MIT Press.
Kral, A. and G. M. O'Donoghue (2010). "Profound deafness in childhood." New England

Journal of Medicine 363(15): 1438-1450.
Kryter, K. D. (1962a). "Methods for the calculation and use of the articulation index." The
Kryter, K. D. (1962b). "Validation of the articulation index." The Journal of the Acoustical
Laneau, J., J. Wouters, et al. (2004). "Relative contributions of temporal and place pitch cues
to fundamental frequency discrimination in cochlear implantees." The Journal of the
Laurence, R. F., B. C. Moore, et al. (1983). "A comparison of behind-the-ear high-fidelity

linear hearing aids and two-channel compression aids, in the laboratory and in
everyday life." British Journal of Audiology 17(1): 31-48.
Lazard, D. S., C. Vincent, et al. (2012). "Pre-, per-and postoperative factors affecting
performance of postlinguistically deaf adults using cochlear implants: a new
conceptual model over time." PloS one 7(11): e48739.
Levitt, H. (1978). "Adaptive testing in audiology." Scandinavian audiology.

Supplementum(6): 241.
Licklider, J. and I. Pollack (1948). "Effects of differentiation, integration, and infinite peak
clipping upon the intelligibility of speech." The Journal of the Acoustical Society of
America 20: 42.
Licklider, J. C. and G. A. Miller (1951). "The perception of speech."
Lin, L. (2004). "Speech Processing in the Auditory Filter Domain". School of Electrical
Engineering & Telecommunications, The University of New South Wales, Sydney,
Australia.Ph.D. 154
Lin, L., W. Holmes, et al. (2003a). "Adaptive noise estimation algorithm for speech
enhancement." Electronics Letters 39(9): 754-755.
Lin, L., W. Holmes, et al. (2003b). "Subband noise estimation for speech enhancement using
a perceptual Wiener filter". Acoustics, Speech, and Signal Processing, 2003.
Proceedings.(ICASSP'03). 2003 IEEE International Conference on, IEEE.
Lippmann, R., L. Braida, et al. (1981). "Study of multichannel amplitude compression and
linear amplification for persons with sensorineural hearing loss." The Journal of the
Loizou, P. C. (2007). "Speech enhancement: theory and practice", CRC.
P a g e | 274
References
Loizou, P. C., M. Dorman, et al. (2000). "Speech recognition by normal-hearing and

cochlear implant listeners as a function of intensity resolution." The Journal of the
Loizou, P. C., M. Dorman, et al. (1999). "On the number of channels needed to understand
speech." The Journal of the Acoustical Society of America 106: 2097.
Lyon, R. (1982). "A computational model of filtering, detection, and compression in the
cochlea". Acoustics, Speech, and Signal Processing, IEEE International Conference
on ICASSP'82., IEEE.
Lyon, R. (1983). "A computational model of binaural localization and separation". Acoustics,
Speech, and Signal Processing, IEEE International Conference on ICASSP'83.,
IEEE.
Lyon, R. (1984). "Computational models of neural auditory processing". Acoustics, Speech,

and Signal Processing, IEEE International Conference on ICASSP'84., IEEE.
Lyon, R. F. and C. Mead (1988). "An analog electronic cochlea." Acoustics, Speech and
Signal Processing, IEEE Transactions on 36(7): 1119-1134.
Ma, J., Y. Hu, et al. (2009). "Objective measures for predicting speech intelligibility in noisy
conditions based on new band-importance functions." The Journal of the Acoustical
Mackersie, C. L. (2002). "Tests of speech perception abilities." Current Opinion in

Otolaryngology & Head and Neck Surgery 10(5): 392-397.
Malah, D., R. V. Cox, et al. (1999). "Tracking speech-presence uncertainty to improve

speech enhancement in non-stationary noise environments". Acoustics, Speech, and
Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on,
IEEE.
Marriage, J. E., B. C. J. Moore, et al. (2005). "Effects of three amplification strategies on

speech perception by children with severe and profound hearing loss." Ear and
hearing 26(1): 35-47.
Martin, R. (1994). "Spectral subtraction based on minimum statistics." power 6: 8.
Martin, R. (2001). "Noise power spectral density estimation based on optimal smoothing and
minimum statistics." Speech and Audio Processing, IEEE Transactions on 9(5): 504-
512.
McDermott, H. J., K. R. Henshall, et al. (2002). "Benefits of syllabic input compression for
users of cochlear implants." Journal of the American Academy of Audiology 13(1):
14-24.
McDermott, H. J., C. M. Mckay, et al. (1992). "A new portable sound processor for the
University of Melbourne/Nucleus Limited multielectrode cochlear implant." The
McDermott, H. J., A. E. Vandali, et al. (1993). "A portable programmable digital sound
processor for cochlear implant research." Rehabilitation Engineering, IEEE
Transactions on 1(2): 94-100.
P a g e | 275
References
McKay, C. M., H. J. McDermott, et al. (1994). "Pitch percepts associated with

amplitude‐modulated current pulse trains in cochlear implantees." The Journal of the
Miller, G. A. (1947). "The masking of speech." Psychological Bulletin 44(2): 105.
Moore, B. (2000). "Use of a loudness model for hearing aid fitting. IV. Fitting hearing aids
with multi-channel compression so as to restore'normal'loudness for speech at
different levels." British Journal of Audiology 34(3): 165-177.
Moore, B. C. J. (2003a). "An Introduction to the Psychology of Hearing", Academic Press.
Moore, B. C. J. (2003b). "Speech processing for the hearing-impaired: successes, failures,

and implications for speech mechanisms." Speech communication 41(1): 81-91.
Moore, B. C. J. (2008). "The choice of compression speed in hearing aids: Theoretical and
practical considerations and the role of individual differences." Trends in
Amplification 12(2): 103.
Moore, B. C. J. and B. R. Glasberg (1988). "A comparison of four methods of implementing

automatic gain control (AGC) in hearing aids." British Journal of Audiology 22(2):
93-104.
Moore, B. C. J., B. R. Glasberg, et al. (1991). "Optimization of a slow-acting automatic gain

control system for use in hearing aids." British Journal of Audiology 25(3): 171-182.
Moore, B. C. J., R. W. Peters, et al. (1999). "Benefits of linear amplification and

multichannel compression for speech comprehension in backgrounds with spectral
and temporal dips." The Journal of the Acoustical Society of America 105: 400.
Muller-Deile, J., J. Kiefer, et al. (2008). "Performance benefits for adults using a cochlear
implant with adaptive dynamic range optimization (ADRO): a comparative study."
Cochlear Implants International 9(1): 8.
Neal, T. (2011). "Acoustic Processing Method and Apparatus". US Patent. 20110135129.
Nelson, D., D. Van Tasell, et al. (1995). "Electrode ranking of" place pitch" and speech
recognition in electrical hearing." Journal of the Acoustical Society of America
98(4): 1987-1999.
Nelson, D. A., J. L. Schmitz, et al. (1996). "Intensity discrimination as a function of stimulus

level with electric stimulation." The Journal of the Acoustical Society of America
100: 2393.
Neuman, A. C., M. H. Bakke, et al. (1998). "The effect of compression ratio and release time
on the categorical rating of sound quality." The Journal of the Acoustical Society of
America 103: 2273.
Nie, K., A. Barco, et al. (2006). "Spectral and temporal cues in cochlear implant speech
perception." Ear and hearing 27(2): 208-217.
Nilsson, M., S. D. Soli, et al. (1994). "Development of the Hearing in Noise Test for the
measurement of speech reception thresholds in quiet and in noise." The Journal of
P a g e | 276
References
Nogueira, W., A. Büchner, et al. (2005). "A psychoacoustic ‘‘NofM”-type speech coding
strategy for cochlear implants." EURASIP J. Appl. Sig. Process: 3044–3059.
Nysen, P. A. (1980). "Recursive Percentile Estimator". US Patent. 4204260.
Olsen, W. O. (1998). "Average Speech Levels and Spectra in Various Speaking/Listening

Conditions: A Summary of the Pearson, Bennett, & Fidell (1977) Report." Am J
Audiol 7(2): 21-25.
Patrick, J. F., P. A. Busby, et al. (2006). "The development of the Nucleus Freedom cochlear
implant system." Trends in Amplification 10(4): 175-200.
Pavlovic, C. V. (1984). "Use of the articulation index for assessing residual auditory function
in listeners with sensorineural hearing impairment." The Journal of the Acoustical
Society of America 75(4): 1253.
Pavlovic, C. V. (1987). "Derivation of primary parameters and procedures for use in speech
intelligibility predictions." The Journal of the Acoustical Society of America 82: 413.
Pavlovic, C. V. and G. A. Studebaker (1984). "An evaluation of some assumptions

underlying the articulation index." The Journal of the Acoustical Society of America
75: 1606.
Pavlovic, C. V., G. A. Studebaker, et al. (1986). "An articulation index based procedure for
predicting the speech recognition performance of hearing‐impaired individuals." The
Pearsons, K., R. Bennett, et al. (1976). "Speech levels in various environments. Report to the
Office of Recources & Development." Environmental Protection Agency, BBN
Report 3281.
Pearsons, K. S., R. L. Bennett, et al. (1977). "Speech levels in various noise environments".
Washington, DC, Office of Health and Ecological Effects, Office of Research and
Development, US EPA.
Plomp, R. (1983). "Perception of speech as a modulated signal". Proceedings of the Tenth

International Congress of Phonetic Sciences, Dordrecht, Foris.
Plomp, R. (1988). "The negative effect of amplitude compression in multichannel hearing

aids in the light of the modulation transfer function." The Journal of the Acoustical
Plomp, R. (1994). "Noise, amplification, and compression: Considerations of three main

issues in hearing aid design." Ear and hearing 15(1): 2.
Qin, M. K. and A. J. Oxenham (2003). "Effects of simulated cochlear-implant processing on

speech reception in fluctuating maskers." The Journal of the Acoustical Society of
America 114: 446.
Rhebergen, K., N. Versfeld, et al. (2008a). "Prediction of the intelligibility for speech in real-
life background noises for subjects with normal hearing." Ear and hearing 29(2):
169.
P a g e | 277
References
Rhebergen, K., N. Versfeld, et al. (2009). "The dynamic range of speech, compression, and
its effect on the speech reception threshold in stationary and interrupted noise." The
Rhebergen, K. S. and N. J. Versfeld (2005). "A speech intelligibility index-based approach

to predict the speech reception threshold for sentences in fluctuating noise for
normal-hearing listeners." The Journal of the Acoustical Society of America 117:
2181.
Rhebergen, K. S., N. J. Versfeld, et al. (2006). "Extended speech intelligibility index for the
prediction of the speech reception threshold in fluctuating noise." The Journal of the
Rhebergen, K. S., N. J. Versfeld, et al. (2008b). "Quantifying and modeling the acoustic
effects of compression on speech in noise." The Journal of the Acoustical Society of
America 123(5): 3167-3167.
Ris, C. and S. Dupont (2001). "Assessing local noise level estimation methods: Application
to noise robust ASR." Speech communication 34(1): 141-158.
Rosen, S. (1992). "Temporal information in speech: acoustic, auditory and linguistic

aspects." Philosophical Transactions of the Royal Society of London. Series B:
Biological Sciences 336(1278): 367-373.
Scollie, S., R. Seewald, et al. (2005). "The desired sensation level multistage input/output
algorithm." Trends in Amplification 9(4): 159-197.
Seligman, P. (2000). "Automatic sensitivity control", US Patent 6,151,400.
Seligman, P. and L. Whitford (1995). "Adjustment of appropriate signal levels in the Spectra
22 and mini speech processors." The Annals of otology, rhinology & laryngology.
Supplement 166: 172.
Shannon, R., Q. Fu, et al. (2001). "Critical cues for auditory pattern recognition in speech:
Implications for cochlear implant speech processor design." Physiological and
Psychological Bases of Auditory Function.
Shannon, R. V. (1983). "Multichannel electrical stimulation of the auditory nerve in man. I.

Basic psychophysics." Hearing research 11(2): 157-189.
Shannon, R. V. (1992). "Temporal modulation transfer functions in patients with cochlear

implants." The Journal of the Acoustical Society of America 91: 2156.
Shannon, R. V., F.-G. Zeng, et al. (1995). "Speech recognition with primarily temporal
cues." Science 270(5234): 303-304.
Shannon, R. V., F.-G. Zeng, et al. (1998). "Speech recognition with altered spectral
distribution of envelope cues." The Journal of the Acoustical Society of America
104: 2467.
Shi, L.-F. and K. A. Doherty (2008). "Subjective and objective effects of fast and slow
compression on the perception of reverberant speech in listeners with hearing loss."
Journal of Speech, Language and Hearing Research 51(5): 1328.
Simon, J. (1997). "Resampling: the new statistics", Resampling Stats Arlington, VA.
P a g e | 278
References
Skinner, M. W., L. K. Holden, et al. (1999). "Comparison of two methods for selecting
minimum stimulation levels used in programming the Nucleus 22 cochlear implant."
Journal of Speech, Language and Hearing Research 42(4): 814.
Skinner, M. W., L. K. Holden, et al. (1997). "Speech recognition at simulated soft,

conversational, and raised-to-loud vocal efforts by adults with cochlear implants."
The Journal of the Acoustical Society of America 101: 3766.
Slaney, M. (1988). "Lyon's cochlear model", Citeseer.
Souza, P., L. Jenstad, et al. (2006). "Measuring the acoustic effects of compression
amplification on speech in noise." The Journal of the Acoustical Society of America
119: 41.
Souza, P. E. (2002). "Effects of compression on speech acoustics, intelligibility, and sound

quality." Trends in Amplification 6(4): 131.
Souza, P. E., B. Yueh, et al. (2000). "Fitting hearing aids with the Articulation Index: Impact
on hearing aid effectiveness." Journal of Rehabilitation Research and Development
37(4): 473-482.
Spahr, A., M. Dorman, et al. (2007). "Performance of patients using different cochlear
implant systems: Effects of input dynamic range." Ear and hearing 28(2): 260.
Spriet, A., L. Van Deun, et al. (2007). "Speech Understanding in Background Noise with the
Two-Microphone Adaptive Beamformer BEAM (TM) in the Nucleus Freedom (TM)
Cochlear Implant System." Ear and hearing 28(1): 62-72.
Stahl, V., A. Fischer, et al. (2000). "Quantile based noise estimation for spectral subtraction
and Wiener filtering". Acoustics, Speech, and Signal Processing, 2000. ICASSP'00.
Proceedings. 2000 IEEE International Conference on, IEEE.
Steeneken, H. J. M. and T. Houtgast (1980). "A physical method for measuring speech
transmission quality." The Journal of the Acoustical Society of America 67: 318.
Steinberg, J. C. and M. B. Gardner (1937). "The dependence of hearing impairment on sound

intensity." The Journal of the Acoustical Society of America 9: 11.
Stevens, K. N. (1983). "Acoustic Properties Used for the Identification of Speech Sounds."
Annals of the New York Academy of Sciences 405(1): 2-17.
Stevens, S. S. (1957). "On the psychophysical law." Psychological review 64(3): 153.
Stevens, S. S. (1975). "Psychophysics: Introduction to its perceptual, neural, and social

prospects", Transaction Publishers.
Stickney, G. S., F.-G. Zeng, et al. (2004a). "Cochlear implant speech recognition with speech
maskers." The Journal of the Acoustical Society of America 116: 1081.
Stickney, G. S., F. G. Zeng, et al. (2004b). "Cochlear implant speech recognition with speech
maskers." The Journal of the Acoustical Society of America 116: 1081.
P a g e | 279
References
Stöbich, B., C. M. Zierhofer, et al. (1999). "Influence of automatic gain control parameter
settings on speech understanding of cochlear implant users employing the
continuous interleaved sampling strategy." Ear and hearing 20(2): 104.
Stone, M. A. and B. C. J. Moore (2003). "Effect of the speed of a single-channel dynamic

range compressor on intelligibility in a competing speech task." The Journal of the
Stone, M. A. and B. C. J. Moore (2004). "Side effects of fast-acting dynamic range

compression that affect intelligibility in a competing speech task." The Journal of
Stone, M. A. and B. C. J. Moore (2007). "Quantifying the effects of fast-acting compression

on the envelope of speech." The Journal of the Acoustical Society of America 121:
1654.
Stone, M. A. and B. C. J. Moore (2008). "Effects of spectro-temporal modulation changes

produced by multi-channel compression on intelligibility in a competing-speech
task." The Journal of the Acoustical Society of America 123: 1063.
Stone, M. A., B. C. J. Moore, et al. (1999). "Comparison of different forms of compression

using wearable digital hearing aids." The Journal of the Acoustical Society of
America 106: 3603.
Studebaker, G., R. Sherbecoe, et al. (1999). "Monosyllabic word recognition at higher-than-

normal speech and noise levels." The Journal of the Acoustical Society of America
105: 2431.
Swanson, B. A. (2008). "Pitch Perception with Cochlear Implants". Department of

Otolaryngology, The University of Melbourne.Doctor of Philosophy. 306
Swanson, B. A., E. Van Baelen, et al. (2007). "Cochlear Implant Signal Processing ICs".
Custom Integrated Circuits Conference, 2007. CICC '07. IEEE. 437-442.
van Buuren, R. A., J. M. Festen, et al. (1999). "Compression and expansion of the temporal
envelope: Evaluation of speech intelligibility and sound quality." The Journal of the
van Hoesel, R., R. Ramsden, et al. (2002). "Sound-direction identification, interaural time
delay discrimination, and speech intelligibility advantages in noise for a bilateral
cochlear implant user." Ear and hearing 23(2): 137-149.
Verschuure, J., A. Maas, et al. (1996). "Compression and its effect on the speech signal." Ear
and hearing 17(2): 162-175.
Villchur, E. (1973). "Signal processing to improve speech intelligibility in perceptive

deafness." The Journal of the Acoustical Society of America 53: 1646.
Walker, G. and H. Dillon (1982). "Compression in hearing aids: An analysis, a review and
some recommendations", Australian Government Publishing Service.
White, M. (1986). "Compression systems for hearing aids and cochlear prostheses." Journal
of Rehabilitation Research and Development 23(1): 25.
P a g e | 280
References
Wichmann, F. A. and N. J. Hill (2001). "The psychometric function: I. Fitting, sampling, and
goodness of fit." Percept Psychophys 63(8): 1293-1313.
Wilson, B. (2006a). "Speech processing strategies." Cochlear Implants: A Practical Guide,

second ed. John Wiley & Sons, Hoboken, NJ: 21–69.
Wilson, B. and M. Dorman (2008). "Cochlear implants: Current designs and future
possibilities." J Rehabil Res Dev 45(5): 695-730.
Wilson, B., C. Finley, et al. (1991). "Better speech recognition with cochlear implants."
Wilson, B. S. (2006b). "Speech-Processing Strategies." Cochlear implants: a practical

guide: 21-69.
Wolfe, J., E. C. Schafer, et al. (2009). "Evaluation of speech recognition in noise with
cochlear implants and dynamic FM." Journal of the American Academy of
Audiology 20(7): 409-421.
Xu, L., C. S. Thompson, et al. (2005). "Relative contributions of spectral and temporal cues
for phoneme recognition." The Journal of the Acoustical Society of America 117:
3255.
Yates, G. K. (1995). "Cochlear structure and function." Hearing: 41-74.
Yoshinaga-Itano, C., A. L. Sedey, et al. (1998). "Language of early-and later-identified

children with hearing loss." Pediatrics 102(5): 1161-1171.
Yost, W. A. and D. W. Nielsen (1994). "Fundamentals of hearing: an introduction",

Academic Press San Diego.
Yund, E. W. and K. M. Buckles (1995a). "Enhanced speech perception at low signal-to-

noise ratios with multichannel compression hearing aids." The Journal of the
Acoustical Society of America 97: 1224-1224.
Yund, E. W. and K. M. Buckles (1995b). "Multichannel compression hearing aids: Effect of

number of channels on speech discrimination in noise." The Journal of the
Zeng, F. and J. Galvin III (1999). "Amplitude mapping and phoneme recognition in cochlear
implant listeners." Ear and hearing 20(1): 60.
Zeng, F., G. Grant, et al. (2002). "Speech dynamic range and its effect on cochlear implant
performance." The Journal of the Acoustical Society of America 111: 377.
P a g e | 281

Public Version

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Public Version

Uploaded by

Copyright:

Available Formats

Gain Optimization for Cochlear Implant

Phyu Phyu Khing

A thesis submitted in fulfilment of the requirements for the degree of

The University of New South Wales

School of Electrical Engineering and Telecommunications

Surname or Family name: Khing

First name: Phyu Phyu Other name/s:

Abbreviation for degree as given in the University

School: EE&T Faculty: Engineering

Title: Gain Optimization for Cochlear Implant

Abstract 350 words maximum: (PLEASE TYPE)

Declaration relating to disposition of project thesis/dissertation

……………………………………………… ……………………………………..…… ……….……………………...

FOR OFFICE USE ONLY Date of completion of requirements for

THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS

have a better mentor.

studies in this thesis would not be complete without their contribution.

encouragement and support.

more time with you.

Table of Contents .................................................................................................................... iii

List of Figures ......................................................................................................................... ix

List of Tables ........................................................................................................................ xvi

Acronyms and Abbreviations ............................................................................................. xviii

1.1 Thesis Objectives ................................................................................................... 1

2.1 Introduction .......................................................................................................... 14

3.1 Introduction .......................................................................................................... 24

4.1 Introduction .......................................................................................................... 34

5.1 Introduction .......................................................................................................... 52

6.1 Introduction .......................................................................................................... 81

7.1 Introduction .......................................................................................................... 90

8.1 Introduction ........................................................................................................ 107

9.1 Introduction ........................................................................................................ 117

10.1 Introduction ........................................................................................................ 134

11.1 Introduction ........................................................................................................ 152

12.1 Introduction ........................................................................................................ 161

13.1 Introduction ........................................................................................................ 226

14.1 Summary of Experimental Results .................................................................... 249

Appendix 2: Cochlear Implant Clinical Questionnaire ........................................................ 261

Appendix 3: Statistics........................................................................................................... 267

References ............................................................................................................................ 269

Figure 1-2 Gain optimization research overview ..................................................................... 7

Figure 2-1 Illustration of the peripheral auditory system....................................................... 14

Figure 2-2 Cross-section of the cochlea ................................................................................. 16

Figure 2-3 Block diagram of Lyon's auditory model ............................................................. 19

Figure 3-1 Cochlear implant system ...................................................................................... 26

Figure 3-2 Nucleus 5 system.................................................................................................. 28

with equal charge (right panel) .................................................................................... 30

Figure 4-1 Cochlear implant sound processing (Swanson 2008)........................................... 35

Figure 4-2 Continuous Interleaved Sampling strategy (Wilson 2006b) ............................... 36

Figure 4-3 Signal processing modules of the ACE strategy .................................................. 38

Figure 4-4 Magnitude response of 22-channel filterbank ...................................................... 41

Figure 4-5 Instantaneous infinite non-linear compression of LGF ........................................ 44

Figure 4-6 Electrodogram of the monosyllabic word ‘Choice’ ............................................. 47

Figure 4-7 Reconstructed spectrogram of the monosyllabic word ‘Choice’ ......................... 48

Figure 5-2 Components of an AGC system ........................................................................... 55

Figure 5-3 Behaviour of a typical AGC system ..................................................................... 56

ratio as a parameter (Plomp 1994) ............................................................................... 62

Figure 5-8 Input-output diagram of Whisper ......................................................................... 70

Figure 5-10 Block diagram of ADRO in one frequency channel .......................................... 74

proposed AGC (green block) ..................................................................................... 106

Figure 8-1 Signal path used in the experiment ..................................................................... 109

FEL75 ........................................................................................................................ 111

indicate statistically significant difference in performance between the two AGC

systems (* p < 0.05, ** p < 0.01). .............................................................................. 120