You are on page 1of 307

Gain Optimization for Cochlear Implant

Systems

Phyu Phyu Khing

A thesis submitted in fulfilment of the requirements for the degree of

Doctor of Philosophy

The University of New South Wales

School of Electrical Engineering and Telecommunications

Sydney, AUSTRALIA

August 2013
PLEASE TYPE
THE UNIVERSITY OF NEW SOUTH WALES
Thesis/Dissertation Sheet

Surname or Family name: Khing

First name: Phyu Phyu Other name/s:

Abbreviation for degree as given in the University


calendar: PhD

School: EE&T Faculty: Engineering

Title: Gain Optimization for Cochlear Implant


Systems

Abstract 350 words maximum: (PLEASE TYPE)

Cochlear implant systems need Automatic Gain Control (AGC) to compress the large dynamic range (~120 dB) of
the acoustic environment into the small dynamic range (< 20 dB) of electrical stimulation. This thesis is concerned
with the design, implementation and evaluation of AGC systems for cochlear implants. It investigated the effects of
AGC on the speech intelligibility of cochlear implant recipients. Various AGC configurations were evaluated with
sentences presented over a wide range of levels at different Signal-to-Noise Ratios (SNR) to identify important
factors affecting the performance. Signal metrics were developed to quantify the effects of AGC on the channel
envelopes. The goal was to improve speech intelligibility in adverse listening conditions.
The performance-intensity functions of cochlear implant recipients with no AGC and with a front-end compression
limiter were measured in noise. With no AGC, the proportion of envelope clipping grew monotonically with
presentation level. The front-end limiter substantially reduced envelope clipping yet gave little improvement in
speech intelligibility. The recipients were highly tolerant of envelope clipping when the background noise was low.
SNR degradation was identified as the main factor reducing speech intelligibility.
A front-end limiter cannot guarantee zero envelope clipping. In contrast, the proposed envelope profile limiter
eliminated envelope clipping and hence preserved the spectral profile. The two AGCs were evaluated, with two
release times (75 and 625 ms). The shorter release time gave worse speech intelligibility because it caused more
waveform distortion and output SNR reduction. For a given release time, preserving spectral envelope profile gave
additional benefits. In a take-home experiment, cochlear implant recipients rated a program with the envelope
profile limiter equivalent to their everyday program.
A conventional cochlear implant signal path uses a predetermined input dynamic range, which is shifted up or
down by the AGC. In contrast, the proposed Adaptive Loudness Growth Function (ALGF) continually optimized the
input dynamic range by estimating the noise floor and peak level in each channel. The ALGF gave better Speech
Reception Threshold (SRT) than the existing state-of-the-art AGC system at the high presentation level when
evaluated with a newly developed roving-level SRT test at three presentation levels.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my
thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known,
subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain
the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts
International (this is applicable to doctoral theses only).

……………………………………………… ……………………………………..…… ……….……………………...


…………… ………… …….…
Signature Witness Date

The University recognises that there may be exceptional circumstances requiring restrictions on copying or
conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a
longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean
of Graduate Research.

FOR OFFICE USE ONLY Date of completion of requirements for


Award:

THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS


Abstract

Cochlear implant systems need Automatic Gain Control (AGC) to compress the large
dynamic range (~120 dB) of the acoustic environment into the small dynamic range (< 20
dB) of electrical stimulation. This thesis is concerned with the design, implementation and
evaluation of AGC systems for cochlear implants. It investigated the effects of AGC on the
speech intelligibility of cochlear implant recipients. Various AGC configurations were
evaluated with sentences presented over a wide range of levels at different Signal-to-Noise
Ratios (SNR) to identify important factors affecting the performance. Signal metrics were
developed to quantify the effects of AGC on the channel envelopes. The goal was to improve
speech intelligibility in adverse listening conditions.

The performance-intensity functions of cochlear implant recipients with no AGC and with a
front-end compression limiter were measured in noise. With no AGC, the proportion of
envelope clipping grew monotonically with presentation level. The front-end limiter
substantially reduced envelope clipping yet gave little improvement in speech intelligibility.
The recipients were highly tolerant of envelope clipping when the background noise was
low. SNR degradation was identified as the main factor reducing speech intelligibility.

A front-end limiter cannot guarantee zero envelope clipping. In contrast, the proposed
envelope profile limiter eliminated envelope clipping and hence preserved the spectral
profile. The two AGCs were evaluated, with two release times (75 and 625 ms). The shorter
release time gave worse speech intelligibility because it caused more waveform distortion
and output SNR reduction. For a given release time, preserving spectral envelope profile
gave additional benefits. In a take-home experiment, cochlear implant recipients rated a
program with the envelope profile limiter equivalent to their everyday program.

A conventional cochlear implant signal path uses a predetermined input dynamic range,
which is shifted up or down by the AGC. In contrast, the proposed Adaptive Loudness
Growth Function (ALGF) continually optimized the input dynamic range by estimating the
noise floor and peak level in each channel. The ALGF gave better Speech Reception
Threshold (SRT) than the existing state-of-the-art AGC system at the high presentation level
when evaluated with a newly developed roving-level SRT test at three presentation levels.

i
Acknowledgements

First of all, I would like to thank my supervisor, Professor Eliathamby Ambikairajah. With

his great support, supervision and mentoring, my study in the UNSW has been a great

successful journey.

I would like to thank my co-supervisor, Dr. Brett Swanson, for envisioning this thesis and

guiding me through. His knowledge in the field of cochlear implant signal processing is

immense and invaluable for this thesis. I am proud to be his first PhD student and I could not

have a better mentor.

Thanks are also due to my employer, Cochlear Limited, and my work colleagues.

Particularly, Mr. Michael Goorevich, for providing the research tools and giving help from

time to time in the clinical studies. I would like to thank Mr. Sasha Case. His help in the

take-home study in particular is greatly appreciated. I would like to thank Mr. Paul

Holmberg, for teaching me how to write Assembly code. I would like to thank Ms. Esti Nel

and the clinical team for their help in the clinical studies of this thesis.

I would like to express my gratitude and special thanks to the cochlear implant recipients

who voluntarily participated in the listening tests. Their help is greatly appreciated. The

studies in this thesis would not be complete without their contribution.

I would like to thank friends and staff from the UNSW signal processing lab for their

encouragement and support.

Finally, I would like to thank my family for their love and endless support. I hope to spend

more time with you.

ii
Table of Contents

Abstract ..................................................................................................................................... i

Acknowledgements .................................................................................................................. ii

Table of Contents .................................................................................................................... iii

List of Figures ......................................................................................................................... ix

List of Tables ........................................................................................................................ xvi

Acronyms and Abbreviations ............................................................................................. xviii

1 Introduction ...................................................................................................................... 1

1.1 Thesis Objectives ................................................................................................... 1


1.2 Research Overview ................................................................................................ 1
1.3 Thesis Outline ........................................................................................................ 7
1.4 Thesis Contributions .............................................................................................. 9
1.5 Patents and Publications ...................................................................................... 11
2 Sound and Hearing ......................................................................................................... 14

2.1 Introduction .......................................................................................................... 14


2.2 Hearing Mechanism ............................................................................................. 14
2.3 Cochlear Compression ......................................................................................... 17
2.4 Auditory Models .................................................................................................. 18
2.5 Loudness Perception ............................................................................................ 19
2.6 Speech Perception ................................................................................................ 21
2.7 Conclusion ........................................................................................................... 23
3 Cochlear Implants .......................................................................................................... 24

3.1 Introduction .......................................................................................................... 24


3.2 Brief History of Cochlear Implants ...................................................................... 24
3.3 Cochlear Implant Systems ................................................................................... 26
3.4 Electrical Stimulation........................................................................................... 29
3.4.1 Stimulation Mode .................................................................................... 29
3.4.2 Current Configuration ............................................................................. 30
3.5 Loudness Perception ............................................................................................ 31
3.6 Conclusion ........................................................................................................... 33

iii
4 Cochlear Implant Sound Coding Strategies .................................................................... 34

4.1 Introduction .......................................................................................................... 34


4.2 Sound Coding Strategies ...................................................................................... 34
4.2.1 Continuous Interleaved Sampling (CIS).................................................. 35
4.2.2 HiResolution (HiRes) .............................................................................. 36
4.2.3 Spectral Peak (SPEAK) ........................................................................... 37
4.2.4 Advanced Combinational Encoder (ACE) .............................................. 37
4.2.5 Alternative Channel Selection Rules ....................................................... 38
4.3 Signal Processing Modules .................................................................................. 39
4.3.1 Microphone Directionality and Pre-emphasis ......................................... 39
4.3.2 Front-end Gain Control ........................................................................... 40
4.3.3 Filterbank................................................................................................. 40
4.3.4 Combine into Channels ........................................................................... 41
4.3.5 Channel Gains ......................................................................................... 42
4.3.6 Maxima Selection .................................................................................... 43
4.3.7 Loudness Growth Function ..................................................................... 43
4.3.8 Dynamic Range Selection ....................................................................... 45
4.3.9 Mapping................................................................................................... 46
4.4 Speech Perception ................................................................................................ 46
4.4.1 Amplitude Cues ....................................................................................... 48
4.4.2 Spectral Processing.................................................................................. 49
4.4.3 Temporal Processing ............................................................................... 50
4.5 Conclusion............................................................................................................ 51
5 Automatic Gain Control Systems ................................................................................... 52

5.1 Introduction .......................................................................................................... 52


5.2 Fundamentals of AGC .......................................................................................... 53
5.3 AGC in Hearing Aids ........................................................................................... 56
5.4 AGC in Cochlear Implant Systems ...................................................................... 64
5.5 AGC Systems in the Nucleus Sound Processors .................................................. 67
5.5.1 Compression Limiter ............................................................................... 68
5.5.2 Automatic Sensitivity Control ................................................................. 68
5.5.3 Whisper ................................................................................................... 70
5.5.4 Unified Gain Model (Tri-loop AGC) ...................................................... 70
5.5.5 Adaptive Dynamic Range Optimization.................................................. 72

iv
5.6 Noise Estimation .................................................................................................. 74
5.6.1 Martin’s Minimum Statistics Noise Estimator ........................................ 76
5.6.2 Lin’s Recursive Averaging Noise Estimator ........................................... 79
5.7 Conclusion ........................................................................................................... 80
6 Speech Intelligibility Metrics ......................................................................................... 81

6.1 Introduction .......................................................................................................... 81


6.2 Speech Intelligibility Index .................................................................................. 82
6.3 Speech Transmission Index ................................................................................. 84
6.4 Normalized Covariance Measure ......................................................................... 85
6.5 Apparent SNR ...................................................................................................... 87
6.6 Across-Source Modulation Correlation ............................................................... 88
6.7 Conclusion ........................................................................................................... 89
7 Test Methodology .......................................................................................................... 90

7.1 Introduction .......................................................................................................... 90


7.2 Test Materials....................................................................................................... 91
7.3 Test Methods........................................................................................................ 92
7.3.1 Fixed Method .......................................................................................... 94
7.3.2 Adaptive Method..................................................................................... 95
7.4 Test Methodology for Clinical Studies in this Thesis .......................................... 97
7.4.1 Test Setup................................................................................................ 97
7.4.2 Test Materials .......................................................................................... 98
7.4.3 Fixed Level Test...................................................................................... 99
7.4.4 Roving-level SRT Test............................................................................ 99
7.4.5 Research Platforms................................................................................ 102
7.5 Conclusion ......................................................................................................... 106
8 Investigating Effects of No AGC and Fast AGC on Cochlear Implant Speech
Intelligibility................................................................................................................. 107

8.1 Introduction ........................................................................................................ 107


8.2 Clinical Study..................................................................................................... 108
8.2.1 Subjects ................................................................................................. 108
8.2.2 Signal Processing .................................................................................. 108
8.2.3 Test Setup.............................................................................................. 110
8.2.4 Results ................................................................................................... 110
8.3 Discussions ........................................................................................................ 114

v
8.4 Conclusions ........................................................................................................ 116
9 Investigating Effects of Slow AGC and Fast AGC on Cochlear Implant Speech
Intelligibility ................................................................................................................. 117

9.1 Introduction ........................................................................................................ 117


9.2 Clinical Study ..................................................................................................... 118
9.2.1 Test Setup .............................................................................................. 118
9.2.2 Subjects ................................................................................................. 118
9.2.3 Signal Processing................................................................................... 118
9.2.4 Results ................................................................................................... 119
9.2.5 Effect of Presentation Level .................................................................. 124
9.2.6 Test-retest Reliability of Roving-level SRT Test with Single Adaptive
Track ............................................................................................................. 127
9.3 Discussions ......................................................................................................... 130
9.4 Conclusions ........................................................................................................ 133
10 Proposed Envelope Profile Limiter .............................................................................. 134

10.1 Introduction ........................................................................................................ 134


10.2 Signal Processing ............................................................................................... 135
10.2.1 Front-end Compression Limiter ............................................................ 135
10.2.2 Proposed Envelope Profile Limiter ....................................................... 136
10.3 Clinical Studies .................................................................................................. 138
10.3.1 Test Setup .............................................................................................. 138
10.3.2 Study Design ......................................................................................... 138
10.3.3 Experiment 1: High Presentation Level................................................. 138
10.3.4 Experiment 2: Performance-Intensity Function..................................... 139
10.4 Results ................................................................................................................ 140
10.4.1 Experiment 1: High Presentation Level................................................. 140
10.4.2 Experiment 2: Performance-Intensity Functions ................................... 143
10.5 Discussions ......................................................................................................... 147
10.6 Conclusions ........................................................................................................ 151
11 Take-home Study with the Proposed Envelope Profile Limiter ................................... 152

11.1 Introduction ........................................................................................................ 152


11.2 DSP Implementation .......................................................................................... 153
11.3 Fitting Procedures .............................................................................................. 154
11.4 Results and Discussions ..................................................................................... 156

vi
11.5 Conclusions ........................................................................................................ 160
12 Proposed Adaptive Loudness Growth Function........................................................... 161

12.1 Introduction ........................................................................................................ 161


12.2 Background ........................................................................................................ 161
12.3 Implementation of the ALGF............................................................................. 164
12.3.1 Fast Saturation Level Regulator ............................................................ 167
12.3.2 Slow Saturation Level Regulator .......................................................... 168
12.3.3 Base Level Regulator ............................................................................ 172
12.4 Offline Data Analysis ........................................................................................ 177
12.4.1 Comparison of the Noise Estimators..................................................... 177
12.4.2 Processing Conditions ........................................................................... 188
12.4.3 Offline Performance Analysis of the Gain Algorithms ......................... 192
12.5 Clinical Studies .................................................................................................. 201
12.5.1 Test setup .............................................................................................. 201
12.5.2 Study 1: Tri + ADRO vs. ALGF-1........................................................ 202
12.5.3 Study 2: Tri + ADRO vs. ALGF-2........................................................ 207
12.5.4 Study 2F: Adaptive vs. Fixed Dynamic Range ..................................... 215
12.5.5 Test-retest Reliability of Interleaved Roving-level SRT Test ............... 221
12.6 Conclusions ........................................................................................................ 224
13 Predicting Cochlear Implant Speech Intelligibility ...................................................... 226

13.1 Introduction ........................................................................................................ 226


13.2 Signal Processing ............................................................................................... 226
13.2.1 Test Stimuli ........................................................................................... 227
13.2.2 AGC Configurations ............................................................................. 228
13.2.3 Curve Fitting ......................................................................................... 228
13.3 Signal Metrics and Performance Analysis ......................................................... 229
13.3.1 Clipping Proportion............................................................................... 229
13.3.2 Output SNR ........................................................................................... 233
13.3.3 Across-Source Modulation Correlation ................................................ 237
13.3.4 Normalized Covariance Measure .......................................................... 241
13.4 Discussions and Conclusions ............................................................................. 244
14 Conclusions and Future Work ...................................................................................... 249

14.1 Summary of Experimental Results .................................................................... 249


14.2 Conclusions ........................................................................................................ 251
vii
14.3 Future Work ....................................................................................................... 256
Appendix 1: Subjects............................................................................................................ 259

Appendix 2: Cochlear Implant Clinical Questionnaire ........................................................ 261

Appendix 3: Statistics........................................................................................................... 267

References ............................................................................................................................ 269

viii
List of Figures

Figure 1-1 Overview of the dynamic range difference between acoustic and electric hearing 3

Figure 1-2 Gain optimization research overview ..................................................................... 7

Figure 2-1 Illustration of the peripheral auditory system....................................................... 14

Figure 2-2 Cross-section of the cochlea ................................................................................. 16

Figure 2-3 Block diagram of Lyon's auditory model ............................................................. 19

Figure 3-1 Cochlear implant system ...................................................................................... 26

Figure 3-2 Nucleus 5 system.................................................................................................. 28

Figure 3-3 Stimulation of the biphasic waveform (left panel) and two current waveforms

with equal charge (right panel) .................................................................................... 30

Figure 3-4 Sequential pulse stimulation, showing timing and amplitudes ............................ 31

Figure 4-1 Cochlear implant sound processing (Swanson 2008)........................................... 35

Figure 4-2 Continuous Interleaved Sampling strategy (Wilson 2006b) ............................... 36

Figure 4-3 Signal processing modules of the ACE strategy .................................................. 38

Figure 4-4 Magnitude response of 22-channel filterbank ...................................................... 41

Figure 4-5 Instantaneous infinite non-linear compression of LGF ........................................ 44

Figure 4-6 Electrodogram of the monosyllabic word ‘Choice’ ............................................. 47

Figure 4-7 Reconstructed spectrogram of the monosyllabic word ‘Choice’ ......................... 48

Figure 5-1 Input-output diagram of an AGC with different compression ratios ................... 54

Figure 5-2 Components of an AGC system ........................................................................... 55

Figure 5-3 Behaviour of a typical AGC system ..................................................................... 56

Figure 5-4 Intelligibility score as a function of the number of channels, with compression

ratio as a parameter (Plomp 1994) ............................................................................... 62

Figure 5-5 Block diagram of dual time-constant AGC system (Boyle et al. 2009) ............... 66

Figure 5-6 AGC systems of the Nucleus CP810 sound processor ......................................... 68

Figure 5-7 Block diagram of Automatic Sensitivity Control (Seligman 2000) ..................... 69

Figure 5-8 Input-output diagram of Whisper ......................................................................... 70

Figure 5-9 Input, output and gain signals of the tri-loop AGC on a roving-level sinusoid ... 72

Figure 5-10 Block diagram of ADRO in one frequency channel .......................................... 74

ix
Figure 6-1 Speech modulation envelope spectrum (Houtgast and Steeneken (1985)) ........... 84

Figure 7-1 Top-level architecture of Champ (Swanson et al. 2007) .................................... 103

Figure 7-2 Components of the real-time Nucleus-xPC system (Goorevich 2005).............. 105

Figure 7-3 ACE sound coding strategy with the standard front-end AGC (blue block) and the

proposed AGC (green block) ..................................................................................... 106

Figure 8-1 Signal path used in the experiment ..................................................................... 109

Figure 8-2 Percent correct scores of four cochlear implant subjects with no AGC and with

FEL75 (legends of the curves are as described in Figure 8-3) ................................... 111

Figure 8-3 Group mean scores of four cochlear implant recipients with no AGC and with

FEL75 ........................................................................................................................ 111

Figure 9-1 SRT of seven cochlear implant subjects measured by the roving-level SRT test

with a single adaptive track. The error bar indicates one standard error. The asterisks

indicate statistically significant difference in performance between the two AGC

systems (* p < 0.05, ** p < 0.01). .............................................................................. 120

Figure 9-2 SRT of seven cochlear implant subjects at 80 dB SPL measured by the

interleaved roving-level SRT test. The error bars indicate one standard error. The

asterisks indicate statistically significant difference in performance between the two

AGC systems (* p < 0.05, ** p < 0.01). .................................................................... 121

Figure 9-3 SRT of seven cochlear implant subjects at 65 dB SPL measured by the

interleaved roving-level SRT test. The error bars indicate one standard error. The

asterisks indicate statistically significant difference in performance between the two

AGC systems (* p < 0.05, ** p < 0.01). .................................................................... 122

Figure 9-4 An example of bad SRT convergence due to the lack of audibility at 50 dB SPL.

Left panel shows the convergence of SRT over trials and right panel shows mean

percent correct words at each SNR. ........................................................................... 123

Figure 9-5 Percent correct scores of seven cochlear implant subjects at 50 dB SPL. The error

bars indicate one standard error. The asterisks indicate statistically significant

difference in performance between the two AGC sytems (* p < 0.05, ** p < 0.01).. 124

Figure 9-6 Comparison of SRTs between 65 and 80 dB SPL for the FEL program. The error

bars indicate one standard error. The asterisks indicate statistically significant

x
difference in performance between the two test conditions (* p < 0.05, ** p < 0.01).

................................................................................................................................... 125

Figure 9-7 Comparison of SRTs between 65 and 80 dB SPL the Tri + ADRO program. The

error bars indicate one standard error. The asterisks indicate statistically significant

difference in performance between the two test conditions (* p < 0.05, ** p < 0.01).

................................................................................................................................... 126

Figure 9-8 Test-retest variability of the roving SRT test with single adaptive track from the

SRT of six subjects taken from test and retest sessions. Top panel shows SRTs of each

subject and group mean for test and retest. The bottom panel shows the SRT

difference between test and retest, and mean of the absolute SRT differences. ........ 127

Figure 9-9 Adaptive tracks of the roving-level SRT test with a single adaptive track for S5

with the Tri + ADRO in the test and retest sessions. The convergence of SNRs was

poor in the test session (top panel) and good in the retest session (bottom). ............. 129

Figure 10-1 Block diagram of ACE signal path with the envelope profile limiter (EPL) ... 136

Figure 10-2 Envelope clipping of a vowel; at spectral waveform (top panel) and temporal

waveform (bottom panel) at the output of the LGF, processed by FEL and EPL ..... 137

Figure 10-3 Effects of gain structure and release time on speech intelligibility of six cochlear

implant subjects in quiet (top panel) and in noise (bottom panel). Error bars indicate

one standard error. ..................................................................................................... 140

Figure 10-4 Performance-intensity functions of FEL75 and EPL625 ................................. 144

Figure 10-5 Comparison of scores between the FEL75 and the EPL625 for the presentation

levels above 70 dB SPL at SNR 20 dB (top panel) and SNR 10 dB (bottom panel).

The asterisks indicate statistically significant difference in performance between the

two AGCs (* p < 0.05, ** p < 0.01). ......................................................................... 145

Figure 10-6 Proportion of clipping for speech presented at 89 dB SPL with the front-end

compression limiter with the release time 75 ms and 625 ms ................................... 148

Figure 11-1 The DSP 1 signal path of the Nucleus CP810 sound processor with a switch

between UGM and EPL ............................................................................................. 154

Figure 11-2 Helpfulness indication of the UGM and EPL programs for each question in

CICQ .......................................................................................................................... 156

xi
Figure 11-3 Helpfulness indication of the UGM and EPL programs in the categorised

listening conditions .................................................................................................... 158

Figure 12-1 Degree of freedom for the input signal to move within the dynamic range of

LGF ............................................................................................................................ 163

Figure 12-2 Top level Simulink block diagram of ALGF .................................................... 165

Figure 12-3 Simulink block diagram of fast saturation level regulator ................................ 168

Figure 12-4 Simulink block diagram of slow saturation level regulator .............................. 169

Figure 12-5 Simulink block diagram of clipping proportion calculation ............................. 170

Figure 12-6 Simulated listening condition showing the slow saturation level with the hold

distance of 0 dB (top panel) and 15 dB (bottom panel) ............................................. 172

Figure 12-7 Simulink block diagram implementation of base level regulator ..................... 173

Figure 12-8 Simulink block diagram of Lin's recursive averaging noise floor estimator .... 174

Figure 12-9 Simulink block diagram of smoothing parameter calculation .......................... 174

Figure 12-10 Simulink block diagram of the proposed MCRA noise floor estimator ......... 175

Figure 12-11 Simulink block diagram of the minima-controlled feature in the proposed

MRCA noise estimation algorithm ............................................................................ 176

Figure 12-12 Fixed-level Sentences presented with three types of noise: four-talker babble,

city noise and LTASS noise ....................................................................................... 178

Figure 12-13 Estimation of three different noises presented at the fixed level by Martin’s

minimum statistics method (top panel), Lin’s recursive averaging method (middle

panel) and the proposed MCRA method (bottom panel) ........................................... 179

Figure 12-14 Probability density functions of the normalized error for estimating fixed level

noises.......................................................................................................................... 180

Figure 12-15 Roving-level sentences presented in four-talker babble (top panel), and LTASS

noise (bottom panel)................................................................................................... 181

Figure 12-16 Estimation of roving-level four-talker babble noise by: Martin’s minimum

statistics method (top panel), Lin’s recursive averaging method (middle panel) and the

proposed MCRA method (bottom panel) ................................................................... 183

Figure 12-17 Probability density function of the normalized error for estimating the roving-

level four-talker babble noise ..................................................................................... 184

xii
Figure 12-18 Estimation of roving-level LTASS noise from the noisy speech by: Martin’s

minimum statistics method (top panel), Lin’s recursive averaging method (middle

panel) and the proposed MCRA method (bottom panel) ........................................... 186

Figure 12-19 Probability density function of the normalized error for estimating roving-level

LTASS noise.............................................................................................................. 187

Figure 12-20 Simulink block diagram of the ALGF with a fixed dynamic range setting ... 191

Figure 12-21 Simulink block diagram of the Nucleus signal path with the existing AGC

systems and the ALGF............................................................................................... 193

Figure 12-22 Input, output and gain signals produced from the Nucleus signal path with Tri

+ ADRO. There are three sentence presentations in the figure, having levels of 65, 80

and 50 dB SPL. Each presentation consists of three seconds of noise, then the

sentence, and then three seconds of noise. Signals at the frequency channels centred at

367 Hz (top panel), 1101 Hz (middle panel) and 4282 Hz (bottom panel) were

analyzed. .................................................................................................................... 195

Figure 12-23 Input, output and gain signals produced from the Nucleus signal path with the

ALGF-1. There are three sentence presentations in the figure, having levels of 65, 80

and 50 dB SPL. Each presentation consists of three seconds of noise, then the

sentence, and then three seconds of noise. Signals at the frequency channels centred at

367 Hz (top panel), 1101 Hz (middle panel) and 4282 Hz (bottom panel) were

analyzed. .................................................................................................................... 197

Figure 12-24 Input, output and gain signals produced from the Nucleus signal path with

ALGF-2. There are three sentence presentations in the figure, having levels of 65, 80

and 50 dB SPL. Each presentation consists of three seconds of noise, then the

sentence, and then three seconds of noise. Signals at the frequency channels centred at

367 Hz (top panel), 1101 Hz (middle panel) and 4282 Hz (bottom panel) were

analyzed. .................................................................................................................... 199

Figure 12-25 Input, output and gain signals produced from the Nucleus signal path with the

ALGF-2F. There are three sentence presentations in the figure, having levels of 65, 80

and 50 dB SPL. Each presentation consists of three seconds of noise, then the

sentence, and then three seconds of noise. Signals at the frequency channels centred at

xiii
367 Hz (top panel), 1101 Hz (middle panel) and 4282 Hz (bottom panel) were

analyzed. .................................................................................................................... 200

Figure 12-26 SRT comparison between Tri + ADRO and ALGF-1. Error bars indicate one

standard deviation from the mean. The asterisks indicate statistically significant

difference in performance between the two processing conditions (* p < 0.05, ** p <

0.01). .......................................................................................................................... 203

Figure 12-27 Percent correct scores of four cochlear implant subjects with Tri + ADRO and

with ALGF-1 in the fixed test at 50 dB SPL in noise. Error bars indicate one standard

deviation from the mean. ........................................................................................... 205

Figure 12-28 Percent correct scores of four cochlear implant subjects with Tri + ADRO and

with ALGF-1 in the fixed test at 80 dB SPL in noise. Error bars indicate one standard

deviation from the mean. ........................................................................................... 205

Figure 12-29 SRT comparison between Tri + ADRO and ALGF-2. Error bars indicate one

standard deviation from the mean. The asterisks indicate the statistical significance of

the difference between the two processing conditions (* p < 0.05, ** p < 0.01). ...... 209

Figure 12-30 Percent correct scores of five cochlear implant subjects with Tri + ADRO and

with ALGF-2 in the fixed test at 50 dB SPL in noise. Error bars indicate one standard

deviation from the mean. ........................................................................................... 210

Figure 12-31 Percent correct scores of three cochlear implant subjects with Tri + ADRO and

with ALGF-2 in the fixed test at 80 dB SPL in noise. Error bars indicate one standard

deviation from the mean. ........................................................................................... 211

Figure 12-32 SRT Comparison between ALGF-2 and ALGF-2F. Error bars indicate one

standard deviation from the mean value. The asterisks indicate the statistical

significance of the difference between the two ALGFs (* p < 0.05, ** p < 0.01). .... 217

Figure 12-33 Percent correct scores of four cochlear implant subjects with ALGF-2 and with

ALGF-2F in the fixed test at 50 dB SPL in noise. Error bars indicate one standard

deviation from the mean. ........................................................................................... 218

Figure 12-34 Test-retest variability of the interleaved SRT test from the SRT of four subjects

with Tri + ADRO taken from Study 1 and Study 2. Error bars indicate one standard

deviation from the mean. The asterisks indicate the statistical significance of the

difference between the two studies (* p < 0.05, ** p < 0.01). ................................... 222

xiv
Figure 12-35 SRT differences between the two studies of each subject. The difference was

calculated as: SRT (Study 1) – SRT (Study 2). The mean was calculated as the

average of the absolute SRT differences between the two studies. ........................... 223

Figure 13-1 Percent correct scores of the cochlear implant subjects with different AGC

configurations in the fixed level test. Open and filled symbols represent 10 dB SNR

and 20 dB SNR respectively. ..................................................................................... 227

Figure 13-2 Comparison of signal amplitudes processed by each AGC configuration, for

speech presented at 80 dB SPL in the presence of four-talker babble at SNR 10dB.

The envelopes at channel 7 were taken before the LGF. ........................................... 230

Figure 13-3 Percent correct scores of the individual subject as a function of clipping

proportion. The open and filled symbols represent the results at SNR 10 dB and 20 dB

respectively. ............................................................................................................... 231

Figure 13-4 Group mean percent correct scores as a function of clipping proportion. The top

panel shows the scores in all conditions, the middle panel shows the scores at 20 dB

SNR and the bottom panel shows the scores at 10 dB SNR. ..................................... 232

Figure 13-5 Block diagram of output SNR calculation for the signal path. The front-end

AGC was used as an example in this diagram. .......................................................... 233

Figure 13-6 Percent correct score of individual subject as a function of output SNR. The

open and filled symbols represent the results in 10 dB and 20 dB SNR respectively.

................................................................................................................................... 235

Figure 13-7 Group mean scores as a function of output SNR. The top panel shows the scores

in all conditions, the middle panel shows the scores at 20 dB SNR and the bottom

panel shows the scores at 10 dB SNR........................................................................ 236

Figure 13-8 ASMC calculation for the signal path. The front-end AGC was used as an

example in this diagram. ............................................................................................ 237

Figure 13-9 Percent correct score of the individual subject as a function of ASMC. The open

and filled symbols represent the results at SNR 10 dB and 20 dB respectively. ....... 238

Figure 13-10 Group mean scores as a function of ASMC. The top panel shows the scores in

all conditions, the middle panel shows the scores at 20 dB SNR and the bottom panel

shows the scores at 10 dB SNR. ................................................................................ 239

xv
Figure 13-11 Percent correct score of the individual subject as a function of as a function of

NCM. The open and filled symbols represent the results in SNR 10 dB and 20 dB

respectively. ............................................................................................................... 242

Figure 13-12 Group mean scores as a function of NCM index. The top panel shows the

scores in all conditions, the middle panel shows the scores at 20 dB SNR and the

bottom panel shows the scores at 10 dB SNR. ........................................................... 243

Figure 13-13 Comparison of deviance between four signal metrics .................................... 244

Figure 13-14 Effect of release time on speech intelligibility predicted by ASMC (top panel),

NCM (middle panel) and OSNR (bottom panel) ....................................................... 247

Figure 14-1 Application of the envelope profile limiter in the bilateral sound processing .. 258

List of Tables

Table 3-1 Recent cochlear implant systems of the major cochlear implant companies ......... 27

Table 3-2 Nucleus cochlear implant models .......................................................................... 28

Table 3-3 Nucleus sound processors ...................................................................................... 29

Table 5-1 Rationales for different AGC systems ................................................................... 58

Table 5-2 Parameter settings of the UGM.............................................................................. 71

Table 7-1 Test materials and methods used in AGC studies .................................................. 94

Table 7-2 Test materials and methods used in the clinical studies of this thesis.................. 101

Table 8-1 Parameter setting of the front-end compression limiter ....................................... 109

Table 8-2 Statistical analysis of the score difference between the front-end compression

limiter and no AGC for the presentation levels above 70 dB SPL in two SNR

conditions. The asterisks indicate statistically significant difference in performance

between no AGC and FEL75 (* p < 0.05, ** p < 0.01). ............................................ 112

Table 9-1 Parameter setting of each program....................................................................... 119

Table 10-1 AGC configurations tested ................................................................................. 138

Table 10-2 Statistical analysis on the scores of the individual and group. The score difference

was obtained by subtracting the score of the first AGC from the second AGC. The

xvi
asterisks indicate statistically significant difference in performance between the two

AGCs (* p < 0.0125, ** p < 0.0025). ........................................................................ 142

Table 10-3 Statistical analysis of the scores between the FEL75 and the EPL625 for the

presentation levels above 70 dB SPL in two SNR conditions. The asterisks indicate

statistically significant difference in performance between the two AGCs (* p < 0.05,

** p < 0.01)................................................................................................................ 146

Table 11-1 Combination of AGCs in each program. Zoom is a fixed beamformer with a

super-cardioid polar response. ................................................................................... 155

Table 11-2 Mean benefit scores and preferred program in quiet and noisy background. S4 did

not answer some questions, as indicated by dashes. .................................................. 159

Table 12-1 Setting of other program parameters ................................................................. 188

Table 12-2 Parameter setting of three configurations of ALGF .......................................... 189

Table 12-3 Statistical analysis of SRT measured with Tri + ADRO and the ALGF-1 at 50, 65

and 80 dB SPL. The value inside the bracket indicates the standard deviation. p-values

were calculated by a t-test for the hypothesis testing of the significance on the SRT

difference. .................................................................................................................. 204

Table 12-4 Statistical analysis of SRT measured with Tri+ADRO and the ALGF-2 at 50, 65

and 80 dB SPL. The value inside the bracket indicates the standard deviation. p-values

were calculated by a t-test for the hypothesis testing of the significance on the SRT

difference. .................................................................................................................. 210

Table 12-5 Retrospective comparison with SRT results from other studies that used a roving-

level SRT test............................................................................................................. 214

Table 12-6 Statistical analysis of SRT measured with ALGF-2 and the ALGF-2F at 50, 65

and 80 dB SPL. The value inside the bracket indicates the standard deviation. p-values

were calculated by a t-test for the hypothesis testing of the significance on the SRT

difference. .................................................................................................................. 218

xvii
Acronyms and Abbreviations

ACE Advanced Combinational Encoder (strategy)


ADC Analogue to Digital Converter

ADRO Adaptive Dynamic Range Optimization (automatic gain control algorithm)


AGC Automatic Gain Control
ALGF Adaptive Loudness Growth Function

ASC Automatic Sensitivity Control (automatic gain control algorithm)


AVC Automatic Volume Control (automatic gain control algorithm)

BLR Base Level Regulator


BTE Behind-The-Ear (sound processor)
C-level Maximum Comfortable level
CDI Cochlear Device Interface
CG Common Ground (stimulation mode)
CI Cochlear Implant
CIS Continuous Interleaved Sampling (strategy)
CP810 The Cochlear Nucleus 5 Sound Processor
DSP Digital Signal Processing
EPL Envelope Profile Limiter (automatic gain control algorithm)
FEL Front-End compression Limiter (automatic gain control algorithm)
FFT Fast Fourier Transform
FSLR Fast Saturation Level Regulator
HiRes HiResolution (strategy)

xviii
LGF Loudness Growth Function
LTASS Long-Term Average Speech-Shaped (noise)

MAP The clinical program parameters for an individual recipient


MCRA Minima-Controlled Recursive Average (noise estimator)
MP Monopolar (stimulation mode)

NMB Nucleus MATLAB Blockset (Simulink)


NMT Nucleus MATLAB Toolbox
P-I function Performance-Intensity Function
pps Pulses per second
SLR Saturation Level Regulator
SNR Signal-to-Noise Ratio
SPEAK Spectral Peak (strategy)

SRT Speech Reception Threshold


SSLR Slow Saturation Level Regulator

StimGen Stimulus Generator (test tool)


T-level Threshold level
Tri-loop Front-end AGC system with three control loops
UGM Unified Gain Model (automatic gain control algorithm)
WDRC Wide Dynamic Range Compressor (automatic gain control algorithm)

Whisper Front-end wide dynamic range compressor


xPC “extra” PC processing platform from the Mathworks

xix
Chapter 1 Introduction

1 Introduction

1.1 Thesis Objectives

The goal of this thesis is to improve the speech intelligibility of cochlear implant recipients

using automatic gain control (AGC) techniques; in particular, to allow recipients to better

cope with changing sound levels in the presence of noise.

The objectives that align with the above goal are:

 To evaluate the performance of existing AGC systems on the speech intelligibility of

cochlear implant recipients;

 To develop and evaluate new algorithms for optimizing signal levels within the

limited dynamic range of electrical stimulation;

 To investigate test methodology for evaluating AGC systems;

 To investigate speech metrics to predict the speech intelligibility of cochlear implant

recipients.

1.2 Research Overview

When a person has lost most of the hair cells in the inner ear, no amount of amplification can

restore normal hearing. A cochlear implant (CI) is a hearing prosthesis for severe-to-

profound hearing impaired persons who can no longer benefit from hearing aids. It bypasses

the transduction process of the inner ear and directly stimulates the auditory nerve to provide

hearing sensation. To date, the cochlear implant is the most successful of all neural

prostheses developed (Wilson and Dorman 2008).

Cochlear implant systems have improved remarkably with technology advances over the last

four decades. The performance of cochlear implant recipients has increased from relying on

lip-reading to telephone usage and music appreciation. The speech understanding of top-

performing cochlear implant recipients is comparable to normal hearing subjects in easy

listening conditions. However, the speech intelligibility of cochlear implant recipients

P age |1
Chapter 1 Introduction

degrades significantly in adverse listening conditions. Unlike normal hearing subjects who

can get benefits from the redundant spectral and temporal cues of speech waveforms in

adverse listening conditions, cochlear implant recipients cannot achieve similar listening

performance due to the limited spectral and temporal cues available. The loss of the inner ear

functions, particularly the loss of a sophisticated gain control mechanism of the outer hair

cells, causes speech intelligibility degradation in adverse listening conditions.

Speech and other sounds vary over a wide range of levels in everyday listening

environments. The overall level of everyday speech can vary in a 35 dB range from casual

conversation to shouting (Pearsons, Bennett and Fidell 1977; Olsen 1998). The individual

components of a segment of speech vary over 40 to 50 dB range when the overall

presentation level remains constant (Boothroyd, Erickson and Medwetsky 1994; Zeng et al.

2002). A normal hearing subject can hear a 120 dB range of acoustic signals. Compared to

that, the dynamic range of electrical current pulses in a cochlear implant is considerably

smaller (5 ~ 20 dB) (Nelson et al. 1995). With constraints on the range of presentable

amplitudes, frequency channels and stimulation rate, an ongoing challenge of cochlear

implant research has been how to best convey the information in acoustic signals onto the

electrodes.

If a very wide range of channel envelopes was mapped into the electrical dynamic range,

fine details of spectro-temporal waveform variations would be lost. It would not matter much

if a large number of discriminable current steps were available within the electrical dynamic

range. The study of Nelson et al. (1996) showed that the cumulative number of discriminable

intensity steps across the dynamic range of electric hearing ranged from as few as 6.6 to as

many as 45.2. Therefore the operating range of input signals is restricted to optimally present

the most relevant range of acoustic signals, i.e., speech. From the study of Cosendai and

Pelizzone (2001), the input dynamic range of more than 45 dB was necessary for optimal

speech recognition. Dawson et al. (2007) showed that increasing the input dynamic range

from 31 dB to 46 and 56 dB improved word recognition at low presentation levels in quiet.

Spahr et al. (2007) highlighted the importance of the input dynamic range setting and dual-

loop (slow and fast time constants) AGC system for speech intelligibility in the performance

P age |2
Chapter 1 Introduction

study on different cochlear implant systems. Figure 1-1 shows the overview of the dynamic

range requirement of the cochlear implant hearing.

Figure 1-1 Overview of the dynamic range difference between acoustic and electric
hearing

The input dynamic range of a cochlear implant system determined the range of input signals

(shown between C-SPL and T-SPL in Figure 1-1) that would be stimulated within the

electrical dynamic range (between C-level and T-level) with no gain adjustment. An AGC is

necessary in the transformation stage for input signals with the presentation levels beyond

the C-SPL and T-SPL. AGC compresses the dynamic range of input stimuli into the input

dynamic range of the system. It is more effective than a linear gain because it can improve

the audibility of soft sounds without amplifying those already-loud sounds.

It is important to have access to low level sounds to expend less effort to maintain speech

intelligibility for different acoustic scenarios. The softest components of everyday speech are

approximately 25 dB SPL. The overall level of speech is low if the talker is a soft-spoken

individual, for example a child, or if the talker-listener distance is far because sound pressure

level decreases as per the inverse square law. Many listening situations occur over a

distance, for example, class participation during lecture.

Many studies showed that having access to low level sounds improved intelligibility of soft

speech of cochlear implant recipients (Skinner et al. 1997; James et al. 2002; Firszt et al.

2004; Muller-Deile et al. 2008; Boyle et al. 2013). Access to soft speech can be beneficial
P age |3
Chapter 1 Introduction

for children with incidental learning situations. Different methods to improve intelligibility

of soft speech sounds have been investigated by other researcher. From Holden et al. (2011)

study, half of the cochlear implant subjects preferred more than one input dynamic range

settings. Holden et al. recommended raised T-levels ( > 10% of the maximum current levels)

and a wide input dynamic range for soft speech understanding in quiet and a narrow input

dynamic range in noisy conditions as clinical guidelines.

Having access to low-level input stimuli often cannot avoid unwanted low-level noise from

entering into the system. Some recipients found such noise objectionable (Holden et al.

2011). Skinner et al. (1999) showed that raised T-levels improved the understanding of soft

speech sounds of Nucleus 22 implant recipients and the recipients preferred to use in a quiet

listening condition. Although the measured speech intelligibility showed an improvement in

noise with raised T-levels, there was a trend towards subjects preferring the default T-levels

in noisy listening conditions. Souza et al. (2000) showed a poor correlation between the

improved speech audibility of hearing impaired subjects measured in the clinic and everyday

communication benefit reported by the patients. These studies indicated a research space in

AGC on amplification of low-level input stimuli to improve speech intelligibility.

A decline in speech recognition performance of normal hearing (Dubno, Horwitz and

Ahlstrom 2005) and normal and hearing impaired listeners (Studebaker et al. 1999) was

observed when speech presentation level was increased above the normal conversation level

while the signal-to-noise ratio was fixed. Both studies indicated that a decrease in the

effective signal-to-noise ratio with the increase in speech presentation level was the main

cause of performance degradation. In cochlear implant systems, the most obvious form of

distortion affecting the channel envelopes of high-level input signals is envelope clipping at

the saturation level of the loudness growth function (LGF). Clipping can degrade the speech

intelligibility by distorting amplitude cues. Shannon et al. (2001) showed that the vowel

recognition of normal hearing subjects listening to four-channel noise vocoder was

monotonically degraded with increasing percentage of envelope clipping. Shannon et al.

indicated that the amplitude pattern was important for speech intelligibility if the frequency

components were poorly presented. It is hypothesized that speech intelligibility in noise can

be improved by avoiding peak-clipping at the output signal because spectral peaks of

P age |4
Chapter 1 Introduction

formants are likely to be preserved in background noise. This thesis will investigate the

importance of spectral envelope cues for speech intelligibility of cochlear implant recipients.

With advances in technology, speech intelligibility of experienced cochlear implant

recipients is nearing to 100% for sentences in quiet. However, speech intelligibility of

cochlear implant recipients degrades in noise. In their performance study of cochlear implant

recipients from multiple clinics, Lazard et al. (2012) indicated that the performance variation

was biased by using good performers in the studies and the performance of cochlear implant

recipients in noise still remained a challenge.

Cochlear implant recipients have more difficulties in speech understanding than normal

hearing persons in noisy conditions due to the degraded spectral resolution (Fu, Shannon and

Wang 1998; Stickney et al. 2004a) and poor preservation and processing of fine temporal

structures (Qin and Oxenham 2003; Nie, Barco and Zeng 2006). When the background is a

single competing talker or amplitude-modulated noise, the difference in Speech Reception

Threshold (SRT) between normal and hearing-impaired subjects can be as large as 15 dB

(Moore, Peters and Stone 1999; Moore 2003b). When the background noise is stationary, the

difference is usually around 2 to 5 dB (Plomp 1994).

The purpose of a hearing device is to rehabilitate normal hearing. One of the most important

tasks is to provide speech intelligibility in various listening conditions given limited hearing

functions. The performance shortcomings of cochlear implant recipients in adverse listening

environments, in which sounds operate beyond the designated dynamic range, show the

importance of gain optimization research for cochlear implant systems. An AGC is essential

to adapt the system to different listening situations by adjusting the gain slowly and to

normalize loudness between soft and loud components of speech by adjusting the gain

dynamically. To design an effective AGC system, it is important to understand the

relationship between spectral and temporal characteristics of acoustic signals and the

compression system (White 1986).

Substantial research has been done on AGC systems in hearing aids (Dillon 2001; Souza

2002; Kates 2010). Compared to that, the amount of research done on AGC systems in

cochlear implant systems is relatively small. The review of Souza (2002) on the effects on

P age |5
Chapter 1 Introduction

compression on speech intelligibility and quality stated that compression has increased in

complexity with greater numbers of parameters which are under the clinician’s control. The

advances in compression hearing aids bring greater flexibility, precision in fitting and

selection to the users. Together with these advantages, the need for more information about

the effects of compression amplification on speech perception and quality is also increased.

The gain optimization research in this thesis analyses the important factors in the envelope

distortion for speech intelligibility. It then investigates the effects of AGC on speech

intelligibility over a wide range of presentation level. The robustness of different AGCs for

speech presented at different levels in noise will also be investigated. The aim is to expand

the operating range of input stimuli beyond the designated range between C-SPL and T-SPL.

Figure 1-2 shows the overview of the gain optimization techniques in this thesis. The

investigation is carried out in two stages. In stage 1, the existing front-end and multichannel

AGC systems in the Nucleus signal path will be investigated. The feasibility of consolidating

various AGC systems at a point after the filterbank will be studied. The aim is to simplify the

signal path and improve the envelope presentation within the dynamic range of electrical

stimulation. In stage 2, the feasibility of adapting the input dynamic range to the dynamic

range of input stimuli in various test conditions will be investigated. The research in this

thesis also involves investigation of test methods and signal metrics to evaluate the gain

optimization techniques.

P age |6
Chapter 1 Introduction

Figure 1-2 Gain optimization research overview

1.3 Thesis Outline

This thesis consists of two major parts.

Part 1 (Chapters 2 – 7) reviews acoustic and electric hearing, concerning AGC functions of

the inner ear and a cochlear implant system.

Chapter 2 is concerned with acoustic hearing. It examines the normal hearing mechanism,

with emphasis on the compressive function of the inner ear.

Chapter 3 is concerned with electric hearing in cochlear implant systems.

Chapter 4 presents sound coding strategies and speech perception in cochlear implant

systems. It concentrates the signal processing implementation of the Advanced

Combinational Encoder (ACE) strategy in the Nucleus CP810 sound processor.

P age |7
Chapter 1 Introduction

Chapter 5 broadly discusses AGC systems of hearing aids and cochlear implant systems. It

then explains the existing AGC systems of the Nucleus CP810 sound processor. Noise floor

estimation methods are also studied in this chapter.

Chapter 6 examines signal metrics that attempt to predict the effect of AGC systems on the

speech intelligibility of cochlear implant recipients.

Chapter 7 discusses test methods and procedures for evaluating speech intelligibility with

cochlear implant recipients. It also describes the test materials and methods used in the

clinical studies of this thesis.

Part 2 (Chapters 8 – 13) contains the experimental work of this thesis.

Chapter 8 studies the performance-intensity functions of cochlear implant recipients with no

AGC and with a front-end compression limiter (fast AGC).

Chapter 9 measures the Speech Reception Thresholds (SRTs) of cochlear implant recipients

with two existing AGC systems. It also discusses shortcomings of the existing roving-level

SRT tests.

Chapter 10 describes a new multichannel envelope profile limiter with a spectral profile

preserving feature. It examines the effect of preserving spectral envelope cues and the effect

of compression speed by comparing the envelope profile limiter and the front-end

compression limiter with different release times.

Chapter 11 explains the implementation of the proposed envelope profile limiter in the

CP810 sound processor, and its evaluation in a take-home experiment. The feedback of the

cochlear implant recipients after experiencing the new AGC in their daily life is also

discussed.

Chapter 12 describes design and implementation of a new dynamic range optimization

technique for cochlear implant systems. The performance of the new algorithm was

compared with the performance of existing AGC systems by conducting acute listening tests

with cochlear implant recipients.

P age |8
Chapter 1 Introduction

Chapter 13 studies the correlation between existing signal metrics and the measured speech

intelligibility of cochlear implant recipients. It then develops a new metric that could predict

some of the speech scores of cochlear implant recipients from previous chapters.

Chapter 14 provides a general discussion of the experimental results, summarizes the

findings of the thesis, and suggests avenues for future research.

Appendix 1 lists the biographical details of the cochlear implant subjects who participated in

the experiments of this thesis. A cross-reference is provided showing which experiments

they participated in.

Appendix 2 lists the questionnaire used to survey the experience of a cochlear implant

subject with two programs under comparison in different acoustic scenarios.

Appendix 3 explains the statistical methods used for analyzing the experimental results.

1.4 Thesis Contributions

This research provides original contributions on the dynamic range compression of cochlear

implant systems and evaluation techniques. The major contributions of the research can be

summarized as follows:

Effect of envelope clipping on speech intelligibility: The effect of envelope distortion over

a wide range of presentation levels is investigated. This is the first report on the signal path

with no AGC, which represents the worst case processing for presentation levels beyond the

nominal level. With a good SNR, the subjects showed high levels of speech understanding

at very high presentation levels, in which channel envelopes were heavily clipped. Hence,

preserving the envelope shape is not important when other cues, spectral and slow temporal

modulation, of the target signal are not seriously distorted by competing noise.

Effect of optimizing spectral envelopes: A new compression limiter (envelope profile

limiter) is proposed that eliminates envelope clipping at the LGF. The effect of preserving

channel envelopes is studied in parallel with the effect of compression speed. Compression

speed is far more important than preserving the spectral envelope profile. However,

P age |9
Chapter 1 Introduction

preserving the channel envelope profile is important when temporal envelope cues are

reduced by fast compression speed in quiet. Similarly, preserving the channel envelope

profile is important in noise even when the compression speed is slow. This study also

demonstrates the feasibility of consolidating AGC systems after the filterbank to have a

better representation of spectral envelope cues with respect to the saturation level of the

LGF.

Importance of an AGC with noise-control in real-life listening: The envelope profile

limiter was implemented in the Nucleus CP810 sound processor for take-home use. The

subjects answered a set of questions to compare their everyday program (with front-end

slow and fast AGC, and the multichannel slow AGC) and the modified program with the

envelope profile limiter (and the multichannel slow AGC). For overall comparison between

the two programs, the subjects rated the envelope profile limiter program equivalent to their

everyday program. However, they preferred their everyday program particularly in noisy

situations. The difference between the two programs is the front-end slow AGC that slowly

reduces the overall level when the background noise is high. This study clearly indicates the

importance of noise control feature in an AGC for listening comfort in real-life listening

situations where noise is inevitable.

A novel signal level optimization algorithm: Conventional AGCs control the level of an

input signal to be within the designated dynamic range by varying the gain. That sometimes

compromises the operation when the goals contradict, for example improving audibility

without increasing background noise level. A novel signal level optimization algorithm

called Adaptive Loudness Growth Function (ALGF) is proposed in this thesis. The

innovative scheme of the ALGF continually contracts or expands the dynamic range of the

LGF to achieve both level adjustment of input stimuli and noise reduction simultaneously.

The speech intelligibility of the cochlear implant recipients with the ALGF was equal or

better than with the existing AGC systems (the tri-loop AGC and ADRO) of the Nucleus

CP810 for roving-level sentences presented at 50, 65 and 80 dB SPL.

Improvement to the roving-level SRT test: The test-retest reliability of the existing

roving-level SRT tests was found to be relatively poor. An improvement to the roving-level

P a g e | 10
Chapter 1 Introduction

SRT is proposed to fix the presentation level sequence to eliminate the performance bias by

the unbalanced randomized sequence.

Predicting speech performance by signal metrics: Performance of AGC systems in noise

is affected by many factors: temporal envelope distortion, output SNR reduction and cross-

modulation between speech and noise. Four signal metrics were implemented for the

cochlear implant processing and tested. The effective SNR at the output of the signal path is

shown to be the most relevant factor affecting speech performance in noise. It is highly

correlated with speech intelligibility of cochlear implant recipients tested with different

AGC configurations over a wide range of presentation levels.

1.5 Patents and Publications

Patent

1. Swanson, B. A., Khing, P. P. “Post-Filter Common-Gain Determination” US Patent

2013/0103396A1

2. Swanson, B. A., Khing, P. P. “Feature-based Level Control” US Patent

2013/0195278A1

Journal Paper

The clinical studies, results and findings of the effects of different compression limiters on

speech intelligibility of cochlear implant recipients were submitted to PLOS ONE is a Peer-

Reviewed, Open Access Journal.

Khing, P. P., Swanson, B. A., E. Ambikairajah. “The Effect of Automatic Gain Control

Structure and Release Time on Cochlear Implant Speech Intelligibility”

P a g e | 11
Chapter 1 Introduction

Conference Papers

The clinical studies, results and findings of the effects of no AGC and fast AGC on speech

intelligibility of cochlear implant recipients were published and presented at the IEEE

International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011.

Prague, Czech Republic.

Khing, P. P., E. Ambikairajah, Swanson, B. A. (2011). “Effect of fast AGC on

cochlear implant speech intelligibility” 2011 IEEE International Conference on

Acoustics, Speech and Signal Processing (ICASSP). Page: 285-288

The analysis on predicting speech intelligibility of cochlear implant recipients using selected

signal metrics were published and presented at the IEEE International Conference on

Acoustics, Speech and Signal Processing (ICASSP), 2013. Vancouver, Canada.

Khing, P. P., E. Ambikairajah, Swanson, B. A. (2013). “Predicting the effect of AGC

on speech intelligibility of cochlear implant recipients in noise” 2013 IEEE

International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Page: 8061-8065

Poster Presentation at International Conferences

In addition to the publications, the results of the findings in various chapters of this thesis

were also presented at international conferences, the ones most relavant for cochlear implant

research.

 Conference on Implantable Auditory Prostheses (CIAP), 2011. Asilomar, California,

USA.

 8th Asia Pacific Symposium on Cochlear Implant and Related Sciences (APSCI),

2011. Daegu, Korea.

 12th International Conference on Cochlear Implants and Other Implantable Auditory

Technologies, 2012. Baltimore, USA.

P a g e | 12
Chapter 1 Introduction

 11th European Symposium on Paediatric Cochlear Implantation (ESPCI), 2013.

Istanbul, Turkey.

 Conference on Implantable Auditory Prostheses (CIAP), 2013. Asilomar, California,

USA.

P a g e | 13
Chapter 2 Sound and Hearing

2 Sound and Hearing

2.1 Introduction

Knowledge of the physiology of the ear, especially the inner ear, is necessary to design

sound processing algorithms for cochlear implants. The most effective way of processing is

to mimic the physiological functions that are bypassed by the prosthesis (Wilson and

Dorman 2008). This chapter is concerned with the mechanism of the auditory system, with

emphasis on the compressive function of the inner ear. It also studies loudness and speech

perception.

2.2 Hearing Mechanism

Hearing is a complex process that converts mechanical movements of a sound wave into

action potentials for the brain to process. Sound travels from the outer ear through the middle

ear to the inner ear. Figure 2-1 shows the structure of the peripheral part of the human

auditory system.

Figure 2-1 Illustration of the peripheral auditory system

P a g e | 14
Chapter 2 Sound and Hearing

The pinnae aid sound localization by modifying the incoming sound, particularly at high

frequencies. Sound travels through the ear canal and vibrates the ear drum. These vibrations

are transmitted through the middle ear by three small bones to the oval window of the

cochlea. The three small bones of the middle ear are called the malleus, incus and stapes.

The middle ear acts as the impedance matching unit between the outer ear and the inner ear

for efficient sound transmission. The middle ear is also thought to provide some automatic

gain control (AGC) via the stapedial reflex. The frequency range of human hearing is

typically from 20 Hz to 20 kHz and it is most sensitive to acoustic signals with frequency

between 1 kHz and 5 kHz, largely due to the resonance of the outer ear canal and the middle

ear.

The inner ear contains the cochlea and the vestibular (balance) system. Understanding the

function of the cochlea can provide insight into many aspects of auditory perception (Moore

2003a). A cross-section of the cochlea is as shown in Figure 2-2. The cochlea is a spiral-

shaped, fluid-filled chamber of bone. The cochlea has three chambers; scala timpani, scala

vestibule and scala media. The scala timpani and the scale media are separated by the basilar

membrane which supports the organ of Corti. The start of the cochlea, where the oval

window is located, is called the base. The basal end is most sensitive to high frequencies.

The inner tip at the other end is known as the apex which is most sensitive to low

frequencies. It is partly due to the physical property of the basilar membrane, which is stiff

and thick at the basal end for the wave to travel faster and thin and flexible at the apical end

for the wave to travel slowly.

P a g e | 15
Chapter 2 Sound and Hearing

Figure 2-2 Cross-section of the cochlea

When the oval window is set in motion by three small bones of the middle ear, a pressure

difference causes the basilar membrane to vibrate. Early research on the frequency

selectivity of the cochlea was carried out by Georg von Békésy. He observed the amplitude

displacement of the basilar membrane as a function of frequency by experimenting with the

inner ears of human cadavers. Sounds of different frequencies produce maximum

displacement at different places along the basilar membrane. The characteristic frequency

can be defined as the frequency that gives the maximal displacement on a particular location

on the basilar membrane. Each point on the basilar membrane is considered as a bandpass

filter with the centre frequency corresponding to the characteristic frequency. Hence the

cochlea acts like a frequency analyzer. The bandwidth roughly increases in proportion with

the characteristic frequency.

Between the basilar membrane and the tectorial membrane are hair cells. The hair cells

contain three rows of outer hair cells and one row of inner hair cells. The protruding hairs on

top of the hair cells are called stereocilia. When the basilar membrane is moved by sound, a

shearing motion created between the basilar membrane and the tectorial membrane causes

the steoreocilia to displace. Deflections of the basilar membrane towards the tectorial

P a g e | 16
Chapter 2 Sound and Hearing

membrane cause the inner hair cells to initiate the action potentials in the neurons, also

known as the spiral ganglion cells, of the auditory nerve. No neuron firings occur for the

movement of the basilar membrane in the other direction. This phenomenon is called phase

locking and is effective for the audio frequency up to 5 kHz. Phase locking synchronizes

timing of the neuron firings to the waveform of the sound. Information is conveyed by the

neurons connected to the inner hair cells to the central nervous system. There are

approximately 30,000 neurons in the human cochlea. Unlike the inner hair cells, the outer

hair cells do not send information to the central nervous system. They function as an active

amplifier and compression unit. They are also responsible for sharp tuning and high

sensitivity of sounds at different frequencies. The gain control mechanism of the cochlea is

described more in the following section.

2.3 Cochlear Compression

The dynamic range of normal hearing is approximately 120 dB (Killion 1978). Neural

response studies show that the dynamic range of auditory nerves is in the order of 20 to 30

dB. Therefore a level adjusting or gain control mechanism is necessary to damp or amplify

the basilar membrane motion according to the available dynamic range of the auditory

nerves. The outer hair cells act as muscles to feed the energy back into the basilar membrane.

In the absence of the outer hair cells, the basilar membrane motion gets damped by the organ

of Corti. When the stimulus is small, the outer hair cells feed the energy back into the basilar

membrane movement. They reduce the damping until the signal is large enough to transmit

to the higher centre of the brain. Similarly for a high level input stimulus, the outer hair cells

compress the motion of basilar membrane (Yates 1995).

The above behavior of cochlea indicates a non-linear motion of the basilar membrane. The

velocity in response to a characteristic frequency is almost linear for very low and very high

levels but has a shallow slope for the levels in mid range. This compressive function allows a

very large range of input sound levels in the peripheral hearing.

P a g e | 17
Chapter 2 Sound and Hearing

2.4 Auditory Models

Auditory models can be divided into two broad groups: physiological models and

computational models. Physiological models attempt to reproduce the complex

hydromechanical activities of the cochlea by partial differential and integral equations.

Computational models on the other hand are much simpler compared to the physiological

models. They also attempt to predict the behaviour of cochlea, but with less emphasis given

on the physiological responses.

One computationally efficient model is Lyon’s cochlea model (Lyon 1982, 1983, 1984). The

model describes the propagation of sounds in the cochlea and the conversion of acoustical

energy into the firing of neurons. The detailed description of the model implementation can

be found in Slaney (1988). The filterbank consists of a series of notch filters that model the

travelling pressure waves and resonators that transform the pressure wave into basilar

membrane motion. A notch filter and a resonator together form a bandpass filter, with a very

sharp high frequency cutoff. The spacing between the centre frequencies is approximately

linear below 1 kHz and logarithmic above 1 kHz. Half-wave rectifiers that follow the

filterbank model the inner hair cells. The half-wave rectifier performs a non-linear detecting

operation of the inner hair cells. The compressive operation of the basilar membrane is

modelled by four stages of AGC. Each AGC uses a different time constant to stimulate

different adaptation times in the ear. Each AGC is coupled with the nearest neighbours to

include the simultaneous masking effect of the ear. The output of the AGCs represents the

time-varying probability of firing of the neurons.

P a g e | 18
Chapter 2 Sound and Hearing

Figure 2-3 Block diagram of Lyon's auditory model

The computational model of Lyon and Mead (1988) was derived from the Lyon’s cochlea

model described above. One of the important changes incorporated in this model was a

replacement of time-invariant gain mechanisms with adaptive active gain model of the outer

hair cells. Time-varying gain factors resulted in Q-factors of the filters that change in time as

a function of input sound pressure level. The gain control behavior of cochlea is effectively

coupled with the nearest neighbors such that a large signal detected in one place can reduce

the overall gain to other places. Iso-output curves are also sharpened by the coupled AGC

action.

Another widely used computational model was developed by Kates (1991). In Kates’ model,

the nonlinear compressive behavior of the filters in response to the change in the sound

pressure level and the response due to the damage of the hair cells were included.

2.5 Loudness Perception

Loudness is defined as the attribute of the auditory sensation in terms of which sound can be

ordered on a scale extending from quiet to loud (Moore 2003a). Loudness is measured by

psychophysical procedures as scaling and matching.

In a direct scaling method, the subject chooses a number corresponding to the loudness of a

stimulus. Stevens’ power law proposed a relationship between the magnitude of a physical

P a g e | 19
Chapter 2 Sound and Hearing

stimulus and its perceived intensity or strength (Stevens 1957). Based on his theory, the

loudness of a pure tone has been measured from a power function of sound intensity:

(2.1)

where L is the subjective unit of loudness, k is a proportionality constant, I is the intensity of

a tone and is the exponent. was 0.6 approximately the loudness estimation done with a

3 kHz tone with the randomly chosen amplitude (Stevens 1975). The power or the scaling

term in the log domain is often used in relating the psychological aspects to the physical

aspects of the stimuli. Normal hearing people can reliably report perceived loudness from

intensity changes and discriminate fine intensity differences over a very large dynamic

range. Changes in intensity are highly correlated with loudness changes. However the

perceived loudness can also be affected by changes in frequency, duration and bandwidth of

a stimulus although the intensity remains fixed (Yost and Nielsen 1994).

In a matching procedure, two stimuli are presented to the listener to match their equality

based on some physical attributes. For example, tones with equal intensity are not equally

loud if their frequencies are different. Fletcher and Munson are the pioneers of loudness

equalization across frequency. They produced the first equal-loudness contours using the

loudness matching procedure (Fletcher and Munson 1933; Fletcher and Munson 1937). In

their study, the subjects were asked to match the loudness of tones at different frequencies to

a reference tone at 1 kHz. The intensity of the reference tone was adjusted until it was

perceived to have an equal loudness as the test tone. The lowest equal-loudness contour

represents the threshold of hearing and the highest contour represents the threshold of pain.

The threshold of hearing is typically used for testing hearing level (HL). For example, a

person with 40 dB HL at 1 kHz is 40 dB less sensitive than normal hearing subjects at that

frequency.

While the thresholds of audibility are important to define the dynamic range of the auditory

system over frequency, it is also important to know the sensitivity to discriminate a change

in intensities and frequencies. The auditory system can detect a change of 0.5 to 1 dB

approximately across a broad range of frequencies and intensities. The ability to discriminate

just-noticeable change in intensity is called just-noticeable difference (JND). The JND of

P a g e | 20
Chapter 2 Sound and Hearing

intensity can also be described by Weber fraction . For many conditions, Weber fraction is

not constant. This subtlety in the auditory processing is often described by the “near miss to

Weber law” (Yost and Nielsen 1994).

2.6 Speech Perception

Speech is a time-varying complex waveform with intensity modulating in both time and

frequency domain. Speech communication occurs in various environments. Between a

speaker and a listener, speech is subjected to interferences such as background noise from

external sources and reverberation by room acoustics and distortion due to poor transmission

channels. Yet speech, due to its redundant nature, shows resistance to noise and distortion

with little-to-none intelligibility degradation. The work of Licklider and Pollack (1948)

showed that speech was still intelligible when all the amplitude information was eliminated

by infinite peak-clipping. One of the qualities of speech is its resistance to distortion.

Assmann and Summerfield reviewed the contribution of redundancy of speech under adverse

listening conditions (Chapter 5, Greenberg et al. 2004). They defined adverse listening

conditions as any perturbation of the communication process resulting from either an error in

production by the speaker, channel distortion or masking in transmission or a distortion in

the auditory system of the listener.

Research on speech perception has tried to uncover what features of speech are robust and

what features are critical for intelligibility. Redundancy of speech occurs in acoustic,

phonetic and linguistic levels (Greenberg et al. 2004). At the acoustic level, the redundancy

is shown by the covariation in the pattern of the amplitude modulation across frequency and

time. At the phonetic level, one phonetic cue can be distinguished by many acoustic cues

(Klatt 1989). Linguistic knowledge and contextual information support the robustness of

speech in the sentence level.

Rosen (1992) categorized temporal information of speech into three groups; (i) envelope

cues from 2 to 50 Hz, (ii) periodicity cues from 50 to 500 Hz and (iii) temporal fine structure

cues from 600 to 10 kHz. Low rate modulation of speech between 5 and 50 Hz conveys

important information for segmental (phonetic) and suprasegmental (stress pattern, words

P a g e | 21
Chapter 2 Sound and Hearing

onset/offset, prosody and speaking rate) distinctions of speech. From the study of Plomp

(1983), the modulation frequency of phonetic entities shows that the modulation of stress

pattern, words, syllables and phonemes is around 1 Hz, 2.5 Hz, 5 Hz and 12 Hz respectively.

According to the studies of Drullman and coworkers (Drullman, Festen and Plomp 1994b,

a), speech intelligibility was maintained as long as temporal envelope modulation below 16

Hz was preserved. The modulation spectrum can be sensitive to noise, distortion such as

peak-clipping and filtering. For speech recognition in noisy conditions, other temporal cues

such as periodicity help to identify the target speaker. Periodicity cues are relevant to

intonation, voicing and consonant manner. Lastly, temporal fine structures are relevant to

information regarding consonant place and vowel quality.

Formants are vocal tract resonances that provide both phonetic information and speaker’s

identification. The frequency location of the first three formants together with their transition

in time provides phonetic cues for vowels and certain consonants. Vowels are voiced sounds,

which are produced by vibration of the vocal folds in the larynx. Vowels form the nucleus of

syllables and possess distinct energies at the formant frequencies. Different vowels can be

distinguished by the frequencies of their first and second formants (F1 and F2). Vowel

perception involves an analysis of the spectral profile.

Consonants are pronounced with pressure, a constriction or closure at some point along the

vocal tract. The closure and release of vocal tract create rapid spectral change. Consonants

are different from vowels by their rate of change in the short-time frequency. According to

Stevens (1983), the density of speech information is highest during the consonantal closures

and onsets. Each consonant can be distinguished by several phonetic features; manner, place

and voicing. Compared to vowels, consonants are more susceptible to masking by noise

because they are short in duration and low in energy.

Both spectral and temporal information are important for speech intelligibility but some

components of speech rely more on the temporal information and some on the spectral

information. For example, vowels, semivowels and nasals have a good spectral

representation in terms of formants but little temporal modulation information. On the other

hand, some consonants such as plosives and fricatives contain little spectral information.

P a g e | 22
Chapter 2 Sound and Hearing

They are mainly determined from the modulated waveform in time domain. Both the

envelope modulation and amplitude fine structure carry the intelligibility of speech.

Envelope modulation is more important to speech intelligibility than fine structure as speech

can still be intelligible without fine temporal structures.

The Long-Term Average Speech Spectrum (LTASS) has been studied for the spectral

characteristics of speech in adverse conditions (Dunn and White 1940; French and Steinberg

1947; Licklider and Miller 1951). LTASS has 25 dB range of variation in average level

across frequency with most of the energy lying below 1 kHz. LTASS is dominated by voiced

sounds, with the energy peaking approximately at 500 Hz and gradually decreasing above

that. The low frequency emphasis of speech improves robustness in adverse conditions. For

instance, the reliability of phase locking below 1.5 kHz is supported by the low frequency

emphasis (Greenberg 1996). The first three formants below 3 kHz carry most intelligibility.

Common periodicity and interaural timing cues for separation of target speech and other

sounds are preserved in the low frequency neural discharge pattern (Greenberg et al. 2004).

2.7 Conclusion

Speech is robust signal because it has redundant cues in different levels. Even heavily

distorted speech can still be intelligible for normal hearing with the robust peripheral

processing. The peripheral hearing system can handle 120 dB range of sounds. Filtering in

various stages of the ear improves robustness. However, for hearing prostheses such as

cochlear implant system, it is important to preserve critical cues to maintain speech

intelligibility because the impaired hearing system cannot process all the available acoustic

features. The electric hearing of a cochlear implant system and speech perception of cochlear

implant recipients will be studied in the next two chapters.

P a g e | 23
Chapter 3 Cochlear Implants

3 Cochlear Implants

3.1 Introduction

A cochlear implant is a solution for persons with severe-to-profound deafness who can no

longer benefit from hearing aids for speech understanding. Electrical stimulation of the

auditory nerves with the cochlear implant partially restores hearing. This chapter briefly

describes the history of cochlear implants and the clinical aspects. It then describes the

components of a cochlear implant system, the principles of operation and the psychophysics.

It then finally examines the effects of cochlear compression loss on the loudness and speech

perception.

3.2 Brief History of Cochlear Implants

Single-electrode cochlear implants were experimented between 1950 and 1980. Djourno and

Eyries (1957) implanted a telecoil on the auditory nerve and stimulated with a burst of 100

Hz signals at a rate of 15 to 20 times per minute. With postoperative rehabilitation, the

patient could distinguish certain words but speech recognition was not developed. House and

Urban (1973) implanted a single electrode in the scala tympani of three patients. The patients

experienced hearing sensation but the study did not elaborate on speech understanding. With

Helmholtz’s place theory, a cochlear implant with multiple electrodes was experimented in

1970s to stimulate different frequency components on different places of the basilar

membrane. The University of Melbourne’s multichannel electrode array was implanted in a

recipient in 1978 (Clark 2003). The subject perceived both the place pitch and the rate pitch

with a significant improvement in open-set speech understanding. With technology advances

over four decades, today cochlear implant recipients have comparable speech understanding

as normal hearing persons in favourable listening conditions.

Initially, only patients with a bilateral profound sensorineural hearing loss with no open-set

speech recognition were considered as candidates for cochlear implantation. Nowadays,

individuals with greater amount of residual hearing and pre-implant speech recognition

P a g e | 24
Chapter 3 Cochlear Implants

scores are also considered for the implant (Gifford 2011). Most importantly, a candidate

needs to have some functional auditory nerves for a cochlear implant to work successfully.

Hence the audiometric threshold and speech recognition results are key factors to select

candidacy for adults. Post-lingually deaf adults and pre-lingually deaf children are the

potential candidates of cochlear implants. The candidacy for children also considers age,

etiology, auditory progress with hearing aids, and speech recognition performance (for older

children), etc. Children who have received the implants before two years of age show the

most benefits from cochlear implantation (Kral and O'Donoghue 2010). Children show

language skills equivalent to those of normal hearing children. From the study of Yoshinaga-

Itano et al. (1998), early identification of hearing loss and early intervention can provide

significantly better language development.

Profound deafness can have a great impact on education, employment and social life. Hence

cochlear implants have improved the quality of life for the recipients. The cochlear implant

has been labelled as the most successful and effective implantable prosthesis in terms of

restoring function to recipients (Wilson and Dorman 2008). As of today, more than 181,000

have received Nucleus cochlear implants. A combined effort from various disciplines

including bioengineering, physiology, otolaryngology, speech science, material science and

signal processing technology has contributed to the success of cochlear implants.

P a g e | 25
Chapter 3 Cochlear Implants

3.3 Cochlear Implant Systems

Figure 3-1 Cochlear implant system

A cochlear implant system consists of (1) a radio frequency (RF) transmitter coil, (2) a sound

processor, (3) a receiver-stimulator and (4) an electrode array. The numbers are as labeled in

Figure 3-1. Early sound processors are body worn but modern processors are behind-the-ear

(BTE). The receiver-stimulator implant unit is placed under the skin behind the ear. The

electrode array is surgically inserted into the scala tympani of the cochlea. The sound

processing procedures of a cochlear implant system are as below:

 The microphones of the sound processor capture incoming acoustic signals

 The sound processor processes the captured audio and transmits an encoded data and

power to the implant transcutaneously via the RF transmitter coil

 The internal receiver-stimulator circuit decodes the digitized waveform and sends a

pulse train to the electrode array inside the cochlea

 The electrical current pulses excite the neurons of the auditory nerve to give hearing

perception to the recipient

At present, there are four major cochlear implant companies; Cochlear Limited (Sydney,

Australia), Med-El (Austria), Advanced Bionics (USA) and Neurelec (France). Table 3-1

P a g e | 26
Chapter 3 Cochlear Implants

lists recent cochlear implant models and default sound coding strategies of the CI

manufacturers.

Manufacturer Implant model Number of electrodes Sound coding strategy

Cochlear Limited CI24RE 22 intra-cochlear Continuous Interleaved

electrodes + 2 extra- Sampling (CIS), Spectral

cochlear electrodes Peak (SPEAK), Advanced

Combinational Encoder

(ACE)

Med-El Pulsar 19 intra-cochlear CIS, Fine Structure

electrodes + 2 extra- Processing (FSP), High

cochlear electrodes Definition CIS (HD-CIS)

Advanced Bionics HiRes 90K 16 intra-cochlear CIS, HiResolution (HiRes)

electrodes + 2 extra- strategies

cochlear electrodes

Neurelec Digisonic 20 intra-cochlear CIS, Main Peaks

electrodes + 2 extra- Interleaved Sampling

cochlear electrodes (MPIS)

Table 3-1 Recent cochlear implant systems of the major cochlear implant companies

Cochlear Limited have manufactured three generations of Nucleus cochlear implants. They

can be distinguished by the Integrated Circuit (IC) as shown in Table 3-2. Nucleus implants

have 22 intra-cochlear electrodes. In addition, CI24M, CI24R and CI24RE implants have

two extra-cochlear electrodes.

P a g e | 27
Chapter 3 Cochlear Implants

System name Implant IC RF link frequency Total pulse rate

model (MHz) (pps)

Nucleus 22 CI22M CIC1 2.5 1500

Nucleus 24 CI24M CIC3 5.0 14400

CI24R

Nucleus CI24RE CIC4 5.0 31500

Freedom

Nucleus 5 CI24RE CIC4 5.0 31500

Table 3-2 Nucleus cochlear implant models

The first extra-cochlear electrode is a ball electrode connected by a separate lead wire and

the second extra-cochlear electrode is the platinum plate mounted on the titanium package

(see the left picture of Figure 3-2).

Figure 3-2 Nucleus 5 system

Over the last 18 years, Cochlear Ltd has produced five generations of Nucleus sound

processors. Table 3-3 lists the Nucleus sound processors together with the type of filterbank.

As the filterbank consumed the most power of signal processing, decisions regarding the

implementation of the filterbank determined the overall processor architecture (Swanson et

al. 2007). Two different technologies have been used to implement the filterbank: switched

capacitor filters (SCF) and digital signal processing (DSP). Modern sound processors are

driven by the power of DSP. The AGC systems used in the Nucleus systems will be

elaborated in chapter 5.

P a g e | 28
Chapter 3 Cochlear Implants

Sound Processor Name Processor Type Filterbank Type

Spectra Body worn SCF

Sprint Body worn DSP

ESPrit BTE SCF

Freedom BTE DSP

CP810 BTE DSP

Table 3-3 Nucleus sound processors

3.4 Electrical Stimulation

During the first fitting session after the implantation, a recipient’s sound processor

communicates with the implant for the first time. The programming software determines the

processor type and speech processing strategy to generate basic stimulation parameters such

as stimulation mode and rate. Using these basic stimulation parameters, a clinician measures

the current level for each electrode pair to obtain the usable dynamic range of the electrical

current stimulation on the auditory nerve. Traditionally, short bursts of pulses are presented

on a single channel at increasing current levels to determine T and C levels. The minimum

current level that provides a consistent hearing sensation is called T-level. The maximum

current level that gives a comfortably loud hearing sensation is called C-level. The clinical

software then creates a program or MAP for the recipient to encode the amplitudes of

acoustic signals into electrical stimulation levels.

3.4.1 Stimulation Mode

The stimulation mode refers to an electrode configuration for the current flow between an

active and a reference electrode. The most commonly used stimulation mode in cochlear

implant systems from all manufacturers is monopolar mode, where current flows between an

intra-cochlear (active) electrode and an extra-cochlear (reference) electrode.

P a g e | 29
Chapter 3 Cochlear Implants

3.4.2 Current Configuration

All commercial implants use a charge-balanced biphasic waveform as shown in the left

diagram of Figure 3-3. One current pulse contains two phases of equal duration and

amplitude. This form of stimulation is safe because no net charge remains in the tissue or

electrode bands. Each biphasic current pulse is delivered between an active electrode and a

reference electrode. The opposite polarity of the two phases indicates the direction of current

flow between the active electrode and the reference electrode. The loudness perceived from

the electrical stimulation is influenced by both current level and pulse width (Shannon 1983).

Loudness is determined by the total charge delivered by a pulse. Total charge is the product

of the current level and the duration of the pulse (i.e. the pulse width). Hence the level of

loudness can be adjusted by changing either the current level or the pulse width such that

pulses with an equal charge produce equal loudness sensation. The right plot of Figure 3-3

shows two biphasic pulses with an equal charge.

Figure 3-3 Stimulation of the biphasic waveform (left panel) and two current
waveforms with equal charge (right panel)

In the Nucleus 24 implant, the actual current level in microampere (uA) is calculated from

the clinical current level by the following equation:

(3.1)

where is the minimum current level ( ) produced when the clinical current value c is

zero and is the maximum current level ( ) produced when (Swanson

P a g e | 30
Chapter 3 Cochlear Implants

2008). The clinical current level in the Nucleus 24 system is specified by an 8-bit value from

0 to 255. The current calculation can also be expressed in terms of an exponential function:

(3.2)

where r = 0.0203 indicates 2% increment of the current level for each step increase in the

clinical current unit.

The Nucleus implants can only produce non-simultaneous current pulses because a single

current source is available. The flow of current is routed between electrodes by electronic

switches. An example of sequential current pulses with different amplitudes is shown in

Figure 3-4.

10

7
Channel

1
16 17 18 19 20
Time (ms)

Figure 3-4 Sequential pulse stimulation, showing timing and amplitudes

3.5 Loudness Perception

The narrow dynamic range for electrical stimulation is a direct consequence of the loss of

cochlear gain control function (Chapter 6, Bacon, Fay and Popper 2004). The electrical

dynamic range is defined as the ratio between C and T currents. Stevens’ power law states

that the loudness grows as a power of sound intensity with the exponent of 0.6 for the

P a g e | 31
Chapter 3 Cochlear Implants

acoustic hearing (Equation 2.1). Fu and Shannon (1998b) estimated loudness perceived by

three Nucleus 22 recipients on the current level between T and C levels stimulated at the rate

of 500 pps. The electrical loudness estimate was well fitted by a power function of the

current level in uA with an exponent, = 2.7.

(3.3)

where k is the subject specific constant and i is the current level in uA. The power function of

the loudness perceived by cochlear implant recipients is in agreement with Steven’s power

law. A greater power term indicates a rapid loudness growth with the current level. Loudness

is normalized by the reference current level which produces the loudness unit of one

(Swanson 2008). The unit of loudness as a function of current can be calculated by:

(3.4)

When and i of the above equation are substituted by the equation 3.2, L becomes:

(3.5)

Taking a logarithm on both side of equation 3.5, the log loudness becomes:

(3.6)

Finally, the loudness in log scale is a linear function of the clinical current level.

The dynamic range of neurons in acoustic hearing varies from 10 to 50 dB depending on the

spontaneous activity. On the other hand, the dynamic range of electrical stimulation is

uniformly narrow from 5 to 20 dB even though T and C levels are different between subjects

(Bacon, Fay and Popper 2004).

The number of discriminable current steps is also important. Nelson et al. (1996) measured

the difference limens (DLs) of eight cochlear implant subjects for changes in electrical

current, using 300 ms bursts stimuli at a rate of 125 Hz. Their study showed that the intensity

discrimination of cochlear implant recipients can be quantified by Weber fractions in

decibels, , as in acoustic stimulation. The average exponent of the Weber

functions of electric stimulation is an order of magnitude higher than that of acoustic

P a g e | 32
Chapter 3 Cochlear Implants

stimulation. It was because of the loss of cochlear compression by directly stimulating on the

nerves. The cumulative number of discriminable intensity steps across the dynamic range of

electric hearing ranged from as few as 6.6 to as many as 45.2.

3.6 Conclusion

This chapter described the principles of cochlear implant hearing. With an electrode array,

place coding is performed by stimulating neurons on different places of the cochlea.

Loudness is proportional to a total charge delivered by a current pulse. Either changing the

pulse width or amplitude can adjust loudness. The number of intensity difference limens is

limited, ranging from 6 to 45, in cochlear implant hearing. Speech represented by the

electrical current waveforms is crude due to the limited spectral, temporal and intensity cues

that can be transmitted to the implant. Therefore it is important for a sound coding strategy to

selectively present important cues for the intelligibility of speech and other sounds. The next

chapter will discuss the sound coding strategies of cochlear implant systems and the speech

perception of cochlear implant recipients.

P a g e | 33
Chapter 4 Cochlear Implant Sound Coding Strategies

4 Cochlear Implant Sound Coding Strategies

4.1 Introduction

The previous chapter described cochlear implant hearing by the electrical stimulation of the

multiple-channel electrodes on the neural nerves. Place and rate coding of frequency are

important for speech perception. In this chapter, sound coding strategies of cochlear implant

systems are studied to see how each strategy transforms the acoustic stimuli to electrical

stimuli to provide the spectral, temporal and intensity cues for hearing.

4.2 Sound Coding Strategies

A sound coding strategy transforms acoustic signals to electrical current pulses. The main

objective of a sound coding strategy is to provide essential cues of speech waveform for

speech understanding of cochlear implant recipients. Early sound coding strategies for the

Nucleus systems extracted key speech features from sounds for speech understanding. The

F0/F2 strategy coded the fundamental frequency (F0) as the rate of stimulation and the

second formant (F2) as place of stimulation. The F0/F2 strategy was later upgraded to the

F0/F1/F2 strategy by stimulating one more electrode that corresponded to the first formant

frequency (F1). The F0/F1/F2 strategy was extended to the Multipeak strategy by stimulating

two additional electrodes at the basal end to include frequency information above 2 kHz.

These strategies worked well for listening conditions in quiet (Clark 2003).

The filterbank-based speech processing algorithms were developed in the 1990s. These

strategies present spectral information of sounds by place of stimulation. They made best use

of the tonotopic structure of the cochlea. More importantly, the filterbank-based strategies

provided higher speech intelligibility than the feature-based strategies. The signal processing

framework of the filterbank-based sound processing strategies is shown in Figure 4-1. The

front-end does the frequency shaping and level adjustment to the audio signals captured by

the microphones. The front-end block typically involves a pre-emphasis filter, a sensitivity

control and an AGC system. The filterbank splits the audio signal into frequency bands such

P a g e | 34
Chapter 4 Cochlear Implant Sound Coding Strategies

that each frequency band is allocated to one stimulation channel. The sampling and selection

block determines the rate of stimulation and the shape of stimuli across the frequency

channels. The amplitude mapping block converts the channel amplitudes into current levels

within the predetermined electrical dynamic range of each electrode.

Figure 4-1 Cochlear implant sound processing (Swanson 2008)

SPEAK, ACE, CIS and HiRes are the most widespread-used sound coding strategies in

cochlear implant systems. Those strategies can be sorted into two groups based on the

channel selection strategy. CIS and HiRes stimulate all frequency channels whereas ACE

and SPEAK pick a subset of frequency channels with the maximum amplitude levels.

4.2.1 Continuous Interleaved Sampling (CIS)

The Continuous Interleaved Sampling (CIS) sound coding strategy was initially developed

for the Ineraid implant with six electrodes that are connected directly to a percutaneous plug

(Wilson et al. 1991). The CIS strategy is offered by all manufacturers (Table 3-1). CIS uses

the number of frequency bands the same as the number of active electrodes. CIS stimulates

different sites on the cochlea based on the frequency contents of the input signals. It

emphasizes presenting the rapid temporal variation of the input signals via a high stimulation

P a g e | 35
Chapter 4 Cochlear Implant Sound Coding Strategies

rate. Each of the filterbank envelopes is sampled at a fixed rate, with the current pulses on

each channel interleaved in a round-robin fashion. The signal processing blocks of CIS

strategy are shown Figure 4-2. The pre-emphasis filter attenuates the frequency components

below 1.2 kHz at 6 dB/octave to reduce the dominating low frequency noise and increases

the energy level of relatively weak consonants compared to vowels. The filterbank is

followed by full-wave rectification and lowpass filtering for envelope extraction. The

frequency channels are assigned to the selected electrodes, following the tonotopic order of

the cochlea.

Figure 4-2 Continuous Interleaved Sampling strategy (Wilson 2006b)

4.2.2 HiResolution (HiRes)

HiResolution (HiRes) strategy (Firszt 2003), offered by Advanced Bionics, is a close

variation of CIS strategy. In HiRes there are 16 logarithmically spaced filters in the

filterbank that corresponds to 16 active electrodes in the cochlea. The principal differences

between HiRes and CIS are number of stimulation channels, rate of stimulation and use of a

half-wave rectifier only for envelope detection in HiRes (Wilson 2006a). HiRes can provide

P a g e | 36
Chapter 4 Cochlear Implant Sound Coding Strategies

temporal information up to 2800 Hz across 16 frequency channels. The stimulation rate can

be maximised by the number of electrodes used and the pulse width.

4.2.3 Spectral Peak (SPEAK)

The Spectral Maxima Sound Processing (SMSP) strategy was developed in the University of

Melbourne for the Nucleus 22 cochlear implant. In this scheme the six largest outputs of 16

frequency channels were used to stimulate at a constant rate of 250 pps. Cochlear

implemented SMSP on the Spectra sound processor of the Nucleus 22 system. It became

well known as SPEAK strategy (McDermott, Mckay and Vandali 1992; McDermott et al.

1993).

The front-end of the Spectra processor involved a sensitivity control and a fast front-end

AGC. The sensitivity control can adjust the AGC compression threshold to increase the

dynamic range of the input signal. The fast front-end AGC had an attack and release time of

2.5 ms and 50 ms respectively. The compression ratio was two to compress signal peaks

above the compression threshold. SPEAK differs from the SMSP by using a total of 20

frequency channels. The total stimulation rate was limited to 2000 pps in Nucleus 22

implant. Hence the stimulation rate per channel will be less than 100 Hz if all 20 channels

are stimulated. To provide adequate temporal information in the channel envelopes, the

number of channels was fixed at six for the average stimulation rate of 250 pps. The channel

selection of SPEAK selects the channels with higher amplitude. The number of channels

selected was based on the number of channel amplitudes above the preset noise threshold

and the energy distribution of input signal across frequency. Therefore the number of

selected channels can be less than six sometimes. For that reason, the stimulation rate was

varied from 180 to 300 pps.

4.2.4 Advanced Combinational Encoder (ACE)

The Advanced Combinational Encoder (ACE) sound coding strategy employs the maxima

selection method similar to the SPEAK strategy. The signal processing components of ACE

strategy are shown in Figure 4-3. ACE became available to the recipients when the Nucleus

P a g e | 37
Chapter 4 Cochlear Implant Sound Coding Strategies

24 system was launched in 1997. Since the Nucleus 24 implant is able to stimulate at higher

rate, the number of maxima can be increased without reducing the stimulation rate per

electrode. The maximum stimulation rate of the Nucleus 24 implant is 14400 pps. Therefore

compared to SPEAK, ACE can present more spectro-temporal details of the input signals by

allowing more channels to be stimulated at a higher rate.

Figure 4-3 Signal processing modules of the ACE strategy

4.2.5 Alternative Channel Selection Rules

There are also sound coding strategies that pick channels by a different set of rules.

Psychoacoustic Advanced Combinational Encoder (PACE) strategy applied a

psychoacoustic masking model to select the perceptually important channels above the

masking threshold (Nogueira et al. 2005). According to speech tests with the cochlear

implant recipients, the PACE strategy was better than ACE when each selected only four

channels per stimulation cycle. Hu and Loizou (2008) proposed a sound coding strategy with

a channel selection criterion based on the SNR of the channels. The SNR in each channel

P a g e | 38
Chapter 4 Cochlear Implant Sound Coding Strategies

was estimated, and target-dominated channels with SNR  0 dB were selected, while

masker-dominated channels with SNR < 0 dB were discarded.

4.3 Signal Processing Modules

The previous section described the major sound coding strategies for cochlear implant

systems. The signal processing modules of the cochlear implant signal path are studied in

this section. These signal processing modules are within the framework of any sound coding

strategy.

4.3.1 Microphone Directionality and Pre-emphasis

A directional microphone can improve the SNR if target speech and other sounds come from

different directions, for example if the speaker is in front of the recipient and interfering

sounds come from behind. Early Nucleus sound processors only had a single directional

microphone. The Freedom sound processor employs a directional microphone and an omni-

directional microphone. The Nucleus CP810 sound processor has two omni-directional

microphones whose phase and magnitude are calibrated by digital filters in the firmware to

form different microphone directionalities. In addition to the fixed-directional spatial filters,

Freedom and CP810 sound processors also have an adaptive-directional spatial noise

reduction technique called BEAM (Spriet et al. 2007).

A delay-sum directional microphone has a frequency response shape that emphasizes high

frequency components more than low frequency components. The pre-emphasis filter

provides a gain of approximately 5 dB per octave between 0.5 and 4 kHz. It flattens the

energy presented to the filterbank for the long term average spectrum of speech. It therefore

balances the energy ratio between vowels and consonants. In addition, the pre-emphasis

filter can also avoid the subsequent gain control systems from predominantly being driven by

intense sounds at low frequency, for example car noise or the recipient’s own voice. Without

the pre-emphasis, maxima selection would select more low frequency channels than high

frequency channels.

P a g e | 39
Chapter 4 Cochlear Implant Sound Coding Strategies

4.3.2 Front-end Gain Control

The front-end gain control system consists of the sensitivity control and the front-end AGC

system. The purpose of the front-end gain control is to increase the operating range of

acoustic signals beyond the pre-determined input dynamic range. The front-end AGC

systems for the Nucleus sound processors will be described in chapter 5 (§5.5).

4.3.3 Filterbank

The filterbank splits the frequency components of the input signals into the frequency bands

whose centre frequencies correspond to the characteristic frequencies of the cochlea. The

filterbank can be a group of Infinite Impulse Response (IIR) or Finite Impulse Response

(FIR) filters, followed by half-wave rectification and low-pass filtering for envelope

extraction. Modern DSP processors, such as SPrint, Freedom and CP810, use an efficient

FFT implementation for the frequency analysis of the input audio waveform. Hence the FFT

filterbank is of interest and explained below.

A 128-point FFT is done on the analysis frame. Before the FFT analysis, a Hann window is

applied to the input signals. Each analysis frame is overlapped with the previous frame to

make the analysis rate or the envelope sample rate close to the stimulation rate. The audio

sample rate of the Nucleus CP810 sound processor (§7.4.5.1) is approximately 16 kHz.

The Hann window is described in the following equation.

(4.1)

where N = 128 and n = sample index from 1 to N.

The Discrete Fourier Transform (DFT) of an input sequence of N samples is described in the

equation below.

(4.2)

where k is the FFT bin and n is the sample index from 0 to N – 1.

P a g e | 40
Chapter 4 Cochlear Implant Sound Coding Strategies

In contrast to the DFT implementation, which needs number of multiply-accumulate

operations, the FFT implementation only requires number of multiply-

accumulate operations. For 128 point FFT analysis, there are 65 bins for real components

with centre frequencies spaced linearly at multiples of 125 Hz.

4.3.4 Combine into Channels

A linear-log frequency spacing is required for the frequency bands: the centre frequencies

are linearly spaced below 1 kHz and logarithmically spaced above 1 kHz. For the bands

below 1 kHz, each band is assigned to one FFT bin. Bin 0 and 1 are discarded and the

assignment starts from bin 2. For frequency bands above 1 kHz, two or more consecutive

FFT bins are combined to produce wider bands. The default frequency range is 187 – 7937

Hz. The frequency allocation for a 22-channel filterbank is shown in Figure 4-4.

-5

-10
Magnitude (dB)

-15

-20

-25

-30
50 100 200 400 800 1000 2000 4000 8000
Frequency (Hz)

Figure 4-4 Magnitude response of 22-channel filterbank

The channel envelopes are calculated using the quadrature envelope detection method which

combines the real and imaginary parts of complex FFT samples into the allocated frequency

bands. Since the input signal is real, the FFT output is symmetric between the first half and

the second half of the bins. Hence FFT bins 65 to 127 are not required for the envelope

calculation. There are two ways to calculate the channel envelopes: (i) power sum and (ii)

P a g e | 41
Chapter 4 Cochlear Implant Sound Coding Strategies

vector sum. The main difference between the two methods is that the vector sum will pass

more high frequency envelope modulation than the power sum method. The following two

equations describe the power sum and the vector sum respectively.

Power sum: (4.3)

Vector sum: (4.4)

where j is FFT bin index, is the magnitude response at channel k. and are the

complex output and the fraction of energy corresponding to jth FFT bin respectively. Re and

Im stand for the real and imaginary components of the complex FFT respectively.

The energy leakage occurs between FFT bins due to the Hann window. Therefore the

channel equalization gains were applied to the channel envelopes to compensate. The

channel equalization gains considered the number of FFT bins, bin proportions, bin indices

and the Hann window response for each channel. Hence, a sinusoid input at the centre of any

channel would result in the same peak magnitude with the channel equalization. The

maximum attenuation of the channel equalization gains occur at the highest three frequency

channels and is approximately – 6 dB.

4.3.5 Channel Gains

The channel gains are either the fixed channel gains or the adaptive gains from the Adaptive

Dynamic Range Optimization (ADRO). The fixed channel gains are derived from the

channel equalization gains and the clinical gains. The clinical gains are set by the clinician

and allow the user to shape the spectrum of the filterbank. When ADRO is enabled, the fixed

channel gains are bypassed and only used in ADRO as initial gains (Figure 11-1). The

implementation of ADRO will be explained in the next chapter (§5.5.5).

P a g e | 42
Chapter 4 Cochlear Implant Sound Coding Strategies

4.3.6 Maxima Selection

The channel selection method for the ACE strategy is called maxima selection. The maxima

selection block scans the amplitudes of the channel envelopes and selects the channels with

highest amplitudes. The number of maxima is a clinical parameter and can be different

between subjects. If the number of channel amplitudes above the base level is less than the

number of maxima, power-up frames are presented after the stimulus frames.

4.3.7 Loudness Growth Function

For acoustic hearing, loudness is a power function of sound pressure level with exponent

(equation 2.1). Similarly, loudness was a power function of electrical current with

exponent (equation 3.3). When the cross-modality is performed between the two

equations for the same loudness (equation 4.5), a new power function with the constant

and the exponent is derived (equation 4.6).

(4.5)

(4.6)

The exponent . According to the study of Fu and Shannon (1998a),

cochlear implant recipients showed the highest speech recognition with the power-law

amplitude mapping function with an exponent of 0.2 although the subjects were fairly

insensitive to the exponent between 0.1 and 0.5. The logarithmic compressive function of

Nucleus systems has a close approximation to a power-law mapping function with the

exponent of approximately 0.25. The logarithmic compression of a typical loudness growth

function (LGF) is described by the following equation.

(4.7)

(4.8)

where is the control parameter for the steepness of the compression curve and is the

non-linear scaling function for the input signal between the base (B) and saturation (S)

P a g e | 43
Chapter 4 Cochlear Implant Sound Coding Strategies

levels for the frequency channel k. is related to Q value of the LGF. Q value is defined as

the percentage decrease in the output for a 10 dB decrease in the input from the saturation

level. The compressive operation of the LGF with two Q values, 20 and 40, is shown in

Figure 4-5.

Loudness Growth Function: Base Level = 4, Q varies


1

0.9
Q = 20
0.8

0.7
Output magnitude

0.6
Q = 40

0.5

0.4

0.3

10 dB
0.2

0.1

0
0B 20 40 60 80 100 120 140 M 160 180 200
Filter envelope amplitude

Figure 4-5 Instantaneous infinite non-linear compression of LGF

The shape of the LGF is intended to match the loudness perception of cochlear implant

recipients to that of normal hearing subjects for changes in sound intensity. The maximum

current level is not allowed to exceed C-level. The envelope at the saturation level (M in the

Figure 4-5) is stimulated by the maximum current level, C-level. The envelope at the base

level (B in the Figure 4-5) is stimulated by the minimum current level, T-level. The range

between the saturation level and the base level determines the operating range of the channel

envelopes. When the AGC system is not active, the levels of input acoustic signals, speech-

like signals (in dB SPL), at T-SPL and C-SPL correspond to the base and saturation level of

the LGF respectively. The signal path is calibrated such that speech at the C-SPL is

stimulated at C-levels (i.e., at the saturation level of the LGF) for the majority frequency

channels at a nominal sensitivity. T-SPL is typically determined from C-SPL and the

P a g e | 44
Chapter 4 Cochlear Implant Sound Coding Strategies

dynamic range of the LGF. With AGC systems, the operating range of the acoustic signals

beyond T-SPL and C-SPL can be extended.

4.3.8 Dynamic Range Selection

The challenge of cochlear implant signal processing is to map a wide range of acoustic

signals into a narrow range of electrical signals without losing essential information. If the

entire acoustic range of 120 dB was mapped into electrical dynamic range between C and T

levels, a substantial amount of compression would be required. Because the number of

differentiable current steps is limited, the intensity variations in acoustic waveforms would

be lost during compression. Therefore the operating range of the LGF is set to be much less

than 120 dB in practice. In Nucleus cochlear implant systems, the dynamic range of the LGF

is set to match the speech dynamic range because speech is most important and targeted

signal for cochlear implant recipients. A speech dynamic range of 30 dB, with speech peaks

12 dB above, and valleys 18 dB below the root-mean-square (RMS) level respectively, has

been used in the Articulatory Index (AI) calculation: ANSI S3.5 1997. Hence the input

dynamic range of the LGF in early Nucleus processors was set to 30 dB. The Freedom and

CP810 sound processors use the input dynamic range of at least 40 dB. Studies showed the

benefit of using a wider dynamic range (Dawson, Decker and Psarros 2004; Spahr, Dorman

and Loiselle 2007). The signal path at the default setting is tuned to work optimally at C-SPL

(the targeted speech level). The C-SPL is typically set to 65 dB SPL, the conversational

speech level at normal vocal effort. From the study of speech levels in everyday life by

Pearsons et al. (1976), the average overall levels for casual and normal vocal efforts by

males, females and children were 56 and 60 dB SPL respectively; with the measurements

taken at one metre from the talkers.

P a g e | 45
Chapter 4 Cochlear Implant Sound Coding Strategies

4.3.9 Mapping

The output of the LGF is linearly mapped into the electrode current levels (in clinical units).

Mapping can be described in the following equation.

(4.9)

where c is the current level, p is the output of LGF, T is the threshold current level, C is the

maximum comfortable current level and Vol is the volume control. The volume control can

only reduce current levels if the stimuli are too loud for the recipient.

4.4 Speech Perception

Many factors can affect speech intelligibility of cochlear implant recipients. The individual

factors include the type of hearing loss, the duration of deafness before implantation,

survival of neurons in the inner ear, duration of the implant usage and the clinical parameters

for sound coding with preferences. Broad factors affecting speech intelligibility of a recipient

include the ability of a particular cochlear implant system to provide accurate and salient

speech cues (Henry et al. 2000).

Figure 4-6 shows the electrodogram of the monosyllabic word ‘Choice’ captured at the

output of the cochlear implant signal path (§8.2.2) using Decoder Implant Emulation Tool

(DIET). Figure 4-7 shows the spectrogram of ‘Choice’ after the output stimuli were

converted back into the channel envelopes before the LGF. The diagrams show slow

temporal modulation at each electrode and frequency transition from one phoneme to

another; ‘ch’ –> ‘oi’ –> ‘ce’.

P a g e | 46
Chapter 4 Cochlear Implant Sound Coding Strategies

1
Ch oi ce
2
3
4
5
6
7
8
9
10
Electrode

11
12
13
14
15
16
17
18
19
20
21
22
23
4400 4500 4600 4700 4800 4900 5000 5100 5200
Time (ms)
Electrode

21

22

4500 4550 4600 4650 4700 4750 4800


Time (ms)

Figure 4-6 Electrodogram of the monosyllabic word ‘Choice’

P a g e | 47
Chapter 4 Cochlear Implant Sound Coding Strategies

7279
7279

5383 5383

3915 3915

2875 2875
Frequency (Hz)

2080 2080

1529 1529

1101 1101

856 856

612 612

367
367

0 0.2 0.4 0.6 0.8 1


4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5
1

0.8

0.6

0.4

0.2

0
4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5
Time (s)

Figure 4-7 Reconstructed spectrogram of the monosyllabic word ‘Choice’

The speech intelligibility of experienced cochlear implant recipients is comparable to normal

hearing subjects in favourable listening conditions even though the representation of speech

provided by the implant is crude. However, considerable performance variability remains

between subjects in noise (Fu and Galvin III 2008).

Researchers have experimented with simulated processing of the cochlear implant,

sometimes called vocoded speech, with normal hearing subjects to understand the

intelligibility of temporally and spectrally degraded speech.

4.4.1 Amplitude Cues

For normal hearing persons, amplitude cues may not be important for speech intelligibility

(Licklider and Pollack 1948). However, for cochlear implant recipients who rely on limited

temporal and spectral cues, amplitude cues are relatively more important for speech

recognition (Shannon, Zeng and Wygonski 1998; Zeng and Galvin III 1999; Zeng et al.

2002). Shannon et al. (2001) studied the effects of amplitude peak-clipping (amplitudes

above threshold was clipped) and center-clipping (amplitudes below threshold was not

presented) on the intelligibility of spectrally degraded speech processed by four channels

noise-vocoder with seven normal hearing subjects. The results showed that peak-clipping

affected vowels more than consonants whereas center-clipping affected both equally. The

consonants recognition was not significantly reduced until 75% of the amplitudes were peak-

P a g e | 48
Chapter 4 Cochlear Implant Sound Coding Strategies

clipped whereas vowels recognition was degraded monotonically with the percentage of

peak-clipping. Vowels are recognized from formant patterns, for example the energy ratio of

F1 and F2 is determined from both frequency place and the amplitude level at each

frequency. Shannon et al.’s study indicated that amplitude pattern was important for speech

intelligibility if the frequency components were poorly presented. It seems that the center-

clipping can affect consonants more than vowels because the energy of vowels is higher than

consonants. However, the results of center-clipping indicated that the recognition scores of

both vowels and consonants were monotonically dropped at the same rate with the amount of

center-clipping. Loizou et al. (2000) showed that amplitude resolution of eight steps were

sufficient for cochlear implant recipients for consonant recognition. They also showed that

the intensity resolution could be traded with spectral resolution when normal hearing

subjects were tested with sine-vocoded speech. Similarly, the amount of amplitude

compression did not significantly affect speech recognition (Fu and Shannon 1998a; Zeng

and Galvin III 1999).

4.4.2 Spectral Processing

The study of Loizou et al. (1999) showed that amplitude cues could be traded with the

frequency resolution. High level of speech understanding was still shown for sentences

processed through 16 channels, but with only two steps of amplitude resolution. The well-

known study of Zeng and Galvin III (1999) showed that speech was still intelligible for

Nucleus 22 cochlear implant recipients even when the amplitudes were presented at two

levels. The intelligibility was still carried by slow temporal modulation and frequency-place

information in that case. The Nucleus 22 recipients scored higher in noise when the number

of electrodes was more than four. These studies consistently showed that amplitude

resolution was less important when other cues were not restricted.

Spectral resolution becomes more important for speech understanding in noise. Fu et al.

(1998) showed that phoneme recognition with 16 frequency channels was significantly

higher than with 8 channels noise-vocoded speech. Similarly, Stickney et al. (2004b) showed

that normal hearing subjects performed significantly better with eight channels than with

four channels noise-vocoded speech for different SNR conditions. Friesen et al. (2001)

P a g e | 49
Chapter 4 Cochlear Implant Sound Coding Strategies

showed that the speech understanding of cochlear implant recipients and normal hearing

subjects listening to noise-vocoded stimuli improved with the number of frequency channels

for speech tests at different SNRs. However, the improvement was seen for up to seven to

ten electrodes for cochlear implant recipients. In contrast, improvement was seen up to 20

channels for normal hearing subjects. Their study demonstrated that cochlear implant

recipients were not able to fully utilize the spectral information provided by the number of

electrodes used in their implant.

4.4.3 Temporal Processing

Speech perception studies with normal hearing subjects showed that low-rate modulation

below 16 Hz was perceptually most important for speech (Houtgast and Steeneken 1985;

Drullman, Festen and Plomp 1994b). The speech envelope modulation was most prominent

around 3 to 4 Hz (Houtgast and Steeneken 1985).

In the study of Shannon et al. (1995), normal hearing subjects tested with four-channel

vocoder showed that speech intelligibility in quiet was not significantly affected until

temporal envelope modulation was reduced below 16 Hz. They concluded that slow

temporal envelope modulation was sufficient for speech intelligibility, at least in quiet.

Shannon (1992) showed that cochlear implant recipients could follow temporal modulation

in the range of 1 to 50 Hz if loudness was matched properly. A good temporal envelope

representation requires the carrier rate to be at least four times faster than the highest

modulation rate of temporal envelopes in each channel (McKay, McDermott and Clark

1994). From the study of Füllgrabe et al. (2009), low-rate amplitude modulation even lower

than 4 Hz could also contribute the speech intelligibility in noise when the listeners only

relied on the envelope information.

A trade-off between the temporal and spectral information was observed for vowels and

consonants recognition (Xu, Thompson and Pfingst 2005). Consonant recognition was

improved with the frequency channels up to 12 and with temporal modulation up to 32 Hz.

When the spectral resolution was restricted, consonant recognition was improved by

allowing more temporal information. In contrast, the spectral cues were more important for

P a g e | 50
Chapter 4 Cochlear Implant Sound Coding Strategies

vowel recognition and trading the number of frequency channels with more temporal

information did not improve the recognition scores much.

In normal hearing, the segregation between two competing voices is done based on the

unique temporal information from each speaker, i.e., pitch. Since cochlear implant recipients

do not perceive pitch strongly, it is difficult for them to segregate the target speech from the

competing voice. In addition, temporal processing of hearing impaired subjects is poor such

that they do not get benefits from listening through gaps to pick up components related to

target speaker like normal hearing persons. Stickney et al. (2004b) showed that speech

intelligibility of cochlear implant recipients was lower when the competing signal was

another voice than a steady state speech-shaped noise. Their study indicated that segregation

between target and competing signals was easier for cochlear implants recipients if the two

signals were temporally different (steady vs. fluctuating). Speech understanding of cochlear

implant recipients suffers if target and masker signals are modulated at similar rate. In that

case the listeners have to rely on spectral and intensity information for segregation.

4.5 Conclusion

This chapter described the sound coding strategies of cochlear implant systems and speech

perception of cochlear implant recipients or normal hearing subjects with simulated cochlear

implant processing. The existing framework of sound processing is sufficient for cochlear

implant recipients to understand speech and other environmental sounds in quiet.

Performance variation between cochlear implant recipients is still considerably large in

noise. Intensity resolution is not important in envelope waveforms if spectral resolution is

sufficiently provided in favorable listening conditions. Spectral cues can be traded if slow

temporal modulation of at least 16 Hz is preserved. However, for speech understanding in

adverse listening conditions, the recipient will require more intensity resolution in temporal

envelope waveforms, even if spectral channels are not limited to present temporally different

target speech from the non-target signals. The next chapter will study gain algorithms for

hearing prostheses to improve amplitude cues within the limited dynamic range of an

impaired hearing.

P a g e | 51
Chapter 5 Automatic Gain Control Systems

5 Automatic Gain Control Systems

5.1 Introduction

An automatic gain control (AGC) is an essential signal processing component for hearing

devices with restricted output dynamic range. The role of compression in hearing devices is

to decrease the range of sound levels in the environment to better match the limited dynamic

range of a hearing-impaired person (Dillon 2001). For hearing aid users with a low-to-

moderate hearing impairment, the output dynamic range is limited due to loudness

recruitment. For cochlear implant users with severe-to-profound deafness the output dynamic

range is limited due to direct stimulation on the auditory nerve. A good AGC shall perform

compression or amplification to improve audibility of the input signal without causing

loudness discomfort and vice versa. The goal is to improve speech intelligibility of the

recipients in adverse listening conditions. Designing an AGC system to accommodate an

input signal whose amplitude varies over a wide range into a limited dynamic range of an

impaired auditory system without perceptual distortion is a challenge. The aim of this

research is to implement such an AGC system or similar technique to optimize the input

dynamic range of cochlear implants. First, the literature review is carried out on the

characteristics of an AGC system and their effects on speech intelligibility of hearing

impaired subjects.

AGC systems in hearing aids are well established. AGC systems in cochlear implants are

adapted from them. There are similarities as well as differences between the AGC systems of

hearing aids and of cochlear implants. Although a common goal of both hearing aid and

cochlear implant system is to rehabilitate hearing by supplementing the functions of the

impaired auditory system, the methodology to achieve the goal can be quite different

between the two systems. The output of a hearing aid is a reconstructed acoustic signal and

the output of a cochlear implant system is a sequence of electrical current pulses. The output

dynamic range of hearing aids can be different across frequency for an individual user

whereas the electrical dynamic range of cochlear implant recipients is low across all

P a g e | 52
Chapter 5 Automatic Gain Control Systems

frequencies. Hence, it is worth mentioning the rationales of AGC systems in cochlear

implant systems as well as in hearing aids.

This chapter first describes the fundamentals of an AGC system. It then reviews different

AGCs for hearing aids and cochlear implant systems and their roles to rehabilitate hearing. It

describes the AGC systems of the Nucleus cochlear implant systems and then elaborates the

ones that were used by the recipients in this thesis. Noise floor estimation methods are also

investigated in this chapter because some AGC systems employ them.

5.2 Fundamentals of AGC

An AGC compresses the dynamic range of an input signal, to better fit into the dynamic

range of the system. It basically reduces the gain for an input signal above the predetermined

threshold. Typical AGC parameters are:

 compression threshold or compression knee point

 compression ratio

 attack time

 (optional) hold time

 release time

The compression threshold is the input level, above which an AGC is active. The

compression ratio determines the amount of gain, which is the inverse of the slope of the

input-output (I-O) diagram, for the input levels above the compression threshold. Figure 5-1

shows the I-O diagrams of an AGC with different compression ratios. The output levels were

different from the input level above the compression threshold, for a compression ratio of

greater than one. The compression ratio of 1:1 indicates a linear gain, 2:1 indicates a

compression of 0.5 and ∞:1 indicates an infinite compression.

P a g e | 53
Chapter 5 Automatic Gain Control Systems

AGC IO Curve
120

100

1: 1

80 2:1
Output level (dB)

:1
60

 knee point
40

20

0
0 20 40 60 80 100 120
Input level (dB)

Figure 5-1 Input-output diagram of an AGC with different compression ratios

The attack time is the time taken for an AGC to react to an increase in input signal level. As

per the standard for AGC systems in hearing aids, IEC 60118-2, the attack time is the time

taken for the output to stabilize within 2 dB of its final value after the input increases from

55 to 80 dB SPL (IEC 1997). According to ANSI, the attack time is defined as the time it

takes the output to drop to within 3 dB of the steady state level after a 2 kHz sinusoidal input

changes from 55 to 90 dB SPL (ANSI 1996). The release time is the time taken for the gain

to return back from the attenuation mode following a decrease in input level. As per IEC

60118-2, the release time can be defined as the time taken for the AGC output signal to

increase to within 2 dB of its final value following a decrease in input level from 80 dB SPL

to 55 dB SPL. According to ANSI, the release time is defined as the time it takes the 2 kHz

sinusoidal output to stabilize within 4 dB of the steady-state level after the input changes

from 90 to 55 dB SPL. The hold time is the time taken for the gain to stay at the previous

value before it is released from the attenuation mode following the decrease in an input level.

The objective of applying a hold time is to reduce pumping effect. The hold time parameter

is optional in AGC systems.

P a g e | 54
Chapter 5 Automatic Gain Control Systems

These parameters, attack time, release time and hold time, define the dynamic characteristics

of an AGC, whereas compression threshold and compression ratio define the static

characteristic of an AGC.

An AGC consists of a control block, which consists of a feature extraction and gain decision

unit, and a gain unit as shown in Figure 5-2.

Figure 5-2 Components of an AGC system

The feature extraction in a simple AGC system detects the envelope of an input signal and

calculates the peak or RMS level for the gain decision unit. An envelope detector consists of

a rectifier and a low-pass filter for smoothing. The gain decision unit checks whether the

input level exceeds the compression threshold and determines the amount of gain according

to the compression ratio. The gain unit reduces the input signal accordingly.

Depending on the type of a level detector, the amount of gain can be different. If a peak

detector is employed, more compression will be exerted on the complex stimuli with

distinctive peaks. If the level detector detects the RMS level of the input signal, the

compressor may exercise lesser amount of compression compared to the one with a peak

detector. An AGC often cannot prevent fast transients from entering into the system,

depending on the smoothing low-pass filter in the envelope detector. The time constants of

an AGC are determined from the smoothing filter. If the envelope detector is slow, the attack

and release times are long, and the gain will be slow responding to the rapid changes in input

signal. Although less spectral distortion can be expected, temporal distortion such as

overshoots can occur. Overshoots can create false articulation and affect speech

intelligibility (Verschuure et al. 1996). If the attack time is short, the AGC responds quickly

to loud input sounds. However, abrupt gain changes can introduce extra frequency

P a g e | 55
Chapter 5 Automatic Gain Control Systems

components that are not part of the input signal. Figure 5-3 shows an AGC responding a

sinusoid input above the compression threshold.

Figure 5-3 Behaviour of a typical AGC system

A typical AGC uses an attack time much less than a release time. An AGC system can be

classified as fast-acting if the release time is less than 200 ms (Walker and Dillon 1982;

Dreschler 1992; Souza 2002). Fast AGCs typically have short attack (0.5–20 ms) and release

times (5–200 ms). A fast-acting AGC is often called a “syllabic” or “phonemic” AGC if the

time constants are shorter than the duration of common syllables (Moore 2008).

5.3 AGC in Hearing Aids

The aims of an AGC system in hearing devices are (i) to provide access to low level sounds

and (ii) to make loud sounds more comfortable. More importantly, an AGC should achieve

both goals without significant perceptual distortion. The ultimate goal is to improve the

speech intelligibility of recipients in their everyday life listening environments. There are

different rationales for designing an AGC system for a hearing prosthesis. An AGC system

either adjusts the gain slowly to adapt to the changes in the overall presentation level from

one listening situation to another or adjusts the gain dynamically to normalize loudness

P a g e | 56
Chapter 5 Automatic Gain Control Systems

between soft and loud components of speech. An AGC system can have multiple

compression channels to provide gain specific to frequency.

Dynamic range reduction is one of the deficits associated with the sensorineural hearing loss.

For a person with low-to-moderate hearing impairment, the dynamic range reduction shows

as an increased hearing threshold although the perception of loud sounds is still the same as

normal hearing. This phenomenon is called loudness recruitment (Dillon 2001). Many forms

of compression are used in both systems to overcome the reduced dynamic range.

Suprathreshold deficits associated with sensorineural hearing impairment reduce the ability

to discriminate components in both frequency and time domains (Moore 2003a).

The operation of an AGC system can be described in terms of (i) speed of compression, (ii)

amount of compression, and (iii) number of independent compression channels. The speed of

compression is the rate of gain change for a given input level and is determined from a

combination of the timing parameters. Table 5-1 lists the rationales for different AGC

systems that are commonly used in hearing aids.

P a g e | 57
Chapter 5 Automatic Gain Control Systems

Type of AGC Implementation Rationale

Compression Compression Compression

Speed Threshold Ratio

Compression fast high high To limit envelope

limiter clipping and to reduce

loudness discomfort

Wide Dynamic fast low low To reduce inter-

Range syllabic intensity

Compressor contrast (therefore

(WDRC) increase audibility of

softer syllables of

speech)

Automatic slow low low/high To reduce the overall

Volume Control level difference

(AVC) between different

listening conditions

Table 5-1 Rationales for different AGC systems

The choices of time constants and their effects on speech intelligibility have been reviewed

for hearing aids (Chapter 6, Dillon 2001; Moore 2008). Ideally, the time constants (attack

and release times) need to be fast for an AGC to adjust the energy ratio of soft and loud

sounds effectively. The rationales for a fast-acting AGC system are (i) to avoid loudness

discomfort and sound quality distortion due to peak-clipping of sounds at very high level, (ii)

to normalize loudness as normal hearing by reducing the intensity difference between short-

term speech components (phonemes and syllables) with high and low energy (Dillon 2001).

Based on these two rationales, a fast AGC system can be either a compression limiter or a

Wide Dynamic Range Compressor (WDRC).

Early hearing aids employed a peak-clipper to limit the maximum amplitude of the output

signals. An AGC was proposed as a compression limiter to avoid harmonic distortion,

intermodulation distortion and noticeable sound quality distortion introduced by the peak-

P a g e | 58
Chapter 5 Automatic Gain Control Systems

clipper (Steinberg and Gardner 1937). A compression limiter typically has fast time

constants, a high compression threshold and a high compression ratio.

A WDRC works under the assumption that speech intelligibility can be improved by having

access to low level signals (Villchur 1973). It dynamically reduces the inter-syllabic intensity

between speech components by doing both compression and amplification. A WDRC has a

low compression threshold to include signals within a wide range. Due to the setting of fast

time constants and a low compression threshold, the compression ratio of a WDRC is

typically low, less than three, to avoid significant distortion on temporal envelopes (Dillon

2001).

Speech intelligibility improvement was mostly shown for speech in quiet and the

degradation in noisy conditions (Souza 2002). The reported benefits of WDRC were higher

than linear amplification in the study of Marriage et al. (2005) with severe and profound

hearing loss children with multichannel hearing aids. However, some of the children did not

like the amplification of background sounds in noisy situations with the WDRC. The

performance improvement with a WDRC was due to the accessibility of low-level signals

below hearing threshold. The degradation in noise was due to the output SNR reduction.

Fast AGC can reduce speech intelligibility in noise due to the distortion of the correlated

fluctuation in the envelopes of different frequency channels which promotes the perceptual

fusion of the targeted speech and the reduction of the modulation depth and intensity contrast

(Stone and Moore 2004). It can also reduce speech intelligibility by reducing spectral and

temporal contrasts of speech (Plomp 1988). If the rate of gain change is faster than the

modulation frequency of a phonetic entity, a fast AGC is likely to distort temporal envelope

patterns and the information carried by that phonetic entity (Plomp 1983). In addition, the

cross-modulation components introduced by fast AGC between target speech and competing

voice degraded the intelligibility of target speech (Stone and Moore 2003, 2004, 2007). Fast

compression can also degrade speech intelligibility in noisy conditions by reducing the

apparent signal-to-noise ratio at the output. Rhebergen et al. (2008b) showed an effective

SNR degradation for a WDRC with different compression ratios.

P a g e | 59
Chapter 5 Automatic Gain Control Systems

The rationale of employing a slow AGC in a hearing prosthesis is to reduce the overall

speech presentation levels from different listening conditions. The time constants, typically

the release time, are much longer than the duration of syllables. The intensity relationship

between syllables is not affected by slow AGC system. A slow AGC with the time constants

in the range of seconds are often called Automatic Volume Control (AVC).

From the review of Souza (2002), a release time of 200 ms in compression systems of

hearing aids is typical for most daily situations. Many studies have evaluated the effect of

AGC time constants on speech intelligibility and quality. Most of them consistently showed

that a long release time was preferred for the perceived quality of sounds in listening

conditions with high background noise. The results of speech understanding with slow AGC

were mixed. Hansen (2002) showed the advantage of using a long release time over shorter

release times. Gatehouse et al. (2006) showed that shorter release times gave better speech

intelligibility than longer release times in two-channel AGC although the longer release

times setting was rated higher for speech quality. A few studies showed that the release

times, short or long, had no significant effect on speech intelligibility and quality (King and

Martin 1984; Neuman et al. 1998; Stone et al. 1999; Jenstad and Souza 2005).

A slow AGC, with a long release time, often has stagnant moments in which gain is slowly

recovered from momentarily loud transient sounds. Sounds following loud transients can be

too soft during that period. Sudden increases in sound level are commonly found in real-life.

For instance, the amplitude variation of an input signal can be wide during a conversation

between the recipient and another person, with rapid increases in level at the onset of voiced

sounds produced by the recipient due to a short distance between the microphones and the

recipient’s mouth. In order to accommodate a wide range of acoustic signals (speech and

other sounds) with short and long-term level variations, more than one AGC systems, or an

AGC system with more than one control loop with short and long time constants, are

necessary (Moore and Glasberg 1988).

Multichannel AGC systems are most commonly found in hearing aids because of frequency

dependent hearing loss. Studies reported mixed results on the benefits of a multichannel

compression compared to a linear amplification in hearing aids. A multichannel compression

P a g e | 60
Chapter 5 Automatic Gain Control Systems

can improve the ability of hearing impaired subjects to distinguish between stops and

fricatives. In contrast, it can degrade spectral concentration and hence the relative intensity

cues to identify some particular stops and fricatives. It can also degrade the place of

articulation and duration required to identify some consonants (De Gennaro, Braida and

Durlach 1986).

Unlike a single-channel wideband AGC, a multichannel AGC can reduce the level of signal

in those frequency channels where the background noise dominates. With an independent

multichannel AGC, less cross-modulation effect can be expected from two signals with

different frequency contents (Stone and Moore 2008). Another benefit of using a

multichannel AGC with independent compression channels is that it can reduce the level of

high intensity narrowband components without affecting other frequency components. On

the other hand, the independent compression can flatten the spectral profile of an input signal

by lowering spectral peaks while spectral troughs remain untouched. Since spectral peaks

and valleys are the important characteristics of speech sounds, spectral profile flattening

makes it harder to identify different components of speech, for example identification of

place of articulation of consonants. To reach a compromise between spectral flattening and

the increase in audibility, a multichannel AGC typically avoids having a compression ratio

higher than three if the number of channel is high (Dillon 2001). Plomp (1994) showed the

degradation of sentence recognition with more compression channels when the compression

ratio was higher than two (see Figure 5-4). A multichannel AGC can have more detrimental

effects on speech intelligibility than a single-channel wideband AGC if the rate of gain

change is too fast. A fast multichannel AGC with an independent compression channels can

distort short-term spectral cues such as formant patterns and the rapid gain change can

reduce the temporal modulation depth and the intensity contrast of different speech

components (White 1986; Plomp 1994; Moore 2008; Stone and Moore 2008).

P a g e | 61
Chapter 5 Automatic Gain Control Systems

Figure 5-4 Intelligibility score as a function of the number of channels, with


compression ratio as a parameter (Plomp 1994)

A multichannel AGC can be cross-coupled between frequency channels to reduce spectral

distortion and consequently improve speech intelligibility (White 1986). Moore, Peters and

Stone (1999) compared the effectiveness of a linear amplification and a WDRC with one,

two, four or eight channels by measuring SRT of subjects listening to sentences in

background sounds with spectral and/or temporal dips. When the background sounds

contained temporal dips, the number of channels was not important. However, when the

background sounds contained spectral dips, the multichannel AGC helped to improve the

audibility of a target speech that fell in the spectral dips of the background sound.

Yund and Buckles (1995b) analyzed the effect on the number of channels on the speech

intelligibility of mild-to-moderate hearing impaired subjects with speech-shaped noise and

competing speakers at different SNRs. They reported that speech intelligibility was

improved with increasing number of compression channels from four to eight. Scores were

not significantly different for the AGC with more than eight compression channels. The

study of Kates (2010) indicated that the number of channels was important for a steeply

sloping hearing loss. Kates showed that speech intelligibility was reduced when the number

of compression channels was reduced. The benefits of using a multichannel fast-acting

P a g e | 62
Chapter 5 Automatic Gain Control Systems

compression for speech intelligibility were only showed in certain conditions. Bustamante

and Braida (1987) showed that higher speech intelligibility was achieved with multichannel

WDRCs at low presentation levels because the independent channel compression allowed

more gain to be introduced in each channel for an equivalent overall loudness generated by a

linear gain. However, speech intelligibility became worse at high presentation levels because

of spectral distortions inflicted by an independent channel compression. Only subjects with

moderate hearing loss received benefits from a multichannel fast AGC over a linear

amplification in the low SNR condition (Yund and Buckles 1995a).

Numerous studies explored the benefits of different AGC systems on speech intelligibility

and quality for hearing aid users. The results are mixed with no consensus on how to best

configure a compression system for a given hearing loss (Kates 2010). Many studies showed

the advantage of compression over linear amplification (Laurence, Moore and Glasberg

1983; King and Martin 1984; Gatehouse, Naylor and Elberling 2006; Shi and Doherty 2008),

some showed no advantage (Crain and Yund 1995; Yund and Buckles 1995a; Davies-Venn,

Souza and Fabry 2007) and a few showed linear amplification had advantage over

compression (Lippmann, Braida and Durlach 1981; van Buuren, Festen and Houtgast 1999).

Although the studies aimed to characterize the same parameter, the difficulty to compare the

results from different studies arises from AGC systems with different configurations. In

addition, using different test stimuli and test setups on subjects with different degrees of

hearing loss makes the comparison between studies almost impossible.

Many fitting procedures, such as NAL-NL1 (Byrne et al. 2001), Camfit (Moore 2000), and

the Desired Sensation Level (Scollie et al. 2005) have been developed for configuration of

AGC systems in hearing aids as a function of hearing loss. From the literature of hearing

aid’s fitting, the most effective compression parameters depend on the hearing loss of the

aid’s wearer. It was not uncommon that the effects of compression parameters on speech

intelligibility of hearing aid users were mixed. Although an optimal set of parameters for an

AGC system is not readily available in all listening conditions, it is generally agreed upon to

avoid using a high compression ratio if the compression threshold is low and time constants

are fast. Similarly, AGC systems should avoid using a large number of compression channels

if the compression ratio is high and the compression channels are independent (Plomp 1988).

P a g e | 63
Chapter 5 Automatic Gain Control Systems

Levitt stated that the magnitude and impact of individual differences were greater with more

advanced compression systems and therefore should not be underestimated (Bacon, Fay and

Popper 2004).

5.4 AGC in Cochlear Implant Systems

The purpose of an AGC in a cochlear implant system is to provide access to sounds that

would otherwise be outside of the input dynamic range of the cochlear implant. The

rationales for the input dynamic range selection was reviewed in §4.3.8.

Sensitivity control, located after the Analog-to-Digital Converter (ADC) of the microphones,

is the first gain control in the cochlear implant signal path. It provides a linear gain to adjust

the sensitivity of the microphone signals as well as the knee point of the front-end AGC.

Increasing sensitivity brings more low-level input signals into the electrical dynamic range.

It can increase the intelligibility of low level speech components. Decreasing sensitivity on

the other hand can push low level signals out of the operating range, and this setting can be

useful in noisy situations. The sensitivity control adjusts the gain for varying acoustic

scenarios by user’s intervention. It is inconvenient to adjust the sensitivity control frequently

in some listening situations. Besides, if the recipient is a child, he or she may not be able to

manually adjust the gain without the adult’s supervision. Therefore an AGC is necessary to

maintain an input signal at the optimal level in different listening environments.

In cochlear implant systems, different forms of compression are used to present a wide range

of auditory signals to the implant. The amount of published data on AGC systems in

cochlear implant systems is relatively few compared to AGC systems in hearing aids. Like

AGC systems in hearing aids, AGC systems in cochlear implant systems can also be

categorized into three: a compression limiter, a WDRC, and a slow AGC. The purpose of a

compression limiter is to reduce peak-clipping distortion due to the instantaneous infinite

compression at the LGF. The purpose of a WDRC is to provide compression amplification to

promote the audibility of low-level input stimuli. A slow AGC is employed in cochlear

implant systems to adjust a long-term gain for an input signal to adapt changes in listening

situations.

P a g e | 64
Chapter 5 Automatic Gain Control Systems

McDermott et al. (2002) showed the benefits of a WDRC, known as Whisper in the Nucleus

systems, for low level speech understanding in quiet. Whisper improved speech intelligibility

of cochlear implant recipients for speech presented at low presentation level. Although it was

not statistically significant, Whisper showed a tendency to degrade the performance in noise.

It was explained by the reduction of the effective SNR when background noise was boosted

during speech pauses. Six subjects who used Whisper in real-life listening environments

reported that compression made background noise louder. All subjects except one agreed

that the benefits of Whisper outweighed the background noise. It should be noted that

individuals with a good sensitivity to temporal fine structures can get more benefits from a

fast compression (Moore 2008). Since cochlear implant recipients have poor sensitivity to

temporal fine structures, they are less likely to have much benefit from a fast compression.

A slow AGC is necessary in a cochlear implant system to expand the range of input signal

without causing the spectral and temporal distortion that typically associated with rapid gain

change. Automatic Sensitivity Control (ASC) is a slow AGC employed in the Nucleus

systems to control the level of background noise (Seligman and Whitford 1995). ASC

improved speech intelligibility of cochlear implant recipients in noisy conditions (Wolfe et

al. 2009).

Modern cochlear implant sound processors use sophisticated AGC systems with multiple

control loops to meet the requirements for short-term and long-term level adjustments. The

dual-time-constant AGC system developed in Cambridge has been widely accepted in

contemporary cochlear implant systems as a standard AGC (Moore and Glasberg 1988;

Moore, Glasberg and Stone 1991; Stone et al. 1999; Spahr, Dorman and Loiselle 2007;

Boyle et al. 2009). If a loud transient comes in, the dual-loop AGC passes the control to the

fast compression to compress the input signal immediately. Hence it can overcome the

disadvantage of a slow AGC which often makes sounds inaudible after loud transients due to

a long release time. When the transient is gone, the AGC then passes the control back to the

slow component to determine the output level.

Stöbich et al. (1999) evaluated six front-end configurations; the standard slow AGC, four

configurations dual-loop slow AGC and a linear gain, in the Med-El Combi-40 system.

P a g e | 65
Chapter 5 Automatic Gain Control Systems

Stöbich et al. showed that the AGCs evaluated were effective to maintain speech

intelligibility at soft, medium, and loud level (55, 70 and 85 dB SPL) and no significant

performance difference between them. Although the slow AGC was efficient for speech

presented at the three presentation levels, subjects performed significantly better with the

dual-loop AGCs in listening situations that involved impulsive noise.

Boyle et al. (2009) compared the performance of two front-end AGC systems, a fast AGC

and a dual-time-constant AGC, with six cochlear implant recipients. Figure 5-5 shows the

implementation of the dual-loop AGC system evaluated in Boyle et al.’s study. If the output

level was above 8 dB of the running level determined by the slow system, the fast AGC

rapidly reduced the gain. A hold time was inserted in the dual-loop AGC of Boyle et al.’s

study. When the hold timer was activated, gain was frozen to prevent the background sounds

from becoming loud in between short gaps of speech components. Since the compression

threshold of the slow AGC was lower than that of the fast AGC, the operation of the dual-

loop AGC was mainly determined by the slow component.

Figure 5-5 Block diagram of dual time-constant AGC system (Boyle et al. 2009)

The dual-loop AGC showed a significantly better performance than the fast AGC in both

fixed and roving presentation level tests. Boyle et al. explained that the performance

degradation with the fast AGC due to the disruption of low-rate envelope cues was reduced

P a g e | 66
Chapter 5 Automatic Gain Control Systems

in the dual-loop AGC. The quality questionnaire indicated that the dual-loop AGC was

preferred more than the fast AGC by majority of the participants. Many found that it was

easier to separate target speech from the background noise with the dual-loop AGC.

A multichannel AGC operates on the channel amplitudes after the filterbank in the cochlear

implant signal path. The compression channels can be independent or interdependent

between each other. If the number of independent compression channels is same as the

number of frequency channels, the rate of gain change shall be very slow to avoid envelope

distortion. A multichannel slow AGC called Adaptive Dynamic Range Optimization

(ADRO) (Blamey 2005) has been used in the Nucleus systems (James et al. 2002; Patrick,

Busby and Gibson 2006). ADRO independently and slowly adjusts gain in each frequency

channel using some statistical rules. It improved speech intelligibility of cochlear implant

recipients at low presentation level in quiet (Dawson, Decker and Psarros 2004; Iwaki,

Blamey and Kubo 2008; Muller-Deile et al. 2008). Not all studies showed the benefits of

ADRO in noise. Although ADRO has a background noise rule to limit the level of

background noise, it was not very active due to a high noise threshold.

Little has been published on the performance of just the compression limiter on speech

intelligibility of cochlear implant recipients. It has been employed in the Nucleus systems as

a default AGC to limit envelope clipping and loudness discomfort of input signals above the

target level (C-SPL). The compression limiter frequently adjusts short-term levels of high

level input signals above the compression threshold. As its role seems to be important for

loud sounds, at least intially before the slow system takes over the operation, the effects of

compression limiter on speech intelligibility of cochlear implant recipients will be studied

more in the second part of this thesis.

5.5 AGC Systems in the Nucleus Sound Processors

This section first describes the AGC systems in the Nucleus sound processor. It then

describes the implementation of the existing AGC systems in the Nucleus CP810 sound

processors because these AGCs will be evaluated with cochlear implant recipients in the

second part of this thesis. The speech intelligibility of cochlear implant recipients with the

P a g e | 67
Chapter 5 Automatic Gain Control Systems

existing AGC system will serve as a performance benchmark for the proposed gain

optimization algorithms to compare with.

The Nucleus sound processors have the front-end gain control algorithms known as ASC,

Whisper and the compression limiter. The multichannel AGC known as Adaptive Dynamic

Range Optimization (ADRO) (James et al. 2002) is available after the filterbank. ADRO has

been introduced into the Nucleus signal path since 2005 (Patrick, Busby and Gibson 2006).

The AGC systems available in the Nucleus CP810 sound processor are shown in Figure 5-6.

In the Nucleus CP810 sound processor, the front-end AGCs are consolidated into one AGC

at the front-end. Since this AGC can be configured to perform a single or a combination of

the front-end AGC algorithms, it will be called the unified gain model (UGM). The UGM

can emulate the performance of ASC, Whisper and the compression limiter.

Figure 5-6 AGC systems of the Nucleus CP810 sound processor

5.5.1 Compression Limiter

The front-end compression limiter is a default AGC in the Nucleus sound processors. The

purpose is to reduce envelope clipping to high-level input stimuli. The compression

threshold of the fast AGC is set to compress peaks of the speech waveform whose overall

level is above the targeted level, i.e., C-SPL. A typical C-SPL is 65 dB SPL in the Nucleus

Freedom and the Nucleus CP810 sound processors. An infinite compression ratio is used to

ensure that the output signal cannot exceed the compression threshold. The attack and

release time is typically less than 10 and 100 ms for this type of AGC.

5.5.2 Automatic Sensitivity Control

Automatic Sensitivity Control (ASC) is a slow AGC that automatically adjusts the

microphone sensitivity (gain) slowly to control the level of background noise that exceeds

P a g e | 68
Chapter 5 Automatic Gain Control Systems

the breakpoint (Seligman and Whitford 1995). It gives listening comfort to the recipients in

difficult listening situations with high background noise (Wolfe et al. 2009).

The block diagram of ASC is as shown in Figure 5-7. It is a feedback comparative system

with the envelope detector followed by the noise floor detector. The envelope detector

follows peaks of the half-wave rectified input signals. The noise floor estimator tracks the

minimum level of the envelopes. The gain provided by ASC is based on the comparison

between the estimated noise floor level and the ASC break point. If the estimated noise floor

is above the threshold, the gain is slowly decreased and if it is below the threshold, the gain

is slowly increased. The maximum gain is limited by the user’s sensitivity setting.

ASC Gain
(Sensitivity)

ASC Input ASC Output

half wave
rectifier

envelope
detector

noise floor
detector

sensitivity (gain)
adjustment breakpoint
(noise floor target)

Figure 5-7 Block diagram of Automatic Sensitivity Control (Seligman 2000)

When ASC is not enabled, a fixed gain is applied to the input signals. When ASC is enabled,

the sensitivity is fixed at the default setting of 12 (0 dB gain) for the estimated noise floor

below the ASC break point. The ASC reduces gain in the presence of a long duration high-

level noise while having negligible effects on speech signals in quiet conditions.

P a g e | 69
Chapter 5 Automatic Gain Control Systems

5.5.3 Whisper

Whisper is a fast-acting front-end AGC with a post-compression gain boost (McDermott,

Henshall and McKay 2002). Whisper is a WDRC that adjusts the intensity difference

between loud and soft speech components. The purpose of Whisper is to increase the

perceived loudness of low-level speech below the compression threshold. The compression

threshold of Whisper is 52 dB SPL and the compression speed is approximately at a syllabic

rate. The compression ratio of Whisper is 2:1. Figure 5-8 shows the input-output diagram of

Whisper.

Whisper: IO Curve
80

75

70
2:1
65

60
Output (dB)

 knee point

55

50

45

40

35

30
30 40 50 60 70 80
Input (dB SPL 1kHz Sine)

Figure 5-8 Input-output diagram of Whisper

5.5.4 Unified Gain Model (Tri-loop AGC)

The unified gain model (UGM) consists of a slow, medium and fast AGC. The fast AGC is

functionally the same as the compression limiter (§5.5.1). The medium AGC with the

intermediate time constants between fast and slow is to provide listening comfort. The AGCs

of the UGM are cascaded in the order of time constants, the slow AGC followed by the

medium AGC and then the fast AGC. When the UGM is configured to operate all three AGC

components, it becomes a tri-loop AGC. The term tri-loop AGC will be used, if all three

AGCs of the UGM are utilized. The input to the UGM is either the RMS or peak of the half-

P a g e | 70
Chapter 5 Automatic Gain Control Systems

wave rectified envelopes. The tri-loop AGC experimented in this thesis used an RMS level

detector. The output of the level detection circuit is updated at the envelope sample rate after

the filterbank. Since the compression threshold of the slow AGC is lower than that of the fast

AGC, the operation of the tri-loop AGC is mainly determined by the slow compression. The

slow AGC determines the amount of gain based on the comparison between the input level

and the slow compression threshold. The medium AGC determines the amount of gain based

on the comparison between the compressed input level (i.e., the level at the output of the

slow AGC) and the medium compression threshold. Likewise, the fast AGC determines the

amount of gain based on the comparison between the already compressed input level by the

slow and medium AGCs and the fast compression threshold. Total gain is a linear

combination of gains in decibel from all AGCs involved.

Many forms of AGC, for example emulation of ASC, Whisper and the front-end

compression limiter of the Freedom processing, can be configured by different parameter

settings of the UGM. Table 5-2 shows three settings of the UGM parameters used in this

thesis. The first column named FEL is the UGM parameter setting that emulates the front-

end compression limiter of the Nucleus Freedom processor. The second and third columns

are two configurations of the tri-loop AGC used in this thesis.

UGM Parameter Setting FEL Tri-loop 75 Tri-loop 65


AGC Level Detection Peak RMS RMS
Slow AGC Attack Time 0 8000 8000
Slow AGC Release Time 0 8000 8000
Slow AGC Compression Threshold 95 54 54
Slow AGC Compression Ratio 1 Inf inf
Medium AGC Attack Time 0 300 300
Medium AGC Release Time 0 2000 2000
Medium AGC Hold Time 0 100 100
Medium AGC Compression Threshold 95 74 64
Medium AGC Compression Ratio 1 Inf Inf
Fast AGC Attack Time 3 5 5
Fast AGC Release Time 53 100 100
Fast AGC Compression Threshold 73 79 69
Fast AGC Compression Ratio Inf Inf Inf

Table 5-2 Parameter settings of the UGM

P a g e | 71
Chapter 5 Automatic Gain Control Systems

Figure 5-9 shows the input, output signals and the compression of the tri-loop AGC (Tri-loop

65) to a sinusoid input at 1 kHz with step changes in the presentation level. When the

presentation level of the input signal stepped up at 7 seconds, it exceeded the compression

thresholds of the slow AGC. Hence the slow AGC slowly reduced the gain in response to the

increased level of the input signal. The medium AGC also reduced the gain since the

compressed input level at the output of the slow AGC was still above the compression

threshold of the medium AGC. At about 9 and 16.5 seconds, and the input stepped up again

and triggered all AGCs. When the amount of medium gain at 9 and 16.5 seconds are

compared, the one at 16.5 seconds was lower because the input level had already been

reduced by the slow AGC in the latter case. The same observation was made for the fast

AGC. Since the overall compression was mainly determined by the slow AGC, less temporal

and spectral distortion is guaranteed in this multiple-loop compression scheme.

80
dB SPL

60

Input Slow AGC compression threshold Medium AGC compression threshold Fast AGC compression threshold
40
0
-10
dB

-20
Fast AGC Gain
-30
0
-10
dB

-20
Medium AGC Gain
-30
0
-10
dB

-20
Slow AGC Gain
-30
0
-10
dB

-20
Total Gain
-30
0 5 10 15 20 25 30
Time(s)

Figure 5-9 Input, output and gain signals of the tri-loop AGC on a roving-level sinusoid

5.5.5 Adaptive Dynamic Range Optimization

The Adaptive Dynamic Range Optimization (ADRO) (James et al. 2002) is a slow

multichannel AGC located after the filterbank in the cochlear implant signal path (Figure

5-6). ADRO independently adjusts the gain in each frequency channel. The main objective

P a g e | 72
Chapter 5 Automatic Gain Control Systems

of ADRO is to make the output comfortably loud. The gain rules of the ADRO are devised

to meet the following aims:

 to improve the audibility of soft sounds,

 to maintain loud sounds at a comfortable level, and

 to keep background noise below an objectionable level

The input to the control block is the percentile estimates of the long term level of the output

at each frequency channel. The percentile levels are estimated by Nysen’s recursive

percentile estimator (Nysen 1980). The recursive estimator operates by taking the level in the

data register and comparing it with the instantaneous signal level. If the level is greater than

the estimate, the estimate is increased by an amount proportional to ‘q’, the desired

percentile level. If the level is less than the estimate, the estimate is decreased by an amount

proportional to ‘1 - q’.

For each gain rule, a percentile estimate is compared with a preset threshold to determine the

direction of gain, up or down, for each channel. Figure 5-10 shows the operation of ADRO

in one channel. Three gain rules operate ADRO as per the objectives outlined above. Only

one rule is activated at a time. Gain rules are in the order of importance as shown in the

pseudo code below:

if high_percentile > high_target % Comfort rule

reduce gain;

elseif low_percentile > low_target % Noise rule

reduce gain;

elseif mid_percentile < mid_target % Audibility rule

increase gain;

else hold gain;

end

P a g e | 73
Chapter 5 Automatic Gain Control Systems

The rate of gain change is typically slow. The precedence of the rules indicates that listening

comfort is given higher priority than audibility. The final gain is limited to not exceed a

predetermined maximum gain.

Figure 5-10 Block diagram of ADRO in one frequency channel

One possible setback of ADRO is that the background noise rule can reduce audibility if the

estimated background noise floor using the percentile estimator is inaccurate. Similarly, the

audibility rule can increase the background noise level together with target speech. Therefore

ADRO has placed the limitation on the maximum gain to prevent the output from becoming

excessively loud. There are a few studies that evaluated ADRO with cochlear implant

recipients (James et al. 2002; Muller-Deile et al. 2008). All studies consistently show the

benefit of ADRO for low-level speech perception of the cochlear implant subjects.

5.6 Noise Estimation

Noise reduction and speech enhancement algorithms estimate the inherent noise power from

the input stimuli. Noise estimation is also part of the AGCs, for example ASC and ADRO, to

reduce the overall level input stimuli in high background noise.

A single-channel noise estimation algorithm does not have an exclusive access to the noise

power. The input signal to a noise estimation algorithm is a mixture of speech and other

competing signals. Therefore, noise estimation algorithms work under the assumption that

P a g e | 74
Chapter 5 Automatic Gain Control Systems

speech and noise are independent signals, and they are either temporally or spectrally

distinguishable. If speech and noise come from spatially different sources, for example

listening to the speaker in front while lawn-mowing noise comes from behind, the two

signals can be separated by a microphone-array beamforming algorithm. This thesis is

concerned with the single-channel noise estimation algorithms that estimate noise power

based on different spectro-temporal characteristics of speech and noise signals. Noise

estimation and speech enhancement are active research topics in speech processing and

communication fields.

Single-channel noise estimation algorithms can generally be sorted into three categories

(Loizou 2007):

 Minimum-tracking algorithms

 Time-recursive averaging algorithms

 Histogram-based algorithms

The minimum-tracking algorithms work under the assumption that noisy speech power in

each frequency band often decays to the noise power, even during speech activity (Martin

1994, 2001). Therefore noise power can be estimated by tracking the minimum of the noisy

speech power within a short time window of duration from 400 ms to 1 second. Martin’s

minimum statistics noise estimation algorithm has been successfully used in the SNR-based

Noise Reduction algorithm of the Nucleus CP810 sound processor (Dawson, Mauger and

Hersbach 2011; Hersbach et al. 2012).

The recursive averaging algorithms update noise estimate whenever the effective SNR is low

(Lin, Holmes and Ambikairajah 2003a, b). The updated noise power is the weighted average

of the estimated noise power from the previous frames and the noisy signal power spectrum

at the present frame. In the noise estimation method proposed by Lin, Holmes and

Ambikairajah (2003b), the coefficient of the noise update filter is adaptively calculated from

the effective SNR. There are different implementations of time-recursive averaging

algorithms that used a fixed coefficient for the noise update filter but only updated the

estimated noise power when the effective SNR exceeded a certain threshold value (Hirsch

and Ehrlicher 1995) or when the noisy spectrum fell within the a fraction of the variance of

P a g e | 75
Chapter 5 Automatic Gain Control Systems

the noise estimate (Ris and Dupont 2001). Other types of time-recursive averaging

algorithms calculated the probability of speech presence in the frequency bins and updated

the smoothing coefficient accordingly (Malah, Cox and Accardi 1999). The speech presence

probability was also calculated from the ratio of the noisy power spectrum to the spectral

minimum from the previous frames (Cohen and Berdugo 2002). Such algorithms use a

spectral minima-tracking feature and are also known as Minima-Controlled Recursive

Averaging (MCRA) algorithms.

The histogram-based techniques are based on the observation that noise components occur

more frequently than speech components (Loizou 2007). The histogram of a noisy speech is

expected to be a bimodal, with the lower-level mode corresponding to the noise components.

From the study of Stahl et al. (2000), using the median of the noisy speech as the noise floor

estimate reduced the word error rate. Histogram-based techniques often can be slow due to

the process of sorting the signal levels and counting the number of occurrences. They also

require a large time window to collect samples to obtain meaningful statistical information

from the input signal. Nysen (1980) proposed a method that efficiently estimates a percentile

without having to collect and sort a large number of samples from the input signal. Nysen’s

percentile estimator has been successfully used in ADRO (James et al. 2002; Blamey 2005;

Blamey, Macfarlane and Steele 2005).

The recursive averaging sub-band noise estimation method proposed by Lin (2004) was

experimented in the proposed Adaptive Loudness Growth Function (§12.3.3). Minima

tracking and bias compensation in the Martin’s minimum statistics noise estimation (Martin

2001) are applied in the proposed noise estimator. Therefore both Martin’s and Lin’s method

are described in the following subsections.

5.6.1 Martin’s Minimum Statistics Noise Estimator

The Power Spectral Density (PSD) of noisy speech often decays to the noise power level

even during speech activity. Hence the power level of noise floor can be estimated within a

time window that is long enough to cover speech components with brief pause periods. The

noisy speech PSD is recursively smoothed by a first-order IIR filter.

P a g e | 76
Chapter 5 Automatic Gain Control Systems

(5.1)

where is the frame index, k is the frequency bin index, is the smoothed PSD of the

noisy signal, is the smoothing parameter and is the PSD of the noisy signal. As

per the implementation in Martin (1994), the smoothing parameter is calculated as:

(5.2)

where = the window length, R = FFT buffer length and is the sample rate. The

smoothing parameter is fixed and not updated over time. Using a fixed smoothing parameter

can widen spectral peaks and the noise estimate may not be accurate. In Martin (2001), a

time-varying smoothing parameter was introduced. The optimal smoothing parameter is

derived as:

(5.3)

where is true noise PSD. In the actual implementation the true noise PSD is

replaced by the estimated noise PSD from the previous frame . Because of using

the estimated noise PSD, needs to be limited to , with a value less than one to

avoid a deadlock occurring when the noise PSD is equal to the smoothed noisy signal PSD.

When it happens, the smoothed noisy signal PSD cannot react quickly to the changes in the

actual noisy signal PSD.

The estimated noise PSD lags behind the actual noise PSD by a number of frames depending

on how the minimum of the smoothed noisy signal PSD is found from the previous frames.

Hence an error monitoring procedure is necessary to detect the deviation between the PSDs

of the smoothed noisy signal and the average of the actual noisy signal. The correction factor

is calculated as:

(5.4)

P a g e | 77
Chapter 5 Automatic Gain Control Systems

The maximum value of is limited and it is further smoothed by a first-order IIR filter.

Then the is finally calculated by:

(5.5)

The minimum value of the smoothing parameter is limited to avoid reaching zero.

The minimum value of the smoothed noisy signal PSD is searched over D

consecutive frames. can be updated for every frame but it is computationally

expensive to do D – 1 compare operations for each frequency band. The searching can be

effectively done by a simple procedure described in (Martin 1994). The update is done at

every D frames but the minimum is found for each frame by only comparing with the

value from the previous frame . The minimum searching procedure is described as in

the following pseudo code.

if modulus( ,D) == 0 % at Dth frame

= min( , );

= ;

else % at other frames

= min( , );

= min( , );

end

is the estimated noise PSD that represents . The procedure is done for

each frequency band. The estimated noise PSD is the minimum smoothed noisy signal power

PSD.

The noise PSD estimated by minimum tracking is often biased towards lower value of the

actual noise PSD and therefore needs to be compensated. It can be compensated by

multiplying the noise estimate with the reciprocal of the mean of the minimum of a sequence

P a g e | 78
Chapter 5 Automatic Gain Control Systems

of random independent variables. The bias is inversely proportional to the number of

previous noisy PSD samples considered in the estimate of the smoothed noisy signal PSD. In

other words, the more the previous frames are used to smooth the power spectrum, the lower

the bias is. However a very long duration window cannot be used to smooth the power

spectrum of highly non-stationary signals like speech. In practice, an IIR filter is used to

smooth the noisy signal power. As a result the smoothed PSDs between successive frames

are correlated to a certain extent. Therefore the bias factor can only be estimated by

simulating the underlying exponential distribution of the power spectrum for a given number

of frames to search the minimum. The bias calculation is a complicated procedure yet

important for compensating the estimated noise that would otherwise be underestimated.

5.6.2 Lin’s Recursive Averaging Noise Estimator

Unlike other methods, the noise estimation using recursive averaging method does not

require a large buffer of samples to update the noise power. It works under the assumption

that speech and noise are independent signals and noise power changes slowly over time.

The noise power in each frequency band is estimated by a first-order IIR filter.

(5.6)

The key component in equation 5.6 is the smoothing parameter which is updated at

each analysis frame by:

(5.7)

where L is the length of frames for estimating the average noise power and Q controls the

steepness of , the smoothing parameter. close to 1 can result a slow noise estimation. The

value of L is typically set between 5 and 10 and Q is between 3 and 10. Although the

equation describes the average noise power at the denominator, it can also use the median

value of the noise power from the previous L frames. The smoothing parameter is

limited between 0 and 1. When speech is absent, the noisy signal power contains noise only

and . As a result, the noise power estimate, , is updated with the noisy

signal power at that time. When speech presents, and the noise power estimate is

P a g e | 79
Chapter 5 Automatic Gain Control Systems

not updated with the new value of noisy speech power. The final noise power estimate is

obtained by smoothing the noise power estimate with a first-order IIR filter.

(5.8)

The value of the IIR filter coefficient of the final smoothing filter is set around 0.5.

5.7 Conclusion

This chapter reviewed AGC systems for hearing aids and cochlear implant systems, and

described the AGC systems in the Nucleus sound processors. The aims of an AGC system

are (i) to provide access to low level sounds and (ii) to make loud sounds more comfortable.

More importantly, an AGC should achieve both goals with little-to-none perceptual

distortion. The rationales for AGC systems in hearing aids and cochlear implant systems are

mainly different in terms of compensating frequency-dependent hearing loss. The optimal

parameter set for an AGC in hearing aids is dependent on many factors and the results are

mixed between studies. The main purpose of AGC systems in cochlear implants is to expand

the predetermined input dynamic range with minimal distortion. Modern cochlear implant

systems use more than one AGC system or an AGC system with multiple (slow and fast)

control loops. While it is important to know the effects of AGC parameters on speech

intelligibility, it is also important to know how to evaluate those effects. Therefore the signal

metrics to quantify the effects of AGC system and test methodology for evaluation of AGC

systems in cochlear implants will be studied in next.

P a g e | 80
Chapter 6 Speech Intelligibility Metrics

6 Speech Intelligibility Metrics

6.1 Introduction

Listening tests with recipients are compulsory for research studies of cochlear implants.

Sometimes listening tests, preparation and actual testing, can be time-consuming for both

cochlear implant recipients and researchers, especially when the parameter space of an

algorithm/system is large or performance is affected by factors other than the main factor to

be analyzed.

Blamey et al. (1996) implemented a three-stage model of auditory performance over time

with the data of 800 recipients who benefited from a cochlear implant. The study was

replicated by Lazard et al. (2012) with more recipients data, 2251 recipients from 15

international clinics (Lazard et al. 2012). The model accounted for 22% of the performance

variance. A part of the unexplained 78% was likely the variation due to test-retest reliability

of speech intelligibility measures used. The performance variation between recipients and

the test-retest reliability of speech tests can undermine the performance variation due to the

processing conditions under test. The overall test results may not be sensitive to the

processing conditions in that case.

A signal metric can be defined as a measure to quantify the performance of a system

objectively. A speech metric is a subset of signal metrics that uses speech signals as input

stimuli. Test conditions are simulated offline to gather the output of a system under test for

performance analysis. The advantage of simulating test conditions offline is repeatability,

with results not affected by subject’s variability. A reliable metric can expedite the

development process of new signal processing algorithms. More importantly, metrics that are

highly correlated with speech intelligibility of cochlear implant recipients allow

understanding of important features of speech.

The standard metrics for speech processing and communication systems are the Speech

Intelligibility Index (SII) of ANSI S3.5 and Speech Transmission Index (STI) of IEC 60268-

P a g e | 81
Chapter 6 Speech Intelligibility Metrics

16. Both quantify speech intelligibility by the average signal-to-noise ratio across critical

channels. Both measures are applicable to speech applications for normal and hearing

impaired subjects.

Compression affects speech intelligibility by reducing the amount of low rate modulation

depth, flattening spectral profile, introducing temporal distortion and reducing signal-to-

noise ratio. Measures that can quantify spectral-temporal distortion of envelopes are

hypothesized to predict speech intelligibility. The prediction power depends on how

sensitive the envelope distortion is on speech understanding. For example, cochlear implant

recipients are more sensitive to noise distortion than normal hearing subjects.

This chapter examines the signal metrics that can quantify envelope distortion due to various

processing in the cochlear implant signal path. The objective is to predict the effects of AGC

systems on the speech intelligibility of cochlear implant recipients. The standard metrics: SII

and STI, and non-standard measures: apparent SNR method, Across-Source Modulation

Correlation (ASMC) and Normalized Covariance Measure (NCM), will be studied.

6.2 Speech Intelligibility Index

The Speech Intelligibility Index (SII) is a metric derived from the Articulation Index (AI). AI

was developed by French and Steinberg (1947) and later improved by Kryter (1962a, b). The

basis of AI is based on the principle that the intelligibility of distorted, filtered or masked

speech depends on the proportion of speech available to the listener. AI was originally

developed to predict speech intelligibility of telephone communication systems under

various conditions of noise distortion, filtering and low level speech. The method to calculate

AI measure was re-examined in many studies (Pavlovic 1984; Pavlovic and Studebaker

1984; Pavlovic, Studebaker and Sherbecoe 1986; Pavlovic 1987) and became the standard

metric SII in ANSI. SII: ANSI S3.5 (1997) is a quantity that highly correlates with the

intelligibility of speech under adverse listening conditions such as noise masking, filtering

and reverberation. The concept of SII is that different frequencies contribute different

amounts of speech intelligibility and the intelligibility of a speech communication system

P a g e | 82
Chapter 6 Speech Intelligibility Metrics

can be predicted by measuring the SNR in each contributing frequency band. This concept

has been widely used as the basis of other speech intelligibility metrics.

The method calculates the intelligibility index by the summation of the audible quantity of

speech that carries intelligibility across the frequency bands. It can be described as:

(6.1)

where is the band importance function, is the audibility function and P is the

proficiency factor of both talker and listener on the speech material. The procedure is to

divide the input speech material into a number of critical bands. The audibility W of a

frequency band depends on the effective sensation level of that band in the listener’s ear.

The scope of the SII is limited to natural speech, normal hearing listeners with no linguistic

and cognitive deficiencies, listening to natural speech, and the processing should not contain

sharp filters. The procedure was modified for hearing impairment by widening the

integration bandwidth of speech and noise spectrum greater than the critical bandwidth.

From Pavlovic (1984), the discrepancies between the observed and predicted performance

are greatest in those frequency regions where hearing loss is greatest. The contribution factor

of each frequency band was modified by a speech desensitization factor, a function of

hearing loss in frequency (Pavlovic, Studebaker and Sherbecoe 1986). The other

shortcoming of SII measure is that it is unable to account for the interaction between various

frequency components because it uses the long-term spectrum of speech and noise

(minimum duration of 30 seconds) and hence cannot reliably predict the intelligibility of

speech in non-stationary noise maskers (Rhebergen and Versfeld 2005). The conventional

SII measure was extended by partitioning the speech and noise into smaller frames and

averaged over the SII values of those time frames to get the final SII value (Rhebergen and

Versfeld 2005; Rhebergen, Versfeld and Dreschler 2006; Rhebergen, Versfeld and Dreschler

2008a).

P a g e | 83
Chapter 6 Speech Intelligibility Metrics

6.3 Speech Transmission Index

Steeneken and Houtgast (1980) developed the Speech Transmission Index (STI) to quantify

speech transmission channels. The STI measure is based on the idea that the speech

intelligibility reduction can be predicted by the reduction in the temporal envelope

modulations. The method is conceptually very similar to the AI of French and Steinberg

(1947), although it applied the modulation transfer function (MTF) concept to evaluate

temporal characteristics of the system under test. The MTF calculates the reduction of the

output envelope spectrum compared to the input envelope spectrum. STI is obtained by the

weighted summation of the transmission indices in the critical frequency bands from 125 Hz

to 8 kHz. In each critical band, the modulation index was calculated for each frequency

component in the test stimuli. The range of the modulation frequency, typically between 0.5

Hz and 16 Hz, is also the most intelligible range of the connected speech of a typical

conversation. The study of Houtgast and Steeneken (1985) shows that the modulation index

of the speech intensity envelope spectrum is the highest from 2 to 4 Hz, as shown in Figure

6-1 below.

Figure 6-1 Speech modulation envelope spectrum (Houtgast and Steeneken (1985))

A high correlation between the STI and the subjective speech intelligibility scores were

shown for a wide variety of speech communication systems with distortions like band-

limiting, peak clipping, AGC, noise and reverberation (Houtgast and Steeneken 1973). Some

P a g e | 84
Chapter 6 Speech Intelligibility Metrics

studies show that the MTF cannot be used directly to predict the speech intelligibility, for

example, if the test stimulus is speech in the fluctuating background noise. The fluctuations

in the background noise will underestimate the modulation reduction of speech caused by the

system under test.

6.4 Normalized Covariance Measure

The Normalized Covariance Measure (NCM) is a STI-based measure which quantifies the

performance of a system by calculating the SNR as a weighted sum of the transmission

indexes across critical bands (Holube and Kollmeier 1996). The difference between the

traditional STI and NCM is that NCM uses the covariance between temporal envelope

modulations of the reference and processed signals to determine the transmission index in

each band (Goldsworthy and Greenberg 2004; Chen and Loizou 2010). Chen and Loizou

(2011b) indicated that STI-based measures needed to use high modulation rates up to 100 Hz

for a better prediction of the intelligibility of the vocoded speech based on the findings from

their study. They also showed that a better speech intelligibility prediction of cochlear

implant recipients was achieved when subject-specific factors, the channel interaction matrix

in particular, were included in the NCM calculation (Chen and Loizou 2011a).

The procedures to calculate the NCM were described in Ma et al. (2009) and Chen and

Loizous (2011a). The channel interaction matrix (Chen and Loizou 2011a) was calculated

for each electrode as:

(6.2)

where Ii and Ij are the index of channel i and j and is the current spreading factor. The

channel interaction matrix was then multiplied with both the reference and processed signals

for all channels. The signals were then low-pass filtered.

The normalized covariance between the reference and the processed signals was calculated

as:

P a g e | 85
Chapter 6 Speech Intelligibility Metrics

(6.3)

where and are the mean values and and are the envelopes of the reference

and the processed signals in the critical band i respectively.

Then the SNR for the critical band i was calculated as:

(6.4)

where is the covariance calculated in the previous step by equation 6.3. The was

limited between -15 and 15 dB.

The transmission index was calculated by linearly mapping the SNR between 0 and 1 as:

(6.5)

The band importance function can be as per the standard SII measure. The critical band

weight was calculated as described in (Ma, Hu and Loizou 2009):

(6.6)

Where x is the magnitude of input stimuli at the critical band i. The exponent factor p can be

between 0.12 and 1.5.

NCM was finally calculated as the weighted sum of the transmission indexes from the

critical bands:

(6.7)

where K is the total number of critical bands, is the weight calculated by equation 6.6 and

is the transmission index of the critical band i calculated by equation 6.5.

The NCM will be used in §13 to predict speech intelligibility of cochlear implant recipients

with different AGC conditions.

P a g e | 86
Chapter 6 Speech Intelligibility Metrics

6.5 Apparent SNR

Sound processing, either linear or nonlinear, can cause the SNR of the output signal to be

different from the SNR of the input stimuli and therefore affect the intelligibility (Steeneken

and Houtgast 1980; Hagerman and Olofsson 2004; Rhebergen, Versfeld and Dreschler

2009). Compression is non-linear processing and therefore the effective SNR at the output

can be different from the input SNR. Souza et al. (2006) showed that fast AGC (WDRC)

degraded SNR due to the amplification of background noise. Even with no amplification, an

AGC could still reduce a positive SNR value of the input signal by compressing target

speech more than background noise. In cochlear implant processing, there are other non-

linear processing such as envelope sampling (maxima selection) and the input dynamic range

limitation that can affect the output SNR.

Hagerman and Olofsson (2004) proposed an inversion technique to recover speech and noise

from the compressed mixture at the output of an AGC. One of the components (either speech

or noise) was phase inverted and combined with the mixture to recover the other component.

The output SNR was then calculated from the recovered compressed speech and noise. The

method is sensitive to the exact time alignment of the original and phase-inverted waveform.

The inversion technique is not suitable for evaluating AGC after the filterbank in cochlear

implant processing because phase information is discarded when the FFT bins are combined

into channels.

Rhebergen et al. developed the apparent SNR method to quantify the effect of compression

amplification (Rhebergen, Versfeld and Dreschler 2008b). The gain produced by an AGC for

a high level input signal, the mixture of speech and noise, was applied to the speech and

noise signals separately to calculate the output SNR. The apparent SNR measure will be used

in this thesis to predict the effect of different AGC systems on speech intelligibility of

cochlear implant recipients.

P a g e | 87
Chapter 6 Speech Intelligibility Metrics

6.6 Across-Source Modulation Correlation

While an AGC improves speech intelligibility of cochlear implant recipients by adjusting the

signal level to be within a predetermined input dynamic range, it also can have several

effects on the envelope of speech signals and consequently degrade speech intelligibility,

especially in noise. The effects of fast-acting compression on the intelligibility of speech in

noise were analyzed by Stone and Moore (2003, 2004, 2007). Stone and Moore quantified

the effect of fast compression on the envelope of speech by three signal metrics; (i) Within-

Signal Modulation Coefficient (WSMC), (ii) Fidelity of Envelope Shape (FES), and (iii)

Across-Source Modulation Correlation (ASMC). The WSMC measure calculated the amount

of correlation between speech envelopes at different frequency channels after compression.

The hypothesis was that if the correlated fluctuation of envelopes at different frequency

channels was important for the speech intelligibility, the WSMC index would be able to

predict subject’s scores. The FES measure compared the envelope shape of the target speech

in each frequency channel before and after the compression. The hypothesis was that if the

preservation of the envelope shape was important for speech intelligibility, the FES index

would show a positive correlation with the subject’s scores. Lastly, the ASMC measure

calculated the correlation between the target speech and interfering signals at the output of

the signal path. The gain produced by an AGC for a high level input signal, the mixture of

speech and noise, was applied to the speech and noise signals separately to calculate the

ASMC. The hypothesis was that cross-modulation between the two originally independent

signals could degrade the speech intelligibility.

Among the three metrics analyzed, the ASMC was shown to have the highest correlation

with the intelligibility scores. Therefore the ASMC measure will be used in §13.3 of this

thesis to predict speech intelligibility of cochlear implant recipients affected by different

AGC systems in noisy conditions.

P a g e | 88
Chapter 6 Speech Intelligibility Metrics

6.7 Conclusion

Signal metrics are useful for performance analysis during the development and optimization

of new signal processing algorithms. A signal metric that reliably predicted the speech

intelligibility of cochlear implant recipients would accelerate development and optimization

of cochlear implant processing algorithms. This chapter examined standard metrics and also

non-standard signal metrics that have the potential to quantify the effects of AGC for

cochlear implant systems, and consequently predict speech intelligibility of cochlear implant

recipients. An indirect, but more important, goal is to find out the important features of

speech that an AGC or a signal processing algorithm should preserve to improve speech

intelligibility of cochlear implant recipients in adverse listening conditions. The apparent

SNR, the NCM and the ASMC measure are chosen in this thesis to quantify the performance

of different AGC configurations for the cochlear implant system. The prediction power of

each signal metric will be analyzed in chapter 13 based on the correlation between the

quantified indexes and the subject’s scores from various clinical studies in this thesis.

P a g e | 89
Chapter 7 Test Methodology

7 Test Methodology

7.1 Introduction

With technology advances in implant design, sound processor hardware and accessories,

multi-microphone technology, signal processing and sound coding techniques, the majority

of cochlear implant recipients show high levels of speech understanding even in relatively

challenging listening conditions. Increases in speech performance demand the review of

speech recognition tests to avoid ceiling effect. Tests to evaluate the performance of cochlear

implant recipients range from psychoacoustic to music tests. Among them, speech

recognition tests are most commonly used because the system that is shown to provide the

highest speech intelligibility is considered to be the most beneficial system for the recipients.

Speech recognition tests are important to evaluate the effectiveness of a system or an

algorithm, to compare between two or more devices or to compare different parameter

settings in a single device. An ideal speech test should be reliable and sensitive to differences

between test and processing conditions. The results should be highly correlated with real

world speech perception (Mackersie 2002). A good test is often measured by test-retest

reliability. Test-retest reliability is indicated by the performance deviation for the same test

conducted at different times.

This chapter first describes test materials and methods used in research studies of AGC

systems in cochlear implants. It then devises test methodology for clinical studies in this

thesis. It also explains the research platforms used to develop signal processing strategies

and elaborates the speech tests for evaluation of the existing and proposed AGC systems in

this thesis.

P a g e | 90
Chapter 7 Test Methodology

7.2 Test Materials

Planning a speech recognition test involves choosing the right test materials and method for

a system or an algorithm under test. Test materials range from components of speech, vowels

and consonants, spondees, words in carrier phrase to sentences without and with context. A

proper selection of test material is critical for evaluating AGC systems (Stöbich, Zierhofer

and Hochmair 1999). Sentences are most appropriate to use in listening tests that evaluate

effects of compression on dynamic stimuli. For example, speech tests using isolated words

may not fully evaluate the benefits of an AGC system with time constants longer than one

second. The City University of New York (CUNY) sentence list, Bamford-Kowal-Bench

(BKB) sentence list (Bench, Kowal and Bamford 1979), Hearing In Noise Test (HINT)

sentences, IEEE sentences, AzBio sentences, Hochmair-Schulz-Moser (HSM) sentences

(Hochmair-Desoyer et al. 1997) and Oldenburg (OLSA) sentences are commonly used in

research studies for evaluation of speech understanding of hearing-impaired subjects.

Predictability of speech material varies between different test materials, for example, the

BKB sentence test and the Oldenburg sentence test. The BKB sentence lists contain simple

short sentences that are suitable for evaluating speech perception in children. Hence, the

sentences are relatively easy to predict. The Oldenburg test, on the other hand, consists of

sentences that have five words, and each sentence has the same syntactical form (Name verb

number adjective object). The predictability of the speech material is low. The predictability

of speech can affect the performance comparison. As context effects are prevalent in the

perception of speech, the effective number of statistically independent Bernoulli trials can be

smaller than the number of words (Brand and Kollmeier 2002). Thus, easy test material can

bias scores towards the higher end. With easy test materials, for example, short sentences

with context, such as the BKB sentence lists, the probability of correctly scoring the

remaining words increases if one of the first words has been recognized.

Gifford et al. (2008) assessed speech perception of 156 adult cochlear implant recipients and

50 hearing aids users on commonly used speech recognition materials; CNC words, and

sentence recognition in quiet with Hearing In Noise Test (HINT) and AzBio sentences and in

noise with BKB-SIN sentences. Their study shows that 28% of the subjects achieved 100%

P a g e | 91
Chapter 7 Test Methodology

correct scores and 71% of the subjects achieved more than 85% correct scores for HINT

sentence in quiet. The ceiling effect suggested more difficult test materials were required.

CNC monosyllabic words, vowel-consonant-vowel (VCV) disyllables are commonly used in

speech recognition tests. Speech tests using single isolated words are often not efficient to

test certain aspects of an AGC system. For example, a monosyllabic word test such as

Consonant-Nucleus-Consonant (CNC) word test may not be efficient to evaluate AGC

systems with different time constants. Duration of test materials matters in this case.

Researchers often use carrier phrases before the keywords.

The Australia Sentence Test In Noise (AuSTIN) sentences (Dawson, Hersbach and Swanson

2013) were developed by the Cooperative Research Centre for Cochlear Implant and

Hearing Aid Innovation for Australian use (Cameron and Dillon 2007). They are short and

simple like BKB sentences. The sentences were recorded with a female native Australian

speaker. AuSTIN contains a total of 80 lists, each comprising 16 sentences.

7.3 Test Methods

While selection of appropriate test materials is important, test methods should also consider

including the dynamics of speech sounds that are commonly encountered. Test conditions

should reflect everyday listening condition, i.e., speech presented in background noise or

competing voices. Sentence in noise tests are most commonly used for evaluating AGC

systems in cochlear implants. There are two basic approaches to conduct a speech-in-noise

test:

 Fixed method: both target speech and competing noise are presented at fixed

presentation levels. The SNR is fixed during the test and performance is usually

expressed as percent correct items, for example, words or sentences.

 Adaptive method: either speech or noise level is adjusted during the test. The SNR is

adaptively changed using an up/down rule to find the SNR that yields a specified

percent correct score.

P a g e | 92
Chapter 7 Test Methodology

Table 7-1 lists test materials and methods used in research studies that evaluated AGCs of

cochlear implant systems. These studies used sentence tests to evaluate different aspects of

compression.

Sentence Presentation
Study Test Methods Test Material
Level

Whisper evaluation o fixed presentation  45, 55, 70 dBA in  SIT sentences

level test quiet


McDermott et al.
o adaptive SRT test  speech at 65 dBA  speech-shaped
(2002)
noise

Evaluation of five fixed presentation 55, 70, 85 dB SPL at Gottingen sentences in

AGC (slow and dual- level test individually selected Fastl-noise (amplitude-

loop) configurations SNR modulated speech-

shaped noise)
Stöbich et al. (1999)

ADRO evaluation fixed presentation  40, 50, 60 dB SPL in  Close set spondees

level test quiet in a carrier phrase


James et al. (2002)
 60 and 70 dB SPL in  Open set CNC

quiet words in quiet

 50, 60, 70 dB SPL in  CUNY sentences in

quiet quiet

 70 dB SPL at SNR of  CUNY sentences

+15 and +10 dB in 8-talker babble

noise

ADRO evaluation o fixed presentation  50, 60, 70 dB SPL in  Freiburger

Muller-Deile et level test quiet numbers test

al.(2008)  Freiburger

monosyllabic

words test
o adaptive SRT test  Background noise
 Oldenburger
was fixed at 65
sentence
dB SPL and speech

was adaptively

P a g e | 93
Chapter 7 Test Methodology

varied

ADRO evaluation fixed presentation  50 dB SPL in quiet BKB sentences in 8-

level test  speech at 65 dB SPL talker babble noise


Dawson, Decker and
at individually
Psarros (2004)
selected SNR

Fast AGC and dual- o fixed presentation  65 dB SPL at  HSM sentence test

loop AGC evaluation level test individually selected  ABC sentence test

SNR in unmodulated
Boyle et al.(2009)
o adaptive SRT test  roving speech level, speech-shaped

65 dB SPL  10 dB noise

Table 7-1 Test materials and methods used in AGC studies

7.3.1 Fixed Method

The fixed method presents both target speech and background noise or competing voices at

predetermined presentation levels. The SNR is fixed throughout the test. Speech

intelligibility is the average of percent correct items, for example, number of correct

morphemes, target words or a whole sentence. Presentation level of sentences and the SNR

were chosen by the researcher in those tests.

The fixed method is only of limited relevance to everyday life listening scenarios. In

everyday speech communication, the short-term and long-term level of target speech as well

as other sounds in the environment can vary within a few tens decibel range. Hence the fixed

method may not fully evaluate all aspects of an algorithm. For example, evaluating the effect

of slow time constants needs the presentation level of test stimuli to be varied within the test.

The other disadvantage of using the fixed method is that the SNR needs to be carefully

selected to avoid floor and ceiling effects. The advantage is that it is simple and relatively

easy to arrange. It is useful for finding the performance-intensity function of a processing

condition.

P a g e | 94
Chapter 7 Test Methodology

7.3.2 Adaptive Method

An adaptive method that finds speech reception threshold (SRT) is most commonly used in

speech recognition testing. SRT can be defined as signal-to-noise ratio (SNR) level at which

a listener achieves a targeted performance. 50% is most commonly used as the targeted

percent correct score level for SRT. The adaptive method has the advantages of greater

flexibility and higher efficiency over the fixed SNR method (Levitt 1978). With the adaptive

method, performance evaluation is not affected by floor or ceiling effect. Different aspects of

compression can be evaluated as the presentation level of either speech or background noise

is varied with SNR which is adaptively varied during the test. An adaptive method

determines the SNR of next stimuli based on the current stimuli and the response. Most tests

adapt SNR by adjusting background noise with respect to the fixed presentation level of

speech. There are also adaptive tests in which the background noise level is fixed and the

presentation level of each sentence is altered based on the subject’s response to the previous

one (Muller-Deile et al. 2008).

The disadvantage is that the test can be relatively complex and parameters need to be

selected carefully. There must be sufficient sentences for the final SNR values to converge

closely to the actual SRT. An SRT test with a standard deviation less than 1 dB is acceptable

for test-retest reliability. The scoring method can also have an impact on test results. For

instance, an adaptive procedure is less efficient with sentence scoring (Brand and Kollmeier

2002). Word scoring produces more test samples and is therefore more suitable to use in

adaptive tests. In addition, the starting SNR value, the up/down procedure, and the method to

calculate the final SRT have impact on the SRT result (Dawson, Hersbach and Swanson

2013).

Speech tests in the laboratory are often criticized for not being representative of real-life

listening conditions. A test should have more than one presentation level to evaluate the

adaptability of an AGC to different presentation levels. An SRT test with sentences roving

between more than one presentation level has been proposed and used by researchers to

evaluate the performance of gain algorithms and cochlear implant systems (Boyle et al.

2009).

P a g e | 95
Chapter 7 Test Methodology

The roving-level SRT test has been used to evaluate the performance of fast and dual-loop

AGC systems (Boyle et al. 2009) and different sound processors (Haumann, Lenarz and

Büchner 2010; Boyle et al. 2013) with cochlear implant subjects. The test originally used

male-voiced German HSM sentences, with speech-shaped noise presented 0.5 second before

and after each sentence. The presentation level for each sentence was randomly selected.

Haumann et al. (2010) tested two roving conditions: 65 dB SPL ±10 dB roving (i.e.

presentation levels roving at 55, 65 and 75 dB SPL) and 65 dB SPL ±15 dB roving (i.e.

presentation levels roving 50, 65 and 80 dB SPL). The SNR was adapted as a single track for

all presentation levels tested within the test. There were altogether 30 sentences in one test,

with 10 sentences for each presentation level. The SRT was calculated as the mean of the last

ten SNR values. As the presentation level was roved between low and high presentation level

within the test, it was hypothesized to be more realistic and emulating listening conditions

outside the laboratory. Besides, the adaptive roving-level test was more sensitive to the

difference between processor designs that were not effectively revealed using the fixed

method.

Boyle et al. (2013) recently developed a new sentence test called Sentence Test with

Adaptive Randomized Roving levels (STARR) to evaluate the effectiveness of hearing

prostheses. The STARR used IEEE sentences spoken by male and female speakers. The SRT

was independently and adaptively calculated for each speaker in the STARR. In STARR, ten

sentences were presented at each of three presentation levels: 50, 65 and 80 dB SPL, with the

presentation level randomly selected. 15 sentences were spoken by the male speaker and 15

by the female speaker. No consecutive sentences were presented at the same level or by the

same speaker. The initial SNR was selected +20 dB. A speech-shaped noise was varied

adaptively to track the SNR at 50% correct scores. The step size was 10 dB initially. The

step size was reduced to 5 dB after the first reversal and to 2.5 dB after the second reversal.

The SNRs of the last nine sentences for each speaker plus the SNR that would have been

applied to the next sentence was averaged to get the final SRT result for each speaker. The

STARR tracked the SRT of male and female speaker independently.

The study on STARR showed that the comparison between two test conditions could only be

meaningful if the SRT difference was greater than 2.2 dB for a normal hearing subject using

P a g e | 96
Chapter 7 Test Methodology

one test list per condition. The SRT variation of the cochlear implant participants in their

study was much higher than that of the normal hearing subjects. Both the normal hearing and

the cochlear implant subjects performed at their best at 65 dB SPL. However, only 40% of

the cochlear implant recipients achieved an SRT lower than 20 dB. A high SRT above 20 dB

indicates that the competing noise was not the main factor affecting the performance. A

significantly lower group mean score at low presentation level (50 dB SPL) than at mid and

high presentation levels (65 and 80 dB SPL) shows that lack of audibility was the main

factor affecting the speech intelligibility. Since the test employed a single adaptive track the

final SRT was over-weighted by the low presentation level. These studies showed that the

roving-level SRT test needed to be improved.

7.4 Test Methodology for Clinical Studies in this Thesis

This thesis evaluated the existing AGC systems with fast and slow time constants and

proposed new gain or dynamic range optimization algorithms. The fixed test was used for

evaluating AGC systems with fast time constants in chapters 8 and 10. For evaluation of

AGC systems with fast and slow time constants and with more than one control loops, the

adaptive method was used in chapters 9 and 12. The fixed test was also used for evaluating

ALGF in chapter 12 to observe the benefits of employing a noise estimator in the algorithm.

7.4.1 Test Setup

Listening tests were carried out in a sound-treated room. A 1 kHz narrow band noise was

used to calibrate sound pressure level at the listener’s position in the soundroom. The sound

pressure level of the narrowband noise at the test position was shown 65 dB SPL within 1 dB

tolerance when measured with B&K 2250 sound level meter.

Two test setups were organized. In the first set-up (referred to as the loudspeaker set-up), the

audio was presented from a single loudspeaker one metre in front of the subject. The sound

pressure level was restricted to 80 dB to avoid loudspeaker distortion. To achieve effective

presentation levels above 80 dB SPL, the manual sensitivity control was increased to provide

additional gain.

P a g e | 97
Chapter 7 Test Methodology

The second setup (referred to as the direct connect setup) bypassed the loudspeaker and

microphones, and presented the audio signal directly to the ADC of the real-time processing

platform (§7.4.5.2). A pre-emphasis filter was used to match the frequency response of the

Standard directionality. The direct connect setup was calibrated by a 1 kHz sinusoid at the

compression threshold of the front-end compression limiter. This had two advantages: the

audio could be presented at high levels without distortion, and there was no possibility of the

recipient using any residual acoustic hearing in their contra-lateral ear. The drawback was

that the subjects could not hear their own voice.

All the experiments in this thesis used a single sound source such that both speech and noise

sounds were presented from the loudspeaker in front of the recipient. Therefore the

directional microphone techniques would have little effect on the performance of the

subjects. The Standard microphone directionality (Hersbach et al. 2012) was used in all

experiments.

7.4.2 Test Materials

The sentence materials of AuSTIN (Dawson, Hersbach and Swanson 2013) were used in the

listening experiments of this thesis. Each sentence contains four to six words or six to eight

syllables. Four-talker babble noise was used in the fixed test and LTASS noise was used in

the roving-level SRT test. Each sentence was time-aligned with a particular segment of four-

talker babble.

The morphemic scoring method was used in all tests except in the adaptive roving-level test

in §9.2, which used word scoring. The morphemic scoring method gives scores on the

number of morphemes correctly repeated from each sentence. For example, the sentence

“She is do/ing her home/work” contains seven morphemes.

P a g e | 98
Chapter 7 Test Methodology

7.4.3 Fixed Level Test

The presentation level of sentences and the input SNR were fixed for each listening

condition in the fixed test. The RMS level of noise was adjusted accordingly before the test.

Four-talker babble noise was used in the speech in noise tests using the fixed method. The

RMS level of noise was adjusted based on the sentence presentation level and the SNR to be

tested. The background noise was presented one second before and also one second after

each sentence.

7.4.4 Roving-level SRT Test

The adaptive SNR rule limits the maximum SNR at 30 dB. The reason for this limitation is

that if a subject cannot score well at very high SNR (30 dB in this case), then the level of

background noise may not be the performance degradation factor. The performance is likely

to be affected by factors relating to the presentation level such as audibility and envelope

distortion. Since increasing SNR is not likely to improve performance for SNR converging

to get the final SRT, the maximum SNR is limited at 30 dB.

7.4.4.1 Roving-level SRT Test (Single Adaptive Track)

In this test, the presentation level of sentences is randomly roved between 50, 65 and 80 dB

SPL during the test. The adaptive rule is same as the one used in the study of Haumann et al.

(2010) and described above in §7.3.2. The test was used in the clinical study that evaluated

the performance of the existing AGC systems of the Nucleus CP810 system (§9.2.1).

7.4.4.2 Interleaved Roving-level SRT Test

Cochlear Limited has its own an adaptive SRT test known as Australian Sentence Test In

Noise (AuSTIN) for clinical studies (Dawson, Hersbach and Swanson 2013). With AuSTIN,

a standard deviation of 1 dB can be achieved with 20 sentences when a psychometric fitting

rule is applied. AuSTIN can be configured to present a single or roving presentation level

between sentences or noise during the test. The main difference between AuSTIN roving-

level test and the roving-level test described in §7.3.2 is that AuSTIN allows multiple

P a g e | 99
Chapter 7 Test Methodology

adaptive tracks, i.e., each presentation level has its own track. Hence it is also called the

interleaved roving-level SRT test.

The interleaved roving-level SRT test also selects the sentence presentation level randomly

from the list. The SNR for each presentation level is calculated independently. It thus allows

the SRT to be measured separately for each presentation level by applying a psychometric fit

rule. A total of 48 sentences are used, with 16 sentences for each presentation level. The

number of correct items (words or morphemes) was counted before the direction of SNR was

decided for next sentence. AuSTIN adaptive rule is similar to the one used in HINT (Nilsson,

Soli and Sullivan 1994). The step size is 4 dB for the first four sentences and 2 dB for

remaining sentences. The background noise can be continuously presented or it can be

played at a certain time before and after the sentence. When the background noise is

presented continuously, a beep is presented before each sentence to alert the subject.

Table 7-2 lists the components speech tests, test materials and methods, used in the clinical

studies of this thesis.

P a g e | 100
Chapter 7 Test Methodology

Sentence Presentation
Study Test Methods Test Material
Level

Effects of no AGC fixed presentation 55-89 dB SPL SNR of AuSTIN sentences in

and the front-end level test +10 and +20 dB four-talker babble

compression limiter noise

on speech

intelligibility (§7)

Evaluation of the o roving-level SRT roving speech level, 65 AuSTIN sentences in

existing AGC test (single dB SPL  15 dB speech-shaped noise

systems (§9) adaptive track)

o interleaved

roving-level SRT

test (multiple

adaptive tracks)

Effects of envelope fixed presentation  89 dB SPL in quiet AuSTIN sentences in

profile limiter on level test and at SNR 10 dB four-talker babble

speech intelligibility  55-89 dB SPL SNR noise

(§ 10) of +10 and +20 dB

Evaluation of o fixed presentation  50 and 80 dB SPL at  AuSTIN sentences

Adaptive Loudness level individually selected in four-talker

Growth Function o interleaved SNR babble

roving-level SRT  roving speech level,


(§ 12)
test (multiple 65 dB SPL  15 dB  AuSTIN sentences

adaptive tracks) in speech-shaped

noise

Table 7-2 Test materials and methods used in the clinical studies of this thesis

P a g e | 101
Chapter 7 Test Methodology

7.4.5 Research Platforms

The signal processing algorithms in this thesis were implemented in either the Nucleus

CP810 behind-the-ear (BTE) sound processor or a real-time PC-based research platform

called the Nucleus-xPC system. The Nucleus-xPC system has the advantage of quick

prototyping and evaluation in the laboratory before an algorithm can be deployed on the

BTE processor. The advantage of implementing algorithms on the BTE sound processor is

the flexibility to evaluate them in different listening conditions. Most of the clinical studies

conducted in the laboratory used the Nucleus-xPC system. The Nucleus CP810 sound

processor was used in the clinical study for evaluation of the existing AGC systems (§9.2)

and the take-home experiment of the proposed envelope profile limiter (§10.2.2) for quality

assessment. The description of each system is elaborated in the following subsections.

7.4.5.1 Nucleus CP810 Sound Processor

The Nucleus CP810 sound processor is shown in Figure 3-2. The Nucleus CP810 sound

processor was launched with the Nucleus 5 cochlear implant system in 2009. The sound

processor used the same customized DSP chip as Freedom processor (Swanson et al. 2007;

Bondarew and Seligman 2012). The processor is based on a 0.18 m CMOS ASIC. The

processing core consists of an 8051 micro-controller and four identical custom DSPs

(Patrick, Busby and Gibson 2006). The analog domain contains a low power oscillator, three

16-bit sigma-delta Analog-to-Digital Converter (ADC), a class-D output and a DC/DC

converter. The top level architecture is shown in Figure 7-1.

P a g e | 102
Chapter 7 Test Methodology

Figure 7-1 Top-level architecture of Champ (Swanson et al. 2007)

The DSPs are utilised in a serial fashion, such that each DSP performs a different set of

functions in the signal path. The DSP firmware runs on four DSPs, and is responsible for

processing audio samples taken from two omni-directional microphones. The DSPs send the

resulting stimulation commands to the stimulus controller which encodes them into an RF

signal for the implant. An acoustic output, available via an optional earphone accessory, is

used for monitoring the signal as it passes through the DSP firmware.

Each of the four DSPs has 1024 words of instruction, X, Y and Z data memory, and run at

5011000 cycles per second. Three DSPs are used in the existing design, leaving one for

future expansion. The multi-microphone directional and beamforming algorithms,

microphone calibration filters, input select and mixing and frequency analysis (FFT) are

implemented on the first DSP (DSP0). In Nucleus CP810 sound processor, a number of

alternative audio inputs, such as telecoil, Lapel Microphone, TV/HI-FI cable, and the Euro-

Adaptor, are available in addition to the omni-directional microphones. The alternative audio

inputs can be mixed with the microphone according to a user-defined mixing ratio.

Combine-Into-Channel (CIC), clinical gains and AGCs, the UGM and ADRO, are

implemented on the second DSP (DSP1). Maxima selection, LGF, mapping and Data

Encoder Formatter (DEF) are implemented on the third DSP (DSP2).

P a g e | 103
Chapter 7 Test Methodology

7.4.5.2 Real-Time Nucleus-xPC System

The real-time Nucleus-xPC system is a laboratory-based research platform developed in

Cochlear Ltd (Goorevich 2005). The xPC system assists research and development of

cochlear implant sound processing by allowing quick prototyping and real-time evaluation of

signal processing algorithms with cochlear implant recipients.

Figure 7-2 shows the hardware components of the Nucleus-xPC system for Nucleus signal

path. Similar to a sound processor, the Nucleus-xPC system consists of the hardware for

audio input, processing unit and a stimulus generator. The processing unit consists of two

computers; a host computer and an x86-based real-time computer system as a target

computer. DSP algorithms for cochlear implant sound processing are implemented in the

Nucleus-xPC system using Simulink from the Mathworks. The host computer builds

Simulink models and downloads them into the target PC. The target computer executes them

in real-time. The audio inputs to the Nucleus-xPC system are captured by two omni-

microphones mounted in the behind-the-ear (BTE) housing of CP810 sound processor. The

electrical signals at the output of the two omni-microphones are amplified by a preamplifier.

The audio interface to the target computer is accomplished by a high performance audio

board such as Bittware PMC+ and General Standards as shown in Figure 7-2. The custom-

made stimulus generator (StimGen) drives the RF coil for the implant.

P a g e | 104
Chapter 7 Test Methodology

Figure 7-2 Components of the real-time Nucleus-xPC system (Goorevich 2005)

The Simulink blocks and models for Nucleus sound processing can be found in a Simulink

library called Nucleus MATLAB Blockset (NMB). NMB contains the commercially

available algorithms such as SPEAK, ACE, CIS, Whisper, ADRO and Beam for the Nucleus

systems. NMB can also be used for off-line simulation and experimentation using Simulink

alone. Figure 7-3 shows an example of ACE sound coding strategy with the two AGCs under

study.

P a g e | 105
Chapter 7 Test Methodology

Figure 7-3 ACE sound coding strategy with the standard front-end AGC (blue block)
and the proposed AGC (green block)

7.5 Conclusion

This chapter studied test methodology, materials and methods, for speech tests to evaluate

the effectiveness of the signal processing algorithms, AGC system in particular, in hearing

prostheses. Speech tests using sentences are more appropriate to evaluate compression

circuits than isolated words. Compared to the fixed method, the adaptive method has the

advantages in a speech test; greater flexibility and higher efficiency. Roving-level SRT test

is considered to represent realistic listening conditions outside the laboratory. A roving-level

adaptive SRT test can have a single or multiple adaptive tracks. The clinical studies in this

thesis used both the fixed method and adaptive method, SRT test. The speech tests that

evaluated the sophisticated gain algorithms used the roving-level SRT test. Two research

platforms were used in this thesis to implement gain algorithms in the Nucleus signal path.

Implementation on the Nucleus CP810 sound processor provides the flexibility to evaluate

the algorithms in real world listening conditions. On the other hand, implementation on the

Nucleus-xPC system has the advantage of quick prototyping and evaluation in the

laboratory.

P a g e | 106
Chapter 8 Investigating Effects of No AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

8 Investigating Effects of No AGC and Fast AGC on

Cochlear Implant Speech Intelligibility

8.1 Introduction

With no AGC in the signal path, the input dynamic range limitation at the LGF (§4.3.7)

would clip the envelopes for speech presented above C-SPL. It could be considered as the

worst case processing for speech intelligibility of cochlear implant recipients at high

presentation levels. Although there are many studies on the effects of different AGC system

on speech performance of cochlear implant recipients (§5.4), no specific study on the effects

of cochlear signal path with no AGC on the speech intelligibility of cochlear implant

recipients at various presentation levels was found.

The study of Zeng et al. (2002) showed that the speech understanding of cochlear implant

subjects was degraded when the input dynamic range was set below 40 dB in quiet

condition. Zeng and Galvin (1999) showed that the resolution of intensity steps was not very

important for the speech intelligibility of the cochlear implant recipients because the speech

intelligibility of cochlear implant subjects was not significantly affected for the maximum

reduction of current levels, just two levels, in the electric dynamic range. They concluded

that amplitude cues could be traded with frequency cues, at least for speech in quiet. An

inference made from these two studies is that a proper dynamic range setting before mapping

into electrical current levels is important for speech intelligibility although the number of

current steps may not be so important. With no AGC, speech close to C-SPL was presented

in the upper part of the dynamic range of the LGF. Performance degradation was expected at

presentation levels further away from C-SPL. In other words, an AGC is necessary to extend

the dynamic range beyond the predetermined range between C-SPL and T-SPL. It should be

noted that limited current levels at the implant could degrade performance in noise because

stimulating target speech and noise at similar levels would make them less distinct from each

other and the segregation between them would be harder.

P a g e | 107
Chapter 8 Investigating Effects of No AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

The front-end compression limiter (§5.5.1) is a default signal processing algorithm in the

everyday sound processing strategy of Nucleus cochlear implant systems. It is a fast AGC

with time constants less than 100 ms. The purpose of a compression limiter in a cochlear

implant system is to reduce envelope clipping at the LGF for high level input signals. It

therefore prevents potential loudness discomfort due to excessive stimulation at maximum

current levels (C-level). Studies showed that fast compression could degrade speech

intelligibility. However, the signal path with the front-end compression limiter would still be

better than the signal path with no AGC at all. It would be interesting to find out the

effectiveness of the front-end compression limiter over a wide range of presentation levels.

The objective of this study is to observe the effects of channel envelope clipping for speech

presented at different levels. Hence the Performance-Intensity function (P-I) of cochlear

implant recipients with no AGC will be measured with sentences presented from low to high

presentation levels. The P-I function of the recipients with the front-end compression limiter

will also be measured for the same input conditions to observe any performance

improvement. The P-I functions of both processing conditions will be measured in two SNRs

to study confounding effects of additive noise on speech intelligibility.

8.2 Clinical Study

8.2.1 Subjects

Four cochlear implant recipients; S1, S2, S3 and S4, participated in this study. The subject

details can be found in Appendix 1. The subjects were unilaterally implanted with either the

Nucleus 24 or Nucleus Freedom cochlear implant. All had more than two years experience

with their implant, and previous experience with speech tests in clinical evaluations.

8.2.2 Signal Processing

The Nucleus Freedom ACE signal path (§4.3) with no microphone directionality and no

other AGC system except the fast front-end AGC was used. The signal processing in the

P a g e | 108
Chapter 8 Investigating Effects of No AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

ACE sound coding strategy and the AGC were implemented on Simulink for testing with the

Nucleus-xPC system (§7.4.5.2).

Figure 8-1 Signal path used in the experiment

The compression limiter evaluated in this study had the same parameter setting as that of the

Nucleus Freedom sound processor as shown in Table 8-1.

Parameter Value Unit

Attack time 5 ms

Release time 75 ms

Hold time 0 ms

Compression threshold 73 dB SPL

Compression ratio Inf

Post AGC gain 0 dB

Table 8-1 Parameter setting of the front-end compression limiter

It had unity gain up to the compression threshold, and infinite compression beyond. The

compression threshold was 73 dB SPL, calibrated by a sine tone at 1 kHz. Since the crest

factor of speech is about 8 dB higher than that of a sinusoid, peaks of speech presented at 65

dB SPL started to hit the compression threshold. The attack time was 5 ms. For a sudden

increase in the input signal level, overshoots can occur during the attack time. Therefore the

front-end compression limiter also has an overshoot limiter to limit the maximum overshoot

of 3 dB above the compression threshold.

P a g e | 109
Chapter 8 Investigating Effects of No AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

8.2.3 Test Setup

The fixed method as described in §7.3.1 was used to evaluate the processing conditions. The

sentences were presented in four-talker babble noise. Eight or more sentences were tested for

each presentation level.

The presentation level was varied from 55 to 89 dB SPL at two SNR conditions: 10 and 20

dB. All subjects except S1 were tested using the loudspeaker setup described in §7.4.1. S1

was tested with the direct connect setup for sentences presented above 75 dB SPL because

S1 had the contra-lateral hearing that could affect the evaluating of test conditions at high

presentation levels.

8.2.4 Results

Each diagram in Figure 8-2 shows the percent correct morpheme scores of each cochlear

implant subject with each processing at each SNR. Figure 8-3 shows the group mean scores

of all subjects.

P a g e | 110
Chapter 8 Investigating Effects of No AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

100

80
Percent correct (%)
60

40

20
S3 S4
0
100

80
Percent correct (%)

60

40

20
S1 S2
0
55 60 65 70 75 80 83 86 8955 60 65 70 75 80 83 86 89
Presentation level (dB SPL) Presentation level (dB SPL)

Figure 8-2 Percent correct scores of four cochlear implant subjects with no AGC and
with FEL75 (legends of the curves are as described in Figure 8-3)

100

90

80
Percent correct (%)

70

60

50

40

30
SNR10 No AGC
20
SNR20 No AGC
10 Mean SNR10 FEL75
SNR20 FEL75
0
55 60 65 70 75 80 83 86 89
Presentation Level (dB SPL)

Figure 8-3 Group mean scores of four cochlear implant recipients with no AGC and
with FEL75

P a g e | 111
Chapter 8 Investigating Effects of No AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

Subj 20 dB SNR 10 dB SNR

 scores (%) p-value  scores (%) p-value

S1 5 0.0625 5 0.0273*

S2 11 <0.001** 21 <0.001**

S3 6 0.039* 13 <0.001**

S4 14 <0.001** 11 <0.001**

Group 9 <0.001** 11 <0.001**

Table 8-2 Statistical analysis of the score difference between the front-end compression
limiter and no AGC for the presentation levels above 70 dB SPL in two SNR
conditions. The asterisks indicate statistically significant difference in performance
between no AGC and FEL75 (* p < 0.05, ** p < 0.01).

8.2.4.1 With no AGC

At 20 dB SNR, the group mean scores were at least 80% for the presentation level up to 83

dB SPL. The performance degraded above 83 dB SPL. The scores at 83 and 86 dB SPL were

compared using the paired binomial test. The difference was highly significant for all

subjects. Subjects differed in their tolerance to envelope distortion with no AGC at high

presentation levels. For example, subject S2 could still score more than 60% at the highest

presentation level whereas the score of S3 dropped to 30% at that level. Performance

degradation at 55 dB SPL compared to 60 dB SPL could be due to low audibility at 55

dB SPL. The difference of more than 50 percentage points was observed between the highest

and the lowest group scores.

At 10 dB SNR, group mean scores started to degrade from the presentation level of 70

dB SPL. Individual subjects showed different starting points of the score degradation. For

example, subject S4 had the score degraded from 65 dB SPL whereas S3’s scores degraded

from 75 dB SPL. When the scores at 70 and 75 dB SPL were compared, the difference was

highly significant for S2 and S4 and marginally significant for S3 (p-value < 0.05*)

according to the paired binomial test. The scores of each subject reached their lowest

P a g e | 112
Chapter 8 Investigating Effects of No AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

performance at the highest three presentation levels. The group mean scores were

approximately 10% at 83 and 86 dB SPL and almost 0% at the highest presentation level of

89 dB SPL. The performance difference of more than 80 percentage points was observed

between the highest and the lowest group mean scores.

8.2.4.2 With Compression Limiter

The compression limiter started to show effects on mean scores for the presentation levels

above 65 dB SPL because the peaks of speech waveform at 65 dB SPL started to hit the

compression threshold.

At 20 dB SNR, the group mean scores with the compression limiter were more than 90% for

presentation levels between 60 and 80 dB SPL. The group mean scores started to degrade

above 80 dB SPL. The group mean scores with the compression limiter were higher than

with no AGC condition above 70 dB SPL except at 83 dB SPL. The paired binomial test

between the two processing conditions on the pooled scores for presentation levels above 70

dB SPL showed that the performance improvement with the compression limiter was highly

significant for the subject S2, S3 and S4 and the group (Table 8-2). The performance

difference of approximately 40 percentage points was observed between the highest and the

lowest group mean scores.

At 10 dB SNR, the improvement of the group mean scores with the compression limiter was

approximately 10 percentage points for the presentation levels above 70 dB SPL. The paired

binomial test on the pooled scores between the two processing conditions for presentation

levels above 70 dB SPL showed that the performance improvement with the compression

limiter was highly significant for each subject and the group (Table 8-2). The P-I functions

of the subjects with and without the compression limiter show that the rates of score

degradation were similar. The difference was that the P-I function of the processing with no

AGC was offset by approximately -10 percentage points. It appears that the input dynamic

range was expanded approximately 5 dB by the AGC between 65 and 80 dB SPL. For

example, the group mean score with the compression limiter at 70 dB SPL was comparable

to that with no AGC at 65 dB SPL. This trend continued to 80 dB SPL. The group mean

P a g e | 113
Chapter 8 Investigating Effects of No AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

scores of the subjects with the compression limiter were approximately 20% at 83 and 86

dB SPL and 10% at 89 dB SPL. The score difference between the highest and the lowest

scores was approximately 80 percentage points.

8.3 Discussions

The subjects showed the performance degradation with increasing presentation level for the

processing with no AGC before the LGF in the signal path. The score degradation at high

presentation levels was due to the distortion of envelopes above the saturation level for

speech presented above C-SPL. The effect of clipping can be divided into two categories:

waveform distortion and SNR reduction. Waveform distortion includes temporal envelope

modulation depth reduction and spectral envelope shape distortion. Spectral properties of

speech that carry intelligibility such as formants can be degraded. Waveform distortion

occurred at high presentation levels regardless of background noise. The output SNR was

reduced when the envelopes of high-level speech components were clipped more than the

background noise.

The comparison of the P-I functions at two SNR conditions showed that the starting point of

the score degradation and the rate of score degradation were different for different

background noise levels. It appeared that the subjects were more tolerant of waveform

distortion than noise. For example, the background noise levels of speech presented at 70 dB

SPL in noise at 10 dB SNR and speech presented at 80 dB SPL in noise at 20 dB SNR were

approximately 60 dB SPL. Even though waveform distortion occurred more for speech at 80

dB SPL than at 70 dB SPL, the group mean scores with no AGC at those two test conditions

were approximately the same. This agrees with research by others showing that noise is more

detrimental than amplitude distortion to speech intelligibility (Zeng and Galvin III 1999).

The mean scores began to degrade when the level of the background noise was about

60 dB SPL at both SNR conditions. The output SNR at this point could be well below the

input SNR due to clipping. The higher the presentation level was increased, the lower the

output SNR could become in the processing with no AGC. Therefore the output SNR

degradation in high presentation levels was attributed as the main cause of performance

degradation. Similar observation was made with speech understanding of normal hearing

P a g e | 114
Chapter 8 Investigating Effects of No AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

subjects (Dubno, Horwitz and Ahlstrom 2005) and hearing impaired listeners (Studebaker et

al. 1999) in high presentation levels. Each study reported that the relative increase in

masking level at high presentation levels caused the drop in effective SNR level and that

affected the performance. The degree of performance degradation is far worse for cochlear

implant recipients because they rely mainly on the envelope cues of target speech which is

easily corrupted by noise.

The front-end compression limiter was employed in the cochlear implant systems to reduce

envelope clipping at the LGF. The score improvement with the compression limiter was

most significant at the highest two presentation levels at the 20 dB SNR. The channel

envelopes could be clipped extensively at those levels. Employing the compression limiter

certainly helped to improve scores as it reduced the envelope distortion more compared to

the no AGC condition. At 10 dB SNR condition, the group mean score improvement with

the compression limiter was approximately 11 percentage points for presentation levels

above 70 dB SPL. However, the P-I function of the compression limiter showed similar

trend of score degradation as with no AGC processing above 65 dB SPL at 10 SNR. The

positive effect of the compression limiter on the reduction of envelope clipping might get

offset by the negative effects of fast compression on the envelopes of the input stimuli at

high presentation levels when the background noise level was also high.

According to the studies of Stone and Moore, the cross-modulation between target and

competing speech signals introduced by fast compression made the segregation between

them difficult (Stone and Moore 2003, 2004, 2007). In addition to the cross-modulation

effect, the output SNR reduction by fast compressing could also degrade speech

intelligibility in noise (Rhebergen, Versfeld and Dreschler 2008b). It would be interesting to

find out which component contributed more to the speech intelligibility degradation. The

quantification of the effects of the front-end compression limiter on the envelope of the input

stimuli will be studied in chapter 13.

The front-end compression limiter expanded the input dynamic range, 5 dB approximately.

Speech intelligibility improvement was relatively moderate. That indicates the necessity of

P a g e | 115
Chapter 8 Investigating Effects of No AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

additional compression stages before the compression limiter to expand the dynamic range

even wider.

8.4 Conclusions

This study investigated the effects of the input dynamic range limitation by the instantaneous

compression at the LGF on the speech intelligibility of cochlear implant recipients. The P-I

functions with no AGC and with the front-end compression limiter were measured for

sentences presented at different presentation levels in two SNR conditions. With no AGC

before the LGF, performance degraded at higher-than-normal presentation levels. Speech

was still intelligible even when the sentence presentation level was very high for high SNR

condition. At low SNR condition, envelope clipping had high impact on the performance not

only due to waveform distortion, but also due to the output SNR degradation. The effect of

background noise level became significant when the noise level was more than 60 dB SPL.

At this level, the background noise produced stimulation at C-level. Although the

compression limiter reduced the envelope distortion at the LGF for the presentation levels

above 65 dB SPL, the score improvement was moderate. Fast compression itself could also

introduce envelope distortion. This study implies that if a slow-acting compression was

available to adjust the overall presentation level, then short intervals of clipping due to

louder transients would not be very objectionable, and there would be little need for fast

AGC. More research will be done in next chapters on the performance of AGC with different

properties.

P a g e | 116
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

9 Investigating Effects of Slow AGC and Fast AGC on

Cochlear Implant Speech Intelligibility

9.1 Introduction

The previous study (Chapter 8) questioned the effectiveness of the front-end compression

limiter to reduce envelope distortion at high presentation levels. It suggested that the use of

slow AGC in the signal path would improve the speech intelligibility of cochlear implant

recipients.

The existing AGC systems of the Nucleus sound processors (ASC, Whisper, the front-end

compression limiter, UGM and ADRO) were described in chapter 5 (§5.5). Chapter 5 also

reviewed the studies on the performance of ASC, Whisper and ADRO in the existing AGC

systems. Although there are a few studies on the performance of the dual-loop AGC from the

Cambridge group, no study has been published on the performance of the tri-loop AGC

(UGM) of the Nucleus CP810 sound processor.

Haumann et al. (2010) conducted a performance study on different cochlear implant systems

using the roving-level SRT test with a single adaptive track. The results showed that

recipients using the Nucleus Freedom performed worse than recipients using the Auria or

Harmony and Opus 2. Haumann et al. hypothesized that the poorer performance was due to

Freedom using only a single fast AGC (i.e., the front-end compression limiter), whilst

Harmony and Opus 2 used a dual-loop AGC system. If Haumann et al.’s hypothesis was

true, then the tri-loop AGC should provide significant benefits under the same test condition.

There are two objectives in this chapter; (i) to study the performance of the existing AGC

system with the tri-loop AGC and ADRO, and (ii) to establish a good test framework for

evaluating AGC systems in cochlear implant systems.

Two roving-level SRT tests were used: the roving-level SRT test with a single adaptive SRT

track (as in Haumann et al.’s study, §7.4.4.1) and the interleaved roving-level SRT test

P a g e | 117
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

(§7.4.4.2) to evaluate the two processing conditions. The SRT results measured by the two

roving-level SRT tests on the same processing condition will be compared to observe the

effectiveness of each test.

9.2 Clinical Study

9.2.1 Test Setup

The loudspeaker setup (§7.4.1) was used. The two roving-level SRT tests described in §7.4.4

were used to evaluate the two programs. The roving-level SRT test (single adaptive track) in

this study is very similar to the roving-level SRT test used in Haumann et al (2010) study

except for the sentence material.

9.2.2 Subjects

Seven cochlear implant recipients participated in this study: S1, S2, S5, S6, S8, S9 and S10

(refer to Appendix 1 for recipient details). The study was conducted as a repeated-measures

single-subject design in which each subject served as their own control.

9.2.3 Signal Processing

Two AGC systems were evaluated in this study: (i) the front-end compression limiter (FEL)

and (ii) the tri-loop AGC and ADRO (Tri + ADRO). Both the tri-loop AGC and the front-

end compression limiter were configured by changing the parameter setting of the UGM.

The LGF dynamic range of the FEL program was a default 40 dB with C-SPL 65 dB. The

program setting of Tri + ADRO was available as a research option of the Nucleus CP810

sound processor at the time of testing. The C-SPL of the Tri + ADRO program was 75 dB

and therefore the dynamic range of the LGF was 50 dB.

The C-levels of some recipients with the Tri + ADRO program were increased by 10%

approximately to obtain equal loudness as the FEL program. In addition to increasing C-

levels, the Q value of the LGF function was reduced from 20 to 16 in the Tri + ADRO

program. The parameter setting of FEL and Tri + ADRO can be found in the first and second

P a g e | 118
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

column of Table 5-2. Other parameter setting of each program is shown in Table 9-1. Both

programs were loaded into a CP810 sound processor. The program order was

counterbalanced between subjects.

Processing FEL Tri + ADRO 75


Sensitivity 12 12
C-SPL 65 75
LGF Q Value 20 16

Table 9-1 Parameter setting of each program

9.2.4 Results

9.2.4.1 Roving-level SRT Test (Single Adaptive Track)

The SRT of each subject except S9 was the mean of test and retest SRTs. S9 was not retested

due to limited time available for the study. Figure 9-1 shows the individual and group mean

SRT results of the subjects with each program. All subjects except S6 showed SRT

improvement with the Tri + ADRO program. Subject S10 showed an exceptional SRT

improvement of 12 dB with the Tri + ADRO program. The SRT of S8 and S10 with the FEL

program (approximately 16 dB) was noticeably higher (worse) than the other subjects. The

group SRT performance with the Tri + ADRO program was better than with the FEL

program. According to a t-test, the improvement was statistically significant. The overall

SRT improvement with the Tri + ADRO was approximately 5 dB.

P a g e | 119
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

Overall SRT (50, 65, 80 dB)


18 18
FEL
16 Tri+ADRO p = 0.0159* 16
14 14
12 12
10 10
SRT (dB)

8 8
6 6
4 4
2 2
0 0
-2 -2
-4 -4
-6 -6
S1 S2 S5 S6 S8 S9 S10 Mean
Subject

Figure 9-1 SRT of seven cochlear implant subjects measured by the roving-level SRT
test with a single adaptive track. The error bar indicates one standard error. The
asterisks indicate statistically significant difference in performance between the two
AGC systems (* p < 0.05, ** p < 0.01).

P a g e | 120
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

9.2.4.2 Interleaved Roving-level SRT test

The Interleaved roving-level SRT test calculated one SRT for each presentation level.

9.2.4.2.1 At 80 dB SPL

The individual and group mean SRT results of the subjects at 80 dB SPL are shown in Figure

9-2. Six out of seven subjects showed SRT improvement with Tri + ADRO. The mean

benefit of Tri + ADRO was approximately 2.5 dB. According to the analysis with a t-test, the

improvement was statistically significant (p < 0.05*).

Interleaved Roving-level SRT Test: 80 dB SPL


14 14
FEL
12 Tri+ADRO 12

10 p = 0.0388* 10

8 8

6 6
SRT (dB)

4 4

2 2

0 0

-2 -2

-4 -4

-6 -6
S1 S2 S5 S6 S8 S9 S10 Mean
Subject

Figure 9-2 SRT of seven cochlear implant subjects at 80 dB SPL measured by the
interleaved roving-level SRT test. The error bars indicate one standard error. The
asterisks indicate statistically significant difference in performance between the two
AGC systems (* p < 0.05, ** p < 0.01).

P a g e | 121
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

9.2.4.2.2 At 65 dB SPL

The individual and group mean SRTs of the subjects at 65 dB SPL are shown in Figure 9-3.

Four of them performed better with Tri + ADRO and three performed better with FEL.

Because of a large SRT variability amongst subjects, there was no significant difference

between the two processing conditions. Subject S9 showed a large SRT improvement of

approximately 12 dB with the Tri + ADRO at 65 dB SPL although the performance at 80 dB

SPL was quite the opposite. The overall improvement with the Tri + ADRO program was

approximately 1.5 dB.

Interleaved Roving-level SRT Test: 65 dB SPL


14 14
FEL
12 Tri+ADRO 12

10 10

8 p = 0.2300 8

6 6
SRT (dB)

4 4

2 2

0 0

-2 -2

-4 -4

-6 -6
S1 S2 S5 S6 S8 S9 S10 Mean
Subject

Figure 9-3 SRT of seven cochlear implant subjects at 65 dB SPL measured by the
interleaved roving-level SRT test. The error bars indicate one standard error. The
asterisks indicate statistically significant difference in performance between the two
AGC systems (* p < 0.05, ** p < 0.01).

P a g e | 122
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

9.2.4.2.3 At 50 dB SPL

An SRT could not be obtained for the 50 dB SPL presentation level in the interleaved SRT

test because the scores were below 50% even at the maximum SNR of 30 dB. Figure 9-4

shows an example of bad SRT performance at 50 dB SPL. The adaptive track of the

interleaved SRT test (left panel) shows that the subject could not get half of the words in the

first sentence presented at 20 dB SNR. The SNR was stepped up 5 dB, but the subject could

not still get 50% scores. The SNR was stepped up again and reached to the maximum 30 dB.

It clearly showed that the background noise was not the main factor in this test case.

30 30

25 25

20 20

15 15

10 10

5 5

0 0

-5 -5

-10 -10
0 5 10 15 0 0.25 0.5 0.75 1
Trial number Mean response at each level

Figure 9-4 An example of bad SRT convergence due to the lack of audibility at 50 dB
SPL. Left panel shows the convergence of SRT over trials and right panel shows mean
percent correct words at each SNR.

In order to compare the two AGC programs at 50 dB SPL, the correct word scores from the

sentences presented at an SNR of 5 dB or higher were extracted and summed for each AGC

program. At first, it seems improper to compare the percent correct scores between the two

test conditions measured with an adaptive SRT test because the scores were taken at

different SNRs. However, an exception can be made if SNR was not the main factor that

changed the performance. For that case, comparing the percent correct scores collected at

different SNRs (above 5 dB) between the two test conditions is justifiable. A similar

P a g e | 123
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

procedure was also followed in recent study (Boyle et al. 2013). The speech intelligibility at

50 dB SPL in the present study was clearly affected by audibility.

100 100
FEL
90 Tri+ADRO 90

80 80
Percent correct words (%)

70 70
p = 2e-5**
60 60

50 50

40 40

30 30

20 20

10 10

0 0
S1 S2 S5 S6 S8 S9 S10 Mean
Subject

Figure 9-5 Percent correct scores of seven cochlear implant subjects at 50 dB SPL. The
error bars indicate one standard error. The asterisks indicate statistically significant
difference in performance between the two AGC sytems (* p < 0.05, ** p < 0.01).

Figure 9-5 shows the word scores at 50 dB SPL of each subject and the group. A binomial

test showed that subjects obtained significantly higher word scores with Tri + ADRO than

with FEL. The group mean score of the Tri + ADRO was better than that of the FEL

program by 20 percentage points.

9.2.5 Effect of Presentation Level

The advantage of the interleaved roving-level SRT test is that comparisons can be made

between programs as well as between presentation levels. The intelligibility at 50 dB SPL

was the poorest among the three presentation levels tested due to the lack of audibility.

Therefore, the SRT difference between 65 and 80 dB SPL was analyzed for each program.

Worse performance was observed with FEL at 80 dB SPL compared to 65 dB SPL. The

performance degradation with FEL at the high presentation level was consistent with the

P a g e | 124
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

performance indicated by the P-I function in the previous study (§8.2.4.2). Figure 9-6 shows

the SRT comparison between 65 and 80 dB SPL for FEL. Six out seven subjects showed

higher (worse) SRT at 80 dB SPL than at 65 dB SPL. However, the difference was not

statistically significant. Subject S9 was different from the other subjects because the SRT of

S9 at 80 dB SPL was lower (better) than at 65 dB SPL. The overall difference was just above

3 dB.

FEL: effect of presentation level


14 14
65 dB
12 80 dB 12
p = 0.0851
10 10

8 8

6 6
SRT (dB)

4 4

2 2

0 0

-2 -2

-4 -4

-6 -6
S1 S2 S5 S6 S8 S9 S10 Mean
Subject

Figure 9-6 Comparison of SRTs between 65 and 80 dB SPL for the FEL program. The
error bars indicate one standard error. The asterisks indicate statistically significant
difference in performance between the two test conditions (* p < 0.05, ** p < 0.01).

Tri + ADRO also showed higher (worse) SRT at 80 dB SPL than at 65 dB SPL. Figure 9-7

shows the overall SRT difference between 65 and 80 dB SPL for the Tri + ADRO program

was just over 2 dB. The difference was statistically significant according to the analysis with

a t-test (p < 0.05*).

P a g e | 125
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

Tri+ADRO: effect of presentation level


14 14
65 dB
12 80 dB 12

10 10

8 p = 0.0369* 8

6 6
SRT (dB)

4 4

2 2

0 0

-2 -2

-4 -4

-6 -6
S1 S2 S5 S6 S8 S9 S10 Mean
Subject

Figure 9-7 Comparison of SRTs between 65 and 80 dB SPL the Tri + ADRO program.
The error bars indicate one standard error. The asterisks indicate statistically
significant difference in performance between the two test conditions (* p < 0.05, ** p <
0.01).

The SRT difference between 65 and 80 dB SPL was larger with FEL than with Tri + ADRO

for all subjects except S9 and S10. The SRT degradation of S10 at 80 dB SPL was

comparable between the two programs. S9 was different from others, and showed SRT

improvement at 80 dB SPL with the FEL. It could be that S9 needed more audibility than the

other subjects because she got more benefits from the high presentation level of 80 dB SPL.

Every subject except S9 with FEL showed equal or better SRT at 65 dB SPL compared to 80

dB SPL.

P a g e | 126
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

9.2.6 Test-retest Reliability of Roving-level SRT Test with Single

Adaptive Track

A retest on each program was only done with the roving-level SRT with single adaptive

track because of limited time available from the subjects who volunteered in the study.

Therefore, test-retest analysis was only done for the roving-level SRT test with single

adaptive track. The SRTs of all subjects except S9 were measured again with both programs.

Test:FEL
Test:Tri+ADRO
15 Retest:FEL
Retest:Tri+ADRO
SRT (dB)

10

FEL: Test - Retest


6 Tri+ADRO: Test - Retest

2
dB

-2

-4

S1 S2 S5 S6 S8 S10 Mean
Subject

Figure 9-8 Test-retest variability of the roving SRT test with single adaptive track from
the SRT of six subjects taken from test and retest sessions. Top panel shows SRTs of
each subject and group mean for test and retest. The bottom panel shows the SRT
difference between test and retest, and mean of the absolute SRT differences.

Figure 9-8 shows the test and retest SRTs of the subjects with each program. The difference

was taken as the SRT of trial 1 minus that of trial 2. A positive value indicates that the SRT

of trial 2 is better. One possible cause for high test-retest variability of the roving-level SRT

test could be due to randomized sequence of the presentation levels. When the less favorable

presentation level was presented at the beginning of the test, SNR was increased by a larger

P a g e | 127
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

step until the performance reversal occurred. Since the number of sentences was limited, the

adaptive track might not be able to converge to the final SRT.

Figure 9-9 shows an example of poor test-retest reliability of due to a randomized sequence

in the roving-level SRT with a single adaptive track. The analyzed case was the test-retest

SRTs of S5 with Tri + ADRO. The top panel of Figure 9-9 shows an example of poor SRT

convergence due to the randomized sequence. The shading of the circles represents the

percent word scores at each sentence. Black indicates 0%, white indicates 100%, and the

values in between are shown by grey. The first five sentences were presented at 50 dB SPL.

Due to the lack of audibility, the SNR was stepped up until it reached to the maximum, 30

dB SNR. Although the SNR was reduced at other presentation levels, the adaptive track was

yet to converge to the final SRT. Perhaps, a few more sentences were required to get there.

The bottom panel of Figure 9-9 shows an example of a good SRT convergence for

evaluating the same program. Four sentences at the beginning were presented at 65 and 80

dB SPL. The SNRs were converged to the true SRT of S5 after half of the sentences were

presented. Poor audibility at 50 dB SPL was still observed, but overall SRT was less biased

by its effect.

It is hypothesized that the effect of presentation level has less impact on the interleaved SRT

test because the test can isolate the negative effects of any presentation level to its own SRT.

It means that the interleaved SRT test can still measure a good SRT of the other two

presentation levels even if one presentation level is problematic. Test-retest reliability of the

interleaved SRT test will be studied in the clinical study of chapter 12.

P a g e | 128
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

Test: Tri

30 5050
50
25 50 80
80 80 65
20 50 50 65 65 50 80
50 65 50 65
15 50 65
SNR (dB)

80
10 80
65
5 Estimate = 6.7 65 80
80 65 65
0 80 80

-5

-10

0 5 10 15 20 25 30
Trial number
Retest: Tri

30

25

20 65

15
SNR (dB)

10 80

5 65
65 65 65 80 80
0 65 80 50 50 50 50 65 50 65 80 50 80
65 50 80 65 50 50 50
-5 80 80
Estimate = -0.4
-10 80

0 5 10 15 20 25 30
Trial number

Figure 9-9 Adaptive tracks of the roving-level SRT test with a single adaptive track for
S5 with the Tri + ADRO in the test and retest sessions. The convergence of SNRs was
poor in the test session (top panel) and good in the retest session (bottom).

The test-retest variability was high for subject S2, S5, S6 and S10 with Tri + ADRO. The

difference was more than 5 dB. Compared to Tri + ADRO, a test-retest variability of FEL

was low. The performance of Tri + ADRO at one presentation level depended on the level of

previous sentence because of the long time constants. For example, a test sequence with

P a g e | 129
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

smaller changes between the presentation levels (+/-15 dB) would favor the results more

than a sequence with larger steps (+/- 30 dB).

9.3 Discussions

The poor roving-level SRT results with the front-end compression limiter was due to the

poor performance at high (80 dB SPL) and low (50 dB SPL) presentation levels, shown

exclusively by the interleaved SRT test. The previous study (§8.2.4.2) showed that percent

correct scores of the subjects with FEL was lowered by approximately 40 percentage points

when the presentation level was increased from 65 to 80 dB SPL. In this study, a 3 dB higher

(poorer) SRT was observed at 80 dB SPL compared to 65 dB SPL. The transfer function of

percent correct scores to SRT in this study agrees with the study of Plomp (1994), which

showed that an SRT increase of 3 dB could be equated to 50% decrease in percent correct

scores for hearing impaired subjects.

SRT greater than 20 dB at 50 dB SPL with the FEL program showed that performance was

reduced by the lack of audibility, rather than the level of background noise. The subjects

could achieve at most 30% correct scores at 50 dB SPL.

The tri-loop AGC improved the performance of the subjects for speech presented at high

presentation level. The tri-loop AGC contains slow and medium time constants AGCs in

addition to the FEL. In the tri-loop AGC, the slow AGC primarily reduced the input signals

at high presentation levels. Hence, the performance improvement was attributed mainly to

the slow AGC that adjusted the overall presentation level of speech to be within the input

dynamic range without serious envelope distortions. The results also agree with the study of

Boyle et al. (2009) which showed that the dual-time-constants AGC performed better than

the fast AGC for roving-level speech. Using a slow AGC as the main one for the level

adjustment of the input signals is shown to be important for speech intelligibility in adverse

conditions. It should be noted that the medium and fast AGC were still important because

they adjusted the signal level when the slow AGC was not ready to respond.

Studies on ADRO by other researchers showed that ADRO improved the audibility of low-

level speech (§5.4). In the present study, the performance improvement at 50 dB SPL by the
P a g e | 130
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

Tri + ADRO cannot be attributed to the tri-loop AGC, because it had unity gain for the

envelopes below the compression threshold. The maximum gain ADRO could provide was 3

dB, however when ADRO was on, the channel equalization (§4.3.4) was bypassed as shown

in Figure 11-1. The channel equalization has attenuation of up to 6 dB on the high frequency

channels. Thus Tri +_ADRO provided up to 9 dB additional gain relative to FEL.

Even so, more audibility was needed at 50 dB SPL with the Tri + ADRO program. Another

possible reason for not achieving a good audibility at low presentation level with the Tri +

ADRO program was the C-SPL at 75 dB SPL. Without any adjustment, a program with C-

SPL 75 would stimulate at lower current levels than a program with C-SPL 65. Therefore, C-

levels of the Tri + ADRO program with C-SPL 75 were set approximately 10% higher than

the C-levels of the FEL program with C-SPL 65. The steepness factor Q of the LGF function

was set to 16 (more compressive) in the C-SPL 75 program. With these adjustments,

loudness was assumed to be equal between the two C-SPL programs for speech presented at

the same level. Perhaps 10% increase in C-levels with the Tri + ADRO was not enough to

achieve equal loudness level.

The roving-level SRT test with a single adaptive track showed poor test-retest reliability.

Using a single adaptive track for all three presentation levels with a randomized sequence to

calculate a single SRT is suspected to be the cause of poor test-retest reliability. The

convergence of an adaptive rule is only guaranteed when the underlying psychometric

function is constant. This is clearly not the case for the processing conditions whose

performance depended on presentation level. It was hypothesized that the randomized

sequence could have less effect on SRT produced by the interleaved SRT test due to the

independent adaptive tracks. Test-retest reliability of the interleaved SRT test will be

observed in the clinical study of chapter 12. From the SRT results obtained from the two

SRT tests, the interleaved roving-level SRT level could be claimed as a better test procedure

because it measured SRT at each presentation level. The effects of presentation level are

shown separately in each adaptive track and hence can be observed more clearly.

Another suggestion to improve test-retest reliability is to use a fixed presentation level

sequence in roving-level SRT tests. If the sequence of presentation levels is fixed, the impact

P a g e | 131
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

of poor presentation levels on the SRT can be reduced. A fixed sequence of presentation

level, for example a repeated sequence of 50, 65 and 80, can avoid an unbalanced

presentation level sequence such as the one shown in Figure 9-9. Another advantage of using

a fixed sequence of presentation levels is that AGC systems will be tested with equal number

of up-steps and down-steps. For example, the number of +15 dB up-steps and -30 dB down-

steps in a roving-level SRT test that repeats a fixed sequence of 50, 65 and 80 dB SPL for a

total of 30 sentences are 20 and 9 respectively. Hence, the performance variation can be

more attributable to different processing conditions of systems than factors related to test

systems.

Compared to a fixed-presentation level test, an adaptive roving-level SRT test can represent

different listening conditions in everyday life. However, some test conditions are uncommon

in listening conditions outside a laboratory, for example, presenting of the background noise

0.5 seconds before and after each sentence. It is more common to have a continuous

background noise between conversations in everyday environments. Besides, a gain

algorithm with a long time constants like ASC needs continuous background noise to fully

evaluate its performance in noise.

The advantage of the interleaved roving-level SRT test is measuring an individual SRT for

each presentation level. The disadvantage is that the test needs more sentences for the SNRs

to converge to the 50% correct score point. At least 16 sentences for each presentation level

should be used in the interleaved roving-level SRT test. Since the number of sentence lists is

often limited, it is important to use them wisely to avoid having to repeat them. If a subject is

tested with the same set of sentences within a short period of time, performance comparison

may not be valid due to a bias introduced by familiar test stimuli.

P a g e | 132
Chapter 9 Investigating Effects of Slow AGC and Fast AGC
on Cochlear Implant Speech Intelligibility

9.4 Conclusions

This chapter expanded the knowledge on AGC systems with different characteristics. It

showed the performance shortcomings of the front-end compression limiter in difficult

listening conditions. The existing AGC systems of the Nucleus CP810, the tri-loop AGC and

ADRO, improved SRT in all three presentation levels tested. The results highlighted the

importance of a slow AGC to adjust the overall presentation level of the input signal to be

within the LGF dynamic range. Although the existing AGC system improved the

performance at all three presentation levels, there is still room for improvement at low and

high presentation levels when compared to the SRT at 65 dB SPL. An ideal AGC system

would produce equal SRTs at all three presentation levels. Gain optimization techniques to

improve the audibility of low-level input stimuli as well as to reduce the spectro-temporal

distortion of input stimuli at high presentation levels should be further investigated.

Based on the findings of the present study on the roving-level SRT tests, a better SRT test

will be determined for evaluation of future gain control algorithms. The proposed roving-

level SRT test was the interleaved SRT test with a fixed sequence of sentence presentation

levels.

P a g e | 133
Chapter 10 Proposed Envelope Profile Limiter

10 Proposed Envelope Profile Limiter

10.1 Introduction

The speech intelligibility of cochlear implant recipients with no AGC and with just the front-

end compression limiter in the signal path was studied in chapter 8. The main cause of

envelope distortion with no AGC was envelope clipping at the LGF. The front-end

compression limiter improved the speech intelligibility of the subjects but performance

degradation at high presentation levels still showed similar trend. This study will investigate

different ways to optimize a compression limiter of a cochlear implant system.

It was considered that bringing together all the gain control elements at a single point in the

signal path, preferably after the filterbank, allowed simplification. Slow AGCs such as ASC

and ADRO, could be combined. It may also allow better integration of new features such as

SNR-based noise reduction (Dawson, Mauger and Hersbach 2011) and dual-microphone

spatial noise reduction (Spriet et al. 2007), which also act on the filterbank outputs. In this

chapter, an initial investigation is made on the feasibility of moving the front-end

compression limiter to just before the LGF. A potential benefit of monitoring signal levels at

the input to the LGF is that an AGC can use that information to prevent envelope clipping.

Such AGC can optimize spectral envelopes of an input signal by preserving the shape of

spectral profile.

A secondary goal was to investigate the effect of compression speed on speech intelligibility.

There is no clear consensus regarding the optimal AGC speed in acoustic hearing aids

(Dillon 2001; Souza 2002; Gatehouse, Naylor and Elberling 2006; Moore 2008; Kates

2010).

Hearing aids often use multichannel AGCs because the amount of hearing loss, and the

dynamic range of residual hearing, varies with frequency. However, if each channel operates

independently, with fast time constants, then amplitude differences across frequencies will

be reduced, degrading the spectral cues used in recognising speech (Plomp 1994; Stone and

P a g e | 134
Chapter 10 Proposed Envelope Profile Limiter

Moore 2004, 2008). Given these results, it would be expected that independent fast AGC on

22 channels would give very poor performance. A solution is to cross-couple the channels,

so that the gains are related in some way (White 1986). A coupled AGC between frequency

channels is a functional requirement of the peripheral hearing system to be able to cope with

a wide range of loudness (Lyon’s auditory model, §2.4).

The previous chapter showed the benefits of employing slow AGCs in the signal path for

roving-level speech at low and high presentation levels. The tri-loop system provided benefit

because the compression limiter was activated relatively infrequently. However the fast

AGC will operate whenever there is a sudden increase in the speech level, so it is still

worthwhile understanding its effect.

10.2 Signal Processing

Two AGCs, the front-end compression limiter and the proposed multichannel compression

limiter, with two release times, 75 and 625 ms, were investigated. The AGCs were

incorporated in the simplified ACE path shown in Figure 8-1. The signal processing in the

ACE sound coding strategy and the AGCs was implemented on Simulink for testing with the

Nucleus-xPC system (§7.4.5.2).

10.2.1 Front-end Compression Limiter

The front-end compression limiter (FEL) was a single-channel AGC located before the

filterbank. The implementation and parameter setting were the same as described in §8.2.2.

P a g e | 135
Chapter 10 Proposed Envelope Profile Limiter

10.2.2 Proposed Envelope Profile Limiter

Figure 10-1 shows the block diagram of the proposed multichannel compression limiter in

the signal path.

Figure 10-1 Block diagram of ACE signal path with the envelope profile limiter (EPL)

The Max block produced the instantaneous maximum value, across channels, of the set of

envelopes, so that the gain rule acted upon whichever channel had the largest amplitude. The

gain rule had unity gain up to the compression threshold, and infinite compression beyond.

The resulting gain was then applied to all channels. Unlike the front-end AGC, zero attack

time could be realised because gain changes after the filterbank cannot produce any

undesirable spectral smearing. The rise time of the envelopes was thus determined by the

filterbank, and no overshoot could occur. The release time was either 75 ms or 625 ms.

With all channels having equal gain, at first glance it may appear that the multichannel

limiter would have behaviour identical to the front-end limiter. The difference is that levels

were observed after the filterbank, where they directly control the stimulation current, and

the compression threshold was set equal to the saturation level of the LGF. Thus no envelope

could exceed the LGF saturation level. Because it eliminates envelope clipping, and

preserves the spectral profile, this multichannel AGC will be referred to as the Envelope

Profile Limiter (EPL).

P a g e | 136
Chapter 10 Proposed Envelope Profile Limiter

The behaviour of the two AGCs is compared in Figure 10-2. At high presentation levels, the

FEL allows some envelope clipping. This has three detrimental effects. Firstly, it distorts the

spectral profile. As shown in the top panel of Figure 10-2, it flattens the formant peak,

making it harder to determine the formant frequency, potentially degrading vowel

perception. Secondly, examining the temporal waveform in the bottom panel of Figure 10-2,

the amplitude modulation is lost. For a vowel, this modulation occurs at the fundamental

frequency, and is the primary cue to voice pitch. Thirdly, at positive SNRs, envelope

clipping reduces the amplitude of the signal peaks relative to the background noise, thus

reducing the SNR. The EPL avoids these drawbacks, and it was hypothesized that it would

provide better speech intelligibility.

Figure 10-2 Envelope clipping of a vowel; at spectral waveform (top panel) and
temporal waveform (bottom panel) at the output of the LGF, processed by FEL and
EPL

P a g e | 137
Chapter 10 Proposed Envelope Profile Limiter

10.3 Clinical Studies

10.3.1 Test Setup

The fixed method described in §7.4.3 was used with the morphemic scoring method.

10.3.2 Study Design

The study used a repeated measure, single-subject design in which each subject served as

their own control. The test order was counterbalanced between subjects. At the beginning of

the listening session, the subjects were asked to comment on the loudness they perceived

from their own voice and the researcher’s voice with each AGC. They were not informed as

to which AGC was being tested.

Experiment 1 was a two-factor design, measuring sentence recognition in quiet and in noise,

at a high presentation level. Experiment 2 measured sentence recognition in noise as a

function of presentation level, for two of the AGC configurations from Experiment 1. As the

goal was to investigate the effect of fast AGC, the usual slow gain control blocks in the

Nucleus signal path (ASC and ADRO) were disabled in Experiment 1 and 2.

10.3.3 Experiment 1: High Presentation Level

Experiment 1 compared the front-end limiter and the envelope profile limiter with two

release time settings; 75 and 625 ms. Abbreviations for the four AGC configurations are

listed in Table 10-1.

Release Time (ms)


AGC Type
75 625

Frontend compression limiter FEL75 FEL625

Envelope profile limiter EPL75 EPL625

Table 10-1 AGC configurations tested

P a g e | 138
Chapter 10 Proposed Envelope Profile Limiter

In this experiment, the presentation level of the sentences was set to the highest possible

level, 89 dB SPL, at which envelope clipping occurred significantly with no AGC. The

speech intelligibility of cochlear implant recipients with each AGC configuration was

measured in this listening condition. The purpose was to study the important factors of AGC

systems affecting speech intelligibility. The comparisons between the two AGC systems

with the same release time aimed to show the importance of the gain structure whereas the

comparison within the same AGC system with different release times was to show the

effects of time constants.

The direct connect setup was used in this experiment. Sentences were presented in two

conditions: in quiet, and in four-talker babble at 10 dB SNR. A total of 16 sentences were

presented for each test condition. The speech-in-quiet condition reveals the effects of

envelope distortions, in particular envelope clipping, as well as reduced amplitude

modulation depth. The speech-in-noise condition is also subject to envelope distortion, and

in addition, a fast AGC can worsen the effective SNR and introduce cross-modulation

components between target speech and background noise.

10.3.4 Experiment 2: Performance-Intensity Function

The high presentation level used in experiment 1 was not representative of everyday

listening conditions. The objective of experiment 2 was to measure performance over a wide

range of presentation levels, i.e. to obtain AGC performance-intensity functions. Because of

the limited availability of the subjects’ time, only two AGC configurations were tested:

FEL75, which gave the lowest scores in Experiment 1; and EPL625, which gave the highest

scores in Experiment 1. The two AGCs were evaluated at presentation levels from 55 to 89

dB SPL, in four-talker babble, at two SNRs: 10 and 20 dB. The 20 dB SNR condition was

used, instead of the speech-in-quiet condition of experiment 1, to avoid ceiling effects.

All subjects were initially tested with the loudspeaker setup. Subject S1 obtained surprisingly

good scores at the higher presentation levels, apparently assisted by his residual contralateral

hearing (despite that ear being plugged), and therefore he was retested using the direct

connect setup.

P a g e | 139
Chapter 10 Proposed Envelope Profile Limiter

10.4 Results

10.4.1 Experiment 1: High Presentation Level

Six cochlear implant subjects participated in Experiment 1. Hypothesis testing was done for

each comparison by using the statistical analysis by Monte Carlo simulation, assuming

binomial distributions (Simon 1997). Appendix 3 described the statistical method. The

statistical significance tests done for each comparison between the four AGC configurations

are shown in Table 10-2. Since four hypothesis testings made on the dataset of the two AGC

systems, the Bonferroni correction was made to adjust the significance level. Hence the

significance level for the probability (p-value) of the two AGC conditions to be the same is

0.05/4 = 0.0125. The p-value lower than 0.0125 will reject the null hypothesis and indicate

the two conditions are different.

100
Percent Correct (%)

80

60

40

20

0
FEL75
EPL75
FEL625
Percent Correct (%)

100 EPL625
80

60

40

20

0
S1 S3 S4 S5 S6 S7 Mean
Subject

Figure 10-3 Effects of gain structure and release time on speech intelligibility of six
cochlear implant subjects in quiet (top panel) and in noise (bottom panel). Error bars
indicate one standard error.

P a g e | 140
Chapter 10 Proposed Envelope Profile Limiter

Subject AGC condition Quiet Noisy

scores (%) p-value scores (%) p-value

S1 EPL75 - FEL75 7.54 0.1049 -7.23 0.0556

EPL625 - FEL625 4.15 0.1287 11.96 0.0414

FEL625 - FEL75 17.73 5e-4** 39.08 <1e-4**

EPL625 - EPL75 14.34 0.0011** 58.27 <1e-4**

S3 EPL75 - FEL75 8.51 0.0067* 4.85 0.0979

EPL625 - FEL625 - - 11.72 0.0426

FEL625 - FEL75 - - 52.33 <1e-4**

EPL625 - EPL75 - - 59.32 <1e-4**

S4 EPL75 - FEL75 20.04 0.0002** 4.8 0.1771

EPL625 - FEL625 -1.98 0.1055 -1.49 0.4203

FEL625 - FEL75 28.9 <1e-4** 42.73 <1e-4**

EPL625 - EPL75 6.84 0.016 36.44 <1e-4**

S5 EPL75 - FEL75 2.33 0.2841 35.22 <1e-4**

EPL625 - FEL625 0 0.5 8.55 0.0505

FEL625 - FEL75 9.47 0.002** 55.8 <1e-4**

EPL625 - EPL75 7.14 0.0054* 29.13 <1e-4**

S6 EPL75 - FEL75 -4.08 0.0817 -11.39 0.0385

EPL625 - FEL625 0 0.5 -4.01 0.1818

FEL625 - FEL75 1.98 0.1055 56.57 <1e-4**

EPL625 - EPL75 6.06 0.0103* 63.95 <1e-4**

S7 EPL75 - FEL75 -0.04 0.5097 -0.38 0.4803

EPL625 - FEL625 -0.05 0.5052 14.49 0.0107*

FEL625 - FEL75 0.07 0.5 31.69 <1e-4**

EPL625 - EPL75 0.06 0.5044 46.56 <1e-4**

Group EPL75 - FEL75 5.72 0.0006* 4.31 0.0329

EPL625 - FEL625 0.42 0.3085 6.87 0.0031*

FEL625 - FEL75 6.9 0.0001** 46.4 0.0001**

EPL625 - EPL75 11.6 0.0001** 48.9 0.0001**

P a g e | 141
Chapter 10 Proposed Envelope Profile Limiter

Table 10-2 Statistical analysis on the scores of the individual and group. The score
difference was obtained by subtracting the score of the first AGC from the second
AGC. The asterisks indicate statistically significant difference in performance between
the two AGCs (* p < 0.0125, ** p < 0.0025).

10.4.1.1 In Quiet

The upper panel of Figure 10-3 shows the percent correct scores of the cochlear implant

subjects with each AGC configuration evaluated in quiet. Scores in quiet exhibited a ceiling

effect, especially for the 625 ms release time. S3 did not undertake the 625 ms release time

condition due to time constraints and because his scores were likely to have been near

ceiling; the mean scores for the 625 ms condition shown in the upper panel of Figure 10-3

are for the remaining subjects.

The group mean scores for sentences presented in quiet were more than 85% for all AGC

conditions. The EPL75 was 5.7 percentage points (from 86.9% to 92.6%) higher than that of

the FEL75. This improvement was statistically significant according to the binomial test.

The group mean scores of the FEL625 and the EPL625 were almost the same. The scores

were improved when the release time was increased from 75 ms to 625 ms; 11.6 percentage

points (from 86.9% to 98.5%) with the FEL and 6.9 percentage points (from 92.6% to

99.5%) with the EPL. The improvement was statistically significant. The score was mainly

determined by the release time.

10.4.1.2 In Four-talker Babble Noise (SNR = 10 dB)

The background noise caused a substantial degradation in speech understanding. The bottom

panel of Figure 10-3 shows the percent correct scores of the cochlear implant subjects with

each AGC configuration for sentences in the presence of four-talker babble noise at 10 dB

SNR. The overall speech intelligibility in noise was higher with the EPL than with the FEL.

The group mean score improvement was 4.4 percentage points (from 20.8% to 25.2%) for

the release time 75 ms and 6.9 percentage points (from 67.2% to 74.1%) for the release time

625 ms. The score difference between the EPL625 and the FEL625 was statistically

significant according to the paired comparison by the binomial test. Increasing the release

P a g e | 142
Chapter 10 Proposed Envelope Profile Limiter

time from 75 ms to 625 ms significantly improved speech intelligibility in noise for both

AGC systems. The group mean score improvement was 46.4 percentage points (from 20.8%

to 67.2%) for the FEL and 48.9 percentage points (from 25.2% to 74.1%) for the EPL

respectively.

When the percent correct scores of all four AGC conditions are compared, the EPL625

obtained the highest scores and the FEL75 obtained the lowest scores. The difference was

approximately 53.3 percentage points.

When the scores in quiet and in noise were compared for each AGC, a significant interaction

between the release time and the background noise was observed for each AGC system. That

means the slow release time is more important in noisy conditions.

10.4.2 Experiment 2: Performance-Intensity Functions

In Experiment 1, speech understanding was measured only for speech presented at the

highest presentation level. It is more informative to measure the performance of each AGC

systems over a wide range of speech presentation levels. Performance-Intensity (P-I)

functions observe the sensitivity of a gain algorithm to the intensity of the stimulus. If the P-I

functions were to be measured for the nine presentation levels from 55 to 89 dB SPL in two

SNR conditions for four AGC settings, a total of 72 sentence lists would be necessary.

Instead, in the interest of time, it was decided to measure only the P-I functions of the two

AGC systems from Experiment 1 with the largest performance difference between them: the

front-end limiter with the release time of 75 ms and the envelope profile limiter with the

release time of 625 ms. The P-I functions of these two AGC conditions were measured with

sentences presented at each presentation level from 55 to 89 dB SPL at two SNRs, 10 and 20

dB.

Six cochlear implant subjects participated in Experiment 2. Five of them also participated in

Experiment 1. The top six panels of Figure 10-4 show the percent correct scores of the

individual subjects and the bottom panel shows the group mean scores.

P a g e | 143
Chapter 10 Proposed Envelope Profile Limiter

100
Percent correct (%)

80

60

40

20
S5 S6
0
100
Percent correct (%)

80

60

40

20
S3 S4
0
100
Percent correct (%)

80

60 SNR10 FEL75
SNR20 FEL75
40 SNR10 EPL625
SNR20 EPL625
20
S1 S2
0
55 60 65 70 75 80 83 86 8955 60 65 70 75 80 83 86 89
Presentation Level (dB SPL) Presentation Level (dB SPL)

100
90

80
Percent correct (%)

70
60

50
40

30
SNR10 FEL75
20 SNR20 FEL75
10 Mean SNR10 EPL625
SNR20 EPL625
0
55 60 65 70 75 80 83 86 89
Presentation Level (dB SPL)

Figure 10-4 Performance-intensity functions of FEL75 and EPL625

P a g e | 144
Chapter 10 Proposed Envelope Profile Limiter

SNR = 20 dB
100 ** *
** ** ** **
**
Percent Correct(%)
80

60

40

20

0
SNR = 10 dB
100 FEL75
EPL625
Percent Correct(%)

80 ** **
** **
60 ** ** **

40

20

0
S1 S2 S3 S4 S5 S6 Mean
Subject

Figure 10-5 Comparison of scores between the FEL75 and the EPL625 for the
presentation levels above 70 dB SPL at SNR 20 dB (top panel) and SNR 10 dB (bottom
panel). The asterisks indicate statistically significant difference in performance between
the two AGCs (* p < 0.05, ** p < 0.01).

P a g e | 145
Chapter 10 Proposed Envelope Profile Limiter

Subject SNR = 20 dB SNR = 10 dB

 scores (%) p-value  scores (%) p-value

S1 10.81 0.0005** 25.98 0.0005**

S2 5.5 0.0005** 16.02 0.0005**

S3 9.23 0.0005** 22.49 0.0005**

S4 20.48 0.0005** 39.39 0.0005**

S5 2.86 0.0485* 17.92 0.0005**

S6 4.42 0.001** 38.55 0.0001**

Group 8.88 0.0001** 26.73 0.0001**

Table 10-3 Statistical analysis of the scores between the FEL75 and the EPL625 for the
presentation levels above 70 dB SPL in two SNR conditions. The asterisks indicate
statistically significant difference in performance between the two AGCs (* p < 0.05, **
p < 0.01).

10.4.2.1 At SNR 20 dB

The subjects scored more than 85% for the presentation level from 55 to 80 dB SPL at 20 dB

SNR. Above 80 dB SPL, the speech intelligibility started to decline with the FEL75. The

EPL625 consistently maintained a high level of performance at all presentation levels. The

group mean scores dropped by approximately 25 percentage points with the front-end limiter

when the presentation level was increased from 80 dB SPL to 89 dB SPL. The variation

between the subjects was also higher with the FEL75 than with the EPL625 at high

presentation levels. For example, subject S3 and S4 performed lower than the others at the

presentation level above 83 dB SPL.

For statistical analysis, the percent correct scores of each AGC system were averaged for the

presentation levels above 70 dB SPL, at which both AGCs were active. The top panel of

Figure 10-5 shows the percent correct scores of the two AGC systems for the presentation

level above 70 dB SPL in the SNR of 20 dB. All subjects showed significantly better

performance with the EPL625. The range of score improvement was from 4.4 to 20.6

percentage points with the average of 8.9 percentage points.

P a g e | 146
Chapter 10 Proposed Envelope Profile Limiter

10.4.2.2 At SNR 10 dB

At 10 dB SNR, the speech intelligibility varied widely between the subjects. The group mean

scores started to degrade above 65 dB SPL for both AGC systems. Approximately 70

percentage points drop in the percent correct scores was observed with the FEL75 when the

presentation level was increased from 65 dB SPL to 89 dB SPL. With the EPL625

approximately 30 percentage points drop in the mean scores was observed when the

presentation level was increased from 65 dB SPL to 89 dB SPL. The rate of score

degradation from 65 to 75 dB SPL was approximately the same for the two AGCs. The

scores continued to drop with FEL75 for the presentation level above 75 dB SPL. Compared

to that, no significant score degradation was observed with the EPL625 for the presentation

level above 75 dB SPL. All subjects scored more than 50% with the EPL625 at the highest

two presentation levels.

The percent correct scores of the two AGC systems were compared for the presentation

levels above 70 dB SPL as a group. The individual and group mean scores are shown in the

bottom panel of Figure 10-5. All subjects scored significantly higher with the EPL625 than

with the FEL75. The range of improvement was from 16 to 40 percentage points with the

average of 26.7 percentage points.

10.5 Discussions

In Experiment 1, when the release time was kept constant, the envelope profile limiter gave

equal or better speech intelligibility when compared to the front-end compression limiter.

The speech-in-quiet condition revealed the effect of envelope distortion. Figure 10-6 shows

the proportion of envelope samples that exceeded the LGF saturation level (i.e. the amount

of envelope clipping) with the front-end limiter for sentences presented at 89 dB SPL in

Experiment 1 (the envelope profile limiter is not shown because it had zero clipping under

all conditions). In both quiet and noise, increasing the release time substantially reduced the

amount of clipping, because the gain was lower on average.

P a g e | 147
Chapter 10 Proposed Envelope Profile Limiter

30
FEL75
FEL625
25
Percent envelopes clipped(%)

20

15

10

0
In Quiet In 10 dB SNR
Test Condition

Figure 10-6 Proportion of clipping for speech presented at 89 dB SPL with the front-
end compression limiter with the release time 75 ms and 625 ms

With 75 ms release time, about 10% of envelope samples were clipped, and the envelope

profile limiter provided a small benefit (approximately 6 percentage points), perhaps due to

better representation of spectral peaks. According to (Drullman 1995), modulation in spectral

peaks carries more intelligibility than that in spectral troughs. Hence the performance

improvement with the envelope profile limiter may be explained by its preservation of

spectral peaks. With 625 ms release time, clipping affected less than 4% of envelope

samples, so there was little scope for the envelope profile limiter to provide benefit. The

results suggest that the subjects were not very sensitive to envelope clipping. This is

consistent with the P-I functions shown in the previous study (§8.2.4), where subjects with

no AGC scored well at high SNR at very high presentation levels. Zeng and Galvin III

(1999) found relatively small reduction in cochlear implant vowel intelligibility (about 10

percentage points) in noise and in quiet when the electrical dynamic range was reduced to

one current level, giving a binary representation, which is equivalent to 100% of the pulses

being affected by envelope clipping. It should be noted that these results were obtained with

the ACE or SPEAK coding strategies, which select the envelopes with largest amplitude for

P a g e | 148
Chapter 10 Proposed Envelope Profile Limiter

stimulation in each cycle; it is possible that envelope clipping may be more detrimental in a

coding strategy such as CIS, which stimulates all channels in each cycle.

One methodological issue with Experiment 1 was the ceiling effect for sentences in quiet,

especially with 625 ms release time. To better observe a difference between the two AGC

types in quiet, more difficult speech material is needed. Isolated words (e.g. CNC words)

could be used, perhaps with a carrier phrase to exercise the dynamic behaviour of the AGC

systems. An alternative is to use low predictability or nonsense sentences (Boothroyd and

Nittrouer 1988).

The two experiments in this study clearly showed that fast compression speed was

detrimental to speech intelligibility. The effect was consistent across subjects, and was

greatest for speech in noise, with scores in Experiment 1 dropping by more than 45

percentage points when the release time was decreased. In Experiment 2, it is highly likely

that the advantage of EPL625 over FEL75 was primarily due to the longer release time. The

consistency and size of the detriment for fast compression with cochlear implants contrasts

with the mixed results obtained in studies with acoustic hearing aids (Gatehouse, Naylor and

Elberling 2006). Moore (2008) proposed that the benefit depended on the individual’s ability

to process temporal fine structure, which facilitates listening to the dips of background noise.

The results of the present study are consistent with that hypothesis, as cochlear implants are

unable to convey temporal fine structure.

Release time had a significant effect for speech in quiet in Experiment 1, implying that

temporal envelope distortion played a role. As cochlear implant speech perception relies on

envelope cues, the fidelity of the envelopes in the limited number of frequency channels is

important (Stone and Moore 2008). The compression speed of 1.6 Hz (for the release time

625 ms) is unlikely to have a significant effect on the modulation rate of the phonetic entities

except stress pattern. Compared to that, the compression speed of 13.33Hz (for the release

time 75 ms) affects the modulation of most phonetic entities; words, syllables and phonemes

in particular (Plomp 1983). Comparing the scores in quiet between the two release times, 75

and 625 ms, in each AGC system, supported the importance of preserving low rate

modulation. Temporal modulation below 16 Hz is perceptually most important for speech

P a g e | 149
Chapter 10 Proposed Envelope Profile Limiter

(Houtgast and Steeneken 1985; Drullman, Festen and Plomp 1994b). A low rate amplitude

modulation of less than 4 Hz could also contribute the speech intelligibility in noise when the

listeners relied only on the envelope information (Füllgrabe, Stone and Moore 2009). Based

on those studies, the AGC release time needs to be at least 500 ms to maintain temporal

modulation cues.

Speech intelligibility of the participants degraded significantly in noise. The additive noise

was more detrimental to speech intelligibility than the compression that reduced the

modulation depth (Drullman 1995). Several factors could contribute to the degradation of

speech intelligibility of cochlear implant recipients at high levels in the presence of noise;

the envelope modulation reduction, the energetic masking of the speech by the noise, the

distortion of the low-rate envelope modulation by compression, and cross-modulation

between target speech and noise introduced by compression. Each of these factors will be

studied in chapter 13.

The poor results with 75 ms release time probably explain why Spahr et al. (2007) found that

ESPrit 3G users performed worse in noise than users of the CII or Tempo+ sound processors

(which had dual-loop AGC systems). The ESPrit 3G processor (released in 2002), used a

front-end compression limiter with a release time of 82 ms, and although ASC was available,

it was not enabled in the default processor setting. In contrast, ASC is on by default in the

CP810 processor (released in 2009), giving a dual-loop AGC system. The performance-

intensity functions of Experiment 2 suggest the improvement that would be obtained with

ASC. Based on bench measurements, at 89 dB SPL and 10 dB SNR, ASC would reduce the

gain by 18 dB; this is equivalent to reducing the presentation level to 71 dB SPL, and

suggests that scores would improve from about 20% correct to 80% correct.

The experiments in the present study showed some speech intelligibility improvements with

the envelope profile limiter compared to the front-end compression limiter. However, the

envelope profile limiter may underperform in some listening conditions. For example, it

would reduce the level of all frequency channels for a narrowband intense sound. More

experiments are needed to explore the performance of the envelope profile limiter with

different test stimuli. The participants in this study had no experience with the envelope

P a g e | 150
Chapter 10 Proposed Envelope Profile Limiter

profile limiter, yet they commented that it did not sound much different to the front-end

limiter. Some of the subjects anecdotally mentioned that the target speech stood out clearer

in the background noise and therefore it was easier to identify words with the envelope

profile limiter. The overall impression on the proposed envelope profile limiter was positive.

To date, cochlear implants have used AGC systems that were essentially the same as those

found in hearing aids. A short AGC release time appears to have a more detrimental effect

in cochlear implants than in acoustic hearing aids, showing the importance of studies

involving cochlear implant recipients. The envelope profile limiter proposed in the present

study was specifically tailored to the needs of a cochlear implant system, and would not be

suitable for a hearing aid. Moving all the gain control elements to after the filterbank opens

new opportunities for optimisation and integration with other processing algorithms.

10.6 Conclusions

A front-end compression limiter prevents the amplitude of the audio signal from exceeding

the compression threshold. However, if the signal path is calibrated for typical speech

signals, then occasional envelope clipping can occur when the audio signal has narrow

bandwidth or low crest factor. The proposed envelope profile limiter eliminated envelope

clipping by monitoring the maximum envelope level (rather than the front-end level) and

setting the envelope compression threshold to be equal to the saturation level of the LGF. It

preserved the spectral profile by applying the same gain to all channels. The primary

conclusion of this study is that the envelope profile limiter is a feasible alternative to a front-

end compression limiter in a cochlear implant system. While both the front-end limiter and

the envelope profile limiter can extend the upper boundary of the operational acoustic range

of cochlear implant systems, the envelope profile limiter accomplishes this with less

distortion.

Among the two factors of the AGC systems investigated in the present study, the release

time was more important for speech intelligibility. The secondary conclusion of this study is

that a slow AGC is important for cochlear implant systems because fast compression speed

can reduce speech intelligibility.

P a g e | 151
C h a p t e r 11 Take-home Study with the Proposed Envelope
Profile Limiter

11 Take-home Study with the Proposed Envelope

Profile Limiter

11.1 Introduction

Listening tests in the laboratory can only replicate a subset of listening conditions

encountered in real life. Therefore take-home experiments are conducted to evaluate quality

and acceptance. A complete evaluation of a system or an algorithm comprises both

laboratory and take-home assessments.

In a take-home experiment, the subject rates a program after a period of acquaintance. The

subject answers a set of questions designed to find out different aspects of the new program

compared to a reference program. The reference program often is the program that the

subject has previously used most of the time. By having to listen with the new program for a

considerable period in the real-world listening environment, a listener can appreciate the

quality and intelligibility of not only speech but also other environmental sounds. Sound

quality and acceptance should be judged after a familiarization period of at least two to four

weeks.

In the previous chapter, the envelope profile limiter, implemented on the Nucleus-xPC

system, was evaluated with cochlear implant recipients in the laboratory. The results

indicated equal or better speech intelligibility performance with the envelope profile limiter

compared to the front-end limiter. The next step was to conduct a take-home study. This

required implementing the envelope profile limiter on the Nucleus CP810 sound processor

(§7.4.5.1). This chapter describes the implementation of the envelope profile limiter. It

explains the clinical fitting procedures and analyzes the results of the clinical questionnaire.

P a g e | 152
C h a p t e r 11 Take-home Study with the Proposed Envelope
Profile Limiter

11.2 DSP Implementation

The DSP implementation of the proposed EPL was done by the present author in Assembly

language. Necessary modification in the DSP firmware was made and thoroughly tested by

the present author before the subjects were equipped with the Nucleus CP810 sound

processors with the modified programs.

The modified signal path on DSP 1 is shown in Figure 11-1. The signal path allowed either

the unified gain model (UGM) or the envelope profile limiter (EPL) to be selected. The

inputs to DSP 1 are 16 samples time domain data and 128 FFT output samples. The UGM

consisted of three cascaded AGCs with slow, medium and fast time constants respectively

(§5.5.4). Based on the comparison between the level of input signal and a set of compression

thresholds, the UGM block calculated the gain and passed it to the AGC Gains block which

applied the gain as well as scaling. The operation of the EPL was as described in §10.2.2.

The down-slew-rate of the EPL was set to 40 dB per second (equivalent to a release time of

625 ms) and instantaneous infinite compression was applied to the channel envelopes if the

maximum amplitude exceeded the compression threshold. To avoid substantial code

modification, whilst still allowing a recipient to switch between a UGM program and an EPL

program on the same processor, ADRO was retained at the end of the signal path in DSP 1.

It was not ideal for ADRO to operate after the EPL because it was possible that ADRO could

increase the gain and cause envelope clipping.

P a g e | 153
C h a p t e r 11 Take-home Study with the Proposed Envelope
Profile Limiter

Figure 11-1 The DSP 1 signal path of the Nucleus CP810 sound processor with a switch
between UGM and EPL

11.3 Fitting Procedures

Five cochlear implant recipients participated in this study. The subjects were provided with a

loaner CP810 speech processor loaded with research firmware that supported the EPL. The

study duration was at least two weeks.

The Nucleus CP810 sound processor can store a maximum of four programs; each program

with parameter settings oriented for specific listening situations. A recipient typically has

four programs in his/her sound processor; Everyday, Noise, Focus and Music. The recipient’s

Everyday and Noise programs were used selected in this take-home study. The loaner

processor was loaded with the recipient’s standard Everyday and Noise programs (using the

UGM) in the program slot 1 and 2, and the modified Everyday and Noise programs (using

the EPL) in the program slow 3 and 4. The recipient’s own program contained the UGM

with a combination of different AGC systems such as ASC, Whisper, the compression

limiter and ADRO.

The subject could easily select programs by using the CR120 remote assistant. The

combination of AGC systems that was used in each program is shown in Table 11-1.

P a g e | 154
C h a p t e r 11 Take-home Study with the Proposed Envelope
Profile Limiter

Subject Standard Program Modified Program

Everyday Noise Everyday Noise

S1 ASC, ADRO Zoom, ASC, ADRO EPL, ADRO Zoom, ADRO, EPL

S3 ASC, ADRO Zoom, ASC, ADRO EPL, ADRO Zoom, ADRO, EPL

S4 ASC, ADRO Zoom, ASC, ADRO EPL, ADRO Zoom, ADRO, EPL

S5 ASC, ADRO Zoom, ASC, ADRO EPL, ADRO Zoom, ADRO, EPL

S7 ADRO Zoom, ASC, ADRO EPL, ADRO Zoom, ADRO, EPL

Table 11-1 Combination of AGCs in each program. Zoom is a fixed beamformer with a
super-cardioid polar response.

The subjects were asked to compare two programs: the UGM Everyday program and the

EPL Everyday program. The subjects were provided with the Cochlear Implant Clinical

Questionnaire (CICQ), developed by the HEARing Cooperative Cochlear Research Centre.

The questions in CICQ are listed in Appendix 2. There are a total of 18 items in the

questionnaire. Each item asks the subject to rate the helpfulness of each program in day-to-

day listening situations. Each program is rated on a five-point response scale, ranging from 1

(not helpful) to 5 (extremely helpful). A subject can select ‘Not Applicable’ if they did not

experience that listening condition. In addition to the helpfulness rating, subjects were also

asked to provide the overall preferred program in quiet and noisy conditions. After selecting

the preferred program, the subjects were also asked to rate the sound quality in quiet and

noisy conditions. The rating was on a four-point scale: the preferred program was (i) very

similar to, (ii) slightly better than, (iii) moderately better, and (iv) much better than the other

programs. The subjects were encouraged to wear the loaner processor as much as they could

to cover many listening scenarios.

P a g e | 155
C h a p t e r 11 Take-home Study with the Proposed Envelope
Profile Limiter

11.4 Results and Discussions

Five subjects, S1, S3, S4, S5 and S7, participated in the take-home study. S1, S3 and S7 use

contralateral hearing aids. S5 is a bilateral subject but only one loaner processor was

provided. Subject S7 answered the questionnaire differently from other subjects such that he

compared the benefit of the UGM program to listening with his contra-hearing alone. He

then compared the benefit of the EPL program to the UGM program. Hence the helpfulness

indicated by S7 was different from the other subjects and therefore not included in Figure

11-2 and Figure 11-3. The average ratings of the two programs were equal for 11 out of 18

questions. Subjects rated the EPL program as less helpful than the UGM program for six out

of 18 conditions. Those conditions in which the UGM program performed less optimally

were concerned with background noise. For example, questions 3, 6, 13, 15 and 18 are about

conversation in noisy background and question 11 is about soft sounds in the environment.

The absolute helpfulness was less than okay in those six conditions.

Extreme
UGM
EPL

Very
Helpfulness

Okay

Little

None
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Question #

Figure 11-2 Helpfulness indication of the UGM and EPL programs for each question in
CICQ

P a g e | 156
C h a p t e r 11 Take-home Study with the Proposed Envelope
Profile Limiter

The questions in the CICQ are about a cochlear implant recipient understanding of speech

and everyday sounds with each program. These questions can be sorted into the following

categories:

 One-to-one conversation in quiet

 One-to-one conversation in noise

 Group conversation in quiet

 Group conversation in noise

 Listening to TV/radio

 Telephone conversation

 Music

 Other sounds in the environment

Figure 11-3 shows the helpfulness of each program in the categorized listening conditions

mentioned above. As per the absolute helpfulness indicator, the subjects found the EPL

program was of little help for conversations in noisy background. The subjects found UGM

more helpful than the EPL program in noisy environments. None of them were very helpful

for group conversation in noise.

P a g e | 157
C h a p t e r 11 Take-home Study with the Proposed Envelope
Profile Limiter

EPL
Telephone conversation UGM

Other sounds in the environment

Music

Listening TV/Radio

Group conversation in noise

One-to-one conversation in noise

Group conversation in quiet

One-to-one conversation in quiet

None Little Okay Very Extreme


Helpfulness

Figure 11-3 Helpfulness indication of the UGM and EPL programs in the categorised
listening conditions

In subsequent analysis, the benefit score for each question was defined as the rating for the

EPL program minus the rating for the UGM program, giving a score in the range -4 to +4. A

positive sign indicates that subjects found EPL more helpful than UGM. Similarly, zeros

means no difference and a negative sign indicates that subjects found EPL less helpful than

UGM. The results from subject S7 were included in this analysis because S7 answered the

questions in this manner, i.e. he gave a helpfulness indication of the EPL program with

respect to the UGM program.

From the questionnaire, seven questions concerning conversation in quiet, and six questions

concerning conversation in noisy conditions were selected for analysis. Table 11-2 shows the

mean benefit score and the overall preferred program for the questions concerning quiet and

noisy backgrounds. According to a t-test, the mean benefit score was not significantly

different from zero (i.e. no net benefit). The questionnaire showed that the subjects had no

strong preference of one program over the other. Despite S7 showing a net benefit for the

EPL program in quiet, he still preferred the UGM program.

Some subjects reported anecdotally that background noise was more objectionable with the

EPL program. It should be noted that in most cases, the UGM program incorporated a slow

P a g e | 158
C h a p t e r 11 Take-home Study with the Proposed Envelope
Profile Limiter

front-end AGC (ASC); the only exception was the Everyday program of S7. In contrast, the

EPL program did not include ASC. ADRO is a slow multichannel AGC with a background

noise rule to reduce high-level background noise. However, the standard parameter setting of

ADRO aimed at improving audibility than noise reduction. That is why the EPL program

with ADRO was not helpful in noisy situations. The subjective reports indicated that a slow

AGC system that reduced the overall level in noisy situations was important for listening

comfort.

Some recipients noticed sound drop-outs with the EPL program after impulsive sounds, for

example door slams. This is a known issue for AGC with a release time over 300 ms

(Stöbich, Zierhofer and Hochmair 1999; Moore 2008).

Subject Quiet Noisy

Preferred Mean benefit score Preferred Mean benefit score

program program

S1 UGM -1 None 0.25

S3 None 0 EPL 0

S4 - - - -1

S5 EPL 0.3 UGM -1

S7 UGM 1.6 UGM -0.8

Mean 0.2 -0.5

Table 11-2 Mean benefit scores and preferred program in quiet and noisy background.
S4 did not answer some questions, as indicated by dashes.

P a g e | 159
C h a p t e r 11 Take-home Study with the Proposed Envelope
Profile Limiter

11.5 Conclusions

The subjects found that the EPL program was similar to their standard program in easy

listening conditions. However, they perceived that the background noise was louder with the

EPL program when the conversation was held in noisy background. The findings from the

take-home study implied that fast AGC alone was not sufficient for day-to-day listening

situations. A slow AGC is necessary to adjust the presentation level of the input signal from

one listening conditions to another. It is particularly useful to have an algorithm like ASC

that reduces the level of background noise. It is hypothesized that the EPL program would be

improved if it was used after a slow AGC. Such AGC arrangement is similar to the dual-loop

AGC of Boyle et al. (2009).

The previous chapter showed that it was feasible to replace the front-end compression limiter

with the envelope profile limiter. The placement of all AGCs at a location, preferably just

before the LGF, would simplify the signal path. In this hypothetical signal path, the ASC

functionality and ADRO would be combined, and the EPL would be the last gain block

before the LGF, to eliminate envelope clipping. The next chapter will investigate further

optimization of the signal path.

P a g e | 160
Chapter 12 Proposed Adaptive Loudness Growth Function

12 Proposed Adaptive Loudness Growth Function

12.1 Introduction

In the previous chapters, some existing AGC systems and the new envelope profile limiter

system were evaluated with cochlear implant subjects. All those systems adjust the level of

the input signal by varying gain. This chapter proposes a new technique, called Adaptive

Loudness Growth Function (ALGF), which adjusts the input dynamic range.

Three configurations of the ALGF were evaluated with cochlear implant subjects in the

laboratory by sentence tests, both with fixed levels and with the adaptive roving-level SRT

test. The performance of the ALGF was compared with the performance of the existing AGC

system, consisting of the front-end tri-loop AGC and ADRO.

12.2 Background

The goal of an AGC system in a hearing device is to improve speech intelligibility by

increasing the audibility of soft components, while keeping loud components at a

comfortable level. This goal is not easily achieved if the input signal is a mixture of a target

and non-target signal, for example speech in the presence of competing noise. When an AGC

compresses noisy speech, the audibility of the desired speech signal suffers. Conversely, an

AGC cannot improve the audibility of low-level speech without amplifying the noise

components.

A typical AGC (§5.2) has a simple level detector, and a gain unit that compresses the signal

level above the compression threshold. ADRO (§5.5.5) uses percentile estimators and a more

sophisticated set of gain rules. As shown in chapters 9 and 10, the performance of these

AGC systems was mixed in noisy conditions. The output SNR can be degraded by an AGC

either by amplifying noise between speech components or by compressing high level speech

components more than interfering noise components. ADRO and ASC employ a noise floor

P a g e | 161
Chapter 12 Proposed Adaptive Loudness Growth Function

estimator to control the level of background noise. Yet they cannot satisfy both audibility

and noise reduction goals at the same time.

One approach to reduce the noise level without compromising the audibility of target speech

is to employ a noise reduction algorithm before the gain control algorithms. In that

traditional approach, the dynamic range is fixed and the noise is pushed down out of the

dynamic range by the noise reduction system. The enhanced speech is compressed or

amplified by the subsequent gain algorithm to be within the designated dynamic range. The

AGC system and the noise reduction method operate independently from each other. In this

chapter a new technique is proposed. Unlike conventional AGC systems, the proposed

technique expands or contracts the dynamic range of the LGF. It is hypothesized that

adjusting the dynamic range with the input signal can achieve both audibility and noise

reduction simultaneously. Since the new technique is integral to the LGF and the dynamic

range varies adaptively with the input signal level, it is called the Adaptive Loudness Growth

Function (ALGF).

The inspiration of the ALGF comes from the prior art invention of Neal (2011). Neal

proposed a technique to optimize the input dynamic range by setting the lower end of the

dynamic range, i.e. the base level of the LGF, to the estimated noise floor. This way the

noise reduction is realized without affecting the audibility of a target signal.

All commercially available cochlear implant systems use a fixed input dynamic range.

Holden et al. (2011) recommended as clinical guidelines to use raised T-levels (> 10% of M-

levels) and to provide the recipients with two programs: one with a wide input dynamic

range for soft speech understanding in quiet, and another with a narrow input dynamic range

for noisy conditions. Nucleus cochlear implant systems typically use an input dynamic range

of between 30 and 50 dB. The input dynamic range has an impact on the speech

intelligibility of cochlear implant recipients in different listening conditions (Spahr, Dorman

and Loiselle 2007). For everyday use, there are advantages and limitations associated with

the input dynamic range (Wolfe et al. 2009). A wide dynamic range may be more likely to

facilitate a range of loudness experiences within the small electrical dynamic range of the

cochlear implant user. For example, recipients have more benefits from a wide dynamic

P a g e | 162
Chapter 12 Proposed Adaptive Loudness Growth Function

range for speech in quiet because low level speech components become more accessible

(James et al. 2003; Dawson et al. 2007; Spahr, Dorman and Loiselle 2007). The effects of a

wide input dynamic range on the intelligibility of noisy speech are mixed. One study shows

no performance difference between different input dynamic range settings (Dawson et al.

2007) but another study shows performance degradation for speech presented in noise

(Spahr, Dorman and Loiselle 2007). A narrow dynamic range may be more useful in noisy

environments because it would partially reduce the noise mapped into the electrical dynamic

range (Wolfe et al. 2009). These studies suggested to the present author that the dynamic

range should be adaptive to the condition of the input signal to maintain the intelligibility.

The aim of the ALGF is to satisfy the requirement of different input dynamic ranges for

different listening scenarios.

1
2 degrees of freedom
 
0.8
Output magnitude

saturation level
base level

0.6

0.4

0.2  
1 degree of freedom
0

-50 -40 -30 -20 -10 0 10


Filter band amplitude (dB)

Figure 12-1 Degree of freedom for the input signal to move within the dynamic range of
LGF

The ALGF adjusts both the lower and upper end of the input dynamic range according to the

varying level of the input signal. Figure 12-1 depicts the adjustment to the level of the signal

in the input dynamic range by a conventional AGC system and the proposed ALGF. A

conventional AGC provides only one degree of freedom. The signal (shown by a red cross in

P a g e | 163
Chapter 12 Proposed Adaptive Loudness Growth Function

the diagram) is either shifted left (towards the base level), or right (towards the saturation

level) by varying the gain. The input dynamic range is fixed. The proposed technique can

adjust the relative position of the signal within the dynamic range, with two degrees of

freedom, by adjusting the base and saturation level independently.

12.3 Implementation of the ALGF

The implementation of the ALGF in Simulink is shown in Figure 12-2. The ALGF consisted

of two main components: the saturation level regulator (SLR) and the base level regulator

(BLR). The saturation level regulator consisted of a fast saturation level regulator (FSLR)

and a slow saturation level regulator (SSLR).

P a g e | 164
Chapter 12 Proposed Adaptive Loudness Growth Function

Figure 12-2 Top level Simulink block diagram of ALGF

P a g e | 165
Chapter 12 Proposed Adaptive Loudness Growth Function

The logarithmic compression of the Nucleus LGF was described in §4.3.7. The scaling

function of the traditional LGF was calculated by using a fixed saturation and fixed base

level (equation 4.8).

The scaling function of the ALGF used the adaptive base and saturation level and produced

the output between 0 and 1. The adaptive saturation and base level were updated

continuously. The noise floor and hence the base level was calculated independently in each

frequency channel because noise in real life was mostly coloured and distributed differently

in different frequency channels. The ALGF slowly adapted the base level to the estimated

noise floor to reduce the level of noise without introducing processing artifact such as

musical tones. Besides, slow noise removal allows access to environmental sounds although

some of them are noise-like. The scaling function of the ALGF is described as:

(12.1)

where and are the envelope and the adaptive base level of the frequency channel k

respectively. is the adaptive saturation level for all frequency channels.

In Figure 12-2, the blocks showing the fast and slow saturation level regulators are coloured

blue and the block showing the base level regulator is coloured yellow. The input to both fast

and slow saturation level regulators was the maximum amplitude of the envelopes across

frequency channels. The fast saturation level regulator worked like a compression limiter and

the slow saturation level regulator worked like a slow AGC. The three orange switches in

Figure 12-2 allowed different configurations of the ALGF. For example, the ALGF could be

configured to employ only an adaptive saturation level or only an adaptive base level. When

both adaptive base and saturation regulators were bypassed, the ALGF became the ordinary

loudness growth function. One of the configurations of the ALGF evaluated with cochlear

implant subjects was the ALGF with a fixed dynamic range setting (§12.4.2.4).

Decreasing the saturation level is equivalent to increasing the gain in a conventional AGC.

For listening comfort, a minimum value was imposed on the dynamic range of the ALGF to

prevent the slow saturation level from approaching the base level when there was no signal

P a g e | 166
Chapter 12 Proposed Adaptive Loudness Growth Function

or only low-level signal present at the input. This minimum dynamic range was realized by

restricting the slow saturation level to stay a specified distance above the maximum of base

levels across frequency channels. The final saturation level was the maximum of the fast

saturation level and the slow saturation level. The computational complexity of the ALGF is

similar to the existing UGM, ADRO and LGF which it would replace. The detailed

implementation of the fast and slow saturation level regulators and the base level regulator

are explained in the subsections below.

12.3.1 Fast Saturation Level Regulator

The implementation of the fast saturation level regulator (FSLR) is shown in Figure 12-3.

The operation of the FSLR was similar to the envelope profile limiter (EPL) described in

§10.2.2 except that the direction of the saturation level was opposite to the direction of gain

in the EPL. The FSLR tracked the maximum level of the channel envelopes over time and

increased the fast saturation level instantaneously if the maximum amplitude exceeded the

fast saturation level from the previous frame. Hence it prevented the channel envelopes from

exceeding the saturation level. Unlike the EPL, the FSLR tracked the maximum level of the

channel envelopes regardless of its relative position from the reference level, for example C-

SPL in the traditional signal path. The potential disadvantage of adjusting the signal level

this way would be the loss of intensity variation in the output signal.

The EPL applied the same amount of gain to all frequency channels to preserve the short-

term spectral profile of the input signal. The saturation level regulator of the ALGF also used

one saturation level for all frequency channels. The other important parameters of the FSLR

were the time constants. The down-slew-rate of the fast saturation level determined how

quickly the system responded to a sudden increase in the input level. The experimental

results of the EPL (§10.4.1) with the cochlear implant subjects showed that a release time

long enough to maintain the modulation rate of the phonetic entities was necessary to

maintain the speech intelligibility of sentences presented in noise. A down-slew-rate of -40

dB/s (i.e. equivalent to the release time 625 ms) was considered a good choice for a single-

loop AGC. However, the ALGF employed a dual-loop saturation level regulator, therefore a

down-slew-rate shorter than -40 dB/s was also acceptable without affecting the speech

P a g e | 167
Chapter 12 Proposed Adaptive Loudness Growth Function

intelligibility in noise. The reason for anticipating a faster down-slew-rate (i.e. a shorter

release time) was to reduce the ‘pumping’ effect that would be noticeable for the release time

in the range between 100 ms and 500 ms. Two values of down-slew-rate were tested: -80 and

-300 dB/s.

Figure 12-3 Simulink block diagram of fast saturation level regulator

As shown in the block diagram, the FSLR also had a hold timer block. The hold timer

facilitated the adjustment of the fast saturation level to stay at the same level for a certain

duration before it was released from the compression mode to track the level of the input

signal.

12.3.2 Slow Saturation Level Regulator

The Simulink block diagram of the slow saturation level regulator (SSLR) is shown in

Figure 12-4. The objective of the SSLR was to slowly adjust the saturation level so that the

input signal was kept within the upper part of the input dynamic range. Hence the SSLR

improved the audibility of the input signal with minimal distortion on the temporal envelope.

The input signal to the SSLR was the maximum level of the channel envelopes.

P a g e | 168
Chapter 12 Proposed Adaptive Loudness Growth Function

Figure 12-4 Simulink block diagram of slow saturation level regulator

The SSLR acted as a feature-based AGC. The main components of the SSLR were a feature

extraction unit that calculated the proportion of clipping, and a level adjustment unit that

adjusted the slow saturation level based on the decision made by the comparison between the

extracted features and the preset thresholds. In this context, clipping is defined as the

maximum envelope exceeding the slow saturation level, i.e. the envelopes would clip if the

FSLR was not present. In other words, clipping proportion was the proportion of time that

the FLSR was active. The level adjustment unit operated on a set of rules as described

below:

if clip_prop > clip_threshold % Comfort


slow_sat_level = prev_slow_sat_level * up_slew_rate;
elseif slow_sat_level - peak_env > hold_distance % Audibility
slow_sat_level = prev_slow_sat_level * down_slew_rate;
else % maintain
slow_sat_level = prev_slow_sat_level;
end

Rules were arranged in the order of importance. The highest priority was given to the

comfort rule. The SSLR increased the slow saturation level if the proportion of clipping

exceeded the threshold. If the input signal satisfied the comfort criterion, then the SSLR

checked if the audibility criterion was met. The SSLR compared a slow-moving envelope of

P a g e | 169
Chapter 12 Proposed Adaptive Loudness Growth Function

the input signal and the slow saturation level from the previous frame. A switch was placed

to choose either the slow envelope or the RMS level of the input signal for checking the

audibility criterion. If the slow saturation level was above the slow envelope of the input

signal by more than a certain level of magnitude (i.e. the hold distance) then the audibility

criterion was not met. The SSLR decreased the slow saturation level to boost the audibility

of the input signal. If both the comfort and audibility criteria were met, the SSLR kept the

slow saturation level at the previous level.

It is more important to reduce the amount of clipping than to increase the audibility (as per

the precedence of the rules). Therefore the up-slew-rate of the SSLR was set considerably

faster than the down-slew-rate. Rather than using fixed time constants as in other AGC

systems, the up-slew-rate was adaptively changed with the proportion of clipping. This

advanced feature allowed the slow saturation level to be more responsive to the variation in

the input signal.

12.3.2.1 Clipping proportion

Figure 12-5 shows the Simulink block diagram of the clipping proportion calculator. The

input signal to the calculation unit was the maximum amplitude of the channel envelopes.

The SSLR counted the number of envelope samples above the slow saturation level within a

frame and divided by total number of samples in the frame.

Figure 12-5 Simulink block diagram of clipping proportion calculation

P a g e | 170
Chapter 12 Proposed Adaptive Loudness Growth Function

12.3.2.2 Adaptive up-slew-rate

A high proportion of clipping could result in loudness discomfort and speech intelligibility

reduction, especially in noise. It was justifiable to use the proportion of clipping itself as a

multiplier if the up-slew-rate would be adaptive to the input signal. The up-slew-rate of the

SSLR was multiplied with the proportion of clipping so that the rate became faster for larger

proportion of clipping.

12.3.2.3 Hold distance

The purpose of using the hold distance in the level determination rules of the SSLR was to

stabilize the slow saturation level. It also helped to maintain a slow temporal modulation of

the envelopes in each frequency channel. For example, the temporal envelope of the output

signal would have less modulation if the input signal and the saturation level modulated

together. For a listening condition with roving-level input signal, for instance conversation

between the recipient and another person, it could be perceptually annoying for cochlear

implant recipients if they noticed the frequent level adjustment by the algorithm for the

overall level changes of the input signal. The overall levels of the two voices are different at

the microphone of the sound processor depending on the distance between them. According

to the inverse square law, the sound pressure level decreases about 6 dB for every doubling

of distance in the free field. The distance between the behind-the-ear sound processor and the

recipient’s mouth is approximately 0.2 m. In a hypothetical listening situation where another

person was one metre away from the recipient, the sound pressure level difference between

the two voices, measured at the microphone, could be approximately 15 dB.

The hold distance could be set between 10 and 15 dB in the experiments. The diagrams in

Figure 12-6 show a slow saturation level, with and without the hold distance, varied with the

roving input signal between two input stimuli simulating a typical acoustic scenario between

the recipient and another person (Mr. X). The envelope waveform between 8 and 16.2

seconds belongs to the recipient and that between 16.2 and 24 second belongs to Mr. X in

this simulation. The diagrams show that using the hold distance avoided modulation in the

slow saturation level.

P a g e | 171
Chapter 12 Proposed Adaptive Loudness Growth Function

Figure 12-6 Simulated listening condition showing the slow saturation level with the
hold distance of 0 dB (top panel) and 15 dB (bottom panel)

12.3.3 Base Level Regulator

The Simulink block diagram of the base level regulator (BLR) is shown in Figure 12-7.

Rather than directly using the estimated noise floor as the base level, the BLR adjusted the

base level with respect to the estimated noise level. This allowed the BLR to control the

speed of noise tracking. The BLR slowly increased the base level if the noise floor was

above the base level. Similarly, the BLR reduced the base level down quickly when the noise

floor was below the base level. The up-slew-rate of the BLR was considerably slower than

the down-slew-rate. Like any noise reduction algorithm, the BLR could introduce two types

of error: overestimation and underestimation of true noise power. The overestimation error

P a g e | 172
Chapter 12 Proposed Adaptive Loudness Growth Function

could potentially distort the input signal. In contrast, the output could still be noisy due to

underestimation error. The reason for making the down-slew-rate faster than the up-slew-rate

was to reduce the overestimation error more than the underestimation error.

Figure 12-7 Simulink block diagram implementation of base level regulator

Any sub-band noise floor estimation algorithms can be appropriately applied in the BLR.

The first noise estimator applied in the BLR was Lin’s time recursive averaging noise

estimation method (§5.6.2). Later, a new noise estimator was proposed by incorporating the

minima-controlled feature into Lin’s noise estimator.

12.3.3.1 Lin’s time-recursive averaging sub-band noise floor estimator

The Simulink block diagram of Lin’s time-recursive averaging sub-band noise estimation

algorithm is shown in Figure 12-8. The smoothing parameter calculation procedure is shown

Figure 12-9.

P a g e | 173
Chapter 12 Proposed Adaptive Loudness Growth Function

Figure 12-8 Simulink block diagram of Lin's recursive averaging noise floor estimator

Figure 12-9 Simulink block diagram of smoothing parameter calculation

The algorithm was adapted to incorporate into the BLR. As shown in Figure 12-8, the power

of input stimuli was smoothed by the first-order IIR filter (equation 5.6). The filter

coefficient of the smoothing filter was updated for every input signal (equation 5.7 &

Figure 12-9). The filter coefficient was limited to the maximum value of 0.9999 and the

minimum value of 0.8. The reason of using the maximum value less than one was to avoid

the deadlock of updating the estimated noise floor only with the previous level. The

minimum value of 0.8 on the other hand was to slow down the algorithm to avoid

overestimating noise floor. The filter coefficient of the final smoothing filter of the

noise power estimate is set to 0.9667 (equation 5.8). The magnitude of the estimated noise

was obtained by taking the square root of the estimated noise power. The filter coefficients

and the limit values were chosen empirically. Compared to the original method, the adapted

P a g e | 174
Chapter 12 Proposed Adaptive Loudness Growth Function

noise estimation used in the BLR was less aggressive to update the noise floor with the noisy

signal power to reduce the overestimation error.

12.3.3.2 New minima-controlled recursive averaging noise floor

estimator

The Lin’s time-recursive averaging method could follow changes in the noisy signal power

yet the algorithm often overestimated the noise power during speech activity. Therefore a

control feature was proposed to add into the Lin’s noise floor estimator to reduce the

overestimation error. Since it monitored the minimum level of the input stimuli to control

Lin’s recursive averaging method, it was called a minima-controlled recursive averaging

(MCRA) noise estimation algorithm. The Simulink diagram of the MCRA noise estimation

algorithm is shown below. The minima-controlled feature was coloured pink in the block

diagram.

Figure 12-10 Simulink block diagram of the proposed MCRA noise floor estimator

P a g e | 175
Chapter 12 Proposed Adaptive Loudness Growth Function

Figure 12-11 Simulink block diagram of the minima-controlled feature in the proposed
MRCA noise estimation algorithm

Figure 12-11 shows the implementation of the minima-controlled feature. The updated noise

estimate at the output of the low-pass IIR filter was compared with the minimum of the

smoothed signal power. The minimum signal power was obtained at each frame by

comparing the smoothed signal power at the current frame with the one from the previous

frame. At the end of the search period, the minimum signal power was updated with the new

signal power and the search continued. The new signal power update was restricted to 10 dB

maximum to avoid transients. The noise estimate was then updated by taking the weighted

average of the estimated noise power and the minimum signal power. The coefficient of the

weighted average filter was empirically set to 0.8. The final noise power estimate was

obtained by taking the minimum of the noise power estimate before and after the minimum

search procedure.

Because of the minima-controlled procedure, the noise power estimate could bias towards

lower value. Therefore the bias compensation was done as described in the Martin’s

minimum statistics noise estimator (§5.6.1).

P a g e | 176
Chapter 12 Proposed Adaptive Loudness Growth Function

12.4 Offline Data Analysis

12.4.1 Comparison of the Noise Estimators

This section evaluates the performance of the noise floor estimators used in the base level

regulator: (i) Lin’s recursive averaging noise estimator and (ii) the proposed minima-

controlled recursive averaging noise estimator. For comparison, the noise estimator using the

Martin’s minimum statistics method was also evaluated.

The algorithms were run offline in Simulink. 15 sentences were concatenated with 0.5

second of silent period before and after each sentence. Noise was presented with no silent

gaps. Two test conditions, fixed and roving level sentences, were tested to observe the

effectiveness of the noise estimators. The SNR was fixed at 8 dB for each test condition. The

histogram of the normalized error between the magnitudes of the actual noise and the

estimated noise was generated. The normalized error was calculated for each frequency

channel as:

(12.2 )

The magnitude of the true noise floor was smoothed first. The normalized error could range

from - ∞ to 1. Zero normalized error indicates an ideal noise estimation, i.e. the magnitude of

the estimated noise is equal to the true noise. Positive normalized error indicates the

underestimation of the true noise. Conversely, negative normalized error indicates the

overestimation of the true noise. The upper limit, i.e. one, of the normalized error indicates

that the magnitude of an estimated noise level is significantly lower than the true noise. The

normalized error of -1 indicates that the magnitude of the estimated noise floor is twice

larger than that of the true noise.

P a g e | 177
Chapter 12 Proposed Adaptive Loudness Growth Function

12.4.1.1 Fixed Presentation Level

Three different types of noise signals were used in this simulation; four-talker babble noise,

city noise and Long-Term Average Speech-Shaped (LTASS) noise. Noises were

concatenated as shown in Figure 12-12.

0.4
speech
0.3 noise

0.2

0.1
Amplitude

-0.1

-0.2
 speech in four-talker babble  speech in city noise  speech in LTASS noise 
-0.3

-0.4
8 9 10 11 12 13 14 15 16
Time (s)

Figure 12-12 Fixed-level Sentences presented with three types of noise: four-talker
babble, city noise and LTASS noise

Figure 12-13 shows the magnitude of noisy speech, actual noise and estimated noise by the

three noise estimators at the output of the frequency channels with the centre frequency (Fc)

of 367 Hz, 1101 Hz and 4282 Hz.

Martin's minimum statistics noise floor estimation


-20
Fc = 4282 Hz Noisy speech
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS

-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)

P a g e | 178
Chapter 12 Proposed Adaptive Loudness Growth Function

Lin's recursive averaging noise floor estimation


-20
Fc = 4282 Hz Noisy speech
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS

-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)
Proposed minima-controlled recursive averaging noise floor estimation
-20
Fc = 4282 Hz Noisy speech
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS

-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)

Figure 12-13 Estimation of three different noises presented at the fixed level by
Martin’s minimum statistics method (top panel), Lin’s recursive averaging method
(middle panel) and the proposed MCRA method (bottom panel)

P a g e | 179
Chapter 12 Proposed Adaptive Loudness Growth Function

Normalized Noise Estimation Error

0.12
0.09 Fc = 4282 Hz
0.06
0.03
0
0.12
0.09
PDF

Fc = 1101 Hz
0.06
0.03
0
0.12
0.09 Martin's min stats
Lin's RA Fc = 367 Hz
0.06 New MCRA
0.03
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Normalized Error

Figure 12-14 Probability density functions of the normalized error for estimating fixed
level noises

Figure 12-14 show the probability density function of the normalized estimation error

produced by each noise estimator in the frequency channels centered at 367Hz, 1101 Hz and

4282 Hz. The RMS level of the noise was fixed at the same level for all three types of noise.

The proportion of the normalized error with the Martin’s minimum statistics algorithm and

the new MCRA method was mainly concentrated between 0.4 and 1. Lin’s recursive

averaging method on the other hand produced the normalized error distribution centered at

about 0. The error distribution was uniformly spread across the range. Compared to the other

two noise estimators, Lin’s method showed considerably less underestimation error.

However, Lin’s method produced more negative normalized error than the other two noise

estimators.

The top and bottom panels of Figure 12-13 show that the estimated noise by the minimum

statistics method and the new MCRA method could follow the variation in the true noise

floor. However, the overall level was approximately 10 dB lower than the true noise floor.

The middle panel of Figure 12-13 shows that Lin’s recursive averaging method tracked the

true noise floor of all three types of noise reasonably well although overestimation

occasionally occurred.

P a g e | 180
Chapter 12 Proposed Adaptive Loudness Growth Function

12.4.1.2 Roving presentation level

Two different types of noise were used: four-talker babble and LTASS noise. Both the

sentences and the background noise were roved together in a sequence of 50, 65 and 80 dB

SPL as shown in Figure 12-15.

speech
1.5 noise

0.5
Amplitude

-0.5

-1

-1.5

8 9 10 11 12 13 14 15 16
Time (s)

1.5
speech
1 noise

0.5
Amplitude

-0.5

-1

-1.5

8 9 10 11 12 13 14 15 16
Time (s)

Figure 12-15 Roving-level sentences presented in four-talker babble (top panel), and
LTASS noise (bottom panel)

Figure 12-16 show the magnitude of the noisy speech, the true noise and noise estimated by

the three potential noise estimators for the base level regulator. The speech and noise

analysis was done at the frequency channels with the centre frequency (Fc) of 367 Hz, 1101

Hz and 4282 Hz.

P a g e | 181
Chapter 12 Proposed Adaptive Loudness Growth Function

Martin's minimum statistics noise floor estimation


-20
Noisy speech Fc = 4282 Hz
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS

-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)
Lin's recursive averaging noise floor estimation
-20
Noisy speech Fc = 4282 Hz
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS

-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)

P a g e | 182
Chapter 12 Proposed Adaptive Loudness Growth Function

Proposed minima-controlled recursive averaging noise floor estimation


-20
Noisy speech Fc = 4282 Hz
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS

-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)

Figure 12-16 Estimation of roving-level four-talker babble noise by: Martin’s minimum
statistics method (top panel), Lin’s recursive averaging method (middle panel) and the
proposed MCRA method (bottom panel)

The results show that the minimum statistics algorithm and the proposed MCRA algorithm

were slow on tracking the noise floor when the presentation level of the noisy speech was

increased to another level by 15 dB. Compared to them, Lin’s recursive averaging method

tracked the noise floor more effectively.

P a g e | 183
Chapter 12 Proposed Adaptive Loudness Growth Function

Normalized Noise Estimation Error - 4-talker babble

0.12
0.09 Fc = 4282 Hz
0.06
0.03
0
0.12
0.09
PDF

Fc = 1101 Hz
0.06
0.03
0
0.12
0.09 Martin's min stats
Lin's RA Fc = 367 Hz
0.06 New MCRA
0.03
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Normalized Error

Figure 12-17 Probability density function of the normalized error for estimating the
roving-level four-talker babble noise

Figure 12-17 show the probability density functions of the normalized estimation error

produced by each noise estimator at three frequency channels centered at 367Hz, 1101 Hz

and 4282 Hz. The proportion of the normalized error with the Martin’s minimum statistics

algorithm and the new MCRA method was mainly concentrated between 0.8 and 1 with

peaks at about 0.95. The highest probability density at 0.95 indicated the underestimation of

the true noise magnitude by both algorithms. The top and bottom panels of Figure 12-16

show that the estimated noise by the Martin’s minimum statistics method and the new

MCRA method was below the true noise floor by 10 to 20 dB, especially when the

presentation level was roved up. The Lin’s recursive averaging method on the other hand

produced more evenly distributed normalized error. The middle panel of Figure 12-16 shows

that the estimated noise by Lin’s recursive averaging method could effectively track the true

noise floor of four-talker babble noise. However, the estimated noise floor was occasionally

above the true noise. Based on the comparison of the error distribution curves, Lin’s

recursive averaging method was most effective for tracking roving-level babble noise

amongst the three noise estimators.

P a g e | 184
Chapter 12 Proposed Adaptive Loudness Growth Function

Figure 12-18 show the magnitude of the noisy speech, the true noise and the estimated noise.

The stationary LTASS noise was used in this analysis.

Martin's minimum statistics noise floor estimation


-20
Noisy speech Fc = 4282 Hz
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS

-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)
Lin's recursive averaging noise floor estimation
-20
Noisy speech Fc = 4282 Hz
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS

-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)

P a g e | 185
Chapter 12 Proposed Adaptive Loudness Growth Function

Proposed minima-controlled recursive averaging noise floor estimation


-20
Noisy speech Fc = 4282 Hz
-30
Actual noise
-40
Estimated noise
-50
-60
-70
-20
Fc = 1101 Hz
-30
dB FS

-40
-50
-60
-70
-20
Fc = 367 Hz
-30
-40
-50
-60
-70
8 9 10 11 12 13 14 15 16
Time(s)

Figure 12-18 Estimation of roving-level LTASS noise from the noisy speech by:
Martin’s minimum statistics method (top panel), Lin’s recursive averaging method
(middle panel) and the proposed MCRA method (bottom panel)

The results show that the Martin’s minimum statistics and the proposed MCRA methods

were slow on tracking the noise floor when the presentation level was stepped up 15 dB to

another level. Lin’s recursive averaging method on the other hand tracked the noise floor

more effectively than the other two methods.

P a g e | 186
Chapter 12 Proposed Adaptive Loudness Growth Function

Normalized Noise Estimation Error - LTASS

0.12
0.09 Fc = 4282 Hz
0.06
0.03
0
0.12
0.09
PDF

Fc = 1101 Hz
0.06
0.03
0
0.12
0.09 Martin's min stats
Lin's RA Fc = 367 Hz
0.06 New MCRA
0.03
0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Normalized Error

Figure 12-19 Probability density function of the normalized error for estimating
roving-level LTASS noise

Figure 12-19 shows the probability density functions of the normalized estimation error

produced by each noise estimator in three selected frequency channels. The probability of

the normalized error estimated by the Martin’s minimum statistics algorithm was mainly

concentrated in the region between 0.8 and 1. The error PDF showed two peaks and the peak

at almost 1 was suspected to be very low noise estimate at the beginning of an increased

presentation level. The Lin’s recursive averaging method showed the error density spread

across the entire range. The normalized error showed two peaks and concentrated more in

the region of -0.4 and 0.4. A small peak at about 0.8 indicates that Lin’s algorithm showed

the underestimation of the true noise at the beginning of the sentences when the presentation

level was stepped up to another level. The probability distribution of the normalized error

produced by the Lin’s method was higher than the other two methods on the negative region.

It indicates the occasional overestimation of the true noise floor. The error PDF of the

proposed MCRA was in between the error PDFs of the other two methods. Compared to

Lin’s algorithm, it still showed the normalized error concentrated more on the positive

region and peaked at about 0.7. According to the normalized error density comparison, Lin’s

P a g e | 187
Chapter 12 Proposed Adaptive Loudness Growth Function

recursive averaging method could track the roving-level stationary noise most effectively.

The other two noise estimators showed significant underestimation error.

12.4.2 Processing Conditions

The following signal processing conditions were analyzed. The same processing conditions

were evaluated with cochlear implant subjects in the laboratory:

 Tri + ADRO

 ALGF-1

 ALGF-2

 ALGF-2F

The setting of manual sensitivity and the LGF were common in all processing conditions and

described in Table 12-1.

Other program parameter Value Unit


Sensitivity 12
C-SPL 65 dB SPL
Dynamic range of the LGF 40 dB
LGF Q Value 20

Table 12-1 Setting of other program parameters

12.4.2.1 Tri + ADRO

Tri + ADRO represents the existing AGC system of the Nucleus signal path. It consists of

the front-end tri-loop AGC (§5.5.4) and ADRO (§5.5.5). The standard setting of ADRO was

used (James et al. 2002). The parameter settings of the tri-loop AGC were shown in the third

column of Table 5-2.

12.4.2.2 ALGF-1

ALGF-1 was the first version of ALGF evaluated with cochlear implant recipients. The

ALGF-1 consisted of the fast and slow saturation level regulators and the base level

regulator.

P a g e | 188
Chapter 12 Proposed Adaptive Loudness Growth Function

The parameter setting of the ALGF-1 is shown in Table 12-2. The down-slew-rate of the

FSLR was equivalent to 312.5 ms release time. According to the offline data analysis of the

noise estimators with different noises, the Lin’s recursive averaging noise floor estimator

showed occasional overestimation. Hence, the BLR of the ALGF-1 conservatively used the

up-slew-rate of 3 dB/s to reduce the overestimation error.

Value Unit
ALGF Parameter
ALGF-1 ALGF-2 ALGF-2F
Minimum dynamic range 20 20 40 dB
Q (steepness factor) 20 20 20
Fast Saturation Level Regulator
Down-slew-rate -80 -300 -300 dB/s
Slow Saturation Level Regulator
Down-slew-rate -9 -12 -12 dB/s
Maximum up-slew-rate 90 60 60 dB/s
Frame duration for clipping 100 100 100 ms
Clipping proportion threshold 0 0 0 %
Proportion
Hold distance 20 15 15 dB
Proportion
Base Level Regulator
Minimum base level - - -40 dBFS
Down-slew-rate -300 -300 - dB/s
Up-slew-rate 3 10 - dB/s
Noise estimator Lin’s MCRA -
Noise floor bias compensation 0 (Martin - dB
Enabled fixed dynamic range False False True Boolean
2001)
Fixed dynamic range - - 40 dB

Table 12-2 Parameter setting of three configurations of ALGF

12.4.2.3 ALGF-2

ALGF-2 was the second version of ALGF evaluated with cochlear implant recipients. The

ALGF-2 employed the same saturation level regulator as ALGF-1, but with different

parameter settings. The parameter setting of the ALGF-2 is shown together with the other

two ALGF versions in Table 12-2. The base level regulator of the ALGF-2 employed the

proposed minima-controlled recursive averaging (MCRA) noise floor estimator.

Since the ALGF used a dual-loop saturation level regulator, it was anticipated that a faster

down-slew-rate in the fast saturation level regulator would not be too detrimental to the

P a g e | 189
Chapter 12 Proposed Adaptive Loudness Growth Function

speech intelligibility. Hence, a faster down-slew-rate was employed in the ALGF-2. The

down-slew-rate of -300 dB/s is equivalent to the release time of 83.3 ms.

The ALGF-2 aimed to achieve more audibility. Hence the hold distance of 15 dB was used.

To give a fair comparison between the two processing conditions: Tri + ADRO vs. ALGF-2,

the fixed channel gains (§4.3.5) were bypassed in the ALGF-2 processing as in the

processing of Tri + ADRO.

According to the offline data analysis of the noise estimators with different noises, the new

MCRA noise floor estimator showed little-to-none overestimation. Hence, the BLR of the

ALGF-2 used high up-slew-rate of 10 dB/s.

12.4.2.4 ALGF-2F

The ALGF-2F is similar to ALGF-2, but with a fixed dynamic range setting, which bypassed

the BLR. Since it only employed the adaptive saturation level regulator, it behaved like a

normal AGC. The base level in this configuration was calculated as the magnitude of the

saturation level (in dB) minus the fixed dynamic range (in dB). The performance comparison

between the ALGF-2 and the ALGF-2F allows observing the importance of using the

estimated noise floor as the base level.

Figure 12-20 shows the Simulink block diagram of the ALGF-2F. A switch colored orange

near the BLR routed either a fixed base level or an adaptive base level as per the estimated

noise floor.

P a g e | 190
Chapter 12 Proposed Adaptive Loudness Growth Function

Figure 12-20 Simulink block diagram of the ALGF with a fixed dynamic range setting

P a g e | 191
Chapter 12 Proposed Adaptive Loudness Growth Function

The ALGF-2F set a fixed dynamic range of a 40 dB. 40 dB was chosen because it was a

typical dynamic range setting used in the Nucleus signal path with the existing AGC system.

The parameter setting of the ALGF-2F is shown in Table 12-2.

12.4.3 Offline Performance Analysis of the Gain Algorithms

This section investigates the performance of the Nucleus signal path with the existing AGC

system and the three versions of the ALGF by visualizing the input and output signals. The

implementation of the signal path in Nucleus-xPC system is shown in Figure 12-21. The

same processing conditions were also evaluated with cochlear implant subjects in the

laboratory.

P a g e | 192
Chapter 12 Proposed Adaptive Loudness Growth Function

Figure 12-21 Simulink block diagram of the Nucleus signal path with the existing AGC
systems and the ALGF

P a g e | 193
Chapter 12 Proposed Adaptive Loudness Growth Function

Roving-level sentences were presented with the LTASS noise at 8 dB SNR. The input

stimuli represented a stimulus produced by the roving-level SRT test (§7.4.4). The

presentation levels of the sentences were 50, 65 and 80 dB SPL. The background noise was

presented 3 seconds before and after each sentence. In the actual testing with the recipients, a

beep was presented two seconds before each sentence to alert the recipients. To reproduce

the stimuli tested with the recipients, a beep was also included in the test stimuli of the

offline data analysis.

12.4.3.1 Tri + ADRO

The input to the front-end tri-loop AGC was a time-domain waveform and the input signal to

ADRO was a vector of filterbank outputs. The output signal was taken at the output of the

loudness growth function before mapping. Figure 12-22 show the input, output and gain

signals of the frequency channels centered at 367 Hz, 1100 Hz and 4282 Hz.

Figure 12-22 shows the simulated input, output and gain signals of the Tri + ADRO

processing. The bottom plot in each panel shows the input signal. The middle plot shows the

gain signals and the top plot shows the signals at the output of the LGF. The input signal plot

shows that the entire sentence was above the saturation level of the LGF for 80 dB SPL

presentation level. The sentence presented at 65 dB SPL was in the upper range of the input

dynamic range between the base and saturation level. Speech occasionally exceeded the

saturation level at 65 dB SPL. The sentence presented at 50 dB SPL resided in the lower

range of the dynamic range. The fast and medium AGCs of the tri-loop AGC were active at

65 dB SPL. ADRO was not very active at this level. All three AGCs of the tri-loop AGC and

ADRO were active at 80 dB SPL. The input signal at 80 dB SPL was reduced to within the

input dynamic range. The gain diagram of the tri-loop AGC shows that the fast AGC

operated on component-level, the medium AGC on sentence-level and the slow AGC on the

overall presentation level at 80 dB SPL. At 50 dB SPL, only ADRO was active, providing

positive gain to improve audibility. The tri-loop AGC was released from compression mode

at 50 dB SPL.

P a g e | 194
Chapter 12 Proposed Adaptive Loudness Growth Function

Fc = 367 Hz
1

Gain (dB) LGF output (dB)


0.5

0
5
-5 ADRO
-15 Tri
-25
Input (dB SPL)

90 Input Fixed sat level Fixed base level


70
50
30
10
10 15 20 25 30

Fc = 1101 Hz
1
Gain (dB) LGF output (dB)

0.5

0
5
-5 ADRO
-15 Tri
-25
Input (dB SPL)

90 Input Fixed sat level Fixed base level


70
50
30
10
10 15 20 25 30

Fc = 4282 Hz
1
Gain (dB) LGF output (dB)

0.5

0
5
-5 ADRO
-15 Tri
-25
Input (dB SPL)

90 Input Fixed sat level Fixed base level


70
50
30
10
10 15 20 25 30

Figure 12-22 Input, output and gain signals produced from the Nucleus signal path
with Tri + ADRO. There are three sentence presentations in the figure, having levels of
65, 80 and 50 dB SPL. Each presentation consists of three seconds of noise, then the
sentence, and then three seconds of noise. Signals at the frequency channels centred at
367 Hz (top panel), 1101 Hz (middle panel) and 4282 Hz (bottom panel) were analyzed.

P a g e | 195
Chapter 12 Proposed Adaptive Loudness Growth Function

12.4.3.2 ALGF-1

Figure 12-23 shows the input signals, output signals and base level and saturation level of

the ALGF-1 operated in the signal path shown in Figure 12-21. The bottom plot in each

panel shows the input signal together the adaptive saturation and adaptive base level of the

ALGF. The top plot shows the signals at the output of the ALGF. It was observed that the

ALGF output at the beginning of the roved-up noise was noticeably high. This was due to

the noise estimator taking a few seconds to respond to a sudden increase in the overall

presentation level. However, after two seconds, the noise level then decreased, becoming

considerably lower than in the Tri + ADRO processing. The saturation level was increased at

the onset of the beep and stayed for the duration of the presentation at 65 and 80 dB SPL. It

was then gradually reduced when the sentence presentation level was decreased, from 80 to

50 dB SPL. The release time of the ALGF-1 was up approximately at the beginning of the

sentence at 50 dB SPL. The ALGF-1 could be released at a faster rate, with the risk of

pumping effect.

P a g e | 196
Chapter 12 Proposed Adaptive Loudness Growth Function

Fc = 367 Hz
1

0.8

LGF output
0.6

0.4

0.2

0
Input Adaptive sat level Adaptive base level Fixed sat level Fixed base level
90
dB SPL

70
50
30
10
10 15 20 25 30
Time(s)
Fc = 1101 Hz
1

0.8
LGF output

0.6

0.4

0.2

0
Input Adaptive sat level Adaptive base level Fixed sat level Fixed base level
90
dB SPL

70
50
30
10
10 15 20 25 30
Time(s)
Fc = 4282 Hz
1

0.8
LGF output

0.6

0.4

0.2

0
Input Adaptive sat level Adaptive base level Fixed sat level Fixed base level
90
dB SPL

70
50
30
10
10 15 20 25 30
Time(s)

Figure 12-23 Input, output and gain signals produced from the Nucleus signal path
with the ALGF-1. There are three sentence presentations in the figure, having levels of
65, 80 and 50 dB SPL. Each presentation consists of three seconds of noise, then the
sentence, and then three seconds of noise. Signals at the frequency channels centred at
367 Hz (top panel), 1101 Hz (middle panel) and 4282 Hz (bottom panel) were analyzed.

P a g e | 197
Chapter 12 Proposed Adaptive Loudness Growth Function

12.4.3.3 ALGF-2

Figure 12-24 shows the input signals, output signals and base level and saturation level of

the ALGF-2. Compared to the output of the ALGF-1, the overall saturation level of the

ALGF-2 was closer to the input signal due to a shorter hold distance used in the slow

saturation level regulator. Hold distance was set to 20 dB in the ALGF-1 and 15 dB in the

ALGF-2. That would provide more audibility at the output signal.

Although the Lin’s recursive averaging noise estimator of the ALGF-1 could track the noise

floor faster than the new MCRA noise estimator of the ALGF-2 (§ 12.4.1.2), the up-slew-

rate of the base level regulator in the ALGF-1 was deliberately made slow to reduce potential

overestimation error of the true noise floor. The up-slew-rate of the base level regulator in

the ALGF-2 was at least three times faster than that of the ALGF-1. Hence, the noise

tracking performance of the two ALGFs were similar. The down-slew-rate of the slow

saturation level regulator of the ALGF-2 was faster than that of the ALGF-1. As a result, the

compressive mode from 80 dB SPL was finished approximately one second before the next

sentence was presented at 50 dB SPL.

P a g e | 198
Chapter 12 Proposed Adaptive Loudness Growth Function

Fc = 367 Hz
1

0.8

LGF output
0.6

0.4

0.2

0
Input Adaptive sat level Adaptive base level Fixed sat level Fixed base level
90
dB SPL

70
50
30
10
10 15 20 25 30
Time(s)
Fc = 1101 Hz
1

0.8
LGF output

0.6

0.4

0.2

0
Input Adaptive sat level Adaptive base level Fixed sat level Fixed base level
90
dB SPL

70
50
30
10
10 15 20 25 30
Time(s)
Fc = 4282 Hz
1

0.8
LGF output

0.6

0.4

0.2

0
Input Adaptive sat level Adaptive base level Fixed sat level Fixed base level
90
dB SPL

70
50
30
10
10 15 20 25 30
Time(s)

Figure 12-24 Input, output and gain signals produced from the Nucleus signal path
with ALGF-2. There are three sentence presentations in the figure, having levels of 65,
80 and 50 dB SPL. Each presentation consists of three seconds of noise, then the
sentence, and then three seconds of noise. Signals at the frequency channels centred at
367 Hz (top panel), 1101 Hz (middle panel) and 4282 Hz (bottom panel) were analyzed.

P a g e | 199
Chapter 12 Proposed Adaptive Loudness Growth Function

12.4.3.4 ALGF-2F

Figure 12-25 shows the input signals, output signals and base level and saturation level of

the ALGF-2F. The difference between the ALGF-2 and the ALGF-2F could be seen from the

two dynamic ranges and the base levels shown in Figure 12-24 and Figure 12-25.

Fc = 367 Hz
1

0.8
LGF output

0.6

0.4

0.2

0
Input Adaptive sat level Adaptive base level Fixed sat level Fixed base level
90
dB SPL

70
50
30
10
10 15 20 25 30
Time(s)
Fc = 1101 Hz
1

0.8
LGF output

0.6

0.4

0.2

0
Input Adaptive sat level Adaptive base level Fixed sat level Fixed base level
90
dB SPL

70
50
30
10
10 15 20 25 30
Time(s)
Fc = 4282 Hz
1

0.8
LGF output

0.6

0.4

0.2

0
Input Adaptive sat level Adaptive base level Fixed sat level Fixed base level
90
dB SPL

70
50
30
10
10 15 20 25 30
Time(s)

Figure 12-25 Input, output and gain signals produced from the Nucleus signal path
with the ALGF-2F. There are three sentence presentations in the figure, having levels
of 65, 80 and 50 dB SPL. Each presentation consists of three seconds of noise, then the
sentence, and then three seconds of noise. Signals at the frequency channels centred at
367 Hz (top panel), 1101 Hz (middle panel) and 4282 Hz (bottom panel) were analyzed.

P a g e | 200
Chapter 12 Proposed Adaptive Loudness Growth Function

12.5 Clinical Studies

Two studies were conducted to evaluate the processing conditions described in §12.4.2. At

least four cochlear implant subjects participated in each study. The speech intelligibility of

the subjects were measured with the designated processing conditions on the same day using

the same test setup to minimize any performance variation due to external factors not related

to the algorithms under test. Before each test session, the subjects engaged in brief

conversation with the experimenters, using the processing condition to be tested. The overall

loudness was balanced between the processing conditions by increasing volume. Before

loudness balancing, the subjects reported that the ALGF was softer than the Tri + ADRO.

Subject S6 with ALGF-2 had to increase the overall C-levels by 2%. No formal procedure

was followed to provide additional listening experience with each processing condition.

Subjects were not able to use the processing before and after the test sessions.

12.5.1 Test setup

The real-time Nucleus-xPC system (§7.4.5.2) was used. The speech intelligibility of the

cochlear implant subjects were evaluated with each processing condition by using the fixed-

level test (§7.4.3) and the interleaved roving-level SRT test (§7.4.4.2). Two fixed-level tests,

sentences presented at 50 dB SPL and 80 dB SPL at a preset SNR, were conducted to

complement the results of the adaptive SRT test. Four-talker babble noise was used for fixed

test and LTASS noise was used for adaptive roving-level SRT test.

In the roving-level SRT test, the background noise was presented continuously. A beep was

presented two seconds before each sentence to alert the subject that a sentence was about to

be presented.

P a g e | 201
Chapter 12 Proposed Adaptive Loudness Growth Function

12.5.2 Study 1: Tri + ADRO vs. ALGF-1

Four subjects, S1, S4, S5 and S6, participated in this experiment. Two processing conditions

were compared:

 Tri + ADRO

 ALGF-1

The program order was counterbalanced between subjects.

12.5.2.1 Results

12.5.2.1.1 Interleaved roving-level SRT test

Figure 12-26 shows the SRT of each subject as well as group mean SRT at each presentation

level. The group mean SRT of the ALGF-1 was comparable to the that of the Tri + ADRO at

all three presentation levels; half of the group performed better with the ALGF-1 and the

other half performed better with Tri + ADRO. Table 12-3 shows the group mean SRT

difference between the two processing, the standard deviation across subjects and p-value

calculated by a t-test to check the statistical significance between the SRTs of the two

processing. The standard deviation at each presentation level shows that SRT variation

between subjects was larger with Tri + ADRO than with ALGF-1 at 50 and 80 dB SPL but

the variation was comparable at 65 dB SPL.

A large SRT difference between the two processing conditions was observed for some

subjects. For example, S4 received a large benefit from the ALGF-1 in all three presentation

levels. The improvement was approximately 6, 10 and 4 dB for the presentation level at 50,

65 and 80 dB SPL respectively. S5 on the other hand got more benefit from Tri + ADRO at

all three presentation levels. The SRT improvement with Tri + ADRO was approximately 3,

1.5 and 7 dB at the presentation level of 50, 65 and 80 dB SPL respectively. The results of

S1 and S6 were mixed.

P a g e | 202
Chapter 12 Proposed Adaptive Loudness Growth Function

Interleaved Roving-level SRT Test

4 50 dBSPL
SRT (dB) 2
0
-2
-4
Tri+ADRO
-6 ALGF-1
Interleaved Roving-level SRT Test

4 65 dBSPL
2
SRT (dB)

0
-2
-4
-6 Interleaved Roving-level SRT Test

4 80 dBSPL
2
SRT (dB)

0
-2
-4
-6

S1 S3 S4 S5 S6 S7 Mean
Subject

Figure 12-26 SRT comparison between Tri + ADRO and ALGF-1. Error bars indicate
one standard deviation from the mean. The asterisks indicate statistically significant
difference in performance between the two processing conditions (* p < 0.05, ** p <
0.01).

P a g e | 203
Chapter 12 Proposed Adaptive Loudness Growth Function

50 dB SPL 65 dB SPL 80 dB SPL

SRT (dB) : Tri + ADRO 1.9 (4.16) 0.48 (2.88) -1.02 (4.58)

SRT (dB) : ALGF-1 0.64 (1.82) -2.23 (2.95) -0.02 (0.45)

Mean difference (dB) -1.26 -2.71 1

p-value 0.7 0.4 0.5

Table 12-3 Statistical analysis of SRT measured with Tri + ADRO and the ALGF-1 at
50, 65 and 80 dB SPL. The value inside the bracket indicates the standard deviation. p-
values were calculated by a t-test for the hypothesis testing of the significance on the
SRT difference.

12.5.2.1.2 Fixed presentation level test

Figure 12-27 and Figure 12-28 show the percent correct scores of the subjects evaluated by

the fixed tests at 50 and 80 dB SPL respectively. The group mean scores show comparable

performance between the two processing at both presentation levels. Among the subjects, S4

consistently scored better with the ALGF in both 50 and 80 dB SPL fixed tests. The score

improvements of S4 were more than 25 and 50 percentage points for the fixed test in 50 and

80 dB SPL respectively. S1 scored almost equally with both processing conditions in both

fixed tests. The scores of the other two subjects, S5 and S6, are mixed.

P a g e | 204
Chapter 12 Proposed Adaptive Loudness Growth Function

Fixed Level Test (50 dBSPL)

Tri+ADRO
100 ALGF-1
90
80

Percent correct(%)
70
60
50
40
30
20
10
0
S1 S3 S4 S5 S6 S7 Mean
Subject

Figure 12-27 Percent correct scores of four cochlear implant subjects with Tri + ADRO
and with ALGF-1 in the fixed test at 50 dB SPL in noise. Error bars indicate one
standard deviation from the mean.

Fixed Level Test (80 dBSPL)

100
90
80
Percent correct(%)

70
60
50
40
30
20
10
0
S1 S3 S4 S5 S6 S7 Mean
Subject

Figure 12-28 Percent correct scores of four cochlear implant subjects with Tri + ADRO
and with ALGF-1 in the fixed test at 80 dB SPL in noise. Error bars indicate one
standard deviation from the mean.

P a g e | 205
Chapter 12 Proposed Adaptive Loudness Growth Function

12.5.2.2 Discussion

The first version of the ALGF was successfully evaluated with four cochlear implant

subjects. The overall SRT results showed a comparable group performance between the two

processing conditions for roving-level sentences. The individual SRT results showed that

some subjects received the new algorithm well, but some did not. The SRT difference

between the two processing conditions was significantly large for some subjects.

Anecdotal reports from the subjects on the first-time listening experience with the ALGF

were positive. Some subjects reported that they heard soft environment sounds in the

laboratory for the first time, for example, keyboard tapping and paper rustling. Some

subjects reported that they occasionally noticed the pumping effect of the ALGF-1 when the

compression was released. One subject mentioned that after he spoke, the investigator’s

voice was soft at the beginning and slowly returned back to normal. The possible reason

could be the down-slew-rate (-80 dB/s which is equivalent to 278 ms release time) of the fast

saturation level saturation. Subjects can perceive the pumping effect of sounds following

cessation of a preceding louder sound by an AGC with the release time between 100 ms and

3 seconds (Dillon 2001).

The other possible shortcoming in the ALGF could be the base level regulator. In order to

control the noise floor estimator, the base level regulator restricted the up-slew-rate of the

base level tracking the estimated noise floor. The down-slew-rate of the base level regulator

was significantly faster than the up-slew-rate. This arrangement ensured that the base level

could not stay above the input signal when the presentation level was decreased. When the

presentation level was increased as in the roving SRT test, the base level regulator took a few

seconds to increase the base level to catch up with the increase in the overall presentation

level. This potentially allowed background noise at the beginning of the increased

presentation level to reside in the dynamic range.

Study 1 showed that level adjustment of the input signal by adjusting the dynamic range was

feasible. The dynamic range of the LGF could potentially be the true dynamic range of an

input signal to satisfy both audibility and noise reduction at the same time. For example,

speech in quiet can have a large dynamic range. Conversely, the same segment of speech can

P a g e | 206
Chapter 12 Proposed Adaptive Loudness Growth Function

have a reduced dynamic range when presented in noise. Hence the expansion of the dynamic

range during the periods of clean speech and the contraction of the range by raising the base

level during the noisy periods is fully justified.

In summary, the SRTs in roving-level sentences and the percent correct scores at the fixed

presentation levels were comparable. The performance variation between subjects was lower

with ALGF-1. This study showed that the ALGF was a feasible alternative of the AGCs in

cochlear implant systems. It also showed that the parameter space of the ALGF was large

and some parameters could be set differently to improve the speech intelligibility of cochlear

implant recipients.

12.5.3 Study 2: Tri + ADRO vs. ALGF-2

After ALGF-1 was evaluated with the cochlear implant recipients, changes were made to

improve performance, resulting in ALGF-2. Two processing conditions were evaluated in

this experiment:

 ALGF-2

 Tri + ADRO

The parameter setting of the Tri + ADRO was identical to the one used in Study 1. The

parameter setting of the ALGF-2 was described in Table 12-2.

12.5.3.1 Results

Six cochlear implant subjects, S1, S3, S4, S5, S6 and S7, participated in this experiment. All

of them participated in the interleaved roving-level SRT test. Only five subjects participated

in the fixed test at 50 dB SPL and three participated in the fixed test at 80 dB SPL.

12.5.3.1.1 Interleaved roving-level SRT test

Figure 12-29 shows the SRT results of the individual subjects and the group mean SRTs with

Tri + ADRO and ALGF-2 at each presentation level. The group mean SRT results were

comparable between the two processing conditions at 50 and 65 dB SPL. The group mean

SRT of the ALGF-2 was significantly better than that of the Tri +ADRO at 80 dB SPL. All

P a g e | 207
Chapter 12 Proposed Adaptive Loudness Growth Function

subjects performed better with the ALGF-2 at 80 dB SPL. According to a t-test, the

improvement was statistically significant. The SRT variation within group was comparable

between the two processing conditions at each presentation level. The difference between the

two standard deviations was less than 1 dB for all presentation levels.

For most subjects, the SRT difference between the two conditions was less than 1 dB which

is probably within the test-retest reliability (Dawson, Hersbach and Swanson 2013).

However, some of the subjects showed a tendency to improve with the ALGF-2 at 50 and 65

dB SPL. For example, at 50 dB SPL, the SRT improvement of S4 and S7 with ALGF-2 was

more than 6 dB. Among the subjects, S4 obtained a large benefit from the ALGF-2 in all

three presentation levels (the SRT improvement was 6 dB, 3 dB and 4 dB at 50, 65 and 80

dB SPL respectively). In contrast, S1 at 65 dB SPL performed better with Tri + ADRO than

with the ALGF.

In summary, performance with ALGF-2 was generally equal or better than Tri + ADRO.

P a g e | 208
Chapter 12 Proposed Adaptive Loudness Growth Function

Interleaved Roving-level SRT Test

4 50 dBSPL
SRT (dB) 2
0
-2
-4
Tri+ADRO
-6 ALGF-2
Interleaved Roving-level SRT Test

4 65 dBSPL
2
SRT (dB)

0
-2
-4
-6 Interleaved Roving-level SRT Test

4 80 dBSPL
2
SRT (dB)

0
-2
-4 *p<0.02
-6

S1 S3 S4 S5 S6 S7 Mean
Subject

Figure 12-29 SRT comparison between Tri + ADRO and ALGF-2. Error bars indicate
one standard deviation from the mean. The asterisks indicate the statistical significance
of the difference between the two processing conditions (* p < 0.05, ** p < 0.01).

P a g e | 209
Chapter 12 Proposed Adaptive Loudness Growth Function

50 dB SPL 65 dB SPL 80 dB SPL

SRT (dB) : Tri + ADRO -0.29 (2.0) -0.81 (1.84) -0.53 (0.98)

SRT (dB) : ALGF-2 -2.18 (2.68) -0.85 (0.95) -2.42 (1.27)

Difference (dB) -1.89 -0.04 -1.89

p-value 0.2 0.96 0.02*

Table 12-4 Statistical analysis of SRT measured with Tri+ADRO and the ALGF-2 at
50, 65 and 80 dB SPL. The value inside the bracket indicates the standard deviation. p-
values were calculated by a t-test for the hypothesis testing of the significance on the
SRT difference.

12.5.3.1.2 Fixed presentation level test

The group mean score of ALGF-2 was not significantly different to that of Tri + ADRO for

sentences presented at both 50 and 80 dB SPL in noise.

Fixed Level Test (50 dBSPL)

100 Tri+ADRO ALGF-2

90
80
Percent correct(%)

70
60
50
40
30
20
10
0
S1 S3 S4 S5 S6 S7 Mean
Subject

Figure 12-30 Percent correct scores of five cochlear implant subjects with Tri + ADRO
and with ALGF-2 in the fixed test at 50 dB SPL in noise. Error bars indicate one
standard deviation from the mean.

P a g e | 210
Chapter 12 Proposed Adaptive Loudness Growth Function

Fixed Level Test (80 dBSPL)

100
90
80

Percent correct(%)
70
60
50
40
30
20
10
0
S1 S3 S4 S5 S6 S7 Mean
Subject

Figure 12-31 Percent correct scores of three cochlear implant subjects with Tri +
ADRO and with ALGF-2 in the fixed test at 80 dB SPL in noise. Error bars indicate
one standard deviation from the mean.

12.5.3.2 Discussion

ALGF-2 achieved equal or better SRT compared to Tri + ADRO in the roving-level SRT

test. The results of the group mean scores were comparable between the two processing

conditions in the fixed-level tests at 50 and 80 dB SPL. Similar outcome was observed in

§12.5.2.1.2. Both Tri + ADRO and ALGF-2 employed fast and slow gain control

mechanisms. Comparably good results between Tri + ADRO and ALGF could be due to the

slow components of each algorithm which adapted to the fixed presentation level throughout

the test. Performance difference would be observed if one algorithm was better than the other

in a listening situation where the overall level changed often. For this reason, sentence test

using a fixed presentation level was not effective for evaluating AGCs with slow time

constants.

The overall SRT results showed a tendency to improve the speech intelligibility of the

recipients with the ALGF in difficult listening conditions. Based on the speech intelligibility

P a g e | 211
Chapter 12 Proposed Adaptive Loudness Growth Function

improvement observed in some subjects in the roving-level SRT test, the potential

performance contributing factors that the ALGF could provide are identified as audibility,

listening comfort and noise reduction.

At 50 dB SPL, ALGF-2 achieved equal or better SRT compared to Tri + ADRO processing.

Apparently, ALGF-2 provided more audibility than the maximum 3 dB gain of ADRO. The

SRT improvement of S4 and S7 with the ALGF-2 at 50 dB SPL was approximately 6 dB.

The other four subjects achieved comparable SRTs with both processing conditions at 50

dB SPL. Subject S1, S4 and S7 have contralateral hearing from the hearing aids. Their

hearing aids were turned off and they only had to listen through the implant during testing.

Perhaps, the processing that provided more audibility gave more benefits to these subjects.

The roving SRT test used a fixed sequence to present the sentences at three presentation

level of 50, 65 and 80 dB SPL. The presentation level took +15 dB steps on the way up, from

50 to 65 to 80 dB SPL, but took a -30 dB step on the way down from 80 to 50 dB SPL.

Because of this -30 dB difference, the beginning part of the sentences presented at 50 dB

SPL could have lower audibility than the later part. Hence the SRT of the subjects could be

higher (worse) at 50 dB SPL than at the other two presentation levels if the release time (i.e.,

the down-slew-rate) of the ALGF was not fast enough to release the saturation level from the

highest presentation level of 80 dB SPL. Comparing the results between the presentation

levels, the SRT at 50 dB SPL was not exceptionally higher (worse) than the SRTs at the

other two presentation levels. Therefore, the down-slew-rate (– 12 dB/s) of the slow

saturation level regulator appears to be appropriate.

The down-slew-rate -12 dB/s can be equated to a release time of 2.08 seconds

approximately. It is within the range of release time values that can produce pumping effect.

A subject may hear background noise growing louder following a loud sound if the criterion

to hold the slow saturation level is not met. The quality aspect of the ALGF should further be

investigated in real-life listening situations.

For sentences presented at 80 dB SPL, there was an initial concern about prolonged

stimulation at C-levels and excessive loudness. However, the minimum dynamic range

constraint and the spectral profile preserving feature of the fast saturation level regulator

P a g e | 212
Chapter 12 Proposed Adaptive Loudness Growth Function

prevented potential loudness discomfort. With the ALGF, no spectral distortion due to

envelope clipping could occur. None of the subjects reported that sentences were

uncomfortably loud with the ALGF at the 80 dB SPL during the test. They also did not

report excessive loudness with Tri + ADRO.

The SRT results at 80 dB SPL clearly showed that the ALGF-2 performed better than the

existing AGC system. Every subject scored lower (better) SRT with the ALGF-2 than with

the Tri + ADRO. The explanation for the improvement with the ALGF could be partly due to

less spectral distortion by avoiding the clipping distortion and partly due to the noise

reduction by the adaptive base level.

The audibility was less of an issue for sentences presented at 65 and 80 dB SPL. Less

spectral distortion or clipping can be expected at 65 dB SPL than at 80 dB SPL. If the SRT

of the subjects with the ALGF was improved at 65 dB SPL, it could be mainly attributed to

the noise reduction ability of the ALGF. The dynamic behavior of the ALGF could also be

partly attributed. The mean SRT at 65 dB SPL are almost the same for both processing

conditions. The individual SRT results are also comparably close between the two

processing for four out of six subjects. The noise estimator was considered as less effective

due to no significant improvement at 65 dB SPL.

The signal path of Nucleus sound processors are designed to work optimally for speech

presented at 65 dB SPL (C-SPL). The results of the earlier study (§9.2.4.1) showed that the

mean SRT of Tri + ADRO at 80 dB SPL was poorer than the one at 65 dB SPL. Compared

to that, the mean SRTs of Tri + ADRO at 50 and 80 dB SPL were closer to 65 dB SPL in

Study 1 (§12.5.2.1.1) and the current study. A possible reason could be a wider input

dynamic range (50 dB) employed in the Tri + ADRO program of the earlier study in Chapter

9. This agrees with the study of Holden et al. (2011), which suggested a narrow input

dynamic range in noisy conditions as clinical guidelines.

Both Tri + ADRO (with a proper dynamic range setting) and the ALGF used slow time

constants, and acted as the Automatic Volume Control (AVC). Since they adapt to the

changes in the overall presentation level of input stimuli, the dependency of the input signal

level on the targeted SPL (C-SPL) becomes less prominent with these programs.

P a g e | 213
Chapter 12 Proposed Adaptive Loudness Growth Function

The experimental results showed that the ALGF could potentially improve the intelligibility

of speech presented at very high or very low levels and also provide more access to sounds

within a wide range. However, some recipients may find that an overall loudness perception

becomes less natural if the presentation level of different sounds has little variation over

time. For example, it would be difficult to judge the distance of the sound sources if two

sounds from near and far distances were perceived to have the same loudness. With the

ALGF acting as an automatic volume control, cochlear implant recipients may not know the

actual loudness of sounds. This could be inconvenient in some listening situations. For

example, the recipient would not adjust the volume of a loud TV or stereo because the sound

was not very loud for him or her. The pros and cons of this dynamic level adjustment should

be tested extensively in various listening conditions outside the laboratory.

It would be interesting to see the SRTs of the subjects evaluated in the present study

compared to the SRT of cochlear implant subjects measured by other researchers using an

adaptive roving-level SRT test. It should be noted that comparing the performance of

subjects between different studies is questionable because different studies used different test

methods and materials, different subjects and different signal processing algorithms.

The SRT results were taken from Haumann et al. (2010) and Boyle et al. (2013); both

studies used the roving-level SRT test with a single adaptive track (§7.4.4.1). The sequence

of the presentation levels was randomized in the SRT test of their studies. The present study

used the adaptive roving-level SRT test with a fixed sequence of the presentation levels.

Table 12-5 shows mean and standard deviation of the SRTs from each study.

Study the better performing Opus 2 Study 2


group Tri + ADRO ALGF-2
(Haumann, Lenarz
(Boyle et al. 2013) and Büchner 2010)
SRT (dB)
Mean (std dev) 9.4 (4) 1 (2) -0.5 (1.3) -1.8 (1.1)

approximately approximately

Table 12-5 Retrospective comparison with SRT results from other studies that used a
roving-level SRT test

P a g e | 214
Chapter 12 Proposed Adaptive Loudness Growth Function

The mean SRTs of the two processing conditions evaluated in the present study were the

lowest among the group. The negative SRTs indicated robustness of both processing for

roving-level speech in noise. A further improvement with ALGF-2 in such difficult listening

condition was encouraging.

12.5.4 Study 2F: Adaptive vs. Fixed Dynamic Range

The SRT improvement with ALGF-2 was attributed to the level adjustment by the adaptive

saturation level and the noise reduction by the adaptive base level. However it was not clear

which component contributed more to the speech intelligibility, from the SRT comparison

between ALGF-2 and Tri + ADRO. Therefore a comparison was made between the ALGF

with and without the noise estimation, i.e., the base level was not driven by the estimated

noise, to show the contribution of the noise floor estimator to the speech intelligibility of the

subjects. When the noise estimator was turned off, the base level was coupled to the

saturation level and the dynamic range of the ALGF was fixed. The ALGF used in this study

was configured from ALGF-2 but with a fixed dynamic range. Hence, it is labeled ALGF-

2F.

The current study is a subset of Study 2. To reflect the nature of the investigation, the study

is labeled as Study 2F. Measurements of Tri + ADRO, ALGF-2 and ALGF-2F were

conducted on the same day using the same test setup on the same subjects. When the

adaptive base level was off, the ALGF-2 became a dual-loop AGC. The dynamic range of 40

dB was chosen for the ALGF-2F in this study. The parameter settings of the saturation level

regulators are the same for both ALGF-2 and ALGF-2F. The parameter settings of the ALGF

were shown in Table 12-2.

P a g e | 215
Chapter 12 Proposed Adaptive Loudness Growth Function

12.5.4.1 Results

Only five cochlear implant subjects were experimented with ALGF-2F. All five of them

were evaluated using the roving-level SRT test. The speech intelligibility of only four was

evaluated using the fixed-level test at 50 dB SPL and none of them were measured at 80 dB

SPL.

12.5.4.1.1 Interleaved roving-level SRT test

Figure 12-32 shows the SRT comparison between ALGF-2 and ALGF-2F at each

presentation level. Table 12-6 shows the group mean and standard deviation of the SRTs

with each processing and the difference between the mean SRT of the two ALGF. The group

mean SRT values of ALGF-2 and ALGF-2F were comparable at 50 and 80 dB SPL in the

roving-level SRT test. The group mean SRT of ALGF-2 was higher (worse) than ALGF-2F

at 65 dB SPL. According to analysis by a t-test, the difference is statistically significant. The

individual subject’s SRT results show that most of them obtained comparable SRTs with

each processing. None of them showed a consistent improvement with either ALGF

configuration in all three presentation levels. For example, The SRT of S4 was lower (better)

at 50 and 80 dB SPL but higher (worse) at 65 dB SPL with ALGF-2. The SRT of S7 on the

other hand was lower (better) at 65 and 80 dB SPL but higher (worse) at 50 dB SPL with

ALGF-2F. S5 obtained comparable SRT values with both configurations of the ALGF-2 at

each presentation level.

P a g e | 216
Chapter 12 Proposed Adaptive Loudness Growth Function

Interleaved Roving-level SRT Test

4 50 dBSPL
SRT (dB) 2
0
-2
-4
ALGF-2F
-6 ALGF2
Interleaved Roving-level SRT Test

4 65 dBSPL
2
SRT (dB)

0
-2
-4 *p<0.03
-6 Interleaved Roving-level SRT Test

4 80 dBSPL
2
SRT (dB)

0
-2
-4
-6

S1 S3 S4 S5 S6 S7 Mean
Subject

Figure 12-32 SRT Comparison between ALGF-2 and ALGF-2F. Error bars indicate
one standard deviation from the mean value. The asterisks indicate the statistical
significance of the difference between the two ALGFs (* p < 0.05, ** p < 0.01).

P a g e | 217
Chapter 12 Proposed Adaptive Loudness Growth Function

50 dB SPL 65 dB SPL 80 dB SPL

SRT (dB) : ALGF-2 -2.66 (2.7) -0.96 (1.02) -2.59 (1.35)

SRT (dB) : ALGF-2F -2.45 (1.2) -2.14 (0.84) -1.56 (0.79)

Difference (dB) 0.21 -1.18 1.03

p-value 0.81 0.03* 0.15

Table 12-6 Statistical analysis of SRT measured with ALGF-2 and the ALGF-2F at 50,
65 and 80 dB SPL. The value inside the bracket indicates the standard deviation. p-
values were calculated by a t-test for the hypothesis testing of the significance on the
SRT difference.

12.5.4.1.2 Fixed presentation level test

The mean scores between ALGF-2 and ALGF-2F were comparable, with the difference of

less than 10 percentage points. Three out of four subjects performed lower with ALGF-2

than ALGF-2F. The score difference was more than 20 percentage points for S1 and S7.

Only S5 scored better with the ALGF-2, with the improvement of approximately 15

percentage points.

Fixed Level Test (50 dBSPL)

ALGF-2F ALGF-2
100
90
80
Percent correct(%)

70
60
50
40
30
20
10
0
S1 S3 S4 S5 S6 S7 Mean
Subject

Figure 12-33 Percent correct scores of four cochlear implant subjects with ALGF-2 and
with ALGF-2F in the fixed test at 50 dB SPL in noise. Error bars indicate one standard
deviation from the mean.

P a g e | 218
Chapter 12 Proposed Adaptive Loudness Growth Function

12.5.4.2 Discussion

The motivation of Study-2F was to investigate the effect of the adaptive base level driven by

the estimated noise level. A statistically significant SRT degradation of approximately 1.2

dB was observed at 65 dB SPL in the roving-level test for ALGF-2 compared to ALGF-2F.

The rest of the results did not give a clear indication of the performance difference between

the ALGF with and without the adaptive base level. Equal or lower performance was

obtained with ALGF-2, with the adaptive base level. Three possible reasons are for the lower

or equal performance with the ALGF-2 are: (i) inaccurate noise estimation (ii) ineffective

test method for evaluation and (iii) abnormal loudness perception.

When the offline data from Figure 12-24 was reviewed, the base level took time to catch up

with the presentation level of the noisy speech when the presentation level was roved up

from 50 to 65 dB SPL and from 65 to 80 dB SPL. At the beginning of the new presentation

level, at 65 and 80 dB SPL in particular, a high level of noise was observed at the output

signal due to the underestimation of true noise. Compared to the noise tracking at 65 and 80

dB SPL, the base level of the ALGF-2 tracked the noise level reasonably well at 50 dB SPL

reasonably well. However no significant SRT difference between ALGF-2 and ALGF-2F

was observed at this level. The mean SRTs were approximately -2.5 dB at 50 dB SPL. At the

SNR of -2.5 dB, the background noise level was higher than the level of target speech.

Perhaps the output SNR could not be improved further by the adaptive base level or the

noise estimator itself became less effective at negative SNR.

Since the roving-level conditions of SRT affected the performance of the noise estimator, the

results of the fixed test were observed to see the performance difference due to the adaptive

base level. Although performance improvement was anticipated from noise removal by the

adaptive base level, no significant performance difference between ALGF-2 and ALGF-2F

was observed at 50 dB SPL. When the offline results for three different types of noise

presented at the fixed test were checked at Figure 12-13, the new MCRA noise estimator

underestimated the four-talker babble noise. If the true dynamic range of the noisy speech at

50 dB SPL was lower than the dynamic range estimated by ALGF-2, the ALGF-2 could

show more noise at the output. Similarly, if the dynamic range estimated by ALGF-2 was

P a g e | 219
Chapter 12 Proposed Adaptive Loudness Growth Function

larger than 40 dB, the fixed dynamic range of ALGF-2F, then more noise could be expected

at the output with ALGF-2.

In the new MCRA noise estimator, noise was estimated by recursive-averaging the noisy

channel envelopes like Lin’s method. It has an additional feature to reduce potential

overestimation error of the recursive-averaging method. Perhaps the minima-controlled

component was biased towards the level lower than the true noise floor. It should be noted

that the MCRA method employed the bias compensation by calculating the mean of the

variance of the stationary noise as proposed by Martin (2001). The underestimation at the

offline results indicated that more bias compensation was necessary for non-stationary noise

like babbles.

The individual and group SRT results were more consistent across the presentation levels

with the ALGF-2F than ALGF-2. The ALGF-2F only had the slow and fast saturation level

regulator to perform slow and fast gain adjustment at different presentation levels. Achieving

equal SRT values within a certain tolerance across all three presentation levels indicated that

the saturation level regulator performed well and truly achieved the goal. This validated the

comparison between the two ALGFs to highlight the performance of the adaptive base level.

The third possible reason for not observing performance improvement with the ALGF with

an adaptive dynamic range was abnormal loudness perception. The subjects have no

experience with processing that varied the dynamic range continuously. The loudness

perception could be affected by frequent variation in the dynamic range. Ideally, the ALGF

estimated the dynamic range of the input stimuli and no abnormal loudness perception was

expected. However, in a situation like roving-level sentences in noise, the overall dynamic

range estimated by the ALGF was not accurate due to the slow tracking of the adaptive base

level to the noise floor. Noise was gradually decreased as the base level was increased to

catch up with the noise floor. That could have impact on the loudness perception, although

no speech intelligibility degradation was expected. Further investigations should be done on

the loudness perception of the subjects with the ALGF with an adaptive dynamic range.

The hypothesis was that the adaptive base level was important for speech in noise and

performance could be improved by using the estimated noise floor as the base level.

P a g e | 220
Chapter 12 Proposed Adaptive Loudness Growth Function

However, the experimental results did not really show the difference to support the

hypothesis, for reasons were stated above. From Study 2F, it was concluded that the noise

estimator could be improved further. Secondly, the roving-level SRT test may not be

effective to show the contribution of the noise estimator. More research needs to be done on

test methodology to evaluate the full potential of the ALGF. Finally, the loudness perception

with the ALGF should be systematically assessed.

12.5.5 Test-retest Reliability of Interleaved Roving-level SRT Test

The test-retest variability of the adaptive roving-level SRT test with a fixed presentation

sequence was analyzed from the SRT values of four subjects who participated in Study 1 and

2 with the same processing condition, Tri + ADRO. The time difference between the two

studies ranged from 7 to 24 weeks.

Figure 12-34 shows the SRTs and Figure 12-35 shows the SRT difference of the subjects

with the Tri + ADRO between the two studies that were conducted on different days. A

positive value indicates that the SRT of Study 2 is better. Subject S4 shows the SRT

improvement in Study 2 at all three presentation levels. Likewise, S1 shows the SRT

improvement at 50 and 65 dB SPL in Study 2. S6 is the only subject with SRT differences of

less than 1 dB between the two studies for all presentation levels. 8 out of 12 SRT

differences were positive. A learning effect was possibly shown in Study 2. The effect of

subject-related matters at the time of testing such as fatigue, distraction and cognitive load

could not be ruled out from the performance difference between studies.

P a g e | 221
Chapter 12 Proposed Adaptive Loudness Growth Function

Interleaved Roving-level SRT Test (Tri+ADRO: Test-retest)

4 50 dBSPL
2
SRT (dB)

0
-2
-4
Trial 1
-6 InterleavedTrial 2
Roving-level SRT Test (Tri+ADRO: Test-retest)

4 65 dBSPL
2
SRT (dB)

0
-2
-4
-6 Interleaved Roving-level SRT Test (Tri+ADRO: Test-retest)

4 80 dBSPL
2
SRT (dB)

0
-2
-4
-6

S1 S3 S4 S5 S6 S7 Mean
Subject

Figure 12-34 Test-retest variability of the interleaved SRT test from the SRT of four
subjects with Tri + ADRO taken from Study 1 and Study 2. Error bars indicate one
standard deviation from the mean. The asterisks indicate the statistical significance of
the difference between the two studies (* p < 0.05, ** p < 0.01).

P a g e | 222
Chapter 12 Proposed Adaptive Loudness Growth Function

Test-retest: Tri + ADRO


6

0
dB

-2

-4

-6 50 dB SPL
65 dB SPL
80 dB SPL
-8
S1 S3 S4 S5 S6 S7 Mean
Subject

Figure 12-35 SRT differences between the two studies of each subject. The difference
was calculated as: SRT (Study 1) – SRT (Study 2). The mean was calculated as the
average of the absolute SRT differences between the two studies.

The test and retest variability of the interleaved roving-level SRT test was better than that of

the roving-level SRT test using single adaptive track (§9.2.6). Each system measured Tri +

ADRO in two sessions held at different days. The improvement was approximately 2 dB.

The test-retest reliability improvement with the interleaved SRT supported the claim that the

interleaved SRT test was better due to the use of independent adaptive tracks and the fixed

sequence of presentation level which could reduce the performance bias due to unbalanced

randomized sequence.

From the study of Dawson et al.(2013) on the adaptive SRT test with one presentation level,

a standard deviation of 1.2 dB could be expected from 16 sentences when a psychometric fit

calculation rule was applied. Compared to that, the standard deviation of 3.4, 2.6 and 3.1 dB

for the interleaved SRT roving at 50, 65 and 80 dB SPL in the present study was higher. The

standard deviation of each presentation level was calculated from 8 SRT values (four

subjects in two studies). A possible reason of higher SRT values observed in the present

study could be the difficulty of the SRT test; the presentation levels were roved between

three: 50, 65 and 80 dB SPL. Another reason could be small sample size.

P a g e | 223
Chapter 12 Proposed Adaptive Loudness Growth Function

12.6 Conclusions

Adjusting signal level by gain means has the implication of increasing the background noise

when the audibility of a low-level target signal is improved. In contrast, adjusting the relative

position of the signal level within the input dynamic range, by expanding or contracting the

range itself, can potentially improve audibility without compromising the level of

background noise. With this concept of level adjustment, a new signal processing technique

that optimizes the input dynamic range of a cochlear implant system was proposed,

implemented and evaluated in this chapter. The experimental results show that the ALGF is

feasible to replace the existing AGC systems.

Two versions of ALGF were successfully evaluated with the recipients by using the fixed

and roving level speech tests in the laboratory. Anecdotal reports from the subjects on the

first-time listening experience with the ALGF were positive. The impression was that

audibility was significantly improved as some reported that they heard low-level sounds of

activities in the laboratory. With a slightly different parameter set and the proposed MCRA

noise estimator, the second version of the ALGF, ALGF-2 was implemented. ALGF-2

achieved equal or better SRT compared to the existing AGC system in the roving-level SRT

test. The SRT improvement with the ALGF-2 was statistically significant at 80 dB SPL.

Based on the results, it was concluded that the ALGF could potentially perform better in

loud noisy environments than the existing AGC. The envelope profile limiter showed better

performance than the front-end limiter for high-level speech in noise at 10 dB SNR

(§10.4.1). The improvement was attributed to the spectral profile preserving feature of the

EPL in adverse listening conditions. Amplitude cues become more important when spectral

cues were distorted. The ALGF also preserved the spectral envelope shape by avoiding

envelope clipping. Therefore, it was deduced that the SRT improvement at the highest level

with the ALGF was also due to the ability that could preserve envelope cues more than the

existing AGC system.

The hypothesis of the ALGF was the speech intelligibility could be improved by using the

estimated noise floor as the adaptive base level. Equal or better performance with ALGF-2

shown in Study 2 only weakly supported the hypothesis. To determine whether the base level

P a g e | 224
Chapter 12 Proposed Adaptive Loudness Growth Function

regulator was effective to reduce the background noise, the ALGF with no base level

regulator (ALGF-2F) was also evaluated and compared to ALGF with the base level

regulator (ALGF-2). Equal or lower performance of group SRT with ALGF-2 compared to

ALGF-2F was observed. A reasonable doubt was placed on the efficacy of the base level

regulator to follow the roving-level background noise. The roving-level SRT test may not be

the right test to evaluate the performance of noise reduction.

The interleaved roving-level SRT test using a fixed sequence could reduce the variability of

performance due to test-related factors. A fixed sequence allowed equal chance for each

processing to be evaluated with the same number of ups and downs in the presentation

levels. One the other hand, it is possible for test condition to be evaluated with more up steps

than down steps or vice versa by the roving-level SRT test with random presentation level

sequences. Test-retest variation can be widened consequently. The above claim was

supported by the comparison between the test-retest variability of the interleaved roving-

level SRT test using a fixed sequence of the presentation levels in the present study and that

of the roving-level SRT test with a single adaptive track using randomized sequence. It was

generally concluded that the interleaved roving-level SRT test with fixed sequence of

presentation levels was more robust for evaluation of AGC systems. More research should

be done on the effect of test-related factors on AGC performance.

The present study showed the feasibility of ALGF as a robust level optimization technique

for cochlear implant systems. The ALGF is a feature-based algorithm. The features that were

utilised in the present study were level-related features, clipping proportion, and estimated

background noise for adjusting the dynamic range. More features could be added into the

future implementation of ALGF to make it more robust in different listening conditions. An

example of a new feature could be the energy profile of input stimuli. For example, energy

profiles of a transient noise and car noise are different. Energy profiles of speech and other

environmental noise can be different. Such information may be useful to make the algorithm

more adaptable to different stimuli.

P a g e | 225
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

13 Predicting Cochlear Implant Speech Intelligibility

13.1 Introduction

Optimizing a sound processing algorithm to perform well in different acoustic scenarios can

take up a lot of time from both researcher and subjects. Signal metrics are useful for

exploration and tuning of the parameter space of an algorithm. A reliable signal metric can

expedite the optimization process, for example fine-tuning of a large parameter set. Two

signal metrics that were reviewed in chapter 6 are applied in this chapter: Across-Source

Modulation Correlation (ASMC) and Normalized Covariance Measure (NCM). In addition,

two new metrics are developed: Clipping Proportion, and Output SNR (OSNR).

Several AGC conditions were evaluated in previous chapters. In this chapter, the effect of

those AGC conditions on the channel envelopes are quantified by the selected signal metrics.

Then the correlation between each signal metric and the measured speech intelligibility

scores will be analysed, to determine the effectiveness of the selected metric to predict

speech intelligibility of cochlear implant subjects. This will strengthen understanding of the

important factors affecting speech intelligibility of cochlear implant recipients.

13.2 Signal Processing

The signal processing used in the previous clinical studies was replicated in this chapter:

 Performance-intensity function: no AGC vs. FEL75 (§8.2)

 Analysis of gain structure and release time: FEL75, FEL625, EPL75 and EPL625

(§10.3.3)

 Performance-intensity function: FEL75 vs. EPL625 (§10.3.4).

Subjects’ scores are collectively shown in Figure 13-1. It should be noted that the

performance-intensity function of the front-end compression limiter was measured in §8.2

with four recipients and in §10.4.2 with six recipients (four common subjects from the

previous study). The results used in this study were the average of the two clinical studies.

P a g e | 226
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

100

90

80
Percent correct(%)
70

60

50
No AGC SNR10
40 No AGC SNR20
FEL75 SNR10
30 FEL75 SNR20
EPL625 SNR10
20
EPL625 SNR20
10 FEL625 SNR10
EPL75 SNR10
0
55 60 65 70 75 80 85 90
Presentation level (dB SPL)

Figure 13-1 Percent correct scores of the cochlear implant subjects with different AGC
configurations in the fixed level test. Open and filled symbols represent 10 dB SNR and
20 dB SNR respectively.

13.2.1 Test Stimuli

AuSTIN sentences and four-talker babble noise were used as test stimuli. Five sentences

were concatenated with silent gaps of four seconds between sentences. The reason for

inserting silent gaps is to initialize the compression of each AGC at 0 dB for the sentence

starts. The background noise was presented one second before and after each sentence. Each

metric was calculated over the duration of the sentences, excluding silent gaps in between

sentences.

The presentation levels and the SNR values evaluated in the clinical studies were

reproduced. The presentation levels of the sentences ranged from 55 to 89 dB SPL in the

studies that measured the performance-intensity functions (§8.2 and 10.3.4). The noise

presentation level for each test condition was adjusted as per the input SNR of 10 and 20 dB.

P a g e | 227
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

13.2.2 AGC Configurations

The following AGC configurations were investigated:

1. No AGC

2. FEL75: front-end compression limiter with 75 ms release time

3. FEL625: front-end compression limiter with 625 ms release time

4. EPL75: envelope profile limiter with 75ms release time

5. EPL625: envelope profile limiter with 625 ms release time

Signal processing was performed offline in the MATLAB-Simulink platform, using the same

signal path and parameter settings that were used in the clinical studies.

The subjects’ own MAPs were also used for the performance analysis of individual subjects.

A default map was used for the group performance analysis.

13.2.3 Curve Fitting

A psychometric function relates the subject’s performance in a psychophysical task to the

physical quantity of the stimuli (Wichmann and Hill 2001). In this chapter, the performance

measure was the subject’s percent correct score (Chapters 8 and 10), and the physical

quantity was a signal metric (Chapter 6). The psychometric function is assumed to have a

sigmoidal shape. A cumulative Gaussian psychometric function was fitted to the recipients’

mean percent correct scores against each signal metric using the psignifit toolbox for

MATLAB by Jeremy Hill (version 2.5.6, available at http://bootstrap-

software.org/psignifit/), which implements a maximum-likelihood method (Wichmann and

Hill 2001). The goodness of fit was quantified by the deviance, D; a smaller deviance

indicated a better fit.

P a g e | 228
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

13.3 Signal Metrics and Performance Analysis

13.3.1 Clipping Proportion

The most obvious form of envelope distortion is envelope clipping. The clipping proportion

was calculated by summing the number of envelope samples that exceeded the saturation

level of the LGF, and dividing by the total number of samples. The proportion of clipping

(%) was obtained by:

(13.1)

where is the amplitude of an envelope at channel at time index i. M is the number of

frequency channels, N is the number of samples collected and S is the saturation level. The

logical produces a binary number; 0 or 1.

One goal of AGC is to prevent envelope clipping, so clipping proportion is a measure of the

effectiveness of the AGC system. Figure 13-2 shows the envelopes processed by no AGC,

FEL75 and EPL625. The input stimuli were sentences presented at 80 dB SPL in four-talker

babble noise at SNR 10 dB. The envelopes were shown together with the saturation level of

the LGF to observe how effective each configuration was to prevent clipping.

P a g e | 229
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

Band 7

0.06
No AGC
0.04

0.02

0
0.04
Amplitude

0.03 FEL

0.02

0.01

0
Signal
0.04
Sat level
0.03 EPL Base level

0.02

0.01

0
0.5 1 1.5 2 2.5 3
Time(s)

Figure 13-2 Comparison of signal amplitudes processed by each AGC configuration,


for speech presented at 80 dB SPL in the presence of four-talker babble at SNR 10dB.
The envelopes at channel 7 were taken before the LGF.

The clipping proportion was calculated only for the processing conditions with the front-end

compression limiter and with no AGC. The results of the envelope profile limiter were not

included because no envelope clipping occurred due to the novel gain structure that

preserved the spectral envelope profile. The bottom panel of Figure 13-2 shows no envelope

from the processing with the EPL exceeded the saturation level of the LGF.

P a g e | 230
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

100 100

90 90

S2

Percent Correct Scores(%)


S1 80
Percent Correct Scores(%)
80

70 70

60 60

50 50

40 40
D = 24.21 D = 17.12
30 30

20 20

10 10

0 0
0 20 40 60 80 100 0 20 40 60 80 100
Proportion of Clipping(%) Proportion of Clipping(%)
100 100 No AGC SNR 10
S4
No AGC SNR 20
90 90 FEL75 SNR 10
S3 FEL75 SNR 20
Percent Correct Scores(%)

80

Percent Correct Scores(%)


80
FEL625 SNR 10
70 70
60 60
50 50
40 40
30 30
20 D = 19.17 20 D = 16.19
10 10
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Proportion of Clipping(%) Proportion of Clipping(%)
100 100

90 90

S5 S6
Percent Correct Scores(%)

Percent Correct Scores(%)

80 80

70 70

60 60

50 50

40 40

30 30 D = 4.47
20 20

10 D = 3.56 10

0 0
0 20 40 60 80 100 0 20 40 60 80 100
Proportion of Clipping(%) Proportion of Clipping(%)

Figure 13-3 Percent correct scores of the individual subject as a function of clipping
proportion. The open and filled symbols represent the results at SNR 10 dB and 20 dB
respectively.

The diagrams in Figure 13-3 show the percent correct scores of each subject as a function of

clipping proportion. The range of deviance was between 3.56 and 24.21. The deviances of

S5 and S6 were low because they were not tested with no AGC condition.

P a g e | 231
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

100

80

60

40

20 D = 90.42
Percent Correct Scores(%)

0
100
No AGC (D = 4.00)
80 FEL75 (D = 4.02)

60 SNR 20dB

40

20

0
100
No AGC (D = 1.57)
80 FEL75 (D = 2.90)
FEL625
60
SNR 10dB
40

20

0
0 10 20 30 40 50 60 70 80
Proportion of Clipping (%)

Figure 13-4 Group mean percent correct scores as a function of clipping proportion.
The top panel shows the scores in all conditions, the middle panel shows the scores at
20 dB SNR and the bottom panel shows the scores at 10 dB SNR.

The three diagrams in Figure 13-4 show the mean percent correct scores of the subjects as a

function of clipping proportion for each SNR and all SNR conditions together. The

correlation between the scores and the clipping proportion was high, indicated by low

deviance, for each processing condition at each SNR. However, a single curve cannot fit well

for the scores from all conditions together, with the deviance of 90.42. The rate of score

degradation was faster with the proportion of clipping produced by the front-end limiter. The

P a g e | 232
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

bottom diagram of Figure 13-4 shows that the proportion of clipping is approximately 15%

from the processing with the front-end compression limiter and the proportion of clipping is

between 50% and 60% from the processing with no AGC. The same score is predicted for

both conditions. Factors other than envelope clipping, affected speech intelligibility in this

case.

Speech was still intelligible with no AGC at 20 dB SNR, even at 83 dB SPL presentation

level, where the channel envelopes were significantly clipped (§8.2.4.1). In contrast, the

envelope profile limiter prevented envelope clipping, but speech intelligibility still degraded

at high presentation levels in low SNR condition (§10.4.2). Hence, clipping proportion by

itself does not appear to be a good indicator of speech intelligibility.

13.3.2 Output SNR

The Output SNR (OSNR) metric is a simple but powerful metric to predict the effect of

compression for speech in noise. The calculation of the output SNR is similar to Rhebergen’s

apparent SNR calculation (Rhebergen, Versfeld and Dreschler 2008b; Rhebergen, Versfeld

and Dreschler 2009), but adapted for the cochlear implant processing. Figure 13-5 shows the

effective output SNR calculation in the cochlear signal path of ACE sound coding.

Figure 13-5 Block diagram of output SNR calculation for the signal path. The front-end
AGC was used as an example in this diagram.

The mixture of speech and noise was processed by the signal path. The gain signals from the

AGCs and the channel indices from the maxima selection block were recorded. Next, the

clean speech was processed through the signal path, applying the recorded gain, and

P a g e | 233
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

choosing stimulation pulses using the recorded channel indices. An inverse LGF was applied

to revert to the linear domain, while retaining the effect of clipping. Similarly, the noise

alone was processed, using the recorded gain and channel indices. Then, the SNR was

calculated for each channel. As in other metrics, the channel SNRs were weighted by using

the relative signal power (Ma, Hu and Loizou 2009) and summed to give the OSNR.

P a g e | 234
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

100 100

90 S1 90 S2 D = 5.19
D = 5.59

Percent Correct Scores(%)


80
Percent Correct Scores(%)
80

70 70

60 60

50 50

40 40

30 30

20 20

10 10

0 0
-2 0 2 4 6 8 10 12 14 16 18 20 -2 0 2 4 6 8 10 12 14 16 18 20
Output SNR(dB) Output SNR(dB)
100 100

90 S3 90 S4
D = 6.41 D = 5.55
Percent Correct Scores(%)

Percent Correct Scores(%)


80 80

70 70

60 60

50 50
No AGC SNR 10
40 40 No AGC SNR 20
FEL75 SNR 10
30 30 FEL75 SNR 20
EPL625 SNR 10
20 20
EPL625 SNR 20
10 10 FEL625 SNR 10
EPL75 SNR 10
0 0
-2 0 2 4 6 8 10 12 14 16 18 20 -2 0 2 4 6 8 10 12 14 16 18 20
Output SNR(dB) Output SNR(dB)
100 100

90 S5 D = 4.66 90 S6 D = 7.41
Percent Correct Scores(%)

80
Percent Correct Scores(%)

80

70 70

60 60

50 50

40 40

30 30

20 20

10 10

0 0
-2 0 2 4 6 8 10 12 14 16 18 20 -2 0 2 4 6 8 10 12 14 16 18 20
Output SNR(dB) Output SNR(dB)

Figure 13-6 Percent correct score of individual subject as a function of output SNR.
The open and filled symbols represent the results in 10 dB and 20 dB SNR respectively.

Each of the diagrams in Figure 13-6 shows the percent correct scores of each subject with

different AGC configurations as a function of OSNR. Each subject’s scores were highly

correlated with the OSNR. The psychometric curve was a good fit for each subject; with the

deviance between 4.66 and 7.41. The slopes of the psychometric functions were similar

across subjects, but the OSNR knee points were different, i.e. the curves were shifted

horizontally. This was expected because some subjects performed better than others in noise.

All subjects reached a performance asymptote for high OSNRs. The three diagrams in Figure

P a g e | 235
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

13-7 show the group mean percent correct scores of the subjects with the three processing

conditions as a function of OSNR.

100

80 D = 16.36

60

40

20
Percent Correct Scores(%)

0
100

80

60
SNR 20dB
40 No AGC (D = 3.70)
20 FEL75 (D = 3.42)
EPL625 (D = 5.57)
0
100

80
SNR 10dB
60 No AGC (D = 3.56)
FEL75 (D = 6.11)
40
EPL625 (D = 3.74)
20 FEL625
EPL75
0
0 5 10 15 20
Output SNR (dB)

Figure 13-7 Group mean scores as a function of output SNR. The top panel shows the
scores in all conditions, the middle panel shows the scores at 20 dB SNR and the bottom
panel shows the scores at 10 dB SNR.

The low deviance of the psychometric fit indicates that the OSNR could predict the percent

correct scores of recipients with each AGC condition in noise. The OSNR reduction with no

AGC was mainly due to the instantaneous compression at the LGF. As the presentation level

increased, the clipping initially affected only target speech while having little effect on

background noise level. The higher the presentation level, the OSNR reduction became

P a g e | 236
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

significant with no AGC. The OSNR degradation still occurred with the front-end

compression limiter at high presentation levels although the amount of OSNR reduction was

not as high as the processing with no AGC. Among the three processing conditions, the

OSNR degradation with the proposed envelope profile limiter was the lowest. When

comparing the OSNR of each AGC with the two release times; 75 and 625 ms, it was shown

that each AGC with the release time 75 ms produced significantly lower OSNR than the

same AGC with the release time 625 ms. The performance degradation with shorter release

time was explained by the OSNR.

13.3.3 Across-Source Modulation Correlation

The Across-Source Modulation Correlation (ASMC) metric calculates the correlation

coefficient between the target speech and competing voice (Stone and Moore 2007). The two

signals that were originally independent acquired some correlation due to the common

modulation introduced by the compression. The hypothesis was that the segregation between

the target and background noise then became difficult, degrading speech intelligibility.

Figure 13-8 ASMC calculation for the signal path. The front-end AGC was used as an
example in this diagram.

The procedure to calculate ASMC was adapted from the study of Stone and Moore (2007).

The implementation of the ASMC metric for the ACE strategy was shown in Figure 13-8.

The mixture of speech and noise was processed by the signal path. The gain signals from the

AGCs were recorded. Next, the clean speech was processed through the signal path,

applying the recorded gain. The channel envelopes after the filterbank were subject to the

base and saturation level in the linear domain, while retaining the effect of clipping. Stone

P a g e | 237
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

and Moore suggested calculating the coefficients from the logarithm of the channel

envelopes because log amplitudes were more relevant to the perception than linear

amplitudes. Therefore the channel envelopes were converted into the log domain. The log

channel envelopes were then smoothed by a low-pass filter with the cut-off of 50 Hz.

Similarly, the noise alone was processed, using the recorded gain. Then, the correlation was

calculated for each channel. As in other metrics, the ASMCs across all frequency were

averaged to give the final ASMC.

100 100

90 S1 90 S2
Percent Correct Scores(%)

80
Percent Correct Scores(%)

80

70 70 D = 20.49

60 60

50 50

40 D = 31.00 40

30 30

20 20

10 10

0 0
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1
ASMC Index ASMC Index
100 100

90 S3 90 S4
Percent Correct Scores(%)

80
Percent Correct Scores(%)

80

70 70

60 60

50 50

40 40 D = 19.33
FEL75 SNR 10
30 D = 22.98 30 FEL75 SNR 20
EPL625 SNR 10
20 20
EPL625 SNR 20
10 10 FEL625 SNR 10
EPL75 SNR 10
0 0
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1
ASMC Index ASMC Index
100 100
90 90
80 S5 S6
Percent Correct Scores(%)

80
Percent Correct Scores(%)

70 70
60 60
D = 28.30
50 50
40 D = 12.69 40
30 30
20 20
10 10
0 0
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1
ASMC Index ASMC Index

Figure 13-9 Percent correct score of the individual subject as a function of ASMC. The
open and filled symbols represent the results at SNR 10 dB and 20 dB respectively.

P a g e | 238
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

Each of the diagrams in Figure 13-9 shows the percent correct scores of each subjects with

different AGC configurations as a function of ASMC. The ASMC could follow the trend of

scores for each subject. The deviance of the psychometric curve fitting ranged between 12.7

and 31.

100

80
D = 132.61
60

40

20
Percent Correct Scores(%)

0
100

80

60

40 SNR 20dB
20 FEL75 (D = 3.28)
EPL625 (D = 5.88)
0
100

80

60 SNR 10dB
FEL75 (D = 4.27)
40
EPL625 (D = 3.88)
20 FEL625
EPL75
0
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1
ASMC

Figure 13-10 Group mean scores as a function of ASMC. The top panel shows the
scores in all conditions, the middle panel shows the scores at 20 dB SNR and the bottom
panel shows the scores at 10 dB SNR.

P a g e | 239
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

The bottom and middle panels of Figure 13-10 show that the correlation between the

measured scores and the ASMC was high for each SNR condition. The deviance was low

(indicating a good fit) for the psychometric curve for each SNR processing. However, a

single psychometric curve could not fit well for the scores at different SNR conditions. The

top diagram of Figure 13-10 shows two trends of score degradation with ASMC; each

represents each SNR condition. The deviance of the curve fitting was high at that condition

(D = 132.61).

The amount of compression and consequently the magnitude of ASMC also increased with

the presentation level. The measured scores reflected the performance indicated by ASMC.

The magnitude of ASMC with the proposed envelope profile limiter was lower than that

with the front-end counterpart. The ASMC comparison between the same AGC with

different release times show that the performance improvement was mainly due to the

release time.

It was unclear why ASMC at SNR 20 dB was higher in magnitude than that at SNR 10 dB

for the same presentation levels. It was logical to think that more negative ASMC would

result at low SNR condition because each AGC exercised more compression at SNR 10 dB

than at SNR 20 dB. A possible explanation for observing higher magnitude ASMC at high

SNR was the amount of modulation in the gain. At SNR 20 dB, the AGC gain was mainly

determined by speech and occasionally by noise. Hence it was modulated at the rate of

speech approximately. Whereas at SNR 10 dB, the slow modulation pattern of gain was less

prominent because the AGC reduced the overall presentation level most of the time.

Moreover, the type of background noise could also have impact on the ASMC. If the

background noise was a competing voice, as in the study of Stone and Moore (Stone and

Moore 2003, 2004), the effect of SNR value would have less effect on the ASMC.

P a g e | 240
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

13.3.4 Normalized Covariance Measure

The Normalized Covariance Measure (NCM) indicates the fidelity of channel envelopes at

the output of the LGF compared to the reference channel envelopes. When the presentation

level is increased, the variation between the reference and processed signals becomes larger

due to compression of the AGC systems and the instantaneous compression at the LGF.

Hence the correlation between the reference and processed signal degrades at high

presentation levels.

The reference signal was the channel envelopes of clean sentences taken before the LGF and

hence not clipped at the saturation nor thresholded at the base level. The processed signal

was taken after the LGF to include the effect of instantaneous infinite compression at the

LGF. An inverse LGF was applied to revert to the linear domain, while retaining the effect

of clipping. The implementation of the NCM method closely followed the procedure

described in the study of Ma et al. (2009). The calculation of the NCM was described in

§6.4. The weight of transmission index for each frequency channel was calculated based on

the RMS energy of the reference signal.

P a g e | 241
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

100 100

90 90 S2
S1
Percent Correct Scores(%)

Percent Correct Scores(%)


80 80

70 70
60 60

50 50
D = 14.92
40 40
D = 15.58
30 30

20 20

10 10

0 0
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
NCM Index NCM Index
100 100

90 S3 90 S4
Percent Correct Scores(%)

Percent Correct Scores(%)


80 80

70 70

60 60

50 50
No AGC SNR 10
40 D = 14.55 40 No AGC SNR 20
D = 10.00 FEL75 SNR 10
30 30 FEL75 SNR 20
EPL625 SNR 10
20 20
EPL625 SNR 20
10 10 FEL625 SNR 10
EPL75 SNR 10
0 0
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
NCM Index NCM Index
100 100

90 S5 90 S6
Percent Correct Scores(%)

80
Percent Correct Scores(%)

80

70 70

60 60

50 50 D = 13.22
40 40

30 30

20 20
D = 8.92
10 10

0 0
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
NCM Index NCM Index

Figure 13-11 Percent correct score of the individual subject as a function of as a


function of NCM. The open and filled symbols represent the results in SNR 10 dB and
20 dB respectively.

The diagrams in Figure 13-11 show the percent correct scores of each subject as a function

of the NCM. The NCM could follow the trend of scores for each subject. The deviance of the

psychometric curve fitting ranged between 8.92 and 15.58.

P a g e | 242
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

100

80
D = 55.94
60

40

20
Percent Correct Scores(%)

0
100

80

60
SNR 20dB
40
No AGC (D = 3.73)
20 FEL75 (D = 3.35)
EPL625 (D = 7.01)
0
100

80
SNR 10dB
60 No AGC (D = 2.86)
FEL75 (D = 7.73)
40
EPL625 (D = 3.87)
20 FEL625
EPL75
0
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
NCM

Figure 13-12 Group mean scores as a function of NCM index. The top panel shows the
scores in all conditions, the middle panel shows the scores at 20 dB SNR and the bottom
panel shows the scores at 10 dB SNR.

The three diagrams in Figure 13-12 show the mean percent correct scores of the subjects as a

function of NCM. The bottom and middle diagrams of Figure 13-12 show that the

correlation between the scores and the NCM indexes was high for each processing condition

in each SNR. The deviance was low (indicating a good fit) for the psychometric curve for

each processing condition. A single curve fitting for all scores together captured the trend of

score degradation but the spread was wide, with the deviance of 55.94.

P a g e | 243
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

The reduction of the NCM with no AGC was mainly due to envelope clipping at the LGF.

The higher the presentation level, the more the correlation was reduced between the

reference and the clipped envelopes for the processing with no AGC. The NCM measure

captured the envelope distortion due to clipping and it was highly correlated to the subjects’

performance for each SNR. Like OSNR and ASMC measures, the NCM also can determine

the performance change due to the release time.

13.4 Discussions and Conclusions

Four signal metrics have been investigated in this chapter. From the analysis done on each

signal metric, the following conclusions are made. Each metric tried to predict the speech

intelligibility scores of the subjects measured in the previous clinical studies using the fixed

method. Among them, the output SNR metric consistently captured the change in scores as

per the changes in the output SNR. Figure 13-13 shows the comparison of deviances from

the psychometric fitting of each signal metric and the mean speech intelligibility scores.

140

120

100
Deviance

80

60

40

20

0
Clipping(%) OSNR ASMC NCM
Signal Metric

Figure 13-13 Comparison of deviance between four signal metrics

The clipping proportion showed a good correlation with mean scores for each processing in

each test condition. Scores degraded as the proportion of clipping increased. However, using

P a g e | 244
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

the proportion of clipping as a performance indicator to compare two AGC systems or

configurations is not effective. The effect of envelope clipping could be prominent for very

high level speech in quiet. However, not enough speech intelligibility was measured for that

test condition. For speech in noise, factors other than spectral envelope clipping affected the

performance. The clipping proportion can be used as a measure of envelope distortion but it

alone is not sufficient to predict the speech intelligibility of cochlear implant recipients.

There are other potential side-effects of envelope clipping. For example, sound quality of

clipped envelopes can be low, and stimulating at the maximum current level for prolonged

duration can drain the battery power faster.

The NCM explained the performance degradation with the reduction of temporal envelope

correlation between the reference clean speech and the signal processed by the AGC systems

and the LGF. The NCM captured the general trend of score degradation but the spread was

wide between processing conditions. The sensitivity of the NCM to the speech intelligibility

performance of the subjects depended on the AGC system. For example, the rate of score

degradation with decreasing NCM was higher for the processing with the front-end

compression limiter than the processing with no AGC. The comparison of deviances shows

that NCM is better than clipping proportion. However, like in clipping proportion, factors

other than preserving temporal envelope shape are more important for speech intelligibility.

The findings in this chapter on envelope preservation predicted by the proportion of clipping

and NCM also agree with Stone and Moore (2007) who also showed that fidelity of envelope

shape was not very important for speech intelligibility.

The ASMC has a good sensitivity to the effect of compression for speech in noise at high

presentation levels. It can compare the performance of two AGC systems. However, care

should be taken when the two test conditions with different SNRs are compared. More

investigation should be done on ASMC with different types of noise at different SNR levels

to show its dependency on noise-related test conditions.

Because cochlear implant sound processing is non-linear, the SNR at the output differs from

the SNR at the input. The present study extended prior work on calculating the apparent

SNR, to make it suitable for cochlear implant processing. Prior work recorded the gains

P a g e | 245
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

produced by the AGC in response to the speech and noise mixture, and applied these gains

separately to the clean speech and the noise (Rhebergen, Versfeld and Dreschler 2009).

However, if the remaining cochlear implant processing was then applied, the maxima

selection on the clean speech would choose different channels from the maxima selection on

the noise. Instead, the channel indexes were recorded from the maxima selection on the

speech and noise mixture, so that the contributions of speech and noise to each stimulation

pulse could be identified.

Unlike other metrics, the performance indication by the OSNR was not conditional on

processing condition or amount of input SNR. The OSNR metric quantified the performance

by the signal-to-noise ratio at the output. The diagrams of the measured scores against the

OSNR showed that cochlear implant subjects were sensitive to the output SNR between 0

and 5 dB. In this study, the OSNR explained the performance degradation by the energetic

masking of competing noise to the target speech. Energetic masking is a peripheral masking

phenomenon that occurs when energy from two or more sounds overlaps both spectrally and

temporally, thereby reducing signal detection (Stickney et al. 2004b). The four-talker babble

noise used in the clinical studies of the AGCs appears to produce energetic masking, as

shown by the monotonic performance degradation with the level of background noise in the

fixed-level testing.

Normal hearing subjects can differentiate two voices by their pitches and other properties

such as speaking style and accents. Besides, normal hearing subjects can get benefits of dip

listening at the moments when noise energy drops (Miller 1947; Greenberg et al. 2004).

Unlike normal hearing subjects, hearing-impaired subjects did not get much benefit from

masking release during spectral and temporal dips (Moore, Peters and Stone 1999). It is well

known that pitch perception of cochlear implant recipients is poor (Zeng et al. 2002). If the

amount of noise is assumed the main determinant of speech understanding of cochlear

implant recipients in noisy conditions, the OSNR metric can be used to predict the

intelligibility of speech in any type of noise. However, the study of Stickney et al. (2004b)

showed that cochlear recipients and normal hearing subjects listening to noise-vocoded

speech achieved higher scores for speech presented with the stationary noise than with a

single-talker babble for the same SNR. They discussed that segregation between speech and

P a g e | 246
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

noise was better if they were spectrally different. Further experiments should be conducted

on the effectiveness of OSNR on predicting speech intelligibility of cochlear implant

recipients with different types of noise, for example a single competing talker, or LTASS

with spectral and temporal dips to observe factors other energetic masking of noise, affecting

speech intelligibility.

0
-0.05
-0.1
-0.15
-0.2 67% 69%
-0.25
-0.3 20% 25%
-0.35
-0.4
-0.45 ASMC

0.65 NCM 69%


0.6 67%
0.55 25%
20%
0.5
0.45

4 OSNR 69%
67%
3
dB

2
20% 25%
1

0
FE75 FEL625 EPL75 EPL625
AGC configurations

Figure 13-14 Effect of release time on speech intelligibility predicted by ASMC (top
panel), NCM (middle panel) and OSNR (bottom panel)

Figure 13-14 shows the effect of release time on speech intelligibility predicted by each

metric. The OSNR, ASMC and NCM metrics consistently showed that the release time was

the main factor that affected speech intelligibility in this case. When the release time is

longer, the compression speed becomes slower. Slow compression brings less distortion on

temporal envelopes compared to the fast compression. In addition, when the release time is

P a g e | 247
Chapter 13 Predicting Cochlear Implant Speech
Intelligibility

longer, the amount of compression becomes larger on reducing the signal above the

compression threshold and less amount of clipping can result at the output in this case.

P a g e | 248
Chapter 14 Conclusions and Future Work

14 Conclusions and Future Work

This chapter first summarizes the experimental results of this thesis chapter by chapter and

then draws the conclusions on the gain control techniques in cochlear implant systems and

their effects on speech intelligibility of the recipients based on the findings. It then suggests

further work on the proposed algorithms to enhance listening performance of the recipients.

14.1 Summary of Experimental Results

Chapter 8 investigated the effects of the input dynamic range limitation by the instantaneous

compression at the LGF on the speech intelligibility of cochlear implant recipients. The

performance-intensity functions with no AGC and with the front-end compression limiter

were measured for sentences presented at different presentation levels in two SNR

conditions. With no AGC, speech was still intelligible even when the sentence presentation

level was very high, for the high SNR condition. At low SNR, the envelope clipping for

sentences presented at high levels had a high impact on performance, not only due to

envelope distortion, but also due to the output SNR degradation. Background noise was

shown to have more impact on speech intelligibility than envelope clipping. Although the

front-end limiter reduced envelope clipping at the LGF for presentation levels above

65 dB SPL, the score improvement was moderate. The study in chapter 8 highlighted the

importance of an AGC system for cochlear implant systems in adverse listening conditions,

but questioned the effectiveness and robustness of a front-end limiter for speech presented at

high presentation levels.

Chapter 9 measured the SRTs of the recipients with an existing AGC system (the tri-loop

AGC and ADRO), and with the front-end limiter, for sentences roving at three presentation

levels: 50, 65 and 80 dB SPL. The study clearly indicated that a slow AGC was essential to

adapt to changes in the overall presentation level. It also showed that the performance of the

existing AGC system at 50 and 80 dB SPL could be improved. The study not only collected

baseline results, but also investigated advantages and disadvantages of two variants of the

roving-level SRT test. The roving-level SRT test with a single adaptive track showed a poor

P a g e | 249
Chapter 14 Conclusions and Future Work

test-retest reliability. The SRT depended on the randomized sequence of presentation levels.

The interleaved roving-level SRT, with a fixed sequence of presentation levels, was

recommended for evaluation of AGC systems in the future.

Chapter 10 demonstrated the feasibility of replacing the front-end compression limiter with

a new multichannel compression limiter, called the envelope profile limiter, which

eliminated envelope clipping. Intelligibility with the envelope profile limiter was equal to, or

slightly better, than the front-end limiter. The study also showed that a slow compression

speed (i.e. a longer release time) was more important than preserving the spectral envelope

profile.

Chapter 11 evaluated the envelope profile limiter in a take-home trial. The subjects

preferred their standard program, with ASC, in noisy listening conditions. The study showed

the importance of employing a slow AGC for everyday use.

Chapter 12 proposed a novel algorithm called Adaptive Loudness Growth Function

(ALGF), which expanded and contracted the dynamic range of the LGF adaptively based on

the features extracted from the channel envelopes. It was evaluated with cochlear implant

recipients by using the interleaved SRT test. The ALGF achieved equal or better speech

understanding compared to an existing AGC system (Tri + ADRO). A significant speech

intelligibility improvement with the ALGF was shown at a high presentation level (80

dB SPL). The ALGF showed the potential to achieve both audibility and noise reduction.

Chapter 13 investigated whether signal processing metrics could be used to predict the

speech intelligibility scores of the cochlear implant recipients that were obtained in the

previous chapters. The proportion of envelope clipping was not a good predictor of

intelligibility in noise. The ASMC could not predict the effect of SNR. The NCM was able

to predict the overall trends in scores. The best metric was output SNR, which was an

effective predictor of speech intelligibility scores for a range of processing, presentation

level, and input SNR conditions.

P a g e | 250
Chapter 14 Conclusions and Future Work

14.2 Conclusions

The cochlear implant is a hearing prosthesis that transforms the world of silence into the

world of sounds. There is no doubt that the cochlear implant has rehabilitated hearing of deaf

people, from lip-reading to telephone answering and music appreciation. Countless

conversations with the recipients and numerous hours of speech tests indicated that speech

understanding of the recipients was up to par at least in favourable listening conditions.

However, performance degraded quickly in adverse listening conditions.

This thesis has investigated the gain optimization techniques for input stimuli to present

within a small electrical dynamic range of cochlear implants. The conventional AGC

systems were investigated first and then new techniques were proposed based on the

findings.

The investigation started from the effect of instantaneous infinite compression of the LGF on

the speech intelligibility. The LGF of the Nucleus systems infinitely compressed the

filterbank outputs above the saturation level. It also thresholded the filterbank outputs below

the base level. With no AGC before the LGF to adjust the signal level, envelope clipping

occurred for high-level sounds. The speech intelligibility of cochlear implant recipients with

no AGC (§8.2.4.1) before the LGF was measured. With no AGC, envelope clipping occurred

substantially at high presentation levels (§13.3.1). Envelope clipping had a strong effect on

speech intelligibility at high presentation levels. The effect was more prominent when the

background noise was also up in the upper range of the LGF.

Envelope clipping has two effects on the input stimuli: envelope distortion and SNR

degradation (in noise). The envelope distortion can be analysed as temporal waveform

distortion in time and spectral envelope distortion in frequency domain. In time domain,

clipping can distort the shape of temporal envelope waveform and reduce the modulation

depth. Clipping of spectral envelopes on the other hand can destroy formant patterns and the

ratio of energy between voiced and unvoiced speech. The envelope distortion occurred both

in quiet and in noise for high-level input stimuli. However, the SNR degradation could only

occurr in noise. Speech intelligibility degradation was plotted against proportion of envelope

P a g e | 251
Chapter 14 Conclusions and Future Work

clipping. With no AGC, the recipients could tolerate 25% of envelope clipping, with mean

scores at about 90% (Figure 13-4 at §13.3.1) when the background noise level was low (at

20 dB SNR). It was deduced that envelope distortion has low impact on speech

intelligibility.

Envelope clipping reduced the output SNR in noise (§13.3.2). When the speech intelligibility

of cochlear implant recipients with no AGC in noise at 10 dB SNR was plotted against the

clipping proportion, performance was degraded monotonously for the presentation levels

above 70 dB SPL (§13.3.1). At 10 dB SNR, the clipping proportion of speech at 70 dB SPL

was approximately 25% and the mean score was about 60% (Figure 13-4 at §13.3.1). For the

same amount of clipping proportion, the mean score was 30 percentage points higher at 20

dB SNR. The output SNR was approximately 2.5 dB and 5 dB for the input SNR of 10 dB

and 20 dB. The output SNR degradation due to envelope clipping was the main factor

affecting the speech understanding at high presentation levels in noise.

Although envelope distortion has low impact on speech intelligibility, it can become

prominent when clipping proportion was more than 25%. The experiment with no AGC did

not show this effect because the background noise level was relatively high (i.e. close to the

saturation level) for the presentation levels above 83 dB SPL at 20 dB SNR. The output SNR

reduction occurred at high presentation level in noise because the envelopes of target speech

were clipped more than the background noise. If that was the case, then reducing clipping

proportion at high presentation level in noise should improve the output SNR and therefore

the speech intelligibility. The experimental results of the front-end compression limiter

(§8.2.4.2) partially supported the above statement. The speech intelligibility improvement

was seen at very high presentation levels, at 86 and 89 dB SPL, at 20 dB SNR and above 70

dB SPL at 10 dB SNR.

The purpose of the front-end compression limiter (§8.2.4.2) was to reduce the envelope

clipping at the LGF. However, it could not guarantee zero percent envelope clipping,

because the compression happened before the filterbank and the filterbank outputs might still

be above the saturation level of the LGF. This could happen for input stimuli with a low

crest factor. Envelope clipping was observed at high presentation levels with the front-end

P a g e | 252
Chapter 14 Conclusions and Future Work

compression limiter (§13.3.1). Performance was not proportionally improved with the

reduction of clipping proportion (Figure 13-4). Besides, the front-end compression limiter

could also introduce temporal envelope distortion and SNR reduction, only in noise, and

consequently reduce the speech intelligibility of cochlear implant recipients.

A novel compression limiter, known as the envelope profile limiter, was proposed to

improve the spectral envelope cues of input stimuli (§10.2.2). By using the saturation level

of the LGF as the compression threshold, the proposed envelope profile limiter could be

configured to avoid clipping completely. More importantly, it applied a single gain to all

channels to preserve the spectral envelope profile of input stimuli. The effect of avoiding

envelope clipping by preserving spectral profile was observed by comparing speech

intelligibility of cochlear implant recipients with the front-end compression limiter and with

the envelope profile limiter (§10.4.1). In addition to that, the effect of compression speed on

speech intelligibility was also observed. The experimental results showed that the

compression speed had a larger effect on speech intelligibility than the gain structure that

preserved spectral profile (§10.4). The AGC configuration with longer release time achieved

higher speech intelligibility for each AGC.

Preserving spectral envelope could also improve speech intelligibility only at some test

conditions. It was shown more effective when envelope distortion was more severe due to

fast compression, for example the envelope profile limiter performed better than the front-

end compression limiter in quiet when the release time was 75 ms. However, in noise, it was

absolutely important to reduce the compression speed for speech intelligibility. Additional

benefit of preserving spectral envelope profile was observed when the release time was 625

ms. Figure 13-14 showed the effect of compression speed or release time quantified by the

output SNR, the ASMC and the NCM. All three signal metrics showed that longer release

time was important. Each AGC with the release time 625 ms improved the output SNR,

reduced the cross-modulation between target and non-target signals and preserved temporal

envelope shape more than the same AGC with the release time 75 ms.

The importance of slow AGC was also observed when the existing AGC systems were

evaluated using the roving-level SRT tests (§9.2.4). Take-home experiment on the envelope

P a g e | 253
Chapter 14 Conclusions and Future Work

profile limiter showed that subjects preferred their standard program in noisy situations

because it contained ASC that reduced the overall level when the background noise was high

(§11.4). AGCs with even slower time constants (i.e., a release time longer than 625 ms) were

not only necessary for speech intelligibility, but also for listening comfort in real-life. Unlike

hearing aid recipients who showed mixed results on compression speed (Kates 2010), the

experimental results in this thesis clearly indicated that a slow compression speed in an AGC

was primarily important for cochlear implant recipients. An AGC with a fast compression

speed could produce more negatives to the channel envelopes than the positives of loudness

balancing. Hence a hypothetical AGC system that could bring benefits in both quiet and

noisy condition would be the envelope profile limiter with longer release time than 625 ms

or an AGC system with a slow AGC followed by the envelope profile limiter placed before

the LGF.

A fast AGC, also known as WDRC, and a slow multichannel AGC called ADRO are

employed in hearing aids and Nucleus cochlear implant systems to improve audibility of

low-level sounds. Such systems improved speech intelligibility of low-level speech in quiet

but mixed results were shown in noise. WDRC has implications in noise due to fast

compression speed. ADRO on the other hand is a slow AGC but the background noise rule

that preceded audibility rule could compromise audibility. The standard parameter setting of

ADRO uses less stringent background noise criterion to improve audibility.

For the normal hearing system, heavily distorted speech can still be intelligible because the

intelligibility is carried by redundant cues in time and frequency. Unlike normal hearing, it is

important for a cochlear implant system to preserve critical cues of speech to maintain

intelligibility because it cannot preserve all the acoustic features available in speech. AGCs

are important to maintain, if not improve, the available limited spectro-temporal cues. This

thesis has singled out effects of AGC that are important for speech intelligibility. The most

significant factor of AGC that could affect speech intelligibility is compression speed. The

effect is strongly related to the level of background noise. Many researchers have shown that

slow compression speed is important to maintain slow temporal envelope modulation. This

thesis showed that slow compression speed is not only important for slow temporal envelope

modulation but also for reducing the proportion of clipping as well as improving the output

P a g e | 254
Chapter 14 Conclusions and Future Work

SNR because it applies a larger amount of gain to input stimuli above the compression

threshold.

From the results of various studies on AGC systems in hearing aids, Kates (2010) suggested

that the parameters of hearing aids cannot be fixed for all hearing losses or listening

situations. It should rather adjust the processing dynamically in response to the calculated

individual benefit. Similarly, in cochlear implant systems, signal processing should be

adaptable to listening situations. Conventional AGCs, slow and fast AGC together, can

tackle the long-term and short-term level variation in input stimuli. They could extend the

upper and lower limit of the operating range but the net range between them would still be

the same. With conventional AGC systems, the audibility of target signals can be achieved at

the risk of increasing the level of competing signals because gain adjustment is done within a

fixed dynamic range.

A proposal was made in this thesis that the input dynamic range of cochlear implant systems

should follow the dynamic range of input stimuli. For example, a large dynamic range was

preferable for clean speech to include low-level components in the range. In contrast, a small

dynamic range was more appropriate in noisy conditions to exclude noise from the range

(Holden et al. 2011). A novel signal level optimization technique proposed in this thesis

could improve audibility without compromising on the level of background noise (§12.3).

The proposed algorithm, known as the Adaptive Loudness Growth Function (ALGF),

continually adjusted the dynamic range of the LGF to accommodate the input stimuli. It

controlled audibility and distortion on the channel envelopes by adjusting the saturation level

of the LGF. The saturation level regulator is a dual-loop level control with slow and fast time

constants and with the ability to preserve spectral envelope profile. In addition, it could also

control the level of background noise by adjusting the base level of the LGF. The ALGF has

the potential to reduce the level of background noise. The experimental results showed that

the ALGF achieved equal or better SRT in roving-level sentences compared to the existing

AGC systems (§12.5).

Based on the experimental results and anecdotal reports from the recipients, the ALGF was

considered as a feasible alternative for the conventional AGC systems. This thesis has taken

P a g e | 255
Chapter 14 Conclusions and Future Work

a first step in the research area of robust signal level optimization. Better speech

intelligibility of cochlear implant recipients in adverse listening conditions is anticipated

with this new approach.

14.3 Future Work

Future works are listed in the order of priority, anticipated by the present author.

Parameter fine-tuning of ALGF with signal metrics

It is almost impossible to find the optimal parameter set by evaluating speech intelligibility

of cochlear implant subjects with different parameter sets because the parameter space of the

ALGF is large. The best way to work on fine-tuning the parameter space is to use reliable

signal metrics to quantify the effects of changing the parameters. According to the

experimental works in this thesis, the output SNR metric can consistently predict speech

intelligibility of the recipient in noise. Therefore the optimal parameter set of the ALGF

should be worked out with the output SNR metric or other reliable signal metrics before a

take-home study is conducted.

Take-home experiment for ALGF

The output of the ALGF is likely to be presented at the same presentation level regardless of

the input presentation level. It would be interesting to find out the perception of cochlear

implant subjects on sounds processed by the ALGF. Such quality assessment is more

appropriate to conduct in real-life listening conditions than in the laboratory. The benefit of

the ALGF on speech intelligibility has been shown by the acute testing in the laboratory in

this thesis. A take-home study is still needed for complementing the benefits shown by acute

testing. Due to considerable amount of time required to implement the ALGF on the BTE

sound processor, the present author has not done in this thesis. Therefore it is noted as a

future work.

Feature space exploration for ALGF

P a g e | 256
Chapter 14 Conclusions and Future Work

The ALGF is a feature-based dynamic range optimization algorithm. The feature it utilized

was clipping proportion in the saturation level regulator and estimated noise magnitude in

the base level regulator. There are other features that could be applied to improve speech

intelligibility, for example spectral envelope correlation or coherence between channels,

modulation rate and depth and energy profile of input stimuli. Feature space is large and it is

important to extract useful features. Some are more important than the others as shown by

the experiments with the signal metrics (§13.3). For example, the output SNR was shown

important for speech in noise. Hence employing a noise estimator in the base level regulator

is the right thing to improve the output SNR.

Test strategy for evaluating AGC systems for cochlear implant systems

The current test strategy of a roving-level SRT test is to assess the performance of AGC

systems in low, normal and high presentation levels. It also assesses the dynamic behavior of

the AGC for tracking the abrupt changes in the presentation level. The roving-level SRT test

was hypothesized to represent realistic listening conditions outside the laboratory (Haumann,

Lenarz and Büchner 2010). However, it is rare in the real-life scenarios in which speech and

noise presentation level was roved between three levels. It would be better if the test

condition represented a listening situation that was mostly encountered by recipients such as

a conversation between a recipient and another person. Such test condition can be simulated

by roving two presentation levels, which either be 50 and 65 dB SPL or 65 and 80 dB SPL

without losing the essence of the roving-level SRT test. The original roving-level SRT test of

Boyle et al. (2009) presented the background noise 0.5 second before and after each

sentence. The interleaved roving-level SRT test used for evaluating the ALGF presented the

background noise continuously. Both techniques increased or decreased the background

noise together with the presentation level of each sentence. Again it is arguable that such

acoustic scenarios in which the background noise changes between different levels are

hardly around us. A scenario in which sentences roved between different levels in the

presence of a background noise with a constant level is more common than the one just

described. It is important to evaluate robustness of AGCs with roving sentence and noise

levels but at the same time it is important for test conditions to reflect realistic listening

conditions. More research should be done on test methods for AGC systems.

P a g e | 257
Chapter 14 Conclusions and Future Work

Envelope profile limiter for bilateral cochlear implant sound processing

So far, the proposed envelope profile limiter has only been evaluated with unilateral implant

recipients. It is hypothesized that the envelope profile limiter could improve localization of

bilateral cochlear implant recipients by preserving the important binaural cues known as

Inter-aural Level Difference (ILD). Figure 14-1 shows an example of a binaural envelope

profile limiter. A similar arrangement can be made with the front-end AGCs of the bilateral

processors(van Hoesel, Ramsden and O'Driscoll 2002). However, it cannot guarantee to

improve the ILD for input stimuli at high presentation levels. For example, the two front-end

AGCs synchronously reduce the signal level to maintain the ILD. However, envelope

clipping at the LGF can destroy the relative level difference between the two envelopes and

therefore the ILD cues. With the bilateral envelope profile limiter, the relative level

difference between the two envelopes is guaranteed, because clipping never happens.

Figure 14-1 Application of the envelope profile limiter in the bilateral sound processing

Envelope profile limiter for music perception

In cochlear implant music perception, amplitude modulation of the envelopes provides an

important temporal cue to pitch (Laneau, Wouters and Moonen 2004; Swanson 2008). Since

the envelope profile limiter preserves the envelope modulation, it may improve melody

recognition. Therefore the application of the envelope profile limiter for cochlear implant

music perception is noted as a future work.

P a g e | 258
Appendix 1: Subjects

Appendix 1: Subjects

The subject details are listed in Table A-1. The subjects were post-lingually deafened adults

with the Nucleus 24 (CI24M, CI24R), Nucleus Freedom (CI24RE) or Nucleus 5 (CI512)

cochlear implants. The last column indicates the per-channel stimulation rate used in their

usual processor. All subjects used the ACE strategy and the CP810 sound processor at the

time of testing. Bilateral subjects used only one implant of their choice and bimodal subjects

turned off their hearing aids during the test.

Sex Age Aetiology Number Implant Implant Number Number PPS


(yr) of use (yr) type of of
implant channels maxima
S1 M 75 Unknown Bimodal >5 CI24RE 20 10 1800

S2 M 86 Unknown Monaural > 12 CI24M 22 12 720

S3 M 74 Unknown Bilateral > 2.5 CI24RE 22 8 900

S4 M 70 Familial Bimodal > 3.5 CI24RE 22 8 900

S5 F 49 Congenital, Bilateral > 6.5 CI24R 22 9 1200

Progressive

S6 F 64 Unknown Bilateral > 5.5 CI24M 22 12 900

S7 M 41 Unknown Bimodal > 2.5 CI512 22 8 900

S8 F 73 Otosclerosis Bilateral > 10.5 CI24M 22 14 720

S9 F 62 Unknown Monaural > 13 CI24R(ST) 22 12 900

S10 M 79 Progressive Bilateral >5 CI24R(CA) 22 12 900

Table A-1 Subject's biographical information, device use and stimulation information

P a g e | 259
Appendix 1: Subjects

Table A-2 is a cross-reference listing the experiments participated in by each subject.

Results
Experiment S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
Section

P-I functions of no

AGC and front-end 8.2.4    

limiter

Existing AGC systems


9.2.4       
evaluation

Release time vs. AGC


10.4.1      
structure

P-I functions of front-

end limiter and 10.4.2      

envelope profile limiter

Take-home trial of the


11.4     
envelope profile limiter

12.5.2.1.1    
ALGF-1 vs. Tri
12.5.2.1.2    
+ADRO
12.5.2.1.2    

12.5.3.1.1      
ALGF-2 vs. Tri +
12.5.3.1.2   
ADRO
12.5.3.1.2     

ALGF-2: adaptive vs. 12.5.4.1.1     

fixed dynamic range 12.5.4.1.2    

Table A-2 Subject-experiment participation

P a g e | 260
Appendix 2: Cochlear Implant Clinical Questionnaire

Appendix 2: Cochlear Implant Clinical Questionnaire

Comparison of Two Conditions

Name: _______________________________ Date: ________________

Speech Processor: ______________________ Project: _______________________

We are interested in knowing which of the two programs or speech processors (P1 or P2)

that you have been comparing performs best in your daily life.

In this questionnaire you are asked to judge the helpfulness of each program or speech

processor in a variety of listening situations. You are asked to judge the benefit of the

processors or programs in each situation, NOT the difficulty of the situation itself.

To answer each question, indicate for each processor or program by how much by circling

the response:

A: B: C: D: E:

Ext. Helpful Very Helpful Helpful Little Help No Help

Not Applicable

Mark ONE rating for processor or program.

The “Not Applicable” response box is provided if you have not experienced the situation.

P a g e | 261
Appendix 2: Cochlear Implant Clinical Questionnaire

We know that not all people talk alike. Some mumble, others talk too fast, and others talk

without moving their lips very much. Please answer the questions according to the way most

people talk.

1. You are watching the news on TV.

P 1: A B C D E

P 2: A B C D E Not Applicable

2. You are at home talking to a friend or member of your family who is in the next room.

P 1: A B C D E

P 2: A B C D E Not Applicable

3. You are in a busy shopping centre. There is a lot of background noise and you are in

conversation with a friend.

P 1: A B C D E

P 2: A B C D E Not Applicable

3. You are speaking to a softly spoken person in a room without any background noise.

P 1: A B C D E

P 2: A B C D E Not Applicable

5. You are listening to the news on the radio in a quiet room.

P a g e | 262
Appendix 2: Cochlear Implant Clinical Questionnaire

P 1: A B C D E

P 2: A B C D E Not Applicable

6. You are in a crowded grocery store checkout line and are talking with the cashier.

P 1: A B C D E

P 2: A B C D E Not Applicable

7. You are talking with a small group of friends or family in a quiet room.

P 1: A B C D E

P 2: A B C D E Not Applicable

8. You are talking to a familiar person on the telephone.

P 1: A B C D E

P 2: A B C D E Not Applicable

9. You are talking to a friend or family member about 2-3 feet (1 metre) away in a quiet

room at your home.

P 1: A B C D E

P 2: A B C D E Not Applicable

10. You are talking to a familiar person in quiet conditions outside.

P a g e | 263
Appendix 2: Cochlear Implant Clinical Questionnaire

P 1: A B C D E

P 2: A B C D E Not Applicable

11. You are listening to soft sounds in your environment (such as a refrigerator motor, birds

at a distance, or water boiling on the stove).

P 1: A B C D E

P 2: A B C D E Not Applicable

12. You are talking with one other familiar person in a quiet carpeted room.

P 1: A B C D E

P 2: A B C D E Not Applicable

13. You are travelling in a car in noisy traffic with some of the windows down. You are

having a conversation with one other person.

P 1: A B C D E

P 2: A B C D E Not Applicable

14. You are talking with someone across the other side of the room. The other person is

speaking in a normal voice.

P 1: A B C D E

P 2: A B C D E Not Applicable

P a g e | 264
Appendix 2: Cochlear Implant Clinical Questionnaire

15. You are talking to a friend or family member and the TV is loud in the background.

P 1: A B C D E

P 2: A B C D E Not Applicable

16. You are listening to music on your stereo.

P 1: A B C D E

P 2: A B C D E Not Applicable

17. You are on near a busy road talking with a friend.

P 1: A B C D E

P 2: A B C D E Not Applicable

18. You are with talking with a small group of people in a busy restaurant or café.

P 1: A B C D E

P 2: A B C D E Not Applicable

Program Preference

What is your overall preferred program when listening in quiet?

 P1  P2  no difference

Using your preferred program in quiet, how would you describe the sound quality?

P a g e | 265
Appendix 2: Cochlear Implant Clinical Questionnaire

 Very similar to the other programs

 Slightly better than the other programs

 Moderately better than the other programs

 Much better than the other programs

What is your overall preferred program when listening in noise?

 P1  P2  no difference

Using your preferred program in noise, how would you describe the sound quality?

 Very similar to the other programs

 Slightly better than the other programs

 Moderately better than the other programs

 Much better than the other programs

THANK YOU FOR COMPLETING THE QUESTIONNAIRE.

P a g e | 266
Appendix 3: Statistics

Appendix 3: Statistics

This chapter describes the statistical methods used to analyze the speech scores of the CI

subjects who participated in the experiments in this thesis. Since the time of CI subjects who

volunteer in experiments is precious, the number of trials should be limited, but sufficient to

show a significant difference between two processing conditions.

Binomial Test

In the fixed presentation level sentence tests, the number of correct morphemes was counted

for each sentence, and summed across sentences. The score was the proportion of correct

morphemes. In all experiments in this thesis, we are interested in comparing the subject’s

performance under two processing conditions. A hypothesis test is performed to determine

whether one processing condition is significantly better than the other.

Swanson (2008) developed an efficient way to determine the statistical significance of the

difference between two proportions, using Monte Carlo simulation (Simon 1997). The

analysis scripts are part of the Nucleus Matlab Toolbox and were used in this thesis. The

following steps explain the method, which is implemented in the function

Difference_between_paired_proportions.

>> Difference_between_paired_proportions([X1, X2], N, 'monte

carlo');

X1 and X2 are the numbers of correct morphemes in each condition, and N is the total

numbers of morphemes in each condition. The appropriate null hypothesis is that both

processing conditions give equal probability of a correct response. If the null hypothesis is

true, then the best estimate of this common probability p0 is:

>> p0 = (X1 + X2) / (2 * N);

Next, two sets of random samples are generated assuming that they come from the same

binomial distribution with probability p0, as stated in the null hypothesis.

P a g e | 267
Appendix 3: Statistics

>> sim_x1 = binornd(length(X1), p0, 1 , num_sim);

>> sim_x2 = binornd(length(X2), p0, 1 , num_sim);

num_sim is the number of random samples. The difference between the simulated data

sim_x1 and sim_x2 is found.

>> sim_d = sim_x1 – sim_x2;

The difference between the actual data X1 and X2 is determined.

>> actual_d = X1 – X2;

Then the difference calculated from the simulated data, and the difference between the

measured data X1 and X2, are compared.

>> p_diff = sum(sim_d >= actual_d)/num_sim;

If p_diff < 0.05 then the difference is significant and the null hypothesis is not

supported. It means the probability of the difference coming from the same dataset is less

than 5%. Hence X1 and X2 are statistically significantly different.

P a g e | 268
References

References

ANSI, A. (1996). "S3.22-1996, Specification of hearing aid characteristics". New York:


American National Standards Institute.

ANSI, A. (1997). "S3.5-1997, Methods for the calculation of the speech intelligibility
index". New York: American National Standards Institute.

Bacon, S. P., R. R. Fay, et al. (2004). "Compression: from cochlea to cochlear implants",
Springer Verlag.

Bench, J., Å. Kowal, et al. (1979). "The BKB (Bamford-Kowal-Bench) sentence lists for
partially-hearing children." British Journal of Audiology 13(3): 108-112.

Blamey, P. (2005). "Adaptive dynamic range optimization (ADRO): A digital amplification


strategy for hearing aids and cochlear implants." Trends in Amplification 9(2): 77.

Blamey, P., P. Arndt, et al. (1996). "Factors affecting auditory performance of


postlinguistically deaf adults using cochlear implants." Audiology and Neurotology
1(5): 293-306.

Blamey, P., D. Macfarlane, et al. (2005). "An intrinsically digital amplification scheme for
hearing aids." EURASIP Journal on Applied Signal Processing 18: 3026-3033.

Bondarew, V. and P. Seligman (2012). "The Cochlear Story", CSIRO Publishing.

Boothroyd, A., F. N. Erickson, et al. (1994). "The hearing aid input: a phonemic approach to
assessing the spectral distribution of speech." Ear and hearing 15(6): 432.

Boothroyd, A. and S. Nittrouer (1988). "Mathematical treatment of context effects in


phoneme and word recognition." The Journal of the Acoustical Society of America
84: 101.

Boyle, P. J., A. Büchner, et al. (2009). "Comparison of dual-time-constant and fast-acting


automatic gain control (AGC) systems in cochlear implants." International Journal
of Audiology 48(4): 211-221.

Boyle, P. J., T. B. Nunn, et al. (2013). "STARR: A Speech Test for Evaluation of the
Effectiveness of Auditory Prostheses Under Realistic Conditions." Ear and hearing
34(2): 203-212.

Brand, T. and B. Kollmeier (2002). "Efficient adaptive procedures for threshold and
concurrent slope estimates for psychophysics and speech intelligibility tests." The
Journal of the Acoustical Society of America 111: 2801.

Bustamante, D. K. and L. D. Braida (1987). "Multiband compression limiting for hearing-


impaired listeners." Journal of Rehabilitation Research and Development 24(4):
149-160.

Byrne, D., H. Dillon, et al. (2001). "NAL-NL1 procedure for fitting nonlinear hearing aids:
Characteristics and comparisons with other procedures." JOURNAL-AMERICAN
ACADEMY OF AUDIOLOGY 12(1): 37-51.

P a g e | 269
References

Cameron, S. and H. Dillon (2007). "Development of the listening in spatialized noise-


sentences test (LISN-S)." Ear and hearing 28(2): 196-211.

Chen, F. and P. C. Loizou (2010). "Analysis of a simplified normalized covariance measure


based on binary weighting functions for predicting the intelligibility of noise-
suppressed speech." Journal of the Acoustical Society of America 128(6): 3715-
3723.

Chen, F. and P. C. Loizou (2011a). "Modeling speech intelligibility by cochlear implant


users". Conference on Implantable Auditory Prostheses, Pacific Grove, California,
USA.

Chen, F. and P. C. Loizou (2011b). "Predicting the intelligibility of vocoded speech." Ear
and hearing 32(3): 331.

Clark, G. (2003). "Cochlear implants: fundamentals and applications", Springer Verlag.

Cohen, I. and B. Berdugo (2002). "Noise estimation by minima controlled recursive


averaging for robust speech enhancement." Signal Processing Letters, IEEE 9(1):
12-15.

Cosendai, G. and M. Pelizzone (2001). "Effects of the Acoustical Dynamic Range on Speech
Recognition with Cochlear Implants: Efectos en el rango dinámico del
reconocimiento del habla con implantes cocleares." International Journal of
Audiology 40(5): 272-281.

Crain, T. R. and E. W. Yund (1995). "The effect of multichannel compression on vowel and
stop-consonant discrimination in normal-hearing and hearing-impaired subjects."
Ear and hearing 16(5): 529-543.

Davies-Venn, E., P. Souza, et al. (2007). "Speech and music quality ratings for linear and
nonlinear hearing aid circuitry." Journal of the American Academy of Audiology
18(8): 688-699.

Dawson, P., J. Decker, et al. (2004). "Optimizing dynamic range in children using the
Nucleus cochlear implant." Ear and hearing 25(3): 230.

Dawson, P. W., A. A. Hersbach, et al. (2013). "An adaptive Australian Sentence Test In
Noise (AuSTIN)." Ear and hearing in press.

Dawson, P. W., S. J. Mauger, et al. (2011). "Clinical Evaluation of Signal-to-Noise Ratio–


Based Noise Reduction in Nucleus® Cochlear Implant Recipients." Ear and hearing
32(3): 382.

Dawson, P. W., A. E. Vandali, et al. (2007). "Clinical evaluation of expanded input dynamic
range in nucleus cochlear implants." Ear and hearing 28(2): 163-176.

De Gennaro, S., L. Braida, et al. (1986). "Multichannel syllabic compression for severely
impaired listeners." Journal of Rehabilitation Research and Development 23(1): 17.

Dillon, H. (2001). "Hearing aids", Boomerang press.

P a g e | 270
References

Djourno, A. and C. Eyries (1957). "Auditory prosthesis by means of a distant electrical


stimulation of the sensory nerve with the use of an indwelt coiling]." La Presse
médicale 65(63): 1417.

Dreschler, W. A. (1992). "Fitting multichannel-compression hearing aids." International


Journal of Audiology 31(3): 121-131.

Drullman, R. (1995). "Temporal envelope and fine structure cues for speech intelligibility."
Journal of the Acoustical Society of America 97(1): 585-592.

Drullman, R., J. M. Festen, et al. (1994a). "Effect of reducing slow temporal modulations on
speech reception." Journal of the Acoustical Society of America 95(5): 2670-2680.

Drullman, R., J. M. Festen, et al. (1994b). "Effect of temporal envelope smearing on speech
reception." Journal of the Acoustical Society of America 95(2): 1053-1064.

Dubno, J., A. Horwitz, et al. (2005). "Word recognition in noise at higher-than-normal


levels: Decreases in scores and increases in masking." The Journal of the Acoustical
Society of America 118: 914.

Dunn, H. and S. White (1940). "Statistical measurements on conversational speech." The


Journal of the Acoustical Society of America 11: 278.

Firszt, J. B. (2003). "HiResolution sound processing." Advanced Bionics White Paper.


Sylmar, Calif: Advanced Bionics Corp.

Firszt, J. B., L. K. Holden, et al. (2004). "Recognition of speech presented at soft to loud
levels by adult cochlear implant recipients of three cochlear implant systems." Ear
and hearing 25(4): 375.

Fletcher, H. and W. Munson (1937). "Relation between loudness and masking." J. Acoust.
Soc. Am. 9: 1-10.

Fletcher, H. and W. A. Munson (1933). "Loudness, its definition, measurement and


calculation." The Journal of the Acoustical Society of America 5(2): 82-108.

French, N. and J. Steinberg (1947). "Factors governing the intelligibility of speech sounds."
The Journal of the Acoustical Society of America 19(1): 90-119.

Friesen, L. M., R. V. Shannon, et al. (2001). "Speech recognition in noise as a function of the
number of spectral channels: comparison of acoustic hearing and cochlear implants."
The Journal of the Acoustical Society of America 110: 1150.

Fu, Q.-J. and J. J. Galvin III (2008). "Maximizing cochlear implant patients’ performance
with advanced speech training procedures." Hearing research 242(1): 198-208.

Fu, Q.-J. and R. V. Shannon (1998a). "Effects of amplitude nonlinearity on phoneme


recognition by cochlear implant users and normal-hearing listeners." The Journal of
the Acoustical Society of America 104: 2570.

Fu, Q.-J., R. V. Shannon, et al. (1998). "Effects of noise and spectral resolution on vowel
and consonant recognition: Acoustic and electric hearing." The Journal of the
Acoustical Society of America 104: 3586.

P a g e | 271
References

Fu, Q. and R. Shannon (1998b). "Effects of amplitude nonlinearity on phoneme recognition


by cochlear implant users and normal-hearing listeners." The Journal of the
Acoustical Society of America 104: 2570.

Füllgrabe, C., M. A. Stone, et al. (2009). "Contribution of very low amplitude-modulation


rates to intelligibility in a competing-speech task." The Journal of the Acoustical
Society of America 125: 1277.

Gatehouse, S., G. Naylor, et al. (2006). "Linear and nonlinear hearing aid fittings--1. Patterns
of benefit." International Journal of Audiology 45(3): 130.

Gifford, R. H. (2011). "Who is a cochlear implant candidate?" The Hearing Journal 64(6):
16-18.

Gifford, R. H., J. K. Shallop, et al. (2008). "Speech recognition materials and ceiling effects:
considerations for cochlear implant programs." Audiology and Neurotology 13(3):
193-205.

Goldsworthy, R. and J. Greenberg (2004). "Analysis of speech-based speech transmission


index methods with implications for nonlinear operations." The Journal of the
Acoustical Society of America 116: 3679.

Goorevich, M. (2005). "An Algorithmic Testbench for Cochlear Implant DSP Speech
Processors". Department of Electronics of the Division of Information and
Communication Sciences, Macquarie University, Sydney, Australia.Master of
Science. 289

Greenberg, S. (1996). "Auditory processing of speech." Principles of experimental


phonetics: 362-407.

Greenberg, S., W. A. Ainsworth, et al. (2004). "Speech processing in the auditory system",
Springer Berlin.

Hagerman, B. and A. Olofsson (2004). "A method to measure the effect of noise reduction
algorithms using simultaneous speech and noise." Acta Acustica united with
Acustica 90(2): 356-361.

Hansen, M. (2002). "Effects of multi-channel compression time constants on subjectively


perceived sound quality and speech intelligibility." Ear and hearing 23(4): 369-380.

Haumann, S., T. Lenarz, et al. (2010). "Speech Perception with Cochlear Implants as
Measured Using a Roving-Level Adaptive Test Method." ORL 72(6): 312-318.

Henry, B. A., C. M. McKay, et al. (2000). "The relationship between speech perception and
electrode discrimination in cochlear implantees." The Journal of the Acoustical
Society of America 108(3): 1269-1280.

Hersbach, A. A., K. Arora, et al. (2012). "Combining Directional Microphone and Single-
Channel Noise Reduction Algorithms: A Clinical Evaluation in Difficult Listening
Conditions With Cochlear Implant Users." Ear and hearing 33(4): e13-e23.

Hirsch, H. and C. Ehrlicher (1995). "Noise estimation techniques for robust speech
recognition". Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995
International Conference on, IEEE.

P a g e | 272
References

Hochmair-Desoyer, I., E. Schulz, et al. (1997). "The HSM sentence test as a tool for
evaluating the speech understanding in noise of cochlear implant users." Otology &
Neurotology 18(6): S83-S86.

Holden, L. K., R. M. Reeder, et al. (2011). "Optimizing the perception of soft speech and
speech in noise with the Advanced Bionics cochlear implant system." International
Journal of Audiology 50(4): 255-269.

Holube, I. and B. Kollmeier (1996). "Speech intelligibility prediction in hearing impaired


listeners based on a psychoacoustically motivated perception model." The Journal of
the Acoustical Society of America 100: 1703.

House, W. F. and J. Urban (1973). "Long term results of electrode implantation and
electronic stimulation of the cochlea in man." The Annals of otology, rhinology, and
laryngology 82(4): 504.

Houtgast, T. and H. J. M. Steeneken (1973). "The modulation transfer function in room


acoustics as a predictor of speech intelligibility." The Journal of the Acoustical
Society of America 54(2): 557-557.

Houtgast, T. and H. J. M. Steeneken (1985). "A review of the MTF concept in room
acoustics and its use for estimating speech intelligibility in auditoria." The Journal of
the Acoustical Society of America 77: 1069.

Hu, Y. and P. Loizou (2008). "A new sound coding strategy for suppressing noise in
cochlear implants." The Journal of the Acoustical Society of America 124: 498.

IEC (1997). "IEC 60118-2, Hearing aids, Part 2: Hearing aids with automatic gain control
circuits".

Iwaki, T., P. Blamey, et al. (2008). "Bimodal studies using adaptive dynamic range
optimization (ADRO) technology." International Journal of Audiology 47(6): 311-
318.

James, C. J., P. J. Blamey, et al. (2002). "Adaptive dynamic range optimization for cochlear
implants: a preliminary study." Ear and hearing 23(1): 49S.

James, C. J., M. W. Skinner, et al. (2003). "An investigation of input level range for the
Nucleus 24 cochlear implant system: speech perception performance, program
preference, and loudness comfort ratings." Ear and hearing 24(2): 157.

Jenstad, L. M. and P. E. Souza (2005). "Quantifying the effect of compression hearing aid
release time on speech acoustics and intelligibility." Journal of Speech, Language
and Hearing Research 48(3): 651.

Kates, J. M. (1991). "A time-domain digital cochlear model." Signal Processing, IEEE
Transactions on 39(12): 2573-2592.

Kates, J. M. (2010). "Understanding compression: Modeling the effects of dynamic-range


compression in hearing aids." International Journal of Audiology 49(6): 395-409.

Killion, M. C. (1978). "Revised estimate of minimum audible pressure: Where is


the’’missing 6 dB’’?" The Journal of the Acoustical Society of America 63: 1501.

P a g e | 273
References

King, A. and M. Martin (1984). "Is AGC beneficial in hearing aids?" British Journal of
Audiology 18(1): 31-38.

Klatt, D. H. (1989). "Review of selected models of speech perception". Lexical


representation and process, MIT Press.

Kral, A. and G. M. O'Donoghue (2010). "Profound deafness in childhood." New England


Journal of Medicine 363(15): 1438-1450.

Kryter, K. D. (1962a). "Methods for the calculation and use of the articulation index." The
Journal of the Acoustical Society of America 34: 1689.

Kryter, K. D. (1962b). "Validation of the articulation index." The Journal of the Acoustical
Society of America 34(11): 1698-1702.

Laneau, J., J. Wouters, et al. (2004). "Relative contributions of temporal and place pitch cues
to fundamental frequency discrimination in cochlear implantees." The Journal of the
Acoustical Society of America 116: 3606.

Laurence, R. F., B. C. Moore, et al. (1983). "A comparison of behind-the-ear high-fidelity


linear hearing aids and two-channel compression aids, in the laboratory and in
everyday life." British Journal of Audiology 17(1): 31-48.

Lazard, D. S., C. Vincent, et al. (2012). "Pre-, per-and postoperative factors affecting
performance of postlinguistically deaf adults using cochlear implants: a new
conceptual model over time." PloS one 7(11): e48739.

Levitt, H. (1978). "Adaptive testing in audiology." Scandinavian audiology.


Supplementum(6): 241.

Licklider, J. and I. Pollack (1948). "Effects of differentiation, integration, and infinite peak
clipping upon the intelligibility of speech." The Journal of the Acoustical Society of
America 20: 42.

Licklider, J. C. and G. A. Miller (1951). "The perception of speech."

Lin, L. (2004). "Speech Processing in the Auditory Filter Domain". School of Electrical
Engineering & Telecommunications, The University of New South Wales, Sydney,
Australia.Ph.D. 154

Lin, L., W. Holmes, et al. (2003a). "Adaptive noise estimation algorithm for speech
enhancement." Electronics Letters 39(9): 754-755.

Lin, L., W. Holmes, et al. (2003b). "Subband noise estimation for speech enhancement using
a perceptual Wiener filter". Acoustics, Speech, and Signal Processing, 2003.
Proceedings.(ICASSP'03). 2003 IEEE International Conference on, IEEE.

Lippmann, R., L. Braida, et al. (1981). "Study of multichannel amplitude compression and
linear amplification for persons with sensorineural hearing loss." The Journal of the
Acoustical Society of America 69: 524.

Loizou, P. C. (2007). "Speech enhancement: theory and practice", CRC.

P a g e | 274
References

Loizou, P. C., M. Dorman, et al. (2000). "Speech recognition by normal-hearing and


cochlear implant listeners as a function of intensity resolution." The Journal of the
Acoustical Society of America 108: 2377.

Loizou, P. C., M. Dorman, et al. (1999). "On the number of channels needed to understand
speech." The Journal of the Acoustical Society of America 106: 2097.

Lyon, R. (1982). "A computational model of filtering, detection, and compression in the
cochlea". Acoustics, Speech, and Signal Processing, IEEE International Conference
on ICASSP'82., IEEE.

Lyon, R. (1983). "A computational model of binaural localization and separation". Acoustics,
Speech, and Signal Processing, IEEE International Conference on ICASSP'83.,
IEEE.

Lyon, R. (1984). "Computational models of neural auditory processing". Acoustics, Speech,


and Signal Processing, IEEE International Conference on ICASSP'84., IEEE.

Lyon, R. F. and C. Mead (1988). "An analog electronic cochlea." Acoustics, Speech and
Signal Processing, IEEE Transactions on 36(7): 1119-1134.

Ma, J., Y. Hu, et al. (2009). "Objective measures for predicting speech intelligibility in noisy
conditions based on new band-importance functions." The Journal of the Acoustical
Society of America 125: 3387.

Mackersie, C. L. (2002). "Tests of speech perception abilities." Current Opinion in


Otolaryngology & Head and Neck Surgery 10(5): 392-397.

Malah, D., R. V. Cox, et al. (1999). "Tracking speech-presence uncertainty to improve


speech enhancement in non-stationary noise environments". Acoustics, Speech, and
Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on,
IEEE.

Marriage, J. E., B. C. J. Moore, et al. (2005). "Effects of three amplification strategies on


speech perception by children with severe and profound hearing loss." Ear and
hearing 26(1): 35-47.

Martin, R. (1994). "Spectral subtraction based on minimum statistics." power 6: 8.

Martin, R. (2001). "Noise power spectral density estimation based on optimal smoothing and
minimum statistics." Speech and Audio Processing, IEEE Transactions on 9(5): 504-
512.

McDermott, H. J., K. R. Henshall, et al. (2002). "Benefits of syllabic input compression for
users of cochlear implants." Journal of the American Academy of Audiology 13(1):
14-24.

McDermott, H. J., C. M. Mckay, et al. (1992). "A new portable sound processor for the
University of Melbourne/Nucleus Limited multielectrode cochlear implant." The
Journal of the Acoustical Society of America 91: 3367.

McDermott, H. J., A. E. Vandali, et al. (1993). "A portable programmable digital sound
processor for cochlear implant research." Rehabilitation Engineering, IEEE
Transactions on 1(2): 94-100.

P a g e | 275
References

McKay, C. M., H. J. McDermott, et al. (1994). "Pitch percepts associated with


amplitude‐modulated current pulse trains in cochlear implantees." The Journal of the
Acoustical Society of America 96: 2664.

Miller, G. A. (1947). "The masking of speech." Psychological Bulletin 44(2): 105.

Moore, B. (2000). "Use of a loudness model for hearing aid fitting. IV. Fitting hearing aids
with multi-channel compression so as to restore'normal'loudness for speech at
different levels." British Journal of Audiology 34(3): 165-177.

Moore, B. C. J. (2003a). "An Introduction to the Psychology of Hearing", Academic Press.

Moore, B. C. J. (2003b). "Speech processing for the hearing-impaired: successes, failures,


and implications for speech mechanisms." Speech communication 41(1): 81-91.

Moore, B. C. J. (2008). "The choice of compression speed in hearing aids: Theoretical and
practical considerations and the role of individual differences." Trends in
Amplification 12(2): 103.

Moore, B. C. J. and B. R. Glasberg (1988). "A comparison of four methods of implementing


automatic gain control (AGC) in hearing aids." British Journal of Audiology 22(2):
93-104.

Moore, B. C. J., B. R. Glasberg, et al. (1991). "Optimization of a slow-acting automatic gain


control system for use in hearing aids." British Journal of Audiology 25(3): 171-182.

Moore, B. C. J., R. W. Peters, et al. (1999). "Benefits of linear amplification and


multichannel compression for speech comprehension in backgrounds with spectral
and temporal dips." The Journal of the Acoustical Society of America 105: 400.

Muller-Deile, J., J. Kiefer, et al. (2008). "Performance benefits for adults using a cochlear
implant with adaptive dynamic range optimization (ADRO): a comparative study."
Cochlear Implants International 9(1): 8.

Neal, T. (2011). "Acoustic Processing Method and Apparatus". US Patent. 20110135129.

Nelson, D., D. Van Tasell, et al. (1995). "Electrode ranking of" place pitch" and speech
recognition in electrical hearing." Journal of the Acoustical Society of America
98(4): 1987-1999.

Nelson, D. A., J. L. Schmitz, et al. (1996). "Intensity discrimination as a function of stimulus


level with electric stimulation." The Journal of the Acoustical Society of America
100: 2393.

Neuman, A. C., M. H. Bakke, et al. (1998). "The effect of compression ratio and release time
on the categorical rating of sound quality." The Journal of the Acoustical Society of
America 103: 2273.

Nie, K., A. Barco, et al. (2006). "Spectral and temporal cues in cochlear implant speech
perception." Ear and hearing 27(2): 208-217.

Nilsson, M., S. D. Soli, et al. (1994). "Development of the Hearing in Noise Test for the
measurement of speech reception thresholds in quiet and in noise." The Journal of
the Acoustical Society of America 95: 1085.

P a g e | 276
References

Nogueira, W., A. Büchner, et al. (2005). "A psychoacoustic ‘‘NofM”-type speech coding
strategy for cochlear implants." EURASIP J. Appl. Sig. Process: 3044–3059.

Nysen, P. A. (1980). "Recursive Percentile Estimator". US Patent. 4204260.

Olsen, W. O. (1998). "Average Speech Levels and Spectra in Various Speaking/Listening


Conditions: A Summary of the Pearson, Bennett, & Fidell (1977) Report." Am J
Audiol 7(2): 21-25.

Patrick, J. F., P. A. Busby, et al. (2006). "The development of the Nucleus Freedom cochlear
implant system." Trends in Amplification 10(4): 175-200.

Pavlovic, C. V. (1984). "Use of the articulation index for assessing residual auditory function
in listeners with sensorineural hearing impairment." The Journal of the Acoustical
Society of America 75(4): 1253.

Pavlovic, C. V. (1987). "Derivation of primary parameters and procedures for use in speech
intelligibility predictions." The Journal of the Acoustical Society of America 82: 413.

Pavlovic, C. V. and G. A. Studebaker (1984). "An evaluation of some assumptions


underlying the articulation index." The Journal of the Acoustical Society of America
75: 1606.

Pavlovic, C. V., G. A. Studebaker, et al. (1986). "An articulation index based procedure for
predicting the speech recognition performance of hearing‐impaired individuals." The
Journal of the Acoustical Society of America 80: 50.

Pearsons, K., R. Bennett, et al. (1976). "Speech levels in various environments. Report to the
Office of Recources & Development." Environmental Protection Agency, BBN
Report 3281.

Pearsons, K. S., R. L. Bennett, et al. (1977). "Speech levels in various noise environments".
Washington, DC, Office of Health and Ecological Effects, Office of Research and
Development, US EPA.

Plomp, R. (1983). "Perception of speech as a modulated signal". Proceedings of the Tenth


International Congress of Phonetic Sciences, Dordrecht, Foris.

Plomp, R. (1988). "The negative effect of amplitude compression in multichannel hearing


aids in the light of the modulation transfer function." The Journal of the Acoustical
Society of America 83: 2322.

Plomp, R. (1994). "Noise, amplification, and compression: Considerations of three main


issues in hearing aid design." Ear and hearing 15(1): 2.

Qin, M. K. and A. J. Oxenham (2003). "Effects of simulated cochlear-implant processing on


speech reception in fluctuating maskers." The Journal of the Acoustical Society of
America 114: 446.

Rhebergen, K., N. Versfeld, et al. (2008a). "Prediction of the intelligibility for speech in real-
life background noises for subjects with normal hearing." Ear and hearing 29(2):
169.

P a g e | 277
References

Rhebergen, K., N. Versfeld, et al. (2009). "The dynamic range of speech, compression, and
its effect on the speech reception threshold in stationary and interrupted noise." The
Journal of the Acoustical Society of America 126: 3236.

Rhebergen, K. S. and N. J. Versfeld (2005). "A speech intelligibility index-based approach


to predict the speech reception threshold for sentences in fluctuating noise for
normal-hearing listeners." The Journal of the Acoustical Society of America 117:
2181.

Rhebergen, K. S., N. J. Versfeld, et al. (2006). "Extended speech intelligibility index for the
prediction of the speech reception threshold in fluctuating noise." The Journal of the
Acoustical Society of America 120: 3988.

Rhebergen, K. S., N. J. Versfeld, et al. (2008b). "Quantifying and modeling the acoustic
effects of compression on speech in noise." The Journal of the Acoustical Society of
America 123(5): 3167-3167.

Ris, C. and S. Dupont (2001). "Assessing local noise level estimation methods: Application
to noise robust ASR." Speech communication 34(1): 141-158.

Rosen, S. (1992). "Temporal information in speech: acoustic, auditory and linguistic


aspects." Philosophical Transactions of the Royal Society of London. Series B:
Biological Sciences 336(1278): 367-373.

Scollie, S., R. Seewald, et al. (2005). "The desired sensation level multistage input/output
algorithm." Trends in Amplification 9(4): 159-197.

Seligman, P. (2000). "Automatic sensitivity control", US Patent 6,151,400.

Seligman, P. and L. Whitford (1995). "Adjustment of appropriate signal levels in the Spectra
22 and mini speech processors." The Annals of otology, rhinology & laryngology.
Supplement 166: 172.

Shannon, R., Q. Fu, et al. (2001). "Critical cues for auditory pattern recognition in speech:
Implications for cochlear implant speech processor design." Physiological and
Psychological Bases of Auditory Function.

Shannon, R. V. (1983). "Multichannel electrical stimulation of the auditory nerve in man. I.


Basic psychophysics." Hearing research 11(2): 157-189.

Shannon, R. V. (1992). "Temporal modulation transfer functions in patients with cochlear


implants." The Journal of the Acoustical Society of America 91: 2156.

Shannon, R. V., F.-G. Zeng, et al. (1995). "Speech recognition with primarily temporal
cues." Science 270(5234): 303-304.

Shannon, R. V., F.-G. Zeng, et al. (1998). "Speech recognition with altered spectral
distribution of envelope cues." The Journal of the Acoustical Society of America
104: 2467.

Shi, L.-F. and K. A. Doherty (2008). "Subjective and objective effects of fast and slow
compression on the perception of reverberant speech in listeners with hearing loss."
Journal of Speech, Language and Hearing Research 51(5): 1328.

Simon, J. (1997). "Resampling: the new statistics", Resampling Stats Arlington, VA.

P a g e | 278
References

Skinner, M. W., L. K. Holden, et al. (1999). "Comparison of two methods for selecting
minimum stimulation levels used in programming the Nucleus 22 cochlear implant."
Journal of Speech, Language and Hearing Research 42(4): 814.

Skinner, M. W., L. K. Holden, et al. (1997). "Speech recognition at simulated soft,


conversational, and raised-to-loud vocal efforts by adults with cochlear implants."
The Journal of the Acoustical Society of America 101: 3766.

Slaney, M. (1988). "Lyon's cochlear model", Citeseer.

Souza, P., L. Jenstad, et al. (2006). "Measuring the acoustic effects of compression
amplification on speech in noise." The Journal of the Acoustical Society of America
119: 41.

Souza, P. E. (2002). "Effects of compression on speech acoustics, intelligibility, and sound


quality." Trends in Amplification 6(4): 131.

Souza, P. E., B. Yueh, et al. (2000). "Fitting hearing aids with the Articulation Index: Impact
on hearing aid effectiveness." Journal of Rehabilitation Research and Development
37(4): 473-482.

Spahr, A., M. Dorman, et al. (2007). "Performance of patients using different cochlear
implant systems: Effects of input dynamic range." Ear and hearing 28(2): 260.

Spriet, A., L. Van Deun, et al. (2007). "Speech Understanding in Background Noise with the
Two-Microphone Adaptive Beamformer BEAM (TM) in the Nucleus Freedom (TM)
Cochlear Implant System." Ear and hearing 28(1): 62-72.

Stahl, V., A. Fischer, et al. (2000). "Quantile based noise estimation for spectral subtraction
and Wiener filtering". Acoustics, Speech, and Signal Processing, 2000. ICASSP'00.
Proceedings. 2000 IEEE International Conference on, IEEE.

Steeneken, H. J. M. and T. Houtgast (1980). "A physical method for measuring speech
transmission quality." The Journal of the Acoustical Society of America 67: 318.

Steinberg, J. C. and M. B. Gardner (1937). "The dependence of hearing impairment on sound


intensity." The Journal of the Acoustical Society of America 9: 11.

Stevens, K. N. (1983). "Acoustic Properties Used for the Identification of Speech Sounds."
Annals of the New York Academy of Sciences 405(1): 2-17.

Stevens, S. S. (1957). "On the psychophysical law." Psychological review 64(3): 153.

Stevens, S. S. (1975). "Psychophysics: Introduction to its perceptual, neural, and social


prospects", Transaction Publishers.

Stickney, G. S., F.-G. Zeng, et al. (2004a). "Cochlear implant speech recognition with speech
maskers." The Journal of the Acoustical Society of America 116: 1081.

Stickney, G. S., F. G. Zeng, et al. (2004b). "Cochlear implant speech recognition with speech
maskers." The Journal of the Acoustical Society of America 116: 1081.

P a g e | 279
References

Stöbich, B., C. M. Zierhofer, et al. (1999). "Influence of automatic gain control parameter
settings on speech understanding of cochlear implant users employing the
continuous interleaved sampling strategy." Ear and hearing 20(2): 104.

Stone, M. A. and B. C. J. Moore (2003). "Effect of the speed of a single-channel dynamic


range compressor on intelligibility in a competing speech task." The Journal of the
Acoustical Society of America 114: 1023.

Stone, M. A. and B. C. J. Moore (2004). "Side effects of fast-acting dynamic range


compression that affect intelligibility in a competing speech task." The Journal of
the Acoustical Society of America 116: 2311.

Stone, M. A. and B. C. J. Moore (2007). "Quantifying the effects of fast-acting compression


on the envelope of speech." The Journal of the Acoustical Society of America 121:
1654.

Stone, M. A. and B. C. J. Moore (2008). "Effects of spectro-temporal modulation changes


produced by multi-channel compression on intelligibility in a competing-speech
task." The Journal of the Acoustical Society of America 123: 1063.

Stone, M. A., B. C. J. Moore, et al. (1999). "Comparison of different forms of compression


using wearable digital hearing aids." The Journal of the Acoustical Society of
America 106: 3603.

Studebaker, G., R. Sherbecoe, et al. (1999). "Monosyllabic word recognition at higher-than-


normal speech and noise levels." The Journal of the Acoustical Society of America
105: 2431.

Swanson, B. A. (2008). "Pitch Perception with Cochlear Implants". Department of


Otolaryngology, The University of Melbourne.Doctor of Philosophy. 306

Swanson, B. A., E. Van Baelen, et al. (2007). "Cochlear Implant Signal Processing ICs".
Custom Integrated Circuits Conference, 2007. CICC '07. IEEE. 437-442.

van Buuren, R. A., J. M. Festen, et al. (1999). "Compression and expansion of the temporal
envelope: Evaluation of speech intelligibility and sound quality." The Journal of the
Acoustical Society of America 105: 2903.

van Hoesel, R., R. Ramsden, et al. (2002). "Sound-direction identification, interaural time
delay discrimination, and speech intelligibility advantages in noise for a bilateral
cochlear implant user." Ear and hearing 23(2): 137-149.

Verschuure, J., A. Maas, et al. (1996). "Compression and its effect on the speech signal." Ear
and hearing 17(2): 162-175.

Villchur, E. (1973). "Signal processing to improve speech intelligibility in perceptive


deafness." The Journal of the Acoustical Society of America 53: 1646.

Walker, G. and H. Dillon (1982). "Compression in hearing aids: An analysis, a review and
some recommendations", Australian Government Publishing Service.

White, M. (1986). "Compression systems for hearing aids and cochlear prostheses." Journal
of Rehabilitation Research and Development 23(1): 25.

P a g e | 280
References

Wichmann, F. A. and N. J. Hill (2001). "The psychometric function: I. Fitting, sampling, and
goodness of fit." Percept Psychophys 63(8): 1293-1313.

Wilson, B. (2006a). "Speech processing strategies." Cochlear Implants: A Practical Guide,


second ed. John Wiley & Sons, Hoboken, NJ: 21–69.

Wilson, B. and M. Dorman (2008). "Cochlear implants: Current designs and future
possibilities." J Rehabil Res Dev 45(5): 695-730.

Wilson, B., C. Finley, et al. (1991). "Better speech recognition with cochlear implants."

Wilson, B. S. (2006b). "Speech-Processing Strategies." Cochlear implants: a practical


guide: 21-69.

Wolfe, J., E. C. Schafer, et al. (2009). "Evaluation of speech recognition in noise with
cochlear implants and dynamic FM." Journal of the American Academy of
Audiology 20(7): 409-421.

Xu, L., C. S. Thompson, et al. (2005). "Relative contributions of spectral and temporal cues
for phoneme recognition." The Journal of the Acoustical Society of America 117:
3255.

Yates, G. K. (1995). "Cochlear structure and function." Hearing: 41-74.

Yoshinaga-Itano, C., A. L. Sedey, et al. (1998). "Language of early-and later-identified


children with hearing loss." Pediatrics 102(5): 1161-1171.

Yost, W. A. and D. W. Nielsen (1994). "Fundamentals of hearing: an introduction",


Academic Press San Diego.

Yund, E. W. and K. M. Buckles (1995a). "Enhanced speech perception at low signal-to-


noise ratios with multichannel compression hearing aids." The Journal of the
Acoustical Society of America 97: 1224-1224.

Yund, E. W. and K. M. Buckles (1995b). "Multichannel compression hearing aids: Effect of


number of channels on speech discrimination in noise." The Journal of the
Acoustical Society of America 97: 1206.

Zeng, F. and J. Galvin III (1999). "Amplitude mapping and phoneme recognition in cochlear
implant listeners." Ear and hearing 20(1): 60.

Zeng, F., G. Grant, et al. (2002). "Speech dynamic range and its effect on cochlear implant
performance." The Journal of the Acoustical Society of America 111: 377.

P a g e | 281

You might also like