The Signal-Cognition Interface

Scandinavian Journal of Psychology, 2009, 50, 385393
DOI: 10.1111/j.1467-9450.2009.00748.x
Background and Basic Processes

The Signal-Cognition interface: Interactions between degraded auditory signals and cognitive processes
STEFAN STENFELT1,3 and JERKER RONNBERG2,3
1 2
Department of Clinical and Experimental Medicine, Linkoping University, Sweden Department of Behavioural Sciences and Learning, Linkoping University, Sweden 3 Linnaeus Centre HEAD, The Swedish Institute for Disability Research, Linkoping and Orebro Universities, Sweden
Stenfelt, S. & Ronnberg, J. (2009). The Signal-Cognition interface: Interactions between degraded auditory signals and cognitive processes. Scandina vian Journal of Psychology, 50, 385393. A hearing loss leads to problems with speech perception; this is exacerbated when competing noise is present. The speech signal is recognized by the cognitive system of the listener; noise and distortion tax the cognitive system when interpreting it. The auditory system must interact with the cognitive system for optimal signal decoding. This article discusses this interaction between the signal and cognitive system based on two models: an auditory model describing signal transmission and degeneration due to a hearing loss and a cognitive model for Ease of Language Understanding. The signal distortion depends on the specics of the hearing impairment and thus differently distorted signals can affect the cognitive system in different ways. Consequently, the severity of a hearing loss may not only depend on the lesion itself but also on the cognitive recourses required to interpret the signal. Key words: Peripheral hearing, cochlear model, Ease of Language Understanding (ELU), working memory, implicit and explicit processing, hearning loss. Stefan Stenfelt, Dept Clinical and Experimental Medicine, Div Technical Audiology, Linkoping University, 581 85 Linkoping, Sweden. Tel: +46 13 222856; e-mail: stefan.stenfelt@liu.se
INTRODUCTION It is estimated that approximately 10 to 15% of the population in the economically developed countries suffers from a hearing loss causing a communication problem (Kochkin, 2005). Communication is of vital importance in todays society, both socially and professionally, and as a result a hearing impaired person can suffer severely from hearing loss, although it is not visible. This is nothing new and research into hearing loss and its consequences for the individual has a long history. Despite these efforts, which have been successfully providing in-depth knowledge of parts of the auditory system, we still know very little about interactions between peripheral auditory mechanisms and perceptual processing involving higher cortical functions. It is generally accepted that sounds are transmitted as vibrations to the cochlea, where they are transformed into electro-chemical potentials which are the source of neural codes. This neural stream is then transmitted, via stations in the brainstem, to the primary auditory cortex. Finally, the neural information is interpreted in further brain regions as a sound event such as music or meaningful speech. For a person with normal hearing function, listening and understanding audible speech is often effortless and implicit when the sound environment is optimal (insignicant background noise, low reverberation: Ronnberg, Rudner, Foo & Lunner, 2008). However, the presence of competing noise will affect speech understanding negatively, resulting in increased listening effort. The person with a hearing loss is in a similar situation, even in the absence of noise. If noise is present the prob-
lem of poor speech understanding and high listening effort is further exacerbated (Kramer, Kapteyn & Houtgast, 2006). Whether one considers people with normal hearing or hearing impairment, the lower the signal-to-noise ratio, the poorer is speech understanding and the greater the listening effort. Classically, a hearing impairment is categorized according to its anatomical site of lesion. A problem with sound transmission in the external or middle ear is known as a conductive loss, a problem with transformation of sound to neural signal within the cochlea is termed a sensorineural loss, a problem with auditory neural conduction in the auditory nerve or brainstem is termed retrocochlear loss, and if the auditory problem involves the cerebral cortex it is known as a central loss. Many people suffering from a hearing loss fall into the category described as a sensorineural hearing loss. This group consists primarily of people with presbyacusis (hearing loss due to aging, Gates & Mills, 2005) and noise-induced hearing loss (Rosenhall, Pedersen & Svanborg, 1990). It has been argued that it is not appropriate to use the term sensorineural hearing loss for this group, as the origin of the impairment can be caused by several different mechanisms leading to different signal transmission distortions (Schuknecht & Gacek, 1993). The problem of not knowing the specic origin of a hearing degeneration is well recognized within the area of hearing aid tting. The diagnosis of a hearing impairment is primarily based on the persons audiogram (the quietest pure-tone sound a person can hear, measured at different frequencies). People with equivalent audiograms tted with the same type of hearing aids and using the same prescription can differ considerably in their
2009 The Authors. Journal compilation 2009 The Scandinavian Psychological Associations. Published by Blackwell Publishing Ltd., 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA. ISSN 0036-5564.
386 S. Stenfelt and J. Ronnberg ability to perceive and understand speech. The reasons for this are complex, but one may relate to the specics of the hearing loss (Edwards, 2007). Another reason for different hearing outcomes subsequent to the hearing aid tting is individual cognitive function (Lunner, Rudner & Ronnberg, 2009). Most probably, there is also an interaction between the specics of the hearing loss and cognition. Within the eld of hearing aid tting, individual cognitive abilities have recently become the focus of attention, since cognitive capacity appears to affect the extent of benet from different signal processing schemes (Lunner et al., 2009). Using the same type of reasoning, the specic type of signal distortion, reecting the specic origin of hearing impairment, could affect persons with different cognitive abilities differently. For example, a particular type of hearing loss could impair listeners with low cognitive ability far more than those with high cognitive ability, while another type of hearing loss, of similar magnitude in terms of audiogram function, may not show such sensitivity to cognitive ability. These types of interaction are important for understanding the functions of the higher order auditory system and for optimizing individually based hearing-aid tting. So what does the signal-cognition interface look like? This question is not easy to answer, but it is generally believed that an undistorted signal and signal transmission leads to a bottomup hearing strategy with fast and implicit decoding of phonetic content and smooth lexical access (Fig. 1). However, a distorted
Scand J Psychol 50 (2009)
signal, whether caused by distortion at the source or in the transmission, masked by noise, a hearing impairment, or a combination of these factors, requires an additional top-down strategy using cognitive resources for attention and explicit decoding of the phonological content. To further explore the interaction between the specic types of signal degradation and cognitive function two models are described. One addresses peripheral hearing and signal transmission and one addresses the cognitive system as it impacts on speech processing under suboptimal conditions.
THE AUDITORY PERIPHERY MODEL The literature provides numerous auditory models (e.g. Lyon, 1982; Meddis, 1988, 2006; Carney, 1993; de Boer, 1997; Bruce, Sachs & Young, 2003). Some are purely mechanical models simulating the motions of the basilar membrane in the cochlea while others are lter-bank oriented, primarily built around the gammatone lter (Patterson, Robinson, Holdsworth, McKeown, Zhang & Allerhand, 1992). The gammatone lter has a response that is similar to the vibration response of the basilar membrane in the cochlea. We present here a slightly different model for the auditory periphery (Fig. 2, Stenfelt, 2008). The current model is lter-based, but not using gammatone lters. The outer ear and middle ear are modeled by two cascaded linear lters simulating the head related transfer function (sound source to eardrum transfer function, HRTF; Shaw, 1974) and middle ear vibration of the ossicles (Hato, Stenfelt & Goode, 2003). The HRTF describes how a sound from a specic position in space is ltered by the pinna, head and body before reaching the eardrum. The middle ear lter mimics the sound transmission from the eardrum, via the three middle ear ossicles to the oval window and the uid inside the cochlea. The next step is a traveling wave lter, made by cascaded low-pass lters. Each of these lters has an equivalent bandwidth corresponding to a human auditory lter (Glasberg & Moore, 1990). The next part is the active inner-ear mechanism, imitating the function of the outer hair cells. This is mimicked by a positive feedback loop containing a nonlinear limiting function and lters (Evans & Dallos, 1993). The output from this, which is the simulated active basilar membrane motion, is processed by a nonlinear rectifying
Fig. 1. A generalized model for bottom-up and top-down processing of auditory input (adapted from Edwards, 2007).
Fig. 2. Schematics for the peripheral auditory model (Stenfelt, 2008).
2009 The Authors. Journal compilation 2009 The Scandinavian Psychological Associations.
The Signal-Cognition interface
387
function and ltered by the inner hair cell block (Meddis, 1988). Also, as a parameter into the outer hair cell and inner hair cell functions, the endocochlear potential is included. This potential can be seen as the voltage driving active systems in the cochlea. The output from each part of these parallel blocks is a signal corresponding to the sum of neural activity at an area of the basilar membrane corresponding to the estimate of human auditory lters (Glasberg & Moore, 1990). This auditory model can simulate hearing losses caused by conduction problems (not relevant to the discussion here), or cochlear losses caused by inner hair cells, outer hair cells, or pathological endocochlear potentials. Retrocochlear problems are not considered in the current model.
INFLUENCE FROM INNER HAIR CELLS A common assumption is that hearing losses up to approximately 60 dB HL originate in lesions of the outer hair cells, while greater losses indicate additional damage of the inner hair cells (Van Tasell, 1993). This rule of thumb is most probably too simplistic, and losses below 60 dB HL may also be related to damage to the inner hair cells. One problem in understanding the inuence of pathological inner hair cells on hearing is that the audiogram the most common way to diagnose hearing loss cannot give a clear picture with regard to the pattern of hearing loss. Since the listener is instructed to respond to any tone perceived, a response could represent activation of inner hair cells at a site on the basilar membrane other than that corresponding to the stimulus frequency used, if that area is unable to respond. At present, it is not entirely clear how inner hair cells are damaged and how different types of inner hair cell damage affect signal transmission and sound perception. Below, we will speculate on the outcome for some different conditions. First, let us assume that an area of inner hair cells is damaged [complete or near complete dysfunction; a dead region (Moore, 2001; Moore, Glasberg & Stone, 2004)]. If the signal contains important information in the frequency region corresponding to the dead region, this information will be lost. Moreover, if the signal corresponding to the dead region is strong enough, it will stimulate nearby healthy inner hair cells that respond to off-frequencies. Consequently, a frequency component in the original signal may be coded as a different frequency component at a place in the cochlea with functional inner hair cells. Hence, not only is signal information lost, but some signals may be neurally coded (through place-coding) as containing frequencies not included in the original signal. Such distortion could mean that not only is phonemic information ambiguous due to the erroneous signal-to-neural coding; but, since some offfrequency spectral components will be coded, the target phoneme could even be classied as a different one. In people with high-frequency dead regions, amplication of speech signals corresponding to the dead region frequencies provided no or only small improvement in speech perception (Baer, Moore & Kluk, 2002). It has been argued that inner hair cell damage, barely detectable at threshold testing by audiogram, may still result in distortion (Moore, Vickers, Plack & Oxenham, 1999; Kollmeier, 1999). Roughly 1020 neurons innervate each inner hair cell. If some of these are lost, ne structural coding may be affected, but with little impact on simple pure-tone threshold testing. For example, a signal may be coded as present at the frequency corresponding to the area at the inner hair cell site, but its exact level cannot be resolved. Nevertheless, it could cause perceptual degradation at suprathreshold levels. A later section of this paper will further discuss some possible linguistic (phoneme identication) outcomes as a function of site of (cochlear) damage.
OUTPUT FROM THE AUDITORY MODEL The auditory model estimates signal output degeneration caused by (1) the effect of the active function of the outer hair cells (cochlear amplier), (2) the vibration to neural signal conversion of the inner hair cells, and (3) the inuence of the endocochlear potentials on the active mechanisms in the cochlea.
INFLUENCE FROM OUTER HAIR CELLS The loss or degeneration of outer hair cells has been studied and reported. The loss of the active amplication in the cochlea results in a reduced dynamic range caused by the elevated thresholds (loss of active regulation) but also a reduced temporal and spectral specicity (Robles & Ruggero, 2001). The reduced dynamic range can be restored with a nonlinear amplication algorithm, termed wide dynamic range compression (WDRC). The basic idea is to map a normal hearing individuals dynamic range onto the reduced dynamic range of the hearing impaired individual with the aim to keep all auditory input audible (Byrne, Dillon, Ching, Katsch & Keidser, 2001; Scollie, Seewald, Cornelisse et al., 2005). This requires a nonlinear amplifying function that introduces distortion. Hence, not only do the damaged outer hair cells cause signal distortion, the amplication adds further distortion. The tradeoff is between audibility and signal distortion the faster the compressor works the more sound is audible but at the cost of high signal distortion. If the compressor works slowly, almost no signal distortion is introduced but at the cost of greater inaudibility. Damage to the outer hair cells causes broadening of the auditory nerve bre tuning curves (Robles & Ruggero, 2001). Perceptually, the lowering of specicity results in spectral and temporal masking of the input signal. In the spectral domain, this masking means that spectral components that are close to each other are not resolvable as they would be in the healthy ear (Glasberg, Moore & Bacon, 1987; Glasberg & Moore, 1989). In terms of phoneme decoding, some phonemes that comprise spectral components that are close to each other may not be completely analyzed in a bottom-up fashion but may require more explicit top-down interpretation (cf. the ELU model, Fig. 3).
388 S. Stenfelt and J. Ronnberg
Fig. 3. Working Memory System for ELU (Ronnberg et al., 2008).
ENDOCOCHLEAR POTENTIALS A degradation of the endocochlear potentials has been suggested as one of several possible reasons for presbyacusis (Schmiedt, Lang, Okamura & Schulte, 2002; Mills & Schmiedt, 2004). A lowering of these potentials leads to a lowering of the voltage driving the active mechanism of the outer hair cells reducing the inuence of the cochlear amplier and also the gating of the inner hair cells. This results in degraded timing of the neural response. Although endocochlear potentials were not investigated per se, Pichora-Fuller et al. (2007) tested the inuence of pre-jittering the auditory signal in healthy young listeners. Jittering the auditory signal results in temporal ambiguity that could be similar to the temporal ambiguity of the neural signal caused by abnormally low endocochlear potentials. Speech perception in these participants then resembled that of older people with good audiograms.
THE COGNITIVE MODEL Recent studies have started to explore the role of cognition, especially in taxing listening conditions with demanding signal processing, in hearing aids (e.g., Lunner, 2003; Lunner & Sundewall-Thoren, 2007; Rudner, Foo, Ronnberg & Lunner, 2009a; Rudner, Foo, Sundewall-Thoren, Lunner & Ronnberg, 2008, see review by Akeroyd, 2008). These studies supplement and challenge speech perception models based solely on optimal signal-to-noise speech processing (e.g., the TRACE model, McClelland & Elman, 1986; the Cohort model, Marslen-Wilson, 1987; and the NAM model, Luce & Pisoni, 1998) since they suggest that individual differences in cognitive capacity affect communicative and social competence even when individuals have similar hearing loss and hearing aid interventions. Age is likely to further affect these ndings, as cognitive function in the elderly population is highly variable and vulnerable (Craik, 2007; Schneider, Pichora-Fuller & Daneman, in press). Part of the conceptual development in our labs is based on a recent model for Ease of Language Understanding (ELU, see Fig. 3). The ELU model describes a role for working memory in language understanding. It differs from extant speech perception models in that it emphasizes a broader understanding of speech, rather than simply perception of individual segments or
words, while also accommodating the effects of hearing impairment on speech perception (Auer & Bernstein, 2007). The ELU model also differs from WM models such as the component model (Baddeley, 2000) in its emphasis on communicative outcome (understanding) rather than working memory as such, and from capacity theory (Daneman & Carpenter, 1980) in its emphasis on phonological processing, the interaction with longterm memory, and the roles played by the implicit and explicit processing mechanisms rather than simply on general storage and processing capacity. The perceptual input to the model, see Fig. 3, is conceptualized as multi-sensory and/or multi-modal linguistic information, which at a cognitive level is assumed to be rapidly and automatically bound together to form phonological streams of information (RAMBPHO). Under optimum signal-to-noise conditions, the RAMBPHO information assumed to be especially crucial at the syllabic level unlocks the lexicon rapidly and implicitly by means of a matching mechanism. In particular, syllabic information from RAMBPHO is assumed to be matched with stored phonological representations in long-term memory (Ronnberg, 2003). If input conditions are sub-optimal, RAMBPHO information may not readily match stored long-term memory representations, and mismatch can occur. Mismatch may also occur if the lexical access mechanism is slowed down. This makes it hard to access meaning within the ow of continuous speech (cf. Pichora-Fuller, 2003). This, in turn, will cause overlaps and generate multiple phonological pseudo-targets. Less precise phonological representations in long-term memory may be another cause of mismatch (Andersson, 2002). When mismatch occurs, explicit processing and storage capacity are required to infer the meaning of the message conveyed by the speech signal, prospectively as well as retrospectively, on the basis of incompletely perceived spoken elements, as well as semantic and contextual information retrieved from long-term memory (cf. Hannon & Daneman, 2001). This explicit capacity is crucial for compensatory purposes (Ronnberg, 2003), especially for elderly people (Ronnberg et al., 2008), and may be gauged using non-auditory testing (Zekveld, George, Kramer, Goverts & Houtgast, 2007). In adverse listening conditions, and for persons with a severe hearing impairment, explicit and implicit processing are assumed to interact continuously to support speech understanding in the unfolding of the spoken message. The relative contribution of explicit and implicit functions to ELU varies as a function of mismatch, talker, context and dialogue-specic aspects (Ronnberg, 2003, for mathematical descriptions). In short, in challenging listening situations, ELU reects the capacity with which explicit functions can be carried out. In its turn, this is related to cognitive capacity and speech processing under mismatch conditions, but not necessarily under matching conditions (see Foo, Rudner, Ronnberg & Lunner, 2007; Rudner et al., 2008, 2009a). Furthermore, the distinction between implicit and explicit processing implies differences in the effort invested in the speech understanding task. Recent data from our laboratory (Rudner, Lunner, Behrens, Sundewall-Thoren & Ronnberg,
389
2009b) suggest that subjectively rated effort goes hand in hand with explicit processing mechanisms, but only above a certain threshold of difculty. Using a short-term recognition memory paradigm, Buchsbaum and DEsposito (2008) discovered that implicit audio-verbal repetition suppression effects were related to neural activation in anterior portions of the superior temporal lobe, whereas working memory maintenance and explicit retrieval of phonological information activated posterior parts of the superior temporal lobe. This neural dissociation provides further support for the ELU assumption about a fundamental distinction in implicit and explicit processing mechanisms pertinent to phonology and lexical access. Finally, there are also language-modality specic neural constraints on storage and processing in explicit working memory but not in implicit functions of the model (Ronnberg, Rudner & Ingvar, 2004; Rudner, Fransson, Ingvar, Nyberg & Ronnberg, 2007; Rudner & Ronnberg, 2008), but those constraints will not be discussed here.
for broad spectra respond selectively to a noise burst. Rapid neurons respond to rapid onsets while directional neurons encode formant transitions (Mesgarani et al., 2008). The neural patterns generated by these extracted features form the input to neural phoneme classiers, see Fig. 4.
MULTI-MODAL INPUT TO THE ELU MODEL The output from the auditory phoneme classier is integrated with non-auditory input at the rst stage of the ELU model (Fig. 3). Phonetic information can be integrated with and modulated by visual information, cf. the McGurk effect. The classical example of this effect is when a face is seen uttering the syllable ga at the same time as a recording is heard of the syllable ba, and the syllable da is erroneously perceived (see Campbell, 2008 for a review). Such binding of auditory and visual speech seems to occur at a syllabic language level after approximately 150 ms and is neurally manifest in increased activity in parts of the superior temporal gyrus (STG) and the posterior superior temporal sulcus (pSTS) (Campbell, 2008). In general, additional visual information about speech may help clarify what is hard to perceive auditorily. It is well known that speech sounds that are hard to perceive in noise can be perceived from lip movements, and vice versa, in a complementary fashion (Summereld, 1987). Apart from ndings suggesting binding of different modes of speech information, there is also neural evidence to suggest cortical convergences of sign and speech at phonological levels of perception and production (see review by Ronnberg, Soderfeldt & Risberg, 2000), as well as high cortical overlaps in activation patterns for visual speech and audiovisual speech (Soderfeldt, Ingvar, Ronnberg, Eriksson, Serrander & Stone-Elander, 1997; Campbell, 2008). These considerations, taken together with recent research on the episodic buffer in working memory (see review by Rudner & Ronnberg, 2008), has led us to suggest an episodic buffer (i.e., RAMBPHO) in language processing (Ronnberg et al., 2008). This takes into account that (1) multi-modal binding occurs early and rapidly in language processing, (2) binding involves long-term memory information, where lexical access is an important special case for language understanding, (3) RAMBPHO output is syllable-based, and (4) the system for language understanding develops to meet the demands of naturally occurring multi-modal or multi-language circumstances. RAMBPHO should not be conceived of as an intermediary storage function but rather as an expression of a nal stage in the phonological abstraction processes whose goal is lexical access (cf. Obleser & Eisner, 2009). Nevertheless, comparisons with previously stored information, by making inferences retrospectively or prospectively in time, are part and parcel of the explicit processing function of the model. To summarize, the output from peripheral auditory processing, via signal feature extraction and phonemic classication, together with selective (i.e. speechread) non-auditory input constitute a multi-modal input to the cognitive ELU model, and to RAMBPHO in particular.
LINKING THE TWO MODELS The output from the peripheral auditory processing model cannot be directly translated into the input to the cognitive ELU model. Between these two stages, signal feature extraction occurs. The details of this feature extraction are not entirely known but signal feature extraction is required to compress the rich signal information into a useful form (Elhilali & Shamma, 2008). It is often hypothesized that the most important features involve dominant frequencies, spectral shape, temporal envelope, harmonicity, onset, offset, binaural interaction, amplitude and frequency modulation (Miller, Escab, Read & Schre iner, 2002; Oxenham, Bernstein & Penagos, 2004; Viemeister, 1979). These types of feature extraction are believed to occur in the auditory periphery as well as in the central auditory pathways. Different phonemes can be characterized by their inherent acoustic features, and different phonemes evoke unique neural patterns within primary auditory cortex (Mesgarani, David, Fritz & Shamma, 2008). A vowel may be represented by peak frequencies corresponding to its formants as well as its temporal and spectral modulation. Consonants are better represented by the spectral distribution of the sound at onset. Neurons selective
Fig. 4. A conceptual layout of the connection between the auditory periphery and the ELU model. The intermediate part consists of two stages. The feature extraction stage form cue-extraction that is supplied to the classication stage. If the signal is noisy, the classier cannot extract the phoneme and the output to the ELU is ambiguous.
390 S. Stenfelt and J. Ronnberg DEGRADED INPUT TO AND CONSEQUENCES FOR THE ELU MODEL Because damage to the outer hair cells impairs temporal and spectral resolution, simple psychoacoustic tests that rest on these aspects of the neural stream may be expected to be sensitive to the status of the outer hair cells, e.g. gap detection, duration discrimination and phoneme or syllable identication. It is also known that chronological age affects temporal processing at the segmental and sub-segmental levels of speech bringing about impaired perception of e.g. stop consonants, despite normal audiograms (Pichora-Fuller, 2003, 2008). In the broader literature on language-impaired children, a similar message is proposed: disruption of basic auditory information processing capacities that tap into a persons ability to perceive brief temporal pauses may hamper formation of distinct phonological representations and perception of phoneme durations and vowel transitions (Tallal, 2004), and there is even evidence that nonlinguistic auditory training generalizes to perception of syllables (Lakshminarayanan & Tallal, 2007). A cognitive mechanism of ELU that depends on temporal and spectral resolution is an early, phonologically mediated, lexical access system (Ronnberg, 1990), which is part of the RAMBPHO-delivered information in speech understanding (Ronnberg et al., 2008), and an important bottom-up aspect of the description of skilled speechreading (cf. Auer, 2009; Lyxell & Ronnberg, 1991). In this context, early means less than 300400 ms from signal onset. The sooner lexical access can take place, i.e. the less phonological information that is needed for successful lexical access, the more efcient the system is. The demands on early, rapid and precise lexical access will be especially prominent in conditions with poorly specied physical input characteristics and rapid speech. Degree of impairment to the outer hair cells may therefore be correlated with degree of temporal slowing or phonological imprecision in RAMBPHO input to the lexicon (Ronnberg et al., 2008). Mismatch also depends on the precision of the phonological representations in long-term memory: hearing aid tting and acclimatization to certain signal processing schemes (affecting phonological processing) have been shown to induce mismatch effects when speech recognition is tested with types of signal processing to which the hearing aid user is unaccustomed (Foo et al., 2007; Rudner et al., 2008, 2009a). We also know that severe hearing impairment affects the precision of phonological representations over time (i.e., rhyme judgments, Andersson, 2002). By implication, it is quite feasible that long-term effects of destruction of the outer hair cells could indirectly cause a deterioration of phonological representations in long-term memory. If damage to the outer hair cells may result in some phonological imprecision in RAMBPHO, dead regions of inner hair cells will cause more profound impairments and frequency transpositions, leading to confusions at the phonemic level, and subsequently for syllabic abstraction in RAMBPHO. If specic frequency information is important for the classication of a phoneme, e.g. formants in a vowel, the phoneme may be classied
erroneously due to the transposition of a formant frequency within the dead region to a frequency corresponding to functional inner hair-cell bundles close to the dead region. However, if the acoustics of the phoneme comprises a broad band of frequencies, as is often the case for consonants with rapid onsets and/or offsets, a dead region need not impair the phoneme classier to the same extent. We therefore suggest that if formant frequencies are misinterpreted, vowels are affected, especially if the dead region is in the lower frequency bands. This has implications for hearing aid ttings: fast compression provides more information, possibly by enabling the target signal to be audible during gaps in modulated noise (Lunner et al., 2009). This means that for hearing impaired people who have dead regions the phoneme may be audible as a result of the fast compression, but may still be systematically misinterpreted due to the frequency transposition of formants. Greater reliance on vowels than consonants for listening in the gaps in modulated noise has recently been empirically demonstrated by Kewley-Port, Zachary Burkle and Hee Lee (2007). Consequently, such patients may not benet from a fast compression strategy. It has also already been noted that loss of ne spectral structure following inner hair cell damage may occur even at supra-threshold levels, contributing to an increase in internal noise in the system. Such effects may also result in ambiguous phoneme classication. Thus, because of the profound inuences of inner hair cell damage on phonemic classication and phonological abstraction in RAMBPHO, we expect that lexical access processes will be especially sensitive to context and to early top-down modulation of the signal. Outer hair cell damage may of course also contribute to a need for top-down modulation of the signal, but in this case, signal processing is more likely to be degraded and slowed down, not qualitatively changed as it is the case for inner hair cells. Because a pathological endocochlear potential may affect the synchrony and timing of neural impulses to the brain, it will distort the phonological representations more generally, actually reinforcing the putative effects of damage to the inner and outer hair cells. However, lack of synchrony across frequency bands may be more disruptive of perception of consonants because they are typically characterized by rapid transitions. We are currently conducting a research program that will test the function of the outer/inner hair cells (cf. Stenfelt, 2008), phonological precursors such as duration discrimination (Lidestam, 2009) and phoneme discrimination by means of the gating paradigm (Grosjean, 1980), as well as different kinds of phonological long-term memory representations and phonological working memory in the same individual. This will allow us to more fully chart linkages between the peripheral model and the ELU model (Ronnberg et al., 2008).
ELU, EARLY TOP-DOWN PROCESSING AND TIMEWINDOWS Viewing the mechanisms of the ELU model over time, one implication of the mismatch assumption is that explicit
391
processing mechanisms are needed in order to retrieve additional semantic and contextual information from long-term memory to narrow down the set of lexical candidates in the speech stream. This is particularly relevant when there is little overlap in the information retrieved from the signal and the information stored in long-term memory. We have suggested that inner hair cell function is especially critical to maintaining the integrity of information related to signal processing. Recent research on the neural networks supporting speech understanding shows a system that is hierarchical and highly interrelated, with rich possibilities for early top-down modulation of speech (Scott & Johnsrude, 2003). One general kind of top-down, early modulation of the speech signal is a function of whether the signal is treated as language or not by the brain. Patterson and Johnsrude (2008) reviewed recent evidence regarding the cognitive state of the perceiver. That is, they claried the cortical processing streams distinguishing sounds (vocal signal) and language (speech) in the speech stream. The left hemisphere and the upper banks of the Superior Temporal Sulcus (STS) seem to be specically activated by the linguistic mode of signal processing, whereas the human voice in general activates STS bilaterally. According to Poeppel, Idsardi and Wassenhove (2008), both segmental/syllabic analysis and audiovisual integration are computed by the left STG and STS bilaterally within a time window of 150300 ms (where RAMBPHO is assumed to come into play, cf. Obleser & Eisner, 2009, for similar reasoning regarding phonological, pre-lexical abstraction). Furthermore, it has been suggested that lexical access is computed by the left Middle Temporal Gyrus (MTG; Poeppel et al., 2008). However, early temporal processing within a window of 2080 ms (where the peripheral auditory processing and simple feature extraction is assumed to take place) is computed by the ascending pathways and the core auditory belt (Poeppel et al., 2008). Very early on in the processing chain, internal neural representations are assumed to be set up and updated continuously every 30 ms, with the internal neuronal representation tested and successively modied (from spectral via segmental to lexical hypotheses). The Poeppel et al. (2008) dual time-window approach to the understanding of speech, with a postulated analysis-by-synthesis feed forward system, is one detailed mechanism that could be important for clarifying the linkage between our peripheral and cognitive model (as indicated above). It is also important to acknowledge that visual information about speech, especially when it is complementary to auditory information, may serve as part of the analysis-by synthesis mechanism, occurring within 150 ms of auditory signal onset (cf. Campbell, 2008). Our emphasis is on conditions when mismatch occurs (with a time window similar to the window for syllabic analysis; Naata nen, 2008), and when explicit repair strategies are necessary. More specically, explicit semantic processing of the speech stream activates anterior superior temporal and inferior frontal cortical areas, which implies multiple sites of processing that all contribute to the composition and repair of the percept (Scott & Johnrude, 2003). One recent study by Obleser, Wise, Dresner
and Scott (2007) investigated the interaction between signal degradation (i.e., noise vocoded sentences of intermediate difculty) and semantic predictability, and found increased activation in the left angular gyrus, lateral prefrontal areas and the posterior cingulate gyrus, with the activity in the STS independent of predictability. This study is pertinent to our theorizing as it clearly demonstrates the interconnectivity of cortical regions in the process of disambiguation of a degraded signal, and it points to the possibility that explicit components (e.g., the prefrontal cortex) are directly involved. Along the same lines, Wingeld and Grossman (2006) presented evidence to suggest that in healthy aging persons, a widespread compensatory cortical network is activated to maintain speech understanding performance at levels similar to young persons. Rapid rates of processing and syntactically complex sentences challenge executive and working memory resources to a larger extent in the older person. Brain compensation must therefore utilize those cortical areas responsible for executive functions and working memory to a relatively larger extent. Future research should focus on the explicit (and early implicit) aspects of top-down inuences on speech understanding in conditions of poorly specied or distorted input, combined with the specics of hearing impairment. In particular, the interaction between different types of hearing impairment, different kinds of signal processing used in a hearing aid and individual differences in working memory capacity may be the key to this development.
The authors gratefully acknowledge the valuable input of Mary Rudner (Linkoping University), Kathy Pichora-Fuller (University of Toronto), and two anonymous reviewers.
REFERENCES
Andersson, U. (2002). Deterioration of phonological processing skills in adults with an acquired severe hearing loss. European Journal of Cognitive Psychology, 14, 335352. Akeroyd, M. (2008). Are individual differences in speech reception related to individual differences in cognitive ability? A survey of twenty experimental studies with normal and hearing-impair. International Journal of Audiology 2008; 47(Suppl. 2), S125S143. Auer, E. T., Jr (2009). Spoken word recognition by eye. Scandinavian Journal of Psychology, 50, 419425. Auer, E. T., Jr & Bernstein, L. E. (2007). Enhanced visual speech perception in individuals with early-onset hearing impairment. Journal of Speech, Language and Hearing Research, 50, 11571165. Baddeley, A.-D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417423. Baer, T., Moore, B. C. & Kluk, K. (2002). Effects of low pass ltering on the intelligibility of speech in noise for people with and without dead regions at high frequencies. Journal of the Acoustical Society of America, 112(3), 11331144. Bruce, I. C., Sachs, M. B. & Young, E. D. (2003). An auditory-periphery model of the effects of acoustic trauma on auditory nerve responses. Journal of the Acoustical Society of America, 113, 369 388. Buchsbaum, B. R. & DEsposito, M. (2008). Repetition suppression and reactivation in auditoryverbal short-term recognition memory. Cerebral Cortex, doi:10.1093/cercor/bhn186.
392 S. Stenfelt and J. Ronnberg

Byrne, D., Dillon, H., Ching, T., Katsch, R. & Keidser, G. (2001). NAL-NL1 procedure for tting nonlinear hearing aids: characteristics and comparisons with other procedures. Journal of the American Academy of Audiology, 12, 3751. Campbell, R. (2008). The processing of audio-visual speech: Empirical and neural bases. Philosophical Transactions of the Royal Society London, 363, 10011010. Carney, L. H. (1993). A model for the responses of low-frequency auditory-nerve bres in cats. Journal of the Acoustical Society of America, 93, 401417. Craik, F. I. M. (2007). The role of cognition in age-related hearing loss. Journal of the American Academy of Audiology, 18, 539547. Daneman, M. & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning & Verbal Behavior, 19, 450466. de Boer, E. (1997). Classical and non-classical models of the cochlea. Journal of the Acoustical Society of America, 101, 21482150. Edwards, B. (2007). The future of hearing aid technology. Trends in Amplication, 11(1), 3145. Elhilali, M. & Shamma, S. A. (2008). A cocktail party with a cortical twist: How cortical mechanisms contribute to sound segregation. Journal of the Acoustical Society of America, 124(6), 37513771. Evans, B. N. & Dallos, P. (1993). Stereocilia displacement induced somatic motility of cochlear outer hair cells. Proceedings of the National Academy of Sciences USA, 90, 82478351. Foo, C., Rudner, M., Ronnberg, J. & Lunner, T. (2007). Recognition of speech in noise with new hearing instrument compression release settings requires explicit cognitive storage and processing capacity. Journal of the American Academy of Audiology, 18, 553 566. Gates, G. A. & Mills, J. H. (2005). Presbycusis. Lancet, 366(9491), 11111120. Glasberg, B. R. & Moore, B. C. (1989). Psychoacoustic abilities of subjects with unilateral and bilateral cochlear hearing impairments and their relationship to the ability to understand speech. Scandinavian Journal of Audiology, (Suppl. 32), 125. Glasberg, B. R. & Moore, B. C. (1990). Derivation of auditory lter shapes from notched-noise data. Hearing Research, 47, 103138. Glasberg, B. R., Moore, B. C. & Bacon, S. P. (1987). Gap detection and masking in hearing-impaired and normal-hearing subjects. Journal of the Acoustical Society America, 81(5), 154656. Grosjean, F. (1980). Spoken word recognition processes and the gating paradigm. Perception & Psychophysics, 28, 267228. Hannon, B. & Daneman, M. (2001). A new tool for measuring and understanding individual differences in the component processes of reading comprehension. Journal of Educational Psychology, 93(1), 103128. Hato, N., Stenfelt, S. & Goode, R. L. (2003). Three-dimensional stapes footplate motion in human temporal bones. Audiology and Neurotology, 8, 140152. Kewley-Port, D., Zachary Burkle, T. & Hee Lee, J. (2007). Contribution of consonant versus vowel information to sentence intelligibility for young normal-hearing and elderly hearingimpaired listeners. Journal of the Acoustical Society of America, 122, 23652375. Kochkin, S. (2005). MarkeTrak VII: Hearing loss population tops 31 million people. Hearing Review, 12(7), 1629. Kollmeier, B. (1999). Signal Processing in the impaired auditory system and its implications for hearing instruments. In A. N. Rasmussen, P. A. Osterhammel, T. Andersson & T. Poulsen (Eds.), Auditory models and non-linear hearing instruments: 18th Danavox Symposium (pp. 251273). Denmark: Holmens Trykkeri. Kramer, S. E., Kapteyn, T. S. & Houtgast, T. (2006). Occupational performance: Comparing normally-hearing and hearing-impaired
Scand J Psychol 50 (2009) employees using the Amsterdam Checklist for Hearing and Work. International Journal of Audiology, 45(9), 503512. Lakshminarayanan, K. & Tallal, P. (2007). Generalization of non-linguistic auditory perceptual training to syllable discrimination. Restorative Neurology and Neuroscience, 25, 263272. Lidestam, B. (2009). Visual discrimination of vowel duration. Scandinavian Journal of Psychology, 50, 427435. Luce, P. A. & Pisoni, D. A. (1998). Recognising spoken words: The neighbourhood activation model. Ear and Hearing, 19, 136. Lunner, T. (2003). Cognitive function in relation to hearing aid use. International Journal of Audiology, 42(Suppl. 1), S49S58. Lunner, T. & Sundewall-Thoren, E. (2007). Interactions between cog nition, compression, and listening conditions: Effects on speech-innoise performance in a two-channel hearing aid. Journal of the American Academy of Audiology, 18, 539552. Lunner, T., Rudner, M. & Ronnberg, J. (2009). Cognition and hearing aids. Scandinavian Journal of Psychology, 50, 395403. Lyon, R. F. (1982). A computational model of ltering, detection, and compression in the cochlea. IEEE ICASSP, 82, 12821285. Lyxell, B. & Ronnberg, J. (1991). Visual speech processing: Word decoding and word discrimination related to sentence-based speechreading and hearing-impairment. Scandinavian Journal of Psychology, 32, 917. Marslen-Wilson, W. (1987). Functional parallelism in spoken word recognition. Cognition, 25, 71103. McClelland, J. L. & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18, 695698. Meddis, R. (1988). Simulation of auditory-neural transduction: Further studies. Journal of the Acoustical Society of America, 83, 1056 1063. Meddis, R. (2006). Auditory-nerve rst-spike latency and auditory absolute threshold: A computer model. Journal of the Acoustical Society of America, 119, 406417. Mesgarani, N., David, S., Fritz, J. & Shamma, S. (2008). honeme representation and classication in primary auditory cortex. Journal of the Acoustical Society America, 123, 899909. Miller, L. M., Escab, M. A., Read, H. L. & Schreiner, C. E. (2002). Spectrotemporal receptive elds in the lemniscal auditory thalamus and cortex. Journal of Neurophysiology, 87(1), 51627. Mills, D. M. & Schmiedt, R. A. (2004). Metabolic presbycusis: Differential changes in auditory brainstem and otoacoustic emission responses with chronic furosemide application in the gerbil. Journal of the Association for Research in Otolaryngology, 5, 110. Moore, B. C. (2001). Dead regions in the Cochlea: Diagnosis, perceptual consequences, and implications for the tting of hearing aids. Trends in Amplication, 5, 134. Moore, B. C., Glasberg, B. R. & Stone, M. A. (2004). New version of the TEN test with calibrations in dB HL. Ear and Hearing, 25(5), 478487. Moore, B. C., Vickers, D. A., Plack, C. J. & Oxenham, A. J. (1999). Inter-relationship between different psychoacoustic measures assumed to be related to the cochlear active mechanism. Journal of the Acoustical Society of America, 106(5), 27612778. Naatanen, R. (2008). Mismatch negativity (MMN) as an index of cen tral auditory system plasticity. International Journal of Audiology, 47(Suppl. 2), S88S92. Obleser, J. & Eisner, F. (2009). Pre-lexical abstraction of speech in the auditory cortex. Trends in Cognitive Sciences., 13(1), 1419. Obleser, J., Wise, R. J. S., Dresner, M. A. & Scott, S. K. (2007). Functional integration across brain regions improves speech perception under adverse listening conditions. Journal of Neuroscience, 27, 22832289. Oxenham, A. J., Bernstein, J. G. & Penagos, H. (2004). Correct tonotopic representation is necessary for complex pitch perception.
Scand J Psychol 50 (2009) Proceedings of the National Academy of Sciences USA, 101(5), 14211425. Patterson, R. D. & Johnsrude, I. S. (2008). Functional imaging of the auditory processing applied to speech sounds. Philosophical Transactions of the Royal Society B, 363, 10231035. Patterson, R. D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C. & Allerhand, M. M. (1992). Complex sounds and auditory images. In Y. Cazals, L. Demay & K. Horner (Eds.), Auditory physiology and perception (pp. 429446). Oxford: Pergamon Press. Pichora-Fuller, M. K. (2003). Processing speed and timing in aging adults: psychoacoustics, speech perception, and comprehension. International Journal of Audiology, 32, S59S67. Pichora-Fuller, M. K. (2008). Use of supportive context by younger and older listeners: Balancing bottom-up and top-down information processing. International Journal of Audiology, 47, S72S82. Pichora-Fuller, M. K., Schneider, B. A., Macdonald, E., Pass, H. E. & Brown, S. (2007). Temporal jitter disrupts speech intelligibility: A simulation of auditory aging. Hearing Research., 223, 114121. Poeppel, D., Idsardi, W. D. & Wassenhove, V. van (2008). Speech perception at the interface of neurobiology and linguistics. Philosophical Transactions of the Royal Society B, 363, 10711086. Robles, L. & Ruggero, M. A. (2001). Mechanics of the mammalian cochlea. Physiological Reviews, 81, 13051352. Rosenhall, U., Pedersen, K. & Svanborg, A. (1990). Presbycusis and noise-induced hearing loss. Ear and Hearing, 11(4), 257263. Ronnberg, J. (1990). Cognitive and communicative function: The effects of chronological age and handicap age. European Journal of Cognitive Psychology, 2(3), 253273. Ronnberg, J. (2003). Cognition in the hearing impaired and deaf as a bridge between signal and dialogue: a framework and a model. International Journal of Audiology, 42(Suppl. 1), S68S76. Ronnberg, J., Rudner, M. & Ingvar, M. (2004). Neural correlates of working memory for sign language. Cognitive Brain Research, 20(2), 165182. Ronnberg, J., Rudner, M., Foo, C. & Lunner, T. (2008). Cognition counts: A working memory system for ease of language understanding (ELU). International Journal of Audiology, 47(Suppl. 2), S171S177. Ronnberg, J., Soderfeldt, B. & Risberg, J. (2000). The cognitive neuro science of signed language. Acta Psychologica, 105, 237254. Rudner, M., Foo, C., Ronnberg, J. & Lunner, T. (2009a). Cognition and aided speech recognition in noise: Specic role for cognitive factors following nine-week experience with adjusted compression settings in hearing aids. Scandinavian Journal of Psychology, 50, 405418. Rudner, M., Foo, C., Sundewall-Thoren, E., Lunner, T. & Ronnberg, J. (2008). Phonological mismatch and explicit cognitive processing in a sample of 102 hearing-aid users. International Journal of Audiology, 47(Suppl. 2), S91S98. Rudner, M., Fransson, P., Ingvar, M., Nyberg, L. & Ronnberg, J. (2007). Neural representation of binding lexical signs and words in the episodic buffer of working memory. Neuropsychologia, 45, 22582276. Rudner, M., Lunner, T., Behrens, T., Sundewall-Thoren, E. & Ronnberg, J. (2009b). Good cognitive resources make listening
393
under challenging conditions seem less effortful. 9th European Federation of Audiology Societies (EFAS) Congress, Tenerife, Spain. Rudner, M. & Ronnberg, J. (2008). The role of the spisodic buffer in working memory for language processing. Cognitive Processes, 9, 1928. Schmiedt, R. A., Lang, H., Okamura, H. O. & Schulte, B. A. (2002). Effects of furosemide applied chronically to the round window: a model of metabolic presbyacusis. Journal of Neuroscience, 22, 96439650. Schneider, B. A., Pichora-Fuller, M. K. & Daneman, M. (in press). The effects of senescent changes in audition and cognition on spoken language comprehension. In S. Gordon-Salant, R. D. Frisina, A. Popper & D. Fay (Eds.), The aging auditory system: Perceptual characterization and neural bases of presbycusis, Springer Handbook of Auditory Research. Berlin: Springer. Schuknecht, H. F. & Gacek, M. R. (1993). Cochlear pathology in presbycusis. Annals Otology, Rhinology & Laryngology, 102(1 Pt 2), 116. Scollie, S., Seewald, R., Cornelisse, L., Moodie, S., Bagatto, M., Laurnagaray, D., Beaulac, S. & Pumford, J. (2005). The Desired Sensation Level multistage input/output algorithm. Trends in Amplication, 9, 159197. Scott, S. K. & Johnsrude, I. S. (2003). The neuroanatomical and functional organization of speech perception. Trends in Neuroscience, 26, 100107. Shaw, E. A. G. (1974). Transformation of sound pressure level from the free eld to the eardrum in the horizontal plane. Journal of the Acoustical Society of America, 56, 18481861. Soderfeldt, B., Ingvar, M., Ronnberg, J., Eriksson, L., Serrander, M. & Stone-Elander, S. (1997). Signed and spoken language perception studied by positron emission tomography. Neurology, 49, 8287. Stenfelt, S. (2008). Towards understanding the specics of cochlear hearing loss: A modeling approach. International Journal of Audiology, 47(Suppl. 2), 1015. Summereld, A. Q. (1987). Some preliminaries to a theory of audiovisual speech processing. In B. Dodd & R. Campbell (Eds.), Hearing by eye (pp. 5882). Hove, UK: Lawrence Erlbaum. Tallal, P. (2004). Opinion improving language and literacy is a matter of time. Nature Reviews Neuroscience, 5(9), 721728. Van Tasell, D. (1993). Hearing loss, speech, and hearing aids. Journal of Speech & Hearing Research, 36, 228244. Viemeister, N. F. (1979). Temporal modulation transfer functions based upon modulation thresholds. Journal of the Acoustical Society of America, 66(5), 13641380. Wingeld, A. & Grossman, M. (2006). Language and the aging brain: Patterns of neural compensation revealed by functional brain imaging. Journal of Neurophysiology, 96, 28302839. Zekveld, A. A., George, E. L. J., Kramer, S. E., Goverts, S. T. & Houtgast, T. (2007). The development of the text reception threshold test: A visual analogue of the speech reception threshold test. Journal of Speech, Language & Hearing Research, 50, 576 584. Received 4 May 2009, accepted 4 May 2009

The Signal-Cognition Interface - Interactions Between Degraded Auditory Signals and Cognitive Processes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Signal-Cognition Interface - Interactions Between Degraded Auditory Signals and Cognitive Processes

Uploaded by

Copyright:

Available Formats

Scandinavian Journal of Psychology, 2009, 50, 385393

Background and Basic Processes

Scand J Psychol 50 (2009)

Fig. 2. Schematics for the peripheral auditory model (Stenfelt, 2008).

Scand J Psychol 50 (2009)

388 S. Stenfelt and J. Ronnberg

Scand J Psychol 50 (2009)

Fig. 3. Working Memory System for ELU (Ronnberg et al., 2008).

Scand J Psychol 50 (2009)

The Signal-Cognition interface

Scand J Psychol 50 (2009)

Scand J Psychol 50 (2009)

The Signal-Cognition interface

392 S. Stenfelt and J. Ronnberg

The Signal-Cognition interface

You might also like