You are on page 1of 3


Intelligibility: what’s the score?
Improving intelligibility is an important goal for many of our clients, notably those with dysarthria. The associated assessment and therapy usually involves sets of minimally different words. But do we know enough about the scoring and interpretation of such tools to use them effectively? Jennifer Vigouroux and Nick Miller investigate.


harting changes in intelligibility is an integral part of clinical work, particularly for clients who have dysarthria. In a typical word recognition test, or more valid diagnostic intelligibility test (Yorkston & Beukelman, 1981; Kent et al., 1989; Weismer & Martin, 1992; Kent et al., 1994; Wilcox & Morris, 1999), speakers read or repeat a list of words randomly selected from items composed of (near) minimally differing sets of words (see examples in next paragraph). The number of words recognised by a listener is taken as a measure of severity of any intelligibility problem. In diagnostic intelligibility tests, scrutinising the pattern of mishearings gives valuable insights into the nature of the underlying disorder and targets for therapy. There are two broad approaches to scoring such tests: 1. Open format scoring Here, listeners write on a blank sheet the word they think they heard. 2. Closed format scoring For each item the listener is presented with a selection of words, one of which is the target and the others (near) minimally differing foils. For example, the targets for three items from the test we used for this study were keep, wad and fat. For each item listeners had to circle which word they think they heard from a choice respectively of: cup, coop, keep, cape, cope, carp (looking at vowel variation); what, wad, watch, watts, one, was (tapping word final tongue tip sounds); bat, pat, fat, vat, mat, hat (assessing ability to make word initial labial distinctions). But do these two methods lead to equivalent scores - or are they qualitatively different tests with differing strengths? Searching the literature does not turn up many suggestions. Yorkston & Beukelman (1978, 1980) found scores

elicited from the open format were significantly lower than scores from the closed format. Despite differences in intelligibility scores, however, both versions still ranked dysarthric speakers similarly. Both assessment formats have therefore been referred to as equally reliable and applicable methods of evaluation. Kent et al. (1989) refer to differences between open vs closed scoring as a ‘negligible concern’ (p.492). Yorkston & Beukelman (1981) acknowledged the potential score difference by specifying that (p.6) “once selected, subsequent re-administrations should utilise the same judging format”.

inconsistencies between the scoring methods, it would have implications for diagnostic accuracy, outcome measures in therapy and prioritisation of treatment. What would be useful to know is exactly how great a discrepancy arises in scores between the two formats; is the gap between them systematic (so predictable across different speakers), or inconsistent? Another issue concerns which scoring procedure is more predictive of performance in spontaneous speech? To find some answers, we recorded twenty-seven people, age 62-83 years (mean 72 years), with varying severity of Parkinson’s disease. They read sixty items from a diagnostic intelligibility test which we had devised (based on Yorkston & Beukelman, 1981 but adjusted to the local accent of the area where the speakers came from). They also described ‘how to make a cup of tea’. Two groups of ‘everyday listeners’ (with no experience of dysarthric speech or Parkinson’s disease), ranging in age from 18-76 years, listened to the same set of recordings. Each speaker was scored by three different listeners in each format; each listener heard five different speakers. Thirty one listeners scored in a closed format – that is, they ticked the word they heard from a choice of twelve (near) minimally distinguished words (target and 11 foils). Examples from the score sheet appear in table 1. Thirty other listeners had just a blank piece of paper

If it were established that there are inconsistencies between the scoring methods, it would have implications for diagnostic accuracy, outcome measures in therapy and prioritisation of treatment. Question marks
However, there are enough question marks over earlier studies to justify looking again. For instance, earlier studies used small numbers of speakers, of a wide range of intelligibility level, and listening was done by a limited number of trained speech and language therapists or student speech and language therapists, not carers or everyday listeners. If it were established that there are



to write answers down (open format). They all heard an excerpt from the ‘cup of tea’ passage and rated on a five point scale how disordered / natural the speech sounded. The total words correctly identified by the three listeners were averaged to derive an intelligibility score per speaker. Mean ‘disordered’ scores were calculated in the same way.
Table 1 Sample items from the closed format scoring version of the diagnostic intelligibility test cub cape one watts mat vat store snore coop heap fall was cat what draw score cup cop what wad hat heart flair sore carp hub wash want fat tat floor chore keep cap waltz warn pat bat four spore sheep hoop wool watch gnat sat poor nor

We found a difference in scores derived from the two formats (see summary in table 2). The differences are statistically significant (t (26) = 7.10 p< 0.001), however such a divergence would not matter too much if change were systematic and people were ordered the same in severity by both versions. Looking at individual speakers, though, there was not a consistent variation. For example speakers 6, 9, 12, and 17 in the closed test obtained mean 54, 54, 53, and 55 points respectively. In the open test they ranged from mean 35 (speaker 12) to 52 (speaker 17). Speaker 22 scored almost identically in both test formats; speaker 29 achieved higher in the open than the closed version.

Table 2 Mean intelligibility raw scores from the open and closed format diagnostic intelligibility test Scoring No. format Closed 27 Open 27 Mean Median Standard Range (out of 60) deviation 47.85 34.86 50.3 37.3 9.88 13.41 15-59 0.33- 54.97

How did results in the different formats fare in relation to the ‘disordered’ ratings made on the basis of the ‘cup of tea’ description? Intelligibility rankings from the open and closed listener formats were compared to the disordered ratings. There was an almost significant relationship of intelligibility-rating scale scores only in the open format ratings – so indicating that scores from the open version came closest to reflecting listeners’ impressions of levels of severity from spontaneous speech. All this tells us that significantly lower intelligibility scores are obtained using an open format. That agrees with Yorkston & Beukelman (1978, 1980) but, contrary to Kent et al. (1989), the difference was not ‘negligible’, and in contrast to Yorkston and others’ conclusions the difference is not neatly systematic. So why did we get such different outcomes? a) We used a large sample of everyday naive listeners with mixed ages, not either two experienced speech and language therapists (Yorkston & Beukelman, 1980) or a group of young speech and language therapy students (Yorkston & Beukelman, 1978). It is known that both age and experience of dysarthric speech can exercise an influence on intelligibility scores (Tjaden & Liss, 1995; Dagenais et al. 1998, 1999; Garcia & Hayden, 1999). b) We employed a larger sample of speakers, all with the same aetiology. Yorkston & Beukelman (1978, 1980) used small samples (9-12 speakers) with varying aetiologies. c) Also, the fact that Yorkston & Beukelman (1978, 1980) used their small sample with a broad range of intelligibility severities may have guaranteed close correspondence between open and closed format diagnostic intelligibility tests for the purpose of ranking speaker impairment. If there are sufficiently large gaps of ability between subjects, subtle differences between test formats or listener effects are not revealed. In this study, participants clustered at the moderate-mild end. d) Our word sets were also more tightly controlled for minimal differences. This may have made the listening task harder for our listeners (especially as they were people unused to hearing dysarthric speech).

ity, loudness, not articulation. To gain a fuller picture of changes, these voice and prosodic features need to be assessed too. This point has further consequences. Closed format is better where small differences are to be gauged in relatively severely impaired speakers, whilst the open format version is more sensitive to impairment and differences in mildly affected speakers. The issue is neatly illustrated with pre and post therapy intelligibility assessments we conducted with people not associated with this study. Both John and Jean had dysarthria following stroke.

It is important to control for those who do the scoring of the tests.
John was severely affected. Therapy subjectively appeared to have made some progress. Yet a colleague who was not familiar with him, scoring the test from audiorecordings using an open format, revealed no progress. However, when matched listeners repeated the task with the target and foils before them, clear improvement from pre to post therapy was shown. Jean demonstrated the opposite. She had a relatively mild intelligibility problem, which nevertheless disturbed her greatly. For that reason we had worked hard on speech. Closed class scoring revealed little benefit from our collective efforts. Using open class scoring a definite step forwards from pre to post therapy emerged.

Clinical practice
So, what’s the score for intelligibility testing? As regards clinical practice we suggest the following: 1. Scores across closed and open formats are not equivalent. Thus, to compare across speakers or in one speaker over time, one must compare using the same format. Using open scoring on one occasion and closed on another will not deliver a valid measure of change. 2. The strengths of the two approaches lie in different directions. Where a speaker is severely affected, then closed format scoring will provide a more sensitive measure. Conversely, where intelligibility is relatively mildly impaired, the more sensitive measure will be open format scoring. When it comes to associations with overall speech naturalness the indication is that open format scores correspond more closely. This may be a function of using relatively unimpaired intelligibility speakers here. We still have to confirm whether this would be true for severely affected speakers. 3. The study also confirms that it is important to control for who does the scoring of the tests. ‘Everyday’ listeners may rate differently to clinicians; people used to hearing dysarthric speech rate differently to those unfamiliar with it. Hence experience, age and familiarity need to be carefully controlled across time to establish reliable readings of change in intelligibility. Jennifer Vigouroux is a speech and language therapist at Newcastle General Hospital and Dr Nick Miller a senior lecturer based in Speech and language sciences, George VI Building, University of Newcastle, Newcastle-upon-Tyne NE1 7RU, tel. 0191 222 5603, e-mail

Large discrepancies
We ranked individuals by score in each format. This revealed some large discrepancies. Over half the people were ranked more than five positions from their position in the other test, with speaker 29 in the extreme case ranked 5th for severity in the open test and 22nd in the closed test. Even when we tried to group people together into severity bands of mild-moderate-severe, only 7 fell in the same scoring band across formats. Some of the discrepancy might have led back to having different groups of people rate the recordings, so we removed listeners from the equation if they looked like they sometimes produced a score that was out on a limb compared to the other people scoring a given test. This improved the correspondence between open and closed rankings, but still did not resolve the inconsistencies in rankings, and made no difference to the relative mean scores across conditions.

If there are sufficiently large gaps of ability between subjects, subtle differences between test formats or listener effects are not revealed.
From our study, it seems open format scoring tallies closer to overall intelligibility rating of spontaneous speech. Closer scrutiny of the figures uncovered some more lessons in relation to this. The correspondence was stronger for more severely impaired speakers, and not so much for those mildly affected. This hypothesis is supported by Dagenais et al. (1998, 1999) who found intelligibility correlated with naturalness for moderately (1999) but not mildly (1998) impaired dysarthric speakers. Yorkston & Beukelman (1981) also argued that closed format tests do not consistently identify mildly impaired speakers. The reason doubtless relates to if articulation is not significantly distorted, what identifies these speakers as impaired are aspects of speaking rate, voice qual-




This research was supported in part through a grant from the Parkinson’s Disease Society, GB. Grateful thanks also to the speakers and listeners who volunteered to be part of this study.

One stop shop
OATS (Open Source Assistive Technology Software) is the first free online ‘one stop shop’ of open source software that enables those with disabilities to access computers. Developed by a consortium headed by the ACE Centre, the resource also provides a forum for developers to interact with users and even customise software in response to individual quests, which can then in the ‘open source’ spirit be made available to others.

Contact a Family
Contact a Family has produced several new / updated items including: • A checklist of the most common benefits and other help which may be available to the parents of a disabled child • Reaching out to Disabled Parents • Reaching out to Fathers • Reaching out to Black and Minority Ethnic Parents • Health Professionals Support Pack • ‘Concerned about your child?’ to encourage parents who are worried about their child’s development to seek medical advice.

Dagenais, P., Watts, C. & Garcia, J. (1998) ‘Acceptability and intelligibility of mildly impaired dysarthric speech by different listeners’, in Cannito, M., Yorkston, K. & Beukelman, D, (eds.) Neuromotor Speech Disorders: Nature, Assessment and Management. Baltimore: Brookes, pp. 229-240. Dagenais, P., Watts, C., Turnage, L. & Kennedy, M. (1999) ‘Intelligibility and acceptability of moderately dysarthric speech by three types of listeners’, Journal of Medical Speech and Language Pathology, 2, pp. 91-96. Garcia, J. & Hayden, M. (1999) ‘Young and older listener understanding of a person with severe dysarthria’, Journal of Medical Speech and Language Pathology, 7, pp. 109-112. Kent, R., Weismer, G., Kent, J. & Rosenbek J. (1989) ‘Toward phonetic intelligibility testing in dysarthria’, Journal of Speech and Hearing Disorders, 54, pp. 482-499. Kent, R., Miolo, G. & Bloedel, S. (1994) ‘Intelligibility of children’s speech: a review of evaluation procedures’, American Journal Speech Language Pathology (May), pp. 81-93. Tjaden, K. & Liss, J. (1995) ‘The role of listener familiarity in the perception of dysarthric speech’, Clinical Linguistics and Phonetics, 9, pp. 139-154. Weismer, G. & Martin, R. (1992) ‘Acoustic and perceptual approaches to the study of intelligibility’, in Kent, R.D. (ed.) Intelligibility in Speech Disorders. Philadelphia: John Benjamins, pp. 67-118. Wilcox, K. & Morris, S. (1999) Children’s speech intelligibility measure. San Antonio: Psychological Corporation. Yorkston, K. & Beukelman, D. (1978) ‘A comparison of techniques for measuring intelligibility of dysarthric speech’, Journal of Communication Disorders, 11, pp. 499512. Yorkston, K. & Beukelman, D. (1980) ‘A clinician-judged technique for quantifying dysarthric speech based on single-word intelligibility’, Journal of Communication Disorders, 13, pp. 15-31. Yorkston, K. & Beukelman, D. (1981) ‘Assessment of Intelligibility of Dysarthric Speech’, Tigard: CC Publications. SLTP

The Scottish Intercollegiate Guidelines Network (SIGN) will be publishing guidelines on head and neck cancer and autism spectrum disorders this year. For updates on progress see www.sign.

A new leaflet from the UK’s 13 health and social care regulatory bodies including the Health Professions Council (HPC) lists which body is responsible for monitoring each profession and gives their contact details. It also explains what regulation means. “Who regulates health and social care professionals?” from (available in large print and 12 languages)

Altered Images: Becoming parents of our disabled children.
A parent-led self-help group in North Yorkshire has produced a collection of writing about being and becoming parents of disabled children. £12.99, e-mail

The first version of React2 software for people with aphasia or other language disorders will be available from October. Purchasers will get an automatic upgrade to version 2 in early 2007, which will have over 9000 exercises. New features include randomised exercises, full screen presentation and increased settings controls. There are five modules: auditory processing, visual processing, semantics, memory / sequencing, and life skills. Single user prices vary from £50-£95 per module to around £375 for a full clinical user. Upgrade costs for users of React and multiple user licences are also available.

Waving not drowning
Working Families, which campaigns for a better balance between home and work for all families, has produced a new guide for working parents of disabled children based on interviews with parents who already successfully combine working and caring. The organisation’s ‘Children with Disabilities’ project helps parents of disabled children who are working or would like to work, and has a parents’ newsletter ‘Waving not drowning’. Leaflets aimed at helping social workers and health visitors understand the needs of working carers may also be of interest to speech and language therapists. Make it work for you is £5.50 (post free) for parents and £15.50 for others.

Photosymbols 2
This CD of photographic images for making easyto-read information has been redesigned to include over 800 new images, most suggested by version 1 users. The models reflect a more diverse range of people and over 2000 images are included. There have also been technical improvements which will make the software easier to access and install than before and all images will be supplied at a resolution suitable for commercial print jobs.

Food Talks
Scope Early Years has developed a pack to promote inclusion for children with eating difficulties. Training has also been developed to accompany the pack. For more details contact Scope Early Years Coordinator Jackie Logue on 01933 625284. Food Talks, £15 inc. p&p, tel. Maria Linehan on 01233 840764.


The hands-free WhisperPhone aims to help people focus on and hear their own speech sounds louder and more clearly. The manufacturer reports positive feedback from speech language pathologists relating to users with a variety of communication needs.

CAMH guide
The British Medical Association’s board of science has produced a report on ‘Child And Adolescent Mental Health – A Guide For Healthcare Professionals’. Download free at Childadolescentmentalhealth