You are on page 1of 18

ARTICLE IN PRESS

Social Science & Medicine 57 (2003) 1289–1306

A systematic and critical review of the process of


translation and adaptation of generic health-related
quality of life measures in Africa, Asia, Eastern Europe,
the Middle East, South America
Annabel Bowden, Julia A. Fox-Rushby*
Health Policy Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK

Abstract

In recent years there has been a worldwide increase in demand for subjective measures of health-related quality of life
(HRQL). Researchers have the choice of whether to develop a new measure or whether to adapt an existing measure in
another language. This review evaluates the processes used in translating and adapting nine generic HRQL instruments
(15D, Dartmouth COOP/WONCA Charts, EuroQol, HUI, NHP, SIP, SF-36, QWB, WHOQOL) for use in Africa,
Asia, Eastern Europe, the Middle East, and South America. The review adopts a universalist model of equivalence,
outlined by Herdman, Fox-Rushby, and Badia (Qual. Life Res. 7 (1998) 323), to judge the 58 papers reviewed. Research
spans 23 countries and is dominated by research in the East Asia and Pacific region and the SF-36. Results are reported
for conceptual, item, semantic, operational, measurement and functional equivalence. It is argued that currently there is
a misguided pre-occupation with scales rather than the concepts being scaled and too much reliance on unsubstantiated
claims of conceptual equivalence. However, researchers using the WHOQOL approach are more likely to establish
reliable conclusions concerning the equivalence of their instrument across countries. It is a key conclusion of this review
that research practice and translation guidelines still need to change to facilitate more effective and less biased
assessments of equivalence of HRQL measures across countries.
r 2003 Elsevier Science Ltd. All rights reserved.

Keywords: Health-related quality of life; Translation; Adaptation; Equivalence; Cross-cultural

Introduction increasingly global dissemination of knowledge through


academic journals and international societies.
Demand for subjective measures of health-related Most generic measures of HRQL have been developed
quality of life (HRQL) is increasing worldwide. This is in English, leaving researchers working in other lan-
due to several factors, including the rising burden of guages with two options; either to develop a new
chronic disease; the desire to measure the impact of measure and/or to translate an existing measure. There
health interventions beyond the absence of disease; the is increasing concern about the relevance of translated
requirements of international organisations to estimate versions of HRQL instruments (Fox-Rushby & Parker,
the impact of a wide variety of interventions to aid 1995). Criticisms have focussed on the quality of the
resource allocation (Bobadilla & Cowley, 1995); and an translation process (Anderson, Aaronson, Bullinger, &
McBee, 1996) and because cultural differences are not
*Corresponding author. Tel.: +44-20-7927-2267; fax: +44- ‘accounted for’ during the translation and adaptation
20-7637-5391. process (Hunt, 1994). Guidelines for translating HRQL
E-mail address: julia.fox-rushby@lshtm.ac.uk questionnaires have been developed (Guillemin, Bom-
(J.A. Fox-Rushby). bardier, & Beaton, 1993; Bullinger et al., 1998; Sartorius

0277-9536/03/$ - see front matter r 2003 Elsevier Science Ltd. All rights reserved.
doi:10.1016/S0277-9536(02)00503-8
ARTICLE IN PRESS
1290 A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306

Table 1
Summary of the Herdman et al. (1998) definitions for each type of equivalence

Equivalence Definition

Conceptual Achieved when the questionnaire has the same relationship to the underlying concept in both cultures, primarily in
terms of the domains included and the emphasis placed on different domains.
Item ‘‘Item equivalence exists when items estimate the same parameters on the latent trait being measured and when
they are equally relevant and acceptable in both cultures’’ (p. 325).
Semantic The transfer of meaning across languages, achieving a ‘‘similar effect’’ on respondents who speak different
languages.
Operational ‘‘the possibility of using a similar questionnaire format, instructions, mode of administration and measurement
methods’’ (p. 329).
Measurement The psychometric properties of the adapted version of the HRQL measures are equivalent.
Functional ‘‘...the extent to which an instrument does what it is supposed to do equally well in two or more cultures’’ (p. 331).
This is demonstrated by being able to state how the ‘‘underlying trait’’ is defined or conceptualised, how well the
instrument design reflects that underlying trait, and how the results compare across cultures. This is assessed by
examining the degree to which the other five types of equivalence summarised above have been achieved.

& Kuyken, 1994) and the meaning of ‘equivalence’ This search generated a total of 1347 papers. All titles
debated, with confusion shown to abound (Herdman, and available abstracts were downloaded and reviewed
Fox-Rushby, & Badia, 1997, p. 234). Notions of to select full papers that used or adapted at least part of
equivalence depend on the viewpoint taken (absolutist, one of the nine generic HRQL (15D, Dartmouth COOP/
universalist, relativist) and a model has been developed WONCA Charts, EuroQol, HUI, NHP, SIP, SF-36,
to examine equivalence between source and target QWB, WHOQOL) measures in the geographical areas
language versions of an HRQL measure from a of interest.
universalist perspective (Herdman et al., 1998). This Of the 62 papers identified, 24 were later rejected for
model defined six types of equivalence (see Table 1), review for the following reasons: paper was unavailable
proposed strategies for their evaluation and suggested despite considerable attempts to retrieve it (n ¼ 9); not
an order in which testing should take place. However, in country/region of interest (n ¼ 9); added no new
this model has not yet been operationalised. information relative to other papers (n ¼ 3); reported on
This paper reports the results of a systematic and an HRQL measure not selected for this review (n ¼ 1);
critical review of the literature to establish how nine1 and only a published abstract was available (n ¼ 2).
generic measures of HRQL have been translated and With the remaining 38 papers, a second search involved;
applied amongst populations in Africa, Asia, Eastern an iterative search of the references from the 38, which
Europe, the Middle East, and South America based on identified a further eight papers; adding three papers
an operationalisation of the Herdman et al. (1998) known to the authors; and contacting the developers
model of equivalence. A fuller technical report is of each measure, which identified an additional six
available from the corresponding author. papers. Finally, in March 2000, a web-page search of the
nine generic HRQL measures identified three further
papers2. Of the 58 papers selected for full review
Methods
2
The corresponding web pages visited are as follows. 15D:
Literature search and selection http://www.uku.fi/~sintonen/; Dartmouth COOP Charts:
http://home.fnxnet.com/FNX/home and http://www.dart-
The first stage of the literature search took place in mouth.edu/~coopproj/publications.html; EuroQol: http://
mid-1999 using Medline, Embase, Psychological Ab- www.eur.nl/bmg/imta/eq-net/eq5d/translat.htm, (now moved
to http://www.euroqol.org/); Health Utilities Index: http://
stracts, Social Science and Psychological Abstracts and
www.fhs.mcmaster.ca/hug/index.htm; Nottingham Health Pro-
ADIS. Search terms focussed on generic HRQL instru- file: http://www.atsqol.org/nott.html; Quality of Well-Being
ments (n ¼ 9) and developers (n ¼ 17) (see Appendix A) Scale: http://www/outcomes-trust.org/catalog/qwb.htm and
with no restrictions on language, country or time period. http://www.atsqol.org/qwb.html; Short Form 36: http://
www.sf-36.com, http://www.outcomes-trust.org/catalog/SF-
1
The nine measures are: 15D; Dartmouth COOP Charts; 36.htm and http://www.atsqol.org/SF-36.html; Sickness Impact
EuroQol (EQ5D); Health Utilities Index (HUI); Nottingham Profile: http://www.outcomes-trust.org/catalog/sip.htm, http://
Health Profile (NHP); Short-Form 36 (SF-36); Sickness Impact www.qlmed.org/SIP/bulletin.html, and http://www.atsqol.org/
Profile (SIP); Quality of Well-Being Index (QWB); and the sick.html; and WHOQOL: http://www.who.int/msa/mnh/mhp/
WHOQOL. ql.htm.
ARTICLE IN PRESS
A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306 1291

(see Appendix B), ten were translated into English (six spread across these six regions, except for the East Asia
from Japanese, two Polish, one Arabic, and one Spanish). and Pacific region where there are three times as many
papers (37 of the 58 papers reviewed). Noticeably, 30 of
Developing the review criteria the 37 papers for the East Asia and Pacific region
reported research carried out in Japan. All eight HRQL
The review criteria were developed based on a model measures have been used in the East Asia and Pacific
developed by Herdman et al. (1998) although we are Region, whereas four or fewer measures had been used
aware that our experiences developing an HRQL in other regions (see Table 3). Research has been
measure in Kenya (Bowden, Fox-Rushby, Nyandieka, conducted in 20 different languages with the dominant
& Wanjau, 2002; Fox-Rushby, 2000 for the KENQOL language being Japanese (30 papers). Eight languages5
Group) and the EQ5D (Fox-Rushby & Selai, in press; were represented in six to 11 papers each, and the
Herdman, Fox-Rushby, Rabin, Badia, & Selai, in press; remaining 116 languages in five or fewer papers each.
Rabin Rabin, Herdman Fox-Rushby, & Badia, in press) Two papers described research with people suffering
influenced our thinking. Each paper included was from communicable disease (HIV and Hepatitis),
systematically reviewed using a standard set of questions whereas 19 papers focused on people suffering with
based on the Herdman et al. model. Early piloting of the non-communicable disease (e.g. cancer, and depression).
questions by both authors on four papers led to further Only one paper reported on injuries (Wrzesniewski,
discussion and revision. The revised questions were 1997). The majority of research (n ¼ 38) was non-disease
piloted again and were also critically reviewed by two specific.
external reviewers.3
The questions used to review each paper covered the
following: background details (such as location of Conceptual equivalence
research, language used and funders); methodological
aspects (such as characteristics of the samples, purpose Table 4 reveals that 65.5% of papers did not report or
of paper, other instruments used, and translation only made a minimal investigation of conceptual
processes); and the extent to which a paper covered equivalence. A partial report was provided in 17 papers
methods suggested by Herdman et al. for each part of (29.3%), and an extensive report in three papers (5.2%).
the universalist model of equivalence (see Appendix C). All of the extensive reporting were by researchers using
For each of the six types of equivalence we also the WHOQOL (e.g. Szabo, 1996). There was no research
categorised each paper into one of three categories, on conceptual equivalence for the 15-D, HUI or SIP.
depending on the level of reporting (none/minimal; The majority for the East Asia and Pacific region
partial; and extensive) and whether specific issues had (67.6%) failed to report on conceptual equivalence.
been covered. Details of the exact criteria are provided Nineteen papers (32.8%) provided some form of
in Table 2. theoretical argument. Examples include vague argu-
ments that consider the differences in experiences,
understanding, and conceptualisation of health, quality
of life, or aspects of health (e.g. Westbury et al., 1997)
Results and discussion of the conceptual basis of the HRQL
instrument (e.g. Wang & Chen, 1999). Authors using the
Background 15-D and the SIP did not present any theoretical
arguments accepting or rejecting conceptual equiva-
All papers were published between 1990 and 1999, lence. Research in the Americas and the East Asia and
despite searches being conducted to identify papers prior Pacific region was more likely to present a theoretical
to this date. The SF-36 dominates the literature (40%) argument than other regions.
followed by the WHOQOL (19%), Dartmouth COOP In total only 24.2% of the papers provided any
Charts (14%), EQ5D (9%), NHP (7%), SIP (5%), 15-D information about the assessment of local conceptions
(3%), and HUI (3%). No papers were identified for the
QWB, and it is therefore excluded from further analysis. (footnote continued)
The papers spanned 23 countries, falling into six of Thailand. Eastern Europe: Croatia, Russian Federation, Po-
the nine World Bank regions.4 There is a fairly even land. Middle East: Israel, Jordan, Saudi Arabia. South Asia:
India, Nepal, Pakistan, Sri Lanka.
3 5
Michael Herdman (Catalan Agency for Health Technology Hebrew (n ¼ 11), Spanish (n ¼ 8), Croatian (n ¼ 8), Rus-
Assessment and Research) and Maria Watson (GlaxoSmithK- sian (n ¼ 7), Hindi (n ¼ 7), Cantonese (n ¼ 7), Tamil (n ¼ 6),
line). and Shona (n ¼ 6).
4 6
Americas: Argentina, Brazil, Colombia, Mexico, Panama. Arabic (n ¼ 3), Kiswahili (n ¼ 3), Thai (n ¼ 3), Polish
East and Southern Africa: South Africa, Tanzania, Zimbabwe. (n ¼ 2), Portuguese (n ¼ 2), Afrikaans (n ¼ 1), Malay (n ¼ 1),
East Asia and Pacific: China, Hong Kong, Japan, Singapore, Nepali (n ¼ 1), Putonghua (n ¼ 1), and Xhosa (n ¼ 1).
ARTICLE IN PRESS
1292 A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306

Table 2
Criteria for deciding whether translation/adaptation process had followed Herdman et al.’s model

None/minimal adherence Partial adherence Extensive adherence

Conceptual No mention of any issues related to At least a mention of one of more of Explicit details about at least
conceptual equivalence, or a the following: (1) an assessment of two of the three issues listed
cursory mention of ‘‘cultural bias’’ the local populations under ‘‘partial’’. Can also
or similar statement. conceptualisation of health or quality include any of the other points
of life, or (2) an assessment of the listed under conceptual
appropriateness of the measure in the equivalence. Should also be
target setting or (3) theoretical information or each of the
arguments questioning or accepting other points listed under
conceptual equivalence. Can also conceptual equivalence.
include any of the other points listed
under conceptual equivalence.
Item No mention of any issues related to Description of the assessment of Explicit details about at least
conceptual equivalence, or a brief either (1) life style patterns in the two of the three issues listed
mention in the text relating to any target population, or (2) the under ‘‘partial’’. Should also
of the issues relating to item relevance or acceptability of include details about
equivalence. individual items to the target judgements and changes
population, or (3) quantitative made.
assessment of item equivalence. Can
also include details about judgements
and changes made.
Semantic No mention of any issues related to Description of at least one of the key Extensive details about at
semantic equivalence, or a brief issues related to semantic least 3 of the key items listed
mention in the text about who was equivalence: (1) an assessment of under ‘‘partial’’. Should also
involved in the translation of the types of meaning, (2) an assessment include information about the
HRQL measure. of key words and phrases, (3) who other issues listed under
was involved in translation of the semantic equivalence.
measure, (4) who was involved in
judging the quality of the
translations, (5) a description of any
problems encountered.
Operational No mention of any issues related to A description of at least one or two An extensive description of at
operational equivalence. of the key issues listed under three or more of the key issues
operational equivalence: (1) an listed under ‘‘partial’’. Should
assessment of response options, (2) also include other issues listed
an assessment of missing data, (3) under operational
discussion on time frames, (4) a equivalence.
discussion on instructions and
format. Can also include other issues
listed under operational equivalence.
Measurement No mention of any issues relating A description of one or two of the A detailed description of at
to measurement equivalence. following: (1) reliability, (2) validity, least three of the five issues
(3) sensitivity, scoring norms, or listed under ‘‘partial’’. Should
effect size, (4) association and nature also include other issues listed
of relationship between socio- under measurement
economic and demographics, clinical equivalence.
characteristics and HRQL scores, (5)
item weighting. Can also include
other issues listed under
measurement equivalence.
Functional Papers not assessed for reporting on functional equivalence.

of health and quality of life. Research with the SIP, cluded discussion among researchers (e.g. Lam, Van
NHP, and the WHOQOL were the most consistent in Weel, & Lauder, 1994) and involvement of local people
providing this information. Approaches adopted in- (n ¼ 6) (e.g. WHOQOL Group, 1995). Users of the
ARTICLE IN PRESS
A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306 1293

Table 3
Number of references reporting research in each World Bank regiona for each HRQL instrument

East and East Asia and South Asia Eastern Europe Middle East Americas Total
Southern Pacific and Central number of
Africa Asia regions

WHOQOL (n ¼ 11) 6 7 7 8 6 5 6
SF-36 (n ¼ 23) 4 12 - 2 4 1 5
COOP (n ¼ 8) — 6 2 — 3 1 4
NHP (n ¼ 4) — 2 - 1 1 — 3
SIP (n ¼ 3) — 2 1 — — 1 3
HUI (n ¼ 2) — 1 — — — 1 2
15D (n ¼ 2) — 2 — — — — 1
EQ (n ¼ 5) — 5 — — — — 1
a
References can fall into more than one region in instances where research was conducted in more than one World Bank region. No
papers were identified for West Africa and North Africa regions.

WHOQOL adopted the widest range of methods for discussions) or quantitative evidence (e.g. analysis of
assessing the local population’s conceptualisation of norms).
health or quality of life, and are most active in making
this assessment. Item equivalence
Each paper was reviewed to establish what kinds of
people were asked to judge the appropriateness of the Thirty-one papers (53.4%) did not report, or only
instrument in the target setting. Forty-nine papers reported a minimal investigation into item equivalence.7
(84.5%) gave no information at all. Some studies used A partial report was provided in 23 papers (39.7%), and
the author’s judgement or that of the original developers an extensive report in four papers (6.9%). The extensive
of the instrument (n ¼ 2), others (n ¼ 8) used members reporting can be attributed equally to the WHOQOL
of the general population, health care users, and health (e.g. Szabo, 1996) and the SF-36 (e.g. Lam et al., 1998).
care workers. By region most reporting occurred in South Asia and
Each paper was reviewed for conclusions, judgements least in Eastern Europe/Central Asia.
and justifications made by researchers concerning Only six papers addressed the issue of whether life
conceptual equivalence. Only 10 papers provided this style patterns differed in source and target countries: SF-
information. The outcomes given were grouped into five 36 (n ¼ 3), SIP (n ¼ 2), Dartmouth COOP Charts
categories: (n ¼ 1). Examples include:

1. No changes were made and the conceptual basis


* Discussion in the text (n ¼ 3) (e.g. Brena, Sanders, &
of the HRQL measure was accepted (n ¼ 3) (e.g. Motoyama, 1990).
Mitchell, Nahas, Shukri, & Al-ma’aitah, 1995).
* Reporting a discussion amongst the researchers
2. The conceptual basis of the measure was questioned (n ¼ 2) (e.g. Landgraf & Nelson, 1992).
but no changes were made to the measure (n ¼ 2)
* Quantitative evidence (n ¼ 2) (Fukuhara, Ware,
(e.g. Shmueli, 1998). Kosinski, Wada, & Gandek, 1998a).
3. The conceptual bases of measures were questioned
and changes were made to the measure as a result In all cases there was an emphasis on the target
(n ¼ 5) (e.g. Saxena, Chandiramani, & Bhargava, culture rather than a comparison of target and source
1998). cultures. In general, the papers did not reflect on
4. Implications of the decision made were discussed implications of the evidence presented.
(n ¼ 1) (Westbury et al., 1997). Items in the questionnaire must be both relevant and
5. Recommendations for further research provided in acceptable to the target population. In this context there
detail (n ¼ 1) (Lam, Gandek, Ren, & Chan, 1998). was considerable variation by HRQL measure, but the

None of the papers reviewed considered that the 7


Herdman et al. (1998) referred to quantitative analysis to
instrument might or would not work in the target assess item equivalence. However, in this review, any quanti-
culture as a result of conceptual differences. All but one tative analysis of item equivalence will be discussed within
of the 10 papers provided justification for the outcome, measurement equivalence. The rationale and implications of
which was based on qualitative (e.g. focus group this decision are considered in the discussion section.
1294
Table 4
Results of systematic review for each equivalence (%)

By HRQL instrument By World Bank Region

15D COOP EQ5D HUI NHP SF-36 SIP W’ QOL E and S E Asia S Asia E Middle Americas
(n ¼ 2) (n ¼ 8) (n ¼ 5) (n ¼ 2) (n ¼ 4) (n ¼ 23) (n ¼ 3) (n ¼ 11) Africa and (n ¼ 10) Europe East (n ¼ 10)
(n ¼ 10) Pacific and C (n ¼ 14)

A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306


(n ¼ 36) Asia
(n ¼ 11)

Conceptual equivalence None 100 75.0 80.0 100 50.0 69.6 100 27.3 50.0 67.6 30.0 40.0 57.1 33.3
Partial 0 25.0 20.0 0 50.0 30.4 0 45.5 20.0 24.3 40.0 30.0 14.3 33.3
Extensive 0 0 0 0 0 0 0 27.3 30.0 8.1 30.0 30.0 28.6 33.3

Assessment of local conceptions of 0 37.5 0 0 50.0 8.7 66.7 54.5 40.0 25.0 70.0 54.5 42.9 60.0

ARTICLE IN PRESS
health and QoL (n ¼ 15)
Appropriateness of HRQL measure 0 37.5 20.0 0 25.0 4.3.0 0 18.2 20.0 22.2 30.0 9.1 7.1 20.0
(n ¼ 9)
Author’s location (n ¼ 56) 100 75.0 100 100 75.0 91.3 100 100 100 89.1 80.0 90.9 100 88.9
Theoretical arguments (n ¼ 19) 0 50.0 20.0 100 75.0 30.4 0 18.2 20.0 36.1 20.0 9.1 14.3 30.0
Outcome, judgements and 0 12.5 20.0 0 50.0 8.7 0 36.4 30.0 19.4 50.0 18.2 28.9 30.0
justifications (n ¼ 10)

Item Equivalence None 50.0 62.5 100 50.0 75.0 47.8 66.7 20.0 30.0 48.6 20.0 54.5 50.0 22.2
Partial 50.0 37.5 0 50.0 25.0 43.5 33.3 60.0 50.0 40.5 60.0 27.3 35.7 55.6
Extensive 0 0 0 0 0 8.7 0 20.0 20.0 10.8 20.0 18.2 14.3 22.7

Life style patterns (n ¼ 6) 0 12.5 0 0 0 13.0 66.7 0 0 16.7 10.0 0 0 20.0


Relevance and acceptability of items 50.0 37.5 40.0 50.0 0 30.4 0 72.7 60.0 44.4 70.0 45.5 50.0 60.0
(n ¼ 23)
Outcomes of any quantitative or 0 12.5 0 0 0 0 0 0 0 4.3 10.0 0 0 10.0
qualitative analysis
Judgements about item equivalence 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Identifying non-equivalent items 0 12.5 0 0 0 34.8 0 18.2 40.0 22.2 30.0 18.2 21.4 20.0
(n ¼ 11)

Semantic equivalence None 100 62.5 60.0 50.0 50.0 60.9 100 81.8 70.0 67.6 60.0 70.0 64.3 44.4
Partial 0 37.5 20.0 50.0 50.0 26.1 0 18.2 30.0 21.6 40.0 30.0 35.7 55.6
Extensive 0 0 20.0 0 0 13.0 0 0 0 10.8 0 0 0 0

Types of meaning (n ¼ 13) 0 25.0 20.0 50.0 75.0 26.1 0 9.1 20.0 22.2 20.0 27.3 21.4 30.0
Original questionnaire meaning 0 12.5 20.0 50.0 0 13.0 0 9.1 10.0 16.7 10.0 9.1 7.1 20.0
(n ¼ 5)
Semantic re-writes 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Contact with developers (n ¼ 20) 0 12.5 20.0 100 0 34.8 0 72.7 70.0 27.8 10.0 0 0 20.0
Translation guidelines (n ¼ 14) 0 25 20.0 50.0 0 34.8 0 18.2 20.0 25.0 30.0 18.2 14.3 30.0
Meaning of key words and phrases 0 0 20.0 0 25.0 4.3 0 9.1 10.0 5.6 0 9.1 0 0
(n ¼ 4)
Lexical relationships 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Translators (n ¼ 19) 0 50.0 40.0 50.0 50.0 39.1 0 9.1 20.0 32.4 30.0 18.2 35.7 44.4
Judging the translation quality 0 0 40.0 100 50.0 43.5 0 9.1 20.0 24.3 10.0 27.3 28.6 33.3
(n ¼ 17)

A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306


Translation problems and difficulties 0 0 20.0 0 25.0 26.1 0 0 10.0 13.9 0 0 14.3 10.0

Operational None 50.0 62.5 60.0 50.0 100 69.6 33.3 45.5 60.0 51.4 40.0 70.0 71.4 33.3
equivalence
Partial 50.0 37.5 40.0 50.0 0 30.4 66.7 36.4 20.0 43.2 40.0 10.0 14.3 44.4
Extensive 0 0 0 0 0 0 0 18.2 20.0 8.7 40.0 20.0 14.3 22.2

Questionnaire format and instructions 0 12.5 0 0 0 2.8 66.7 27.3 10.0 50.0 0 0 0 0

ARTICLE IN PRESS
(n ¼ 6)
Literacy rates (n ¼ 1) 0 12.5 0 0 0 0 0 0 0 2.8 0 0 0 0
Addressing respondents (n ¼ 1) 0 12.5 0 0 0 0 0 0 0 2.8 0 0 0 0
Missing data (n ¼ 1) 0 0 0 0 0 2.8 0 0 0 2.8 0 0 0 0
Response options (n ¼ 14) 50.0 12.5 20.0 0 0 13.9 0 54.5 30.0 25.0 10.0 0 14.3 0
Literature reviews (n ¼ 1) 0 0 0 0 0 0 0 9.1 10.0 0 0 0 0 0
Time frames (n ¼ 8) 0 25.0 0 0 0 8.7 0 36.4 30.0 7.7 10.0 9.1 0 0
Pre-testing (n ¼ 4) 0 12.5 0 50.0 0 17.4 0 18.2 30.0 7.7 0 0 14.3 10.0
Comparative response bias (n ¼ 2) 0 12.5 20.0 0 0 0 0 0 0 7.7 0 0 0 0

Measurement None 0 37.5 40.0 50.0 25.0 21.7 66.7 54.5 50.0 37.8 50.0 60.0 57.1 66.7
equivalence
Partial 100 37.5 40.0 0 50.0 30.4 33.3 18.2 20.0 29.7 20.0 20.0 21.4 11.1
Extensive 0 25.0 20.0 50.0 25.0 47.8 0 27.3 30.0 32.4 30.0 20.0 21.4 22.2

Reliability testing (n ¼ 24) 100 12.5 0 0 25.0 65.2 0 45.5 30.0 33.3 40.0 45.5 35.7 20.0
Validity testing (n ¼ 25) 100 25.0 20.0 100 25.0 60.9 33.3 18.2 40.0 47.2 40.0 18.2 14.3 20.0
Relationship between HRQL and 0 62.5 60.0 50.0 75.0 56.5 33.3 36.4 60.0 47.2 20.0 27.3 50.0 30.0
respondent characteristics (n ¼ 30)
Item weighting (n ¼ 6) 100 0 40.0 0 25.0 4.3 0 0 0 13.9 0 0 1 0

Functional equivalence Papers not assessed for reporting on functional equivalence.

1295
ARTICLE IN PRESS
1296 A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306

WHOQOL stands out because 73% of papers made an Table 5


assessment using at least one of the three approaches People selected to translate HRQL instruments and to judge the
described below: translation quality

* Consultation with experts (n ¼ 11): e.g. physicians Translator Judged


(n=) translation
(Shigemoto, 1990), translators (Fukuhara, Bito,
quality (n=)
Green, & Kurokawa, 1998b) and public health nurses
(Hisashige, Mikasa, & Katayama, 1998). Authors or researchers 8 6
* Post hoc analysis of the HRQL questionnaire results Other professionals 6 1
(n ¼ 9): e.g. analysis of the time taken to complete Unspecified 6 4
the questionnaire (Watanabe, 1995). Health care professionals 5 3
Professional translators 4 5
* Qualitative research techniques with ‘non-experts’
(including bilingual and
(n ¼ 6) (Lam, Van Weel, & Lauder, 1994).
linguistic experts)
Members of the general 3 10
Very few papers provided details about any outcomes population
Original developers of the — 7
or judgements8 as a result of identifying non-equivalent
instrument
items. Eleven papers highlighted items that were not
considered equivalent. Eight of these papers were for the
SF-36 and the remaining papers utilised the Dartmouth
COOP Charts and the WHOQOL. Only two of the
papers attended to this issue in any detail: Lam et al. addressed this issue. In all cases the information was
(1998) and Skevington, Bradshaw, and Saxena (1999). fairly limited with Wagner et al. (1998) using the SF-36
In general, there is limited information about how items providing the most thorough assessment of this. Twenty
were dealt with once they were identified as not papers (34.5%) had at least some contact with the
equivalent. original developers who should have had the potential to
inform the researchers about the original questionnaire
Semantic equivalence meaning. The majority of these papers included at least
one of the original developers amongst the authors.
Thirty-nine papers (67.2%) did not report or gave a Only one paper described an assessment of key words
minimal report of investigations into semantic equiva- and phrases in the target language. Wrzesniewski (1997)
lence. A partial report was provided in 15 papers mentioned that three people directly connected with the
(25.9%), and an extensive report in four papers (6.9%). adaptation discussed the translation of every point in the
Both the 15-D and the SIP failed to report on semantic questionnaire. No paper described an assessment of the
equivalence. Users of the EQ5D (Japanese EurQol lexical relationship9 between key words and phrases.
Translation Team, 1998) and the SF-36 (e.g. Fukuhara Nineteen of the papers provided information about
et al., 1998b) were the only groups to provide an who translated the HRQL measure during the adapta-
extensive report. By region, the only extensive reporting tion process and these are listed in Table 5. Seven of the
of semantic equivalence assessment has been completed papers used translators from more than one of the
in East Asia and Pacific regions, and on closer categories listed. Seventeen papers provided information
examination, all of this research is within Japan. about who made judgements about the translated
Although none of the papers addressed the types of instrument (see Table 5), and ten of these used people
meaning presented in Herdman et al. (1998), 13 papers from more than one of the categories listed. Those using
(22.4%) did consider other issues of semantic impor- the EQ5D, HUI, and NHP did not favour a type of
tance, although none addressed this issue in any detail person to judge the translation. The SF-36 favoured the
and in most cases very general comments were made. general population and the developers to make the
For example Mitchell et al. (1995) and Fukuhara et al. judgement, although several of the SF-36 papers used
(1998b) mentioned literal meaning and Szecket, Medin, professional translators and the authors or researchers.
Furlong, Feeny, and Barr (1999) discussed idiomatic Fourteen papers (24.1%) made reference to translation
expressions. guidelines or protocol. There was considerable variation
Each paper was examined to identify how the authors by HRQL measure but less variation by region. Of
and translators knew the meaning of the questionnaire these, 11 papers (78.6%) used the guidelines recom-
in the source language, and only six papers (10.3%) mended by the developers and three (21.4%) used other
guidelines.
8
Those judgements about item equivalence based on assess-
9
ment of the items post-translation are covered in the semantic, The relation of a particular word to other words that have
operational or measurement equivalance sections. some aspects in common with it.
ARTICLE IN PRESS
A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306 1297

Herdman et al. (1998) identified three outcomes of Seven papers (12.1%) provided information about the
assessing semantic equivalence where items are either HRQL instrument’s time frame. For example, four
easy, difficult or impossible to translate. However, only WHOQOL papers described that the time frame is 2
eight papers identified any problems during the transla- weeks prior to the administration of the questionnaire
tion exercise, how any problems were dealt with and and two SF-36 papers describe how the time frame was
whether or not items (questions) were considered easy/ changed (see Wagner et al., 1999).
difficult, or impossible to translate, e.g. Japanese EurQol Eight papers (13.8%) described pre-testing activities.
Translation Team (1998). The information was very limited in all papers and for
four it is not clear whether issues surrounding opera-
Operational equivalence tional equivalence were covered in the pre-testing.
Examples include Fukuhara et al. (1998b), who exam-
Thirty-six papers (62.1%) did not report or gave a ined the response choices in pre-testing, and Szecket et al.
minimal report of investigation into operational equiva- (1999) tested the skip-patterns on the questionnaire.
lence. A partial report was provided in 20 papers Herdman et al. (1998) identified three possible out-
(34.5%), and an extensive report in two papers (3.4%) comes of assessing operational equivalence: the same
(WHOQOL Group, 1998a; Szabo, 1996). Only the methods can be used, some aspects of operational
WHOQOL provides an extensive description of inves- equivalence need to be different; or it is impossible to
tigation into operational equivalence. Each of the six achieve operational equivalence. None of the papers
World Bank regions has a reference that made an made a judgement of this nature, although it is clear
extensive assessment, and this can be attributed entirely from the descriptions above that there are examples of
to researchers using the WHOQOL. both the first and second outcomes. None of the papers
Only Lam et al. (1994) provided information regard- stated that the methods could not be transferred to the
ing the literacy rates. Several papers provided informa- target setting.
tion about the educational level of the respondents. Only
Westbury et al. (1997) acknowledged how important it is Measurement equivalence
for researchers to be aware of the customs of addressing
people, and appropriate ways of framing questions, by There is an increased level of reporting of measure-
stating that written instructions were given to the ment equivalence compared with conceptual, item,
patients explaining the research to them. semantic, and operational equivalence. All of the eight
Seven (12.1%) papers reported on whether or not the HRQL measures have some papers that fall into the
same instructions and format were used in the source partial and/or extensive categories. In fact, all of the
and target versions of the instrument. Fourteen papers HRQL measures, except for the WHOQOL and the SIP,
(24.1%) provided information about the assessment of have at least 50% of their papers in the partial and/or
response categories10 to establish if they were equivalent extensive categories. The SF-36 and the NHP are both
in the source and target countries. Several of these noted for reporting most frequently. By region, the
papers (n ¼ 4) provided very detailed descriptions of combined percentage of papers in the extensive and
how the equivalence of response options was examined, partial categories ranges from 33% (Americas) to 62.1%
e.g. Tsuchiya (1999). (East Asia and Pacific).
Nine of the papers used only qualitative methods to Twenty-five papers (43%) reported assessments of
assess the equivalence of response options, three used reliability. Internal consistency (using Cronbach Alpha)
only quantitative methods, and two papers described was the most popular (n ¼ 15). Of these, 80% utilised
how both quantitative and qualitative methods were the SF-36 (e.g., Gandek & Ware, 1998). Test–retest
employed, e.g. Keller, Ware, Gandek, Aaronson, reliability was assessed in eight papers, with five using
Alonso, Apolone et al. (1998). Only one paper (Szabo, the SF-36 (e.g., Al Abdulmohsin, Coons, Draugalis, &
Orley, & Saxena, 1997) consulted literature about Hays, 1997). Other forms of reliability testing included
appropriate response modes. Qualitative methods in- equivalent forms using Pearson correlation coefficient
cluded discussions between the authors or researchers (Coons, Al Abdulmohsin, Draugalis, & Hays, 1998) and
(Landgraf & Nelson, 1992) and extensive interviews with item to scale correlation (Lewin-Epstein, Sagiv-Schifter,
people from the general population (Szabo et al., 1997). Shabtai, & Schumeli, 1998).
Quantitative methods included Thurstone scaling ex- Of the 23 papers (43%) that reported validity tests,
ercises (Gandek et al., 1998) and Time Trade Off most concerned the SF-36. Tests included construct
techniques (Tsuchiya, 1999). validity (e.g. Thomas, Ruby, Peter, & Cherian, 1995),
face validity (e.g. Lam et al., 1994) and convergent
10
Response categories can be dichotomous (e.g. yes or no), validity (Thumboo et al., 1999). It is difficult, in practice,
likert scales (e.g. excellent, very good, good, fair, poor), points to separate the assessment of scoring norms with the
on a visual analogue scale (e.g. EQ-VAS) etc. assessment of the relationship between questionnaire
ARTICLE IN PRESS
1298 A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306

scores and socio-economic, demographic, and clinical and measurement equivalence. Since functional equiva-
characteristics, and again with an assessment of known lence was a concept introduced by the Herdman et al.
(or extreme) group validity. The analysis of variation in (1998), and the majority of papers reviewed in this paper
HRQL scores by socio-economic, demographic, and were published prior to this date, it is a little premature
clinical characteristics was by far the most common to examine reporting on functional equivalence. How-
analysis reported in the studies reviewed. Many studies ever, it is possible to provide an indication of whether
(n ¼ 10) referred to this as discriminant validity, of researchers have enough information to make an
which the SF-36 was included in seven. However, the assessment of functional equivalence. We selected Japan
same analytical approach was used by others to establish as a focus because the majority of HRQL research
population norms (Ikeda and Ikegami, 1999; Mitchell reviewed has taken place here, involving six of the nine
et al., 1995; O’Keefe & Wood, 1996), and many others HRQL measures. Table 6 shows the HRQL instruments
considered simply considered this an assessment of the in the first column and lists each equivalence across the
variation in scores by demographic and socio-economic top row. The table gives an indication of the extent to
characteristics. The most popular characteristics for which each equivalence has been reported for the
analysis were age (n ¼ 12 papers) and gender (n ¼ 10 Japanese version of each HRQL instrument. The
papers), whereas others described relationships between equivalence of the Japanese version of the WHOQOL
HRQL scores and clinical characteristics. Examples has been more extensively reported on than any other
include analysis for those suffering with major and Japanese versions of the HRQL measures. It also shows
minor depression (Froom, Aoyama, Hermoni, Mino, & that the Japanese version of the SF-36 has been well
Galambos, 1995; Mino, Aoyama, & Froom, 1994). documented in terms of equivalence. However, this table
The majority of the papers (63.3%) compared findings reveals the focus on semantic and measurement equiva-
with studies elsewhere. Several made comparisons with lence issues, and the lack of reporting on conceptual,
studies from different countries that were part of the item, and operational issues.
same research project, but not with scores from the
instrument’s country of origin (e.g., Westbury et al.,
1997; Lam et al., 1994; WHOQOL Group, 1998a). Discussion
However, in most cases, the comparison is with data
from studies on populations from the source country The discussion addresses the current state of translat-
(e.g., Tsuchiya et al., 1998; Mitchell et al., 1995; Lam ing and adapting generic HRQL measures, followed
et al., 1998). by a reflection on the value of the Herdman et al.
Six papers provided information about the weighting (1998) model, and our attempt to operationalise this
of items, four of these detailed the process of obtaining model.
new weightings for the HRQL measure in the target
country (Watanabe, 1995; Watanabe et al., 1996; How widespread is the ‘‘international’’ field of generic
Hisashige et al., 1998; Lewin-Epstein et al., 1998). HRQL assessment?
Others suggested that weights should change but did not
use a different weighting scheme (e.g., Mitchell et al., This review has identified an increasing number of
1992). adaptation studies outside of North America and
Northern Europe. Although the adaptation process
is increasingly widespread, it is uneven, with an absence
Functional equivalence of activity in two of the World Bank regions (West
Africa and North Africa) and a preponderance of
Functional equivalence is the combined effect of activity in the East Asia and Pacific region, and
assessing for conceptual, item, semantic, operational, specifically in Japan.

Table 6
Extent of equivalence reporting in Japan by HRQL measure

Concept Item Semantic Operational Measurement Functional

15D None/minimal Partial None/minimal Partial Partial N/A


COOP Partial Partial Partial Partial Extensive N/A
EQ5D Partial Partial Extensive Partial Extensive N/A
SF-36 Partial Extensive Extensive Partial Extensive N/A
SIP None/minimal Partial None/minimal Partial None/minimal N/A
WHOQOL Extensive Extensive Partial Extensive Extensive N/A
ARTICLE IN PRESS
A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306 1299

The HRQL instruments reviewed in this paper, except Herdman et al. (1998), but the majority of papers failed
for the WHOQOL, were all developed in countries to report partially or extensively on this area. There
where non-communicable, as opposed to communicable, appears to be increasing contact with the original
diseases are prevalent. The World Health Organization developers of instruments, and it is not unusual to find
reports11 that 6% of mortality in Europe can be one among the authors of papers using translated
attributed to communicable disease whereas it is versions of an instrument. The advantage of this is that
71% in Africa and 39% in South East Asia. However, translators are more likely to understand the aims and
only 3.4% of the papers reviewed focused on commu- intended meanings of the source instrument. The
nicable disease and a mere 1.7% on accident and disadvantages could include the unwillingness of origi-
injury. Thus, the current application of HRQL instru- nal developers to ‘‘allow’’ changes to be made, and the
ments may not accurately reflect the disease profile or need for greater compromise by translators as devel-
benefits of appropriate health interventions. However, opers are able to ‘‘withhold’’ official ‘‘approval’’ of the
whether researchers consider existing measures less newly translated instrument without necessarily under-
relevant for assessing the HRQL of people suffering standing anything about the target culture.
from communicable diseases is not known. Another In terms of operational equivalence, there is a focus
possible explanation is that researchers are keen to on equivalence of response options, with an emphasis on
compare their findings with results elsewhere and so the placing of words on scales. In general, researchers
research interests from North America and Europe using the SF-36 have frequently examined validity and
dominate. reliability, although good practice is not restricted to
research using the SF-36.
How well are generic measures of HRQL being assessed It was surprising to see the ease with which those
for ‘‘equivalence’’ during the translation and adaptation translating/adapting HRQL measures accept the target
process? measure with unjustified assertions about cultural
applicability. Too much emphasis is being placed on
Few studies consider each type of equivalence, as set establishing the psychometric properties of an adapted
out in Herdman et al. (1998), and none in any detail. instrument. Researchers are generally too happy to
The majority of papers only considered measurement accept confirmation of validity and reliability as
equivalence to any degree. This reveals a pre-occupation sufficient evidence that the adapted measure is suitable
in this field with scales rather than concepts. Indeed, it for use in the target culture. This is not a comment that
confirms Herdman et al.’s earlier view that research in can be pinned to all the research examined. There were
this field either implicitly or explicitly adopts an some good examples of critical assessment of the
‘‘absolutist’’ conception of health (Herdman et al., findings, often voiced as concerns over the applicability
1997), where health is conceived of in the same way of the HRQL measure in the target population.
across the world and that all that differs is the value However, the field can be characterised by statements
attached to its component parts. of concern about equivalence between source and target
This review has allowed us to go beyond this broad versions of an instrument, followed by a sweeping
criticism and show how researchers evaluated the summary from authors that a measure is suitable for use
translated questionnaires in detail and in practice. in the target population and, at most, a cautious
Overall, there was a lack of published evidence for the suggestion that further research might be needed to
assessment of conceptual equivalence. It was disappoint- confirm this.
ing to see quantitative analysis as the preferred approach The processes involved in developing and testing
when qualitative procedures would certainly prove to be the WHOQOL have more rigorously evaluated equiva-
equally, if not more, rewarding. It is notable that lence. Therefore, conclusions about the use of the
developers and those translating and adapting instru- WHOQOL are more likely to provide reliable and
ments rarely draw on theoretical positionings in this valid interpretations across countries. Drawing together
research (Albrecht & Fitzpatrick, 1994) or question the papers from Japan for each HRQL instrument
nature of their own beliefs—both of which affect demonstrated this. The WHOQOL researchers have
interpretations of whether instruments in source and provided a greater amount of information to facilitate
target languages are considered conceptually equivalent. an assessment of functional equivalence. However,
The majority of papers also failed to report, either despite this, there are still concerns regarding its use:
partially or extensively, on item equivalence. Semantic for example, it uses an imposed concept of health
equivalence was outlined in considerable detail by and further investigation of its psychometric properties
is needed. It would be unfair to identify an HRQL
11
http://www3.who.int/whosis/menu.cfm?path=whosis, bur- measure that was the ‘‘least useful’’, as each measure
den, burden gbd2000, burden gbd2000 subregion & langua- has shown good and bad practice in terms of
ge=english (accessed 3 December 2001). adaptation.
ARTICLE IN PRESS
1300 A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306

How well was and can the Herdman et al. (1998) model Whilst our operationalisation of the Herdman et al.
be operationalised? (1998) paper may assist those seeking to test translations
and adaptations of HRQL measures, we suggest the
The Herdman et al. (1998) model of equivalence is following changes to our operationalisation:
unusual in its attempt to relate a philosophical
positioning to a working model of equivalence as well * More extensive use of local literature reviews for
as some practical examples of how to examine each type investigating other types of equivalence, not just
of equivalence. It was particularly helpful in guiding our conceptual equivalence.
examination of the processes through which the generic * Asking the following question for all types of
measure of HRQL have been translated and applied equivalence, not just conceptual equivalence: ‘‘Do
amongst our populations of interest. However, the the authors highlight any areas where {operational}
review also allowed us to re-examine the model, equivalence has not been achieved?’’ to encourage
particularly with respect to practical applications and greater questioning and discussion of results.
the delineation between alternative types of equivalence. * Include a question regarding what authors recom-
Overall, the model allowed us to develop a structured mend when a particular type of equivalence is not
checklist for each type of equivalence in some detail. achieved.
Also useful was the suggested order for which the * Supplement with methods for assessing validity and
equivalences should be tested. The ordering worked well reliability, as these are not covered in any detail in
in this review and helped identify situations where our operationalisation or in Herdman et al.’s paper.
researchers had claimed to address (for example) * Draw on a wider range of information than provided
functional equivalence, when in fact they had assessed by Herdman et al.’s paper, e.g. participatory rural
measurement equivalence. However, we did experience appraisal techniques (Chambers, 1997), cognitive
some difficulties trying to operationalise the model as assessment of survey methodology (Forsyth &
not all types of equivalence were addressed in the same Lessler, 1991), or through an exercise using the
level of detail in the Herdman et al. (1998) paper: for Herdman et al. paper to guide the translation/
example, semantic equivalence compared to item adaptation of a generic measure in practice.
equivalence. We also started to consider whether
response options (operational equivalence) should actu- Finally, neither the Herdman et al. model nor our
ally be part of measurement equivalence. The most operationalisation allows any statement to be made
difficult distinction to make was to try to establish the concerning the degree to which source and target
difference (if any) between quantitative assessment for questionnaires are equivalent per se. They only provide
item equivalence, and methods used to establish methods through which processes are examined. At
measurement equivalence. Herdman et al. (1998) de- present, the conclusions are left entirely to the judge-
scribe how item equivalence should first be assessed ment of the individual researcher(s). Herdman et al. do
using qualitative approaches as well as assessing the not suggest who should assess how far equivalence has
psychometric properties of the item in the target culture, been achieved. For this review, we felt unable to make
and gives examples of techniques such as Cronbach’s such judgements, not only because the quality of the
alpha and Rasch analysis. However, within measure- evidence was weak, but also because we are not
ment equivalence, Herdman et al. also refer to Cronba- sufficiently knowledgeable about the source and target
ch’s alpha as a method for assessing measurement cultures within which HRQL instruments were being
equivalence. Thus, information gathered during mea- used. This view is quite challenging because of the way
surement equivalence can be used to interpret item in which developers of existing HRQL instruments seek
equivalence, and the difference between the two to control the quality of translated/adapted HRQL
equivalences are not made clear. One potential solution, instruments. Perhaps the only people ever capable of
and the one used in this review, is to strictly follow the judging will comprise those research teams whose
assertion that the equivalences should be tested in the members have each worked in both (or all) countries
following order: conceptual; item; semantic; operational; of interest.
and measurement. But in fact, Herdman et al.’s model There are limitations to this study. The literature
recognises that assessing equivalence is a circular process searches were predominantly based on international
allowing a second assessment of (for example) item databases which have a bias against literature from
equivalence at a later stage in the adaptation process. developing nations (Aronson & Bertrand, 1993). Adap-
The implications of our approach to solving this tation procedures may not always be published,
problem is that we may have underreported the extent especially where researchers have encountered problems
to which researchers have assessed item equivalence by in their adaptation. In fact, it would be very useful to see
classifying it all as an assessment of measurement such material. It is also apparent that the Herdman et al.
equivalence. model is only one model among many.
ARTICLE IN PRESS
A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306 1301

Conclusion SF20
SF12
Translation of HRQL instruments can be charac- SF6
terised to date by their focus on non-communicable SF 20
diseases in Southeast Asia rather than, for example, SF 12
communicable disease in Africa. Of the current instru- SF 6
ments available, the WHOQOL has evaluated equiva- SF 6D
lence most rigorously and is therefore more likely to Short Form 36
provide valid scores for comparison across settings, Short Form 12
although further psychometric testing is still needed and Short Form 20
concerns about the imposed definition of health and lack Short Form 6
of an index exist. Our conclusions are dependent on Medical Outcomes
accepting the Herdman et al. (1998) model, which we Survey
operationalised for the first time. In doing so, we found
Sickness Im- Sickness Impact Profile
more guidance for assessing particular types of equiva-
pact Profile
lence and we have suggested a number of additions to
SIP
the model to improve future reviews and translations.
However, such guidelines would ideally need testing. Nottingham Nottingham Health
Health Profile Profile
NHP
Acknowledgements
Quality of Quality Well-Being Scale
Well-Being
The authors would like to acknowledge Glaxo Well-
Scale
come for the financial support for this research including
Quality Well-Being Scale
providing a translation service where needed and the
Quality Well-Being Index
Health Economics and Financing Programme at
Quality Well-Being Index
LSHTM who provided funding for the preparation of
QWB Scale
this article. We thank Sarah Payne and Keiko Tsune-
QWB Index
kawa for their support and assistance and also gratefully
acknowledge Ann Ward, Richard Suchett-Kaye and Dartmouth Dartmouth COOP
Annette Cleary for their help with the literature search COOP
and obtaining papers. We also thank Suzanne Skeving- 15-D 15 Dimensional
ton, Cindy Lam, Michael Herdman, Virginia Wiseman Questionnaire
and Mercy Mugo for their comments on earlier drafts of 15Dimensional
this paper. Any errors are the fault of the authors. Questionnaire
Regarding any conflict of interest, both authors have 15 D Questionnaire
been members of the KENQOL group since 1995. In 15D Questionnaire
addition, Julia Fox-Rushby has been a member of the
EuroQol since 1987 and was one of many external Health Utili- Health Utilities Index
advisors to the WHOQOL in 1993/4. ties Index
HUI
WHOQOL WHOQOL
Appendix A. Literature search terms WHOQOL 100
WHOQOL BREF
Instrument Search terms
EuroQol EuroQol Authors and Anderson RT McKenna S
EQ 5D Instrument
EQ 5D Developers
EQ 5D Bergner M Nelson EC
Euro Qol Bice TW Olweny C
Deyo RA Orley J
Short Form SF-36 Feeny D Patrick D
36 Gandek B Sartorius N
SF 36 Hunt S Sintonen H
MOS 36 Kaplan RM Ware J
Rand 36 Marquis P
ARTICLE IN PRESS
1302 A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306

Appendix B. Papers reviewed Gandek, B., & Ware, J.E. for the IQOLA Project Group
(1998). Methods for validating and norming translations
Al Abdulmohsin, S.A., Coons, S.J., Draugalis, J.R., of health status questionnaires: The IQOLA project
&Hays, R.D. (1997). Translation of the RAND 36-item approach. Journal of Clinical Epidemiology, 51(11),
health survey 1-0 (AKA SF-36) into Arabic. Santa 953–959.
Monica: RAND. Hisashige, A., Mikasa, H., Katayama, T. (1998).
Amir, M., Roziner, I., Knoll, A., Neufold, M.Y. (1999). Description and valuation of health-related quality
Self-efficacy and social support as mediators in the of life among the general public in Japan by the
relation between disease severity and quality of life EuroQol. The Journal of Medical Investigation,
patients with epilepsy. Epilepsia, 40(2), 216–224. 45(1–4), 123–129.
Bito, S., Fukuhara, S. (1998). Validation of interviewer Ikeda, S., Ikegami, N. (1999). Health status in Japanese
administration of the Short Form 36 Health Survey, and population: Results from Japanese EuroQol study.
comparisons of health-related quality of life between Journal of Health Care and Society, 9(3), 83–92.
community-dwelling and institutionalized elderly peo- Jarema, M., Bury, L., Konieczynska, Z., Zaborowski,
ple. Japanese Journal of Geriatric Medicine, 35(6), B., Cikowska, G., Kunicka, A., et al. (1995). Compar-
458–463. ison of quality of life of schizophrenic patients in
Bobak, M., Pikhart, H., Hertzman, C., Rose, R., different forms of psychiatric care. Psychiatria Polska,
Marmot, M. (1998). Socioeconomic factors, perceived 31(5), 585–594.
control and self-reported health in Russia. A cross- Japanese EurQol Translation Team. (1998). The devel-
sectional study. Social Science & Medicine, 47(2), opment of the Japanese EuroQol instrument. Iryo to
269–279. Shakai, 8(1), 109–117.
Brena, S.F., Sanders, S.H., Motoyama, H. (1990). Keller, S.D., Ware, J.E., Gandek, B., Aaronson, N.K.,
American and Japanese chronic low back pain patients Alonso, J., Apolone, G., et al. (1998). Testing the
cross-cultural similarities and differences. The Clinical equivalence of translations of widely used response
Journal of Pain, 6(2), 118–124. choice labels: Results from the IQOLA project. Journal
Bullinger, M., Alonso, J., Apolone, G., Leplege, A., of Clinical Epidemiology, 51(11), 933–944.
Sullivan, M., Wood-Dauphinee, S., et al. (1998). Lam, C.L., Van Weel, C., Lauder, I.J. (1994). Can the
Translating health status questionnaires and evaluating Dartmouth COOP/WONCA charts be used to assess the
their quality: The IQOLA project approach. Clinical functional status of Chinese patients? Family Practice,
Journal of Epidemiology, 51(11), 913–923. 11(1), 85–94.
Coons, S.J., Al Abdulmohsin, A., Draugalis, J.R., Hays, Lam, C.L.K., Gandek, B., Ren, X.S., Chan, M.S.
R.D. (1998). Reliability of an Arabic version of the (1998). Tests of scaling assumptions and construct
RAND-36 health survey and its equivalence to the US- validity of the Chinese (HK) versions of the SF-36 of
English version. Medical Care, 36(3), 428–432. health survey. Journal of Clinical Epidemiology, 51(11),
Froom, J., Aoyama, H., Hermoni, D., Mino, Y., 1139–1147.
Galambos, N. (1995). Depressive disorders in three Landgraf, J.M., Nelson, E.C. (1992). Summary of the
primary care populations: United States, Israel, Japan. WONCA/COOP international health assessment field
Family Practice, 12(3), 274–278. trial. Australian Family Practice, 21(3), 255–269.
Fukuhara, S., Hino, K., Kato, K., Tomita, E., Yuasa, Lewin-Epstein, N., Sagiv-Schifter, T., Shabtai, E.L.,
S., Okushin H. (1997). Health-related quality of life in Schumeli, A. (1998). Validation of the 36-item
patients with chronic liver disease type-C. Kanzo, 38(10), short form health survey (Hebrew version) in the
587–595. adult population of Israel. Medical Care, 36(9),
Fukuhara, S., Ware, J.E., Kosinski, M., Wada, S., 1361–1370.
Gandek, B. (1998a). Psychometric and clinical tests of Li, J., Fielding, R. (1995). The measurement of current
validity of the Japanese SF-36 health survey. Journal of perceived health among Chinese people in Guangzhou
Clinical Epidemiology, 51(11), 1045–1053. and Hong Kong, southern China. Quality of Life
Fukuhara, S., Bito, S., Green, J. A. H., Kurokawa, K. Research, 4(3), 271–278.
(1998b). Translation, adaptation, and validation of the Mino, Y., Aoyama, H., Froom, J. (1994). Depressive
SF-36 health survey for use in Japan. Journal of Clinical disorders in Japanese primary care patients. Family
Epidemiology, 51(11), 1037–1044. Practice, 11(4), 363–367.
Gandek, B., Ware, J.E., Aaronson, N.K., Alonso, J., Mitchell, R.A., Imperial, E., Zhou, D., Lu, Y., Watts,
Apolone, G., Bjorner, J., et al. (1998). Tests of G., Kelleher, P., et al. (1992). A cross-cultural assess-
data quality, scaling assumptions, and reliability of the ment of perceived health problems in the elderly.
SF-36 in eleven countries: Results from the IQOLA Disability and Rehabilitation, 14(3), 133–135.
project. Journal of Clinical Epidemiology, 51(11), Mitchell, R.A., Nahas, V., Shukri, R., Al-ma’aitah, R.
1149–1158. (1995). Perceived health problems in elderly residents of
ARTICLE IN PRESS
A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306 1303

Jordan. Journal of Cross-Cultural Gerontology, 10(4), validation of the Short Form 36 for quality of
307–314. life assessment of systemic lupus erythematosus patients
O’Keefe, E.A., Wood, R. (1996). The impact of human in Singapore. Annals Academy of Medicine, 26(3),
immunodefficiency virus (HIV) infection on quality 282–284.
of life in a multiracial South African population. Quality Thumboo, J., Fong, K., Ng, T., Leong, K., Feng, P.,
of Life Research, 5(2), 275–280. Thio, S., et al. (1999). Validation of the MOS SF-36 for
Pibernik-Okanovic, M., Szabo, S., Metelko, Z. (1996). quality of life assessment of patients with systemic lupus
Quality of life in diabetic, otherwise ill and healthy erythematosus in Singapore. The Journal of Rheumatol-
persons. Diabetologia Croatia, 25(3), 117–121. ogy, 26(1), 97–102.
Pibernik-Okanovic, M., Szabo, S., Metelko, Z. (1998). Tsuchiya, A. (1999). Estimating a EuroQol tariff: The
Quality of life following a change in therapy for diabetes case of Japan. Barcelona, Spain: EuroQol Plenary
mellitus. Pharmacoeconomics, 14(2), 201–207. Meeting.
Sanders, S.H., Brena, S.F., Spier, C.J., Beltrutti, D., Tsuchiya, A., Hasegawa, T., Nishimura, S., Hisashige,
McConnell, H., Quintero, O. (1992). Chronic low back A., Ikegami, N.,Ikeda, S. (1998). A validity study of
pain patients around the world: Cross-cultural simila- the Japanese EuroQol instrument. Iryo-To-Shakai, 8(1),
rities and differences. The Clinical Journal of Pain, 8(4), 67–77.
317–323. Wagner, A.K., Gandek, B., Aaronson, N.K., Acquadro,
Saxena, S., Chandiramani, K., Bhargava, R. (1998). C., Alonso, J., Apolone, G., et al. (1998). Cross-cultural
WHOQOL-Hindi: A questionnaire for assessing quality comparisons of the content of the SF-36 trans-
of life in health care settings in India. The National lations across 10 countries: Results from the IQOLA
Medical Journal of India, 11(4), 160–165. project. Clinical Journal of Epidemiology, 51(11),
Shigemoto, H. (1990). A trial of the Dartmouth COOP 925–932.
charts in Japan. In: W.C. Committee (Ed.), Functional Wagner, A.K., Wyss, K., Gandek, B., Kilima, P.M.,
status measurement in primary care (pp. 181–187). New Lorenz, S.,Whiting D. (1999). A Kiswahili version of the
York: Springer. SF-36 health survey for use in Tanzania: Translation
Shumueli, A. (1998). The SF-36 profile and health- and tests of scaling assumptions. Quality of Life
related quality of life: An interpretative analysis. Quality Research, 8, 101–110.
of Life Research, 7, 187–195. Wang, Q., Chen, G. (1999). The health status of the
Skevington, S.M., Bradshaw, J.,Saxena, S. (1999). Singaporean population as measured by the Health
Selecting national items for the WHOQOL: Conceptual Utilities Index Mark III system. Singapore Medical
and psychometric considerations. Social Science & Journal, 40(6), 389–396.
Medicine, 48, 473–487. Watanabe, Y. (1995). Tests of the weighting and usage
Szabo, S. (1996). The World Health Organization of a quality of life (QOL) questionnaire. Annual Report
Quality of Life (WHOQOL) assessment instrument. In: of Research Committee on Epidemiology of Intractable
B. Spilker (Ed.), Quality of life and pharmacoeconomics Diseases (pp. 137–139). The Ministry of Health and
in clinical trials (pp. 355–362). Philadelphia: Lippincott- Welfare of Japan.
Raven. Watanabe, Y., Ozasa, K., Higashi, A., Hayashi, A.,
Szabo, S., Orley, J., Saxena, S. (1997). An approach to Taneike, R., Kudoh, A., et al. (1996). Tests of the
response scale development for cross-cultural question- weighting and usage of a quality of life (QOL)
naires. European Psychologist, 2(3), 270–276. questionnaire (report 2). Annual Report of Research
Szecket, N., Medin, G., Furlong, W.J., Feeny, D.H., Committee on Epidemiology of Intractable Diseases (pp.
Barr, R.D. (1999). Preliminary translation and cultural 318–326). The Ministry of Health and Welfare of Japan.
adaptation of Health Utilities Index questionnaires for Westbury, R.C., Rogers, T.B., Briggs, T.E., Allison,
application in Argentina. International Journal of Cancer D.J., Gervas, J., Shigemoto, H., et al. (1997). A
Supplement, 12, 119–124. multinational study of the factorial structure and other
Tazaki, M., Nakane, Y., Endo, T., Kakikawa, F., Kano, characteristics of the Dartmouth COOP Functional
K., Kawano, H., et al. (1998). Results of a qualitative Health Assessment Charts/WONCA. Family Practice,
and field study using the WHOQOL instrument for 14(6), 478–485.
cancer patients. Japanese Journal of Clinical Oncology, WHOQOL Group. (1995). The World Health Organiza-
28(2), 134–141. tion Quality of Life assessment (WHOQOL): Position
Thomas, K., Ruby, J., Peter, J.V., Cherian, A.M. (1995). paper from the World Health Organization. Social
Comparison of disease-specific and a generic quality of Science & Medicine, 41(10), 1403–1409.
life measure in patients with bronchial asthma. The WHOQOL Group. (1998a). The World Health Organi-
National Medical Journal of India, 8(6), 258–260. zation Quality of Life assessment (WHOQOL):
Thumboo, J., Fong, K.Y., Ng, T.P., Leong, K.H., Feng, Development and general psychometric properties.
P.H., Boey, M.L. (1997). Initial construct cross-cultural Social Science & Medicine, 46(12), 1569–1585.
ARTICLE IN PRESS
1304 A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306

WHOQOL Group. (1998b). Development of the World * Other disease specific and generic measures used
Health Organization WHOQOL-BREF Quality of Life alongside instrument.
Assessment. Psychological Medicine, 28, 551–558. * Details of other clinical or behavioural measures used
Wikblad, K., Smide, B., Bergstrom, A., Kessi, J., alongside instrument.
Mugusi, F. (1997). Outcome of clinical foot examina- * Listing of any other variables on which data was
tion in relation to self-perceived health and glycaemic collected.
control in a group of urban Tanzanian diabetic * Method of using measure.
patients. Diabetes Research and Clinical Practice, 37(3), * Details of translation process.
185–192.
Wrzesniewski, K. (1997). Study of quality of life using a C.3. Conceptual equivalence
Polish adaptation of the Nottingham Health Profile. In: * In what ways were the local populations conceptua-
J.B. Karski, H. Kirschner, &J. Leowski (Eds.),
lisation of health/QOL assessed?
Contemporary needs and ways of measuring health
(pp. 37–41). Warsaw: Ingis.
(a) review local literature,
Wyss, K., Wagner, A.K., Whiting, D., Mtasiwa, D.M.,
(b) review local questionnaires/instruments,
Tanner, M., Gandek, B., et al. (1999). Validation of the
(c) discussion amongst researchers,
Kiswahili version of the SF-36 health survey in a
(d) involvement of anthropologists, sociologists, etc.,
representative sample of an urban population in
(e) involvement of local people: (i) participant obser-
Tanzania. Quality of Life Research, 8, 111–120.
vation (ii) questionnaires,
Yodfat, Y. (1991). Functional status in the treatment of
(f) other.
health failure by Captopril: A multicentre, controlled,
What kind of people were asked to help judge the
double-blind study in family practice. Family Practice,
appropriateness of the instrument in the target setting?-
8(4), 409–411.
Who was involved in the research?Were any theoretical
Yodfat, Y. (1995). A multicentre study of lisinopril in
arguments presented questioning or accepting concep-
the treatment of mild to moderate hypertension.
tual equivalence?What were the outcomes of the above?
Harefuh, 129(1–2), 26–29.
And how were judgments made and justified?Do the
Zuniga, M.A., Carillo-Jimenez, G.T., Fos, P.J., Gandek,
authors claim conceptual equivalence?
B., Mdeina-Moreno, M.R. (1999). Health status
evaluation with the SF-36 survey: Preliminary
C.4. Item equivalence
results in Mexico. Salud Publica de Mexico, 41(2),
110–117. * What evidence was presented to suggest that lifestyle
patterns were the same/similar in the source and
target countries?
Appendix C. Review criteria operationalised from Herd- * How was the relevance or acceptability of individual
man et al. (1998) items/questions to the target population addressed?
* What quantitative analysis of items equivalence was
C.1. Background details undertaken?
* What were the outcomes of any quantitative or
* Publication authors and date. qualitative analysis?
* Journal title.
* What judgements were made about item equivalence?
* Location of research.
* Do the authors claim item equivalence?
* Funders of research.
* Disease and intervention studied. C.5. Semantic equivalence
* HRQL measure used. What types of meaning were addressed in assessing
* Source and target languages. semantic equivalence? How were authors/translators
sure about what was meant by the original questionnaire
in the source language? Was a semantic re-write of the
C.2. Methodological details original used? Were the original developers of the source
version contacted and what was the nature of the
* Sample characteristics: number, sampling frame, and contact? Were translation guidelines referred to? (a)
method of selection. instrument guidelines (b) any other guidelines? How was
* Sample characteristics: socio-demographic and eco- the meaning of keywords/phrases investigated in the
nomic variables. target language? Was the lexical relationship between
* Aims of study and hypothesis: specific to trial. words established? What kind of people did the
* Aims of study and hypothesis: specific to use of translations? Who was in involved in judging the quality
instrument. of the translations? Was a translation protocol followed?
ARTICLE IN PRESS
A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306 1305

What problems were identified in the translation exercise References


and how were they dealt with? Were items considered
easy/difficult/impossible to translate? Any other Aronson, B., & Bertrand, I. (1993). Letter. International Journal
evidence presented about semantic equivalence? Do the of Clinical Epidemiology, 22(1), 172.
authors claim semantic equivalence? Albrecht, G. L., & Fitzpatrick, R. (1994). A sociological
perspective on health-related quality of life research.
Advances in Medical Sociology, 5, 1–1721.
C.6. Operational equivalence Anderson, R. T., Aaronson, N. K., Bullinger, M., & McBee, W.
L. (1996). A review of the progress towards developing
* Were the same instructions and format used in the health-related quality-of-life instruments for international
source and target versions of the instrument? clinical studies and outcomes research. Phamacoeconomics,
* What are the literacy rates in the source and target 10(4), 336–355.
countries? Bobadilla, J. L., & Cowley, P. (1995). Designing and
* What issues of how to address respondents were implementing packages of essential health services. Journal
considered/reported? of International Development, 7(3), 543–554.
Bowden, A., Fox-Rushby, J. A., Nyandieka, L., & Wanjau, J.
* What was the % of missing data (by question) in the
(2002). Methods for pre-testing and piloting survey ques-
source and target data?
tions: Illustrations from the KENQOL survey of health-
* How were the response modes considered? related quality of life. Health Policy and Planning, 17(3),
* Were the same time frames used in the source and 322–330.
target questionnaires, and was the time frame Bullinger, M., Alonso, J., Apolone, G., Leplege, A., Sullivan,
investigated? M., & Wood-Dauphinee, S., et al. (1998). Translating health
* Were any reviews in the literature consulted about status questionnaires and evaluating their quality: The
appropriate response modes? IQOLA project approach. Clinical Journal of Epidemiology,
* How were issues pre-tested? 51(11), 913–923.
* Was there any analysis of comparative response bias Chambers, R. (1997). Whose reality counts? Putting the first last.
between source and target versions? London: Intermediate Technology Publications.
Forsyth, B. H., & Lessler, J. T. (1991). Cognitive laboratory
* Was the questionnaire considered to be operationally
methods: A taxonomy. In P. P. Biemer, S. M. Groves, L. E.
equivalent?
Lyberg, N. A. Mathiowetz, & S. Sudanudan (Eds.),
* Was any other evidence presented about operational Measurement errors in surveys (pp. 393–418). New York:
equivalence? Wiley.
* Do the authors claim operational equivalence? Fox-Rushby, J. (2000). Operationalising conceptions of ‘health’
amongst the Wakamba and Maragoli of Kenya: The basis
C.7. Measurement equivalence of the kenqol instrument. Quality of Life Research, 9(3),
316.
* How were the following issues investigated in the Fox-Rushby, J., & Parker, M. (1995). Culture and the
measurement of health-related quality of life. European
target setting:
Review of Applied Psychology, 45(4), 257–263.
Guillemin, F., Bombardier, C., & Beaton, D. (1993). Cross-
(a) reliability, cultural adaptation of health-related quality of life mea-
(b) validity, sures: Literature review and proposed guidelines. Journal of
(c) sensitivity, Clinical Epidemiology, 46(12), 1417–1432.
(d) scoring norms, Herdman, M., Fox-Rushby, J., & Badia, X. (1997). ‘Equiva-
lence’ and the translation and adaptation of health-related
(e) effect size,
quality of life questionnaires. Quality of Life Research, 6,
(f) socio-economic and demographic relationships
237–247.
with the instrument. Herdman, M., Fox-Rushby, J., & Badia, X. (1998). A model of
equivalence in the cultural adaptation of HRQoL instru-
ments: The universalist approach. Quality of Life Research,
* Were items weighted differently in the source and 7, 323–335.
target versions? Herdman, M., Fox-Rushby, J., Rabin, R., Badia, X., & Selai
* Any other evidence presented about measurement (forthcoming). Developing and translating different lan-
equivalence? guage versions of the EQ-5D. Brooks, R., Rabin, R., & de
* Do the authors claim measurement equivalence? Charro, F. (Eds.), The measurement and valuation of health
status using the EQ-5D: European perspective (Evidence
from the EuroQol BIOMED research program). Dordrecht:
C.8. Functional equivalence Kluwer Academic Publishers.
Do the authors claim functional equivalence? Hunt, S. M. (1994). Cross-cultural comparability of quality of
Refer to Herdman et al. (1998) for descriptions and life measures. International Symposium on Quality of Life
definitions of each criterion. and Health (pp. 25–27). Blackwell Verlag, Berlin.
ARTICLE IN PRESS
1306 A. Bowden, J.A. Fox-Rushby / Social Science & Medicine 57 (2003) 1289–1306

Rabin, R., Herdman, M., Fox-Rushby, J., & Badia, X. Sartorius, N., & Kuyken, W. (1994). Translation of health
(forthcoming). Exploring the results of translating the EQ- status instruments. In J. Orley, & W. Kuyken (Eds.),
5D into 11 European languages. In: R. Brooks, R. Rabin, & Proceedings of the joint meeting organized by the World
F. de Charro (Eds.), The measurement and valuation of Health Organization and the foundation IPSEN (pp. 3–18).
health status using the EQ-5D: European perspective Paris: Springer.
(Evidence from the EuroQol BIOMED research program).
Dordrecht: Kluwer Academic Publishers.

You might also like