You are on page 1of 5

PLoS CLINICAL TRIALS

Essay

Factors That Can Affect the External


Validity of Randomised Controlled Trials
Peter M. Rothwell

Randomised controlled trials (RCTs) settings, but they can each sometimes which can be substantial—or important
must be internally valid (i.e., design undermine external validity. Moreover, racial differences in pathology and natural
and conduct must eliminate the the list is not exhaustive and requires history of disease also affect the external
possibility of bias), but to be clinically more detailed annotation and validity of RCTs. A good example is the
useful, the result must also be relevant to explanation than is possible in this short heterogeneity of results of trials of bacilli
a definable group of patients in a review. calmette guerin vaccination in prevention
particular clinical setting (i.e., they Some of the issues that determine of tuberculosis, with a progressive loss of
must be externally valid). Lack of external validity are relevant to the efficacy (p , 0.0001) and with decreasing
distinction between pragmatic trials and latitude [4].
external validity is the most frequent
explanatory trials [2], but it would be How centres and clinicians were
criticism by clinicians of RCTs,
wrong to assume that pragmatic trials selected to participate in trials is seldom
systematic reviews, and guidelines, and
necessarily have greater external validity reported, but can also have important
is one explanation for the widespread than explanatory trials. For example, implications for external validity. For
underuse in routine practice of broad eligibility criteria, limited example, the Asymptomatic Carotid
many treatments that have been shown collection of baseline data, and inclusion Artery Study (ACAS) trial of
to be beneficial in trials and are of centres with a range of expertise and endarterectomy for asymptomatic
recommended in guidelines [1]. Yet differing patient populations have many carotid stenosis only accepted surgeons
medical journals, funding agencies, advantages, but they can also make it very with an excellent safety record, rejecting
ethics committees, the pharmaceutical difficult to generalise the overall average 4 0 % o f a p p l i c a n t s i n it i a l l y , a n d
industry, and governmental regulators effect of treatment to a particular clinical subsequently barring from further
seem to give external validity a low setting. participation those who had adverse
priority. Admittedly, whereas the operative outcomes in the trial. The
determinants of internal validity are The Setting of the Trial benefit from surgery in ACAS was due in
intuitive and can generally be worked major part to the consequently low
out from first principles, understanding A detailed understanding of the setting
operative risk [5]. A meta-analysis of 46
of the determinants of the external in which a trial is performed, including
surgical case series that published
validity of an RCT requires clinical any peculiarities of the health-care
operative risks during the five years after
rather than statistical expertise, and system in particular countries, can be
ACAS found operative mortality to be
often depends on a detailed essential in judging external validity. The
potential impact of differences between
understanding of the particular clinical
health-care systems is illustrated by the .........................................................
condition under study and its
analysis of the results of the European
management in routine clinical Funding: The author received no specific funding for
Carotid Surgery Trial (ECST) [3], an
practice. However, reliable judgments this article.
RCT of endarterectomy for recently
about the external validity of RCTs are symptomatic carotid stenosis, in Figure Competing Interests: The author declares that he
essential if treatments are to be used 1. National differences in the speed with has no competing interests.
correctly in as many patients as possible which patients were investigated, with a Citation: Rothwell PM (2006) Factors that can affect
in routine clinical practice. median delay from last symptoms to the external validity of randomised controlled trials.
The results of RCTs or systematic PLoS Clin Trials 1(1): e9. DOI: 10.1371/journal.pctr.
randomisation of greater than two 0010009
reviews will never be relevant to all months in the United Kingdom (slow
patients and all settings, but they should centres) compared with three weeks in DOI: 10.1371/journal.pctr.0010009
be designed and reported in a way that Belgium and Holland (fast centres), Copyright: Ó 2006 Peter M. Rothwell. This is an
allows clinicians to judge to whom the resulted in very different treatment open-access article distributed under the terms of
results can reasonably be applied. Table 1 the Creative Commons Attribution License, which
effects in these different health-care permits unrestricted use, distribution, and
lists some of the important potential systems—due to the shortness of the time reproduction in any medium, provided the original
determinants of external validity, each window for effective prevention of stroke. author and source are credited.
of which is reviewed briefly below. Many Similar differences in performance Abbreviations: EAFT, European Atrial Fibrillation
of the considerations will only be relevant between health-care systems will exist for Trial; MI, myocardial infarction; RCT, randomised
in certain types of trials, for certain other conditions, and there is, of course, controlled trial; SPIRIT, Stroke Prevention in
interventions, or in certain clinical Reversible Ischaemia Trial
the broader issue of how trials done in the
developed world apply in the developing Peter M. Rothwell is Professor of Clinical Neurology,
world. Moreover, other differences Stroke Prevention Research Unit, University Depart-
The Essay section contains opinion pieces on topics ment of Clinical Neurology, Radcliffe Infirmary,
of broad interest to a general medical audience. between countries in the methods of Oxford, United Kingdom. E-mail: peter.rothwell@
diagnosis and management of disease— clneuro.ox.ac.uk

www.plosclinicaltrials.org 0001 May | 2006 | e9


eight times higher and the risk of stroke ...
and death to be about three times higher ... Table 1. Main Issues That Can Affect External Validity and Should Be Addressed in
..
[1]. Trials should not include centres that .. Reports of the Results of Randomised Controlled Trials or Systematic Reviews and
..
do not have the competence to treat .. Considered by Clinicians
..
patients safely, but selection should not .........................................................................................................................
...
be so exclusive that the results cannot be .. Issue Example
generalised to routine clinical practice.
..
..
...
Selection and Exclusion of ... Setting of the trial Health-care system
... Country
Patients ... Recruitment from primary, secondary, or tertiary care
.. Selection of participating centres
Concern is often expressed about highly ..
.. Selection of participating clinicians
selective trial eligibility criteria, but there .. Selection of patients Methods of prerandomisation diagnosis and investigation
..
are often several earlier stages of ... Eligibility criteria
selection that are rarely recorded or .. Exclusion criteria
..
reported but which can be more .. Placebo run-in period
... Treatment run-in period
problematic. For example, consider a ..
.. ‘‘Enrichment’’ strategies
trial of a new blood pressure–lowering .. Ratio of randomised patients to eligible nonrandomised
drug, which like most such trials is ...
... patients in participating centres
performed in a hospital clinic. Fewer ... Proportion of patients who declined randomisation
than 10% of patients with hypertension ... Characteristics of randomised patients Baseline clinical characteristics
.. Racial group
are managed in hospital clinics, and this ..
.. Uniformity of underlying pathology
group will differ from those managed in ... Stage in the natural history of their disease
primary care. Moreover, only one of the ..
.. Severity of disease
ten physicians who see hypertensive .. Comorbidity
...
patients in this particular hospital is .. Absolute risks of a poor outcome in the control group
taking part in the trial, and this
.. Differences between trial
..
physician mainly sees young patients .. protocol and routine practice Trial intervention
.. Timing of treatment
with resistant hypertension. Thus, even ...
before any consideration of eligibility or ... Appropriateness/relevance of control intervention
... Adequacy of nontrial treatment—both intended and actual
exclusion criteria, potential recruits are ... Prohibition of certain nontrial treatments
already very unrepresentative of patients ... Therapeutic or diagnostic advances since trial was performed
.. Outcome measures and follow-up Clinical relevance of surrogate outcomes
in the local community. It is essential ..
.. Clinical relevance, validity, and reproducibility of complex scales
therefore that, where possible, trials ... Effect of intervention on most relevant components of
record and report the pathways to ..
.. composite outcomes
recruitment. .. Identification of who measured outcome
...
Patients are then further selected ... Use of patient-centred outcomes
according to trial eligibility criteria. ... Frequency of follow-up
Some RCTs exclude women and many ... Adequacy of the length of follow-up
.. Adverse effects of treatment Completeness of reporting of relevant adverse effects
exclude the elderly and/or patients with ..
.. Rates of discontinuation of treatment
common comorbidities. One review of ... Selection of trial centres and/or clinicians on the basis of skill or
214 drug trials in acute myocardial ..
.. experience
infarction (MI) found that over 60% ... Exclusion of patients at risk of complications
excluded patients aged over 75 years [6], ... Exclusion of patients who experienced adverse effects during a
..
despite the fact that over 50% of MIs .. run-in period
.. Intensity of trial safety procedures
occur in this older age group. A review of ...
. DOI: 10.1371/journal.pctr.0010009.t001
41 United States National Institutes of
Health RCTs found an average exclusion
rate of 73% [7], but rates can be much
Strict eligibility criteria can limit the for a succinct message does not usually
higher. One study of the eligibility
external validity of RCTs, but physicians allow detailed consideration of the
criteria of an acute stroke treatment
should at least be able to select similar eligibility and exclusion criteria or other
trial found that of the small proportion
patients for treatment in routine determinants of external validity.
of patients admitted to hospital in time to
practice. Unfortunately, however, Prerandomisation run-in periods are
be suitable for treatment, 96% were
ineligible based on the various other reporting of trial eligibility criteria is also often used to select or exclude
exclusion criteria [8]. One centre in frequently inadequate. A review of trials patients. In a placebo run-in, all eligible
another acute stroke trial had to screen leading to clinical alerts by the US patients receive placebo, and those who
192 patients over two years to find an National Institutes of Health revealed are poorly compliant are excluded. There
eligible patient [9]. Yet, highly selective that of an average of 31 eligibility can be good reasons for doing this, but
recruitment is not inevitable. The GISSI-1 criteria, only 63% were published in the high rates of exclusion will reduce
trial of thrombolysis for acute MI, for main trial report and only 19% in the external validity. Active treatment run-in
example, recruited 90% of patients clinical alert [11]. Inadequate reporting is periods in which patients who have
admitted within 12 hours of the event also a major problem in secondary adverse events or show signs that
with a definite diagnosis and no publications, such as systematic reviews treatment may be ineffective are
contraindications [10]. and clinical guidelines, where the need excluded are more likely to undermine

www.plosclinicaltrials.org 0002 May | 2006 | e9


external validity. For example, two RCTs
of carvedilol, a vasodilatory beta-blocker,
in chronic heart failure excluded 6% and
9% of eligible patients in treatment run-in
periods—mainly because of worsening
heart failure and other adverse events,
some of which were fatal [1]. In both trials,
the complication rates in the subsequent
randomised phase were much lower than
in the run-in phase.
Trials also sometimes actively recruit
patients who are likely to respond well to
treatment (often termed ‘‘enrichment’’).
For example, some trials of antipsychotic
drugs have selectively recruited patients
who have previously had a good response
to antipsychotics [1]. Other trials have
excluded nonresponders in a run-in
phase. One RCT of a cholinesterase
inhibitor, tacrine, in Alzheimer disease
recruited 632 patients to a six-week
‘‘enrichment’’ phase in which they were
randomised to different doses of tacrine
versus placebo [12]. After a washout
period, only the 215 (34%) patients who
had a measured improvement on tacrine
in the ‘‘enrichment’’ phase were
randomised to tacrine (at their best
dose) versus placebo in the main phase
of the trial. External validity is clearly
undermined here.

Characteristics of Randomised
Patients
Even in large pragmatic trials with very
few exclusion criteria, recruitment of less
than 10% of potentially eligible patients
in participating centres is common.
Those patients who are recruited
generally differ from those who are
eligible but not recruited in terms of
age, sex, race, severity of disease,
educational status, social class, and place
of residence. The outcome in patients
included in RCTs is also usually better
than those not in trials, often markedly
so, not because of better treatment but
because of a better baseline prognosis.
Trial reports usually include the baseline
clinical characteristics of randomised
patients, so it is argued that clinicians
can assess external validity by comparison
with their patients. However, recorded
baseline clinical characteristics often say
DOI: 10.1371/journal.pctr.0010009.g001 very little about the real makeup of the
trial population, and can sometimes be
Figure 1. The Absolute Reductions in the Five-Year Risks of Ipsilateral Ischaemic Stroke misleading. For example, Table 2 shows
(Top) and Any Stroke or Death (Bottom) with Surgery in European Carotid Surgery Trial the baseline clinical characteristics of
Centres in Which the Median Delay from Last Symptomatic Event to Randomisation Was patients randomised to warfarin in two
Less than or Equal to 50 Days (Fast Centres) Compared with Centres with a Longer
RCTs of secondary prevention of stroke
Delay (Slow Centres)
[1]. In one trial, patients were in atrial
Data are shown separately for patients with moderate (50%–69%) and severe (70%–99%) carotid stenosis.
fibrillation, and in the other they were in
sinus rhythm, but the characteristics of

www.plosclinicaltrials.org 0003 May | 2006 | e9


..
the two cohorts were otherwise fairly ..
similar. However, the risk of intracranial ... Table 2. The Baseline Clinical Characteristics and Haemorrhage Outcomes of Patients
... Randomised to Anticoagulation with Warfarin in EAFT and SPIRIT
haemorrhage on warfarin was 19 times ..
.........................................................................................................................
higher (p , 0.0001) in Stroke Prevention ..
in Reversible Ischaemia Trial (SPIRIT) ... Measurement Criterion SPIRIT (n ¼ 651) EAFT (n¼225)
..
than in the European Atrial Fibrillation ..
..
Trial (EAFT), even after adjustment for ... Baseline clinical characteristics Male sex 66% 55%
differences in baseline clinical .. Age . 65 years 47% 81%
..
characteristics and the intensity of .. Hypertension 39% 48%
... Angina 9% 11%
anticoagulation [13]. In judging external ..
.. Myocardial infarction 9% 7%
validity, an understanding of how ..
.. Diabetes 11% 12%
patients were referred, investigated, and .. Leukoariosis on computerised 7% 14%
diagnosed (i.e., their pathway to ... tomography brain scan
..
recruitment), as well as how they were .. Outcomes during trial Mean (standard deviation) 3.3 (1.1) 2.9 (0.7)
..
subsequently selected and excluded, is ... international normalized ratio
often much more informative than a list ... during trial
... Patient-years of follow-up 735 507
of baseline characteristics. ... Intracranial haemorrhage 27 0a
.. Extracranial haemorrhage 26 13
..
The Intervention, Control .. Adjusted hazard ratio Intracranial haemorrhage 19.0 (2.4–250) p , 0.0001
... (95% confidence interval)a
..
Treatment, and Pre-trial or .. Extracranial haemorrhage 1.9 (0.8–4.7) p ¼ 0.15
.. a
Nontrial Management ... There were no proven intracranial haemorrhages, but no computerised tomography scan was performed in two
External validity can also be affected if ... strokes. For the purpose of calculation of the adjusted hazard ratio for haemorrhage, these two strokes were
... categorised as having been due to intracranial haemorrhage.
trials have protocols that differ from usual . DOI: 10.1371/journal.pctr.0010009.t002
clinical practice. For example, prior to
randomisation in the RCTs of
endarterectomy for symptomatic carotid which had previously been shown to be were assessed by a surgeon or a
stenosis, patients had to be diagnosed by a correlated with a relevant clinical outcome neurologist [16].
neurologist and have conventional in observational studies, but where the Many trials combine events in their
arterial angiography, neither of which treatments have proved ineffective or primary outcome measure. This can
are routine in many centres. The trial harmful in subsequent large RCTs that produce a useful measure of the overall
intervention itself may also differ from used these same clinical outcomes [1]. effect of treatment on all the relevant
that used in current practice, such as in Complex scales, often made up of outcomes, and it usually affords greater
the formulation and bioavailability of a arbitrary combinations of symptoms and statistical power, but the outcome that is
drug, or the type of anaesthetic used for clinical signs, are also problematic. A most important to a particular patient
an operation. The same can be true of the review of 196 RCTs in rheumatoid may be affected differently by treatment
treatment in the control group in a trial, arthritis identified more than 70 than the combined outcome. Composite
which may use a particularly low dose of different outcome scales [14]. More outcomes also sometimes combine events
the comparator drug, or fall short of best worryingly, a review of 2,000 RCTs in of very different severity, and treatment
current practice in some other way. schizophrenia identified 640 scales— effects can be driven by the least
External validity can also be undermined many of which were devised for the important outcome, which is often the
by too stringent limitations on the use of particular RCT and had no supporting most frequent. Equally problematic is the
nontrial treatments. Any prohibition of data on validity or reliability, but which composite of definite clinical events and
nontrial treatments should be reported in were more likely to show statistically episodes of hospitalisation. The fact that
the main trial publications, along with significant treatment effects than a patient is in an RCT will probably affect
details of relevant nontrial treatments established scales [15]. Moreover, the the likelihood of hospitalisation, and it
that were used. The timing of many
clinical meaning of apparent treatment will certainly vary between different
interventions is also critical and should
effects (e.g., a 2.7-point mean reduction in health-care systems.
be reported when relevant.
a 100-point outcome scale made up of Another major problem for the
various symptoms and signs) is usually e x t e r n a l v a l i d i t y o f R C T s is a n
Outcome Measures and impossible to discern. Simple clinical inadequate duration of treatment and/or
Follow-Up outcomes usually have most external follow-up. For example, although patients
The external validity of an RCT also validity, but, even then, only if they with refractory epilepsy or migraine
depends on whether the outcomes were reflect the priorities of patients. For require treatment for many years, most
clinically relevant. Many trials use example, patients with epilepsy are much RCTs of new drugs look at the effect of
‘‘surrogate’’ outcomes, usually biological more interested in the proportion of treatment for only a few weeks. Whether
or imaging markers that are thought to be individuals rendered free of seizures in initial response is a good predictor of
indirect measures of the effect of RCTs of anticonvulsants than they are in long-term benefit is unknown. The same
treatment on clinical outcomes. However, changes in mean seizure frequency. problem has been identified in RCTs in
as well as being of questionable clinical Identifying who actually measured the schizophrenia, with fewer than 50% of
relevance, surrogate outcomes are often outcome can also be important. For trials having greater than six-week follow-
misleading. There are many examples of example, the recorded operative risk of up, and only 20% following patients for
treatments that have had a major stroke due to carotid endarterectomy is longer than six months [17]. The contrast
beneficial effect on a surrogate outcome, highly dependent on whether patients between beneficial effects of treatments

www.plosclinicaltrials.org 0004 May | 2006 | e9


in short-term RCTs and the less some of those performed by the 6. Gurwitz JH, Col NF, Avorn J (1992) The
encouraging experience of long-term pharmaceutical industry. Yet exclusion of elderly and women from clinical
trials in acute myocardial infarction. JAMA
treatment in clinical practice has also researchers, funding agencies, ethics 268: 1417–1422.
been highlighted by clinicians treating committees, medical journals, and 7. Charleson ME, Horwitz RI (1984) Applying
patients with rheumatoid arthritis [18]. governmental regulators all neglect results of randomised trials to clinical practice:
Impact of losses before randomisation. BMJ
proper consideration of external 289: 1281–1284.
Adverse Effects of Treatment validity. Judgment is left to clinicians, 8. Jorgensen HS, Nakayama H, Kammersgaard
Reporting of adverse effects of treatment but reporting of the determinants of LP, Raaschou HO, Olsen TS (1999) Predicted
in RCTs and systematic reviews is often impact of intravenous thrombolysis on
external validity in trial publications,
prognosis of general population of stroke
poor. In a review of 192 pharmaceutical and particularly in secondary reports patients: Simulation model. BMJ 319: 288–289.
trials, less then a third had adequate and clinical guidelines, is rarely 9. LaRue LJ, Alter M, Traven ND, Sterman AB,
reporting of adverse clinical events or adequate and much relevant Sobel E, et al. (1988) Acute stroke therapy
trials: problems in patient accrual. Stroke 19:
laboratory toxicology [19]. Treatment information is never published. RCTs 950–954.
discontinuation rates provide some cannot be expected to produce results 10. Gruppo Italiano per lo Studio della
guide to tolerability, but pharmaceutical that are directly relevant to all patients Streptochinasi nell’Infarto Miocardico [GISSI]
trials often use eligibility criteria and run- (1986) Effectiveness of intravenous
and all settings, but to be externally valid thrombolytic treatment in acute myocardial
in periods to exclude patients who might they should at least be designed and infarction. Lancet 1: 397–402.
be prone to adverse effects. reported in a way that allows clinicians 11. Shapiro SH, Weijer C, Freedman B (2000)
Clinicians are usually most concerned to judge to whom they can reasonably be Reporting the study populations of clinical
trials. Clear transmission or static on the line? J
about external validity of RCTs of applied. A consensus is required on how Clin Epidemiol. 53: 973–979.
potentially dangerous treatments. the design and reporting of trials could 12. Davis KL, Thal LJ, Gamzu ER, Davis CS,
Complications of medial interventions be improved in order to achieve this aim. Woolson RF, et al. (1992) A double-blind,
are a leading cause of death in placebo-controlled multicenter study of
Agreement on a list of the most tacrine for Alzheimer’s disease. The Tacrine
developed countries. Risks can be important issues that sh ould be Collaborative Study Group. N Engl J Med 327:
overestimated in RCTs, particularly considered by clinicians and researchers 1253–1259.
during the introduction of new would be a helpful first step. “ 13. Gorter JW (1999) Major bleeding during
treatments when trials are often done in anticoagulation after cerebral ischaemia:
Patterns and risk factors. Neurology 53: 1319–
patients with very severe disease, but 1327.
stringent selection of patients, REFERENCES 14. Gøtzsche PC (1989) Methodology and overt
confinement to specialist centres, and 1. Rothwell PM (2005) External validity of and hidden bias in reports of 196 double-blind
randomised controlled trials: To whom do the trials of nonsteroidal antiinflammatory drugs
intensive safety monitoring usually lead in rheumatoid arthritis. Control Clin Trials 10:
results of this trial apply? Lancet 365: 82–93.
to lower risks than routine clinical 2. Tunis SR, Stryer DB, Clancy CM (2003) 31–56.
practice. RCTs of warfarin in Practical clinical trials: Increasing the value of 15. Marshall M, Lockwood A, Bradley C, Adams C,
nonrheumatic atrial fibrillation are good clinical research for decision making in Joy C, et al. (2000) Unpublished rating scales—
clinical and health policy. JAMA 290: 1624– A major source of bias in randomised
examples. All trials reported benefit controlled trials of treatments for
1632.
with warfarin, but complication rates 3. European Carotid Surgery Trialists’ schizophrenia? Br J Psychiatry 176: 249–252.
were much lower than in routine Collaborative Group (1998) . European 16. Rothwell PM, Warlow CP (1995) Is self-audit
Carotid Surgery Trialists’ Collaborative Group reliable? Lancet 346: 1623.
practice, and consequent doubts about
(1998) Randomised trial of endarterectomy for 17. Thornley B, Adams CE (1998) Content and
external validity are partly to blame for recently symptomatic carotid stenosis: Final quality of 2000 controlled trials in
major underprescribing of warfarin, results of the MRC European Carotid Surgery schizophrenia over 50 years. BMJ 317: 1181–
particularly in the elderly [1]. Trial (ECST) Lancet 351: 1379–1387. 1184.
4. Fine PEM (1995) Variation in protection by 18. Pincus T (1998) Rheumatoid arthritis:
BCG: Implications of and for heterologous Disappointing long-term outcomes despite
immunity. Lancet 346: 1339–1345. successful short-term clinical trials. J Clin
CONCLUSIONS 5. Asymptomatic Carotid Atherosclerosis Study Epidemiol 41: 1037–1041.
Group (1995) Carotid endarterectomy for 19. Ioannidis JP, Contopoulos-Ioannidis DG (1998)
Some trials have excellent external patients with asymptomatic internal carotid Reporting of safety data from randomised
validity, but many do not, particularly artery stenosis. JAMA 273: 1421–1428. trials. Lancet 352: 1752–1753.

www.plosclinicaltrials.org 0005 May | 2006 | e9

You might also like