Professional Documents
Culture Documents
Critical Appraisal Tools - Sie IT Intermoska-1
Critical Appraisal Tools - Sie IT Intermoska-1
The main question being addressed should be The Title, Abstract or final paragraph of the
clearly stated. The exposure, such as a therapy or Introduction should clearly state the question. If
diagnostic test, and the outcome(s) of interest will you still cannot ascertain what the focused ques-
often be expressed in terms of a simple relation- tion is after reading these sections, search for
ship. another paper!
In this paper
Yes No Unclear
Comment:
The starting point for a comprehensive search for The Methods section should describe the search
all relevant studies is the major bibliographic da- strategy, including the terms used, in some de-
tabases (eg Medline, Cochrane, EMBASE, etc) but tail. The Results section will outline the number of
should also include a search of reference lists from titles and abstracts reviewed, the number of full-
relevant studies and contact with experts, par- text studies retrieved, and the number of studies
ticularly to inquire about unpublished studies. The
search should not be limited to English language
only. The search strategy should include both
MESH terms and text words.
In this paper
Yes No Unclear
Comment:
A - Were the criteria used to select articles for inclusion
appropriate?
The inclusion or exclusion of studies in a system- The Methods section should describe in detail the
atic review should be clearly defined a priori. The inclusion and exclusion criteria. Normally, this will
eligibility criteria used should specify the patients, include the study design.
interventions or exposures and outcomes of in-
terest. In many cases the type of study design will
also be a key component of the eligibility criteria.
In this paper
Yes No Unclear
Comment:
The article should describe how the quality of each The Methods section should describe the assess-
study was assessed using predetermined quality ment of quality and the criteria used. The Results
criteria appropriate to the type of clinical question section should provide information on the quality
(e.g., randomization, blinding and completeness of of the individual studies.
follow-up)
In this paper
Yes No Unclear
Comment:
T - Were the results similar from study to study?
Ideally, the results of the different studies should The Results section should state whether the
be similar or homogeneous. If heterogeneity exists results are heterogeneous and discuss possible
the authors may estimate whether the differences reasons. The forest plot should show the results of
are significant (chi-square test). Possible reasons the chi-square test for heterogeneity and discuss
for the heterogeneity should be explored. reasons for heterogeneity, if present.
In this paper
Yes No Unclear
Comment:
A systematic review provides a summary of the data from the results of a number of individual studies. If
the results of the individual studies are similar, a statistical method (called meta-analysis) is used to
combine the results from the individual studies and an overall summary estimate is calculated. The
meta-analysis gives weighted values to each of the individual studies according to their size. The
individual results of the studies need to be expressed in a standard way, such as relative risk, odds ratio or
mean difference between the groups. Results are traditionally displayed in a figure called a forest plot, like
the one below.
The forest plot depicted above represents a meta-analysis of five trials that assessed the effects of a
hypothetical treatment on mortality. Individual studies are represented by a black square and a horizontal
line, which corresponds to the point estimate and 95% confidence interval of the odds ratio. The size of
the black square reflects the weight of the study in the meta-analysis. The solid vertical line corresponds
to ‘no effect’ of treatment - an odds ratio of 1.0. When the confidence interval includes 1 it indicates that
the result is not significant at conventional levels (P>0.05).
The diamond at the bottom represents the combined or pooled odds ratio of all five trials with its 95%
confidence interval. In this case, it shows that the treatment reduces mortality by 34% (OR 0.66 95% CI
0.56 to 0.78). Notice that the diamond does not overlap with the ‘no effect’ line (the confidence interval
doesn’t include 1) so we can be assured that the pooled OR is statistically significant. The test for overall
effect also indicates statistical significance (p<0.0001).
Exploring heterogeneity
Heterogeneity can be assessed using the ‘eyeball’ test or more formally with statistical tests, such as the
Cochran Q test. With the ‘eyeball’ test one looks for overlap of the confidence intervals of the trials with
the summary estimate. In the example above note that the dotted line running vertically through the
combined odds ratio crosses the horizontal lines of all the individual studies indicating that the studies are
homogenous. Heterogeneity can also be assessed using the Cochran chi-square (Cochran Q). If Cochran Q
is statistically significant there is definite heterogeneity. If Cochran Q is not statistically significant but the
ratio of Cochran Q and the degrees of freedom (Q/df) is > 1 there is possible heterogeneity. If Cochran Q
is not statistically significant and Q/df is < 1 then heterogeneity is very unlikely. In the example above Q/df
is <1 (0.92/4= 0.23) and the p-value is not significant (0.92) indicating no heterogeneity.
Note: The level of significance for Cochran Q is often set at 0.1 due to the low power of the test to detect
heterogeneity.
SYSTEMATIC REVIEW
THERAPY STUDY
Centralised computer randomisation is ideal and The Methods should tell you how patients were
often used in multi-centred trials. Smaller trials allocated to groups and whether or not randomisa-
may use an independent person (eg the hospital tion was concealed.
pharmacy) to ‘police’ the randomization.
In this paper
Yes No Unclear
Comment:
If the randomisation process worked (that is, The Results should have a table of ‘Baseline Char-
achieved comparable groups) the groups should be acteristics’ comparing the randomized groups on a
similar. The more similar the groups the better it is. number of variables that could affect the outcome
There should be some indication of whether differ- (ie age, risk factors etc). If not, there may be a de-
ences between groups are statistically significant scription of group similarity in the first paragraphs
(ie. p values). of the Results section.
In this paper
Yes No Unclear
Comment:
2a. A – Aside from the allocated treatment, were groups treated
equally?
Apart from the intervention the patients in the Look in the Methods section for the follow-up
different groups should be treated the same, eg schedule and permitted additional treatments etc,
additional treatments or tests. and in Results for actual use.
In this paper
Yes No Unclear
Comment:
2b. A – Were all patients who entered the trial accounted for? And
were they analysed in the groups to which they were randomised?
Losses to follow-up should be minimal – prefera- The Results section should say how many patients
bly less than 20%. However, if few patients have were randomised (eg Baseline Characteristics
the outcome of interest, then even small losses table) and how many patients were actually includ-
to follow-up can bias the results. Patients should ed in the analysis. You will need to read the results
also be analysed in the groups to which they were section to clarify the number and reason for losses
randomised – ‘intention-to-treat analysis’. to follow-up.
In this paper
Yes No Unclear
Comment:
It is ideal if the study is ‘double-blinded’ – that is, First, look in the Methods section to see if there is
both patients and investigators are unaware of some mention of masking of treatments eg pla-
treatment allocation. If the outcome is objective cebos with the same appearance or sham therapy.
(eg death) then blinding is less critical. If the out- Second, the Methods section should describe how
come is subjective (eg symptoms or function) then the outcome was assessed and whether the asses-
blinding of the outcome assessor is critical. sor(s) were aware of the patients’ treatment.
In this paper
Yes No Unclear
Comment:
Most often results are presented as dichotomous outcomes (yes or no outcomes that happen or don’t
happen) and can include such outcomes as cancer recurrence, myocardial infarction and death. Consider a
study in which 15% (0.15) of the control group died and 10% (0.10) of the treatment group died after 2
years of treatment. The results can be expressed in many ways as shown below.
Relative Risk (RR) = risk of the outcome in the The relative risk tells us how many times more
treatment group / risk of the outcome in the con- likely it is that an event will occur in the treatment
trol group. group relative to the control group. An RR of 1
means that there is no difference between the two
groups thus, the treatment had no effect. An RR <
1 means that the treatment decreases the risk of
the outcome. An RR > 1 means that the treatment
increased the risk of the outcome.
In our example the RR = 0.10/0.15 = 0.67 Since the RR < 1, the treatment decreases the risk
of death.
Absolute Risk Reduction (ARR) = risk of the The absolute risk reduction tells us the absolute
outcome in the control group - risk of the outcome difference in the rates of events between the two
in the treatment group. This is also known as the groups and gives an indication of the baseline risk
absolute risk difference. and treatment effect. An ARR of 0 means that
there is no difference between the two groups -
therefore the treatment had no effect.
In our example the ARR = 0.15 - 0.10 = 0.05 or The absolute benefit of treatment is a 5% reduc-
5% tion in the death rate.
Relative Risk Reduction (RRR) = absolute risk The relative risk reduction is the complement of the
reduction / risk of the outcome in the control RR and is probably the most commonly reported
group. An alternative way to calculate the RRR is to measure of treatment effects. It tells us the reduc-
subtract the RR from 1 (eg. RRR = 1 - RR) tion in the rate of the outcome in the treatment
group relative to that in the control group.
In our example the RRR = 0.05/0.15 = 0.33 or The treatment reduced the risk of death by 33%
33% relative to that occurring in the control group.
Or
RRR = 1 - 0.67 = 0.33 or 33%
Number Needed to Treat (NNT) = inverse of the The number needed to treat represents the number
ARR and is calculated as 1 / ARR. of patients we need to treat with the experimen-
tal therapy in order to prevent 1 bad outcome and
incorporates the duration of treatment. Clinical
significance can be determined to some extent by
looking at the NNTs, but also by weighing the NNTs
against any harms or adverse effects (NNHs) of
therapy.
In our example the NNT = 1/ 0.05 = 20 We would need to treat 20 people for 2 years in
order to prevent 1 death.
The true risk of the outcome in the population is not known and the best we can do is estimate the true
risk based on the sample of patients in the trial. This estimate is called the point estimate. We can gauge
how close this estimate is to the true value by looking at the confidence intervals (CI) for each estimate.
If the confidence interval is fairly narrow then we can be confident that our point estimate is a precise re-
flection of the population value. The confidence interval also provides us with information about the sta-
tistical significance of the result. If the value corresponding to no effect falls outside the 95% confidence
interval then the result is statistically significant at the 0.05 level. If the confidence interval includes the
value corresponding to no effect then the results are not statistically significant.
The questions that you should ask before you decide to apply the results of the study to your patient are:
• Is my patient so different to those in the study that the results cannot apply?
• Is the treatment feasible in my setting?
• Will the potential benefits of treatment outweigh the potential harms of treatment for my patient?
THERAPY STUDY
DIAGNOSTIC ACCURACY
STUDIES
Are the results of the study valid?
It is ideal if the diagnostic test is applied to the full The Methods section should tell you how patients
spectrum of patients - those with mild, severe, were enrolled and whether they were randomly
early and late cases of the target disorder. It is also selected or consecutive admissions. It should also
best if the patients are randomly selected or con- tell you where patients came from and whether
secutive admissions so that selection bias is mini- they are likely to be representative of the patients
mized. in whom the test is to be used.
In this paper
Yes No Unclear
Comment:
Comment:
Reference standard
+ve -ve
Index test +ve 240
-ve 600
Reference standard
+ve -ve
Index test +ve 240 150 390
-ve 10 600 620
Sensitivity (Sn) = the proportion of people with The sensitivity tells us how well the test identifies
the condition who have a positive test result. people with the condition. A highly sensitive test
will not miss many people.
In our example the Sn = 240/250 = 0.96 10 people (4%) with dementia were falsely iden-
tified as not having it. This means the test is fairly
good at identifying people with the condition.
Specificity (Sp) = the proportion of people with- The specificity tells us how well the test identifies
out the condition who have a negative test result. people without the condition. A highly specific test
will not falsely identify many people as having the
condition.
In our example the Sp = 600/750 = 0.80 150 people (20%) without dementia were falsely
identified as having it. This means the test is only
moderately good at identifying people without the
condition.
Positive Predictive Value (PPV) = the proportion This measure tells us how well the test performs in
of people with a positive test who have the condi- this population. It is dependent on the accuracy of
tion. the test (primarily specificity) and the prevalence
of the condition.
In our example the PPV = 240/390 = 0.62 Of the 390 people who had a positive test result,
62% will actually have dementia.
Negative Predictive Value (NPV) = the propor- This measure tells us how well the test performs in
tion of people with a negative test who do not have this population. It is dependent on the accuracy of
the condition. the test and the prevalence of the condition.
In our example the NPV = 600/610 = 0.98 Of the 610 people with a -ve test , 98% will not
have dementia.
In this paper
Yes No Unclear
Comment:
DIAGNOSTIC ACCURACY
STUDIES
CRITICAL APPRAISAL OF
PROGNOSTIC STUDIES
Are the results of the study valid? (Internal Validity)
It is preferable if study patients are enrolled at a The Methods section should describe the stage
uniformly early time in the disease usually when at which patients entered the study (eg at the
disease first becomes manifest. Such groups of time of first myocardial infarction; Stage 3 breast
patients are called an ‘inception cohort’. Patients cancer). The Methods section should also provide
should also be representative of the underlying information about patient recruitment, whether
population. Patients from tertiary referral centres patients were recruited from primary care or ter-
may have more advanced disease and poorer prog- tiary referral centres.
noses than patients from primary care.
In this paper
Yes No Unclear
Comment:
Length of follow-up should be long enough to The Results section should state the median or
detect the outcome of interest. This will vary de- mean length of follow-up.
pending on the outcome (eg for pregnancy out-
comes, nine months; for cancer, many years). All The Results section should also provide the num-
patients should be followed from the beginning of ber of and the reasons for patients being unavaila-
the study until the outcome of interest or death ble for follow-up. A comparison of the two groups
occurs. Reasons for non follow-up should be pro- (those available and those unavailable) may be
vided along with comparison of the demographic presented in table form or the authors may simply
and clinical characteristics of the patients who state in the text whether or not there were differ-
were unavailable and those in whom follow-up was ences.
complete.
In this paper
Yes No Unclear
Comment:
A clear definition of all outcomes should be pro- The Methods section should provide a clear defini-
vided. It is ideal if less objective outcomes are tion or explicit criteria for each outcome. Whether
assessed blindly, that is, the individual determining determination is blinded to prognostic factors will
the outcome does not know whether the patient be found in either the Methods or Results sections.
has a potential prognostic factor.
In this paper
Yes No Unclear
Comment:
A prognostic factor is a patient characteristic (eg The Results section should identify any prognos-
age, stage of disease) that predicts the patient’s tic factors and whether or not these have been
eventual outcome. The study should adjust for adjusted for in the analysis. Also look at the tables
known prognostic factors in the analysis so that and figures for evidence of this (eg there may be
results are not distorted. separate survival curves for patients at different
stages of disease or for different age groups).
In this paper
Yes No Unclear
Comment:
What are the results?
There are several different ways of reporting outcomes of disease. Often they are reported simply as a
rate (eg the proportion of people experiencing an event). Expressing prognosis as a rate has some advan-
tages. It is simple, easily communicated and understood and readily committed to memory. Unfortunate-
ly, rates convey very little information and there can be important differences in prognosis within similar
summary rates. For this reason survival curves are used to estimate survival of a cohort over time. It is a
useful method for describing any dichotomous outcome (not just survival) that occurs only once during
the follow-up period. The figure below shows the survival curves for three diseases with the same survival
rate at 5 years. Notice that the summary rate obscures important differences to patients
To determine the precision of the estimates we need to look at the 95% confidence intervals (CI) around
the estimate. The narrower the CI, the more useful the estimate. The precision of the estimates depends
on the number of observations on which the estimate is based. Since earlier follow-up periods usual-
ly include results from more patients than later periods, estimates on the left hand side of the curve are
usually more precise. Observations on the right or tail end of the curve are usually based on a very small
number of people because of deaths, dropouts and late entrants to the study. Consequently, estimates of
survival at the end of the follow-up period are relatively imprecise and can be affected by what happens
to only a few people.
The questions that you should ask before you decide to apply the results of the study to your patients are:
• Is my patient so different to those in the study that the results cannot apply?
• Will this evidence make a clinically important impact on my conclusions about what to offer to tell my
patients?
CRITICAL APPRAISAL OF
PROGNOSTIC STUDIES