Critical Appraisal Tools - Sie IT Intermoska-1

SYSTEMATIC REVIEW
Are the results of the review valid?
What question (PICO) did the systematic review address?
What is best? Where do I find the information?
The main question being addressed should be The Title, Abstract or final paragraph of the
clearly stated. The exposure, such as a therapy or Introduction should clearly state the question. If
diagnostic test, and the outcome(s) of interest will you still cannot ascertain what the focused ques-
often be expressed in terms of a simple relation- tion is after reading these sections, search for
ship. another paper!
In this paper
Yes No Unclear
Comment:
F - Is it unlikely that important, relevant studies were missed?
The starting point for a comprehensive search for The Methods section should describe the search
all relevant studies is the major bibliographic da- strategy, including the terms used, in some de-
tabases (eg Medline, Cochrane, EMBASE, etc) but tail. The Results section will outline the number of
should also include a search of reference lists from titles and abstracts reviewed, the number of full-
relevant studies and contact with experts, par- text studies retrieved, and the number of studies
ticularly to inquire about unpublished studies. The
search should not be limited to English language
only. The search strategy should include both
MESH terms and text words.
In this paper
Yes No Unclear
Comment:
A - Were the criteria used to select articles for inclusion
appropriate?
The inclusion or exclusion of studies in a system- The Methods section should describe in detail the
atic review should be clearly defined a priori. The inclusion and exclusion criteria. Normally, this will
eligibility criteria used should specify the patients, include the study design.
interventions or exposures and outcomes of in-
terest. In many cases the type of study design will
also be a key component of the eligibility criteria.
In this paper
Yes No Unclear
Comment:
A - Were the included studies sufficiently valid for the type of

question asked?
The article should describe how the quality of each The Methods section should describe the assess-
study was assessed using predetermined quality ment of quality and the criteria used. The Results
criteria appropriate to the type of clinical question section should provide information on the quality
(e.g., randomization, blinding and completeness of of the individual studies.
follow-up)
In this paper
Yes No Unclear
Comment:
T - Were the results similar from study to study?
Ideally, the results of the different studies should The Results section should state whether the
be similar or homogeneous. If heterogeneity exists results are heterogeneous and discuss possible
the authors may estimate whether the differences reasons. The forest plot should show the results of
are significant (chi-square test). Possible reasons the chi-square test for heterogeneity and discuss
for the heterogeneity should be explored. reasons for heterogeneity, if present.
In this paper
Yes No Unclear
Comment:
What were the results?
A systematic review provides a summary of the data from the results of a number of individual studies. If
the results of the individual studies are similar, a statistical method (called meta-analysis) is used to
combine the results from the individual studies and an overall summary estimate is calculated. The
meta-analysis gives weighted values to each of the individual studies according to their size. The
individual results of the studies need to be expressed in a standard way, such as relative risk, odds ratio or
mean difference between the groups. Results are traditionally displayed in a figure called a forest plot, like
the one below.
The forest plot depicted above represents a meta-analysis of five trials that assessed the effects of a
hypothetical treatment on mortality. Individual studies are represented by a black square and a horizontal
line, which corresponds to the point estimate and 95% confidence interval of the odds ratio. The size of
the black square reflects the weight of the study in the meta-analysis. The solid vertical line corresponds
to ‘no effect’ of treatment - an odds ratio of 1.0. When the confidence interval includes 1 it indicates that
the result is not significant at conventional levels (P>0.05).
The diamond at the bottom represents the combined or pooled odds ratio of all five trials with its 95%
confidence interval. In this case, it shows that the treatment reduces mortality by 34% (OR 0.66 95% CI
0.56 to 0.78). Notice that the diamond does not overlap with the ‘no effect’ line (the confidence interval
doesn’t include 1) so we can be assured that the pooled OR is statistically significant. The test for overall
effect also indicates statistical significance (p<0.0001).
Exploring heterogeneity
Heterogeneity can be assessed using the ‘eyeball’ test or more formally with statistical tests, such as the
Cochran Q test. With the ‘eyeball’ test one looks for overlap of the confidence intervals of the trials with
the summary estimate. In the example above note that the dotted line running vertically through the
combined odds ratio crosses the horizontal lines of all the individual studies indicating that the studies are
homogenous. Heterogeneity can also be assessed using the Cochran chi-square (Cochran Q). If Cochran Q
is statistically significant there is definite heterogeneity. If Cochran Q is not statistically significant but the
ratio of Cochran Q and the degrees of freedom (Q/df) is > 1 there is possible heterogeneity. If Cochran Q
is not statistically significant and Q/df is < 1 then heterogeneity is very unlikely. In the example above Q/df
is <1 (0.92/4= 0.23) and the p-value is not significant (0.92) indicating no heterogeneity.
Note: The level of significance for Cochran Q is often set at 0.1 due to the low power of the test to detect
heterogeneity.
SYSTEMATIC REVIEW
THERAPY STUDY
Are the results of the trial valid? (Internal Validity)
What question did the study ask?

• Patients
• Intervention
• Comparison
• Outcomes
1a. R- Was the assignment of patients to treatments randomised?
Centralised computer randomisation is ideal and The Methods should tell you how patients were
often used in multi-centred trials. Smaller trials allocated to groups and whether or not randomisa-
may use an independent person (eg the hospital tion was concealed.
pharmacy) to ‘police’ the randomization.
In this paper
Yes No Unclear
Comment:
1b. R- Were the groups similar at the start of the trial?
If the randomisation process worked (that is, The Results should have a table of ‘Baseline Char-
achieved comparable groups) the groups should be acteristics’ comparing the randomized groups on a
similar. The more similar the groups the better it is. number of variables that could affect the outcome
There should be some indication of whether differ- (ie age, risk factors etc). If not, there may be a de-
ences between groups are statistically significant scription of group similarity in the first paragraphs
(ie. p values). of the Results section.
In this paper
Yes No Unclear
Comment:
2a. A – Aside from the allocated treatment, were groups treated
equally?
Apart from the intervention the patients in the Look in the Methods section for the follow-up
different groups should be treated the same, eg schedule and permitted additional treatments etc,
additional treatments or tests. and in Results for actual use.
In this paper
Yes No Unclear
Comment:
2b. A – Were all patients who entered the trial accounted for? And
were they analysed in the groups to which they were randomised?
Losses to follow-up should be minimal – prefera- The Results section should say how many patients
bly less than 20%. However, if few patients have were randomised (eg Baseline Characteristics
the outcome of interest, then even small losses table) and how many patients were actually includ-
to follow-up can bias the results. Patients should ed in the analysis. You will need to read the results
also be analysed in the groups to which they were section to clarify the number and reason for losses
randomised – ‘intention-to-treat analysis’. to follow-up.
In this paper
Yes No Unclear
Comment:
3. M - Were measures objective or were the patients and clinicians

kept “blind” to which treatment was being received?
It is ideal if the study is ‘double-blinded’ – that is, First, look in the Methods section to see if there is
both patients and investigators are unaware of some mention of masking of treatments eg pla-
treatment allocation. If the outcome is objective cebos with the same appearance or sham therapy.
(eg death) then blinding is less critical. If the out- Second, the Methods section should describe how
come is subjective (eg symptoms or function) then the outcome was assessed and whether the asses-
blinding of the outcome assessor is critical. sor(s) were aware of the patients’ treatment.
In this paper
Yes No Unclear
Comment:
1. How large was the treatment effect?
Most often results are presented as dichotomous outcomes (yes or no outcomes that happen or don’t
happen) and can include such outcomes as cancer recurrence, myocardial infarction and death. Consider a
study in which 15% (0.15) of the control group died and 10% (0.10) of the treatment group died after 2
years of treatment. The results can be expressed in many ways as shown below.
What is the measure? What does it mean?
Relative Risk (RR) = risk of the outcome in the The relative risk tells us how many times more
treatment group / risk of the outcome in the con- likely it is that an event will occur in the treatment
trol group. group relative to the control group. An RR of 1
means that there is no difference between the two
groups thus, the treatment had no effect. An RR <
1 means that the treatment decreases the risk of
the outcome. An RR > 1 means that the treatment
increased the risk of the outcome.
In our example the RR = 0.10/0.15 = 0.67 Since the RR < 1, the treatment decreases the risk
of death.
Absolute Risk Reduction (ARR) = risk of the The absolute risk reduction tells us the absolute
outcome in the control group - risk of the outcome difference in the rates of events between the two
in the treatment group. This is also known as the groups and gives an indication of the baseline risk
absolute risk difference. and treatment effect. An ARR of 0 means that
there is no difference between the two groups -
therefore the treatment had no effect.
In our example the ARR = 0.15 - 0.10 = 0.05 or The absolute benefit of treatment is a 5% reduc-
5% tion in the death rate.
Relative Risk Reduction (RRR) = absolute risk The relative risk reduction is the complement of the
reduction / risk of the outcome in the control RR and is probably the most commonly reported
group. An alternative way to calculate the RRR is to measure of treatment effects. It tells us the reduc-
subtract the RR from 1 (eg. RRR = 1 - RR) tion in the rate of the outcome in the treatment
group relative to that in the control group.
In our example the RRR = 0.05/0.15 = 0.33 or The treatment reduced the risk of death by 33%
33% relative to that occurring in the control group.
Or
RRR = 1 - 0.67 = 0.33 or 33%
Number Needed to Treat (NNT) = inverse of the The number needed to treat represents the number
ARR and is calculated as 1 / ARR. of patients we need to treat with the experimen-
tal therapy in order to prevent 1 bad outcome and
incorporates the duration of treatment. Clinical
significance can be determined to some extent by
looking at the NNTs, but also by weighing the NNTs
against any harms or adverse effects (NNHs) of
therapy.
In our example the NNT = 1/ 0.05 = 20 We would need to treat 20 people for 2 years in
order to prevent 1 death.
2. How precise was the estimate of the treatment effect?
The true risk of the outcome in the population is not known and the best we can do is estimate the true
risk based on the sample of patients in the trial. This estimate is called the point estimate. We can gauge
how close this estimate is to the true value by looking at the confidence intervals (CI) for each estimate.
If the confidence interval is fairly narrow then we can be confident that our point estimate is a precise re-
flection of the population value. The confidence interval also provides us with information about the sta-
tistical significance of the result. If the value corresponding to no effect falls outside the 95% confidence
interval then the result is statistically significant at the 0.05 level. If the confidence interval includes the
value corresponding to no effect then the results are not statistically significant.
Will the results help me in caring for my patient? (ExternalValidity/Applicability)
The questions that you should ask before you decide to apply the results of the study to your patient are:
• Is my patient so different to those in the study that the results cannot apply?
• Is the treatment feasible in my setting?
• Will the potential benefits of treatment outweigh the potential harms of treatment for my patient?
THERAPY STUDY
DIAGNOSTIC ACCURACY
STUDIES
Are the results of the study valid?
Was the diagnostic test evaluated in a Representative spectrum of

patients (like those in whom it would be used in practice)?
It is ideal if the diagnostic test is applied to the full The Methods section should tell you how patients
spectrum of patients - those with mild, severe, were enrolled and whether they were randomly
early and late cases of the target disorder. It is also selected or consecutive admissions. It should also
best if the patients are randomly selected or con- tell you where patients came from and whether
secutive admissions so that selection bias is mini- they are likely to be representative of the patients
mized. in whom the test is to be used.
In this paper
Yes No Unclear
Comment:
Was the reference standard applied regardless of the index test

result?

Ideally both the index test and the reference The Methods section should indicate whether or
standard should be carried out on all patients in not the reference standard was applied to all pa-
the study. In some situations where the reference tients or if an alternative reference standard (eg
standard is invasive or expensive there may be res- follow-up) was applied to those who tested nega-
ervations about subjecting patients with a negative on the index test.
tive index test result (and thus a low probability of
disease) to the reference standard. An alternative
reference standard is to follow-up people for an
appropriate period of time (dependent on disease
in question) to see if they are truly negative.
In this paper
Yes No Unclear
Comment:
Was there an independent, blind comparison between the index test

and an appropriate reference (‘gold’) standard of diagnosis?

There are two issues here. First, the reference The Methods section should have a description of
standard should be appropriate - as close to the the reference standard used and if you are unsure
‘truth’ as possible. Sometimes there may not be a of whether or not this is an appropriate reference
single reference test that is suitable and a combi- standard you may need to do some background
nation of tests may be used to indicate the pres- searching in the area.
ence of disease. The Methods section should also describe who
Second, the reference standard and the index test conducted the two tests and whether each was
being assessed should be applied to each patient conducted independently and blinded to the results
independently and blindly. Those who interpreted of the other.
the results of one test should not be aware of the
results of the other test.
In this paper
Yes No Unclear
Comment:
Are test charecteristics presented?

There are two types of results commonly reported in diagnostic test studies. One concerns the accuracy
of the test and is reflected in the sensitivity and specificity. The other concerns how the test performs in
the population being tested and is reflected in predictive values (also called post-test probabilities). To ex-
plore the meaning of these terms, consider a study in which 1000 elderly people with suspected dementia
undergo an index test and a reference standard. The prevalence of dementia in this group is 25%. 240
people tested positive on both the index test and the reference standard and 600 people tested nega-
tive on both tests. The first step is to draw a 2 x 2 table as shown below. We are told that the prevalence
of dementia is 25% therefore we can fill in the last row of totals - 25% of 1000 people is 250 - so 250
people will have dementia and 750 will be free of dementia. We also know the number of people testing
positive and negative on both tests and so we can fill in two more cells of the table.
Reference standard
+ve -ve
Index test +ve 240
-ve 600
250 750 1000

By subtraction we can easily complete the table:
Reference standard
+ve -ve
Index test +ve 240 150 390
-ve 10 600 620
250 750 1000
Now we are ready to calculate the various measures.
What is the measure? What does it mean?
Sensitivity (Sn) = the proportion of people with The sensitivity tells us how well the test identifies
the condition who have a positive test result. people with the condition. A highly sensitive test
will not miss many people.
In our example the Sn = 240/250 = 0.96 10 people (4%) with dementia were falsely iden-
tified as not having it. This means the test is fairly
good at identifying people with the condition.
Specificity (Sp) = the proportion of people with- The specificity tells us how well the test identifies
out the condition who have a negative test result. people without the condition. A highly specific test
will not falsely identify many people as having the
condition.
In our example the Sp = 600/750 = 0.80 150 people (20%) without dementia were falsely
identified as having it. This means the test is only
moderately good at identifying people without the
condition.
Positive Predictive Value (PPV) = the proportion This measure tells us how well the test performs in
of people with a positive test who have the condi- this population. It is dependent on the accuracy of
tion. the test (primarily specificity) and the prevalence
of the condition.
In our example the PPV = 240/390 = 0.62 Of the 390 people who had a positive test result,
62% will actually have dementia.
Negative Predictive Value (NPV) = the propor- This measure tells us how well the test performs in
tion of people with a negative test who do not have this population. It is dependent on the accuracy of
the condition. the test and the prevalence of the condition.
In our example the NPV = 600/610 = 0.98 Of the 610 people with a -ve test , 98% will not
have dementia.
Applicability of the results
Were the methods for performing the test described in sufficient

detail to permit replication?

The article should have sufficient description of The Methods section should describe the test in
the test to allow its replication and also detail.
interpretation of the results.
In this paper
Yes No Unclear
Comment:
DIAGNOSTIC ACCURACY
STUDIES
CRITICAL APPRAISAL OF
PROGNOSTIC STUDIES
Are the results of the study valid? (Internal Validity)
1. Was the defined representative sample of patients assembled at a

common (usually early) point in the course of their disease?
It is preferable if study patients are enrolled at a The Methods section should describe the stage
uniformly early time in the disease usually when at which patients entered the study (eg at the
disease first becomes manifest. Such groups of time of first myocardial infarction; Stage 3 breast
patients are called an ‘inception cohort’. Patients cancer). The Methods section should also provide
should also be representative of the underlying information about patient recruitment, whether
population. Patients from tertiary referral centres patients were recruited from primary care or ter-
may have more advanced disease and poorer prog- tiary referral centres.
noses than patients from primary care.
In this paper
Yes No Unclear
Comment:
2. Was patient follow-up sufficiently long and complete?
Length of follow-up should be long enough to The Results section should state the median or
detect the outcome of interest. This will vary de- mean length of follow-up.
pending on the outcome (eg for pregnancy out-
comes, nine months; for cancer, many years). All The Results section should also provide the num-
patients should be followed from the beginning of ber of and the reasons for patients being unavaila-
the study until the outcome of interest or death ble for follow-up. A comparison of the two groups
occurs. Reasons for non follow-up should be pro- (those available and those unavailable) may be
vided along with comparison of the demographic presented in table form or the authors may simply
and clinical characteristics of the patients who state in the text whether or not there were differ-
were unavailable and those in whom follow-up was ences.
complete.
In this paper
Yes No Unclear
Comment:
3. Were outcome criteria either objective or applied in a ‘blind’

fashion?
A clear definition of all outcomes should be pro- The Methods section should provide a clear defini-
vided. It is ideal if less objective outcomes are tion or explicit criteria for each outcome. Whether
assessed blindly, that is, the individual determining determination is blinded to prognostic factors will
the outcome does not know whether the patient be found in either the Methods or Results sections.
has a potential prognostic factor.
In this paper
Yes No Unclear
Comment:
4. If subgroups with different prognoses are identified, did

adjustment for important prognostic factors take place?
A prognostic factor is a patient characteristic (eg The Results section should identify any prognos-
age, stage of disease) that predicts the patient’s tic factors and whether or not these have been
eventual outcome. The study should adjust for adjusted for in the analysis. Also look at the tables
known prognostic factors in the analysis so that and figures for evidence of this (eg there may be
results are not distorted. separate survival curves for patients at different
stages of disease or for different age groups).
In this paper
Yes No Unclear
Comment:
What are the results?
How likely are the outcomes over time?
There are several different ways of reporting outcomes of disease. Often they are reported simply as a
rate (eg the proportion of people experiencing an event). Expressing prognosis as a rate has some advan-
tages. It is simple, easily communicated and understood and readily committed to memory. Unfortunate-
ly, rates convey very little information and there can be important differences in prognosis within similar
summary rates. For this reason survival curves are used to estimate survival of a cohort over time. It is a
useful method for describing any dichotomous outcome (not just survival) that occurs only once during
the follow-up period. The figure below shows the survival curves for three diseases with the same survival
rate at 5 years. Notice that the summary rate obscures important differences to patients
Figure: Five year curves for three different diseases.

How precise are the prognostic estimates?
To determine the precision of the estimates we need to look at the 95% confidence intervals (CI) around
the estimate. The narrower the CI, the more useful the estimate. The precision of the estimates depends
on the number of observations on which the estimate is based. Since earlier follow-up periods usual-
ly include results from more patients than later periods, estimates on the left hand side of the curve are
usually more precise. Observations on the right or tail end of the curve are usually based on a very small
number of people because of deaths, dropouts and late entrants to the study. Consequently, estimates of
survival at the end of the follow-up period are relatively imprecise and can be affected by what happens
to only a few people.
Can I apply this valid, important evidence about prognosis to my patient?
The questions that you should ask before you decide to apply the results of the study to your patients are:
• Is my patient so different to those in the study that the results cannot apply?
• Will this evidence make a clinically important impact on my conclusions about what to offer to tell my
patients?
CRITICAL APPRAISAL OF
PROGNOSTIC STUDIES

Critical Appraisal Tools - Sie IT Intermoska-1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Critical Appraisal Tools - Sie IT Intermoska-1

Uploaded by

Copyright:

Available Formats

SYSTEMATIC REVIEW

Are the results of the review valid?

What question (PICO) did the systematic review address?

What is best? Where do I find the information?

F - Is it unlikely that important, relevant studies were missed?

What is best? Where do I find the information?

What is best? Where do I find the information?

A - Were the included studies sufficiently valid for the type of

What is best? Where do I find the information?

What is best? Where do I find the information?

What were the results?

Are the results of the trial valid? (Internal Validity)

What question did the study ask?

1a. R- Was the assignment of patients to treatments randomised?

What is best? Where do I find the information?

1b. R- Were the groups similar at the start of the trial?

What is best? Where do I find the information?

What is best? Where do I find the information?

What is best? Where do I find the information?

3. M - Were measures objective or were the patients and clinicians

What is best? Where do I find the information?

What were the results?

1. How large was the treatment effect?

What is the measure? What does it mean?

2. How precise was the estimate of the treatment effect?

Will the results help me in caring for my patient? (ExternalValidity/Applicability)

Was the diagnostic test evaluated in a Representative spectrum of

What is best? Where do I find the information?

Was the reference standard applied regardless of the index test

What is best? Where do I find the information?

Was there an independent, blind comparison between the index test

What is best? Where do I find the information?

What were the results?

Are test charecteristics presented?

250 750 1000

250 750 1000

Now we are ready to calculate the various measures.

What is the measure? What does it mean?

Applicability of the results

Were the methods for performing the test described in sufficient

What is best? Where do I find the information?

1. Was the defined representative sample of patients assembled at a

What is best? Where do I find the information?

2. Was patient follow-up sufficiently long and complete?

What is best? Where do I find the information?

3. Were outcome criteria either objective or applied in a ‘blind’

What is best? Where do I find the information?

4. If subgroups with different prognoses are identified, did

What is best? Where do I find the information?

How likely are the outcomes over time?

Figure: Five year curves for three different diseases.

Can I apply this valid, important evidence about prognosis to my patient?

You might also like