You are on page 1of 12

11.

2 Diagnostic studies
A diagnostic test is a procedure which gives a rapid, convenient and/or inexpensive
indication of whether a patient has a certain disease. Some examples of diagnostic tests
are:
The Yale-Brown obsessive-compulsive scale, a simple yes/no answer to the
following question: Do you often feel sad or depressed? In a study of stroke patients
at the Royal Liverpool and Broadgreen University Hospitals (BMJ 2001; 323: 1159),
this test was shown to perform well compared to a more complex measure, the
Montgomery Asberg depression rating scale.
The SCOFF questionnaire asks five yes/no questions to determine whether a
patient has an eating disorder.
Do you ever make yourself sick because you feel uncomfortably full?
Do you worry you have lost control over how much you eat?
Have you recently lost more than one stone in a 3 month period?
Do you believe yourself to be fat when others say you are thin?
Would you say that food dominates your life?
Two or more yes answers is considered a positive test. In a study of 341
consecutive patients at two general practices in southwest London (BMJ 2002; 325:
755-756), these patients were given the SCOFF questionnaire and then a formal
interview based on Diagnostic and Statistical Manual of Mental Disorders, (fourth
edition). The interview lasted 10-15 minutes and the interviewer did not know that
score on the SCOFF questionnaire. The SCOFF questionnaire produced results that
were comparable to the formal interview.
Patients with rectal bleeding will sometimes develop colorectal cancer. In a
study at a network of practices in Belgium (BMJ 2000; 321; 998-999), 386 patients
presented with rectal bleeding between 1993 and 1994. After following these patients
for 18 to 30 months, only a few developed colorectal cancer.
A standard electrocardiogram can produce a measure called QTc dispersion.
In a study of 49 patients with peripheral vascular disease (BMJ 1996; 312: 874-878),
all were assessed for their QTc dispersion values. These patients were then followed for
52 to 77 months. During this time, there were 12 cardiac deaths, 3 non-cardiac deaths,
and 34 survivors. A value of QTc dispersion of 60 ms or more did quite well in
predicting cardiac death.
 
11.2.1 Assessing the quality of a diagnostic test
To assess the quality of a diagnostic test, you need to compare it to a gold
standard. This is a measurement that is slower, less convenient, or more expensive
than the diagnostic test, but which also gives a definitive indication of disease status.
The gold standard might involve invasive procedures like a biopsy or could mean
waiting for several years until the disease status becomes obvious.
You classify patients as having the disease or being healthy using the gold
standard. Then you count the number of times that the diagnostic test agrees and
disagrees with the gold standard of disease and the number of times that the
diagnostic test agrees and disagrees with the gold standard of being healthy.
This leads to four possible categories.
TP (true positive) = # who test positive and who have the disease,
FN (false negative) = # who test negative and who have the disease,
FP (false positive) = # who test positive and who are healthy, and
TN (true negative) = # who test negative and who are healthy.

  1
A good diagnostic test will minimize the number of false negative and false
positive results.
 
11.2.2 The role of prevalence
Prevalence is the proportion of patients who have the disease in the population
you are testing. This can vary quite a bit in real situations. For example, the
prevalence of a disease is often much higher in a tertiary care center than at a primary
care physician's office. Prevalence can also vary sometimes by seasons of the year. It
can also vary sometimes by race or gender.
Prevalence plays a large role in determining how effective a diagnostic test is.
Let's look at a hypothetical situation. In the graph below, patients on the left have the
disease and patients on the right are healthy. If you have the disease, the test can
either be a true positive test (TP) or a false negative (FN). If you are healthy, the test
can either be a false positive (FP) or a true negative (TN).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
This situation represents a disease with high prevalence. The test performs
reasonably well. Among the patients with disease only a few test negative. Among the
healthy patients only a few test positive. A positive test is reasonably definitive because
the number of true positives is much larger than the number of false positives.
Let's consider a different hypothetical situation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
In this situation, the prevalence of the disease is much lower. As before only a
few of the patients with disease test negative and only a few of the healthy patients
test positive. But since there are so many more healthy patients, their false positive
results swamp out the true positive results.

  2
In general, when the prevalence of the disease you are testing is rare, it
becomes harder to positively diagnose that disease. It takes a very very good test to
find the needle in the haystack.
 
11.2.3 Fagan nomogram
The Fagan nomogram is a graphical tool for estimating how much the result
on a diagnostic test changes the probability that a patient has a disease (NEJM 1975;
293: 257). A picture of the Fagan nomogram appears below.
To use this tool, you need to provide your best estimate of the probability of
the disease prior to testing. This is usually related to the prevalence of the disease,
though this may be modified up or down on the basis of certain risk factors that are
present in your patient pool or possibly in this particular patient. You also need to
know the likelihood ratio for the diagnostic test.
With this information, draw a line connecting the pre-test probability and the
likelihood ratio. Extend this line until it intersects with the post-test probability. The
point of intersection is the new estimate of the probability that your patient has this
disease.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here are details on how the graph works and how you could construct a
similar graph yourself. The principle is very much similar to a slide rule.
First, the computations involved use odds rather than ratios. If you multiply
the pre-test odds by the likelihood ratio, you will get the post-test odds. And since
multiplication of two numbers is equivalent to adding their logarithms, we use a log
scaling for both the odds and the likelihood ratio.
The official formula is:

  3
 
 
 
 
and the Fagan nomogram uses the equivalent formula
 
 
 
 
So although the labels on the left and right are written in terms of probability,
the tick marks are spaced at the log odds. For technical reasons.we have to set the
scaling of the log likelihood ratio to 1/2 that of the log odds. We also have to invert
the scale for the log pre-test odds.
So if you wanted to construct this graph yourself, simply plot a range of log
odds at x=+1. Plot an inverted range of log odds at x=+1. Write labels in terms of
probabilities rather than odds. Then plot 1/2 of the log likelihood ratio values at x=0.
Example
Here are a couple examples of how to use the Fagan nomogram.
A study of an early test for developmetnal dysplasia of the hip (AJPH 1998;
88(2): 285-288) computed a likelihood ratio for a positive result as 7 for boys (5 for
girls). The prevalence of this condition is 1.5% in boys (6% for girls). Suppose one of
our patients is a boy with no special risk factors. The diagnostic test is positive. What
can we say about the chances that this boy will develop hip dysplasia?

  4
The post-test probability is a bit below 10%.
Suppose this boy had a family history of hip dysplasia that would increase our
pre-test probability to 25%. How much would our assessment change if we.had a
negative test result?
The likelihood ratio for a negative result is 0.09 for boys (0.2 for girls) So we
would draw a line connecting the pre-test probability of slightly more than 20% to a
likelihood ratio about just a bit below .1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
This leads to a post-test probability around 3%.
A web based version of the Fagan Nomogram is available at
www.cebm.net/nomogram.asp. This version requires the Shockwave plug-in.
Summary
You can use a Fagan nomogram to calculate disease probabilities. You draw a line
connecting the pre-test probability of disease and the likelihood ratio. When you extend
this line to the right, it intersects at the post-test probability of disease.
 
11.2.4 Likelihood ratio
The likelihood ratio incorporates both the sensitivity and specificity of the test
and provides a direct estimate of how much a test result will change the odds of
having a disease. The likelihood ratio for a positive result (LR+) tells you how
much the odds of the disease increase when a test is positive. The likelihood ratio

  5
for a negative result (LR-) tells you how much the odds of the disease decrease
when a test is negative.
You combine the likelihood ratio with information about
• the prevalance of the disease,
• characteristics of your patient pool, and
• information about this particular patient
• to determine the post-test odds of disease.
If you want to quantify the effect of a diagnostic test, you have to first provide
information about the patient. You need to specify the pre-test odds: the likelihood
that the patient would have a specific disease prior to testing. The pre-test odds are
usually related to the prevalence of the disease, though you might adjust it upwards or
downwards depending on characteristics of your overall patient pool or of the
individual patient.
You are probably more comfortable specifying a probability instead of an
odds, and if so there are simple formulas for converting probabilities into odds. You
also may have some uncertainty about the pre-test odds. In this case, you might propose
a range of values that seem plausible.
You can summarize information about the diagnostic test itself using a
measure called the likelihood ratio. The likelihood ratio combines information about
the sensitivity and specificity. It tells you how much a positive or negative result
changes the likelihood that a patient would have the disease.
The likelihood ratio of a positive test result (LR+) is sensitivity divided by 1-
specificity.
 
 
 
 
The likelihood ratio of a negative test result (LR-) is 1- sensitivity divided by
specificity.
 
 
 
 
Once you have specified the pre-test odds, you multiply them by the likelihood
ratio. This gives you the post-test odds.
 
 
The post-test odds represent the chances that your patient has a disease. It
incorporates information about the disease prevalance, the patient pool, and specific
patient risk factors (pre-test odds) and information about the diagnostic test itself (the
likelihood ratio).
Example
An early test for developmental dysplasia of the hip. The test has 92%
sensitivity and 86% specificity in boys (AJPH 1998; 88(2): 285-288). The
likelihood ratio for a positive result from this test is 0.92 / (1-0.86) = 6.6 for boys.
The likelihood ratio for a negative result from this test is (1-0.92) / 0.86 = 0.09 (or
roughly 1/11).
Suppose one of our patients is a boy with no special risk factors. The
diagnostic test is positive. What can we say about the chances that this boy will
develop hip dysplasia? The prevalence of this condition is 1.5% in boys. This
corresponds to an odds of one to 66. Multiply the odds by the likelihood ratio, you

  6
get 6.6 to 66 or roughly 1 to 10. The post test odds of having the disease is 1 to 10
which corresponds to a probability of 9%.
Suppose we had a negative result, but it was with a boy who had a family
history of hip dysplasia. Suppose the family history would change the pre-test
probability to 25%. How likely is hip dysplasia, factoring in both the family history
and the negative test result? A probability of 25% corresponds to an odds of 1 to 3.
The likelihood ratio for a negative result is 0.09 or 1/11. So the post-test odds would
be roughly 1 to 33, which corresponds to a probability of 3%.
Notice that a negative test seems to change things more than a positive test.
There are two factors at work here. First, a positive result multiplies the pre-test
odds by a factor of only seven whereas a negative result divides the pre-test odds
by 11. This means that the test is better at ruling out a condition than ruling it in.
Second, the impact of a test is usually greatest for mid-sized probabilities. If a
condition is either very rare, or very common, then only a very definitive test is
likely to change things much. But mid-sized probabilities (say between 20% and
80%) will change greatly on the basis of even a moderately precise test.
Technical details
Define the following terms:
 
 
 
 
 
With this notation, we can describe all the classic measures for diagnostic tests:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Bayes Theorem, a classic result of probability theory, tells us that
 
 
 
 
 
 
 
 
 
 
 
Compute the ratio of these two probabilities
 
 
 
 
 
 
 
 
 
When the test is negative, we get a similar result:

  7
 
 
 
 
 
 
 
 
 
Summary
The likelihood ratio, which combines information from sensitivity and
specificity, gives an indication of how much the odds of disease change based on a
positive or a negative result. You need to know the pre-test odds, which incorporates
information about prevlance of the disease, characteristics of your patient pool, and
specific information about this patient. You then multiply the pre-test odds by the
likelihood ratio to get the post-test odds.
 
11.2.5 Negative Predictive Value
The negative predictive value of a test is the probability that the patient will
not have the disease when restricted to all patients who test negative.
You can compute the positive predictive value as
NPV = TN / (TN + FN)
where TN and FN are the number of true negative and false negative results,
respectively. Notice that the denominator for negative predictive value is the
number of patients who test negative. You can also define the negative predictive
value using conditional probabilities,
NPV = P [ Patient is healthy | Test is negative ].
If the prevalence of the disease in your situation is different from the prevalence
of the disease in the research study you are examining, then you can use likelihood
ratios to estimate the NPV.
Do not calculate the negative predictive value on a sample where the
prevalence of the disease was artificially controlled. For example, the NPV is
meaningless in a study where you artificially recruited healthy and diseased patients
in a one to one ratio.
Here is an example.
In a study of depression among 79 patients hospitalized for stroke (BMJ 2001;
323: 1159), 34 patients responded "no" to the question: Do you often feel sad or
depressed? Among these 34 patients who tested negative, 6 had clinical
depression as defined by a more complex measure, the Montgomery Asberg depression
rating scale. Since 28 did not have depression, the negative predictive value is 28 /
34 = 82%. This sample is somewhat unusual in that the prevalence of depression is
43%. The authors recalculated negative predictive value for a range of different
prevalences. If the group being examined would have had a prevalence of depression
of only 10%, then the negative predictive value would have been
98%.
 
11.2.6 Positive Predictive Value
The positive predictive value of a test is the probability that the patient has the
disease when restricted to those patients who test positive. This term is sometimes
abbreviated as PPV. You can compute the positive predictive value as
PPV = TP / (TP + FP)

  8
where TP and FP are the number of true positive and false positive results, respectively.
Notice that the denominator for positive predictive value is the number of patients
who test positive. You can also define the positive predictive value using conditional
probabilities,
PPV = P [ Patient has the disease | Test is positive ].
If the prevalence of the disease in your situation is different from the prevalence
of the disease in the research study you are examining, then you can use likelihood
ratios to estimate the PPV.
Do not calculate the positive predictive value on a sample where the
prevalence of the disease was artificially controlled. For example, the PPV is
meaningless in a study where you artificially recruited healthy and diseased patients
in a one to one ratio.
Here is an example.
In a study of patients in a network of sentinel practices in Belgium (BMJ
2000; 321; 998-999), 386 patients presented with rectal bleeding. These patients
were followed from 18 to 30 months and 27 of them developed colorectal cancer. The
positive predictive value for rectal bleeding is 27 / 386 = 7%.
 
11.2.7 Odds
The experts on this issue live just south of here in a town called Peculiar, Missouri.
The sign just outside city limits reads "Welcome to Peculiar, where the odds are with
you."
Odds are just an alternative way of expressing the likelihood of an event such
as catching the flu. Probability is the expected number of flu patients divided by the
total number of patients. Odds would be the expected number of flu patients divided
by the expected number of non-flu patients.
During the flu season, you might see ten patients in a day. One would have the flu and
the other nine would have something else. So the probability of the flu in your patient
pool would be one out of ten. The odds would be one to nine.
More details
It's easy to convert a probability into an odds. Simply take the probability
and divide it by one minus the probability. Here's a formula.
 
 
If you know the odds in favor of an event, the probability is just the odds
divided by one plus the odds. Here's a formula.
 
 
You should get comfortable with converting probabilities to odds and vice
versa. Both are useful depending on the situation.
Example
If both of your parents have an Aa genotype, the probability that you will have
an AA genotype is .25. The odds would be
 
 
which can also be expressed as one to three. If both of your parents are Aa, then the
probability that you will be Aa is .50. In this case, the odds would be
 
 
We will sometimes refer to this as even odds or one to one odds.

  9
When the probability of an event is larger than 50%, then the odds will be
larger than 1. When both of your parents are Aa, the probability that you will have at
least one A gene is .75. This means that the odds are.
 
 
which we can also express as 3 to 1 in favor of inheriting that gene. Let's convert that
odds back into a probability. An odds of 3 would imply that
 
 
Well that's a relief. If we didn't get that answer, Professor Mean would be open
to all sorts of lawsuits.
Suppose the odds against winning a contest were eight to one. We need to re-
express as odds in favor of the event, and then apply the formula. The odds in favor
would be one to eight or 0.125. Then we would compute the probability as
 
 
Notice that in this example, the probability (0.125) and the odds (0.111) did
not differ too much. This pattern tends to hold for rare events. In other words, if a
probability is small, then the odds will be close to the probability. On the other hand,
when the probability is large, the odds will be quite different. Just compare the values
of 0.75 and 3 in the example above.
Summary
Odds are an alternative way to express the likelihood that an event will occur.
You can compute the odds by taking the probability of an event and dividing by
one minus the probability. You can convert back to a probability by taking the odds
and dividing by one plus the odds.
 
11.2.8 Sensitivity
The sensitivity of a test is the probability that the test is positive when given
to a group of patients with the disease. Sensitivity is sometimes abbreviated Sn.
The formula for sensitivity is
Sn = TP / (TP + FN)
where TP and FN are the number of true positive and false negative results, respectively.
You can think of sensitivity as 1- the false negative rate. Notice that the
denominator for sensitivity is the number of patients who have the disease. Using
conditional probabilities, we can also define sensitivity as
Sn = P [ Test is positive | Patient has the disease ]
A large sensitivity means that a negative test can rule out the disease.
David Sackett coined the acronym "SnNOut" to help us remember this.
Here is an example of a sensitivity calculation.
In a study of 5,113 subjects checked for gastric cancer by endoscopy (Gut
1999; 44: 693-697), serum pepsinogen concentrations were also measured. A
pepsinogen I concentration of less than 70 ng/ml and a ratio of pepsinogen I to
pepsinogen II of less than 3 was considered a positive test. There were 13 patients with
gastric cancer confirmed by endoscopy. 11 of these patients were positive on the test.
The sensitivity is 11/13 = 85%.
 
11.2.9 Specificity
The specificity of a test is the probability that the test will be negative among
patients who do not have the disease. Specificity is sometimes abbreviated Sp. The
formula for specificity is

  10
Sp = TN / (TN + FP)
where TN and FP and the number of true negative and false positive results,
respectively. You can think of specificity as 1 - the false positive rate. Notice that
the denominator for specificity is the number of healthy patients. Using
conditional probabilities, we can also define specificity as
Sp = P [ Test is negative | Patient is healthy ]
A large specificity means that a positive test can rule in the disease. David
Sackett coined the acronym "SpPIn" to help us remember this.
Here is an example of a specificity calculation.
In a study of the urine latex agglutination test (AJPH 1998;88(2):285-288),
children were tested for H. influenzae using blood, urine, cerebrospinal fluid, or some
combination of these. Of all the children tested, 1,352 did not have H. influenzae
in any of these fluids. Only 9 of these patients tested positive on the urine latex
agglutination test, the remaining 1,343 tested negative. The specificity is 1343 /
1352 = 99.3%.
 
11.2.10 Multistage screening
To make a diagnosis of HIV, we normally perform a ELISA initially. Those
who are found to be positive are then subjected to a Western Blot. This type of testing
is called multistage screening. The initial test is often the more sensitive test and the
subsequent test more specific. When multistage screening is done, the overall sensitivity
(after both tests are done) is usually lower than the individual test sensitivities; the
overall specificity is usually higher than the individual specificities. Thus multistage
screening improves the specificity at the cost of sensitivity.
 
11.2.11 ROC Curves
Receiver Operating Characteristic (ROC) curve is a graphic method of assessing
the ability of a screening test to discriminate between healthy and diseased persons.
(Last 1995, Altaian 1994). ROCs are particularly useful in evaluating those tests
which do not give a categorical positive or negative result but instead give a value.
Obviously, if different cut off values are chosen to define normal and diseased, then
the screening test is going to have different values of sensitivity and specificity for
each cut off point. How then does one decide on which cut off point to use? The ROC
plots the different levels of sensitivity and specificity for differing cut off points (see
Fig. 1). The cut off which gives the maximum sensitivity and maximum specificity
(area under the ROC curve will be the maximum in this situation) will be the best cut
off to use in practice.

  11
Fig. 2 Example of a ROC curve
 
11.2.11.1 Choosing the right test
An ideal test of 100% sensitivity and 100% spedficily does not exist. We generally have
to choose from the available range of tests with varying sensitivities and specificities. How
does one choose the right test? If very important not to miss a disease which is serious and
potentially treatable (cancer, for example), it would be better to use a test which has greater
sensitivity. One would like to pick up as many cases as possible doing this test. On the other
hand, if making a positive diagnosis would result in much worry, stigma (HIV, for example)
or cost, then it would be better to use a test which has high specificity.
 

You might also like