You are on page 1of 7

SECTION III • Sources of Error: Anomalies, Artifacts, Technical Factors, and Statistics

9 Basic Statistics for


Electrodiagnostic Studies

For every electrodiagnostic (EDX) test performed, one Normal distribution


needs to decide if the study is normal or abnormal. That
determination often needs to be made in real time as the
testing progresses, so that the study can be modified based
on new information obtained as the testing proceeds.
However, interpreting a test as normal or abnormal is not 68.3%
always straightforward, and requires some understanding of
basic statistics. A full discussion of statistics is beyond the
scope and purpose of this text, but there are some basic
95.5%
statistical concepts that every electromyographer needs to
know in order to properly interpret a study.
No two normal individuals have precisely the same find- 99.7%
ings on any biologic measurement, regardless of whether
3 2 1 0 1 2 3
it is a serum sodium level, a hematocrit level, or a distal
median motor latency. Most populations can be modeled Mean
as a normal distribution, wherein there is a variation of 1 20 50 80 99
values above and below the mean. This normal distribution Percentiles
results in the commonly described bell-shaped curve (Figure
FIGURE 9–1  Normal distribution. Many biologic variables can
9–1). The center of the bell-shaped curve is the mean or
be modeled as a normal distribution wherein there is a variation of
average value of a test. It is defined as follows: values above and below the mean. This normal distribution results in
a bell-shaped curve. The center of the bell-shaped curve is the mean
Mean =
∑ (x , x , … x )
1 2 N or average value of a test. The numbers on the x-axis represent the
N number of standard deviations above and below the mean. The
standard deviation is a measure of the dispersion or variation in a
where x = an individual test result, and N = total number of distribution. The number of standard deviations above and below the
individuals tested. mean define a certain portion of the population.
The standard deviation (SD) is a statistic used as a
measure of the dispersion or variation in a distribution. In
a normal median distal motor latency is less than 4.4 ms
general, it is a measure of the extent to which numbers are
(i.e., there is no lower cutoff because there is no median
spread around their average. It is defined as follows:
distal motor latency that is too good). Thus, for tests where
the abnormal values are limited to one tail of the bell-
∑[( x − Mean )2, ( x2 − Mean ) … ( x N − Mean ) ]
2 2
1
SD = shaped curve, instead of two:
N −1
• All observations up to 2 SD beyond the mean include
The reasons that the SD is such a useful measure of the approximately 97.5% of the population.
scatter of the population in a normal distribution are as
• All observations up to 2.5 SD beyond the mean
follows (Figure 9–1):
include approximately 99.4% of the population.
• The range covered between 1 SD above and below
the mean is about 68% of the observations. These facts are important because cutoff values for most
EDX studies often are set at 2 or 2.5 SD above or below
• The range covered between 2 SD above and below
the mean for upper and lower cutoff limits, respectively.
the mean is about 95% of the observations.
After cutoff limits are established, one must next appreci-
• The range covered between 3 SD above and below ate the important concepts of specificity and sensitivity of
the mean is about 99.7% of the observations. a test.
In EDX studies, one usually uses a lower or upper cutoff The specificity of a test is the percentage of all patients
value, not both. For instance, a normal serum sodium may without the condition (i.e., normals) who have a negative
be 130 to 145 mmol/L (lower and upper cutoffs); however, test. Thus, when a test is applied to a population of patients

©2013 Elsevier Inc


DOI: 10.1016/B978-1-4557-2672-1.00009-X
Downloaded from ClinicalKey.com at Univ Targu Mures Med Pharmacy March 28, 2016.
For personal use only. No other uses without permission. Copyright ©2016. Elsevier Inc. All rights reserved.
Chapter 9 • Basic Statistics for Electrodiagnostic Studies 91

who are normal, the test will correctly identify all patients In an ideal setting, there would be no overlap between a
as normal who do not exceed the cutoff value (true nega- normal and a disease population. Then, a cutoff value could
tive); however, it will misidentify a small number of normal be placed between the two populations, and such a test
patients as abnormal (false positive) (Figure 9–2, left). It would have 100% sensitivity and 100% specificity (Figure
is important to remember that every positive test is not 9–3, left). However, in the real world, there is always some
necessarily a true positive; there will always be a small overlap between a normal and disease population (Figure
percentage of patients (approximately 1–2%) who will be 9–3, right). If a test has very high sensitivity and specificity,
misidentified. it will correctly identify nearly all normals and abnormals;
The sensitivity of a test is the percentage of all patients however, there will remain a small number of normal
with the condition who have a positive test. When a test is patients misidentified as abnormal (false positive) and a
applied to a disease population, the test will correctly iden- small number of abnormal patients misidentified as normal
tify all abnormal patients who exceed the cutoff value (true (false negative).
positive); however, it will misidentify a small number of Often there is a compromise between sensitivity and
abnormal patients as normal (false negative) (Figure 9–2, specificity when setting a cutoff value. Take the example
right). Thus, it is equally important to remember that every of a normal and a disease population where there is signifi-
negative test is not necessarily a true negative; there will cant overlap between the populations for the value of a test.
always be a small percentage of abnormal patients (approxi- If the cutoff value is set low, the test will have high sensitiv-
mately 1–2%) who will be misidentified as normal. Thus, ity but very low specificity (Figure 9–4). In this case, the
the specificity and sensitivity can be calculated as follows: test will correctly diagnose nearly all the abnormals cor-
rectly (true positive) and will only misidentify a few as
True Negatives normal (false negative) (Figure 9–4, left). However, the
Specificity (%) = ∗ 100
(True Negatives + False Positives ) tradeoff for this high sensitivity will be low specificity. In
this case, a high number of normal patients will be classified
as abnormal (false positive) (Figure 9–4, right).
True Positives Conversely, take the example where the cutoff value is
Sensitivity (%) = ∗ 100
(True Positives + False Negatives ) set high. The test will now have high specificity but very

Cut off Cut off

Normal Disease

Test value Test value


True negative True positive
False positive False negative
FIGURE 9–2  Cutoff values and false results. Left: When a test is applied to a population of normal patients, the test will correctly identify
all patients who are below the cutoff value as normal (true negative [green in the figure]); however, it will misidentify a small number of
normal patients who are above the cutoff value as abnormal (false positive [dark blue in the figure]). Right: When a test is applied to a disease
population, the test will correctly identify all abnormal patients who exceed the cutoff value (true positive [red in the figure]); however, it will
misidentify a small number of abnormal patients who are below the cutoff value as normal (false negative [dark blue in the figure]).

100% sensitivity 100% sensitivity


100% specificity 100% specificity
Cut off Cut off

Normal Disease Normal Disease

Test value Test value


False positive
False negative
FIGURE 9–3  Selection of cutoff values. Left: Ideally, there would be no overlap between a normal (green) and a disease (red) population,
and the cutoff value could be placed between the two populations, yielding 100% sensitivity and 100% specificity. Right: In biologic populations,
there is always some overlap between normal and disease populations. If a test has very high sensitivity and specificity, it will correctly identify
nearly all normals and abnormals; however, there will remain a small number of normal patients misidentified as abnormal (false positive [yellow
in the figure]) and a small number of abnormal patients misidentified as normal (false negative [dark blue in the figure]).

Downloaded from ClinicalKey.com at Univ Targu Mures Med Pharmacy March 28, 2016.
For personal use only. No other uses without permission. Copyright ©2016. Elsevier Inc. All rights reserved.
92 SECTION III Sources of Error: Anomalies, Artifacts, Technical Factors, and Statistics

High sensitivity
Low specificity
Cut off Cut off

Normal Disease Normal Disease

Test value Test value


True positive True negative
False negative False positive
FIGURE 9–4  Advantages and disadvantages: high sensitivity and low specificity. Left: If the cutoff value is set low (high sensitivity), the test
will correctly diagnose nearly all the abnormals correctly (true positives [red in the figure]) and will misidentify only a few as normal (false negatives
[dark blue in the figure]). Right: The tradeoff for this high sensitivity will be low specificity. In this case, some normals will be identified as
normal (true positives [green in the figure]) but a high number of normal patients will be classified as abnormal (false positives [light blue in the
figure]).

Low sensitivity
High specificity
Cut off Cut off

Normal Disease Normal Disease

Test value Test value


True negative True positive
False positive False negative
FIGURE 9–5  Advantages and disadvantages: high specificity and low sensitivity. Left: If the cutoff value is set high (high specificity), the test
will correctly identify nearly all of the normals correctly (true negatives [green in the figure]) and will misidentify only a few normals as abnormal
(false positives [light blue in the figure]). Right: The tradeoff for this high specificity will be low sensitivity. Here, some abnormal patients will be
identified as abnormal (true positives [red in the figure]) but a high number of abnormal patients will be classified as normal (false negatives [dark
blue in the figure]).

low sensitivity (Figure 9–5). In this case, the test will cor- The tradeoff between sensitivity and specificity can be
rectly identify nearly all the normals correctly (true nega- appreciated by plotting a receiver operator characteristic
tive) and will only misidentify a few normals as abnormal (ROC) curve that graphs various cutoff values by their
(false positive) (Figure 9–5, left). However, the tradeoff sensitivity on the y-axis and specificity on the x-axis (actu-
for this high specificity will be low sensitivity. Here, a high ally in a typical ROC curve, the x-axis is 1 minus the specificity,
number of abnormal patients will be classified as normal which can alternatively be graphed as the specificity going
(false negative) (Figure 9–5, right). from 100 to 0, instead of 0 to 100). Figure 9–6 shows an
False positives and false negatives result in what are ROC curve for the digit 4 sensory nerve conduction study
termed type I and type II errors, respectively. In a type I in patients with mild carpal tunnel syndrome. For this nerve
error, a diagnosis of an abnormality is made when none is conduction study, the sensory latency stimulating the ulnar
present (i.e., convicting an innocent man). Conversely, in a nerve at the wrist and recording digit 4 is subtracted
type II error, a diagnosis of no abnormality is made when from the sensory latency stimulating the median nerve at
one actually is present (i.e., letting a guilty man go free). the wrist and recording digit 4, using identical distances. In
Although both are important, type I errors are generally normals, one expects there to be no significant difference.
considered more unacceptable (i.e., labeling patients as In patients with carpal tunnel syndrome, the median
having an abnormality when they are truly normal, because latency is expected to be longer than the ulnar latency.
this can lead to a host of problems, among them inappropri- Note in Figure 9–6 that there is a tradeoff between specifi-
ate testing and treatment). Thus, the specificity of a test city and sensitivity as the cutoff value changes. For any
should take precedence over the sensitivity, unless the test cutoff value 0.4 ms or greater, there is a very high specifi-
is being used as a screening tool alone (i.e., any positive city. As the cutoff value is lowered, the sensitivity increases
screening test must be confirmed by a much more specific but at a significant cost to the specificity. In this example,
test before any conclusion is reached). it is easy to appreciate that the 0.4 ms cutoff is where the

Downloaded from ClinicalKey.com at Univ Targu Mures Med Pharmacy March 28, 2016.
For personal use only. No other uses without permission. Copyright ©2016. Elsevier Inc. All rights reserved.
Chapter 9 • Basic Statistics for Electrodiagnostic Studies 93

FIGURE 9–6  Receiver operator characteristic (ROC) > -0.4


curve for the digit 4 (D4) study. The graph demonstrates 100
> -0.3
the tradeoff between sensitivity and specificity for various > -0.2
90 >0 > -0.1
test values of the D4 study. Normal cutoff value (arrow) was > 0.1
set to obtain a 97.5% specificity (dashed line). The ROC 80 > 0.2
curve shows a sharp turn at the cutoff value, maximizing > 0.3
sensitivity and specificity. 70 > 0.4
(Adapted from Nodera, H., Herrmann, D.N., Holloway, R.G., et al.,
60

Sensitivity (%)
2003. A Bayesian argument against rigid cutoffs in electrodiagnosis
of median neuropathy at the wrist. Neurology 60, 458–464.) > 0.5
50

40
> 0.6
30

20

10

0
100 90 80 70 60 50 40 30 20 10 0
Specificity (%)

graph abruptly changes its slope. Setting the cutoff value Population = 1000
at 0.4 ms or greater achieves a specificity greater than 97%.
The sensitivity is approximately 70%. One could place the
cutoff value at 0.1 ms and achieve a sensitivity of 90%; Prevalence of disease = 80%
however, the specificity would fall to about 60%, meaning
40% of normal patients would be misidentified as abnor-
mal, a clearly unacceptable level.
Important clinical–electrophysiologic implications are as Disease Normal
follows: 800 200

1. Because of the normal variability and overlap between


Sensitivity = 95% Specificity = 95%
normal and disease populations, all EDX studies will
have a small number of false-positive results and
false-negative results.
2. Thus, EDX studies can never completely “rule out” True positive False negative False positive True negative
any condition. Likewise, they can never completely 760 40 10 190
“rule in” any condition.
Predictive value of (True positives)
3. Remember that a small number of false-positive =
a positive test (True positives + false positives)
results are expected. Always keep in mind the
760
possibility of a type I error (i.e., convicting an = = 98.7%
(760 + 10)
innocent man) and the ramifications such an error
can have. FIGURE 9–7  Predictive value of a positive test: high prevalence
of disease. See text for details.

BAYES’ THEOREM AND examples (Figures 9–7 and 9–8). In both examples, the
THE PREDICTIVE VALUE OF same test with a 95% sensitivity and a 95% specificity is
A POSITIVE TEST applied to a population of 1000 patients. In Figure 9–7,
the prevalence of the disease in the population is high
Bayes’ theorem states that the probability of a test demon- (80%); in Figure 9–8, the prevalence is low (1%). In the
strating a true positive depends not only on the sensitivity population with a disease prevalence of 80%, 760 of the
and specificity of a test but also on the prevalence of the 800 patients with the disease will be correctly identified;
disease in the population being studied. The chance of a of the 200 normals, 10 will be misidentified as abnormal
positive test being a true positive is markedly higher in a (false positives). The predictive value of a positive test is
population with a high prevalence of the disease. In con- defined as the number of true positives divided by the
trast, if a very sensitive and specific test is applied to a number of total positives. The total positives are the true
population with a very low prevalence of the disease, most positives added to the false positives. In Figure 9–7, the
positive tests will actually be false positives. The predictive predictive value that a positive test is a true positive is
value of a positive test is best explained by contrasting two 760/(760 + 10)  = 98.7%. Thus, in this example, where the

Downloaded from ClinicalKey.com at Univ Targu Mures Med Pharmacy March 28, 2016.
For personal use only. No other uses without permission. Copyright ©2016. Elsevier Inc. All rights reserved.
94 SECTION III Sources of Error: Anomalies, Artifacts, Technical Factors, and Statistics

disease prevalence in the population is high, a positive test patients with a high index of suspicion for the disorder
is extremely helpful in correctly identifying the patient as being questioned; hence, the prevalence of the disease is
having the disease. high. For instance, take the example of a patient referred
In the example where the disease prevalence is 1% to the EDX laboratory for possible carpal tunnel syndrome.
(Figure 9–8), of the 10 patients with the disease, 9.5 will If the patient has pain in the wrist and hand, paresthesias
be correctly identified. However, of the 990 normals, 49.5 of the first four fingers, and symptoms provoked by sleep,
will be misidentified as abnormal. Thus, the predictive driving, and holding a phone, the prevalence of carpal
value that a positive test is a true positive is 9.5/ tunnel syndrome in patients with such symptoms would be
(9.5 + 49.5)  = 16.1%. This means that 83.9% of the positive extremely high. Thus, if EDX studies are performed and
results will actually be false! In this setting, where the demonstrate delayed median nerve responses across the
disease prevalence in the population is low, a highly sensi- wrist, there is a very high likelihood that these positive tests
tive and specific test is of absolutely no value. are true positives. However, if the same tests are performed
Although this analysis may seem distressing, the good in a patient with back pain and no symptoms in the hands
news is that EDX studies are generally performed in and fingers, the prevalence of carpal tunnel syndrome
would be low in such a population. In this situation, any
positive finding would have a high likelihood of being a
false positive and would likely not be of any clinical
Population = 1000
significance.
Less well appreciated is that the problem of a false posi-
tive in a population with a low prevalence of disease can
Prevalence of disease = 1%
be overcome by making the cutoff value more stringent
(i.e., increasing the specificity). Take the example shown
in Figure 9–9 of the palmar mixed latency difference test
Disease Normal in patients with suspected carpal tunnel syndrome. For
10 990 this nerve conduction study, the latency for the ulnar
palm-to-wrist segment is subtracted from the latency for
the median palm-to-wrist segment, using identical dis-
Sensitivity = 95% Specificity = 95% tances. In normals, one expects there to be no significant
difference. In patients with carpal tunnel syndrome, the
median latency is expected to be longer than the ulnar
True positive False negative False positive True negative
latency. In this example, the post-test probability (i.e., the
9.5 0.5 49.5 940.5 predictive value of a positive test) is plotted against differ-
ent cutoff values for what is considered abnormal for
Predictive value of
=
(True positives) patients in whom there is a high pre-test probability of
a positive test (True positives + false positives) disease and for those in whom there is a low pre-test
9.5 probability. In the patients with a high pre-test probability
= = 16.1%
(9.5 + 49.5) of disease, a cutoff value of 0.3 ms (i.e., any value >0.3 ms
FIGURE 9–8  Predictive value of a positive test: low prevalence is abnormal) achieves a 95% or greater chance that a posi-
of disease. See text for details. tive test is a true positive. However, the same 0.3 ms

FIGURE 9–9  Post-test probabilities (PostTP) calculated with 100


different pre-test probabilities (PreTP). In this example, the 95
commonly used test for median neuropathy at the wrist, the 90
palmar median–ulnar difference, is plotted using different 85
80
PreTPs ( 90%, 10%) against PostTPs (i.e., a positive test 75
result is a true positive). Note that PostTP depends on both 70
Post-test probability (%)

the actual test value and PreTP, with higher PreTP yielding 65
higher PostTP. A borderline abnormal test value (i.e., 0.4 ms) 60
yields very high PostTP (95%) when PreTP is high, whereas 55
50
the same test value results in only intermediate PostTP when 45
PreTP is low. In contrast, very abnormal test values (i.e., 40
≥0.5 ms) result in PostTPs of 100%, regardless of the PreTP. 35
(Adapted from Nodera, H., Herrmann, D.N., Holloway, R.G., et al., 30
2003. A Bayesian argument against rigid cutoffs in electrodiagnosis of 25
median neuropathy at the wrist. Neurology 60, 458–464.) 20 90%
15 Pre-test Probability
10 10%
5
0
<0.1 0.1 0.2 0.3 0.4 0.5 0.6 >0.6
Palmar median-ulnar difference (ms)

Downloaded from ClinicalKey.com at Univ Targu Mures Med Pharmacy March 28, 2016.
For personal use only. No other uses without permission. Copyright ©2016. Elsevier Inc. All rights reserved.
Chapter 9 • Basic Statistics for Electrodiagnostic Studies 95

cutoff in the low pre-test probability population results in 4. An abnormal test, especially when borderline, is likely
only a 55% chance that a positive test is a true positive a false positive if the clinical symptoms and signs do
(and a corresponding 45% false-positive rate). These find- not suggest the possible diagnosis.
ings are in accordance with Bayes’ theorem wherein the
chance of a positive test being a true positive depends not
only on the sensitivity and specificity of the test, but on MULTIPLE TESTS AND
the prevalence of the disease in the population being THE INCREASING RISK
sampled (i.e., the pre-test probability). However, if the
cutoff value is increased to 0.5 ms, then the post-test OF FALSE POSITIVES
probability that a positive test is a true positive jumps to The last relevant statistical issue that every electromyogra-
greater than 95%, even in the population with a low prob- pher needs to appreciate is the increased risk of a false
ability of disease. positive when many different tests are applied in an attempt
Important clinical–electrophysiologic implications are as to reach a diagnosis. The most common situation occurs
follows: in the electrodiagnosis of median neuropathy at the wrist
1. Every EDX study must be individualized, based on (i.e., carpal tunnel syndrome) where numerous useful
the patient’s symptoms and signs and the correspond- nerve conduction studies have been described. However,
ing differential diagnosis. When the appropriate tests when normal values for each individual test are set, an
are applied for the appropriate reason, any positive upper limit of normal usually is selected at 2 SD beyond
test is likely to be a true positive and of clinical the mean so that approximately 97.5% of the normal popu-
significance. lation will be correctly identified. Thus, each test carries a
2. A test result that is minimally positive has significance 2.5% false-positive rate. If these tests are independent and
only if there is a high likelihood of the disease being used sequentially, the false-positive rate increases and
present, based on the presenting symptoms and quickly rises to unacceptable levels. For instance, if 10 tests
differential diagnosis. are applied, each with a 2.5% false-positive rate, and only
3. A test that is markedly abnormal is likely a true positive, one abnormal test is required to make a diagnosis, the false-
regardless of the clinical likelihood of the disease. positive rate rises above 20%. This situation is similar to a

.050
.045
.040
.035
FPR .030 FPR
50 .025 50

.020 .050

40 40 .045
.015
.040
30 30
.035
.010
.030
20 20
.025
.005
.020
10 10
.015
.010
0 0 .005
0 10 20 30 0 10 20 30
Number of tests Number of tests
1 test abnormal ≥2 tests abnormal
FIGURE 9–10  Multiple tests and the risk of false positives. The number of tests is plotted against the cumulative false-positive rate (FPR)
for a variety of different individual test FPRs. Note the curve with the (★); this represents a false-positive rate of 2.5%, which carries the most
common test specificity of 97.5%. Left: Cumulative FPR is calculated based on the assumption that only one test needs to be abnormal to
diagnose the condition. Note that if 10 different tests are done, the cumulative FPR is almost 25%. Right: If two or more tests are required to
be abnormal before a diagnosis is reached, the statistics change. When 10 tests are done with an individual FPR of 2.5%, the cumulative FPR
remains less than 2.5%, an acceptable level.
(Adapted from Van Dijk, J.G., 1995. Multiple tests and diagnostic validity. Muscle Nerve 18, 353–355.)

Downloaded from ClinicalKey.com at Univ Targu Mures Med Pharmacy March 28, 2016.
For personal use only. No other uses without permission. Copyright ©2016. Elsevier Inc. All rights reserved.
96 SECTION III Sources of Error: Anomalies, Artifacts, Technical Factors, and Statistics

normal person undergoing an SMA-20 blood screen. It is 1. Be very cautious about making any diagnosis based on
not uncommon that a single test is above or below the only one piece of data; if that piece of data is in error,
cutoff range and, in nearly every case, represents a false it will be a false positive.
positive. 2. Be very cautious about making any diagnosis based on
Fortunately, there is a relatively simple remedy to this only one piece of data; 2.5% of all tests will be false
problem of multiple tests and the increasing risk of false positives, simply based on how the cutoff values are
positives. In Figure 9–10, the number of tests performed selected (i.e., 2 SD beyond the mean).
is plotted against the cumulative false-positive rate for a 3. Be very cautious about making any diagnosis based on
variety of different individual test false-positive rates only one piece of data, especially if multiple tests are
(FPRs). Note the curve with the (★); this represents a used; the cumulative false-positive rate quickly rises
false-positive rate of 2.5%, which carries the most common to unacceptable levels.
test specificity of 97.5%. In the graph to the left, the cumu- 4. When multiple tests are used, the false-positive rate
lative false-positive rate is calculated based on the assump- can be reduced to an acceptable level if two or more
tion that only one test needs to be abnormal to diagnose tests must be abnormal before a diagnosis is made.
the condition. Note that if 10 different tests are performed,
with each individual test carrying a false-positive rate of
2.5%, the cumulative false-positive rate is almost 25%. In Suggested Readings
contrast, the statistics change significantly if two or more Nodera, H., Herrmann, D.N., Holloway, R.G., et al.,
tests are required to be abnormal to diagnose the condition. 2003. A Bayesian argument against rigid cut-offs in
In the graph to the right, if 10 tests are done, each with an electrodiagnosis of median neuropathy at the wrist.
individual false-positive test rate of 2.5%, the cumulative Neurology 60, 458–464.
false-positive rate remains less than 2.5%, an acceptable Rivner, M.H., 1994. Statistical errors and their effect in
level, if two or more of the tests are required to be electrodiagnostic medicine. Muscle Nerve 17, 811–814.
abnormal. Van Dijk, J.G., 1995. Multiple tests and diagnostic validity.
Important clinical–electrophysiologic implications are as Muscle Nerve 18, 353–355.
follows:

Downloaded from ClinicalKey.com at Univ Targu Mures Med Pharmacy March 28, 2016.
For personal use only. No other uses without permission. Copyright ©2016. Elsevier Inc. All rights reserved.

You might also like