You are on page 1of 7

Journal of Clinical Epidemiology 56 (2003) 956–962

Optimal choice of a cut point for a quantitative diagnostic test


performed for research purposes
Laurence S. Magdera,b,*, Alan D. Fixa
a
Department of Epidemiology and Preventive Medicine, University of Maryland, 660 West Redwood Street, Baltimore, MD 21201, USA
b
Department of Mathematics and Statistics, University of Maryland, Baltimore, MD, USA
Accepted 11 May 2003

Abstract
Often, in epidemiologic research, classification of study participants with respect to the presence of a dichotomous condition (e.g.,
infection) is based on whether a quantitative measurement exceeds a specified cut point. The choice of a cut point involves a tradeoff
between sensitivity and specificity. When the classification is to be made for the purpose of estimating risk ratios (RRs) or odds ratios
(ORs), it might be argued that the best choice of cut point is one that maximizes the precision of estimates of the RRs or ORs. In this article,
two different approaches for estimating RRs and ORs are discussed. For each approach, formulae are derived that give the mean squared
error of the RR and OR estimates, for any choice of cut point. Based on these formulae, a cut point can be chosen that minimizes the
mean squared error of the estimate of interest. 쑖 2003 Elsevier Inc. All rights reserved.
Keywords: Epidemiologic methods; Sensitivity and specificity; Diagnostic tests; Misclassification; Odds ratio; Risk ratio; Study design

1. Introduction Methods for choosing a cut point have been developed


for the situation in which the test is to be used for diagnosis
Often, in epidemiologic research designed to estimate the in a clinical setting where treatment decisions will be based
association between risk factors and a disease, the classifica-
on the test results [1]. In this context, the choice of cut point
tion of a person with respect to the presence or absence of
should be based on the consequences of treating those who
disease is based on whether some quantitative measurement do not have the condition, the consequences of failing to treat
exceeds a specified cut point. This classification is generally
those who do have the condition, and the prevalence of the
not perfectly accurate, and the choice of the cut point in- condition in question [1]. This approach can be implemented
volves a tradeoff between sensitivity and specificity. De- by plotting an receiver operating characteristic (ROC) curve
creasing the value of the cut point results in higher sensitivity,
and finding the point on the curve at which the slope equals
but lower specificity of the diagnostic test. This article con- [C/B][(1⫺pD)/pD] where C is the net cost of treating someone
cerns the question of how to choose a cut point in that without the condition, B is the net benefit of treating
context. someone with the condition, and pD is the patient’s pretest
For example, there is current interest in estimating the probability of having the disease [1].
degree of association between various risk factors (e.g., However, when the test is used for research purposes and
sexual activity) and seropositivity for Herpes Simplex Virus treatment decisions are not based on the test results, the
8 (HSV8). The degree of association can be quantified by choice of the cut point should be based on scientific consider-
risk ratios (RRs) or odds ratios (ORs). The classification of ations. Specifically, if the test is to be used in epidemiologic
study participants with respect to HSV8 seropositivity is research to estimate RRs and ORs, it might be argued that
based on whether the value of the optical density of a sero- a cut point should be chosen that results in the most precise
logic assay exceeds a specified cut point. Unfortunately, RR and OR estimates.
no matter what cut point is chosen, the assay does not result In this article, we consider two general approaches for
in perfectly accurate classifications of HSV8. What would estimating RRs and ORs and for each approach we provide
be the optimal cut point in this context? formulae that can be used to calculate the precision of
the estimates for any choice of cut point. Based on these
formulae, cut points can be chosen that result in the best
* Corresponding author. Tel.: 410-706-3253; fax: 410-706-8013. estimates. Throughout, it is assumed that the sensitivity and
E-mail address: Lmagder@epi.umaryland.edu (L.S. Magder). specificity of the diagnostic test are known at each possible
0895-4356/03/$ – see front matter 쑖 2003 Elsevier Inc. All rights reserved.
doi: 10.1016/S0895-4356(03)00153-7
L.S. Magder, A.D. Fix / Journal of Clinical Epidemiology 56 (2003) 956–962 957

cut point. However, the results have implications for choos- How should we choose the value of the cut point when the
ing cut points when there is uncertainty about the sensitivity classification is being made strictly for research purposes?
and specificity of the test at each cut point. We would argue that the a cut point should be chosen
that maximizes the precision of estimates of scientific interest.
Here, we consider the cases in which the goal is to estimate
RRs or ORs. More precisely, let pD|E and pD|Ē stand for
2. Statement of the problem the probability of disease among the exposed and unexposed
For linguistic simplicity, we will use the words “disease” to respectively. In these terms,
refer to the health condition of interest, and “exposure”
RR ⫽ pD|E ÷ pD|Ē
to refer to a dichotomous risk factor. Let Z stand for the
quantitative variable used to classify people with respect to and
the presence or absence of disease. The usefulness of Z
for classifying people depends on the degree to which the OR ⫽ pD|E/(1⫺pD|E) ÷ pD|Ē/(1⫺pD|Ē).
distribution of Z among those with disease differs from
the distribution of Z among those without disease. Fig. 1 To quantify the precision of an estimate, it is common
illustrates hypothetical distributions of Z among the diseased to use the mean squared error (MSE), the average squared
and the nondiseased. In this illustration, the two distributions distance between an estimate and the true value [2]. The
are both normal with standard deviation equal to 1, but with MSE is equal to the variance of the estimate plus the square
means that differ by three standard deviations. The horizontal of the bias of the estimate. To determine the precision of
axis is labeled with respect to the distance from the mean estimates of the RR and the OR it is common and convenient
of Z in the nondiseased. to work on the log scale. Thus, the precision of an estimate
To classify people as diseased or nondiseased based on of RR, say RR̂, can be quantified using
an observed value of Z, a cut point is chosen. If the value
of Z for a person exceeds the cut point, then the person is MSE ⫽ Expected value of (logRR̂ ⫺ logRR)2
classified as diseased (test result positive). Otherwise, the
⫽ Var(logRR̂) ⫹ (bias(logRR̂))2.
person is classified as nondiseased (test result negative).
The two shaded areas in Fig. 1 illustrate the sensitivity The analogous expression is used for the OR. Given an
(shaded area on the right) and specificity (shaded are on the estimation approach, the problem reduces to finding the cut
left) that would result if a cut point of 2.0 were chosen. point that minimizes the MSE.

Fig. 1. Illustration of hypothetical distributions of Z among the diseased (right curve) and undiseased (left curve). The area in the shaded region to the
right of 2.0 represents the sensitivity that would result if 2.0 was chosen as the cut point. The area in the shaded region to the left of 2.0 represents
the specificity that would result.
958 L.S. Magder, A.D. Fix / Journal of Clinical Epidemiology 56 (2003) 956–962

3. Choosing a cut point when the RR or OR To illustrate the results of applying these formulae, we
will be estimated in a standard manner consider the special case in which the distribution of Z
among both the diseased and nondiseased is normal with
We assume independent observations are available from the same variance but different means. Fig. 2a–d shows the
n study subjects. For each subject, information regarding MSE of the standard estimates calculated at a range of cut
the presence or absence of the exposure and the value of Z
points under various scenarios. Fig. 2b and d is based on two
is known. Given such data, there are several approaches to distributions whose means are separated by three standard
estimating the RR and OR.
deviations, as illustrated in Fig. 1. The horizontal axes con-
The standard approach would be to choose a cut point,
sists of an interval of possible cut points, labeled by their
classify the subjects with respect to disease based on this distance (in standard deviations) from the mean of the distri-
cut point, and estimate the RR or OR as if these classifica-
bution of Z among the nondiseased, as in Fig. 1. The sensitiv-
tions represented the true disease status of each person. To
ity and specificity corresponding to each possible cut point
describe this approach more precisely, notation is provided (calculated based on the normality assumptions) are given
in Table 1. This represents a hypothetical two-by-two table
below the horizontal axis. The calculations are based on a
that can be constructed once a cut point for the quantita-
sample size of 200 per group. It can be seen that in these
tive measurement Z is chosen. scenarios, the optimum cut point occurs in a place of high
Using the notation in Table 1, let p̂T⫹|E ⫽ a/nE and
specificity and moderate sensitivity.
p̂T⫹|Ē ⫽ c/nĒ denote the standard estimates of the probability
The importance of high specificity in these scenarios is
of testing positive for the disease given exposure and nonex- tied to the fact that the probability of disease in each group
posure respectively. The standard approach to estimating the
is relatively low. Given a low probability of disease and
RR and the OR in this setting is to use imperfect specificity, the number of true positives might be
RR̂standard ⫽ p̂T⫹|E/p̂T⫹|Ē relatively low compared to the number of false positives.
This will lead to relatively greater bias and variance.
and

OR̂standard ⫽ (p̂T⫹|E/(1⫺p̂T⫹|E)) ÷ (p̂T⫹|Ē/(1⫺p̂T⫹|Ē)).


4. Choosing a cut point when the RR or OR
Appendix A contains formulas for the asymptotic vari- will be estimated using an approach that
ance and bias of these estimates derived using the “delta adjusts for imperfect sensitivity and specificity
method” [3]. No assumptions were needed to derive these of the diagnostic test
formulae other than the availability of independent observa-
Given data such as that in Table 1 and known values of
tions and knowledge of the sensitivity and specificity of
sensitivity and specificity, it is possible to estimate the RR
the test. In particular, the formulae do not depend on the
and OR using a method that adjusts for the imperfect sensitiv-
normality of Z. As can be seen, the variance and bias depend
ity and specificity of the test [4,5]. The rationale for this
on: (1) the sample size among exposed and unexposed, (2)
approach is as follows:
the true values of pD|E and pD|Ē, and (3) the sensitivity and
Let pT⫹|E and pT⫹|Ē stand for the probability of testing
specificity of the diagnostic test. Given these values, we can
positive in the exposed and unexposed, respectively, based
calculate the MSE of the estimates.
on a given cut point. There are two possible ways in which
Now we assume that the sensitivity and specificity are
a positive test result could occur: (1) the person really has
known for each possible cut point under consideration.
the disease and the test is correctly positive, and (2) the
Therefore, given sample sizes and values of pD|E and pD|Ē
person does not have the disease, but the test is incorrectly
we can calculate the MSE of these estimates at each possible
positive. The probability of testing positive is the sum of
cut point under consideration and then choose the cut point
the probability of these two possibilities. Therefore,
that leads to the lowest MSE. In actuality, we will not know
the true values of pD|E and pD|Ē, but substituting our best pT⫹|E ⫽ ( pD|E)(sens) ⫹ (1⫺pD|E)(1⫺spec)
guesses for them will result in the best guess regarding the
optimal cut point. where “sens” and “spec” are short for the sensitivity and
specificity of the diagnostic test. Solving this equation for
pD|E results in:
Table 1
Notation for cell counts in two-by-two tables pT⫹|E ⫺ (1⫺spec)
pD|E ⫽ .
Classification of study participants based sens ⫺ (1⫺spec)
on the diagnostic test
Therefore, an unbiased estimate of pD|E is:
Classified as diseased Classified as nondiseased Total
Exposed a b nE p̂T⫹|E ⫺ (1⫺spec)
Unexposed c d nĒ p̂D|E ⫽ .
sens ⫺ (1⫺spec)
L.S. Magder, A.D. Fix / Journal of Clinical Epidemiology 56 (2003) 956–962 959

Fig. 2. Asymptotic mean squared error of the log RRstandard (plots a and b), and log ORstandard (plots c and d) at a range of cut points, under various scenarios.
All plots assume 200 subjects per group. Plots (a) and (c) are based on the assumption that the distributions of the quantitative assessment have the same
standard deviation, but differ in their means by two standard deviations. Plots (b) and (d) assume they differ by three standard deviations. The probability
of disease in the unexposed was assumed to equal 0.1 (thick), or 0.3 (thin). The probability of disease in the exposed was set so that the RR ⫽ 2 (plots a
and b) or the OR ⫽ 2 (plots c and d). The numbers for the cut points on the horizontal axes refer to distances from the mean in the undiseased.

Using the same approach to derive pD|Ē results in the Note that, if the specificity is 1.0, the adjusted estimate
following adjusted estimates for the RR and OR: of the RR is equivalent to the standard estimate. This reflects
the fact that when specificity is perfect, the standard estimate
p̂D|E p̂T⫹|E ⫺ (1⫺spec)
RR̂adjusted ⫽ ⫽ of the RR is asymptotically unbiased.
p̂D|Ē p̂T⫹|Ē ⫺ (1⫺spec) The asymptotic variances of these estimates are given by
and equations (7) and (8) in the appendix. These variances
depend on the sensitivity and specificity of the test, the
p̂D|E/(1⫺p̂D|E) values of pD|E and pD|Ē and the sample size in each group.
OR̂adjusted ⫽
p̂D|Ē/(1⫺p̂D|Ē) Again, the validity of these formulae does not depend on
assumptions about the normality Z. Therefore, for given
p̂T⫹|E ⫺ (1⫺spec) p̂T⫹ |Ē ⫺ (1⫺spec)
⫽ ÷ . values of pD|E and pD|Ē and sample size, a cut point can be
sens ⫺ p̂T⫹|E sens ⫺ p̂T⫹|Ē chosen which results in the lowest variance. Because
For some data sets, these formulae can result in negative these estimates are asymptotically unbiased, their asymptotic
values for the estimate. In those cases, the parameter should MSE is equivalent to their asymptotic variances.
be estimated with 0 or infinity depending on the situation. Fig. 3a–d shows the asymptotic MSE of the adjusted
For example, if the denominator of the adjusted RR estimate, estimates calculated at a range of cut points under the same
p̂T⫹|Ē ⫺ (1⫺spec), is less than 0, then there are fewer sub- scenarios used for Fig. 2. Again, it can be seen that the optimal
jects testing positive than would be expected if all the unex- cut points occur for high values of specificity. Interestingly,
posed subjects were truly nondiseased. In this case, the data despite the fact that these estimates are unbiased, the MSE
are most consistent with no probability of disease in the of the adjusted estimate exceeds the MSE of the standard
unexposed, and the appropriate estimate of the RR is infinity. estimate for many cut points.
960 L.S. Magder, A.D. Fix / Journal of Clinical Epidemiology 56 (2003) 956–962

Fig. 3. Asymptotic mean squared error of the log RRadj (plots a and b), and log ORadj (plots c and d) at a range of cut points, under various scenarios. All
plots assume 200 subjects per group. Plots (a) and (c) are based on the assumption that the distributions of the quantitative assessment have the same
standard deviation, but differ in their means by two standard deviations. Plots (b) and (d) assume they differ by three standard deviations. The probability
of disease in the unexposed was assumed to equal 0.1 (thick lines), or 0.3 (thin lines). The probability of disease in the exposed was set so that the RR ⫽ 2
(plots a and b) or the OR ⫽ 2 (plots c and d). The numbers for the cut points on the horizontal axes refer to distances from the mean in the undiseased.

5. Example 6. Further comments


As mentioned in the introduction, there is currently epide- In practice, it will be impossible to determine the MSE
miologic interest in identifying risk factors for HSV8, a of the estimates at various cut points with certainty. For one
recently discovered virus associated with Kaposi’s sar- thing, there will generally be uncertainty regarding the
coma. Unfortunately, serologic assays to identify infection
are thought to have imperfect sensitivity and specificity. Table 2
Engels et al. [6] evaluated the sensitivity and specificity of Sensitivity and specificity of the K8.1 assay for detecting HSV8 at
several assays at different cut points. Table 2 shows the various cut points, and the resulting mean square errors of risk ratio
sensitivity and specificity at three optical density cut points estimates

for one of the enzyme-linked immunoassays designed to Optical density cut points
measure antibodies to the lytic phase glycoprotein K8.1. 0.80 1.00 1.50
Table 2 also shows the MSE of estimates of the risk ratio Sensitivitya 90% 85% 78%
under different scenarios, calculated using the formulae in Specificitya 83% 90% 98%
the appendices of this article. It can be seen that if the MSEb of log standard estimate of the risk ratio
standard estimate is used and the prevalence in the unex- Assuming prevalence in unexposed ⫽ 5% 0.33 0.26 0.16
Assuming prevalence in unexposed ⫽ 20% 0.114 0.072 0.038
posed is 5%, using a cut point of 1.5 leads to a far more precise MSEb of log adjusted estimate of the risk ratio
estimate than using a cut point of 0.8 (MSE ⫽ 0.16 compared Assuming prevalence in unexposed ⫽ 5% 0.79 0.55 0.26
to MSE ⫽ 0.33). A similar advantage of the larger cut point Assuming prevalence in unexposed ⫽ 20% 0.065 0.055 0.042
is seen when the prevalence is 20% and when the adjusted a
From Engels et al. [5].
b
estimate is used. Assuming 200 patients per group and a true risk ratio of 2.0.
L.S. Magder, A.D. Fix / Journal of Clinical Epidemiology 56 (2003) 956–962 961

sensitivity and specificity of the diagnostic test at various The methods described in this article are meant to be
cut points. For another thing, the formulae given in the used when disease status is a true dichotomy (e.g., infected/
Appendix provide only approximate MSEs, the quality of uninfected). This should be distinguished from the situation
which depend on the sample size. However, decisions often when disease status is matter of degree, and Z is a measure
have to be made in the presence of uncertainty, and the of the degree of disease (e.g., when the disease is obesity
formulae and the graphs in this article can still provide guid- and Z is body-mass index). In the latter case, the use of a
ance in choosing cut points. It is clear, for example, that cut point is mainly for the purpose of providing simpler
for relatively rare outcomes, good precision in estimation summaries of the data, and the notions of sensitivity and
requires high specificity, but is somewhat robust to depar- specificity do not apply. Considerations for choosing a cut
tures from high sensitivity. This would suggest that one point in that setting are discussed by Ragland [9].
should choose relatively high cut points in this context. The methods described in this article make the implicit
Interestingly, we found that in certain settings, the MSE assumption that the sensitivity and specificity of the diagnos-
of the standard estimates are lower than the MSE of the tic test are the same in both study groups. Although generally
adjusted estimates. This occurs because the standard esti- reasonable, this assumption can be relaxed by using different
mates are biased towards 1.0, reducing the probability of values for sensitivity and specificity in the terms in the
getting extremely large or small estimates. In fact, it can be formulae relate to each study group.
shown that the variances of the standard estimates are always
lower than the variances of the adjusted estimates. Thus, for
small sample sizes (where the MSE is predominantly de- Acknowledgments
termined by the variance), the MSE will be lower for
the standard estimates than for the adjusted estimates. This work was supported by research grant R0-1 AR
However, the lower MSE of the standard method does 43727 of the National Institutes of Health.
not mean that it is preferable to the adjusted estimates in
these settings. It can be argued based on the likelihood
principle [7] that if the sensitivity and specificity are known, Appendix
then the adjusted estimate of association is a more accurate
Formulae for the bias and variance of the standard estimates
representation of the information in the data with respect to
based on a given cutpoint.
the true value of the association. For example, consider a
data set in which the observed number of positive tests in Let pD|E and pD|Ē stand for the probability of disease in
the exposed group is less than would be expected even if the the exposed and unexposed respectively. Similarly, let pT+|E
true disease risk in the exposed was 0, given the imperfect and pT⫹|Ē stand for the probability of testing positive in the
specificity of the test. With such data it is arguable that the exposed and unexposed respectively based on a given cut
data are most consistent with a RR of 0. This is what point, T. Then,
the adjusted estimate would be, whereas the standard esti-
mate would not equal 0. Thus, if the goal of the analysis is pT⫹|E ⫽ PD|E sens ⫹ (1⫺pD|E)(1⫺spec) (1)
to report what the data say regarding the value of the associa- and
tion, it is best to use the adjusted estimate.
There is a third approach to estimating RRs and ORs in pT⫹|Ē ⫽ PD|Ē sens ⫹ (1⫺pD|Ē )(1⫺spec) (2)
this context that obviates the need to choose a specific cut
where “sens” and “spec” refer to the sensitivity and specific-
point. In brief, using methods described in Magder and
ity of the diagnostic test based on the chosen cut point. The
Hughes [5], the exact value of Z can be used in risk assess-
standard estimate for the RR, p̂T⫹|E/p̂T⫹|Ē, is an unbiased
ment using a probabilistic approach. Thus for example, those
estimate of pT⫹|E/pT⫹|Ē. Therefore,
with a very high value of Z might be classified as having a
higher probability of disease than those with a borderline
value of Z. These probabilities can be incorporated into an
algorithm to compute maximum likelihood estimates of risk
bias(logRR̂standard) ⫽ log
( ) pT⫹|E
pT⫹|Ē ( )
p
⫺ log D|E
pD|Ē
(3)

ratios or odds ratios. To use this approach the sensitivity Expression (3) can be rewritten in terms of sens, spec,
and specificity of the assay must be known (or assumed) pD|E and pD|E by substituting for pT⫹|E, and pT⫹|Ē based on
for multiple cut points. expressions (1) and (2).
Many studies seek estimates of the association of expo- Also, using the “delta” method [3],
sure and disease, while controlling for potential confounders. (1⫺pT⫹|E) (1⫺pT⫹|Ē)
The adjusted method described above can be extended to var(logRR̂standard) ≈ ⫹ (4)
nE pT⫹|E nE pT⫹|Ē
this context. A SAS macro is available on the Internet that
extends logistic regression to adjust for imperfect sensitivity where nE and nĒ are the number of study subjects in the exposed
and specificity of a diagnostic test [8]. and unexposed groups respectively. Again, expression
962 L.S. Magder, A.D. Fix / Journal of Clinical Epidemiology 56 (2003) 956–962

(4) can be written in terms of sens, spec, pD|E and pD|Ē by var(logOR̂adj)
substituting from expressions (1) and (2).
( )
2
Using analogous reasoning, sens ⫹ spec⫺1 pT⫹|E(1⫺pT⫹|E)

(pT ⫹ |E ⫹ spec ⫺ 1)(sens ⫺ pT⫹|E) nE

( )
p /(1⫺pT⫹|E) (8)
bias(logOR̂standard) ⫽ log T⫹|E (5)
pT⫹|Ē/(1⫺pT⫹|Ē)

( )
2
sens ⫹ spec ⫺ 1 pT⫹|Ē(1 ⫺ pT⫹|Ē)

( )
pD|E/(1⫺pD|E) ⫹
⫺ log (pT⫹|Ē ⫹ spec⫺1)(sens ⫺ pT⫹|Ē) nĒ
pD|Ē/(1⫺pD|Ē)

and

1 1 References
var(logOR̂standard) ⫽ ⫹ (6)
nE pT⫹|E nE(1⫺pT⫹|E) [1] Sox HC Jr, Blatt MA, Higgins MC, Marton KI. Medical decision
making. Boston, MA: Butterworth-Heinemann; 1988.
1 1 [2] Bickel PJ, Doksum KA. Mathematical statistics: basic ideas and se-
⫹ ⫹
nĒ pT⫹|Ē nĒ(1⫺pT⫹|Ē) lected topics. Oakland, CA: Holden-Day Inc.; 1977.
[3] Bishop YMM, Fienberg SE, Holland PW. Discrete multivariate analy-
sis: theory and practice. Massachusetts, MA: MIT Press; 1975.
[4] Copeland KT, Checkoway H, McMichael AJ, Holbrook RH. Bias due
to misclassification in the estimation of relative risk. Am J Epidemiol
Formulae for the variance of the adjusted estimates based 1977;105:488–95.
on a given cut point. [5] Magder LS, Hughes JP. Logistic regression when the outcome is mea-
sured with uncertainty. Am J Epidemiol 1997;146:195–203.
Using the delta method, the asymptotic variances of the [6] Engels EA, Whitby D, Goebel PB, Stossel A, Waters D, Pintus A, Contu
adjusted estimates can be derived. These are as follows: L, Bigger RJ, Goedert JJ. Identfying human herpesvirus 8 infection:
performance characteristics of serolgic assays. J Acquir Immune Defic
pT⫹|E(1⫺pT⫹|E) Syndr 2000;23:346–54.
var(logRR̂adj) ⫽ (7) [7] Royall RM. Statistical evidence. A likelihood paradigm. London: Chap-
(pT⫹|E ⫹ spec⫺1)2nE man & Hall; 1997.
[8] Web site. http://medschool.umaryland.edu/departments/Epidemiology/
pT⫹|Ē(1⫺pT⫹|Ē)
⫹ software.html
(pT⫹|Ē ⫹ spec⫺1)2nĒ [9] Ragland DR. Dichotomizing continuous outcome variables: Depen-
dence of the magnitude of association and statistical power on the
and cutpoint. Epidemiology 1992;3:434–40.

You might also like