Measures of Accuracy of Screening Tests
Introduction
Screening can be defined as the application of a medical procedure or test
Commented [RS1]: Screening is applied to a defined
population so as to detect persons with possible disease
identify people who are in the preclinical phase or as yet have no signs or symptoms of a
particular disease for the purpose of determining their likelihood of having the disease. The screening procedure itself does not diagnose the illness but can set an alarm to those who have a positive result for further evaluation with subsequent diagnostic tests or procedures. The goal of screening is to detect the disease in the earliest phase so that proper treatment or management can be done to reduce morbidity or mortality from the disease, when treatment is usually more successful. Some common examples of screening tests are pap smear, mammogram, clinical breast examination, blood pressure determination, cholesterol level, eye examination/vision test, and urinalysis.
Screening vs. Diagnostic Tests
Screening tests are different from diagnostic tests in terms of meaning, application and uses. Diagnostic tests are usually performed on individuals with a symptoms or sign of an illness whereas screening tests are applied on healthy individuals with no such symptoms or signs. Screening test is usually applied to a large population simultaneously whereas diagnostic test is applied on a single patient at a time. Diagnostic tests are more expensive and accurate as compared to screening test which is less accurate and nonexpensive. Diagnostic test provides a basis for initiation of treatment whereas screening test does not.


Natural History of Disease 

Application and effectiveness of screening test or programme depends upon natural course of disease. Screening is not very effective from public health point of view if the natural course of disease is short or in case of acute illness where the latent period is very short.
Commented [RS2]: Concept of screening may be more elaborated and explained in the context of natural history of disease to explain certain concepts viz. lead time.
Criteria for a Screening Programme
1. Lifethreatening diseases and those known to have serious and irreversible consequences if
not treated early are appropriate for screening. For example, Life threating disease such as Lung Cancer, disease having irreversible consequences such as Hypothyroidism.
2. Treatment of diseases at their earlier stages should be more effective than treatment begun
after the development of symptoms.
1
Commented [RS3]: These are not characteristics of a screening test. They may be written as characteristics of disease for which screening test is employed
10. The test should be less invasive, less painful or with minimal discomfort.
11. The test should be easy to administer or socially acceptable.
12. The test should be reliable i.e. consistent results on repeated test.
13. The test should be valid i.e. ability to distinguish between diseased and nondiseased
people.
Test Reliability (Consistency)
2
A screening test is considered reliable if it gives consistent results with repeated tests. Variability in the measurement can be the result of physiologic variation or the result of variables related to the method of testing. For example, if one were using a sphygmomanometer to measure blood pressure repeatedly over time in a single individual, the results might vary depending on:
Biological variability (BP normally varies within an individual). Instrument variability (is the sphygmomanometer reliable). Intraobserver variability (does a given tester perform the test the same way each time). Interobserver variability (do different testers perform the test the same way each time).
The reliability of all tests can potential be affected by one or more of these factors.
Test Validity (Accuracy)
Validity is the ability of a test to correctly measure what it intends to measure. It should have ability to correctly identify diseased and nondiseased persons. In diseased personspersons, it should give a positive result and in nondiseased persons it should give a negative result. The
validity of a test can be assessed if the test results can be compared either with a “true” measure
of the physiologic, biochemical, or pathologic state of the disease or with the occurrence of disease progression or a disease complication that the test result seeks to predict (1).
The diagnostic accuracy of a screening test gives us an answer to the following question “How
well this test discriminates between the two conditions of interest i.e. diseased and healthy, twostages of diseases etc. This discriminative ability can be quantified by the measures of diagnostic accuracy:
♦ Sensitivity and Specificity ♦ Positive and Negative predicative values (PPV, NPV) ♦ Likelihood ratio ♦ Area under the ROC curve (AUC)
Different measures of diagnostic accuracy relate to the different aspects of diagnostic procedure. Some measures are used to assess the discriminative property of the test; others are used to assess its predictive ability (2). While discriminative measures are mostly used by health policy decisions, predictive measures are most useful in predicting the probability of a
3
disease in an individual (3). Furthermore, it should be noted that measures of a test performance are not fixed indicators of a test quality and performance. Measures of diagnostic accuracy are very sensitive to the characteristics of the population in which the test accuracy is evaluated. Some measures largely depend on the disease prevalence, while others are highly sensitive to the spectrum of the disease in the studied population. It is therefore of utmost importance to know how to interpret them as well as when and under what conditions to use them (4).
A 2 x 2 table, or contingency table, is also used when testing the validity of a screening test, but note that this is a different contingency table than the ones used for summarizing cohort studies, randomized clinical trials, and casecontrol studies. The 2 x 2 table below shows the results of the evaluation of a screening test for diseased and nondiseased subjects.
Gold Standard 

Test Result 
Diseased 
Not Diseased 
Total 
Test Positive 
a (TP) 
b (FP) 
a + b 
Test Negative 
c (FN) 
d (TN) 
c + d 
Totals
a + c
b + d
a + b +c + d =N

The contingency table for evaluating a screening test lists the true disease status in the columns, and the observed screening test results are listed in the rows.
The
shows the results for a screening test. There are a + c subjects who are
ultimately found to have had disease, and b + d subjects remained free of disease during the
study. Among the a + c subjects with disease, “a” have a positive screening test (TPtrue positives), but “c “have negative tests (FNfalse negatives). Among the b + d subjects without disease, “d “have negative screening tests (TNtrue negatives), but “b “ are incorrectly have positive screening tests (FPfalse positives).
Commented [RS4]: May consider to explain concept of TP. FP. FN. TN using pictorial illustrations to make it more illustrative and selfexplanatory on the lines of explanation given in Leon Gordis.
Based on the outcome observed in the contingency table above, we will define different diagnostic accuracy of the test.
Sensitivity and Specificity
A. Sensitivity – It is defined as the test ability to correctly identify diseased subjects as test positive. It is conditional probability P (T+  D+) of getting positive test results (T+) in the
4
diseased subjects (D+). Hence, it relates to the potential of a test to recognise subjects with the disease. Numerically, it is expressed as conditional probability estimate as
Sensitivity = P(T+  D+) = a / (a + c).
It is usually expressed in per cent value such as 80% which mean the proportion of test positive among diseased subject is 80 out of 100.
B. Specificity – It is defined as the test ability to correctly identify a healthy or nondiseased subjects as test negative. It is conditional probability P (T  D) of getting test result negative (T) in the nondiseased subjects (D). Hence, it relates to the potential of a test to recognise subjects without the disease. Numerically, it is expressed as conditional probability estimate as
Specificity = P (T  D) = d / (b + d).
It is usually expressed in per cent value such as 80% which mean the proportion of test negative among nondiseased subject is 80 out of 100.
There was a common notion that neither sensitivity nor specificity depends on or influenced by the disease prevalence. It means that results estimated in one study population can easily be transferred to some other population with a different prevalence of the disease. Nonetheless, Sensitivity and specificity of a test often vary with prevalence, likely due to mechanisms that affect both prevalence and sensitivity and specificity, such as patient spectrum (5). Therefore, investigators are invited to think of the intended use of the test when designing a study of test accuracy, and specify the inclusion criteria that define the study population accordingly (6).
Along with sensitivity and specificity, accuracy is also an important indicator of diagnostic ability of a screening test. Accuracy is the proportion of true results, either true positive or true negative, in a population. It measures the degree of veracity of a diagnostic test on a condition. Numerically it is given as below:
Accuracy
True Positive (TP) + True Negative (TN)
a + d
=  = 
TP + FP + FN + TN
a + b + c + d
In addition to the equation show above, accuracy can be determined from sensitivity and specificity, where prevalence is known. Prevalence is the probability of disease in the population at a given time:
5
Accuracy = (sensitivity) x (prevalence) + (specificity) x (1  prevalence).
The numerical value of accuracy represents the proportion of true positive results (both true positive and true negative) in the selected population. An accuracy of 99% of times the test result is accurate, regardless positive or negative. This stays correct for most of the cases. However, it worth mentioning, the equation of accuracy implies that even if both sensitivity and specificity are high, say 99%, it does not suggest that the accuracy of the test is equally high as well. In addition to sensitivity and specificity, the accuracy is also determined by how common the disease in the selected population. A diagnosis for rare conditions in the population of interest may result in high sensitivity and specificity, but low accuracy. Accuracy needs to be interpreted cautiously (7).
Predictive Value
The validity of a test can also be expressed as the extent to which being categorized as positive or negative actually predicts the presence of the disease i.e. the ability of a test to predict disease among those who are test positive and nondisease who are test negative.
Positive Predictive Value (PPV) – It is the proportion of those with a positive test who have the disease. It is the probability that a subject has the disease given that the subject has a positive screening test result. In terms of Bayes’ Theorem, it is expressed as
PPV = P(D+ T+) =
P(T+D+) P(D+)  P(T+D+) P(D+) + P(T+D) P(D)
Sensitivity x Prevalence =  Sensitivity x Prevalence + (1 – Specificity) (1 – Prevalence)
=
a / (a + b)
PPV depends on sensitivity, specificity and prevalence of disease in the population. For a given sensitivity and specificity, the PPV increases as the prevalence of disease increase in the population.
Let us consider a screening test with sensitivity of 80% and specificity of 90% is used in population of 10,000 individuals each with 5%, 10% and 15% prevalence of disease
6
respectively. We see that PPV of a test with same sensitivity and specificity increases as prevalence of disease is increasing (Table2).
Table2 : 2 x 2 Contingency Tables with increasing prevalence
With 5 % Prevalence 

Test 
D+ 
D 
Total 
PPV 
Result 

T+ 

950 
1350 

T 

8650 

Total 

10000 

With 10 % Prevalence 

D+ 
D 
Total 

T+ 

900 
1700 

T 

8300 

Total 

10000 

With 15 % Prevalence 

D+ 
D 
Total 

T+ 

850 
2050 

T 
300 
7650 
7950 

Total 
1500 
8500 
10000 
Let us consider the screening test with sensitivity of 80% and prevalence of 10% but with varying specificity of 80%, 90% and 95% respectively. We see that PPV of a test with same sensitivity and prevalence increases as specificity of the test is increasing (Table2A).
Table2A : 2 x 2 Contingency Tables with increasing Specificity
With 80 % Specificity 

Test 
D+ 
D 
Total 
PPV 
Result 

T+ 
800 
1800 
2600 
30.76% 
T 
200 
7200 
7400 

Total 
1000 
9000 
10000 

With 90 % Specificity 

D+ 
D 
Total 

T+ 
800 
900 
1700 
47.05% 
T 
200 
8100 
8300 

Total 
1000 
9000 
10000 

With 95 % Specificity 

D+ 
D 
Total 

T+ 
800 
450 
1250 
64.0% 
T 
200 
8550 
8750 

Total 
1000 
9000 
10000 
7
Let us consider the screening test with specificity of 90% and prevalence of 10% but with varying sensitivity of 80%, 90% and 95% respectively. We see that PPV of a test with same specificity and prevalence increases as sensitivity of the test is increasing (Table2B).
Table2B : 2 x 2 Contingency Tables with increasing Sensitivity
With 80 % Sensitivity 

Test 
D+ 
D 
Total 
PPV 
Result 

T+ 
800 
900 
1700 
47.05% 
T 
200 
8100 
8300 

Total 
1000 
9000 
10000 

With 90 % Sensitivity 

D+ 
D 
Total 

T+ 
900 
900 
1800 
50.0% 
T 
100 
8100 
8200 

Total 
1000 
9000 
10000 

With 95 % Sensitivity 

D+ 
D 
Total 

T+ 
950 
900 
1850 
51.35% 
T 
50 
8100 
8150 

Total 
1000 
9000 
10000 
From the above three Tables – 2, 2A and 2B, we can see that the extent of increase of PPV is more rapid in case of increasing specificity of the test and increasing prevalence of the disease compared to the extent of increase when sensitivity of the test is increased. Hence, PPV value is more influence by Specificity of the test and prevalence of the disease.
Thus, for a screening test with a given sensitivity and specificity, the rarer the disease, the lower the PPV. In this sense, PPV serves as a crude measure of relative cost efficiency i.e. it reflects the ratio of the screening program benefits or yields (number of TP) to the cost of misdiagnoses (FPs + FNs) for a given number of screened subjects. Further PPV is more sensitive to changes in Specificity than to changes in sensitivity. Therefore, we can do more to improve the efficiency of a screening program, especially with a rare disease, by increasing the specificity of the test than by increasing the sensitivity. A PPV value of 50% indicates that the chance of having disease among those who tested positive is 50%. Negative Predictive Value (NPV) – It is the proportion of those with a negative test who do not have the disease in question. It is the probability that a subject is nondiseased given that the subject has a negative screening test result. In terms of Bayes’ Theorem, it is expressed as
8
NPV = P(D T) =
P(TD) P(D)  P(TD) P(D) + P(TD+) P(D+)
Specificity x (1 – Prevalence) =  Specificity x (1 – Prevalence) + (1 – Sensitivity) x Prevalence
=
d/ (c + d)
NPV very close to 1 indicates that testing negative is reassuring as to the absences of disease and that rescreening may not be worthwhile. If NPV falls short of 1 by an amount comparable with preclinical disease prevalence, much of the preclinical disease pool will be missed by the screening program. A low NPV is more likely to result from poor sensitivity than poor specificity. Hence, a screening test with high sensitivity will improve NPV.
Let us consider the screening test with specificity of 90% and prevalence of 10% but with varying sensitivity of 80%, 90% and 95% respectively. We see that NPV of a test with same specificity and prevalence increases as sensitivity of the test is increasing (Table3A).
Table3A : 2 x 2 Contingency Tables with increasing Sensitivity
With 80 % Sensitivity 

Test 
D+ 
D 
Total 
NPV 
Result 

T+ 
800 
900 
1700 

T 
200 
8100 
8300 
97.59% 
Total 
1000 
9000 
10000 

With 90 % Sensitivity 

D+ 
D 
Total 

T+ 
900 
900 
1800 

T 
100 
8100 
8200 
98.78% 
Total 
1000 
9000 
10000 

With 95 % Sensitivity 

D+ 
D 
Total 

T+ 
950 
900 
1850 

T 
50 
8100 
8150 
99.39% 
Total 
1000 
9000 
10000 
Let us consider the screening test with sensitivity of 80% and prevalence of 10% but with varying specificity of 80%, 90% and 95% respectively. We see that NPV of a test with same
9
sensitivity and prevalence does not increases as much as that of the increase in specificity of the test (Table3B). NPV is not very sensitive to increase in specificity of the disease.
Table3B : 2 x 2 Contingency Tables with increasing Specificity
With 80 % Specificity 

Test 
D+ 
D 
Total 
NPV 
Result 

T+ 
800 
1800 
2600 

T 
200 
7200 
7400 
97.3% 
Total 
1000 
9000 
10000 

With 90 % Specificity 

D+ 
D 
Total 

T+ 
800 
900 
1700 

T 
200 
8100 
8300 
97.6% 
Total 
1000 
9000 
10000 

With 95 % Specificity 

D+ 
D 
Total 

T+ 
800 
450 
1250 

T 
200 
8550 
8750 
97.7% 
Total 
1000 
9000 
10000 
Let us consider a screening test with sensitivity of 80% and specificity of 90% is used in population of 10,000 individuals each with 5%, 10% and 15% prevalence of disease respectively. We see that NPV of a test with same sensitivity and specificity decreases as the prevalence of disease increases (Table3C).
Table3C : 2 x 2 Contingency Tables with increasing prevalence
With 5 % Prevalence 

Test 
D+ 
D 
Total 
PPV 
Result 

T+ 
400 
950 
1350 

T 
100 
8550 
8650 
98.8% 
Total 
500 
9500 
10000 

With 10 % Prevalence 

D+ 
D 
Total 

T+ 
800 
900 
1700 

T 
200 
8100 
8300 
97.6% 
Total 
1000 
9000 
10000 

With 15 % Prevalence 

D+ 
D 
Total 

T+ 
1200 
850 
2050 

T 
300 
7650 
7950 
96.2% 
Total 
1500 
8500 
10000 
Thus Positive predictive value of a screening program can be improved by restricting the program to people at “high risk”, that is, those who have a relatively high prevalence of
10
preclinical diseases, or by restricting at a lower frequency to maintain the prevalence of preclinical disease in the target population at a higher level. Either approach leads to some overall loss of the value (PVN) of screening since fewer cases are detected and treated early.
(8).
Example – 1 : Following are the results of Pap smear & Cervical Biopsy, which was done on 600 patients attending gynae OPD in a hospital. Study the 2x2 table and answer the below mentioned questions.
Cervical Biopsy 

Cancer 
No Cancer 
Total 

Positive 
96 


Test 
Negative 
4 


Total 
100 

Calculate Sensitivity, Specificity, Positive predictive value (PPV) and. Negative predictive value (NPV).
Sensitivity = (96/100) x 100 = 96% Specificity = (250/500) x 100 = 50% PPV = (96/346) x 100 = 27.74% NPV = (250/254) x 100 = 98.42%
Example2: The sensitivity of a particular home pregnancy test is 80% if the test is used by a group of women in which 1/3 are actually pregnant and the positive predictive value is 50%, then what would be the Specificity of the test?
Solution: It is given Sensitivity = 80%, PPV = 50% and Prevalence = 1/3 = 33.3%
PPV is given by the formula as
PPV
Sensitivity x Prevalence =  Sensitivity x Prevalence + (1 – Specificity) (1 – Prevalence)
Putting the given values in the above expression, we get
Specificity = 60% (approx.)
Likelihood ratio (LR)
Likelihood ratio is a very useful measure of diagnostic accuracy. It is defined as the ratio of expected test result in subjects with a certain state/disease to the subjects without the disease. As such, LR directly links the pretest and posttest probability of a disease in a specific patient
(9).
11
Simplified, LR tells us how many times more likely particular test result is in subjects with the disease than in those without disease. When both probabilities are equal, such test is of no value and its LR = 1.
Likelihood ratio for positive test results (LR+) tells us how much more likely the positive test result is to occur in subjects with the disease compared to those without the disease. Numerically it is given by the formula as below:
Pr (T+│D+) 
Sensitivity 

LR+ = 
 
= 

Pr (T+│D) 
 (1 – Specificity) 
LR+ is usually higher than 1 because is it more likely that the positive test result will occur in subjects with the disease than in subject without the disease.
LR+ is the best indicator for rulingin diagnosis. The higher the LR+ the test is more indicative of a disease. Good diagnostic tests have LR+ > 10 and their positive result has significant contribution to the diagnosis.
Likelihood ratio for negative test result (LR) represents the ratio of the probability that a negative result will occur in subjects with the disease to the probability that the same result will occur in subjects without the disease. Therefore, LR tells us how much less likely the negative test result is to occur in a patient than in a subject without disease.
Pr (T│D+) 
(1  Sensitivity) 

LR = 
 
= 

Pr (T│D) 
 Specificity 
LR is usually less than 1 because it is less likely that negative test result occurs in subjects with than in subjects without disease.
Area under the ROC curve (AUC)
All the above diagnostic indicators are based when the outcome of screening test is a binary variable i.e. either positive or negative. There are many screening test where outcome is a continuous variable such as prostate specific antigen (PSA) test for prostate cancer in which a
12
test value below 4.0 is considered to be normal and above 4.0 to be abnormal. Clearly there will be patients with PSA values below 4.0 that are abnormal (false negative) and those above 4.0 that are normal (false positive). Receiver operating characteristic (ROC) curves are used in medicine to determine a cutoff value for a clinical test. The goal of an ROC curve analysis is to determine the cutoff value.
The sensitivity and specificity of a diagnostic test depends on more than just the "quality" of the testthey also depend on the definition of what constitutes an abnormal test. Look at the idealized graph at right showing the number of patients with and without a disease arranged according to the value of a diagnostic test. This distributionsThis distribution overlapthe test (like most) does not distinguish normal from disease with 100% accuracy. The area of overlap indicates where the test cannot distinguish normal from disease. In practice, we choose a cut off point (indicated by the vertical black line) above which we consider the test to be abnormal and below which we consider the test to be normal. The position of the cutoff point will determine the number of true positive, true negatives, false positives and false negatives. We may wish to use different cutoff points for different clinical situations if we wish to minimize one of the erroneous types of test results.
Assume that there are two groups of men and by using a ‘gold standard’ technique one group is known to be normal (negative), not have prostate cancer, and the other is known to have prostate cancer (positive). A blood measurement of prostate‐specific antigen is made in all men and used to test for the disease. The test will find some, but not all, abnormal to have the disease. The ROC curve analysis of the PSA test will find a cutoff value that will, in some
13
way, minimize the number of false positives and false negatives. Minimizing the false positives and false negatives is the same as maximizing the sensitivity and specificity. The receiver operating characteristic (ROC) curve is the plot that displays the full picture of tradeoff between the sensitivity (true positive rate) and (1 specificity) (false positive rate) across a series of cutoff points. Area under the ROC curve is considered as an effective measure of inherent validity of a diagnostic test. This curve is useful in:
(i) Evaluating the discriminatory ability of a test to correctly pick up diseased and nondiseased
subjects
(ii) Finding optimal cutoff point to least misclassify diseased and nondiseased subjects
(iii) Comparing efficacy of two or more medical tests for assessing the same disease
(iv) Comparing two or more observers measuring the same test (interobserver variability).
Nonparametric and parametric methods to obtain area under the ROC curve
Statistical software provides nonparametric and parametric methods for obtaining the area under ROC curve. The user has to make a choice. The following details may help.
Nonparametric methods are distributionfree and the resulting area under the ROC curve is called empirical. First such method uses trapezoidal rule. If sensitivity and specificity are denoted by Sn and Sp, respectively, the trapezoidal rule calculates the area by joining the points (Sn, 1 – Sp) at each interval value of the continuous test and draws a straight line joining the xaxis. This forms several trapezoids and their area can be easily calculated and summed. Another nonparametric method uses MannWhitney statistics, also known as Wilcoxon rank sum statistic and the cindex for calculating area. Both these nonparametric methods of estimating AUC estimate have been found equivalent (10).
Parametric methods are used when the statistical distribution of test values in diseased and non diseased is known. Binomial distribution is commonly used for this purpose. This is applicable when both diseased and nondiseased test values follow normal distribution. If data are actually binomial or a transformation such as log, square or BoxCox makes the data binomial then the relevant parameters can be easily estimated by the means and variances of test values in diseased and nondiseased subjects. For details, see (9, 11).
14
The choice of method to calculate AUC for continuous test values essentially depends upon availability of statistical software. Binomial method produces the smooth ROC curve, further statistics can be easily calculated but gives biased results when data are degenerated and distribution is bimodal (1213). When software for both parametric and nonparametric methods is available, conclusion should be based on the method that yields greater precision of estimate of inherent validity, namely, of AUC.
Examples of ROC curve
Patients with Suspected Hypothyroidism: Consider the following data on patients with suspected hypothyroidism reported (14). T4 and TSH values were measured in ambulatory patients with suspected hypothyroidism and TSH values was used as a gold standard for determining which patients were truly hypothyroid.
T4 value 
Hypothyroid 
Euthyroid 

5 or less 
18 
1 

5.1 
 7 
7 
17 
7.1 
 9 
4 
36 
9 or more 
3 
39 

Totals: 
32 
93 
Notice that these authors found considerable overlap in T4 values among the hypothyroid and euthyroid patients. Further, the lower the T4 value, the more likely the patients are to be hypothyroid.
Of a total of 125 subjects, 32 are known to be hypothyroid and 93 are known to have normal thyroid function. All subjects are assessed with respect to T4 (thyroxine) levels, and then sorted among the four ordinal categories: T4<5.1, T4=5.1 to 7.0, T4=7.1 to 9.0, and T4>9.0. Of the 19 subjects with T4 levels lower than 5.1, 18 were in fact hypothyroid while only 1 was euthyroid. Thus, if a T4 of 5 or less were taken as an indication of hypothyroidism, this measure would yield 18 true positives and 1 false positive, with a truepositive rate (sensitivity) of 18/32=.5625 and a falsepositive rate (1specificity) of 1/93=.0108.
15
Observed Frequencies 
Cumulative Rates 

T4 Value 
Euthyroid 
Hypothyroid 
Euthyroid 
Hypothyroid 
Diagnostic 
False 
True 
False 
True 
Level 
Positive 
Positive 
Positive 
Positive 
<5.1 
1 
18 
.0108 
.5625 
5.17.0 
17 
7 
.1935 
.7813 
7.19.0 
36 
4 
.5806 
.9063 
>9.0 
39 
3 
1.0 
1.0 
Totals: 
93 
32 
Similarly, 7 of the hypothyroid subjects and 17 of the euthyroid had T4 levels between 5.1 and 7.0. Thus, if any value of T4 less than 7.1 were taken as an indication of hypothyroidism, this measure would yield 18+7=25 true positives and 1+17=18 false positive, with a true positive rate of 25/32=.7813 and a falsepositive rate of 18/93=.1935. And so on for the other diagnostic levels, T4=7.1 to 9.0, and T4>9.0.
For the present example k=4, so the curve is fitted to the first three of the bivariate pairs, as shown below in Graph A.
The area under the T4 ROC curve is 0.872. The T4 would be considered to be "good" at separating hypothyroid from euthyroid patients.
Interpretation of ROC curve
Total area under ROC curve is a single index for measuring the performance of a test. The larger the AUC, the better is overall performance of the medical test to correctly identify
16
diseased and nondiseased subjects. Equal AUCs of two tests represents similar overall performance of tests but this does not necessarily mean that both the curves are identical. They may cross each other.
Figure 1 depicts three different ROC curves. Considering the area under the curve, test A is better than both B and C, and the curve is closer to the perfect discrimination. Test B has good validity and test C has moderate.
Figure 1: Three ROC curves with different areas under the curve
The accuracy of the test depends on how well the test separates the group being tested into those with and without the disease in question. Accuracy is measured by the area under the ROC curve. An area of 1 represents a perfect test; an area of .5 represents a worthless test. A rough guide for classifying the accuracy of a diagnostic test is the traditional academic point system:

.901 = excellent (A) 

.80.90 = good (B) 

.70.80 = fair (C) 

.60.70 = poor (D) 

.50.60 = fail (F) 
17
a very important from public health point of views as it helps people in perceiving
symptoms of their own illness and then consult physicians for diagnosis and treatment. The success of any screening program at reducing morbidity and mortality depends on various factors such as interrelations between the disease experience of the target population, the characteristics of the screening procedures, and the effectiveness of the methods of treating disease early.
Commented [RS5]: Before ending, may consider to comment on evaluating a screening programme more so when characteristics of screening programme is described at the beginning of this chapter.
References:
1. Noel S. Weiss. Clinical Epidemiology – Chapter 32 Modern Epidemiology, Third Edition, Editors Rothman KJ, Greenland S and Lash TL.
2. Irwig L, Bossuyt P, Glasziou P, Gatsonis C, Lijmer J. Designing studies to ensure that
18
estimates of test accuracy are transferable. BMJ. 2002; 324(7338): 66971.
3. Raslich MA. Markert RJ, Stutes SA. Selecting and interpreting diagnostic tests. Biochemia Medica 2007; 17(2):139270.
4. AnaMaria Šimundić . Measures of diagnostic accuracy: basic definitions. Department of Molecular Diagnostics University Department of Chemistry, Sestre milosrdnice University Hospital, Zagreb, Croatia. Assessed on 12/02/2017 www.ifcc.org/ifccfiles/docs/190404200805.pdf, Page no. 2.
5. Mariska M.G. Leeflang MG, Anne W.S. Rutjes AWS, Reitsma JB MD, Hooft L, Bossuyt P. Variation of a test’s sensitivity and specificity with disease prevalence. CMAJ. 2013, August 6(11): 185.
6. Irwig L, Bossuyt P, Glasziou P, et al. Designing studies to ensure that estimates of test accuracy are transferable. BMJ. 2002; 324:66971.
7. Wen Zhu, Nancy Zeng, Ning Wang. Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS® Implementations. NESUG 2010, Health Care and Life Sciences.
8. Alan S. Morrison. Screening – Chapter 25. Modern Epidemiology, Second Edition. Editors Rothman KJ and Greenland S.
Deeks

Altman 
DG. 
Diagnostic 
tests 
4: 
likelihood 
ratios. 
BMJ 
2004; 
17, 
329(7458):1689. 
10. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:2936.
11. Zhou Xh, Obuchowski NA, McClish DK. Statistical Methods in Diagnostic Medicine. New York: John Wiley and Sons, Inc, 2002.
12. Faraggi D, Reiser B. Estimating of area under the ROC curve. Stat Med 2002; 21:3093
3106.
13. Hajian Tilaki KO, Hanley JA, Joseph L, Collet JP. A comparison of parametric and nonparametric approaches to ROC analysis of quantitative diagnosis tests. Med Decis Making 1997; 17:94102.
14. Goldstein BJ and Mushlin AI. Use of a single thyroxine test to evaluate ambulatory medical patients for suspected hypothyroidism. J Gen Intern Med. 1987 Jan Feb;2(1):204.
19