You are on page 1of 19

Measures of Accuracy of Screening Tests

Introduction

Screening can be defined as the application of a medical procedure or test

to a population to
to a population to

Commented [RS1]: Screening is applied to a defined

population so as to detect persons with possible disease

identify people who are in the pre-clinical phase or as yet have no signs or symptoms of a

particular disease for the purpose of determining their likelihood of having the disease. The screening procedure itself does not diagnose the illness but can set an alarm to those who have a positive result for further evaluation with subsequent diagnostic tests or procedures. The goal of screening is to detect the disease in the earliest phase so that proper treatment or management can be done to reduce morbidity or mortality from the disease, when treatment is usually more successful. Some common examples of screening tests are pap smear, mammogram, clinical breast examination, blood pressure determination, cholesterol level, eye examination/vision test, and urinalysis.

Screening vs. Diagnostic Tests

Screening tests are different from diagnostic tests in terms of meaning, application and uses. Diagnostic tests are usually performed on individuals with a symptoms or sign of an illness whereas screening tests are applied on healthy individuals with no such symptoms or signs. Screening test is usually applied to a large population simultaneously whereas diagnostic test is applied on a single patient at a time. Diagnostic tests are more expensive and accurate as compared to screening test which is less accurate and non-expensive. Diagnostic test provides a basis for initiation of treatment whereas screening test does not.

Natural History of Disease
Natural History of Disease
 

Natural History of Disease

Natural History of Disease
Natural History of Disease
 
Natural History of Disease

Application and effectiveness of screening test or programme depends upon natural course of disease. Screening is not very effective from public health point of view if the natural course of disease is short or in case of acute illness where the latent period is very short.

Commented [RS2]: Concept of screening may be more elaborated and explained in the context of natural history of disease to explain certain concepts viz. lead time.

Criteria for a Screening Programme

  • 1. Life-threatening diseases and those known to have serious and irreversible consequences if

not treated early are appropriate for screening. For example, Life threating disease such as Lung Cancer, disease having irreversible consequences such as Hypothyroidism.

  • 2. Treatment of diseases at their earlier stages should be more effective than treatment begun

after the development of symptoms.

1

3. The prevalence of the detectable preclinical phase of disease has to be high among the
3.
The prevalence of the detectable preclinical phase of disease has to be high among the
population screened. High prevalence would reduce the relative costs of the screening program
and increase positive predictive value.
4.
A screening program that finds diseases that occur less often could only benefit few
individuals. Such a program might prevent some deaths. While preventing even one death is
important, given limited resources, a more cost-effective program for diseases that are more
common should be given a higher priority, because it will help more people.
5.
In some cases though, screening for low prevalence diseases is also cost effective, if the cost
of screening is less than the cost of care if the disease is not detected early.
6.
A suitable screening test must be available. Suitability criteria includes adequate sensitivity
and specificity, low cost, ease of administration, safe, imposes minimal discomfort upon
administration, and is acceptable to both patients and practitioners.
7.
There must also be appropriate follow-up of those individuals with positive screening results
to ensure thorough diagnostic testing occurs.
Characteristics of a Screening Test
1.
The given health condition or disease should be an important health problem.
2.
There should be a treatment for the disease or condition.
3.
There should be proper facilities for diagnosis and treatment after screening.
4.
There should be a latent or asymptomatic stage of the disease.
5.
The natural history of the disease should be adequately understood.
6.
There should be an agreed policy on whom to treat.
7.
There should be a continuous process of case-finding in the population, not just a ‘once and
for all’ project.
8.
The test used should be sensitive.
9.
The test should be less expensive.

Commented [RS3]: These are not characteristics of a screening test. They may be written as characteristics of disease for which screening test is employed

  • 10. The test should be less invasive, less painful or with minimal discomfort.

  • 11. The test should be easy to administer or socially acceptable.

  • 12. The test should be reliable i.e. consistent results on repeated test.

  • 13. The test should be valid i.e. ability to distinguish between diseased and non-diseased

people.

Test Reliability (Consistency)

2

A screening test is considered reliable if it gives consistent results with repeated tests. Variability in the measurement can be the result of physiologic variation or the result of variables related to the method of testing. For example, if one were using a sphygmomanometer to measure blood pressure repeatedly over time in a single individual, the results might vary depending on:

Biological variability (BP normally varies within an individual). Instrument variability (is the sphygmomanometer reliable). Intra-observer variability (does a given tester perform the test the same way each time). Inter-observer variability (do different testers perform the test the same way each time).

The reliability of all tests can potential be affected by one or more of these factors.

Test Validity (Accuracy)

Validity is the ability of a test to correctly measure what it intends to measure. It should have ability to correctly identify diseased and non-diseased persons. In diseased personspersons, it should give a positive result and in non-diseased persons it should give a negative result. The

validity of a test can be assessed if the test results can be compared either with a “true” measure

of the physiologic, biochemical, or pathologic state of the disease or with the occurrence of disease progression or a disease complication that the test result seeks to predict (1).

The diagnostic accuracy of a screening test gives us an answer to the following question “How

well this test discriminates between the two conditions of interest i.e. diseased and healthy, two-stages of diseases etc. This discriminative ability can be quantified by the measures of diagnostic accuracy:

Sensitivity and Specificity Positive and Negative predicative values (PPV, NPV) Likelihood ratio Area under the ROC curve (AUC)

Different measures of diagnostic accuracy relate to the different aspects of diagnostic procedure. Some measures are used to assess the discriminative property of the test; others are used to assess its predictive ability (2). While discriminative measures are mostly used by health policy decisions, predictive measures are most useful in predicting the probability of a

3

disease in an individual (3). Furthermore, it should be noted that measures of a test performance are not fixed indicators of a test quality and performance. Measures of diagnostic accuracy are very sensitive to the characteristics of the population in which the test accuracy is evaluated. Some measures largely depend on the disease prevalence, while others are highly sensitive to the spectrum of the disease in the studied population. It is therefore of utmost importance to know how to interpret them as well as when and under what conditions to use them (4).

A 2 x 2 table, or contingency table, is also used when testing the validity of a screening test, but note that this is a different contingency table than the ones used for summarizing cohort studies, randomized clinical trials, and case-control studies. The 2 x 2 table below shows the results of the evaluation of a screening test for diseased and non-diseased subjects.

 

Gold Standard

 

Test Result

Diseased

Not Diseased

Total

Test Positive

a (TP)

b (FP)

a + b

Test Negative

c (FN)

d (TN)

c + d

Totals a + c b + d a + b +c + d =N
Totals
a + c
b + d
a + b +c + d =N

The contingency table for evaluating a screening test lists the true disease status in the columns, and the observed screening test results are listed in the rows.

The

table shown above
table shown above

shows the results for a screening test. There are a + c subjects who are

disease in an individual (3). Furthermore, it should be noted that measures of a test performance

ultimately found to have had disease, and b + d subjects remained free of disease during the

study. Among the a + c subjects with disease, “a” have a positive screening test (TP-true positives), but “c “have negative tests (FN-false negatives). Among the b + d subjects without disease, “d “have negative screening tests (TN-true negatives), but “b “ are incorrectly have positive screening tests (FP-false positives).

Commented [RS4]: May consider to explain concept of TP. FP. FN. TN using pictorial illustrations to make it more illustrative and self-explanatory on the lines of explanation given in Leon Gordis.

Based on the outcome observed in the contingency table above, we will define different diagnostic accuracy of the test.

Sensitivity and Specificity

A. Sensitivity It is defined as the test ability to correctly identify diseased subjects as test positive. It is conditional probability P (T+ | D+) of getting positive test results (T+) in the

4

diseased subjects (D+). Hence, it relates to the potential of a test to recognise subjects with the disease. Numerically, it is expressed as conditional probability estimate as

Sensitivity = P(T+ | D+) = a / (a + c).

It is usually expressed in per cent value such as 80% which mean the proportion of test positive among diseased subject is 80 out of 100.

B. Specificity It is defined as the test ability to correctly identify a healthy or non-diseased subjects as test negative. It is conditional probability P (T- | D-) of getting test result negative (T-) in the non-diseased subjects (D-). Hence, it relates to the potential of a test to recognise subjects without the disease. Numerically, it is expressed as conditional probability estimate as

Specificity = P (T- | D-) = d / (b + d).

It is usually expressed in per cent value such as 80% which mean the proportion of test negative among non-diseased subject is 80 out of 100.

There was a common notion that neither sensitivity nor specificity depends on or influenced by the disease prevalence. It means that results estimated in one study population can easily be transferred to some other population with a different prevalence of the disease. Nonetheless, Sensitivity and specificity of a test often vary with prevalence, likely due to mechanisms that affect both prevalence and sensitivity and specificity, such as patient spectrum (5). Therefore, investigators are invited to think of the intended use of the test when designing a study of test accuracy, and specify the inclusion criteria that define the study population accordingly (6).

Along with sensitivity and specificity, accuracy is also an important indicator of diagnostic ability of a screening test. Accuracy is the proportion of true results, either true positive or true negative, in a population. It measures the degree of veracity of a diagnostic test on a condition. Numerically it is given as below:

Accuracy

True Positive (TP) + True Negative (TN)

a + d

= ----------------------------------------------------- = ------------------

TP + FP + FN + TN

a + b + c + d

In addition to the equation show above, accuracy can be determined from sensitivity and specificity, where prevalence is known. Prevalence is the probability of disease in the population at a given time:

5

Accuracy = (sensitivity) x (prevalence) + (specificity) x (1 - prevalence).

The numerical value of accuracy represents the proportion of true positive results (both true positive and true negative) in the selected population. An accuracy of 99% of times the test result is accurate, regardless positive or negative. This stays correct for most of the cases. However, it worth mentioning, the equation of accuracy implies that even if both sensitivity and specificity are high, say 99%, it does not suggest that the accuracy of the test is equally high as well. In addition to sensitivity and specificity, the accuracy is also determined by how common the disease in the selected population. A diagnosis for rare conditions in the population of interest may result in high sensitivity and specificity, but low accuracy. Accuracy needs to be interpreted cautiously (7).

Predictive Value

The validity of a test can also be expressed as the extent to which being categorized as positive or negative actually predicts the presence of the disease i.e. the ability of a test to predict disease among those who are test positive and non-disease who are test negative.

Positive Predictive Value (PPV) It is the proportion of those with a positive test who have the disease. It is the probability that a subject has the disease given that the subject has a positive screening test result. In terms of Bayes’ Theorem, it is expressed as

PPV = P(D+| T+) =

P(T+|D+) P(D+) ------------------------------------------ P(T+|D+) P(D+) + P(T+|D-) P(D-)

Sensitivity x Prevalence = --------------------------------------------------------------------------- Sensitivity x Prevalence + (1 Specificity) (1 Prevalence)

=

a / (a + b)

PPV depends on sensitivity, specificity and prevalence of disease in the population. For a given sensitivity and specificity, the PPV increases as the prevalence of disease increase in the population.

Let us consider a screening test with sensitivity of 80% and specificity of 90% is used in population of 10,000 individuals each with 5%, 10% and 15% prevalence of disease

6

respectively. We see that PPV of a test with same sensitivity and specificity increases as prevalence of disease is increasing (Table-2).

Table-2 : 2 x 2 Contingency Tables with increasing prevalence

 

With 5 % Prevalence

   

Test

D+

D-

Total

PPV

Result

T+

  • 400 29.6%

950

1350

 

T-

  • 100 8550

 

8650

 

Total

  • 500 9500

 

10000

 
 

With 10 % Prevalence

   
 

D+

D-

Total

 

T+

  • 800 47.05%

900

1700

 

T-

  • 200 8100

 

8300

 

Total

  • 1000 9000

 

10000

 
 

With 15 % Prevalence

   
 

D+

D-

Total

 

T+

  • 1200 58.53%

850

2050

 

T-

300

7650

7950

 

Total

1500

8500

10000

 

Let us consider the screening test with sensitivity of 80% and prevalence of 10% but with varying specificity of 80%, 90% and 95% respectively. We see that PPV of a test with same sensitivity and prevalence increases as specificity of the test is increasing (Table-2A).

Table-2A : 2 x 2 Contingency Tables with increasing Specificity

 

With 80 % Specificity

   

Test

D+

D-

Total

PPV

Result

T+

800

1800

2600

30.76%

T-

200

7200

7400

 

Total

1000

9000

10000

 
 

With 90 % Specificity

   
 

D+

D-

Total

 

T+

800

900

1700

47.05%

T-

200

8100

8300

 

Total

1000

9000

10000

 
 

With 95 % Specificity

   
 

D+

D-

Total

 

T+

800

450

1250

64.0%

T-

200

8550

8750

 

Total

1000

9000

10000

 

7

Let us consider the screening test with specificity of 90% and prevalence of 10% but with varying sensitivity of 80%, 90% and 95% respectively. We see that PPV of a test with same specificity and prevalence increases as sensitivity of the test is increasing (Table-2B).

Table-2B : 2 x 2 Contingency Tables with increasing Sensitivity

 

With 80 % Sensitivity

   

Test

D+

D-

Total

PPV

Result

T+

800

900

1700

47.05%

T-

200

8100

8300

 

Total

1000

9000

10000

 
 

With 90 % Sensitivity

   
 

D+

D-

Total

 

T+

900

900

1800

50.0%

T-

100

8100

8200

 

Total

1000

9000

10000

 
 

With 95 % Sensitivity

   
 

D+

D-

Total

 

T+

950

900

1850

51.35%

T-

50

8100

8150

 

Total

1000

9000

10000

 

From the above three Tables 2, 2A and 2B, we can see that the extent of increase of PPV is more rapid in case of increasing specificity of the test and increasing prevalence of the disease compared to the extent of increase when sensitivity of the test is increased. Hence, PPV value is more influence by Specificity of the test and prevalence of the disease.

Thus, for a screening test with a given sensitivity and specificity, the rarer the disease, the lower the PPV. In this sense, PPV serves as a crude measure of relative cost efficiency i.e. it reflects the ratio of the screening program benefits or yields (number of TP) to the cost of misdiagnoses (FPs + FNs) for a given number of screened subjects. Further PPV is more sensitive to changes in Specificity than to changes in sensitivity. Therefore, we can do more to improve the efficiency of a screening program, especially with a rare disease, by increasing the specificity of the test than by increasing the sensitivity. A PPV value of 50% indicates that the chance of having disease among those who tested positive is 50%. Negative Predictive Value (NPV) It is the proportion of those with a negative test who do not have the disease in question. It is the probability that a subject is non-diseased given that the subject has a negative screening test result. In terms of Bayes’ Theorem, it is expressed as

8

NPV = P(D-| T-) =

P(T-|D-) P(D-) ------------------------------------------ P(T-|D-) P(D-) + P(T-|D+) P(D+)

Specificity x (1 Prevalence) = --------------------------------------------------------------------------- Specificity x (1 Prevalence) + (1 Sensitivity) x Prevalence

=

d/ (c + d)

NPV very close to 1 indicates that testing negative is reassuring as to the absences of disease and that rescreening may not be worthwhile. If NPV falls short of 1 by an amount comparable with pre-clinical disease prevalence, much of the pre-clinical disease pool will be missed by the screening program. A low NPV is more likely to result from poor sensitivity than poor specificity. Hence, a screening test with high sensitivity will improve NPV.

Let us consider the screening test with specificity of 90% and prevalence of 10% but with varying sensitivity of 80%, 90% and 95% respectively. We see that NPV of a test with same specificity and prevalence increases as sensitivity of the test is increasing (Table-3A).

Table-3A : 2 x 2 Contingency Tables with increasing Sensitivity

 

With 80 % Sensitivity

   

Test

D+

D-

Total

NPV

Result

T+

800

900

1700

 

T-

200

8100

8300

97.59%

Total

1000

9000

10000

 
 

With 90 % Sensitivity

   
 

D+

D-

Total

 

T+

900

900

1800

 

T-

100

8100

8200

98.78%

Total

1000

9000

10000

 
 

With 95 % Sensitivity

   
 

D+

D-

Total

 

T+

950

900

1850

 

T-

50

8100

8150

99.39%

Total

1000

9000

10000

 

Let us consider the screening test with sensitivity of 80% and prevalence of 10% but with varying specificity of 80%, 90% and 95% respectively. We see that NPV of a test with same

9

sensitivity and prevalence does not increases as much as that of the increase in specificity of the test (Table-3B). NPV is not very sensitive to increase in specificity of the disease.

Table-3B : 2 x 2 Contingency Tables with increasing Specificity

 

With 80 % Specificity

   

Test

D+

D-

Total

NPV

Result

T+

800

1800

2600

 

T-

200

7200

7400

97.3%

Total

1000

9000

10000

 
 

With 90 % Specificity

   
 

D+

D-

Total

 

T+

800

900

1700

 

T-

200

8100

8300

97.6%

Total

1000

9000

10000

 
 

With 95 % Specificity

   
 

D+

D-

Total

 

T+

800

450

1250

 

T-

200

8550

8750

97.7%

Total

1000

9000

10000

 

Let us consider a screening test with sensitivity of 80% and specificity of 90% is used in population of 10,000 individuals each with 5%, 10% and 15% prevalence of disease respectively. We see that NPV of a test with same sensitivity and specificity decreases as the prevalence of disease increases (Table-3C).

Table-3C : 2 x 2 Contingency Tables with increasing prevalence

 

With 5 % Prevalence

   

Test

D+

D-

Total

PPV

Result

T+

400

950

1350

 

T-

100

8550

8650

98.8%

Total

500

9500

10000

 
 

With 10 % Prevalence

   
 

D+

D-

Total

 

T+

800

900

1700

 

T-

200

8100

8300

97.6%

Total

1000

9000

10000

 
 

With 15 % Prevalence

   
 

D+

D-

Total

 

T+

1200

850

2050

 

T-

300

7650

7950

96.2%

Total

1500

8500

10000

 

Thus Positive predictive value of a screening program can be improved by restricting the program to people at “high risk”, that is, those who have a relatively high prevalence of

10

preclinical diseases, or by restricting at a lower frequency to maintain the prevalence of preclinical disease in the target population at a higher level. Either approach leads to some overall loss of the value (PVN) of screening since fewer cases are detected and treated early.

(8).

Example 1 : Following are the results of Pap smear & Cervical Biopsy, which was done on 600 patients attending gynae OPD in a hospital. Study the 2x2 table and answer the below mentioned questions.

   

Cervical Biopsy

 
   

Cancer

No Cancer

Total

Positive

96

  • 250 346

 

Test

Negative

4

  • 250 254

 

Total

100

  • 500 600

 

Calculate Sensitivity, Specificity, Positive predictive value (PPV) and. Negative predictive value (NPV).

Sensitivity = (96/100) x 100 = 96% Specificity = (250/500) x 100 = 50% PPV = (96/346) x 100 = 27.74% NPV = (250/254) x 100 = 98.42%

Example-2: The sensitivity of a particular home pregnancy test is 80% if the test is used by a group of women in which 1/3 are actually pregnant and the positive predictive value is 50%, then what would be the Specificity of the test?

Solution: It is given Sensitivity = 80%, PPV = 50% and Prevalence = 1/3 = 33.3%

PPV is given by the formula as

PPV

Sensitivity x Prevalence = --------------------------------------------------------------------------- Sensitivity x Prevalence + (1 Specificity) (1 Prevalence)

Putting the given values in the above expression, we get

Specificity = 60% (approx.)

Likelihood ratio (LR)

Likelihood ratio is a very useful measure of diagnostic accuracy. It is defined as the ratio of expected test result in subjects with a certain state/disease to the subjects without the disease. As such, LR directly links the pre-test and post-test probability of a disease in a specific patient

(9).

11

Simplified, LR tells us how many times more likely particular test result is in subjects with the disease than in those without disease. When both probabilities are equal, such test is of no value and its LR = 1.

Likelihood ratio for positive test results (LR+) tells us how much more likely the positive test result is to occur in subjects with the disease compared to those without the disease. Numerically it is given by the formula as below:

 

Pr (T+│D+)

Sensitivity

LR+ =

--------------------

=

Pr (T+│D-)

-------------------------- (1 Specificity)

LR+ is usually higher than 1 because is it more likely that the positive test result will occur in subjects with the disease than in subject without the disease.

LR+ is the best indicator for ruling-in diagnosis. The higher the LR+ the test is more indicative of a disease. Good diagnostic tests have LR+ > 10 and their positive result has significant contribution to the diagnosis.

Likelihood ratio for negative test result (LR-) represents the ratio of the probability that a negative result will occur in subjects with the disease to the probability that the same result will occur in subjects without the disease. Therefore, LR- tells us how much less likely the negative test result is to occur in a patient than in a subject without disease.

 

Pr (T-│D+)

(1 - Sensitivity)

LR- =

--------------------

=

Pr (T-│D-)

-------------------------- Specificity

LR- is usually less than 1 because it is less likely that negative test result occurs in subjects with than in subjects without disease.

Area under the ROC curve (AUC)

All the above diagnostic indicators are based when the outcome of screening test is a binary variable i.e. either positive or negative. There are many screening test where outcome is a continuous variable such as prostate specific antigen (PSA) test for prostate cancer in which a

12

test value below 4.0 is considered to be normal and above 4.0 to be abnormal. Clearly there will be patients with PSA values below 4.0 that are abnormal (false negative) and those above 4.0 that are normal (false positive). Receiver operating characteristic (ROC) curves are used in medicine to determine a cut-off value for a clinical test. The goal of an ROC curve analysis is to determine the cut-off value.

The sensitivity and specificity of a diagnostic test depends on more than just the "quality" of the test--they also depend on the definition of what constitutes an abnormal test. Look at the idealized graph at right showing the number of patients with and without a disease arranged according to the value of a diagnostic test. This distributionsThis distribution overlap--the test (like most) does not distinguish normal from disease with 100% accuracy. The area of overlap indicates where the test cannot distinguish normal from disease. In practice, we choose a cut- off point (indicated by the vertical black line) above which we consider the test to be abnormal and below which we consider the test to be normal. The position of the cut-off point will determine the number of true positive, true negatives, false positives and false negatives. We may wish to use different cut-off points for different clinical situations if we wish to minimize one of the erroneous types of test results.

test value below 4.0 is considered to be normal and above 4.0 to be abnormal. Clearly

Assume that there are two groups of men and by using a ‘gold standard’ technique one group is known to be normal (negative), not have prostate cancer, and the other is known to have prostate cancer (positive). A blood measurement of prostatespecific antigen is made in all men and used to test for the disease. The test will find some, but not all, abnormal to have the disease. The ROC curve analysis of the PSA test will find a cut-off value that will, in some

13

way, minimize the number of false positives and false negatives. Minimizing the false positives and false negatives is the same as maximizing the sensitivity and specificity. The receiver operating characteristic (ROC) curve is the plot that displays the full picture of trade-off between the sensitivity (true positive rate) and (1- specificity) (false positive rate) across a series of cut-off points. Area under the ROC curve is considered as an effective measure of inherent validity of a diagnostic test. This curve is useful in:

  • (i) Evaluating the discriminatory ability of a test to correctly pick up diseased and non-diseased

subjects

(ii) Finding optimal cut-off point to least misclassify diseased and non-diseased subjects

(iii) Comparing efficacy of two or more medical tests for assessing the same disease

(iv) Comparing two or more observers measuring the same test (inter-observer variability).

Non-parametric and parametric methods to obtain area under the ROC curve

Statistical software provides non-parametric and parametric methods for obtaining the area under ROC curve. The user has to make a choice. The following details may help.

Non-parametric methods are distribution-free and the resulting area under the ROC curve is called empirical. First such method uses trapezoidal rule. If sensitivity and specificity are denoted by Sn and Sp, respectively, the trapezoidal rule calculates the area by joining the points (Sn, 1 Sp) at each interval value of the continuous test and draws a straight line joining the x-axis. This forms several trapezoids and their area can be easily calculated and summed. Another non-parametric method uses Mann-Whitney statistics, also known as Wilcoxon rank- sum statistic and the c-index for calculating area. Both these non-parametric methods of estimating AUC estimate have been found equivalent (10).

Parametric methods are used when the statistical distribution of test values in diseased and non- diseased is known. Binomial distribution is commonly used for this purpose. This is applicable when both diseased and non-diseased test values follow normal distribution. If data are actually binomial or a transformation such as log, square or Box-Cox makes the data binomial then the relevant parameters can be easily estimated by the means and variances of test values in diseased and non-diseased subjects. For details, see (9, 11).

14

The choice of method to calculate AUC for continuous test values essentially depends upon availability of statistical software. Binomial method produces the smooth ROC curve, further statistics can be easily calculated but gives biased results when data are degenerated and distribution is bimodal (12-13). When software for both parametric and non-parametric methods is available, conclusion should be based on the method that yields greater precision of estimate of inherent validity, namely, of AUC.

Examples of ROC curve

Patients with Suspected Hypothyroidism: Consider the following data on patients with suspected hypothyroidism reported (14). T4 and TSH values were measured in ambulatory patients with suspected hypothyroidism and TSH values was used as a gold standard for determining which patients were truly hypothyroid.

T4 value

Hypothyroid

Euthyroid

5

or less

18

1

5.1

- 7

7

17

7.1

- 9

4

36

9

or more

3

39

Totals:

32

93

Notice that these authors found considerable overlap in T4 values among the hypothyroid and euthyroid patients. Further, the lower the T4 value, the more likely the patients are to be hypothyroid.

Of a total of 125 subjects, 32 are known to be hypothyroid and 93 are known to have normal thyroid function. All subjects are assessed with respect to T4 (thyroxine) levels, and then sorted among the four ordinal categories: T4<5.1, T4=5.1 to 7.0, T4=7.1 to 9.0, and T4>9.0. Of the 19 subjects with T4 levels lower than 5.1, 18 were in fact hypothyroid while only 1 was euthyroid. Thus, if a T4 of 5 or less were taken as an indication of hypothyroidism, this measure would yield 18 true positives and 1 false positive, with a true-positive rate (sensitivity) of 18/32=.5625 and a false-positive rate (1-specificity) of 1/93=.0108.

15

 

Observed Frequencies

Cumulative Rates

T4 Value

Euthyroid

Hypothyroid

Euthyroid

Hypothyroid

Diagnostic

False

True

False

True

Level

Positive

Positive

Positive

Positive

<5.1

1

18

.0108

.5625

5.1-7.0

17

7

.1935

.7813

7.1-9.0

36

4

.5806

.9063

>9.0

39

3

1.0

1.0

Totals:

93

32

 

Similarly, 7 of the hypothyroid subjects and 17 of the euthyroid had T4 levels between 5.1 and 7.0. Thus, if any value of T4 less than 7.1 were taken as an indication of hypothyroidism, this measure would yield 18+7=25 true positives and 1+17=18 false positive, with a true- positive rate of 25/32=.7813 and a false-positive rate of 18/93=.1935. And so on for the other diagnostic levels, T4=7.1 to 9.0, and T4>9.0.

For the present example k=4, so the curve is fitted to the first three of the bivariate pairs, as shown below in Graph A.

Observed Frequencies Cumulative Rates T4 Value Euthyroid Hypothyroid Euthyroid Hypothyroid Diagnostic False True False True Level

The area under the T4 ROC curve is 0.872. The T4 would be considered to be "good" at separating hypothyroid from euthyroid patients.

Interpretation of ROC curve

Total area under ROC curve is a single index for measuring the performance of a test. The larger the AUC, the better is overall performance of the medical test to correctly identify

16

diseased and non-diseased subjects. Equal AUCs of two tests represents similar overall performance of tests but this does not necessarily mean that both the curves are identical. They may cross each other.

Figure 1 depicts three different ROC curves. Considering the area under the curve, test A is better than both B and C, and the curve is closer to the perfect discrimination. Test B has good validity and test C has moderate.

diseased and non-diseased subjects. Equal AUCs of two tests represents similar overall performance of tests but

Figure 1: Three ROC curves with different areas under the curve

The accuracy of the test depends on how well the test separates the group being tested into those with and without the disease in question. Accuracy is measured by the area under the ROC curve. An area of 1 represents a perfect test; an area of .5 represents a worthless test. A rough guide for classifying the accuracy of a diagnostic test is the traditional academic point system:

.90-1 = excellent (A)

.80-.90 = good (B)

.70-.80 = fair (C)

.60-.70 = poor (D)

.50-.60 = fail (F)

17

Screening is
Screening is

a very important from public health point of views as it helps people in perceiving

Screening is a very important from public health point of views as it helps people in

symptoms of their own illness and then consult physicians for diagnosis and treatment. The success of any screening program at reducing morbidity and mortality depends on various factors such as interrelations between the disease experience of the target population, the characteristics of the screening procedures, and the effectiveness of the methods of treating disease early.

Commented [RS5]: Before ending, may consider to comment on evaluating a screening programme more so when characteristics of screening programme is described at the beginning of this chapter.

References:

  • 1. Noel S. Weiss. Clinical Epidemiology Chapter 32 Modern Epidemiology, Third Edition, Editors Rothman KJ, Greenland S and Lash TL.

  • 2. Irwig L, Bossuyt P, Glasziou P, Gatsonis C, Lijmer J. Designing studies to ensure that

18

estimates of test accuracy are transferable. BMJ. 2002; 324(7338): 669-71.

  • 3. Raslich MA. Markert RJ, Stutes SA. Selecting and interpreting diagnostic tests. Biochemia Medica 2007; 17(2):139-270.

  • 4. Ana-Maria Šimundić . Measures of diagnostic accuracy: basic definitions. Department of Molecular Diagnostics University Department of Chemistry, Sestre milosrdnice University Hospital, Zagreb, Croatia. Assessed on 12/02/2017 www.ifcc.org/ifccfiles/docs/190404200805.pdf, Page no. 2.

  • 5. Mariska M.G. Leeflang MG, Anne W.S. Rutjes AWS, Reitsma JB MD, Hooft L, Bossuyt P. Variation of a test’s sensitivity and specificity with disease prevalence. CMAJ. 2013, August 6(11): 185.

  • 6. Irwig L, Bossuyt P, Glasziou P, et al. Designing studies to ensure that estimates of test accuracy are transferable. BMJ. 2002; 324:669-71.

  • 7. Wen Zhu, Nancy Zeng, Ning Wang. Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS® Implementations. NESUG 2010, Health Care and Life Sciences.

  • 8. Alan S. Morrison. Screening Chapter 25. Modern Epidemiology, Second Edition. Editors Rothman KJ and Greenland S.

Deeks

  • 9. JJ,

Altman

DG.

Diagnostic

tests

4:

likelihood

ratios.

BMJ

2004;

17,

329(7458):168-9.

 
  • 10. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29-36.

  • 11. Zhou Xh, Obuchowski NA, McClish DK. Statistical Methods in Diagnostic Medicine. New York: John Wiley and Sons, Inc, 2002.

  • 12. Faraggi D, Reiser B. Estimating of area under the ROC curve. Stat Med 2002; 21:3093-
    3106.

  • 13. Hajian Tilaki KO, Hanley JA, Joseph L, Collet JP. A comparison of parametric and nonparametric approaches to ROC analysis of quantitative diagnosis tests. Med Decis Making 1997; 17:94-102.

  • 14. Goldstein BJ and Mushlin AI. Use of a single thyroxine test to evaluate ambulatory medical patients for suspected hypothyroidism. J Gen Intern Med. 1987 Jan- Feb;2(1):20-4.

19