Professional Documents
Culture Documents
Overview
In this session we will learn on the value of a type of secondary prevention called
screening. Screening is a way of improving patient outcomes by detecting a disease at
an earlier, more treatable stage, or by avoiding recurrence of disease. In order to
provide effective curative or preventive health care, it is needed to distinguish between
individuals who have a disease and those who do not.
For this purpose, several tests such as physical examination; biochemical assay of blood,
urine and other body fluids; radiography; ultrasonography; cytology; and
histopathology. One question we need to answer is how good these tests are at
separating individuals with and without the disease in question. Unfortunately, several
screening and diagnostic tests are liable to error. In this chapter, you will learn about
certain statistical methods for assessing the quality of screening and diagnostic tests to
help you make informed decisions about their use and interpretation.
Learning objective
After working through this session, students were expected to be able to:
Describe and calculate the measures of validity of a diagnostic test
Explain the relationship between prevalence and predictive values
List the World Health Organization guidelines for assessing appropriateness of
screening
Describe and calculate the measure of reliability of a test.
We cannot expect sensitivity and specificity values to be equally high for a given test,
and the importance of each measure will depend on the disease in question. In the case
of a communicable disease, for example, specificity may be considered more important
as a false positive case may have less of a public health impact than a false negative
which could result in continued transmission of the disease. Estimation of sensitivity and
specificity will depend on the definition that is used for a true positive. This may be
relatively easy when the test is for a dichotomous variable where a disease is considered
to be either present or absent. For a continuous variable, such as blood pressure, the
definition of a positive case needs to be determined and be evidence-based; this may be
by carrying out a further ‘gold standard’ diagnostic test, or by following up participants
to see who develops clinical manifestations of disease.
Predictive values
Another important measure for a screening test is the predictive value. The positive
predictive value of mammography, for example, will tell a woman how likely it is that
she has breast cancer after a positive mammogram. The negative predictive value will
tell a woman the probability is that she truly does not have breast cancer if the
mammogram is negative. Predictive values measure whether or not the individual
actually has the disease, given the results of the screening test, and are determined by
the validity of a test (specificity and sensitivity) and the characteristics of the population
being tested (particularly the prevalence of preclinical disease). The more sensitive a
test, the less likely it is that an individual with a negative result will have the disease, so
the greater the negative predictive value. The more specific a test, the less likely an
individual with a positive test will be free from disease and the greater the positive
predictive value. However, if the disease is rare, and the population is at a low risk of
disease, the positive results are likely to be mostly false positives. Table 12.1
summarizes the relationship between the results of a screening test and the actual
presence of disease as determined by the result of a subsequent confirmatory
diagnostic test (the ‘gold standard’).
In the table, a is the number of subjects who have the condition and are found positive
by the test (true positives), b the number of subjects who do not have the condition but
are found positive by the test (false positives), c the number of subjects who have the
condition but are found negative by the test (false negatives) and d the number of
subjects who do not have the condition and are found negative by the test (true
negatives).
Reliability Test
Reliability mean that the results of a test or measure are identical or closely similar each
time it is conducted. When there was a different result between two test (with similar
equipment or tool) it means there was a variation between the first and second test
conducted. There were 3 kind of variation :
1. Intra subject variation
The values obtained in measuring many human characteristics often vary over
time, even during a short period. Variability over time is considerable. This, as
well as the conditions under which certain tests are conducted (e.g.,
postprandially or postexercise, at home or in a physician's office), clearly can
lead to different results in the same individual. Therefore, in evaluating any test
result, it is important to consider the conditions under which the test was
performed, including the time of day.
2. Intra observer variation
Sometimes variation occurs between two or more readings of the same test
results made by the same observer. For example, a radiologist who reads the
same group of X-rays at two different times may read one or more of the X-rays
differently the second time. Tests and examinations differ in the degree to which
subjective factors enter into the observer's conclusions, and the greater the
subjective element in the reading, the greater the intra observer variation in
readings is likely to be.
3. Inter observer variation
Another important consideration is variation between observers. Two examiners
often do not derive the same result. The extent to which observers agree or
disagree is an important issue, whether we are considering physical
examinations, laboratory tests, or other means of assessing human
characteristics. We therefore need to be able to express the extent of agreement
in quantitative terms. We measure this variation using kappa method.
Kappa Method
Kappa statistic was used to measure agreement between two observers to know
whether the agreement happen by chance or not. Because percent agreement is also
significantly affected by the fact that even if two observers use completely different
criteria to identify subjects as positive or negative, we would expect the observers to
agree solely as a function of chance.
Observer A
Observer B Total
+ -
+ a b a+b
- c d c+d
Total a+c b+d a+b+c+d
)( )
(( a+c
a+b +c +d )
x ( a+ b ) +
) ((
b+ d
a+b +c +d )
x ( c +d )
)
Kappa =
( a+ d
a+ b+c +d
−
a+b +c +d
( (( ) ) (( ) ) )
a+c b+ d
x ( a +b ) + x ( c +d )
a+b +c +d a+b +c +d
100 %−
a+ b+c +d
Landis and Koch, suggested that a kappa greater than 0.75 represents excellent
agreement beyond chance, a kappa below 0.40 represents poor agreement, and a
kappa of 0.40 to 0.75 represents intermediate to good agreement.
Activity 1
In a hypothetical study regarding HIV screening in a population with HIV prevalence
below 30% in people with clinical sign, we would like to test the validity of 2 HIV
screening test. We screen 1000 people using 2 kind of screening test (HIV1 and HIV2)
and a gold standard (HIV-gold).
In the first screening using HIV1, we found 761 people negative for HIV using HIV1 and
200 people were positive using HIV-gold. There were 760 people among 761 that was
also negative using HIV-gold.
In the second screening using HIV-2, we found 850 people negative using HIV-2. There
were 145 people positive for HIV2 and HIV-gold.
5. How many people had false positive and false negative result?
There were 5 people with false positive result and 55 people with false negative
result
6. What are the sensitivity, specificity, and positive and negative predictive values
of the HIV-2 test?
Sensitivity: 73%
Specificity: 99,4%
PPV: 97%
NPV: 94%
7. Between false positive and negative, which one need the most attention in HIV
screening for blood donor or transfusion? Explain your answer
False negative because false negative result it means that person has negative
test result but actually has a positive HIV and if we failed to catch positive HIV
case (false negative) it could be a potential HIV source.
8. Which test is better for the first and second test in a two stage diagnostic of HIV?
Explain the reason for choosing the test
HIV1 for the first diagnostic test because it has high sensitivity so it will have low
false negative result.
HIV2 for the second diagnostic test because it has high specificity so it could be
used to screen the false positive result from the first screening test.
The reason we use two stage screening because we want to increase the PPV by
increasing prevalence and by lowering the false positive so we test the
positive person from the first diagnostic test in order to increase the prevalence
and lower the false positive.
Activity 2
Two physicians were asked to classify 100 chests X ray as abnormal or normal
independently. The comparison of their classification is shown in following table:
Physician 2 Total
Physician 1 Abnormal Normal
Abnormal 40 20 60
Normal 10 30 40
Total 50 50 100
1. The simple, overall percent agreement between two physicians out of the total is…
= (40+30)/100 = 70%
2. The overall percent agreement between the two physicians, removing the X-rays
that both physicians classified as normal is = 40/ (40+20+10) = 40/70= 57.1%
Physician 2 Total
Physician 1 Abnormal Normal
Abnormal 40 20 60 60%
Normal 10 30 40 40%
Total 50 50 100
50% 50%
Landis and Koch suggested that a kappa greater than 0.75 represent excellent
agreement beyond chance, kappa below 0.4 represents poor agreement and a
kappa 0.40 – 0.75 represents intermediate to good agreement.