You are on page 1of 32

Epidemiology and Population Based Healthcare (NURS 323)

Assessing Validity & Reliability


of Diagnostic & Screening
Tests
Objectives
• Define the validity and reliability of screening and
diagnostic tests.
• Compare measures of validity, including sensitivity and
specificity.
• Illustrate the use of multiple tests (sequential and
simultaneous testing).
• Introduce positive and negative predictive value.
• Compare measures of reliability, including percent
agreement and kappa.
Validity
Ø Validity is the strength of our conclusions, inferences or propositions.

Ø The best available approximation to the truth of a given conclusion.

In short:
o Were we right?
o Is the instrument measure what suppose to be measured?
VALIDITY OF SCREENING TESTS

Ø Validity of a test = its ability to differentiate between who has a


disease and who does not.

Validity has two components:

1. Sensitivity

2. Specificity
VALIDITY OF SCREENING TESTS

Sensitivity =

Ability of the test to identify correctly those who have the disease.

Specificity=

Ability of the test to identify correctly those who do not have the
disease.
Example of Sensitivity & Specificity:
Tests with Dichotomous Results (Positive or Negative)

• We have a population of 1,000 people, of whom 100 have a certain


disease and 900 do not.

• A test is available that can give in either positive or negative results.

• We want to use this test to try to differentiate persons who have the
disease from those who do not.

• The results obtained by applying the test to this population of 1,000


people are shown in Table, as follows:
Why we Measure Sensitivity and Specificity
For disease status of each individual in the population, sometimes the truth maybe
the result of:
Ø Another test that has been in use

Ø More definitive, and often more persistent test (e.g., cardiac catheterization or tissue biopsy).

Ø To calculate sensitivity & specificity of a test, we must know who "really“ has the
disease and who does not from a source other than the test we are using.

Ø In fact, we’re comparing our test results with some "gold standard”- an external
source of "truth"
Why we Measure Sensitivity and Specificity
Where Is the problem: false positives ?
The issue of false positives is important
because all people who screened positive
are brought back for more sophisticated
and more expensive tests.
o Burden on the health care system.

o Anxiety and worry induced in persons


who have been told that they have
tested positive.
Where is the problem: false negatives ?

If a person has the disease but is mistakenly informed that the test
result is negative:
o If the disease is a serious one for which effective intervention is available, the
problem is really critical.

E.g., if the disease is cancer that is curable only in its early stages, a
false-negative result could represent a virtual death sentence.
Tests of Continuous Variables

Test for a continuous variable, such as blood pressure or blood glucose


level, for which there is no "positive" or "negative" result.
USE OF MULTIPLE TESTS

Sequential (Two-stage) Testing : 2-stage screening:


1. a less expensive, less invasive, or less uncomfortable test is generally
performed first,

2. & those who screen positive are recalled for further testing with a more
expensive, more invasive, or more uncomfortable test, which may have greater
sensitivity and specificity.
Positive & Negative Predictive Value of a Test
(PPV) and (NPV)
Sensitivity:
Ø What is the chance that a person with a positive test truly has the A/(A + C) × 100
disease? If the subject is in the first row in the table, what is the 10/15 × 100 = 67%
probability of being in cell A as compared to cell B? A clinician
Specificity:
calculates across the row as follows: D/(D + B) × 100
Ø PPV: The proportion of patients who test positive and actually have 45/85 × 100 = 53%
the disease
o PPV= True positive/ Total number (true positive +false positive)
A/(A+B) × 100= 10/50*100= 20%

Ø NPV: The proportion of patients who test negative and actually do not
have the disease
o NPV= True negative/ Total number (true negative +false negative)
D/(D + C) × 100= 45/50 × 100 = 90%
Reliability of Tests
Is the test is repeatable? Can it give the same results when we repeat it?

Factors that cause variation in test results:

1. Intra-subject Variation: The values obtained in measuring many human characteristics


often vary over time, even during a short period.
o E.g., Changes in blood pressure readings over a 24-hour period

2. Intra-observer variation: Sometimes variation occurs between two or more readings of


the same test results made by the same observer.
o E.g., Radiologist who reads the same group of X-rays at two different times may read
one or more of the X-rays differently the second time.
Reliability of Tests

3. Inter-observer Variation: The variation between observers.

• E.g., Two examiners often do not derive the same result.

• The extent to which observers agree or disagree is an important issue,


whether we are considering physical examinations, laboratory tests, or
other means of assessing human characteristics.

• Need to be able to express the extent of agreement in quantitative terms.


Reliability of Tests
Percent Agreement:

ØTable 5-12 shows a schema for examining variation between observers. Two observers
were instructed to categorize each test result into one of the following four categories:
abnormal, suspect, doubtful, and normal.

ØThe number of readings in each cell is denoted by a letter of the alphabet.

ØA X-rays were read as abnormal by both radiologists.

Ø C X-rays : abnormal by radiologist 2 and as doubtful by radiologist 1.

ØM X-rays: abnormal by radiologist 1 and as normal by radiologist 2.


Percent Agreement:
Percent Agreement:
Questions 1,2, & 3
A physical examination was used to screen for breast cancer in 2,500 women with biopsy proven
“Confirmed adenocarcinoma of the breast and in 5,000 age- and race-matched control women.
The results of the physical examination were positive (i.e., a mass was palpated) in 1,800 cases
and in 800 control women, all of whom showed no evidence of cancer at biopsy.
1. The sensitivity of the physical examination was: ______
2. The specificity of the physical examination was: ______
3. The positive predictive value of the physical examination was: ______

In order to compute sensitivity,* it is helpful to complete a 2 × 2 table as


below: Biopsy-Proven Adenocarcinoma
Physical Examination Yes No Totals
Positive 1,800 800 2,600
Negative 700 4,200 4,900
Totals 2,500 5,000 7,500
Answer Question 1, 2 & 3

In order to compute sensitivity,* it is helpful to complete a 2 × 2 table as


below: Biopsy-Proven Adenocarcinoma
Physical Examination Yes No Totals
Positive 1,800 800 2,600
Negative 700 4,200 4,900
Totals 2,500 5,000 7,500

1: Sensitivity can be calculated as follows:


= True positives (TP)____________________
True positives (TP) + false negatives (FN)
= 1,800___________
1,800 + 700
= 0.720, or 72%
Answer Question 1, 2 & 3

2: Specificity can be calculated as follows:


= True negatives (TN)__________________
True negatives (TN) + false positives (FP)
= 4,200_______________
4,200 + 800
= 0.840, or 84.0%
3. Positive predictive value (PPV) of the physical examination:
= True positives (TP)__________________
True positives (TP) + false positives (FP)
= 1,800
1,800 + 800
= 0.692, or 69.2%
Question 4
A physical examination and an audiometric test were given to 500 persons with suspected hearing problems, of
whom 300 were actually found to have them. The results of the examinations were as follows:

Table I00-2. PHYSICAL EXAMINATION


HEARING PROBLEMS
Result Present Absent
Positive 240 40
Negative 60 160
AUDIOMETRIC TEST
HEARING PROBLEMS
Result Present Absent
Positive 270 60
Negative 30 140

Compared with the physical examination, the audiometric test is:


◦a. Equally sensitive and specific
◦b. Less sensitive and less specific
◦c. Less sensitive and more specific
◦d. More sensitive and less specific
◦e. More sensitive and more specific
Answer of Q4 d is the best answer based upon the following calculations.

Sensitivity of the physical examination:


= Persons with the disease correctly identified by the test
Persons with the disease
= True positives (TP) /_True positives (TP) + false negatives (FN)
= 240/300
= 0.800, or 80.0%
Sensitivity of the audiometric test:
= Persons with the disease correctly identified by the test
Persons with the disease
= True negatives (TN)/ True negatives (TN) + false positives (FP)
= 270 / 300
= 0.900, or 90.0%
Question 4: Answer
Specificity of the physical examination:

= Persons without the disease correctly identified by the test


Persons without the disease or
= True positives (TP)/ True positives (TP) + false negatives (FN)
= 160 /200
= 0.800, or 80.0%

Specificity of the audiometric test:


= Persons without the disease correctly identified by the test
Persons without the disease
= True negatives (TN)/ True negatives (TN) + false positives (FP)
= 140/ 200
= 0.700, or 70.0%
.
Question 5
Two pediatricians want to investigate a new laboratory test that identifies streptococcal
infections. Dr. Kidd uses the standard culture test, which has a sensitivity of 90% and
a specificity of 96%. Dr. Childs uses the new test, which is 96% sensitive and 96%
specific.
If 200 patients undergo culture with both tests, which of the following is correct?
A. Dr. Kidd will correctly identify more people with streptococcal infection than Dr.
Childs
B. Dr. Kidd will correctly identify fewer people with streptococcal infection than Dr.
Childs
C. Dr. Kidd will correctly identify more people without streptococcal infection than Dr.
Childs
D. The prevalence of streptococcal infection is needed to determine which pediatrician
will correctly identify the larger number of people with the disease
The best answer for Q5 is B
ü The sensitivity of the test is higher for Dr. Childs than for Dr. Kidd. Dr. Childs will correctly identify a greater
proportion of diseased people by the new test.

ü This can also be interpreted as Dr. Kidd correctly identifying fewer children who actually have streptococcal
infection with the standard culture test.

Answer a : incorrect because Dr. Kidd’s test has a lower sensitivity.

Answer c : incorrect as the proportion of non-diseased people who are correctly identified as negative by the
test will be the same for the two doctors since both tests have a specificity of 96%.

Answer d : incorrect since sensitivity and specificity are characteristics of the test itself and do not depend
upon disease prevalence.
Question 6 & 7 based on the following info:

A colon cancer screening study is being conducted in Nottingham,


England. Individuals 50 to 75 years old will be screened with the
Hemoccult test. In this test, a stool sample is tested for the
presence of blood.

The Hemoccult test has a sensitivity of 70% and a specificity of


75%. If Nottingham has a prevalence of 12/1,000 for colon cancer,
what is the positive predictive value of the test?
Answer Q6
. In a hypothetical population of 10,000 people:
Colon Cancer
Hemoccult Test Yes No Totals
Positive 84 2.470 2,554
Negative 36 7,410 7,446
Totals 120 9,880 10,000

*Note: PPV can also be calculated as follows:


= True positives (TP)
True positives (TP) + false positives (FP)
= 84/ 84 + 2,470
= 0.033, or 3.3%
Question 7
If the Hemoccult test result is negative, no further testing is done. If the Hemoccult test
result is positive, the individual will have a second stool sample tested with the
Hemoccult II test. If this second sample also tests positive for blood, the individual will
be referred for more extensive evaluation. What is the effect on net sensitivity and net
specificity of this method of screening?
◦a. Net sensitivity and net specificity are both increased
◦b. Net sensitivity is decreased and net specificity is increased
◦c. Net sensitivity remains the same and net specificity is increased
◦d. Net sensitivity is increased and net specificity is decreased
◦e. The effect on net sensitivity and net specificity cannot be determined from the data
Answer Q7
The best answer is b.
ØThe method of testing described in this example is called sequential testing.
By applying sequential tests, there is a loss of net sensitivity and a gain in net
specificity.
• Answer a : incorrect since both test characteristics are not increased in a
sequential testing approach.
• Answer c: incorrect since the test characteristics undergo a trade-off when
multiple tests are performed. This means that any gain in one must be
balanced by a loss for the other.
• Answer d: incorrect since this describes the result of simultaneous testing.

You might also like