You are on page 1of 48

VALIDITY AND RELIABILITY

OF SCREENING TESTS

Rashida B Syed, Epidemiologist


Consultant Faculty
Field Epidemiology Training Program (FETP)-
Pakistan

7/11/2013 Validity and reliability of Tests 1


Objectives
 Calculate and interpret measures of the validity of a screening
test:
 Sensitivity
 Specificity
 Understand the relationship between sensitivity and
specificity.

 Calculate and interpret measures of the performance (yield) of


a screening test:
 Predictive value positive (PV+)
 Predictive value negative (PV-)

 Understand factors that influence PV+ and PV-

 Recognize issues and sources of bias in evaluating screening


programs.
7/11/2013 Validity and reliability of Tests 2
Purpose of screening
 The early detection of disease in individuals
who do not show any signs of disease.

 Aims to reduce morbidity and mortality from


disease among persons being screened.

 Is the application of a relatively simple,


inexpensive test, examinations or other
procedures to people.
 a means of identifying persons at increased risk
for the presence of disease, who warrant further
evaluation

7/11/2013 Validity and reliability of Tests 3


Diagnosis = Screening

 Screening tests can also often be used as


diagnostic tests

 Diagnosis involves confirmation of presence or


absence of disease in someone suspected of or
at risk for disease

 Screening is generally in done among


individuals who are not suspected of having
disease

7/11/2013 Validity and reliability of Tests 4


Requirements

 Is there a truly effective treatment available for


the discovered disease?
 Is that treatment more effective in screened than
non-screened cases?
 What are the side effects of the screening
process?
 How efficient is screening? Do we have the right
threshold? i.e. how many people must be
screened to obtain a case?

7/11/2013 Validity and reliability of Tests 5


Natural History of Disease
Detectable sub-clinical disease

Susceptible Subclinical Clinical Stage of Recovery,


Host Disease Disease Disability, or Death

Diagnosis
Point of sought
Exposure

Onset of
symptoms

Screening
7/11/2013 Validity and reliability of Tests 6
Examples of Screening Tests

 Questions
 Clinical Examinations
 Laboratory Tests
 Genetic Tests
 X-rays

Goel
7/11/2013 Validity and reliability of Tests 7
Diseases for which screening
has been recommended

 Cervical cancer
 Breast cancer
 Prostate cancer
 Colon cancer
 Diabetes
 Hypertension

7/11/2013 Validity and reliability of Tests 8


Terminology

Validity is analogous to accuracy

The validity of a screening test is how well the given


screening test reflects another test of known
greater accuracy

Validity assumes that there is a gold standard to


which a test can be compared

7/11/2013 Validity and reliability of Tests Paneth9


7/11/2013 Validity and reliability of Tests 10
Three key measures of validity

• Sensitivity
• Specificity
• Predictive value

7/11/2013 Validity and reliability of Tests 11


Sensitivity and Specificity

Sensitivity tells us how well a positive test detects disease.

It is defined as the ability of the test to identify correctly as


diseased, those who have the disease.
---------------------------------------------------------------------------------
Specificity tells us how well a negative test detects
non-disease.

Defined as the ability of the test to identify correctly those who do


not have the disease as test negative.

7/11/2013 Validity and reliability of Tests 12


Disease
Present Absent

True False
Screening

Positive positives positives


Test

False True
Negative negatives negatives

7/11/2013 Validity and reliability of Tests 13


Disease
Present Absent

a b
Screening

a+b
Positive
Test

Negative c d c+d

a+c b+d N

7/11/2013 Validity and reliability of Tests 14


Sensitivity

 Proportion of individuals who have the disease who


test positive (true positive rate) tells us how well a “+”
test picks up disease

Disease
yes no
a
Screening

+ a b a+b Sensitivity =
Test

- c d c+d a+c
a+c b+d N
7/11/2013 Validity and reliability of Tests 15
Specificity

Proportion of individuals who don’t have the disease who


test negative (true negative rate) tell us how well a “-
” test detects no disease

Disease
yes no
d
Screening

+ a b a+b Specificity =
Test

- c d c+d b+d
a+c b+d N
7/11/2013 Validity and reliability of Tests 16
Predictive value

 Positive predictive value – the number of


individuals who have a condition from all
those who test positive.

 Negative predictive value - the number of


individuals who do not have a condition
from all those who test negative

7/11/2013 Validity and reliability of Tests 17


Positive Predictive Value
 Proportion of individuals who test positive who
actually have the disease

Disease
yes no
a
Screening

+ a b a+b P.P.V. =
Test

- c d c+d a+b
a+c b+d N
7/11/2013 Validity and reliability of Tests 18
Negative Predictive Value

 Proportion of individuals who test negative


who don’t have the disease

Disease
yes no
d
Screening

+ a b a+b N.P.V. =
Test

- c d c+d c+d
a+c b+d N
7/11/2013 Validity and reliability of Tests 19
Determinants of predictive value

The predictive value of a test is determined by


3 factors:
 1. Sensitivity
 2. Specificity
 3. Prevalence of the disease in the
population being tested

7/11/2013 Validity and reliability of Tests 20


Effect of prevalence on PPV

 As prevalence rates decrease, the positive


predictive value of a test also decreases
 This explains why diagnostic tests which are
developed in clinical populations (where the
prevalence of the disease being tested is
often high) often perform poorly in general
population settings (where disease
prevalence tends to be lower).
 In our example-prove it
7/11/2013 Validity and reliability of Tests 21
Scenarios
 Tests with Dichotomous Results
 Examples
 (Positive or Negative)

 Tests with Continuous results


 Examples
 Systolic blood pressure (mm Hg)
 Tuberculin reaction (induration diameter, mm)

7/11/2013 Validity and reliability of Tests 22


Examples
 In a sample of 200 people: 100 people have the disease
Hypothyroidism, and 100 people do not have it.
 In the same sample of 200 people: 110 people test
positive for Hypothyroidism using a new diagnostic test,
and 90 people test negative for Hypothyroidism using
the same diagnostic test.
 Of the 110 people who are test positive, 90 do have the
disease and 20 do not.
 Of the 90 people who are test negative, 10 do have the
disease and 80 do not.
 Sensitivity and Specificity?

7/11/2013 Validity and reliability of Tests 23


Solution

 SENSITIVITY=TP/TP+FN
=90/90+10=90%
 SPECIFICITY=TN/TN+FP

=80/80+20=80%

7/11/2013 Validity and reliability of Tests 24


A test is used in 50 people with disease and
50 people without. These are the results.

Disease
Present Absent

48 3 51
Screening

Positive
Test

Negative 2 47 49

50 50 100
7/11/2013 Validity and reliability of Tests
Paneth
25
Disease
Present Absent

48 3 51
Screening
Test Positive

Negative 2 47 49

50 50 100
Sensitivity = 48/50
Specificity = 47/50
Positive Predictive Value = 48/51
Negative Predictive Value = 47/49
7/11/2013 Validity and reliability of Tests
Paneth
26
So… you understand the
accuracy of a screening test …

What is the next step?


Put screening to use in the
population

7/11/2013 Validity and reliability of Tests 27


Sensitive vs. Specific tests
 A test with high levels of sensitivity is usually
positive when disease is present and has few false
negatives – useful when it is important not to miss a
diagnosis (e.g. if the disease is dangerous but has
an effective treatment)

 A test with high levels of specificity is usually


negative when disease is absent and has few false
positives – useful when a false positive diagnosis
would be harmful (e.g. if it resulted in unnecessary
treatment)

7/11/2013 Validity and reliability of Tests 28


Balancing sensitivity vs. specificity
 A really good test would be highly sensitive and highly
specific.
 In practice, this is often not the case.

 Instead, there is often a trade-off between the sensitivity and


the specificity of diagnostic tests

 This occurs in cases where the test result is expressed on a


continuous scale (e.g. blood pressure, blood sugar levels)

 In such circumstances, a cut-point has to be chosen to define


normal vs. abnormal

 The decision for the cut point involves weighing the


consequences of leaving cases undetected (false negatives)
against erroneously classifying healthy persons as diseased
(false positives).

Refer
 7/11/2013 to Gordis Validity and reliability of Tests 29
NET SENSITIVITY AND SPECIFICITY

 Use of multiple tests


 Refer Gordis

7/11/2013 Validity and reliability of Tests 30


Balancing sensitivity vs. specificity

Blood sugar level Sensitivity % Specificity%


2hrs after eating
(mg/100ml)

70 98.6 8.8
90 94.3 47.6
110 85.7 84.1
130 64.3 96.9
170 42.9 100.0

7/11/2013 Validity and reliability of Tests 31


ROC curves

 One method for determining the best cut-


off point is by constructing a ROC curve
 ROC=receiver operating characteristic, a
term that comes from radar science
 ROC curves are constructed by plotting
the sensitivity (or true positive rate)
against the false positive rate (1-
specificity)

7/11/2013 Validity and reliability of Tests 32


ROC curve for blood sugar
readings

Source: Fletcher, Fletcher and Wagner, Clinical epidemiology: the essentials (3rd ed)
7/11/2013 Validity and reliability of Tests 33
 Shows trade-off between sensitivity and
specificity
 Closer to left hand and top borders the
more accurate the test
 Slope of tangent at cut point gives the
Likelihood Ratio (LR) for that value of the
test
 The area under the curve is a measure of
test accuracy

7/11/2013 Validity and reliability of Tests 34


The Area under an ROC Curve

7/11/2013 Validity and reliability of Tests 35


 Good tests lie close to the upper left hand
corner of the graph – where sensitivity and
specificity are both high

 Generally the best cut-off point lies at or near


the “shoulder” of the curve*

 The overall accuracy of the test is represented


by the area under the curve

 Tests that plot close to the diagonal across the


middle of the graph are least useful, as this is
where the test is no better than chance

 ROC curves can also be used to compare


different tests
*unless there are clinical reasons for preferring a highly sensitive
or highly specific test
7/11/2013 Validity and reliability of Tests 36
Sources of Bias in the Evaluation
of Screening Programs

 Lead time bias


 Length bias
 Volunteer bias

7/11/2013 Validity and reliability of Tests 37


Lead time bias

 Lead time: interval between the


diagnosis of a disease at screening and
the usual time of diagnosis (by
symptoms) Lead Time

Diagnosis Diagnosis
by screening via symptoms

7/11/2013 Validity and reliability of Tests 38


Lead-Time Bias

Consider a condition where the natural history allows for


an earlier diagnosis, however, survival does not improve
despite identifying it earlier

A screening program here will…


 survival will appear to increase

 but in reality, it is increased by exactly the


amount of time their diagnosis was advanced by
the screening program
 Thus there is no benefit to screening from a survival
standpoint.

7/11/2013 Validity and reliability of Tests 39


Lead time bias
 Assumes survival is time between screen and
death
 Does not take into account lead time between
diagnosis at screening and usual diagnosis.
Survival = 14 years

Diagnosis
by screening Death
in 1994 in 2008
7/11/2013 Validity and reliability of Tests 40
Lead time bias
Survival = 14 years

True Survival = 10 years

Lead Time 4 years

Diagnosis Usual time of Death


by diagnosis in 2008
screening via symptoms
in 1994
7/11/2013
in 1998
Validity and reliability of Tests 41
Length Bias

 Most chronic diseases, especially cancers, do not


progress at the same rate in everyone.

 Any group of diseased people will include some in


whom the disease developed slowly and some in
whom it developed rapidly.

 Screening will preferentially pick up slowly developing


disease (longer opportunity to be screened) which
usually has a better prognosis

7/11/2013 Validity and reliability of Tests 42


Paneth
O P Y D
Biological Disease Symptoms Death
onset of detectable Begin
disease via screening

Screening

O P Y D
O P Y D
Length bias

O P Y D

O P Y D

O P Y D
O P Y D

7/11/2013 Validity and reliability of Tests Time


43
Volunteer bias

 Type of bias where those who choose to participate are


likely to be different from those who don’t
 Volunteers tend to have:
 Better health

 Lower mortality

 Likely to adhere to prescribed medical regimens

7/11/2013 Validity and reliability of Tests 44


A worked example the Fecal occult blood (FOB) screen test
is used in 203 people to look for bowel cancer: Patients with
bowel cancer (as confirmed on endoscopy)

 False positive rate (α) = FP / (FP + TN) = 18 / (18 + 182) = 9% = 1 −


specificity.

 False negative rate (β) = FN / (TP + FN) = 1 / (2 + 1) = 33% = 1 −


sensitivity.

 Power = sensitivity = 1 − β

 Hence with large numbers of false positives and few false negatives,
a positive FOB screen test is in itself poor at confirming cancer
(PPV = 10%) and further investigations must be undertaken, it will
though pickup 66.7% of all cancers (the sensitivity). However as a
screening test, a negative result is very good at reassuring that a
patient does not have cancer (NPV = 99.5%) and at this initial screen
correctly identifies 91% of those who do not have cancer (the
specificity).
7/11/2013 Validity and reliability of Tests 45
Reliability

 Validity (accuracy)
 Reliability (Repeatability)

 Refer Epidemiology by Gordis

7/11/2013 Validity and reliability of Tests 46


Review questions from Gordis

7/11/2013 Validity and reliability of Tests 47


 Likelihood-ratio positive =
sensitivity / (1 − specificity) =
66.67% / (1 − 91%) = 7.4
 Likelihood-ratio negative =
(1 − sensitivity) / specificity =
(1 − 66.67%) / 91% = 0.37
7/11/2013 Validity and reliability of Tests 48