Professional Documents
Culture Documents
INTRODUCTION Imprecision
DIAGNOSIS • Must not be confused with bias
• Diagnosis is an imperfect process • Arises when an estimate is based on a small sample
o Results in a probability rather than a certainty of being right • Caused by random error
• The doctor’s certainty/uncertainty about a diagnosis is expressed using • Statistical analysis appropriately describes the uncertainty in an estimate
terms like “rule out” or “possible” before a clinical diagnosis caused by random error using confidence intervals
• Increasingly, clinicians express the likelihood that a patient has a disease • In a systematic review, confidence intervals (or regions) are estimated to
as a probability describe imprecision
o Implies being familiar with mathematical relationships between the
properties of diagnostic tests and the information they yield in Bias vs. Imprecision
various clinical situations
o Understanding these may: Bias Imprecision
▪ Help the clinical reduce diagnostic uncertainty Arises through issues of internal Arises when an estimate is based
▪ Increase the understanding of the degree of uncertainty and external validity on a small sample
(know what you don’t know) Caused by systematic error Caused by random error
▪ Convince the clinician to increase their level of uncertainty Statistical analysis can
• Four Possible Types of Test Results appropriately describe the Statistical analysis cannot describe
o Diagnostic Tests can be either positive (abnormal) or negative uncertainty in an estimate the uncertainty
(normal) (confidence intervals)
o Disease can be either present or absent Described by estimating
Risk of bias is assessed by study
confidence intervals (or regions) in
Disease (+) Disease (-) validity in a systematic review
a systematic review
Test (+) True Positive False Positive
Test (-) False Negative True Negative • Assessing study validity and estimating confidence intervals are both
essential in all systematic reviews
• Reasons for Performing a Diagnostic Test:
o Also done to assess the accuracy and precision of the article
o To diagnose the presence or absence of a particular condition
o To identify people who are predisposed to disease
Precision and Accuracy
o To identify early asymptomatic disease (screening)
o To plan an intervention (e.g., surgery) • Some articles can be misleading due to bias
o To monitor response to therapy
o To estimate the risk of future events
o To determine prognosis in patients with a known disease
• Questions on diagnosis should be phrased in terms of the following
variables:
o P - the patient population on whom the test might be done
o I/E - the exposure or the test to be performed (referred to as the
index test)
o C - the comparator (the gold standard test/reference
standard/criterion standard test)
o O - the outcome (or condition) that is supposed to be diagnosed BIAS IN ARTICLES ON DIAGNOSIS
• Example: Spectrum of Patients
Among patients with febrile illness (P), how accurate is Dengue NS1 (E) • An appropriate patient spectrum should be defined in light of the
compared to viral isolation (C) in establishing the presence or absence of dengue research question
fever (O)? o State key factors that could affect test accuracy
▪ Setting, disease severity, prevalence, and prior testing
BIAS AND IMPRECISION o If a small proportion of inappropriate patients will be tolerated, this
Bias (Review) proportion should be stated
• Systematic error or deviation from the truth • Exclusion of inappropriate sampling methods may be part of the eligibility
o Either in the results or in their inferences criteria of some reviews
• Can act in either direction o Ex: exclusion of studies that have employed a group of healthy
o Can lead to overestimates or underestimates of test accuracy controls
• Impossible to know for sure whether a study is biased, or the direction
and magnitude of the bias Verification Bias
o When weaknesses are identified, judgements can be made of the • The choice of an optimal reference standard is crucial
risk of bias in an individual study o Used to determine the presence or absence of the target condition
o Its likely direction and size can also occasionally be hypothesized (disease status)
• Can arise through • Indicators of diagnostic accuracy are calculated by comparing the results
o Problems in the design or execution of the study (internal validity) of the index test with the outcome of the reference standard
o Recruiting the wrong participants, using the wrong test, testing in o When there are disagreements between the reference standard and
the wrong way (external validity) the index test, it is assumed that the index test is incorrect
littlemarmaid 1
o Estimates of accuracy are calculated based on the assumption that • Sensitivity Analysis
the reference standard is 100% sensitive and specific o Unverified patients are alternately considered as different
• Perfect reference standards are rare combinations of test-positive and test-negatives
o Errors due to imperfect reference standards can potentially bias the o May allow the potential magnitude of any bias to be ascertained
estimation of the diagnostic accuracy of the index test
• Acceptable reference standards need to be predefined in the review Differential Verification Bias
protocol • Occurs when some patients are verified by one type of reference
o Judgement about the accuracy of the reference standard are not standard and other patients by a different standard
always straightforward o Particularly of concern when those positive to the index test use
▪ Require clinical experience of the topic area to know whether one method of verification, and negatives receive a second
a test or test combination is an appropriate reference • Where the reference standard is a composite test (involving a panel of
standard tests and other information) differential verification will not occur if all
o In some research areas, consensus reference standards have been individuals receive all tests
defined o Becomes problematic when only selected test information for each
o If a mixture of reference standards is used, consider carefully individual is available, and the extent of that information relates to
whether all of these are acceptable the index test finding
• In other situations, differential verification may occur because different
Disease Progression Bias and Recovery Bias tests are available in different centers
• Ideally, the results of the index test and the reference standard are
collected on the same patients at the same time Incorporation Bias
o If this is not possible/a delay ensues, misclassification may occur • In some primary studies, the reference standard is ascertained through a
due to: panel of tests, or on the basis of information collected over a prolonged
▪ Spontaneous recovery period of investigation
▪ Benefit from treatment o Ex: hospital discharge diagnosis
▪ Progression to a more advanced stage of disease • When the result of the index test is used in establishing the reference
▪ Occurrence of a new disease standard, incorporation bias may occur
• Disease Progression Bias and Recovery Bias o Incorporation of the index test in the reference standard panel is
o Used to describe the associated potential biases likely to increase the amount of agreement between index test
• Treatment Paradox results and the reference standard
o Effective treatment of those found positive on the first test ▪ Overestimates diagnostic accuracy
undertaken leads them to be negative in a later test
o The length of the time period that may cause such bias can vary Test and Diagnostic Review
between conditions
• Similar to the issue of blinded outcome assessment in intervention
• Have to make judgements about what is considered “short enough” for
studies
the condition being considered
• Test Review Bias
o Pre-state this in the review protocol
o Interpretation of the results of the index test may be influenced by
o The time period will depend on the:
the knowledge of the results of the reference standard
▪ Speed of progression of the disease
• Diagnostic Review Bias
▪ Possible resolution of the disease
o Interpretation of the results of the reference may be influenced by
▪ The speed at which treatment can be administered and
the knowledge of the results of the index test
effective
• The extent to which test results can be influenced depends on the
o The time period is likely longer in chronic diseases than in acute
degree of subjectivity involved in interpreting the test
diseases
o More subjective reading → more likely for interpreters to be
o Should state whether:
influenced by the results of the reference standard
▪ All patients have to be assessed within this interval
o Fully automated test → less likely
▪ It is based on mean or maximum times
• Consider the topic area being reviewed and determine whether the
▪ It is acceptable for a pre-specified proportion to be outside
interpretation of the index test or reference standard could be influenced
the required interval
by knowledge of the results of the other test
• Empirical evidence shows that both diagnostic and test review bias
Partial Verification Bias
increase sensitivity but have no systematic effect on specificity
• Aka work-up bias, (primary) selection bias or sequential ordering bias • Whether or not blinding was undertaken in a study may not be explicitly
• Can occur when not all of the study patients are verified by the reference stated
standard o If index and reference tests were undertaken and interpreted in a
• Biased estimates of test performance may arise where the choice of clear order, it is evident that the first must have been undertaken
patients for verification is not random blind to the results of the second
o Esp. if it is then influenced by the results of the index test o If the index and reference tests were undertaken by different
• The effect of partial verification is complicated to predict people, a degree of ambiguity may exist about what information
o Depends on: was available for each test
▪ Whether test-positive or test-negative patients are not o Knowledge of standard laboratory practices may allow reasonable
verified assumptions to be made
▪ Whether unverified patients are omitted from the 2×2 table ▪ Ex: where samples are sent in batches to an independent
or classified as true negatives or true positives laboratory
▪ Whether unverified patients are random samples of index test ▪ Authors should always try to confirm assumptions
negatives and positives
o There is no correct way of handling unverified patients in an
analysis
littlemarmaid 2
Clinical Review Bias QUESTION 2: WERE THE “DEFINITIONS” OF THE INDEX TEST AND THE
• For some index tests, the availability or absence of relevant patient REFERENCE STANDARD INDEPENDENT?
information when the test is undertaken may affect its performance. • The results of a test (the index test or the reference standard) are
o Age, gender, presence and severity of symptoms, other test results interpreted based on a defined set of criteria
o Sometimes, 1 criterion is used (ex: pyuria to define urinary tract
Uninterpretable Results infection)
o Other times, multiple criteria are used (ex: pyuria AND positive
• Diagnostic tests may report uninterpretable results, or call results
urine culture AND compatible urinary symptoms)
uncertain, indeterminate, or intermediate
• If any of the criteria for the index test are part of the criteria of the
• Happens with both index tests and reference standards
reference standard, there tends to be a falsely high level of agreement
• Bias will arise depending on the possible correlation between
between the two tests
uninterpretable test results and the true disease status
o Results in an overestimation of the accuracy of the index test
o If uninterpretable results occur randomly, and are not related to the
• Need to make sure that the definitions of the index test and the
true disease status of the individual, then these should not affect
reference standard DO NOT overlap
the test performance
littlemarmaid 3
• TIP: To ascertain the independence of interpretation, look for attempts • Knowing the sensitivity and specificity of a test does not necessarily help
to make sure interpretation of one test did not affect the interpretation in making clinical decisions because they are statistics based on knowing
of the other whether the patient has a disease
o Suggested by terms such as “independent” or “blind” interpretation o Except:
of results ▪ A negative test result from a test with high sensitivity (a very
o In general, independence can only be assured through a prospective low false-negative rate) usually excludes the disease
study. ▪ A positive test result in a test with high specificity (a very low
• DECISION POINT: Will I believe the article? false-positive rate) usually indicates disease
REMEMBER:
APPRAISING RESULTS SnOUT – Sensitivity, Negative, Rule Out
APPRAISING RESULTS OF PRIMARY TEST ACCURACY STUDY SpIN – Specificity, Positive, Rule In
• The 2X2 Table
o Basic format for evaluating the performance characteristics of a PREDICTIVE VALUES
diagnostic test • Predictive Values
(+) Reference (-) Reference o Measures defined as conditional on the index test results
Standard Standard o Computed as proportions of the total with positive and negative
(+) Positive index test results
True Positives False Positives
Index Predictive Value • Although predictive values seem useful, they will vary substantially
(TP) (FP)
Test TP/(TP+FP) according to the prevalence of the disease
(-) Negative
False Negatives True Negatives
Index Predictive Value Positive Predictive Value
(FN) (TN)
Test TN/(FN+TN) PPV = TP/TP+FP or a/(a+b)
Sensitivity Specificity
• The probability that a case with a positive index test result is diseased
TP/(TP+FN) TN/(FP+TN)
• Reported as proportions or percentages
littlemarmaid 4
• 2x2 table where the prevalence is 1%: Comorbidities
Disease (+) Disease (-) Total • Consider comorbid conditions that might affect the performance of the
(+) Index Test 9 99 108 index test
(-) Index Test 1 891 892 • Ex: malnutrition can decrease the sensitivity of a tuberculin skin test
Total 10 990 1000
o Sn = a/(a+c) = 9/10 = 90% Race
o Sp = d/(b+d) = 891/990 = 90% • Consider racial differences that might alter the performance of the index
o PPV = a/(a+b) = 9/108 = 8.3% test
o NPV = d/(c+d) = 891/892 = 99.8% • Ex: African American ancestry increases the likelihood of a high-grade
prostatic cancer in patients with high levels of prostatic surface antigen
• The PPV was higher in the case where prevalence was higher (10%), and
the NPV was higher in the case where prevalence was lower (1%)
Age
o As prevalence increases, PPV increases but NPV decreases
o As prevalence decreases, PPV decreases but NPV increases • Consider the age of the population in the study concerning the patient
• Ex: a sputum AFB stain for PTB performs well in adults, but gastric
LIKELIHOOD RATIO aspirates are more accurate in infants
• More useful clinically
o Can be used to update the pretest probability of disease using Pathology
Bayes’ Theorem or nomogram once the test result is known • Consider differences in the type of pathology for which the index test is
• Post-Test Probability being used
o The updated probability • Ex: a plain cranial CT scan is reasonable accurate diagnostic test within
o Should be higher than the pre-test probability if the test result is the first few hours of a hemorrhagic stroke, but is less accurate for an
positive ischemic stroke within the same time window
o Should be lower than the pre-test probability if the test result is
negative SOCIOECONOMIC FACTOR
Questionnaires
Positive Likelihood Ratio
• If the index test is a questionnaire, then it particularly prone to social,
LR (+) = Sn/(1 – Sp) or estimated as (a/(a+b)/(b/b+d)) cultural, and economic differences among patients
• Describes how many times more likely positive index results were in the o A lot can be lost in translation of questionnaires
disease group compared to the non-disease group o Even if the language is the same, interpretation and reaction may
• Should be greater than 1 if the test is informative vary
o LR (+) > 1.0 increases the likelihood of the disease • Examples:
o The higher the LR above 1.0, the better it is at ruling in a condition o Cognitive tests tend to underestimate the abilities of elderly people
from ethnic minorities
LR < 3.0 (close to 1.0) Weakly positive
▪ Can lead to overdiagnosis of dementia in these communities
LR 3.0 – 10.0 Moderately positive o The CAGE questionnaire (commonly used to detect alcoholism)
LR > 10.0 Strongly positive performs poorly in some ethnic groups, especially African American
men
Negative Likelihood Ratio o A questionnaire to detect autism developed in the US and UK could
LR (-) = (1-Sn)/Sp or estimated as (c/(a+c) / (d/(b+d)) not be used in families in Hong Kong because of perceived cultural
• Describes how many times less likely negative index results were in the differences
disease group compared to the non-disease group • Should look for local validation studies before accepting the accuracy of
• Should be less than 1 if the test is informative diagnostic tests that come in the form of questionnaires
o LR (-) < 1.0 decreases the likelihood of disease
o The lower the LR below 1.0, the better it is at ruling out a condition Laboratory Tests
• Even laboratory tests may sometimes have questionable applicability
LR > 0.3 (close to 1.0) Weakly negative
• When a lab has limited resources, it may not match the standards of
LR 0.3-0.1 Moderately negative performance defined in a study that uses the best equipment/hires the
LR < 0.1 Strongly negative best interpreters/continuously monitors good laboratory practices
• Need to ensure that these standards are (at least) approximated by the
local laboratories
ASSESSING APPLICABLITY • Ex: RT-PCR used in the diagnosis of COVID-19
• After the article is appraised for its directness, validity, and results, and it o Only laboratories that passed the accreditation are considered
shoes that it is valid and gave acceptable accuracy, the next step is to
assess its applicability to the individual patient
• Look for biologic and socio-economic issues that may affect the INDIVIDUALIZING THE RESULT
applicability of the result of the study THE DIAGNOSTIC PROCESS
o Similar to assessing applicability in articles on therapy • The algorithm of the Diagnostic Process has 7 steps
• Step 5: Testing the Hypothesis
BIOLOGIC FACTORS o Once you have your leading hypothesis, you need to decide
Sex whether you need further information before proceeding to treat
• Consider physiologic, hormonal, or biochemical differences between the patient/before excluding the diagnosis
males and females that might affect the accuracy of the index test o Think in terms of probability
• Ex: creatinine clearance based on a single serum creatinine determination ▪ Is the pretest probability high enough that no further testing is
must be adjusted according to sex needed to proceed with treatment?
▪ Is it low enough that no further testing is needed to exclude
the diagnosis?
littlemarmaid 5
Use Your Overall Clinical Impression
• This is a combination of:
o Known symptom prevalence and disease prevalence
o Clinical experience
o “Clinical judgment.”
• This is just as imprecise as it sounds
o It has been shown that physicians are disproportionately influenced
by their most recent clinical experience.
o BUT it has also been shown that the overall clinical impression of
experienced clinicians has significant predictive value.
• Clinicians generally categorize pretest probability as low, moderate, or
high.
o This rather vague categorization is still helpful.
o Do not get distracted thinking a number is necessary.
• The ends of the bar in the threshold model represent 0% to 100% pretest
probability
• Treatment Threshold
o The probability above which the diagnosis is so likely, you would
treat the patient without further testing
DETERMINE THE PRETEST PROBABILITY • Test Threshold
• There are 3 ways to determine the pretest probability of the leading o The probability below which the diagnosis is so unlikely, it is
diagnosis and the most important (usually the most serious) active excluded without further testing
alternatives: • Diagnostic tests are necessary when the pretest probability of disease is
o Use a validated clinical trial decision rule (CDR) in the middle (grey zone)
o Use information about the prevalence of certain symptoms in a o Above the test threshold and below the treatment threshold
given disease • A really useful tests shifts the probability of disease so much the that
o Use your overall clinical impression post-test probability (the probability after the test is done) crosses one of
the thresholds
Using a Validated CDR
• Investigators construct a list of potential predictors of the outcome of Factors to consider when setting the test threshold
interest and then examine a group of patients if the predictors and • The perceived menace
outcome are present o The more dangerous a disease seems, the lower the diagnostic
o Logistic regression is then used to determine which predictors are threshold (test at lower probabilities)
most powerful and which can be omitted • The invasiveness of the test
o The model is then validated by applying it to other patient o The more invasive a test, the higher the diagnostic threshold (test at
populations higher probabilities)
o To simplify use, the clinical predictors in the model are often • The side effects of the test
assigned point values, and different point totals correspond to o The more side effects a test has, the higher the diagnostic threshold
different pretest probabilities (treat at higher probabilities)
• CDRs are rarely available, but are the most precise way of estimating • The cost of the test
pretest probability o The more expensive a test, the higher the diagnostic threshold
• Finding a validated CDR will allow you to come up with an exact number (treat at higher probabilities)
(or a small range of numbers) for your pretest probability ▪ Esp. if the expenses will be paid out of pocket
Use Information About the Prevalence of Certain Symptoms Factors to consider when setting the treatment threshold
• Ex:73% of patients with pulmonary embolism (PE) have dyspnea. • The perceived menace
However, this does not tell you how many patients with dyspnea have o The more dangerous a disease seems, the lower the treatmnet
PE. threshold (treat at lower probabilities)
• There is often a lot of information available about symptom prevalence. • The invasiveness of treatment
o The more invasive a treatment, the higher the diagnostic threshold
(treat at higher probabilities)
littlemarmaid 6
• The side effects of the treatment APPRAISING DIRECTNESS
o The more side effects a treatment has, the higher the diagnostic Question Article
threshold (treat at higher probabilities) Population/ Symptomatic and
• The cost of the test COVID-19 suspect
Patient asymptomatic patient
o The more expensive a treatment, the higher the diagnostic
Coris COVID-19 Ag Coris COVID-19 Ag
threshold Exposure
Respi-Strip Test Respi-Strip Test
▪ Esp. if the expenses will be paid out of pocket
Comparator RT-PCR RT-PCR
Outcome Diagnosis of COVID-19 Diagnosis of COVID-19
BAYES THEOREM
• Step 1: Convert pretest probability to pretest odds.
APPRAISING VALIDITY
o Pretest odds = pretest probability/ (1 - pretest probability)
• Q1: Was the reference standard acceptable? Yes
• Step 2: Multiply pretest odds by the Likelihood ratio to get the post-test
odds.
o Post-test odds = pretest odds x LR
• Step 3: Convert post-test odds to post-test probability.
o Post-test probability = posttest odds/ (1 + post-test odds)
• DECISION POINT: Did the negative test result make the probability
reach the test threshold? Did the positive test result make the probability
reach the treatment threshold?
FAGAN NOMOGRAM o RT-PCR is the current gold standard being used for the diagnosis of
• Find the patient’s pretest probability on the COVID-19
left, then draw a line through the likelihood o It is a combination of nasopharyngeal and oropharyngeal samples to
ratio of the test to find the post-test improve the sensitivity of one specimen alone.
probability o In the article, only a nasopharyngeal swab was done
o Once you have arrived at the post-test ▪ The nasopharyngeal swab specimen only has a 97% (92-
probability, you can make a clinical 100%) sensitivity.
decision base don the thresholds of
management and the patient’s
preferences
• 3 Choices
o Stop testing and get on with the
treatment of the probable disease
o Stop testing and reassure the patient
that the disease probability is low • Q2: Were the definition of the index test and the reference standard
o Do more tests before deciding independent? Yes
o There are no criteria used for the index test and the reference
A useful calculator: http://araw.mede.uic.edu/cgi-bin/testcalc.pl standard
CASE SCENARIO
You are a doctor in a remote barangay in the mountainous province of Tralala
where RT-PCR was not available. A 40 year/old, male patient came in to your clinic
during the enhanced community quarantine due to cough, sore throat and fever
which started 10 days prior to consult. On your history, patient had a close contact
with a COVID (+) patient 13 days prior to consult. Past medical history revealed
that he has uncontrolled T2DM. You decided to do rapid antigen test. You inquired
in the laboratory you are affiliated to and asked the availability of the test. The
laboratory personnel said that it was available and the name of the test kit is Coris
COVID-19 Ag Respi-Strip test . You want to know the accuracy of this test. So, you
decided to search for an article and found:
littlemarmaid 7
• Q3: Were the performance of the index test and the reference standard APPRAISING RESULTS
independent? Yes
o Both tests were done in all collected samples
• Q4: Were the interpretation of the index test and the reference standard
independent? Yes
o There is no degree of subjectivity in interpreting the RT-PCR
▪ It is based on the CT value
o The Respi-strip employs a qualitative interpretation using a control
line and a test line
▪ Important to blind the interpreter of the Respi-strip to the
results of the RT-PCR
littlemarmaid 8
The test has no cross-reactivity
• Prevalence of the Disease with other viruses, bacteria, or
o The computed prevalence of the disease is quite high considering fungus
that the samples were taken in a hospital Race Asian European
▪ Might not be reflective of the true prevalence of the disease The age range included in the
in the setting of the patient in the clinical case scenario Age 40
study is 0 to 94 years old
Included both asymptomatic and
symptomatic but stated also the
Symptomatic
results of those who are
Pathology
symptomatic only.
CT-value should The accuracy of the test is
be taken affected by the CT-value
Does not require special
Availability of equipment and skilled laboratory
COVID-19 CT- personnel familiar with
value molecular techniques. The test
Socioeconomic strip is only needed.
Factors Availability of
ultracentrifugation
in the laboratory Ultracentrifugation is needed.
that will conduct
the test.
• Visiting again the website of the manufacturer:
https://www.corisbio.com/pdf/Products/COVID-19-Respi-
Strip_20200910.pdf
• What CT-value is: https://www.cebm.net/study/duration-of-
• Likelihood Ratio infectiousness-and-correlation-with-rt-pcr-cycle-threshold-values-in-
cases-of-covid-19-in-england/
o Cannot compute for the LR (+) because the specificity of the test is
100%
▪ The test is highly specific
o The LR (-) is > 0.3 (close to 1), so the test is weakly negative
ASSESSING APPLICABILITY
Patient Article
The study included both males
Sex Male
and females
There was no mention of the
conditions (concomitant
Comorbidity Type 2 DM
diseases) that will give a false
positive or false-negative result.
littlemarmaid 9
• The cost of treatment: ↓ Were the definition of the index test and the reference standard independent?
• Bias: Incorporation Bias
• Answer YES if the index test did not form part of the reference standard
or vice versa.
• Answer NO if the reference standard formally included the result of the
• index test.
• Answer UNCLEAR if it is unclear whether the results of the index test
were used in the final diagnosis.
littlemarmaid 10