Diagnostic test: validity appraisal.
Validity
assessment. Results assessment.
Applicability assessment. Application.
Irine Sakhelashvili
SEU
2022
Diagnostic Test: Validity Appraisal
• Validity Assessment (Is the Information Valid?)
Validity is the extent to which the data are free from bias. The bias can occur in
the selection of sample or measurement; therefore, you must ask whether:
1. Selection bias was avoided (Was the sample selection appropriate?)
(a) Appropriate spectrums of patients in whom there is a need for a new
diagnostic test (or new diagnostic approach)?
(b) Selected in unbiased way (e.g. consecutive cases)?
2. Measurement bias was avoided:
(a) Was there a comparison with an appropriate gold standard?
(b) Blinded measurement: where those doing or reporting the ‘gold standard’
unaware of the test result and vice versa?
(c) No missing data: did everyone who got the test also had the gold
Q.1. Was the Sample Selection Appropriate?
(a) Appropriate spectrum of patients in whom there is a need for a
new diagnostic test (or approach)?
You need a new diagnostic test to distinguish a disease (early as well as
late) from other diseases with similar symptoms.
(b) Selected in unbiased way:
consecutive patients fulfilling the entry criteria, with symptoms and
signs common to both cases and non-cases ought to be included in the
study.
How do we answer and interpret the
question?
Read the methods section of the paper to find out what criteria were used for
inclusion of patients in the study. Determine whether the patients so included
represent
(a) the disease spectrum in whom a new test is needed and also
(b) the diseases commonly confused with the disease to be diagnosed (target
disorder).
How to interpret the results?
If there was only one set of eligibility (entry) criteria and it covered both cases and
non-cases, then the patients are likely to suffer from commonly confused diseases.
Sometimes, authors do not mention entry criteria but present the criteria for final
diagnoses of cases and non-cases. In such papers, you have to decide whether in
clinical practice, there is confusion between the cases and non-cases and whether
there is a need for a test to distinguish them from one another.
Q.2. Was There a Comparison with an Appropriate
Gold Standard?
An appropriate gold standard is the one which is ‘error-free’ and independent (distinct or separate)
from the test under evaluation.
The only way to evaluate the correctness of the results of the test under evaluation is to compare
with something, which is never wrong. This something is called the reference standard or gold
standard. This means that the gold standard is never false positive or false negative. It is 100 %
sensitive and 100 % specific.
However, such an ideal ‘gold standard’ is hardly ever available. You may have to accept something
less than ideal as a reasonable gold standard. You may have to accept something less than ideal as
a reasonable gold standard. The purpose of the gold standard is to tell you the truth – did the
patients have the disease or not, when the test was performed.
Sometimes authors use more than one ‘gold standard’ to know whether the patients at the time of
testing had the disease or not.
How do we answer and interpret the question?
You need to carefully read and find out what gold standard (s) is (or
are) used in the paper. From your knowledge of the subject, decide
whether it (they) is (are) reasonable and whether they are independent
of the test result.
How to interpret the results?
If the gold standard is (are) not reasonable, then the results cannot be
trusted. There will be overestimation or underestimation of the test
properties. If the gold standard is not independent of the test result,
i.e. test is a part of the gold standard, there will be overestimation of
the test sensitivity and specificity. The extent of this overestimation
depends on the degree of overlap (dependence) between the test
result and the gold standard.
Q.3. Were Those Doing or Reporting the ‘Gold
Standard’ Unaware of the Test Result and Vice Versa?
The test is the one under evaluation, and its correctness is being determined by comparison
with the ‘gold standard’, which gives the definitive diagnosis.
As clinicians, you are aware that once clinicians know the chest CT finding of pleural fluid, they
start finding decreased breath sounds on auscultation; if ultrasound shows a stone in the
kidney, they find a corresponding radio-opaque shadow in the plain x-ray abdomen.
Thus, knowledge of a test result may introduce conscious or subconscious bias in the
interpretation of gold standard and vice versa.
How do we answer and interpret the question?
The authors of the paper may write in the methods section that those doing the gold standard
or the test were unaware of the results of the other. If they don’t write, you have to determine
from reading the methods whether those doing the test can know the results of the gold
standard or those doing/reporting the gold standard could introduce conscious or subconscious
bias in the result if they know the test result.
If there is possibility of bias due to knowledge of the test results in reporting the gold standard
or vice versa, the validity of the study is compromised.
Q.4. Did Everyone Who Got the Test Also Had the Gold Standard (No
Verification Bias )?
The gold standard is done to verify the results of the test under evaluation. By
comparing the two results, we know the true positives, false positives, true
negatives and false negatives of the test. Ideally, all patients should have both
the test and the gold standard.
Why do we ask this question?
Gold standard tests are often invasive and/or expensive. Clinicians are
reluctant to perform the gold standard, when test results under evaluation are
negative. For example, if you are evaluating exercise electrocardiography, you
may not like to do coronary angiographies in those who are exercise test
negative. Similarly, if ventilation perfusion scans are negatives, you may not like
to do pulmonary angiographies, which is the ‘gold standard’ for pulmonary
embolism.
But then how do you verify the negative test
results?
Clinicians know that they may be false negatives.
Some investigators follow up the patients for a period of time (without
specific treatment). If the patients do not develop direct or indirect
features of the disease, they are taken as true negatives, otherwise as
false negatives.
If there is no follow-up or any other way to verify the negatives, the
study suffers from verification bias.
How do we answer the question?
The results section of the paper gives you the number of patients who
underwent the test and the gold standard. If all those undergoing the
test were subjected to the gold standard, there is no problem. More
than one gold standard may be used, for example, angiographies and
follow-up. If both are acceptable, again there is no problem.
If some patients had not undergone the gold standard test and were
not followed up or subjected to another gold standard, there may be a
problem. You need to find out whether the results are still valid.
How do we interpret the answer?
One way to find out whether the results are still valid is as follows:
The number of patients, negative on the test but no gold standard
(means no verification), may be assumed to be false negative, and the
test characteristics may be recalculated. If they are still acceptable, then
the results may be taken as valid, in spite of verification bias. However, if
test characteristics become unacceptable, then the validity of the results
is compromised.
The above false-negative assumption is extreme and unlikely to be true,
but if a study passes this extreme assumption, then the results are
strong. Otherwise, you don’t know whether the results are valid or not.
You doubt the validity of the results.
Results Assessment: What Is the
Information?
Applicability Assessment
- Is the Test Available and Reproducible in My Clinical Setting?
The test availability includes not only the equipment and reagents
required for the test but also the human resource (e.g. technicians and
experts). If they are available, the next question is, can they reproduce the
same result when the test is repeated in stable patients?
Are Patients in My Practice Similar to Those in the Study? -You need to
think whether the disease severity and the conditions in differential
diagnosis of the disease in your practice are similar to those in the study.
Otherwise, the test parameters (sensitivity, specificity, likelihood ratio
[LR]) may not strictly apply, though roughly you can still use them.
The Likelihood Ratio (LR)
It is the likelihood that a given test result would be expected in a patient
with the target disorder compared to the likelihood that that same result
would be expected in a patient without the target disorder. For example,
you have a patient with anaemia and a serum ferritin of 60mmol/l and you
find in an article that 90 percent of patients with iron deficiency anaemia
have serum ferritins in the same range as your patient (= sensitivity) and
that 15 percent of patients with other causes for anaemia have serum
ferritins in the same range as your patient (1 – specificity).
This means that your patient’s result would be six times as likely (90/15) to
be seen in someone with, as opposed to someone without, iron deficiency
anaemia, and this is called the LR for a positive test result.
Will the Results Change My Management?
This is a very important question. You know the pretest probability in your setting.
You know the LRs of the test results, and you determine the posttest probabilities
and then ask whether these are likely to change your decision. If the test results,
even if one, is likely to change your management, then the test is helpful.
Will Your Patients Be Better Off as a Result of the Test?
It is not enough to say that the test results will change your management. What
you need to think is whether as a result of this changed management, your patients
are likely to be better off. Better off may mean better health, earlier discharge to
home, less inconvenience, early return to work and even less expenses. Then
whatever benefit comes, is it worth the costs and risk of the test?
Application
As usual, first assess the patients with history and physical examination
and determine the pretest probability of the disease, then do the
diagnostic test and get the result. The likelihood ratio associated with
this result will take you from pretest probability to posttest probability.
Based on this, take a decision whether to treat and do another test.