You are on page 1of 10

Dentomaxillofacial Radiology (2009) 38, 110 2009 The British Institute of Radiology http://dmfr.birjournals.



Evidence-based diagnosis and clinical decision making

PA Mileman*,1 and WB van den Hout2
1 2

Department of Oral Radiology of the Academic Centre for Dentistry in Amsterdam (ACTA), Amsterdam, The Netherlands; Department of Medical Decision Making, University of Leiden Medical Center (LUMC), Leiden, The Netherlands

The application of evidence-based dentistry to diagnosis should result in a reduction of errors in decision making. The frequency of errors is dependent not only on the accuracy of a diagnostic test for pathology, but also on the prior chance of disease being present. If this chance is low and below a certain threshold, then, for example, applying a diagnostic test can result in more decision errors and therefore inappropriate treatment than omitting to use the test. In deciding on the usefulness of a diagnostic test an additional factor to take into account is the relative value of the possible health states resulting from diagnosis and subsequent therapy. These can be determined by eliciting from the patient the numerical values of the possible dental health conditions using a visual analogue scale technique. Clinical decision analysis can then be carried out to calculate the most appropriate diagnostic strategy for the patient. Clinical decision analysis is starting to influence the development of guidelines for the diagnostic use of radiographs although its application in dentistry needs further refinement and development. Dentomaxillofacial Radiology (2009) 38, 110. doi: 10.1259/dmfr/18200441 Keywords: diagnosis, radiography; evidence-based medicine; decision making; state of the art review

Introduction Diagnosis is only a means to select the best treatment.1 Combining clinical findings with radiological observations to achieve a diagnosis is one of the everyday tasks of the general dental practitioner. The ability to do this may have been learnt from an eminent professor, but even so, how do we know that s/he was accurate in diagnosis and successful in transferring this knowledge to us? If mistakes occur in diagnosis then lesions will be missed where they are present and lesions will be treated which are not there. Are these two types of errors just as important for the patient and, if not, how can we achieve the optimal trade-off between both types of error in diagnosis? Correctly identifying pathology using a diagnostic test will, among other things, depend on the chance that pathology was present beforehand. These and other aspects of the evidence base of diagnosis in dentistry will be the subject of this article. There are numerous introductions available for evidence-based diagnosis2,3 stemming from an
*Correspondence to: Dr PA Mileman, Academic Centre for Dentistry in Amsterdam (ACTA), Louwesweg 1, 1066 EA Amsterdam, The Netherlands; E-mail: Received 24 July 2007; accepted 20 December 2007 (no revision)

inspirational publication in 1972.4 To complement them, international guidelines for reporting diagnostic research have been published,5 whilst in Europe6 and the USA evidence-based guidelines for prescribing radiographs have recently been issued. In this article we will illustrate important aspects of evidence-based dentistry as it pertains to approximal caries diagnosis. How should articles in the literature be evaluated and used in the clinic? Our approach here will be that of clinical decision analysis.7 In it, treatment decisions are determined not only by obtaining published evidence but also by being able to combine this diagnostic evidence in a rational, transparent and systematic way to optimize interventions for the benefit of the patient (Figure 1).

Diagnostic accuracy A diagnostic test is a procedure carried out with the intention of decreasing the uncertainty regarding whether a disease is present or absent. Tests may be used for different purposes: screening, diagnosis, prognosis and during therapy. Ideally, a test provides

Evidence-based diagnosis PA Mileman and WB van den Hout

Figure 1 The road to effective decision making for the patient is full of bumps and pitfalls. Aspects of the diagnostic process with potential shortcomings are shown in this graphical illustration. By making these aspects explicit, evidence-based diagnosis can contribute to improved decision making by the dentist for his patient

perfect information, quickly, without changing the state of health of the patient and without cost. Most tests are not perfect. The value of a test will ultimately depend on the net health benefits to the patient subsequent to the treatment chosen based on the test result. Whether the most appropriate treatment is chosen will depend in part on the accuracy of the test. Diagnostic imaging procedures usually demand the interpretation of the image by an observer who is therefore part of the diagnostic sytem and as such contributes to its level of accuracy. There is considerable variation in diagnostic accuracy for dentin caries between dentists themselves when viewing bitewing radiographs.8 Feedback about their own diagnostic accuracy and how to improve it would seem part and parcel of an evidence-based approach to the use of imaging tests by dentists. Tools should be developed to enable dentists to assess their own accuracy at least for commmon diagnostic problems such as caries,9 periodontal disease and periapical lesions. Innovations in research on improving diagnostic accuracy for imaging tests have included that of identifying features on radiographs linked to the increased probability of a correct diagnosis and therefore of improving the prognosis after treatment for the patient. This approach may improve accuracy in
Dentomaxillofacial Radiology

diagnosing the likelihood of treatment complications occuring during, for example, wisdom tooth extraction.10 Practitioners may, however, need training in this method of feature recognition11 to be able to use diagnostic aids or expert systems such as Oral Radiographic Differential Diagnosis (ORAD) which have appeared on the internet.12,13 The gold standard A prerequisite for being able to assess the accuracy of a test is the availability of a valid gold or reference standard diagnosis. The appropriate gold standard should be carefully chosen14 and based on a technique other than that of the test being evaluated. It is therefore better to evaluate, for example, an imaging test by comparing it with a non-imaging gold standard test such as histology or biopsy. Otherwise, there is a danger that the estimation of the performance of the test under consideration will be inflated since both the gold standard and the index test may have the same systematic errors. Measures of diagnostic accuracy Articles comparing diagnostic tests often use and calculate a broad spectrum of measures of diagnostic

Evidence-based diagnosis PA Mileman and WB van den Hout

Table 1 Measures of diagnostic test accuracy with methods of calculation and definitions. These calculations are a realistic example of the use of bitewing radiographs as a test for the diagnosis of a tooth surface with dentin caries. General notation: Disease present Disease absent Test positive Test negative Total Measures of accuracy Prevalence Sensitivity (Se) Specificity (Sp) Predictive value positive (PV+) Predictive value negative (PV2) Positive likelihood ratio (LR+) Negative likelihood ratio (LR2) Diagnostic odds ratio (DOR) Area (Az) under the receiver operating characteristic (ROC) Curve a (TP) c (FN) a+c b (FP) d (TN) b+d Definitions Pre-test chance of disease in a population at a point in time Proportion of positive test results in cases with the target disorder (true positive rate) Proportion of negative test results in cases without the target disorder (true negative rate) Proportion of cases with a target disorder among those with a positive test result; chance that a patient with a positive test result has the disease Proportion of cases without the target disorder among those with a negative test result; chance that a patient with a negative test result does not have the disease Ratio of the chance of a positive test result in a case with disease to the chance of a positive test result in a case without disease Ratio of the chance of a negative test result in a case with disease to the chance of a negative test result in a case without disease Ratio of the odds of a positive test result in disease relative to the odds of a positive test result in those without disease. (Odds 5 chance/(12chance)) The ROC curve is a graph of the relationship between the true positive fraction (Se) and the false positive fraction (12Sp) of a diagnostic test for all possible threshold values for discriminating between health states. The area under the ROC curve is a measure of the probability that the test can be used to discriminate between diseased and healthy patients. Total a+b c+d (a+b+c+d) Bitewing + Bitewing 2 Total Diagnosis of the presence of dentin caries using bitewing radiography: Caries present 18 27 45 Caries absent 1 59 60 Total 19 86 105 Calculations (a+c)/(a+b+c+d) 5 43% a/(a+c) 5 40% d/(b+d) 5 98% a/(a+b) 5 95% d/(c+d) 5 69% (a/(a+c))/(b/(b+d)) or Se/(12Sp) 5 24.0 (c/(a+c))/(d/(b+d)) or (12Se)/Sp 5 0.61 (a6d)/(c6b) or LR+/LR2 5 39.3

TP, true positive; FP, false positive; FN, false negative; TN, true negative

performance.1517 Table 1 provides an overview, with illustrative calculations of a simple dichotomous test. The sensitivity of a test is the percentage of cases with the disease for which the test result is, indeed, positive. The specificity of a test is the percentage of cases without the disease for which the test result is actually negative. Both sensitivity and specificity are generally considered properties of the test that are independent of the prevalence of disease, but may vary with severity of disease in a study population. One of the problems of test evaluation is that sensitivity and specificity are inversely related. Many medical tests depend on a numerical threshold value to distinguish normal from abnormal test results. High sensitivity can be achieved by choosing a threshold such that few cases of disease are missed; this will, however, generally have the effect that a higher proportion of cases without disease will be judged as diseased, so specificity will be reduced. Receiver operating characteristic (ROC) curves graphically represent the nature of the dependency between sensitivity and specificity. The area under the ROC curve (called Az) provides a measure for the ability of the test to distinguish patients with disease from patients without disease,18 without specifying the clinical cut-off point used for the test and without distinguishing the difference in importance to the patient of false positive (FP) and false negative (FN) diagnostic errors. The area under the ROC curve can vary from 0.5 (an uninformative test) to 1.0 (a test that discriminates perfectly between health and disease). This area measure can be used to compare and test for

the difference in accuracy between different diagnostic techniques. In publications evaluating tests, because of the relationship between sensitivity and specificity, these two measures viewed in isolation provide an incomplete picture of the performance of a diagnostic test. This has resulted in a plethora of accuracy measures being used in diagnostic publications. Overall measures which combine sensitivity and specificity are, for example, likelihood ratios and the diagnostic odds ratio (Table 1).19,20 Another important reason to consider measures other than sensitivity and specificity is that they answer the wrong question, that is, What is the probability of a certain test result, given the absence or presence of disease? A clinically more important question would be, What is the probability of disease, given a positive or negative test result? Superficially the answer might seem to be provided precisely by two other measures: the positive and negative predictive values of a test (Table 1).17 However, a publication concerning a diagnostic test using predictive values is only comparable with another publication about a patient population with exactly the same prevalences of disease as that reported in the original article. To compare articles about a diagnostic test using patients from populations with different prevalences of disease additional calculations are needed. Predictive values are therefore poor for comparing research publications about diagnostic tests because differences in the measures can arise due to differences in both diagnostic accuracy and the underlying prevalence of disease.
Dentomaxillofacial Radiology

Evidence-based diagnosis PA Mileman and WB van den Hout

Obtaining diagnostic evidence from the literature At the end of the last century, summaries of the enormous literature concerning therapeutic interventions were made in the form of systematic reviews and meta analyses.19 In these reviews the level of scientific evidence was assessed and recommendations were made for improving the quality and reporting of research. Although the outcome of therapy will to a large extent depend on the right diagnosis, developments in evidence-based diagnostic research have lagged behind those in the therapeutic field. Systematically summarising the evidence of the accuracy of diagnostic tests has had a number of problems: literature on diagnostic test evaluation has been difficult to identify, sensitivity and specificity need to be analysed together and published studies have often used heterogenous populations where the prevalence of the target disease was not reported.19 Publications without explicit inclusion of frequency data meant that summarising the accuracy of different tests in a meta-analysis was impossible.19 Because of the inadequacies of reports in the past, there are now guidelines for readers for judging the quality of articles about diagnostic tests15 and for writers of articles about diagnostic accuracy: Standards for Reporting of Diagnostic Accuracy, the so-called STARD initiative.5 Using these guidelines, diagnostic articles can be searched for in databases such as Medline by using PubMed and meta-analysis summaries can be made.19 Diagnostic tests in dentistry for which there are published measures of accuracy include oral examination, patient features, dental history and complaints, electric and other forms of vitality testing for pulp necrosis, digital and film radiographic imaging techniques for approximal and occlusal caries, periodontal and periapical disease, and pocket probing for periodontal defects.21 Guidelines for evaluating publications on diagnostic tests Identifying diagnostic publications has been made easier recently as a result of guidelines advising authors to use the keywords sensitivity and specificity or accuracy in their publications. There is also now a specialized search engine using a filter for articles found in PubMed, SUMSEARCH,22 which can be used for initial searches of the literature for diagnostic test literature in which the articles found are ranked according to criteria for the strength of evidence. For example, in a search using the key words radiography and dental caries, 695 articles were identified. When the diagnostic filter of the search engine (using the keywords sensitivity and specificity) was used, this was reduced to 176, 7 of which were reported as probably systematic reviews. Quality criteria should be used when reviewing the literature17,19 to judge the validity of the study, the manner of presenting results and whether the conclusions can be applied to help dental practitioners in
Dentomaxillofacial Radiology

caring for their patients. According to the criteria cited, a diagnostic study should report results of a double blind prospective independent comparison of the index test with a valid gold standard reference test for actual pathology. The study should report the cutoff point used for the diagnosis of pathology for the index and reference test, the prevalence and spectrum of disease, previous tests and referrals, and about patient demographics. The results should be reported in the form of a frequency table so that likelihood ratios for the index test can be calculated (Table 1). The reproducibility and accuracy of interpretation of the test by the practitioner in general practice should be comparable with that in the article reporting the test. Furthermore, the results of the test should be applicable to patients seen in general practice, change the management of the patients and improve their overall health status. However, in a recent systematic review of the use of bitewing radiography compared with panoramic radiography as a test for caries diagnosis,23 only five publications were found of a high enough standard to answer the question that the study had posed. Insufficient evidence to support the use of panoramic radiographs for the diagnostic task was found. The authors concluded it was not possible to combine the results in a meta-analysis because there were too many differences between the study populations and the reference tests used. This conclusion is echoed in other systematic reviews of diagnosis in dentistry.24 Summarising and comparing accuracy of diagnostic performance Meta-analyses of diagnostic literature are carried out in a similar way to those for therapeutic studies.25 A previously developed protocol is used for retrieving the pertinent documents and extracting and combining the accuracy data, which are aggregated using a weighted quality scoring system from the published literature.5 Specifically for meta-analyses of diagnostic tests is the problem that sensitivity and specificity are correlated, and therefore pooling of estimates across different studies may render biased estimates of a diagnostic tests performance. For this reason, the log diagnostic odds ratio (DOR) has been advocated for summarising results in meta-analyses (Table 1).19,20 A more explicit way to take into account the relationship between sensitivity and specificity is the summary ROC (SROC) method.19 In diagnostic imaging, the cut-off points or thresholds along the ROC curve usually represent different degrees of certainty with which a diagnosis of pathology was made. Different outcomes in studies of diagnostic accuracy between publications originate, in part, from using these different thresholds. The SROC-method for summarising data, however, considers the data from each scientific publication as originating from the same ROC curve whilst taking into account the possibility that the data points represent different thresholds.

Evidence-based diagnosis PA Mileman and WB van den Hout

From prevalence of disease to clinical decision making For a full evaluation of the consequences of using or not using a diagnostic test, evidence is required, not only about diagnostic accuracy, but also about the prevalence of disease and the preferences for the possible outcomes of treatment.16 This evidence and any uncertainties involved can be agregated and analysed using clinical decision analysis to arrive at the best course of action. Estimating prevalence of disease A prerequisite for decision analysis is to have an estimate of the initial probability of pathology, taking into account the patients clinical characteristics, signs and symptoms. Initial sources of information about chances of pathology for the dentist will be memories based on previous experience of other patients, but these memories may be selective. Another primary source of evidence is the scientific epidemiological literature about the average prevalence in the population. However, in the dentists surgery, patients presenting and suspected of having, for example, periapical pathology may have complaints, other features of disease such as discolouration of the tooth in question or the suspected tooth may have been crowned. These factors together with the clinical examination will all modify the chances of periapical pathology actually being present. Vitality testing will further modify this probability of pathology even before radiographs are considered as an additional diagnostic test. Recalculating the chance of disease following a diagnostic test A critical determinant of the value of a diagnostic test is how the test result changes the probability that the patient has the disease. In other words, is there a sufficient reduction in uncertainty as a result of the test about whether or not the patient has a pathological condition to make a decision about therapy? The pretest probability (prevalence or a priori probability) of disease for patients in the waiting room can be used together with the likelihood ratios of the test to calculate the post-test (a posteriori chance) probability of disease given the test result. These calculations can best be illustrated by making use of modified chances in the form of odds. Odds are related to probabilities (and therefore prevalence) by the following formulae: odds 5 probability/(12probability) or prevalence/(12prevalence) and probability 5 odds/(1+odds). The odds is the relationship between the chance of an event occurring and of it not occurring, and is famously used in the Anglo-Saxon world for betting. The odds of disease, before and after the test result is known, are related by the following formula: post-test odds 5 likelihood ratio 6 pre-test odds. This formula is called Bayes theorem and is attributed to an English clergyman, the Reverend

Thomas Bayes (17021761). The likelihood ratio used should be either the positive or negative likelihood ratio, depending on whether we want to calculate the probability that disease is present or absent. The respective likelihood ratios are larger or smaller than one, so that a positive test result increases the odds (and therefore the chance) of disease and a negative test result reduces the odds of disease presence. Here we will use, as an example, a prevalence of 0.43 proximal dentin caries and a positive likelihood ratio of 24 (Table 1). After a positive test result, the post-test probability of dentin caries can be calculated as follows:
Pre-test odds 5 0.43 / (120.43) 5 0.754 Post-test odds 5 24 6 0.754 5 18.1 Post-test probability 5 18.1 / (1+18.1) 5 0.948

This means that prior to the bitewing radiograph used as a test for dentin caries the chance of pathology was 0.43, but after a positive test the probability had increased to 0.948. This test has therefore substantially increased the certainty with which the diagnosis of pathology can be made. By definition the post-test probability in this example is the same as the positive predictive value. However, if the prevalence had been 0.02, the post-test probability in our example would be:
Pre-test odds 5 0.02 / (120.02) 5 0.020 Post-test odds 5 24 6 0.020 5 0.480 Post-test probability 5 (0.480 ) / (1+ 0.480 ) 5 0.324

With the same accuracy of diagnostic test, a lower prior probabilty of disease means that the post-test probability is much lower, and we are now much less sure that pathology is present after a positive test result. The probability of pathology after a positive test given the measures of accuracy in Table 1 also can be arrived at with the help of calculators on the internet (for example, the EBP calculator on http://sumsearch. or The threshold approach to diagnostic testing Bayes theorem allows the post-test probability of disease to be calculated once the result of the test is known. If the pre-test probability is sufficiently low, the post-test probability may still not be sufficiently high to justify instigating treatment despite a positive test result (see the second calculation in the previous paragraph). Similarly, the pre-test probability can be so high that even after a negative test result it would be irresponsible not to instigate therapy. In short, at very high and low prevalences, initiating treatment on the basis of the test result may lead to deterioration in the health of the patient. This means it is in the interests of the patient not to undergo a test. Consider the test described in Table 1, with a sensitivity of 40% and a specificity of 98%. Furthermore,
Dentomaxillofacial Radiology

Evidence-based diagnosis PA Mileman and WB van den Hout

assume that a realistic prevalence of approximal dentin caries in the patient is 2%. Then, treatment based on the outcome of the test will result in 3.2% of decisions being incorrect (60% of 2% results in 1.2% FN errors and 2% of 98%<2% FP errors), whereas refraining from testing and treatment would only render 2% of decisions incorrect (all FN). In fact, up to a prevalence of 5%, testing leads to more incorrect treatment decisions than refraining from testing and therefore treatment. Conversely for prevalences above 62%, due to the far from perfect 40% sensitivity, deciding on treatment based on the results of testing would lead to more incorrect decisions than instigating treatment without testing. Only in the intermediate range of prevalence, from 562%, is a better chance of a correct decision obtained by testing and initiating treatment after a positive test result. Unless a diagnostic test has perfect sensitivity or perfect specificity, watchful waiting is the best option if the prevalence is sufficiently low and directly proceeding to treatment is the best strategy if the prevalence is sufficiently high. The prevalence above which it is better to test instead of wait (5% in the example above) is called the test threshold. The prevalence above which it is better to treat without testing first (62% in the example above) is called the testtreatment threshold.2,16 Even if the exact values of the test and treatment thresholds are unknown, it is still important to be conscious of their existence. Testing is often considered a safe option, without the realization that an imperfect test will result in loss of health for the patient due to the FP and FN results of diagnosis and treatment. For example, one article reported diagnostic accuracy using radiographs for periapical lesions as sensitivity 70% and specificity 77%.21 Using this test at a prevalence of lesions up to 25% would result in more lesions being incorrectly diagnosed than correctly diagnosed and treated. The pre-test probability of disease will be modified by the patient work-up, including referral of patients and use of patient selection factors recommended in international guidelines for prescribing radiographs.6 For example, the selection factor presence of anterior caries or restorations could put children in such a high caries-risk group that the test threshold for screening bitewing radiographs might be exceeded. Similarly, it may mean that a clinical work-up including periodontal probing will push the chance of moderate periodontal disease over the testtreatment threshold so that additional radiographs irrespective of the findings will not provide information which would change the anticipated management of the patient. The answer to the question, Will my patient benefit from bitewing radiographs? will therefore not only depend on the prevalence of disease, the use of selection factors, the accuracy of the dentist in diagnosing radiographs and the radiographic (film or digital) diagnostic technique used, but also on the patients valuation of the desirable and undesirable outcomes of diagnosis and therapy.
Dentomaxillofacial Radiology

Figure 2 Utility assessment. Example of a visual analogue scale for measuring four health state outcome values. By moving the arrows along a scale between best (100) and worst imaginable (0), the respondent indicates the relative value of the intermediate health outcomes. FN, false negative; FP, false positive; TN, true negative; TP, true positive

Measuring patient values When using an imperfect test, errors occur in diagnosis, which means dentists will be missing lesions when they are present (FN errors) and/or finding lesions when they are not (FP errors). The threshold approach described above, in which test and test-treatment thresholds were defined, gave equal weight to both types of error. However, the health states resulting from these two types of error may be valued differently by the patient. Figure 2 illustrates one method by which the numerical values for possible health outcome states (utilities) can be elicited.26 Respondents are asked to indicate the value of a possible health outcome of diagnosis and therapy between best and worst possible outcomes on a visual analogue scale (valued at 100 and 0 respectively). The best outcome is given by a true negative (TN) decision. Realistically from the patients point of view after the treatment decision the value of the true positive (TP) and FP treatment outcomes might be considered equivalent. However, fourth year dental students consider the value of the outcome of a FP decision significantly lower than that of a TP decision (utility of FP 36 and TP 78). In the limited number of studies in dentistry on measuring utilities, dentists vary in their values26 and generally seem to value treatment outcomes more highly than patients do.27 This is important because the value of diagnostic testing depends in part on the value attached to the health states of the possible outcomes involved. Calculating the expected value of a diagnostic strategy while incorporating patient values When deciding to carry out a diagnostic test, two types of information need to be combined: the probabilities of the various outcomes of testing and the values of the outcomes of testing to the patient. There may, however, be a chance of complications from the treatment and

Evidence-based diagnosis PA Mileman and WB van den Hout

Figure 3 Simple example of how the options and outcomes of a typical diagnostic problem are modelled in the form of a decision tree. FN, false negative; FP, false positive; Prev, prevalence; Se, sensitivity; Sp, specificity; TN, true negative; TP, true positive

from the use of the diagnostic test itself. For example, an operation to remove wisdom teeth may damage the lingual nerve.10 In addition there may be a chance that the treatment will fail in the long term so it is important to know, for example, the chance of composite and amalgam restorations surviving a 10 year period. This type of information, although not as yet readily available in the scientific literature, is important to have when deciding on the value of diagnostic examinations. We demonstrate in Figure 3 how a typical diagnostic therapeutic problem can be analysed using a decision tree. Three possible strategies are compared: use of a bitewing radiograph to diagnose if dentin caries is present in an approximal tooth surface which therefore requires restorative treatment, and the other hypothetical options of watchful waiting (without testing or treating) and treating all cases (without testing). Each pathway in the decision tree has its own probability. For example, the probability of a TP decision after a radiograph is the product of the prevalence and the sensitivity of the test. When, without testing, treatment is the strategy, the probability of a TP decision is equal to the prevalence of disease.

In a decision analysis, the best strategy is the strategy with the highest expected utility. This expected utility can be calculated by multiplying the utility for each pathway with the probability of the pathway. Assume, in accordance with the study among dental students cited in the previous section for a given time scale, that the utility values are 100 for treatment correctly withheld (TN), 78 for treated caries (TP), 36 for unnecessary treatment (FP) and 0 for untreated dentin caries (FN). Furthermore, assume that radiography as a diagnostic test for dentin caries has a sensitivity of 40% and a specificity of 98% (Table 1), and that the prevalence of dentin caries is estimated at 10%. With these assumptions, the expected utility of each strategy can be calculated. When testing, the TN, TP, FP and FN outcomes have probabilities of 88%, 4%, 2% and 6%, respectively. The resultant overall expected utility of testing is then 93 (i.e. 88% 6 100 + 4% 6 78 + 2% 6 36 + 6% 6 0). The expected utility is 40 (10% 6 78 + 90% 6 36) for watchful waiting, and 90 for treating all cases (10% 6 0 + 90% 6 100). This calculation shows that, at 10% prevalence, testing provides a better expected value than not testing. This conclusion depends, however, on the prevalence of
Dentomaxillofacial Radiology

Evidence-based diagnosis PA Mileman and WB van den Hout

Figure 4 The best test strategy and expected utility depending on the prevalence of dentin caries (using utilities of dental students, and diagnostic accuracy from Table 1). Below the test threshold of 4% prevalence, the best strategy is to withhold treatment without taking a bitewing radiograph. Above the testtreatment threshold of 57% the best strategy is treatment also without resorting to testing. In the intermediate range of 457% the strategy maximizing health is testing and restoring surfaces with a positive test result

dentin caries. This prior probability is more likely to be in the region of 2% for the average Dutch teenager than 10%. Figure 4 shows a so-called sensitivity analysis, where we have recalculated the expected utility of the three strategies for a prior probability varying between 0% and 100%. This sensitivity analysis takes into account the utilities of the outcomes so that the two thresholds (Figure 4), the test threshold and the testtreatment threshold, can be seen. Watchful waiting is the optimal strategy up to a prevalence of 4% (test threshold); treatment following positive diagnosis of dentin caries on a bitewing radiograph is the best strategy between 4% and 57% (the testtreatment threshold) and instigating treatment for all cases without testing is optimal above 57%. This example shows how, starting from the accuracy of a test and preferences for the health outcomes, the test threshold and testtreatment threshold can be determined. The advantage of the threshold approach to diagnosis is that the precise pre-test probability of disease need not be known in order to decide on the best course of action, only whether it is above or below the thresholds. In general, the prevalence of approximal caries in western European populations lies below the test threshold so that the FP diagnoses resulting from the use of an imperfect screening test will damage patient health. Using validated selection criteria from tested guidelines on the population, however, could result in selecting patients who would actually benefit
Dentomaxillofacial Radiology

from a radiographic examination. The incidence of disease will determine after which period of time patients, once they have had a radiographic examination, will again exceed the test threshold and therefore benefit from further radiographs. In this illustrative example, we have of course assumed bitewing radiographs are used for one sort of target disorder.

Implications for dental practice and student education Traditionally, diagnosis has been learnt from experts presently seen as the lowest level in the hierarchy of evidence.28 The use of experts as a method is incompatible with the probabilistic view of diagnosis presented earlier where the existence of errors is explicitly taken into account. The basis of diagnostic literature is now changing.2931 More emphasis is being placed on criteria assuring the validity of diagnostic research and essential to this is use of a valid gold standard. In this process, data are beginning to be published which enable an evidence-based and clinical decision analysis approach to diagnosis in dentistry to be pursued. Scientific publications concerning diagnostic research have a number of additional complications when compared with those on therapeutic research. Randomized controlled clinical trials of diagnosis may be difficult to carry out because omitting the use of a

Evidence-based diagnosis PA Mileman and WB van den Hout

valid test or treatment option in the control arm could be seen as unethical. The relationship between the sensitivity and specificity of a test means that summarising the results of tests in meta-analysis has lead to problems, although these may be resolved by improved data summarising techniques20 and the adoption of new guidelines for reporting diagnostic studies. The benefits of diagnosis are dependent on the trade off between FP and FN decisions. The optimal balance between these two types of error depends not only on the accuracy of the diagnostic test used and the prevalence of disease but on how serious the patient considers these errors in outcome to be. When the chance of pathology falls below a certain threshold, the use of a diagnostic test will, because of unnecessary treatment, lead to deterioration in the health of the patient. The use of validated guidelines in dental radiography should, however, lead to a selection of patients who will actually benefit from a diagnostic examination. In conclusion, dentists need to know the factors that play a role in determining the chance of a correct diagnosis and subsequent treatment decision making (Figure 1). More than ever, they need to gain insight into their own diagnostic accuracy for common pathological
1. Wulff HR. Rational diagnosis and treatment. An introduction to clinical decision making. Oxford: Blackwell Scientific Publications, 1981. 2. Mileman PA, Kievit J. Efficie ntie van diagnostiek en kwaliteit van besluitvorming: klinische besliskunde. In: van der Stelt PF, Arnold LV, Duinkerke ASH, Sandrink GCH (eds). Tandheelkundige Radiologie. Bohn, Stafleu, van Loghum: Houten, 1995, F10, pp 150. 3. Offringa M, Assendelft WJJ, Scholten RJPM. Inleiding in evidence-based medicine. Klinische handelen gebaseerd op bewijsmateriaal. Bohn Stafleu Van Loghum: Houten/Diegem, 2000. 4. Cochrane AL. Effectiveness and efficiency. Random reflections on health services. London: The Nuffield Provincial Hospitals Trust, 1972. 5. Bossuyt PM, Reitsema JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem 2003; 49: 718. 6. European Commission, Radiation Protection. European guidelines on radiation protection in dental radiology. The safe use of radiographs in dental practice. Luxembourg: Office for Official Publications of the European Communities, 2004 [cited 2006 July 19]. Available from: 7. Rohlin M, Mileman PA. Decision analysis in dentistry the last 30 years. J Dent 2000; 28: 453468. 8. Mileman PA, Hout WB van den. Comparing the accuracy of Dutch dentists and dental students in radiographic diagnosis of dentinal caries. Dentomaxillfac Radiol 2002; 31: 714. 9. Mileman PA, Hout WB van den, Sanderink GC. Looking for caries? Teachers evaluate a program to improve caries diagnosis from radiographs. Eur J Dent Educ 2004; 8: 3542. 10. Sedaghatfar M, August MA, Dodson TB. Panoramic findings as predictors of inferior alveolar nerve exposure following third molar extraction. J Oral Maxillofac Surg 2005; 63: 37. 11. Stheeman SE, Mileman PA, Hof MA vant, van der Stelt PF. An approach to the development of decision support for diagnosing pathology from radiographs. Dentomaxillofac Radiol 1995; 24: 238242.

conditions in order where necessary to be able to improve in diagnosis and to be able to interpret the relevance of the literature on diagnostic testing for their dental practice. In dental education there are already various computer programmes available to help the coming generation of dentists in this task.9 The further development of such programmes and their availability on the internet for dentists is likely. Finally more research into the values that patients place on the results of interventions is essential for improving the evidence base of diagnosis in dentistry. These developments in evidencebased diagnosis, aimed at improving patient health, should not pass by the student of dentistry or the dental practitioner unnoticed.
Acknowledgment The English translation of this article appears here with the kind permission of Nederlands Tijdschrift voor Tandheelkunde. Previously published as: PA Mileman, Hout WB van der. Evidence-based diagnostiek en klinische besluitvorming. Ned Tijdschr Tandheelkd 2007; 114: 187194.

12. White SC. Computer-aided differential diagnosis of oral radiographic lesions. Dentomaxillofacial Radiol 1989; 18: 5359. 13. White SC. ORAD II Oral radiographic differential diagnosis. ORAD for the web- ORAD version 2.0. [Updated 2005 July 9; cited 2006 July 19]. Available from: 14. Wenzel A, Hintze H. The choice of gold standard for evaluating tests for caries diagnosis. Dentomaxillofac Radiol 1999; 28: 132136. 15. Jaeschke R, Guyatt G, Sackett DL. Users guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The Evidence-Based Medicine Working Group. JAMA 1994; 271: 703707. 16. Hunink M, Glasziou P, Siegel J, Weeks J, Pliskin J, Elstein A, et al. Decision making in health and medicine. Integrating evidence and values. Cambridge: Cambridge University Press, 2001. 17. Bhandri M, Montori VM, Swiontkowski MF, Guyatt H. Users guide to the surgical literature: how to use an article about a diagnostic test. JBJS Org 2003; 85a: 11331140. 18. Erkel AR van, Pattynama PMT. Receiver operating characteristic (ROC) analysis: basic principles and applications in radiology. Eur J Radiol 1998; 27: 8894. 19. Deville WLJM. Thesis. Evidence in diagnostic research. Reviewing diagnostic accuracy: from search to guidelines. Vrije University in Amsterdam, Ponsen & Looijen: Wageningen, 2001, pp. 1152. 20. Glas AS. Thesis. Beyond diagnostic accuracy. Applying and extending methods for diagnostic test research. University of Amsterdam, Thela Thesis: Amsterdam, 2003, pp 1159. 21. Pretty IA, Maupome G. A closer look at diagnosis in clinical dental practice: Part 3. Effectiveness of radiographic diagnostic procedures. J Canad Dent Assoc 2004; 70: 388394. 22. SUMSEARCH. University of Texas Health Sciences Center, Department of Medicine-Medical Informatics. [Updated 2006 April 19; cited 2006 July 19] Available from: http://sumsearch. 23. Taylor-Weetman K, Wake B, Hyde C. Comparison of panoramic and bitewing radiography for the detection of dental caries: a systematic review of diagnostic tests. Birmingham: University of Birmingham, 2002, pp 155. 24. Bader J, Ismail A. Survey of systematic reviews in dentistry. J Am Dent Assoc 2004; 135: 464473.
Dentomaxillofacial Radiology


Evidence-based diagnosis PA Mileman and WB van den Hout

25. Sanden WJ van der, Nienhuijs ME, Mettes TG. De rol van richtlijnen in de tandheelkundig zorgverlening. Ned Tijdschr Tandheelkund 2007; 114: 179186. 26. Mileman PA, Hout WB van den. Preferences for oral health states: effect on prescribing periapical radiographs. Dentomaxillofac Radiol 2003; 32: 4017. 27. Fyffe HE, Kay EJ. Assessment of dental health utilities. Community Dent Oral Epidemiol 1992; 20: 269273. 28. Aartman IH, Loveren C van. Evidence-based tandheelkunde. Onderzoeksontwerpen en de ladder van evidence. Ned Tijdschr Tandheelkd 2007; 114: 161165.

29. Deville WL, Buntinx F, Bouter LM, Montori VM, Vet HC de, Windt AW van der, et al. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Medical Research Methodology 2002; 2: 113. 30. Leeflang M, Reitsma J. Scholten R, Rutjes A, Di Nisio M, Deeks J, et al. Impact of adjustment for quality on results of metaanalyses of diagnostic accuracy. Clin Chem 2007; 53: 164172. 31. Leeflang MM, Scholten RJ, Rutjes AW, Reitsma JB, Bossuyt PM. Use of methodological search filters to identify diagnostic accuracy studies can lead to the omission of relevant studies. J Clinic Epidemiol 2006; 59: 234240.

Dentomaxillofacial Radiology