VOLUME 5  NUMBER 1  2012 



From Figures to Values: The Implicit Ethical Judgements in our Measures of Health
Paolo Vineisà , School of Public Health, Imperial College London Roberto Satolli, Zadig, Milano Italy

Corresponding author: Paolo Vineis, School of Public Health, Imperial College London, St Mary’s Campus, Norfolk Place W2 1PG London.

Tel: +44 (0)20 75943372; Fax: +44 (0)20 75943196; Email:

The objective of the article is to examine the extensions of a clinical measure of efficacy, the Number Needed to Treat (NNT), in different settings including screening, scanning, genetic testing and primary prevention, and the associated ethical implications. We examine several situations in which the use of the NNT or NNS (Number Needed to Screen) has been suggested, such as Prostate-Specific Antigen for prostate cancer, Magnetic Resonance Imaging scans, genetic testing and banning of smoking. For each application, we explore the ethical implications of the relevant measure. We have found that the different measures have different ethical implications. For example, the Number Needed to Prevent is the only measure that can be lower than one, indicating with a numerical example that prevention is better than cure. Conversely, we raise questions about the acceptability of genetic screening. In a realistic example, we show that primary prevention of the effects of arsenic in drinking water, targeted to the most susceptible, would require to genetically screen a large number of subjects, whereas giving rise to ethical concerns. We warn against the abuse of testing, in particular genetic testing, we show that different measures are associated with different ethical issues and that prevention tends to be better than cure.

Downloaded from by Edgar M on December 14, 2012

How the impact of medical and preventive activities is measured is one of the important issues that epidemiologists face, and it has moral implications. The purpose of this article is to show how different measures of treatment and prevention are associated with very different impacts for the populations involved, and entail different moral implications. For the aim of this analysis, any clinical or public health intervention can be considered worthy on the basis of two ethical principles: (i) benefits should exceed harm (beneficence) and (ii) the priority in the use of public resources should be for interventions that produce more benefits for more people (utility).

aspects—and its use has thrived in the last decades—see (Zulman et al., 2008) for an application to Public Health strategies. It is a summary measure that allows the physician to estimate how many patients need to receive a treatment to have a benefit, it can be compared with the expected burden of side-effects, with alternative courses of action, and can lead to a cost–benefit analysis. However, its extensions to testing, screening, scanning (including incidental findings) and primary prevention have not been fully explored and will be analysed here from a public health perspective. By examining different scenarios, we will address the moral implications involved in the use of the NNT and derived measures.

Number Needed to Treat
The Number Needed to Treat (NNT) is probably the most useful single figure that one needs to know in order to judge the efficacy of a therapy, and in fact of any medical intervention. Its properties have been described—see (Schulzer and Mancini, 1996) and (Walter, 2001) for reviews and a discussion of statistical

Scenario 1: Therapy and Tertiary Prevention
The NNT is the number of patients that is necessary to treat with a drug or any other medical intervention to save one life, to avoid the loss of 1 year of life (or of one Quality Adjusted Life Years (QALY)), or to reduce other specified adverse health outcomes. The NNT (Box 1) is a function of the efficacy of the therapy and of the frequency of the outcome we want to avoid or prevent.

! The Author 2012. Published by Oxford University Press. Available online at



Box 1. Example of a measure of treatment efficacy Let us consider a drug that is supposed to prevent heart disease (e.g. a statin). To express its efficacy, one can calculate the frequency of deaths or of illnesses, after a sufficiently long time, in the treated group compared with the control group. In a large study in healthy subjects with normal cholesterol, but with an altered level of an inflammation marker (CRP) (the Jupiter study), the deaths were 12.5 per year every 1000 people in the control group, and 10 in those treated with the drug. The two frequencies can be compared by calculating the difference (i.e. the deaths decreased by 2.5 per 1000 per year). However, this measure is rarely used to communicate benefits. The authors of clinical trials prefer to calculate the percent of risk reduction in the treated arm compared with controls, in this case 20 per cent (i.e. 2.5 divided by 12.5). In this way, the apparently modest absolute result is transformed into a more attractive relative reduction. In other words, when the basal risk is low, even a modest absolute benefit translates into an apparently large relative benefit. However, one of the most useful measures is the NNT in order to avoid one adverse event such as death. In our example, the drug benefits 12.5 patients out of 1000 treated for 5 years (2.5 multiplied by 5 years). This means that (1000 divided by 12.5) subjects need to be treated to obtain one benefit, i.e. to avoid one death. The NNT is thus 80 subjects. Is it large or small? To give an idea, 70 elderly patients with hypertension need to be treated for 5 years with anti-hypertensives, in order to avoid one death; or, 100 male adults with no sign of heart disease need to take aspirin for 5 years in order to avoid an infarction. Not only is the NNT an easily interpretable measure, but also allows comparative analyses including costs. For example, if a year of therapy against cholesterol costs E1000 per patient; then approximately E400,000 are needed to prevent one death by treating 80 people for 5 years. When the therapy is very effective, like surgery for appendicitis, and the outcome is frequent in the absence of intervention, then the NNT is very close to 1, i.e. we save almost all patients who are treated. This is a very uncommon occurrence in medicine, and most NNTs fluctuate around 50–500. Notice that the NNT may be high, even for a common adverse outcome, not only if the therapy is ineffective, but also if spontaneous recovery occurs, since the measure of efficacy is based on a comparison between treated and untreated patients (Box 1). Therefore, we may have a very high NNT in the case of pancreatic cancer (frequency of death 100%, highly ineffective therapies), but also for the therapy of the common influenza, depending on the day of observation (with very high rate of spontaneous recoveries a few days after treatment initiation). The NNT increases with a decreasing frequency of the outcome, whereas in contrast adverse side-effects of therapies have the same occurrence rate, irrespectively of the frequency of the outcome that we want to prevent. For example, there is a fixed proportion of subjects who will undergo aplastic anaemia after treatment with ibuprofen, whether or not the drug is properly used in seriously sick patients who really need it or inappropriately used in subjects with a mild and self-containing disease. This relationship is represented in Figure 1, which shows that treatment should be initiated only when the advantages overcome the side-effects. This well-known Figure is usually applied to therapies, but common sense would suggest to apply it to any medical act. Walter and Sinclair (2009) have recently analysed the issue of the ‘minimum target event risk for treatment’, i.e. the threshold to undertake a treatment, and they noticed the frequent lack of information that may allow an informed decision. The ‘first ethical implication’ is that any benefit should be compared with side-effects, and the two are asymmetric, because only benefits of treatment are influenced by the frequency of the outcome, so that damage without benefit can easily occur for rare outcomes. Benefit and harm are asymmetric also because they do not necessarily refer to the same persons, so that an intervention can slightly harm a large number of people in order to benefit only one person. These two asymmetries are in contrast with both the principles of beneficence and of utility.

Downloaded from by Edgar M on December 14, 2012

Number Needed to Test
Suppose that a doctor wants to prescribe a Computerized Tomography (CT) for joint pain. If it is highly likely that the CT will help her/him—to decide whether to treat the patient or not, then the NNT for therapy can be simply estimated for the treated subjects. But if any treatment is unlikely to be undertaken, why is the test performed? Has the doctor considered the potential side-effects of the CT for the patient? In this circumstance, it seems reasonable to estimate not actually the NNT, but the Number Needed to Test. In the case of



Figure 1. The Figure shows that with an increasing frequency of health effects (outcomes) the NNT is lower, i.e. the benefits of treatment are higher, whereas harm is independent of the frequency of outcomes (see also Box 1).

appendicitis, diagnosis is very simple, and in most cases all patients undergoing the relevant tests will have the correct diagnosis and will be saved by surgery. But this is clearly an exception. The doctor may decide to perform a CT scan in 1000 patients with joint pain to identify the 10 who can theoretically benefit from a specific therapy. The NNT for those 10 patients may be, say, 10 (to be optimistic), i.e. out of the 10 patients with that condition who are treated, only 1 will recover, thanks to the treatment. The other nine will get the drug (with its side-effects) with no benefit. But we also have to include in the equation the 990 patients who underwent a CT scan with no gain. Therefore, the Number Needed to Test is in fact 999, and among the side-effects, we have to count also those of the diagnostic test. Again, the frequency of the side-effects is independent of the efficacy (or lack of) of the treatment and of the frequency of the outcome. The ‘second moral implication’ is that testing itself (not only treatment) can lead to a large number of useless interventions, and the related discomfort. In fact, the ratio between useful and useless interventions can be much higher than for the NNT. Therefore, the calculation of the Number Needed to Test is more useful than the NNT in evaluating the beneficence and utility of any medical intervention.

Needed to Test, but the computation needs to incorporate the prevalence of the condition in asymptomatic subjects. It is like the Number Needed to Test but in the absence of signs and symptoms, and therefore with a usually much lower disease prevalence. In fact, the Number Needed to Screen (NNS) for breast cancer screening, e.g. is around 2500–20,000, depending on the age bands. This means that at least 2500 women will undergo the screening test to identify a fraction who have a potentially malignant lesion, among whom there is one who will be saved by the screening activity. This leads to a ‘third moral implication’, i.e. the overall effect of a screening test in asymptomatic subjects depends on the prevalence of the asymptomatic condition, so that a test has completely different effectiveness, e.g. in different age groups, and in accord to the principles of beneficence and utility should not be offered to a population with a low prevalence of the condition, when the expected benefits are likely to be exceeded by the harm. According to one study, 3 million American men aged 40–74 years would show abnormal PSA levels if screened (>4.0 nanograms per millilitre; with a proposed threshold of 2.5 nanograms per millilitre, an additional 3 million men would be abnormal). However, only 0.4 per cent of men in the age range 40–74 years are expected to die every year from prostate cancer. Let us suppose that screening reduces the risk of dying by 20 per cent, probably an optimistic estimate [this is the estimate found in the European ERSPC trial, not in the American PLCO (Andriole et al., 2009; Schroder et al., 2009)]. With the figures given in the recent ERSPC report (Schroder et al., 2009), the absolute risk reduction is 0.7 per 1000 in 10 years, which gives a NNS to save a life of 1400 (1/0.0007), a rather high value. Another way to estimate the impact is to say that 48 additional tumours need to be treated to prevent one death (Schroder et al., 2009). This means that approximately 1399 subjects will undergo screening with no benefit, and 47/48 will suffer from all the complications related to prostatectomy with no real gain in survival. If we consider the different life expectations, the NNS to avoid the loss of 1 year (or a QALY) would probably be higher for older people (70 years), in spite of the higher prevalence of the cancer.

Downloaded from by Edgar M on December 14, 2012

Scenario 2: Secondary Prevention—the Example of PSA Testing
The 1000 hypothetical patients above were all affected by joint pain. What about a screening scenario, such as Prostate-Specific Antigen (PSA) for prostate cancer? This situation is similar to the estimation of Number

Scenario 3: Disease Prediction—the Example of Genetic Testing
One can argue that breast cancer screening is useful indeed, at least over the age of 50 years; and perhaps,



less convincingly, that also PSA screening may be useful. But there are instances in which no benefit can be demonstrated. One such instance is screening for low-penetrant genetic variants. Let us consider what the website of Decode, an Icelandic firm specialized in genetic research, offers. They suggest that by sending them a blood sample they can identify the gene variants that predispose to cancer and other chronic diseases. What happens if one has a ‘bad gene’? There are in fact only two possibilities: one is early diagnosis by a screening test such as mammography, a strategy used in women with Breast Cancer 1 (BRCA1) mutations; the other is a primary preventive strategy, e.g. quitting smoking for a smoking-related cancer. Here, we are interested in the methodological properties of the NNT and the ensuing moral implications. According to Decode’s website to predict the onset of bladder cancer, a smoking-related cancer, they will look at two gene variants, one in the region 8q24 (chromosome 8) and the other in 5p15 (chromosome 5). Is it useful? Will one be benefited? It is very hard to say, since Decode does not explain what one is supposed to do with the genetic information they offer. The only ways to make use of such information are either to prevent exposures to carcinogens or to screen the carriers of the variant(s) with greater intensity than non-carriers. Unfortunately, the second possibility does not apply in this situation since there is no effective early detection test for bladder cancer. The ‘fourth moral implication’ is that, for beneficence and utility, no testing should be done when an effective intervention is not available. Let us then imagine that we screen people in order to suggest them to avoid exposure to a bladder carcinogen, such as arsenic. The example is purely theoretical and has been fully developed elsewhere (Vineis et al., 2005). We hypothesize that the relative risk associated with the gene variant is 1.5 (low penetrance), that the cumulative risk for bladder cancer is 1 per cent in the normal population (1.5 per cent among the carriers of the gene variant), and that reduction or elimination of exposure to arsenic leads to a 50 per cent reduction in the risk of bladder cancer (all realistic assumptions). This means that the cumulative risk after intervention is 0.75 per cent in carriers of the gene variant, and the risk reduction becomes 0.015–0.0075, i.e. 0.75 pre cent, leading to a NNT of 133 (1/0.75). Under this scenario, if the exposure to arsenic is reduced only in the carriers of the variant, we will need to ‘treat’ (i.e. to reduce exposure for) 133 exposed subjects to prevent one case. If exposure to arsenic is instead reduced for the ‘wild-type’ (again with an efficacy of 50 per cent), then the NNT is

200. The difference between 133 and 200 is clearly not striking, i.e. selecting those with the gene variant is not particularly advantageous. But there is a further complication, because we need to screen the population to identify the variant carriers; the wild-type occurs in 80 per cent of the people, the variant only in 20 per cent. This means it would be necessary to screen 666 subjects to identify the 133 to ‘treat’ with preventive policies to avoid one case of cancer (if we want to treat only the variant carriers). So the costs and side-effects of screening may not be worthwhile, even without considering ethical issues related to utility, etc. Thus, the ‘fifth moral implication’ is that testing may divert attention from a more equitable and effective (on a population basis) intervention, in this case primary prevention.

Downloaded from by Edgar M on December 14, 2012

Scenario 4: Incidental Findings—the Example of Brain Imaging
A clear example of a recent application of the NNTest is a meta-analysis of studies on brain magnetic resonance imaging (MRI), in which a rather high prevalence (0.7%) of incidental findings occurs (Morris et al., 2009). We could discuss whether this is screening or not: usually, MRI is done because of symptoms, but often it is done for research, and in any case incidental findings arise that are unrelated to the symptoms. Screening is usually not the purpose, or at least the requirements of a screening test are not met. Clinicians do not know yet how to deal with incidental findings, such as aneurysms, and guidelines are not available. The authors of the meta-analysis use (apparently for the first time) what they call the Number Needed to Scan, which is only 50 for ‘any non-neoplastic incidental finding’, clearly a very low figure: every 50 scans, one will be considered suspect or pathological. It is worth noting that this Number Needed to Scan has little to do with the NNT (or to screen), which is the number of patients we need to treat/screen to avoid one adverse outcome like death. In the case of scanning, the index just tells us the number we need to scan to find one positive result of any kind, irrespective of treatment efficacy or usefulness of the finding. It seems that the risk of haemorrhage from unruptured aneurysms is low, but MRI is too recent to allow for a sufficiently long follow-up. In contrast, the risk of stroke or death from surgical interventions is sizable. In practice, we do not know where we are positioned in the graph of risks versus benefits shown in Figure 1. Consider also that 94 per cent of meningiomas remain



asymptomatic and 63 per cent do not grow. On the other side, the occasional discovery of a brain lesion for the patient means the loss of the driving licence, insurance and (in some countries and for some jobs) of employment. These are all side-effects of the MRI that do not depend on the efficacy of treatments and the frequency of the outcomes. This example is an extension of the fifth moral implication. In this case, not only there is no known beneficial intervention, but also even the natural history of the disease is little understood.

Scenario 5: the Number Needed to Prevent
As an article in the New York Times stated in January 2007, for most Americans, the biggest health threat is not avian flu, West Nile or mad cow disease. Its their health-care system: ‘advanced technology allows doctors to look really hard for things to be wrong. We can detect trace molecules in the blood. We can direct fiber-optic devices into every orifice. And CT scans, ultrasounds, MRI and PET scans let doctors define subtle structural defects deep inside the body. These technologies make it possible to give a diagnosis to just about everybody . . . Second, the rules are changing. Expert panels constantly expand what constitutes disease: thresholds for diagnosing diabetes, hypertension,

osteoporosis and obesity have all fallen in the last few years. The criterion for normal cholesterol has dropped multiple times. With these changes, disease can now be diagnosed in more than half the population’. In more technical terms, this escalation is represented in Figure 2 that shows how the NNT or NNS is increasing when we move to the left from death or frank symptomatic disease to early diagnosis and to ‘pre-clinical conditions’. In fact, this encompasses a series of measures that include the NNT on the right, then the Number Needed to Test, then the NNS on the left. The latter shift has repeatedly occurred in recent years with almost all the thresholds used for diagnostic purposes: cholesterol from 160 to 130 or 100 milligrams per decilitre, fasting glycaemia from 140 milligrams per decilitre to 126 milligrams per decilitre, systolic blood pressure from 160 to 140 mmHg and then 120 mmHg. If we add genetic testing for inherited susceptibility to disease, the NNT/NNS shifts further to the left in Figure 2. One important property of prevention is that the NNP (Number Needed to Prevent one case of disease) can be <1. This apparently paradoxical situation occurs when a relatively limited preventive action has an impact that goes beyond those who are directly affected by it, e.g. for an indirect fallout. The typical example is herd immunity: vaccinating a relatively limited number of subjects prevents the disease in many more, e.g.

Downloaded from by Edgar M on December 14, 2012

Figure 2. The vertical axis shows the NNT, the number of subjects who need to be examined and treated to avoid one adverse effect such as death. The dark grey area represents those who are beyond the clinical threshold, i.e. those who already suffer from symptoms or have signs of disease. The light grey area includes those who are considered ill only because they are above a certain threshold such as glycaemia, though they still feel well. With a decreasing threshold in asymptomatic persons, the NNT increases (to the left).



Table 1. Properties of the different measures described in the text Measure Components Range Examples

NNTreat NNTest NNScreen NNPrevent

Relative risk frequency of outcome Same plus prevalence of disease in symptomatic persons Same plus prevalence of pre-clinical condition in asymptomatic persons Relative Risk Frequency of outcome

1 to infinity >1 to infinity >1 to infinity 1 to >1

Surgery for acute appendicitis MRI for joint pain Mammography Herd immunity

treating 10, we save 100. Similarly, banning smoking in public places has a positive effect not only in those potentially exposed to second-hand smoke (the target population), but also in smokers, who will smoke less. Zulman et al. (2008) have considered how the NNT helps disentangling the efficacies of different public health strategies, including focused strategies aimed at high-risk groups, versus unfocused strategies aimed at the general population. They notice that a populationbased intervention is a good option (in terms of NNT, though it should be more adequately called NNP) if there are no adverse effects, whereas a targeted approach may prevent more deaths while treating fewer people if adverse effects are present. The ‘sixth moral implication’ seems to be that prevention is better than cure also for a very technical reason, related to utility and beneficence, i.e. at least in certain cases the ratio between the ‘treated’ individuals and the benefited individuals is <1.

low-penetrant genes, we may conclude that the overall effort is far from being justified, as our example of arsenic shows. Clearly, if there is no benefit (or benefits are unknown, like in the case of the Number Needed to Scan with MRI), the NNS tends to infinity. Finally, if we extend the reasoning to primary prevention, we discover an interesting property of the NNP, i.e. it is the only measure that can be <1, when the benefit of prevention goes beyond the target of the preventive effort. Such situations can be much more frequent than we think, from herd immunity to climate change. In general, we can say that the different measures related to the original NNT can also be used as proxies of the adherence of an intervention to the ethical principles of beneficence and utility.

Downloaded from by Edgar M on December 14, 2012

We are grateful to Michael Parker (Ethox, Oxford) for useful suggestions.

By reviewing the different measures related to the original NNT (as summarized in Table 1), we have identified some interesting features of each of them. In exceptional cases, the NNT is 1, when all patients would die and all are saved; but usually the NNT is >1, and it is even greater if we consider the Number Needed to Test rather than the NNT, i.e. we include in the denominator all the subjects who undergo a diagnostic test rather than only those who are offered the treatment. By definition, the NNS is greater than the NNT, because it involves asymptomatic persons whose disease prevalence is lower than for the symptomatic ones. If we want to screen for genetic susceptibility, as several commercial laboratories now propose, we have to compare the benefits gained by screening and treating only the susceptibles, with the benefits obtained by treating the whole population. At least for

Conflict of interest
None declared.

Andriole, G. L., Crawford, E. D., Grubb, R. L. 3rd, Buys, S. S., Chia, D., Church, T. R., Fouad, M. N., Gelmann, E. P., Kvale, P. A., Reding, D. J., Weissfeld, J. L., Yokochi, L. A., O’Brien, B., Clapp, J. D., Rathmell, J. M., Riley, T. L., Hayes, R. B., Kramer, B. S., Izmirlian, G., Miller, A. B., Pinsky, P. F., Prorok, P. C., Gohagan, J. K. and Berg, C. D. (2009). Mortality Results from a Randomized Prostate-cancer Screening Trial. New England Journal of Medicine, 360, 1310–1319.



Morris, Z., Whiteley, W. N., Longstreth, W. T. Jr, Weber, F., Lee, Y. C., Tsushima, Y., Alphs, H., Ladd, S. C., Warlow, C., Wardlaw, J. M. and Al-Shahi Salman, R. (2009). Incidental Findings on Brain Magnetic Resonance Imaging: Systematic Review and Meta-analysis. BMJ, 339, b3016. Schroder, F. H., Hugosson, J., Roobol, M. J., Tammela, T. L., Ciatto, S., Nelen, V., Kwiatkowski, M., Lujan, M., Lilja, H., Zappa, M., Denis, L. J., Recker, F., Berenguer, A., Maattanen, L., Bangma, C. H., Aus, G., Villers, A., Rebillard, X., van der Kwast, T., Blijenberg, B. G., Moss, S. M., de Koning, H. J. and Auvinen, A. (2009). Screening and Prostate-cancer Mortality in a Randomized European Study. New England Journal of Medicine, 360, 1320–1328. Schulzer, M. and Mancini, G. B. (1996). ‘Unqualified Success’ and ‘unmitigated Failure’: Numberneeded-to-treat-related Concepts for Assessing

Treatment Efficacy in the Presence of Treatmentinduced Adverse Events. International Journal of Epidemiology, 25, 704–712. Vineis, P., Ahsan, H. and Parker, M. (2005). Genetic Screening and Occupational and Environmental Exposures. Journal of Occupational and Environmental Medicine, 62, 657–662, 597. Walter, S. D. (2001). Number Needed to Treat (NNT): Estimation of a Measure of Clinical Benefit. Statistics in Medicine, 20, 3947–3962. Walter, S. D. and Sinclair, J. C. (2009). Uncertainty in the Minimum Event Risk to Justify Treatment was Evaluated. Journal of Clinical Epidemiology, 62, 816–824. Zulman, D. M., Vijan, S., Omenn, G. S. and Hayward, R. A. (2008). The Relative Merits of Populationbased and Targeted Prevention Strategies. Milbank Quarterly, 86, 557–580.

Downloaded from by Edgar M on December 14, 2012

Sign up to vote on this title
UsefulNot useful