You are on page 1of 21

Expert Opinion on Drug Safety

ISSN: 1474-0338 (Print) 1744-764X (Online) Journal homepage: http://www.tandfonline.com/loi/ieds20

The role of data mining in pharmacovigilance

Manfred Hauben, David Madigan, Charles M Gerrits, Louisa Walsh & Eugene
P Van Puijenbroek

To cite this article: Manfred Hauben, David Madigan, Charles M Gerrits, Louisa Walsh & Eugene
P Van Puijenbroek (2005) The role of data mining in pharmacovigilance, Expert Opinion on Drug
Safety, 4:5, 929-948, DOI: 10.1517/14740338.4.5.929

To link to this article: https://doi.org/10.1517/14740338.4.5.929

Published online: 07 Sep 2005.

Submit your article to this journal

Article views: 589

Citing articles: 117 View citing articles

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=ieds20
Review
General

The role of data mining in


pharmacovigilance
Manfred Hauben, David Madigan, Charles M Gerrits†, Louisa Walsh
1. Introduction & Eugene P Van Puijenbroek
†Takeda Global Research and Development, Inc., Department of Pharmacoepidemiology and Outcomes
2. Mining spontaneous reporting
system data: theory Research, Lincolnshire, Illinois, USA

3. Mining spontaneous reporting


A principle concern of pharmacovigilance is the timely detection of adverse
system data: practice
drug reactions that are novel by virtue of their clinical nature, severity and/or
4. Clinical versus computational frequency. The cornerstone of this process is the scientific acumen of the
approaches pharmacovigilance domain expert. There is understandably an interest in
5. Conclusion developing database screening tools to assist human reviewers in identifying
6. Expert opinion associations worthy of further investigation (i.e., signals) embedded within a
database consisting largely of background ‘noise’ containing reports of no
substantial public health significance. Data mining algorithms are, therefore,
being developed, tested and/or used by health authorities, pharmaceutical
companies and academic researchers. After a focused review of postapproval
drug safety signal detection, the authors explain how the currently used
algorithms work and address key questions related to their validation, com-
parative performance, deployment in naturalistic pharmacovigilance settings,
limitations and potential for misuse. Suggestions for further research and
development are offered.

Keywords: data mining, disproportionality, drug safety, pharmacovigilance

Expert Opin. Drug Saf. (2005) 4(5):929-948

1. Introduction

Increasing scientific, regulatory and public scrutiny is focused on the obligation of


the medical community, pharmaceutical industry and health authorities to ensure
that marketed drugs have acceptable benefit–risk profiles. This is an intricate and
ongoing process that begins with careful preapproval studies, but continues after
regulatory market authorisation when the drug is in widespread clinical use. In the
latter environment, surveillance schemes based on spontaneous reporting system
(SRS) databases are a cornerstone for the early detection of drug hazards that are
novel by virtue of their clinical nature, severity and/or frequency. Pharmacovigilance
is often used to describe the aforementioned surveillance activities.
Early hints that suggest the possibility of novel adverse events (AEs) are often
referred to as ‘signals’, although considerable ambiguity surrounds the use of this
term. The following definition of a signal will be employed: ‘A set of data constitut-
ing a hypothesis that is relevant to the rational and safe use of a medicine. Such data
For reprint orders, please
contact: are usually clinical, pharmacological, pathological or epidemiological in nature. A
reprints@ashley-pub.com signal consists of a hypothesis together with data and arguments’ [1].
Computational signal detection algorithms – hereafter referred to as data mining
algorithms (DMAs) – may assist pharmacovigilance domain experts to discover
potentially relevant drug–event associations (DEAs). Data mining is the process of
seeking interesting or valuable information within large data sets [2]. Myriad data
mining applications exist in healthcare areas as diverse as medical imaging, gene
Ashley Publications
www.ashley-pub.com expression analysis and nursing [3-5]. In addition, data mining is also being explored
in databases that exist principally for purposes other than knowledge discovery (e.g.,

10.1517/14740338.4.5.929 © 2005 Ashley Publications Ltd ISSN 1474-0338 929


The role of data mining in pharmacovigilance

claims databases or [electronic] medical records). By contrast, patients. Although legally required in some countries, there is
SRS databases exist precisely to facilitate early discovery of de facto voluntary reporting for all but pharmaceutical manu-
potentially dangerous associations. In recent years, data min- facturers; this introduces differential reporting of AEs. The lit-
ing in pharmacovigilance has attracted significant attention erature surveying the factors that influence reporting behavior
and has the potential to discover complex interactions that provides potential opportunities for process improvements that
defy human recognition. The marriage of computer-intensive are beyond the scope of this article [7-13].
data mining algorithms with pharmacovigilance domain The next step is review of these reports by a professional
expertise represents a promising alliance. capable of accurately classifying and coding AE terms and rec-
The authors’ objective is to critically assess DMAs that are ognising potentially serious events in reports that do not con-
being used increasingly by pharmacovigilance specialists to tain the usual regulatory flags for seriousness (i.e., death,
explore postapproval safety databases. First, the authors pro- immediately life threatening, hospitalisation/prolongation of
vide a concentrated overview of the nature of SRS databases, hospitalisation etc.). This facilitates signal detection at the
including their strengths, limitations and interpretive case level and reduces data corruption (e.g., inaccurate coding
nuances for purposes of signal detection in general. The of reported AEs) at the level of individual records that would
authors then develop an intuitive theoretical framework by compromise statistical approaches based on aggregate data.
which pharmacovigilance specialists may better appreciate The initial intake assessment is most useful in detecting sig-
the ‘mechanics’ of DMAs and feel comfortable with inter- nals of so-called designated medical events (DMEs). These are
preting their output. Finally, the authors delve into a number AEs considered rare, serious and associated with a high
of issues that the budding data miner in pharmacovigilance drug-attributable risk and constitute an alarm with as few as
needs to consider when deploying DMAs in real-life pharma- 1 – 3 reports [14]. Typical examples include Stevens-Johnson
covigilance settings and make recommendations for use and syndrome, toxic epidermal necrolysis, hepatic failure, anaphy-
further research. laxis, agranulocytosis, aplastic anaemia and torsade des
As a ‘signal’ is usually considered to be more than just a sta- pointes. Other events of special interest, sometimes also called
tistical association [1], the authors hereafter use the term ‘sig- targeted medical events (TME), associated with particular
nal of disproportionate reporting’ (SDR) when discussing drugs and/or patient populations may be monitored in a
statistical disproportionalities in SRS databases without clini- similar fashion.
cal, pharmacological and/or (pharmaco)epidemiological con- From a public health and a regulatory perspective, the
text [6]. The intention of the aforementioned terminology is majority of reports in SRS databases represent ‘noise’, because
to emphasise that DEAs highlighted with DMAs indicate dif- the reports are associated with treatment indications (i.e., con-
ferential reporting of possible reactions, not necessarily founding by indication), co-morbid illnesses, protopathic
indicative of differential occurrence. bias, channeling bias and/or other reporting artifacts, or the
reported adverse events are already labelled or are medically
1.1 Spontaneous reporting systems and signal trivial. Signals from non-DME reports may not begin to
detection become cogent until cumulative reports generate a pattern
Pharmaceutical companies, health authorities and drug moni- that stands out from the background ‘noise’. Thus, the next
toring centres use SRS databases for global screening for step in traditional signal detection is manual review of lists of
unforeseen AEs after regulatory authorisation for use in clini- reported AE frequencies, and/or comparison of reporting rates
cal practice. The precise details of each SRS differ in terms of to background incidence rates deduced from external
size and scope, statutory reporting mandates, surveillance longitudinal databases.
selectivity or intensity and organisational structure. Promi- DEAs of intermediate frequency or seriousness might be
nent SRSs include the Adverse Event Reporting System discounted, based on manual review of AE lists, especially if a
(AERS) of the US FDA [101], the Yellow Card Scheme of the pharmacologically plausible explanation is not readily appar-
Medicines and Healthcare Products Regulatory Agency ent. More complicated reports, such as with drug–drug inter-
(MHRA) [102], and the international pharmacovigilance pro- actions and drug-induced syndromes, may be especially
gramme of the World Health Organization (the WHO Upp- resistant to detection, because the reviewer would have to cog-
sala Monitoring Centre) [103]. These, and other systems, were nitively link multiple, separately listed drugs and/or adverse
created to provide early warnings of possible safety problems events. Historically, discerning factors suggestive of a possible
that would be difficult to detect during clinical drug develop- ADR for non-DME reports has drawn on the clinical and
ment because of the power limitations, constricted range of pharmacoepidemiological acumen of the prepared mind [15].
demographics, exclusion of patients with extensive co-morbid However, as the number of reports submitted to postlicensing
illnesses and co-medications, and limited duration of safety databases continues to grow, statistical systems have
follow-up, characteristic of clinical trials. been developed as adjunctive tools to assist the prepared mind
The first step in signal detection is the submission of case in identifying possible ADRs.
reports of suspected adverse events to pharmaceutical compa- Different statistical approaches in pharmacovigilance include
nies and health authorities by healthcare professionals and disproportionality analyses, sequential probability ratio tests,

930 Expert Opin. Drug Saf. (2005) 4(5)


Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek

correlation analyses and multivariate regression. Most of the algorithms (e.g., gamma-Poisson shrinker [GPS], multi-item
published experience, to date, has been with so-called dispro- gamma-Poisson shrinker [MGPS], proportional reporting ratios
portionality analyses, a major focus of this article. Although the [PRR], reporting odds ratios [ROR], Bayesian Confidence
precise operational details of each disproportionality algorithm Propagation Neural Network [BCPNN]) focus on low-dimen-
vary, they all calculate surrogate observed-to-expected ratios in sional projections of the data, typically two-dimensional contin-
which the reporting experience of each reported drug–event gency tables. Table 2 shows a typical (fictitious) table. The
combination (DEC) is compared to the background reporting explanation of the above names will become clear below.
experience across all/most drugs and events using an independ- The number of such tables in a 15,000 drug name –
ence model [16-20]. In the appropriate clinical context, DECs 16,000 AE SRS database is ∼ 240 million, so enumeration of
that stand out statistically against the background reporting all possible tables is tedious, yet still feasible on standard hard-
experience may reflect credible signals warranting additional ware. The basic data-mining task then is to rank order the
investigation. If there is sufficient correlation between these sta- tables in order of ‘interestingness’ and report some subset of
tistical metrics and novel causal associations, these tools could the DECs as worthy of further human investigation.
improve drug safety monitoring. However, as discussed in detail Most authors use some statistical measure of association as
below, many current disproportionality analysis methods have their measure of ‘interestingness’. Many such measures exist
the potential to perform poorly in real world databases and, and their statistical properties for hypothesis testing vary.
therefore, the authors encourage the development of new GPS focuses on the ‘relative risk’ (RR) (the authors note that
methods to address some of the existing limitations. this is an unfortunate term in the data mining literature
To fully understand these issues, the authors first delve into given that SRS data were never intended for, and cannot be
the theory behind these tools by ‘looking under the hood’ of used to calculate incidences and, consequently, relative risks.
the commonly used DMAs. Alternative terminology, such as relative reporting ratio, is
preferred). The RR for the drug i – adverse event j combina-
2. Mining spontaneous reporting system data: tion (RRij) is the observed number of occurrences of the
theory combination (20 in Table 2) divided by the expected number
of occurrences. GPS computes the expected value under a
SRSs receive reports that consist of one or more drugs, one or model of independence. Specifically, in the example above,
more AEs, and possibly some basic demographic information overall, AE j occurs in 10% of the reports (120 out of 1200).
(in addition to text data). Over time, SRS databases emerge Thus, if drug i and adverse event j are stochastically inde-
that contain thousands or even millions of these reports. Not- pendent, 10% of the reports containing drug i should
withstanding several well-documented data limitations (see include AE j, that is 12 reports in this case. Thus, the RR for
Section 1), SRS databases represent a primary data source for this example is 20/12 or 1 2/3; this combination occurred
evaluating drug safety. Certainly, these databases may play a ∼ 67% more often than expected. Some analysts use 2 as an
role in the investigation of specific associations. Here, how- threshold of ‘interestness’, and, hence, would not report this
ever, the authors are concerned with harnessing the databases combination as a candidate for further human investigation.
to detect previously undiscovered associations. The field of Natural (though not necessarily unbiased) estimates of vari-
data mining focuses on problems of this nature and the last ous probabilities emerge from tables such as Table 2. For
few years have seen some useful interplay between the data example, one might estimate the conditional probability of
mining and drug safety communities. AE j given drug i by a/a+b (i.e., 20/120 in the example of
Beyond the data quality issues already alluded to, analysis of Table 2). That is, the observed fraction of drug i reports that
SRS databases presents some immediate challenges. The listed AE j. Table 3 lists the formulae for the various measures
Med-DRA adverse event coding system includes > 16,000 dis- of association in common use, along with their probabilistic
tinct preferred terms (PTs). The number of licensed drugs is of interpretation. Here “¬drug” for example, denotes the reports
the same order of magnitude. Thus, SRS databases resemble that did not list the target drug. PRR is the ‘proportional
spreadsheets with one row per report and ∼ 30,000 columns. reporting ratio’, ROR is the ‘reporting odds ratio,’ and IC is
Table 1 shows a conceptual representation of a typical entry. the ‘information component’ defined by the WHO Uppsala
Multivariate statistical analysis of high-dimensional data Monitoring Centre [16,18,21].
of this sort can present significant difficulties. Nonetheless, All four of these measures make sense; in each case, a par-
progress in domains such as gene expression analysis and ticular drug that is more likely to cause a particular AE than
text categorisation is directly relevant – and is discussed in some other drug will typically receive a higher score. Similarly,
Section 2.3. if an AE and a drug are stochastically independent, all
measures will return a null value.
2.1 Contingency tables and disproportionality However, all four methods are subject to sampling variability
measures (i.e., a different set of AE reports from the same ‘population’
A number of approaches have emerged in recent years that will not give exactly the same value of the measure of associa-
search SRS databases for ‘interesting’ associations. Most such tion). This may particularly be the case with large, sparse

Expert Opin. Drug Saf. (2005) 4(5) 931


The role of data mining in pharmacovigilance

Table 1. A conceptual representation of a typical entry in an SRS database.

Age Sex Drug 1 Drug 2 … Drug AE 1 AE 2 … AE 16,000


15,000
42 Male No Yes … No Yes No … Yes
AE: Adverse event; SRS: Spontaneous reporting system.

Table 2. A fictitious 2-dimensional projection of an SRS database.

AE j = yes AE j = no Total
Drug i = Yes a = 20 b = 100 120
Drug i = No c = 100 d = 980 1080
Total 120 1080 1200
AE: Adverse event; SRS: Spontaneous reporting system.

Table 3. Common measures of association for 2 x 2 tables in SRS analyses.

Measure of association Formula Probabilistic interpretation


Relative risk* a * (a + b + c + d) Pr(ae I drug)
------------------------ ----------------
(a + c) * (a + b) Pr(ae)
Proportional reporting ratio a/(a + b) Pr(ae I drug)
------------ ----------------
c/(c + d) Pr(ae I¬drug)
Reporting odds ratio a/c Pr(ae I drug)/Pr(¬ae I drug)
---- ------------------------------------
b/d Pr(ae I¬drug)/Pr(¬ae I ¬drug)
Information component‡ a * (a + b + c + d) Pr(ae I drug)
Log 2 ------------------------- Log2 ----------------
(a + c) * (a + b) Pr(ae)
* The preferred terminology would be relative reporting ratio (see text).
‡ Information component is used in the frequentist sense in this table (as a ‘crude’ measure), but is formulated as a Bayesian metric in the BCPNN.

BCPNN: Bayesian confidence propagation neural network; SRS: Spontaneous reporting system.

databases. Due to the law of large numbers, this statistical varia- large, it could be that the true measure is actually small (or
bility diminishes as the sample size increases. In the SRS con- vice versa).
text, however, the count in the ‘a’ cell is often small, leading to The pharmacovigilance literature describes different
substantial variability (and hence uncertainty about the true approaches to deal with this difficulty. The most straightfor-
value of the measure of association) despite the often large ward approach is to estimate a standard error for the measure
numbers of reports overall. of association. Roughly speaking, one expects the observed
Consider, for example, Tables 4 and 5, showing 2 x 2 tables measure of association to be within two or three standard
that differ merely by a single extra report. errors of the corresponding true measure of association.
Table 6 shows the various measures for Tables 4 and 5. Not- Therefore, for example, a PRR of 4 with a standard error of 3
withstanding the somewhat large sample size of almost 1200, the is not as interesting as a PRR of 4 with a standard error of 0.2.
addition of one extra report doubles, or almost doubles all the Three difficulties arise with this general approach.
measures. Table 7 shows a table involving an uncommon drug First, formulae for standard errors justify themselves via
and a rare AE where a single report results in an RR of >1000! asymptotic arguments that may not apply in tables with small
The essential problem in each case is that the standard counts. Second, the interpretation of the standard error relies
error of the measure of association is large. Large standard on the notion of a ‘true’ population measure of association
errors mean that the measure of association is unreliable; and a repeatable random sampling mechanism – these may
even though the observed measure of association might be not make any sense in an SRS context. Third, standard errors

932 Expert Opin. Drug Saf. (2005) 4(5)


Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek

Bayesian Geometric Mean or EBGM) of 1.5 (i.e., the crude


Table 4. 2 x 2 table of counts for drug D1 and adverse
RR is “shrunk” towards a value of 1), whereas an RR of 1000
event AE.
that derives from an observed count of a = 100 might result in
AE = yes AE = no a EBGM RR estimate of close to 1000. For the specific Baye-
D1 = Yes a=1 b = 100
sian setup that (M)GPS uses, observed counts in excess of 10
result in RR estimates that typically receive essentially no
D1 = No c=5 d = 1080
shrinkage although in practice larger differentials have been
AE: Adverse event.
observed depending on the thresholds used [20,22,23].
All Bayesian statistical analyses begin with a prior distribu-
tion for all the unknowns. In the case of (M)GPS, the true
Table 5. 2 x 2 table of counts for drug D2 and adverse RRs are the ‘unknowns.’ A standard Bayesian analysis specifies
event AE. the prior distribution before looking at the data. Via Bayes’
AE = yes AE = no theorem, the data then transform the prior distribution into a
posterior distribution. This posterior distribution in a precise
D2 = Yes a=2 b = 100 sense combines prior knowledge (that the prior distribution
D2 = No c=5 d = 1080 encapsulates) with the evidence from the data. As a matter of
AE: Adverse event. convenience, prior distributions usually come from some par-
ametric family, such as the normal distribution or the gamma
distribution. Encapsulating prior knowledge then amounts to
Table 6. Measures of association for Tables 4 and 5. choosing the parameters of these distributions (e.g., the mean
and the variance in the case of a normal distribution). One
Measure Drug D1 Drug D2
particular Bayesian approach actually uses the data to choose
Proportional 2.1 4.3 the parameters of the prior. This is the so-called empirical
reporting ratio Bayes’ approach, and although it appears to ‘double-dip’ in
Reporting odds ratio 2.2 4.3 the data, it does enjoy some theoretical support. GPS and
Information 1.0 1.7 MGPS adopt the empirical Bayes’ approach. Madigan showed
component that a standard Bayesian analysis (i.e., not the empirical
Relative risk 2.0 3.3 approach) yields similar estimates to (M)GPS and proved
somewhat more satisfactory in one small example [24].
Regardless of how the Bayesian analysis handles the prior,
Table 7. 2 x 2 table of counts for drug D3 and adverse the approach produces a posterior distribution for each RR.
event AE. EBGM is the geometric mean of the posterior distribution.
Other summaries are possible. For example, DuMouchel
AE = yes AE = no mentions ‘EB05’ [25]. This is the fifth percentile of the poste-
D3 = Yes A=1 b=7 rior distribution – meaning that there is a 95% probability
that the ‘true’ RR exceeds the EB05. As EB05 is always
D3 = No C=3 d = 34000
smaller than EBGM, this, in a sense, adds extra shrinkage and
AE: Adverse event.
represents a more restrictive choice than EBGM. A number of
studies have provided examples where EB05 might be too
shrink with increasing sample size, so this approach tends to conservative in the sense that it could result in delayed detec-
overemphasise common DECs. tion of relevant signals that other disproportionality methods
A second related approach conducts statistical hypothesis detected much earlier [22,23,26].
tests in the 2 x 2 contingency table. Common tests include The authors note that Bayesian approaches do not receive
chi-square tests with or without small cell count adjustments immunity from concerns about non-random sampling. Fur-
and Fisher’s exact test. These run afoul of the same difficulties thermore, whereas BCPNN, GPS and its later variant
that are associated with the standard error-based approach. MGPS, provide elegant solutions to an important practical
A third approach involves Bayesian shrinkage (thus the word problem, other Bayesian and non-Bayesian shrinkage
‘shrinker’ in the names of certain algorithms). Both (M)GPS approaches are possible and will, in general, lead to different
and BCPNN use this approach. GPS, for example, places a RR estimates. For instance, one could craft a Bayesian model
prior distribution on RRs that encapsulates a prior belief that that leads to the following shrinkage scheme: if the observed
most RRs are close to a value of one. Only in the face of sub- count (i.e., the cell a) is equal to 1, divide the RR by 10; if
stantial evidence from the data does (M)GPS return an RR the observed count is between 2 and 4, divide the RR by 5;
estimate that is substantially larger than one. Thus, for exam- if the observed count is ≥ 5, return the observed RR. The
ple, an RR of 1000 that derives from an observed count of operational characteristics of various possible shrinkage
a = 1 might result in a (M)GPS RR estimate (Empirical schemes are essentially unknown.

Expert Opin. Drug Saf. (2005) 4(5) 933


The role of data mining in pharmacovigilance

In summary, all of the approaches mentioned, thus far, the drug). Maximum likelihood provides one way to estimate
consider low-dimensional contingency tables that aggregate the three regression coefficients and, in this case, yields
over very high-dimensional data. The authors contend that β1 = 4.4 and β2 = 0. β1 represents the expected change in the
the specific measure of association used and the specific statis- log-odds of nausea as Rosinex goes from 0 to 1, holding Gan-
tical method to deal with sampling variability is of secondary clex constant. Similarly, β2 represents the expected change in
importance in the face of some of the issues described below. the log-odds of nausea as Ganclex goes from 0 to 1, holding
Rosinex constant. In the latter case, this adjusting for Rosinex
2.2 The problem with contingency tables is key; indeed, within the Rosinex taking group, Ganclex is
Thus far, the authors have focused on measurement of associa- not associated with nausea and within the non-Rosinex taking
tions between individual drugs and AEs. Disproportionality group, Ganclex is not associated with nausea. Thus, logistic
analysis can also be used to examine higher-order associations regression provides a satisfactory answer here showing a posi-
such as between two drugs and one AE, two drugs and two AEs tive and statistically significant (p < 10-8) effect of Rosinex on
and so on [27-29]. Contemporary DMAs can also incorporate a the reporting structure and no effect at all due to Ganclex.
limited set of demographic variables via stratification, or when As mentioned above, SRS databases, such as FDA-AERS,
logistic regression analysis is applied, RORs can be calculated can include > 15,000 different drug names (including many
that are corrected for these demographic variables. In any event, redundant drug names). Thus, a regression of a particular
the general approach measures associations in low-dimensional AE on all of the drugs, involves simultaneous estimation of
projections of high-dimensional data. Some well-known diffi- 15,000 regression coefficients. This represented an insur-
culties with this general approach exist and provide an mountable barrier until relatively recently. However,
important focus for the epidemiology literature in general. approaches based on Bayesian shrinkage have proven suc-
The authors illustrate one basic difficulty with a simple cessful in applications to gene expression data and text cate-
example. Consider a fictitious drug, Rosinex, that causes nau- gorisation where several published applications estimated in
sea. Suppose that 90% of the individuals taking Rosinex experi- excess of 100,000 parameters. In their own work, the
ence nausea, whereas 10% of the individuals not taking Rosinex authors have applied the ‘Bayesian Binary Regression’
experience nausea. Further, suppose that Rosinex makes one software to AERS and VAERS analyses (the software is
susceptible to eye infections. Consequently, due to standard available at [104]).
practice guidelines, 90% of the Rosinex users also take a pro- The regression approach to data mining in SRS databases,
phylactic antibiotic called Ganclex, whereas ∼ 1% of the does not, however, provide a totally satisfactory solution. Key
non-Rosinex users take Ganclex. Ganclex does not cause nau- limitations include the following:
sea. Figure 1 shows a causal model that describes the situation.
• The regression approach just described, builds a separate
Table 8 shows data that are consistent with this description.
regression model for each AE presenting a significant mul-
Considering only Ganclex and Nausea, the observed count is
tiple comparison challenge and ignoring dependencies
82 as compared with an expected value of ∼ 18, leading to an
between AEs;
RR of > 4. The EBGM score would be similar. Therefore, even
• Regression models adjust merely for measured and
though Ganclex has no causal relationship with nausea, the data
recorded factors, such as drugs and demographic covariates,
mining approach based on 2 x 2 tables would generate a Gan-
but fail to take account of unmeasured or unrecorded fac-
clex–Nausea SDR. Fram et al. refer to Ganclex as an ‘innocent
tors, such as health status, or unreported drugs. This issue
bystander’ [19]. The statistical literature sometimes refers to a
is discussed below;
related phenomenon as ‘Simpson’s paradox.’
• A model-based approach requires modelling assumptions;
This is a simple example of a more general phenomenon.
the model above assumes linearity which may or may not
In general, particular patterns of association between observed
be appropriate.
and unobserved variables can lead to essentially arbitrary
measures of association involving the observed variables. Nonetheless, for routine data mining in SRS databases, the
These measures can contradict the true unknown underlying authors contend that regression approaches have distinct
causal model that generated the data. advantages over algorithms that analyse low-dimensional con-
tingency tables, although all are credible additions to the
2.3 Multiple regression pharmacovigilance toolkit.
A multiple regression modelling approach can deal with some
of these concerns. In the Rosinex example above, a logistic 2.4 Confounding, causality and propensity scoring
regression model could be considered: The discovery of a drug-induced disorder is a dynamic and
often lengthy process that begins with signal detection.
Log [Pr(Nausea)/Pr(Not Nausea)]
Although SRS data can never be used to definitively establish
= β0 + β1 x Rosinex + β2 x Ganclex
cause-and-effect relationships, it is often used for adjudicating
Here, ‘Rosinex’ and ‘Ganclex’ are binary predictor variables, associations with sufficient certainty to make decisions in
taking values zero (did not take the drug) and one (did take naturalistic pharmacovigilance settings. In any case, making

934 Expert Opin. Drug Saf. (2005) 4(5)


Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek

Table 8. 2 x 2 x 2 contingency table from an spontaneous reporting system database that is consistent with these
probabilities and with the causal model.

Nausea No nausea Total


Rosinex Ganclex 81 9 90
Rosinex No Ganclex 9 1 10
No Rosinex Ganclex 1 9 10
No Rosinex No Ganclex 90 810 900

unmeasured/unrecorded confounder, such as health status,


Rosinex however, the situation is much less satisfactory. In general,
absent assumptions about potential unmeasured confounders,
no association can reliably yield a causal interpretation. To
quote Cartwright – ‘no causes in, no causes out’ [32]. However,
the authors again emphasise that SRS data by themselves can
Nausea Ganclex often not provide more than an index of suspicion of novel
safety phenomena. As a rule, a further process needs to be ini-
tiated using additional data sets that are more reliable from
the perspective of causal inference.
Figure 1. Graphical causal model. Rosinex causes nausea and The randomised clinical trial (RCT) represents the excep-
also causes individuals to take Ganclex. Taking Ganclex has no tion to the above rule and plays a central role in the drug
effect on the probaility of experiencing nausea. licensing process. The key feature of an RCT is that an objec-
tive randomised mechanism controls the probability that a
subject receives any particular treatment. For example, one
causal inferences is a fundamental downstream goal of SRS could imagine an RCT where a coin flip decides whether a
data mining. Ultimately, one wants to know if taking a drug subject receives Ganclex or a placebo. Because of the coin flip,
(or a combinations of drugs) increases the chance of it would be expected, especially with large sample sizes, that
experiencing some AE (or set of AEs). the fraction of Rosinex users to be similar in the Ganclex and
‘Confounding’ is a causal concept closely related to placebo groups. This contrasts sharply with the above-men-
Simpson’s paradox and its generalisations. tioned example where the Ganclex group contained a prepon-
When computing the Ganclex–nausea RR, the authors derance of Rosinex users. In fact, the beauty of the coin flip
implicitly compared the individuals who took Ganclex with mechanism is that it is expected that all potential confounders
those who did not, and tried to use this comparison to make will be roughly equally balanced between the Ganclex and
inference about the causal effect of Ganclex on nausea. How- placebo groups, whether they were measured/recorded or not.
ever, the true individual level causal effect would compare the SRS databases are different from, and complementary to ran-
nausea status of individuals who took Ganclex with those same domised clinical trials and expose themselves to confounding,
individuals had they not taken Ganclex. Population-level causal both from observed factors as well as unobserved factors. ‘Pro-
effects then average over the individual causal effects. This pensity scoring’ is one framework for formulating causal
‘potential outcomes’ view of causality dates back at least to Ney- hypotheses in the face of confounding and examining their con-
man [30] (for a recent review see [31]). Because the authors do not sequences. Although propensity scoring has attracted considera-
have access to individuals who both took and did not take Gan- ble attention in the medical and social sciences [33], the authors
clex (and even if they did, they would have done so at different are unaware of any prior applications in the SRS context and
times and in different circumstances), the individuals who did believe this represents a potentially important future research
not take Ganclex are used as surrogates for the Ganclex users direction. Here, the authors sketch the basic ideas in the context
had they not taken Ganclex. This is the root of the phenome- of drug safety. The propensity-scoring framework comprises of
non described above – the individuals that did not take Ganclex two components – one dealing with observed confounders and
differ from those who did in ways that matter. In particular, the other dealing with unobserved confounders.
Ganclex users are much more likely to have taken Rosinex than For the first component, the authors begin by assuming
non-Ganclex users, and, as has been seen, Rosinex causes nau- there are no unobserved confounders. Suppose, for example,
sea. This phenomenon is called ‘confounding.’ Rosinex one wants to estimate the potential causal effect of Rosinex on
confounds the relationship between Ganclex and nausea. nausea. Because there are no unmeasured confounders, the
In the case of Rosinex, a regression model provides one way Rosinex and non-Rosinex groups differ systematically only in
to deal with the confounding issue. In the case of an ways that have been measured. The idea is to construct a

Expert Opin. Drug Saf. (2005) 4(5) 935


The role of data mining in pharmacovigilance

model that predicts whether someone will take Rosinex or table. Therefore, the authors emphasise that they do not neces-
not, using all the recorded variables as predictors. The ‘pro- sarily agree with the scientific claims made in the referred
pensity score’ for an individual is the predicted probability literature, nor that they believe that all assertions made are
that that individual takes Rosinex. Now, if two individuals always supported by the data. The authors encourage readers
match with identical propensity scores, one a Rosinex taker to obtain the index articles for critical study using the
and the other a non-taker, they can be compared as if they had principles and concepts discussed in this article (Table 9).
been assigned according to an RCT. Many other details are
being ignored here, concerning predictive model construc- 3. Mining spontaneous reporting sytem data:
tion, optimal matching and multiple testing, but the essential practice
idea is to estimate a causal effect under the (unrealistic)
assumption of no unobserved confounders. 3.1 Fundamental considerations
The second component of the propensity-scoring frame- Understanding the theoretical basis of commonly used DMAs
work assesses the sensitivity of the causal effect that the first facilitates thoughtful consideration of crucial operational
component estimated. The idea is to assess how strong an questions, such as: should data mining be added to routine
effect any unmeasured confounder would need to have in pharmacovigilance operations? Is there a preferred DMA?
order to explain the causal effect. Cornfield et al.’s classic work What is the optimum positioning of DMAs within a compre-
on the smoking-lung cancer connection exemplifies this type hensive pharmacovigilance system that utilises multiple
of analysis [34]. Specifically, they write: approaches and data sets for signal detection?
‘If an agent A with no causal effect upon the risk of a dis- A sensible starting point is to clearly delineate the improve-
ease, nevertheless because of a positive correlation with some ments that the authors hope to obtain with DMAs. It is
other causal agent B shows an apparent risk, r, for those important to be clear about this because it is not unusual to
exposed to A relative to those not so exposed, then the preva- hear vague statements that DMAs ‘are useful’. If DMAs have
lence of B among those exposed to A relative to the prevalence value, it is because they achieve one or more of the following
of those not so exposed, must be greater than r’ [34]. pharmacovigilance process improvements: 1) Detection of
Thus, for some mystery hormone X to explain the AEs that would otherwise have gone undetected (this is espe-
nine-fold increase in lung cancer amongst smokers compared cially pertinent to higher-order associations because they are
to non-smokers, the proportion of hormone X producers difficult to be captured by the human mind; examples are
among smokers must be at least nine-times greater than that drug–drug interactions, drug–food association, multiple risk
of non-smokers. More recent work extends this line of think- factors for developing an adverse drug reaction etc.); 2) earlier
ing to binary outcomes and also, crucially, deals with identification of AEs that would have been detected with
sampling variability. standard approaches; 3) Detection of the same AEs at the
The authors believe that the propensity scoring approach same time, but with greater scientific efficiency (i.e., decreased
presents exciting possibilities for SRS analysis. Causal effects person-time expended per AE detected); 4) Provision of a
under the no-unobserved-confounder assumption would rep- safety net against human cognitive lapses.
resent the primary signalling mechanism with a sensitivity
The following questions provide a framework for further
analysis providing second-order uncertainty information.
assessing the potential need for, or added value of, DMAs for
However, its usefulness is contingent upon the completeness
individual organisations:
and consistency of reporting of all the relevant variables. As
discussed, this is often an issue. • What is the size and scope of existing surveillance activities
(numbers of drugs, events, reports) relative to human
2.5 Experience with contingency tables resources?
Although the fundamental limitations involved in projecting • How rigorous and comprehensive is the current suite of
high-dimensional data onto 2 x 2 contingency tables should be signal detection tools and strategies?
an inspiration for research into new approaches, the authors • Are there significant numbers of medically important
note that the cumulative experience with disproportionality DECs that do not meet traditional signalling thresholds for
analysis shows it to be a promising adjunct for safety reviewers cumulative review despite continuing accumulation of
confronted with data sets that are difficult to monitor by virtue reports?
of their size. The aforementioned experience has occurred • Is there a significant history of delayed recognition of AEs?
across a variety of organisational (health authorities, drug For example, has a pharmaceutical company been repeatedly
monitoring centres, academia and pharmaceutical companies, alerted to medically important associations involving their
see Table 9) and scientific settings (e.g., general and specialised drugs by health authorities rather than their own internal
pharmacovigilance settings). Due to space limitation, the pharmacovigilance systems (or vice versa)? The authors
authors’ goal is not to exhaustively list or critically analyse each emphasise ‘repeatedly’ because they are all working to the
organisation’s experience, but merely to present an informa- same goal and it is not necessarily unusual that this would
tional synopsis of the major published findings in the form of a happen on occasion.

936 Expert Opin. Drug Saf. (2005) 4(5)


Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek

Table 9. Disproportionality analysis: major findings from the published literature

Organisation (i.e., Algorithms Key findings/conclusions


primary author’s
affiliation)

Drug monitoring BCPNN, ROR,


centres PRR
WHO Uppsala Drug • Retrospective analysis demonstrated early SDRs for known drug–event associations (e.g.,
Monitoring Centre cough–captopril) and avoidance of false positives (e.g., lack of an SDR with digoxin–rash).
(Sweden) Positive predictive value (44%) and negative predictive value (85%) were calculated [35].
• Retrospective analysis of reports of hepatic injury reported with SSRIs, showed an SDR for
hepatic injury and nefazadone, and no SDR for other SSRIs [36].
• Retrospective analysis showed that SDRs could have been observed for the reported DEC:
SSRI-neonatal withdrawal syndrome, especially for paroxetine [37].
• In the WHO database, an association between clozapine and cardiomyopathy and
myocarditis has been established. Related antipsychotics showed a similar relationship
with these ADRs [38].
• Relevant DECs were highlighted by BCPNN (practolol–peritonitis, captopril–cough,
terfenadine–heart rate rhythm disorders, clozapine–myocarditis). BCPNN also demonstrated
the ability to highlight possible drug-specific and group effects [39].
Netherlands • Anaphylactic reactions associated with the use of naproxen, ibuprofen and diclofenac
Pharmacovigilance were reported disproportionately compared with other drugs [40].
Centre Lareb • The inter-relationship between the reported ADRs urticaria, fever and arthralgia, and
(The Netherlands) terbinafin, was examined by logistic regression modelling and pointed towards the existence
of an immunological syndrome [29].
• Logistic regression analysis confirmed the influence of concomitant use of diuretics and
NSAIDs, on a decreased efficacy reported with diuretics involved [27].
• A cross-sectional analysis of the data set of the Netherlands Pharmacovigilance Centre
showed that different frequentist and Bayesian measures of disproportionality are broadly
comparable when four or more cases per combination have been collected; did not include
empirical measures [41].
Regulatory PRR, MGPS,
authorities PROFILE
Medicines and • Retrospective analysis of 15 newly marketed drugs in the UK, showed that signals referred
Healthcare products for 70% to known ADRs, 13% were related to the underlying disease and 17% required
Regulatory Agency further follow-up [18].
(MHRA, UK) • Preliminary comparison using commonly-cited thresholds, showed that standard-PRR
generated more SDRs, but also that PRR highlighted SDRs earlier than stratified-MGPS.
Performance gradients were dependent upon threshold selection [26].
Food and Drug • Retrospective analysis showed an early SDR for the labelled AE ‘bronchospasm’ with
Administration rapacuronium bromide prior to withdrawal for this reason [42].
(FDA, US) • Retrospective analysis showed an early SDR for liver enzyme abnormalities with pemoline
prior to withdrawal in the UK for this reason [42].
• Retrospective analysis showed an early SDR for the labelled AE ‘rhabdomyolysis’ with
cerivastatin prior to withdrawal for this reason [21].
• ROC curves shown for four EB05 cut off scores for labelled events originally identified
in large clinical trials [21].
• MGPS signals reportedly prompted further analyses that led to black box warning for
life-threatening pancreatitis with valproic acid and valproate [43].
• Four data mining metrics (PRR, screened PRR (sPRR), EBGM, and EB05) were applied to the
Vaccine Adverse Event Reporting System to examine agreement between metrics and other
performance characteristics. Unscreened PRR was not studied in full because of
overabundance of signals with singleton associations. The most highly ranked vaccine–event
pairs varied between the metrics. Few known associations were in the top 100 scores
of any of the methods. The number of vaccine–event pairs in the top 100 rankings of any two
methods ranged from 42 – 67. sPRR was generally comparable with EBGM. Each method has
strengths and limitations [44].
ADR: Adverse drug reaction; BCPNN: Bayesian confidence propagation neural network; DEC: Drug–event combination; DMA: Data mining algorithm; EB: Empical
Bayes; EBGM: Empirical Bayesian geometric mean; IC: Information component; MGPS: Multi-item gamma-poisson shrinker; NSAID: Nonsteroidal anti-inflammatory
drug; PRR: Proportional reporting ratio; ROC: Receiver operating characteristic; ROR: Reporting odds ratio; SDR: Signal of disproportionate reporting; SPRT: Sequential
probability ratio test; SSRI: Selective serotonin re-uptake inhibitor.

Expert Opin. Drug Saf. (2005) 4(5) 937


The role of data mining in pharmacovigilance

Table 9. Disproportionality analysis: major findings from the published literature (continued)

Organisation (i.e., Algorithms Key findings/conclusions


primary author’s
affiliation)
Therapeutic Goods • An iterative probability-filtering algorithm (‘PROFILE’) including Fischer’s exact test was
Administration applied to the Australian voluntary reporting system database. PROFILE is based on 2 x 2
(TGA, Australia) contingency tables, but the output is the number of reports surviving the filter, rather than
disproportionality scores. Causality guidelines in Australia’s voluntary reporting scheme
frequently result in multiple suspect drugs per report, raising the issue of ‘innocent bystander’
drugs. The crude data comprised of almost 2000 drugs and 8000 associations. PROFILE
analysis identified ∼ 17% of the associations as ‘noise’ due to co-suspected drugs. If signal
was defined as three or more reports surviving the probability filter, then over 80% of the
reports could be attributed to 25% of the drugs (i.e., 75% of drugs are ‘noise’ or ‘evolving
signals’). PROFILE analyses of seven specific reaction terms are described. Retrospective
analysis demonstrated potential to identify quality control problems with medicines [45].
Pharmaceutical PRR, MGPS,
industry BCPNN
Various • Contemporary Bayesian methods highlight similar DECs especially for frequently reported
Pharmaceutical events. Three DECs were compared in detail: fluoxetine and headache, akithesia and
Companies polyneuritis. For one drug, AEs that were well-recognised after six years of use, would have
been highlighted by empirical Bayesian approaches within the first postapproval year [46].
• In a series of retrospective data mining exercises involving diverse drugs, events and
pharmacovigilance scenarios, both standard-PRR and stratified MGPS showed potential value
in highlighting relevant DECs in a timely manner, although sensitivity gradients were observed.
Standard-PRR was found to be more sensitive than stratified-MGPS in detecting adverse events
of demonstrated interest in pharmacovigilance when commonly cited thresholds were used. This
greater sensitivity was demonstrated both for the number of relevant DECs highlighted, as well
as the time-to-signal for DECs highlighted by both methods. In some instances, the time lag
between first disproportional PRR and first disproportional EB05 was substantial [22,23,47-53].
• For a set of potentially serious/life-threatening DECs that were the subject of black-box
warnings, a retrospective analysis did not demonstrate an obvious benefit of DMAs over
traditional signalling approaches. Some of these DECs were associated with SDRs after detection
by traditional approaches [54].
• Using commonly cited thresholds, stratified-MGPS failed to detect SDR for
antipsychotic-associated pancreatitis that had been detected by traditional clinical approaches [55].
• Using commonly cited thresholds, stratified-MGPS failed to highlight relevant DECs involving
thalidomide – reported during the first 18 months of marketing – that were selected for further
review by traditional approaches [56].
• The ability of MGPS to identify drug–drug interactions between verapamil and other
cardiovascular drugs resulting into impaired cardiac conduction was studied. Empirical Bayesian
data mining has potential value in investigating the clinical relevance of potential drug–drug
interaction [57].
• Disproportionality analysis was used to investigate the associations between different asthma
polypharmacy regimens and the spontaneous reporting of Churg-Strauss syndrome. Different
degrees of association were observed. Disproportionality analysis was able to reveal the
differential contribution of each class of drugs to the reports of Churg-Strauss syndrome [57,58].
• Preliminary data mining results from spontaneous reports of movement disorders associated
with diverse pharmacological/therapeutic drug classes, suggest that data mining may identify
drugs for further molecular structure–toxicity modelling [49].
ADR: Adverse drug reaction; BCPNN: Bayesian confidence propagation neural network; DEC: Drug–event combination; DMA: Data mining algorithm; EB: Empical
Bayes; EBGM: Empirical Bayesian geometric mean; IC: Information component; MGPS: Multi-item gamma-poisson shrinker; NSAID: Nonsteroidal anti-inflammatory
drug; PRR: Proportional reporting ratio; ROC: Receiver operating characteristic; ROR: Reporting odds ratio; SDR: Signal of disproportionate reporting; SPRT: Sequential
probability ratio test; SSRI: Selective serotonin re-uptake inhibitor.

938 Expert Opin. Drug Saf. (2005) 4(5)


Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek

Table 9. Disproportionality analysis: major findings from the published literature (continued)

Organisation (i.e., Algorithms Key findings/conclusions


primary author’s
affiliation)

Academia/national PRR, MGPS,


research institutes BCPNN, ROR
INSERM (France) • Using commonly cited thresholds, Monte Carlo simulations were performed comparing PRR,
ROR, SPRT, IC and EB05. All methods showed low sensitivity. IC was most sensitive, followed
by PRR and ROR, EB05 and SPRT [59,60].
• In what is primarily an exercise in SRS modelling, two Bayesian signal detection methods
were compared: the IC and the EB method. SRS modelling was used to simulate realistic
data on 60 drugs and 40 effects using qualitative knowledge of pharmacovigilance experts
and literature as expressed by fuzzy representation of knowledge and a fuzzy inference
system. Simulation parameters were assigned based on characteristics of the French
pharmacovigilance database. EB was superior to IC for low report counts/rare adverse
events. With larger numbers of reports, both methods were comparable [105].
University of Tokyo • Five data mining methodologies were applied to Japanese spontaneous reports. The DECs
(Japan) highlighted as possible signals varied between methodologies, particularly for DECs with
report counts of 1 or 2. Using commonly cited thresholds, unstratified-PRR was more sensitive
and less specific than stratified-GPS as measured against BCPNN criteria. Metrics based on the
lower bound of the confidence interval of PRR and ROR showed high sensitivity but lower
specificity. GPS demonstrated the lowest negative predictive values and highest positive
predictive values. Stratified versus unstratified analyses (by reporting year) accounted for very
small but statistically significant differences in average scores for the relevant metrics (EB05
1.23 vs. 1.29, EBGM 4.70 vs. 4.49) and small differences in the fraction of combinations
highlighted as possible signals (6.9 vs. 7.1%) [61].
University of Utrecht • ROR was studied using a case-non-case analysis with reports from the WHO-UMC database.
(The Netherlands) RORs were calculated using all 284,426 case reports of suspected AEs of drugs with known
anti-HERG activity. Cases were defined as reports of cardiac arrest, sudden death, torsade des
pointes, ventricular fibrillation and ventricular tachycardia (n = 5591). RORs correlated with
various indices related to anti-HERG activity [62].
University of Ballarat • Eight methods/metrics (including both frequentist and Bayesian methods and PROFILE)
(Australia) were applied to one reaction term (‘hepatitis cholestatic’) in the Australian adverse reaction
database. Sensitivity, specificity, predictive values, and corrected Kappa statistics were
calculated to compare statistical methods with each other and with the Australian
Adverse Reactions Advisery Committee (ADRAC) bulletin [63].
ADR: Adverse drug reaction; BCPNN: Bayesian confidence propagation neural network; DEC: Drug–event combination; DMA: Data mining algorithm; EB: Empical
Bayes; EBGM: Empirical Bayesian geometric mean; IC: Information component; MGPS: Multi-item gamma-poisson shrinker; NSAID: Nonsteroidal anti-inflammatory
drug; PRR: Proportional reporting ratio; ROC: Receiver operating characteristic; ROR: Reporting odds ratio; SDR: Signal of disproportionate reporting; SPRT: Sequential
probability ratio test; SSRI: Selective serotonin re-uptake inhibitor.

In short, if an organisation receives extremely large numbers performance from a purely statistical perspective. To recap,
of reports with numerous drugs and there are significant given the rate-limiting data distortions and corruption in SRS
subsets of the database that do not meet current signalling data, the limitations of projecting higher dimensional data
criteria, then it would be prudent to either strengthen exist- into 2 x 2 contingency tables, the arbitrary and adjustable
ing signalling criteria and/or use supplementary tools, such nature of commonly cited thresholds, as well as the ability to
as DMAs. create ad hoc shrinkage rules with any form of disproportion-
Organisations planning to use DMAs are presented with a ality analysis, performance differentials between the com-
daunting space of available choices (Table 10). This presents a monly-used DMAs in naturalistic pharmacovigilance settings
challenge to researchers developing and testing these tools, as is likely to be of questionable significance.
well as to students of the published data mining literature. Although empirical Bayesian metrics were not included in the
To discuss each possible choice or configuration would be analysis, Van Puijenbroek and co-workers demonstrated con-
beyond the scope of this article so the authors focus on some cordance between Bayesian and non-Bayesian methodologies
basic aspects. when there are ≥ 4 cases for a particular DEC [44].
The question of which, if any, DMA outperforms the oth- Although theoretical statistical considerations and pub-
ers has been vigorously debated, yet may be overly broad. The lished performance evaluations of DMAs (which usually
authors have already touched on issues of comparative study the DMA in isolation) provide informative guidance for

Expert Opin. Drug Saf. (2005) 4(5) 939


The role of data mining in pharmacovigilance

Table 10. Modifiable parameters and configurations for data mining investigations.
Algorithm Proportional reporting ratios
Reporting odds ratios
Bayesian confidence propagation neural network
Multi-item gamma-poisson shrinker
Sequential probability ratio tests
Type of database Public, proprietary (company)
Report source Spontaneous versus spontaneous plus clinical trials; inclusion/exclusion by report source
(e.g., consumer reports).
Size of database ~ 35,000 – 3,000,000 reports
Global versus subset of database
Dictionary MedDRA, COSTART, WHO-ART, local, company developed
Dictionary hierarchy used LLT, PT, HLT, HLGT, SOC
Case definitions Special search categories (e.g., Standardised MedDRA Queries)
Ad hoc case definitions
Drugs (suspect) versus (suspect versus concomitant)
Certain subsets of drugs removed (to eliminate masking)
Grouping by pharmacological/therapeutic class
Study design Real data versus database simulation
Prospective versus retrospective
Methodology Stratified versus unstratified (currently age, gender, time period)
reports versus events
Binary classifier versus ranking classifier
Cross-sectional analysis
Time trend analysis
Deployment in series versus in parallel with other signal detection activities
Performance measures Sensitivity, specificity, positive predictive values, negative predictive value, ROC curves
Threshold selection/threshold metrics Disproportionality threshold
Discrete thresholds: point estimates versus lower CI boundaries
Confidence intervals
Case count threshold
CI: Confidence interval; COSTART: Coding symbols for a thesaurus of adverse reaction terms; HLGT: Higher level group term; HLT: Higher level term; LLT: Lower level
term; MedDRA: Medical dictionary for regulatory activities; PT: Preferred term; ROC: Receiver operating characteristic; SOC: System organ class;
WHO-ART: The WHO adverse reaction terminology.
With courtesy of Drug Safety [64].

adjudicating utility, it is important to remember that DMAs • Performance gradients in sensitivity and specificity may
are used to assist the prepared mind [15]. The use of DMAs in show a progression from frequentist to Bayesian to
naturalistic pharmacovigilance settings involves cognitive empirical Bayesian approaches.
processes and interactions that defy explicit characterisation. • Some of the additional DEAs obtained with frequentist
Consequently, the authors must turn to extra-statistical con- DMAs are due to confounding, reporting artifact or
siderations to complete their assessment of the practical signif- statistical noise (especially at low reporting frequency), and
icance, as opposed to the statistical significance, of reported require additional triage criteria for practical
performance gradients between DMAs. implementation.
These parallel ‘clinical’ considerations lead to similar con- • Some of the additional DEAs represent coding variants
clusions. In general, the comparative performance of these encountered with hypergranular dictionaries (e.g.,
methods used as binary classifiers with commonly cited MedDRA)
thresholds can be summarised as follows [17,18]:
Characteristics of different forms of disproportionality
• In general, frequentist forms of DMAs (e.g., PRRs, RORs) analysis are shown in Table 11.
seem to highlight a greater number and variety of DEAs Practical reality dictates that the search for truth in pharma-
than Bayesian DMAs (e.g., BCPNN, (M)GPS) [22,23,26]; covigilance requires judicious limitations on the number of
• For DEAs that are highlighted by both, frequentist and associations that we investigate. Excessive time and effort
Bayesian methodologies, frequentist DMAs tend to do so expended on associations of no significance will be adverse to
earlier [22,23,26]; public safety as focus is diverted from more significant

940 Expert Opin. Drug Saf. (2005) 4(5)


Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek

Table 11. Characteristics of commonly used data mining algorithms based on disproportionality analysis.

‘Simple’/frequentist Bayesian
Algorithms Proportional reporting ratios Bayesian Confidence Propagation Neural
Reporting odds ratios Network
(Multi-item) gamma-Poisson shrinker
Some users Health authorities outside the US US FDA
Pharmaceutical companies WHO Drug Monitoring Centre
Drug Safety Research Unit Pharmaceutical companies
Advantages More sensitive* More specific*
Clear, easy to use and to understand Numerous data mining settings and
Identifies virtually all DEAs identified by configurations maximise exploratory
Bayesian methods capacity
Natural metric for logistic regression Configured to perform higher order analysis
analyses (e.g., drug–drug interactions, complex
medical syndromes)
Disadvantages Lower specificity leading to overabundance Lower sensitivity
of SDRs that may require additional triage Numerous data mining settings and
criteria for practical implementation configurations raise issues of confirmation
bias and multiple comparisons issue
* When commonly cited thresholds are used as binary classifiers.
DEA: Drug–event association; SDR: Signal of disproportionate reporting.

hazards, and should therefore be avoided. In response to an reflecting the influence of patient-specific factors, treatment
overabundance of SDRs generated by DMAs, some health indication, and/or co-medications and co-morbid illnesses.
authorities (MHRA) and drug monitoring centres (WHO) This clinical shrinkage may allow the safety reviewer using fre-
apply additional triage procedures based on public health quentist forms of DMAs to review a wider net of SDRs rap-
principles, to filter the associations initially highlighted by idly discarding uninteresting DEAs, while filtering out DEAs
data mining. The objective of the triage is to limit the number of interest for further evaluation.
of potential signals to a more manageable level by pinpointing The overabundance of SDRs highlighted with frequentist
associations that seem most significant from a public health DMAs may not be prohibitive and the opportunity cost due
perspective. For example, the MHRA had utilised a triage to lower specificity may be acceptable, and even desirable, if
logic known as ‘SNIP’ (strength of the association, new event, there is a resultant gain in sensitivity; although this is
clinically important, and preventable) and an impact analysis presumably highly situation-dependent.
to help select a subset of disproportionately represented asso- There is currently no theoretical basis or firm empirical sup-
ciations warranting further review [18,65]. The WHO triage port establishing universal thresholds defining a potential signal,
algorithms selects disproportionately represented associations although some have been recommended (e.g., PRR: PRR ≥ 2
that are also characterised by such factors as a rapid reporting and χ2 ≥ 4, N > 2 [18]; MGPS: EB05 ≥ 2, N > 0 [21]; BCPNN:
increase, serious AEs with new drugs, reactions of special IC-2SD > 0 [35] [with or without a time-trend analysis]). As
interest and positive rechallenge, to filter initial data mining thresholds utilised for highlighting SDRs are unvalidated,
output [18,65]. Triage criteria are not governed by specific ad hoc and adjustable, the deployment of DMAs is inherently
regulations and may be decided by each institution. subjective in nature and can be optimised by judicious selection
Although both frequentist and Bayesian methodologies are of data mining ‘parameters’, such as statistical thresholds and
associated with false-positive findings, as mentioned above, background for comparison (see Section 3.2 for discussion of
Bayesian DMAs, in general, highlight less SDRs than frequen- backgrounds). Consequently, mathematically more complex
tist methodologies. This is not surprising because, as discussed and sophisticated Bayesian DMAs should not automatically be
earlier, Bayesian methodologies use a mathematical process that assumed to result in superior outcomes in all situations.
factors the overall reporting experience across drugs and events Finally, the quality and usefulness of the results is strongly
in part to achieve a statistical shrinkage of SDRs associated influenced by the knowledge and experience of the
with low reporting frequency. The extent to which ‘true’ causal data miner.
DEAs are ‘shrunk’ along with noise, is, however, unknown, yet Given all the aforementioned considerations, no single
likely to be occurring, as the mathematical model homogenises method seems superior. Advocacy of a given DMA could
individual reports and contains no explicit clinical criteria. partly relate to intellectual or commercial conflicts of interest
However, by integrating clinical judgment and prior but can also reflect practical challenges imposed by the
knowledge of drugs, events, patient population and diseases, unique structure and function of a given pharmacovigilance
safety assessors can often rapidly filter out associations organisation. All such organisations have the same overall
Expert Opin. Drug Saf. (2005) 4(5) 941
The role of data mining in pharmacovigilance

objective, although each presents a unique combination of essence is to collect facts that individually tell little, but
workload and resource capacity. The choice of a DMA could collectively form a clue to drug dangers.’ [66].
reflect this balance (or imbalance) between workload and Higher-order phenomenon, such as complex drug–drug
resource capacity, as well as comfort with varying levels of interactions and drug-induced syndromes, require making
process automation. Pharmacovigilance systems are some- cognitive links between multiple drugs and/or events, and
what analogous to queuing systems, in which arrivals (the so may be inherently less amenable to detection by manual
‘signals’) place demands on finite (human) resource capacity. review of lists and, hence, as stated above, represent an
Queuing theory quantifies the common-sense notion that attractive opportunity for DMAs. In a drug–drug interac-
loading an overloaded system can result in disproportion- tion, both a pharmacokinetic interaction as well as a phar-
ately increased ‘waiting time’ resulting in decompensation of macodynamic interaction may be involved, each of which
the system’s efficiency. An organisation massively overloaded may lead to either an increase or decrease in the effect of
with data relative to resource capacity, may put a higher pre- one or both of the interacting drugs. Although the mecha-
mium on specificity over sensitivity in data mining. Organi- nism of action is important for an understanding of the
sations with more balanced data and resource capacity, nature of the interaction, in the detection of ADRs in a
might appropriately put a higher premium on sensitivity spontaneous reporting system, it is of less significance. The
when choosing a DMA, metric and/or threshold and the net effect of the possible interaction, however, is essential,
manner in which it is applied. In the former case, reducing because this will direct the focus of the safety reviewer to
the number of signals (‘unloading the system’) would trans- SDRs that are most likely in need for further investigation.
late into choosing more specific and less sensitive algorithms, Examples include, the possible interaction between oral
metrics, thresholds and configurations (e.g., using the DMA contraceptives and the use of itraconazole, in which a
in series with the prepared mind). In the latter situation, delayed withdrawal bleeding is taken to be indicative of a
however, more sensitive and less specific algorithms, metrics, possible interaction [28], as well as the influence of concom-
thresholds and configurations (i.e., parallel deployment in itant use of diuretics and NSAIDs on symptoms indicating
which DMAs assist in detecting additional signals) may be a decreased efficacy of diuretics [27].
an appropriate choice. In the former situation, an organisa- In addition, SRS data sets can be used in the detection and
tion might find it most expedient to use the computerised analysis of possible drug-related syndromes. A syndrome can
algorithm to provide all the initial filters, whereas in the lat- be seen as a complex of signs and symptoms that, together,
ter situation the parallel use of additional filters based on constitute the picture of a disease. Only a small part of the
human assessments may be chosen. ADR clusters present in the database actually represent distinct
In short, because pharmacovigilance organisations are not clinical syndromes. Rather than being part of a certain clinical
structurally or functionally homogeneous, the formula for syndrome, an apparent clustering of symptoms may occur due
achieving maximum scientific efficiency may differ across to fact that the symptoms themselves are related. This is the
organisations. case with nausea and vomiting or abdominal pain and diar-
However, DMAs may also be useful in the common sce- rhoea, symptoms which, although closely interrelated, do not
nario in which a signal is initially detected without using represent a particular clinical entity. The extent to which the
these tools. Clinical epidemiological principles of screening symptoms urticaria, fever and arthralgia were interrelated in a
can help illuminate this process. Screening tests are most SRS data set was examined by logistic regression modelling.
informative when the pretest probability of the disease Case series as well as the results of the statistical analyses
under surveillance is neither very high nor very low. A corre- showed a clustering of symptoms among reports of patients
sponding situation in pharmacovigilance would be when a using terbinafine. These finding might point towards a
detailed case review prompted by a potential signal paints a clustering of these symptoms in patients using terbinafine [29].
compelling clinical argument that the association is likely to The regression approach described above, provides one way
be causal (high pretest probability) or that it is obviously due to deal with complex multi-drug effects, although dispropor-
to confounding factors, reporting artifacts etc. (low pretest tionality analysis, despite being particularly prone to
probability). In such cases, statistical calculations on SRS unreliable results in this setting, may also be used.
data are unlikely to substantially illuminate the phenomena The use of DMAs may provide additional ancillary gains.
under investigation. However, in naturalistic pharmacovigi- As an illustration, for several reasons, AEs may be reported
lance settings, the clinical data and arguments are often more than once (e.g., directly by the physician and indirectly
highly ambiguous from the perspective of causality assess- by the company). Observations of particular interest (e.g.,
ment. In these situations, safety reviewers seek multiple con- regarding unexpected serious AEs) may be more prone to
vergent lines of data and evidence to formulate an duplicate reporting. Often, similarities (e.g., age and sex,
assessment and statistical calculations on SRS data can be comedication, administration dates) can yield a high level of
one piece of the puzzle in this situation. David Finney pio- suspicion of duplication. Automated detection of suspect
neered numerical approaches to SRS data and said: ‘the duplicate reporting is often routinely incorporated as part of

942 Expert Opin. Drug Saf. (2005) 4(5)


Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek

‘cleaning’ the data prior to data mining [67]. This can lead to results that contradict pre-existing expectations by retrofitting
improved data quality. an analysis around the data, using sequential trials of nonspe-
cific case definitions of uncertain clinical relevance, different
3.2 Mining SRS data: pitfalls subsets of the database, thresholds and/or other configuration
Although results, to date, are promising, it is important to be parameters until the ‘desired’ output is achieved. Retrospec-
mindful of numerous limitations and pitfalls in the use of tive and post hoc exercises might be especially susceptible to
DMAs and interpretation of the published data mining litera- this form of confirmation bias [71].
ture. The authors briefly discuss some prominent examples of The finding that, in general, frequentist methodologies
biases that may creep into the picture. identify DEAs earlier than Bayesian methodologies [22,23,26],
Advocates of frequentist as well as the Bayesian approaches highlights an important lesson when studying the published
note that each of the DMAs has been able to identify known literature on data mining validation, namely the significant
signals. The initial peer-reviewed literature reported predomi- limitations of cross-sectional analyses and the importance of
nantly positive results, raising the question of publication bias. repeatedly analysing the ‘developmental anatomy’ of the data-
Subsequently, unpublished and published examples of known base over time, so the nature and number of the SDRs pro-
safety issues not identified by DMAs have appeared [50,56,59]. duced by each DMA as well as the relative timing of SDRs for
The failure of a causal association to generate an SDR can be those generated by more than one method are examined.
related to the overall reported safety profile of the drug (‘fore- Finally, the authors would be remiss if they did not men-
ground’) as well as the composition of the background in the tion the considerable impact of dictionary architecture on
database used for comparison [68]. Consider the reported asso- data mining. The numerous coding redundancies and multi-
ciation between cisplatin and bradycardia, which includes plicities of hypergranular dictionaries such as the Medical
multiple positive rechallenges [69]. Despite the presence of up Dictionary for Regulatory Affairs can profoundly affect the
to 41 reports in FDA-AERS through the third quarter of results of data mining computations. The most obvious exam-
2003, no SDR was generated either by the more sensitive ple is what has been referred to as ‘signal fragmentation’ [72].
PRRs or the more specific MGPS. The authors can conjecture In addition, this could also be introduced by switching to
about several reasonable explanations for this observation another dictionary, or to changes in mapping structure within
(aside from the obvious one, the lack of causality). Cisplatin a given dictionary (e.g., version upgrading) [50]. Therefore, it
has a very complex safety profile that is dominated by tradi- is important to see if data mining performance can be opti-
tional patterns of cytotoxicity. Also, there are many drugs that mised by mining particular levels of the dictionary hierarchy
may cause bradycardia, possibly leading to a high background or by using Boolean logic to combine clinically redundant or
rate of this event in the database. Therefore, bradycardia may overlapping event terms. Excellent discussions of this issue
not be associated with an SDR because it is masked by both can be found in articles by Brown [73-75].
the foreground and background.
Sometimes the background is the predominant factor. One 4. Clinical versus computational approaches
can create a more ‘sensitive’ background to exclude heavy con-
tributors of certain AEs of interest. For example, when all the The computational approach of DMAs basically differs from
other drugs in the FDA-AERS database are used as back- the classical case-by-case approach in which every incoming
ground, to evaluate the DEA between drug X (e.g., a drug for report is reviewed by a qualified assessor. The place of the
which a dear healthcare professional letter was issued due to additional available clinical information is less well-defined
hyperglycaemia) and hyperglycaemia NOS, the signalling when using DMAs, despite the fact that this information
score is far below the threshold. However, when insulin-for- makes an essential contribution to the signal detection proc-
mulations and their reported events are deleted from the ess. In the case-by-case approach, the intrinsic value of the
background, the signalling score for this DEC rises far above case report itself is a crucial factor. An individual case-report
commonly used thresholds. This situation can occur with any not merely provides information about a certain combination
DMA. The phenomenon of diluting the SDR of relative of a drug and an adverse event, but it also places this infor-
weaker DEA by strong DEA is referred to in the literature as mation in a specific context; for instance, with regards to the
cloaking, or masking, and in-house company safety databases pattern of the clinical events, the course of the reaction, spe-
might be especially prone to this because of the relative lack of cific time-related information and experiences of the patient
diversity of events or drugs [18,47,56,70]. and doctor with related products. These aspects are currently
The space of available choices (as displayed in Table 10) in not taken into account by DMAs. The frame of reference for
data mining maximises the exploratory capacity, but also signal detection in the case-by-case approach is the experi-
makes these exercises highly susceptible to confirmation bias. ence and interpretation of the data by the individual assessor.
Given the freedom to choose so many user-adjustable config- This basically differs from the statistical approaches in which
urations, use of DMAs is prone to self-deception bias – in the frame of reference is the other DECs in the data set.
which a data miner with a strong incentive to believe in a par- Another major difference is that in the classical case-by-case
ticular outcome, consciously or unconsciously, tries to avoid approach each case report has its own value and may have a

Expert Opin. Drug Saf. (2005) 4(5) 943


The role of data mining in pharmacovigilance

different contribution in building the evidence of the signal target environment may defy explicit characterisation.
involved. With the current DMAs, all reports have an equal However, in fairness, it is important to remember that phar-
contribution to the signal, irrespective of the level of docu- macovigilance practice has historically depended on semi-
mentation and quality and quantity of additional clinical quantitative and non-computational approaches that are not
information available in the case-reports. The basic differ- formally validated (though there is obviously more
ences in both approaches make clear that one approach will prospective experience with these approaches). There are for-
never be able to replace the other one. midable challenges to validating data mining algorithms
beyond those already mentioned, such as the choice of appro-
5. Conclusion priate reference AEs (true and false positive and negative sig-
nals) for assessing DMA performance in the absence of perfect
The development, testing and deployment of DMAs repre- gold standards for adjudicating causality.
sent a quantum jump in pharmacovigilance. Although there is Accordingly, DMAs should be considered as one of many
currently no scientific or regulatory basis to claim that DMAs potentially performance-enhancing options in the pharma-
are a required element of good pharmacovigilance practice, covigilance toolkit that need to be assessed by each institution
they are an intuitively appealing solution to the operational on an individual basis. They should only be considered poten-
challenges of screening steadily enlarging safety databases. tial supplements to, and not substitutes for, a comprehensive
Higher-order phenomena, such as complex drug–drug inter- signal detection programme based on multiple approaches and
actions or drug-induced syndromes, may be especially diffi- data sets. The authors encourage all stakeholders to participate
cult to identify through manual review of AE lists, and it is in testing these methodologies. Data mining research is impor-
this type of phenomena which might be most amenable to tant, interesting and fun. The authors believe the greater the
detection through data mining. number of perspectives and the more vigorous the discourse,
Retrospective applications indicate that DMAs can high- the better for patient safety. This presents an important oppor-
light some medically significant associations in a timely man- tunity for multi-disciplinary knowledge sharing between regu-
ner, often in advance of the published literature and latory authorities, drug monitoring centres, pharmaceutical
traditional signalling strategies. This experience includes both companies and academia to improve safety of patients.
general as well as more specialised pharmacovigilance settings There is significant potential for misapplication and misuse
[56]. DMAs have been incorporated into routine pharmacovig- of DMAs. A great danger is that DMAs, especially those with
ilance operations of major national and transnational drug an extensive mathematical veneer and dazzling visualisation
monitoring centres, such as the MHRA (PRRs), the Nether- tools, will seduce users into believing that the enormous limi-
lands Pharmacovigilance Centre LAREB in the Netherlands tations and defects in SRS data have been neutralised and
(RORs), the WHO drug monitoring centre (BCPNN) and thereby promote overinterpretation and overconfidence in
the US FDA (MGPS). Key regulatory guidance documents data mining output. As some have noted [77], indications of
include discussions about the potential role of data mining in this have already appeared in the published literature and the
pharmacovigilance and risk management framework [106]. courtroom where these methods have been described as
However, DMAs may fail to highlight legitimate associa- potentially useful for testing hypotheses [78].
tions for various reasons, have an unclear opportunity cost
associated with false alarms, and have yet to prospectively 6. Expert opinion
detect new drug hazards. The latter consideration is espe-
cially pertinent in light of questions that have been raised Two rate-limiting factors loom large in mining SRS data; the
about classifier performance in general. As Hand stated: qualitative and quantitative data distortions and corruption
‘improvements attributed to the more advanced and recent inherent to voluntary reporting schemes, and the projection
developments are small, and that aspects of real, practical of high-dimensional data onto two-dimensional
problems often render such small differences irrelevant, or contingency tables.
unreal, so that the gains reported on theoretical grounds, or One approach to defective data will be the development
on empirical comparisons from simulated or even real data and application of DMAs to very large, quality audited phar-
sets, do not translate into real advantages in practice. That is, macoepidemiological databases. The latter ordinarily contain
the progress is far less than it appears’ [76]. anonymous longitudinal medical records, including diagnoses,
The authors cannot say with certainty the degree to which prescribed drugs, hospital admissions and laboratory results. It
these considerations apply to data mining in naturalistic phar- is, therefore, tempting to use these databases to identify signs
macovigilance settings, but it seems reasonable to ponder the and symptoms, drug–drug interactions, as well as long latency
degree to which global retrospective data mining exercises adverse reactions. Unlike SRS databases, pharmacoepidemio-
involving large databases populated with many old drugs and logical databases do not contain reporter-defined drug–event/
labelled events informs us about good prospective pharma- symptom pairs. This might be a challenge, but on the other
covigilance practice in naturalistic pharmacovigilance settings, hand, the routine recording of signs and symptoms regardless
especially given that key parameters and uncertainties in the of the index of suspicion, provides less biased data. However,

944 Expert Opin. Drug Saf. (2005) 4(5)


Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek

voluntary reporting systems will still be needed, especially for the profound rate-limiting defects in SRS data. Opportunities
the detection of extremely rare events for which even large for improving data quality are crucial. Similarly, the study and
longitudinal pharmacoepidemiological databases are optimisation of clinical cognition in the process of signal
principally too small. detection and evaluation should not be neglected. From a
Progress beyond the second rate-limiting step will be facili- technological perspective, knowledge-sharing collaborative
tated by exploring and developing techniques, such as multi- efforts involving the pharmacovigilance, pharmacoepidemio-
variate regression and propensity scoring that incorporate the logical, statistics, computer science and artificial intelligence
full dimensionality of the data. communities, should strive for database tools that will allow
Just as there is a danger of overinterpreting data mining the expert safety reviewer to efficiently access and integrate
results, there is also a danger of overattention to data mining extra-statistical scientific knowledge pertinent to the aetiology
research at the expense of other important areas for growth. of ADRs with the statistical calculations, because associations
With increasing focus on statistical approaches, the authors for which cogent post hoc scientific explanations are found
are, once again, and perhaps more than ever, reminded about may be more likely to be causal in nature [2].

Bibliography professionals across the European Union. 15. TRONTELL A: Expecting the
Papers of special note have been highlighted as The European Pharmacovigilance Research unexpected-drug safety, pharmacovigilance,
either of interest (•) or of considerable interest Group. Eur. J. Clin. Pharmacol. and the prepared mind. N. Engl. J. Med.
(••) to readers. (1997) 52(6):423-427. (2004) 351(14):1385-1387.
9. BELTON KJ, LEWIS SC, PAYNE S, •• An excellent description by an expert
1. MEYBOOM RH, EGBERTS AC,
RAWLINS MD, WOOD SM: at the FDA, of signal detection as a
EDWARDS IR et al.: Principles of
Attitudinal survey of adverse drug reaction comprehensive process using multiple
signal detection in pharmacovigilance.
reporting by medical practitioners in the approaches and techniques.
Drug Saf. (1997) 16(6):355-365.
•• Eloquent discussion of the nuances United Kingdom. Br. J. Clin. Pharmacol. 16. BATE A, LINDQUIST M, EDWARDS IR
of signal detection and evaluation. (1995) 39(3):223-226. et al.: A Bayesian neural network method
10. COSENTINO M, LEONI O, BANFI F, for adverse drug reaction signal generation.
2. HAND D, BLUNT G, KELLY M,
LECCHINI S, FRIGO G: Attitudes to Eur J. Clin. Pharmacol. (1998)
ADAMS N: Data mining for fun and
adverse drug reaction reporting by medical 54(4):315-321.
profit. Stat. Science (2000) 15(2):111-131.
practitioners in a Northern Italian district. •• A seminal paper describing the BCPNN.
3. GOODWIN L, VANDYNE M, LIN S,
Pharmacol. Res. (1997) 35(2):85-88. 17. EGBERTS AC, MEYBOOM RH,
TALBERT S: Data mining issues and
11. DE BRUIN ML, VAN PUIJENBROEK EP: Use of measures
opportunities for building nursing
VAN PUIJENBROEK EP, EGBERTS AC, of disproportionality in pharmacovigilance:
knowledge. J. Biomed Inform.
HOES AW, LEUFKENS HG: three Dutch examples. Drug Saf. (2002)
(2003) 36(4-5):379-388.
Non-sedating antihistamine drugs and 25(6):453-458.
4. LI X, RAO S, WANG Y, GONG B:
cardiac arrhythmias – biased risk estimates 18. EVANS SJ, WALLER PC, DAVIS S:
Gene mining: a novel and powerful
from spontaneous reporting systems? Br. J. Use of proportional reporting ratios
ensemble decision approach to hunting
Clin. Pharmacol. (2002) 53(4):370-374. (PRRs) for signal generation from
for disease genes using microarray
12. ELAND IA, BELTON KJ, spontaneous adverse drug reaction
expression profiling. Nucleic Acids
VAN GROOTHEEST AC et al.: reports. Pharmacoepidemiol.
Res. (2004) 32(9):2685-2694.
Attitudinal survey of voluntary reporting Drug Saf. (2001) 10(6):483-486.
5. PERNER P: Image mining: issues, •• Describes the application
of adverse drug reactions. Br. J. Clin.
framework, a generic tool and its of PRR at the MHRA.
Pharmacol. (1999) 48(4):623-627.
application to medical image diagnosis.
13. WILLIAMS D, FEELY J: 19. FRAM D, ALMENOFF J,
Journal Engineering applications of artificial
Underreporting of adverse drug reactions: DUMOUCHEL W: Empirical Bayesian
intelligence (2002) 15(3):193-203.
attitudes of Irish doctors. Ir. J. Med. Sci. data mining for discovering patterns in post-
6. HAUBEN M, REICH J: Communication marketing drug safety. (9th ACM SIGKDD
(1999) 168(4):257-261.
of findings in pharmacovigilance: use of International Conference on Knowledge
term ‘signal’ and the need for precision 14. BEGAUD B, MORIDE Y,
Discovery and Data Mining, August 24-27,
in its use. Eur. J. Clin. Pharmacol. TUBERT-BITTER P, CHASLERIE A,
2003, Washington), (2003) p359-368.
(2005) 61(5-6):479-480. HARAMBURU F: False-positives in
spontaneous reporting: should we worry 20. HAUBEN M: A brief primer on automated
7. BATEMAN DN, SANDERS GL, signal detection Quantitative methods
about them? Br. J. Clin. Pharmacol.
RAWLINS MD: Attitudes to adverse in pharmacovigilance: focus on signal
(1994) 38(5):401-404.
drug reaction reporting in the Northern detection. Ann. Pharmacother.
• Describes a method based on the Poisson
Region. Br. J. Clin. Pharmacol. (2003) 37(7-8):1117-1123.
distribution for computing the maximum
(1992) 34(5):421-426.
number of reports of an ADR that could 21. SZARFMAN A, MACHADO SG,
8. BELTON KJ: Attitude survey of adverse be expected to be reported coincidentally, O’NEILL RT: Use of screening algorithms
drug-reaction reporting by health care and relates this to the concept of DMEs. and computer systems to efficiently signal

Expert Opin. Drug Saf. (2005) 4(5) 945


The role of data mining in pharmacovigilance

higher-than-expected combinations 31. RUBIN D: Causal inference using potential detection in spontaneous reporting
of drugs and events in the US FDA’s outcomes: design, modeling, decisions. systems for adverse drug reactions.
spontaneous reports database. JASA (2005) 100(469):322-331. Pharmacoepidemiol. Drug Saf.
Drug Saf. (2002) 25(6):381-392. 32. CARTWRIGHT NE: Nature’s Capacities (2002) 11(1):3-10.
22. HAUBEN M, REICH L: and Their Measurements. Cartwright N (ed.) 42. O’NEILL RT, SZARFMAN A:
Safety related drug-labelling changes: Oxford University Press, Oxford, (1994): Some US Food and Drug Administration
findings from two data mining algorithms. 33. ROSENBAUM P, RUBIN D: perspectives on data mining for pediatric
Drug Saf. (2004) 27(10):735-744. The central role of the propensity score safety assessment. Curr. Ther. Res.
23. HAUBEN M: Trimethoprim-induced in observational studies for causal effects. (2001) 62(9):650-663.
hyperkalaemia – lessons in data mining. Br. Biometrika (1983) 70(1):41-55. 43. SZARFMAN A, TONNING JM,
J. Clin. Pharmacol. (2004) 58(3):338-339. 34. CORNFIELD J, HAENSZEL W, DORAISWAMY PM, MACHADO SG,
24. MADIGAN D: Discussion of ‘Bayesian HAMMOND EEA: Smoking and lung O’NEILL RT: Pharmacovigilance in
data mining in large frequency tables’ cancer: recent evidence and a discussion the 21st century: new systematic tools
by Bill DuMouchel. Am. Stat. (1999) of some questions. J. Natl. Cancer Inst. for an old problem. Pharmacotherapy
53:198-200. (1959) 22(1):173-203. (2004) 24(9):1099-1104.

25. DUMOUCHEL W: Bayesian data mining 35. LINDQUIST M, STAHL M, BATE A 44. BANKS D, WOO E, BURWEN D et al.:
in large frequency tables, with an et al.: A retrospective evaluation of a data Comparing data mining methods on the
application to the FDA spontaneous mining approach to aid finding new adverse VAERS database. Pharmacoepidemiol. Drug.
reporting system. Am. Stat. (1999) drug reaction signals in the WHO Saf. (2005) Published Online: 13 Jun 2005
53(3):170-190. international database. Drug Saf. 45. PURCELL P, BARTY S:
•• A seminal paper describing the empirical (2000) 23(6):533-542. Statistical techniques for signal generation:
Bayesian approach to signal detection 36. SPIGSET O, HAGG S, BATE A: the Australian experience. Drug Saf. (2002)
in pharmacovigilance. Hepatic injury and pancreatitis during 25(6):415-421.
26. MOSELEY J, HEELEY E, treatment with serotonin reuptake 46. GOULD AL: Practical pharmacovigilance
EKINS-DAUKES S, EVANS S: inhibitors: data from the World Health analysis strategies. Pharmacoepidemiol.
Preliminary comparison of 2 signal Organization (WHO) database of adverse Drug Saf. (2003) 12(7):559-574.
detection methodologies in the UK drug reactions. Int. Clin. Psychopharmacol. •• A comprehensive, well-written and
regulatory spontaneous ADR data (2003) 18(3):157-161. insightful discussion of both theoretical
base. [Abstract]. Drug Saf. (2004) 37. SANZ EJ, DE-LAS-CUEVAS C, concepts and issues related to the
27(12):950-951. KIURU A, BATE A, EDWARDS R: practical deployment of DMAs.
27. VAN PUIJENBROEK EP, EGBERTS AC, Selective serotonin reuptake inhibitors in 47. HAUBEN M, REICH L: Drug-induced
HEERDINK ER, LEUFKENS HG: pregnant women and neonatal withdrawal pancreatitis: lessons in data mining. Br. J.
Detecting drug-drug interactions using syndrome: a database analysis. Lancet Clin. Pharmacol. (2004) 58(5):560-562.
a database for spontaneous adverse drug (2005) 365(9458):482-487. •• Describes points to consider when
reactions: an example with diuretics 38. COULTER DM, BATE A, MEYBOOM using DMAs.
and non-steroidal anti-inflammatory RH, LINDQUIST M, EDWARDS IR: 48. HAUBEN M, REICH L: A case report
drugs. Eur. J. Clin. Pharmacol. Antipsychotic drugs and heart muscle of rhabdomyolysis with pentamidine that
(2000) 56(9-10):733-738. disorder in international pharmacovigilance: prompted a retrospective evaluation of a
28. VAN PUIJENBROEK EP, EGBERTS AC, data mining study. BMJ (2001) pharmacovigilance tool under investigation.
MEYBOOM RH, LEUFKENS HG: 322(7296):1207-1209. Br. J. Clin. Pharmacol. (2004)
Signalling possible drug-drug interactions 39. BATE A, LINDQUIST M, ORRE R et al.: 58(6):675-676.
in a spontaneous reporting system: delay Data-mining analyses of pharmacovigilance 49. HAUBEN M, REICH L: Data mining,
of withdrawal bleeding during concomitant signals in relation to relevant comparison drug safety, and molecular pharmacology:
use of oral contraceptives and itraconazole. drugs. Eur. J. Clin. Pharmacol. (2002) potential for collaboration. Ann.
Br. J. Clin. Pharmacol. (1999) 58(7):483-490. Pharmacother. (2004) 38(12):2174-2175.
47(6):689-693. •• Provides cogent and relevant 50. HAUBEN M, REICH L: Valproate-
29. VAN PUIJENBROEK EP, EGBERTS AC, demonstrations and discussion, including induced parkinsonism: use of a newer
MEYBOOM RH, LEUFKENS HG: graphical data visualisation tools, of pharmacovigilance tool to investigate
Association between terbinafine and the BCPNN, using real-world examples. the reporting of an unanticipated adverse
arthralgia, fever and urticaria: symptoms 40. VAN PUIJENBROEK EP, EGBERTS AC, event with an ‘old’ drug. Mov. Disord.
or syndrome? Pharmacoepidemiol. MEYBOOM RH, LEUFKENS HG: (2005) 20(3):387-388.
Drug Saf. (2001) 10(2):135-142. Different risks for NSAID-induced 51. HAUBEN M, REICH L: Case reports
30. NEYMAN J: On the application anaphylaxis. Ann. Pharmacother. of dobutamine-induced myoclonia in
of probability theory to agricultural (2002) 36(1):24-29. severe renal failure: potential of emerging
experiments. Essay on principles. 41. VAN PUIJENBROEK EP, BATE A, pharmacovigilancetechnologies. Nephrol.
Section 9, translated in Statistical LEUFKENS HG et al.: A comparison of Dial. Transplant (2005) 20(2):471-472.
Science, (with discussion). Stat. measures of disproportionality for signal
Science (1990) 5(4):465-480.

946 Expert Opin. Drug Saf. (2005) 4(5)


Hauben, Madigan, Gerrits, Walsh & Van Puijenbroek

52. HAUBEN M, REICH L: Endotoxin-like using Japanese spontaneous reports. 72. PURCELL PM: Data mining in
reactions with intravenous gentamicin: Pharmacoepidemiol. Drug Saf. pharmacovigilance Int. J. Pharm.
results from pharmacovigilance tools (2004) 13(6):387-394. Med. (2003) 17(2):63-64.
under investigation. Infect. Control Hosp. 62. DE BRUIN ML, PETTERSSON M, 73. BROWN EG: Effects of coding dictionary
Epidemiol. (2005) 26(4):391-394. MEYBOOM RH, HOES AW, on signal generation: a consideration of use
53. HAUBEN M, REICH L, CHUNG S: LEUFKENS HG: Anti-HERG activity of MedDRA compared with WHO-ART.
Postmarketing surveillance of potentially and the risk of drug-induced arrhythmias Drug Saf. (2002) 25(6):445-452.
fatal reactions to oncology drugs: potential and sudden death. Eur. Heart J. 74. BROWN EG: Methods and pitfalls in
utility of two signal-detection algorithms. (2005) 26(6):590-597. searching drug safety databases utilising
Eur. J. Clin. Pharmacol. (2004) 63. HARVEY JT, TURVILLE C, BARTY SM: the Medical Dictionary for Regulatory
60(10):747-750. Data mining of Australian adverse drug Activities (MedDRA). Drug Saf.
54. HAUBEN M, REICH L: reactions database: a comparison of (2003) 26(3):145-158.
Potential utility of data-mining algorithms Bayesian and other statistical indicators. 75. BROWN EG: Using MedDRA:
for early detection of potentially fatal/ Intl. Tran. Op. Research (2004) 11:419-433. implications for risk management.
disabling adverse drug reactions: 64. HAUBEN M, PATADIA V, GERRITS C, Drug Saf. (2004) 27(8):591-602.
A retrospective evaluation. J. Clin. WALSH L, REICH L: Data mining 76. HAND D: Technical report, Department
Pharmacol. (2005) 45(4):378-384. in pharmacovigilance: the need for of Mathematics, Imperial College London.
55. HAUBEN M: Application of an empiric a balanced perspective. Drug Saf.
77. STROM BL: Evaluation of suspected
Bayesian data mining algorithm to reports (2005) 28(in Press).
adverse drug reactions. JAMA
of pancreatitis associated with atypical 65. STAHL M, LINDQUIST M, (2005) 293(11):1324-1325.
antipsychotics. Pharmacother. (2004) EDWARDS IR et al.: Introducing triage
24(9):1122-1129. 78. ALMENOFF JS, DUMOUCHEL W,
logic as a new strategy for the detection
KINDMAN LA et al.: Disproportionality
56. HAUBEN M: Early postmarketing of signals in the WHO Drug Monitoring
analysis using empirical Bayes data mining:
drug safety surveillance: data mining Database. Pharmacoepidemiol.
a tool for the evaluation of drug interactions
points to consider. Ann. Pharmacother. Drug Saf. (2004) 13(6):355-363.
in the post-marketing setting.
(2004) 38(10):1625-1630. 66. FINNEY DJ: The detection of adverse Pharmacoepidemiol. Drug Saf.
57. ALMENOFF JS, DUMOUCHEL W, reactions to therapeutic drugs. (2003) 12(6):517-521.
KINDMAN LA, YANG X, FRAM D: Stat. Med. (1982) 1(2):153-161.
Disproportionality analysis using empirical 67. NOREN G, ORRE R, BATE A: A hit-miss Websites
Bayes data mining: a tool for the evaluation model for duplicate detection in the WHO
of drug interactions in the post-marketing drug safety database. Proceedings of the 11th 101. http://www.fda.gov/cder/aers/default.htm
setting. Pharmacoepidemiol. Drug Saf. ACM SIGKDD international conference on FDA website
(2003) 12(6):517-521. knowledge discovery and data mining, Adverse event reporting system (2005).
58. ALMENOFF J, DUMOUCHEL W, August 2005, Chicago, USA. 102. http://medicines.mhra.gov.uk/ourwork/
KINDMAN L, YANG X, FRAM D: 68. GOGOLAK VV: The effect of backgrounds monitorsafequalmed/yellowcard/yellow
Letter to the Editor–Re: Almenoff et al., in safety analysis: the impact of comparison cardscheme.htm
‘Disproportionality analysis using empirical cases on what you see. Pharmacoepidemiol. MHRA website
Bayes data mining: a tool for the evaluation Drug Saf. (2003) 12(3):249-252. Yellow card scheme (2005).
of drug interactions in the post-marketing
69. ALTUNDAG O, CELIK I, KARS A: 103. http://www.who-umc.org/
setting’. Pharmacoepidemiol. Drug Saf.
Recurrent asymptomatic bradycardia The Uppsala Monitoring Centre (2005).
(2004) 13(3):111.
episodes after cisplatin infusion. Ann. 104. http://stat.rutgers.edu/∼madigan/BBR
59. ROUX ET-B, P.,THESSARD Pharmacother. (2001) 35(5):641-642. Bayesian Binary Regression software (2005).
F,FOURRIER A,BEGAUD A,EVANS S.:
70. YEE C, KLINCEWICS S, KNIGHT J, 105. http://www.ailab.si/idamap/idamap2003/
Automatic signal detection methods
THOMAS A, WILSON R: Roux.pdf
evaluation on a simulated data base.
Practical consideration in developing Spontaneous reporting system modelling
Workshop 20th International Conference on
an automated signaling program for data mining methods evaluation
Pharmacoepidemiology & Therapeutic Risk
within a pharmacovigilance department. in pharmacovigilance. Conference on
Management, Bordeaux, France, 2004.
Drug Inf. J. (2004) 38(3):293. intelligent data analysis in medicine and
60. ROUX E, TUBERT-BITTER P, • Describes how one pharmaceutical pharmacology. October 2003, Protaras,
THIESSARD F: Evaluation of data company incorporates DMAs in their Cyprus. (2005).
mining methods in pharmacovigilance current pharmacovigilance practices.
using simulated datasets. 20th International 106. http://www.outcome.com/new_home/
71. HAUBEN M, REICH L: images/pharmacovig3_05.pdf
Conference on Pharmacoepidemiology &
Application of an empiric Bayesian data Guidance for industry:
Therapeutic Risk Management, Bordeaux,
mining algorithm to reports of pancreatitis Good pharmacovigilance practices and
France, 2004.
associated with atypical antipsychotics. pharmacoepidemiologic assessment (2005).
61. KUBOTA K, KOIDE D, HIRAI T: Pharmacother. (2004) 24(9):1122-1129.
Comparison of data mining methodologies

Expert Opin. Drug Saf. (2005) 4(5) 947


The role of data mining in pharmacovigilance

Affiliation
Manfred Hauben1,2,3 MD, MPH,
David Madigan4 PhD,
Charles M Gerrits†5 PharmD, PhD,
Louisa Walsh6 MD
& Eugene P Van Puijenbroek7 MD, PhD
†Author for correspondence
1Pfizer, Inc., Risk Management Strategy,

New York, NY, USA


2New York University School of Medicine,

Department of Medicine, New York, NY, USA


3New York Medical College, Departments of

Pharmacology and Community and Preventive


Medicine, Valhalla, NY, USA
4Rutgers University, Department of Statistics,

Piscataway, New Jersey, USA


5Takeda Global Research and Development, Inc.,

Department of Pharmacoepidemiology and


Outcomes Research, Lincolnshire, Illinois, USA
Tel: +1 847 383 3835; Fax: +1 847 383 3889;
E-mail: cgerrits@tgrd.com
6AstraZeneca LP, Clinical Drug Safety,

Wilmington, Delaware, USA


7Netherlands Pharmacovigilance Centre, Lareb,

‘s-Hertogenbosch, The Netherlands

948 Expert Opin. Drug Saf. (2005) 4(5)

You might also like