Data Center

© All Rights Reserved

26 views

Data Center

© All Rights Reserved

- Comparing Groups for Statistical Differences_ How to Choose the Right Statistical Test_ _ Biochemia Medica
- Par Inc Case Problem
- Verification and Validation Viswalk
- Reporting_Statistics_in_APA_Format
- IR
- Chi-square.docx
- Week 5.docx
- Six_Sigma
- Predictive Value of Statistical Significance
- BRM final report
- f14-0
- Class24_ChiSquareTestOfIndependencePostHoc
- QT-MBA Sem I
- A Practical Guide to Statistics
- Geiman (2012) Why we usually dont have to worry about multiple comparisons.pdf
- NonParametric Statistical Inference - Gibbons J..pdf
- Logistic Regression Tutorial
- IN-HOUSE TRAINING AS AN EFFORT TO IMPROVE KNOWLEDGE OF SPECIAL SCHOOL TEACHERS FOR THE DEAF ABOUT AUDIOVISUAL LEARNING MEDIA DEVELOPMENT.
- Abstract
- Dougherty5e Studyguide Review

You are on page 1of 12

TRIALS

clinical trials

Janice M Poguea,b,c, PJ Devereauxa,b, Kristian Thorlunda and Salim Yusuf a,b

to identify centers with problematic data or conduct and intervene while the trial is

still ongoing. Currently, there are few published models that can be used for this

purpose.

Purpose To develop and validate a series of risk scores to identify fabricated data

within a multicenter trial, to be used in central statistical monitoring.

Methods We used a database from a multicenter trial in which data from 9 of 109

centers were documented to be fabricated. These data were used to build a series of

risk scores to predict fraud at centers. All analyses were performed at the level of the

center. Exploratory factor analysis was used to select from 52 possible predictors,

chosen from a variety of previously published methods. The final models were

selected from a total of 18 independent predictors, based on the factors identified.

These models were converted to risk scores for each center.

Results Five different risk scores were identified, and each had the ability to discri-

minate well between centers with and without fabricated data (area under the curve

values ranged from 0.90 to 0.95). True- and false-positive rates are presented for

each risk score to arrive at a recommended cutoff of seven or above (high risk score).

We validated these risk scores, using an independent multicenter trial database that

contained no data fabrication and found the occurrence of false-positive high scores

to be low and comparable to the model-building data set.

Limitations These risk score have been validated only for their false-positive rate

and require validation within another trial that contains centers that have fabricated

data. Validation in noncardiovascular trials is also required to gage the usefulness of

these risk scores in central statistical monitoring.

Conclusions With further validation, these risk scores could become part of a series

of tools that provide evidence-based central statistical monitoring, which in turn can

improve the efficiency of trials, and minimize the need for more expensive on-site

monitoring. Clinical Trials 2013; 10: 225235. http://ctj.sagepub.com

Introduction

The goals of any clinical trial can be achieved if there variety of errors, deviations, or misconducts while

are sufficient procedures in place to reduce bias and undertaking a trial, including procedural errors in

maximize precision in the outcome of interest. Cen- failing to follow the protocol, and data recording

ter investigators and study personnel may make a errors, including in rare cases, falsifying data [1].

a

Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, ON, Canada, bFaculty of Health

Sciences, McMaster University, Hamilton, ON, Canada, cPopulation Health Research Institute, Hamilton Health Sciences,

Hamilton, ON, Canada

Author for correspondence: Janice M Pogue, Population Health Research Institute, Hamilton Health Sciences, General

Campus, DBCVSRI 237 Barton Street East, Hamilton, ON L8L 2X2, Canada.

Email: Janice.Pogue@phri.ca

Reprints and permission: http://www.sagepub.co.uk/journalsPermissions.nav 10.1177/1740774512469312

226 JM Pogue et al.

Not all of these behaviors will lead to biased trial probability of concerns and for which there should

results or increased variability in outcomes [18], but be consideration for on-site monitoring [1,2,4,5].

for those that do, trialists would like to routinely The purpose of this article is to retrospectively

detect and stop these actions while the trial is evaluate the ability of central statistical monitoring

ongoing. Trialists are then left with the question of to identify centers that are known to have fabricated

how best to detect and correct important errors at trial data. The identification of fabricated data is an

centers within a multicenter trial. important function of trial monitoring. Although

It has been suggested that central statistical moni- falsified data may not alter the results of a large mul-

toring may serve as the foundation for quality assur- ticenter trial, at least for blinded trials [13,6,7], it

ance and center monitoring in multicenter clinical could reduce the precision of the trial results. Unfor-

trials [1,2,4,5]. Traditionally, quality assurance for tunately, even isolated and small amounts of fraud

clinical trials has included trial oversight committees within a trial can cause significant doubts about its

(trial management committee, trial steering commit- conclusions and have the potential to lead to a lack

tee, and data monitoring committee); central moni- of public confidence for the clinical trial process in

toring of center performance, including central general [13,5,6,1418].

statistical monitoring; and on-site monitoring [1]. Many authors have suggested that central statistical

The first two monitoring methods are used univer- monitoring of data may be used to identify fraud at

sally, but the use of on-site monitoring visits differs centers within multicenter trials [13,5,6,14,1825].

among trials. Morrison et al. [9] surveyed 65 research Case reports have been published describing how an

organizations involved in the conduct of clinical individual center with fraudulent data was detected

trials and found the use of a wide variety of monitor- by examining center-level data during the conduct of

ing practices. Of the academic, government, or trials including Animal Models for Protecting

cooperative research groups, only 31% always used Ischemic Myocardium (AMPIM) [14], Multiple Risk

on-site monitoring, whereas 84% of the industry/ Factor Intervention Trial (MRFIT) [6], NSABP-06 [15],

contract research organization (CRO) group relied Second European Stroke Prevention study [21], Clopi-

on this approach. Among the various monitoring dogrel and Metoprolol in Myocardial Infarction Trial

methods, on-site visits are expensive, yet we have lit- (COMMIT-1) [19], and Perioperative Ischemic Evalua-

tle evidence that they affect the validity of the main tion (POISE) [26]. Al-Marzouki et al. [20] performed a

findings of an otherwise unbiased trial. While Sha- comparison of a diet intervention database that was

piro and Charrow [10] have documented that Food thought to be fabricated to a sample from a validated

and Drug Administration (FDA) audits have identi- study. In comparing the intervention to the control

fied serious deficiencies at trial centers, they did not group within each data set for a variety of laboratory

classify these deficiencies as capable of changing measurements, the authors found more statistically

individual clinical trial results or not. There are no significant differences between the two randomized

published systematic reviews estimating the effect of groups within the data set that was thought to be

on-site monitoring on trial results, but the experi- fabricated.

ences from some individual programs and trials have Any prognostic model must begin with a litera-

been published. The National Cancer Institute ture review of previously published articles within

found that instituting an on-site auditing program that area, and the difficulty with central statistical

did not change the agreement rate for treatment fail- monitoring is that a great many factors have been

ures or the percentage of protocol deviations within suggested to predict fabricated data. A very large

their program [11]. The National Surgical Adjuvant number of possible statistical methods have been

Breast and Bowel Project (NSABP) in reviewing all its proposed to identify unusual patterns at centers,

participants data found no additional treatment and these are summarized in Table 1. Such a long

failures or deaths, and only a small number of pre- list provides no direct guidance as to which statisti-

viously unknown ineligible participants. An audit of cal methods are most appropriate for central statisti-

the Global Utilization of Streptokinase and TPA for cal monitoring. Yet, there is even more uncertainty

Occluded Coronary Arteries (GUSTO) trial also in how to proceed given that within a typical trial

found no errors that would change the trials results the types of variables that these methods may be

[12]. It has been suggested that regular on-site visits applied to include inclusion and exclusion criteria,

to review entire case report forms are excessive in medical history, physical measurements, laboratory

cost and result in little gain in important data qual- tests, study-specific tests, visit or test dates, compli-

ity [4,12,13]. Given the lack of clear evidence of the ance, outcomes, adverse events, completeness of fol-

value of on-site monitoring to detect biases, and low-up, data query rates, time to clean data, center

their high costs, which is reflected in less use in enrollment, number of participant seen per day, or

investigator initiated trials, it has been suggested calendar day of visits. Given such a long list of pos-

that central statistical monitoring may be able to sible methods and even a modest number of vari-

assist in identifying sites where there is a higher ables collected per trial participant, there is a need

Detecting fraud in trials 227

for a parsimonious and efficient approach to central undergoing noncardiac surgery. Initially there were

statistical monitoring. This article seeks to evaluate 9298 trial participants randomized from 196 clinical

some of the methods for central statistical monitor- centers from 24 countries, but data fabrication was

ing using statistical summaries and tests to establish detected in 9 centers involving a total of 947 rando-

a series of good prognostic models to identify cen- mized trial participants. During this trial, fraud was

ters with fabricated data within a multicenter trial. detected initially at the six participating centers in

Ideally, such a model would be a simple algorithm Iran, all managed by a common research team. Here,

that could be used widely in different types of multi- problems were first detected when a center called in

center trials, at modest costs. to randomize a patient when they should not have

In this article, we use data from the POISE Trial had any available drug kits left. An on-site audit

[26] to retrospectively model confirmed data fabrica- organized by the National Principal Investigator

tion at centers within a multicenter trial. took place, and only minor problems were reported.

However, a former center staff member claimed

fraud had occurred in Iran.

Methods Subsequent on-site monitoring by the Study Prin-

cipal Investigator found massive amounts of data

The POISE Trial fabrication. We discovered that many submitted

electrocardiograms and troponin values did not

The POISE Trial [26] examined the effect of a perio- belong to the patients for whom they were sub-

perative beta-blocker versus placebo in participants mitted. Commonly, the date of the test was covered

at risk of cardiovascular events who were up and another date was reported. We also estab-

lished that the laboratory computer had deleted the

troponin results that were beyond 6 months old

Table 1. Statistical method suggested for identifying fabricated

and that many official laboratory troponin reports

data

did not match what was in the laboratory computer.

Classification Type Method Consult notes were frequently made up on hospital

stationary, indicating that a patient had suffered an

Univariate: Statistical Proportions [1,2,5,19,20] event (e.g., a myocardial infarction) when the actual

examining summaries Means [1,2,5,19,20] hospital chart did not mention or support such an

one variable Center event rates [1,2,14,18,19,21]

outcome. Through review of hospital charts, we also

at a time Variances [2,5,14,20]

identified patients who had suffered a major perio-

Digit preference [1,2,20,25]

perative cardiovascular complication, but the sub-

Calendar checks (day of week) [1,2]

mitted case report forms indicated that these

Benfords law [1,2,25,27,28]

patients had not suffered such an event. We identi-

Skewness [1,2]

fied what appeared to be a fabricated death certifi-

Kurtosis and outliers [1,2,5,25]

cate based upon a phone call with the patients

Inliers [2,25]

daughter who informed us that the patient was alive

Statistical t-test [2,20]

and vacationing in another city. We established that

tests Chi-square tests [2,5,20]

none of the 59 patients randomized and reported to

F-test for variances [2,20,21]

have had surgery at one Iranian center actually had

Purely Histograms [2,25]

surgery at that center. These patients had surgery at

graphical qq plot [2,25]

the other Iranian centers. Furthermore, many

methods box plots [2,25]

patients were inappropriately randomized after they

Multivariate: Statistical Date comparisons [2,5,25]

had surgery, and their date of surgery was falsified

examining summaries Correlations [2,5,25]

on the case report forms. All records from the rando-

combinations Intraclass correlations [2]

mized patients at these six centers (N = 752) were

of variables Auto correlation [2]

considered to be fabricated and were omitted from

Mahaloanobiss distance [2]

the POISE Trial.

Cooks distance [2]

Fraud also was found to have occurred at three

Statistical Cluster analysis [2]

centers in Columbia, which shared a common

tests Discriminant analysis [2]

research assistant. Problems were first detected

Hotellings T2 [2]

when another center staff member failed to locate

Runs tests [2]

some consent forms for patients enrolled by a speci-

Purely Scatter plots [2,5,14,25]

graphical Chernoff faces [2,25]

fic research assistant. An on-site audit first by the

methods Star plots [2,25]

National Principal Investigator and subsequently by

the Study Principal Investigator documented the

228 JM Pogue et al.

following problems. Many of the patients this Table 2. Strategy for model building

research assistant had randomized could not be

1. Identify possible predictors of Based on prior publications

tracked to an actual patient, and of those patients

data fabrication at a center

who could be identified, many of them denied parti-

2. Summarize predictors at a Calculate p-values for

cipating in POISE. Of the patients who could be

center level in a unit-less form predictors comparing each

identified and had consented to POISE, many were

center versus all others

ineligible to participate in POISE and the troponin

3. Eliminate redundancy Factor analysis

values recorded for these patients were not consis-

among the predictors

tent with the timing of their actual surgery. This

4. Build possible models to Logistic regression

audit was extended to cases not associated with this

predict data fabrication

research assistant, both at the three centers at which

5. Convert model regression Utilize a points system [30]

this research assistant worked and the other eight

coefficients to scores

Colombian centers, and these cases did not demon-

6. Validate the scores externally Apply to an independent

strate any concerns. From these three centers, data

multicenter trial database

from 195 patients (26% of total randomized) were

considered to be fabricated and were omitted from

the final trial analysis [26].

The accuracy of the data from the rest of the

POISE centers was verified by on-site monitoring for

hospitals that recruited 40 or more participants and presented in the baseline characteristics table within

for any other center that was identified as an outlier the original POISE publication, when the character-

through central statistical monitoring. This on-site istic was present in at least 10% of the trial partici-

monitoring was completed at 77 centers that had pants. Other possible predictors of fraud that we

included were those with repeated physical measure-

collectively randomized 85% of all trial participants.

No major discrepancies, outside of those reported ment over time, compliance and outcome rates, data

quality/query rates, and total numbers of partici-

above, were identified between the submitted data

pants randomized. See online Appendix A (Table A1)

and the hospital records. One unreported myocar-

for a complete list of all variables initially included

dial infarction was identified out of 534 reported

as potential predictors.

primary outcomes, but no other instances of unre-

ported outcomes or fabricated data were found.

Only the 109 centers that randomized 20 trial parti- Summaries by center

cipants or more were included in this analysis, so that

we could appropriately use statistical tests to summar- Although it is the individual at a center who com-

ize data patterns within and between centers. We pur- mits fraud, most trials, including POISE [26], do not

posefully planned to use statistical summaries or tests record which center staff member is responsible for

within our models, so that they could potentially be the origins of each data item. For this reason, the

generalized across different studies, regardless of the center was chosen to be the unit of analysis for

number of centers or variables collected. We avoided these models, with all variables summarized at the

using purely graphical methods as these can have sub- center level. Furthermore, since all trials measure

jective interpretations, and we preferred models that different variables, the summaries have to be unit-

can be validated in an objective manner. less, derived from statistical distributions and tests,

rather than in their native units. For each variable,

the focus was to present ways of showing how dif-

Statistical methods ferent each center was from the others. This assess-

ment of center differences did not assume any

The analytic strategy for model building and valida-

directionality (e.g., high or low rates) but only

tion is summarized in Table 2. There are multiple

sought to quantify how distinct the data were at

valid strategies for developing risk models, and the

individual centers. The one exception to this focus

strategy we opted to use should be considered only

was the summary of repeated measurements over

as one possible option. We also decided to develop

time, where the summary statistic was already unit-

five models instead of a single one, recognizing that

less (see test (7)). The following tests were used:

data fabrication may be detected in multiple ways.

All analyses were performed in SAS 9.1 for Unix [29].

(1) Frequency comparison. For binary data (e.g., age

.70 years), the proportion with this character-

Identifying possible predictors istic at each center was compared to the overall

proportion from all other centers using a two-

Based on prior research, we selected a variety of vari- by-two Pearson chi-squared test. The probabil-

able types including the baseline characteristics ity value (p-value) from this test for each center

Detecting fraud in trials 229

was used as a potential predictor for model detect fabricated data. The consistency of con-

building. tinuous measurement recorded multiple times

(2) Mean comparison. For continuous measure- in follow-up (e.g., systolic blood pressure) was

ments (e.g., systolic blood pressure), which quantified by an intraclass correlation coeffi-

were likely to be approximately normally dis- cient (ICC) calculated for each center.

tributed, a two-group Students t-test, using a

pooled standard deviation, was used to com- Table A1 (online Appendix A) lists each possible

pare the mean at each center to the overall predictor and the statistic(s) used to assess it. A total

mean of other centers. For each center, two- of 52 potential predictors were generated to be used

tailed p-value from this test was used as a to predict fabricated data.

potential predictor.

(3) Digit preference. We compared the frequency of

the last digit recorded for physical measure- Eliminating redundancy

ments over all trial participants at each center.

Given the long list of suggestions to identify fabri-

The frequency of these digits was compared to

cated data, we have no clear a priori direction as to

the overall frequency from all other centers

which predictors should be included in building a

using a two-by-two Pearson chi-squared test.

prognostic model. Additionally, in any long list of

The p-value from this test for each center was

possible predictors, there is likely to be correlation

used as a potential predictor for model build-

among them, but since all predictors are summar-

ing. This test was also used to assess the day of

ized at the center level, rather than for individual

week on which randomization occurred. This

trial participants, these correlations may be challen-

method assumes no distribution for the digits

ging to predict. Therefore, an initial principal com-

or days but merely tests whether a center is dif-

ponents factor analysis was done to identify a subset

ferent from the other centers.

of independent variables for centers to use in predic-

(4) Variance comparison. The variability of measure-

tive modeling. The exploratory principal compo-

ments at centers was compared using a folded

nent analysis with varimax rotation was used to

F-test, contrasting the variance of a possible

identify a subset of independent variables, with the

predictor at each center to the variance of the

total number of factors identified being chosen

rest of the centers. For each center, a p-value

using the Kaiser criterion of eigenvalues greater than

from this test was used as a potential predictor.

one. For each identified factor, one variable with the

(5) Distance measure. Using data from the ith trial

largest loading score was then included in the initial

participant at the jth center, we calculated a dis-

tance measure (dj ) to indicate how far away one models. This reduced set of variables was used to

centers data are from the overall mean (y) across predict fraudulent centers in a logistic regression.

all centers, standardized by the overall standard

deviation (s) [25,31]. The natural logarithm of Building possible models

distance was used as a possible predictor.

We used the best subsets of models using the branch

X yij y

2

and bound algorithm of Furnival and Wilson [32] to

dj = find models with the largest score statistic for includ-

i

s

ing different numbers of variables. The final series of

models was selected based on no significant increase

(6) Outcome probability. We calculated the outcome

in the score test for increasing the number of vari-

rate at each center adjusted for country. We then

ables in the model. These models were checked for

calculated the probability of observing an

lack of fit using the Hosmer and Lemeshow [33]

adjusted outcome rate as extreme as that observed

goodness of fit test. Prediction ability was summar-

at that center, assuming a Poisson distribution

ized as the area under the curve (AUC) with 95%

with the overall adjusted mean. This adjustment

confidence intervals (CIs) and partial AUC.

for country variation was used to control for the

commonly observed patterns of different out-

come rates in different countries, due to many Converting models to risk scores

factors, including use of concomitant medica-

tions, differing health-care systems, and a variety For ease of use, the resulting models were then con-

of other factors. The probability from the cumula- verted into simplified risk scores using a points sys-

tive probability distribution (CDF) for each center tem [30]. Here, a reference category is selected and

was used as a possible predictor. whole number points are given to the range of the

(7) Repeated measures. The correlation of repeated predictor, relative to its regression coefficient from

measurements may be a good approach to the logistic regression. We then identified cutoffs of

230 JM Pogue et al.

these scores to most efficiently identify the fraudu- intraperitoneal surgery frequency p-value (model 2),

lent centers. general anesthesia frequency p-value (model 3), pre-

operative angiotensin-converting-enzyme inhibitor

(ACE-I) or angiotensin II receptor blocker (ARB) fre-

Model validation quency p-value (model 4), or compliance rate CDF

p-value (model 5). For models 14 statistics, a higher

Since fabricated data sets are rare, external valida- p-value indicated a greater risk of fraud for that cen-

tion of the model was done by calculating the risk ter. Again given the high consistency of systolic

scores in an independent trial that had on-site mon- blood pressure over time, and similarity of diastolic

itoring and for which no center was identified as blood pressure at baseline, centers with data that

having fabricated data. If the fabricated data score were very similar to the overall summary statistics

was valid, we would anticipate low scores for all cen- were predicted to be at risk of fraud. For the fifth

ters within this second trial. For purposes of model model, lower CDF probability for compliance was

validation, the multicenter Heart Outcomes Preven- associated with fraud. As the center compliance rate

tion Evaluation (HOPE) Trial [34] data were used. became more improbable (either high or low rates

of compliance), there was a higher risk of being a

center with fabricated data. For this last model,

Results therefore, the points are reverse.

A total of 8722 randomized participants from 109 Table 3 displays these predictive models after con-

clinical sites were included in this analysis, 947 par- version of the logistic regression coefficients to a

ticipants with fabricated data from 9 sites and 7775 simple scoring system. Figure 1 displays these five

participants with validated data from 100 sites. risk scores for each center with fraudulent centers

From the 52 possible predictors specified, factor ana-

lysis identified 18 independent factors. From this,

the variable with the largest loading for each of the Table 3. Risk scores predicting fabricated data

18 factors was selected for inclusion in logistic

regression modeling to predict sites with fraud. See Model Terms Categories Score

Table A2 in online Appendix A for the associations Predictor 1: SBP over time intraclass 1 +0

between predictors and the reduced list included in correlations 2 +1

model building. 3 +2

Based on the score function, any possible model 4 +3

of more than three predictors did not significantly 5 +4

add to the models. The five 3-variable models with Predictor 2: DBP mean comparison 1 +0

highest score test were selected as possible predictive t-test p-values 2 +1

models. For each of these models, no significant lack 3 +2

of fit was detected. See Table A3 of online Appendix 4 +3

A for the logistic regression coefficients for the five 5 +4

models. Predictor 3: Model 1: SBP digit 1 + 0 (+ 4)

These models all included the intracluster correla- preference x2 p-values

tion coefficient for repeated systolic blood pressure Model 2: surgery 2 + 1 (+ 3)

measurements, with larger values indicating higher intrathoracic or

risk of fraud. The second term included in each intraperitoneal

model was the diastolic blood pressure comparison frequency x2 p-value

of each centers mean versus that of the other cen- Model 3: anesthesia 3 + 2 (+ 2)

ters. Here, risk increased as the t-test p-value general frequency

increased toward 1. This indicated that after adjust- x2 p-value

ing for the high ICC in the first term, a centers Model 4: ACE-I/ARB 4 + 3 (+ 1)

mean diastolic blood pressure at baseline that is very x2 p-value

similar to that of other centers predicts a risk of fab- Model 5: compliance 5 + 4 (+ 0)

ricated data. Given that there is a very high regular- outcome probability 2

ity in the values of the systolic blood pressure over CDF (points in brackets)

time (high ICC), it is unusual for the same center to

have a diastolic blood pressure mean that closely SBP: systolic blood pressure; DBP: diastolic blood pressure; ARB: angioten-

sin II receptor blocker; ACE-I: angiotensin-converting-enzyme inhibitor;

resembles other centers.

CDF: cumulative probability distribution.

The third term for each model differed. These Point reversed for model 5 only and provided in brackets.

included the following: systolic blood pressure digit Categories: intraclass correlation and p-values 1= 0.20, 2 = 0.210.40,

preference p-value (model 1), intrathoracic or 3 = 0.410.60, 4 = 0.610.80, and 5 = 0.81+ .

Detecting fraud in trials 231

95% CI: (0.90-1.00) (0.81-0.99) (0.84-0.98) (0.82-0.98) (0.88-1.00)

12

10

Center Scores

8

6

4

Fraud

2

None

0

ROC: receiver operating characteristic.

indicated in red. All five scores could discriminate relatively equivalent variables. We used HOPE dia-

well between fraudulent and validated centers with betes inclusion criteria instead of surgery or anesthe-

AUC values ranging from 0.90 (95% CI: 0.810.99) sia type, concomitant beta-blocker use instead of

for model 2 to 0.95 (95% CI: 0.901.00) for model ACE-I or ARB use, and 75% compliance to 2 years

1, with this latter model having a smaller 95% CI. instead of 80% compliance at 30 days. Figure 2

Figure 1 shows that the majority of the centers with presents the model 1 scores for these HOPE center

fraud have higher scores than those centers without alongside those for POISE. Within the HOPE data

fabricated data. set, only 2 of the 180 centers had a model 1 risk

In using these scores within an active multicenter score of at least seven (1.1%). For models 2, 4, and

trial, one would have to select a cutoff value and 5, the number of false-positive scores was 13 (7.2%),

examine or monitor all centers with this score or 11 (6.1%), and 23 (12.8%), respectively. These false-

greater. Table 4 shows the effect of using various risk positive rates are comparable to those observed in

score cutoffs on the number of centers selected by POISE (see Table 4 for score 7), and even lower for

fraud status. For model 1, examining centers with models 1 and 5.

scores of seven or above would have detected 8 of Table 5 shows the outcome rates and treatment

the 9 fraudulent centers (89%) and involve detailed effects for the POISE trial with and without the frau-

examination of 18 (18%) of the total centers within dulent data. The inclusion of the fraudulent data in

the trial, including false-positive scores for 10 cen- the trial database did lead to minor variations in

ters (10%) with no fabricated data. For model 5, 24 outcome rates, treatment estimate, 95% CIs, and

centers (22%) would have high fraud scores, 8 (89%) p-values. However, these small differences did

with fraud and 16 (16%) with false-positive high change the interpretation of which outcomes were

scores. statistically significant, at a p-value of 0.05, for the

Similar variables to those included within model primary outcome and cardiovascular death.

1 were also measured within the HOPE Trial, which

randomized 9541 participants at 281 centers in Ref.

[34]. Using the 180 centers that had randomized at Discussion

least 20 participants into the trial, we tested the var-

ious models using similar variables to those col- The process of quality assurance in multicenter trials

lected in POISE. Models 1, 2, 4, and 5 scores were will have multiple components including oversight

calculated for each center, and a score of seven or by trial committees, site training and communica-

greater was defined as a high fraud risk score. The tion, data cleaning and checking, central statistical

score for model 1 was the only risk score that con- monitoring, and on-site monitoring [1,5]. All trials

tained virtually equivalent variables to the model would benefit from the careful evaluation of how to

for POISE. For the remaining models, we selected use each of these components individually and in

232 JM Pogue et al.

Table 4. Sensitivity of score cutoffs to detecting centers with fraud (n = 9 of 109 centers)

Model 1 Fraud (true positive) 9 (100%) 9 (100%) 8 (89%) 8 (89%) 6 (67%) 4 (44%)

No fraud (false positive) 42 (42%) 26 (26%) 23 (23%) 10 (10%) 4 (4%) 0 (0%)

Total high scores 51 (47%) 35 (32%) 31 (28%) 18 (28%) 10 (9%) 4 (4%)

Model 2 Fraud (true positive) 9 (100%) 7 (78%) 6 (67%) 5 (56%) 2 (22%) 2 (22%)

No fraud (false positive) 41 (41%) 22 (22%) 8 (8%) 4 (4%) 1 (1%) 1 (1%)

Total high scores 50 (46%) 29 (27%) 14 (13%) 9 (8%) 3 (3%) 3 (3%)

Model 3 Fraud (true positive) 9 (100%) 8 (89%) 6 (67%) 4 (44%) 3 (33%) 3 (33%)

No fraud (false positive) 36 (36%) 20 (20%) 14 (14%) 3 (3%) 2 (2%) 0 (0%)

Total high scores 45 (41%) 28 (26%) 20 (18%) 7 (6%) 5 (5%) 3 (3%)

Model 4 Fraud (true positive) 9 (100%) 8 (89%) 7 (78%) 4 (44%) 4 (44%) 1 (11%)

No fraud (false positive) 46 (46%) 25 (25%) 15 (15%) 6 (6%) 1 (1%) 1 (1%)

Total high scores 55 (51%) 33 (30%) 22 (20%) 10 (9%) 5 (5%) 2 (2%)

Model 5 Fraud (true positive) 9 (100%) 9 (100%) 9 (100%) 8 (89%) 6 (67%) 4 (44%)

No fraud (false positive) 65 (65%) 47 (47%) 33 (33%) 16 (16%) 7 (7%) 1 (1%)

Total high scores 74 (68%) 56 (51%) 42 (39%) 24 (22%) 13 (12%) 5 (5%)

This table shows the true-positive and false-positive counts and percentages for each model defined at various cutoffs. A true positive is a high score (at or

above the cutoff) for a center with fraud, and a false positive is a high score for a center without fraud. Percentages for centers with fraud are out of a total

of 9 centers and false positives are from 100 centers. The total number of centers with high scores per cutoff indicates the number of centers where some

action would be required by the trial management group (e.g., on-site visits).

12

10

Center Scores

8

6

4

Fraud

2

None

0

POISE HOPE

Figure 2. External validation of model 1 on a trial without fabricated data: A comparison of the distribution of center risk score in POISE

(with nine fraudulent centers) and HOPE (no fraudulent centers).

POISE: Perioperative Ischemic Evaluation: HOPE: Heart Outcomes Prevention Evaluation.

combination to arrive at a database that contains However, cost is not the only issue with traditional

minimal bias and maximizes precision for trial on-site monitoring. The process of data collection in

outcomes. trials is changing, with greater use of electronic data

The traditional model of frequent on-site monitor- capture (EDC). Paper records could no longer exist at

ing has been estimated to take up to 30% of the total sites to compare with values within the trial data-

cost of a trial and could be reduced substantially base. Information is now commonly entered directly

with the use of central statistical monitoring [4,13]. onto an electronic device during an interview with a

Detecting fraud in trials 233

Table 5. Impact on the results of the POISE Trial with and without inclusion of centers with and without fraudulent data

Metoprolol Placebo Metoprolol versus placebo Metoprolol Placebo Metoprolol versus placebo

Primary: CV death, MI, 244 (5.8%) 290 (6.9%) 0.84 0.700.99 0.040 284 (6.1%) 328 (7.1%) 0.86 0.731.01 0.064

cardiac arrest

CV death 75 (1.8%) 58 (1.4%) 1.30 0.921.83 0.137 83 (1.8%) 59 (1.3%) 1.41 1.011.97 0.044

Non-fatal MI 152 (3.6%) 215 (5.1%) 0.70 0.570.86 \0.001 182 (3.9%) 247 (5.3%) 0.73 0.600.89 0.001

Non-fatal Cardiac arrest 21 (0.5%) 19 (0.5%) 1.11 0.602.06 0.744 24 (0.5%) 24 (0.5%) 1.00 0.571.76 0.994

POISE Trial: Perioperative Ischemic Evaluation Trial; HR: hazard ratio; CI: confidence interval; CV: cardiovascular; MI: myocardial infarction.

trial participant. The EDC record may be the only enrolled high-risk cardiovascular patients undergoing

source document that exists in many trials. The surgery with short-term treatment and follow-up to

future of trial quality assurance must by necessity 30 days. However, HOPE had a secondary prevention

move to a process that relies predominantly on cen- population, with long-term therapy and follow-up to

tralized data checking. Therefore, we need to study a mean of 4.5 years.

how best to avoid bias and excessive variability or An important limitation of this article is that the

noise by identifying important procedural errors at validation data set did not contain any case of fraud,

sites, critical data recording errors, and fraudulent and can only demonstrate the false-positive rates for

data. This article represents a first step in developing these risk scores. Further validation of these risk

an evidence-based set of quality assurance tools that scores within data sets that do contain significant

can be used to improve the quality of the research numbers of fraudulent centers will be required to

that we undertake. fully validate the current models. There is also a

In this article, we found that the best predictors of need for further validation of these models in other

fraud are the combination of high similarity of types of trials, within different research areas. It may

repeated measurements (e.g., systolic blood pres- be important to tailor these scores within specific

sure) estimated through a site ICC, and a higher research areas, but a common pattern across all areas

similarity of center baseline characteristics (e.g., dia- should include the investigation of centers which

stolic blood pressure) when compared with data have high regularity in physical measurements and

from other centers. On its own, the lack of variabil- very typical baseline characteristics. We are hopeful

ity for repeated measurement data at a center may that these risk scores may have equivalents in other

in general be the most sensitive single predictor to trials that will also lead to more effective identifica-

detect possible fraud. However, it is the combination tion of fraud. It should be noted that standard data

of great regularity within the data, accompanied by cleaning must be implemented within any trial,

very typical baseline characteristic measurements, prior to calculating any fraud risk score, as random

that is best at indicating the presence of fabricated noise in the data can easily mask the patterns we are

data. Those inventing data are creating what they trying to detect. Also, the percentage of fabricated

imagine to be a group of typical trial participants, data at a center will directly influence our ability to

whose baseline characteristics appear very normal, identify it, making the rare fabricated values almost

but the resulting means and frequencies are too impossible to detect through any statistical

similar to all the other centers. They also fail to cre- methods.

ate the natural variability in continuous physical It is important to detect fraud in trials since it

measurements over time and between individuals could add noise to trial results and reduce the power

and make these data too regular and predictable. of the trial to detect treatment effects, if they exist.

Although no prognostic model will ever be 100% In the POISE Trial [26], we found that the fabricated

accurate, the scores developed here had reasonable data did add some noise to trial results, likely due to

success at discriminating between centers with and the relative size of the fabricated data. Others have

without fabricated data. found that a small proportion of fabricated data did

We demonstrated, in an independent database, not change their multicenter trial results [6,7,15,19].

that the calculated risk scores for centers with valid It is possible that trials where the treatment is not

data are low. This finding is encouraging because the masked may be at greater risk for bias due to fabri-

POISE and HOPE trials have some important design cated data. Regardless of the effect of fraud on trial

differences. Both are cardiovascular trials, but POISE results, it is important to detect this in a trial because

234 JM Pogue et al.

clinical trials [1,2,5,7].

In the future, it would be useful to build other risk 1. Baigent C, Harrell F, Buyse M, Emberson J, Altman D.

scores to identify other issues with clinical site per- Ensuring trial validity by data quality assurance and

formance, including major protocol deviations, sys- diversification of monitoring methods. Clin Trials 2008;

tematic under-reporting of adverse events or study 5: 4955.

2. Buyse M, George S, Evans S, et al. The role of biostatis-

outcomes, or problems in the consent process. One

tics in the prevention, detection and treatment of fraud

can envision central statistical monitoring in trials in clinical trials. Stat Med 1999; 18: 343551.

as a toolbox of indices for various suspected pro- 3. DeMets D. Distinctions between fraud, bias, errors, mis-

blems at centers, guiding the monitoring process. understanding, and incompetence. Clin Trials 1997; 18:

Further research in this area is needed to equip trial- 63750.

ists with the tools they need to identify those who 4. Eisenstein E, Collins R, Cracknell B, et al. Sensible

threaten the reputation of randomized controlled approaches for reducing clinical trials costs. Clin Trials

trials. Use of such scores could streamline the pro- 2008; 5: 7584.

cess of quality assurance in multicenter trials, lead- 5. Knatterud G, Rockhold F, George S, et al. Guidelines for

quality assurance in multicenter trials: A position paper.

ing to greater effectiveness and efficiency. It is only

Control Clin Trials 1998; 19: 47793.

through systematic study of trial methodology that 6. Neaton J, Bartsch G, Broste S, et al. A case of data altera-

we can identify evidence-based best practices and tion in the Multiple Risk Factor Intervention Trial

arrive at sensible guidelines for the conduct of clini- (MRFIT). Control Clin Trials 1991; 12: 73140.

cal trials [8]. 7. Peto R, Collins R, Sackett D, et al. The trials of Dr. Ber-

nard Fisher: A European perspective on an American epi-

sode. Control Clin Trials 1997; 18: 113.

Funding 8. Yusuf S, Bosch J, Devereaux P, et al. Sensible guidelines

for the conduct of large randomized trials. Clin Trials

Funding for the conduct of the POISE trial was 2008; 5: 3839.

received from the Canadian Institutes of Health 9. Morrison B, Cochran C, White J, et al. Monitoring the

Research; the Commonwealth Government of Aus- quality of conduct of clinical trials: A survey of current

tralias National Health and Medical Research Coun- practices. Clin Trials 2011; 8: 34249.

cil; the Instituto de Salud Carlos III (Ministerio de 10. Shapiro M, Charrow R. The role of data audits in detect-

Sanidad y Consumo) in Spain; the British Heart ing scientific misconduct: Results of the FDA program.

Foundation; and AstraZeneca, who provided the JAMA 1989; 261: 250511.

study drug and funding for drug labelling, packa- 11. Weiss R, Vogelzang N, Peterson B, et al. A successful sys-

ging, and shipping and helped support the cost of tem of scientific data audits for clinical trials: A report

from the Cancer and Leukemia Group B. JAMA 1993;

some national POISE investigator meetings. The

270: 45964.

HOPE trial was funded by the Medical Research 12. Friedman LM, Furberg CD, DeMets DL. Fundamentals of

Council of Canada, HoechstMarion Roussel, Astra- Clinical Trials. Springer, New York, 2010.

Zeneca, King Pharmaceuticals, Natural Source Vita- 13. Eisenstein E, Lemons P II, Tardiff B, et al. Reducing the

min E Association and Negma, and the Heart and costs of phase III cardiovascular clinical trials. Am Heart J

Stroke Foundation of Ontario. 2005; 149: 48288.

14. Bailey K. Detecting fabrication of data in a multicenter

collaborative animal study. Control Clin Trials 1991; 12:

Acknowledgments 74152.

15. Christian M, McCabe M, Korn E, et al. The National

The authors would like to acknowledge the contri- Cancer Institute audit of the National Surgical Adjuvant

bution of Dr David DeMets for his thoughtful review Breast and Bowel Project B-06. N Engl J Med 1995; 333:

and advice on an earlier version of this article. 146974.

16. Ranstam J, Buyse M, George S, et al. Fraud in medical

research: An international survey of biostatisticians. Con-

Conflict of interest trol Clin Trials 2000; 21: 41527.

POISE: Dr Yusuf has received consultancy fees, 17. Swazey J, Anderson M, Lewis K. Ethical problems in aca-

demic research. Am Sci 1993; 81: 54252.

research grants, and honoraria from AstraZeneca,

18. White C. Suspected research fraud: Difficulties of getting

which provided the study drug for the POISE trial. the truth. BMJ 2005; 331: 28188.

HOPE: Dr Yusuf was supported by a Senior Scientist 19. COMMIT (CLOpidogrel and Metroprolol in Myocar-

Award of the Medical Research Council of Canada dial Infarction Trial) Collaborative Group. Addition of

and a Heart and Stroke Foundation of Ontario clopidogrel to aspirin in 45852 patients with acute myo-

Research Chair. The other authors had no compet- cardial infarction: Randomized placebo-controlled trial.

ing interests. Lancet 2005; 366: 160721.

Detecting fraud in trials 235

20. Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these 27. Benford F. The law of anomalous numbers. Proc Am Phil

data real? Statistical methods for the detection of data Soc 1938; 78: 55172.

fabrication in clinical trials. BMJ 2005; 331: 26770. 28. Preece D. Distribution of final digits in data. Statistician

21. The EsPS2 Group. European Stroke Prevention Study 2. 1981; 30: 3160.

Efficacy and safety data. J Neurol Sci 1997; 151: S1S77. 29. SAS Institute. SAS, Version 9.1 (computer program),

22. Schraepler J, Wagner G. Identification, characteristics and Cary, NC, 2002.

impact of faked interviews in surveys. IZA DP No. 969, 2011. 30. Sullivan L, Massaro J, DAgostino R Sr. Presentation of

The Institute for the Study of Labor (IZA), www.iza.org. multivariate data for clinical use: The Framingham Study

23. Murphy J, Eyerman J, McCue C, Hottinger C, Kenner risk score function. Stat Med 2004; 23: 163160.

J. Interviewer falsification detection using data mining. 31. Evans S. Statistical aspects of the detection of fraud. In

In Proceedings of Statistics Canada Symposium, Ottawa, Lock S, Wells F (eds). Fraud and Misconduct in Medical

Ontario, Canada, 2005, 11-522-XIE. Research. London: BMJ Publishing Group, 1996, pp. 226

24. Svolba G, Bauer P. Statistical quality control in clinical 39.

trials. Control Clin Trials 1999; 20: 51930. 32. Furnival G, Wilson R. Regression by leaps and bounds.

25. Taylor R, McEntegart D, Stillman E. Statistical techni- Technometrics 1974; 16: 499511.

ques to detect fraud and other data irregularities in clini- 33. Hosmer DWJ, Lemeshow S. Applied Logistic Regression.

cal questionnaire data. Drug Inform J 2002; 36: 11525. John Wiley & Sons, New York, 1989.

26. Devereaux P, Yang H, Yusuf S, et al. Effects of extended- 34. The Heart Outcomes Prevention Evaluation (HOPE)

release metoprolol succinate in patients undergoing non- Study Investigators. Effect of an angiotensin-convert-

cardiac surgery (POISE trial): A randomised controlled ing-enzyme inhibitor, ramipril on cardiovascular events

trial. Lancet 2008; 371: 183947. in high-risk patients. N Engl J Med 2000; 324: 14553.

Reproduced with permission of the copyright owner. Further reproduction prohibited without

permission.

- Comparing Groups for Statistical Differences_ How to Choose the Right Statistical Test_ _ Biochemia MedicaUploaded byDaniel Bondoroi
- Par Inc Case ProblemUploaded byTrisit Chatterjee
- Verification and Validation ViswalkUploaded byFurqon Edy Muhammad
- Reporting_Statistics_in_APA_FormatUploaded byAbbas Smiley
- IRUploaded bySukesh R
- Chi-square.docxUploaded byRoma Mahinay Dela Cruz
- Week 5.docxUploaded byRob Rehman
- Six_SigmaUploaded byJane Lob
- Predictive Value of Statistical SignificanceUploaded byTom Heston MD
- BRM final reportUploaded bysufyanbutt007
- f14-0Uploaded byAdriana Rezende
- Class24_ChiSquareTestOfIndependencePostHocUploaded bynapsd
- QT-MBA Sem IUploaded bytusharhrm
- A Practical Guide to StatisticsUploaded byamin jamal
- Geiman (2012) Why we usually dont have to worry about multiple comparisons.pdfUploaded byLyly Magnan
- NonParametric Statistical Inference - Gibbons J..pdfUploaded byasd
- Logistic Regression TutorialUploaded byDaphne
- IN-HOUSE TRAINING AS AN EFFORT TO IMPROVE KNOWLEDGE OF SPECIAL SCHOOL TEACHERS FOR THE DEAF ABOUT AUDIOVISUAL LEARNING MEDIA DEVELOPMENT.Uploaded byIJAR Journal
- AbstractUploaded byrahmanugroho
- Dougherty5e Studyguide ReviewUploaded byyelida blanco
- Bayesian statistics in genetics.pdfUploaded byAlejandro Bárcenas
- An Emperical Study on Export of Developing Asean CountryUploaded byEditor IJTSRD
- Contingency Tables.Uploaded byShailendra Kr Yadav
- Advanced StatisticsUploaded byArnold Arceo
- day 2.docxUploaded bylainel
- Statistical_Hypothesis_Testing.docxUploaded byChristian Gelilang
- statistics projectUploaded byapi-457740603
- RCO-002Uploaded byDebi Prasad Sahoo
- Exam Solutions 2017 2018 Semester 2 AdaptedUploaded byRyan Lee
- Hypothesis TestingUploaded byBernadette Figarum

- LEMBAR JAWABANUploaded bySyahnasMasterina
- Upping the Likelihood of Merger & Acquisition Success Through Intercultural SynergyUploaded bymruga_123
- ReddebriefUploaded byParitosh Pushp
- achievement chart rubricUploaded byapi-253252566
- AVRCUploaded byAnu Mary Varghese
- CONTOH PARAPHRASING, SUMARISING, QUOTING AND ABSTRACTUploaded byLuhan Rina
- SHS Applied_English for Academic and Professional Purposes CGUploaded byRonz de Borja
- MNC & FDI Book.pdfUploaded byGin Bing Wong
- The Basic Structure of AccountingUploaded bysabeen ansari
- NashBUploaded byNa sh
- Identifying a Research Problem HandoutsUploaded byjade_witch27
- Assessing Adhesion with Transdermal Delivery Systems and Topical Patches for ANDAs Draft June 2016 Generic Drugs.pdfUploaded byMarkbot1999
- Gender and Cash_analysis - December 2018Uploaded byAnahita Boboeva
- 59 Shreyas Patankar.pptxUploaded bylionking007
- d 03501041045Uploaded bytheijes
- IBIMA_walid_ellili (2).pdfUploaded byWalid Ellili
- 3. 2009 Central Limit TheoremUploaded byfabremil7472
- Base de Datos Ingles 1 SemestreUploaded byJaime Mekler
- 11-Media Characteristics (1) (1).pptUploaded byNehaReshi
- APS 305 2011 Group ProjectUploaded bySaad Khan
- Guidelines for Final ProjectUploaded byMegha Jain
- Forecasting Using MinitabUploaded bysldztrk
- Final Report by Bharat Upadhyay on operational analysis of two-lane highwayUploaded byBharat upadhyay
- enzalutamideUploaded byRanda Halfian
- SSC CGLUploaded byGaurav Tiwari
- Calculating Electrical Properties (a, m, And n)Uploaded byPetro Man
- IeeeUploaded byHelloproject
- Is It Real Can We Win is It Worth DoingUploaded byUsman Farooq
- Sustainable Everyday ENGUploaded byDaisy
- Rebecca Hessions PresentationUploaded byJet Canada

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.