You are on page 1of 14

Biostatistics1

CHARLES HERRING
15
I. TYPES OF DATA1
A. Nonparametric (aka discrete) variables1
1. Nominal: Numbers are purely arbitrary or without regard to any order of ranking of severity.2–6
This includes dichotomous (binary) data (lived/died, yes/no, hospitalized/not hospitalized) and
categorical data without order or inherent value (race, eye color, hair color, religion, blood type,
acute renal failure [ARF]/congestive heart failure [CHF]/diabetes mellitus [DM]).5,6
2. Ordinal: Categorical, but scored on a continuum, without a consistent level of magnitude of
difference between ranks (pain scale, New York Heart Association [NYHA] class, trauma score,
coma score).2–8
B. Parametric (aka continuous or measuring): Order and consistent level of magnitude of difference be-
tween data units (drug concentrations, glucose, forced expiratory volume in 1 second [FEV1], heart
rate, blood pressure [BP]).2–6

II. MEASURES OF CENTRAL TENDENCY9


A. Mean (aka average): “Sum of all values divided by the total number of values.”4 Mean is affected by
outliers__(extreme values) and is used for parametric data.4,5,7,9 Mu () is the population mean.4,7
X-bar (X) is the sample mean.4,7,9
B. Median (aka 50th percentile): The “mid-most” point. Median is not affected by outliers and may be
used for ordinal or parametric data.4,7,9
C. Mode: The most common value.4,7,9 Mode is not affected by outliers and may be used for nominal,
ordinal, or parametric data.4,7,9
D. A weakness of measures of central tendency is that they do not describe variability or spread of data.1

III. MEASURES OF VARIABILITY: DESCRIBE DATA SPREAD


AND CAN HELP INFER STATISTICAL SIGNIFICANCE9
A. Range: The interval between lowest and highest values. Range is simply descriptive and only consid-
ers extreme values, so it is affected by outliers.3,4,7,9
B. Interquartile range: The interval between the 25th and 75th percentiles, so it is not affected by outliers.
Because it is directly related to median, it is typically used for ordinal data.5,7,9
C. Variance: Deviation from the mean, expressed as the square of the units used.3,9
1. Variance  sum of (mean  data point) squared, divided by n  1
__
2. Variance   _ (X  X1)2
n1
D. Standard deviation (SD): Square root of variance. SD estimates data scatter around a sample mean.
SD is only used for parametric data.3–5,7,9 Sigma () is the population SD.3 S is the sample SD.3
_______
1. SD  √variance
E. Standard error of the mean (SEM) (aka standard error [SE]): SD divided by the square root of n. As
a measure of variability, SEM is misleading. If one is provided SEM, SD and/or confidence intervals
(CIs) should be calculated to see true sample variability.3,4,7,9
1. SEM  _ SD
__
√n
303
304 Chapter 15 III. F

F. CI: In medical literature, a 95% CI is most frequently used, and it is a range of values that “if the entire popu-
lation could be studied, 95% of the time the true population value would fall within the CI estimated from
the sample.”9 CIs
__ are descriptive and inferential. All values contained in the CI are statistically possible.
4

1. 95% CI  X  1.96 (SEM)


2. Interpretation of statistical significance for CI in superiority trials:
a. For differences such as BP reduction, cholesterol reduction, fingerstick blood sugar (FSBS) or
A1c reductions, relative risk reductions or increases, and absolute risk reductions or increases,
if the 95% CI includes 0, then the results are not statistically significant (NSS).4,7
b. For ratios such as relative risk (aka risk ratio), odds ratio, and hazards ratio, if the 95% CI
includes 1, then the results are NSS.5–7

IV. HYPOTHESIS TESTING10


A. H0 (null hypothesis)  For superiority trials, H0 is that no difference exists between the populations
studied.4,5,10 H1 (alternative hypothesis)  For superiority trials, H1 is that a difference does exist
between the populations studied.5,10 Statistical significance is tested (hypothesis testing) to indicate
whether H0 should be accepted or rejected.4,6,10
1. For superiority trials, H0 is “rejected” if a statistically significant difference between groups is
detected (results unlikely due to chance). H0 is “accepted” if no statistically significant difference
is detected.4,6,10
B. A type 1 error occurs if one rejects H0 when, in fact, H0 is true.5,10 For superiority trials, this occurs
when one finds a difference between treatment groups when, in fact, no difference exists.4–6,10 Alpha
() is the probability of making a type 1 error.5,6,10 H0 is rejected when P .4,10 By convention,  is
usually 0.05, which means that 1 time out of 20, a type 1 error will be committed. This is a conse-
quence that investigators are generally willing to accept and is denoted in trials as a P 0.05.4–6,10
C. A type 2 error occurs when one accepts H0 when, in fact, H0 is false.4–6,10 For superiority trials, this
is when one finds no difference between treatment groups when, in fact, a difference does exist.5,6
Beta ( ) is the probability of making a type 2 error.5,6,10 By convention, is 20% (0.2).4,5,10
D. Power is the ability of an experiment to detect a statistically significant difference between samples
when a significant difference truly exists.4–6,10 (Power  1  ) Inadequate power may cause one to
conclude that no difference exists when, in fact, a difference does exist (type 2 error).10 Note that in
most cases, power is an issue only if one accepts H0. If one rejects H0, there is no way that one could
have made a type 2 error1 (Table 15-1).
E. Clinical versus statistical significance
1. Just because one finds a statistically significant difference does not mean that the difference is
clinically meaningful.4,10 With enough patients, one can find all kinds of statistically significant
differences that are not clinically meaningful. For example, with a large enough sample size, one
could detect a statistically significant difference between one BP medication that decreases sys-
tolic BP by 10 mm Hg and another that decreases BP by 11 mm Hg. This would be statistically but
not clinically significant. Also, lack of statistical significance does not mean the results are unim-
portant.4,11 A nonstatistically significant difference is more likely to be accepted as being clinically
significant in the instance of safety issues such as adverse effects.1
2. To help judge the clinical significance of a statistically significant data set, determine what
others think is clinically significant by considering the effect used in the sample size calculation

Table 15-1 TYPE 1 AND 2 ERROR FOR SUPERIORITY TRIALS

Reality

Decision from statistical test Difference exists (H0 false) No difference exists (H0 true)
Difference found (Reject H0) Correct Incorrect
No error Type 1 error (aka false positive)
No difference found (Accept H0) Incorrect Correct
Type 2 error (false negative) No error

Shargel_8e_CH15.indd 304 07/08/12 1:31 AM


Biostatistics 305

(if reported), the existing evidence-based or expert consensus statements, and any cost-
effectiveness or decision analyses that have been performed.8,12 Absent such guidance, require
that the minimum worthwhile effect be large when the intervention is costly (e.g., in terms of
time, money, or other resources), the intervention is high risk, a patient is risk averse, or the out-
come is unimportant or has intermediate importance but with uncertain benefit to patients.8,12
Accept the minimum worthwhile effect as small when the intervention is low cost, the interven-
tion is low risk, the patient is risk taking, or the intervention is important and has an unambigu-
ous outcome (e.g., death).8,12

V. STATISTICAL INFERENCE TECHNIQUES IN HYPOTHESIS


TESTING FOR PARAMETRIC DATA13
A. T test
1. Nonpaired t test. Observations between groups are independent as in a parallel trial.4,13
2. Paired t test (aka matched or repeated measures data). Patients are their own control (i.e., obser-
vations between groups are dependent as in a pretest/posttest or crossover trial).4,13
B. The t test is the statistical test of choice when making a single comparison of parametric data
between two groups. When making either multiple comparisons between two groups or a single
comparison between multiple groups, type 1 error risk increases, and one should make an effort to
keep the type 1 error risk 5% (i.e., 0.05). One of the best ways to help control for type 1 error
risk when analyzing parametric data for multiple groups or comparisons is analysis of variance
(ANOVA) testing.1
C. ANOVA tests for a statistically significant difference among a group’s collective values.13 ANOVA
involves calculation of an F-ratio, which answers the question, “Is ‘the variability between the groups
large enough in comparison to the variability of data within each group to justify the conclusion that
two or more of the groups differ?’”5,13
D. ANOVAs for independent (aka nonpaired) samples
1. One-way ANOVA. Used if no confounders, the experimental groups differ in one factor (e.g., type
of drug evaluated), and greater than or equal to three independent (i.e., nonpaired) groups are
being evaluated.3
E. Multifactorial ANOVA for independent (aka nonpaired) samples. Any type of ANOVA control-
ling for one or more confounders (i.e., samples differ in greater than or equal to two factors). These
include the following:
1. Two-way ANOVA. Used if there is one confounder (i.e., samples differ in two factors at a time—
drug and confounder) for two or more independent groups (nonpaired).1
a. Example 1: Weight loss studies with at least two groups would use this because heavier
patients lose weight faster than less heavy patients. This type of ANOVA will show if vari-
ability in results is attributable to either factor independently and/or the two factors are
combined.1
F. Analysis of covariance (ANACOVA, ANCOVA) for independent (aka nonpaired) samples. Used if
there are two or more confounders and two or more independent groups (nonpaired).1
1. Three-way ANOVA. Used if there are two confounders (i.e., samples differ in three factors at a
time—drug and two confounders) for two or more independent groups (nonpaired).1
2. Four-way ANOVA. Used if there are three confounders (i.e., samples differ in four factors at a
time—drug and three confounders) for two or more independent groups (nonpaired).
3. . . . and so on . . .
G. ANOVAs for related (aka paired, matched, repeated) samples
1. Repeated measures ANOVA. Used if there are no confounders and three or more related samples
(paired). Subjects participate in more than one treatment group as in a crossover trial. 1
2. Two-way repeated measures ANOVA. Used if there is one confounder and two or more related
samples (paired).1
3. Repeated measures regression. Used if there are two or more confounders and two or more related
samples (paired).1
H. ANOVA will indicate whether a difference exists between groups but will not indicate where this
difference exists.5 For this, one must use multiple comparison methods (MCMs). These are types of
post hoc tests. MCMs are performed only after a statistically significant F test.13

Shargel_8e_CH15.indd 305 07/08/12 1:31 AM


306 Chapter 15 VI. A

VI. STATISTICAL INFERENCE TECHNIQUES IN HYPOTHESIS


TESTING FOR NONPARAMETRIC DATA14
A. If data are not parametric, nonparametric statistical methods must be used.14 We will start with nom-
inal data. Chi-square and Fisher exact tests can be used for proportions and frequencies of nominal
data matrices (e.g., prevalence).14
B. Chi-square tests are “used to answer questions about rates, proportions, or frequencies.”5,14 Used to
determine whether a difference between populations or groups exists but will not indicate where the
difference lies.5,14
1. Chi-square test of independence (aka test of association). Used to compare two or more groups
( 2 2 table, i.e., 2 2, 2 3, 4, 5, 3 3, etc).4,5,14 (Sample size must be 20).14 Chi-square
test of independence “cannot be used for paired data.”14
a. Example of a chi-square test: Might compare the Board of Pharmacy exam pass rates of
candidates from three different pharmacy schools. This kind of table is a contingency table.
It expresses the idea that one variable (such as passing or failing the examination) may be
contingent on the other variable (such as which pharmacy school one attended).1

Passing Scores Failing Scores

School A 90 5
School B 120 9
School C 130 11

b. There are two possible results (passing vs. failing) for three schools of pharmacy. Therefore,
this is called a 2 3 table or matrix.1
C. Once chi-square calculations (for greater than 2 2 contingency table) indicate a statistically
significant difference, one must perform post hoc tests to determine which groups or treatments
differ from one another.5 These post hoc tests should only be performed if the chi-square test
was significant.1
D. Fisher exact test may be used when the sample size for a nominal data set is between 20 and 40.14 It
may also be used for a 2 2 matrix when a nominal data set is 20 or less. 4,14 In addition, Fisher exact
test may be used for matched or paired data (i.e., crossover or prepost test design).1

VII. STATISTICAL INFERENCE TECHNIQUES IN HYPOTHESIS


TESTING FOR ORDINAL DATA14
A. Mann-Whitney U test and Wilcoxon rank sum test. These may be used when one comparison is
being made with two nonpaired groups (which need not be of equal size).5,14 These are not appropri-
ate for grouping into a “cumulative frequency distribution.”14
B. Kolmogorov-Smirnov test. This can also be used when one comparison is being made with two non-
paired groups.14 However, this test is used when data are grouped into a “cumulative frequency distri-
bution” (i.e., individual scores or data values are lumped together into groups for further analysis).14
An example of a cumulative distribution grouping is the manner in which the Glasgow Coma Score
is used to help determine trauma score.14
C. Wilcoxon signed rank test (not to be confused with Wilcoxon rank sum test). This may be used when
one comparison is being made with two paired groups.4,14
D. Kruskal-Wallis test (for nonpaired data). This may be used when two or more comparisons are being
made with three or more nonpaired groups.4,14
E. Friedman test (for paired data). This may be used when two or more comparisons are being made
with three or more paired groups.4,14
1. As with chi-square and ANOVA tests for more than two groups, once Kruskal-Wallis or Friedman
tests indicate a statistically significant difference, one must perform post hoc tests to determine
which groups or treatments differ from one another.5 These post hoc tests should only be per-
formed if the Kruskal-Wallis or Friedman test was statistically significant.1

Shargel_8e_CH15.indd 306 07/08/12 1:31 AM


Biostatistics 307

F. ANOVA rank tests are generally used to account for confounders in ordinal data with the excep-
tion of using repeated measures regression to account for two or more confounders in crossover
design.1

VIII. CORRELATION11 SIMPLY EXPLAINS THE STRENGTH OF A


RELATIONSHIP BETWEEN TWO VARIABLES4,11,15
A. The sample correlation coefficient for parametric data is the Pearson correlation coefficient or
Pearson product moment r.5,11
1. r ranges from 1 to 15,11
2. 1  perfect negative linear relationship5,11
3. 1  perfect positive linear relationship5,11
4. zero  no relationship5,11
5. H0 is that r  0 (i.e., no relationship between variables)11
6. There is not a consistent level of magnitude of difference between r values.15 Therefore, r of
0.25 is not half the relationship of an r of 0.5 and “an r of  0.5 does not imply that the
strength of the relationship is ‘half-way’ between no correlation/relationship and a perfect
correlation.”11,15
7. The strength of a relationship depends on the data being evaluated. In one field of study, an r  0.6
may be a strong correlation, whereas in another field of study, an r  0.8 may be required to
indicate a strong correlation.1
8. Some general guidelines are the following:
a. r  0.25: “doubtful” correlation15,16
b. r  0.26–0.5: “fair” correlation15,16
c. r  0.51–0.75: “good” correlation15,16
d. r > 0.75: “superior” correlation15,16
9. There is a P value associated with r and CIs can be calculated5 [e.g., r  0.74 (95% CI:
0.53  0.98)].
B. The sample correlation coefficient for ordinal data is Spearman rho or Spearman rank order r.5,11
C. Most of the time, relationships are “confounded by extraneous variables.”11 This “adversely affects a
‘perfect’ correlation.”11
1. Example: There is an association with overuse of albuterol (ProAir HFA) and mortality. How-
ever, there are multiple confounders that prevent an absolute understanding of the relation-
ship. Possible confounders include the type of controller medication being used, inhalation
technique/medication compliance, concomitant disease states (chronic obstructive pulmonary
disease [COPD], heart failure [HF]), concomitant medications ( -blockers), and one of many
triggers.1
D. Correlation has many limitations. Although it does a good job of recognizing and measuring the
strength of relationship(s) between variables, it does not establish causality.11,15 Remember the age-
less question of which came first, the chicken or the egg? A more relevant example is, does overuse of
albuterol (ProAir HFA) lead to poorly responsive, severe asthma attacks? Or does poorly responsive,
severe asthma lead to albuterol (ProAir HFA) overuse?1
E. Correlations do not have the ability to “predict” one variable based on another.1

IX. REGRESSION TAKES CORRELATION ONE STEP FURTHER


Regression not only recognizes and measures the strength of relationship(s) between variables, it also
describes a relationship such that an equation for predicting one variable from another variable can be
developed.4,11
A. With simple linear regression, where m  slope of the line, “y is the dependent variable, x is the
independent variable,” and “b is the y-intercept of the line.”11
y  mx  b
B. r-squared (r2) is known as the coefficient of determination. It “represents the percentage of variation
in y that is accounted for by x.”8 For example, if a study produces a correlation/regression analysis

Shargel_8e_CH15.indd 307 07/08/12 1:31 AM


308 Chapter 15 IX. B

between stroke and hypertension reporting an r  0.70, then r2  0.49, and one could say that 49%
of stroke risk may be explained by BP.1
1. Conversely, 1  r2 represents the proportion of the variation that is not related to the indepen-
dent variables (i.e., the residual variation).11 This is sometimes referred to as the coefficient of
nondetermination.15
2. As with correlation, CIs can be calculated for regression analyses. Also, as with correlation, the
existence of this kind of statistical association is not in itself evidence of causality. One must take
into account what type of analysis is being performed (i.e., case control vs. cohort vs. randomized
controlled trial [RCT]).1
C. Multiple regression. When more than one independent variable is used to predict a dependent vari-
able, multiple (or multivariate) regression analysis (MRA) is used.4,15 For example, the national cho-
lesterol guidelines use multiple regression to help establish 10-year coronary heart disease (CHD)
risk for patients based on population data. A patient’s 10-year CHD risk is the dependent variable
because its estimate “depends on” several independent variables. The independent variables include
age, total cholesterol, high-density lipoprotein (HDL) cholesterol, smoking status, and systolic BP. All
of these independent variables are used to help predict a patient’s 10-year CHD risk.1
1. Multiple linear regression is “used with parametric (aka continuous) outcomes like BP” and lipid
values.17
2. Logistic regression is used with nominal outcomes such as death and hospitalization.8,17
3. Cox proportional hazards regression is “used when an outcome is the length of time to an event.”17
For example, time until death or hospitalization or time until discharge.5,17

X. ERROR VERSUS BIAS. Errors are “mistakes that do not systematically under- or overestimate
effect size.”18 Biases are systematic errors/flaws in study design that lead to incorrect results.5,6,18 More
common types of biases include publication bias, investigator bias, compliance bias, selection bias,
diagnostic or detection bias, recall bias, and channeling bias (aka confounding by indication).5–7,18–20
The best way to minimize bias is through proper study design (e.g., randomization, inclusion/exclusion
criteria, blinding, using controls and objective outcome measures).5,6,8,18,19

XI. CONFOUNDING. Confounders are “causes, other than the one studied, which may be linked
to the studied outcomes and/or the hypothesized cause.”5,7,19 Although these are sometimes difficult
to detect, investigators should account for known confounders.5,19 For example, atherosclerosis causes
myocardial infarction (MI). There is an association between atherosclerosis and smoking, smoking
and risk for an MI, and atherosclerosis and risk for an MI. The proposed cause is atherosclerosis.
The potential confounder is smoking, so investigators need to account for any significant smoking
differences among studied groups.19

Proposed cause Confounder (aka covariate)


(atherosclerosis) (smoking)

Outcome studied
(heart attack)

A. Ways of controlling for confounders include proper study design (e.g., randomization, inclusion/
exclusion criteria, blinding, using controls and objective outcome measures) and proper statistical
analysis (stratification, MRA, and use of ANOVA [for parametric data]).5–8,18,19

Shargel_8e_CH15.indd 308 07/08/12 1:31 AM


Biostatistics 309

XII. CONTROLLED VERSUS NONCONTROLLED TRIALS


A. Controlled trials attempt to “keep the study groups as similar as possible and to minimize bias. Ideally
the groups will differ only in the factor (treatment) being studied.”7
B. Uncontrolled trials generally only evaluate one group7 (i.e., no control group).1
C. Various types of controls
1. “Placebo control: One or more groups are given active treatment while the control group receives
a placebo.”7
2. “Historical Control: The data from a group of subjects receiving the intervention are compared
to data from a group of subjects previously treated during a different time period, perhaps in a
different place”7 (experimental group vs. past group).
3. “Crossover control: Each subject serves as his/her own control. During a defined period
of time, group A receives the experimental drug while group B receives the control. Then
for the next defined period of time, group A receives the control and group B receives the
experimental drug.”7
4. “Standard Treatment (aka Active Treatment) Control: Control group subjects receive ‘gold stan-
dard’ treatment while the other group receives the experimental treatment.”7 These are mainly
used when a placebo control would be unethical.7
5. “Within Patient Comparison Control: One part of the body is treated with the experimental treat-
ment while another part of the body is treated with either the ‘gold standard’ or placebo.” This is
mainly used for dermatology trials.7

XIII. BLINDING DOES SEVERAL THINGS. It helps minimize clinicians’ treating/assessing


one group differently from another, helps control for a placebo effect, and helps maximize equal patient
compliance. Blinding is especially important if there is any degree of subjectivity associated with outcome
assessment. Common types of blinding are listed in the following1:
A. Nonblinded trial: The investigator and subject know what treatment or intervention the subject is
receiving. This is commonly referred to as an open or open-label trial.7
B. Single-blind trial: Someone (usually the patient) is unaware of what treatment or intervention the
subject is receiving.7
C. “Double-blind trial: Neither the investigator nor the subject is aware of what treatment or interven-
tion the subject is receiving.”7
D. “Double-dummy trials are used if one is comparing two different dosage forms and doesn’t want the
patient or investigator to know in which arm the patient is participating. For example, if one is com-
paring intranasal sumatriptan (Imitrex) to injectable sumatriptan (Imitrex STATdose), one group
would receive intranasal sumatriptan and a placebo injection, while the other group would receive
intranasal placebo and a sumatriptan injection.”7

XIV. VALIDITY
A. Internal validity. To what degree does a study appropriately test and answer the question(s) being
asked and measure what is claimed to be measured? It “addresses issues of bias, confounding, and
measurement of endpoints.” 19 It directly affects external validity.7,18,19
B. External validity. Presuming internal validity, this assesses whether the results can be extrapolated to
the general population, to other groups, patients, or systems.7,18,19

XV. ASSESSING RISK1


A. Absolute risk (AR) (aka incidence)
number who develop outcome during a specified period
AR  ______
number available to develop outcome at the beginning of the study
1. Absolute risk reduction (ARR)  ARControl  ARExperimental
2. Numbers-needed-to-treat (NNT)  1/ARR
B. Relative risk (RR) (aka risk ratio, rate ratio, or incidence rate ratio)  ARExperimental/ARControl
1. Relative risk reduction (RRR)  1  RR and RRR  ARR/ARControl

Shargel_8e_CH15.indd 309 07/08/12 1:31 AM


310 Chapter 15 XV. C

C. Odds ratio (OR) is an estimate of RR and is generally used for case-control trials.5,15
D. Hazard ratio (HR) estimates RR and is generally used with Cox proportional hazards regression
analysis.
E. OR and HR are fairly accurate estimates of RR if the incidence of an outcome is  15%.17

XVI. TRIAL DESIGN1


A. Prospective versus retrospective
1. Prospective study: Subjects are followed forward in time from a specified time point. Data are
collected and outcomes or variables are measured and analyzed.1
2. Retrospective study: Always begins and sometimes ends in the past (case report, case series, case
control, retrospective cohort) but may end in the present or future (prospective cohort). The
investigator(s) look(s) back in time to collect and analyze data.1
B. Relative strength of causal relationships based on “trial designs.”19 The following study designs are
listed in a “weakest to strongest order for establishing causality and confidence in conclusions/
results.”18,19
1. Case report18,19 (weakest) (not analytical or interventional)
2. Case series18,19 (simply more patients than case reports) (not analytical or interventional)
3. Cross-sectional studies “survey characteristics of a population at a given time and are particularly
useful for measuring the prevalence of a disease or event.”20
4. Case-control study18,19 (aka epidemiological or observational trial designs)5 (analytical)
5. Cohort study18,19 (aka epidemiological or observational trial designs, outcomes or follow-up
studies)5 (analytical)
a. Retrospective cohort (exposure and end point occurred in the past)1
b. Prospective cohort (exposure occurred in the past, end point in the future) 1
c. True prospective cohort (exposure in the present or future, end point in the future)1
6. Meta-analysis. This is controversial, but most are of the opinion that, as long as the meta-analysis
included only RCTs and was conducted appropriately, its strength to determine causality is
between that of a cohort and RCT.19
7. RCT. Strongest for establishing causality (interventional)5,18,19
C. Retrospective trials (arrow charts for trial designs shown hereafter)19
1. Case-control study (aka “risk factor studies”)19:

+ = disease present
− = disease absent
Outcome

Exposure
Today
(?)


a. These are retrospective, identify patients based on outcome or disease, and are therefore good
for rare outcomes or diseases.6,19 These are good for identifying possible “causal influences on
relatively uncommon outcomes” or slow-developing diseases, can be used in diseases with
long latencies such as Alzheimer, and are inexpensive and quick.5,18,19 Problems include their
being only hypothesis generating, subject to bias and confounding, not good for rare expo-
sures, and that selection of controls can be challenging. 5,18,19
2. Retrospective cohort study (aka “outcomes studies”)19:

Exposure

Outcome Today
 (?)

Shargel_8e_CH15.indd 310 07/08/12 1:31 AM


Biostatistics 311

3. Prospective cohort study (aka “outcomes studies”)19:

Exposure

Today Outcome (?)


a. This is called a prospective cohort, but it is really a retrospective design because exposure took
place in the past.19
4. Very rarely, one may see the following type of prospective cohort study, which is truly prospective18:

Exposure

Today Outcome (?)

a. Cohorts are good for studying frequent outcomes or diseases such as atherosclerosis.5,18
Subjects are identified based on exposures, which allows investigators to study multiple dis-
eases and/or uncommon exposures.6,19 These are subject to less bias than case control, can
help determine incidence of disease, and can help define temporal trends between exposure
and disease. 5,6,18,19 Problems include loss of follow-up, inability to control exposures, bias
(although less than with case control), time, and expense.1
D. Randomized trials. Randomization (aka allocation) means that all within a population have an equal
and independent opportunity of being selected as part of a sample.1
1. Parallel design (arrow charts for trial designs shown hereafter) 19

Sample Rx 1 End point

Rx 2 End point

Randomization Evaluation/Analysis

2. Crossover design. Used when there is wide, interpatient variability. “Since each patient serves as
his or her own control, variation between treatment groups is minimized.”20

Ensure adequate time for washout


to prevent carryover effect

Sample Rx 1 End point Rx 1 End point

Rx 2 End point Rx 2 End point

Evaluation/Analysis
Randomization
a. Crossovers may be used for certain “chronic, stable diseases, such as osteoarthritis, or for
pharmacokinetic studies.”20 Crossovers are not appropriate for chronic, unstable diseases (e.g., al-
lergic rhinitis or asthma).20 Also, crossovers are not suitable for “acute conditions, such as post-
operative pain or infections.”20 They are not appropriate for certain types of treatment questions
(e.g., treatment of nausea/vomiting in chemotherapy trials). Also, there may be cases where cross-
overs are unethical or impractical (e.g., a smoking cessation trial). To prevent carryover effect, “a
typical washout period should last at least 5 half-lives of the study drug or its active metabolite.”20
3. Randomized trials are the “best design for determining causality” by minimizing bias and
dividing confounders equally.5,19 However, these are expensive and time and labor intensive, and
generalization of results depends highly on appropriate inclusion/exclusion criteria.5,19

Shargel_8e_CH15.indd 311 07/08/12 1:31 AM


312 Chapter 15 XVII

XVII. REFERENCES
1. Herring C. Quick Stats: Basics for Medical Literature Evaluation. 3rd ed. Acton, MA: Copley; 2009.
2. Gaddis ML, Gaddis GM. Introduction to biostatistics: Part 1, basic concepts. Ann Emerg Med.
1990;19(1):86–89.
3. Glaser AN. High Yield Biostatistics. Media, PA: Williams & Wilkins; 1995.
4. DeYoung GR. Biostatistics: A Refresher (handout). 2000 Updates in Therapeutics: The Pharmaco-
therapy Preparatory Course.
5. Kaye KS. Clinical Epidemiology and Biostatistics: Overview and Basic Concepts (handout). Faculty
Development Seminar, Campbell University School of Pharmacy, Department of Pharmacy
Practice. 2001.
6. Kaye KS. Clinical Epidemiology and Biostatistics, Part 2 (handout). Faculty Development Seminar,
Campbell University School of Pharmacy, Department of Pharmacy Practice. 2001.
7. Berensen NM. Statistics: A Review (handout). 2001.
8. DeYoung GR. Understanding statistics: an approach for the clinician. In: Pharmacotherapy Self-
Assessment Program. Book 5: The Science and Practice of Pharmacotherapy 1. 5th ed. Kansas, MO:
American College of Clinical Pharmacy; 2005:1–17.
9. Gaddis GM, Gaddis ML. Introduction to biostatistics: Part 2, descriptive statistics. Ann Emerg Med.
1990;19(3):309–315.
10. Gaddis GM, Gaddis ML. Introduction to biostatistics: Part 3, sensitivity, specificity, predictive value,
and hypothesis testing. Ann Emerg Med. 1990;19(5):591–597.
11. Gaddis ML, Gaddis GM. Introduction to biostatistics: Part 6, correlation and regression. Ann Emerg
Med. 1990;19(12):1462–1468.
12. Froehlich GW. What is the chance that this study is clinically significant? A proposal for Q values.
Eff Clin Pract. 1999;2:234–239.
13. Gaddis GM, Gaddis ML. Introduction to biostatistics: Part 4, statistical inference techniques in
hypothesis testing. Ann Emerg Med. 1990;19(7):820–825.
14. Gaddis GM, Gaddis ML. Introduction to biostatistics: Part 5, statistical inference techniques for
hypothesis testing with nonparametric data. Ann Emerg Med. 1990;19(9):1054–1059.
15. De Muth JE. Basic Statistics and Pharmaceutical Statistical Applications. 2nd ed. Boca Raton, FL:
Chapman & Hall/CRC, Taylor & Francis Group; 2006.
16. Kelly WD, Ratliff TA, Nenadic CM. Basic Statistics for Laboratories: A Prime for Laboratory Workers.
Hoboken, NJ: John Wiley and Sons; 1992:93.
17. Katz MH. Multivariable analysis: A primer for readers of medical research. Ann Intern Med.
2003;138:644–650.
18. Drew R. Clinical Research Introduction (handout). Drug Literature Evaluation/Applied Statistics
Course. Campbell University School of Pharmacy. 2003.
19. DeYoung GR. Clinical Trial Design (handout). 2000 Updates in Therapeutics: The Pharmacotherapy
Preparatory Course.
20. West PM. Literature evaluation. In: Pharmacotherapy Self-Assessment Program. Book 5: The
Science and Practice of Pharmacotherapy 2. 5th ed. Kansas, MO: American College of Clinical
Pharmacy; 2005.

Shargel_8e_CH15.indd 312 07/08/12 1:31 AM


Biostatistics 313

Study Questions*
1. Which measure(s) of central tendency is/are sensitive (For the next two questions) A study of the effects of
to outliers? bupropion (Zyban) versus nicotine gum (Nicorette)
(A) Mean on the primary end point of change in the number of
(B) Median cigarettes smoked per day in a parallel, randomized trial.
(C) Mode The investigators plan to include 450 subjects (150 in each
arm) to reach statistical significance based on a of 0.2
2. For what type of data can standard deviation (SD) be and  of 0.05.
used?
5. Which of the following statistical tests would be the
(A) Parametric data
most appropriate? (Hint: assume no confounders)
(B) Ordinal data
(C) Nominal data (A) One-way ANOVA
(B) Chi-square (2)
3. Which of the following is correct regarding measures (C) Fisher exact test
of variability? (D) Friedman test
(A) Range can be both descriptive and inferential. (E) t test
(B) Standard error of the mean (SEM) is always larger 6. Which of the following statistical tests would be the
than SD. most appropriate if the study had evaluated three
(C) All values contained in a confidence interval (CI) groups instead of only two? (i.e., bupropion [Zyban]
are statistically possible. vs. nicotine patches [Nicoderm CQ] vs. nicotine gum
(D) CI is a descriptive measure only. [Nicorette])
4. A study was performed to determine the effect of a (A) One-way ANOVA
new antipsychotic agent (Drug A) on psychosis in (B) Chi-square (2)
patients with underlying schizophrenia as compared (C) Fisher exact test
to placebo. A sample size of 300 patients was (D) Friedman test
calculated to be needed based on an  of 0.05 and a (E) Student’s t test
of 0.2. The double-blind, parallel, superiority trial
7. A study is designed to evaluate the change in blood
was performed in 350 patients for 8 weeks. At the end
pressure lowering between metoprolol tartrate
of the 8-week period, the new antipsychotic agent
(Lopressor®) and metoprolol succinate (Toprol XL®).
was found to induce remission in 20% of patients as
The investigators decide to perform a parallel trial
compared to 19% in the placebo group (P  0.04).
in 200 patients. There were significant baseline
Which of the following statements is true based on the
differences between the groups in diet and exercise.
results of the study?
Which of the following statistical tests would be most
(A) Drug A was found to have a statistically appropriate?
significant and clinically significant difference
(A) One-way ANOVA
on remission of psychosis as compared to
(B) Chi-square (2)
placebo.
(C) Fisher exact test
(B) Drug A was found to have a statistically
(D) ANCOVA
significant difference but not a clinically
(E) Student’s t test
significant difference on remission of psychosis as
compared to placebo.
(C) Drug A was found to have a clinically significant
difference but not a statistically significant
difference on remission of psychosis as compared
to placebo.
(D) Drug A was not found to have a clinically or
statistically significant difference on remission of
psychosis as compared to placebo.

* These study questions were composed by Melanie Pound, PharmD, BCPS and Rebekah Grube, PharmD, BCPS.

Shargel_8e_CH15.indd 313 07/08/12 1:31 AM


314 Chapter 15

8. The makers of eplerenone (Inspra) want to design 11. What can be concluded about the outcome “CVA
a study to compare their medication to the current or TE”?
standard of spironolactone (Aldactone) in the (A) Dabigatran (Pradaxa) has a clinically significant
treatment of heart failure. They decide to perform a lower risk than warfarin (Coumadin), although it
parallel trial of the two agents in 2000 patients with is not statistically significant because the CI does
NYHA classes II to IV heart failure over 2 years. The not include 1.
primary end point is mortality. Which of the following (B) Dabigatran (Pradaxa) has a clinically significant
statistical tests would be most appropriate to use? lower risk than warfarin (Coumadin), although it
(A) ANOVA is not statistically significant because the CI does
(B) Fisher exact test not include 0.
(C) Chi-square (2) (C) Dabigatran (Pradaxa) has a clinically significant
(D) Mann-Whitney U test higher risk than warfarin (Coumadin), and it is
(E) Student’s t test statistically significant because the CI does not
include 0.
9. A retrospective study produces correlation/regression
(D) Dabigatran (Pradaxa) has a clinically significant
analysis between a high sodium intake ( 2.4 g/day)
lower risk than warfarin (Coumadin), and it is
and hypertension (HTN) reporting an r  0.45. Which
statistically significant because the CI does not
of the following is correct?
include 1.
(A) 20% of HTN may be explained by high sodium
intake. 12. What can be concluded about the outcome “MI”?
(B) 20% of HTN is not explained by high sodium (A) Dabigatran (Pradaxa) has a higher MI risk
intake. than warfarin (Coumadin), although it is not
(C) 80% of HTN is explained by high sodium intake. statistically significant because the CI includes 1.
(D) 55% of HTN is not explained by high sodium (B) Dabigatran (Pradaxa) has a higher MI risk
intake. than warfarin (Coumadin), although it is not
(E) 45% of HTN may be explained by high sodium statistically significant because the CI does not
intake. include 0.
(C) Dabigatran (Pradaxa) has a higher MI risk than
10. A study was performed to evaluate a possible
warfarin (Coumadin), and it is statistically
correlation between the use of the herbal product
significant because the CI includes 1.
Goldenseal and changes in pain relief (based on pain
(D) Dabigatran (Pradaxa) has a higher MI risk than
scale scores). Which type of correlation analysis
warfarin (Coumadin), and it is statistically
should be used in this trial?
significant because the CI does not include 0.
(A) Pearson
(B) Spearman 13. A researcher was interested in examining the
(C) Linear association between postmenopausal hormone
(D) Cox replacement therapy (HRT) and development of
heart disease. All women who were characterized as
(For the next two questions) In the RE-LY trial, dabigatran postmenopausal were approached regarding their
(Pradaxa) was compared with warfarin (Coumadin) for interest in participating in the study by answering a
the prevention of cerebrovascular accident (CVA) in atrial questionnaire annually regarding their medication use
fibrillation (AF) patients. The primary outcome in this trial and medical conditions. Of the 16,168 women who
was CVA or systemic thromboembolism (TE). The results provided consent, the average length of follow-up
are presented as follows: was 12.5 years (range, 6 to 16 years). Which of the
following best describes the study design?
Dabigatran Warfarin (A) Case-control study
End point (n  6076) (n  6022) RR, 95% CI (B) Prospective cohort study
(C) Randomized controlled trial
CVA or TE 134 (2.2%) 159 (2.6%) 0.66 (0.53–0.82) (D) Meta-analysis
MI 89 (1.5%) 63 (1.0%) 1.38 (1.00–1.91)

Shargel_8e_CH15.indd 314 07/08/12 1:31 AM


Biostatistics 315

14. An investigator wishes to study a new drug for the 15. Which of the following would be appropriate for a
treatment of hypertension in patients with diabetes. crossover design study?
What is the best type of trial design the investigator (A) Effects of Drug A versus Drug B on hypertension
should use for determining causality in this particular in 100 patients
study? (B) Effects of varenicline (Chantix) compared to
(A) A case-control study placebo on smoking cessation
(B) A prospective cohort study (C) Effects of fluticasone/salmeterol (Advair) and
(C) A prospective, randomized, placebo-controlled trial budesonide/formoterol (Symbicort) on asthma
(D) A prospective, randomized, standard-of-care exacerbations
comparison trial (D) Effects of hydralazine and hydrochlorothiazide
(E) A meta-analysis (Microzide) on all-cause mortality

Answers and Explanations


1. The answer is A [see II.A]. 5. The answer is E [see V.A–B].
Mean is the correct answer because it is affected by out- Answer E is correct because a t test is the test of choice
liers. Median and mode are incorrect because they are for evaluating statistical differences in parametric data
not affected by outliers. (change in the number of cigarettes smoked daily)
when only two groups are being evaluated and there
2. The answer is A [see III.D].
are no detected confounders. Answer A is incorrect
Parametric (aka continuous) data is correct. Standard
because one-way ANOVA is used for testing three or
deviation is only meaningful for parametric data. It is
more groups. Answers B and C are incorrect because
not meaningful for nominal data. Measures of vari-
chi-square and Fisher exact test are used to test nomi-
ability are not meaningful for nominal data. Standard
nal data. Answer D is incorrect because Friedman test
deviation is usually not meaningful for ordinal data.
is used to test ordinal data.
Interquartile range is the preferred measure of variabil-
ity for ordinal data. 6. The answer is A [see V.C–D].
Answer A is correct because one-way ANOVA is used
3. The answer is C [see III.F].
to test parametric data (change in the number of ciga-
Answer C is correct because all values within a CI are
rettes smoked daily) when there are three independent
statistically possible. Answer A is incorrect because range
groups and no detected confounders are present. An-
is descriptive only. Range is not inferential because one
swers B and C are incorrect because chi-square and
is unable to “infer” statistical significance for a data set
Fisher exact test are used to test nominal data. Answer
based on range. Answer B is incorrect because SEM is al-
D is incorrect because Friedman test is used to test or-
ways smaller than SD. Answer D is incorrect because CI
dinal data. Answer E is incorrect because t test is used
is not only descriptive but also inferential since one is able
for evaluating statistical differences in parametric data
to “infer” statistical significance for a data set based on CI.
when only two groups are being evaluated.
4. The answer is B [see IV.E].
7. The answer is D [see V.F].
Answer B is correct. The difference was statistically sig-
Answer D is correct because ANCOVA is used to test
nificant because the P value was .04. Based on this P
parametric data (change in BP) when two or more
value, there is a 4% chance that a type 1 error occurred,
groups are being evaluated and two or more confound-
which is less than the prespecified acceptable risk of
ers (diet and exercise differences) are detected. An-
5% (preset  was 0.05 or 5%). However, the difference
swer A is incorrect because one-way ANOVA is used
was not clinically meaningful because there was only
when no detected confounders are present. Answers
a 1% difference in the primary end point (induction
B and C are incorrect because chi-square and Fisher
of remission) between the treatment groups. For these
exact test are used to test nominal data. Answer E is
reasons, answers A, C, and D are incorrect.
incorrect because t test is used when no detected con-
founders are present.

Shargel_8e_CH15.indd 315 07/08/12 1:31 AM


316 Chapter 15

8. The answer is C [see VI.A–B]. ratios like relative risk, it is the difference from 1 that
Answer C is correct because chi-square is used to de- determines statistical significance, not difference from
tect statistical differences for nominal data (mortality) 0. Answer C is incorrect because all values within the
when there are large numbers of patients in each treat- 95% CI are statistically possible and the 95% CI for
ment group. Answers A and E are incorrect because MI contains 1. Therefore, it is statistically possible that
ANOVA and t test are used to test parametric data. there is no difference between dabigatran (Pradaxa)
Answer B is incorrect because Fisher exact test is used and warfarin (Coumadin) for the outcome of MI.
when there are small numbers of patients in each treat-
13. The answer is B [see XVI.C.4].
ment group ( 40 patients). Answer D is incorrect be-
cause Mann-Whitney U test is used to test ordinal data. Answer B is correct because postmenopausal women
were identified based on exposure (medications they
9. The answer is A [see IX.B]. were taking, i.e., whether or not they were taking HRT)
Answer A is correct because r  0.45, r-squared (r2) as is done in cohort studies. Answer A is incorrect be-
 0.2 or 20%. Therefore, 20% of one variable (hyper- cause, for case-control studies, subjects are identified
tension) may be explained by the other variable (high- based on disease (heart disease in this case) rather
sodium diet). This would mean that 1  0.2  0.8 than exposure (HRT), which was not the setup for this
or 80% of hypertension would not be explained by a study. Answer C is incorrect because patients were not
high-sodium diet. Therefore, answers B, C, D, and E randomized to an intervention. Answer D is incorrect
are incorrect. because meta-analyses include multiple studies, which
10. The answer is B [see VIII.B]. is not the case in this example.
Answer B is correct because pain scale scores are ordi- 14. The answer is D [see XII.C.4 and XVI.D.3].
nal data, and Spearman is the sample correlation coef- Answer D is correct because the most robust and ethi-
ficient for ordinal data. Answer A is incorrect because cal way of determining differences between hyperten-
Pearson is the sample correlation coefficient for linear sion treatments and determining causality is through a
(parametric) data. Answer C is incorrect because pain randomized trial with a standard of care control. An-
scale is ordinal data, not linear (parametric). Answer D swer A is incorrect because a case-control study is very
is incorrect because Cox is a type of regression analysis, weak at determining causality. Answers B and E are
not a form of correlation analysis. incorrect because cohort studies and meta-analyses are
11. The answer is D [see III.F.1.b, IV.E, and XV.B]. weaker than randomized trials for establishing causal-
Answer D is correct because the outcome measure is ity. Answer C is incorrect because it would be unethical
relative risk (RR). For ratios like relative risk, it is the to treat hypertensive, diabetic patients with a placebo
difference from 1 that determines statistical signifi- rather than an established therapy that has been shown
cance. Answer A is incorrect because there is a statisti- to improve cardiovascular outcomes.
cally significant difference in CVA or TE because the 15. The answer is A [see XVI.D.2.a].
95% CI does not include 1. Answers B and C are incor- Answer A is correct. Answer B is incorrect because
rect because, for ratios like relative risk, it is the differ- it would be unethical to ask those who had stopped
ence from 1 that determines statistical significance, not smoking in the first part of the trial to restart smoking
difference from 0. in order to obtain data for the second part of the study.
12. The answer is A [see III.F.1.b, IV.E, and XV.B]. Answer C is incorrect because crossovers are not good
Answer A is correct because for ratios like relative for treatment evaluation in unstable diseases. Asthma
risk, it is the difference from 1 that determines statisti- severity may vary depending on seasons. Answer D is
cal significance. The 95% CI included 1, so although incorrect because those who died in the first part of
this may be clinically meaningful, it is not statistically the crossover trial could not be evaluated during the
significant. Answers B and D are incorrect because for second part of the trial.

Shargel_8e_CH15.indd 316 07/08/12 1:31 AM

You might also like