Professional Documents
Culture Documents
INTRO
To analyse trial data, researchers rely on tried and tested statistical methods,
which have to be specified in a filing with the regulatory authorities before the
trial even begins. This makes it possible to monitor and check what’s happening to
the data at any time.
At participating trial centres, the bulk of the data is entered directly online and
electronically processed for submission to the authorities.
Phase III clinical trials are conducted under the direction of an independent
specialist in the disease area of interest. This ‘principal investigator’ is also the
one who will present the results at a medical meeting or in a medical journal —
even if the trial medication fails to produce the desired treatment response.
The quality and utility of monitoring, evaluation and research in our projects and
programmes fundamentally relies on our ability to collect and analyse quantitative
and qualitative data.
A necessary companion to a well-designed clinical trial is its appropriate statistical
analysis. Assuming that a clinical trial will produce data that could reveal
differences in effects between two or more interventions, statistical analyses are
used to determine whether such differences are real or are due to chance. Data
analysis for small clinical trials in particular must be focused. In the context of a
small clinical trial, it is especially important for researchers to make a clear
distinction between preliminary evidence and confirmatory data analysis. When
the sample population is small, it is important to gather considerable preliminary
evidence on related subjects before the trial is conducted to define the size needed
to determine a critical effect.
A single large clinical trial is often insufficient to answer a biomedical research
question, and it is even more unlikely that a single small clinical trial can do so.
Thus, analyses of data must consider the limitations of the data at hand and their
context in comparison with those of other similar or related studies.
Some Statistical Approaches to Analysis of Clinical Trials
1. Sequential analysis
2. Hierarchical models
3. Bayesian analysis
4. Decision analysis
5. Statistical prediction
6. Meta-analysis and other alternatives
7. Risk-based allocation
1) SEQUENTIAL ANALYSIS
Sequential analysis refers to an analysis of the data as they accumulate, with a
view toward stopping the study as soon as the results become statistically
compelling. This is in contrast to a sequential design, in which the probability
that a participant is assigned to a particular intervention is changed depending
on the accumulating results. In sequential analysis the probabilty of assignment
to an intervention is constant across the study
In sequential analysis, the final sample size is not known at the beginning of the
study. On average, sequential analysis will lead to a smaller average sample size
than that in an equivalently powered study with a fixedsample-size design. This
is a major advantage to sequential analysis and is a reason that it should be given
consideration when one is planning and analyzing a small clinical trial.
The use of study stopping (cessation) rules that are based on successive
examinations of accumulating data may cause difficulties because of the need to
reconcile such stopping rules with the standard approach to statistical analysis
used for the analysis of data from most clinical trials. This standard approach is
known as the “frequentist approach.” In this approach the analysis takes a form
that is dependent on the study design. When such analyses assume a design in
which all data are simultaneously available, it is called a “fixed-sample analysis.”
If the data from a clinical trial are not examined until the end of the study, then a
fixed-sample analysis is valid.
2) HIERARCHICAL MODELS
Hierarchical models can be quite useful in the context of small clinical trials in
two regards. First, hierarchical models provide a natural framework for
combining information from a series of small clinical trials conducted within
ecological units (e.g., space missions or clinics). In the case where the data are
complete, in which the same response measure is available for each individual,
hierarchical models provide a more rigorous solution than metaanalysis, in that
there is no reason to use effect magnitudes as the unit of observation. Note,
however, that a price must be paid (i.e., the total sample size must be increased)
to reconstruct a larger trial out of a series of smaller trials. Second, hierarchical
models also provide a foundation for analysis of longitudinal studies, which are
necessary for increasing the power of research involving small clinical trials. By
repeatedly obtaining data for the same subject over time as part of a study of a
single treatment or a crossover study, the total number of subjects required in
the trial is reduced.
A common theme in medical research is two-stage sampling, that is, sampling of
responses within experimental units (e.g., patients) and sampling of
experimental units within populations.
Analysis of this type of data (under the assumptions that a subset of the
regression parameters has a distribution in the population of participants and
that the model residuals have a distribution in the population of responses within
participants and also in the population of participants) belongs to the class of
statistical analytical models called:
• mixed model • regression with randomly dispersed parameters
• exchangeability between multiple regressions
3) BAYESIAN ANALYSIS
The majority of statistical techniques that clinical investigators encounter are of
the frequentist school and are characterized by significance levels, confidence
intervals, and concern over the bias of estimates. The Bayesian philosophy of
statistical inference however is fundamentally different from that underlying the
frequentist approach. In certain types of investigations Bayesian analysis can
lead to practical methods that are similar to those used by statisticians who use
the frequentist approach. The Bayesian approach has a subjective element. It
focuses on an unknown parameter value q, which measures the effect of the
experimental treatment. Before designing a study or collecting any data, the
investigator acquires all available information about the activities of both the
experimental and the control treatments.
The attraction of the Bayesian approach lies in its simplicity of concept and the
directness of its conclusions. Its flexibility and lack of concern for interim
inspections are especially valuable in sequential clinical trials. The main problem
with the Bayesian approach, however, lies in the idea of a subjective distribution
4) DECISION ANALYSIS
Decision analysis is a modeling technique that systematically considers all
possible management options for a problem. It uses probabilities and utilities to
explicitly define decisions. The computational methods allow one to evaluate the
importance of any variable in the decision-masking process. Sensitivity analysis
describes the process of recalculating the analysis as one changes a variable
through a series of plausible values.
The other major advantage of decision analysis occurs after data collection. If one
assumes that the sample size is inadequate and therefore that the confidence
intervals on the effect in question are wide, one may still have a clinical situation
for which a decision is required. One might have to make decisions under
conditions of uncertainty, despite a desire to increase the certainty. The use of
decision analysis can make explicit the uncertain decision, even informing the
level of confidence in the decision.
The formulation of a decision analytical model helps investigators consider which
health outcomes are important and how important they are to one another.
Decision analysis also facilitates consideration of the potential marginal benefit
of a new intervention by forcing comparisons with other alternatives
5) STATISTICAL PREDICTION
When the number of control samples is potentially large and the number of
experimental samples is small and is obtained sequentially from a series of
clusters with small sample sizes (e.g., space missions), traditional comparisons of
the aggregate means or medians may be of limited value. In those cases, one can
view the problem not as a classical hypothesis testing problem but as a problem
of statistical prediction. Conceptualized in that way, the problem is one of
deriving a limit or interval on the basis of the control distribution that will
include the mean or median for all or a subset of the experimental cluster
samples.
A drawback to this method is that the control group is typically not a concurrent
control group. Thus, if other conditions, in addition to the intervention being
evaluated, are changed, it will not be possible to determine if the changes are in
fact due to the experimental condition
6) META-ANALYSIS: SYNTHESIS OF RESULTS OF INDEPENDENT STUDIES
Meta-analysis refers to a set of statistical procedures used to summarize
empirical research in the literature. A meta-analysis can summarize an entire set
of research in the literature, a sample from a large population of studies, or some
defined subset of studies (e.g., published studies or n-of-1 studies). The degree to
which the results of a synthesis can be generalized depends in part on the nature
of the set of studies.
In general, meta-analysis serves as a useful tool to answer questions for which
single trials were underpowered or not designed to address. More specifically,
the following are benefits of meta-analysis: • It can provide a way to combine the
results of studies with different designs (within reason) when similar research
questions are of interest. • It uses a common outcome metric when studies vary
in the ways in which outcomes are measured. • It accounts for differences in
precision, typically by weighting in proportion to sample size. • Its indices are
based on sufficient statistics. • It can examine between-study differences in
results (heterogeneity).
7) RISK-BASED ALLOCATION
Empirical Bayes methods are needed for analysis of experiments with risk-based
allocation for two reasons. First, the natural heterogeneity from subject to
subject requires some accounting for random effects; and second, the differential
selection of groups due to the risk-based allocation is handled perfectly by the “u-
v” method
The u-v method of estimation capitalizes on certain general properties of
distributions such as the Poisson or normal distribution that hold under arbitrary
and unknown mixtures of parameters, thus allowing for the existence of random
effects. At the same time, the u-v method allows estimation of averages under a
wide family of restrictions on the sample space, such as restriction to high-risk
or low-risk subjects, thus addressing the risk-based allocation design feature.
Statistics
The mathematics of the collection, organization, and interpretation of numerical
data, especially the analysis of population characteristics by inference from
sampling.
Why Statistics?
Medicine is a quantitative science but not exact
–Not like physics or chemistry
•Variation characterises much of medicine
•Statistics is about handling and quantifying variation and uncertainty
•Humans differ in response to exposure to adverse effects
Example: not every smoker dies of lung cancer some non-smokers die of lung
cancer
•Humans differ in response to treatment
Example: penicillin does not cure all infections
•Humans differ in disease symptoms
Example: Sometimes cough and sometimes wheeze are presenting features for
asthma
Statistics play a very important role in any clinical trial from design, conduct,
analysis, and reporting in terms of controlling for and minimizing biases,
confounding factors, and measuring random errors.
The statistician generates the randomization code, calculates the sample size,
estimates the treatment effect, and makes statistical inferences, so an appreciation
of statistical methods is fundamental to understanding randomized trial methods
and results.
Statistical analyses deal with random error by providing an estimate of how likely
the measured treatment effect reflects the true effect
Why Statistics Are Necessary
• Statistics can tell us whether events could have happened by chance and to
make decisions
• We need to use Statistics because of variability in our data
• Generalise: can what we know help to predict what will happen in new and
different situations?
Statistics into three major categories as follows based on how they are used:
1. DESCRIPTIVE STATISTICS
• Enumerate, organise, summarise, and categorise graphical representation of
data. These type of statistics describes the data.
•Examples
• means and frequency of outcomes
• charts and graphs
•
Descriptive statistics are used to summarize and describe data collected in a study.
To summarize a quantitative (continuous) variable, measures of central location
(i.e. mean, median, and mode) and spread (e.g. range and standard deviation) are
often used, whereas frequency distributions and percentages (proportions) are
usually used to summarize a qualitative variable.
Descriptive statistics paint a picture of a situation, providing a concise numerical
or graphical summary. We use descriptive statistics every day to help
communicate a phenomenon or situation to other people. People often view
descriptive statistics as being more “objective” than non-numerical descriptions.
set of descriptive statistics may be as follows: 75 people, the average age was 35
years old, the range of ages from 21 to 47, 65% female and 35% male, 80% were
college graduates and 20% had advanced degrees, 30% were from out of state,
and 20% had black hair, 20% had blonde hair, 50% had brown hair, and 10%
had red hair. As you can see, descriptive statistics provide “hard” numbers
against which each person can compare his or her personal benchmarks.
A female friend may feel that 35% men is too few for a party while a male friend
may be happy to hear that the crowd was 65% female
2. COMPARATIVE STATISTICS
Simple descriptive statistics without a proper context or comparison may not be
useful. Is a party with 80% college graduates good or bad? How good or bad is a
35% to 65% male-to-female ratio? Remember very few things are inherently
good or bad. It all depends on comparisons. A party with 65% women may not be
good for a man who is used to attending social gathering with 80% women.
However, a man who has been living and working among only men for several
years may be related to find a party with any women. Is a more sophisticated
type of descriptive statistics serve to compare in a numerical or graphical fashion
one situation with another (or multiple other situations).
3. INFERENTIAL STATISTICS
• Drawing conclusions from incomplete information.
• they make predictions about a larger population given a smaller sample
• these are thought of as the statistical test
•Examples
• 95%CI, t-test, chi-square test, ANOVA, regression
1 Central Tendency
The central tendency of a variable is the “middle” or “typical” values of the
variable in a sample of population. Measures of the central tendency provide a
single number answer to the question: What is the typical value of that variable
in your sample or population? People who want to “go along with the crowd” are
most interested in central tendencies. Our choice of the central tendency measure
depends on the situation and the distribution of the data.
Table 3.1 lists the common measures of central tendencies.
The distribution is the shape of the curve generated if you were to plot the
frequencies of each value of the variable. The two common measures of shape are
skew and kurtosis.
Skew is the degree to which a curve is “bent” toward one direction.
The kurtosis is the degree to which a curve is peaked or flat. The greater the
kurtosis, the more “peaked” a curve is. The less the kurtosis, the flatter the curve
is.
NORMAL DISTRIBUTION
1 The Importance of Normal Distributions
Normal distributions, also known as bell-shaped curves or Gaussian
distributions, have the same general shape: symmetric and unimodal (i.e., a
single peak) with tails that appear to extend to positive and negative infinity. In a
normal curve, approximately 68% of the values fall within one standard
deviation of the mean, 95% fall within two standard deviations, and 99.7% fall
within three standard deviations.
Normal curves can differ in spread. Like most distributions, the normal
distribution has a mean (𝜇) and a standard deviation (𝜎).
Understanding the normal distribution is important. Interestingly, many
biological, psychological, sociological, economical, chemical, and physical
variables exhibit normal distributions. A classic example is that educational test
scores tend to follow a bell curve: most student’s score close to the mean and
much fewer have very high or very low scores. In fact, when the distribution of a
variable is unknown, you frequently assume that it is normal until proven
otherwise. Many statistical tests are based on normal distributions. Violations of
normal distributions, in fact, may invalidate some of these tests, although many
statistical tests still function reasonably well with other types of distributions
3 The t -Distribution
While large samples have a distribution close to the normal distribution (i.e., the
larger the sample, the more normal the distribution), small samples do not.
Moreover, many times the population standard deviation is unknown, requiring
you to use the sample standard deviation to estimate the population standard
deviation. Whenever you have a small sample or do not know the population
standard deviation, using a t-distribution may be more appropriate than using a
normal distribution. t -Distributions are similar to (i.e., symmetrical and bell
shaped) but flatter (leptokurtic) than the standard normal distribution. Unlike
the normal distribution, the t distribution has a special additional parameter,
degrees of freedom (df) , that can be any real number greater than zero and
changes the t -distribution curve ’ s shape. Curves with smaller df have more of
their area under their tails and are therefore flatter than curves with higher
degrees of freedom. As df increase, the t -distribution becomes more and more
like the standard normal distribution.
The t -score (or t -statistic) is to the t -distribution what the Z -score is to the
normal distribution. Use the following formula to calculate a t -score
Two statistical approaches are often used for clinical data analysis: hypothesis
testing and statistical estimate.
Hypothesis Testing
» Hypothesis testing or inference involves an assessment of the probability of
obtaining an observed treatment difference or more extreme difference for
an outcome assuming that there is no difference between two treatments
» This probability is often called the P-value or false-positive rate.
» If the Pvalue is less than a specified critical value (e.g., 5%), the observed
difference is considered to be statistically significant. The smaller the P-
value, the stronger the evidence is for a true difference between treatments.
» On the other hand, if the P-value is greater than the specified critical value
then the observed difference is regarded as not statistically significant, and
is considered to be potentially due to random error or chance. The traditional
statistical threshold is a P-value of 0.05 (or 5%), which means that we only
accept a result when the likelihood of the conclusion being wrong is less than
1 in 20, i.e., we conclude that only one out of a hypothetical 20 trials will
show a treatment difference when in truth there is none.
When hypothesis testing arrives at the wrong conclusions, two types of errors can
result: Type I and Type II errors (Table 3.4). Incorrectly rejecting the null
hypothesis is a Type I error, and incorrectly failing to reject a null
Hypothesis is a Type II error. In general, Type II errors are more serious than Type
I errors; seeing an effect when there isn’t one (e.g., believing an ineffectual drug
works) is worse than missing an effect (e.g., an effective drug fails a clinical trial).
The null hypothesis for each trial is that there is no difference between the active
drug treatment and placebo in mean weight change.
In trial 1 of drug A, the reduction of drug A over placebo was 6 kg with only 40
subjects in each group. The P-value of 0.074 suggests that there is no evidence
against the null hypothesis of no effect of drug A at the 5% significance level. The
95% CI shows that the results of the trial are consistent with a difference ranging
from a large reduction of 12.6 kg in favor of drug A to a reduction of 0.6 kg in favor
of placebo.
The results for trial 2 among 400 patients, again for drug A, suggest that mean
weight was again reduced by 6 kg. This trial was much larger, and the P-value (P
< 0.001) shows strong evidence against the null hypothesis of no drug effect. The
95% CI suggests that the effect of drug A is a greater reduction in mean weight
over placebo of between 3.9 and 8.1 kg. Because this trial was large, the 95% CI
was narrow and the treatment effect was therefore measured more precisely.
In trial 3, for drug B, the reduction in weight was 4 kg. Since the P-value was
0.233, there was no evidence against the null hypothesis that drug B has no
statistically significant benefit effect over placebo. Again this was a small trial with
a wide 95% CI, ranging from a reduction of 10.6 kg to an increase of 2.6 kg for the
drug B against the placebo.
The fourth trial on drug B was a large trial in which a relatively small, 2-kg
reduction in mean weight was observed in the active treatment group compared
with the placebo group. The Pvalue (0.008) suggests that there is strong evidence
against the null hypothesis of no drug effect. However, the 95% CI shows that the
reduction is as little as 0.5 kg and as high as 3.5 kg. Even though this is convincing
statistically, any recommendation for its use should consider the small reduction
achieved alongside other benefits, disadvantages, and cost of this treatment.
P-value
• The p-value is a “tool” to answer the question:
• Could the observed results have occurred by chance*?
Remember:
• –Decision given the observed results in a SAMPLE
• –Extrapolating results to POPULATION
• *: accounts exclusively for the random error, not bias
Sample Size
The planned number of participants is calculated on the basis of:
• Expected effect of treatment(s)
• Variability of the chosen endpoint
• Accepted risks in conclusion
95%CI
Better than p-values…
• use the data collected in the trial to give an estimate of the treatment effect
size, together with a measure of how certain we are of our estimate
•CI is a range of values within which the “true” treatment effect is believed to
be found, with a given level of confidence.
• 95% CI is a range of values within which the ‘true’ treatment effect will lie
95% of the time
•Generally, 95% CI is calculated as
• Sample Estimate ± 1.96 x Standard Error
2. Data Management:
1. CRF management with content and design
3. Study Implementation
It involves sampling selection and implementation of randomizes procedures
4. Study Monitoring
It involves monitoring of quality and for safety and efficacy
5. Data Analysis:
A detailed analysis plan writing with all hypotheses to be tested along with the hierarchy
of analysis
It helps to prepare for reporting, manuscript writing, along with the validity and
creditability of results
When interval estimates are reported, they should have a commonly held
interpretation in the scientific community and more widely. In this
regard, credible intervals are held to be most readily understood by the
general public [citation needed]. Interval estimates derived from fuzzy logic
have much more application-specific meanings.
For commonly occurring situations there should be sets of standard
procedures that can be used, subject to the checking and validity of any
required assumptions. This applies for both confidence intervals and
credible intervals.
For more novel situations there should be guidance on how interval
estimates can be formulated. In this regard confidence intervals and
credible intervals have a similar standing but there are differences: