You are on page 1of 27

DATA ANALYSIS AND INTERPRETATION

INTRO
To analyse trial data, researchers rely on tried and tested statistical methods,
which have to be specified in a filing with the regulatory authorities before the
trial even begins. This makes it possible to monitor and check what’s happening to
the data at any time.
At participating trial centres, the bulk of the data is entered directly online and
electronically processed for submission to the authorities.
Phase III clinical trials are conducted under the direction of an independent
specialist in the disease area of interest. This ‘principal investigator’ is also the
one who will present the results at a medical meeting or in a medical journal —
even if the trial medication fails to produce the desired treatment response.
The quality and utility of monitoring, evaluation and research in our projects and
programmes fundamentally relies on our ability to collect and analyse quantitative
and qualitative data.
A necessary companion to a well-designed clinical trial is its appropriate statistical
analysis. Assuming that a clinical trial will produce data that could reveal
differences in effects between two or more interventions, statistical analyses are
used to determine whether such differences are real or are due to chance. Data
analysis for small clinical trials in particular must be focused. In the context of a
small clinical trial, it is especially important for researchers to make a clear
distinction between preliminary evidence and confirmatory data analysis. When
the sample population is small, it is important to gather considerable preliminary
evidence on related subjects before the trial is conducted to define the size needed
to determine a critical effect.
A single large clinical trial is often insufficient to answer a biomedical research
question, and it is even more unlikely that a single small clinical trial can do so.
Thus, analyses of data must consider the limitations of the data at hand and their
context in comparison with those of other similar or related studies.
Some Statistical Approaches to Analysis of Clinical Trials
1. Sequential analysis
2. Hierarchical models
3. Bayesian analysis
4. Decision analysis
5. Statistical prediction
6. Meta-analysis and other alternatives
7. Risk-based allocation
1) SEQUENTIAL ANALYSIS
Sequential analysis refers to an analysis of the data as they accumulate, with a
view toward stopping the study as soon as the results become statistically
compelling. This is in contrast to a sequential design, in which the probability
that a participant is assigned to a particular intervention is changed depending
on the accumulating results. In sequential analysis the probabilty of assignment
to an intervention is constant across the study
In sequential analysis, the final sample size is not known at the beginning of the
study. On average, sequential analysis will lead to a smaller average sample size
than that in an equivalently powered study with a fixedsample-size design. This
is a major advantage to sequential analysis and is a reason that it should be given
consideration when one is planning and analyzing a small clinical trial.
The use of study stopping (cessation) rules that are based on successive
examinations of accumulating data may cause difficulties because of the need to
reconcile such stopping rules with the standard approach to statistical analysis
used for the analysis of data from most clinical trials. This standard approach is
known as the “frequentist approach.” In this approach the analysis takes a form
that is dependent on the study design. When such analyses assume a design in
which all data are simultaneously available, it is called a “fixed-sample analysis.”
If the data from a clinical trial are not examined until the end of the study, then a
fixed-sample analysis is valid.
2) HIERARCHICAL MODELS
Hierarchical models can be quite useful in the context of small clinical trials in
two regards. First, hierarchical models provide a natural framework for
combining information from a series of small clinical trials conducted within
ecological units (e.g., space missions or clinics). In the case where the data are
complete, in which the same response measure is available for each individual,
hierarchical models provide a more rigorous solution than metaanalysis, in that
there is no reason to use effect magnitudes as the unit of observation. Note,
however, that a price must be paid (i.e., the total sample size must be increased)
to reconstruct a larger trial out of a series of smaller trials. Second, hierarchical
models also provide a foundation for analysis of longitudinal studies, which are
necessary for increasing the power of research involving small clinical trials. By
repeatedly obtaining data for the same subject over time as part of a study of a
single treatment or a crossover study, the total number of subjects required in
the trial is reduced.
A common theme in medical research is two-stage sampling, that is, sampling of
responses within experimental units (e.g., patients) and sampling of
experimental units within populations.
Analysis of this type of data (under the assumptions that a subset of the
regression parameters has a distribution in the population of participants and
that the model residuals have a distribution in the population of responses within
participants and also in the population of participants) belongs to the class of
statistical analytical models called:
• mixed model • regression with randomly dispersed parameters
• exchangeability between multiple regressions
3) BAYESIAN ANALYSIS
The majority of statistical techniques that clinical investigators encounter are of
the frequentist school and are characterized by significance levels, confidence
intervals, and concern over the bias of estimates. The Bayesian philosophy of
statistical inference however is fundamentally different from that underlying the
frequentist approach. In certain types of investigations Bayesian analysis can
lead to practical methods that are similar to those used by statisticians who use
the frequentist approach. The Bayesian approach has a subjective element. It
focuses on an unknown parameter value q, which measures the effect of the
experimental treatment. Before designing a study or collecting any data, the
investigator acquires all available information about the activities of both the
experimental and the control treatments.
The attraction of the Bayesian approach lies in its simplicity of concept and the
directness of its conclusions. Its flexibility and lack of concern for interim
inspections are especially valuable in sequential clinical trials. The main problem
with the Bayesian approach, however, lies in the idea of a subjective distribution
4) DECISION ANALYSIS
Decision analysis is a modeling technique that systematically considers all
possible management options for a problem. It uses probabilities and utilities to
explicitly define decisions. The computational methods allow one to evaluate the
importance of any variable in the decision-masking process. Sensitivity analysis
describes the process of recalculating the analysis as one changes a variable
through a series of plausible values.
The other major advantage of decision analysis occurs after data collection. If one
assumes that the sample size is inadequate and therefore that the confidence
intervals on the effect in question are wide, one may still have a clinical situation
for which a decision is required. One might have to make decisions under
conditions of uncertainty, despite a desire to increase the certainty. The use of
decision analysis can make explicit the uncertain decision, even informing the
level of confidence in the decision.
The formulation of a decision analytical model helps investigators consider which
health outcomes are important and how important they are to one another.
Decision analysis also facilitates consideration of the potential marginal benefit
of a new intervention by forcing comparisons with other alternatives
5) STATISTICAL PREDICTION
When the number of control samples is potentially large and the number of
experimental samples is small and is obtained sequentially from a series of
clusters with small sample sizes (e.g., space missions), traditional comparisons of
the aggregate means or medians may be of limited value. In those cases, one can
view the problem not as a classical hypothesis testing problem but as a problem
of statistical prediction. Conceptualized in that way, the problem is one of
deriving a limit or interval on the basis of the control distribution that will
include the mean or median for all or a subset of the experimental cluster
samples.
A drawback to this method is that the control group is typically not a concurrent
control group. Thus, if other conditions, in addition to the intervention being
evaluated, are changed, it will not be possible to determine if the changes are in
fact due to the experimental condition
6) META-ANALYSIS: SYNTHESIS OF RESULTS OF INDEPENDENT STUDIES
Meta-analysis refers to a set of statistical procedures used to summarize
empirical research in the literature. A meta-analysis can summarize an entire set
of research in the literature, a sample from a large population of studies, or some
defined subset of studies (e.g., published studies or n-of-1 studies). The degree to
which the results of a synthesis can be generalized depends in part on the nature
of the set of studies.
In general, meta-analysis serves as a useful tool to answer questions for which
single trials were underpowered or not designed to address. More specifically,
the following are benefits of meta-analysis: • It can provide a way to combine the
results of studies with different designs (within reason) when similar research
questions are of interest. • It uses a common outcome metric when studies vary
in the ways in which outcomes are measured. • It accounts for differences in
precision, typically by weighting in proportion to sample size. • Its indices are
based on sufficient statistics. • It can examine between-study differences in
results (heterogeneity).
7) RISK-BASED ALLOCATION
Empirical Bayes methods are needed for analysis of experiments with risk-based
allocation for two reasons. First, the natural heterogeneity from subject to
subject requires some accounting for random effects; and second, the differential
selection of groups due to the risk-based allocation is handled perfectly by the “u-
v” method
The u-v method of estimation capitalizes on certain general properties of
distributions such as the Poisson or normal distribution that hold under arbitrary
and unknown mixtures of parameters, thus allowing for the existence of random
effects. At the same time, the u-v method allows estimation of averages under a
wide family of restrictions on the sample space, such as restriction to high-risk
or low-risk subjects, thus addressing the risk-based allocation design feature.
Statistics
The mathematics of the collection, organization, and interpretation of numerical
data, especially the analysis of population characteristics by inference from
sampling.

Why Statistics?
Medicine is a quantitative science but not exact
–Not like physics or chemistry
•Variation characterises much of medicine
•Statistics is about handling and quantifying variation and uncertainty
•Humans differ in response to exposure to adverse effects
Example: not every smoker dies of lung cancer some non-smokers die of lung
cancer
•Humans differ in response to treatment
Example: penicillin does not cure all infections
•Humans differ in disease symptoms
Example: Sometimes cough and sometimes wheeze are presenting features for
asthma
Statistics play a very important role in any clinical trial from design, conduct,
analysis, and reporting in terms of controlling for and minimizing biases,
confounding factors, and measuring random errors.
The statistician generates the randomization code, calculates the sample size,
estimates the treatment effect, and makes statistical inferences, so an appreciation
of statistical methods is fundamental to understanding randomized trial methods
and results.
Statistical analyses deal with random error by providing an estimate of how likely
the measured treatment effect reflects the true effect
Why Statistics Are Necessary
• Statistics can tell us whether events could have happened by chance and to
make decisions
• We need to use Statistics because of variability in our data
• Generalise: can what we know help to predict what will happen in new and
different situations?
Statistics into three major categories as follows based on how they are used:

1. DESCRIPTIVE STATISTICS
• Enumerate, organise, summarise, and categorise graphical representation of
data. These type of statistics describes the data.
•Examples
• means and frequency of outcomes
• charts and graphs


Descriptive statistics are used to summarize and describe data collected in a study.
To summarize a quantitative (continuous) variable, measures of central location
(i.e. mean, median, and mode) and spread (e.g. range and standard deviation) are
often used, whereas frequency distributions and percentages (proportions) are
usually used to summarize a qualitative variable.
Descriptive statistics paint a picture of a situation, providing a concise numerical
or graphical summary. We use descriptive statistics every day to help
communicate a phenomenon or situation to other people. People often view
descriptive statistics as being more “objective” than non-numerical descriptions.
set of descriptive statistics may be as follows: 75 people, the average age was 35
years old, the range of ages from 21 to 47, 65% female and 35% male, 80% were
college graduates and 20% had advanced degrees, 30% were from out of state,
and 20% had black hair, 20% had blonde hair, 50% had brown hair, and 10%
had red hair. As you can see, descriptive statistics provide “hard” numbers
against which each person can compare his or her personal benchmarks.
A female friend may feel that 35% men is too few for a party while a male friend
may be happy to hear that the crowd was 65% female
2. COMPARATIVE STATISTICS
Simple descriptive statistics without a proper context or comparison may not be
useful. Is a party with 80% college graduates good or bad? How good or bad is a
35% to 65% male-to-female ratio? Remember very few things are inherently
good or bad. It all depends on comparisons. A party with 65% women may not be
good for a man who is used to attending social gathering with 80% women.
However, a man who has been living and working among only men for several
years may be related to find a party with any women. Is a more sophisticated
type of descriptive statistics serve to compare in a numerical or graphical fashion
one situation with another (or multiple other situations).
3. INFERENTIAL STATISTICS
• Drawing conclusions from incomplete information.
• they make predictions about a larger population given a smaller sample
• these are thought of as the statistical test
•Examples
• 95%CI, t-test, chi-square test, ANOVA, regression

Inferential statistics are used to make inferences or judgments about a larger


population based on the data collected from a small sample drawn from the
population. A key component of inferential statistics is hypothesis testing.
Examples of inferential statistical methods are t-test and regression analysis.
Inferential statistics helps to suggest explanations for a situation or phenomenon.
It allows you to draw conclusions based on extrapolations, and is in that way
fundamentally different from descriptive statistics that merely summarize the
data that has actually been measured. Let us go back to our party example. Say
comparative statistics suggest that parties hosted by your friend Sophia are very
successful (e.g., the average number of attendees and the median duration of her
parties are greater than those of other parties). Your next questions may be: Why
are her parties so successful? Is it the food she serves, the size of her social
network, the prestige of her job, the number of men or women she knows, her
physical attractiveness, the alcohol she provides, or the location and size of her
residence? Inferential statistics may help you answer these questions. Finding
that less well-attended parties had on average fewer drinks served would suggest
that your friend Sophia ’ s drinks might be the important factor. The differences
in attendance and drinks served between her parties and other parties would
have to be large enough to draw any conclusions.
SAMPLES
A population is an entire group of people, animals, objects, or data that meet a
specific set of criteria. A population may be real or hypothetical. A hypothetical
population could be the resulting cardiac ejection fractions if a certain heart
procedure were instituted among patients in France. The heart procedure is not
in use yet, and the resulting population is only a guess or prediction. A sample is
a portion or subset of a population. When populations are very large, observing
or testing every single member of the population becomes impractical
Usually, samples must be random to be representative of the population.
Choosing a simple random sample is equivalent to putting every member of the
population in a hat or large container and blindly selecting a specified number of
members. Every member of a population has an equal chance of being selected
for a simple random sample. Selecting a stratified random sample involves first
dividing the population into different relevant categories or strata (e.g., men and
women or under 21 years old and 21 years and older) and then selecting random
samples from each category. Creating a non-random sample that is truly
representative of the population is very difficult since a population has so many
different characteristics.
Choosing the size of your study population (i.e., the sample size) is a critical
decision. Whether you are setting up a clinical study or analysing existing data,
you need to know how many subjects to include in your study or whether you
already have enough subjects. Samples that are too large waste time and
resources and, in a clinical trial, may expose too many patients to potential harm.
Conversely, samples that are too small confer an inadequate study power and
may generate inaccurate or inconclusive results. Having patients participate in a
potentially useless clinical study wastes their time and effort and, as a result,
may be quite unethical
BIAS
Systematic errors associated with the inadequacies in the design, conduct, or
analysis of a trial on the part of any of the participants of that trial (patients,
medical personnel, trial coordinators or researchers), or in publication of its the
results, that make the estimate of a treatment effect deviate from its true value.
Systematic errors are difficult to detect and cannot be analyzed statistically but
can be reduced by using randomization, treatment concealment, blinding, and
standardized study procedures.
CONFIDENCE INTERVALS
A range of values within which the "true" population parameter (e.g. mean,
proportion, treatment effect) is likely to lie. Usually, 95% confidence limits are
quoted, implying that there is 95% confidence in the statement that the "true"
population parameter will lie somewhere between the lower and upper limits.

STATISTICS AND PARAMETERS

1 Central Tendency
The central tendency of a variable is the “middle” or “typical” values of the
variable in a sample of population. Measures of the central tendency provide a
single number answer to the question: What is the typical value of that variable
in your sample or population? People who want to “go along with the crowd” are
most interested in central tendencies. Our choice of the central tendency measure
depends on the situation and the distribution of the data.
Table 3.1 lists the common measures of central tendencies.

2 Spread (Dispersion or Variability)


The spread (dispersion or variability) measures how different values of a
variable are from each other. The greater the diversity of values, the greater the
spread is. Table 3.2 lists the common measures of spread. Each measure of
spread has its advantages and disadvantages, but the standard deviation is by far
the most commonly used measure.
3 Shape
Knowing the central tendency and the spread of data does not necessarily give
you enough information. Two different sets of data could have equivalent
measures of the central tendency and spread but be very different (Figure 3.1).
Therefore, knowing the distribution of the variable’s values is very helpful.

The distribution is the shape of the curve generated if you were to plot the
frequencies of each value of the variable. The two common measures of shape are
skew and kurtosis.
Skew is the degree to which a curve is “bent” toward one direction.

The skew is:


Positive (Right skewed): If the curve’s right tail is longer than its left and most of
the distribution is shifted to the left. The mean is usually (but not always) greater
than the median (Figure 3.2a).
Negative (Left skewed): If the curve’s left tail is longer than its right and most of
the distribution is shifted to the right. The mean is usually (but not always) less
than the median (Figure 3.2b).
Zero (No skew): If the right and left tails are of equal length and most of the
distribution is in the center of the curve. The normal distribution has a slew of
zero. The mean and the median are usually (but not always) equal (Figure 3.2.c).

The kurtosis is the degree to which a curve is peaked or flat. The greater the
kurtosis, the more “peaked” a curve is. The less the kurtosis, the flatter the curve
is.

The kurtosis may be:


Zero (mesokurtic or mesokurtotic): Normal distributions have zero kurtosis
(Figure 3.3a).
Positive (leptokurtic or leptokurtotic) : Compared to a normal curve, these curves
have a more acute, taller peak at the mean (i.e., a greater number of values close
to the mean) and “ fatter ” tails (i.e., a greater number of extreme values) at the
more extreme ( Figure 3.3b ).
Negative (platykurtic or platykurtotic): Compared to a normal curve, these
curves have a flatter, lower peak at the mean (i.e., a smaller number of values
close to the mean) and “thinner” tails (i.e., fewer extreme values) (Figure 3.3c).

NORMAL DISTRIBUTION
1 The Importance of Normal Distributions
Normal distributions, also known as bell-shaped curves or Gaussian
distributions, have the same general shape: symmetric and unimodal (i.e., a
single peak) with tails that appear to extend to positive and negative infinity. In a
normal curve, approximately 68% of the values fall within one standard
deviation of the mean, 95% fall within two standard deviations, and 99.7% fall
within three standard deviations.
Normal curves can differ in spread. Like most distributions, the normal
distribution has a mean (𝜇) and a standard deviation (𝜎).
Understanding the normal distribution is important. Interestingly, many
biological, psychological, sociological, economical, chemical, and physical
variables exhibit normal distributions. A classic example is that educational test
scores tend to follow a bell curve: most student’s score close to the mean and
much fewer have very high or very low scores. In fact, when the distribution of a
variable is unknown, you frequently assume that it is normal until proven
otherwise. Many statistical tests are based on normal distributions. Violations of
normal distributions, in fact, may invalidate some of these tests, although many
statistical tests still function reasonably well with other types of distributions

2 Standard Normal Distribution


The standard normal (or Z-) distribution is a special normal distribution that has
a mean of 0 and a standard deviation of 1. If you know the mean and standard
deviation of a normal distribution, you can transform it into a standard normal
distribution

3 The t -Distribution
While large samples have a distribution close to the normal distribution (i.e., the
larger the sample, the more normal the distribution), small samples do not.
Moreover, many times the population standard deviation is unknown, requiring
you to use the sample standard deviation to estimate the population standard
deviation. Whenever you have a small sample or do not know the population
standard deviation, using a t-distribution may be more appropriate than using a
normal distribution. t -Distributions are similar to (i.e., symmetrical and bell
shaped) but flatter (leptokurtic) than the standard normal distribution. Unlike
the normal distribution, the t distribution has a special additional parameter,
degrees of freedom (df) , that can be any real number greater than zero and
changes the t -distribution curve ’ s shape. Curves with smaller df have more of
their area under their tails and are therefore flatter than curves with higher
degrees of freedom. As df increase, the t -distribution becomes more and more
like the standard normal distribution.
The t -score (or t -statistic) is to the t -distribution what the Z -score is to the
normal distribution. Use the following formula to calculate a t -score

Two statistical approaches are often used for clinical data analysis: hypothesis
testing and statistical estimate.

Hypothesis Testing
» Hypothesis testing or inference involves an assessment of the probability of
obtaining an observed treatment difference or more extreme difference for
an outcome assuming that there is no difference between two treatments
» This probability is often called the P-value or false-positive rate.
» If the Pvalue is less than a specified critical value (e.g., 5%), the observed
difference is considered to be statistically significant. The smaller the P-
value, the stronger the evidence is for a true difference between treatments.
» On the other hand, if the P-value is greater than the specified critical value
then the observed difference is regarded as not statistically significant, and
is considered to be potentially due to random error or chance. The traditional
statistical threshold is a P-value of 0.05 (or 5%), which means that we only
accept a result when the likelihood of the conclusion being wrong is less than
1 in 20, i.e., we conclude that only one out of a hypothetical 20 trials will
show a treatment difference when in truth there is none.

When hypothesis testing arrives at the wrong conclusions, two types of errors can
result: Type I and Type II errors (Table 3.4). Incorrectly rejecting the null
hypothesis is a Type I error, and incorrectly failing to reject a null
Hypothesis is a Type II error. In general, Type II errors are more serious than Type
I errors; seeing an effect when there isn’t one (e.g., believing an ineffectual drug
works) is worse than missing an effect (e.g., an effective drug fails a clinical trial).

Alpha (Type I) and Beta (Type II) Errors


Type I Error (a)
False positive
• Rejecting the null hypothesis when in fact it is true
• Standard: a=0.05
• In words, chance of finding statistical significance when in fact there truly
was no effect
Type II Error (b)
False negative
• Accepting the null hypothesis when in fact alternative is true
• Standard: b=0.20 or 0.10
• In words, chance of not finding statistical significance when in fact there was
an effect
 When testing a hypothesis, two types of errors can occur.
 To explain these two types of errors, we will use the example of a
randomized, double-blind, placebo-controlled clinical trial on a cholesterol-
lowering drug ‘A’ in middle-aged men and women considered to be at high
risk for a heart attack.
 The primary endpoint is the reduction in the total cholesterol level at 6
months from randomization.
 The null hypothesis is that there is no difference in mean cholesterol
reduction level at 6 months post dose between patients receiving drug A (μ1)
and patients receiving placebo (μ2) (H0: μ1 = μ2); the alternative hypothesis
is that there is a difference (Ha: μ1 ≠ μ2).
 If the null hypothesis is rejected when it is in fact true, then a Type I error
(or false-positive result) occurs. For example, a Type I error is made if the
trial result suggests that drug A reduced cholesterol levels when in fact there
is no difference between drug A and placebo. The chosen probability of
committing a Type I error is known as the significance level.
 As discussed above, the level of significance is denoted by α.
 In practice, α represents the consumer’s risk, which is often chosen to be 5%
(1 in 20).
 On the other hand, if the null hypothesis is not rejected when it is actually
false, then a Type II error (or false-negative result) occurs. For example,
a Type II error is made if the trial result suggests that there is no difference
between drug A and placebo in lowering the cholesterol level when in fact
drug A does reduce the total cholesterol. The probability of committing a Type
II error, denoted by β, is sometimes referred to as the manufacturer’s risk.
 The power of the test is given by 1 – β, representing the probability of
correctly rejecting the null hypothesis when it is in fact false. It relates to
detecting a pre-specified difference.
Statistical Estimate

» Statistical estimates summarize the treatment differences for an outcome in


the forms of point estimates (e.g., means or proportions) and measures of
precision (e.g., confidence intervals [CIs]).
» A 95% CI for a treatment difference means that the range presented for the
treatment effect contains (when calculated in 95 out of 100 hypothetical
trials assessing the same treatment effect) the true value of treatment
difference, i.e., the value we would obtain if we were to use the entire
available patient population is 95% likely to be contained in the 95% CI.
Let us assume that four randomized, double-blind, placebo-controlled trials are
conducted to establish the efficacy of two weight-loss drugs (A and B) against
placebo, with all subjects, whether on a drug or placebo, receiving similar
instructions as to diet, exercise, behaviour modification, and other lifestyle
changes.
The primary endpoint is the weight change (kg) at 2 months from baseline. The
difference in the mean weight change between an active drug and placebo groups
can be considered as weight reduction for the active drug against placebo.
Table 2 presents the results of hypothesis tests and CIs for the four hypothetical
trials.

The null hypothesis for each trial is that there is no difference between the active
drug treatment and placebo in mean weight change.
In trial 1 of drug A, the reduction of drug A over placebo was 6 kg with only 40
subjects in each group. The P-value of 0.074 suggests that there is no evidence
against the null hypothesis of no effect of drug A at the 5% significance level. The
95% CI shows that the results of the trial are consistent with a difference ranging
from a large reduction of 12.6 kg in favor of drug A to a reduction of 0.6 kg in favor
of placebo.
The results for trial 2 among 400 patients, again for drug A, suggest that mean
weight was again reduced by 6 kg. This trial was much larger, and the P-value (P
< 0.001) shows strong evidence against the null hypothesis of no drug effect. The
95% CI suggests that the effect of drug A is a greater reduction in mean weight
over placebo of between 3.9 and 8.1 kg. Because this trial was large, the 95% CI
was narrow and the treatment effect was therefore measured more precisely.
In trial 3, for drug B, the reduction in weight was 4 kg. Since the P-value was
0.233, there was no evidence against the null hypothesis that drug B has no
statistically significant benefit effect over placebo. Again this was a small trial with
a wide 95% CI, ranging from a reduction of 10.6 kg to an increase of 2.6 kg for the
drug B against the placebo.
The fourth trial on drug B was a large trial in which a relatively small, 2-kg
reduction in mean weight was observed in the active treatment group compared
with the placebo group. The Pvalue (0.008) suggests that there is strong evidence
against the null hypothesis of no drug effect. However, the 95% CI shows that the
reduction is as little as 0.5 kg and as high as 3.5 kg. Even though this is convincing
statistically, any recommendation for its use should consider the small reduction
achieved alongside other benefits, disadvantages, and cost of this treatment.

THE NULL HYPOTHESIS


A clinical study is essentially hypothesis testing. A hypothesis is a postulated
theory about or suggested explanation for a phenomenon. Every formal study or
experiment should consist of two rival and polar opposite hypotheses: a null
hypothesis (H 0) and an alternative hypothesis (H 1). The goal of the experiment
is to either accept or reject the null hypothesis. Rejecting the null hypothesis
suggests that the alternative hypothesis is a viable theory or explanation. For
example, the null hypothesis may be that there is no difference between two
groups of patients. If the study shows a difference between the two groups that is
not likely due to random chance, you may reject the null hypothesis and infer
that the alternative hypothesis (i.e., there is truly a difference between the two
groups) may be true.
The null hypothesis could be that a measure or the difference between two
measures is equal to, greater than, or less than a certain value. The null
hypothesis also does not have to involve means. The null hypothesis may state
that there is no difference between two or among several medians, proportions,
or correlation coefficients

P-value
• The p-value is a “tool” to answer the question:
• Could the observed results have occurred by chance*?
Remember:
• –Decision given the observed results in a SAMPLE
• –Extrapolating results to POPULATION
• *: accounts exclusively for the random error, not bias

Sample Size
The planned number of participants is calculated on the basis of:
• Expected effect of treatment(s)
• Variability of the chosen endpoint
• Accepted risks in conclusion

95%CI
Better than p-values…
• use the data collected in the trial to give an estimate of the treatment effect
size, together with a measure of how certain we are of our estimate
•CI is a range of values within which the “true” treatment effect is believed to
be found, with a given level of confidence.
• 95% CI is a range of values within which the ‘true’ treatment effect will lie
95% of the time
•Generally, 95% CI is calculated as
• Sample Estimate ± 1.96 x Standard Error

Interpreting Results of Clinical Trials


 Studies termed “negative” are commonly defined as those where the
difference for the primary endpoint has a P value greater than or equal to
0.05 (P ≥ 0.05) (1), that is, where the null hypothesis is not rejected. These
studies are difficult to publish because they are said to be “nonsignificant.”
In other words, the data are not strong enough to persuade rejection of the
null hypothesis. A high P value is frequently interpreted as proof that the null
hypothesis is true; however, such an interpretation is a logical fallacy.
 On the other hand, if a small P value is observed, it implies there is evidence
that the null hypothesis is false, which is why much stronger clams can be
made when the null hypothesis is rejected. Recall that the null hypothesis
(H0) is a stated value of the population parameter that is set up to be refuted.
Most often, the value of H0 states that the effect of interest (e.g., mean
difference, squared multiple correlation coefficient, regression coefficient) is
zero.

 This point is illustrated in Table 1, a 2 × 2 statistical inference decision table,


wherein what is the true but unknown state of the world is crossed with the
statistical decision, thereby generating conceptual definitions for the type 1
and type 2 error rates.
 Until statistical evidence in the form of a hypothesis test indicates otherwise,
the null hypothesis is presumed true (1). For example, in a clinical trial with
an intervention and a control group, the null hypothesis generally proposes
that the intervention and control are equally as effective or ineffective. That
is, the population mean of the treatment and control groups is assumed to be
the same until an effect estimate (which reaches some prespecified statistical
threshold) is observed.

Analysis of clinical trials


The analysis of clinical trials involves many related topics including:
 the choice of an estimand (measure of effect size) of interest that is closely
linked to the objectives of the trial,
 the choice and definition of analysis sets,
 the choice of an appropriate statistical model for the type of data being
studied,
 appropriate accounting for the treatment assignment process,
 handling of missing data,
 handling of multiple comparisons or endpoints,
 accounting for interim analyses and trial adaptations,
 And appropriate data presentation.

Statistical methodology and case study


1: Bonferroni method: Bonferroni method can strictly control type I error rate
within 0.05. The Alpha adjusted by Bonferroni method is calculated by 0.05/m (m
is the compared time), it rejects null hypothesis when the p-value is less than the
adjusted Alpha.
2: Šidák method: the methodology is similar to Bonferroni method. The Alpha
adjusted by Šidák method is calculated by 1-(1-a) **1/m (m is the compared time),
it rejects null hypothesis when the p-value is less than the adjusted Alpha.
3: Holm stepwise test: the Holm multiple test is based on a sequentially rejective
algorithm that tests the ordered hypothese H(1) ,. . .H(m) corresponding to the
ordered p-values p(1), . . .,p(m). The multiple testing procedure begins with the
hypothesis associated with the most significant Pvalue. This hypothesis is rejected
if p(1) < a/m, further, H(i) is rejected at the ith step if p(i) < a/(m-i+1) for all i=1,
. . .,otherwise, H(i) , . . ., H(m) are retained and the algorithm terminates
4: Hochberg stepwise test: Contrast with Holm test, this test examines the ordered
p-values p (1), p (m) starting with the largest one and thus falls into the class of
step-up tests

UNIT VII- Data Analysis and interpretation


MCQs ‐ 1 Mark
1. Which of the following is the odd one out?
A. Kaplan‐Meier
B. Histogram
C.ANOVA
D. Bar
2. What would be being tested for if a Shapiro‐Wilk Test was referenced?
A. Difference in proportions
B. Non‐inferiority
C. Normality
D. Correlation
3. If two primary endpoints were referred to as co‐primary and the
significance level set to 5% which of the following would be classed as a
statistically significant primary endpoint?
A. Primary Endpoint 1: p=0.04, Primary Endpoint 2: p=0.06
B. Primary Endpoint 1: p=0.000001, Primary Endpoint 2: p=0.051
C. Primary Endpoint 1: p=0.06, Primary Endpoint 2: p=0.049
D. Primary Endpoint 1: p=0.049, Primary Endpoint 2: p=0.049
4. If two primary endpoints were referred to as multiple or alternative
primary and the significance level set to 5% which of the following would not
be classed as a statistically significant primary endpoint?
A. Primary Endpoint 1: p=0.04, Primary Endpoint 2: p=0.04
B. Primary Endpoint 1: p=0.00001, Primary Endpoint 2: p=0.06
C. Primary Endpoint 1: p=0.024, Primary Endpoint 2: p=0.001
D. Primary Endpoint 1: p=0.026, Primary Endpoint 2: p=0.024
5. Which of the following is not a method to maintain the type I error level in
a clinical trial?
A. Bonferroni
B. Pearson
C. OBrien‐Fleming
D. Holm Step Down
6. If the sample size is greater than 2, which of the following would give the
smallest value?
A. 1 standard deviation
B.2 standard deviation
C. 1 standard error
D. 2 standard error
7. Which of the following is NOT true
A. Standard error will reduce with increasing sample size
B. Standard deviation and variability reduce as N gets larger
C. Standard deviation remains reasonably constant as sample size increases
D. If standard deviation increases then so does standard error
8. Which of the following approaches to analysis populations would you expect
to be used to analyse a Phase III Non‐inferiority trial?
A. Intention to treat
B. Per Protocol
C. All treated
D. Modified Intention to treat
9. Which of the following is a non‐parametric test?
A. ANOVA
B.ANCOVA
C. Wilcoxon Rank Sum Test
D. t‐test
LONG ESSAY ‐ 10 Marks
A. Define biostatistics and explain the types of statistics.
Biostatistics an application of statistics is nothing but the science of the analysing
design and data of biological experiments, especially in medicine and related
research. By definition, it is the collection, summarization, and analysis of data
from those experiments; and the interpretation of, and inference from, the results.
Biostatistics also helps in presenting the scientific manuscript with relatively
sophisticated statistical analyses of a complex set of medical data in
renowned scientific journals.
» The bio statistical analysis is key to conduct new clinical research and one of
the foundations of evidence-based clinical practice. It evaluates and applies
prior research findings precisely for the new researches. With less than 10
per cent of the new compounds reaching the market, the need for advanced
biostatistics is increasing every day.
» With the continuous evolution of the bio statistical methods for the past few
centuries, now they can help in experimental design with a prospective
approach. The p-value which came into existence only 80 years ago has made
the scope and use of biostatistics leap in bounds. P-value summarizes
whether the observed data could have happened by chance and is widely used
in clinical research involving:
Investigating proposed medical treatments
Assessing the relative benefits of competing therapies
Establishing optimal treatment combinations

Roles and Importance of Biostatistics in Clinical Research:


1. Protocol Development
In the protocol development in clinical researches, the following are the roles and
responsibilities of the biostatistician
» Objectives:
» Study Design
» Sample size
» Analysis plan summary
» Protocol Review

2. Data Management:
1. CRF management with content and design

2. Dataset specification with annotation of CRFs and record layout

3. Validation with error checking specification and test data

3. Study Implementation
It involves sampling selection and implementation of randomizes procedures

4. Study Monitoring
It involves monitoring of quality and for safety and efficacy

5. Data Analysis:
 A detailed analysis plan writing with all hypotheses to be tested along with the hierarchy
of analysis
 It helps to prepare for reporting, manuscript writing, along with the validity and
creditability of results

6. Reports or Manuscript writing:


 Method section with statistical methodology and description of data with endpoints and
design
 Result section includes data presented in the form of a graph, tables, and others

 Discussion section with the appropriate interpretation of the results

B. Classify types of data and explain.


Refer the pdf types of data

SHORT ESSAY ‐ 5 Marks


A. Define population and sample size.
B. Describe P-value in details.
C. Describe type I and Type-II Error.
D. What is interval estimation? Explain.
interval estimation is the use of sample data to calculate an interval of possible
values of an unknown population parameter; this is in contrast to point
estimation, which gives a single value. The most prevalent forms of interval
estimation are:
 Confidence intervals (a frequentist method): In statistics, a confidence
interval (CI) is a type of estimate computed from the statistics of the
observed data. This proposes a range of plausible values for an
unknown parameter (for example, the mean). The interval has an
associated confidence level that the true parameter is in the proposed

range. Given observations and a confidence level , a valid confidence

interval has a probability of containing the true underlying parameter.


The level of confidence can be chosen by the investigator. In general terms,
a confidence interval for an unknown parameter is based on sampling
the distribution of a corresponding estimator.
 credible intervals (a Bayesian method): In Bayesian statistics, a credible
interval is an interval within which an unobserved parameter value falls
with a particular probability. It is an interval in the domain of a posterior
probability distribution or a predictive distribution.[1] The generalisation to
multivariate problems is the credible region. Credible intervals are
analogous to confidence intervals in frequentist statistics,[2] although they
differ on a philosophical basis:[3] Bayesian intervals treat their bounds as
fixed and the estimated parameter as a random variable, whereas
frequentist confidence intervals treat their bounds as random variables and
the parameter as a fixed value.
The scientific problems associated with interval estimation may be summarised
as follows:

 When interval estimates are reported, they should have a commonly held
interpretation in the scientific community and more widely. In this
regard, credible intervals are held to be most readily understood by the
general public [citation needed]. Interval estimates derived from fuzzy logic
have much more application-specific meanings.
 For commonly occurring situations there should be sets of standard
procedures that can be used, subject to the checking and validity of any
required assumptions. This applies for both confidence intervals and
credible intervals.
 For more novel situations there should be guidance on how interval
estimates can be formulated. In this regard confidence intervals and
credible intervals have a similar standing but there are differences:

 Credible intervals can readily deal with prior information, while


confidence intervals cannot.
 confidence intervals are more flexible and can be used practically in
more situations than credible intervals: one area where credible
intervals suffer in comparison is in dealing with non-parametric models

You might also like