Professional Documents
Culture Documents
Understanding Research
Study Designs
Did investigator assign exposures ?
Yes No
Direction ?
E O E & O at same
time
E O
Case control Cross sectional
Cohort study study study
E= Exposure; O=Outcome
Randomization process
• Randomization: The process by which each
subject has the same chance of being assigned
to either intervention or control group.
• tends to produce study groups comparable with
respect to known and unknown risk factors,
• removes investigator bias in the allocation of
participants,
• and guarantees that statistical tests will have
valid significance levels.
Design- comparative trials
• There are two types of experimental
designs of comparative clinical trials:
• Fixed-sample trial: the number of patients
allocated to the two (or more) treatments
is fixed before the study begins.
• Sequential trials: the decision whether to
continue taking new patients is determined
by the results accumulated to that time.
• Discussed later
Design- comparative trials
• There are two types of experimental
designs of comparative clinical trials:
• Fixed-sample trial: the number of patients
allocated to the two (or more) treatments
is fixed before the study begins.
• Sequential trials: the decision whether to
continue taking new patients is determined
by the results accumulated to that time.
• Discussed later
Fixed-sample designs
Simple randomized design:
• Patients are randomized to the two (or more) treatments without
considering their characteristics.
• The main advantage is its simplicity and usefulness/ prognostic factors are
unknown/ or the potential subjects are homogeneous. .
• To allot “A” and “B” toss a coin H=A, T=B, design is
AAABBAAAAABABABBAAAABAA…as H or T occurs
Another method using simple random no. table
• With two treatments digit 0-4 = treatment A
digit 5-9 = treatment B
• Nos. in top row of table: 0 5 2 7 8 4 3 7 4 1 6 8 3 8 etc..
• Sequence of treatment: A B A B B A A B A A B B A B etc..
Fixed-sample designs
Stratified randomized design( Randomized block design )
• patients can be grouped into prognostic categories, comparability among
treatment groups can be achieved ( randomization does not guarantee
equal groups).
• Repeated successive occurrence of the same treatment is eliminated
• Within each group, patients are randomly assigned to the treatments.
• For comparing 4 treatments, experimental units may be grouped in groups
of 4,( called replications) ; units in a replication- similar.
• Each treatment is then allocated by randomization to one unit in each
replication.
– AABB-ABBA-BBAA-BAAB-ABAB-AABB-…
Designs
• Parallel:- subjects are randomly assigned to treatments, which
then proceed in parallel with each group.
• Aim to provide assurance that any difference between
treatments is in fact due to treatment effects
• This is unlike a crossover study where at first one group receives
treatment A and later followed by treatment B while the other
group receives treatment B followed by treatment A.
• A parallel designed clinical trial compares the results of a
treatment on two separate groups of patients. Each group of
participants is exposed to only one of the study interventions
• In a parallel group (also termed "completely randomized")
design, each patient receives a single treatment.
• In a crossover design, each patient receives some or all of the
treatments being studied.
• A parallel group study is a simple and commonly used clinical
design which compares two treatments. Usually a test therapy is
compared with a standard therapy.
• The allocation of subjects to groups is usually achieved by
randomisation.
• The groups are typically named the treatment group and
the control group.
• Parallel group designs do not require the same number of
subjects in each group, although often similar numbers are
observed.
• The design is commonly used in randomised controlled trials.
• Statistical analysis
Parallel and crossover
Crossover Designs
• Some trials may invoke a crossover design in which patients
serve as their own controls.
• For example, subjects may undergo an experimental therapy
for six weeks and then “cross over” to the control therapy for
another six weeks (or vice versa).
• Crossover designs are appealing because the patients serve
as their own controls.
• A crossover design typically will require a much smaller
sample size than a “parallel” design.
• Effects of the first treatment may still be present for
a long period, or where the first treatment
permanently changes the course of the disease
• Methods to minimize carry over effects and to test
for the presence of carry-over are available.
• Nearly all crossover designs have "balance",
In most crossover trials, in fact, each subject
receives all treatments.
• A crossover study has two advantages over a
non crossover longitudinal study.
• First, the influence of confounding covariates
is reduced because each crossover patient
serves as his or her own control.
• Second, optimal crossover designs are
statistically efficient and so require fewer
subjects than do non-crossover designs.
Suitability of Cross-Over Design
• Suitable where there is no carryover effect
• Drug for asthma-a subject given salbutamol orally for
a specified period; then the other sal. Puff after the
washout period
• Randomization by tossing a coin-which pt. gets tr. A
first then B and which BA.
• Order effect removed.
• Variability in response group differences minimized.
• Efficiency increased due to small size.
• Limitations and disadvantages
• Suitable for chronic conditions; for curative treatments or rapidly changing
conditions, cross-over trials may be infeasible or unethical.
• Crossover studies often have two problems:
• First is the issue of "order" effects, because it is possible that the order in
which treatments are administered may affect the outcome. First with many
effects then a less harmfull.
Second is"carry-over" effects --can be avoided with a sufficiently long “wash-
out" period between treatments.
• However, the planning for sufficiently long wash-out periods does require
expert knowledge of the dynamics of the treatment, which often is
unknown, of course.
• Also, there might be a "learning" effect.
• This is important where you have controls who are naive to the intended
therapy.
• In such a case e.g. you cannot make a group (typically the group which
learned the skill first) unlearn a skill such as yoga and then act as a control in
the second phase of the study.
Crossover Designs
• Crossover designs -only for chronic diseases.
• Common cold may resolve itself within a short period of time; second
treatment not needed.
• Why are clinical trials necessary? Why not do all the tests on mice?
– Laboratory experiments are the paradigm (model of how something should be
done) of scientific study
– Several important limitations including generalization to human beings
Design
Experimental
group –
Anticoagulants,
Number of 70
Patients with 100
Surviving > 1yr.20
stroke
Control Group – No
Anticoagulants,100
A bioequivalency profile
comparison of 150 mg extended-
release bupropion as produced
by Impax Laboratories forTeva
and Biovail for GlaxoSmithKline.
• Prescribability means that a patient is ready
to embark on a treatment regimen for the
first time, so that either the reference or test
formulations can be chosen.
• Switchability means that a patient, who
already has established a regimen on either
the reference or test formulation, can switch
to the other formulation without any
noticeable change in efficacy and safety.
• Prescribability requires that the test and
reference formulations are population
bioequivalent, whereas switchability
requires that the test and reference
formulations have individual bioequivalence
• Often the old process is time intensive and more costly. Finding
the new process not inferior to the old process gives sufficient
reason to replace it.
• In clinical trials, with patient and treatment variability, a new
treatment that performs within 10 to 20% of an old treatment is
often the margin used to be called non-inferior.
• Often the safety profile of the new treatment is superior to the
standard treatment, making the new treatment preferable even
though it is not superior in efficacy.
• Outcomes of a Non-inferiority Trial At the conclusion of a non-
inferiority trial, the confidence interval can be plotted and
examined on a chart showing the non-inferiority margin
representing the non-inferiority difference between X and Y.
• Several outcomes are possible, from superiority of the new
treatment. Note for this hypothesis test we are comparing (X –
Y) with the lower non-inferiority boundary.
• Switching Between Non-inferiority and Superiority
• An experiment might be easily designed to show non-inferiority.
• Through testing, the outcome actually might show a statistically significant
improvement for Y compared with X.
• The issue arises as to whether the experimenter is entitled to claim superiority
for Y over X, rather than merely the non-inferiority for which the trial was
planned.
• If the upper confidence bound is greater than the non-inferiority margin and
the lower confidence bound exceeds zero, a conclusion of superiority is
warranted.
• It might also be possible to support a non-inferiority conclusion in the case of
an experiment designed to show superiority. This conclusion can be made only
if both the hypothesis of superiority and the hypothesis of inferiority are
stated before the experiment.
• Summary
• The use of non-inferiority experimental designs is well established in
evaluating new clinical entities and devices in the biotech and pharmaceutical
industries. Adapting this technique for identifying processes and products as
non-inferior vs. equivalent within a pre specified non-inferiority margin can
provide more cost effective sampling and analysis when applied to statistical
quality control.
δ is same as Δ
The choice of inferiority margin, Δ , affects the sample size calculation and the
conclusion of the study. A general rule of thumb is that this quantity must be
considerably smaller (1/2 or 1/3) than the minimal clinical difference we might
use to calculate sample size in a superiority trial.
• The null hypothesis in inferiority trials seems backwards, in a
sense, as this hypothesis is not 'null' at all.
• Instead, it states that the new treatment is worse than the
old by more than -Δ, where -Δ is the 'non-inferiority margin'.
The alternative hypothesis states that the difference in the
effect between the new and old interventions is less than -Δ
(Figure 1)
Interim analysis
• If a treatment is particularly beneficial or harmful compared to the concurrent
placebo group while the study is on-going, the investigators are ethically
obliged to assess that difference using the data at hand and may terminate
the study earlier than planned.
• The term ‘interim analysis’ is used to describe an evaluation of the current
data from an ongoing trial, in which the primary research question is
addressed, and which has the potential for modifying the conduct of the
study, usually before the recruitment is complete.
• In addition to saving time and resources, such a design feature can reduce
study participants' exposure to the inferior treatment. However, when
repeated significance testing on accumulating data is done, some adjustment
of the usual hypothesis testing procedure must be made to maintain an
overall significance level.
• Sometimes interim analyses are equally spaced in terms of calendar time or
the information available from the data, but this assumption can be relaxed to
allow for unplanned or unequally spaced analyses, It is a tool used for
statistical purpose.
Stopping rules
• The design of many clinical trials includes some strategy for early stopping
if an interim analysis reveals large differences between treatment groups.
• 1 0.05
• 2 0.08
• 3 0.11
Why would we have messed up if we looked early on?
• Every time we look at the data and consider stopping, we
introduce the chance of falsely rejecting the null hypothesis.
• In other words, every time we look at the data, we have the
chance of a type 1 error.
• If we look at the data multiple times, and we use alpha of
0.05 as our criterion for significance, then we have a 5%
chance of stopping each time.
• Under the true null hypothesis and just 2 looks at the data,
then we “approximate” the error rates as:
– Probability stop at first look: 0.05
– Probability stop at second look: 0.95*0.05 = 0.0475
Total probability of stopping 0.0975
Results
Outcome Direction
Of
Present Absent Sampling
O
Yes a b
Exposure
No c d
E
Odds of exposure to getting disease=a/ (a +b)/ b/ (a +b)=a/b
Odds of no exposure in getting disease=c/( c +d)/ d/ (c +d)=c/d
What is the excessive risk conferred having relatives with breast cancer?
women (no relatives with breast cancer) 100
will develop breast cancer 7
odds of getting breast cancer : (7/100) / (93/100) = 7/93
Y 28 72 100
OR=28*93/7*72
N 7 93 100 =5.2
Meta analysis
• A meta-analysis combines the results of several studies that
address a set of related research hypotheses.
• Done by identification of a common measure of effect size,
measuring average effect of a treatment.
• Resulting overall averages can be considered meta-effect
sizes, which are more powerful estimates of the true effect
size than those derived in a single study under a given single
set of assumptions and conditions
• Measure of effect size depends on problem and even customs of different fields of
research. (Hall et al. 1994)
• Wolf (1986), on the other hand, suggests that .25 is weakly significant and
.50 is clinically significant.
• The first meta-analysis was performed by Karl Pearson in
1904, in an attempt to overcome the problem of reduced
statistical power in studies with small sample sizes; analyzing
the results from a group of studies can allow more accurate
data analysis
• meta-analysis is widely used in epidemiology and
evidence-based medicine today
• Meta-analysis has been used to give helpful insight into:
1. the overall effectiveness of interventions (e.g.,
psychotherapy, outdoor education),
2. the relative impact of independent variables (e.g., the effect
of different types of therapy), and
3. the strength of relationship between variables.
Advantages of meta-analysis
• Deriving and statistical testing of overall
factors / effect size parameters in related
studies
• Generalization to the population of studies
• Ability to control for between study variation
• Including moderators to explain variation
• Higher statistical power to detect an effect
Steps in a meta-analysis
• 1. Search of literature
• 2. Selection of studies (‘incorporation criteria’)
• Based on quality criteria, e.g. the requirement of randomization and
blinding in a clinical trial
• Selection of specific studies on a well-specified subject, e.g. the treatment
of breast cancer.
• Decide whether unpublished studies are included to avoid publication bias
(file drawer problem:
• 3. Decide which dependent variables or summary measures are allowed.
For instance:
• Differences (discrete data)
• Means (continuous data)
• Hedges g is a popular summary measure for continuous data that is
standardized in order to eliminate scale differences, but it incorporates an
index of variation between groups
• Equation (4) is the expected number of cases in picking a binomial variable from a
population of Si susceptibles, with probability Pi +1. The conditional probability of
obtaining
• Ci+1= x cases given Ci and Si must then be
x si-x
• equation (5): P[Ci+1=x/ Ci.Si]= (Si) Pi+1 Qi+1
(x
• The parameter Pi, changes with i, thereby giving a chain-binomial
process.
• Note that the probabilities Qi+ 1 and Pi +1 and the expected values are conditional on
Ci and Si.
• Aside from the question of independence, it seems likely that the number of physically
possible contacts of one susceptible is not necessarily equal to Ci;for large C, it is
certainly less.
• On the other hand if C, is small as in household epidemics, the number of physically
possible contacts could be close to C, and the Reed-Frost model might then be close
though not necessarily exact.
• A branching process is a Markov process that models a
population in which each individual in generation n produces
some random number of individuals in generation n + 1,
according to a fixed probability distribution that does not vary
from individual to individual.
• Branching processes are used to model reproduction; for
example, the individuals might correspond to bacteria, each of
which generates 0, 1, or 2 offspring with some probability in a
single time unit.
• Branching Rule: Each individual has a random number of children in the next
generation. These random variables are independent copies of ξ and have a
distribution (pi ).
• Let us now review some standard results about branching processes. For
simplicity let µ denote E[ξ] and Let Zn denote the number of individuals in the
nth generation. By default, we set Z0 = 1, and also exclude the case when ξ is a
constant.
• Notice that a branching process may either become extinct or survive forever.
We are interested under what conditions and with what probabilities these
events occur.
Application
• Extinction Probability for Queues:
• A customer arrives at an empty server and immediately goes for service initiating a busy period.
During that service period, other customers may arrive and if so they wait for service.
• The server continues to be busy till the last waiting customer completes service which indicates
the end of a busy period. An interesting question is whether the busy periods are bound to
terminate at some point ? Are they ?
• or multiplicatively
• Xt = Tt · It · Ct · Et .
• Cyclical Variation
• The second component of a time series is cyclical variation consisting of a period of
prosperity followed by periods of recession, depression, and then recovery with no fixed
duration of the cycle –more than 1 year.
• Employment, production, the S&P/TSX Composite Index, and many other
business and economic series are below the long-term trend lines or
Conversely, in good periods they are above their long-term trend lines.
• Seasonal Variation
• The third component of a time series is the seasonal component. Many sales, production,
and other series fluctuate with the seasons. The unit of time reported is either quarterly or
monthly.
• Irregular Variation
• Many analysts prefer to subdivide the irregular variation into episodic and residual
variations.
• Episodic fluctuations are unpredictable, but they can be identified. The initial impact on
the economy of a major strike or a war can be identified, but a strike or war cannot be
predicted.
• Smoothing data removes random variation and shows trends
cyclic and seasonal components
• Two methods
• 1. Averaging Methods
2.Exponential Smoothing Methods
1. Taking averages is the simplest way to smooth data
The mean is not a good estimator when there are trends
Income of a PC manufacturer between 1985
and 1994.
• Yr Income
1985 46.163
1986 46.998
1987 47.816
1988 48.311
1989 48.758
1990 49.164
1991 49.548
1992 48.915 we can not use mean to forecast
1993 50.315 different estimates if there is trend
1994 50.768
Taking a moving average is a smoothing
process
• Supplier $ Error Error Squared
• --------------------------------------------------------------------------------
•
•
An alternative way to summarize the past data is to compute
1 9 -1 1
• 2 8 -2 4 the mean of successive smaller sets of numbers of past data
• 3 9 -1 1 as follows ; taking 3 as the set
• 4 12 2 4
• 5 9 -1 1 the average of the first 3 numbers is: (9 + 8 + 9) / 3 = 8.667 ;
• 6 12 2 4 smoothing process is continued by advancing one period and
• 7 11 1 1
• 8 7 -3 9 calculating the next average of three numbers, dropping the
• 9 13 3 9 first number.
• 10 9 -1 1
• 11 11 1 1
• 12 10 0 0
• Mean=10
• --------------------------------------------------------------------------------
•
• 1 9 p=4 p=2
• 1.5
• 2 8
• 2.5 9.5
• 3 9 9.5
• 3.5 9.5 9.75
• 4 12 10.0 10.062
• 4.5 10.5 10.375
• 5 9 10.75
• 55 11.0
• 6 12
• 6.5
• 7 9
Double Moving Averages for a Linear Trend
Process
• Unfortunately, neither the mean of all data nor the moving
average of the most recent M values, when used as forecasts
for the next period, are able to cope with a significant trend.
There exists a variation on the MA procedure that often does
a better job of handling trend.
• It is called Double Moving Averages for a Linear Trend
Process. It calculates a second moving average from the
original moving average, using the same value for M. As soon
as both single and double moving averages are available, a
computer routine uses these averages to compute a slope and
intercept, and then forecasts one or more periods ahead
• Linear Trend
• The long-term trend of many business series, such as sales, exports, and
production, often approximates a straight line. If so, the equation to
describe this growth is: Yˊ =a+bt where
• Y' is the projected value of the Y variable for a selected value of t.
• a is the Y-intercept; the estimated value of Y when t =0 and b is the slope
of the line, or the average change in Y' for each change of one unit in t.
• t is any value of time that is selected.
• To illustrate the meaning of Y', a, b, and t in a time-series problem, a line
may be drawn to sales data of a company in first table to represent the
typical trend of sales.
• a and b needs to be estimated, may be by least squares method.
Uses of time series
• Economic Forecasting
• Sales Forecasting
• Budgetary Analysis
• Stock Market Analysis
• Yield Projections
• Process and Quality Control
• Inventory Studies
• Workload Projections
• Utility Studies
• Census Analysis
and many
Unemployment data
autocorrelation
• This value can be useful for computing for security analysis. For example, if
you know a stock historically has a high positive autocorrelation value and
you witnessed the stock making solid gains over the past several days, you
might reasonably expect the movements over the upcoming several days
(the leading time series) to match those of the lagging time series and
to move upwards.
• Time series analysis comprises methods that attempt to understand such
time series, often either to understand the underlying context of the data
points (Where did they come from? What generated them?), or to make
forecasts (predictions).
• Time series forecasting is the use of a model to forecast future events
based on known past events: to forecast future data points before they are
measured
• Two main goals-
1. Identifying nature of the sequence
where the term εt is the source of randomness and is called white noise. It is assumed
to have the following characteristics:
• 1.
• 2.
• 3.
• With these assumptions, the process is specified up to second-order moments
and, subject to conditions on the coefficients, may be .
• If the noise also has a normal distribution, it is called normal white noise (denoted
here by Normal-WN):
•
• In this case the AR process may be strictly stationary, again subject to conditions
on the coefficients
Trend Analysis
• There are no proven "automatic" techniques to
identify trend components in the time series
data; however, as long as the trend is
monotonous (consistently increasing or
decreasing) that part of data analysis is
typically not very difficult.
• If the time series data contain considerable
error, then the first step in the process of trend
identification is smoothing
• Smoothing.
• Some form of local averaging of data such that the
nonsystematic components of individual observations cancel
each other out.
1998
1999
Sesonal
Adj seas
91.77