Multivariate analysis 1

- Baillargeon, R. (2004). Infants Physical World. Current Directions in Psychological Science, 13(3), 89-94
- Design of Experiment question
- Root Cause Analysis Forms
- Laboratory Manual Format
- Variables
- BME3101 Biomechanics Lab Description_KGP_rev
- Ccj 4700- Introduction to Research Methods
- Quantitative Methods
- Project Front Page1
- Guide to Writing Techreports 2012
- research plan template jhms
- Experimental Research PDF
- Lab Report Guidelines 88-224 W2014
- Lab Report Guidelines and Example
- CBSE Science Lab Manual - Class 10 - Module 1
- Learning Activity 7.1 and 7.3
- Lab Report Guidelines
- LAB Write Up Guideline
- Lab Report Guidelines
- Reporting Standards for Research in Psychology

Department of Statistics

National Chengchi University

Taipei 11605, Taiwan

E-mail: chengt@nccu.edu.tw

Outline

• 1.2 Planning for research

• 1.3 Experiments, Treatments, and Experimental Units

• 1.4 Research Hypotheses Generate Treatment Designs

• 1.5 Local Control of Experimental Errors

• 1.6 Replication for Valid Experiments

• 1.7 How Many Replications?

• 1.8 Randomization for Valid Inferences

• 1.9 Relative Efficiency of Experiment Designs

• 1.10 From Principles to Practice: A Case Study

Objectives

research studies have statistical implications.

• The principles in this Chapter form a basis for the

structure of a research study.

• The structure, in turn, defines the study’s function.

• If the structure is faulty, then the study will not function

properly. It will produce either incomplete or misleading

information.

• The statistical principles are those associated with the

collection of observations to gain maximum information

for a research study in an efficient manner.

• They include treatment design, local control of variability,

replication, randomization, and the efficiency of

experiments.

Sir Ronald A. Fisher (1890-1962)

• R. A. Fisher developed the analysis of variance and unified

his idea on basic principles of experimental design.

• He joined the staff at Rothamsted Experimental Station

during 1919-1933.

• Fisher (1926) published “The arrangement of field

experiments”and outlined three fundamental components

• Local control of field conditions for reduction of

experimental error

• Replication as a means to estimate experimental error

variance

• Randomization for valid estimation of experimental error

variance

• Two books

• Statistical Principles for Research Workers (1925)

• The Design of Experiments (1935)

Planning for research

scientist to acquire knowledge about a natural or

manufactured process.

• Good planning helps the scientist to organize the required

tasks for a research study.

A checklist at the beginning

• Identification of influential factors and which of those

factors to vary and which to hold constant

• The characteristics to be measured

• The specific procedures for conducting tests or measuring

the characteristics

• The number of repetitions of the basic experiment to

conduct

• Available resources and materials

Questions to focus activities

• What is my objective?

• What do I want to know?

• Why do I want to know it?

• Productive follow-up question for each activity in the

process direct our attention to define the role of each

activity in the research study.

• How am I going to perform this task?

• Why am I doing this task?

Design of experiments (DOE)

that establish a particular set of circumstance under a

specific protocol to observe and evaluate implications of

the resulting observation.

• The investigator establishes and controls the protocols in

an experiment to evaluate and test something that for the

most part is unknown up to that time.

• Comparative experiment

• the establishment of more than one set of circumstances

in the experiment

• the responses resulting from different circumstances will

be compared with one another

(Comparative) experiment

• Treatments

• the set of circumstances created for the experiment in

response to research hypothesis

• the focus of the investigation

• Ex: diets, cultivars of a crop species, temperatures, soil

types, amounts of a nutrient

• Experiment units (EU)

• the physical entity or subject exposed to the treatment

independently of other units.

• responses or outcomes

• a method to assign treatments to experiment units

Remarks

prior to an experiment.

• Always design experiments with particular statistical tests

in mind

• Proper analysis depends on the design and the kinds of

statistical model assumptions we believe are correct and

are willing to assume.

Experimental error

independently treated EUs

• Origins of experimental error

• the natural variation among EUs

• variability in measurement of the response

• inability to reproduce the treatment conditions exactly

• interaction of treatments and EUs

• other extraneous factors influence the response

• A major objective is the attainment of an estimate of the

variance of experimental error

• The variance of experimental error is the variance of

observations on EUs

• The differences among the observations can be attributed

only to experimental error.

• Statistical inferences rely on an estimate of such variance.

Comparative observational study

possible effect of treatments on subjects, where the

assignment of subjects into treatment groups is outside

the control of the investigator.

• The subjects are either self-selected into identifiable

groups or they are simply exist in their particular

circumstances.

• The groups or circumstances are used as treatments in the

observational study.

• Practical reasons or ethical considerations

• Observational study are limited to association

relationships between the responses and treatment

conditions.

DOE vs. observational study

• helps researchers to answer questions or evaluate the

effectiveness of a research program.

• Distinctions between observational study and designed

experiment:

• Can treatments be assigned to EUs?

• In designed experiments, treatments are assigned to EUs

according to a planned scheme.

• In observational studies, subjects are either self-selected

into identifiable groups or they simply exist in their

particular circumstances.

• The groups or circumstances are used as treatments in

observation study.

• What kind of relationships can be established?

• Designed experiments help to establish causal relationships

between the responses and treatments

• Observational studies help to establish association

relationships between responses and treatments

• Section 1.4 gives description about how the research

hypotheses generate treatment design.

• Why do experiments but not observational study?

• For instance, an observational study shows a positive

association between ice cream sales and rates of violent

crime.

• What does this mean?

Notes

• Observational studies are prone to be influenced by

confounding variables, which mask or distort the

association between measured variables in a study

• Ex: Ice cream sales and rates of violent crime are positively

correlated to hot weather.

• In an experiment, treatments can be randomly assigned to

units or individuals (randomization) to avoid confounding

variables

Purposes of experiments

• To find conditions under which the optimal (maximum or

minimum) response is achieved

• To compare responses at different levels of controllable

variables

• To determine the levels of controllable variables to

minimize the impacts from the uncontrollable variables

• To develop a model for predicting responses

• Etc.

History of design of experiments (I)

• W. S. Gossett and the t-test (1908)

• R. A. Fisher & his co-workers

• Profound impact on agricultural and biological sciences

• The first industrial era, 1951 – late 1970s

• Box & Wilson, response surfaces

• Process modeling and optimization

• Applications in the chemical & process industries

History of design of experiments (II)

• Quality improvement and variation deduction initiatives in

many companies

• Taguchi and robust parameter design, process robustness

• The modern era, beginning 1990s

• Popular outside statistics, and an indispensable tool in

many scientific/engineering endeavors

• New challenges:

• Large and complex experiments, e.g. screening design in

pharmaceutical industry, experimental design in

biotechnology

• Computer experiments: efficient ways to model complex

systems based on computer simulation

• Many others

A systematic approach to experimentation

• State objectives

• Choose responses

• What to measure?

• How to measure?

• How good is the measurement system?

• Flow chart and cause-and-effect diagram

• Factor experimental range is crucial for success

• Conduct the experiment

• Analyze the data

• Conclusion and recommendation:

• iterative procedure

• confirmation experiments/follow-up experiments

Research hypotheses generate treatment designs

pathogen

• Research hypothesis: Not all fungicides are equally

effective in controlling the soil pathogen.

• Treatment design: Selecting different types of fungicides

for experiment.

• Treatment design decides treatments to be used, where

treatments can be conditions imposed by researchers or

existing conditions.

• Allowing additional treatments help to fully evaluate the

consequences of hypotheses (next page)

Control treatments are benchmarks

• No treatment helps to reveal the conditions under which

the experiment was conducted.

• Placebo establishes a basis for treatment effectiveness

when manipulating treatment alone may produce a

response.

• Note: One may include two distinct types of controls.

Multiple-factor treatment designs expand inferences

• The categories of a factor are levels of the factor.

• Temperature: 20◦ C , 30◦ C , 40◦ C

• Practical or ethical reasons may prevent experimentation:

• Ex: Can’t randomly assign gender to subjects

• Ex: Unethical to put any individual in “no seat belt”

condition in car colliding experiment

• Ex: Impractical to randomly assign sites to have only pure

mature oak trees or to have mixed mature oak and pine

trees

• The investigator may simply lack the requisite influence or

power to conduct randomization.

Factorial arrangement

combinations of the levels of the treatment factors.

• Ex: Nitrogen production by a special type of bacteria is

affected by the temperature and soil type

• Temperature: 20◦ C , 30◦ C , 40◦ C

• Soil type: normal, saline, sodic

• 3 × 3 factorial treatment combinations (Two-factor

factorial design)

• See Chapters 6 and 7.

Local control of experimental errors

experimental error:

• (1) Techniques related to preparing circumstances or

taking measurement:

• Accurate measurement, preparation of media, pipetting of

solutions, calibration of instruments

• (2) Selection of uniform EUs to reduce experimental error.

• (3) Blocking helps to reduce experimental error and select

experimental units for uniformity.

Local control of experimental errors (II)

Major criteria for blocking:

• Proximity: such as neighboring plots

• Physical characteristics: such as litter, batch, age, or

weight, etc.

• Time: temporal effect on the task, such as each day only

one replication can be run.

• Management: Each technician or observer can be

assigned to one replication of all treatments.

• Matching strategy for grouping subjects or units:

• Pair matching: two approaches based on controlling

variables:

• exact value

• caliper value (similar value)

• Nonpair matching: two approaches based on controlling

variables:

• frequency: Subjects are stratified from a frequency

distribution

• mean-based: Each treatment group has the same average

value.

Local control of experimental errors (III)

• (4) Choice of experiment design to accommodate

treatment designs

• Experiment design decides the selection/arrangement of

EUs and assignment of treatments.

• Ex: Compare 3 gasoline additives.

• Treatment: additive ; 3 treatments

• EU: automobile engine; 6 EUs

• Assignment: 3 additives are randomly allocated to 6

engines, 2 units per additive.

• Design:

• Ex: Compare 3 diets.

• Treatment: diet; 3 treatments.

• EU: mouse; 6 EUs.

• Block : litter ; 2 blocks

• Assignment: 3 treatments were randomly assigned to 3

mice in each litter.

• Design:

Local control of experimental errors (IV)

• Variables that are related to the response variable are

called covariates.

• Analysis of covariance (ANCOVA): A procedure considering

covariates in addition to treatments provides a statistical

control on experiment error variance.

• Ex: Body weights would be a covariate of a weight gain

experiment.

• Pretest scores would be a covariate of a learning

experiment.

• Requirement of ANCOVA: Covariates are unaffected by the

treatments.

Replication for valid experiments

experiment.

• Reasons for replicating an experiment:

• demonstrating the results to be reproducible

• providing some insurance against aberrant results due to

accident

• providing an estimate for experimental error variance

• increasing the precision for estimating treatment means

Notes

• Why the observational unit , the experimental unit?

• The observational unit may be a sample from the EU

Example 1.1

• Weight gain data are collected to test the efficacy of 2

rations.

Pen 1 – Ration A Pen 2 – Ration B

s s s s s s

s s s s s s

• Each of 2 rations was assigned to one of 2 pens.

• Each pen had 6 cows.

• EU: pen; number of EUs= 2.

• Observational unit: cow; number of OUs within each

treatment = 6.

• This experiment has only 1 true replication, i.e. pen.

• Differences in ȲA and ȲB may not be attributed to rations

alone, why?

• Besides treatment effects, there may exist pen effects and

treatment-pen interactions.

• Note: When observational unit (OU) , the experimental unit,

OUs are also called pseudoreplicates.

Example 1.2

• Each of two rations was randomly assigned to 2 pens of

cows.

Ration A Ration B

Pen 1 s s s Pen 3 s s s

Pen 2 s s s Pen 4 s s s

• EU: pen; number of EUs= 4.

• OU: cow; 3 cows per pen.

• Response from EU: the average pen response.

samples from a field plot are observational units

• Medications were received by patients, then serum

samples from a patient are observational units.

How many replications?

effects.

• Factors involved in determining number of replications:

• The variance σ 2

• Size of difference that has physical significance

• The significance level α

• The power of the test 1- β.

Example

• To test H0 : µ1 = µ2 vs. Ha : µ1 , µ2 , an experiment with 2

independent samples is conducted.

• Let α be the significance level.

• When the true difference in means, µ1 − µ2 , is δ, and the

probability of type II error is to be no greater than β

• Then the replication for each treatment groups is at least

2

2 Z α2 + Zβ σ 2

r= .

δ2

• If µ1 = 10, µ2 = 11, σ = 0.5, α = 0.05, and β = 0.2, then

2

2 Z α2 + Zβ σ 2

r= = 2(1.96 + 0.84)2 0.52 = 3.92 −→ 4

δ2

Randomization for valid inferences

• According to Fisher, the random allocation of treatments

to EUs simulates the effect of independence and permits

us to proceed as if the observations are independent and

normally distributed.

• He showed normal theory tests are good approximations to

the randomized tests provided randomization being

implemented to a reasonably large sample.

of treatments to EUs.

Example

A and 3 receiving B.

• Two-sample t test

(y 1 − y 2 ) − D0 (16.515) − 0

t= q = q = 0.958

1 1 1 1

sp n + n 4.2( 4 + 3 )

1 2

where

sp2 = = = 4.2

n1 + n2 − 2 4+3−2

• ANOVA

Example (continued)

• Randomized test

• If no difference between the effects of treatment A and B,

then they are merely labels put on EUs, and can be

allocated to the sample results.

(n +n )!

• Hence, there are n1 !n 2! = 47!3! ! = 35 possible

1 2

arrangements.

• Table 1.2 listed all these 35 arrangements and computed

the differences between the two treatment, (y 1 − y 2 )j ,

j =1,. . . ,35.

|(y 1 − y 2 )|(1) , |(y 1 − y 2 )|(2) , · · · · · · , |(y 1 − y 2 )|(35)

y 1 − y 2 = 1.5

• p-value of randomized test =

P35

i =1 I (|(y 1 − y 2 )|(i ) ≥ 1.5) 13

= = 0.37

35 35

Three importance principles-I

• Randomization

• Randomization is the random assignment of treatments to

units in an experimental study.

• Breaks the association between potential confounding

variables and the explanatory variables

Three importance principles-II

• Replication

• Replication implies an independent repetition of the basic

experiment.

• Reasons for replicating an experiment:

• demonstrating the results to be reproducible

• providing some insurance against aberrant results due to

accident

• providing an estimate for experimental error variance

• increasing the precision for estimating treatment means

Three importance principles-III

• Blocking

• Blocking is the grouping of experimental units that have

similar properties

• Within each block, treatments are randomly assigned to

experimental units.

• Blocking allows us to remove extraneous variation from

the data.

• When do we need to consider blocking and/or

randomization?

• There are nuisance factors which are not our interest but

they do affect the response.

• For the nuisance factors

• Block what you can control

• Randomize what you cannot control

Nuisance factors

• If unknown

• Protecting experiment through randomization

• If known (measurable) but uncontrollable

• Analysis of Covariance

• If known and controllable

• Blocking

Textbook (I)

population means with independent samples, Ch. 2, 3)

• Factorial experimental designs (Ch. 6, 7)

• Factorial designs with fixed effects (Ch. 6)

• Factorial designs with random effects (Sec. 7.1)

• Factorial designs with mixed effects (Sec. 7.2)

• Nested factor designs (Sec. 7.3)

• Nested and crossed factors designs (Sec. 7.4)

• Complete block designs

• Randomized block designs (comparing two population

means with matched samples, Sec. 8.2)

• Latin square designs (Sec. 8.3)

• Factorial experiments in complete block design (Sec. 8.4)

Textbook (II)

• Fractional factorial designs (Ch. 12)

• Response surface designs (Ch. 13)

• Split-plot designs (Ch. 14)

• Repeated measures designs (Ch. 15)

• Crossover designs (Ch. 16)

Other techniques or terminologies may be implemented in DOE

Control group

the treatment of interest but otherwise experiencing the

same conditions as the treated subjects

• Example: One group of patients is given a placebo

• Why do we need to consider placebo?

• Because patients treated with placebos, including sugar

pills may report improvement

• Example: More than 30% of patients with chronic back pain

report improvement even when treated only with a placebo.

Blinding

participants and/or researchers about which subjects are

receiving which treatments

• Single blind: subjects are unaware of treatments

• Double blind: subjects and researchers are unaware of

treatments

• Example: testing heart medication

• Two treatments: drug and placebo

• Single blind: Patients don’t know which group they are in,

but doctors do.

• Double blind: Neither patients nor doctors administering the

drug know which group the patients are in. Only a third

party knows the identification before the study is done.

Balance

equal sample size.

• Balance maximizes power.

• Balance makes tests more robust if assumptions are

violated.

Definition from Montgomery

made to the input variables of a process or system so that

we may observe and identify the reasons for changes that

may be observed in the output response.

• An nonrandomized experiment may give biased parameter

estimates

• A biased estimate of the error variance

