Professional Documents
Culture Documents
Hassen Shifa
1
Introduction
Inferential
Basic Research: for sake of knowledge.
Applied Research: solve problem
Field Research: uncontrolled environment
Laboratory Research: controlled envit.
.
Introduction cont.
Data presentation
– Tabular
– Frequency distribution
– Graphic Representation
Table 1. Summary of project cost
Original budget, Revised
Project component mln USD budget mln
USD
25
20
15
Allocated
Used
10
0
ay ra ia P far al
i
l la uz ari
r a N ar
Ti
g
m
h ro
m
S N A om be um
H
A O S am B.G
G
80
70
60
ATVET
50
Extension
40 Research
ICT
30
Marketing
20
PMU
10
0
1st year 2nd 3rd 4th 5th
Introduction cont.
Why sampling?
Complete information would emerge only if
data were collected from every individual in
the population.
To collect data of destructive type can lead to
all individual to be eliminated.
Sampling cont.
Assignment
Simple random sampling
Systematic sampling
Stratified random sampling
Multistage random sampling
Stratified multistage random sampling
Cluster sampling
Quota sampling
Sample Size
What
What is
is
Experimental
Experimental
design?
design?
The
Research
Process
Results
Implementation
Basic Principles
❑ Considerations:
I. Planning and Execution
1.Define the Problem
general objectives
hypothesis
specific objectives.
Basic Principles cont.
2. Establish Experimental
Procedures
biological materials available
treatment to be used
Choice of the treatments
experimental units
replicates,
select proper experimental design
consult experts for choosing design
conduct of experiment
Consideration: I. Planning and
Execution of the Experiment cont.
3. Measurements to be Taken
✪decide on the variables of interest
✪what to measure, and when is the right
time
✪plan at the right time
✪consider time constraints
Consideration: I. Planning and
Execution of the Experiment cont.
4. Prepare for Data Recording and
Summarization
✪use of any facilities, eg tape recorder,
ruler
✪note books, pens , etc.
✪be prepared with problems
Consideration: I. Planning and
Execution of the Experiment cont.
5. Outline and Prepare for Data
Analysis
✪ design considerations is
important
✪ how to analyze
✪ softwares, hardwares
✪ what programme to be used, etc.
Consideration: I. Planning and
Execution of the Experiment cont.
6. Prepare a Detailed Work Schedule
✪ what supplies needed, how long
✪ labor sufficiency, time sufficiency
Consideration: I. Planning and
Execution of the Experiment cont.
7. Consider the Scope and Cost
✪ do not have experiment too
large
✪ labor, cost and time - cost-
benefit
ratio
Consideration: I. Planning and
Execution of the Experiment cont.
8. Conduct of Experiment
✪ avoid bias
✪ use numbers
✪ reduce size and scope if possible
Consideration: I. Planning and
Execution of the Experiment cont.
9. Analyze Data
✪ what statistics to be used
✪ inferences to be made
Consideration: Planning and
Execution of the Experiment cont.
10. Write-up Results
✪ results should be clear, and
clearly discussed
✪ draw accurate and
relevant conclusions
Basic Principles cont.
❖Precision Indicators
❑Degrees of freedom for error
❑Mean squares of Error
High
Lo
w
Degrees of Mean
Freedom for squares
error for error
Lo High
w
Basic Principles cont.
❑ Types of Variation:
1.Inherent Variation
- due to inconsistencies or
heterogeneity within the population
of experimental units
- always present, cannot be avoided
- population variance
Basic Principles cont.
❑Randomization = assigning
treatment
Ways to Reduce Experimental Error cont.
1. Functions of Replication
a. to provide an estimate of
experimental error.
b. to improve precision
c. to increase the scope of
inferences
d. to effect control of error
Ways to Reduce Experimental Error
2. Functions of Randomization:
Design Depends on
❑ Types of Treatments
❑ Number of Treatment
❑ Arrangement of Treatment
❑ Objectives of a study
❑ Inherent Variation in experimental area
Choose less complicated design
EXPERIMENTAL DESIGNS
• Complete block
• Incomplete block
• One factor
• Multiple factors
EXPERIMENTAL DESIGNS
Homogeneous environment
Laboratory
Green-house
Growth chamber
CRD cont.
❑Advantages:
Disadvantage
1. Requires uniform experimental units.
2. All non-treatment variation is
labeled as experimental error.
3. Size limitation is usually restricted
to small experiments.
CRD
Source df SS MS E(MS)
______________________________________
Treatments t-1 Xi./r - X..2/rt SSt/dft σ2+rσ2
Error t(r-1) subtraction SSe/dfe σ2
Total SS - SSt
I II III IV
t1
t2
t3
t4
t5
Analysis of variance
SAS CRD.doc
CRD-SAS-output.doc
Raw data of effects of applying six herbicide types on
biomass of broad leaved weed species evaluated at ..
h1 17 20 16 21
h2 20 21 18 17
h3 18 19 21 16
h4 13 18 14 17
What design to control
one direction of inherent variation?
Block-I
Block-II
Block-III
High fertility
RCBD cont.
L.A.M.:
ij = + i + j + Eij
T1 T2 T1 T3
T3 T1 T2 T4
T2 T4 T3 T1
T4 T3 T4 T2
❑Advantages:
❑ Disadvantages:
I II III IV
t1
t2
t3
t4
t5
t6
Analysis of variance
• CF
• Total SS
• SSt
• SSB
• SSE
Analysis
❖SAS-RCBD.doc
Table . Raw data of six palm progenies
evaluated for oil to bunch ratio (%)
Advantages:
Disadvantages
❑ Size limitations.
L.A.M. Model for Latin square:
Rj
2
SSR = C.F.
–
t
Sum of squares due to Column
(SSC)
C –
2
k
SSC = C.F.
t
Source of
variation d.f. SS MS F-cal 0.05 0.01
Rows r-1
Columns r-1
Treatments r-1
Error (r-1)(r-2)
Column
Row 1 2 3 4 5 6 Rj Mean
Ck
What design
when there are two or more
factors
TWO OR MORE FACTORS
❑ A treatment arrangement,
not experimental design
❑ A method of determining the
treatment to be employed
• Factor - a specific type of treatment
• Level - a state of a factor.
FACTORIAL EXPERIMENT cont.
Purpose/Advantage:
• To increase the scope of inferences on:
i.e., to determine:
• Important factors
• Optimum levels of those factors
• Interactions between factors joint importance
• Improved precision not always true in all
cases
FACTORIAL EXPERIMENT cont.
Simple effect
- effect of 1 factor measured at a
specific level of all other factors.
Main effect
- the average of all simple effects of
a
single factor.
Interaction
- the failure of simple effects of one
factor to be the same at every level
of the other factor.
Tells the measure of simple effect of A at
the level of B.
a1 a2
Sum of diagonal rule : b1
b2
Various Types of Interaction
b2
b1
b1
Magnitude and direction Change in magnitude
change i.e. b1 respond and rank, but not in direction.
a negative way to increasing
levels of A.
FACTORIAL EXPERIMENT cont.
1. Qualitative (specific)
2. Quantitative
3. Ranked qualitative
4. Sample qualitative
Factorial in RCBD
N rate
Variety 100 200
A - -
B - -
C - -
Nested relationship
❑levels of on factor are specific for specific
levels of the other factor.
N rates
Variety 100 200
A -
B -
C -
D -
E -
F -
L1 L2
B1 B3
B2 B1
B3 B2
MSt = SSt/DFt
MSE = SSE/DFe
Correction factor (C.F.) = (GT)2/abr
Sources of
variation DF SS MS F-calc. F-tab (0.05) (0.01)
Treatments t-1 SSt MSt**
Error t(r-1) SSe MSe
Sources of
variation DF SS MS F-calc. F-tab (0.05) (0.01)
Block b-1 SSB MSB**
Treatments t-1 SSt MSt**
Error (t-1)(r-1) SSe MSe
variation
Treatment t-1 SSt MSt**
Factor A a-1 SSA MSA
Factor B b-1 SSB MSB
A x B (a-1)(b-1) SSAxB MSAxB
Error t(r-1) SSe MSe
Total
** = Significant at 0.01 level of probability
Analysis of variance for factorial in RCBD
Sources of
variation DF SS MS F-calc. F-tab (0.05) (0.01)
Block b-1 SSB MSB**
Treatments t-1 SSt MSt**
Factor A a-1 SSA MSA
Factor B b-1 SSB MSB
AxB (a-1)(b-1) SSAxB MSAxB
Variety: SE (m) =± MS E
rn
Nitrogen: SE MS E
(m) = ± rv
VxN: SE MS E
(m) = ±
r
Figure 1. Effect of insecticide on the control of
corn rootworm
10
Corn rootworm score (0-10)
9
8
7
6 A
5 B
4 C
3
2
1
0
C O N
Insecticide
One MSc student conducted an experiment to determine the effect of
different levels of nitrogen fertilizer (N0, N1, N2, ) on three varieties (V1, V2
and V3). A 3 x 3 factorial experiment in RCBDBlock
Treatment with three replications was
used, keeping all other culturalI practices as recommended
II III the area.
for
The grain yield data were given in Table 1 for analysis and appropriate
interpretation.V1N0 3.85 2.61 3.14
V1N1varieties tested
Yield data of three 4.79 at different
4.94 4.56fertilizer .
levels of nitrogen
Block 1 Block 2
Main-plot1 Main-plot2
SP1 SP2 SP3 SP3 SP1 SP2
Main-plot3 Main-plot1
SP2 SP3 SP1 SP2 SP1 SP3
Main-plot2 Main-plot3
SP1 SP3 SP2 SP1 SP2 SP3
Table . Yield (t/ha) of at different density and sowing time
S3
=2033.45
I II III IV V VI Si
S\rep
S1 18.18 17.44 18.13 17.34 17.0 17.1 105.19
Then, SSS
SS due to error (a)
It is necessary to make a two-way table between sowing time and density as follows.
D1 D2 D3 Si
S\rep
S1 30.97 36.41 37.81 105.19
S2 35.76 40.83 39.57 116.16
S3 32.41 40.21 37.40 110.02
Dj 99.14 117.45 114.78 331.37
SSd
SSs
SS due to error
(b)
Table . ANOVA table for sowing time x plant density study
in split-plot design
Source of d.f. SS MS F-cal F-tabulated
variation 0.05 0.01
146
Perfect positive correlation r = 1.0
r = 0 No relationship
147
r does not depend on units: changing
cm to mm does not affect correlation
but does to slope
r does not detect cause and effect It
measures how the variables covary
r quantifies the strength of linear
relationships
148
Regression
Yield
Fertilizer level
149
Regression
153
SAS statement
data crdanov;
input trtment $ regen;
cards;
M1 12
M1 15
M1 16
M2 10
M2 9
M2 11
M3 15
M3 18
M3 17
M4 9
M4 8
M4 7
;
proc anova;
class trtment;
model regen=trtment;
means trtment/lsd;
run;
Data rcbdanov;
Input genotype block DS;
Cards;
1 1 53
2 51
3 50
4 52
2 1 38
2 43
3 40
4 40
3 1 49
2 48
3 52
4 45
4 1 45
2 47
3 50
4 46
5 1 37
2 36
3 37
4 42
;
proc anova;
class genotype block;
model DS=genotype block;
means genotype/Duncan;
run;
OUTPUT
Output from proc anova of the above data is given in Table 4.40
SAS output for 10 genotypes replicated 4 times in RCBD
The SAS System Analysis of Variance
Class Levels Values
BLOCK 4 1 2 3 4
GENOTYPE 10 1 2 3 4 5 6 7 8 9 10
Number of observations in data set = 40
Dependent Variable: DS
Source D.F. Sum of Squares Mean Square F Value Pr > F
Model 12 1031.80000000 85.98333333 17.30 0.0001
Error 27 134.17500000 4.96944444
Total 39 1165.97500000
R-Square C.V. Root MSE DS Mean
0.884925 4.770947 2.22922508 46.72500000
Source D.F. SS Mean Square F Value Pr > F
BLOCK 3 19.07500000 6.35833333 1.28 0.3014
GENOTYPE9 1012.72500000 112.52500000 22.64 0.0001
Alpha= 0.05 d.f.= 27 MSE= 4.969444, Critical Value of T= 2.05
Least Significant Difference= 3.2343
Means with the same letter are not significantly different.
T Grouping Mean N GENOTYPE
A 53.250 4 8
A
B A 51.500 4 1
B A
B A 51.500 4 6
B
B C 50.000 4 10
B C
B C D 48.500 4 3
C D
C D 47.000 4 4
D
D 46.250 4 7
E 41.000 4 9
E
E 40.250 4 2
data LSDanov;
input row column trtment score;
cards;
1 1 1 33.8
2 4 1 34.6
3 3 1 36.9
4 2 1 37.1
5 5 1 36.4
1 2 2 33.7
2 3 2 33.5
3 5 2 35.1
4 4 2 38.1
5 1 2 34.8
;
proc anova;
class row column trtment;
model score=row column trtment;
means trtment/lsd;
run;
Class Levels Values
ROW 5 1 2 3 4 5
COLUMN 5 1 2 3 4 5
TRTMENT 5 1 2 3 4 5
Number of observations in data set = 25
Analysis of Variance Procedure
Dependent Variable: EL
Source D.F. Sum of Squares Mean Square F Value Pr > F
Model 12 259.85920000 21.65493333 7.06 0.0010
Error 12 36.79920000 3.06660000
Total 24 296.65840000
R-Square C.V. Root MSE EL
0.875954 5.134194 1.75117104 34.10800000
Source D.F. Anova SS Mean Square F Value Pr > F
ROW 4 87.40240000 21.85060000 7.13 0.0035
Column 4 16.56240000 4.14060000 1.35 0.3079
TRTMENT 4 155.89440000 38.97360000 12.71 0.0003
T tests (LSD) for variable: EL
Alpha= 0.05 d.f.= 12 MSE= 3.0666
Critical Value of T= 2.18
Least Significant Difference= 2.4131
Means with the same letter are not significantly different.
T Grouping Mean N
TRTMENT
A 35.760 5 1
A
A 35.680 5 3
A
A 35.040 5 2
A
A 34.900 5 4
B 29.160 5 5
data splot;
input vy $ date $ blk yld;
cards;
BC10 d1 1 2.2
BC10 d1 2 2.0
BC10 d1 3 2.3
BC10 d2 1 3.2
BC10 d2 2 3.3
BC10 d2 3 3.4
BC10 d3 1 4.0
BC10 d3 2 4.1
BC10 d3 3 4.2
BC9 d1 1 1.8
BC9 d1 2 1.9
BC9 d1 3 2.2
BC9 d2 1 2.4
BC9 d2 2 2.4
BC9 d2 3 2.5
BC9 d3 1 3.1
BC9 d3 2 3.2
BC9 d3 3 3.3
;
proc anova;
class vy date blk;
model yld=vy blk vy*blk date vy*date;
test h=vy blk e=vy*blk;
means vy/lsd e=vy*blk;
means date vy*date/lsd;
run;
;
Class Levels Values
V 2 BC10 BC9
DATE 3 d1 d2 d3
BLPCK 3 123
Dependent Variable: YIELD(t/ha)
Source D.F. Sum of Squares Mean Square F Value Pr > F
Model 9 10.06500000 1.11833333 154.85 0.0001
Error 8 0.05777778 0.00722222
Total 17 10.12277778
R-Square C.V. Root MSE YIELD Mean
0.994292 2.970303 0.08498366 2.86111111
Source D.F. Anova SS Mean Square F Value Pr > F
V 1 1.93388889 1.93388889 267.77 0.0001
BLOCK 2 0.13777778 0.06888889 9.54 0.0076
V*BLK 2 0.00444444 0.00222222 0.31 0.7435
DATE 2 7.52111111 3.76055556 520.69 0.0001
V*DATE 2 0.46777778 0.23388889 32.38 0.0001
Total 71 43238839.07944430
R-Square C.V. Root MSE MOE Mean
0.885462 8.717321 287.30085649 3295.74722222
data class;
input hybrid $ Eheight Cweight @@;
data lines;
Hyb-1 69 112
. . .
Hyb-n 62 62
;
proc reg data=class;
model Cweight=Eheight;
run;
plot r.*p.;
Run;
SAS output for a response and two independent variables
MODEL: MODEL 1
DEPENDENT VARIABLE: Y
ANALYSIS OF VARIANCE
SOURCE D.F. SS MS F-VALUE PROB> F
MODEL 2 1423.83797 711.91898 113.126 0.0001
ERROR 10 62.93126 6.29313
TOTAL 12 1486.76923
PARAMETERS ESTIMATES
VARIABLE D.F. ESTIMATE ERROR T FOR H0 PROB>T
INTERCEPT 1 65.099678 14.94457 4.356 0.0014
X1 1 1.0771 0.077 13.975 0.0001
X2 1 0.4254 0.07315 5.815 0.0002
MEAN COMPARISONS
t-test
T Prob>|T|
8.839 0.0001
T1 T2 T3 T4 T5
9 10 16 17 22
Tukey’s Test
Tα = qα(t, d.f.) MS E
r
8.06
qα (5, 20) = 5
= 5.37.
T1 T2 T3 T4 T5
9 10 16 17 22
Duncan’s New Multiple Range Test
(DNMRT)
From Duncan’s Table of significant ranges,
Appendix Table A.7 of Steel and Torrie (1980), rα
(p, d.f.) values are obtained for t = 2, 3, …, t
where α is the significant level and d.f. is error
degrees of freedom. This ranges will be
converted into a set of t-1 least significant
ranges (Rp) for p = 2, 3, …, t by calculating Rp).
Rp = rα(p, d.f.) SE(m) for p = 2, 3, …, t.
DNMRT can be applied on the previous example.
Recalling that MSE = 8.06, n = 5 and error degrees of
freedom = 20 and then the treatment means can be
ordered in ascending order as T1=9, T2 = 10, T3 = 16, T4 =
17 and T5 = 22.
MS E
The SE (m) = r
8.06
= 5
= 1.27.
The comparison would yield that there are
significant differences between all pairs of
means except treatment 1 and 2 and 3 and 4. In
this example, DNMRT and the LSD method
produced the same result that leads to identical
conclusions.
T1 T2 T3 T4 T5
9 10 16 17 22
Which comparison method is the best?
y = a + bx
Grain
yield
150
N fertilization
2. Quadratic
Nutrient release
y = a+b1x+b2x2
Time
3. Cubic
Specific leaf weight
y = a+b1x+b2x2+b3 x3
3 turning points
- max, min change of the curve
Time
Trends Cont.
Considerations (Questions):
• Is there a response?
• Is the response predictable or
explainable?
• What is the nature of the response?
• What model best describe the response?
• Is the response significant?
• practical view point
• statistical view point
GENERAL PRINCIPLES
1. For t treatments, SSt may be partitioned into t-1
portions specified by increasing higher order contrasts.
______________________________________________________
t df terms
2 1 L invalid not done
3 2 L, D
4 3 L, Q, D
5 4 L, Q, C, D
. .
. .
Q K SS
L 231.72 20 134.24
Q -4.36 4 0.02
D -84.86 20 18.00
ANOVA
Source df. MS F
Total 79
Conclusion
1. Linear response is significant.
2. There is also a significant curvilinear
response, a response that is not quadratic,
but maybe cubic?
linear
cubic
WHEN ASSUMPTIONS OF ANOVA
o Normality
o Independence
o Homogeneity
o Randomness
Data transformation
DATA TRANSFORMATION
LOG TRANSFORMATION
Example
- Number of insects per plot
- Number of diseased plants per
plot
SQUARE-ROOT TRANSFORMATION
• Choice of design
• Precision
• Treatment
• Inherent variation
• Knowledge