Biometry 2010

Biometry
Hassen Shifa
1
Introduction
 What is statistics? A statistical

expression that says, ‘we
calculate statistics from
statistics by statistics’
Introduction cont.
 Descriptive and Inferential.
 Descriptive statistics can be defined as
those methods involving the collection,
presentation and characterization of a set
of data
 Inferential statistics = the estimation of a
characteristic of a population or the
making of a decision concerning a
population based only on sample results
Introduction cont.
 Descriptive
– Mean
– Median
– Mode
– Range
– Variance
– Standard deviation
Introduction cont.
 Inferential
 Basic Research: for sake of knowledge.
 Applied Research: solve problem
 Field Research: uncontrolled environment
 Laboratory Research: controlled envit.
.
Introduction cont.
 Variables: characteristic that can be

measured or described and can vary
among individuals or objects of
reference
 Data: Information about a particular
characteristic of an individual
Variable Data
----------- -------
gender male, female
height 1.6 m, 1.55 m, 1.5 m
t-shirt co lour white, black, blue
number of pens 1, 2, 3,
weather sunny, rainy, cloudy
CGPA 1.5, 2.0, 2.5, 3.0, 3.5
Weight 65kg, 100kg
Introduction cont.
 Quantitative Data = measurable

– Continuous, height of students
– Discrete, number of students
 Qualitative Data = described
– Color
– taste
Introduction cont.
 Data presentation
– Tabular
– Frequency distribution
– Graphic Representation
Table 1. Summary of project cost
Original budget, Revised
Project component mln USD budget mln
USD
1. ATVET 6.3 6.3

2. Extension 29.0 33.7
3. Research 22.3 22.3
4. ICT 3.0 1.2
5. Marketing 7.0 11.0
6. PMU 3.4 3.1
Total 71 77.6
RCBP budget utilization by region, 2007
25
20
15
Allocated
Used
10
0
ay ra ia P far al
i
l la uz ari
r a N ar
Ti
g
m
h ro
m
S N A om be um
H
A O S am B.G
G
80
70
60
ATVET
50
Extension
40 Research
ICT
30
Marketing
20
PMU
10
0
1st year 2nd 3rd 4th 5th
Introduction cont.
 A population is the whole set of

measurements or counts about which we
want to draw conclusion.
 A sample is a sub set of a population, a
set of some of the measurements or
counts that comprises the population.
Introduction cont.
 Accuracy and precision

 Precision is the closeness of repeated
measurements and
 Accuracy is the closeness of a measured
or computed value to its true value
Introduction cont.
 Hypothesis is an assertion or conjecture

concerning one or more populations.
 Null hypothesis: any hypothesis to be tested
and is denoted by H0.
 Alternate hypothesis, denoted by H1 or HA.
Introduction cont.
 Type-I error: Rejection of the null

hypothesis when it is true
 Type-II error: Acceptance of the

null hypothesis when it is false
Investigator’s Ho is true Ho is false
Decision μF = μM μF ≠ μM
The Real Situation
Reject Ho Type1 error
(unknown to the investigator) A correct decision
Accept Ho Correct Type 11 error

decision
Introduction cont.
 Experimental error is the variation

between experimental units (plots)
treated alike
Introduction cont.
 Replication is repeating treatments in

more than one experimental unit
 Randomization is the process of
assigning treatments to experimental
units at random.
Introduction cont.
 Blocking is a one of the procedures of

the refined techniques where by
experimental units are grouped into
blocks of homogeneous units
Introduction cont.
 Analysis of variance (ANOVA) is a

procedure that can be used to analyze
the results from both simple and complex
experiments
Introduction cont.
 Assumptions of analysis of variance

include
 randomness- sampling of individuals be
at random
 Independence-the errors are
independently distributed
Introduction cont.
 normality -the errors are randomly,

independently and normally distributed), and
 homogeneity -the error terms are
homogeneous/equal variance).
Introduction cont.
 Correlation analysis attempts to measure

the strength of relationships between two
variables
 Regression is similar to correlation in

that testing for a linear relationship
between two types of measurements is
made on the same individuals
Introduction cont.
 The covariance between two random

variables is a measurement of the
nature of the association between the
two
Introduction cont.
 Experiment as a planned inquiry to

obtain new facts or to confirm or deny
the results of the previous experiments
Introduction cont.
 Experiment is an important tool for

research and should have the following
important characteristics:
– simplicity,
– measuring differences at higher precision,
– absence of systematic error,
– wider ranges of conclusion and
– calculation of degree of uncertainty
Introduction cont.
 Experimental units and treatments are

very general terms
 An experimental unit may be an animal,
many animals, a plant, a leaf and so on.
Accordingly, a treatment may be a
standard ration, inoculation, and a
spraying rate/spraying schedule
Sampling
 Why sampling?
 Complete information would emerge only if
data were collected from every individual in
the population.
 To collect data of destructive type can lead to
all individual to be eliminated.
Sampling cont.
 Sampling is the taking or measuring of more

than one observation per experimental unit.
 Sampling error occurs during sample
measurements. The sampling error is the
variation among observations taken within
experimental units.
 Sampling can be done in green houses,
under filed conditions, in laboratories, on life
plants like perennial crops, on animals, and
so on.
Sampling cont.
 Sampling can be done in one run, at two

stages, three stages or so.
Sampling cont.
 The main reason for sampling is to save

resources (time, money and efforts).
 The second reason is that the sample data
can be useful in drawing conclusions about
the population, with appropriate sampling
method and sample size.
 The third reason is when the act of
measuring the variable destroys the
individual, such as in destructive sampling.
Sampling methods
Assignment
 Simple random sampling
 Systematic sampling
 Stratified random sampling
 Multistage random sampling
 Stratified multistage random sampling
 Cluster sampling
 Quota sampling
Sample Size
 In the planning phase, it is necessary to

decide what kind of measurements to be
taken and which sampling techniques are
going to be used.
 The next step is to decide how many
measurements to be taken to get
representative of the population
Sample Size
 The most common question asked by every

investigator who whishes to collect and
analyze data is, ‘how much data should be
collected?’
 A sample size of 30 sounds enough,
depending on conditions.
 Precision required
 The variability, as measured by its standard
deviation is important.
Sample size
 The second suggestion is to take 20% of the

units from each stratum (this is called
proportional sampling)
 The third is take optimal samples from each
stratum.
 changing the sample size alter the sensitivity
or accuracy
Data Description
 Summarize possible large numbers of

measurements:
 Frequency distribution
 Basically, the frequency distribution is simply
a table constructed to show how many times
a given score or group of scores occurred.
We can set up a table where the highest
score is at the top and the lowest at the
bottom, with all possible scores in between,
distribution (Table below).
Frequency distribution of plant height of a swee tcorn
population
___________________________________
Class frequency
___________________________________
156-160 10
161-165 80
166-170 235
171-175 370
176-180 220
181-185 80
186-190 5
___________________________________
 Real limits and apparent limits
 Apparent limit is like the one written as 156-
160, 161-165, and so on.
 Gap between 160 (the top score in the 156-
160 interval) and 156 (the lowest score in the
156-160 interval). For this reason, it is
understood that the real limits of any interval
extended from 1/2 unit below the apparent
limit to 1/2 unit above the apparent upper
limit. The real lower limit is designated L and
the real upper limit is U.
Frequency distribution
 The first step in constructing a frequency

distribution from any group of data is to
locate the highest and the lowest scores.
 The distance between the highest and lowest
score is the range.
 The next step is to determine i, the size of
the interval. Dividing the range by the
number of intervals that will be employed
does this.
Frequency distribution
 Most commonly i values of 3, 5, 10, 25, 50,

and other multiple of 10 are used (Berenson
et al., 1988).
 The choice of the number of intervals and the
size of the interval is quite arbitrary.
 The main point is to have the frequency
distribution display as much information as
possible concerning the concentrations and
patterns of scores.
Histogram
 Histograms are vertical bar charts in which

the rectangular bars are constructed at the
boundaries of each class.
 The steps in graphing frequency distribution
are: to layout an area on a paper having the
height of the graph about three-fourths of the
width.
 The horizontal line, called the x-axis or the
abscissa,
Measure of central tendency
 A number describing the location of a set of

values is a measure of central tendency or
measure of location.
 The central tendency, dispersion and shape
are three major properties that describe a set
of numerical quantitative data.
 Mean is the arithmetic average of the values.
EXPERIMENTAL DESIGN
What
What is
is
Experimental
Experimental
design?
design?
❑ The rules and procedures used

in organizing, initiating and
analyzing an experiment.
INTRODUCTION cont.
❑ Main Components of the

Research Process
Planning
The
Research
Process
Results
Implementation
Basic Principles
❑ Considerations:
I. Planning and Execution
1.Define the Problem
 general objectives
hypothesis
specific objectives.
Basic Principles cont.
2. Establish Experimental
Procedures
 biological materials available
 treatment to be used
 Choice of the treatments
 experimental units
 replicates,
 select proper experimental design
 consult experts for choosing design
 conduct of experiment
Consideration: I. Planning and
Execution of the Experiment cont.
3. Measurements to be Taken
✪decide on the variables of interest
✪what to measure, and when is the right
time
✪plan at the right time
✪consider time constraints
4. Prepare for Data Recording and
Summarization
✪use of any facilities, eg tape recorder,
ruler
✪note books, pens , etc.
✪be prepared with problems
5. Outline and Prepare for Data
Analysis
✪ design considerations is
important
✪ how to analyze
✪ softwares, hardwares
✪ what programme to be used, etc.
6. Prepare a Detailed Work Schedule
✪ what supplies needed, how long
✪ labor sufficiency, time sufficiency
7. Consider the Scope and Cost
✪ do not have experiment too
large
✪ labor, cost and time - cost-
benefit
ratio
8. Conduct of Experiment
✪ avoid bias
✪ use numbers
✪ reduce size and scope if possible
9. Analyze Data
✪ what statistics to be used
✪ inferences to be made
Consideration: Planning and
10. Write-up Results
✪ results should be clear, and
clearly discussed
✪ draw accurate and
relevant conclusions
❑ Error Control - primary function

of experimental design
1. Among treatments (controllable)
2. Within treatments (uncontrollable)
- to be minimized
ANOVA Table for CRD
Source d.f. SS MS F-cal

Among treatment t-1 SSt MSt MSt/MSe
Within treatment t(r-1) SSe MSe
Total tr-1 Total SS
MSt = SSt/d.f. for treatment

MSe = SSe/d.f. for error
❖Experimental Error - Measure of the

variation which exists among
observations on experimental units
treated alike.
❖Precision Indicators
❑Degrees of freedom for error
❑Mean squares of Error
High
Lo
w
Degrees of Mean
Freedom for squares
error for error
Lo High
w
• When degrees of freedom for error is high

• Low Mean squares of error
• High precision
• Treatment effect is revealed
• When degrees of freedom for error is low

• High mean squares of error
• Low precision
• Treatment effect is masked
❑ Types of Variation:
1.Inherent Variation
- due to inconsistencies or
heterogeneity within the population
of experimental units
- always present, cannot be avoided
- population variance
2. Variation due to inconsistencies or lack of

uniformity in the physical conduct of experiment
 can never be corrected by statistics.
a. mistakes and sloppiness

b. ignorance - lack of knowledge of particular
information.
eg. characteristics of experimental units;
proper experimental design.
c. non-random outcomes - uncontrollable eg. flood
Ways to Reduce Experimental Error
❑Replication = repeating treatment
❑Randomization = assigning
treatment
Ways to Reduce Experimental Error cont.
1. Functions of Replication
a. to provide an estimate of
experimental error.
b. to improve precision
c. to increase the scope of
inferences
d. to effect control of error
❑ Factors influencing number of

replications
1. Degree of precision required

2. Uniformity of experimental
units
3. Number of Treatments
4. Experimental design used
2. Functions of Randomization:
a. To provide an unbiased estimate of

experimental error and treatment means.
b. Precaution against disturbances that

may be present.
CHOOSING EXPERIMENTAL DESIGN
Design Depends on
❑ Types of Treatments
❑ Number of Treatment
❑ Arrangement of Treatment
❑ Objectives of a study
❑ Inherent Variation in experimental area
Choose less complicated design
EXPERIMENTAL DESIGNS
• Complete block
• Incomplete block
• One factor
• Multiple factors
EXPERIMENTAL DESIGNS
Commonly used designs:
♦ Completely randomized (CRD)
♦ Randomized complete block (RCBD)
♦ Latin square (LS)

♦ Split-plot (SP)
♦ Incomplete block designs (IBD)
COMPLETELY RANDOMIZED DESIGN
(CRD)
❑ Each experimental unit has the same chance

of receiving a treatment in a completely
randomized manner
❑ Difference among experimental units given the

same treatment is attributed to experimental
error
When to Use CRD?
1. Experimental units are homogeneous. i.e.

low inherent variations.
2. When losses of observations is expected,

or already have unequal replications.
3. In small experiments, where error d.f. is

small.
When to use CRD cont.
Homogeneous environment
Laboratory
Green-house
Growth chamber
CRD cont.
❑Advantages:
1. Very flexible for different number of

treatments and replications.
2. Analysis is easy, even with unequal
reps, and heterogeneous errors. i.e. it
can be used if treatments with different
error variances.
3. Missing observations are easy to
analyze.
4. Maximum error d.f. is possible
compared to other design.
CRD
Disadvantage
1. Requires uniform experimental units.
2. All non-treatment variation is
labeled as experimental error.
3. Size limitation is usually restricted
to small experiments.
CRD
✪L.A.M. (Linear Additive Model):

ij =  + i + Eij
where ij = j th observation of the i th treatment.
 = overall mean
i = i th treatment effect (i - )
Eij = effect of the j th observation of the i th
treatment.
j = 1,......., r
i = 1,......., t.
P1R2 P2R4 P1R1 P3R4
P3R1 P2R3 P2R1 P2R2
P1R3 P3R3 P3R2 P1R4
Figure 1. Lay-out in CRD, P = progeny

Analysis
Source df SS MS E(MS)
______________________________________
Treatments t-1 Xi./r - X..2/rt SSt/dft σ2+rσ2
Error t(r-1) subtraction SSe/dfe σ2
Total SS - SSt
Total rt–1  Xij 2 - X..2/rt

Example
❖Manual analysis of variance
❖SAS analysis of variance

Treatment Biomass(g/plot) Treat. Treat.
Replication Total Mean
(Ti)
I II III IV
t1
t2
t3
t4
t5
Analysis of variance
Correction factor (C.F.) = (GT)2/rt where, r = No of rep

t = number of treatment
Total sum of squares (Total SS)
Total SS = ∑∑Xij 2 –C.F.
Sum of squares due to treatment (SSt)
SSt = (∑Ti)2/r – C.F.
Sum of squares due to error (SSE) = Total SS-SSt
Degrees of freedom for treatment (DFt)
DFt = number of treatment (t)-1
Degrees of freedom for error (Dfe)
Dfe = t(r-1)
Mean square due to treatment (MSt)
MSt = SSt/DFt
Mean square due to error (MSE)
MSE = SSE/Dfe
Coefficient of variation (CV%) = 100(MSE)/GM
SAS statement for CRD
 SAS CRD.doc
 CRD-SAS-output.doc
Raw data of effects of applying six herbicide types on
biomass of broad leaved weed species evaluated at ..

(Ti)
I II III IV
h1 17 20 16 21
h2 20 21 18 17
h3 18 19 21 16
h4 13 18 14 17
What design to control
one direction of inherent variation?
Low fertility High fertility

RANDOMIZED COMPLETE BLOCK (RCBD)
 Treatments assigned blocks consisting of

experimental units
 To keep variability among experimental units

within a block as small as possible
 To maximize differences among blocks
 No contribution to precision in detecting

treatment difference if no block differences
When to use RCBD?
1. Experimental units must be able to

be blocked in some way.
2. Each block must be large enough to

accommodate all treatments.
➭It is one of the designs that employ

a two-way classification of
treatments.
Low fertility
Gradient
Block-I
Block-II
Block-III
High fertility
RCBD cont.
L.A.M.:
ij =  + i + j + Eij
where ij = observation of the ith treatment in the jth block.

 = overall mean
i = ith treatment effect (i - )
j = jth block effect (j - )
Eij = effect of the ith treatment in the
jth block (ij - i -j + )
i = 1,......., t.
j = 1,......., r
Figure 2: Lay-out in RCBD
Block 1 Block 2 Block 3 Block 4
T1 T2 T1 T3
T3 T1 T2 T4
T2 T4 T3 T1
T4 T3 T4 T2

RCBD cont.
❑Advantages:
1. More precise than CRD if blocking is

effective.
2. Can handle more treatments or blocks as
an effort to control experimental error.
3. Analysis is relatively easy.
4. Error sum of squares may be partitioned
into heterogeneous components.
5. Provide information on the uniformity of
experimental units.
RCBD cont.
❑ Disadvantages:
1. If blocking is ineffective, precision is lost.
2. Block size depends on the number of

treatments.
3. Limiting the total number of observations

will limit the number of blocks or
treatments. (n = rt).
Block Total Mean
(Ti)
I II III IV
t1
t2
t3
t4
t5
t6
Analysis of variance
• CF
• Total SS
• SSt
• SSB
• SSE
Analysis
Source of Variation df SS E(MS)
Blocks b-1 ΣX.j2 / t-X.. 2 / bt σ2 + tσ2β
Among Treatments t-1 Σxi. / r-X..2 / bt σ2 + bσ2

2
Blocks X Treatments (b-1)(t-1) subtraction σ
Total bt-1 ΣX. ij2 -X.. 2 / bt

SAS Statement for RCBD
❖SAS-RCBD.doc
Table . Raw data of six palm progenies
evaluated for oil to bunch ratio (%)
Progeny Oil to bunch ratio (%) Treat. Treat.

(Ti)
I II III IV
P1 27.0 30.1 26.4 31.3
P2 30.4 31.1 28.9 27.7
P3 28.1 29.0 31.0 26.2
P4 23.1 28.2 24.1 27.0
P5 27.3 27.0 24.0 28.3
P5 23.1 21.4 20.1 23.3

Wet
What design to control

two directions of inherent variations?

dry
LATIN SQUARE DESIGN (LSD)
 Handles simultaneously 2 known sources

of variation among experimental units
 Treats sources as 2 independent blocking

criteria (row and column)
 Every treatment occurs only once in each

row-block and once in each column-block
 Number of replications depends on number

of treatments
LATIN SQUARE cont.
❑ Useful for experiments with 4 to 8

treatments with a single
experimental unit per treatment in
each column and row
Figure 3. Lay-out in LS
Wet
Column Column 2 Column 3 Column 4
1
Row 1 T1 T2 T3 T4
Row 2 T2 T3 T4 T1
Row 3 T3 T4 T1 T2
Row 4 T4 T1 T2 T3
dry
ANOVA for Latin Square vs. RCBD
Source d.f. RCBD d.f.
Rows r-1 Block r-1

Columns r-1
Treatments r-1 Trt. t-1
Error (r-1)(r-2) Error (r-1)(t-
1)
Total r2-1 tr-1

LATIN SQUARE cont.
Advantages:
❑ MSe is reduced if blocking is effective

in both directions.
❑ ANOVA is straight forward.
❑ May evaluate the effects of possible
gradients.
LATIN SQUARE cont.
Disadvantages
❑ If blocking is ineffective in either or

both direction, MSe is not reduced.
❑ MSe and all F-tests are invalid if

rows, columns or treatments interact.
❑ Size limitations.
L.A.M. Model for Latin square:
Xij(k) = μ + ρi + Kj + T(k) + Eij(k)

(GT ) 2
Correction factor (C.F.) =
t2 , where, t = number of treatments
=

Total SS = ∑∑Xijk 2 –C.F.
Sum of squares due to Row (SSR)
 Rj
2
SSR = C.F.
–
t
Sum of squares due to Column
(SSC)
C –
2
k
SSC = C.F.
t
Sum of squares due treatment

(SSt)
T – C.F.
2
k
SSt = t =
Table . ANOVA table for 6 x 6 Latin square design
Source of
variation d.f. SS MS F-cal 0.05 0.01
Rows r-1
Columns r-1
Treatments r-1
Error (r-1)(r-2)
Significant at level of probability

Analysis Using SAS
SAS_LSD.doc
Table . Body weight of animals fed on different rations
Column
Row 1 2 3 4 5 6 Rj Mean
1 F(3.19) E(3.50) D(3.27) C(2.62) B(2.82) A(1.91)

2 E(3.27) C(2.41) A(1.91) D(2.91) F(3.13) B(2.95)
3 B(3.04) A(1.91) F(3.25) E(3.29) D(2.91) C(3.07)
4 A(1.77) B(3.04) E(3.40) F(2.99) C(2.82) D(3.50)
5 D(3.50) F(3.31) C(3.09) B(3.04) A(1.91) E(3.27)
6 C(2.52) D(2.86) B(2.91) A(1.77) E(3.30) F(2.98)
Ck
What design
when there are two or more
factors
TWO OR MORE FACTORS
• Factorial in CRD, RCBD, LS

• Split-plot
• Split-split-plot
• Split-block, not common
• Fractional Factorial
FACTORIAL EXPERIMENT
❑ A treatment arrangement,
not experimental design
❑ A method of determining the
treatment to be employed
• Factor - a specific type of treatment
• Level - a state of a factor.
FACTORIAL EXPERIMENT cont.
Purpose/Advantage:
• To increase the scope of inferences on:
i.e., to determine:
• Important factors
• Optimum levels of those factors
• Interactions between factors  joint importance
• Improved precision  not always true in all
cases
Simple effect
- effect of 1 factor measured at a
specific level of all other factors.
Main effect
- the average of all simple effects of
a
single factor.
Interaction
- the failure of simple effects of one
factor to be the same at every level
of the other factor.
Tells the measure of simple effect of A at
the level of B.
Eg. 2 x 2 factorial (2n)
a1 a2
Sum of diagonal rule : b1
b2
Various Types of Interaction
b2
b1
No interaction Change in magnitude

but no change in ranking
b2
b1
Magnitude and direction Change in magnitude
change i.e. b1 respond and rank, but not in direction.
a negative way to increasing
levels of A.
❑ Conditions for no interaction
1. The effects of B are the same at all levels of A

2. The effects of A are the same at all levels of B
3. The effects of A and B are completely
additive
4. The residuals (unexplained effects) are all
zero
Test of significance are independent
e.g. if interaction AB are significant,

 A and B are non-independent,
but if interaction AB non-significant
 effect of A and effect of B are
independent i.e. their effects are additive.
Significance of A or B alone does not show
that the interaction AB are significant or not.
Type of factors useful in factorial
experiments
1. Qualitative (specific)
2. Quantitative
3. Ranked qualitative
4. Sample qualitative
Factorial in RCBD
❑a complete factorial has all factors

cross-classified to each other.
❑Cross-classification  all levels of a

factor correspond specifically to
equivalent levels at every level all
the other factors.
Factorial in RCBD cont.
e.g. 3 X 2 factorial (3 varieties, 2 N rates)
N rate
Variety 100 200
A - -
B - -
C - -
Nested relationship
❑levels of on factor are specific for specific
levels of the other factor.
e.g. 3 x 2 factorial (6 var., 2 N rates)
N rates
Variety 100 200
A -
B -
C -
D -
E -
F -
i.e. levels of variety are nested within N rates

If done in 2 locations to examine interaction
between varieties and location:
L1 L2
B1 B3
B2 B1
B3 B2
blocks are nested within locations

V1 V1
V2 V2
V3 V3
varieties are cross- classified with locations
Rootworm damage ratings & insecticide treatment (j)
Hybrid Rep Carbamate Organophosphate None

A 1 69 79 41
2 57 81 52
3 60 75 37
4 53 77 45
5 60 65 41
B 1 93 84 10
2 82 86 17
3 87 79 9
4 95 72 21
5 91 66 12
C 1 80 70 11
2 84 69 24
3 81 81 19
Correction factor (C.F.) = (GT)2/abr
SSt = (∑Ti 2)/r – C.F.
SS S. error) = Total SS–SSt
MSt = SSt/DFt
MSE = SSE/DFe
Correction factor (C.F.) = (GT)2/abr
SSt = (∑Ti2)/r – C.F.
SSB = (∑Bj2)/t – C.F.
SS S. error = Total SS–SSt-SSB

Analysis of variance for factorial in CRD
Sources of
variation DF SS MS F-calc. F-tab (0.05) (0.01)
Treatments t-1 SSt MSt**
Error t(r-1) SSe MSe
Total tr-1 Total SS
** = Significant at 0.01 level of probability

Analysis of variance for factorial in RCBD
Sources of
Block b-1 SSB MSB**
Error (t-1)(r-1) SSe MSe
Total tr-1 Total SS

Analysis of variance for factorial in CRD
Source of DF SS MS F-calc. F-tab (0.05) (0.01)
variation
Treatment t-1 SSt MSt**
Factor A a-1 SSA MSA
Factor B b-1 SSB MSB
A x B (a-1)(b-1) SSAxB MSAxB
Error t(r-1) SSe MSe
Total
** = Significant at 0.01 level of probability
Analysis of variance for factorial in RCBD
Sources of
Block b-1 SSB MSB**
Factor A a-1 SSA MSA
Factor B b-1 SSB MSB
AxB (a-1)(b-1) SSAxB MSAxB
Error (t-1)(r-1) SSe MSe
Total tr-1 Total SS

• Standard error
Variety: SE (m) =± MS E
rn
Nitrogen: SE MS E
(m) = ± rv
VxN: SE MS E
(m) = ±
r
Figure 1. Effect of insecticide on the control of
corn rootworm
10
Corn rootworm score (0-10)
9
8
7
6 A
5 B
4 C
3
2
1
0
C O N
Insecticide
One MSc student conducted an experiment to determine the effect of
different levels of nitrogen fertilizer (N0, N1, N2, ) on three varieties (V1, V2
and V3). A 3 x 3 factorial experiment in RCBDBlock
Treatment with three replications was
used, keeping all other culturalI practices as recommended
II III the area.
for
The grain yield data were given in Table 1 for analysis and appropriate
interpretation.V1N0 3.85 2.61 3.14
V1N1varieties tested
Yield data of three 4.79 at different
4.94 4.56fertilizer .
levels of nitrogen
V1N2 4.58 4.45 4.88
V2N0 2.84 3.79 4.11
V2N1 4.96 5.13 4.15
V2N2 5.93 5.70 5.81
V3N0 4.19 3.75 3.74

SPLIT PLOT DESIGN
o Assigns main-plot factor to main-plots and

sub-plot factor to sub-plots
o Sub-plots are division of main-plots hence,
o Main-plot becomes a block for sub-plot

treatments
Suitability of Split-plot
 Two-factor experiment with many treatments such

that they cannot be accommodated by a complete
block designs
 Improves precision for measurement of effects

of sub-plots at the expense of main-plot
Table . Treatment randomization in split-plot in RCBD
Block 1 Block 2
Main-plot1 Main-plot2
SP1 SP2 SP3 SP3 SP1 SP2
Table . Yield (t/ha) of at different density and sowing time
Density Replication Total

Sowing date Sowing
I II III IV V VI Date(Si)
S1
D1 5.59 5.50 5.25 5.31 4.63 4.69

D2 5.69 6.66 6.22 5.84 5.97 6.03
D3 6.90 5.28 6.66 6.19 6.40 6.38
Sum 18.18 17.44 18.13 17.34 17.0 17.1 105.19

S2
D1 5.84 5.63 6.14 5.94 6.13 6.08

D2 6.90 6.86 7.03 6.94 6.22 6.88
D3 7.09 6.72 6.25 6.36 6.34 6.81
19.83 19.21 19.42 19.24 18.69 17.77 116.16
S3
D1 5.23 5.41 5.25 5.71 5.06 5.75

D2 6.60 6.55 7.00 6.28 6.75 7.03
D3 6.03 6.52 6.12 5.72 6.13 6.88
Sum 17.86 18.48 18.37 17.71 17.94 19.66 110.02

(GT ) 2 (331.37) 2
Correction factor (C.F.) = abr = 3 x3 x 6
=2033.45
Total SS = ∑∑∑Xijk 2 – C.F.

= [(5.59)2 + (5.50)2 + … + (6.88)2] – 2033.45
=21.00
Sum of squares due to replication (SSR)

R
2
k – C.F.
SSR = ab
(55.87) 2  (55.13) 2  ...  (56.53) 2 – 2033.45
=
3 x3
= 0.67
5. Sum of squares due sowing date (SS S)
Here we need to have a two-way table between main-plot and replication.

Table 5.6. Grain yield (t/ha) of a wheat variety at different density and sowing time
I II III IV V VI Si
S\rep
S1 18.18 17.44 18.13 17.34 17.0 17.1 105.19
S2 19.83 19.21 19.42 19.24 18.69 17.77 116.16
S3 17.86 18.48 18.37 17.71 17.94 19.66 110.02
Rk 55.87 55.13 55.92 54.29 53.63 56.53 331.37
Then, SSS
SS due to error (a)
It is necessary to make a two-way table between sowing time and density as follows.
D1 D2 D3 Si
S\rep
S1 30.97 36.41 37.81 105.19
S2 35.76 40.83 39.57 116.16
S3 32.41 40.21 37.40 110.02
Dj 99.14 117.45 114.78 331.37
SSd
SSs
SS due to error
(b)
Table . ANOVA table for sowing time x plant density study
in split-plot design
Source of d.f. SS MS F-cal F-tabulated
variation 0.05 0.01
Replications 5 0.67 0.134 0.96 2.53 5.53

Sowing, S 2 3.35 1.675 18.41** 3.32 8.77
Error(a) 10 0.91 0.091 0.65 2.16 4.24
Densities, D 2 10.87 5.435 38.90** 3.32 8.77
SxD 4 1.01 0.2525 1.81 2.69 6.12
Error (b) 30 4.19 0.1397
**, significant at 0.01 level of probability

Split plots
An experiment on maize experimental hybrid Hy-59
was conducted to study the effects of 3 population
densities and 3 fertilizer rates on silage yields. The
design was a RCB with 4 replications in split-plots.
The whole plots were population densities of 24000,
30000 and 36000 plants per hectare. Each main-plot
was divided into sub-plots fertilized with 120, 150
and 180 kg/ha of nitrogen.
Calculate the analysis of variance. Test sources of

variation due to main effects and the interactions
using the appropriate or approximate F-tests.
Interpret the results in a clear and concise
discussion. Include the standard errors for
population density means and the fertilizer means.
24000 120 9.0 8.3 8.5 8.6
150 9.5 8.0 8.8 7.8
180 10.7 9.5 8.8 8.9
30000 120 9.2 9.8 11.2 7.9

150 8.9 9.8 10.4 8.6
180 9.2 10.6 9.1 8.7
36000 120 8.1 7.4 8.2 8.5

150 9.2 8.1 7.8 8.9
180 9.3 7.4 9.4 8.3
Regression and correlation
* Is the relationship linear (straight-line)?

* Does the value of y depend upon the value
of x or vice versa?
* How strong is the relationship, do the
points form a perfect line?
* Does the scatter represent nothing more
than a random collection of points since
there is no relationship?
* Can we predict the value of x if we know y,
and vice versa?
Correlation:
Very often we are interested in knowing what
association, if any, exists between two
measurements or variables. One way to
determine this is to calculate the correlation
coefficient, a commonly used index which
measures the degree of association between
two variables.
r has a value between -1 and +1 145

From this r is
146
Perfect positive correlation r = 1.0
r = 0 No relationship
negative correlation r = -0.6
147
r does not depend on units: changing
cm to mm does not affect correlation
but does to slope
r does not detect cause and effect It
measures how the variables covary
r quantifies the strength of linear
relationships
148
Regression
Yield
Fertilizer level
149
Regression
To see the association between variables we

need a formal model. The most general form of
such model is:
Y=f(X1, X2, ..., Xn) +  .............1
For n=1, (1) becomes a simple linear regression
For n2, (1) becomes a multiple linear
regression.
independent, predictor or explanatory
variable is denoted by X
the dependent or response variable

and is denoted by (Y)
150
Simple Linear Regression
The functional form of the linear relationship between a
dependent variable Y and independent variable X is
represented by the equation:
Y= a + bx
Where a is the intercept

b is slope of the line
We test the hypothesis.

Ho:  = 0 i.e. X is not useful as a predictor of
Y. OR
There is no linear relationship between X & Y
PROC REG;
MODEL dep=indep;
RUN; 151
152
Residuals
• Residuals are the observed value - the fitted
value, i.e. the red line on the previous slide.
• Plots involving residuals can be very
informative. They can:
• help assess if assumptions are valid
• help assess if other variables need to be taken into
account
153
SAS statement
data crdanov;
input trtment $ regen;
cards;
M1 12
M1 15
M1 16
M2 10
M2 9
M2 11
M3 15
M3 18
M3 17
M4 9
M4 8
M4 7
;
proc anova;
class trtment;
model regen=trtment;
means trtment/lsd;
run;
Data rcbdanov;
Input genotype block DS;
Cards;
1 1 53
2 51
3 50
4 52
2 1 38
2 43
3 40
4 40
3 1 49
2 48
3 52
4 45
4 1 45
2 47
3 50
4 46
5 1 37
2 36
3 37
4 42
;
proc anova;
class genotype block;
model DS=genotype block;
means genotype/Duncan;
run;
OUTPUT
Output from proc anova of the above data is given in Table 4.40
SAS output for 10 genotypes replicated 4 times in RCBD
The SAS System Analysis of Variance
Class Levels Values
BLOCK 4 1 2 3 4
GENOTYPE 10 1 2 3 4 5 6 7 8 9 10
Number of observations in data set = 40
Dependent Variable: DS
Source D.F. Sum of Squares Mean Square F Value Pr > F
Model 12 1031.80000000 85.98333333 17.30 0.0001
Error 27 134.17500000 4.96944444
Total 39 1165.97500000
R-Square C.V. Root MSE DS Mean
0.884925 4.770947 2.22922508 46.72500000
Source D.F. SS Mean Square F Value Pr > F
BLOCK 3 19.07500000 6.35833333 1.28 0.3014
GENOTYPE9 1012.72500000 112.52500000 22.64 0.0001
Alpha= 0.05 d.f.= 27 MSE= 4.969444, Critical Value of T= 2.05
Least Significant Difference= 3.2343
Means with the same letter are not significantly different.
T Grouping Mean N GENOTYPE
A 53.250 4 8
A
B A 51.500 4 1
B A
B A 51.500 4 6
B
B C 50.000 4 10
B C
B C D 48.500 4 3
C D
C D 47.000 4 4
D
D 46.250 4 7
E 41.000 4 9
E
E 40.250 4 2
data LSDanov;
input row column trtment score;
cards;
1 1 1 33.8
2 4 1 34.6
3 3 1 36.9
4 2 1 37.1
5 5 1 36.4
1 2 2 33.7
2 3 2 33.5
3 5 2 35.1
4 4 2 38.1
5 1 2 34.8
;
proc anova;
class row column trtment;
model score=row column trtment;
means trtment/lsd;
run;
Class Levels Values
ROW 5 1 2 3 4 5
COLUMN 5 1 2 3 4 5
TRTMENT 5 1 2 3 4 5
Number of observations in data set = 25
Analysis of Variance Procedure
Dependent Variable: EL
Model 12 259.85920000 21.65493333 7.06 0.0010
Error 12 36.79920000 3.06660000
Total 24 296.65840000
R-Square C.V. Root MSE EL
0.875954 5.134194 1.75117104 34.10800000
Source D.F. Anova SS Mean Square F Value Pr > F
ROW 4 87.40240000 21.85060000 7.13 0.0035
Column 4 16.56240000 4.14060000 1.35 0.3079
TRTMENT 4 155.89440000 38.97360000 12.71 0.0003
T tests (LSD) for variable: EL
Alpha= 0.05 d.f.= 12 MSE= 3.0666
Critical Value of T= 2.18
Means with the same letter are not significantly different.
T Grouping Mean N
TRTMENT
A 35.760 5 1
A
A 35.680 5 3
A
A 35.040 5 2
A
A 34.900 5 4
B 29.160 5 5
data splot;
input vy $ date $ blk yld;
cards;
BC10 d1 1 2.2
BC10 d1 2 2.0
BC10 d1 3 2.3
BC10 d2 1 3.2
BC10 d2 2 3.3
BC10 d2 3 3.4
BC10 d3 1 4.0
BC10 d3 2 4.1
BC10 d3 3 4.2
BC9 d1 1 1.8
BC9 d1 2 1.9
BC9 d1 3 2.2
BC9 d2 1 2.4
BC9 d2 2 2.4
BC9 d2 3 2.5
BC9 d3 1 3.1
BC9 d3 2 3.2
BC9 d3 3 3.3
;
proc anova;
class vy date blk;
model yld=vy blk vy*blk date vy*date;
test h=vy blk e=vy*blk;
means vy/lsd e=vy*blk;
means date vy*date/lsd;
run;
;
Class Levels Values
V 2 BC10 BC9
DATE 3 d1 d2 d3
BLPCK 3 123
Dependent Variable: YIELD(t/ha)
Model 9 10.06500000 1.11833333 154.85 0.0001
Error 8 0.05777778 0.00722222
Total 17 10.12277778
R-Square C.V. Root MSE YIELD Mean
0.994292 2.970303 0.08498366 2.86111111
V 1 1.93388889 1.93388889 267.77 0.0001
BLOCK 2 0.13777778 0.06888889 9.54 0.0076
V*BLK 2 0.00444444 0.00222222 0.31 0.7435
DATE 2 7.52111111 3.76055556 520.69 0.0001
V*DATE 2 0.46777778 0.23388889 32.38 0.0001
Tests of Hypotheses using the Anova MS for V*BLOCK as an error term

V 1 1.93388889 1.93388889 870.25 0.0011
BLOCK 2 0.13777778 0.06888889 31.00 0.0313
Alpha= 0.05 d.f.= 2 MSE= 0.002222
Critical Value of T= 4.30
data diet;
input diet $ breed $ wt;
cards;
a1 b1 30
a1 b1 31
a1 b1 29
a1 b2 35
a1 b2 36
a1 b2 34
a2 b1 37
a2 b1 38
a2 b1 36
a2 b2 40
a2 b2 41
a2 b2 39
;
proc anova;
class diet breed;
model wt=diet breed diet*breed;
means diet breed/lsd;
run;
Source DF Sum of Squares Mean Square F Value Pr > F
Model 11 38286332.15096790 3480575.65008800 42.17 0.0001
Error 60 4952506.92847634 82541.78214127
Total 71 43238839.07944430
R-Square C.V. Root MSE MOE Mean
0.885462 8.717321 287.30085649 3295.74722222
Source DF Anova SS Mean Square F Value Pr > F

AGE 2 440720.21861088 220360.10930544 2.67 0.0775
PASIZE 1 12.93482661 12.93482661 0.00 0.9901
AGE*PASIZ 2 410702.68475962 205351.34237981 2.49 0.0916
DENSITY 1 34665578.67555560 34665578.67555560 419.98 0.0001
AGE*DENSI 2 2439446.37694442 1219723.18847221 14.78 0.0001
PASIZE*DE 1 35587.67096257 35587.67096257 0.43 0.5139
A*P*D 2 294283.58930826 147141.79465413 1.78 0.1770
SAS statement for linear regression with single predictor
data class;
input hybrid $ Eheight Cweight @@;
data lines;
Hyb-1 69 112
. . .
Hyb-n 62 62
;
proc reg data=class;
model Cweight=Eheight;
run;
plot r.*p.;
Run;
SAS output for a response and two independent variables
MODEL: MODEL 1
DEPENDENT VARIABLE: Y
ANALYSIS OF VARIANCE
SOURCE D.F. SS MS F-VALUE PROB> F
MODEL 2 1423.83797 711.91898 113.126 0.0001
ERROR 10 62.93126 6.29313
TOTAL 12 1486.76923
PARAMETERS ESTIMATES
VARIABLE D.F. ESTIMATE ERROR T FOR H0 PROB>T
INTERCEPT 1 65.099678 14.94457 4.356 0.0014
X1 1 1.0771 0.077 13.975 0.0001
X2 1 0.4254 0.07315 5.815 0.0002
MEAN COMPARISONS
t-test
 Tests difference between 2 means with

no specific experimental design
 Means can come from dependent (paired)

or independent (unpaired) experiment
Table 1: Effect of selfing on seedling height
S0 S1
Plant number Seedling height (cm)
1 20 12
2 18 10
3 22 8
4 28 5
5 17 4
6 20 2
7 19 6
8 25 3
9 27 11
10 30 7
Mean 22.60 6.80
SAS output
T Prob>|T|
8.839 0.0001
Significant at 0.01 level of probability
Mean of S0 is higher than S1

MULTIPLE COMPARISON OF MEANS
Major statistics for mean comparison

❖ LSD
❖ DNMRT
❖ Tukey’s
Least significant difference
LSD = t(1- α)/2 2 MS E

r
Cochran and Cox (1957)

Taking five treatments, for example, the process can be
explained as T1, T2, T3, T4 and T5 with mean 9, 10, 16,
17 and 22, respectively, with α = 0.05, error degrees of
freedom 20, and MSE for the five treatments = 8.06, then
the LSD = 4.37. From this statistics one can see that
treatment 5 had the highest mean and two pairs of
means that do not significantly differ from each other
are T1 and T2 and T3 and T4.
T1 T2 T3 T4 T5
9 10 16 17 22
Tukey’s Test
(Maximum mean - minimum mean)

q= MS E
r
Tukey’s test declares two means significantly

different from each other, if the absolute value of
their sample differences exceeds T.
.
Tα = qα(t, d.f.) MS E
r
8.06
qα (5, 20) = 5
= 5.37.
Hence, any pair of treatment means that differ

in absolute value by more than 5.37 would
imply that the corresponding pair of population
means is significantly different.
T1 T2 T3 T4 T5
9 10 16 17 22
Duncan’s New Multiple Range Test
(DNMRT)
From Duncan’s Table of significant ranges,
Appendix Table A.7 of Steel and Torrie (1980), rα
(p, d.f.) values are obtained for t = 2, 3, …, t
where α is the significant level and d.f. is error
degrees of freedom. This ranges will be
converted into a set of t-1 least significant
ranges (Rp) for p = 2, 3, …, t by calculating Rp).
Rp = rα(p, d.f.) SE(m) for p = 2, 3, …, t.
DNMRT can be applied on the previous example.
Recalling that MSE = 8.06, n = 5 and error degrees of
freedom = 20 and then the treatment means can be
ordered in ascending order as T1=9, T2 = 10, T3 = 16, T4 =
17 and T5 = 22.
MS E
The SE (m) = r
8.06
= 5
= 1.27.
The comparison would yield that there are
significant differences between all pairs of
means except treatment 1 and 2 and 3 and 4. In
this example, DNMRT and the LSD method
produced the same result that leads to identical
conclusions.
T1 T2 T3 T4 T5
9 10 16 17 22
Which comparison method is the best?
• No clear-cut answer to this question

• Carmer and Swanson (1973) have
conducted simulation studies and
concluded that
• LSD or DNMRT suffice the present

knowledge
Group comparison method
 Structure or nature treatments are such

that
they need to be meaningfully grouped and
compared = contrast
 Examination of functional relationship
between
treatment levels and treatment means = trend
analysis
TREND ANALYSIS
Linear and Curvilinear Trends
❑ Linear and Curvilinear Trends

➭involving quantitative factors i.e. rate
and date type studies.
➭Objective : to study the rate of change of

a variable with increasing levels of a
specific factor.
 linear and curvilinear trends.
eg.
➭variable (y) : Yield
➭factor (x) : N fertilization rate
1. Linear
y = a + bx
Grain
yield
150
N fertilization
2. Quadratic
Nutrient release
y = a+b1x+b2x2
Time
3. Cubic
Specific leaf weight
y = a+b1x+b2x2+b3 x3
3 turning points
- max, min change of the curve
Time
Trends Cont.
Considerations (Questions):
• Is there a response?
• Is the response predictable or
explainable?
• What is the nature of the response?
• What model best describe the response?
• Is the response significant?
• practical view point
• statistical view point
GENERAL PRINCIPLES
1. For t treatments, SSt may be partitioned into t-1
portions specified by increasing higher order contrasts.
______________________________________________________
t df terms
2 1 L invalid not done
3 2 L, D
4 3 L, Q, D
5 4 L, Q, C, D
. .
. .
1. Quantitative treatments factor levels need not

be equally spaced, but will make calculations
easier if equally spaced.
2. Tests of significance made exactly as described for
contrast.
3. Interpretation is never complete without a graph.
Trts. Totals
Angle 0 60 120 180
Contrast 970.34 978.78 1087.05 1031.49

L -3 -1 1 3
Q 1 -1 -1 1
Dev. Fr Q -1 3 -3 1
Q K SS
L 231.72 20 134.24
Q -4.36 4 0.02
D -84.86 20 18.00
ANOVA
Source df. MS F
Reps 19 3.15 2.22**

Angles
L 1 134.04 94.39**
Q 1 0.02 0.01
D 1 18.00 12.68**
Error 57 1.42
Total 79
Conclusion
1. Linear response is significant.
2. There is also a significant curvilinear
response, a response that is not quadratic,
but maybe cubic?
linear
cubic
WHEN ASSUMPTIONS OF ANOVA
o Normality
o Independence
o Homogeneity
o Additiveness are violated
o Randomness
Data transformation
DATA TRANSFORMATION
LOG TRANSFORMATION
 Suitable for count data where standard

deviation is proportional to mean
Example
- Number of insects per plot
- Number of diseased plants per
plot
SQUARE-ROOT TRANSFORMATION
 Suitable for percentage data
 Variance is proportional to mean

Examples :
- Number of seeds germinated in a plot
- Number of insects caught in traps
ARC SINE OR ANGULAR TRANSFORMATION
 Suitable for count data expressed percentages
 Percentage ranging from 0 to 100%

CONCLUSION
• Choice of design
• Precision
• Treatment
• Inherent variation
• Knowledge
❑ Less complicated with good precision

❑ Computer know-how
❑ Consult statistician during planning

Biometry 2010

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biometry 2010

Uploaded by

Copyright:

Available Formats

Biometry

 What is statistics? A statistical

 Variables: characteristic that can be

 Quantitative Data = measurable

1. ATVET 6.3 6.3

 A population is the whole set of

 Accuracy and precision

 Hypothesis is an assertion or conjecture

 Type-I error: Rejection of the null

 Type-II error: Acceptance of the

Accept Ho Correct Type 11 error

 Experimental error is the variation

 Replication is repeating treatments in

 Blocking is a one of the procedures of

 Analysis of variance (ANOVA) is a

 Assumptions of analysis of variance

 normality -the errors are randomly,

 Correlation analysis attempts to measure

 Regression is similar to correlation in

 The covariance between two random

 Experiment as a planned inquiry to

 Experiment is an important tool for

 Experimental units and treatments are

 Sampling is the taking or measuring of more

 Sampling can be done in one run, at two

 The main reason for sampling is to save

 In the planning phase, it is necessary to

 The most common question asked by every

 The second suggestion is to take 20% of the

 Summarize possible large numbers of

 The first step in constructing a frequency

 Most commonly i values of 3, 5, 10, 25, 50,

 Histograms are vertical bar charts in which

 A number describing the location of a set of

❑ The rules and procedures used

❑ Main Components of the

❑ Error Control - primary function

Source d.f. SS MS F-cal

MSt = SSt/d.f. for treatment

❖Experimental Error - Measure of the

• When degrees of freedom for error is high

• When degrees of freedom for error is low

2. Variation due to inconsistencies or lack of

a. mistakes and sloppiness

❑Replication = repeating treatment

❑ Factors influencing number of

1. Degree of precision required

a. To provide an unbiased estimate of

b. Precaution against disturbances that

Commonly used designs:

♦ Completely randomized (CRD)

♦ Randomized complete block (RCBD)

♦ Latin square (LS)

❑ Each experimental unit has the same chance

❑ Difference among experimental units given the

1. Experimental units are homogeneous. i.e.

2. When losses of observations is expected,

3. In small experiments, where error d.f. is

1. Very flexible for different number of

✪L.A.M. (Linear Additive Model):

P3R1 P2R3 P2R1 P2R2

P1R3 P3R3 P3R2 P1R4

Figure 1. Lay-out in CRD, P = progeny