You are on page 1of 193

Biometry

Hassen Shifa
1
Introduction

 What is statistics? A statistical


expression that says, ‘we
calculate statistics from
statistics by statistics’
Introduction cont.
 Descriptive and Inferential.
 Descriptive statistics can be defined as
those methods involving the collection,
presentation and characterization of a set
of data
 Inferential statistics = the estimation of a
characteristic of a population or the
making of a decision concerning a
population based only on sample results
Introduction cont.
 Descriptive
– Mean
– Median
– Mode
– Range
– Variance
– Standard deviation
Introduction cont.

 Inferential
 Basic Research: for sake of knowledge.
 Applied Research: solve problem
 Field Research: uncontrolled environment
 Laboratory Research: controlled envit.

.
Introduction cont.

 Variables: characteristic that can be


measured or described and can vary
among individuals or objects of
reference
 Data: Information about a particular
characteristic of an individual
Variable Data
----------- -------
gender male, female
height 1.6 m, 1.55 m, 1.5 m
t-shirt co lour white, black, blue
number of pens 1, 2, 3,
weather sunny, rainy, cloudy
CGPA 1.5, 2.0, 2.5, 3.0, 3.5
Weight 65kg, 100kg
Introduction cont.

 Quantitative Data = measurable


– Continuous, height of students
– Discrete, number of students
 Qualitative Data = described
– Color
– taste
Introduction cont.

 Data presentation
– Tabular
– Frequency distribution
– Graphic Representation
Table 1. Summary of project cost
Original budget, Revised
Project component mln USD budget mln
USD

1. ATVET 6.3 6.3


2. Extension 29.0 33.7
3. Research 22.3 22.3
4. ICT 3.0 1.2
5. Marketing 7.0 11.0
6. PMU 3.4 3.1
Total 71 77.6
RCBP budget utilization by region, 2007

25

20

15
Allocated
Used
10

0
ay ra ia P far al
i
l la uz ari
r a N ar
Ti
g
m
h ro
m
S N A om be um
H
A O S am B.G
G
80

70

60
ATVET

50
Extension

40 Research

ICT
30
Marketing

20
PMU

10

0
1st year 2nd 3rd 4th 5th
Introduction cont.

 A population is the whole set of


measurements or counts about which we
want to draw conclusion.
 A sample is a sub set of a population, a
set of some of the measurements or
counts that comprises the population.
Introduction cont.

 Accuracy and precision


 Precision is the closeness of repeated
measurements and
 Accuracy is the closeness of a measured
or computed value to its true value
Introduction cont.

 Hypothesis is an assertion or conjecture


concerning one or more populations.
 Null hypothesis: any hypothesis to be tested
and is denoted by H0.
 Alternate hypothesis, denoted by H1 or HA.
Introduction cont.

 Type-I error: Rejection of the null


hypothesis when it is true

 Type-II error: Acceptance of the


null hypothesis when it is false
Investigator’s Ho is true Ho is false
Decision μF = μM μF ≠ μM
The Real Situation
Reject Ho Type1 error
(unknown to the investigator) A correct decision

Accept Ho Correct Type 11 error


decision
Introduction cont.

 Experimental error is the variation


between experimental units (plots)
treated alike
Introduction cont.

 Replication is repeating treatments in


more than one experimental unit
 Randomization is the process of
assigning treatments to experimental
units at random.
Introduction cont.

 Blocking is a one of the procedures of


the refined techniques where by
experimental units are grouped into
blocks of homogeneous units
Introduction cont.

 Analysis of variance (ANOVA) is a


procedure that can be used to analyze
the results from both simple and complex
experiments
Introduction cont.

 Assumptions of analysis of variance


include
 randomness- sampling of individuals be
at random
 Independence-the errors are
independently distributed
Introduction cont.

 normality -the errors are randomly,


independently and normally distributed), and
 homogeneity -the error terms are
homogeneous/equal variance).
Introduction cont.

 Correlation analysis attempts to measure


the strength of relationships between two
variables

 Regression is similar to correlation in


that testing for a linear relationship
between two types of measurements is
made on the same individuals
Introduction cont.

 The covariance between two random


variables is a measurement of the
nature of the association between the
two
Introduction cont.

 Experiment as a planned inquiry to


obtain new facts or to confirm or deny
the results of the previous experiments
Introduction cont.

 Experiment is an important tool for


research and should have the following
important characteristics:
– simplicity,
– measuring differences at higher precision,
– absence of systematic error,
– wider ranges of conclusion and
– calculation of degree of uncertainty
Introduction cont.

 Experimental units and treatments are


very general terms
 An experimental unit may be an animal,
many animals, a plant, a leaf and so on.
Accordingly, a treatment may be a
standard ration, inoculation, and a
spraying rate/spraying schedule
Sampling

 Why sampling?
 Complete information would emerge only if
data were collected from every individual in
the population.
 To collect data of destructive type can lead to
all individual to be eliminated.
Sampling cont.

 Sampling is the taking or measuring of more


than one observation per experimental unit.
 Sampling error occurs during sample
measurements. The sampling error is the
variation among observations taken within
experimental units.
 Sampling can be done in green houses,
under filed conditions, in laboratories, on life
plants like perennial crops, on animals, and
so on.
Sampling cont.

 Sampling can be done in one run, at two


stages, three stages or so.
Sampling cont.

 The main reason for sampling is to save


resources (time, money and efforts).
 The second reason is that the sample data
can be useful in drawing conclusions about
the population, with appropriate sampling
method and sample size.
 The third reason is when the act of
measuring the variable destroys the
individual, such as in destructive sampling.
Sampling methods

Assignment
 Simple random sampling
 Systematic sampling
 Stratified random sampling
 Multistage random sampling
 Stratified multistage random sampling
 Cluster sampling
 Quota sampling
Sample Size

 In the planning phase, it is necessary to


decide what kind of measurements to be
taken and which sampling techniques are
going to be used.
 The next step is to decide how many
measurements to be taken to get
representative of the population
Sample Size

 The most common question asked by every


investigator who whishes to collect and
analyze data is, ‘how much data should be
collected?’
 A sample size of 30 sounds enough,
depending on conditions.
 Precision required
 The variability, as measured by its standard
deviation is important.
Sample size

 The second suggestion is to take 20% of the


units from each stratum (this is called
proportional sampling)
 The third is take optimal samples from each
stratum.
 changing the sample size alter the sensitivity
or accuracy
Data Description

 Summarize possible large numbers of


measurements:
 Frequency distribution
 Basically, the frequency distribution is simply
a table constructed to show how many times
a given score or group of scores occurred.
We can set up a table where the highest
score is at the top and the lowest at the
bottom, with all possible scores in between,
distribution (Table below).
Frequency distribution of plant height of a swee tcorn
population
___________________________________
Class frequency
___________________________________
156-160 10
161-165 80
166-170 235
171-175 370
176-180 220
181-185 80
186-190 5
___________________________________
 Real limits and apparent limits
 Apparent limit is like the one written as 156-
160, 161-165, and so on.
 Gap between 160 (the top score in the 156-
160 interval) and 156 (the lowest score in the
156-160 interval). For this reason, it is
understood that the real limits of any interval
extended from 1/2 unit below the apparent
limit to 1/2 unit above the apparent upper
limit. The real lower limit is designated L and
the real upper limit is U.
Frequency distribution

 The first step in constructing a frequency


distribution from any group of data is to
locate the highest and the lowest scores.
 The distance between the highest and lowest
score is the range.
 The next step is to determine i, the size of
the interval. Dividing the range by the
number of intervals that will be employed
does this.
Frequency distribution

 Most commonly i values of 3, 5, 10, 25, 50,


and other multiple of 10 are used (Berenson
et al., 1988).
 The choice of the number of intervals and the
size of the interval is quite arbitrary.
 The main point is to have the frequency
distribution display as much information as
possible concerning the concentrations and
patterns of scores.
Histogram

 Histograms are vertical bar charts in which


the rectangular bars are constructed at the
boundaries of each class.
 The steps in graphing frequency distribution
are: to layout an area on a paper having the
height of the graph about three-fourths of the
width.
 The horizontal line, called the x-axis or the
abscissa,
Measure of central tendency

 A number describing the location of a set of


values is a measure of central tendency or
measure of location.
 The central tendency, dispersion and shape
are three major properties that describe a set
of numerical quantitative data.
 Mean is the arithmetic average of the values.
EXPERIMENTAL DESIGN

What
What is
is
Experimental
Experimental
design?
design?

❑ The rules and procedures used


in organizing, initiating and
analyzing an experiment.
INTRODUCTION cont.

❑ Main Components of the


Research Process
Planning

The
Research
Process
Results
Implementation
Basic Principles

❑ Considerations:
I. Planning and Execution
1.Define the Problem
 general objectives
hypothesis
specific objectives.
Basic Principles cont.

2. Establish Experimental
Procedures
 biological materials available
 treatment to be used
 Choice of the treatments
 experimental units
 replicates,
 select proper experimental design
 consult experts for choosing design
 conduct of experiment
Consideration: I. Planning and
Execution of the Experiment cont.
3. Measurements to be Taken
✪decide on the variables of interest
✪what to measure, and when is the right
time
✪plan at the right time
✪consider time constraints
Consideration: I. Planning and
Execution of the Experiment cont.
4. Prepare for Data Recording and
Summarization
✪use of any facilities, eg tape recorder,
ruler
✪note books, pens , etc.
✪be prepared with problems
Consideration: I. Planning and
Execution of the Experiment cont.
5. Outline and Prepare for Data
Analysis
✪ design considerations is
important
✪ how to analyze
✪ softwares, hardwares
✪ what programme to be used, etc.
Consideration: I. Planning and
Execution of the Experiment cont.
6. Prepare a Detailed Work Schedule
✪ what supplies needed, how long
✪ labor sufficiency, time sufficiency
Consideration: I. Planning and
Execution of the Experiment cont.
7. Consider the Scope and Cost
✪ do not have experiment too
large
✪ labor, cost and time - cost-
benefit
ratio
Consideration: I. Planning and
Execution of the Experiment cont.
8. Conduct of Experiment
✪ avoid bias
✪ use numbers
✪ reduce size and scope if possible
Consideration: I. Planning and
Execution of the Experiment cont.
9. Analyze Data
✪ what statistics to be used
✪ inferences to be made
Consideration: Planning and
Execution of the Experiment cont.
10. Write-up Results
✪ results should be clear, and
clearly discussed
✪ draw accurate and
relevant conclusions
Basic Principles cont.

❑ Error Control - primary function


of experimental design
1. Among treatments (controllable)
2. Within treatments (uncontrollable)
- to be minimized
ANOVA Table for CRD

Source d.f. SS MS F-cal


Among treatment t-1 SSt MSt MSt/MSe
Within treatment t(r-1) SSe MSe
Total tr-1 Total SS

MSt = SSt/d.f. for treatment


MSe = SSe/d.f. for error
Basic Principles cont.

❖Experimental Error - Measure of the


variation which exists among
observations on experimental units
treated alike.

❖Precision Indicators
❑Degrees of freedom for error
❑Mean squares of Error
High
Lo
w

Degrees of Mean
Freedom for squares
error for error

Lo High
w
Basic Principles cont.

• When degrees of freedom for error is high


• Low Mean squares of error
• High precision
• Treatment effect is revealed

• When degrees of freedom for error is low


• High mean squares of error
• Low precision
• Treatment effect is masked
Basic Principles cont.

❑ Types of Variation:
1.Inherent Variation
- due to inconsistencies or
heterogeneity within the population
of experimental units
- always present, cannot be avoided
- population variance
Basic Principles cont.

2. Variation due to inconsistencies or lack of


uniformity in the physical conduct of experiment
 can never be corrected by statistics.

a. mistakes and sloppiness


b. ignorance - lack of knowledge of particular
information.
eg. characteristics of experimental units;
proper experimental design.
c. non-random outcomes - uncontrollable eg. flood
Ways to Reduce Experimental Error

❑Replication = repeating treatment

❑Randomization = assigning
treatment
Ways to Reduce Experimental Error cont.

1. Functions of Replication
a. to provide an estimate of
experimental error.
b. to improve precision
c. to increase the scope of
inferences
d. to effect control of error
Ways to Reduce Experimental Error

❑ Factors influencing number of


replications

1. Degree of precision required


2. Uniformity of experimental
units
3. Number of Treatments
4. Experimental design used
Ways to Reduce Experimental Error

2. Functions of Randomization:

a. To provide an unbiased estimate of


experimental error and treatment means.

b. Precaution against disturbances that


may be present.
CHOOSING EXPERIMENTAL DESIGN

Design Depends on
❑ Types of Treatments
❑ Number of Treatment
❑ Arrangement of Treatment
❑ Objectives of a study
❑ Inherent Variation in experimental area
Choose less complicated design
EXPERIMENTAL DESIGNS

• Complete block
• Incomplete block

• One factor
• Multiple factors
EXPERIMENTAL DESIGNS

Commonly used designs:

♦ Completely randomized (CRD)

♦ Randomized complete block (RCBD)

♦ Latin square (LS)


♦ Split-plot (SP)
♦ Incomplete block designs (IBD)
COMPLETELY RANDOMIZED DESIGN
(CRD)

❑ Each experimental unit has the same chance


of receiving a treatment in a completely
randomized manner

❑ Difference among experimental units given the


same treatment is attributed to experimental
error
When to Use CRD?

1. Experimental units are homogeneous. i.e.


low inherent variations.

2. When losses of observations is expected,


or already have unequal replications.

3. In small experiments, where error d.f. is


small.
When to use CRD cont.

Homogeneous environment
Laboratory
Green-house
Growth chamber
CRD cont.
❑Advantages:

1. Very flexible for different number of


treatments and replications.
2. Analysis is easy, even with unequal
reps, and heterogeneous errors. i.e. it
can be used if treatments with different
error variances.
3. Missing observations are easy to
analyze.
4. Maximum error d.f. is possible
compared to other design.
CRD

Disadvantage
1. Requires uniform experimental units.
2. All non-treatment variation is
labeled as experimental error.
3. Size limitation is usually restricted
to small experiments.
CRD

✪L.A.M. (Linear Additive Model):


ij =  + i + Eij
where ij = j th observation of the i th treatment.
 = overall mean
i = i th treatment effect (i - )
Eij = effect of the j th observation of the i th
treatment.
j = 1,......., r
i = 1,......., t.
P1R2 P2R4 P1R1 P3R4

P3R1 P2R3 P2R1 P2R2

P1R3 P3R3 P3R2 P1R4

Figure 1. Lay-out in CRD, P = progeny


Analysis

Source df SS MS E(MS)
______________________________________
Treatments t-1 Xi./r - X..2/rt SSt/dft σ2+rσ2
Error t(r-1) subtraction SSe/dfe σ2
Total SS - SSt

Total rt–1  Xij 2 - X..2/rt


Example

❖Manual analysis of variance

❖SAS analysis of variance


Treatment Biomass(g/plot) Treat. Treat.
Replication Total Mean
(Ti)

I II III IV

t1

t2

t3

t4

t5
Analysis of variance

Correction factor (C.F.) = (GT)2/rt where, r = No of rep


t = number of treatment
Total sum of squares (Total SS)
Total SS = ∑∑Xij 2 –C.F.
Sum of squares due to treatment (SSt)
SSt = (∑Ti)2/r – C.F.
Sum of squares due to error (SSE) = Total SS-SSt
Degrees of freedom for treatment (DFt)
DFt = number of treatment (t)-1
Degrees of freedom for error (Dfe)
Dfe = t(r-1)
Mean square due to treatment (MSt)
MSt = SSt/DFt
Mean square due to error (MSE)
MSE = SSE/Dfe
Coefficient of variation (CV%) = 100(MSE)/GM
SAS statement for CRD

 SAS CRD.doc
 CRD-SAS-output.doc
Raw data of effects of applying six herbicide types on
biomass of broad leaved weed species evaluated at ..

Treatment Biomass(g/plot) Treat. Treat.


Replication Total Mean
(Ti)
I II III IV

h1 17 20 16 21
h2 20 21 18 17
h3 18 19 21 16
h4 13 18 14 17
What design to control
one direction of inherent variation?

Low fertility High fertility


RANDOMIZED COMPLETE BLOCK (RCBD)

 Treatments assigned blocks consisting of


experimental units

 To keep variability among experimental units


within a block as small as possible

 To maximize differences among blocks

 No contribution to precision in detecting


treatment difference if no block differences
When to use RCBD?

1. Experimental units must be able to


be blocked in some way.

2. Each block must be large enough to


accommodate all treatments.

➭It is one of the designs that employ


a two-way classification of
treatments.
Low fertility
Gradient

Block-I

Block-II

Block-III

High fertility
RCBD cont.
L.A.M.:
ij =  + i + j + Eij

where ij = observation of the ith treatment in the jth block.


 = overall mean
i = ith treatment effect (i - )
j = jth block effect (j - )
Eij = effect of the ith treatment in the
jth block (ij - i -j + )
i = 1,......., t.
j = 1,......., r
Figure 2: Lay-out in RCBD

Block 1 Block 2 Block 3 Block 4

T1 T2 T1 T3
T3 T1 T2 T4

T2 T4 T3 T1

T4 T3 T4 T2

Low fertility High fertility


RCBD cont.

❑Advantages:

1. More precise than CRD if blocking is


effective.
2. Can handle more treatments or blocks as
an effort to control experimental error.
3. Analysis is relatively easy.
4. Error sum of squares may be partitioned
into heterogeneous components.
5. Provide information on the uniformity of
experimental units.
RCBD cont.

❑ Disadvantages:

1. If blocking is ineffective, precision is lost.

2. Block size depends on the number of


treatments.

3. Limiting the total number of observations


will limit the number of blocks or
treatments. (n = rt).
Treatment Biomass(g/plot) Treat. Treat.
Block Total Mean
(Ti)

I II III IV

t1

t2

t3

t4

t5

t6
Analysis of variance
• CF
• Total SS
• SSt
• SSB
• SSE
Analysis

Source of Variation df SS E(MS)

Blocks b-1 ΣX.j2 / t-X.. 2 / bt σ2 + tσ2β

Among Treatments t-1 Σxi. / r-X..2 / bt σ2 + bσ2


2
Blocks X Treatments (b-1)(t-1) subtraction σ

Total bt-1 ΣX. ij2 -X.. 2 / bt


SAS Statement for RCBD

❖SAS-RCBD.doc
Table . Raw data of six palm progenies
evaluated for oil to bunch ratio (%)

Progeny Oil to bunch ratio (%) Treat. Treat.


Replication Total Mean
(Ti)
I II III IV

P1 27.0 30.1 26.4 31.3

P2 30.4 31.1 28.9 27.7

P3 28.1 29.0 31.0 26.2

P4 23.1 28.2 24.1 27.0

P5 27.3 27.0 24.0 28.3

P5 23.1 21.4 20.1 23.3


Wet

What design to control


two directions of inherent variations?

Low fertility High fertility


dry
LATIN SQUARE DESIGN (LSD)

 Handles simultaneously 2 known sources


of variation among experimental units

 Treats sources as 2 independent blocking


criteria (row and column)

 Every treatment occurs only once in each


row-block and once in each column-block

 Number of replications depends on number


of treatments
LATIN SQUARE cont.

❑ Useful for experiments with 4 to 8


treatments with a single
experimental unit per treatment in
each column and row
Figure 3. Lay-out in LS
Wet
Column Column 2 Column 3 Column 4
1
Row 1 T1 T2 T3 T4
Row 2 T2 T3 T4 T1
Row 3 T3 T4 T1 T2
Row 4 T4 T1 T2 T3
dry
Low fertility High fertility
ANOVA for Latin Square vs. RCBD

Source d.f. RCBD d.f.

Rows r-1 Block r-1


Columns r-1
Treatments r-1 Trt. t-1
Error (r-1)(r-2) Error (r-1)(t-
1)

Total r2-1 tr-1


LATIN SQUARE cont.

Advantages:

❑ MSe is reduced if blocking is effective


in both directions.
❑ ANOVA is straight forward.
❑ May evaluate the effects of possible
gradients.
LATIN SQUARE cont.

Disadvantages

❑ If blocking is ineffective in either or


both direction, MSe is not reduced.

❑ MSe and all F-tests are invalid if


rows, columns or treatments interact.

❑ Size limitations.
L.A.M. Model for Latin square:

Xij(k) = μ + ρi + Kj + T(k) + Eij(k)


(GT ) 2
Correction factor (C.F.) =
t2 , where, t = number of treatments
=

Total sum of squares (Total SS)


Total SS = ∑∑Xijk 2 –C.F.
Sum of squares due to Row (SSR)

 Rj
2
SSR = C.F.

t
Sum of squares due to Column
(SSC)
C –
2
k
SSC = C.F.
t

Sum of squares due treatment


(SSt)
T – C.F.
2
k
SSt = t =
Table . ANOVA table for 6 x 6 Latin square design

Source of
variation d.f. SS MS F-cal 0.05 0.01

Rows r-1
Columns r-1
Treatments r-1
Error (r-1)(r-2)

Significant at level of probability


Analysis Using SAS
SAS_LSD.doc
Table . Body weight of animals fed on different rations

Column
Row 1 2 3 4 5 6 Rj Mean

1 F(3.19) E(3.50) D(3.27) C(2.62) B(2.82) A(1.91)


2 E(3.27) C(2.41) A(1.91) D(2.91) F(3.13) B(2.95)
3 B(3.04) A(1.91) F(3.25) E(3.29) D(2.91) C(3.07)
4 A(1.77) B(3.04) E(3.40) F(2.99) C(2.82) D(3.50)
5 D(3.50) F(3.31) C(3.09) B(3.04) A(1.91) E(3.27)
6 C(2.52) D(2.86) B(2.91) A(1.77) E(3.30) F(2.98)

Ck
What design
when there are two or more
factors
TWO OR MORE FACTORS

• Factorial in CRD, RCBD, LS


• Split-plot
• Split-split-plot
• Split-block, not common
• Fractional Factorial
FACTORIAL EXPERIMENT

❑ A treatment arrangement,
not experimental design
❑ A method of determining the
treatment to be employed
• Factor - a specific type of treatment
• Level - a state of a factor.
FACTORIAL EXPERIMENT cont.

Purpose/Advantage:
• To increase the scope of inferences on:
i.e., to determine:
• Important factors
• Optimum levels of those factors
• Interactions between factors  joint importance
• Improved precision  not always true in all
cases
FACTORIAL EXPERIMENT cont.

Simple effect
- effect of 1 factor measured at a
specific level of all other factors.
Main effect
- the average of all simple effects of
a
single factor.

Interaction
- the failure of simple effects of one
factor to be the same at every level
of the other factor.
Tells the measure of simple effect of A at
the level of B.

Eg. 2 x 2 factorial (2n)

a1 a2
Sum of diagonal rule : b1

b2
Various Types of Interaction

b2
b1

No interaction Change in magnitude


but no change in ranking
b2

b1
Magnitude and direction Change in magnitude
change i.e. b1 respond and rank, but not in direction.
a negative way to increasing
levels of A.
FACTORIAL EXPERIMENT cont.

❑ Conditions for no interaction

1. The effects of B are the same at all levels of A


2. The effects of A are the same at all levels of B
3. The effects of A and B are completely
additive
4. The residuals (unexplained effects) are all
zero
FACTORIAL EXPERIMENT cont.

Test of significance are independent

e.g. if interaction AB are significant,


 A and B are non-independent,
but if interaction AB non-significant
 effect of A and effect of B are
independent i.e. their effects are additive.
Significance of A or B alone does not show
that the interaction AB are significant or not.
Type of factors useful in factorial
experiments

1. Qualitative (specific)

2. Quantitative

3. Ranked qualitative

4. Sample qualitative
Factorial in RCBD

❑a complete factorial has all factors


cross-classified to each other.

❑Cross-classification  all levels of a


factor correspond specifically to
equivalent levels at every level all
the other factors.
Factorial in RCBD cont.

e.g. 3 X 2 factorial (3 varieties, 2 N rates)

N rate
Variety 100 200
A - -

B - -
C - -
Nested relationship
❑levels of on factor are specific for specific
levels of the other factor.

e.g. 3 x 2 factorial (6 var., 2 N rates)

N rates
Variety 100 200
A -
B -
C -
D -
E -
F -

i.e. levels of variety are nested within N rates


If done in 2 locations to examine interaction
between varieties and location:

L1 L2
B1 B3
B2 B1
B3 B2

blocks are nested within locations


V1 V1
V2 V2
V3 V3
varieties are cross- classified with locations
Rootworm damage ratings & insecticide treatment (j)

Hybrid Rep Carbamate Organophosphate None


A 1 69 79 41
2 57 81 52
3 60 75 37
4 53 77 45
5 60 65 41
B 1 93 84 10
2 82 86 17
3 87 79 9
4 95 72 21
5 91 66 12
C 1 80 70 11
2 84 69 24
3 81 81 19
Correction factor (C.F.) = (GT)2/abr

Total sum of squares (Total SS)

SSt = (∑Ti 2)/r – C.F.

SS S. error) = Total SS–SSt

MSt = SSt/DFt

MSE = SSE/DFe
Correction factor (C.F.) = (GT)2/abr

Total sum of squares (Total SS)

SSt = (∑Ti2)/r – C.F.

SSB = (∑Bj2)/t – C.F.

SS S. error = Total SS–SSt-SSB


Analysis of variance for factorial in CRD

Sources of
variation DF SS MS F-calc. F-tab (0.05) (0.01)
Treatments t-1 SSt MSt**
Error t(r-1) SSe MSe

Total tr-1 Total SS

** = Significant at 0.01 level of probability


Analysis of variance for factorial in RCBD

Sources of
variation DF SS MS F-calc. F-tab (0.05) (0.01)
Block b-1 SSB MSB**
Treatments t-1 SSt MSt**
Error (t-1)(r-1) SSe MSe

Total tr-1 Total SS


Analysis of variance for factorial in CRD

Source of DF SS MS F-calc. F-tab (0.05) (0.01)

variation
Treatment t-1 SSt MSt**
Factor A a-1 SSA MSA
Factor B b-1 SSB MSB
A x B (a-1)(b-1) SSAxB MSAxB
Error t(r-1) SSe MSe

Total
** = Significant at 0.01 level of probability
Analysis of variance for factorial in RCBD
Sources of
variation DF SS MS F-calc. F-tab (0.05) (0.01)
Block b-1 SSB MSB**
Treatments t-1 SSt MSt**
Factor A a-1 SSA MSA
Factor B b-1 SSB MSB
AxB (a-1)(b-1) SSAxB MSAxB

Error (t-1)(r-1) SSe MSe

Total tr-1 Total SS


• Standard error

Variety: SE (m) =± MS E
rn

Nitrogen: SE MS E
(m) = ± rv

VxN: SE MS E
(m) = ±
r
Figure 1. Effect of insecticide on the control of
corn rootworm

10
Corn rootworm score (0-10)

9
8
7
6 A
5 B
4 C
3
2
1
0
C O N

Insecticide
One MSc student conducted an experiment to determine the effect of
different levels of nitrogen fertilizer (N0, N1, N2, ) on three varieties (V1, V2
and V3). A 3 x 3 factorial experiment in RCBDBlock
Treatment with three replications was
used, keeping all other culturalI practices as recommended
II III the area.
for
The grain yield data were given in Table 1 for analysis and appropriate
interpretation.V1N0 3.85 2.61 3.14

V1N1varieties tested
Yield data of three 4.79 at different
4.94 4.56fertilizer .
levels of nitrogen

V1N2 4.58 4.45 4.88

V2N0 2.84 3.79 4.11

V2N1 4.96 5.13 4.15

V2N2 5.93 5.70 5.81

V3N0 4.19 3.75 3.74


SPLIT PLOT DESIGN

o Assigns main-plot factor to main-plots and


sub-plot factor to sub-plots

o Sub-plots are division of main-plots hence,

o Main-plot becomes a block for sub-plot


treatments
Suitability of Split-plot

 Two-factor experiment with many treatments such


that they cannot be accommodated by a complete
block designs

 Improves precision for measurement of effects


of sub-plots at the expense of main-plot
Table . Treatment randomization in split-plot in RCBD

Block 1 Block 2

Main-plot1 Main-plot2
SP1 SP2 SP3 SP3 SP1 SP2

Main-plot3 Main-plot1
SP2 SP3 SP1 SP2 SP1 SP3
Main-plot2 Main-plot3
SP1 SP3 SP2 SP1 SP2 SP3
Table . Yield (t/ha) of at different density and sowing time

Density Replication Total


Sowing date Sowing
I II III IV V VI Date(Si)
S1

D1 5.59 5.50 5.25 5.31 4.63 4.69


D2 5.69 6.66 6.22 5.84 5.97 6.03
D3 6.90 5.28 6.66 6.19 6.40 6.38

Sum 18.18 17.44 18.13 17.34 17.0 17.1 105.19


S2

D1 5.84 5.63 6.14 5.94 6.13 6.08


D2 6.90 6.86 7.03 6.94 6.22 6.88
D3 7.09 6.72 6.25 6.36 6.34 6.81

19.83 19.21 19.42 19.24 18.69 17.77 116.16

S3

D1 5.23 5.41 5.25 5.71 5.06 5.75


D2 6.60 6.55 7.00 6.28 6.75 7.03
D3 6.03 6.52 6.12 5.72 6.13 6.88

Sum 17.86 18.48 18.37 17.71 17.94 19.66 110.02


(GT ) 2 (331.37) 2
Correction factor (C.F.) = abr = 3 x3 x 6

=2033.45

Total sum of squares (Total SS)

Total SS = ∑∑∑Xijk 2 – C.F.


= [(5.59)2 + (5.50)2 + … + (6.88)2] – 2033.45
=21.00

Sum of squares due to replication (SSR)


R
2
k – C.F.
SSR = ab
(55.87) 2  (55.13) 2  ...  (56.53) 2 – 2033.45
=
3 x3
= 0.67
5. Sum of squares due sowing date (SS S)

Here we need to have a two-way table between main-plot and replication.


Table 5.6. Grain yield (t/ha) of a wheat variety at different density and sowing time

I II III IV V VI Si
S\rep
S1 18.18 17.44 18.13 17.34 17.0 17.1 105.19

S2 19.83 19.21 19.42 19.24 18.69 17.77 116.16

S3 17.86 18.48 18.37 17.71 17.94 19.66 110.02

Rk 55.87 55.13 55.92 54.29 53.63 56.53 331.37

Then, SSS
SS due to error (a)
It is necessary to make a two-way table between sowing time and density as follows.

D1 D2 D3 Si
S\rep
S1 30.97 36.41 37.81 105.19
S2 35.76 40.83 39.57 116.16
S3 32.41 40.21 37.40 110.02
Dj 99.14 117.45 114.78 331.37
SSd
SSs
SS due to error
(b)
Table . ANOVA table for sowing time x plant density study
in split-plot design
Source of d.f. SS MS F-cal F-tabulated
variation 0.05 0.01

Replications 5 0.67 0.134 0.96 2.53 5.53


Sowing, S 2 3.35 1.675 18.41** 3.32 8.77
Error(a) 10 0.91 0.091 0.65 2.16 4.24
Densities, D 2 10.87 5.435 38.90** 3.32 8.77
SxD 4 1.01 0.2525 1.81 2.69 6.12
Error (b) 30 4.19 0.1397

**, significant at 0.01 level of probability


Split plots
An experiment on maize experimental hybrid Hy-59
was conducted to study the effects of 3 population
densities and 3 fertilizer rates on silage yields. The
design was a RCB with 4 replications in split-plots.
The whole plots were population densities of 24000,
30000 and 36000 plants per hectare. Each main-plot
was divided into sub-plots fertilized with 120, 150
and 180 kg/ha of nitrogen.

Calculate the analysis of variance. Test sources of


variation due to main effects and the interactions
using the appropriate or approximate F-tests.
Interpret the results in a clear and concise
discussion. Include the standard errors for
population density means and the fertilizer means.
24000 120 9.0 8.3 8.5 8.6
150 9.5 8.0 8.8 7.8
180 10.7 9.5 8.8 8.9

30000 120 9.2 9.8 11.2 7.9


150 8.9 9.8 10.4 8.6
180 9.2 10.6 9.1 8.7

36000 120 8.1 7.4 8.2 8.5


150 9.2 8.1 7.8 8.9
180 9.3 7.4 9.4 8.3
Regression and correlation

* Is the relationship linear (straight-line)?


* Does the value of y depend upon the value
of x or vice versa?
* How strong is the relationship, do the
points form a perfect line?
* Does the scatter represent nothing more
than a random collection of points since
there is no relationship?
* Can we predict the value of x if we know y,
and vice versa?
Correlation:
Very often we are interested in knowing what
association, if any, exists between two
measurements or variables. One way to
determine this is to calculate the correlation
coefficient, a commonly used index which
measures the degree of association between
two variables.

r has a value between -1 and +1 145


From this r is

146
Perfect positive correlation r = 1.0

r = 0 No relationship

negative correlation r = -0.6

147
r does not depend on units: changing
cm to mm does not affect correlation
but does to slope
r does not detect cause and effect It
measures how the variables covary
r quantifies the strength of linear
relationships

148
Regression
Yield

Fertilizer level

149
Regression

To see the association between variables we


need a formal model. The most general form of
such model is:
Y=f(X1, X2, ..., Xn) +  .............1
For n=1, (1) becomes a simple linear regression
For n2, (1) becomes a multiple linear
regression.
independent, predictor or explanatory
variable is denoted by X

the dependent or response variable


and is denoted by (Y)
150
Simple Linear Regression
The functional form of the linear relationship between a
dependent variable Y and independent variable X is
represented by the equation:
Y= a + bx

Where a is the intercept


b is slope of the line

We test the hypothesis.


Ho:  = 0 i.e. X is not useful as a predictor of
Y. OR
There is no linear relationship between X & Y
PROC REG;
MODEL dep=indep;
RUN; 151
152
Residuals
• Residuals are the observed value - the fitted
value, i.e. the red line on the previous slide.
• Plots involving residuals can be very
informative. They can:
• help assess if assumptions are valid
• help assess if other variables need to be taken into
account

153
SAS statement
data crdanov;
input trtment $ regen;
cards;
M1 12
M1 15
M1 16
M2 10
M2 9
M2 11
M3 15
M3 18
M3 17
M4 9
M4 8
M4 7
;
proc anova;
class trtment;
model regen=trtment;
means trtment/lsd;
run;
Data rcbdanov;
Input genotype block DS;
Cards;
1 1 53
2 51
3 50
4 52
2 1 38
2 43
3 40
4 40
3 1 49
2 48
3 52
4 45
4 1 45
2 47
3 50
4 46
5 1 37
2 36
3 37
4 42
;
proc anova;
class genotype block;
model DS=genotype block;
means genotype/Duncan;
run;
OUTPUT
Output from proc anova of the above data is given in Table 4.40
SAS output for 10 genotypes replicated 4 times in RCBD
The SAS System Analysis of Variance
Class Levels Values
BLOCK 4 1 2 3 4
GENOTYPE 10 1 2 3 4 5 6 7 8 9 10
Number of observations in data set = 40
Dependent Variable: DS
Source D.F. Sum of Squares Mean Square F Value Pr > F
Model 12 1031.80000000 85.98333333 17.30 0.0001
Error 27 134.17500000 4.96944444
Total 39 1165.97500000
R-Square C.V. Root MSE DS Mean
0.884925 4.770947 2.22922508 46.72500000
Source D.F. SS Mean Square F Value Pr > F
BLOCK 3 19.07500000 6.35833333 1.28 0.3014
GENOTYPE9 1012.72500000 112.52500000 22.64 0.0001
Alpha= 0.05 d.f.= 27 MSE= 4.969444, Critical Value of T= 2.05
Least Significant Difference= 3.2343
Means with the same letter are not significantly different.
T Grouping Mean N GENOTYPE
A 53.250 4 8
A
B A 51.500 4 1
B A
B A 51.500 4 6
B
B C 50.000 4 10
B C
B C D 48.500 4 3
C D
C D 47.000 4 4
D
D 46.250 4 7
E 41.000 4 9
E
E 40.250 4 2
data LSDanov;
input row column trtment score;
cards;
1 1 1 33.8
2 4 1 34.6
3 3 1 36.9
4 2 1 37.1
5 5 1 36.4
1 2 2 33.7
2 3 2 33.5
3 5 2 35.1
4 4 2 38.1
5 1 2 34.8
;
proc anova;
class row column trtment;
model score=row column trtment;
means trtment/lsd;
run;
Class Levels Values
ROW 5 1 2 3 4 5
COLUMN 5 1 2 3 4 5
TRTMENT 5 1 2 3 4 5
Number of observations in data set = 25
Analysis of Variance Procedure
Dependent Variable: EL
Source D.F. Sum of Squares Mean Square F Value Pr > F
Model 12 259.85920000 21.65493333 7.06 0.0010
Error 12 36.79920000 3.06660000
Total 24 296.65840000
R-Square C.V. Root MSE EL
0.875954 5.134194 1.75117104 34.10800000
Source D.F. Anova SS Mean Square F Value Pr > F
ROW 4 87.40240000 21.85060000 7.13 0.0035
Column 4 16.56240000 4.14060000 1.35 0.3079
TRTMENT 4 155.89440000 38.97360000 12.71 0.0003
T tests (LSD) for variable: EL
Alpha= 0.05 d.f.= 12 MSE= 3.0666
Critical Value of T= 2.18
Least Significant Difference= 2.4131
Means with the same letter are not significantly different.
T Grouping Mean N
TRTMENT
A 35.760 5 1
A
A 35.680 5 3
A
A 35.040 5 2
A
A 34.900 5 4
B 29.160 5 5
data splot;
input vy $ date $ blk yld;
cards;
BC10 d1 1 2.2
BC10 d1 2 2.0
BC10 d1 3 2.3
BC10 d2 1 3.2
BC10 d2 2 3.3
BC10 d2 3 3.4
BC10 d3 1 4.0
BC10 d3 2 4.1
BC10 d3 3 4.2
BC9 d1 1 1.8
BC9 d1 2 1.9
BC9 d1 3 2.2
BC9 d2 1 2.4
BC9 d2 2 2.4
BC9 d2 3 2.5
BC9 d3 1 3.1
BC9 d3 2 3.2
BC9 d3 3 3.3
;
proc anova;
class vy date blk;
model yld=vy blk vy*blk date vy*date;
test h=vy blk e=vy*blk;
means vy/lsd e=vy*blk;
means date vy*date/lsd;
run;
;
Class Levels Values
V 2 BC10 BC9
DATE 3 d1 d2 d3
BLPCK 3 123
Dependent Variable: YIELD(t/ha)
Source D.F. Sum of Squares Mean Square F Value Pr > F
Model 9 10.06500000 1.11833333 154.85 0.0001
Error 8 0.05777778 0.00722222
Total 17 10.12277778
R-Square C.V. Root MSE YIELD Mean
0.994292 2.970303 0.08498366 2.86111111
Source D.F. Anova SS Mean Square F Value Pr > F
V 1 1.93388889 1.93388889 267.77 0.0001
BLOCK 2 0.13777778 0.06888889 9.54 0.0076
V*BLK 2 0.00444444 0.00222222 0.31 0.7435
DATE 2 7.52111111 3.76055556 520.69 0.0001
V*DATE 2 0.46777778 0.23388889 32.38 0.0001

Tests of Hypotheses using the Anova MS for V*BLOCK as an error term


Source D.F. Anova SS Mean Square F Value Pr > F
V 1 1.93388889 1.93388889 870.25 0.0011
BLOCK 2 0.13777778 0.06888889 31.00 0.0313
Alpha= 0.05 d.f.= 2 MSE= 0.002222
Critical Value of T= 4.30
Least Significant Difference= 0.0956
data diet;
input diet $ breed $ wt;
cards;
a1 b1 30
a1 b1 31
a1 b1 29
a1 b2 35
a1 b2 36
a1 b2 34
a2 b1 37
a2 b1 38
a2 b1 36
a2 b2 40
a2 b2 41
a2 b2 39
;
proc anova;
class diet breed;
model wt=diet breed diet*breed;
means diet breed/lsd;
run;
Source DF Sum of Squares Mean Square F Value Pr > F
Model 11 38286332.15096790 3480575.65008800 42.17 0.0001
Error 60 4952506.92847634 82541.78214127

Total 71 43238839.07944430
R-Square C.V. Root MSE MOE Mean
0.885462 8.717321 287.30085649 3295.74722222

Source DF Anova SS Mean Square F Value Pr > F


AGE 2 440720.21861088 220360.10930544 2.67 0.0775
PASIZE 1 12.93482661 12.93482661 0.00 0.9901
AGE*PASIZ 2 410702.68475962 205351.34237981 2.49 0.0916
DENSITY 1 34665578.67555560 34665578.67555560 419.98 0.0001
AGE*DENSI 2 2439446.37694442 1219723.18847221 14.78 0.0001
PASIZE*DE 1 35587.67096257 35587.67096257 0.43 0.5139
A*P*D 2 294283.58930826 147141.79465413 1.78 0.1770
SAS statement for linear regression with single predictor

data class;
input hybrid $ Eheight Cweight @@;
data lines;
Hyb-1 69 112
. . .
Hyb-n 62 62
;
proc reg data=class;
model Cweight=Eheight;
run;
plot r.*p.;
Run;
SAS output for a response and two independent variables
MODEL: MODEL 1
DEPENDENT VARIABLE: Y
ANALYSIS OF VARIANCE
SOURCE D.F. SS MS F-VALUE PROB> F
MODEL 2 1423.83797 711.91898 113.126 0.0001
ERROR 10 62.93126 6.29313
TOTAL 12 1486.76923

PARAMETERS ESTIMATES
VARIABLE D.F. ESTIMATE ERROR T FOR H0 PROB>T
INTERCEPT 1 65.099678 14.94457 4.356 0.0014
X1 1 1.0771 0.077 13.975 0.0001
X2 1 0.4254 0.07315 5.815 0.0002
MEAN COMPARISONS

t-test

 Tests difference between 2 means with


no specific experimental design

 Means can come from dependent (paired)


or independent (unpaired) experiment
Table 1: Effect of selfing on seedling height
S0 S1
Plant number Seedling height (cm)
1 20 12
2 18 10
3 22 8
4 28 5
5 17 4
6 20 2
7 19 6
8 25 3
9 27 11
10 30 7
Mean 22.60 6.80
SAS output

T Prob>|T|

8.839 0.0001

Significant at 0.01 level of probability

Mean of S0 is higher than S1


MULTIPLE COMPARISON OF MEANS

Major statistics for mean comparison


❖ LSD
❖ DNMRT
❖ Tukey’s
Least significant difference

LSD = t(1- α)/2 2 MS E


r

Cochran and Cox (1957)


Taking five treatments, for example, the process can be
explained as T1, T2, T3, T4 and T5 with mean 9, 10, 16,
17 and 22, respectively, with α = 0.05, error degrees of
freedom 20, and MSE for the five treatments = 8.06, then
the LSD = 4.37. From this statistics one can see that
treatment 5 had the highest mean and two pairs of
means that do not significantly differ from each other
are T1 and T2 and T3 and T4.

T1 T2 T3 T4 T5
9 10 16 17 22
Tukey’s Test

(Maximum mean - minimum mean)


q= MS E
r

Tukey’s test declares two means significantly


different from each other, if the absolute value of
their sample differences exceeds T.
.

Tα = qα(t, d.f.) MS E
r
8.06
qα (5, 20) = 5

= 5.37.

Hence, any pair of treatment means that differ


in absolute value by more than 5.37 would
imply that the corresponding pair of population
means is significantly different.

T1 T2 T3 T4 T5
9 10 16 17 22
Duncan’s New Multiple Range Test
(DNMRT)
From Duncan’s Table of significant ranges,
Appendix Table A.7 of Steel and Torrie (1980), rα
(p, d.f.) values are obtained for t = 2, 3, …, t
where α is the significant level and d.f. is error
degrees of freedom. This ranges will be
converted into a set of t-1 least significant
ranges (Rp) for p = 2, 3, …, t by calculating Rp).
Rp = rα(p, d.f.) SE(m) for p = 2, 3, …, t.
DNMRT can be applied on the previous example.
Recalling that MSE = 8.06, n = 5 and error degrees of
freedom = 20 and then the treatment means can be
ordered in ascending order as T1=9, T2 = 10, T3 = 16, T4 =
17 and T5 = 22.

MS E
The SE (m) = r
8.06
= 5

= 1.27.
The comparison would yield that there are
significant differences between all pairs of
means except treatment 1 and 2 and 3 and 4. In
this example, DNMRT and the LSD method
produced the same result that leads to identical
conclusions.

T1 T2 T3 T4 T5
9 10 16 17 22
Which comparison method is the best?

• No clear-cut answer to this question


• Carmer and Swanson (1973) have
conducted simulation studies and
concluded that

• LSD or DNMRT suffice the present


knowledge
Group comparison method

 Structure or nature treatments are such


that
they need to be meaningfully grouped and
compared = contrast
 Examination of functional relationship
between
treatment levels and treatment means = trend
analysis
TREND ANALYSIS
Linear and Curvilinear Trends

❑ Linear and Curvilinear Trends


➭involving quantitative factors i.e. rate
and date type studies.

➭Objective : to study the rate of change of


a variable with increasing levels of a
specific factor.
 linear and curvilinear trends.
eg.
➭variable (y) : Yield
➭factor (x) : N fertilization rate
1. Linear

y = a + bx
Grain
yield

150
N fertilization
2. Quadratic
Nutrient release

y = a+b1x+b2x2

Time
3. Cubic
Specific leaf weight

y = a+b1x+b2x2+b3 x3

3 turning points
- max, min change of the curve

Time
Trends Cont.

Considerations (Questions):
• Is there a response?
• Is the response predictable or
explainable?
• What is the nature of the response?
• What model best describe the response?
• Is the response significant?
• practical view point
• statistical view point
GENERAL PRINCIPLES
1. For t treatments, SSt may be partitioned into t-1
portions specified by increasing higher order contrasts.
______________________________________________________
t df terms
2 1 L invalid not done
3 2 L, D
4 3 L, Q, D
5 4 L, Q, C, D
. .
. .

1. Quantitative treatments factor levels need not


be equally spaced, but will make calculations
easier if equally spaced.
2. Tests of significance made exactly as described for
contrast.
3. Interpretation is never complete without a graph.
Trts. Totals
Angle 0 60 120 180

Contrast 970.34 978.78 1087.05 1031.49


L -3 -1 1 3
Q 1 -1 -1 1
Dev. Fr Q -1 3 -3 1

Q K SS

L 231.72 20 134.24
Q -4.36 4 0.02
D -84.86 20 18.00
ANOVA

Source df. MS F

Reps 19 3.15 2.22**


Angles
L 1 134.04 94.39**
Q 1 0.02 0.01
D 1 18.00 12.68**
Error 57 1.42

Total 79
Conclusion
1. Linear response is significant.
2. There is also a significant curvilinear
response, a response that is not quadratic,
but maybe cubic?

linear

cubic
WHEN ASSUMPTIONS OF ANOVA

o Normality

o Independence

o Homogeneity

o Additiveness are violated

o Randomness

Data transformation
DATA TRANSFORMATION
LOG TRANSFORMATION

 Suitable for count data where standard


deviation is proportional to mean

Example
- Number of insects per plot
- Number of diseased plants per
plot
SQUARE-ROOT TRANSFORMATION

 Suitable for percentage data

 Variance is proportional to mean


Examples :
- Number of seeds germinated in a plot
- Number of insects caught in traps
ARC SINE OR ANGULAR TRANSFORMATION

 Suitable for count data expressed percentages

 Percentage ranging from 0 to 100%


CONCLUSION

• Choice of design
• Precision
• Treatment
• Inherent variation
• Knowledge

❑ Less complicated with good precision


❑ Computer know-how
❑ Consult statistician during planning

You might also like