Professional Documents
Culture Documents
MATHEMATICAL ASPECTS
Target Population
Let (X1 , X2 , . . . , Xn ) and (Y1 , Y2 , . . . , Yn ) be
The concept of the target population is two samples of dimension n. The objective
defined as a population to which we would is to test the following two hypotheses:
like to apply the results of an inquiry.
Null hypothesis H0 : The two variables X
and Y are independent.
EXAMPLES
Alternative hypothesis H1 : The two vari-
We want to conduct an inquiry on foreigners ables X and Y are not independent.
living in city X. The population is represent-
ed by the list of foreigners registered in the The test of independence consists in compar-
city. ing the empirical distribution with the theo-
The target population is defined by the col- retical distribution by calculating an indica-
lection of foreigners living in city X, includ- tor.
ing those not on the list (unannounced, clan- The test of independence applied to two con-
destine, etc.). tinuous variables is based on the ranks of the
observations. Such is the case if the tests are
based on the Spearman rank correlation
FURTHER READING
coefficient or on the Kendall rank corre-
Population
lation coefficient.
In the case of two categorical variables, the
most widely used test is the chi-square test
Test of Independence of independence.
A test of independence is a hypothesis test
where the objective is to determineif two ran- DOMAINS AND LIMITATIONS
dom variables are independent or not. To make a test of independence each couple
(Xi , Yi ), i = 1, 2, . . . , n must come from the
same bivariate population.
HISTORY
See Kendall rank correlation coefficient,
Spearman rank correlation coefficient, EXAMPLES
and chi-square test of independence. See chi-square test of independence.
536 Time Series
3. Can we trust the available data? Dufour, I.M.: Histoire de l’analyse de Séries
The problem may arise of breaks of series Chronologues, Université de Montréal,
resulting from insufficiency of data or Canada (2006)
a change in the quality of variables. Funkhauser, H.G.: A note on a 10th century
graph. Osiris 1, 260–262 (1936)
EXAMPLES Kendall, M.G.: Time Series. Griffin, London
A time series is generally represented as fol- (1973)
lows: Macauley, F.R.: The smoothing of time
series. National Bureau of Economic
Research, 121–136 (1930)
Persons, W.M.: An index of general business
conditions. Rev. Econom. Stat. 1, 111–
205 (1919)
Trace
We distinguish four components:
For a square matrix A of order n, we define
• Secular trend, slightly increasing in the
the trace of A as the sum of the terms situated
present case
on the diagonal. Thus the trace of a matrix is
• Seasonal variations, readily apparent
a scalar quantity.
• Cyclical fluctuations, in the form of cycles
of an approximate amplitude of 27 units
MATHEMATICAL ASPECTS
of time
Let A be a square matrix of order n, A =
• Irregular variations, generally weak
aij , where i, j = 1, 2, . . . , n. We define
enough, except t = 38, which repre-
trace A by:
sents a very abrupt fall that would not be
n
rational tr (A) = aii .
i=1
FURTHER READING
Transformation Dependent variable
A transformation is a change in one or many Independent variable
variables in a statistical study. Logarithmic transformation
We transform variables, for example, Normal distribution
by replacing them with their logarithms Regression analysis
(logarithmic transformation). Variance
Treatment 541
MATHEMATICAL ASPECTS
Treatment
If A = aij , i = 1, 2, . . . , m and j = In an experimental design, a treatment is
1, 2, . . . , n is a matrix of order (m × n), then a particular combination of levels of various
the transpose of A is the matrix A of order factors.
(n × m) given by:
A = aji with j = 1, 2, . . . , n EXAMPLES
Experiments are often carried out to compare
and i = 1, 2, . . . , m .
two or more treatments, for example two dif-
Transposition is a reversible and recipro- ferent fertilizers on a particular type of plant
cal operation; thus taking the transpose of or many types of drugs to treat a certain ill-
the transpose of a matrix, we find the initial ness. Another example consists in measur-
matrix. ing the time of coagulation of blood sam-
ples of 16 animals having supported different
regimes A, B, C, and D.
DOMAINS AND LIMITATIONS In this case, different examples of treat-
We use the transpose of a matrix, or more ments are, respectively, fertilizers, drugs, T
precisely the transpose of a vector, while cal- and regimes.
culating the scalar product. On the other hand, in experiments that test
a particular fertilizer, for example azote, on
EXAMPLES a harvest of wheat, we can consider differ-
Let A be the following matrix of order ent quantities of the same fertilizer as dif-
(3 × 2): ferent treatments. Here, we have one factor
(azote) on different levels (quantities), for
⎡ ⎤
1 2 example 30 kg, 50 kg, 100 kg, and 200 kg.
A = ⎣ 0 3 ⎦. Each of these levels corresponds to a treat-
2 5 ment.
542 Tukey, John Wilder
αi and βj , and εijk is the experimental error To test the second hypothesis, we create
of observation Yijk . a ratio whose numerator is an estimate of the
This model is subjected to the basic assump- variance of factor B and whose denomina-
tions associated to analysis of variance if tor is an estimate of the variation within treat-
one supposes that the errors εijk are inde- ments.
pendent random variables following a nor- This ratio, denoted by F, follows a Fisher
mal distribution N(0, σ 2 ). distribution with b−1 and ab(c−1) degrees
Three hypotheses can then be tested: of freedom.
1.
Null hypothesis H0 : β1 = β2 = . . . = βb
H0 : α1 = α2 = . . . = αa
will be rejected if the F ratio is greater than or
H1 : At least one αi is different equal to the value of the Fisher table, mean-
from αj , i = j. ing if
2. F ≥ Fb−1,ab(c−1),α .
H0 : β 1 = β 2 = . . . = β b To test the third hypothesis, we create a ratio
whose numerator is an estimate of the
H1 : At least one βi is different
variance of the interaction between fac-
from βj , i = j.
tors A and B and whose denominator is an
3. estimate of the variance within treatments.
H0 : (αβ)11 = (αβ)12 = . . . = (αβ)1b This ratio, denoted by F, follows a Fisher
distribution with (a−1)(b−1) and ab(c−1)
= (αβ)21 = . . . = (αβ)ab
degrees of freedom.
H1 : At least one of the
Null hypothesis H0 :
interactions is different
from the others. (αβ)11 = (αβ)12 = . . . = (αβ)1b
The Fisher distribution is used to test the = (αβ)21 = . . . = (αβ)ab
first hypothesis. This distribution requires will be rejected if the F ratio is greater than or
the creation of a ratio whose numerator is equal to the value in the Fisher table, mean-
an estimate of the variance of factor A and ing if
whose denominator is an estimate of the vari-
ance within treatments known also as error F ≥ F(a−1)(b−1),ab(c−1),α .
or residual. T
This ratio, denoted by F, follows a Fisher Variance of Factor A
distribution with a−1 and ab(c−1) degrees For the variance of factor A, the sum of
of freedom. squaresmustbecalculated for factor A(SSA),
which is obtained as follows:
Null hypothesis H0 : α1 = α2 = . . . = αa a
SSA = b · c (Ȳi.. − Ȳ... )2 ,
will be rejected at the significant level α if i=1
the ratio F is greater than or equal to the value
where Ȳi.. is the mean of all the observations
in the Fisher table, meaning if
of level i of factor A and Ȳ... is the general
F ≥ Fa−1,ab(c−1),α . mean of all the observations.
546 Two-way Analysis of Variance
Variance of Interaction AB a b c
For the variance of interaction AB, the sum SS T = 2
Yijk .
of squares must be calculated for interac- i=1 j=1 k=1
tion AB (SSAB), which is obtained as follows: The number of degrees of freedom associ-
a b ated to this sum is equal to N, meaning the
SSAB = c (Ȳij. − Ȳi.. − Ȳ.j. + Ȳ... )2 , total number of observations.
i=1 j=1
The sum of squares for the mean is equal to:
where Ȳij. is the mean of all the observations
of level ij of interaction AB, Ȳi.. is the mean of SSM = N Ȳ...
2
.
all the observations of level i of factor A, Ȳ.j.
The number of degrees of freedom associ-
is the mean of all the observations of level j
ated to this sum is equal to 1.
of factor B, and Ȳ... is the general mean of all
The total sum of squares (SST ) can be
the observations.
expressed with all the other sums of squares
The number of degrees of freedom associ-
in the following way:
ated to this sum is equal to (a − 1)(b − 1).
The variance of interaction AB is then SST = SSM + SSA + SSB + SSAB + SSE .
equal to:
SSAB Fisher Tests
s2AB = .
(a − 1)(b − 1) We can now calculate the different F ratios.
Two-way Analysis of Variance 547
To test the null hypothesis If F is greater than or equal to the value in the
Fisher table for (a−1)(b−1) and ab(c−1)
H0 : α1 = α2 = . . . = αa , degrees of freedom, the null hypothesis is
rejected and it is concluded that H, is true.
the first F ratio is made whose numerator
is the estimation of the variance of fac-
tor A and denominator is the estimation of Table of Analysis of Variance
the variance within treatments: All the information required to calculate the
two F ratios can be summarized in a table of
s2A
F= 2 . analysis of variance:
sE
If F is greater than or equal to the value in the Source Degrees Sum of Mean of F
Fisher table for a −1 and ab(c−1) degrees of varia- of free- squares squares
of freedom, the null hypothesis is rejected tion dom
s2B
H0 : β 1 = β 2 = . . . = β b , Factor B b − 1 SSB s2B
s2E
the second F ratio is made whose numerator Inter-
(a − 1) s2AB
is the estimation of the variance of factor B action SSAB s2AB
· (b − 1) s2E
and denominator is the estimation of the vari- AB
ance within treatments: Within
treat- ab(c−1) SSI s2E
s2B ments
F= .
s2E Total abc SST
= 2 1.52 + (−1.5)2 + . . . + 12 The number of degrees of freedom associ-
= 83 . ated to this sum is equal to the number N of
observations, meaning 40.
The number of degrees of freedom associ- The sum of squares for the mean is equal to:
ated to this sum is equal to (4 − 1)(5 − 1) =
12. SSM = N Ȳ...
2
The variance of interaction AB is then
= 40 · 342
equal to:
= 46240 .
SSAB
s2AB =
(a − 1)(b − 1) The number of degrees of freedom associ-
83 ated to this sum is equal to 1.
=
12
= 6.92 . Fisher Tests
To test the null hypothesis
Variance Within Treatments (Error)
The sum of squares within treatments is H0 : α1 = α2 = α3 = α4 ,
equal to:
we form the first F ratio whose numerator
4
5
2
is the estimation of the variance of fac-
SSE = (Yijk − Ȳij. )2 tor A and denominator is the estimation of
i=1 j=1 k=1
the variance within treatments:
= (33 − 34.5)2 + (36 − 34.5)2
s2A
+ (31 − 31)2 + . . . + (33 − 33)2 F=
s2E
= 58 .
45
=
The number of degrees of freedom associ- 2.9
ated to this sum is equal to 4 · 5(2 − 1) = 20: = 15.5172 .
null hypothesis when it is true, and α is the When we leave home in the morning, we
probability of rejecting the null hypothesis wonder what the weather will be like. If we
H0 when it is true: think it is going to rain, we take an umbrella.
If we think it is going to be sunny, we do not
α = P(reject H0 | H0 is true) .
take anything for bad weather.
The probability of type I error α is equal to Therefore we are confronted with the follow-
the significance level of the hypothesis test. ing hypotheses:
Null hypothesis H0 : It is going to rain.
HISTORY Alternative hypothesis H1 : It is going to
In 1928,Neyman, J.and Pearson, E.S. were be sunny.
the first authors to recognize that a ratio-
nal choice of hypothesis testing must take Suppose we accept the rain hypothesis and
into account not only the hypothesis that take an umbrella.
one wants to verify but also the alternative If it really does rain, we have made the right
hypothesis. They introduced the type I error decision, but if it is sunny, we made an error:
and the type II error. the error of accepting a false hypothesis.
In the opposite case, if we reject the rain
hypothesis, we have made a good decision
DOMAINS AND LIMITATIONS
if it is sunny, but we will have made an error
The type I error is one of the two errors the
if it rains: the error of rejecting a true hypoth-
statistician is confronted with in hypothesis
esis.
testing: it is the type of error that can occur
We can represent these different types of
in decision making if the null hypothesis is
errors in the following table:
true.
If the null hypothesis is wrong, another type Situation Decision
of error arises called the type II error, mean- Accept H0 Reject H0
ing accepting the null hypothesis H0 when it (I take an (I do not take
is false. umbrella) an umbrella)
Thedifferenttypesof errorscan berepresent- H0 true Good Type I error
ed by the following table: (it rains) decision
H0 false Type II error β Good
Situation Decision (it is sunny) decision
Accept H0 Reject H0 T
H0 true 1−α α Theprobability α of rejecting atruehypoth-
H0 false β 1−β esis is called level of significance.
The probability β of accepting a false
with α is the probability of the type I error hypothesis is the probability of type II
and β is the probability of the type II error. error.
FURTHER READING
HISTORY Hypothesis testing
See type I error. Type I error