Professional Documents
Culture Documents
com
Statistic
Chapter 1: Introduction to Sampling
Degree in Mathematics
Isabel Pereira,
isabel.pereira@ua.pt
Aveiro University,
2023/24
Contents of Chapter 1
1Fundamentals
Probability/Statistical Inference:
In probability theory, one starts from a certain model and
calculates the probability of certain events occurring.
In statistical inference we start from a set of observations Px1, x
two, . . . , xnqand we try to infer something about the model.
STATISTICAL INFERENCE:
Goal:characterize and eventually define decision rules about a
population knowing only part of it.
Very important
In relation to samples, it must be ensured that they arerepresentation
relative to the population from which it was taken.
Sampling Processes
Stratified Sampling
Sampling Processes
Group Sampling
P tfPx;θq:θP Θu
Goal:
Know the probability distributionunspecify parameter valueθ
O
STATISTICS
StatisticÑsample function that does not depend on unknown
quantities
Notation:TPXq TPX1, . . . , Xnq
1
° n
Sample meanX̄ n i1Xi
Sample median
Sample fashion
maxPX1,...,XnqminPX1,...,Xnq
Sample centerW two
Sample median
#
XPn1 if n even
M twoq
XPnqtwo
XPn 1q two .
two if n odd
Illustration
XP12qXP13q
Ifn 24, Sample median:M two
Ifn 25, Sample median:M XP13q
Isabel Pereira University of Aveiro, 2023/24 11
2. Statistics
Given the random samplePX1, . . . , XnqThe following statistics are some common
examples (associated with dispersion):
1
°n
Sampling variance (uncorrected):stwo 1PXi-X̄qtwo
nBi
1
°n ¯two
Sample standard deviation (corrected):sw n-1 i 1PXi-Xq
Sample amplitude: Ampl=Maximum-Minimum
Mad (median absolute deviation):MAD med|Xi-X̄|
POPULATION MOMENTS:
ANDrXks, k 1,two, . . .
µk ANDrpX-µqks, k 1,two, . . .
SAMPLE MOMENTS:
1ņ
Xki,k 1,two, . . .
n i1
Order centered sample momentk, The
1ņ
PXi-X ¯qk, k 1,two, . . .
n i1
Particular cases
1
° n
- Sample mean:X̄ n i 1Xi
1
°n
- Sample variance:stwo n i1PXi-X̄qtwo
1
°n
ANDrXks
n i 1Xk i
1
°n k
ANDrpX-µqks n i 1PXi-X̄q
ņ ņ
PXi-X̄qtwo Xtwo
i-nX̄two
i 1 i1
,
O
1¸ n 1¸ n
stwo qtwo
PX-i X̄ i -¯ Xtwo
Xtwo
ni 1
n i1
Isabel Pereira University of Aveiro, 2023/24 15
2. Sample Moments
Theorem
It isPX1, . . . , Xnqaa If the necessary population moments exist,ANDr
Xks, we have to:
°n
ANDr
n 1
i 1Xk is ANDrXks
°n
Vr1n i 1Xk is ANDrX spANDrX sq
n
twok k two
Dem
°n 1
° n k 1
ANDr1
n i 1Xk is n i 1ANDrXis n-n-ANDrXks ANDrXks
°n 1
° n k 1
Vr1n i 1Xk is ntwo i 1VrXis ntwo-n-VrXks
ANDrXtwokspANDrXksqtwo
n
Corollary
IfPX1, . . . , XnqIt isan aa of a populationX,whereANDrXs VrXs µIt is
σtwo, then
σtwo
ANDrX̄s µ VrX̄s n
Theorem
It isPX1, . . . , Xnqone aa of populationX,such thatANDrXs VrXs µIt is
σtwoand with a centered moment of 2thfinite order,µ4, then
n-1 Pn-1qtwo
ANDrstwos
nσtwo
Vrstwos n3 Pµ4-n-3n-1σ4q
Dem(1thpart)
Adding and subtractingµ, it turns out that:
°n °n two ¯
i 1PXi-X̄qtwo i 1PXi-µq-nPX-µqtwo
ie, °n
1
stwo n i 1PXi-µqtwo-PX̄-µqtwo,
then,
1
°n 1 n-1
ANDrstwos
n i1ANDrpXi-µqtwos-ANDrpX̄-µqtwos n-nVrXs-VrX̄s
n σtwo
Theorem
It isPX1, . . . , Xnqone aa of populationX,such thatANDrXs VrXs µ,
σtwo, with a centered moment of 2thfinite order,µ4, and with
corrected sample variance,stwo w,then
1
ws
ANDrstwo σtwo Vrstwows nPµ4-n-3
n-1σ4q
NOTE:
Although these results are known about the 1thand 2thsampling
moments, in general their distributions are unknown, even knowing
the distribution of the population.
Taking into account theCentral Limit Theorem, and under its conditions, it is
possible to obtain the distributionapproximateof some sample moments. In
particular
X̄- ?µÝdÑNP0,1q
σ{n
1
°n
In relation to the centered sampling moments,Mk n i 1Xk i,
it turns out that
?Mk-µk ÝdÑNP0,1q
VPMkq
Order statistics:
It isPX1, . . . , Xnqone aa of the populationX; arranging the components in
ascending order, we have
XP1q¤XPtwoq¤. . .¤XPnq
The va śXPiq, i 1, . . . , nare designatedorder statistics of the
sample.
Definition
Define yourselfsample quantileof orderP,0 P 1, The
"
XPnpq npPN
QP .
Xprnps1q npRN
Example 3
Ifn 24.1thQuartileQ1{4 XP6q, thennp 24 - 1{4 6
Ifn 25.1thQuartileQ1{4 XP7q, thenrnps 1 r25 - 1{4s 1 7
P0,1,0,1,1,1,0,0,0,1, . . . ,0,1,0,0q
°
What represents xi?
Isabel Pereira University of Aveiro, 2023/24 23
3. A Bernoulli population
Remember:
1ņ
fn X̄ Xi
n i1
Remember:
° n
IfXi BerPPq BP1, Pq,vaiid, thensn i 1Xi BPn, pq,ie,
s
PPX̄ xq PPn xq PPsn nxq
n
1 two
Wnx
n PnxP1 -PqnP1-xq, x 0, , , . . . , 1
n n
mean and variance?
PP1-Pq
ANDPX̄q P It is VPX̄q n
? n?fn-P
ÝdÑNP0,1q
PP1-Pq
fnÝPÑP
PopulationX1 PopulationXtwo
O O
°n °ntwo
fn 1 X¯ 1 1
n1
1
i1X1,i fntwo X¯ two 1
ntwo i1 Xtwo,i
σtwo
ANDPX̄q µ;VPX̄q
n
For independent Bernoulli populationsX1It isXtwowithANDPXiq It isV Pi
PXiq PiP1 -Piq,we have for the respective relative frequencies
of two large samples taken from each of them:
fn1-fntwo-PP 1-Ptwoq L
B
P1P1-P1q PtwoP1-Ptwoq
ÝÑNP0,1q
n1 ntwo
1
°n
Sample mean: X̄ n i 1Xi
°n
If variables are identically distributed: i 1Xi NPnµ, nσtwoq
ThenX̄ NPµ,σtwonq
Whereby
X̄- ?µ
σ{n
NP0,1q
Isabel Pereira University of Aveiro, 2023/24 31
4. A Normal population: distribution of sampling
variance
1
°n
Sampling variance:stwo n i 1PXi-X̄qtwo
1
° n
Corrected sampling variance:stwo w n- 1 i 1PXi-X̄qtwo
n
Remember that:stwow n-1stwo
°n ¯ °n two ¯two
As i 1PXi-Xqtwo i1Xi-nX
Y χtwo
Pkq
Probability density function of the Chi-square dist
1 »8
fPyq It is-y{twoy k
two-1 , y¡0, kPN,ΓPkq xn-1It is-xdx
twok{twoΓpk{twoq 0
1 k
φyPtq P q ,t
two 1{2
1 - 2t
k 1
Y χtwo
PkqOY GaP , q
two
two
Demo:
?
1 ?1 ?1It is -
?
Pyqtwo P1{2q1{2 It is-1 twoy
fYPyq twofXPyqtwo
?
y y twoπ
1
two
Γp1{2qy
1-1
two
Additivity property
°n
Yi χtwo
Pkiq, independent,i 1, . . . , nñ i 1Yi χtwo°
Pn 1kiq
i
Demo:
1 ki
Yi χtwo
PkiqñφYiPtq P q two,t 1{2
1 - 2t
° 1 °n ki
φ°ni P tq ANDrIt istin1Yis ANDrIt istY1s. . .ANDrIt istYns P q 1 two
1Yi
1 - 2t
Theorem
If a va can be expressed by the difference of two independent va's, each
of which has a Chi-Square distribution, then it also follows a Chi-Square
distribution, ie,
Y χtwo
Pmq, Z Y χtwo
Pmkq, YIt isZindependentñZ χtwo
Pkq
Theorem
The sum of squares of normal independent va's N(0,1), follows a Chi-
Square distribution whose nthof degrees of freedom (gl) is equal to nth
of installments, ie,
ņ
Yi NP0,1q, i 1, . . . , n,iidñ Ytwo
i χtwo
Pnq
i1
Motto
IfPY, Zqhas fgm such that φY,ZPs, tq φYPsq-HPtq, where φYPsqIt isThe
fgm ofY,then
- YIt isZare independent
- HPtqIt isthe fgm ofZ.
Demo:
doings 0 in the joint fgm, we have:
φY,ZP0, tq φYP0q-HPtq
ANDrIt is0YtZs
φZPtq ANDrIt istZs φY,ZP0, tq φYP0qHPtq HPtq
Demo:
Strategy: get joint fgm fromPX̄, Yqand factorize it in order to apply
the previous lemma.
°n
φX̄,YPt, t1, . . . , tnq ANDrIt istX̄ 1 tiPXi-Xs ¯
φX̄,PX - X̄ qPt, t1, . . . , t nq
i
1-X̄,...,Xn
As
two two
The
i PThe
φXPThe
i iq
It isµ i qσ
two ,i 1, . . . , n
±n σ two
two °n σtwo°n two
φX̄,YPt, t1, . . . , tnq i 1ANDrIt
µisi The The
P i
q twos µ
It is i 1Thei two i 1Thei
But,
°n °n
i 1Thei
t
i1Pn ti-tq ¯ t
°n °n ttwo ° n
i 1 Theitwo
t
i1Pn ti-tq ¯two i 1Pti-t̄q
two
n
He comes,
O
HPt1, . . . , tnq φYPt1, . . . , tnqIt isX̄It isYthey are independent va's, cqd
Corollary
In random samples from normal populations, thesample mean and the
sampling variance (corrected or not) areindependent
Theorem
It isPX1, . . . , Xnqan aa of a populationX NPµ, σtwoq. Then
stwo swtwo
n χtwo
n-1 O Pn - 1q χtwo
σtwo σtwo n-1
Demo:
ņ ņ
PXi-µqtwo rpXi-Xqp¯X̄-µqstwo
i1 i1
ņ
rpXi-X̄q twoPX̄-µqtwo twoPX̄-µqpXi-X̄qs
i1
ņ
PXi-X̄qtwo nPX̄-µq
i1
ņ
Xi-µ qtwo
ņ
Xi-X ¯ X̄-µ qtwo
P P qtwoP ?
σ σ itσoo{omoooon
n
iloo1ooooooooooooo iloo1ooooooooooooo
χtwo χtwo
1
n nstwo
σtwo
Then
stwo ņ
Xi-µ qtwo X¯ - µ
n P - P ? qtwo
σtwo σ itσoo{omoooon
n
iloo1ooooooooooooo
χtwo χtwo
1
n
By a previous theorem,
nstwo
σtwo χtwo
n-1
nstwo
σtwo χtwo
n-1; Pn-1qstwoσtwo
w
χtwo
n-1
Average value
Variance
Similarly,
Vrnσ twostwos twoPn-1q ñntwo
σ4Vrstwos twoPn-1q ohVrstwos
twoPn-1q
ntwo σ4
Vrstwows Vr n-1nstwos ntwo
Pn-1qtwo
twoPn-1q
ntwo σ4 two
n-1σ4
X̄-µ
? NP0,1q
σ{n
is useful when thevarianceof the population isknown
X̄-µ
T ?
sw{ n
Ÿ establish the distribution ofT
Theorem
It isPX1, . . . , Xnqan aa of a populationX NPµ, σtwoqIt isstwo wyours
corrected sampling variance. Then
X̄- ?µ
sw{n TPn-1q
Dem
X̄-µ? X̄-µ? X̄-µ?
X̄- ?µ ? σ{n ? NP0,1q
sw{n
σ{n
sw{σ
σ{n
B TPn - 1q,
stwo
w{σ two Pn-1qstwow{σtwo {Pn-1q χtwo
Pn-1q
{Pn-1q
Example 4
Given a random sample of size 9 from a normal population, what is
the probability that the difference between the sample mean and the
population mean is less than twice the corrected standard deviation?
¯- ?
P Pp|X̄-µ| twoswq Pp|sXw?{µn | twosw?
w{ n
q Pp|sX̄w-{n??µ | twonq
? X̄- ? s?
PP-two n ?µ
sw{n
twonq FTPn-1q Ptwonq-FT Pn-1q
P-twonq
FTP8q P6q-FT P8q
F T P6q-1
P-6q two P8q 0.9997
X̄- ?µ
Dist. from theSample Average σ{ n
NP0,1q
(known population variance)
X̄- ?µ
Dist. from theSample Average sw{n TPn-1 q
(population variance unknown)
NPµtwo, σtwotwoq
PopulationX1 PopulationXtwo
O O
two two
σtwo
X̄1 NPµσ11,nq1 X̄two NPµtwo,nq
two
Isabel Pereira Univ rsity of Aveiro, 2023/24
It is 48
4. Two Normal populations: dist. difference between
sample means
Theorem
BePX1,1, X1,two, . . . , X1,n1qIt isPXtwo,1, Xtwo,two, . . . , Xtwo,ntwoqa.a's of two
independent normal populations, with equal variance σtwoPσtwo 1 σtwo
q.
two
Then
PX̄w
1-X̄twoqpµ1-µtwoq
B Pn1-1qstwow,1Pn2-1qstwo
TPnn1 -twoq
two
1 1 w,two
n1 ntwo n1n2-2
Theorem
BeXIt isYtwo independent v.a', each of which has a Chi-square
distribution, respectively withkIt isldegrees of freedom.
Then
X{k
FPk, lq
Y{l
There isX FPk, lqñ1 X FPl, kq
The distributionsF-FisherIt isT-Studentare related by X
TPkqñXtwo FP1, kq
If in particularσtwo 1 σtwo
two
stwo
w,1
stwo
w,two
FPn1-1,ntwo-1q
Summary table:
Consider two independent populationsXYNPµ NPµ1, σtwo1qIt is
two, σtwo twoq
PX̄w
1-X̄twoqpµ1-µtwoq
Comparison ofAverages B TPn1ntwo-twoq
Pn1-1qstwow,1Pn2-1qstwo
1 1 w,two
n1 ntwo n1n2-2
stwo
w,1 σtwo
Comparison ofVariances stwo
two
FPn1- 1,n-1q
two
w,twoσtwo1