Reminder of Statistics: Lecturer: Dmitri A. Moltchanov E-Mail: Moltchan@cs - Tut.fi

Reminder of statistics
Lecturer: Dmitri A. Moltchanov

E-mail: moltchan@cs.tut.fi
http://www.cs.tut.fi/kurssit/ELT-53606/
Network analysis and dimensioning I D.Moltchanov, TUT, 2013
OUTLINE:
• Why do we need statistics?
• Description of statistical data;
• Statistical CDFs and histograms;
• Estimating parameters of the sample;
• Point estimators of parameters;
• Interval estimators of parameters;
• Criteria of fitting accuracy;
• Criteria of homogeneity of samples;
• Statistics of stochastic processes;
• Tests for autocorrelations and white noise;
• Tests for stationarity.
Lecture: Reminder of statistics 2

1. Why do we need statistics?

The aim of stochastic modeling:
• develop probabilistic models of random events;
• some examples:
– estimate parameters of the traffic model;
– estimate values obtained via simulations.
Definition: statistic develops methods of registration, description and analysis of data.
Examples of tasks we are interested in:

• determine probability of random event based on its frequency;
• determine approximating distribution law of empirical data;
• estimate parameters of distribution law based on empirical data;
• check statistical hypothesis.

1.1. Tasks of the statistics

There are following tasks:
• description of events;
• analysis and forecasting;
• conclusions.
1. Description of events:
• how to represent these data in convenient form;
• particularly, what kind of statistical table, graphs we should use (are best suited for . . . ).
2. Analysis and forecasting:
• provide an estimate of a certain parameter (mean, variance, st.dev., etc.);
• given a certain number of experiments, determine how accurate these estimates are;
• predict the next value (of stochastic process) with a certain accuracy.
3. Conclusions:
• hypothesis testing.

1.2. Example: why do we need to know statistics?

Example: how to estimate the variance:
• we measured a certain RV N times to get Xi , i = 1, 2, . . . , N measurements.
From the probability theory we know:
2 2

σ [X] = E (X − E[X]) . (1)
It seems we have to do the following:

N
1 X
σ 2 [X] = (Xi − m)2 . (2)
N i=1
where
• σ 2 [X] is the estimate of variance;
PN
• m is the estimate of mean, given by 1/N i=1 Xi .
Important: such an estimate of variance is biased meaning that we make systematic error.

1.3. Example: applicability in our course

To provide quantitative analysis of a system we have to:
• provide adequate traffic model:
– determine important statistical parameters of input traffic;
– match these parameters using appropriate traffic model.
• provide model of the service process;
– determine statistical parameters of the service process;
– match these parameters using appropriate model.
• analyze the system: what is the load a given system may carry?
– analytic approach;
– simulation study.
• solve the inverse task: what are system parameters to carry a given load?
– analytic approach;
– simulation study.

Questions that statistics help to answer:
Modeling arrival and service processes:

• what statistical parameters must be taken into account?
• what statistical parameters are statistically significant?
• how many observations we need to consider?
• how to estimate parameters when only few observations are available?
• guess a model when no statistics are available.
Analyzing simulation results:

• what are the point estimates of the parameters?
• what are the confidence limits for parameters?
• how to best organize collection of data?

2. Description of statistical data

After experiments we are given a list containing:
• the number of experiment k, k = 0, 1, . . . , N ;
• the value xk , k = 0, 1, . . . , N of RV X under investigation;
• this list is called an initial statistical set or sample.
Note the following:

• if the number of experiments is high, initial statistical set is not convenient to deal with;
• other representations are required.
Another representation is ordered statistical set:

• put all values obtained in statistical experiments starting from smallest up to highest;
• re-enumerate values accordingly.

inintial statistical set ordered statistical set

k xk i xi
1 82 1 75
2 80 2 78
3 80 3 78
4 78 4 80
5 78 5 80
6 84 6 81
7 82 7 82
8 75 8 82
9 85 9 84
10 81 10 85
Figure 1: Initial and ordered statistical sets.

2.1. Statistical probability distribution function

Distributional information can be provided in terms of statistical CDF:
FX? (x) = P r? {X ≤ x}. (3)
• equivalent to probability distribution function: P r{X < x};

• often called empirical CDF.
Properties of empirical CDF:
• this is a step function;
• it is monotone and non-decreasing;
• it is equal to zero for all x which are less than the smallest observed value of RV X:
• it is equal to one for all x which are greater than the highest observed value of RV X.
Note that statistical CDF has each step equal to li /N :
• li is the number of occurrences of values up to value xi ;
• N is the overall number of experiments.

FX* ( x) = Pr{ X £ x}
0 x1 x2 x3 x4 x5 x
Figure 2: Example of statistical distribution function.
• black dots denote a values of function at points of discontinuity;

• if we define statistical CDF, P r{X < x} this graph will be different.

Note the following:
• assume RV X is continuous;
• if number of experiments increases empirical CDF tends to CDF of RV X.
FX* ( x) = Pr{ X £ x}
0 x1 x2 x3 x4 x5 x
Figure 3: Trend of statistical CDF to CDF of continuous RV X.

Fi(D) Fi(D)
100 1
50 0.75
0 0.5
50 0.25
100 0
0 14.29 28.57 42.86 57.14 71.43 85.71 100 100 64.29 28.57 7.14 42.86 78.57 114.29 150
iD iD
SPDF: 100 radom numbers with Normal(15,30) SPDF: 100 radom numbers with Normal(15,30)
Fi(D) Fi(D)
200 1
125 0.75
50 0.5
25 0.25
100 0
0 142.86 285.71 428.57 571.43 714.29 857.14 1000 100 64.29 28.57 7.14 42.86 78.57 114.29 150
iD iD
SPDF: 1000 radom numbers with Normal(15,30) SPDF: 1000 radom numbers with Normal(15,30)

2.2. Grouped statistical set and histogram of relative frequencies

Instead of statistical CDF, one may consider:
• grouped statistical set and/or
• histogram of relative frequencies.
Note: the latter if often called just histogram.
Assume that we have measured RV X K times:
• we got outcomes xk , k = 1, 2, . . . , K.
Let us define the following:
xmax = max xk , xmin = min xk . (4)

∀k ∀k
• be smallest and largest values.

We can identify the following property:
• all values of measured RV X are between xmin and xmax values, including both.

To define a grouped statistical set we have to do the following:
• consider a part of 0X axis between xmin and xmax , including both;
• divide this part of axis into N non-overlapping intervals.
Often these intervals are equal to each other (it is not, however, required):
xmax − xmin
L= (5)
N
Definition: grouped statistical set is table where:
• 1st row: intervals;
• 2nd row: corresponding frequencies.
x1-x2 x2-x3 x3-x4 x4-x5 ... xi-xi+1 ... xN-1-xN
p1 * p2 * p3 * p4 * ... pi * ... pN *
Figure 4: Grouped statistical set.

Frequency of event p?i is the ratio of:
• the number of experiments in which X ∈ {xi , xi+1 };
• overall number of experiments, N :
li
p?i = . (6)
N
– these frequencies are sometimes called relative frequencies.
For any grouped statistical set the following condition holds:
• all frequencies in grouped statistical set must sum up to 1:
N
X
p?i = 1. (7)
i=1
Note the following:

• if RV X is continuous and some observations is at bound between intervals (rounding):
– one may assign it to any neighboring interval;
• if RV X is discrete RV we MAY get an analogue of PF (we should count each value).

How to get analogue of PF if RV X is discrete:
• assume we have x1 , x2 , . . . , xN observations of RV X;
• determine maxi xi and mini xi ;
• determine the number of histogram bins as: l = maxi xi ;
• determine the vector of the histogram bins as;
maxi xi − mini xi
min xi + l . (8)
i maxi xi
– the length of the bin is then:
maxi xi − mini xi
. (9)
maxi xi
• classify observations to these bins;
• construct the histogram.
Note: works well when N is large (N > 1000).
The idea: construct intervals such that only of value falls in one interval.

fi,E(D) E(i)
0.15 100
0.11 75
0.075 50
0.038 25
0 0
0 5.71 11.43 17.14 22.86 28.57 34.29 40 0 14.29 28.57 42.86 57.14 71.43 85.71 100
iD i
Histogram: 100 radom numbers with Geom(0.1) 100 radom numbers with Geom(0.1)
fi,E(D)
E(i)
0.1
100
0.075
75
0.05
50
0.025
25
0
0 5.71 11.43 17.14 22.86 28.57 34.29 40 0 4
0 1428.57 2857.14 4285.71 5714.29 7142.86 8571.43 1 .10
iD
i
Histogram: 10000 radom numbers with Geom(0.1)
10000 radom numbers with Geom(0.1)

What if: we observe values of continuous RV X.
Divide every frequency by a length of the interval to get frequency density:
p?i
fi? = . (10)
xi+1 − xi
Using fi? , i = 1, 2, . . . , N we can now construct a histogram:

• by definition the area of each bin is equal to fi? , i = 1, 2, . . . , N .
f*X(x)
x1 x2 x3 x4 x5 x
Figure 5: An example of histogram for continuous RV X.

fi,E(D) E(i)
0.03 100
0.0225 50
0.015 0
0.0075 50
0 100
150 107.14 64.29 21.43 21.43 64.29 107.14 150 0 14.29 28.57 42.86 57.14 71.43 85.71 100
iD i
Histogram: 100 radom numbers with Normal(15,30) 100 radom numbers with Normal(15,30)
fi,E(D) E(i)
0.015 200
0.0113 125
0.0075 50
0.0038 25
0 100
150 107.14 64.29 21.43 21.43 64.29 107.14 150 0 142.86 285.71 428.57 571.43 714.29 857.14 1000
iD i
Histogram: 1000 radom numbers with Normal(15,30) 1000 radom numbers with Normal(15,30)

GENERAL ALGORITHM: for histogram of relative frequencies:
• assume we have x1 , x2 , . . . , xN observations of RV X;
• determine maxi xi and mini xi ;
• determine the number of histogram bins as: l = int(1.72(N )1/3 );
• determine the vector of the histogram bins as;
maxi xi − mini xi
min xi + l . (11)
i maxi xi + 1
– the length of the bin is then:
maxi xi − mini xi
. (12)
maxi xi + 1
• classify observations to these bins;
• divide the number of observations in each bin by the length of the bin;
• construct the histogram.
The idea: works well when N is relatively small of RV X is continuous.

fi,E(D)
E(i)
0.1
80
0.075
60
0.05
40
0.025
20
0
0 5.71 11.43 17.14 22.86 28.57 34.29 40 0
0 14.29 28.57 42.86 57.14 71.43 85.71 100
iD
i
Histogram: 100 radom numbers with Geom(0.1)
100 radom numbers with Geom(0.1)
fi,E(D) E(i)
0.1 150
0.075 112.5
0.05 75
0.025 37.5
0 0 4
0 5.71 11.43 17.14 22.86 28.57 34.29 40 0 1428.57 2857.14 4285.71 5714.29 7142.86 8571.43 1 .10
iD i
Histogram: 10000 radom numbers with Geom(0.1) 10000 radom numbers with Geom(0.1)

3. Estimating parameters of the sample

Why we may turn to just parameters:
• statistical distributions are difficult to estimate:
– we may not be given enough statistical data to deal with (n < 100);
– something have to be done on-line.
• to approximate distributions using method of moments;
• sometimes it is not needed to determine a distribution:
– since this law may be known in advance;
– distribution is defined using a certain parameter.
• we are only interested in a certain parameter.
To proceed, we assume:
• we consider RV X;
• we observed N values of RV: Xi , i = 1, 2, . . . , N .

3.1. Point and interval estimators

Notation:
• a is some parameter whose value we want to estimate;
• a is the estimate of this parameter.
Important note: estimate is the function of experimental data:
a = φ(X1 , X2 , . . . , Xn ). (13)
There are two types of estimators for parameters:

• point estimators:
– the goal is to give the best possible estimate of the parameter;
– the value of parameter a is likely a.
• interval estimators:
– sample is always of limited size: confidential intervals must be provided;
– with a confidence interval (1 − α) (or with risk α) parameter a ∈ [pmin , pmax ].

4. Point estimators of parameters

The sample mean:
N
1 X
m= Xi (14)
N i=1
The sample variance:

N
2 1 X
σ [X] = (Xi − m)2 . (15)
N i=1
The k th moment and k th central moment of the sample:

N N
1 X k 1 X
αk = X , µk = (Xi − m)k . (16)
N i=1 i N i=1
Major questions we have to answer:

• can we use these estimators?
• how close we can estimate the value of a parameter?

4.1. Mean is a RV!

Assumptions:
• finite set of measurements of RV X: Xi , i = 1, 2, . . . , N ;
• measurements Xi , i = 1, 2, . . . , N are independent;
• RV X is characterized by mean E[X] and variance σ 2 [X].
Central limit theorem in statistics:
• the sample mean m is normally distributed RV;
• mean and variance of the sample mean are given by:
"P #
N
i=1 Xi
E[m] = E = E[X] = m,
N
"P #
N
i=1 Xi σ 2 [X]
E[m − E[X]] = E − E[X] = . (17)
N N
Note: it does not matter what distribution RV X follows.

Consider the mean of m:

1
E[m] = E (X1 + · · · + XN ) =
N
1
= (E[X1 ] + · · · + E[XN ]) =
N
1
= N E[X] = E[X]. (18)
N
Consider the variance of m:

"P #
N
Xi 1
σ 2 [m] = D i=1
=D (X1 + · · · + XN ) =
N N

X1 XN 1
=D + ··· + = 2 (D[X1 ] + · · · + D[XN ]) =
N N N
1 2 σ 2 [X]
= 2 N σ [X] = . (19)
N N
Note: approximation is good enough when N > 30.

4.2. Requirements for points estimators

Estimator of a, denoted by a should be:
• consistent: a with increase of N should converge to a:
a → a, N → ∞. (20)
• unbiased: should not allow systematic bias (errors):
E[a] = a, N → ∞. (21)
• effective: a should be as less stochastic as possible:
σ 2 [a] → 0, N → ∞. (22)
Consistent, unbiased, effective estimator is called absolutely correct.

Note the following:
• sometimes non-effective estimators are used: higher variance;
• sometimes even biased estimators are used.

4.3. Point estimator of the mean

The natural estimate of mean is:
n
1 X
m= Xi . (23)
N i=1
Is this estimate consistent (a → a, N → ∞)? Yes:

N
1 X
lim Xi = E[X]. (24)
N →∞ N
i=1
Is this estimate unbiased (E[a] = a, N → ∞)? Yes:

" N # N N
1 X 1 X 1 X
E[m] = E Xi = E[Xi ] = E[X] = E[X]. (25)
N i=1
N i=1
N i=1
Is this estimate effective (σ 2 [a] → 0, N → ∞)? Yes:

N N
2 1 X 2 1 X 2 σ 2 [X]
σ [m] = 2 σ [Xi ] = 2 σ [X] = . (26)
N i=1 N i=1 N

4.4. Point estimator of the variance

The natural estimator:
N
2 1 X
σ [X] = (Xi − m)2 . (27)
N i=1
Is this estimate consistent (a → a, N → ∞)? Yes:

• let us express it via second moment:
N N N
2 1 X 2 2 1 X 2 1 X 2
σ [X] = lim (Xi − m ) = lim Xi − lim m. (28)
N →∞ N N →∞ N N →∞ N
i=1 i=1 i=1
• the first term is the mean of X 2 that probabilistically tends to its mean: E[X 2 ];
• the second term m2 probabilistically tends to E 2 [X];
• finally, we may find that the estimator is consistent:
N
2 1 X 2
σ [X] = lim (Xi − m2 ) = α2 [X] − E 2 [X] = σ 2 [X]. (29)
N →∞ N
i=1

Is this estimate unbiased (E[a] = a, N → ∞)? No:
• substitute an estimate of mean to estimate of variance to get:
 !2 
N N N
2 1 X 2 2 1 X 2 1 X
σ [X] = (Xi − m ) = Xi − Xi  =
N i=1 N i=1 N i=1
N
N −1X 2 2 X
= X − 2 Xi Xj . (30)
N 2 i=1 i N i<j
• centralize measurements and choose the beginning of 0X axis at E[X]:

– variance does not depend on when we choose the beginning of X-axis;
0
– denote centralized measurement by X i :
– rewriting we get:
N
N − 1 X 0 2 2 X 0 0
2
σ [X] = X − X iX j . (31)
N 2 i=1 i N 2 i<j

• determine the mean of variance σ 2 [X]:
" N
# " #
N −1 X 0 2 2 X 0 0
E[σ 2 [X]] = E Xi − E X iX j . (32)
N2 i=1
N2 i<j
0
• note that for any measurement X i , i = 1, 2, . . . , N we have:
2
0
E X i = σ 2 [X], i = 1, 2, . . . . (33)
• since all measurements were assumed to be independent, we have:

0 0
E X i X j = 0, i 6= j (34)
• substituting we get:
N
2 N −1X 2 2 X 0 0 N −1 2 N −1 2
E[σ [X]] = σ [X] − E[X X
i j ] = N σ [X] = σ [X]. (35)
N 2 i=1 N 2 i<j N N

1
PN
Estimate σ 2 [X] = N i=1 (Xi − m)2 is biased:
• its mean is not equal to σ 2 [X] but slightly less (by (N − 1)/N );
• using such an estimate we make a systematic error.
To get unbiased estimate for variance we just multiply by N/(N − 1) to get:
N
N 1 X
σ 2 [X] = (Xi − m)2 . (36)
N − 1 N i=1
The final expression for estimate of variance is now given by:

N
2 1 X
σ [X] = (Xi − m)2 . (37)
N − 1 i=1
Important note:
• multiplier N/(N − 1) must be taken into account whenever N < 50;
• with increase of N the multiplier N/(N − 1) tends to one and can be dropped.

4.5. Point estimator of the covariance

Assume that we are obtained data of two RV X and Y :
(X1 , Y1 ), (X2 , Y2 ), . . . , (Xi , Yi ), . . . , (XN , YN ). (38)
Then consistent and unbiased estimate of covariance:

N
N 1 X
K XY = (Xi − m[X])(Yi − m[Y ]). (39)
N − 1 N i=1
• where
n n
1 X 1 X
mX = Xi , mY = Yi . (40)
N i=1 N i=1
Then consistent and unbiased estimate of correlation coefficient:

K XY
rXY = . (41)
σ 2 [X]σ 2 [Y ]

5. Interval estimators of parameters

Why interval estimators are needed:
• estimates for mean, variance, covariance are all RV!!!
• sometimes it is needed to say how precisely we can estimate some parameter.
The idea:
• we get estimate θ of some parameter θ;
• recall that θ is a random function of experiments;
• we want to say that we are 95% of confident θ ∈ {θ1 , θ2 }: P r{θ1 < θ < θ2 } = γ;
– where γ = 1 − α must be close to 1;
– α: level of significance (usually set to 0.05).
a a
1-a
2 2
q1 q q2

5.1. Confidence interval for the mean

Assume we have X1 , X2 , . . . , XN observation with known variance σ 2 :
• first we find point estimator of the mean (sample mean):
N −1
1 X
m= Xi (42)
N i=0
• mean is normal RV! standardizing RV to N (0, 1) via (Z = (X − µ)/σ) we have that

PN
i=1 Xi − N µ
√ , (43)
σ/ N
– is approximately a standard normal variable N (0, 1);
– where µ is the actual mean.
• let zα be the (1 − α)-quantile (P r{X ≤ x} ≥ 1 − α) of N (0, 1), that is
P r{Z > zα } = α. (44)
• for example, zα=0.05 = 1, 96 (α = 0.05 is often used).

• setting certain α small enough wand observing that N (0, 1) is symmetric:
( PN )
Xi − nµ
P r −zα/2 ≤ i=1 √ ≤ zα/2 ≈ 1 − α, (45)
σ N
• which is equivalent to

σ[X] σ[X]
P r E[X] − zα/2 √ ≤ µ ≤ E[X] + zα/2 √ ≈ 1 − α, (46)
N N
– substituting your values to get 100(1 − α)% confidence interval for µ;
– where −zα/2 and zα/2 are upper or lower critical values of N (0, 1).
a a
2 2

The previous result is correct if:
• if observations X1 , X2 , . . . , XN are normally distributed;
• is variance is known.
What if variance is unknown?
• use estimate of the variance σ 2 ;
• mean now has student distribution with (N − 1) degrees of freedom (# of observations −1);
• confidence intervals can be obtained replacing zα/2 by tn−1,α/2 .
The length of the intervals:
• what we want: to make them as small as possible;
• approach 1: increase the number of experiments, N ;
• approach 2: decrease the sample variance σ 2 :
– very effective way;
– you shorten the intervals without increasing experiments!
• we will consider techniques to reduce variance without increasing N .

6. Criteria of fitting accuracy

What we want to do:
• we test whether a sample has a certain distribution;
• distribution is usually assumed to be analytical.
What we can do:
• compare CDF F (x) with empirical CDF F ? (x);
• compare pdf f (x) or pmf pi with histogram f ? (x).
What tests are available:
• χ2 test:
– compare pdf f (x) and empirical pdf f ? (x);
• Kolmogorov’s test:
– compare CDF F (x) and empirical CDF F ? (x);
General approach: hypothesis testing!

6.1. χ2 test: discrete distribution

After parameters matching procedure we must check the accuracy of this fitting.
We formulate it as follows: whether data belong to a chosen distribution:

• to answer such kind of questions we use a fitting criteria;
• we consider one of the most popular Pearson’s χ2 (chi-square) criteria.
To proceed, we assume that:

• the random variable X under consideration is discrete one with possible values {x1 , x2 , . . . , xk };
• we provided n independent experiments in which X took on certain values in set {x1 , x2 , . . . , xk };
• n is the whole number of experiments;
• ni , i = 1, 2, . . . , k, is the number of experiments where event {X = xi }, i = 1, 2, . . . , k occurred;
• p?i = ni /n, i = 1, 2, . . . , k, are frequencies of events {X = xi }, i = 1, 2, . . . , k.

Based on statistical data we construct the following grouped statistical set for RV X.
x1 x2 x3 x4 ... xi ... xk
p1* p2* p3* p4* ... pi* ... pk*
Figure 6: Statistical distribution set.
Let us take a null hypothesis H0 consisting in that the RV X has the following PF.
x1 x2 x3 x4 ... xi ... xk
p1 p2 p3 p4 ... pi ... pk
Figure 7: Hypothetical probability function.
• deviations of frequencies p?i from probabilities pi are just due to stochastic reasons.
Note: to verify or reject this hypothesis we have to define a certain measure of deviation.

As a measure of deviation, R one may choose:
• a sum of squared deviations (p?i − pi ) with certain ’weights’ ci , i = 1, 2, . . . , k, as follows:
k
X
R= ci (p?i − pi )2 . (47)
i=1
Deviations associated with pi , i = 1, 2, . . . , k, are not equal in their impacts:

• deviation (p?i − pi ) may have huge impact if the absolute value of pi is large
• contrarily, it may be of a little significance if the absolute value of pi is small;
• to avoid it: coefficients ci , i = 1, 2, . . . , k are introduced;
• coefficients ci , i = 1, 2, . . . , k, should be inverse proportional to pi , i = 1, 2, . . . , k.
Question: how to choose ci , i = 1, 2, . . . , k?
Pearson proposed to choose ci , i = 1, 2, . . . , k as follows:
n
ci = , i = 1, 2, . . . , k. (48)
pi
• where n is the number of experiments.

If previous is satisfied, the distribution of RV R is characterized by the following:
• it does not depend on distribution law of RV X under consideration;
• it depends only little on the number of experiments n;
• it depends on the number of bins k;
• with increase of n tends to χ2 distribution.
The measure of deviation R is denoted by χ2 and given by:
k n k
2
X i
2 X ni ni 2
χ = ci − pi = − pi . (49)
i=1
n p
i=1 i
n
Taking n/pi into the sum we have:

k
X (ni − npi )2
R = χ2 = . (50)
i=1
npi
• which is the final expression for the χ2 criteria.

Distribution χ2 depends on number of degrees of freedom r:
r = k − l − 1, (51)
• k is the number of histogram bins;

• l is the number bindings applied to observations:
– we always estimate frequencies from statistical data:
p?1 + p?2 + · · · + p?k = 1, that is why: r = k − 1! (52)
– if we also estimated mean to fit our distribution:

k
X −
xi p?i = E[X], in this case: r = k − 1 − 1 = k − 2! (53)
i=1
– if we also estimated variance to fit our distribution:

k
X −
(xi − E[X])2 p?i = D[X], in this case: r = k − 1 − 2 = k − 3! (54)
i=1
Note: estimation of parameters is required to parameterize hypothetical distribution.

χ2 distribution is extensively tabulated in literature in the following form:
• input: value of χ21−α (r) statistic;
• input: number of degrees of freedom, r;
• output: probability that RV distributed according to χ2 is more than a given value of χ2 .
Null hypothesis, H0 : statistical data are taken from distribution F (x).
The procedure of checking the null hypothesis is as follows:

• if χ21−α (r) determined from tables is less than computed χ2 :
– H0 must be accepted;
– deviation from between statistical and hypothetical distribution is not significant
• if χ21−α (r) determined from tables is greater than computed χ2 :
– H0 must be rejected;
– deviation from between statistical and hypothetical distribution is significant.

6.2. χ2 test: continuous distribution

How to: get grouped statistical set (histogram)
• where ∆i , i = 1, 2, . . . , k are intervals.
D1 D 2 D 3 D 4 ... Di ... Dk
p1* p2* p3* p4* ... pi* ... pk*
Figure 8: Grouped statistical set.
• p?i is the frequency of RV X in ith bin defined as follows:

ni
p?i = . (55)
n
• then proceed similarly to discrete RV replacing probabilities as follows:
Z xi+1
pi = f (x)dx. (56)
xi
Note: observations of discrete RV are also often grouped!

The procedure:
• choose significance level α;
• determine the number of bins using either:
k = 1.72N 1/3 , or k = 1 + 3.3 ln N. (57)
• estimate frequencies ni = i, i = 1, 2, . . . , k;
• estimate values of hypothetical function F (x), pi = P r{X ∈ ∆i }, i = 1, 2, . . . , k;
• using tables get quantile χ21−α (k − 1)
• estimate χ2 statistics as
k
2
X (ni − npi )2
χ = (58)
i=1
npi
• compare χ2 and quantile χ21−α (k − 1):

– χ2 < χ21−α (k − 1): H0 is accepted;
– χ2 ≥ χ21−α (k − 1): H0 is rejected.

Example: H0 data follow Poisson distribution:
• we have 200 observations;
How we proceed:
−
• estimate mean from the sample as E[X] = 1;
−
i −λ
• get probabilities of Poisson distribution pi = (λ /i!)e with λ = E[X]:
p0 = p1 = 0.368, p2 = 0.184, p3 = 0.061, p4 = 0.015, p5 = 0.003, (59)
– note: np5 = 0.6 < 5 and np4 = 3 < 5, np3 = 12.2 > 5;
– we have to join three last intervals (important empirical rule!).
• determine the number of degrees of freedom: r = k − l − 1 = 4 − 1 − 1 = 2;
• choosing α = 0.05, get χ21−α (r) = χ20.95 (2) = 5.99;
• estimate χ2 statistics;
– since χ2 = 0.9 < χ20.95 (2) = 5.99 we accept H0 .

Figure 9: Data for χ2 test.
Figure 10: Computing χ2 test.

6.3. Criteria of fitting accuracy: Kolmogorov test

Assume the following:
• hypothetical CDF F (x) is fully defined (parameters must be known in advance);
• hypothetical CDF F (x) is continuous.
The Kolmogorov’s criterion:
Dn = sup |Fn? (x) − F (x)|, (60)

x
• supx is the least upper bound, practically, maximum;

• F (x) is the hypothetical CDF;
• Fn? (x) is statistical CDF.
Kolmogorov determined statistics:

√ P∞ n −2k 2 x2
k=−∞ (−1) e x > 0,
P r{ nDn < x} → K(x) = (61)
0 x ≤ 0.
• which is tabulated in literature.

The procedure:
• estimate statistical CDF Fn? (x);
• estimate CDF F (x) in points of ai−1 , i = 1, 2, . . . , k:
– ai−1 are left points of intervals ∆i , i = 1, 2, . . . , k;
– ∆i , i = 1, 2, . . . , k are histogram bins.
• estimate di = |Fn? (ai−1 + 0) − F (ai−1 )|, i = 1, 2, . . . , k;
√
• determine Dn = maxi di and λ = nDn ;
• choose α and find root K1−α of K(x) = 1 − α;
• compare λ and K1−α :
– λ ≤ K1−α : H0 is accepted;
– λ > K1−α : H0 is rejected.

Example: using two samples m = n = 100 check H0 that data are exponential:
• use first sample to get estimate of the mean E[X] = 1.01;
• determine parameter of exponential distribution: λ = 1/E[X] = 1/1.01 = 0.99;
Figure 11: Two sets for testing of fitting accuracy.

• use second sample to construct statistics CDF Fn? (x) by consecutively summing;
• estimate values of exponential distribution F (x);
• estimate di = x∈∆i |Fn? (x) − F (x)| = |Fn? (ai−1 + 0) − F (ai−1 )|;
P
√
• estimate Dn = maxi di = 0.13 and λ = 100Dn = 1.3;
• choose α = 0.05 and find root K1−α of K(x) = 1 − α = 0.95 as K1−α = 1.36;
• since λ = 1.3 < K1−α = 1.36 we accept H0 (BUT IT’S VERY CLOSE!).
Figure 12: Computing Kolmogorov’s test.

7. Homogeneity of samples
Homogeneity of samples:
• we test whether two samples are taken from the same distribution.
We have:
• criteria for fitting to hypothetical distribution;
• criteria for homogeneity of samples
Difference between these two:
• fitting: we compare statistical with analytical one;
• homogeneity: we compare two statistical distributions.
What tests are available:
• Smirnov’s test;
• χ2 test.

7.1. Homogeneity of samples: Smirnov test

Note the following:
• often referred to as Smirnov-Kolmogorov test:
– correct: Smirnov test;
– this is extension of the ideas in Kolmogorov test.
Assume we have:
0 0 0 ?
• sample (x1 , x2 , . . . , xm ) with SPF F1m (x);
00 00 00 ?
• sample (x1 , x2 , . . . , xm ) with F2n (x).
Question: whether these two sample are taken from the same distribution with CDF F (x).
Smirnov proposed to use:
? ?
Dmn = sup |F1m (x) − F2n (x)|. (62)
x
Note: looks similar to Kolmogorov’s statistics but far away practically.

Smirnov found that if two sample are taken from the same distribution:
( ) 
P∞ n −2k2 x2
Dmn 
k=−∞ (−1) e x > 0,
Pr p < x → K(x) = (63)
1/m + 1/n 0 x ≤ 0.
• K(x) is the Kolmogorov’s function.

The procedure:
• choose the level of significance α;
• determine K1−α by solving K(x) = 1 − α;
p
• determine Dmn and λ = Dmn / 1/m + 1/n;
• compare λ and K1−α :
– if λ > K1−α two sample are non-homogenous;
– if λ ≤ K1−α two sample are homogenous.
Note the following:
• one can compare only two samples at once;
• samples may have different number of observations.

Example: comparing two samples m = n = 100:
? ?
• compute F1n (x) and F2n (x);
? ?
• compute di = |F1n (zi ) − F2n (zi )|, zi are middles of intervals ∆i , i = 1, 2, . . . , k:
? ?
• compute Smirnov’s statistics as: Dnn = supx |F1n (zi ) − F2n (zi )| = maxi di = 0.007;
p √
• compute λ = Dnn / 1/m + 1/n = 0.07/ 0.02=0.50;
• choose α = 0.05 and find root of K(x) = 1 − α = 0.95 from tables as K1−α = 1.36;
• since λ = 0.50 < K1−α = 1.36 we accept H0 .
Figure 13: Computing Smirnov’s test for samples.

7.2. Homogeneity of samples: χ2 test

For two samples: the following statistics has χ2 distribution:
k r r 2
X 1 n m
χ2 = mi + ni , (64)
i=1
m i + n i m n
• m: number of observation in the first sample;

• n: number of observation in the second sample;
• number of degrees of freedom is r = k − 1.
For several samples: s samples containing n1 , n2 , . . . , ns observations:
k X s
!
2
2
X nij
χ =n −1 , (65)
i=1 j=1
n j v j
• nij : number of observations in sample j that fall into interval ∆i , i = 1, 2, . . . , k;

• vj : numbers of obserbation in all elements that fall into interval ∆i , i = 1, 2, . . . , k;
• number of degrees of freedom is r = (k − 1)(s − 1).

Example: comparing two samples m = n = 100:
• combine ∆11 and ∆12 since only few observations are in;
• thus we have k = 11, r = k − 1=10;
• assuming α = 0.05 we find from tables that quantile χ21−α (r) = χ20.95 (10) = 18.3;
• computing statistics χ2 = 10.25 < χ20.95 (10) = 18.3 we accept H0 .
Figure 14: Computing χ2 test for samples.

7.3. Other criteria of homogeneity

We considered: Smirnov’s and χ2 tests:
• powerful and mostly used;
• usually used both for better assurance.
Wilcoxon, Mann and Whitney:
• Wilcoxon in 1945 for two samples with m = n, extended by Mann and Whitney for m 6= n;
• idea: construct ordered statistical set and compare ranks (number in order) of observations.
Criterion of signs: can be used for two samples only with m = n:
• compute number of signs (+/−) of zi = xi − yi , i = 1, 2, . . . , n;
• if samples are homogenous + and − occur with the same frequency;
• H0 : (z1 , z2 , . . . , zn ) is taken from distribution with median 0.5.
Simple check: compare variances and means of two (or more) samples:
• it is mandatory requirement for homogeneity.

8. Tests for autocorrelations and white noise

Note the following:
• most tests are biased:
– reason: ρ(1), ρ(2), . . . are not independently distributed!
• what to do:
– still use these test;
– have enough intuition to judge...
The following test are available:

• intuitive suggestion;
• rule of thumb;
• turning point test;
• ’portmanteau’ test by Box and Pierce;
• modified ’portmanteau’ by Ljung and Box.

8.1. Intuitive suggestion

Use the following:
• observe plot of ACF;
• if there is significant values ρ(i) say that data are more complex than white noise.
KY(i) KY(i)
1 1
0.75 0.75
0.5 0.5
0.25 0.25
0 0
0.25 0.25
0.5 0.5
0 5 10 15 20 0 5 10 15 20
i, lag i, lag
(a) AR(1) model: K(1) = 0.6 (b) AR(1) model: K(1) = 0.0
Figure 15: Suggesting correlation and no correlation.

8.2. Rule of thumb

Rule of thumb:
√ √
• NACF and at lag m, is negligible if its estimate ρi is: − 2/ n < ρi < 2/ n;
• here 2 is approximation of 1.96 (α = 0.05).
KY(i) KY(i)
1 1
0.75 0.75
0.5 0.5
0.25 0.25
0 0
0.25 0.25
0.5 0.5
0 5 10 15 20 0 5 10 15 20
i, lag i, lag
(a) AR(1) model: K(1) = 0.6 (b) AR(1) model: K(1) = 0.0
Figure 16: Suggesting correlation and no correlation (n = 100).

8.3. Turning point test

Testing whether a given series is white noise:
• H0 : process is white noise;
• H1 : process is not white noise.
Note the following:

• this test does not signal autocorrelation:
– autocorrelation is the measure of linear dependence in a process;
– there could be other types of dependence.
• it signal whether a process is purely random: white noise.
White noise: sequence of iid random variables with finite mean anf variance.

Do the following:
• consider n successive triples (Xi , Xi+1 , Xi+3 ), i = 0, 1, . . . , n from observations;
• white noise: these values are equally likely to occur in any of 6 possible orders.
HML HLM MHL MLH LMH LHM
Figure 17: Possible orders of within a triple of values.
Tests is as follows:
• in four cases there is a turning point in the middle:
– white noise: there should be (2/3)(n − 2) of such points out of n.
• for large n number of turning points is distributed as N (2n/3, 8n/45);
p
• reject H0 at α = 0.05 if number of turning point is outside 2n/3 ± 1.96 8n/45.

8.4. ’Portmanteau’ test

We test hypothesis that several autocorrelations are jointly zero:
• H0 : ρ1 = ρ2 = · · · = ρm = 0;
• H1 : ρi 6= 0 for some i ∈ {1, 2, . . . , m}.
Box and Pierce statistics:
m
X
Q(m) = n (ρi )2 , (66)
i=1
• n is the number of observations;

• m should be such that m << n;
• this statistics approximately has χ2 distribution with m degrees of freedom.
Note the following:
• test is biased: χ2 is only approximation;
• perform for several m to ensure.

KY(i)
1
c 2a =0.05(5) = 11.07 < Q(5) = 49.013: H0 rejected
0.75
0.5
0.25 c 2a =0.05(10) = 18.31 < Q(10) = 51.416: H0 rejected
0
0.25
c 2a =0.05(15) = 25.00 < Q(15) = 55.332: H0 rejected
0.5
0 5 10 15 20
i, lag
(a) AR(1) model: ρ1 = 0.6
KY(i)
1
c 2a =0.05(5) = 11.07 > Q(5) = 4.193: H0 accepted
0.75
0.5
0.25 c 2a =0.05(10) = 18.31 > Q(10) = 8.416: H0 accepted
0
0.25
c 2a =0.05(15) = 25.00 > Q(15) = 11.499: H0 accepted
0.5
0 5 10 15 20
i, lag
(b) AR(1) model: ρ1 = 0.0
Figure 18: Example of the ’Portmanteau’ test for trace generated from AR(1) model.

8.5. Modified ’Portmanteau’ test

Ljung and Box statistics:
m
X (ρi )2
Q(m) = n(n + 2) . (67)
i=1
n − i
• modification: ρi has variance (n − i)/(n(n + 2)).

Practical considerations:
• the choice of m affects the performance of Q(m) statistic;
• several values of m are often used;
• empirically, m ≈ ln n provides better performance.
Important notes:
• tests must always be performed if one thinks there is autocorrelation;
• in modern teletraffic tests rarely performed!.. Reasons?:
– n is usually very big (MPEG trace we considered ∼ 40000 frame sizes);
– everybody wants to see what he/she wants to see...

KY(i)
1
c 2a =0.05(5) = 11.07 < Q(5) = 50.789: H0 rejected
0.75
0.5
0.25 c 2a =0.05(10) = 18.31 < Q(10) = 53.467: H0 rejected
0
0.25 c 2a =0.05(15) = 25.00 < Q(15) = 58.028: H0 rejected
0.5
0 5 10 15 20
i, lag
(a) AR(1) model: ρ1 = 0.6
KY(i)
1
c 2a =0.05(5) = 11.07 > Q(5) = 4.454: H0 accepted
0.75
0.5
0.25 c 2a =0.05(10) = 18.31 > Q(10) = 9.105: H0 accepted
0
0.25 c 2a =0.05(15) = 25.00 > Q(15) = 12.66: H0 accepted
0.5
0 5 10 15 20
i, lag
(b) AR(1) model: ρ1 = 0.0
Figure 19: Example of the modified ’Portmanteau’ test for trace generated from AR(1) model.

9. Tests for stationarity

Note the following:
• we usually want to know whether observation are not weakly stationary;
• there are no effective test for stationarity;
Obvious way to test:
• divide the set of observations into two or more segments;
• compare statistics of the segments.
What statistics to compare:
• recall the following:
– strictly stationary process: N -dimensional shifted distributions are the same for all N ;
– weakly stationary process: mean and ACF are constant.
• sufficient for observations to be taken from non-stationary process:
– different means;
– different ACF.

9.1. Testing for not strictly stationarity behavior

Note the following:
• strictly stationary: N -dimensional shifted distributions are the same for all N ;
• start with testing whether one-dimensional distributions are the same.
Y(i)
30
24
18
12
0
0 1000 2000 3000 4000
Figure 20: Signal-to-noise ratio process over IEEE 802.11b wireless channel.
Note: do you see any differences?

Proceed as follows:
• halve the trace first;
– note: you have to ensure that there are enough observations;
• compute histograms;
• use e.g. χ2 test to compare distributions.
fi,E(D)
0.15
fi,E(D)
0.11
0.11
0.083
0.075
0.055
0.038
0.028
0
0 7.14 14.29 21.43 28.57 35.71 42.86 50
0
iD 0 7.14 14.29 21.43 28.57 35.71 42.86 50
(a) First halve (b) Second halve
Figure 21: Histograms of relative frequencies for both halves.

T 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
m =
0 1 10 61 72 126 329 914 602 513 476 522 433 427 870 353 78 36 21 16 7
T 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
n =
0 11 17 24 36 31 70 251 213 176 240 212 197 167 266 118 55 49 26 19 7
Figure 22: Values mi , ni , i = 0, 1, . . . , 19 (1.72n1/3 ∼ 20).
χ2 statistics is given by:

k r r 2
X 1 n m
χ2 = mi + ni = 4907, (68)
i=1
mi + ni m n
• χ2α=0.05 (19) = 30.14 < 4907;

• H0 is rejected: distributions are completely different.
Conclusion: since 1-dimensional distributions are different trace is not strictly stationary.

Reminder of Statistics: Lecturer: Dmitri A. Moltchanov E-Mail: Moltchan@cs - Tut.fi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reminder of Statistics: Lecturer: Dmitri A. Moltchanov E-Mail: Moltchan@cs - Tut.fi

Uploaded by

Copyright:

Available Formats

Reminder of statistics

Lecturer: Dmitri A. Moltchanov

Lecture: Reminder of statistics 2

1. Why do we need statistics?

Definition: statistic develops methods of registration, description and analysis of data.

Examples of tasks we are interested in:

Lecture: Reminder of statistics 3

1.1. Tasks of the statistics

Lecture: Reminder of statistics 4

1.2. Example: why do we need to know statistics?

It seems we have to do the following:

Lecture: Reminder of statistics 5

1.3. Example: applicability in our course

Lecture: Reminder of statistics 6

Modeling arrival and service processes:

Analyzing simulation results:

Lecture: Reminder of statistics 7

2. Description of statistical data

Note the following:

Another representation is ordered statistical set:

Lecture: Reminder of statistics 8

inintial statistical set ordered statistical set

Figure 1: Initial and ordered statistical sets.

Lecture: Reminder of statistics 9

2.1. Statistical probability distribution function

FX? (x) = P r? {X ≤ x}. (3)

• equivalent to probability distribution function: P r{X < x};

Lecture: Reminder of statistics 10

Figure 2: Example of statistical distribution function.

• black dots denote a values of function at points of discontinuity;

Lecture: Reminder of statistics 11

Figure 3: Trend of statistical CDF to CDF of continuous RV X.

Lecture: Reminder of statistics 12

Lecture: Reminder of statistics 13

2.2. Grouped statistical set and histogram of relative frequencies

xmax = max xk , xmin = min xk . (4)

• be smallest and largest values.

Lecture: Reminder of statistics 14

x1-x2 x2-x3 x3-x4 x4-x5 ... xi-xi+1 ... xN-1-xN

Figure 4: Grouped statistical set.

Lecture: Reminder of statistics 15

Note the following:

Lecture: Reminder of statistics 16

Lecture: Reminder of statistics 17

Lecture: Reminder of statistics 18

Using fi? , i = 1, 2, . . . , N we can now construct a histogram:

Figure 5: An example of histogram for continuous RV X.

Lecture: Reminder of statistics 19

Lecture: Reminder of statistics 20

Lecture: Reminder of statistics 21

Lecture: Reminder of statistics 22

3. Estimating parameters of the sample

Lecture: Reminder of statistics 23

3.1. Point and interval estimators

There are two types of estimators for parameters:

Lecture: Reminder of statistics 24

4. Point estimators of parameters

The sample variance:

The k th moment and k th central moment of the sample:

Major questions we have to answer:

Lecture: Reminder of statistics 25

4.1. Mean is a RV!

Note: it does not matter what distribution RV X follows.