You are on page 1of 73

Reminder of statistics

Lecturer: Dmitri A. Moltchanov


E-mail: moltchan@cs.tut.fi

http://www.cs.tut.fi/kurssit/ELT-53606/
Network analysis and dimensioning I D.Moltchanov, TUT, 2013
OUTLINE:
• Why do we need statistics?
• Description of statistical data;
• Statistical CDFs and histograms;
• Estimating parameters of the sample;
• Point estimators of parameters;
• Interval estimators of parameters;
• Criteria of fitting accuracy;
• Criteria of homogeneity of samples;
• Statistics of stochastic processes;
• Tests for autocorrelations and white noise;
• Tests for stationarity.

Lecture: Reminder of statistics 2


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

1. Why do we need statistics?


The aim of stochastic modeling:
• develop probabilistic models of random events;
• some examples:
– estimate parameters of the traffic model;
– estimate values obtained via simulations.

Definition: statistic develops methods of registration, description and analysis of data.

Examples of tasks we are interested in:


• determine probability of random event based on its frequency;
• determine approximating distribution law of empirical data;
• estimate parameters of distribution law based on empirical data;
• check statistical hypothesis.

Lecture: Reminder of statistics 3


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

1.1. Tasks of the statistics


There are following tasks:
• description of events;
• analysis and forecasting;
• conclusions.
1. Description of events:
• how to represent these data in convenient form;
• particularly, what kind of statistical table, graphs we should use (are best suited for . . . ).
2. Analysis and forecasting:
• provide an estimate of a certain parameter (mean, variance, st.dev., etc.);
• given a certain number of experiments, determine how accurate these estimates are;
• predict the next value (of stochastic process) with a certain accuracy.
3. Conclusions:
• hypothesis testing.

Lecture: Reminder of statistics 4


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

1.2. Example: why do we need to know statistics?


Example: how to estimate the variance:
• we measured a certain RV N times to get Xi , i = 1, 2, . . . , N measurements.
From the probability theory we know:
2 2
 
σ [X] = E (X − E[X]) . (1)

It seems we have to do the following:


N
1 X
σ 2 [X] = (Xi − m)2 . (2)
N i=1

where
• σ 2 [X] is the estimate of variance;
PN
• m is the estimate of mean, given by 1/N i=1 Xi .

Important: such an estimate of variance is biased meaning that we make systematic error.

Lecture: Reminder of statistics 5


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

1.3. Example: applicability in our course


To provide quantitative analysis of a system we have to:
• provide adequate traffic model:
– determine important statistical parameters of input traffic;
– match these parameters using appropriate traffic model.
• provide model of the service process;
– determine statistical parameters of the service process;
– match these parameters using appropriate model.
• analyze the system: what is the load a given system may carry?
– analytic approach;
– simulation study.
• solve the inverse task: what are system parameters to carry a given load?
– analytic approach;
– simulation study.

Lecture: Reminder of statistics 6


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Questions that statistics help to answer:

Modeling arrival and service processes:


• what statistical parameters must be taken into account?
• what statistical parameters are statistically significant?
• how many observations we need to consider?
• how to estimate parameters when only few observations are available?
• guess a model when no statistics are available.

Analyzing simulation results:


• what are the point estimates of the parameters?
• what are the confidence limits for parameters?
• how to best organize collection of data?

Lecture: Reminder of statistics 7


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

2. Description of statistical data


After experiments we are given a list containing:
• the number of experiment k, k = 0, 1, . . . , N ;
• the value xk , k = 0, 1, . . . , N of RV X under investigation;
• this list is called an initial statistical set or sample.

Note the following:


• if the number of experiments is high, initial statistical set is not convenient to deal with;
• other representations are required.

Another representation is ordered statistical set:


• put all values obtained in statistical experiments starting from smallest up to highest;
• re-enumerate values accordingly.

Lecture: Reminder of statistics 8


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

inintial statistical set ordered statistical set


k xk i xi

1 82 1 75
2 80 2 78
3 80 3 78
4 78 4 80
5 78 5 80
6 84 6 81
7 82 7 82
8 75 8 82
9 85 9 84
10 81 10 85

Figure 1: Initial and ordered statistical sets.

Lecture: Reminder of statistics 9


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

2.1. Statistical probability distribution function


Distributional information can be provided in terms of statistical CDF:

FX? (x) = P r? {X ≤ x}. (3)

• equivalent to probability distribution function: P r{X < x};


• often called empirical CDF.
Properties of empirical CDF:
• this is a step function;
• it is monotone and non-decreasing;
• it is equal to zero for all x which are less than the smallest observed value of RV X:
• it is equal to one for all x which are greater than the highest observed value of RV X.
Note that statistical CDF has each step equal to li /N :
• li is the number of occurrences of values up to value xi ;
• N is the overall number of experiments.

Lecture: Reminder of statistics 10


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

FX* ( x) = Pr{ X £ x}

0 x1 x2 x3 x4 x5 x

Figure 2: Example of statistical distribution function.

• black dots denote a values of function at points of discontinuity;


• if we define statistical CDF, P r{X < x} this graph will be different.

Lecture: Reminder of statistics 11


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Note the following:
• assume RV X is continuous;
• if number of experiments increases empirical CDF tends to CDF of RV X.

FX* ( x) = Pr{ X £ x}

0 x1 x2 x3 x4 x5 x

Figure 3: Trend of statistical CDF to CDF of continuous RV X.

Lecture: Reminder of statistics 12


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

Fi(D) Fi(D)
100 1

50 0.75

0 0.5

50 0.25

100 0
0 14.29 28.57 42.86 57.14 71.43 85.71 100 100 64.29 28.57 7.14 42.86 78.57 114.29 150
iD iD
SPDF: 100 radom numbers with Normal(15,30) SPDF: 100 radom numbers with Normal(15,30)
Fi(D) Fi(D)
200 1

125 0.75

50 0.5

25 0.25

100 0
0 142.86 285.71 428.57 571.43 714.29 857.14 1000 100 64.29 28.57 7.14 42.86 78.57 114.29 150
iD iD
SPDF: 1000 radom numbers with Normal(15,30) SPDF: 1000 radom numbers with Normal(15,30)

Lecture: Reminder of statistics 13


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

2.2. Grouped statistical set and histogram of relative frequencies


Instead of statistical CDF, one may consider:
• grouped statistical set and/or
• histogram of relative frequencies.
Note: the latter if often called just histogram.
Assume that we have measured RV X K times:
• we got outcomes xk , k = 1, 2, . . . , K.
Let us define the following:

xmax = max xk , xmin = min xk . (4)


∀k ∀k

• be smallest and largest values.


We can identify the following property:
• all values of measured RV X are between xmin and xmax values, including both.

Lecture: Reminder of statistics 14


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
To define a grouped statistical set we have to do the following:
• consider a part of 0X axis between xmin and xmax , including both;
• divide this part of axis into N non-overlapping intervals.
Often these intervals are equal to each other (it is not, however, required):
xmax − xmin
L= (5)
N
Definition: grouped statistical set is table where:
• 1st row: intervals;
• 2nd row: corresponding frequencies.

x1-x2 x2-x3 x3-x4 x4-x5 ... xi-xi+1 ... xN-1-xN

p1 * p2 * p3 * p4 * ... pi * ... pN *

Figure 4: Grouped statistical set.

Lecture: Reminder of statistics 15


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Frequency of event p?i is the ratio of:
• the number of experiments in which X ∈ {xi , xi+1 };
• overall number of experiments, N :
li
p?i = . (6)
N
– these frequencies are sometimes called relative frequencies.
For any grouped statistical set the following condition holds:
• all frequencies in grouped statistical set must sum up to 1:
N
X
p?i = 1. (7)
i=1

Note the following:


• if RV X is continuous and some observations is at bound between intervals (rounding):
– one may assign it to any neighboring interval;
• if RV X is discrete RV we MAY get an analogue of PF (we should count each value).

Lecture: Reminder of statistics 16


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
How to get analogue of PF if RV X is discrete:
• assume we have x1 , x2 , . . . , xN observations of RV X;
• determine maxi xi and mini xi ;
• determine the number of histogram bins as: l = maxi xi ;
• determine the vector of the histogram bins as;
maxi xi − mini xi
min xi + l . (8)
i maxi xi
– the length of the bin is then:
maxi xi − mini xi
. (9)
maxi xi
• classify observations to these bins;
• construct the histogram.
Note: works well when N is large (N > 1000).
The idea: construct intervals such that only of value falls in one interval.

Lecture: Reminder of statistics 17


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

fi,E(D) E(i)
0.15 100

0.11 75

0.075 50

0.038 25

0 0
0 5.71 11.43 17.14 22.86 28.57 34.29 40 0 14.29 28.57 42.86 57.14 71.43 85.71 100
iD i
Histogram: 100 radom numbers with Geom(0.1) 100 radom numbers with Geom(0.1)
fi,E(D)
E(i)
0.1
100

0.075
75

0.05
50

0.025
25

0
0 5.71 11.43 17.14 22.86 28.57 34.29 40 0 4
0 1428.57 2857.14 4285.71 5714.29 7142.86 8571.43 1 .10
iD
i
Histogram: 10000 radom numbers with Geom(0.1)
10000 radom numbers with Geom(0.1)

Lecture: Reminder of statistics 18


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
What if: we observe values of continuous RV X.
Divide every frequency by a length of the interval to get frequency density:
p?i
fi? = . (10)
xi+1 − xi

Using fi? , i = 1, 2, . . . , N we can now construct a histogram:


• by definition the area of each bin is equal to fi? , i = 1, 2, . . . , N .

f*X(x)

x1 x2 x3 x4 x5 x

Figure 5: An example of histogram for continuous RV X.

Lecture: Reminder of statistics 19


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

fi,E(D) E(i)
0.03 100

0.0225 50

0.015 0

0.0075 50

0 100
150 107.14 64.29 21.43 21.43 64.29 107.14 150 0 14.29 28.57 42.86 57.14 71.43 85.71 100

iD i
Histogram: 100 radom numbers with Normal(15,30) 100 radom numbers with Normal(15,30)
fi,E(D) E(i)
0.015 200

0.0113 125

0.0075 50

0.0038 25

0 100
150 107.14 64.29 21.43 21.43 64.29 107.14 150 0 142.86 285.71 428.57 571.43 714.29 857.14 1000
iD i
Histogram: 1000 radom numbers with Normal(15,30) 1000 radom numbers with Normal(15,30)

Lecture: Reminder of statistics 20


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
GENERAL ALGORITHM: for histogram of relative frequencies:
• assume we have x1 , x2 , . . . , xN observations of RV X;
• determine maxi xi and mini xi ;
• determine the number of histogram bins as: l = int(1.72(N )1/3 );
• determine the vector of the histogram bins as;
maxi xi − mini xi
min xi + l . (11)
i maxi xi + 1
– the length of the bin is then:
maxi xi − mini xi
. (12)
maxi xi + 1
• classify observations to these bins;
• divide the number of observations in each bin by the length of the bin;
• construct the histogram.
The idea: works well when N is relatively small of RV X is continuous.

Lecture: Reminder of statistics 21


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

fi,E(D)
E(i)
0.1
80

0.075
60

0.05
40

0.025
20

0
0 5.71 11.43 17.14 22.86 28.57 34.29 40 0
0 14.29 28.57 42.86 57.14 71.43 85.71 100
iD
i
Histogram: 100 radom numbers with Geom(0.1)
100 radom numbers with Geom(0.1)
fi,E(D) E(i)
0.1 150

0.075 112.5

0.05 75

0.025 37.5

0 0 4
0 5.71 11.43 17.14 22.86 28.57 34.29 40 0 1428.57 2857.14 4285.71 5714.29 7142.86 8571.43 1 .10

iD i
Histogram: 10000 radom numbers with Geom(0.1) 10000 radom numbers with Geom(0.1)

Lecture: Reminder of statistics 22


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

3. Estimating parameters of the sample


Why we may turn to just parameters:
• statistical distributions are difficult to estimate:
– we may not be given enough statistical data to deal with (n < 100);
– something have to be done on-line.
• to approximate distributions using method of moments;
• sometimes it is not needed to determine a distribution:
– since this law may be known in advance;
– distribution is defined using a certain parameter.
• we are only interested in a certain parameter.
To proceed, we assume:
• we consider RV X;
• we observed N values of RV: Xi , i = 1, 2, . . . , N .

Lecture: Reminder of statistics 23


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

3.1. Point and interval estimators


Notation:
• a is some parameter whose value we want to estimate;
• a is the estimate of this parameter.
Important note: estimate is the function of experimental data:

a = φ(X1 , X2 , . . . , Xn ). (13)

There are two types of estimators for parameters:


• point estimators:
– the goal is to give the best possible estimate of the parameter;
– the value of parameter a is likely a.
• interval estimators:
– sample is always of limited size: confidential intervals must be provided;
– with a confidence interval (1 − α) (or with risk α) parameter a ∈ [pmin , pmax ].

Lecture: Reminder of statistics 24


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

4. Point estimators of parameters


The sample mean:
N
1 X
m= Xi (14)
N i=1

The sample variance:


N
2 1 X
σ [X] = (Xi − m)2 . (15)
N i=1

The k th moment and k th central moment of the sample:


N N
1 X k 1 X
αk = X , µk = (Xi − m)k . (16)
N i=1 i N i=1

Major questions we have to answer:


• can we use these estimators?
• how close we can estimate the value of a parameter?

Lecture: Reminder of statistics 25


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

4.1. Mean is a RV!


Assumptions:
• finite set of measurements of RV X: Xi , i = 1, 2, . . . , N ;
• measurements Xi , i = 1, 2, . . . , N are independent;
• RV X is characterized by mean E[X] and variance σ 2 [X].
Central limit theorem in statistics:
• the sample mean m is normally distributed RV;
• mean and variance of the sample mean are given by:
"P #
N
i=1 Xi
E[m] = E = E[X] = m,
N
"P #
N
i=1 Xi σ 2 [X]
E[m − E[X]] = E − E[X] = . (17)
N N

Note: it does not matter what distribution RV X follows.

Lecture: Reminder of statistics 26


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Consider the mean of m:
 
1
E[m] = E (X1 + · · · + XN ) =
N
1
= (E[X1 ] + · · · + E[XN ]) =
N
1
= N E[X] = E[X]. (18)
N

Consider the variance of m:


"P #
N  
Xi 1
σ 2 [m] = D i=1
=D (X1 + · · · + XN ) =
N N
 
X1 XN 1
=D + ··· + = 2 (D[X1 ] + · · · + D[XN ]) =
N N N
1 2 σ 2 [X]
= 2 N σ [X] = . (19)
N N

Note: approximation is good enough when N > 30.

Lecture: Reminder of statistics 27


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

4.2. Requirements for points estimators


Estimator of a, denoted by a should be:
• consistent: a with increase of N should converge to a:

a → a, N → ∞. (20)

• unbiased: should not allow systematic bias (errors):

E[a] = a, N → ∞. (21)

• effective: a should be as less stochastic as possible:

σ 2 [a] → 0, N → ∞. (22)

Consistent, unbiased, effective estimator is called absolutely correct.


Note the following:
• sometimes non-effective estimators are used: higher variance;
• sometimes even biased estimators are used.

Lecture: Reminder of statistics 28


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

4.3. Point estimator of the mean


The natural estimate of mean is:
n
1 X
m= Xi . (23)
N i=1

Is this estimate consistent (a → a, N → ∞)? Yes:


N
1 X
lim Xi = E[X]. (24)
N →∞ N
i=1

Is this estimate unbiased (E[a] = a, N → ∞)? Yes:


" N # N N
1 X 1 X 1 X
E[m] = E Xi = E[Xi ] = E[X] = E[X]. (25)
N i=1
N i=1
N i=1

Is this estimate effective (σ 2 [a] → 0, N → ∞)? Yes:


N N
2 1 X 2 1 X 2 σ 2 [X]
σ [m] = 2 σ [Xi ] = 2 σ [X] = . (26)
N i=1 N i=1 N

Lecture: Reminder of statistics 29


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

4.4. Point estimator of the variance


The natural estimator:
N
2 1 X
σ [X] = (Xi − m)2 . (27)
N i=1

Is this estimate consistent (a → a, N → ∞)? Yes:


• let us express it via second moment:
N N N
2 1 X 2 2 1 X 2 1 X 2
σ [X] = lim (Xi − m ) = lim Xi − lim m. (28)
N →∞ N N →∞ N N →∞ N
i=1 i=1 i=1

• the first term is the mean of X 2 that probabilistically tends to its mean: E[X 2 ];
• the second term m2 probabilistically tends to E 2 [X];
• finally, we may find that the estimator is consistent:
N
2 1 X 2
σ [X] = lim (Xi − m2 ) = α2 [X] − E 2 [X] = σ 2 [X]. (29)
N →∞ N
i=1

Lecture: Reminder of statistics 30


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Is this estimate unbiased (E[a] = a, N → ∞)? No:
• substitute an estimate of mean to estimate of variance to get:
 !2 
N N N
2 1 X 2 2 1 X 2 1 X
σ [X] = (Xi − m ) = Xi − Xi  =
N i=1 N i=1 N i=1
N
N −1X 2 2 X
= X − 2 Xi Xj . (30)
N 2 i=1 i N i<j

• centralize measurements and choose the beginning of 0X axis at E[X]:


– variance does not depend on when we choose the beginning of X-axis;
0
– denote centralized measurement by X i :
– rewriting we get:
N
N − 1 X 0 2 2 X 0 0
2
σ [X] = X − X iX j . (31)
N 2 i=1 i N 2 i<j

Lecture: Reminder of statistics 31


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
• determine the mean of variance σ 2 [X]:
" N
# " #
N −1 X 0 2 2 X 0 0
E[σ 2 [X]] = E Xi − E X iX j . (32)
N2 i=1
N2 i<j

0
• note that for any measurement X i , i = 1, 2, . . . , N we have:
 2
0
E X i = σ 2 [X], i = 1, 2, . . . . (33)

• since all measurements were assumed to be independent, we have:


 
0 0
E X i X j = 0, i 6= j (34)

• substituting we get:
N
2 N −1X 2 2 X 0 0 N −1 2 N −1 2
E[σ [X]] = σ [X] − E[X X
i j ] = N σ [X] = σ [X]. (35)
N 2 i=1 N 2 i<j N N

Lecture: Reminder of statistics 32


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
1
PN
Estimate σ 2 [X] = N i=1 (Xi − m)2 is biased:
• its mean is not equal to σ 2 [X] but slightly less (by (N − 1)/N );
• using such an estimate we make a systematic error.
To get unbiased estimate for variance we just multiply by N/(N − 1) to get:
N
N 1 X
σ 2 [X] = (Xi − m)2 . (36)
N − 1 N i=1

The final expression for estimate of variance is now given by:


N
2 1 X
σ [X] = (Xi − m)2 . (37)
N − 1 i=1

Important note:
• multiplier N/(N − 1) must be taken into account whenever N < 50;
• with increase of N the multiplier N/(N − 1) tends to one and can be dropped.

Lecture: Reminder of statistics 33


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

4.5. Point estimator of the covariance


Assume that we are obtained data of two RV X and Y :

(X1 , Y1 ), (X2 , Y2 ), . . . , (Xi , Yi ), . . . , (XN , YN ). (38)

Then consistent and unbiased estimate of covariance:


N
N 1 X
K XY = (Xi − m[X])(Yi − m[Y ]). (39)
N − 1 N i=1

• where
n n
1 X 1 X
mX = Xi , mY = Yi . (40)
N i=1 N i=1

Then consistent and unbiased estimate of correlation coefficient:


K XY
rXY = . (41)
σ 2 [X]σ 2 [Y ]

Lecture: Reminder of statistics 34


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

5. Interval estimators of parameters


Why interval estimators are needed:
• estimates for mean, variance, covariance are all RV!!!
• sometimes it is needed to say how precisely we can estimate some parameter.
The idea:
• we get estimate θ of some parameter θ;
• recall that θ is a random function of experiments;
• we want to say that we are 95% of confident θ ∈ {θ1 , θ2 }: P r{θ1 < θ < θ2 } = γ;
– where γ = 1 − α must be close to 1;
– α: level of significance (usually set to 0.05).
a a
1-a
2 2

q1 q q2

Lecture: Reminder of statistics 35


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

5.1. Confidence interval for the mean


Assume we have X1 , X2 , . . . , XN observation with known variance σ 2 :
• first we find point estimator of the mean (sample mean):
N −1
1 X
m= Xi (42)
N i=0

• mean is normal RV! standardizing RV to N (0, 1) via (Z = (X − µ)/σ) we have that


PN
i=1 Xi − N µ
√ , (43)
σ/ N
– is approximately a standard normal variable N (0, 1);
– where µ is the actual mean.
• let zα be the (1 − α)-quantile (P r{X ≤ x} ≥ 1 − α) of N (0, 1), that is

P r{Z > zα } = α. (44)

• for example, zα=0.05 = 1, 96 (α = 0.05 is often used).

Lecture: Reminder of statistics 36


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
• setting certain α small enough wand observing that N (0, 1) is symmetric:
( PN )
Xi − nµ
P r −zα/2 ≤ i=1 √ ≤ zα/2 ≈ 1 − α, (45)
σ N

• which is equivalent to
 
σ[X] σ[X]
P r E[X] − zα/2 √ ≤ µ ≤ E[X] + zα/2 √ ≈ 1 − α, (46)
N N
– substituting your values to get 100(1 − α)% confidence interval for µ;
– where −zα/2 and zα/2 are upper or lower critical values of N (0, 1).

a a
2 2

Lecture: Reminder of statistics 37


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
The previous result is correct if:
• if observations X1 , X2 , . . . , XN are normally distributed;
• is variance is known.
What if variance is unknown?
• use estimate of the variance σ 2 ;
• mean now has student distribution with (N − 1) degrees of freedom (# of observations −1);
• confidence intervals can be obtained replacing zα/2 by tn−1,α/2 .
The length of the intervals:
• what we want: to make them as small as possible;
• approach 1: increase the number of experiments, N ;
• approach 2: decrease the sample variance σ 2 :
– very effective way;
– you shorten the intervals without increasing experiments!
• we will consider techniques to reduce variance without increasing N .

Lecture: Reminder of statistics 38


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

6. Criteria of fitting accuracy


What we want to do:
• we test whether a sample has a certain distribution;
• distribution is usually assumed to be analytical.
What we can do:
• compare CDF F (x) with empirical CDF F ? (x);
• compare pdf f (x) or pmf pi with histogram f ? (x).
What tests are available:
• χ2 test:
– compare pdf f (x) and empirical pdf f ? (x);
• Kolmogorov’s test:
– compare CDF F (x) and empirical CDF F ? (x);
General approach: hypothesis testing!

Lecture: Reminder of statistics 39


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

6.1. χ2 test: discrete distribution


After parameters matching procedure we must check the accuracy of this fitting.

We formulate it as follows: whether data belong to a chosen distribution:


• to answer such kind of questions we use a fitting criteria;
• we consider one of the most popular Pearson’s χ2 (chi-square) criteria.

To proceed, we assume that:


• the random variable X under consideration is discrete one with possible values {x1 , x2 , . . . , xk };
• we provided n independent experiments in which X took on certain values in set {x1 , x2 , . . . , xk };
• n is the whole number of experiments;
• ni , i = 1, 2, . . . , k, is the number of experiments where event {X = xi }, i = 1, 2, . . . , k occurred;
• p?i = ni /n, i = 1, 2, . . . , k, are frequencies of events {X = xi }, i = 1, 2, . . . , k.

Lecture: Reminder of statistics 40


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Based on statistical data we construct the following grouped statistical set for RV X.

x1 x2 x3 x4 ... xi ... xk

p1* p2* p3* p4* ... pi* ... pk*

Figure 6: Statistical distribution set.

Let us take a null hypothesis H0 consisting in that the RV X has the following PF.

x1 x2 x3 x4 ... xi ... xk

p1 p2 p3 p4 ... pi ... pk

Figure 7: Hypothetical probability function.

• deviations of frequencies p?i from probabilities pi are just due to stochastic reasons.

Note: to verify or reject this hypothesis we have to define a certain measure of deviation.

Lecture: Reminder of statistics 41


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
As a measure of deviation, R one may choose:
• a sum of squared deviations (p?i − pi ) with certain ’weights’ ci , i = 1, 2, . . . , k, as follows:
k
X
R= ci (p?i − pi )2 . (47)
i=1

Deviations associated with pi , i = 1, 2, . . . , k, are not equal in their impacts:


• deviation (p?i − pi ) may have huge impact if the absolute value of pi is large
• contrarily, it may be of a little significance if the absolute value of pi is small;
• to avoid it: coefficients ci , i = 1, 2, . . . , k are introduced;
• coefficients ci , i = 1, 2, . . . , k, should be inverse proportional to pi , i = 1, 2, . . . , k.
Question: how to choose ci , i = 1, 2, . . . , k?
Pearson proposed to choose ci , i = 1, 2, . . . , k as follows:
n
ci = , i = 1, 2, . . . , k. (48)
pi
• where n is the number of experiments.

Lecture: Reminder of statistics 42


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
If previous is satisfied, the distribution of RV R is characterized by the following:
• it does not depend on distribution law of RV X under consideration;
• it depends only little on the number of experiments n;
• it depends on the number of bins k;
• with increase of n tends to χ2 distribution.
The measure of deviation R is denoted by χ2 and given by:
k n k
2
X i
2 X ni  ni 2
χ = ci − pi = − pi . (49)
i=1
n p
i=1 i
n

Taking n/pi into the sum we have:


k
X (ni − npi )2
R = χ2 = . (50)
i=1
npi

• which is the final expression for the χ2 criteria.

Lecture: Reminder of statistics 43


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Distribution χ2 depends on number of degrees of freedom r:

r = k − l − 1, (51)

• k is the number of histogram bins;


• l is the number bindings applied to observations:
– we always estimate frequencies from statistical data:

p?1 + p?2 + · · · + p?k = 1, that is why: r = k − 1! (52)

– if we also estimated mean to fit our distribution:


k
X −
xi p?i = E[X], in this case: r = k − 1 − 1 = k − 2! (53)
i=1

– if we also estimated variance to fit our distribution:


k
X −
(xi − E[X])2 p?i = D[X], in this case: r = k − 1 − 2 = k − 3! (54)
i=1

Note: estimation of parameters is required to parameterize hypothetical distribution.

Lecture: Reminder of statistics 44


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
χ2 distribution is extensively tabulated in literature in the following form:
• input: value of χ21−α (r) statistic;
• input: number of degrees of freedom, r;
• output: probability that RV distributed according to χ2 is more than a given value of χ2 .

Null hypothesis, H0 : statistical data are taken from distribution F (x).

The procedure of checking the null hypothesis is as follows:


• if χ21−α (r) determined from tables is less than computed χ2 :
– H0 must be accepted;
– deviation from between statistical and hypothetical distribution is not significant
• if χ21−α (r) determined from tables is greater than computed χ2 :
– H0 must be rejected;
– deviation from between statistical and hypothetical distribution is significant.

Lecture: Reminder of statistics 45


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

6.2. χ2 test: continuous distribution


How to: get grouped statistical set (histogram)
• where ∆i , i = 1, 2, . . . , k are intervals.
D1 D 2 D 3 D 4 ... Di ... Dk

p1* p2* p3* p4* ... pi* ... pk*

Figure 8: Grouped statistical set.

• p?i is the frequency of RV X in ith bin defined as follows:


ni
p?i = . (55)
n
• then proceed similarly to discrete RV replacing probabilities as follows:
Z xi+1
pi = f (x)dx. (56)
xi

Note: observations of discrete RV are also often grouped!

Lecture: Reminder of statistics 46


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
The procedure:
• choose significance level α;
• determine the number of bins using either:

k = 1.72N 1/3 , or k = 1 + 3.3 ln N. (57)

• estimate frequencies ni = i, i = 1, 2, . . . , k;
• estimate values of hypothetical function F (x), pi = P r{X ∈ ∆i }, i = 1, 2, . . . , k;
• using tables get quantile χ21−α (k − 1)
• estimate χ2 statistics as
k
2
X (ni − npi )2
χ = (58)
i=1
npi

• compare χ2 and quantile χ21−α (k − 1):


– χ2 < χ21−α (k − 1): H0 is accepted;
– χ2 ≥ χ21−α (k − 1): H0 is rejected.

Lecture: Reminder of statistics 47


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Example: H0 data follow Poisson distribution:
• we have 200 observations;
How we proceed:

• estimate mean from the sample as E[X] = 1;

i −λ
• get probabilities of Poisson distribution pi = (λ /i!)e with λ = E[X]:

p0 = p1 = 0.368, p2 = 0.184, p3 = 0.061, p4 = 0.015, p5 = 0.003, (59)

– note: np5 = 0.6 < 5 and np4 = 3 < 5, np3 = 12.2 > 5;
– we have to join three last intervals (important empirical rule!).
• determine the number of degrees of freedom: r = k − l − 1 = 4 − 1 − 1 = 2;
• choosing α = 0.05, get χ21−α (r) = χ20.95 (2) = 5.99;
• estimate χ2 statistics;
– since χ2 = 0.9 < χ20.95 (2) = 5.99 we accept H0 .

Lecture: Reminder of statistics 48


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

Figure 9: Data for χ2 test.

Figure 10: Computing χ2 test.

Lecture: Reminder of statistics 49


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

6.3. Criteria of fitting accuracy: Kolmogorov test


Assume the following:
• hypothetical CDF F (x) is fully defined (parameters must be known in advance);
• hypothetical CDF F (x) is continuous.
The Kolmogorov’s criterion:

Dn = sup |Fn? (x) − F (x)|, (60)


x

• supx is the least upper bound, practically, maximum;


• F (x) is the hypothetical CDF;
• Fn? (x) is statistical CDF.
Kolmogorov determined statistics:

√ P∞ n −2k 2 x2
k=−∞ (−1) e x > 0,
P r{ nDn < x} → K(x) = (61)
0 x ≤ 0.

• which is tabulated in literature.

Lecture: Reminder of statistics 50


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
The procedure:
• estimate statistical CDF Fn? (x);
• estimate CDF F (x) in points of ai−1 , i = 1, 2, . . . , k:
– ai−1 are left points of intervals ∆i , i = 1, 2, . . . , k;
– ∆i , i = 1, 2, . . . , k are histogram bins.
• estimate di = |Fn? (ai−1 + 0) − F (ai−1 )|, i = 1, 2, . . . , k;

• determine Dn = maxi di and λ = nDn ;
• choose α and find root K1−α of K(x) = 1 − α;
• compare λ and K1−α :
– λ ≤ K1−α : H0 is accepted;
– λ > K1−α : H0 is rejected.

Lecture: Reminder of statistics 51


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Example: using two samples m = n = 100 check H0 that data are exponential:
• use first sample to get estimate of the mean E[X] = 1.01;
• determine parameter of exponential distribution: λ = 1/E[X] = 1/1.01 = 0.99;

Figure 11: Two sets for testing of fitting accuracy.

Lecture: Reminder of statistics 52


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
• use second sample to construct statistics CDF Fn? (x) by consecutively summing;
• estimate values of exponential distribution F (x);
• estimate di = x∈∆i |Fn? (x) − F (x)| = |Fn? (ai−1 + 0) − F (ai−1 )|;
P

• estimate Dn = maxi di = 0.13 and λ = 100Dn = 1.3;
• choose α = 0.05 and find root K1−α of K(x) = 1 − α = 0.95 as K1−α = 1.36;
• since λ = 1.3 < K1−α = 1.36 we accept H0 (BUT IT’S VERY CLOSE!).

Figure 12: Computing Kolmogorov’s test.

Lecture: Reminder of statistics 53


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

7. Homogeneity of samples
Homogeneity of samples:
• we test whether two samples are taken from the same distribution.
We have:
• criteria for fitting to hypothetical distribution;
• criteria for homogeneity of samples
Difference between these two:
• fitting: we compare statistical with analytical one;
• homogeneity: we compare two statistical distributions.
What tests are available:
• Smirnov’s test;
• χ2 test.

Lecture: Reminder of statistics 54


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

7.1. Homogeneity of samples: Smirnov test


Note the following:
• often referred to as Smirnov-Kolmogorov test:
– correct: Smirnov test;
– this is extension of the ideas in Kolmogorov test.
Assume we have:
0 0 0 ?
• sample (x1 , x2 , . . . , xm ) with SPF F1m (x);
00 00 00 ?
• sample (x1 , x2 , . . . , xm ) with F2n (x).
Question: whether these two sample are taken from the same distribution with CDF F (x).
Smirnov proposed to use:
? ?
Dmn = sup |F1m (x) − F2n (x)|. (62)
x

Note: looks similar to Kolmogorov’s statistics but far away practically.

Lecture: Reminder of statistics 55


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Smirnov found that if two sample are taken from the same distribution:
( ) 
P∞ n −2k2 x2
Dmn 
k=−∞ (−1) e x > 0,
Pr p < x → K(x) = (63)
1/m + 1/n 0 x ≤ 0.

• K(x) is the Kolmogorov’s function.


The procedure:
• choose the level of significance α;
• determine K1−α by solving K(x) = 1 − α;
p
• determine Dmn and λ = Dmn / 1/m + 1/n;
• compare λ and K1−α :
– if λ > K1−α two sample are non-homogenous;
– if λ ≤ K1−α two sample are homogenous.
Note the following:
• one can compare only two samples at once;
• samples may have different number of observations.

Lecture: Reminder of statistics 56


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Example: comparing two samples m = n = 100:
? ?
• compute F1n (x) and F2n (x);
? ?
• compute di = |F1n (zi ) − F2n (zi )|, zi are middles of intervals ∆i , i = 1, 2, . . . , k:
? ?
• compute Smirnov’s statistics as: Dnn = supx |F1n (zi ) − F2n (zi )| = maxi di = 0.007;
p √
• compute λ = Dnn / 1/m + 1/n = 0.07/ 0.02=0.50;
• choose α = 0.05 and find root of K(x) = 1 − α = 0.95 from tables as K1−α = 1.36;
• since λ = 0.50 < K1−α = 1.36 we accept H0 .

Figure 13: Computing Smirnov’s test for samples.

Lecture: Reminder of statistics 57


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

7.2. Homogeneity of samples: χ2 test


For two samples: the following statistics has χ2 distribution:
k  r r 2
X 1 n m
χ2 = mi + ni , (64)
i=1
m i + n i m n

• m: number of observation in the first sample;


• n: number of observation in the second sample;
• number of degrees of freedom is r = k − 1.
For several samples: s samples containing n1 , n2 , . . . , ns observations:
k X s
!
2
2
X nij
χ =n −1 , (65)
i=1 j=1
n j v j

• nij : number of observations in sample j that fall into interval ∆i , i = 1, 2, . . . , k;


• vj : numbers of obserbation in all elements that fall into interval ∆i , i = 1, 2, . . . , k;
• number of degrees of freedom is r = (k − 1)(s − 1).

Lecture: Reminder of statistics 58


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Example: comparing two samples m = n = 100:
• combine ∆11 and ∆12 since only few observations are in;
• thus we have k = 11, r = k − 1=10;
• assuming α = 0.05 we find from tables that quantile χ21−α (r) = χ20.95 (10) = 18.3;
• computing statistics χ2 = 10.25 < χ20.95 (10) = 18.3 we accept H0 .

Figure 14: Computing χ2 test for samples.

Lecture: Reminder of statistics 59


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

7.3. Other criteria of homogeneity


We considered: Smirnov’s and χ2 tests:
• powerful and mostly used;
• usually used both for better assurance.
Wilcoxon, Mann and Whitney:
• Wilcoxon in 1945 for two samples with m = n, extended by Mann and Whitney for m 6= n;
• idea: construct ordered statistical set and compare ranks (number in order) of observations.
Criterion of signs: can be used for two samples only with m = n:
• compute number of signs (+/−) of zi = xi − yi , i = 1, 2, . . . , n;
• if samples are homogenous + and − occur with the same frequency;
• H0 : (z1 , z2 , . . . , zn ) is taken from distribution with median 0.5.
Simple check: compare variances and means of two (or more) samples:
• it is mandatory requirement for homogeneity.

Lecture: Reminder of statistics 60


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

8. Tests for autocorrelations and white noise


Note the following:
• most tests are biased:
– reason: ρ(1), ρ(2), . . . are not independently distributed!
• what to do:
– still use these test;
– have enough intuition to judge...

The following test are available:


• intuitive suggestion;
• rule of thumb;
• turning point test;
• ’portmanteau’ test by Box and Pierce;
• modified ’portmanteau’ by Ljung and Box.

Lecture: Reminder of statistics 61


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

8.1. Intuitive suggestion


Use the following:
• observe plot of ACF;
• if there is significant values ρ(i) say that data are more complex than white noise.

KY(i) KY(i)
1 1

0.75 0.75

0.5 0.5

0.25 0.25

0 0

0.25 0.25

0.5 0.5
0 5 10 15 20 0 5 10 15 20
i, lag i, lag

(a) AR(1) model: K(1) = 0.6 (b) AR(1) model: K(1) = 0.0

Figure 15: Suggesting correlation and no correlation.

Lecture: Reminder of statistics 62


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

8.2. Rule of thumb


Rule of thumb:
√ √
• NACF and at lag m, is negligible if its estimate ρi is: − 2/ n < ρi < 2/ n;
• here 2 is approximation of 1.96 (α = 0.05).

KY(i) KY(i)
1 1

0.75 0.75

0.5 0.5

0.25 0.25

0 0

0.25 0.25

0.5 0.5
0 5 10 15 20 0 5 10 15 20
i, lag i, lag

(a) AR(1) model: K(1) = 0.6 (b) AR(1) model: K(1) = 0.0

Figure 16: Suggesting correlation and no correlation (n = 100).

Lecture: Reminder of statistics 63


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

8.3. Turning point test


Testing whether a given series is white noise:
• H0 : process is white noise;
• H1 : process is not white noise.

Note the following:


• this test does not signal autocorrelation:
– autocorrelation is the measure of linear dependence in a process;
– there could be other types of dependence.
• it signal whether a process is purely random: white noise.

White noise: sequence of iid random variables with finite mean anf variance.

Lecture: Reminder of statistics 64


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Do the following:
• consider n successive triples (Xi , Xi+1 , Xi+3 ), i = 0, 1, . . . , n from observations;
• white noise: these values are equally likely to occur in any of 6 possible orders.

HML HLM MHL MLH LMH LHM

Figure 17: Possible orders of within a triple of values.

Tests is as follows:
• in four cases there is a turning point in the middle:
– white noise: there should be (2/3)(n − 2) of such points out of n.
• for large n number of turning points is distributed as N (2n/3, 8n/45);
p
• reject H0 at α = 0.05 if number of turning point is outside 2n/3 ± 1.96 8n/45.

Lecture: Reminder of statistics 65


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

8.4. ’Portmanteau’ test


We test hypothesis that several autocorrelations are jointly zero:
• H0 : ρ1 = ρ2 = · · · = ρm = 0;
• H1 : ρi 6= 0 for some i ∈ {1, 2, . . . , m}.
Box and Pierce statistics:
m
X
Q(m) = n (ρi )2 , (66)
i=1

• n is the number of observations;


• m should be such that m << n;
• this statistics approximately has χ2 distribution with m degrees of freedom.
Note the following:
• test is biased: χ2 is only approximation;
• perform for several m to ensure.

Lecture: Reminder of statistics 66


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

KY(i)
1
c 2a =0.05(5) = 11.07 < Q(5) = 49.013: H0 rejected
0.75
0.5
0.25 c 2a =0.05(10) = 18.31 < Q(10) = 51.416: H0 rejected
0
0.25
c 2a =0.05(15) = 25.00 < Q(15) = 55.332: H0 rejected
0.5
0 5 10 15 20
i, lag

(a) AR(1) model: ρ1 = 0.6

KY(i)
1
c 2a =0.05(5) = 11.07 > Q(5) = 4.193: H0 accepted
0.75
0.5
0.25 c 2a =0.05(10) = 18.31 > Q(10) = 8.416: H0 accepted
0
0.25
c 2a =0.05(15) = 25.00 > Q(15) = 11.499: H0 accepted
0.5
0 5 10 15 20
i, lag

(b) AR(1) model: ρ1 = 0.0

Figure 18: Example of the ’Portmanteau’ test for trace generated from AR(1) model.

Lecture: Reminder of statistics 67


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

8.5. Modified ’Portmanteau’ test


Ljung and Box statistics:
m
X (ρi )2
Q(m) = n(n + 2) . (67)
i=1
n − i

• modification: ρi has variance (n − i)/(n(n + 2)).


Practical considerations:
• the choice of m affects the performance of Q(m) statistic;
• several values of m are often used;
• empirically, m ≈ ln n provides better performance.
Important notes:
• tests must always be performed if one thinks there is autocorrelation;
• in modern teletraffic tests rarely performed!.. Reasons?:
– n is usually very big (MPEG trace we considered ∼ 40000 frame sizes);
– everybody wants to see what he/she wants to see...

Lecture: Reminder of statistics 68


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

KY(i)
1
c 2a =0.05(5) = 11.07 < Q(5) = 50.789: H0 rejected
0.75
0.5
0.25 c 2a =0.05(10) = 18.31 < Q(10) = 53.467: H0 rejected
0
0.25 c 2a =0.05(15) = 25.00 < Q(15) = 58.028: H0 rejected
0.5
0 5 10 15 20
i, lag

(a) AR(1) model: ρ1 = 0.6

KY(i)
1
c 2a =0.05(5) = 11.07 > Q(5) = 4.454: H0 accepted
0.75
0.5
0.25 c 2a =0.05(10) = 18.31 > Q(10) = 9.105: H0 accepted
0
0.25 c 2a =0.05(15) = 25.00 > Q(15) = 12.66: H0 accepted
0.5
0 5 10 15 20
i, lag

(b) AR(1) model: ρ1 = 0.0

Figure 19: Example of the modified ’Portmanteau’ test for trace generated from AR(1) model.

Lecture: Reminder of statistics 69


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

9. Tests for stationarity


Note the following:
• we usually want to know whether observation are not weakly stationary;
• there are no effective test for stationarity;
Obvious way to test:
• divide the set of observations into two or more segments;
• compare statistics of the segments.
What statistics to compare:
• recall the following:
– strictly stationary process: N -dimensional shifted distributions are the same for all N ;
– weakly stationary process: mean and ACF are constant.
• sufficient for observations to be taken from non-stationary process:
– different means;
– different ACF.

Lecture: Reminder of statistics 70


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

9.1. Testing for not strictly stationarity behavior


Note the following:
• strictly stationary: N -dimensional shifted distributions are the same for all N ;
• start with testing whether one-dimensional distributions are the same.
Y(i)
30

24

18

12

0
0 1000 2000 3000 4000

Figure 20: Signal-to-noise ratio process over IEEE 802.11b wireless channel.

Note: do you see any differences?

Lecture: Reminder of statistics 71


Network analysis and dimensioning I D.Moltchanov, TUT, 2013
Proceed as follows:
• halve the trace first;
– note: you have to ensure that there are enough observations;
• compute histograms;
• use e.g. χ2 test to compare distributions.

fi,E(D)
0.15
fi,E(D)
0.11
0.11
0.083
0.075
0.055
0.038
0.028
0
0 7.14 14.29 21.43 28.57 35.71 42.86 50
0
iD 0 7.14 14.29 21.43 28.57 35.71 42.86 50

(a) First halve (b) Second halve

Figure 21: Histograms of relative frequencies for both halves.

Lecture: Reminder of statistics 72


Network analysis and dimensioning I D.Moltchanov, TUT, 2013

T 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
m =
0 1 10 61 72 126 329 914 602 513 476 522 433 427 870 353 78 36 21 16 7

T 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
n =
0 11 17 24 36 31 70 251 213 176 240 212 197 167 266 118 55 49 26 19 7

Figure 22: Values mi , ni , i = 0, 1, . . . , 19 (1.72n1/3 ∼ 20).

χ2 statistics is given by:


k  r r 2
X 1 n m
χ2 = mi + ni = 4907, (68)
i=1
mi + ni m n

• χ2α=0.05 (19) = 30.14 < 4907;


• H0 is rejected: distributions are completely different.

Conclusion: since 1-dimensional distributions are different trace is not strictly stationary.

Lecture: Reminder of statistics 73

You might also like