You are on page 1of 59

Probability & Statistics

17.07.2020
Anh Tuan Tran (Ph.D.) & Thinh Tien Nguyen (Ph.D.)
1. Central Limit Theorem
Central Limit Theorem

Theorem:
 Let X1 , … , X n be i.i.d. random variables with expected
value E Xi = μ and variance 0 < D Xi = σ2 < +∞
for i = 1, … , n.
 Then, the random variable
X − μ X1 + ⋯ + Xn − nμ
Zn ≔ σ =

n
converges in distribution to the standard normal
random variable as n → +∞.
Central Limit Theorem
Example:
 Toss a fair coin n times.
 Let X i be 1 if Head occurs and 0 if Tail occurs in the ith
toss for i = 1, … , n.
 E X i = p = 0.5 and D X i = p(1 − p) for i = 1, … , n.
 Then, the random variable
X1 + ⋯ + Xn − np
Zn ≔
np 1 − p
converges in distribution to the standard normal random
variable as n → +∞.
 Binom n, p converges to N np, np 1 − p as n → +∞.
Central Limit Theorem

n=2
Central Limit Theorem

n=5
Central Limit Theorem

n = 30
Central Limit Theorem
Example:
 Roll n dice.
 Let X i be the number occurs on the ith die for i = 1, … , n.
7 35
 E Xi = and D Xi = for i = 1, … , n.
2 12
 Then, the random variable
7
X1 + ⋯ + X n − n
Zn ≔ 2
35
n
12
converges in distribution to the standard normal random
variable as n → +∞.
7 35
 Zn converges to N n, n as n → +∞.
2 12
Central Limit Theorem

n=1
Central Limit Theorem

n=2
Central Limit Theorem

n=8
Central Limit Theorem
Example:

A bank teller serves customers standing in the queue


one by one. Suppose that the service time Xi for
customer i has mean E Xi = 2 (minutes) and D Xi =
1. We assume that service times for different bank
customers are independent. Let Y be the total time the
bank teller spends serving 50 customers.

Find the probability that the bank teller spends from 90


to 110 minutes for the customers.
Central Limit Theorem

Y = X1 + ⋯ + X50

P 90 ≤ Y ≤ 110
90 − 2 ⋅ 50 Y − 2 ⋅ 50 110 − 2 ⋅ 50
=P ≤ ≤
50 50 50
≈ P − 2 ≤ Z ≤ 2 ≈ 0.8427

Where Z~N 0,1 .


2. Hypothesis testing
Hypothesis testing

Motivation problem 1:
 Select randomly 100 people in a city to compute the
average height.
 Repeat the above steps for few times, and the
records of the average height is a sequence
approximating 1.65 m.
 The average height of the people in the city is
exactly 1.65 m or not?
Hypothesis testing

Motivation problem 2:
 A dataset of the final scores of a group of 300
students.
 The group can be divided into 2 subgroups of boys
and girls.
 Compute the average score of each subgroup, boys:
7.91/10 and girls: 6.96/10
 Sex affects the performance of the students? i.e. the
difference between the average scores is
significant?
Null and alternative hypotheses

Null hypothesis:
 The hypothesis that is often the opposite of our
guess.
 Denoted by H0 .

Alternative hypothesis:
 The hypothesis that is often consistent with our
guess and is opposite to the null hypothesis.
 Denoted by Ha or H1 .
Null and alternative hypotheses
Example 1 (One-sample test):
Two-tailed test:
The average height of the people of the city is exactly
1.65 m?
H0 : μ = 1.65
Ha : μ ≠ 1.65
One-tailed test (Right or left):
The average height of the people of the city is less
than or equal to (greater than or equal to) 1.65 m?
H0 : μ ≤ 1.65 H0 : μ ≥ 1.65
or
Ha : μ > 1.65 Ha : μ < 1.65
Null and alternative hypotheses

Example 2 (Two-independent-samples test):


Sex affects the performance of the students? i.e. the
difference between the average scores of the boys
and the girls in the class is really significant?
Two-tailed test:
H0 : μ1 = μ2
Ha : μ1 ≠ μ2
One-tailed tests (Right or left):
H0 : μ1 ≤ μ2 H0 : μ1 ≥ μ2
or
Ha : μ1 > μ2 Ha : μ1 < μ2
Test statistic

Definition:

 A test statistic is the output of a scalar function of all


the observations (data).
 The test statistic is constructed based on the
assumption the null hypothesis H0 is true.
Test statistic
Example (One-sample test):
 We have the heights of 150 people in a city.
 A test statistic of the test of the average height of the
people in the city is 1.65 m is

x − 1.65
t≔ s
150
where
 x is the average height of the sample,
 s is the adjusted standard deviation of the sample.
Test statistic
Why t?
 If the null hypothesis H0 is true and the average height of
the people in the city is exactly μ0 = 1.65 (m).
 By the central limit theorem, for n large enough
X − μ0
T≔ ~N 0,1
S/ n
where
 X is the random variable, the possible values of X are the
average heights of every sample of size n taken from the
people in the city,
 S is the random variable, the possible values of S are the
adjusted standard deviations of every sample of size n
taken from the people in the city.
Test statistic
p-value

Definition:
 Assume H0 is true.
 Let T be a test statistic random variable deduced
from H0 .
 Let t be the observed test statistic from the data.
 Then
 Right tests: p-value = P(T ≥ t|H0 ),
 Left tests: p-value = P T ≤ t|H0 ,
 Two-tailed tests:
p-value = 2 min P T ≥ t|H0 , P(T ≤ t|H0 )
p-value

Right-tailed test
p-value
p-value

Left-tailed test

p-value
p-value

Two-tailed test p-value=2 times the


min
Two types of errors
Definition:

 Type I (False positive): reject H0 when it is actually true.


 Type II (True negative): accept H0 when it is actually
false.

Example:

 Type I: Reject the hypothesis that the average height of


the people in the city is 1.65 m when it is exactly 1.65 m.
 Type II: Accept the hypothesis that the average height of
the people in the city is 1.65 m when it is not the case.
Significance level
Definition:

 Assume H0 is true.
 The probability that H0 will be rejected is called a
significance level.
 Denoted by α.

Example:

α = 0.05 indicates a 5% risk of concluding that rejecting the


1.65 m average height of the people in the city when it is
exactly the case.
Significance level

Right-tailed test
Significance level

Left-tailed test
Significance level

Two-tailed test
Accepting H0

If the p-value is larger


the significance level α,
we accept H0 .
Otherwise, reject it.
Accepting H0
Example:
H0 : μ ≤ 1.65
Ha : μ > 1.65

 P T ≥ t|H0 = p-value > α: Accept H0 .


 P T ≥ t|H0 = p-value < α: Reject H0 .

X−1.65 x−1.65
T= S ~N(0,1) and t = s .
n n

where x and s are resp. mean and adjusted standard


deviation from size n (large) sample of observed data.
3. Useful tests
One-sample t-test

When we want to test the hypothesis of the mean of


the whole population is not equal to a constant μ0 .

In two-tailed tests:
H0 : μ = μ0
Ha : μ ≠ μ0

In one-tailed tests:
H0 : μ ≤ μ0 H0 : μ ≥ μ0
or
Ha : μ > μ0 Ha : μ < μ0
One-sample t-test
Test statistic:
x − μ0
t≔ s ~T n − 1
n

 x is the mean of the sample.


 s is the adjusted standard deviation of the sample.
 The sample size n > 30 or we do need to assume
the normal distribution of the whole population.
 T n − 1 is the Student’s t distribution with freedom
degree n − 1.
Student’s t-distribution

Density function:

X~T n . Then
n+1 n+1
Γ x2

2
2
f x = n 1+ .
nπΓ n
2
Student’s t-distribution

Gamma function:

+∞

Γ x ≔ y x−1 e−y dy.


0
Student’s t-distribution
Student’s t distribution

Property:

Let
X~T n .

Then for n ≥ 30,

X~N 0,1 .
One-sample t-test

Example:

In a manufactory, a machine is used to package sugar


for each 1 kg. To check if it works properly, workers
select 100 packages randomly with the weights as
follows.

Weight 0.95 0.97 0.99 1.01 1.03 1.05


#Packages 9 31 40 15 3 2

t ≈ −6.92, p − value ≈ 4.522e − 10


Two-independent-samples t-test

When we want to compare the means μ1 and μ2 of two


independent samples.

In two-tailed tests:
H0 : μ1 = μ2
Ha : μ1 ≠ μ2

In one-tailed tests:
H0 : μ1 ≤ μ2 H0 : μ1 ≥ μ2
or
Ha : μ1 > μ2 Ha : μ1 < μ2
Two-independent-samples t-test
Test statistic (Equal variances):
x1 − x2
t≔ ~T(n1 + n2 − 2)
1 1
sp +
n1 n2

 x1 , x2 is the means of the samples.


 s1 , s2 are the adjusted standard deviations of the samples
2 2
n1 − 1 s1 + n 2 − 1 s 2
sp2 ≔
n1 + n2 − 2
 The sample sizes n1 , n2 > 30 or we need to assume the
normal distribution on each group.
 The two samples are independent (Otherwise, another
test is applied).
Two-independent-samples t-test
Test statistic (Unequal variances):
x1 − x2
t≔ ~T(df)
s12 s22
+
n1 n2

 x1 , x2 is the means of the samples.


 s1 , s2 are the adjusted standard deviations of the
samples.
 The sample sizes n1 , n2 > 30 or we need to assume the
normal distribution on each group.
 The two samples are independent (Otherwise, another
test is applied).
Two-independent-samples t-test

 Degree of freedom:
2 2 2
s1 s2
+
n1 n2
df ≔
2 2 2 2
1 s1 1 s2
+
n1 − 1 n1 n2 − 1 n2
Two-independent samples test
Example:

In order to compare the average weights of rural and urban


births, 10000 births were weighed. Here is the summary table.

Region #Births Average weight (Adjusted) standard deviation


Rural 8000 3.0 kg 0.3 kg
Urban 2000 3.2 kg 0.2 kg

Equal variances:
t ≈ −28.23, df ≈ 9998, p − value ≈ 2.26e − 1.69

Not equal variances:


t ≈ −35.77, df ≈ 4523, p − value ≈ 4.46e − 247
F-test (two independent samples)

When we want to compare the variances σ1 and σ2 of


two independent samples.

In two-tailed tests:
H0 : σ12 = σ22
Ha : σ12 ≠ σ22
In one-tailed tests:
H0 : σ12 ≤ σ22 H0 : σ12 ≥ σ22
or
Ha : σ12 > σ22 Ha : σ12 < σ22
F-test (two independent samples)

Test statistic:
s12
f ≔ 2 ~F n1 − 1, n2 − 1
s2

 si is the adjusted standard deviation of the ith


sample of size ni for i = 1,2.
 F n1 − 1, n2 − 1 is the (Fisher-Snedecor’s) F
distribution with freedom degree n1 − 1 and n2 − 1.
F-distribution

Density function:

X~F m, n . Then for x > 0


m+n m n m
Γ m 2 n2 x 2 −1
f x = 2 .
m n m+n
Γ Γ n + mx 2
2 2
F-distribution
F-test (two independent samples)
Example:

In order to compare the average weights of rural and


urban births, 10000 births were weighed. Here is the
summary table.

Region #Births Average weight (Adjusted) standard deviation


Rural 8000 3.0 kg 0.3 kg
Urban 2000 3.2 kg 0.2 kg

f ≈ 2.25, df1 = 7999, df2 = 1999 ,


p − value ≈ 1.11e − 16
One-way ANOVA (Analysis of variances)

When we want to compare the means μi of more than


two independent samples.

H0 : μ1 = ⋯ = μk
Ha : ∃i ≠ j, μi ≠ μj
One-way ANOVA (Analysis of variances)
 ni : the number of observed data of the ith group.
 The observed data of the ith group are denoted by
xi1 , xi2 , … , xini
 The average of the ith group:
ni
1
xi ≔ xij .
ni
j=1
 The adjusted variance of the ith group:
ni
2 1 2
si ≔ xij − xi .
ni − 1
j=1
One-way ANOVA (Analysis of variances)

 n: the number of observed data.


 The average of the whole sample:
k ni
1
x≔ xij .
n
i=1 j=1
 The adjusted variance of the whole sample:
k ni
1 2
s2 ≔ xij − x .
n−1
i=1 j=1
One-way ANOVA (Analysis of variances)
 The total sum of squares:
k ni
2
SST ≔ xij − x .
i=1 j=1
 The sum of squares within groups:
k ni
2
SSE ≔ xij − xi .
i=1 j=1
 The sum of squares between groups and x:
k
2
SSA ≔ ni xi − x = SST − SSE.
i=1
One-way ANOVA (Analysis of variances)

Test statistic:

n − k SSA
f≔ ⋅ ~F k − 1, n − k
k − 1 SSE

where F k − 1, n − k is the (Fisher-Snedecor’s) F


distribution with freedom degree k − 1 and n − k.
One-way ANOVA (Analysis of variances)

Comments:

 The groups are independent.


 The size of each group is large enough or they have
normal distribution.
 Equal variances must be assumed.
One-way ANOVA (Analysis of variances)
Example:

Amount of Alcaloid (mg) in a new herb in the three


regions.

Region A: 7.5, 6.8, 7.1, 7.5, 6.8, 6.6, 7.8


Region B: 5.8, 5.6, 6.1, 6.0, 5.7
Region C: 6.1, 6.3, 6.5, 6.4, 6.5, 6.3

f ≈ 26.56, df1 = 2, df2 = 15 ,


p − value ≈ 1.17e − 05

You might also like