You are on page 1of 8

NPTEL

Course On

STRUCTURAL
RELIABILITY
Module # 02
Lecture 6

Course Format: Web

Instructor:
Dr. Arunasis Chakraborty
Department of Civil Engineering
Indian Institute of Technology Guwahati
6. Lecture 06: Hypothesis Testing

Tests of Distributions

Generally, the data available to us is either through experimental observations or recorded


signals. They are not bound to exactly follow any mathematical probability distribution model.
Behaviour of the data may be of approximately matching any defined probability distribution.
Hence, data analysis is required in concern to see whether it is following any distribution or not,
if yes than with how much significance. Following sections will discuss two very popular
distribution tests, they are Chi Square Test and Kolmogorov–Smirnov Test.

Chi Square Test

Chi square test is used to analyze whether a given random sample follows any theoretically
defined probability distribution or not. The basic idea is based on evaluating the cumulative error
between the probability density of the random sample and theoretical probability distribution.
Stepwise detailed discussion for conducting Chi square test in view of checking probability
distribution with its statistical parameters are stated below.

Step 1. Firstly, null and alternate hypotheses are formulated based on the probability distribution
and statistical parameters of the random data. Both hypotheses can be presented as

𝐻0 : 𝑋 ~ 𝑓 𝑎, 𝑏 2.6.1
𝐻𝐴 : 𝑋 ≠ 𝑓 𝑎, 𝑏 2.6.2

where 𝐻0 and 𝐻𝐴 denotes null and alternative hypotheses, respectively. 𝑋 is the random
variable and 𝑓 is the assumed probability distribution. Rejection of null hypothesis
states that the given information either doesn't follows the distribution or assumed
parameters or both.

Step 2. Null hypothesis is defined by assuming an appropriate model for the observed data. The
model must be defined with probability distribution and corresponding statistical
parameters. This will stand as basis for estimating the expected frequencies. According
to null hypothesis, chi square test is defined on the basis of comparison of observed

Course Instructor: Dr. Arunasis Chakraborty


1
Lecture 06: Hypothesis Testing
frequencies and expected frequencies as per the assumed model. The test statistics is
expressed as
𝑘
2
2
𝑂𝑖 − 𝐸𝑖
𝜒 = 2.6.3
𝐸𝑖
𝑖=1

where, 𝜒 2 represents the chi square distribution, 𝑂𝑖 and 𝐸𝑖 are observed frequencies of
the given sample and expected frequencies of the assumed model of 𝑖 th interval,
respectively and 𝑘 is total number of intervals of the histogram.

Step 3. Level of significance 𝛼 is selected based on the importance or priority of the data.
Usually, a level of significance of 5% is selected for the general data. This value can be
reduced for higher priority or critical data.

Step 4. Histogram of the observed data is formed to evaluate observed frequencies. Similarly,
expected frequencies are also evaluated from assumed probability distribution and its
parameters.

Step 5. Accepting or rejecting of null hypothesis is dependent on the degrees of freedom of chi
square distribution, level of significance and chi square value. This gives a region of
rejection that above a certain 𝜒 2 value the hypothesis is rejected. This value is based on
the degrees of freedom and level of significance. The degree of freedom is defined by
(𝑘 − 𝑗), where 𝑗 is number of quantities estimated from the given sample for use in
calculating the expected frequencies. Generally, these quantities are number of
observations, mean and standard deviation of sample data, hence, a total of 3 degrees of
freedom are subtracted (i.e., 𝑘 − 3). Note, if only the number of observations is
considered than the degrees of freedom is increased (𝑘 − 1). Now, for a specific degrees
of freedom and level of significance one can find the 𝜒 2 value from a 𝜒 2 value table
with respect to level of significance. This is clearly explained in the following example.

Step 6. If null hypothesis is rejected than an alternative hypothesis is selected and the assumed
model is again chosen to conduct the chi square test from Step 1.

Example

Ex # 01. A series of random data of sample size 40 is mentioned below from an experimental
outcome. Assumed probability distribution model is exponential distribution with level of
significance 5% and 1%. Check whether the null hypothesis is rejected or accepted by
conducting Chi square test.

0.2049 0.0989 2.0637 0.0906 0.4583 2.3275 1.2783 0.6035


0.0434 0.0357 1.8476 0.0298 0.0438 0.7228 0.2228 1.9527
0.8633 0.0880 0.2329 0.0414 0.4220 3.3323 0.1635 0.0683

Course Instructor: Dr. Arunasis Chakraborty


2
Lecture 06: Hypothesis Testing
0.3875 0.2774 0.2969 0.9359 0.4224 1.7650 0.3481 3.4473
1.2840 3.0754 2.3317 2.8821 2.4319 1.1098 3.3258 0.1206

Solu. Initially, null hypothesis and alternate hypothesis is defined as

𝐻0 : 𝑋 ~ 𝜆 𝑒 −𝜆𝑥
𝐻𝐴 : 𝑋 ≠ 𝜆 𝑒 −𝜆𝑥
where, 𝜆 𝑒 −𝜆𝑥 is probability density function [𝑓𝑋 𝑥 ] for exponential distribution. Before
forming a histogram, one must find out mean, standard deviation, the number of intervals and
interval class. Mean of the observed data is evaluated as 𝑋 = ∑𝑥𝑖 𝑛 = 1.0420. For exponential
distribution standard deviation is equal to mean. Parameter 𝜆 is evaluated from mean as 𝜆 =
1 𝑋 = 0.9597.

Number of intervals 𝑘 can be evaluated from as shown below

𝑘 = 1 + 3.3 log10 𝑛 = 1 + 3.3 log10 40 = 6.2868 ≈ 7

Interval class is selected based on the difference between the minimum value 0.0298 and
maximum value 3.4473 of the above mentioned sample per interval. Thus, the interval class
comes out to be

3.4473 − 0.0298
Class = = 0.4882 ≈ 0.5
7
Now, the histogram for the observed data is formed as shown in table below
Class 𝑬𝒊 = 𝒏𝒇𝑿 (𝒙𝒊−𝟏 < 𝑥 < 𝑥𝒊 ) 𝑶𝒊 − 𝑬𝒊 𝟐
𝑶𝒊
Interval 𝑛𝒇𝑿 (𝒙𝒊−𝟏 < 𝑥 < 𝑥𝒊 ) = 𝑬𝒊 𝑬𝒊
40 × 0.9597 × exp −0.9597 × 0.0
< 0.5 21 − 0.9597 × exp(−0.9597 16.9249 0.9812
× 0.5)
40 × 0.9597 × exp −0.9597 × 0.5
0.5 − 1.0 4 − 0.9597 × exp(−0.9597 10.0522 3.6439
× 1.0)
40 × 0.9597 × exp −0.9597 × 1.0
1.0 − 1.5 3 − 0.9597 × exp(−0.9597 5.9703 1.4778
× 1.5)
40 × 0.9597 × exp −0.9597 × 1.5
1.5 − 2.0 3 − 0.9597 × exp(−0.9597 3.5459 0.0840
× 2.0)
40 × 0.9597 × exp −0.9597 × 2.0
2.0 − 2.5 4 − 0.9597 × exp(−0.9597 2.1060 1.7033
× 2.5)

Course Instructor: Dr. Arunasis Chakraborty


3
Lecture 06: Hypothesis Testing
40 × 0.9597 × exp −0.9597 × 2.5
2.5 − 3.0 1 − 0.9597 × exp(−0.9597 1.2508 0.0503
× 3.0)
40 × 0.9597 × exp −0.9597 × 3.0
3.0 ≥ 4 − 0.9597 × exp(−0.9597 1.8295 2.5751
× ∞)
∑𝑂𝑖 = 40 ∑𝐸𝑖 = 40 𝜒 2 = 10.5155

Now, degrees of freedom, for this example, is evaluated as (7 − 3 = 4). Based on this and level
of significance one can obtain 𝜒 2 value as 9.492 (for 𝛼 = 5%) and 13.280 (for 𝛼 = 1%).
According to Chi square test, null hypothesis is rejected for 5% level of significance whereas for
1% level of significance it is accepted.

Kolmogorov–Smirnov Test

Chi square test considers the probability density whereas Kolmogorov–Smirnov (KS) test
considers cumulative distribution function. The philosophy KS behind is determining the
maximum absolute difference between the values of cumulative distribution of given random
data and assumed model as per null hypothesis. Steps for conducting KS test on a given random
sample and with assumed model and its parameters are explained below.

Step 1. Similar to Chi square test, null and alternate hypotheses are formulated in terms of
probability distribution and statistical parameters of the random data. Also, level of
significance 𝛼 is also selected (generally, 𝛼 = 5% is selected).

Step 2. As defined above, cumulative mass density of the observed sample 𝐹𝑂 𝑥 is calculated as
shown in equation below

0 for 𝑥 < 𝑥1
𝑖
𝐹𝑂 𝑥 = for 𝑥𝑖 < 𝑥 < 𝑥𝑖+1 2.6.4
𝑛
1 for 𝑥 ≥ 𝑥𝑛

where, 𝑥 is random data placed in ascending order, 𝑛 is sample size and 𝑖 ranges from
1,2, … , 𝑛.

Step 3. Cumulative distribution of the random sample as per the assumed probability distribution
and its parameters , i.e. 𝐹𝑋 𝑥 is calculated.

Step 4. Finally, maximum absolute difference between the cumulative function of the observed
and expected is evaluated as shown below

Course Instructor: Dr. Arunasis Chakraborty


4
Lecture 06: Hypothesis Testing

𝐹𝑋 𝑥1 − 𝐹𝑂 𝑥0 , 𝐹𝑋 𝑥1 − 𝐹𝑂 𝑥1 , 𝐹𝑋 𝑥2 − 𝐹𝑂 𝑥1 ,
KS = max 𝐹𝑋 𝑥2 − 𝐹𝑂 𝑥2 , … , 𝐹𝑋 𝑥𝑖 − 𝐹𝑂 𝑥𝑖−1 , 𝐹𝑋 𝑥𝑖 − 𝐹𝑂 𝑥𝑖 , 2.6.5
𝐹𝑋 𝑥𝑛 − 𝐹𝑂 𝑥𝑛−1 , 𝐹𝑋 𝑥𝑛 − 𝐹𝑂 𝑥𝑛

Step 5. Critical KS value with respect to 𝛼 and 𝑛 is calculated from a KS value table for
comparing the observed KS value evaluated as per Eq. 2.6.5.

Step 6. Like Chi square test, null hypothesis is rejected if the computed KS value is more than
critical value from Step 5.

Example

Ex # 02. Considering Ex # 01 check whether the null hypothesis is rejected or accepted by


conducting Kolmogorov–Smirnov test. For ease the random data is arranged in ascending order.

0.0434 0.0357 0.2329 0.0298 0.0438 0.7228 0.1635 0.0683


0.2049 0.088 0.2969 0.0414 0.422 1.1098 0.2228 0.1206
0.3875 0.0989 1.8476 0.0906 0.4224 1.765 0.3481 0.6035
0.8633 0.2774 2.0637 0.9359 0.4583 2.3275 1.2783 1.9527
1.284 3.0754 2.3317 2.8821 2.4319 3.3323 3.3258 3.4473

Solu. Again, null hypothesis and alternate hypothesis, mean of the observed data and parameter
𝜆 is taken from Ex # 01. Now, for performing KS test one have to evaluate cumulative mass
distribution as per Eq. 2.6.4 ,i.e. 𝐹𝑂 𝑥𝑖 and 𝐹𝑋 𝑥𝑖 are given in table below

Rank
𝒙𝒊 𝑭𝑶 𝒙𝒊 𝑭𝑿 𝒙 𝒊 𝑭𝑿 𝒙𝒊 − 𝑭𝑶 𝒙𝒊−𝟏 𝑭𝑿 𝒙 𝒊 − 𝑭𝑶 𝒙 𝒊
𝒊
1 0.0298 0.0250 0.0306 0.0306 0.0056
2 0.0357 0.0500 0.0365 0.0115 0.0135
3 0.0414 0.0750 0.0422 0.0078 0.0328
4 0.0434 0.1000 0.0442 0.0308 0.0558
5 0.0438 0.1250 0.0446 0.0554 0.0804
6 0.0683 0.1500 0.0687 0.0563 0.0813
7 0.0880 0.1750 0.0876 0.0624 0.0874
8 0.0906 0.2000 0.0901 0.0849 0.1099
9 0.0989 0.2250 0.0979 0.1021 0.1271

Course Instructor: Dr. Arunasis Chakraborty


5
Lecture 06: Hypothesis Testing

10 0.1206 0.2500 0.1181 0.1069 0.1319


11 0.1635 0.2750 0.1566 0.0934 0.1184
12 0.2049 0.3000 0.1922 0.0828 0.1078
13 0.2228 0.3250 0.2072 0.0928 0.1178
14 0.2329 0.3500 0.2155 0.1095 0.1345
15 0.2774 0.3750 0.251 0.099 0.124
16 0.2969 0.4000 0.2661 0.1089 0.1339
17 0.3481 0.4250 0.3042 0.0958 0.1208
18 0.3875 0.4500 0.3322 0.0928 0.1178
19 0.4220 0.4750 0.3558 0.0942 0.1192
20 0.4224 0.5000 0.356 0.119 0.144
21 0.4583 0.5250 0.3797 0.1203 0.1453
22 0.6035 0.5500 0.4668 0.0582 0.0832
23 0.7228 0.5750 0.5291 0.0209 0.0459
24 0.8633 0.6000 0.5932 0.0182 0.0068
25 0.9359 0.6250 0.6229 0.0229 0.0021
26 1.1098 0.6500 0.6854 0.0604 0.0354
27 1.2783 0.6750 0.736 0.086 0.061
28 1.2840 0.7000 0.7376 0.0626 0.0376
29 1.7650 0.7250 0.841 0.141 0.116
30 1.8476 0.7500 0.8541 0.1291 0.1041
31 1.9527 0.7750 0.8693 0.1193 0.0943
32 2.0637 0.8000 0.8835 0.1085 0.0835
33 2.3275 0.8250 0.9115 0.1115 0.0865
34 2.3317 0.8500 0.9119 0.0869 0.0619
35 2.4319 0.8750 0.9207 0.0707 0.0457
36 2.8821 0.9000 0.9504 0.0754 0.0504
37 3.0754 0.9250 0.9594 0.0594 0.0344
38 3.3258 0.9500 0.9687 0.0437 0.0187

Course Instructor: Dr. Arunasis Chakraborty


6
Lecture 06: Hypothesis Testing

39 3.3323 0.9750 0.9689 0.0189 0.0061


40 3.4473 1.0000 0.9725 0.0025 0.0275
max( 𝐹𝑋 𝑥𝑖 − 𝐹𝑂 𝑥𝑖−1 , 𝐹𝑋 𝑥𝑖 − 𝐹𝑂 𝑥𝑖 ) = 0.1453

KS value observed from the random data is 0.1453. Critical KS values based on sample size 40
and level of significance 𝛼 (5% and 1%) are evaluated as

1.36
KS5% = , for 𝑛 > 35 = 0.2150
𝑛
1.63
KS1% = , for 𝑛 > 35 = 0.2577
𝑛
Thus, according to KS test, for both the cases null hypothesis is accepted as the observed value is
less than the critical values.

Course Instructor: Dr. Arunasis Chakraborty


7

You might also like