You are on page 1of 4

14.

The highway between Atlanta, Georgia and Athens, Georgia has a high incidence
of accidents along its 100 kilometers. Public safety officers say that the occurrence
of accidents along the highway is randomly (uniformly) distributed, but the
news media say otherwise. The Georgia Department of Public Safety published
records for the month of September. These records indicated the point at which 30
accidents involving an injury or death occurred, as follows (the data points
representing the distance from the city limits of Atlanta):

88.3 40.7 36.3 27.3 36.8


91.7 67.3 7 45.2 23.3
98.8 90.1 17.2 23.7 97.4
32.4 87.8 69.8 62.6 99.7
20.6 73.1 21.6 6 45.3
76.6 73.2 27.3 87.6 87.2

Use the Kolmogorov-Smirnov test to discover whether the distribution of location of


accidents is uniformly distributed for the month of September.

(Assume the lower bound and upper bound of the Uniform Distribution is 0
and 100, respectively)

Solution:
The Kolmogorov-Smirnov (KS) test is used to compare a sample distribution with a
theoretical distribution. In this case, we are comparing the distribution of accident
locations with a uniform distribution between 0 and 100 kilometers.

The KS statistic is given by the maximum vertical distance between the empirical
distribution function (EDF) of the sample and the cumulative distribution function
(CDF) of the theoretical distribution.

The null hypothesis and its alternate are formed as follows:


𝐻𝐻𝑜𝑜 : Data are uniformly distributed
𝐻𝐻𝑖𝑖 : Data are formally distributed
1. Sort the accident locations in ascending order: (sorted from smallest to largest)

Sorted Data
[6.0,7.0,17.2,20.6,21.6,23.3,23.7,27.3,27.3,32.4,36.3,36.8,40.7,45.2,45.3,62.6,67.3,69.
8,73.1,73.2,76.6,87.2,87.6,87.8,88.3,90.1,91.7,97.4,98.8,99.7]

2. Empirical Distribution Function (EDF):

𝑖𝑖
𝐸𝐸𝐸𝐸𝐸𝐸(𝑥𝑥𝑖𝑖 ) = where N is sample size = 30
𝑁𝑁

3. Theoretical Cumulative Distribution Function (CDF) for a Uniform Distribution


between 0 and 100 kilometers:

𝑅𝑅
𝐶𝐶𝐶𝐶𝐶𝐶(𝑅𝑅) =
100

4. Calculate the KS Statistic


𝑖𝑖
𝐷𝐷+ = max � − 𝑅𝑅𝑖𝑖 � = 0.0470
1≤𝑖𝑖≤𝑁𝑁 𝑁𝑁
𝑖𝑖 − 1
𝐷𝐷− = max �𝑅𝑅𝑖𝑖 − � = 0.1720
1≤𝑖𝑖≤𝑁𝑁 𝑁𝑁

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Sorted Data 6 7 17.2 20.6 21.6 23.3 23.7 27.3 27.3 32.4 36.3 36.8 40.7 45.2 45.3
CDF=R/100 0.06 0.07 0.172 0.206 0.216 0.233 0.237 0.273 0.273 0.324 0.363 0.368 0.407 0.452 0.453
EDF=i/N=1/30 0.033333 0.066667 0.1 0.133333 0.166667 0.2 0.233333 0.266667 0.3 0.333333 0.366667 0.4 0.433333 0.466667 0.5
D+= i/30-R -0.0267 -0.0033 -0.0720 -0.0727 -0.0493 -0.0330 -0.0037 -0.0063 0.0270 0.0093 0.0037 0.0320 0.0263 0.0147 0.0470
D-=R-(i-1)/30 0.0600 0.0367 0.1053 0.1060 0.0827 0.0663 0.0370 0.0397 0.0063 0.0240 0.0297 0.0013 0.0070 0.0187 -0.0137

i 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Sorted Data 62.6 67.3 69.8 73.1 73.2 76.6 87.2 87.6 87.8 88.3 90.1 91.7 97.4 98.8 99.7
CDF=R/100 0.626 0.673 0.698 0.731 0.732 0.766 0.872 0.876 0.878 0.883 0.901 0.917 0.974 0.988 0.997
EDF=i/N=1/30 0.533333 0.566667 0.6 0.633333 0.666667 0.7 0.733333 0.766667 0.8 0.833333 0.866667 0.9 0.933333 0.966667 1
D+= i/30-R -0.0927 -0.1063 -0.0980 -0.0977 -0.0653 -0.0660 -0.1387 -0.1093 -0.0780 -0.0497 -0.0343 -0.0170 -0.0407 -0.0213 0.0030
D-=R-(i-1)/30 0.1260 0.1397 0.1313 0.1310 0.0987 0.0993 0.1720 0.1427 0.1113 0.0830 0.0677 0.0503 0.0740 0.0547 0.0303
𝐷𝐷 = max(0.0470,0.1720) = 0.1720

5. Locate the critical value Da in Table A.8 for the specified significance level a and
the given sample size N.
𝐷𝐷0.05,30 = 0.24

6. If 𝑡𝑡ℎ𝑒𝑒 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝐷𝐷 > 𝑡𝑡ℎ𝑒𝑒 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝐷𝐷0.05,30 , the 𝐻𝐻0 that the data are a
sample from a uniform distribution is rejected.

𝐷𝐷0.05,30 = 0.24 > 𝐷𝐷 = 0.172 , Therefore, we don’t reject 𝐻𝐻0

You might also like