You are on page 1of 15

20th September 2019

MANAGERIAL STATISTICS

PROJECT REPORT

ACCIDENTS IN MUMBAI LOCAL TRAINS


(Western line)

Group No: W2 (Batch 2019 – 2021)

1. Adarsh Sinha – 19F404


2. Debadarsan Barua – 19F418
3. Jayesh Prakash Hemlani – 19F423
4. Sowjanya Sampathkumar – 19F452
5. Varun Ganatra – 19F458
Table of Contents

1. Executive Summary .......................................................................................................................... 3


2. Introduction ....................................................................................................................................... 4
2.1 Problem Statement...................................................................................................................... 5
2.2 Scope of the study........................................................................................................................ 5
2.3 Objectives of the study: .............................................................................................................. 5
3. Methodology ...................................................................................................................................... 6
3.1 Data Collection ............................................................................................................................ 6
3.2 Source of Data ............................................................................................................................. 6
3.3 Tools Used .................................................................................................................................... 6
3.4 Concepts Used ............................................................................................................................. 6
3.5 Research Methodology ............................................................................................................... 6
3.5.1 Bayes’ Theorem.................................................................................................................... 6
3.5.2 Hypothesis Testing - Test of Independence ....................................................................... 7
3.5.3 Regression Analysis.............................................................................................................. 8
4. Results and Analysis ......................................................................................................................... 9
4.1 Analysing Accidents occurring near each station .................................................................... 9
4.2 Bayes’ Theorem......................................................................................................................... 11
4.3 Hypothesis Testing – Test of Independence ............................................................................ 12
4.4 Regression Analysis .................................................................................................................. 13
5. Conclusion and Discussion ............................................................................................................. 14
5.1 Major Findings ............................................................................................................................ 14
5.2 Limitations of the study ............................................................................................................ 14
5.3 Recommendation....................................................................................................................... 14
6. References ........................................................................................................................................ 15
1. Executive Summary

In our study, we have implemented statistical concepts to determine the probability of accidents
happening on various stations in the Mumbai railway network (Western Line). We have also
tried to determine the independence of various attributes considered in our study. We start with
collecting the accident-related data of various passengers starting from August 2018 till August
2019. The attributes considered in our study are Age, Day, Month, Gender, Type of Accident
and Nearest Station where it occurred. We calculated a prior probability of accidents happening
at each station. We use this prior probability to determine that if an accident has occurred then
what is the probability of it happening at a station. Hence, we implemented Bayes Theorem in
the first part of our analysis. This analysis helped us to understand and find out if any accident
has occurred what is the probability that it will occur near the station. After this analysis, we
implement the test of independence on the attributes considered in our study where we checked
the independence of the Gender and Type of Accident. Through this analysis, we get to know
whether the type of accident that is major, minor or death is independent upon the gender or
not. We have also used the Regression model to determine the relationship between the type
of accidents and attributes.
2. Introduction

Formerly known as Bombay Suburban Railway, Mumbai Suburban Railways is spread over a
length of 390 km, carrying more than 7.5 million commuters daily. Being one of the busiest
rail networks, they offer approximately 2342 train services with annual commuters of around
2.64 billion. Thus, the system is too overcrowded throughout the entire year, starting from 4:00
a.m. until 01:00 a.m.

The history of this prestigious railway network dates to April 16th, 1953 when the first train
took off from Bori Bunder for Thane and covered 34 km in around an hour and a half. With
the intention of just experimentation, it soon became a hit amongst the local people for
providing the ease of travelling from one place to another. Lines started expanding and today,
it is the second-largest railway network in Asia.

Indian Railways takes care of the operations part and divides it into two divisions, mainly the
Western Railways and Central Railways. Our primary focus is on Western Railway network
which runs from Churchgate parallel to the west coast of the Mumbai Metropolitan Region,
Maharashtra. It consists of 37 railway stations from Dahanu Road in the north to Churchgate
station in the south. The Western Railways' Electric Multiple Unit, which was approved in
2012-13 Railway Budget, runs on AC 25 kV power and uses 9 car rakes. This conversion from
DC to AC was carried out to optimize the punctuality and energy-efficiency of the network,
allowing the trains to achieve 100 km/hr. The trains running at slower speeds halt at all the
stations while the faster ones stop only at the important ones.

While the railway services provide ease of transportation to the people, there are also many
negative aspects of this extensive network. The most important one to take a note of is the
number of accidents that take place daily. Reports show that about 2000 passengers die
annually and between the period of 2002 and 2012, more than 36000 people met with fatal
accidents and around 37000 passengers were injured severely. Although over crowdedness can
be primarily attributed to the main cause of accidents, there are several other reasons. Reasons
such as trains halting for a mere 10 seconds and then passengers hastily trying to board or get
off the trains within that time window resulting in a major chunk of the number of accidents.
Some passengers die when they travel sitting on the roof to avoid the crowd and then
accidentally touching on the high-voltage wires. Another reason is because of the open doors
and windows. People travel by hanging off the edge of the footboard, off door ledges often
losing support or slipping off the trains. Another reason is teenagers performing stunts off the
doorways. People often lose their lives due to crossing the tracks on foot simply to avoid the
footbridges to save time.
2.1 Problem Statement
This report has been made to study the number of accidents that occurred in each of the stations
of the Western Lines during the period of August 2018-August 2019, the demographics of the
passengers who had met with the accidents, the type of injuries and the frequencies of such
happenings every month. We also want to determine the independence and relationship of
various attributes considered in our study.

2.2 Scope of the study


In this study, we are considering the data for Western Lines. The data has been acquired from
western railway website wr.indianrailways.gov.in. The data has been collected for the months
of August 2018 to August 2019. We extracted data which includes the type of accidents and
details of the accident like gender, age, date, day, month and nearby station.
We are considering the following stations: Andheri, Bandra, Bhayander, Borivali, Charni
Road, Churchgate, Dadar, Dahisar, Elphinstone, Goregaon, Grant Road, Jogeshwari,
Kandivali, Khar Road, Lower Parel, Mahalaxmi, Mahim, Malad, Marine Lines, Matunga Road,
Mira Road, Mumbai Central, Naigaon, Nalasopara, Santa Cruz, Vasai Road, Vile Parle and
Virar.
We used this data to calculate probabilities using Bayes' Theorem, identifying the
independence between two variables using Hypothesis Testing – Test of Independence and to
find out if there is any significant relationship between variables and types of accidents using
regression analysis.
2.3 Objectives of the study:

1. To determine the probability of accidents occurring in each station during the period of
August 2018-August 2019.
2. To apply Bayes' Theorem and determine if the accident has occurred then what is the
probability of it occurring in any of the nearest stations.
3. To apply the Test of Independence and determine whether gender and type of accident
are independent of each other.
4. To determine the relationship between the type of accidents and gender, age group and
days of the week through Regression model for all types of accidents.
3. Methodology

3.1 Data Collection


Data is collected from western railway website (Mumbai Suburban Accident details) and
compiled in excel sheets. A total of 1356 line of data was collected for the period of 1 year
starting from August 2018 to August 2019
3.2 Source of Data
The Source of data collected is from the following site:
The data of no of accidents occurred from August 2018 to August 2019 including details like
Age, Gender, Day, Month and Type of accident have been collected from
https://wr.indianrailways.gov.in/wr_accident.jsp?lang=0&id=0,6

3.3 Tools Used


The following tools have been used for analysis: -
▪ Excel Analysis Toolkit
3.4 Concepts Used
The following concepts have been used: -
▪ Bayes’ Theorem
▪ Hypothesis testing - Test of Independence
▪ Regression Analysis

3.5 Research Methodology

3.5.1 Bayes’ Theorem


Bayes’ Theorem is used to do probability analysis with initial and prior probabilities. By getting
additional information, we calculate revised or posterior probabilities.
The formula is:

P(A) ∗ P(B ǀ A)
P(A ǀ B) =
P(B)

Where,
P(A) = Probability of event A occurring
P(B) = Probability of event B occurring
P (A ǀ B) = Probability of event A given that event B occurred.
P (B ǀ A) = Probability of event B given that event A occurred.

A total of 1356 accidents occurred from August-2018 to August-2019. After extrapolating the
data in excel, our first objective was to determine the probability of an accident taking place at
any randomly selected railway station involved in our study. We use these probabilities as prior
probabilities. Next, we implemented Bayes' theorem and used these previously calculated prior
probabilities to determine that if an accident has occurred then what is the probability of it
occurring near a specific station. We assumed Conditioned probability by using data of
accidents occurred near a specific station. Highest accidents occurred near any station was
given higher conditioned probability and lowest accidents occurred near any station were given
lower values. This analysis helped us to understand and find out if any accident has occurred
what is the probability that it will occur near the specific station.

3.5.2 Hypothesis Testing - Test of Independence


In hypothesis testing – Test of Independence is based on the chi-square (χ2). It is used to
determine the significant relationship between two variables. The data used for calculation is
always categorical. In this method, the frequency of one variable is compared across the
categories of the second variable. Test of independence is done in 5 steps and they are

Step 1: Define the null (Ho) and alternative hypothesis (Ha).

Step 2: Select a random sample and record the observed frequency. fij , for each cell of the table

Step 3: Compute the expected frequency (eij) for each cell

(𝑅𝑜𝑤 𝑖 𝑇𝑜𝑡𝑎𝑙)(𝐶𝑜𝑙𝑢𝑚𝑛 𝑗 𝑇𝑜𝑡𝑎𝑙)


𝑒𝑖𝑗 =
Sample Size

Step 4: Compute the test statistics

2
(𝑓𝑖𝑗 − 𝑒𝑖𝑗 )
χ2 = Σi Σj
𝑒𝑖𝑗
Step 5: Determine the rejection rule
If p-value ≤ α or  ≥   then Reject Ho or else do not reject Ho

Where α is the significance level and there are (n-1)*(m-1) degree of freedom (with n rows and
m column)

We apply a test of independence to find out whether the type of accident i.e. death, major or
minor has any significant relationship with gender. The general conception is that male
passenger are more susceptible to injuries caused due to accidents in and around western
Mumbai railway network.

3.5.3 Regression Analysis


A statistical procedure called regression analysis can be used to develop an equation showing
how the variable is related. The variable being predicted is called the dependent variable and
the variable used to predict the values of dependent variables are called independent variables.
To proceed with the regression testing where we will find the "R square" or coefficient of
determination which will indicate the amount of variability due to independent factors on
dependent factors. High "R square" indicates that the number of random variables affecting the
values of Y is very less and most of it varies concerning a change in the value of x.
We carried out regression analysis to estimate the relationship between the type of accident
and gender, age, month and station.
4. Results and Analysis

4.1 Analysing Accidents occurring near each station

As per data extracted from western railway website, the maximum number of accidents
occurred from August 2018 to August 2019 is near Kandivali Station.

Below chart shows no of accidents occurred at or near each station.

No of Accidents occurred near each station


100
90 94
80 87 85
70 75
NO OF ACCIDENTS

72
60 67
61 59 60 60 61
50 58
54
48 49 49
40 44 45
30 37
33
20 26 26
23
20 18 18
10 15
12
0
MALAD
BORIVALI

CHURCHGATE

GOREGAON

VIRAR
JOGESHWARI

KHAR ROAD

MAHALAXMI
DADAR

MARINE LINES

MIRA RD

SANTA CRUZ
ANDHERI

BHAYANDER

CHARNI ROAD

DAHISAR

MATUNGA RD

NAIGAON
NALASOPARA
LOWER PAREL

MAHIM

MUMBAI CENTRAL (L)


BANDRA

ELPHINSTONE

GRANT ROAD

KANDIVALI

VASAI RD
VILE PARLE
STATTION

From the above chart, it can be the inference that maximum accidents occurred at or near
Kandivali station followed by Borivali station.

Probability of accidents occurred at or near the station was calculated and the maximum
probability is 0.07 at or near Kandivali station and least probability is 0.009 at or near
Mahalaxmi station.

Below table shows probability calculation for no of accidents occurred at or near the station.
Stations No of Accidents Probability
ANDHERI 75 0.055
BANDRA 67 0.049
BHAYANDER 61 0.045
BORIVALI 87 0.064
CHARNI ROAD 26 0.019
CHURCHGATE 23 0.017
DADAR 59 0.044
DAHISAR 60 0.044
ELPHINSTONE 15 0.011
GOREGAON 72 0.053
GRANT ROAD 20 0.015
JOGESHWARI 85 0.063
KANDIVALI 94 0.069
KHAR ROAD 33 0.024
LOWER PAREL 48 0.035
MAHALAXMI 12 0.009
MAHIM 49 0.036
MALAD 60 0.044
MARINE LINES 18 0.013
MATUNGA RD 18 0.013
MIRA RD 26 0.019
MUMBAI CENTRAL (L) 37 0.027
NAIGAON 49 0.036
NALASOPARA 61 0.045
SANTA CRUZ 44 0.032
VASAI RD 58 0.043
VILE PARLE 45 0.033
VIRAR 54 0.040
Total 1356 1.000

We further analysed data and segregated based on gender and it was found out that most of
the accidents occur to men. Below table shows no of accidents segregated based on gender.

No of accidents segregated based on Gender


700
600
NO OF ACCIDENTS

500
400
300
200
100
0
Minor Major Dead

Male Female
4.2 Bayes’ Theorem
Using Bayes' theorem, we find out if any accident has occurred what is the probability that it
will occur near the specific station. Below table shows the analysis by Bayes' theorem

Grand Prior Joint Post


Row Labels Dead Major Minor Cond P
Total prob Prob Prob
Andheri 23 44 8 75 0.055 0.800 0.044 0.070
Bandra 15 45 7 67 0.049 0.700 0.035 0.054
Bhayander 20 35 6 61 0.045 0.700 0.031 0.050
Borivali 27 50 10 87 0.064 0.900 0.058 0.091
Charni road 7 14 5 26 0.019 0.300 0.006 0.009
Churchgate 2 16 5 23 0.017 0.300 0.005 0.008
Dadar 6 40 13 59 0.044 0.600 0.026 0.041
Dahisar 23 27 10 60 0.044 0.700 0.031 0.049
Elphinstone 4 10 1 15 0.011 0.200 0.002 0.003
Goregaon 22 43 7 72 0.053 0.800 0.042 0.067
Grant road 5 11 4 20 0.015 0.300 0.004 0.007
Jogeshwari 29 50 6 85 0.063 0.900 0.056 0.089
Kandivali 40 45 9 94 0.069 0.900 0.062 0.098
Khar road 16 15 2 33 0.024 0.400 0.010 0.015
Lower Parel 11 31 6 48 0.035 0.500 0.018 0.028
Mahalaxmi 2 8 2 12 0.009 0.200 0.002 0.003
Mahim 12 35 2 49 0.036 0.500 0.018 0.028
Malad 20 34 6 60 0.044 0.700 0.031 0.049
Marine lines 0 16 2 18 0.013 0.200 0.003 0.004
Matunga road 5 12 1 18 0.013 0.200 0.003 0.004
Mira road 14 11 1 26 0.019 0.300 0.006 0.009
Mumbai central 2 27 8 37 0.027 0.400 0.011 0.017
(l)
Naigaon 21 18 10 49 0.036 0.500 0.018 0.028
Nalasopara 26 27 8 61 0.045 0.700 0.031 0.050
Santa Cruz 14 26 4 44 0.032 0.500 0.016 0.026
Vasai road 23 32 3 58 0.043 0.600 0.026 0.040
Vile Parle 6 31 8 45 0.033 0.500 0.017 0.026
Virar 28 23 3 54 0.040 0.600 0.024 0.038
Grand Total 423 776 157 1356 1.000 0.400 0.636 1.000
4.3 Hypothesis Testing – Test of Independence

Analysis for Test of Independence is given below: -

Step 1:
Ho: Accidents are independent of gender
Ha: Accidents are not independent of gender

Step 2:
Level of significance alpha = 0.05

Step 3:
Below table consists of type of accidents with respect to gender.

Type of Accident Gender


Male Female Total
Minor 132 25 157
Major 665 111 776
Dead 377 46 423
Column Total 1174 182 1356

Test statistics are calculated based on the data from this table. Observed frequency (OF) is the
actual frequency in various combinations and Expected frequency (EF) is what the frequency
should have been nominal.

Observed Expected
OF – EF (OF - EF) ^2 (OF - EF) ^2/E.F.
Frequency Frequency
132 135.93 -3.93 15.43 0.11

25 21.07 3.93 15.43 0.73

665 671.85 -6.85 46.88 0.07

111 104.15 6.85 46.88 0.45

377 366.23 10.77 116.09 0.32

46 56.77 -10.77 116.09 2.04

Chi Square value = 3.73

Step 4:
The P-value calculated = 0.16
Step 5:
As p- value (0.16) is more than Alpha (0.05). We cannot reject Ho. And Hence, it is proved
that accidents are independent of Gender.

4.4 Regression Analysis


Regression Analysis was carried out to analysis to estimate the relationship between the type
of accident and gender, age, month and station.

Below Tables shows the result of Regression Analysis

Regression Statistics

Multiple R 0.196

R Square 0.039

Adjusted R Square 0.036

Standard Error 0.613

Observations 1356

Standard P- Lower Upper Lower Upper


Coefficients t Stat
Error value 95% 95% 95.0% 95.0%

Intercept 1.972 0.079 25.031 0.000 1.818 2.127 1.818 2.127


X Variable 1 0.084 0.013 6.605 0.000 0.059 0.109 0.059 0.109
X Variable 2 -0.097 0.049 -1.972 0.049 -0.193 -0.001 -0.193 -0.001
X Variable 3 0.008 0.005 1.554 0.120 -0.002 0.018 -0.002 0.018
X Variable 4 0.004 0.002 1.791 0.074 0.000 0.008 0.000 0.008

We got R square value very low (approx. 0.039). Low "R square" indicates that the relationship
between X variables and Y variable is very weak.
5. Conclusion and Discussion

5.1 Major Findings


As per the analysis was done on data extracted from the western railway website for August
2018 to August 2019 using various tools and concepts, the following are our conclusion:
1. The maximum number of accidents occurred at or near Kandivali station and few
reasons of a high number of accidents are falling from running train, the collision of
people on tracks with train, trying to board running train and railway line crossing and
not using foot over bridge.
2. No of men involved in accidents is more than women. But as per hypothesis testing- a
test of independence, the severity of accidents and gender are independent.
3. There is hardly any relationship between accidents and factors like age, gender, month
and station.
5.2 Limitations of the study
Due to lack of time, the researchers were unable to extract data for the last 5 – 10 years and
analyse the same. However, further research can be done to analyse the difference in no of
accidents during summer, monsoon and winter season and try to find out accidents pattern
5.3 Recommendation
1. Enhanced supervision by authorities to monitor if people are adhering to rules.
2. Encourage use of foot over the bridge over railway line crossing.
3. Spread awareness about the increase in the rate of accidents.
6. References

1. https://wr.indianrailways.gov.in/index.jsp

You might also like