You are on page 1of 4
ACD — Capstone Project — Car accident severity ACD_Coursera Sep 22 -3 min read Business Problem: Introduction What are the most salient factors that determine the severity of a car accident? In this project we are going to analyze a data-set containing car accidents and the conditions at the given time of the event (road condition, visibility, weather). By means of training different machine learning models, we'll be able to predict how these conditions modulate the severity of the collision. Furthermore, we’ll construct a rating scale in order to asses the probability of an accident (1: extremely low probability up to 5: highest probability). Data Understanding and Preparation The data used for this project will be extracted from the Collisions set. In order to prepare the data to be feed into the models, we'll drop the columns that contain irrelevant information. This will allow us to greatly decrease the computational costs associated in managing large amounts of information. In particular, we’ll use the columns WEATHER (x1), ROADCOND(x2) and LIGHTCOND (x3) to predict the SEVERITYCODE (y). import pandas as pd import Gece » dé=pd.read_csv("Data-Collisions.csv") dasta = d£.drop(columns = ['OBJECTID’, 'SEVERITYCODE.1', 'REFORTNO", * 'x!, 'y!, 'STATUS', 'ADDRTYPE', "LOCATION", "EXCEPTRSNCODE', ‘SEVERITYDESC', ‘I! (ONTYFE', "SD! * PEDCYLCOUNT', *PERSONCOUNT', ‘SPEEDING’, 4 dasta["WEATHER"] = dasta["WEAT! dasta["ROADCOND") = dasta["ROADC dasta["LIGHTCOND"] = dasta ("Lz "} astype ("category") OND") .astype ("category") dasta["WEATHER_CAT"] = dasta["WEATHER"] .cat. codes dasta["ROADCOND_CAT"] = dasta(["ROI dasta[ "Lr dasta.head(20) Transforming the target value (SEVERITYCODE) in order to balance the data prior to the modeling stage. from sklearn.utils import resample dasta more = dasta[dasta. SEVERITYCODI dasta_less = dasta [dasta. SEVERITYCODI dasta_more equal = resample (dasta_s 2] Te, replace=Palse, n_samples=581 random_state= daste_bal = pd.concat([dasta_more equal, dasta_less]) dasta bal. SEVERITYCODE. value_counts() Modeling Once we have appropiately transformed our data, it’s time to proceed to test different models in order to dilucidate the one that has the best accuracy. In the usual order, we are going to train: a logistic regression model, a KNN model and a decision tree. from sklearn import preprocessing X = preprocessing. StandardScaler() .f1t(X).transform(X) 5] import mumpy as np X = np.asarzay (desta bal [ [WEATHER CRi x(025] y = mp-asarzay (daste y (9:5) + "ROADCOND_CAT', 'LIGHTCOND_CAT']]) yal [' SEVERITYCODE' ]) fas ve did in previous Labs, I have decided to use a 808(train) and 208(test) from sklearn.model_selection import train_test_split X train, X test, y train, y test = train Eest split (x, y, , sandom_state=3) flogistic Regression from sklearn.linear model import LogisticRegression from sklearn.metrics import confusion matrix ='Liblinear').£it(x_train,y_train) logReg = LogisticRegression (C=: logPred = logReg.predict (x_test) LogPredodd= logReg. predict proba (x test) #xM from sklearn.neighbors import KNeighborsClaseifier ks = 15 hood = RMeighborsClassifier(n neighbors = ke).£it (M_train,y tain) hood hoodPred = hood.predict (x_test) hoodPred[0:5] azray([2, 2, 2, 2, 11) #Decis. from sklearn.tree import DecisicnTreeClassifier treedat = DecisionTreeClassifier(criterion="entropy", max_depth = treedat teeedat. fit (X_train,y train) treeFred = treedat.predict (xX_test) Evaluation Now that we have succesfully trained 3 different models, we need to evaluate them in order to determine which is the most accurate. from sklearn.metrics import £1 score from sklearn.metrics import from sklearn.metrics import y_score(y test, logPred) Yt, LogPredodd) 0.529386492524 0683972651397; 542896479876548 +5571833648393195, Conclusion 8 About Help In this project, we tackled the problem of predicting the severity of the collision (y) in a given event via a multi factorial model which takes road condition, visibility and weather as input variables. After selecting the data set, we proceeded to curate it in order to facilitate the construction of 3 different models: logistic regression, KNN and a decision tree. After carefully evaluating the models via accuracy tests, we concluded that logistic regression is the optimal approach for this task. The results hereby presented are subject to further refinement by means of exploration higher order ML models.

You might also like