You are on page 1of 1

Aim: Predictive Risk Modelling to classify underwriting risk for vehicle policies and

quantifying the risky class.

Data: Data has been obtained from kaggle-AllState Claim Prediction challenge. It has
features related to the vehicle and also non vehicle features for example policy related
features. The data is from the period 2005-2007.Since the overall data is very large(around
2.5 GB), a portion of the data may have to be used for practical purposes.

Objectives:
1. Based on the vehicle and policy feature set classify each entry to a risk class. The risk
class can be thought of as a multi class categorical variable.
2. Develop a classification and regression model to predictively classify each entry to
their risk as well as quantify the risk by predicting a claim amount.
Solution Steps:
1. Data Pre-processing and cleaning: Handling missing values and outliers.
2. Exploratory Data Analysis: Univariate and Bivariate analysis. Identifying the
dependencies and the redundant variables.
3. Dividing the data into training and test set (or train-test-validate set).Using the
insurance claim amounts quantify all the vehicle entries by a risk class(say 1-8,1 for
low risk 8 for highest risk.
4. Run a classification model (like SVM/Log. Regression) in order to tackle the multiclass
classification problem of assigning a risk level to each entry in the training set.
5. Run a Regression model (Linear Regression etc.) to predict the claim amounts for
each entry.
6. In each of the steps 4 and 5 various aspects like feature selection, hyper parameter
tuning etc. must be done to optimize the results. Also comparison of results from
multiple regression or classification algorithms may be done to find the suitable
option.

Progress made: I am currently in the Data pre-processing and EDA stage.

You might also like