You are on page 1of 9

Individual Report: Revolut Car

Insurance Data Analytics


Incorporating Sustainability with Profitability
(Individual Assignment)

By
Oisín Butler, 23358866

BU7152: Financial Analytics

Module Leader: Michael Dowling

Trinity Business School


TRINITY COLLEGE
UNIVERSITY OF DUBLIN

March 2024
Introduction friendly behaviours, such as driving
electric vehicles or reducing carbon
The objective of this project was to create
emissions, will be eligible for discounts
a tailored car insurance plan for Revolut,
and rewards. This not only promotes the
tailored specifically for the Irish market.
green movement but also enhances
Within our team, I Undertook the data
Revolut's brand image as a socially
analysis efforts, utilising Python scripts to
responsible company committed to
infer additional data, model the dataset,
environmental sustainability.
and conduct comprehensive analysis.
Moreover, I was responsible for
generating the personalised insurance
premiums based on the insights derived
from the analysis. Additionally, I led the
presentation of the project, consolidating
our findings into a cohesive and polished
presentation for the team.

Proposed Revolut Car


Insurance Product (Revolut Green Project example)
The proposed Revolut Car Insurance
product is meticulously crafted to address Furthermore, our product seamlessly
the unique needs of the Irish market while integrates within the Revolut ecosystem,
aligning with Revolut's brand ethos of providing customers with a seamless and
innovation, sustainability, and customer- convenient experience. By leveraging
centricity. Revolut's existing infrastructure and user
base, we can offer enhanced features
Our primary objective is to identify the such as streamlined payments, real-time
leading causes of claims to effectively monitoring of driving behaviour, and
manage risk and ensure personalised personalised recommendations tailored to
premiums that reflect individual driving individual preferences.
behaviours and risk profiles. To achieve
this, we've developed a sophisticated
formula that analyses various factors such
as driving history, vehicle type, location,
and frequency of claims. By leveraging
advanced data analytics and machine
learning techniques, we can accurately
predict risk and tailor premiums
accordingly, thereby optimising our
company's risk management strategy.
(Interface Inegration)
In addition to risk management, our team
wanted to incorporate a sustainability-
oriented reward system to incentivize Feature Engineering
environmentally conscious driving habits. When it came to implementing the goal
Through this, customers who exhibit eco- of our project it was clear that I would
need to make some alterations to the or inner city. Using all of these metrics
existing data set. The original data set I calculated an average distance
contained 43 different variables, travelled per car model per week per
including engine_type, is_claim, town, which combined with my CO2
area_cluste and population density. calculation gave us a personalised
The first thing I wanted to do was total CO2 emissions estimate per car
make the data more relatable, so I per day.
converted the area clusters to irish
town names in order of largest
population density to smallest, for
example 70,000 C1 became Dublin.
I then did a bit of research to try and
link the Car Models, engine_types and
total specs with existing cars. I did this
successfully and came up with models
such as the suzuki swift and the Tata
Harrier.
(Population distribution in new cities)

Model Building
To determine the main Variables that
were indicative of a claim being made I
ran 4 classification models, Logistic
regression, Random forest, SVC and
Decision Tree.
An issue emerged when I noticed a
significant class imbalance.
(Suzuki Swift)

I could then determine each car's


average fuel expenditure on highways
and off highways by utilising the
age_of_car and the cars efficiency
score online. Using the cars fuel type I
could calculate a cars CO2 emissions
per litre of fuel burned.
I divided the Irish towns into groups
such as Commuter towns or urban
areas. This can indicate to us how (Poor Random Forest classifier)
often an individual will need to drive,
typically how far they will need to drive True negatives (Claim being made) in
their vehicle and on which kinds of the data set were so small compared
roads they will be driving e.g. Highway to the true positives (claim not being
made), that my models had an 0.50 0.53 0.57 0.52
extremely difficult time identifying (Model Metrics with SMOTE)
claims. My Random forest model was
returning an accuracy rating of 92%
but was failing to identify claims. The
models were no better than random
sampling.

(Random forest metrics) (SMOTE random forest Confusion


Matrix)
A solution to this problem that I tried
was Synthetic Minority Oversampling After taking the best overall model
Technique or SMOTE. SMOTE works Random forest I ran a feature
by generating synthetic instances of importance to determine which
the minority class (in this case, claims variables were the most important
being made) to rebalance the dataset. when determining whether a claim
The logistic regression model saw would be made or not. The result
significant changes with SMOTE. showed 8 variables being far more
Before, it had high accuracy but important than any other. These
struggled to spot claims accurately. factors being Policy tenure, Car age,
After SMOTE, The accuracy dropped Holder age, Town, Distance driven,
to 50% but it got slightly better at CO2 emissions and Car Model.
finding claims. This means the model
improved in identifying actual claims,
although a recall of .07 is not good
enough to rely on this model.

Logistic SMOTE Random SMOTE


Regression Logistic Forest Forest
Accuracy: Accuracy: Accuracy: Accuracy:
0.93 0.50 0.92 0.88
Precision: Precision: Precision: Precision:
0.00 0.07 0.11 0.10 (Feature importance)
Recall: Recall: Recall: Recall: We can ignore CO2 emissions as it is
0.00 0.57 0.02 0.11 a derivative of Distance travelled and
F1 Score: F1 Score: F1 Score: F1 Score: car models.
0.00 0.13 0.03 0.11 I ran an EDA on each of the variables
ROC AUC ROC AUC ROC AUC ROC AUC against is_claim. I then categorised
Score: Score: Score: Score:
them into risk factors with the variables
being linked with the highest
percentage of claims being the high
risk, the middle ones being moderate
risk and then the safest variables
being low risk before moving on to
calculating premiums.

(Claims by city)

Thirdly, examining claims by average


travel distance revealed that higher
travel distances, particularly in the
highway category, were associated
(Claims percentage per age category) with higher percentages of claims.

Results and Insights


In analysing the data, several key
findings emerged. Firstly, the
distribution of claims by age category
showed a trend of increasing claims
with age until around the age of 55, (Claims Distance travelled in city)
after which claims decreased
significantly. However, the percentage
of claims within each age category
remained relatively consistent,
suggesting that the proportion of
claims within each age group was
stable.
(Claims Distance travelled on
Highway)

Moreover, analysing claims by car


model showed that certain models,
such as the Maruti Suzuki Swift,
Hyundai Creta, and Maruti Suzuki Alto,
(Claims by age) had notably higher percentages of
claims compared to others.
Secondly, claims varied significantly by
city, with some cities such as Kilkenny
and Waterford experiencing higher
percentages of claims compared to
others.
Base Premium (€850) × CAF × CMAF
× DTAF(city) × DTAF(highway) × AAF
× PTAF

(Claims by car)

These insights provide valuable


information for understanding the
patterns and factors influencing claims
within the dataset.
(Premium Calculation and explanation)

Premium Calculation Formula The formula's structure allows for the


When calculating a premium formula I incorporation of these factors as
first had to look at what our coefficients that adjust a base
competitors were charging. premium (set at €850) to produce a
fully personalised premium. Each
factor is weighted based on its
significance in assessing risk. For
instance, the City Adjustment Factor
accounts for the city of registration for
each insurer, while the Car Model
Adjustment Factor considers the risk
associated with different car models.
The Daily Travel Adjustment Factors
adjust the premium based on the
(Irish Insurance market) driving environment, whether in city or
highway conditions, as these factors
Using the Irish car insurance average influence accident rates.
premium rates as guidance I began to The Age Adjustment Factor reflects
create a formula. the impact of the policyholder's age on
Using the key factors I created a risk risk assessment, with younger or older
based weighting for each variable that drivers potentially facing different
would act as a coefficient of a base premiums. Finally, the Policy Tenure
premium in order to determine a fully Adjustment Factor adjusts the total
personalised. based on the length of contract with
the policyholder.
These factors come together in
ensuring a tailored and accurate
Basic Premium: assessment of risk. By incorporating
these risk-based weightings into the
formula, we can determine premiums Conclusion
that accurately reflect the level of risk
Through meticulous data analysis and
associated with each policyholder's
the use of modelling techniques my
driving profile, thereby ensuring fair
team managed to develop a
and personalised pricing.
comprehensive calculation formula
that accurately represents an
Sustainability Score and individual's risk profile. By seamlessly
Rewards integrating these elements into our
The Green score is a much easier insurance product, we are poised to
calculation. The Green score (gs) is deliver a unique and competitive
determined by evaluating the vehicle's offering within the Revolut ecosystem.
carbon dioxide (CO2) emissions per This project not only enhances our
day relative to the range recorded in understanding of the factors
our database, with adjustments based influencing claims but also sets the
on fuel type. stage for continued innovation and
excellence in insurance services
tailored to the needs of our customers.

Limitations and future work


Dealing with the data was tough because
there weren't many claims to work with.
This made it hard to make a model that
was really accurate. In the future, we
(Sustainability score and premium could try something called under sampling
Interface) to fix this. That's when we balance out the
data by taking fewer samples from the
The emissions difference is divided by majority class. I didn't think of trying this
the CO2 range, then multiplied by 0.7. before, but it could help make our model
Additionally, a fuel type factor (Fscore) better. Better group communication and
ranging from 1 for electric vehicles to task allocation would also be beneficial as
0.25 for diesel vehicles is included, it would appear that my group was over
weighted at 0.3. We then provide a reliant on my prior knowledge of python
coupled with my ability to speak english
discount to users based on their GS
fluently for this project.
taken off the top of their overall
premium. This composite score
provides a streamlined assessment of
the vehicle's environmental impact,
enabling us to encourage eco-friendly
driving habits and prioritise low-
emission vehicles within our insurance
offerings.
Appendices

(Numerical vals vs Is_Claim)


(Correlation matrix Numerical)

(Age of holder vs Claim)


(Tableau Visualization 1)

(Age of Car vs Claim)

(Tableau Visualization 2)
(Claims per model)

(All models confusion matrix as above)

(Random forest ROC)

You might also like