You are on page 1of 48

PRICING OF MOTOR VEHICLE

INSURANCE PREMIUMS

ALFONCE OPIYO I162/0560/2018


LINET CHESANG I162/0556/2018
MUTHONI NDEGWA I162S/12300/2018
NACKSON MURIUKI I162S/19165/2015
SILVESTER ODHIAMBO I162/0615/2017

School of Pure and Applied Sciences


Department of Mathematics and Actuarial Science
Kenyatta University

A RESEARCH PROJECT SUBMITTED IN PARTIAL FULFILMENT OF THE


REQUIREMENT FOR THE AWARD OF THE DEGREE OF BACHELOR OF SCIENCE IN
ACTUARIAL SCIENCE IN THE SCHOOL OF PURE AND APPLIED SCIENCES.
22ND SEPTEMBER 2022
Declaration

This project is our original work and has not been submitted to any other institution of higher

learning for the award of a certificate, diploma, degree or any other award.

NAME REGISTRATION NO. SIGN DATE


ALFONCE OPIYO I162/0560/2018
LINET CHESANG I162/0556/2018
MUTHONI NDEGWA I162S/12300/2018
NACKSON MURIUKI I162S/19165/2015
SILVESTER ODHIAMBO I162/0615/2017

I confirm that this research project has been submitted for examination with my approval as the

university supervisor.

SIGN……………………………. DATE……………………………

Dr. ANANDA KUBE

Lecturer

Kenyatta University
Dedication

It is with genuine gratitude and warm regard that we dedicate this work to our parents and

guardians who have never failed to give us financial and moral support. We also dedicate this

project to our friends and all the students pursuing actuarial science.
Acknowledgement

First and foremost, we would like to thank the Almighty God for enabling us to complete this

project. We would like to convene profound gratitude to Dr. Ananda Kube, our lecturer and

supervisor, for supporting this project with his guidance, ideas and positive criticism. We would

also like to extend our appreciation to the Department of Mathematics and Actuarial Science and

all those who guided us in writing this project through their comments and suggestions. Lastly,

we want to thank ourselves for our cooperation and commitment towards the completion of this

project.
Abstract

The precise project’s objective is developing motor vehicle insurance pricing estimator using

Generalized linear models (GLMs) in order to increase insurance penetration in Kenya.

Insurance penetration in Kenya is low at 2.4% according to a 2020 study by the Central Bank of

Kenya (CBK), which is way below the global average of 7.23% due in part to Kenyans mostly

viewing insurance as a luxury product and will mostly take it when it’s a regulatory requirement

as in the case of third-party insurance. Five attributes: age of driver, age of driver’s license, total

distance covered by car, age of the car and gender of the driver have been used in estimating

both the claims frequency and claims severity. The Poisson model has been used to estimate the

expected claim frequency while the Gamma model has been used to estimate the expected claim

severity. The analysis shows that in the data used, the total distance covered by car is significant

in the estimation of claim frequency while the other four attributes are not significant. All the

attributes are not significant in the estimation of claim severity. These results show that the

insurer should continue calculating pure premiums using the existing methods because the

factors analyzed do not affect premium rates and hence cannot reduce the premiums charged by

the insurer, which is what this project aimed to achieve.


Table of Contents
Declaration.....................................................................................................................................2

Dedication.......................................................................................................................................3

Acknowledgement..........................................................................................................................4

Abstract..........................................................................................................................................5

CHAPTER ONE: INTRODUCTION..........................................................................................8

1.1 Background of the study...................................................................................................8

1.2 Problem Statement..........................................................................................................14

1.3 Objectives of the study.......................................................................................................15

1.3.1 General Objective.........................................................................................................15

1.3.2 Specific Objectives.......................................................................................................15

1.4 Significance of the study.....................................................................................................15

CHAPTER 2: LITERATURE REVIEW..................................................................................16

CHAPTER THREE: METHODOLOGY.................................................................................23

CHAPTER FOUR: DATA ANALYSIS AND PRESENTATION..........................................30

4.1 Data Presentation.................................................................................................................30

4.1.1 Frequency.....................................................................................................................30

4.1.2 Severity.........................................................................................................................33

4.2 Data Analysis.......................................................................................................................35

4.3 Models Evaluation...............................................................................................................38


CHAPTER FIVE: CONCLUSION AND RECOMMENDATION........................................40

References.....................................................................................................................................43

Appendix.......................................................................................................................................46

LIST OF TABLES
Table 1: Frequency histogram of five predictor variables for claim frequency............................32

Table 2: Column charts of claim severity against predictor variables.........................................34

LIST OF FIGURES
Figure 1: Generalized Linear Model output of claims data fitted on a Poisson distribution......36

Figure 2: Generalized Linear Model output of claims data fitted on a quasipoisson distribution

.......................................................................................................................................................37

Figure 3: Generalized Linear Model output of claims data fitted on a Gamma distribution......38
CHAPTER ONE: INTRODUCTION

1.1 Background of the study

Insurance is defined as a contract in which a corporation or the government agrees to pay a

premium in exchange for a guarantee of compensation for a certain loss, damage, illness, or

death. An insurance company is a financial institution that provides a variety of insurance plans

to safeguard individuals and organizations against the risk of financial losses in exchange for

regular premium payment. An insurance business works by pooling the risks of many

policyholders. There are two types of insurance: general or non-life (short-term) insurance and

life insurance (long-term). General insurance is either short-term or annual, and it is divided into

two types: personal (for individuals) and commercial (for businesses). Life insurance is a long-

term contract between an individual (policyholder) or an organization and an insurance firm.

The Insurance Regulatory Authority (IRA) regulates the insurance industry through the

Insurance Act and its accompanying Guidelines/Regulations. The Insurance Regulatory

Authority (IRA) is in charge of licensing, regulating, and growing Kenya's insurance business,

Insurance Regulatory Authority, (2017). The insurance industry in Kenya dates back to the

British colonization of the country. Colonialism had both positive and harmful consequences,

with the insurance industry being one of them. The white settlers made significant investments in

Kenyan land, particularly in farming and agriculture. The settlers, like the Italian merchants,

understood the importance of safeguarding their investments against a myriad of risks. Their

conquests grew, and so did their investments. This in turn led to an increase in demand for

insurance coverage. The settlers therefore capitalized on the situation and founded insurance
companies. As these colonies grew, the colonizers' insurance networks grew to meet the

increasing need for insurance. British insurers held all of the insurance organizations.

The earliest insurance companies include; Pioneer Assurance Society was founded in 1930,

Jubilee Insurance Company was founded in 1937, Pan Africa Insurance was founded in 1946,

and Provincial Insurance Company Limited was founded in 1949. The insurance organizations

had been upgraded to complete insurance firms by the time Kenya gained independence in 1963.

Kenya, like many other developing countries, had no specific insurance legislation before

independence. The Companies Act governed the operation. However, following our

independence in 1963, the United Nations Conference on Trade Development (UNCTAD)

assisted Kenya in recognizing the necessity for legislation to limit the growth of insurance

companies and the country's economy.

The Insurance Ordinance (1960) was enacted as a result of this. Its primary responsibility was to

oversee the insurance industry's formation, financing, and operations. In 1994 and 1972,

respectively, two policies were created and established. The first saw a healthy national

insurance and reinsurance industry as a key component of economic development. The other

strategy encouraged developing nations to take steps to reduce their reliance on international

insurers and reinsurers, as well as to domesticate competitive reinsurance terms and conditions

derived from the international insurance business. The Insurance Act, cap 487, was passed in

1986 and went into effect in January 1987 as a result of the preceding initiatives. The act

established a regulator's office as well as regulations for insurance and reinsurance businesses, as

well as other industry participants such as agents and brokers, to register.

The depth of a country's insurance market is determined through the metric calculation of the

insurance penetration ratio. The penetration rate is measured as the ratio of premium
underwritten in a particular year to the Gross Domestic Product (GDP). A higher penetration rate

indicates that the insurance business contributes more to the country's economy. Insurance

promotes economic development by encouraging savings, reducing unneeded precautionary

savings, and converting dormant capital into free capital by mitigating risk for participants in

diverse economic sectors, Liedtke, (2007). Individuals, corporations, and governments can

manage their risk exposure by purchasing insurance to cover losses, liability litigation, and

natural calamities that might otherwise be catastrophic if there were no risk transfer mechanisms

in place. Insurance promotes financial stability and fosters investment by mobilizing savings.

Personal and social insurance improves residents' well-being and allows workers to stay healthy

and financially secure in between jobs. A government can also encourage business development

by providing insurance to individuals and enterprises directly or by cooperating with the

insurance industry, Din et al., (2017).

The insurance sector in the globe is dominated by wealthy developed nations. Despite accounting

for less than 10% of the world's population, the Group of Seven countries (G7) account for about

65% of global insurance premiums. In 2012, the same seven countries spent an average of US$

3,910 on insurance premiums per capita, compared to barely US$ 120 in emerging markets Re,

(2013). Globally, 6.5% of people have insurance. In 2012, total insurance premiums in Africa

totaled US$ 71.9 billion, resulting in a penetration rate of 3.65%, Re, (2021). Re, (2020)shows

that in 2019, the total insurance penetration in emerging markets grew vigorously, continuing a

solid upward trend, especially in Asia. Growth was seen in many regions except in the life

sectors of Emerging Europe and the Middle East and Africa. The average per capita spending of

insurance (insurance density) in emerging markets was US$ 175 in 2019, whereas insurance
penetration was 3.3%. Due to COVID-19, the report projected slow insurance market growth by

close to 3 percentage points.

The total insurance penetration in Africa in 2019 was at 2.78% compared to the global average

of 7.23% as stated by African Insurance Organization. South Africa prevails the African

Insurance Market, generating 70% of the African insurance premiums at a value of US$ 48.3bn.

COVID 19 impact slowed down the insurance penetration in Africa but the solid market

fundamentals of the African countries remain appealing for long-term growth.

Kenya's insurance industry is part of the financial services industry. According to a report by the

Insurance Regulatory Authority, Kenya has 56 insurance businesses, 204 insurance brokers, and

11273 insurance agents. The 56 companies achieved total revenue of Kshs 132.1 billion, with an

overall underwriting profit of Kshs 15.5 billion, or around 11.7 percent of total revenue.

Insurance Regulatory Authority, (2021).

Insurance penetration in Kenya remains low at 2.4% in contrast to other countries according to

the 2020 Financial stability report by the Central Bank of Kenya, which is way below the global

average of 7.23%. This can be due to the fact that most Kenyans view insurance as a luxury

product and mostly take it when it is a regulatory requirement as in the case of third-party

insurance. A variety of causes have been blamed for Kenya's low insurance penetration rate,

including the burdensome regulatory environment, little public awareness, poor customer

service, poor claims, non-supportive culture, low discretionary income, marketing methods

prices, and settlements Barasa, (2016). Improving Kenya's socioeconomic situation will help

raise insurance penetration numbers, which have stayed below worldwide standards for a long

time, indicating a huge, uninsured consumer base.


The COVID-19 pandemic disrupted all businesses around the world. It posed a particularly

significant test to small and medium businesses (SMBs) given their typically thin cash reserves

and over-dependence on a small number of market routes. The levels of business insurance for

SMBs tend to be relatively low as they view insurance as a cash outflow that may not provide

them with the needed benefits. In the wake of the pandemic, SMBs are likely to review their

arrangements since the pandemic served as a powerful demonstration on the value of insurance

protection. This creates a significant opportunity for insurers to provide the right range and

choice of cover, through the right channels, at the right price. The potential is there for insurers

who can drive a truly differentiated SMB proposition to secure a significant competitive

advantage, putting the segment’s business needs and customer experience at the core of

everything they do and connecting it across the enterprise.

Insurance pricing (also known as rate making) is the process of determining what rates, or

premiums, to charge for insurance. A rate is the cost of insurance per exposure unit (a unit of

liability or property with similar characteristics). Until the policy time has expired, the true cost

of providing the insurance is unclear. As a result, rather than real costs, insurance rates must be

based on projections. The majority of rates are established by statistical analysis of previous

losses based on the insured’s factors. Premiums are determined based on the variables that

produce the best projections. However, in some cases, such as earthquake insurance, the

historical analysis may not provide an adequate statistical reason for selling a rate. Catastrophe

modeling is occasionally utilized in these situations, but it has a lower success rate. Underwriters

determine which variables apply to a certain insurance application, while actuaries set the

insurance rate depending on a number of variables.


The insurance sector is primarily concerned with hedging against the danger of a speculative

financial loss, and its operation is thus largely a risk management exercise. Using an insurance

policy contract, the insured swaps future risk with an insurer for a fixed premium, and whenever

the policyholder suffers a loss, they can file a claim with the insurer provided the agreement

allows it. The insurer determines the premium in advance of any claims, thus it is critical for the

company to forecast the risks of its clients to set a lucrative premium. Predictive modeling is

widely utilized in the insurance industry, both in terms of appraising consumers and determining

rates.

To ensure that the customer's loss is covered, the premium is established in response to his or her

risk. This, however, does not cover the entire cost. An insurance company, like any other

business, has its expenses and strives to profit. As a result, the premium is adjusted to cover both

the customer's loss and the costs, and to maintain a reasonable profit margin. The core of the

premium, however, is to select a premium based on the customer's risk. And thus, the customer's

risk must be calculated.

The underwriting as well as the ratemaking have to be precise. If the rate is accurate for a

specific class, but the underwriter assigns applicants who are not in that class, the rate may be

insufficient in covering losses. On the other hand, if the underwriting is good but the rate is

based on a small sample size or variables that don't consistently forecast future losses, the

insurance firm could lose a lot of money.

According to the Re, (2021), the third quarter of 2021 marked an increase of 15% on non-life

insurance commercial lines prices. Property pricing increased by 9% in the third quarter and

Financial and Professional liability (FinPro) pricing increased by 32% in almost all regions.

Property rates were mostly driven by cat-related covers and FinPro by rising Directors &
Officers claims. Casualty business showed 6% price improvements in 2021. These were driven

by improvements in the US and Europe.

1.2 Problem Statement

The insurance business around the world has been showing incredible progress in terms of new

product creation and technological advancement. However, when it comes to insurance market

penetration in Kenya, there has been little progress. This can be as a result of various factors

including high cost of insurance, lack of a savings culture among the general public, limited

disposable income; with close to half of the population of Kenya living in poverty, lack of

enough tax incentives to encourage the purchase of insurance products, and lack of enough

credibility of the insurance industry, especially regarding settlement of claims.

This study seeks to price motor vehicle insurance using a Generalized Linear Model(GLM) in

order to reach a larger customer base with an affordable premium. Generalized Linear models

have been successfully used as a pricing technique in the insurance industry. A study done by

Masese, (2020) modeled the claim frequency and severity on auto insurance data using

Generalized Linear Models in order to determine the effects of the distance driven, speed, and

time of driving on premium rates. The findings were that an increase in the distance and speed

resulted in an increase in claim severity and frequency, hence causing the pure premium to

increase. By using a GLM, this study will be able to determine significant factors affecting motor

insurance premium rates and adjust them in order to come up with affordable premium rates.

This will in turn help to increase insurance penetration through the availability of affordable

premium rates for the general population.


1.3 Objectives of the study

1.3.1 General Objective

The main objective of this project is to develop a pricing estimator for motor vehicle insurance

using GLMs.

1.3.2 Specific Objectives

i. To develop a pricing estimator for motor vehicle insurance using GLMs.

ii. To model claim frequency by use of GLMs and estimate the parameters.

iii. To investigate the properties of the proposed estimator in (i.) above

iv. To fit the proposed estimator in (i) above in motor vehicle insurance data to explain how

pricing can be done.

1.4 Significance of the study

This project intends to develop the appropriate pricing estimator for motor vehicle insurance in

Kenya by modeling claim frequency by use of GLMs. The right pricing of motor vehicle

insurance will enable many motor vehicle owners to take the insurance, therefore, increasing the

insurance penetration in Kenya. The increase in the volume of motor vehicle insurance business

will in turn generate more profits for the business. Owing to the fact that there is very little

literature in the Kenyan market regarding GLMs, this project will also form an insight into this

area as the use of GLMs in pricing has gained popularity in Malaysia, Canada, and other

European countries.
CHAPTER 2: LITERATURE REVIEW

This chapter primarily focuses on addressing and synthesizing the existing information in the

areas of motor insurance pricing and the usage of Generalized Linear Models. These models are

becoming more prominent as a method of statistical analysis for insurance data.

Nelder and Wedderburn introduced generalized linear models for the first time in 1972.For

decades, the Generali, common statistical equipment for Generalized Linear Model (GLM) has

been one of the most used methods for assessing automobile insurance. According to Huang &

Query, (2007). it is a highly valid system for determining automobile insurance rates that can

readily manage a large number of risk combinations and construct complex claim linkages.

Nelder & Verrall, (1997) demonstrated how credibility theory can be included in GLM theory.

In this line, Schmitter, (2004)presented a straightforward approach for estimating the number of

claims required for a GLM tariff calculation. GLMs are now widely regarded as the industry

standard pricing technique and, more crucially modeling actuarial data. The majority of countries

utilize GLMs to analyze their portfolios. In Japan, Korea, Canada, Singapore, Brazil, Malaysia,

and many other European countries, the use of GLMs is becoming more common.

Furthermore, there are two types of GLMs. The first one is an additive model, which includes

adding the covariates. A multiplicative model is an alternative option. The additive model,

according to Ohlsson & Johansson, (2010), Goldburd et al., (2016), and Huang & Query, (2007)

has inferior actuarial applicability since it can produce false results. Negative premium values

and claim estimates are simple to obtain with adequate values. Furthermore, charging a $20

penalty for a $100 premium and the same amount for a $1,000 premium seems absurd.
According to the multiplicative model, we should charge 20% of the premium for both

individuals, which seems actuarially fair because 20% for $100 and $1,000 equals $5 and $50,

respectively.

The Tweedie Models, named after a British statistician who provided a comprehensive

examination of the notion in 1984, are also of great importance to premium grading. Goldburd et

al., (2016) noted that because most of their mass is near zero and the remaining mass is tilted to

the right, Tweedie models are best suited for premium estimates. When most claim distributions

are visually represented, this is essentially how they would appear.

Insurance companies categorize many kinds of risks using "risk variables," such as age and

gender. There are a few things to think about when using generalized linear models to insurance

pricing, especially in the current global context. Williams & Shabanova, (2003)discovered that

while women in their 50s and older were more likely to be at fault than same-age males, young

males were more likely to be at fault than young females for accident deaths. Young drivers,

particularly males, have the highest rates of culpability for deaths per licensed driver. This

demonstrates a link between gender and age when it comes to claims settlement. If those two

things contribute to crashes, they must be factored into pricing. In 2012, however, the European

Court of Justice (ECJ) prohibited all private European Union (EU) insurers from discriminating

against risks based on gender. Insurers can be certain that acquiring more specific information

will have little or no impact on their pricing procedures.

Bülbül & Baykal, (2016) observed that the bonus-malus system is one of the most significant

instruments utilized in third-party liability pricing in their study on an optimal bonus-malus

system design in automobile insurance. It is a mechanism that changes a policyholder's premium

in accordance with their history of claims. The bonus-malus system does this by ensuring fair
rates because each person pays a premium in accordance with the frequency of their claims.

Typically, the claim frequency is utilized because The frequency of claims is more impacted by

unobserved and unmeasurable driver factors than the severity. To encourage drivers to drive

more cautiously and to make sure that policyholders pay a premium in line with their history of

claim frequency, insurers use bonus-malus systems.

Silva & Afonso, (2015) discussed other models that could be used in pricing motor insurance

premiums. They looked into the following options: Pure premium by historically aggregated

claims, Pure premium by expected aggregated claims, Classic linear models, and Generalized

linear models. Their research aimed to compare the growth in relation to original pricing and

pure premium dispersion. To adequately clarify their views, they constructed a variable called

Efficiency, which calculates the price rise so that it is one percentage point lower than the price

in their dataset.

After doing their research, Silva & Afonso, (2015)concluded that the historical pure premium

and GLM modeling methodologies had the best price efficiency (1.60), but the former showed

lesser volatility than the latter. The historical pure premium technique had a standard error of

5.26 percent while the GLM technique had a standard error of 17.52 percent. The historical pure

premium had a comparative advantage because of the modest variation of the basic indicators

over time and since it contained data for all insured vehicles in Brazil, allowing for strong

adherence to the technique. GLM, on the other hand, appeared to be a viable pricing option for

medium-sized portfolios that are experiencing growth or are looking into specialty industries.

In circumstances when the data comprises associated risk factors, Huang & Query, (2007)

proposed combining a Max Model with a GLM to improve accuracy. The GLM usually does a

good job with factors that are somewhat linked; it only has trouble fitting values when the
components are extremely connected. The technique of selecting the most significant and least

correlated elements did not work well in China, according to Huang & Query, (2007) because it

left very few components to work with. They went on to advise that correlated factors be used

since they help to bridge the gap between relativities.

The pricing scheme does not provide a significant reduction to policyholders who provide

inaccurate information, and there is sufficient redundancy to correct it. For instance, actuaries in

China evaluated the vehicle's book value, engine size, maker or model, mileage, and other factors

to determine evaluate the vehicle's risk, even though all of these factors are interrelated. This

contradicts GLM's assumptions. Other factors add to the difficulty of separating the risks. Huang

& Query, (2007) gave an example to consider the circumstance of a skilled driver who is driving

a faulty vehicle or vice versa. Major issues are frequently attributed to a single factor, although

the model's other factors are considered accurate and reliable. For these issues, they proposed

using a Max model to overcome the problems of linked variables, based on the above challenges.

The Max model assumes that for the correlated risk factor,

GLMs could be used to produce more models, such as those suggested by Goldburd et al. (2016).

Elastic Net GLMs, Generalized Additive Models (GAMs), Multivariate Adaptive Regression

Splines (MARS) Models, Generalized Linear Mixed Models (DGLMs), and GLMs with

Dispersion Modeling are a few examples of these models. When we look at the modern era, we

can see how GLMs have had an undeniable impact on the pricing of auto insurance. GLMs serve

as a foundation for more advanced approaches. According to Liu et al., (2017), telematics; the

use of car sensors and positioning systems to monitor vehicle and/or driver behavior, has enabled

insurers to re-design their insurance plans on the fly based on the data collected. Artificial

Intelligence (AI) has expanded the quantity of data that insurers can collect from their customers.
Companies like Discovery are already using telematics in vehicle-based insurance in neighboring

South Africa. Poole et al., (1998) defined Artificial Intelligence as intelligence demonstrated by

machines, as opposed to human and animal intelligence. In computer science, AI research is

described as the study of intelligent agents, which are any devices that sense their surroundings

and take actions to increase their chances of achieving their objectives.

The rise of various data-gathering devices, informally known as the Internet of Things (IoT), as

well as technological advancements, have recently enabled actuaries to use predictive analytics, a

benefit that comes with harvesting Big Data. This means that the typical consumer's premium

can finally be set by specific criteria that are unique to them. Liu et al., (2017) and Zheng,

(2015). have exhibited ideas and concepts that are being tested on a larger scale in industrialized

countries. Liu et al., (2017) propose a driving behavior model based on Usage-based Insurance

(UBI). They built on previous work by Zheng, (2015) who presented the notions of altering a

GLM estimated static premium by merging it with a dynamic premium that adjusted on the fly

based on Telematics data obtained from the insureds' vehicles.

In Dionne & Vanasse, (1989), the study proposed an expansion of popular pricing strategies for

auto insurance. This was accomplished by merging two systems of tarification and creating a

negative binomial model with a regression component: a previous model that chose tariff

variables, established tariff classes, and calculated premiums, and a posterior model that used a

bonus-malus system and adjusted the policyholder’s premium according to their accident history.

The study was able to show how the bonus-malus system can be modified to include both the

prior and posterior information on an individual basis.

The ideal premium strategy for a general insurance company, according to Pantelous &

Passalidou (2013), is influenced by the competitiveness of the insurance market as well as the
company's demand on the market. The strategy for premium pricing is primarily based on claim

expenses, business and administration costs, the margins for changes in the experience of claims,

and anticipated profits. The study looked at two strategies for figuring out a company's best

course of action in a cutthroat insurance market. The first step was to provide a generic equation

for the business volume linked to variables like historical performance, average market premium,

average company premium, and company reputation. The best premium method was then

discovered utilizing a linear discounted function to maximize a discrete-time, stochastic wealth

function's present value. These conclusions were used to analyze data from the Greek insurance

industry, which improved comprehension of the model.

David, (2015) looked at how auto insurance premiums are calculated using Generalized Linear

Models. This study used data from a French auto insurance portfolio where the bonus-malus

coefficient growth The insurance portfolio was separated into sub-portfolios based on their

various risk criteria, preventing anti-selection because each class had policyholders with the

same risk profile and willingness to pay the same fair premium. A Poisson Regression model

was used to predict the frequency of claims, and a Gamma model was used to determine the

average level of claims costs for each class of policyholders. The projected frequency and

expense of claims were multiplied to determine the premium. The study's findings showed that

the pure premium lowers with increasing insured age, increasing insurance contract age, and

factoring risk factors such the insured's occupation, use of the vehicle, bonus-malus and age of

the insurance contract The pure premium decreases with an increase in the insured age, the age

of the insurance contract and the bonus malus coefficient growth..

A study done by Kafková, (2015) shows that there are certain important factors to be considered

for an individual policyholder that cannot be accounted for in GLMs such as the driver’s
capacity, reflex actions, drug abuse, knowledge of the highway code. Hence, the use of a bonus-

malus able to determine an estimate of annual claim frequency for different groups of drivers

using GLM. A bonus-malus system was applied that enabled a fair premium price for each of the

different groups of drivers.

A study by Garrido et al., (2016) explored how GLMs can be used in cases where the frequency

and severity of claims are dependent. An example of this is in motor vehicle insurance, where

claim frequency and severity are often negatively correlated; as drivers who file a large number

of claims are usually filing for minor accidents with low claim amounts. In the study, this

dependence between claim frequency and claim severity was modelled by a conditional method.

The claim frequency was included as a covariate in the conditional severity model, thereby

forming an extended version of the formally used independent model. This information was

applied to an automobile insurance dataset where the dependence rate was found to be small but

proved to be an important factor in determining the pure premium.

An advantage of using Generalized Linear Models is that they are not limited to normally

distributed data, but can also be used with exponential family distributions. The Normal,

Binomial, Gamma and Poisson distributions are part of the exponential family of distributions. A

study by Nelder & Wedderburn, (1972) showed that GLMs expand on the Gaussian model to

include the exponential family of distributions. This has been beneficial in pricing general

insurance, as the claim frequency and severity are often not normally distributed. The

advancement of GLMs has improved the quality of risk analysis models and the methods of

determining an appropriate premium for a certain risk.


The GLM principles have inspired a lot of papers and literature on the various models and many

authors have succeeded to show and develop the assumptions used in the application of these

models in general insurance. This chapter has looked at the arguments of various authors and the

ways in which they used GLM to price insurance products. Their contribution has led to the

development of better models of pricing non-life insurance.

CHAPTER THREE: METHODOLOGY

In statistical analysis, simple linear regression is used to model the relationship between a

dependent variable and other independent variables. It assumes that the dependent variable

follows a normal distribution and also that the dependent variable has a linear relationship with

the independent variables. This becomes a shortcoming when dealing with data that follows a

non-normal distribution or is non-linear.

Hence, the introduction of Generalized Linear Models, that generalize the simple linear

regression models. A function of the dependent variable, referred to as the link function, is used

instead in the model. The Generalized Linear Model works with a general class of distributions

including normal and non-normal distributions, and so addresses the shortcoming of simple

linear regression. The linear functions of x are the transformations of the means and they contain

the linear and multiplicative models of the data


The Components of a GLM

In a GLM, the dependent variable Y has an exponential dispersion family distribution with the

general form:

yθ−b(θ )
f ( y ;θ , φ )=exp ( )+c ( y , φ) (1 )
a( φ)

where θ and ϕ are the natural and dispersion or scale parameters respectively, and a , b , c are

functions. The Exponential Dispersion model becomes a one parameter exponential family if ϕ is

fixed. A distribution from the one-parameter exponential family has a log-likelihood with the

form:

yθ−b(θ )
l= +c ( y , φ ) (2 )
φ

Where θ is the canonical parameter and ϕ is the dispersion parameter, assumed known. The

dispersion parameter ϕ has a value 1 for the Poisson distribution. It is however not known and

has to be calculated for the Gamma distribution. As seen in Haberman & Renshaw, (1996), it can

be shown that

d
m=E(Y )= b(θ ) (3 )

2
d
Var (Y )=φ 2 b(θ )=φ b '' (θ ) (4 )
dx

Var(Y ) is a result of multiplying the dispersion parameter ( φ ) and b ' ' (θ). The quantity b ' ' (θ)

which is called the variance function is dependent on the canonical parameter ( θ ) and therefore
the mean. An illustration of this can be shown using the log-likelihood of the Poisson

distribution,

l= y log m−m−log y ! (5 )

where θ=log m , b(θ )=exp (θ) , Var (m)=m and the dispersion parameter φ=1

The link function defines the relationship between the random and the systematic components. It

shows the relationship between the expected value of the response and the linear combination of

predictor variables. Let the expected value of the response/dependent variable be


E(Y i )=μi . A

link function can be expressed as

p
g( μi )=β 0 + ∑ β j x ij=x ti β=ηi (6 )
j=1

where the parameters β 1, β 2, …, β p are a linear combination of the explanatory factors X i . Here,

the link function is the function g as it links the linear predictor ηi with the mean, where
μi is

the mean of the dependent variable

A link function is used to define the relationship between the mean of the response variable Y,

and the predictor variables. This study aims to rate premium factors by the use of a GLM that

does this using the linear predictor


Estimation of the claim frequency

Frequency is the number of claims per unit exposure over a specified period of time. In the non-

life insurance industry, it has been demonstrated that using GLM techniques to estimate claim

frequency has a priori Poisson structure. Antonio & Valdez, (2012)presented the Poisson model

as the event counts modelling archetype, commonly referred to as the frequency of claims in

actuarial literature. Cameron & Trivedi, (1998) contribution represents an important

development for counts regression models. In this instance, assuming that the discrete random

variable Y (claim frequency or observed number of claims), conditional on the vector of

explanatory variables ( X i ) (the observable risk characteristics), is Poisson distributed, the

Poisson distribution represents the appropriate statistical model to evaluate the probability of 0,

1, 2,... risks occurrence. Therefore, for the insured i , the probability that the random variable Y i

takes the value


yi( yi ∈ Ν ) is given by the density:

−λ i
e λ y
i i
f (Y i= y i|x i )= (7)
yi !

The Poisson distribution suggests a specific type of heteroscedasticity, resulting in equality

between the mean and variance of claim frequency. In other words, the Poisson distribution

parameter reflects both the mean and the variance of the distribution:
x'i β
E( y i|x i )=V ( y i|x i )=λ i=e (8)

The maximum likelihood estimator is the standard estimate for this model. The likelihood

function is described as follows

−λ i
n e λ y

L( β )=∏ ii
( 9)
i=1 yi !

λ
By substituting i as equal to e
x 'i β
,

xi β
' t
n x β yi
e−e (e i )
L( β )== ∏ (10)
i=1 yi !

The log-likelihood function is obtained by using a logarithm in both sides of (7):

n n
x' β
l=∑ [ y i ln λ i−λ i−ln y i ! ]=∑ [ y i x 'i β−e i −ln y i !] (10 )
i=1 i=1

The first two partial derivatives of the log-likelihood function exist and are written as:

n n
∂l( β ) x' β
=∑ ( y i −λi ) xij =∑ ( yi −e i ) x ij (11)
∂ β j i=1 i=1

2 n n
∂ l( β ) x' β
=−∑ λi x ij x ik =−∑ (e i x ij x ik ) (12)
∂ β j βk i=1 i=1

The maximum likelihood estimators ^


β j can be determined by differentiating the log-likelihood in

terms of the regression coefficients and solving them to zero. The system's equations can't

generate explicit solutions, hence an iterative method must be used to solve them numerically.
Newton-Raphson is regarded as one of the most efficient iterative techniques. The estimation of

claim frequency can also be done using RStudio packages in R programming however due to the

nature of the dataset, there might be a problem of random variations. As seen in Ohlsson &

Johansson, (2010), “The variance of the data within a tariff cell is greater than the variance of a

Poisson distribution, resulting in 17 overdispersions due to random variation among customers

and insured objects and the impact of explanatory variables not included in the model.” To

overcome the problem of random variation, the Overdispersed Poisson(ODP) can be used by

calling upon the GLM function ‘quasipoisson’ instead of usual ‘Poisson’. The ODP is similar to

the usual Poisson distribution, with the exception of the φ , the dispersion parameter. In the

ODP, φ can take any value other than 1 given in the ‘true’ Poisson.

Estimation of the claim cost

The Gamma model is often used in modeling the claim costs. In Pinquet, (1997) the author

describes a simple realistic parametric model based on the Gamma distribution for the modeling

of claim cost in auto insurance. Let


c i1 ,c i2 ,...,c in be the cost of the claims incurred by an

insured i, assuming the costs are independently Gamma distributed, the pdf is given by

1 vci v vc
f (ci )= ( ) exp(− i ) c i >0 (14 )
Γ ( v ) μi μi

2
μi
E(c i )=μi and variance V (c i )=
With mean v

Thus the log-likelihood for the Gamma model is expressed as:


yi
1 vc vc ik 1
l ( β )= ∏ ∏ ( Γ ( v ) ( μ i )v exp(− ) ) ( 15)
i| yi >0 k =1 i μi c ik

Deriving the likelihood function,

yi
∂ l ( β|c ) vc
= ∂ ∑ ∑ (−v ln μi − ik )=0 ( 16)
∂ βj ∂ β j i| y > 0 i=1 μi
i

Which can be simplified as

yi
vcik
∑ ∑ x ij ( 1− μi
)=0 ( 17 )
i| y i >0 i =1

x 'i β^
Hence having c^ i= μ^ i=e as the estimated claims cost for the insured i, the maximum likelihood

estimate ^β is found by solving the equation:

c
∑ ( y i− c^i. )x i =0 (18)
i

Hence demonstrating connection, the between the explanatory variables and residuals. The

parameters v and µ, which allow for more flexibility in forecasting claim costs, are the key

benefits of utilizing the Gamma model to estimate claim costs.

Modeling the pure premium

A reasonable pure premium reflects the estimated value of claims. This number is modified to

represent the real price per exposure level after accounting for expenses and claim handling fees.

The pure premium is determined by multiplying the estimated claim frequency and claim cost:

N
E[ ∑ Ci ]=E(Y )×E(C i ) (19 )
i=1
for the claim amounts (C1 , C 2 ,.. .) . According to Denuit & Charpentier, (2005), this separate

method of calculating the frequency and cost of claims is relevant since the risk factors affecting

the frequency and cost of claims are typically different. The separate study of the two

components demonstrates how the premium is affected by the risk factors.

CHAPTER FOUR: DATA ANALYSIS AND

PRESENTATION

A dataset containing 489 observations of telematics data and claims data of motor vehicle

accidents that occurred in Spain in 2011 was used in the analysis. The data was provided by an

insurer and contained a random sample with information on total cost in thousand Euros, drivers’

information, and telematics information on the driving patterns of the car.

Nine attributes that could have been used as predictor variables with two other response variables

were retained: Number of claims (Nclaims) and Amount settled for by the insurer with respect to

claims made (total_cost)


4.1 Data Presentation

4.1.1 Frequency

This segment examines how frequency of claims vary with vehicle age, age of license, age of

driver, gender of driver and the total distance in kms covered by the car. Histograms for each of

the attributes have been created to present the data.


Table 1: Frequency histogram of five predictor variables for claim frequency

Histogram of Age of car Histogram of Age of License


120
120
106 102
100 100
88
79 79
80 72 80
65 63
60 60 51
40 39

Frequency
Frequency

40 36 38
26 40 29
20 11 19 16
7 8 20
4
0
0
3.9

5.7

7.5

9.3

.9

.7

.5

.3

.1

.9
11.

-3.3

-5.9

-7.2

-8.5

-9.8
4.6

.1

2.4

3.7

5.0
-14

-16
-12

-18

-2 0

-21
2.1-

3.9-

5.7-

7.5-

-11
9.3-

3.3-

1- 1

4-1

7- 1
2.0

5,9

8.5
4.6

7.2
11.1

12.9

14.7

18.3

20.1
16.5

9.8

11 .

1 2.

13 .
Age of Car
Age of License

(a) ( b)

Histogram of Age of driver Histogram of Gender


80 76
68 69
70 270
61
60 52 260
50 48 47
37 250
40
28 240
Frequency

30
Frequency

20 230
10 3 220
0
210
2

4
4.8

6.2

7.6

3.2

4.6
.8

200
23 .

30 .
6-2

6-2

- 31
4-2

8-2

2- 2

8-3

2-3
2 2-

2 9-
2 0.

2 7.

Females Males
30.4
2 3.

2 4.

2 6.

3 1.

3 3.

Age of driver Gender

(c) (d)

Histogram of Distance covered by car


113
120
80 81
80 60 47
43 29 19
40
Frequency

6 3 4 1 0 2 0 0 1
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 2 2 5 2 2 7 2 2 92 2 1 2 2 3 2 2 5 2 2 7 2 2 9 2 2 1 2 2 3 2 2 5 2 2 7 2 2 9 2 2 5 2 2 3 2 2 5 2 2
- - - - 1 1 1 1 1 2 2 2 2 2 3 3 3
20 20 20 20 0- 0- 0 - 0 - 0 - 0- 0- 0- 0 - 0 - 0 - 0-
1 2 3 2 5 2 72 92 2 1 2 2 3 2 2 5 2 2 7 2 2 9 2 2 1 2 2 3 2 2 5 2 2 7 2 2 3 2 2 1 2 2
1 1 1 1 1 2 2 2 2 3 3
Distance in kms

(e)
Table 1a: A histogram of age of car

It is noted that frequency is highest at car age 5.7 -7.5 age band followed closely by the 3.9-5.7

age band. The 7.5-9.3 age band has the third highest frequency while the 16.5-18.3 age band has

the lowest frequency.

Table 1b: A histogram of Age of License

It is noted that frequency rises steadily with license age till the 4.6-5.9 then drops steadily with

license age. Frequency is highest at the 4.6-5.9 license age band followed by the 3.3-4.6 age

band which is in turn followed very closely by the 5.9-7.2 age band.

Table 1c: A histogram of Age of driver

It is noted that frequency is highest at 24.79-26.19 age band followed very closely with the

26.19-27.59 age band. 23.39-24.79 age band has the third highest frequency.

Table 1d: A histogram of Gender

It is observed that males have a higher claim frequency compared to females.

Table 1e: A histogram of the Distance covered by car

It is observed that frequency increases to a maximum at 5220-7220 distance band then drops

steadily with distance. 5220-7220 is the distance band with the highest frequency, followed

firmly by 7220-9220, and 3220-5220 has the third-highest frequency.


4.1.2 Severity

This segment inspects severity of claims variation with the attributes: age of car, age of license,

age of driver, gender of driver and the total distance covered by the car in kilometers. Column

charts were used to present the data by plotting the claim severity against the attributes.

Table 2: Column charts of claim severity against predictor variables

Claim Severity Against Age of car Claim Severity Against Age of License
500000
450000
400000
700000
350000 600000
500000
300000 400000
250000 Severity 300000
Severity

200000
200000 100000
150000 0
Under 3 3-5.99 6-8.99 9-11.99 Above 12
100000
License Age
50000
0
Under 4 4-7.99 8-11.99 12-15.99 Above 16
Car Age

(a) (b)

Claim Severity Against Age of Driver Claim Severity Against Gender


450000 780000
760000
400000
740000
350000
720000
300000 700000
Severity

250000 680000
Severity

200000 660000
640000
150000
620000
100000
600000
50000 580000
Females Males
0
20.5-22.5 22.5-24.5 24.5-26.5 26.5-28.5 28.5-30.5 30.5-32.5 32.5-34.5 Gender
Age of Driver
Claim Severity Against Distance covered by car
800000 (d)
( 700000
c)
600000

500000
Severity

400000
(e)
300000

200000

100000

0
0-5000 5001-10000 10001-15000 15001-20000 Above 20000
Distance Bands
Table 2a: A column chart of claim severity against age of car

It is observed that severity increases to a maximum at 4-7.99 car age then decreases steadily with

an increase in car age. Generally, there are more insured cars aged between 4 and 7.99 years

hence the high severity. Insurance companies also set higher premiums for older cars because of

their high exposure to risk hence discouraging owners to take policies on older cars.

Table 2b: A column chart of claim severity against age of license

It is observed that severity increases to a maximum at age 3-5.99 then decreases steadily with an

increase in license age. Severity is lowest at license age less than 3 because there are fewer

insured drivers with licenses under 3 years. The severity then decreases from age 3-5.99 because

the drivers become more experienced and less exposed to risks leading to insurance claims.

Table 2a: A column chart of claim severity against age of driver

It is observed that severity is highest at the 24.5 to 26.5 age band. 30.5 to 32.5 age band is the

second highest followed by 22.5 to 24.5 age band. Drivers between the ages of 24.5 and 26.5 are

more susceptible to accidents, which accounts for the high severity.

Table 2b: A column chart of claim severity against gender

It is observed that males have a higher severity compared to females. Men are more prone than

women to drive more miles and participate in dangerous driving behaviors such as speeding, not

wearing seat belts, and driving while intoxicated.

Table 2b: A column chart of claim severity against distance covered by car.
It is observed that severity increases to 5001-10000 distance band then steadily decreases with an

increase in distance. This is mainly due to most cars being driven to between 5001 and 10000

kilometers with fewer cars having more mileage hence the highest severity.

4.2 Data Analysis

The data was fitted to a glm function in order to determine the significance of the predictor

variables age, age of license, total km, age of car, and age of gender in calculating the pure

premium. A Poisson model was used in modelling the frequency of claims and a Gamma

distribution was instrumental in claim severity modelling.

Figure 1: Generalized Linear Model output of claims data fitted on a Poisson distribution

A second iteration was run using a quasipoisson model and the following were the results.
Figure 2: Generalized Linear Model output of claims data fitted on a quasipoisson distribution

Using the quasipoisson model, all the predictor variables except the distance are found to be

insignificant. This means that only the distance (tkm) variable should be used in clustering the

data. This illustrates that the number of claims and distance covered(tkm) are positively

correlated.
Figure 3: Generalized Linear Model output of claims data fitted on a Gamma distribution

From the table above, none of the variables are significant to the severity except the intercept

which is the expected mean value of the severity when all the other predictor variables are zero.

This means that age, age of license, distance(tkm), age of car, and gender do not affect the

severity of the claims.

From the analysis above, the data is rightly skewed. This means that most of the data is on the

positive side of the graph’s peak. This is also evident from the histograms above, where it can be

seen that the frequency of observations is lower on the right side of the graph than it is on the left

side of the graph.


4.3 Models Evaluation

Utilizing the deviance, the models' goodness of fit was evaluated. The models were evaluated as

follows;

Hypothesis testing

A test is carried out to determine whether car age, age of license, distance covered by car, gender

and age of the policy holder affect premium rates so that they can be adjusted in order to come

up with affordable premium rates.

A null hypothesis states that the risk factors stated above are not significant in calculating the

pure premium and they do not affect the motor insurance premium rates while the alternative

hypothesis states that the risk factors stated above are significant in calculating the pure premium

and they affect the motor insurance premium rates.

Frequency Model

From table 2 above, the effect of distance covered(tkm) is statistically significant at the at the

0.05 threshold of significance. This denotes that a change in distance(tkm) is associated with a

change in the frequency of claims. Deviance is used as a measure of goodness of fit in GLMs.

The residual deviance of the frequency model on 483 degrees of freedom is 105.88. The p-value

associated with the residual deviance by using the chi-square distribution is 1 and hence at a 5%

level of significance, there is adequate proof to reject the null hypothesis H0: that the age of the

policyholder, age of car, age of license, gender, and distance covered by the car do not affect the

frequency of claims.
Severity Model

From table 3 above, it is noted that the policy holder’s age, age of car, age of license, gender, and

distance covered by car are not significant in calculating the pure premium. That is, the stated

risk factors do not affect the premium rates of motor insurance premium rates. The residual

deviance for the severity model on 483 degrees of freedom is 861.54. Using the chi-square

distribution, the p-value associated with the residual deviance is 0 and hence at 5% significance

level, we do not reject the null hypothesis H0: that age of the policy holder, age of car, age of

license, gender, and distance covered by car do not affect the amount settled by the insurer with

respect to claims made.


CHAPTER FIVE: CONCLUSION AND

RECOMMENDATION

This chapter highlights an outline of the results of the project and the main conclusion drawn

from data analysis in Chapter four. This chapter has been sorted as follows: segment 5.1 contains

summary of the project’s findings, while segment 5.2 is the conclusion and lastly segment 5.3

shows the recommendation for this research.

5.1 Summary

This project focused on developing a pricing estimator for affordable motor vehicle insurance

premiums using Generalized linear models (GLMs). Five attributes were used in the

approximation of both the frequency of claims and severity of claims. The attributes used were

driver’s age, age of the driver’s license, total distance covered by the car, car age and the driver’s

gender. The attributes were plotted individually against both the frequency of claims and severity

of claims using graphs. The Poisson model estimated the expected claim frequency while the

gamma model estimated the expected claim severity. Significance of the attributes in calculation

of the frequency of claims and severity of claims was tested by fitting the models. Utilizing

deviance, goodness of fit of the models was also tested


5.2 Conclusion

The project was on how to price motor vehicle insurance by developing a pricing estimator using

GLMs. Based on the analysis, it can be shown that none of the five attributes used are

significant in the calculation of claim severity. This means that the attributes do not affect the

amount of claims settled by the insurer per policy holder. This shows that the risk factors stated

above do not affect calculation of pure premium as well as the motor insurance premium rates of

the insurer. We conclude that the insurer should continue calculating pure premium using the

existing methods because the anticipated risk factors do not affect premium rates hence cannot

reduce the premiums charged by the insurer. Regarding claim frequency, it can be seen that total

distance covered by the car is significant in its estimation. The other risk factors are not

significant in claim frequency estimation hence the insurer should ignore them and only use the

total distance covered by the car to cluster. This demonstrates that the total distance covered by

the car and the claim frequency are positively correlated. Therefore, total distance covered by the

car is a good estimator of claim frequency.

The models’ goodness of fit showed that the gamma model is ideal for estimating claim severity.

This is because residual deviance was not large at 5% significance level which means that

gamma model is a good fit claim severity. The Poisson model is not ideal for estimating claim

frequency. This is because the residual deviance was large at 5% significance level, this means

that Poisson model is not a good fit for claim frequency.


5.3 Recommendation

In using Generalized Linear Models, the research had an assumption that the data attributes used

in the determination of insurance pricing were independent which isn’t always the case in real

life as some attributes are co-dependent in determining the claims frequency and severity.

Attributes like age of driver’s license (used to determine the driver’s experience) and age of car

can co-dependently affect the claims severity and frequency of a motor vehicle insurance. It is

therefore recommended that the insurance companies adopt models that would take the co-

dependence of specific attributes into account for better insurance pricing.

The attributes used in this research were insignificant in estimation of the severity of claims. The

Poisson model was also noted not to be ideal for the estimation of claim frequency. Therefore,

we recommend that further research to be done to identify the ideal model to estimate claim

frequency. Further research should also be done to identify predictor variables which will be

significant in estimation of the claim severity.


References

Antonio, K., & Valdez, E. A. (2012). Statistical concepts of a priori and a posteriori risk

classification in insurance. AStA Advances in Statistical Analysis, 96(2), 187–224.

https://doi.org/10.1007/s10182-011-0152-7

Barasa, K. (2016). To identify a framework for adoption by insurance industry for enhancing

insurance penetration. Strathmore University.

Cameron, A. C., & Trivedi, P. K. (1998). Regression Analysis of Count Data. Cambridge

University Press. https://doi.org/10.1017/CBO9780511814365

David, M. (2015). Auto Insurance Premium Calculation Using Generalized Linear Models.

Procedia Economics and Finance, 20, 147–156. https://doi.org/10.1016/S2212-

5671(15)00059-3

Denuit, M., & Charpentier, A. (2005). Mathematics of Non-Life Insurance. Volume II: Pricing

and provisioning. https://dial.uclouvain.be/pr/boreal/object/boreal:17317

Din, U., Mohy, S., Regupathi, Bakar, A., & Arpah. (2017). Insurance effect on economic

growth– among economies in various phases of development. Review of International

Business and Strategy.


Dionne, G., & Vanasse, C. (1989). A Generalization of Automobile Insurance Rating Models:

The Negative Binomial Distribution with a Regression Component. ASTIN Bulletin: The

Journal of the IAA, 19(2), 199–212. https://doi.org/10.2143/AST.19.2.2014909

Goldburd, M., Khare, A., Tevet, D., & Guller, D. (2016). GENERALIZED LINEAR MODELS

FOR INSURANCE RATING Second Edition. 122.

Haberman, S., & Renshaw, A. E. (1996). Generalized Linear Models and Actuarial Science.

Journal of the Royal Statistical Society: Series D (The Statistician), 45(4), 407–436.

https://doi.org/10.2307/2988543

Huang, D., & Query, J. T. (2007). Designing a New Automobile Insurance Pricing System in

China: Actuarial and Social Considerations.

Insurance Regulatory Authority. (2017). Insurance-Industry-Annual-Report-2017.

Insurance Regulatory Authority. (2021). INSURANCE INDUSTRY ANNUAL REPORT 2020.

231.

Kafková, S. (2015). Bonus-malus Systems in Vehicle Insurance. Procedia Economics and

Finance, 23, 216–222. https://doi.org/10.1016/S2212-5671(15)00354-8

Liedtke, P. M. (2007). What’s Insurance to a Modern Economy? The Geneva Papers on Risk

and Insurance - Issues and Practice.

Liu, Z., Shen, Q., & Ma, J. (2017). A driving behavior model evaluation for UBI. International

Journal of Crowd Science, 1(3), 223–236. https://doi.org/10.1108/IJCS-08-2017-0022

Nelder, J. A., & Verrall, R. J. (1997). Credibility Theory and Generalized Linear Models. ASTIN

Bulletin.
Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized Linear Models. Journal of the

Royal Statistical Society: Series A (General), 135(3), 370–384.

https://doi.org/10.2307/2344614

Ohlsson, E., & Johansson, B. (2010). Non-Life Insurance Pricing with Generalized Linear

Models. Springer. https://doi.org/10.1007/978-3-642-10791-7

Pinquet, J. (1997). Allowance for Cost of Claims in Bonus-Malus Systems. ASTIN Bulletin: The

Journal of the IAA, 27(1), 33–57. https://doi.org/10.2143/AST.27.1.542066

Poole, D., Mackworth, A., & Goebel, R. (1998). Computational Intelligence: A Logical

Approach.

Re, S. (2020). World insurance: Riding out the 2020 pandemic storm. Sigma, 4.

Re, S. (2021). Turbulence after lift-off: Global economic and insurance market outlook 2022/23.

Sigma, 5.

SCHMITTER, H. (2004). The Sample Size Needed for the Calculation of a GLM Tariff. Astin

Bulletin - ASTIN BULL, 34. https://doi.org/10.2143/AST.34.1.504964

Silva, Y., & Afonso, L. (2015). A Comparative Sudy of Pricing Methods of Automobile

Insurance in Brazil. Revista Brasileira de Educação, 19, 25–44.

Williams, A., & Shabanova, V. (2003). Responsibility of drivers, by age and gender, for motor-

vehicle crash deaths. Journal of Safety Research, 34, 527–531.

https://doi.org/10.1016/j.jsr.2003.03.001

Zheng, C. (2015). The automobile insurance pricing model, combining static premium with

dynamic premium——Based on the generalized linear models. World Risk and

Insurance Economics Congress (WRIEC).


Appendix

R Syntaxes

Code for claim frequency

Code for claim severity

You might also like