Professional Documents
Culture Documents
INSURANCE PREMIUMS
This project is our original work and has not been submitted to any other institution of higher
learning for the award of a certificate, diploma, degree or any other award.
I confirm that this research project has been submitted for examination with my approval as the
university supervisor.
SIGN……………………………. DATE……………………………
Lecturer
Kenyatta University
Dedication
It is with genuine gratitude and warm regard that we dedicate this work to our parents and
guardians who have never failed to give us financial and moral support. We also dedicate this
project to our friends and all the students pursuing actuarial science.
Acknowledgement
First and foremost, we would like to thank the Almighty God for enabling us to complete this
project. We would like to convene profound gratitude to Dr. Ananda Kube, our lecturer and
supervisor, for supporting this project with his guidance, ideas and positive criticism. We would
also like to extend our appreciation to the Department of Mathematics and Actuarial Science and
all those who guided us in writing this project through their comments and suggestions. Lastly,
we want to thank ourselves for our cooperation and commitment towards the completion of this
project.
Abstract
The precise project’s objective is developing motor vehicle insurance pricing estimator using
Insurance penetration in Kenya is low at 2.4% according to a 2020 study by the Central Bank of
Kenya (CBK), which is way below the global average of 7.23% due in part to Kenyans mostly
viewing insurance as a luxury product and will mostly take it when it’s a regulatory requirement
as in the case of third-party insurance. Five attributes: age of driver, age of driver’s license, total
distance covered by car, age of the car and gender of the driver have been used in estimating
both the claims frequency and claims severity. The Poisson model has been used to estimate the
expected claim frequency while the Gamma model has been used to estimate the expected claim
severity. The analysis shows that in the data used, the total distance covered by car is significant
in the estimation of claim frequency while the other four attributes are not significant. All the
attributes are not significant in the estimation of claim severity. These results show that the
insurer should continue calculating pure premiums using the existing methods because the
factors analyzed do not affect premium rates and hence cannot reduce the premiums charged by
Dedication.......................................................................................................................................3
Acknowledgement..........................................................................................................................4
Abstract..........................................................................................................................................5
4.1.1 Frequency.....................................................................................................................30
4.1.2 Severity.........................................................................................................................33
References.....................................................................................................................................43
Appendix.......................................................................................................................................46
LIST OF TABLES
Table 1: Frequency histogram of five predictor variables for claim frequency............................32
LIST OF FIGURES
Figure 1: Generalized Linear Model output of claims data fitted on a Poisson distribution......36
Figure 2: Generalized Linear Model output of claims data fitted on a quasipoisson distribution
.......................................................................................................................................................37
Figure 3: Generalized Linear Model output of claims data fitted on a Gamma distribution......38
CHAPTER ONE: INTRODUCTION
premium in exchange for a guarantee of compensation for a certain loss, damage, illness, or
death. An insurance company is a financial institution that provides a variety of insurance plans
to safeguard individuals and organizations against the risk of financial losses in exchange for
regular premium payment. An insurance business works by pooling the risks of many
policyholders. There are two types of insurance: general or non-life (short-term) insurance and
life insurance (long-term). General insurance is either short-term or annual, and it is divided into
two types: personal (for individuals) and commercial (for businesses). Life insurance is a long-
The Insurance Regulatory Authority (IRA) regulates the insurance industry through the
Authority (IRA) is in charge of licensing, regulating, and growing Kenya's insurance business,
Insurance Regulatory Authority, (2017). The insurance industry in Kenya dates back to the
British colonization of the country. Colonialism had both positive and harmful consequences,
with the insurance industry being one of them. The white settlers made significant investments in
Kenyan land, particularly in farming and agriculture. The settlers, like the Italian merchants,
understood the importance of safeguarding their investments against a myriad of risks. Their
conquests grew, and so did their investments. This in turn led to an increase in demand for
insurance coverage. The settlers therefore capitalized on the situation and founded insurance
companies. As these colonies grew, the colonizers' insurance networks grew to meet the
increasing need for insurance. British insurers held all of the insurance organizations.
The earliest insurance companies include; Pioneer Assurance Society was founded in 1930,
Jubilee Insurance Company was founded in 1937, Pan Africa Insurance was founded in 1946,
and Provincial Insurance Company Limited was founded in 1949. The insurance organizations
had been upgraded to complete insurance firms by the time Kenya gained independence in 1963.
Kenya, like many other developing countries, had no specific insurance legislation before
independence. The Companies Act governed the operation. However, following our
assisted Kenya in recognizing the necessity for legislation to limit the growth of insurance
The Insurance Ordinance (1960) was enacted as a result of this. Its primary responsibility was to
oversee the insurance industry's formation, financing, and operations. In 1994 and 1972,
respectively, two policies were created and established. The first saw a healthy national
insurance and reinsurance industry as a key component of economic development. The other
strategy encouraged developing nations to take steps to reduce their reliance on international
insurers and reinsurers, as well as to domesticate competitive reinsurance terms and conditions
derived from the international insurance business. The Insurance Act, cap 487, was passed in
1986 and went into effect in January 1987 as a result of the preceding initiatives. The act
established a regulator's office as well as regulations for insurance and reinsurance businesses, as
The depth of a country's insurance market is determined through the metric calculation of the
insurance penetration ratio. The penetration rate is measured as the ratio of premium
underwritten in a particular year to the Gross Domestic Product (GDP). A higher penetration rate
indicates that the insurance business contributes more to the country's economy. Insurance
savings, and converting dormant capital into free capital by mitigating risk for participants in
diverse economic sectors, Liedtke, (2007). Individuals, corporations, and governments can
manage their risk exposure by purchasing insurance to cover losses, liability litigation, and
natural calamities that might otherwise be catastrophic if there were no risk transfer mechanisms
in place. Insurance promotes financial stability and fosters investment by mobilizing savings.
Personal and social insurance improves residents' well-being and allows workers to stay healthy
and financially secure in between jobs. A government can also encourage business development
The insurance sector in the globe is dominated by wealthy developed nations. Despite accounting
for less than 10% of the world's population, the Group of Seven countries (G7) account for about
65% of global insurance premiums. In 2012, the same seven countries spent an average of US$
3,910 on insurance premiums per capita, compared to barely US$ 120 in emerging markets Re,
(2013). Globally, 6.5% of people have insurance. In 2012, total insurance premiums in Africa
totaled US$ 71.9 billion, resulting in a penetration rate of 3.65%, Re, (2021). Re, (2020)shows
that in 2019, the total insurance penetration in emerging markets grew vigorously, continuing a
solid upward trend, especially in Asia. Growth was seen in many regions except in the life
sectors of Emerging Europe and the Middle East and Africa. The average per capita spending of
insurance (insurance density) in emerging markets was US$ 175 in 2019, whereas insurance
penetration was 3.3%. Due to COVID-19, the report projected slow insurance market growth by
The total insurance penetration in Africa in 2019 was at 2.78% compared to the global average
of 7.23% as stated by African Insurance Organization. South Africa prevails the African
Insurance Market, generating 70% of the African insurance premiums at a value of US$ 48.3bn.
COVID 19 impact slowed down the insurance penetration in Africa but the solid market
Kenya's insurance industry is part of the financial services industry. According to a report by the
Insurance Regulatory Authority, Kenya has 56 insurance businesses, 204 insurance brokers, and
11273 insurance agents. The 56 companies achieved total revenue of Kshs 132.1 billion, with an
overall underwriting profit of Kshs 15.5 billion, or around 11.7 percent of total revenue.
Insurance penetration in Kenya remains low at 2.4% in contrast to other countries according to
the 2020 Financial stability report by the Central Bank of Kenya, which is way below the global
average of 7.23%. This can be due to the fact that most Kenyans view insurance as a luxury
product and mostly take it when it is a regulatory requirement as in the case of third-party
insurance. A variety of causes have been blamed for Kenya's low insurance penetration rate,
including the burdensome regulatory environment, little public awareness, poor customer
service, poor claims, non-supportive culture, low discretionary income, marketing methods
prices, and settlements Barasa, (2016). Improving Kenya's socioeconomic situation will help
raise insurance penetration numbers, which have stayed below worldwide standards for a long
significant test to small and medium businesses (SMBs) given their typically thin cash reserves
and over-dependence on a small number of market routes. The levels of business insurance for
SMBs tend to be relatively low as they view insurance as a cash outflow that may not provide
them with the needed benefits. In the wake of the pandemic, SMBs are likely to review their
arrangements since the pandemic served as a powerful demonstration on the value of insurance
protection. This creates a significant opportunity for insurers to provide the right range and
choice of cover, through the right channels, at the right price. The potential is there for insurers
who can drive a truly differentiated SMB proposition to secure a significant competitive
advantage, putting the segment’s business needs and customer experience at the core of
Insurance pricing (also known as rate making) is the process of determining what rates, or
premiums, to charge for insurance. A rate is the cost of insurance per exposure unit (a unit of
liability or property with similar characteristics). Until the policy time has expired, the true cost
of providing the insurance is unclear. As a result, rather than real costs, insurance rates must be
based on projections. The majority of rates are established by statistical analysis of previous
losses based on the insured’s factors. Premiums are determined based on the variables that
produce the best projections. However, in some cases, such as earthquake insurance, the
historical analysis may not provide an adequate statistical reason for selling a rate. Catastrophe
modeling is occasionally utilized in these situations, but it has a lower success rate. Underwriters
determine which variables apply to a certain insurance application, while actuaries set the
financial loss, and its operation is thus largely a risk management exercise. Using an insurance
policy contract, the insured swaps future risk with an insurer for a fixed premium, and whenever
the policyholder suffers a loss, they can file a claim with the insurer provided the agreement
allows it. The insurer determines the premium in advance of any claims, thus it is critical for the
company to forecast the risks of its clients to set a lucrative premium. Predictive modeling is
widely utilized in the insurance industry, both in terms of appraising consumers and determining
rates.
To ensure that the customer's loss is covered, the premium is established in response to his or her
risk. This, however, does not cover the entire cost. An insurance company, like any other
business, has its expenses and strives to profit. As a result, the premium is adjusted to cover both
the customer's loss and the costs, and to maintain a reasonable profit margin. The core of the
premium, however, is to select a premium based on the customer's risk. And thus, the customer's
The underwriting as well as the ratemaking have to be precise. If the rate is accurate for a
specific class, but the underwriter assigns applicants who are not in that class, the rate may be
insufficient in covering losses. On the other hand, if the underwriting is good but the rate is
based on a small sample size or variables that don't consistently forecast future losses, the
According to the Re, (2021), the third quarter of 2021 marked an increase of 15% on non-life
insurance commercial lines prices. Property pricing increased by 9% in the third quarter and
Financial and Professional liability (FinPro) pricing increased by 32% in almost all regions.
Property rates were mostly driven by cat-related covers and FinPro by rising Directors &
Officers claims. Casualty business showed 6% price improvements in 2021. These were driven
The insurance business around the world has been showing incredible progress in terms of new
product creation and technological advancement. However, when it comes to insurance market
penetration in Kenya, there has been little progress. This can be as a result of various factors
including high cost of insurance, lack of a savings culture among the general public, limited
disposable income; with close to half of the population of Kenya living in poverty, lack of
enough tax incentives to encourage the purchase of insurance products, and lack of enough
This study seeks to price motor vehicle insurance using a Generalized Linear Model(GLM) in
order to reach a larger customer base with an affordable premium. Generalized Linear models
have been successfully used as a pricing technique in the insurance industry. A study done by
Masese, (2020) modeled the claim frequency and severity on auto insurance data using
Generalized Linear Models in order to determine the effects of the distance driven, speed, and
time of driving on premium rates. The findings were that an increase in the distance and speed
resulted in an increase in claim severity and frequency, hence causing the pure premium to
increase. By using a GLM, this study will be able to determine significant factors affecting motor
insurance premium rates and adjust them in order to come up with affordable premium rates.
This will in turn help to increase insurance penetration through the availability of affordable
The main objective of this project is to develop a pricing estimator for motor vehicle insurance
using GLMs.
ii. To model claim frequency by use of GLMs and estimate the parameters.
iv. To fit the proposed estimator in (i) above in motor vehicle insurance data to explain how
This project intends to develop the appropriate pricing estimator for motor vehicle insurance in
Kenya by modeling claim frequency by use of GLMs. The right pricing of motor vehicle
insurance will enable many motor vehicle owners to take the insurance, therefore, increasing the
insurance penetration in Kenya. The increase in the volume of motor vehicle insurance business
will in turn generate more profits for the business. Owing to the fact that there is very little
literature in the Kenyan market regarding GLMs, this project will also form an insight into this
area as the use of GLMs in pricing has gained popularity in Malaysia, Canada, and other
European countries.
CHAPTER 2: LITERATURE REVIEW
This chapter primarily focuses on addressing and synthesizing the existing information in the
areas of motor insurance pricing and the usage of Generalized Linear Models. These models are
Nelder and Wedderburn introduced generalized linear models for the first time in 1972.For
decades, the Generali, common statistical equipment for Generalized Linear Model (GLM) has
been one of the most used methods for assessing automobile insurance. According to Huang &
Query, (2007). it is a highly valid system for determining automobile insurance rates that can
readily manage a large number of risk combinations and construct complex claim linkages.
Nelder & Verrall, (1997) demonstrated how credibility theory can be included in GLM theory.
In this line, Schmitter, (2004)presented a straightforward approach for estimating the number of
claims required for a GLM tariff calculation. GLMs are now widely regarded as the industry
standard pricing technique and, more crucially modeling actuarial data. The majority of countries
utilize GLMs to analyze their portfolios. In Japan, Korea, Canada, Singapore, Brazil, Malaysia,
and many other European countries, the use of GLMs is becoming more common.
Furthermore, there are two types of GLMs. The first one is an additive model, which includes
adding the covariates. A multiplicative model is an alternative option. The additive model,
according to Ohlsson & Johansson, (2010), Goldburd et al., (2016), and Huang & Query, (2007)
has inferior actuarial applicability since it can produce false results. Negative premium values
and claim estimates are simple to obtain with adequate values. Furthermore, charging a $20
penalty for a $100 premium and the same amount for a $1,000 premium seems absurd.
According to the multiplicative model, we should charge 20% of the premium for both
individuals, which seems actuarially fair because 20% for $100 and $1,000 equals $5 and $50,
respectively.
The Tweedie Models, named after a British statistician who provided a comprehensive
examination of the notion in 1984, are also of great importance to premium grading. Goldburd et
al., (2016) noted that because most of their mass is near zero and the remaining mass is tilted to
the right, Tweedie models are best suited for premium estimates. When most claim distributions
Insurance companies categorize many kinds of risks using "risk variables," such as age and
gender. There are a few things to think about when using generalized linear models to insurance
pricing, especially in the current global context. Williams & Shabanova, (2003)discovered that
while women in their 50s and older were more likely to be at fault than same-age males, young
males were more likely to be at fault than young females for accident deaths. Young drivers,
particularly males, have the highest rates of culpability for deaths per licensed driver. This
demonstrates a link between gender and age when it comes to claims settlement. If those two
things contribute to crashes, they must be factored into pricing. In 2012, however, the European
Court of Justice (ECJ) prohibited all private European Union (EU) insurers from discriminating
against risks based on gender. Insurers can be certain that acquiring more specific information
Bülbül & Baykal, (2016) observed that the bonus-malus system is one of the most significant
in accordance with their history of claims. The bonus-malus system does this by ensuring fair
rates because each person pays a premium in accordance with the frequency of their claims.
Typically, the claim frequency is utilized because The frequency of claims is more impacted by
unobserved and unmeasurable driver factors than the severity. To encourage drivers to drive
more cautiously and to make sure that policyholders pay a premium in line with their history of
Silva & Afonso, (2015) discussed other models that could be used in pricing motor insurance
premiums. They looked into the following options: Pure premium by historically aggregated
claims, Pure premium by expected aggregated claims, Classic linear models, and Generalized
linear models. Their research aimed to compare the growth in relation to original pricing and
pure premium dispersion. To adequately clarify their views, they constructed a variable called
Efficiency, which calculates the price rise so that it is one percentage point lower than the price
in their dataset.
After doing their research, Silva & Afonso, (2015)concluded that the historical pure premium
and GLM modeling methodologies had the best price efficiency (1.60), but the former showed
lesser volatility than the latter. The historical pure premium technique had a standard error of
5.26 percent while the GLM technique had a standard error of 17.52 percent. The historical pure
premium had a comparative advantage because of the modest variation of the basic indicators
over time and since it contained data for all insured vehicles in Brazil, allowing for strong
adherence to the technique. GLM, on the other hand, appeared to be a viable pricing option for
medium-sized portfolios that are experiencing growth or are looking into specialty industries.
In circumstances when the data comprises associated risk factors, Huang & Query, (2007)
proposed combining a Max Model with a GLM to improve accuracy. The GLM usually does a
good job with factors that are somewhat linked; it only has trouble fitting values when the
components are extremely connected. The technique of selecting the most significant and least
correlated elements did not work well in China, according to Huang & Query, (2007) because it
left very few components to work with. They went on to advise that correlated factors be used
The pricing scheme does not provide a significant reduction to policyholders who provide
inaccurate information, and there is sufficient redundancy to correct it. For instance, actuaries in
China evaluated the vehicle's book value, engine size, maker or model, mileage, and other factors
to determine evaluate the vehicle's risk, even though all of these factors are interrelated. This
contradicts GLM's assumptions. Other factors add to the difficulty of separating the risks. Huang
& Query, (2007) gave an example to consider the circumstance of a skilled driver who is driving
a faulty vehicle or vice versa. Major issues are frequently attributed to a single factor, although
the model's other factors are considered accurate and reliable. For these issues, they proposed
using a Max model to overcome the problems of linked variables, based on the above challenges.
The Max model assumes that for the correlated risk factor,
GLMs could be used to produce more models, such as those suggested by Goldburd et al. (2016).
Elastic Net GLMs, Generalized Additive Models (GAMs), Multivariate Adaptive Regression
Splines (MARS) Models, Generalized Linear Mixed Models (DGLMs), and GLMs with
Dispersion Modeling are a few examples of these models. When we look at the modern era, we
can see how GLMs have had an undeniable impact on the pricing of auto insurance. GLMs serve
as a foundation for more advanced approaches. According to Liu et al., (2017), telematics; the
use of car sensors and positioning systems to monitor vehicle and/or driver behavior, has enabled
insurers to re-design their insurance plans on the fly based on the data collected. Artificial
Intelligence (AI) has expanded the quantity of data that insurers can collect from their customers.
Companies like Discovery are already using telematics in vehicle-based insurance in neighboring
South Africa. Poole et al., (1998) defined Artificial Intelligence as intelligence demonstrated by
described as the study of intelligent agents, which are any devices that sense their surroundings
The rise of various data-gathering devices, informally known as the Internet of Things (IoT), as
well as technological advancements, have recently enabled actuaries to use predictive analytics, a
benefit that comes with harvesting Big Data. This means that the typical consumer's premium
can finally be set by specific criteria that are unique to them. Liu et al., (2017) and Zheng,
(2015). have exhibited ideas and concepts that are being tested on a larger scale in industrialized
countries. Liu et al., (2017) propose a driving behavior model based on Usage-based Insurance
(UBI). They built on previous work by Zheng, (2015) who presented the notions of altering a
GLM estimated static premium by merging it with a dynamic premium that adjusted on the fly
In Dionne & Vanasse, (1989), the study proposed an expansion of popular pricing strategies for
auto insurance. This was accomplished by merging two systems of tarification and creating a
negative binomial model with a regression component: a previous model that chose tariff
variables, established tariff classes, and calculated premiums, and a posterior model that used a
bonus-malus system and adjusted the policyholder’s premium according to their accident history.
The study was able to show how the bonus-malus system can be modified to include both the
The ideal premium strategy for a general insurance company, according to Pantelous &
Passalidou (2013), is influenced by the competitiveness of the insurance market as well as the
company's demand on the market. The strategy for premium pricing is primarily based on claim
expenses, business and administration costs, the margins for changes in the experience of claims,
and anticipated profits. The study looked at two strategies for figuring out a company's best
course of action in a cutthroat insurance market. The first step was to provide a generic equation
for the business volume linked to variables like historical performance, average market premium,
average company premium, and company reputation. The best premium method was then
function's present value. These conclusions were used to analyze data from the Greek insurance
David, (2015) looked at how auto insurance premiums are calculated using Generalized Linear
Models. This study used data from a French auto insurance portfolio where the bonus-malus
coefficient growth The insurance portfolio was separated into sub-portfolios based on their
various risk criteria, preventing anti-selection because each class had policyholders with the
same risk profile and willingness to pay the same fair premium. A Poisson Regression model
was used to predict the frequency of claims, and a Gamma model was used to determine the
average level of claims costs for each class of policyholders. The projected frequency and
expense of claims were multiplied to determine the premium. The study's findings showed that
the pure premium lowers with increasing insured age, increasing insurance contract age, and
factoring risk factors such the insured's occupation, use of the vehicle, bonus-malus and age of
the insurance contract The pure premium decreases with an increase in the insured age, the age
A study done by Kafková, (2015) shows that there are certain important factors to be considered
for an individual policyholder that cannot be accounted for in GLMs such as the driver’s
capacity, reflex actions, drug abuse, knowledge of the highway code. Hence, the use of a bonus-
malus able to determine an estimate of annual claim frequency for different groups of drivers
using GLM. A bonus-malus system was applied that enabled a fair premium price for each of the
A study by Garrido et al., (2016) explored how GLMs can be used in cases where the frequency
and severity of claims are dependent. An example of this is in motor vehicle insurance, where
claim frequency and severity are often negatively correlated; as drivers who file a large number
of claims are usually filing for minor accidents with low claim amounts. In the study, this
dependence between claim frequency and claim severity was modelled by a conditional method.
The claim frequency was included as a covariate in the conditional severity model, thereby
forming an extended version of the formally used independent model. This information was
applied to an automobile insurance dataset where the dependence rate was found to be small but
An advantage of using Generalized Linear Models is that they are not limited to normally
distributed data, but can also be used with exponential family distributions. The Normal,
Binomial, Gamma and Poisson distributions are part of the exponential family of distributions. A
study by Nelder & Wedderburn, (1972) showed that GLMs expand on the Gaussian model to
include the exponential family of distributions. This has been beneficial in pricing general
insurance, as the claim frequency and severity are often not normally distributed. The
advancement of GLMs has improved the quality of risk analysis models and the methods of
authors have succeeded to show and develop the assumptions used in the application of these
models in general insurance. This chapter has looked at the arguments of various authors and the
ways in which they used GLM to price insurance products. Their contribution has led to the
In statistical analysis, simple linear regression is used to model the relationship between a
dependent variable and other independent variables. It assumes that the dependent variable
follows a normal distribution and also that the dependent variable has a linear relationship with
the independent variables. This becomes a shortcoming when dealing with data that follows a
Hence, the introduction of Generalized Linear Models, that generalize the simple linear
regression models. A function of the dependent variable, referred to as the link function, is used
instead in the model. The Generalized Linear Model works with a general class of distributions
including normal and non-normal distributions, and so addresses the shortcoming of simple
linear regression. The linear functions of x are the transformations of the means and they contain
In a GLM, the dependent variable Y has an exponential dispersion family distribution with the
general form:
yθ−b(θ )
f ( y ;θ , φ )=exp ( )+c ( y , φ) (1 )
a( φ)
where θ and ϕ are the natural and dispersion or scale parameters respectively, and a , b , c are
functions. The Exponential Dispersion model becomes a one parameter exponential family if ϕ is
fixed. A distribution from the one-parameter exponential family has a log-likelihood with the
form:
yθ−b(θ )
l= +c ( y , φ ) (2 )
φ
Where θ is the canonical parameter and ϕ is the dispersion parameter, assumed known. The
dispersion parameter ϕ has a value 1 for the Poisson distribution. It is however not known and
has to be calculated for the Gamma distribution. As seen in Haberman & Renshaw, (1996), it can
be shown that
d
m=E(Y )= b(θ ) (3 )
dθ
2
d
Var (Y )=φ 2 b(θ )=φ b '' (θ ) (4 )
dx
Var(Y ) is a result of multiplying the dispersion parameter ( φ ) and b ' ' (θ). The quantity b ' ' (θ)
which is called the variance function is dependent on the canonical parameter ( θ ) and therefore
the mean. An illustration of this can be shown using the log-likelihood of the Poisson
distribution,
l= y log m−m−log y ! (5 )
where θ=log m , b(θ )=exp (θ) , Var (m)=m and the dispersion parameter φ=1
The link function defines the relationship between the random and the systematic components. It
shows the relationship between the expected value of the response and the linear combination of
p
g( μi )=β 0 + ∑ β j x ij=x ti β=ηi (6 )
j=1
where the parameters β 1, β 2, …, β p are a linear combination of the explanatory factors X i . Here,
the link function is the function g as it links the linear predictor ηi with the mean, where
μi is
A link function is used to define the relationship between the mean of the response variable Y,
and the predictor variables. This study aims to rate premium factors by the use of a GLM that
Frequency is the number of claims per unit exposure over a specified period of time. In the non-
life insurance industry, it has been demonstrated that using GLM techniques to estimate claim
frequency has a priori Poisson structure. Antonio & Valdez, (2012)presented the Poisson model
as the event counts modelling archetype, commonly referred to as the frequency of claims in
development for counts regression models. In this instance, assuming that the discrete random
Poisson distribution represents the appropriate statistical model to evaluate the probability of 0,
1, 2,... risks occurrence. Therefore, for the insured i , the probability that the random variable Y i
−λ i
e λ y
i i
f (Y i= y i|x i )= (7)
yi !
between the mean and variance of claim frequency. In other words, the Poisson distribution
parameter reflects both the mean and the variance of the distribution:
x'i β
E( y i|x i )=V ( y i|x i )=λ i=e (8)
The maximum likelihood estimator is the standard estimate for this model. The likelihood
−λ i
n e λ y
L( β )=∏ ii
( 9)
i=1 yi !
λ
By substituting i as equal to e
x 'i β
,
xi β
' t
n x β yi
e−e (e i )
L( β )== ∏ (10)
i=1 yi !
n n
x' β
l=∑ [ y i ln λ i−λ i−ln y i ! ]=∑ [ y i x 'i β−e i −ln y i !] (10 )
i=1 i=1
The first two partial derivatives of the log-likelihood function exist and are written as:
n n
∂l( β ) x' β
=∑ ( y i −λi ) xij =∑ ( yi −e i ) x ij (11)
∂ β j i=1 i=1
2 n n
∂ l( β ) x' β
=−∑ λi x ij x ik =−∑ (e i x ij x ik ) (12)
∂ β j βk i=1 i=1
terms of the regression coefficients and solving them to zero. The system's equations can't
generate explicit solutions, hence an iterative method must be used to solve them numerically.
Newton-Raphson is regarded as one of the most efficient iterative techniques. The estimation of
claim frequency can also be done using RStudio packages in R programming however due to the
nature of the dataset, there might be a problem of random variations. As seen in Ohlsson &
Johansson, (2010), “The variance of the data within a tariff cell is greater than the variance of a
and insured objects and the impact of explanatory variables not included in the model.” To
overcome the problem of random variation, the Overdispersed Poisson(ODP) can be used by
calling upon the GLM function ‘quasipoisson’ instead of usual ‘Poisson’. The ODP is similar to
the usual Poisson distribution, with the exception of the φ , the dispersion parameter. In the
ODP, φ can take any value other than 1 given in the ‘true’ Poisson.
The Gamma model is often used in modeling the claim costs. In Pinquet, (1997) the author
describes a simple realistic parametric model based on the Gamma distribution for the modeling
insured i, assuming the costs are independently Gamma distributed, the pdf is given by
1 vci v vc
f (ci )= ( ) exp(− i ) c i >0 (14 )
Γ ( v ) μi μi
2
μi
E(c i )=μi and variance V (c i )=
With mean v
yi
∂ l ( β|c ) vc
= ∂ ∑ ∑ (−v ln μi − ik )=0 ( 16)
∂ βj ∂ β j i| y > 0 i=1 μi
i
yi
vcik
∑ ∑ x ij ( 1− μi
)=0 ( 17 )
i| y i >0 i =1
x 'i β^
Hence having c^ i= μ^ i=e as the estimated claims cost for the insured i, the maximum likelihood
c
∑ ( y i− c^i. )x i =0 (18)
i
Hence demonstrating connection, the between the explanatory variables and residuals. The
parameters v and µ, which allow for more flexibility in forecasting claim costs, are the key
A reasonable pure premium reflects the estimated value of claims. This number is modified to
represent the real price per exposure level after accounting for expenses and claim handling fees.
The pure premium is determined by multiplying the estimated claim frequency and claim cost:
N
E[ ∑ Ci ]=E(Y )×E(C i ) (19 )
i=1
for the claim amounts (C1 , C 2 ,.. .) . According to Denuit & Charpentier, (2005), this separate
method of calculating the frequency and cost of claims is relevant since the risk factors affecting
the frequency and cost of claims are typically different. The separate study of the two
PRESENTATION
A dataset containing 489 observations of telematics data and claims data of motor vehicle
accidents that occurred in Spain in 2011 was used in the analysis. The data was provided by an
insurer and contained a random sample with information on total cost in thousand Euros, drivers’
Nine attributes that could have been used as predictor variables with two other response variables
were retained: Number of claims (Nclaims) and Amount settled for by the insurer with respect to
4.1.1 Frequency
This segment examines how frequency of claims vary with vehicle age, age of license, age of
driver, gender of driver and the total distance in kms covered by the car. Histograms for each of
Frequency
Frequency
40 36 38
26 40 29
20 11 19 16
7 8 20
4
0
0
3.9
5.7
7.5
9.3
.9
.7
.5
.3
.1
.9
11.
-3.3
-5.9
-7.2
-8.5
-9.8
4.6
.1
2.4
3.7
5.0
-14
-16
-12
-18
-2 0
-21
2.1-
3.9-
5.7-
7.5-
-11
9.3-
3.3-
1- 1
4-1
7- 1
2.0
5,9
8.5
4.6
7.2
11.1
12.9
14.7
18.3
20.1
16.5
9.8
11 .
1 2.
13 .
Age of Car
Age of License
(a) ( b)
30
Frequency
20 230
10 3 220
0
210
2
4
4.8
6.2
7.6
3.2
4.6
.8
200
23 .
30 .
6-2
6-2
- 31
4-2
8-2
2- 2
8-3
2-3
2 2-
2 9-
2 0.
2 7.
Females Males
30.4
2 3.
2 4.
2 6.
3 1.
3 3.
(c) (d)
6 3 4 1 0 2 0 0 1
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 2 2 5 2 2 7 2 2 92 2 1 2 2 3 2 2 5 2 2 7 2 2 9 2 2 1 2 2 3 2 2 5 2 2 7 2 2 9 2 2 5 2 2 3 2 2 5 2 2
- - - - 1 1 1 1 1 2 2 2 2 2 3 3 3
20 20 20 20 0- 0- 0 - 0 - 0 - 0- 0- 0- 0 - 0 - 0 - 0-
1 2 3 2 5 2 72 92 2 1 2 2 3 2 2 5 2 2 7 2 2 9 2 2 1 2 2 3 2 2 5 2 2 7 2 2 3 2 2 1 2 2
1 1 1 1 1 2 2 2 2 3 3
Distance in kms
(e)
Table 1a: A histogram of age of car
It is noted that frequency is highest at car age 5.7 -7.5 age band followed closely by the 3.9-5.7
age band. The 7.5-9.3 age band has the third highest frequency while the 16.5-18.3 age band has
It is noted that frequency rises steadily with license age till the 4.6-5.9 then drops steadily with
license age. Frequency is highest at the 4.6-5.9 license age band followed by the 3.3-4.6 age
band which is in turn followed very closely by the 5.9-7.2 age band.
It is noted that frequency is highest at 24.79-26.19 age band followed very closely with the
26.19-27.59 age band. 23.39-24.79 age band has the third highest frequency.
It is observed that frequency increases to a maximum at 5220-7220 distance band then drops
steadily with distance. 5220-7220 is the distance band with the highest frequency, followed
This segment inspects severity of claims variation with the attributes: age of car, age of license,
age of driver, gender of driver and the total distance covered by the car in kilometers. Column
charts were used to present the data by plotting the claim severity against the attributes.
Claim Severity Against Age of car Claim Severity Against Age of License
500000
450000
400000
700000
350000 600000
500000
300000 400000
250000 Severity 300000
Severity
200000
200000 100000
150000 0
Under 3 3-5.99 6-8.99 9-11.99 Above 12
100000
License Age
50000
0
Under 4 4-7.99 8-11.99 12-15.99 Above 16
Car Age
(a) (b)
250000 680000
Severity
200000 660000
640000
150000
620000
100000
600000
50000 580000
Females Males
0
20.5-22.5 22.5-24.5 24.5-26.5 26.5-28.5 28.5-30.5 30.5-32.5 32.5-34.5 Gender
Age of Driver
Claim Severity Against Distance covered by car
800000 (d)
( 700000
c)
600000
500000
Severity
400000
(e)
300000
200000
100000
0
0-5000 5001-10000 10001-15000 15001-20000 Above 20000
Distance Bands
Table 2a: A column chart of claim severity against age of car
It is observed that severity increases to a maximum at 4-7.99 car age then decreases steadily with
an increase in car age. Generally, there are more insured cars aged between 4 and 7.99 years
hence the high severity. Insurance companies also set higher premiums for older cars because of
their high exposure to risk hence discouraging owners to take policies on older cars.
It is observed that severity increases to a maximum at age 3-5.99 then decreases steadily with an
increase in license age. Severity is lowest at license age less than 3 because there are fewer
insured drivers with licenses under 3 years. The severity then decreases from age 3-5.99 because
the drivers become more experienced and less exposed to risks leading to insurance claims.
It is observed that severity is highest at the 24.5 to 26.5 age band. 30.5 to 32.5 age band is the
second highest followed by 22.5 to 24.5 age band. Drivers between the ages of 24.5 and 26.5 are
It is observed that males have a higher severity compared to females. Men are more prone than
women to drive more miles and participate in dangerous driving behaviors such as speeding, not
Table 2b: A column chart of claim severity against distance covered by car.
It is observed that severity increases to 5001-10000 distance band then steadily decreases with an
increase in distance. This is mainly due to most cars being driven to between 5001 and 10000
kilometers with fewer cars having more mileage hence the highest severity.
The data was fitted to a glm function in order to determine the significance of the predictor
variables age, age of license, total km, age of car, and age of gender in calculating the pure
premium. A Poisson model was used in modelling the frequency of claims and a Gamma
Figure 1: Generalized Linear Model output of claims data fitted on a Poisson distribution
A second iteration was run using a quasipoisson model and the following were the results.
Figure 2: Generalized Linear Model output of claims data fitted on a quasipoisson distribution
Using the quasipoisson model, all the predictor variables except the distance are found to be
insignificant. This means that only the distance (tkm) variable should be used in clustering the
data. This illustrates that the number of claims and distance covered(tkm) are positively
correlated.
Figure 3: Generalized Linear Model output of claims data fitted on a Gamma distribution
From the table above, none of the variables are significant to the severity except the intercept
which is the expected mean value of the severity when all the other predictor variables are zero.
This means that age, age of license, distance(tkm), age of car, and gender do not affect the
From the analysis above, the data is rightly skewed. This means that most of the data is on the
positive side of the graph’s peak. This is also evident from the histograms above, where it can be
seen that the frequency of observations is lower on the right side of the graph than it is on the left
Utilizing the deviance, the models' goodness of fit was evaluated. The models were evaluated as
follows;
Hypothesis testing
A test is carried out to determine whether car age, age of license, distance covered by car, gender
and age of the policy holder affect premium rates so that they can be adjusted in order to come
A null hypothesis states that the risk factors stated above are not significant in calculating the
pure premium and they do not affect the motor insurance premium rates while the alternative
hypothesis states that the risk factors stated above are significant in calculating the pure premium
Frequency Model
From table 2 above, the effect of distance covered(tkm) is statistically significant at the at the
0.05 threshold of significance. This denotes that a change in distance(tkm) is associated with a
change in the frequency of claims. Deviance is used as a measure of goodness of fit in GLMs.
The residual deviance of the frequency model on 483 degrees of freedom is 105.88. The p-value
associated with the residual deviance by using the chi-square distribution is 1 and hence at a 5%
level of significance, there is adequate proof to reject the null hypothesis H0: that the age of the
policyholder, age of car, age of license, gender, and distance covered by the car do not affect the
frequency of claims.
Severity Model
From table 3 above, it is noted that the policy holder’s age, age of car, age of license, gender, and
distance covered by car are not significant in calculating the pure premium. That is, the stated
risk factors do not affect the premium rates of motor insurance premium rates. The residual
deviance for the severity model on 483 degrees of freedom is 861.54. Using the chi-square
distribution, the p-value associated with the residual deviance is 0 and hence at 5% significance
level, we do not reject the null hypothesis H0: that age of the policy holder, age of car, age of
license, gender, and distance covered by car do not affect the amount settled by the insurer with
RECOMMENDATION
This chapter highlights an outline of the results of the project and the main conclusion drawn
from data analysis in Chapter four. This chapter has been sorted as follows: segment 5.1 contains
summary of the project’s findings, while segment 5.2 is the conclusion and lastly segment 5.3
5.1 Summary
This project focused on developing a pricing estimator for affordable motor vehicle insurance
premiums using Generalized linear models (GLMs). Five attributes were used in the
approximation of both the frequency of claims and severity of claims. The attributes used were
driver’s age, age of the driver’s license, total distance covered by the car, car age and the driver’s
gender. The attributes were plotted individually against both the frequency of claims and severity
of claims using graphs. The Poisson model estimated the expected claim frequency while the
gamma model estimated the expected claim severity. Significance of the attributes in calculation
of the frequency of claims and severity of claims was tested by fitting the models. Utilizing
The project was on how to price motor vehicle insurance by developing a pricing estimator using
GLMs. Based on the analysis, it can be shown that none of the five attributes used are
significant in the calculation of claim severity. This means that the attributes do not affect the
amount of claims settled by the insurer per policy holder. This shows that the risk factors stated
above do not affect calculation of pure premium as well as the motor insurance premium rates of
the insurer. We conclude that the insurer should continue calculating pure premium using the
existing methods because the anticipated risk factors do not affect premium rates hence cannot
reduce the premiums charged by the insurer. Regarding claim frequency, it can be seen that total
distance covered by the car is significant in its estimation. The other risk factors are not
significant in claim frequency estimation hence the insurer should ignore them and only use the
total distance covered by the car to cluster. This demonstrates that the total distance covered by
the car and the claim frequency are positively correlated. Therefore, total distance covered by the
The models’ goodness of fit showed that the gamma model is ideal for estimating claim severity.
This is because residual deviance was not large at 5% significance level which means that
gamma model is a good fit claim severity. The Poisson model is not ideal for estimating claim
frequency. This is because the residual deviance was large at 5% significance level, this means
In using Generalized Linear Models, the research had an assumption that the data attributes used
in the determination of insurance pricing were independent which isn’t always the case in real
life as some attributes are co-dependent in determining the claims frequency and severity.
Attributes like age of driver’s license (used to determine the driver’s experience) and age of car
can co-dependently affect the claims severity and frequency of a motor vehicle insurance. It is
therefore recommended that the insurance companies adopt models that would take the co-
The attributes used in this research were insignificant in estimation of the severity of claims. The
Poisson model was also noted not to be ideal for the estimation of claim frequency. Therefore,
we recommend that further research to be done to identify the ideal model to estimate claim
frequency. Further research should also be done to identify predictor variables which will be
Antonio, K., & Valdez, E. A. (2012). Statistical concepts of a priori and a posteriori risk
https://doi.org/10.1007/s10182-011-0152-7
Barasa, K. (2016). To identify a framework for adoption by insurance industry for enhancing
Cameron, A. C., & Trivedi, P. K. (1998). Regression Analysis of Count Data. Cambridge
David, M. (2015). Auto Insurance Premium Calculation Using Generalized Linear Models.
5671(15)00059-3
Denuit, M., & Charpentier, A. (2005). Mathematics of Non-Life Insurance. Volume II: Pricing
Din, U., Mohy, S., Regupathi, Bakar, A., & Arpah. (2017). Insurance effect on economic
The Negative Binomial Distribution with a Regression Component. ASTIN Bulletin: The
Goldburd, M., Khare, A., Tevet, D., & Guller, D. (2016). GENERALIZED LINEAR MODELS
Haberman, S., & Renshaw, A. E. (1996). Generalized Linear Models and Actuarial Science.
Journal of the Royal Statistical Society: Series D (The Statistician), 45(4), 407–436.
https://doi.org/10.2307/2988543
Huang, D., & Query, J. T. (2007). Designing a New Automobile Insurance Pricing System in
231.
Liedtke, P. M. (2007). What’s Insurance to a Modern Economy? The Geneva Papers on Risk
Liu, Z., Shen, Q., & Ma, J. (2017). A driving behavior model evaluation for UBI. International
Nelder, J. A., & Verrall, R. J. (1997). Credibility Theory and Generalized Linear Models. ASTIN
Bulletin.
Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized Linear Models. Journal of the
https://doi.org/10.2307/2344614
Ohlsson, E., & Johansson, B. (2010). Non-Life Insurance Pricing with Generalized Linear
Pinquet, J. (1997). Allowance for Cost of Claims in Bonus-Malus Systems. ASTIN Bulletin: The
Poole, D., Mackworth, A., & Goebel, R. (1998). Computational Intelligence: A Logical
Approach.
Re, S. (2020). World insurance: Riding out the 2020 pandemic storm. Sigma, 4.
Re, S. (2021). Turbulence after lift-off: Global economic and insurance market outlook 2022/23.
Sigma, 5.
SCHMITTER, H. (2004). The Sample Size Needed for the Calculation of a GLM Tariff. Astin
Silva, Y., & Afonso, L. (2015). A Comparative Sudy of Pricing Methods of Automobile
Williams, A., & Shabanova, V. (2003). Responsibility of drivers, by age and gender, for motor-
https://doi.org/10.1016/j.jsr.2003.03.001
Zheng, C. (2015). The automobile insurance pricing model, combining static premium with
R Syntaxes