Business Analytics Project: Sector: Insurance Industry

PGDM Finance
(2019-2021)
Business Analytics project
Sector: Insurance industry
Submitted by: Team 8 Mentored By:
Mayank Chauda 201921027 Dr S. Maheswaran

Nandita Singla 201921029
Pratik Dokania 201921036
Rajat Kumar Nath 201921039
Sajeev George 201921041
INSURANCE INDUSTRY
 Insurance is a means of protection from financial loss. It is a form of risk
management, primarily used to hedge against the risk of a contingent or uncertain
loss.
 An entity which provides insurance is known as an insurer, insurance company,
insurance carrier or underwriter. A person or entity who buys insurance is known as
an insured or as a policyholder. The insurance transaction involves the insured
assuming a guaranteed and known relatively small loss in the form of payment to the
insurer in exchange for the insurer's promise to compensate the insured in the event of
a covered loss. The loss may or may not be financial, but it must be reducible to
financial terms, and usually involves something in which the insured has an insurable
interest established by ownership, possession, or pre-existing relationship.
 The insured receives a contract, called the insurance policy, which details the
conditions and circumstances under which the insurer will compensate the insured.
The amount of money charged by the insurer to the policyholder for the coverage set
forth in the insurance policy is called the premium. If the insured experiences a loss
which is potentially covered by the insurance policy, the insured submits a claim to
the insurer for processing by a claims adjuster. The insurer may hedge its own risk by
taking out reinsurance, whereby another insurance company agrees to carry some of
the risk, especially if the primary insurer deems the risk too large for it to carry.
DATASET
 The dataset contains details of policyholders with different insurers. The

data contains variables such as names of the insured and the insurer,
DPID, age, office phone, amount insured and email-id of the
policyholders.
 The source of the data is Insurance Regulatory and Development
Authority of India website. The excel file is attached with this file.
 The size of the sample is 490 containing 7 variables out of which 4
variables are quantitative.
 The quantitative variables are DPID, age, office phone and the amount
insured.
OBJECTIVES OF THE STUDY
 The objective of the study is to analyse the information collected from
IRDA.
 Through the data, we are trying to analyse the behaviour of policyholders
towards the amount insured by them.
 We will look at the various type of insurers in insurance industry.
 The data is a sample data. We will try to find that whether the sample
mean is helpful in predicting population mean or not.
 We will also see that whether the amount insured is affected by the age of
the individuals or not.
ANALYSIS OF DATA
Measures of Central Tendency:
 Mean: The mean of a data set is also known as the average value. It is
calculated by dividing the sum of all values in a data set by the number of
values.
 Median: The median of a data set is the value that is at the middle of a
data set arranged from smallest to largest.
 Mode: The mode is the most common observation of a data set, or the
value in the data set that occurs most frequently.
 Standard Deviation: The standard deviation of a dataset is a measure of
the magnitude of deviations between the values of the observations
contained in the dataset.
INTERPRETATION:
 We have used measures of central tendency on the amount insured by the
policy holders.
 The mean amount insured is ₹ 2,59,97,405 approximately while the
median amount insured is ₹ 2,68,04,555.5.
 In the dataset, there are multiple values which are occurring the most. The
smallest mode value is 11,27,788.
 The standard deviation is ₹ 1,42,33,972 which is the measure of the
magnitude of deviations between the values of the observations contained
in the dataset.
BOXPLOT
 We have created a boxplot of age of the insurance policyholders.

 Here, it can be seen that the age of the youngest policyholder is 25 years while the age
of the oldest policyholder is 70 years.
 The median age of the policyholders is 47.5 years.
 Q1 is 36 years while Q3 is 58.25 years.
PIE CHART
INTERPRETATION:
 The pie chart shows the various insurers providing insurance.
 The pie chart shows that the maximum number of policyholders are
insured by Life Insurance Corporation of India.
 Also, there are 49 different type of insurance companies in this dataset.
ONE-TAILED t TEST
 The One Sample t Test determines whether the sample mean is statistically different
from a known or hypothesized population mean. The One Sample t Test is a
parametric test.
 This test is also known as:
 Single Sample t Test
 The variable used in this test is known as:
 Test variable
 In a One Sample t Test, the test variable is compared against a "test value", which is a
known or hypothesized value of the mean in the population.
INTERPRTATION:
H0 : The mean of the amount insured is

equal to ₹ 2,00,00,000.
µ = 2,00,00,000
H1 : The mean of the amount insured is not
equal to ₹ 2,00,00,000.
µ > 2,00,00,000
Significance Level: α = 0.05
P-value = (.000/2) = 0.000 {Since the p-value given is for 2-tailed
test}
P-value < α
0.000 < 0.050
Thus, we reject the null hypothesis and accept the alternative hypothesis. Hence,
the mean of the amount insured is greater than ₹ 2,00,00,000.
ANOVA
One-Way ANOVA ("analysis of variance") compares the means of two or more independent
groups in order to determine whether there is statistical evidence that the associated
population means are significantly different. One-Way ANOVA is a parametric test.
This test is also known as:
 One-Factor ANOVA
 One-Way Analysis of Variance
 Between Subjects ANOVA
The variables used in this test are known as:
 Dependent variable
 Independent variable (also known as the grouping variable, or factor)
 This variable divides cases into two or more mutually exclusive levels, or
groups
Interpretation
H0 : The mean amount insured of different age groups of policyholders are equal.
µ1 = µ2 = µ3 = µ4 = …….. = µk
H1 : There is a significant difference between the mean amount insured of the policyholders
across various age groups.
µ1 ≠ µ2 ≠ µ3 ≠ µ4 ≠ …….. ≠ µk

Observation:
P-value = 0.313
P-value > α
0.313 > 0.050
Thus, we accept the null hypothesis and reject the alternative hypothesis.
Hence, there is no significant difference between the means of the amount insured of the
policyholders across various age groups.
CORRELATION
The bivariate Pearson Correlation produces a sample correlation coefficient, r, which

measures the strength and direction of linear relationships between pairs of continuous
variables. By extension, the Pearson Correlation evaluates whether there is statistical
evidence for a linear relationship among the same pairs of variables in the population,
represented by a population correlation coefficient, ρ (“rho”). The Pearson Correlation is a
parametric measure.
This measure is also known as:
 Pearson’s correlation
 Pearson product-moment correlation (PPMC)
Interpretation
H0 : There is no correlation between the age of the policyholders and the amount insured by
them.
ρ=0
H1 : There is a relation between the age of the policyholders and the amount insured by them.
ρ≠0
Observation:
P-value = 0.202
P-value > α
0.202 > 0.050
Thus, we accept the null hypothesis and reject the alternative hypothesis.
Hence, there is no correlation between the age of the policyholders and the amount insured by
them.
REGRESSION
 Linear regression is the next step up after correlation. It is used when we want to
predict the value of a variable based on the value of another variable. The variable we
want to predict is called the dependent variable. The variable we are using to predict
the other variable's value is called the independent variable.
 This table provides the R and R2 values. The R value represents the simple correlation
and is 0.058 (the "R" Column), which indicates no correlation. The Adjusted R2 value
(the “Adjusted R Square" column) indicates how much of the total variation in the
dependent variable, Amount Insured, can be explained by the independent
variable, Age. In this case, 1% can be explained, which is very small.
 This table indicates that the regression model predicts the dependent variable
significantly well. Let’s look at the "Regression" row and go to the "Sig." column.
This indicates the statistical significance of the regression model that was run. Here, p
= 0.202, which is greater than 0.05, and indicates that, overall, the regression model is
not a good fit for the data.
 The Coefficients table provides us with the necessary information to predict amount
insured from age, as well as determine whether income contributes statistically
significantly to the model (by looking at the "Sig." column). Furthermore, we can use
the values in the "B" column under the "Unstandardized Coefficients" column to
present the regression equation as:
 Amount Insured = 28972079 - 62493(Age)
SCOPE FOR FURTHER STUDY
 As we have seen that the age is not a good factor for the determination of amount
insured. We can further analyse that whether other factors are helpful in determination
of the amount insured.
 We can find the gender of policyholders and see that whether it is a determining
factor for the amount insured.
 We can get the details of salary of the policyholders because generally salary is an
important factor in deciding the amount insured.
BIBLIOGRAPHY
 https://libguides.library.kent.edu/SPSS/OneWayANOVA
 https://statistics.laerd.com/spss-tutorials/one-way-anova-using-spss-statistics.php
 https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php

Business Analytics Project: Sector: Insurance Industry

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Analytics Project: Sector: Insurance Industry

Uploaded by

Copyright:

Available Formats

PGDM Finance

Business Analytics project

Sector: Insurance industry

Submitted by: Team 8 Mentored By:

Mayank Chauda 201921027 Dr S. Maheswaran

 The dataset contains details of policyholders with different insurers. The

 We have created a boxplot of age of the insurance policyholders.

H0 : The mean of the amount insured is

Significance Level: α = 0.05

The bivariate Pearson Correlation produces a sample correlation coefficient, r, which

SCOPE FOR FURTHER STUDY

You might also like