Professional Documents
Culture Documents
(2019-2021)
DATASET
ANALYSIS OF DATA
Measures of Central Tendency:
Mean: The mean of a data set is also known as the average value. It is
calculated by dividing the sum of all values in a data set by the number of
values.
Median: The median of a data set is the value that is at the middle of a
data set arranged from smallest to largest.
Mode: The mode is the most common observation of a data set, or the
value in the data set that occurs most frequently.
Standard Deviation: The standard deviation of a dataset is a measure of
the magnitude of deviations between the values of the observations
contained in the dataset.
INTERPRETATION:
We have used measures of central tendency on the amount insured by the
policy holders.
The mean amount insured is ₹ 2,59,97,405 approximately while the
median amount insured is ₹ 2,68,04,555.5.
In the dataset, there are multiple values which are occurring the most. The
smallest mode value is 11,27,788.
The standard deviation is ₹ 1,42,33,972 which is the measure of the
magnitude of deviations between the values of the observations contained
in the dataset.
BOXPLOT
PIE CHART
INTERPRETATION:
The pie chart shows the various insurers providing insurance.
The pie chart shows that the maximum number of policyholders are
insured by Life Insurance Corporation of India.
Also, there are 49 different type of insurance companies in this dataset.
ONE-TAILED t TEST
The One Sample t Test determines whether the sample mean is statistically different
from a known or hypothesized population mean. The One Sample t Test is a
parametric test.
This test is also known as:
Single Sample t Test
The variable used in this test is known as:
Test variable
In a One Sample t Test, the test variable is compared against a "test value", which is a
known or hypothesized value of the mean in the population.
INTERPRTATION:
Interpretation
H0 : The mean amount insured of different age groups of policyholders are equal.
µ1 = µ2 = µ3 = µ4 = …….. = µk
H1 : There is a significant difference between the mean amount insured of the policyholders
across various age groups.
µ1 ≠ µ2 ≠ µ3 ≠ µ4 ≠ …….. ≠ µk
CORRELATION
Interpretation
H0 : There is no correlation between the age of the policyholders and the amount insured by
them.
ρ=0
H1 : There is a relation between the age of the policyholders and the amount insured by them.
ρ≠0
Significance Level: α = 0.05
Observation:
P-value = 0.202
P-value > α
0.202 > 0.050
Thus, we accept the null hypothesis and reject the alternative hypothesis.
Hence, there is no correlation between the age of the policyholders and the amount insured by
them.
REGRESSION
Linear regression is the next step up after correlation. It is used when we want to
predict the value of a variable based on the value of another variable. The variable we
want to predict is called the dependent variable. The variable we are using to predict
the other variable's value is called the independent variable.
This table provides the R and R2 values. The R value represents the simple correlation
and is 0.058 (the "R" Column), which indicates no correlation. The Adjusted R2 value
(the “Adjusted R Square" column) indicates how much of the total variation in the
dependent variable, Amount Insured, can be explained by the independent
variable, Age. In this case, 1% can be explained, which is very small.
This table indicates that the regression model predicts the dependent variable
significantly well. Let’s look at the "Regression" row and go to the "Sig." column.
This indicates the statistical significance of the regression model that was run. Here, p
= 0.202, which is greater than 0.05, and indicates that, overall, the regression model is
not a good fit for the data.
The Coefficients table provides us with the necessary information to predict amount
insured from age, as well as determine whether income contributes statistically
significantly to the model (by looking at the "Sig." column). Furthermore, we can use
the values in the "B" column under the "Unstandardized Coefficients" column to
present the regression equation as:
Amount Insured = 28972079 - 62493(Age)
As we have seen that the age is not a good factor for the determination of amount
insured. We can further analyse that whether other factors are helpful in determination
of the amount insured.
We can find the gender of policyholders and see that whether it is a determining
factor for the amount insured.
We can get the details of salary of the policyholders because generally salary is an
important factor in deciding the amount insured.
BIBLIOGRAPHY
https://libguides.library.kent.edu/SPSS/OneWayANOVA
https://statistics.laerd.com/spss-tutorials/one-way-anova-using-spss-statistics.php
https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php