You are on page 1of 16

Business Mathematics and Statistics

Group Assignment & Presentation

Submitted by

Jue Yadanar Win - 65190266


Zhaoxu Yang - 65190183
Qiulin Wu - 65190217
1. Descriptive Statistics
(a) Measure of Central tendency, Dispersion, Skewness, and Kurtosis of all
variables

Annual Medical Expenditure (Medexp) in hundreds of dollars


No. of Observations=1000
Minimum =0
Mean = 4.865
Maximum =13
Median = 5
Standard Deviation = 2.841276
Skewness = 0.361864
Kurtosis = 2.46231

Annual Income in Thousand of Dollars (Inc)


No. of Observations= 1,000
Minimum= 22
Mean= 73.571
Maximum= 146
Median= 74
Standard Deviation= 18.60191
Skewness= 0.0958106
Kurtosis= 3.498669

Age in years (age)


No.of Observations=1000
Minimum =21
Mean = 46.42
Maximum =70
Median = 47
Standard Deviation = 13.76419
Skewness = -0.0698053
Kurtosis = 1.814745

Insurance status (lnsur)


No.of Observations=1000
Minimum =0
Mean = 0.477
Maximum =1
Median = 0
Standard Deviation =0.4997206
Skewness =0.0920975
Kurtosis = 1.008482

(b) Histogram for Annual Medical Expenditure

Annual Medical Expenditure (Medexp) in hundreds of dollars

The distribution of the Annual Medical Expenditure is positively skewed to the


right with the skewness of 0.361864. The kurtosis of 2.462341 (less than 3) means that
the distribution has a short and thin tail. This implies the presence of outliers with
higher medical expenditures, contributing to the right skewness in the distribution.

(c) The Coefficient of Variation for the Medical Expenditure

The coefficient of variation is a useful metric in healthcare research, especially


when examining the variability in medical expenditure. In our dataset, the coefficient of
variation of 0.5840238 suggests that the variation in the medical expenditure is
approximately around 58% of the average medical expenditure. This indicates that there
is a high variability from the average medical expenditure. The higher the variability, the
less predictable or consistent the individual expenditures are in relation to the average.
This information is crucial for understanding the range and dispersion of healthcare
costs within the dataset, providing insights into the unpredictability of individual medical
expenditures in comparison to the average.

2. Correlation Analysis

(a) Significant levels


medexp inc age

medexp 1.0000

inc 0.0780** 1.0000

age 0.6650*** 0.0677** 1.0000

Note: Correlation tests *** significant at 1%, ** at 5%,* at 10% significance level.

(b) Directions and Strength

Inc

𝐻0: 𝜌 = 0 (There is no correlation between medexp and inc )


𝐻1: 𝜌 ≠ 0 ( There is a correlation between medexp and Inc )

The coefficient of correlation (r) is 0.0780, and the p-value is 0.0137.

The positive correlation coefficient (0.0780), indicates a weak positive correlation between the
two variables, at the significance level of 5%, providing 95% confidence that the correlation is
statistically significant. Hence, the null hypothesis can be rejected and the alternative
hypothesis is accepted. It suggests that as Annual income(inc) increases or decreases, Medical
expenditure (medexp) tends to increase or decrease as well.
Age

𝐻0: 𝜌 = 0 (There is no correlation between medexp and age)


𝐻2: 𝜌 ≠ 0 ( There is a correlation between medexp and age)

The coefficient of correlation (r) is 0.6650, and the p-value is 0.0000.

The positive sign of the correlation coefficient (0.6650), indicates a strong positive linear
correlation between the two variables, at a significance level of 1%, providing 99% confidence
that the correlation is statistically significant. Hence, the null hypothesis can be rejected and the
alternative hypothesis is accepted. It suggests that as age increases or decreases, Medical
expenditure (medexp) tends to increase or decrease as well.
3. Regression analysis
(a) Multiple Linear Regression Analysis

(b)Overall Model Fit

Number of observations - 1000


This 1000 is the number of observations used in the regression analysis.

Prob>F - 0.0000
Since Prob > F (0.000) is less than the p-value (0.01), the model is statistically significant at the
1% level and the model is a good fit.

R-squared - 0.5015
The R-Square is 0.5015, approximately 50.15% of the variability in the dependent variable
(medexp) can be explained by the age, inc and insur included in the regression model. The
remaining 49.85% is not included in the model.

Adj R - squared- 0.5000


There is not much difference between R-squared (50.15%) and adjusted R-squared (50%)

Parameter Estimates

Age in Years (Age)

𝐻0: There is no significant relationship between Age and Medexp


𝐻1: There is a significant relationship between medexp and age
The coefficient is 0.132764 (Positively related). This suggests that as there is an increase of
age in individuals, medical expenditure tends to increase. This may be due to age-related health
issues.

The P-value is 0.000 (less than 0.01). This means that age in year (age)of individuals is
statistically significant in the medical expenditure (medexp) at 99% confidence level. Hence, the
null hypothesis can be rejected and the alternative hypothesis can be accepted.

Annual Income (Inc)

𝐻0: There is no significant relationship between Inc and medexp


𝐻2: There is a significant relationship between inc and medexp

The coefficient is 0.0090412 (Positively related). Hence, it indicates that higher Annual income
(inc) is associated with higher medical expenditure. Individuals with higher income might be
more willing to spend on healthcare services.

The P-value is 0.009 (less than 0.01). This means that the annual income is significant in the
medical expenditure at 99% confidence level. Hence, the null hypothesis can be rejected and
the alternative hypothesis can be accepted.

Insurance Status (insur)

𝐻0: There is no significant relationship between insur and medexp


𝐻3: There is a significant relationship between insur and medexp

The coefficient is 1.3826 (Positively related). It suggests that having insurance is associated
significantly with higher medical expenditure. Hence, it indicates that the individuals who buy
insurance have higher medical expenditure compared to those who don’t.

The P-value is 0.000 (less than 1). This means that the insurance status is significant in the
medical expenditure at 99% confidence level. Hence, the null hypothesis can be rejected and
the alternative hypothesis can be accepted. .
(c)The prediction of the annual medical expenditure by the regression model

Multiple Linear Regression Function

Y (Dependent variable) = Annual Medical Expenditure

X1 [Age in Years (Age)]- 55

X2 [Annual Income in Thousands of Dollars (Inc)]- 61

X3 [Insurance Status] - No insurance policy

B0 (Constant) = -2.622888
B1 (Age) = 0.132764
B2 (Inc) = 0.0090412
B3 (Insur) = 1.38326

There is no insurance policy.

If X3= 0 (No insurance)


Y= -2.622888 + (0.132764 * 55)+ (0.0090412*61)+ (1.38326*0)
Y= 5.2306452

This means that, when there is no insurance (X3 = 0), the predicted annual medical expenditure
is approximately $5,230.00. The value of X3 doesn’t have an effect on Y.

If X3= 1 (Insurance)

Y = -2.622888 + (0.132764 * 55)+ (0.0090412*61)+ (1.38326*1)


Y = 6.6139052

This indicates that, when there is insurance (X3 = 1), the predicted annual medical expenditure
increases to approximately $6,613.91.

By the multiple regression function, it reveals that the predicted annual medical
expenditure for a 55-year-old patient with an annual income of $61,000 is influenced by factors
such as age, annual income, and insurance status. The absence of insurance is associated with
a predicted expenditure of around $5,230.00, while having insurance increases the predicted
expenditure to approximately $6,613.91.
4.(a) Summary of the findings from the descriptive statistics,
correlation analysis, and regression analysis.

1. Descriptive statistics
For Annual Medical Expenditure (Medexp) in hundreds of dollars

The minimum return to Medxep is 0.


The maximum return of Medexp is 13
The average return (mean) toMedxep is 4.865
The median return to Medexp is 5

Based on the Histogram of Annual Medical Expenditure (medexp), the


distribution is positively skewed to the right with a skewness of 0.361864. The Kurtosis
of 2.46231(less 3) indicates that the distribution has a short, thin tail. The standard
deviation of 2.841276 indicates the moderate dispersion in the level of medical
expenditure in a dataset. This means that most of the people's health care spending is
concentrated around a certain median, but there are some individuals who spend more
than the average, showing some volatility overall.
For Annual Income in Thousand of Dollars (Inc)

The minimum return to Medxep is 22.


The maximum return of Medexp is 146
The average return (mean) toMedxep is 73.571
The median return to Medexp is 74

Based on the histogram of Annual Income in Thousands of Dollars (Inc), the


distribution is peaked and slightly positively skewed to the right with a skewness of
0.0958106. A kurtosis of 3.498669(greater than 3) indicates that the tail of the
distribution is long, and a thin tail with extreme values (outliers). The standard deviation
(18.60191) suggests that the distribution has a high level of dispersion between the
annual expenditure from the average annual expenditure. The overall distribution
fluctuates greatly. Most people's annual income is clustered around a certain median of
73.571, and there are also some outliers.
For Age in years (age)

The minimum return to Medxep is 21


The maximum return of Medexp is 70
The average return (mean) to Medexp is 46.42
The median return to Medexp is 47

Based on the histogram of Age in years (age), the distribution is nearly


symmetric, and slightly negatively skewed to the left with the negative skewness
(-0.0698053). The Kurtosis of 1.814745 (less 3) , this distribution has a short, thick tail
which means that there are no extreme values.The standard deviation 13.76419
suggests that the distribution has a relatively high level of dispersion in ages within the
dataset.
For Insurance Status (lnsur)

Based on the histogram, there are two variables 0- no insurance policy, and 1= if
the individual has insurance policy. Hence, the histogram lines are far from each other.
The kurtosis, 1.008482 (less than 3) indicates that the distribution is flatter, and has a
short, and thick tail which indicates that there are no outliers.

(b). Recommendations to the healthcare researcher on strategies for


targeted interventions to improve healthcare access and affordability
Study goals
The goal of this study of our analysis is to provide recommendations to the
healthcare researcher on strategies for targeted interventions to improve healthcare
accessibility and affordability.

For Annual Income,

The standard deviation of annual income (inc) is the highest among the Age and
Insurance Status and Annual Income, indicating substantial variability in income levels
of the individuals within the dataset. With the positive correlation, it suggests that an
increase in income is associated with higher medical expenditure and lower income is
associated with lower medical expenditure. Hence, considering the significant
variability in annual income and its positive correlation with medical expenditure,
targeted interventions to address the healthcare affordability for individuals with
various income levels are recommended. For example, by targeting interventions
towards lower-income populations, the researcher can address the financial barriers
they are facing to access the healthcare services. This recommendation would reduce
the inequity and accessibility in healthcare access and improve health outcomes for
individuals with limited financial assets. Therefore, focusing the research on Annual
Income for these interventions would be a good decision.

For insurance,
According to the case scenario of the patient of age 55 with an annual income ,
the absence of insurance is associated with a predicted expenditure of around
$5,230.00, while having insurance increases the predicted expenditure to approximately
$6,613.91. The difference between patients with policy or without policy is around
$1,400 which will impact on Medical Expenditure (medexp). In the real world, insurance
policies are likely to reduce the medical expenditure to be more affordable. However ,
according to the regression analysis, the coefficient of regression is positively
correlated with annual income and insurance which means that individuals with higher
income who purchase insurance policy might spend more on healthcare service. Hence,
we recommend the research to analyze why the individuals who purchase the insurance
might have more expenditure rather than the individuals who don’t have one in order to
improve the healthcare affordability.

For age,

Based on the substantial variation in the ages of individuals within the dataset, as
evident from the histogram and the standard deviation of age in years, coupled with a
positive correlation coefficient in the regression analysis, it is suggested that as the age
of individuals increases, medical expenditures tend to rise. Consequently, recognizing
the considerable variability in age and its correlation with medical expenditure, targeted
interventions to enhance healthcare affordability, and accessibility especially for
individuals across different age groups, are recommended. This approach aims to
address the diverse healthcare needs associated with varying ages and income levels,
facilitating more effective and equitable healthcare interventions.

You might also like