Professional Documents
Culture Documents
This is a group assignment where you can work in groups of 3-4 other students. All group
members will receive the same marks for the assignment. You must submit an electronic
copy of your assignment in Canvas in pdf, doc or docx format. Hard copies will not be accepted.
Show your calculations (if any) as well as answering the questions in clear full sentences. The
number of tables, graphs, calculations given in parentheses after each question are a guide.
For this home assignment you will be required to model life expectancy worldwide.
Please use the file: life_expectancy.Rdata (World Bank Database – 2016 values). Please read the
description at the end of this document to understand the variables. In this home assignment
we are going to model life expectancy at birth (dependent variable).
QUESTION 1
Please model the determinants of life expectancy using R: We recommend that you create
the log of life expectancy at birth for modelling purposes.
SOLUTION
(Scatter plots’ notes:
Blue line: Smoothed regression line
Green line: Smoothed regression line (with standard error ignored, method = ‘lm’))
g. UHC service
1. Use of logarithmic/ Linear form. Why?
Logarithmic form. When computed in R Studio, the smoothed regression line
overlapped much of its part with the forced linear relationship, illustrating a linear
relationship between UHC service coverage index and logged Life expectancy.
2. Economic or common sense behind the model - why do you pick this variable?
The critical health care coverage index is focused on tracer initiatives such as
maternal, infant, and infant mortality, contagious diseases, chronic illnesses, and
resource capability and access. As described, it is given on a scale of 0 to 100. If
there are more interferences, in the form of UHC service, the healthcare accessed
by people is expected to improve, meaning better health and prolonged life
expectancy. This correlation could be predicted in such a way and, therefore,
illustrated through graphing.
3. Functional form specification (potential nonlinear relationships, eg: log-linear or
quadratic relationships)
Log-linear relationship
C. TABLE
Min: The value of the data point furthest below the regression line
Median: 50% are greater and 50% of the residuals are smaller than 0.002146
Max: The value of the data point furthest above the regression line
p-value: Since 2e-16 is much smaller than 0.05, this model as a whole is statistically significant.
R-squared: 0.6854: R-squared shows that 68.54% of the dependent variable variance can be
explained by the model.
54 degrees of freedom
A. LOG-LOG RELATIONSHIPS
ii) PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
● Coefficient of -0.0079602 (elastic coefficient for log-log relationship)
● Interpretation: %∆Life Expectancy= %∆ PM2.5 air pollution,
mean annual exposure (micrograms per cubic meter) * 0.0079602
or if the PM2.5 air pollution, mean annual exposure (micrograms per cubic meter) is
increased by 1%, Life expectancy is expected to decrease by 0.0079602%, other things
held constant.
B. LOG-LINEAR RELATIONSHIPS
i) People with basic hand washing facilities including soap and water (% of population)
● Coefficient of 0.0006497 (semi- elastic coefficient for log-linear relationship)
● Interpretation %∆Life Expectancy = 100 *d (People with basic hand
washing facilities including soap and water (% of population)) * 0.0006497
or if People with basic hand washing facilities including soap and water (% of
population) is increased by 1 unit, Life Expectancy is expected to increase by 0.06487%,
other things held constant.
c) Interpret the statistical significance of these coefficients using the p-values OR the t-
stat.
(5 marks)
● P-value of 0.655027
● Interpretation: Given that other things held constant, with a p-value of 0.655027, if the
Logged Domestic general government health expenditure (% of GDP) does not show any
relationship with Logged Life expectancy at birth total (years), there is 65.5027% chance
of studies ending up receiving the same results as observed in the sample due to random
distribution. Moreover, this p-value is not statistically significant (p-value >>0.05)
iii) Logged Literacy rate, adult total (% of people ages 15 and above)
● P-value of 0.117959
● Interpretation: Given that other things held constant, if the Logged Literacy rate, adult
total (% of people ages 15 and above) does not impact the Logged Life expectancy at
birth total (years), there is a chance of 11.7959% that studies obtaining the same results
as observed in this sample due to sample random error. Moreover, this p - value is not
very statistically significant. (p-value <0.05)
iv) People with basic hand washing facilities including soap and water (% of population)
● P-value 0.000875
● Interpretation: Given that other things held constant, if the People with basic hand
washing facilities including soap and water (% of population) does not show any
influences on Logged Life expectancy at birth total (years), there is a chance of 0.0875%
that studies obtaining the same results as observed in this sample due to sample random
error. In other words, assuming that the model is defined correctly, there is a 99.8125%
chance of being accurate that People with basic hand washing facilities including soap
and water (% of population) actually poses an impact on Logged Life expectancy at birth
total (years). Moreover, this p - value is very statistically significant. (p-value << 0.05)
v) Logged PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
● P-value of 0.638594
● Interpretation: Given that other things held constant, if Logged PM2.5 air pollution,
mean annual exposure (micrograms per cubic meter) does not show any correlations to
Logged Life expectancy at birth total (years), there is a chance of 63.8594% that studies
receiving the same results as observed in this sample due to sample random error.
Moreover, this p - value is not statistically significant (p-value>> 0.05).
vi) In conclusion, it is most likely that the variable, labeled as People with basic hand washing
facilities including soap and water (% of population), will have some impact on the dependent
variable - Logged Life expectancy at birth total (years), and this explanatory variable is also a
statistically significant variable. This independent variable is, orderly, followed by the Logged
Literacy rate, adult total (% of people ages 15 and above), Logged Domestic private health
expenditure per capita (current US$), Logged PM2.5 air pollution, mean annual exposure
(micrograms per cubic meter), and the Logged Domestic general government health expenditure
(% of GDP) is the least likely to have an impact on Logged Life expectancy at birth total (years)
and also the least significant variable. This conclusion is true if when each relationship between
the dependent variable and each independent variable is examined, other things are held constant
and 5 explanatory variables’ correlations to the dependent variable are compared separately.
d) Test for heteroscedasticity in R using the Breusch-Pagan test and copy below the
results. Interpret the results of the Breusch Pagan test.
(2 marks) 1 Table & Explanations
e) Present the results from a) using HAC robust errors! Did any of the standard errors
change significantly?
(3 marks) 1 Table & Explanations
SECTION A TABLE:
In section a, this is a bit less likely to be met, as indicated in the equation deducted after
using the OLS method. From the residuals versus fitted scatter plot below, it can be seen
that the residuals were distributed all around the 0 line at random. This indicates that the
assumption of a linear relationship is appropriate. Around the 0 line, the residuals form a
linear horizontal band. This implies that the variances of the error terms are the same.
There is no single residual standing out from the fundamental random pattern of the
residuals. This implies that no outliers exist. Although this does not guarantee that there
are no non-zero common means, it increases the likelihood that the model satisfies zero
conditional means of error.
Interpretation:
The significance of the results of this paper lies in how it has practical and policy
relevance. It utilized a number of methods (mostly OSL method) to express the
relationship between the dependent variable - logged life expectancy - by other dependent
variables. The sample was also derived from a credible source - Worldbank.org. By
estimating the parameters and relationships of independent variables with the concerned
variable - Expectancy, this paper can help predict the expectancy at birth of people in a
country based on certain values of the 5, or even 7, explanatory variables.
A. 7 VARIABLES
As seen in the regression output table for 7 originally chosen independent variables
People with basic hand washing facilities including soap and water (% of population) Logged
Literacy rate, adult total (% of people ages 15 and above), UHC service coverage index, GDP
per capita PPP, Logged Domestic private health expenditure per capita (current US$), People
with basic hand washing facilities including soap and water (% of population), Logged PM2.5 air
pollution mean annual exposure (micrograms per cubic meter), and the Logged Domestic general
government health expenditure (% of GDP), some significant correlation could be interpreted
based on the p-values, which resulted in the following policy proposals.
1. With a p-value of 0.000875, People with basic hand washing facilities including soap and
water (% of population) shows statistically significant impact on Life expectancy (years).
The graph of logged Life expectancy being regressed on People with basic hand washing
facilities including soap and water (% of population) shows positive correlation. As a
result, the government should grant a larger percentage of people access to hand washing
facilities by releasing subsidies for this type of products.
2. With a p-value of 0.117959, the logged Literacy rate, adult total (% of people ages 15 and
above) shows relatively significant impact on logged Life expectancy (years). The graph
of logged Life expectancy being regressed on Literacy rate, adult total (% of people ages
15 and above) shows positive correlation. As a result, the government should provide free
access to institutions for the citizens to improve their Literacy rate, leading to an increase
in the logged Literacy rate and, therefore, in the logged Life expectancy (years).
3. With a p-value of 0.449, the logged Domestic general government health expenditure (%
of GDP) may influence the logged Life expectancy; they also demonstrated a positive
relationship. As a result, the government should spend more on facilitating hospitals and
clinics to improve their healthcare system, leading to an opportunity for an increase in the
logged Life expectancy (years).