7varversion RR Group Assignment Mel Econ1066 2110

Basic Econometrics
Research Report Group Assignment
This is a group assignment where you can work in groups of 3-4 other students. All group
members will receive the same marks for the assignment. You must submit an electronic
copy of your assignment in Canvas in pdf, doc or docx format. Hard copies will not be accepted.
Show your calculations (if any) as well as answering the questions in clear full sentences. The
number of tables, graphs, calculations given in parentheses after each question are a guide.
What determines life expectancy?
For this home assignment you will be required to model life expectancy worldwide.
Please use the file: life_expectancy.Rdata (World Bank Database – 2016 values). Please read the
description at the end of this document to understand the variables. In this home assignment
we are going to model life expectancy at birth (dependent variable).
QUESTION 1
Please model the determinants of life expectancy using R: We recommend that you create
the log of life expectancy at birth for modelling purposes.
a) Include a minimum of 7 (seven) explanatory variables in the regression equation and

provide a scatter plot of your dependent and independent variables (7 scatter plots). When
modelling, explain each of your functional form specification choices with respect to:
● Use of logarithms - should you specify a linear or logarithmic form? Why?

● Economic or common sense behind the model - why do you pick this variable?
● Multicollinearity – are the independent variables multicollinear?
● Functional form specification (potential nonlinear relationships, eg: log-linear or
quadratic relationships)
● Sampling – what is the impact of your modelling choices on the sample size?
in writing. You will be graded on model accuracy in this section.
Use OLS standard errors.
(14 marks) 1 Table [regression output] & Explanations, 7 scatter plots
SOLUTION
(Scatter plots’ notes:
Blue line: Smoothed regression line
Green line: Smoothed regression line (with standard error ignored, method = ‘lm’))
A. ANSWERING QUESTIONS AND SCATTER PLOTS

a. GDP per capitaPPP (with scatterplot)
1. Use of logarithms - should you specify a linear or logarithmic form? Why?
Logarithmic form, as shown in the scatterplot. Moreover, as computed in R
Studio, the smoothed regression line demonstrated a rather linear relationship
between the log of the given dependent variable and log of GDP per capita PPP
constant 20, while forced linear regression line (green) could capture most of the
shape of the smoothed regression line (blue).
2. Economic or common sense behind the model - why do you pick this variable?
GDP at consumer costs is the amount of the total value added by manufacturers
nationwide plus all taxes and minus any incentives that are not included in
product value. It is computed without regard for depreciation of manufactured
products or loss and deterioration of resources. Therefore, an increase in this
variable will likely lead to an increase in health expenditure, contributing to better
healthcare and longer lifespan. This correlation could be predicted in such a way
and, therefore, illustrated through graphing.
3. Functional form specification (potential nonlinear relationships, eg: log-linear or
Log-log relationship.
b. Literacy rate total (with scatterplot)

1. Use of logarithmic/ Linear form. Why?
Logarithmic form should be applied as demonstrated in the scatterplot below.
When computed in R Studio, the smoothed regression line (blue) overlapped
much of its part with the forced linear relationship (green) , illustrating a linear
relationship between logged Literacy rate and logged Life expectancy.
The relationship between literacy and life expectancy emphasizes the role of
health and social conditions as critical transmission pathways between literacy
and lifespan. In a broad sense, illiterate individuals would have lower incomes,
leading to limited access to healthcare facilities and insurance. As a result, their
life expectancy would be reduced. This correlation could be foreseen and,
therefore, illustrated.
c. Pollution with scatterplot

Logarithmic form would be chosen. As shown in the scatter diagram, the
smoothed regression line demonstrated a rather linear relationship between logged
air pollution and logged Life expectancy, as the forced linear line was close to the
smoothed regression line. Moreover, this data set when plotted demonstrated
skewed distribution, needing logarithmic form to normalize it.
This variable represents the inhabitants' exposure to atmospheric PM2.5
contamination is characterized as the average degree of exposure of a country's
population to levels of suspended particles with aerodynamic diameters smaller
than 2.5 microns. Such exposure can lead to respiratory problems in the long
term, leading to declined health conditions as well as shortened life expectancy.
This correlation could be foreseen and, therefore, graphed as below.
d. Domestic health exp (with scatterplot)

Logarithmic form could be used. As can be seen in the scatterplot produced by R
Studio, the smoothed regression line demonstrated a relationship between the log
of the given dependent variable and logged Domestic private health expenditure,
while forced linear relationship was close to the smoothed regression line.
Non-public sources include individuals', businesses', and non-profit organisations'
funds. Such funds can be paid to health insurance providers or healthcare in
general, which influences life expectancy tremendously (good healthcare means
life longevity). Such a correlation could be predicted and, therefore, illustrated
through graphing.
Log-Log relationship.
e. Domestic Gov heath exp (with scatterplot)

Logarithmic form. As can be seen in the scatterplot produced by R Studio, if
deviations taken into account, the relationship between logged Domestic
Government Health expenditure and logged Life expectancy at birth could be
described as rather linear, leading to logarithmic form being used to describe the 2
variables’ relationship.
Domestic government health expenditure is, essentially, expenditure on healthcare
in the form of a share of the economy (as calculated by GDP). Generally, the
more the government invests in healthcare, the better the healthcare system
becomes, meaning improved public health. Accordingly, life expectancy will
incline, leading to a correlation that can be graphed as below.
f. People hand washing (with scatterplot)

Logarithmic form. As seen in the scatterplot, the linear relationship between
people hand washing and logged life expectancy was demonstrated. Therefore,
logarithmic form must be used.
This variable represents the proportion of people who live in homes that have a
hand hygiene system with water and soap in their homes. Sanitation facilities
may be stationary or handy and include a sink with running water, containers with
taps, or simply hand washing basins. Essentially, access to hand washing facilities
means better hygiene and lower exposure to bacteria or diseases, leading to
improved health conditions and lifespan. Such a correlation could be predicted
and, therefore, illustrated through graphing.
Log-linear relationship.
g. UHC service
Logarithmic form. When computed in R Studio, the smoothed regression line
overlapped much of its part with the forced linear relationship, illustrating a linear
relationship between UHC service coverage index and logged Life expectancy.
The critical health care coverage index is focused on tracer initiatives such as
maternal, infant, and infant mortality, contagious diseases, chronic illnesses, and
resource capability and access. As described, it is given on a scale of 0 to 100. If
there are more interferences, in the form of UHC service, the healthcare accessed
by people is expected to improve, meaning better health and prolonged life
expectancy. This correlation could be predicted in such a way and, therefore,
illustrated through graphing.
Log-linear relationship
B. IMPACT OF MODELLING CHOICES ON SAMPLE SIZE.

The sample size of this model could be considered as a large sample size that was investigated
for information processing and collection (172 countries over the total number countries
worldwide). Meanwhile, in such a large sample size, the design flaws or biases, if any, could be
tremendously magnified, leading to invalidity of the data set.In this case, it could be beneficial to
choose logarithmic models since those were capable of producing more less biased results by
converting unit change into a percentage change, which could be easily interpreted and clarified.
C. TABLE
Equation: Log (Life expectancy at birth, total (years)) = 1.6746574 -

0.0126116*log(Domestic private health expenditure per capita (current US$)) +
0.0072770*log(Domestic general government health expenditure (% of GDP)) -
0.0062961*log(GDP per capita, PPP (constant 2017 international $)) +
0.0605795*log(Literacy rate, adult total (% of people ages 15 and above)) +
0.0006497*People with basic hand washing facilities including soap and water (% of
population) + 0.0011018*UHC service coverage index - 0.0079602*log(PM2.5 air pollution,
mean annual exposure (micrograms per cubic meter))
Min: The value of the data point furthest below the regression line
Q1: 25% of the residuals are less than - 0.012860
Median: 50% are greater and 50% of the residuals are smaller than 0.002146
Q3: 25% of the residuals are larger than 0.013644
Max: The value of the data point furthest above the regression line
OLS standard error

t-value: = Estimates/ Standard Errors. (how many standard deviations there are between the
estimate and the zero.)
p-value: Since 2e-16 is much smaller than 0.05, this model as a whole is statistically significant.
R-squared: 0.6854: R-squared shows that 68.54% of the dependent variable variance can be
explained by the model.
54 degrees of freedom
D. MULTICOLLINEARITY (Variance Inflation Factor analysis)

To interpret multicollinearity of the 7 independent variables, their variance inflation factors
(VIFs) should be examined. As indicated in the table, the VIFs for 7 explanatory variables could
be seen to vary from more than 1 to more than 5, some of which indicated high correlation
between certain variables. A rule for interpreting VIFs is that if VIF is higher than five then the
variables are highly correlated; meanwhile, a VIF from 1 to 5 indicates only relative correlation,
which is acceptable. As a result, high levels of multicollinearity could be seen amongst the
logged GDP per capita, PPP (constant 2017 international $) (VIF = 5.888244) and UHC service
coverage index (VIF= 5.846582). Other than that, multicollinearity amongst logged Domestic
private health expenditure per capita (current US$), logged Domestic general government health
expenditure (% of GDP), logged Literacy rate adult total (% of people ages 15 and above),
logged People with basic hand washing facilities including soap and water (% of population),
andlogged PM2.5 air pollution, mean annual exposure (micrograms per cubic meter) was
acceptable.
b) Interpret the coefficients on 5 explanatory variables. Describe if the coefficients are

elasticities or semi-elasticities, or simple level variables.
(5 marks)
Due to the interpretation of the VIFs of the 7 previously chosen explanatory variables, the 5
explanatory variables that could be processed from this stage of the paper should only include
logged Domestic private health expenditure per capita (current US$), logged Domestic general
government health expenditure (% of GDP), logged Literacy rate adult total (% of people ages
15 and above), logged People with basic hand washing facilities including soap and water (% of
population), andlogged PM2.5 air pollution, mean annual exposure (micrograms per cubic
meter).
A. LOG-LOG RELATIONSHIPS
i) Literacy rate total

● Coefficient of 0.0605795 (elastic coefficient for log-log relationship)
● Interpretation : %∆Life Expectancy= %∆Literacy rate total *
0.0605795
or if Literacy rate total is increased by 1%, the Life expectancy is expected to increase by
0.0605795%, other things held constant.
ii) PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
● Coefficient of -0.0079602 (elastic coefficient for log-log relationship)
● Interpretation: %∆Life Expectancy= %∆ PM2.5 air pollution,
mean annual exposure (micrograms per cubic meter) * 0.0079602
or if the PM2.5 air pollution, mean annual exposure (micrograms per cubic meter) is
increased by 1%, Life expectancy is expected to decrease by 0.0079602%, other things
held constant.
iii) Domestic private health expenditure per capita (current US$)

● Coefficient of -0.0126116 (elastic coefficient for log-log relationship)
● Interpretation: %∆Life Expectancy= %∆ Domestic private health
expenditure per capita (current US$) * -0.0126116
or if the Domestic private health expenditure per capita (current US$) increases by 1%,
Life expectancy is expected to decrease by 0.0126116%, other things held constant.
iv) Domestic general government health expenditure (% of GDP)

● Coefficient of 0.0072770 (elastic coefficient for log-log relationship)
● Interpretation: %∆Life Expectancy= %∆ Domestic general
government health expenditure (% of GDP) * 0.0072770
or if the Domestic general government health expenditure (% of GDP) increases by 1%,
Life expectancy is expected to increase by 0.0072770%, other things held constant.
B. LOG-LINEAR RELATIONSHIPS
i) People with basic hand washing facilities including soap and water (% of population)
● Coefficient of 0.0006497 (semi- elastic coefficient for log-linear relationship)
● Interpretation %∆Life Expectancy = 100 *d (People with basic hand
washing facilities including soap and water (% of population)) * 0.0006497
or if People with basic hand washing facilities including soap and water (% of
population) is increased by 1 unit, Life Expectancy is expected to increase by 0.06487%,
other things held constant.
c) Interpret the statistical significance of these coefficients using the p-values OR the t-
stat.
(5 marks)
i) Logged Domestic private health expenditure per capita (current US$)
● P-value of 0.334154 (or 33.41254%)

● Interpretation: Given that other things held constant, if the Logged Domestic private
health expenditure per capita (current US$) does not have any relationships with the
Logged Life expectancy at birth total (years), there is a 33.41254% chance that studies
will end up obtaining the same results as seen in the sample due to sample random error.
Moreover, this p-value is not statistically significant (p-value > 0.05)
ii) Logged Domestic general government health expenditure (% of GDP)
● P-value of 0.655027
● Interpretation: Given that other things held constant, with a p-value of 0.655027, if the
Logged Domestic general government health expenditure (% of GDP) does not show any
relationship with Logged Life expectancy at birth total (years), there is 65.5027% chance
of studies ending up receiving the same results as observed in the sample due to random
distribution. Moreover, this p-value is not statistically significant (p-value >>0.05)
iii) Logged Literacy rate, adult total (% of people ages 15 and above)
● P-value of 0.117959
● Interpretation: Given that other things held constant, if the Logged Literacy rate, adult
total (% of people ages 15 and above) does not impact the Logged Life expectancy at
birth total (years), there is a chance of 11.7959% that studies obtaining the same results
as observed in this sample due to sample random error. Moreover, this p - value is not
very statistically significant. (p-value <0.05)
iv) People with basic hand washing facilities including soap and water (% of population)
● P-value 0.000875
● Interpretation: Given that other things held constant, if the People with basic hand
washing facilities including soap and water (% of population) does not show any
influences on Logged Life expectancy at birth total (years), there is a chance of 0.0875%
that studies obtaining the same results as observed in this sample due to sample random
error. In other words, assuming that the model is defined correctly, there is a 99.8125%
chance of being accurate that People with basic hand washing facilities including soap
and water (% of population) actually poses an impact on Logged Life expectancy at birth
total (years). Moreover, this p - value is very statistically significant. (p-value << 0.05)
v) Logged PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
● P-value of 0.638594
● Interpretation: Given that other things held constant, if Logged PM2.5 air pollution,
mean annual exposure (micrograms per cubic meter) does not show any correlations to
Logged Life expectancy at birth total (years), there is a chance of 63.8594% that studies
receiving the same results as observed in this sample due to sample random error.
Moreover, this p - value is not statistically significant (p-value>> 0.05).
vi) In conclusion, it is most likely that the variable, labeled as People with basic hand washing
facilities including soap and water (% of population), will have some impact on the dependent
variable - Logged Life expectancy at birth total (years), and this explanatory variable is also a
statistically significant variable. This independent variable is, orderly, followed by the Logged
Literacy rate, adult total (% of people ages 15 and above), Logged Domestic private health
expenditure per capita (current US$), Logged PM2.5 air pollution, mean annual exposure
(micrograms per cubic meter), and the Logged Domestic general government health expenditure
(% of GDP) is the least likely to have an impact on Logged Life expectancy at birth total (years)
and also the least significant variable. This conclusion is true if when each relationship between
the dependent variable and each independent variable is examined, other things are held constant
and 5 explanatory variables’ correlations to the dependent variable are compared separately.
d) Test for heteroscedasticity in R using the Breusch-Pagan test and copy below the
results. Interpret the results of the Breusch Pagan test.
(2 marks) 1 Table & Explanations
A. 7 INDEPENDENT VARIABLES WITH BP TEST
Explanation: df = 7 as can be seen that logged Life expectancy is regressed on 7 independent

variables. More importantly, it should be noticed that the p-value in this case is 0.3075, which is
higher than 0.05, indicating that the null hypothesis (that the error variance is constant) would
not be rejected, and there may not be heteroscedasticity.
e) Present the results from a) using HAC robust errors! Did any of the standard errors
change significantly?
(3 marks) 1 Table & Explanations
HAC ROBUST ERROR:
SECTION A TABLE:
Did any of the standard errors change significantly?

None of the standard errors below change significantly.
f) Describe each of the five “Gauss Markov” assumptions, (define them) and explain in
the context of the regression output in a) whether these assumptions are likely to be
met in these models.
(5 marks)
A. GAUSS MARKOV ASSUMPTIONS:

1. Linearity: The population process has to be all linear in parameters. In other words, all
of the parameters estimated (by OSL method) must be linear.
In section a, this assumption is likely to be met due to the conversion of the variables
themselves into the logged version of it to linearize the parameters. After applying the
OSL, the linear parameters were obtained:
Log (Life expectancy at birth, total (years)) = 1.6746574 - 0.0126116*log(Domestic
private health expenditure per capita (current US$)) + 0.0072770*log(Domestic general
government health expenditure (% of GDP)) - 0.0062961*log(GDP per capita, PPP
(constant 2017 international $)) + 0.0605795*log(Literacy rate, adult total (% of people
ages 15 and above)) + 0.0006497*People with basic hand washing facilities including
soap and water (% of population) + 0.0011018*UHC service coverage index -
0.0079602*log(PM2.5 air pollution, mean annual exposure (micrograms per cubic
meter))
2. A random sample: Within the population, a random sample would occur if each single
data point within the population is equally likely to be sampled or be picked. This
assumption also implies that all of the data points must come from the same population.
In section a, this assumption is met since all the data are from the same investigated
population. Moreover, each data point, regardless of the lack of some data points, is
equally likely to be picked and examined equally by the OLS method.
3. Zero conditional mean of error: The error term u has an expected value of zero given
any values of the independent variables.
In section a, this is a bit less likely to be met, as indicated in the equation deducted after
using the OLS method. From the residuals versus fitted scatter plot below, it can be seen
that the residuals were distributed all around the 0 line at random. This indicates that the
assumption of a linear relationship is appropriate. Around the 0 line, the residuals form a
linear horizontal band. This implies that the variances of the error terms are the same.
There is no single residual standing out from the fundamental random pattern of the
residuals. This implies that no outliers exist. Although this does not guarantee that there
are no non-zero common means, it increases the likelihood that the model satisfies zero
conditional means of error.
4. No perfect collinearity: There must be no perfect collinearity among regressors, which

means there are no independent variables that can be exactly expressed by other
independent variables.
In section a, there is actually no perfect collinearity despite the presence of highly
correlated independent variables, as tested by the variance inflation factors (VIFs)
5. Homoscedasticity: the error of the variance is constant, despite the values of the
regressors.
In section a, the homogeneity of variance was not successfully rejected; therefore, this
assumption is met.
g) Please include the OECD dummy in the regression output, and interpret it, along with its
level of significance.
(2 marks) 1 Table [regression output]
A. 7 INDEPENDENT VARIABLES
Interpretation:
● When OECD = 1, other independent variables held 0 then the log(Life_expectancy) =

1.6746318 + 0.0034241 = 1.6780559
● When OECD = 0, other independent variables held 0 then the log(Life_expectancy) =
1.6746318
(The data points belonging to the group OECD = 1, when other independent variables held 0, has
a higher logged life expectancy than those belonging to the group OECD = 0)
● Level of significance of OECD (dummy variable) is 0.89692 higher than 0.05; therefore,
it does not have any significant impacts on the dependent variable)
h) Explain the practical/ policy relevance of your results in 1 paragraph. Describe a
minimum of 3 policy proposals [ each 2-3 sentences] aiming at increasing life
expectancy. (4 marks)
The significance of the results of this paper lies in how it has practical and policy
relevance. It utilized a number of methods (mostly OSL method) to express the
relationship between the dependent variable - logged life expectancy - by other dependent
variables. The sample was also derived from a credible source - Worldbank.org. By
estimating the parameters and relationships of independent variables with the concerned
variable - Expectancy, this paper can help predict the expectancy at birth of people in a
country based on certain values of the 5, or even 7, explanatory variables.
A. 7 VARIABLES
As seen in the regression output table for 7 originally chosen independent variables
People with basic hand washing facilities including soap and water (% of population) Logged
Literacy rate, adult total (% of people ages 15 and above), UHC service coverage index, GDP
per capita PPP, Logged Domestic private health expenditure per capita (current US$), People
with basic hand washing facilities including soap and water (% of population), Logged PM2.5 air
pollution mean annual exposure (micrograms per cubic meter), and the Logged Domestic general
government health expenditure (% of GDP), some significant correlation could be interpreted
based on the p-values, which resulted in the following policy proposals.
1. With a p-value of 0.000875, People with basic hand washing facilities including soap and
water (% of population) shows statistically significant impact on Life expectancy (years).
The graph of logged Life expectancy being regressed on People with basic hand washing
facilities including soap and water (% of population) shows positive correlation. As a
result, the government should grant a larger percentage of people access to hand washing
facilities by releasing subsidies for this type of products.
2. With a p-value of 0.117959, the logged Literacy rate, adult total (% of people ages 15 and
above) shows relatively significant impact on logged Life expectancy (years). The graph
of logged Life expectancy being regressed on Literacy rate, adult total (% of people ages
15 and above) shows positive correlation. As a result, the government should provide free
access to institutions for the citizens to improve their Literacy rate, leading to an increase
in the logged Literacy rate and, therefore, in the logged Life expectancy (years).
3. With a p-value of 0.449, the logged Domestic general government health expenditure (%
of GDP) may influence the logged Life expectancy; they also demonstrated a positive
relationship. As a result, the government should spend more on facilitating hospitals and
clinics to improve their healthcare system, leading to an opportunity for an increase in the
logged Life expectancy (years).
Assignment total: 40 marks
VARIABLE DESCRIPTIONS: Source: www.worldbank.org
Access to Access to electricity is the percentage of the population with access to

electricity (% of electricity. Electrification data are collected from industry, national surveys
population) and international sources.
Life expectancy at Life expectancy at birth indicates the number of years a newborn infant
birth, total (years) would live if prevailing patterns of mortality at the time of its birth were to
stay the same throughout its life.
Domestic private Current private expenditures on health per capita expressed in current US
health expenditure dollars. Domestic private sources include funds from households,
per capita (current corporations and non-profit organizations. Such expenditures can be either
US$) prepaid to voluntary health insurance or paid directly to healthcare
providers.
Domestic general Public expenditure on health from domestic sources as a share of the
government health economy as measured by GDP.
expenditure (% of
GDP)
GDP per capita, GDP per capita based on purchasing power parity (PPP). PPP GDP is gross
PPP (constant 2017 domestic product converted to international dollars using purchasing power
international $) parity rates. An international dollar has the same purchasing power over
GDP as the U.S. dollar has in the United States. GDP at purchaser's prices is
the sum of gross value added by all resident producers in the country plus
any product taxes and minus any subsidies not included in the value of the
products. It is calculated without making deductions for depreciation of
fabricated assets or for depletion and degradation of natural resources. Data
are in constant 2017 international dollars.
Government General government expenditure on education (current, capital, and
expenditure on transfers) is expressed as a percentage of GDP. It includes expenditure
education, total (% funded by transfers from international sources to government. General
of GDP) government usually refers to local, regional and central governments.
Military Military expenditures data from SIPRI are derived from the NATO
expenditure (% of definition, which includes all current and capital expenditures on the armed
GDP) forces, including peacekeeping forces; defense ministries and other
government agencies engaged in defense projects; paramilitary forces, if
these are judged to be trained and equipped for military operations; and
military space activities. Such expenditures include military and civil
personnel, including retirement pensions of military personnel and social
services for personnel; operation and maintenance; procurement; military
research and development; and military aid (in the military expenditures of
the donor country). Excluded are civil defense and current expenditures for
previous military activities, such as for veterans' benefits, demobilization,
conversion, and destruction of weapons. This definition cannot be applied
for all countries, however, since that would require much more detailed
information than is available about what is included in military budgets and
off-budget military expenditure items. (For example, military budgets might
or might not cover civil defense, reserves and auxiliary forces, police and
paramilitary forces, dual-purpose forces such as military and civilian police,
military grants in kind, pensions for military personnel, and social security
contributions paid by one part of government to another.)
Compulsory Duration of compulsory education is the number of years that children are
education, duration legally obliged to attend school.
(years)
Literacy rate, adult Adult literacy rate is the percentage of people ages 15 and above who can
total (% of people both read and write with understanding a short simple statement about their
ages 15 and above) everyday life.
Total alcohol Total alcohol per capita consumption is defined as the total (sum of recorded
consumption per and unrecorded alcohol) amount of alcohol consumed per person (15 years
capita (liters of of age or older) over a calendar year, in litres of pure alcohol, adjusted for
pure alcohol, tourist consumption.
projected estimates,
15+ years of age)
Suicide mortality Suicide mortality rate is the number of suicide deaths in a year per 100,000
rate (per 100,000 population. Crude suicide rate (not age-adjusted).
population)
Smoking Prevalence of smoking is the percentage of men and women ages 15 and
prevalence, total over who currently smoke any tobacco product on a daily or non-daily basis.
(ages 15+) It excludes smokeless tobacco use. The rates are age-standardized.
People with basic The percentage of people living in households that have a handwashing
hand washing facility with soap and water available on the premises. Handwashing
facilities including facilities may be fixed or mobile and include a sink with tap water, buckets
soap and water (% with taps, tippy-taps, and jugs or basins designated for handwashing. Soap
of population) includes bar soap, liquid soap, powder detergent, and soapy water but does
not include ash, soil, sand or other handwashing agents.
Coverage index for essential health services (based on tracer interventions
UHC service that include reproductive, maternal, newborn and child health, infectious
coverage index diseases, noncommunicable diseases and service capacity and access). It is
presented on a scale of 0 to 100.
PM2.5 air Population-weighted exposure to ambient PM2.5 pollution is defined as the
pollution, mean average level of exposure of a nation's population to concentrations of
annual exposure suspended particles measuring less than 2.5 microns in aerodynamic
(micrograms per diameter, which are capable of penetrating deep into the respiratory tract and
cubic meter) causing severe health damage. Exposure is calculated by weighting mean
annual concentrations of PM2.5 by population in both urban and rural areas.

7varversion RR Group Assignment Mel Econ1066 2110

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

7varversion RR Group Assignment Mel Econ1066 2110

Uploaded by

Copyright:

Available Formats

Basic Econometrics

Research Report Group Assignment

What determines life expectancy?

a) Include a minimum of 7 (seven) explanatory variables in the regression equation and

● Use of logarithms - should you specify a linear or logarithmic form? Why?

in writing. You will be graded on model accuracy in this section.

Use OLS standard errors.

(14 marks) 1 Table [regression output] & Explanations, 7 scatter plots

A. ANSWERING QUESTIONS AND SCATTER PLOTS

b. Literacy rate total (with scatterplot)

c. Pollution with scatterplot

d. Domestic health exp (with scatterplot)

e. Domestic Gov heath exp (with scatterplot)

f. People hand washing (with scatterplot)

B. IMPACT OF MODELLING CHOICES ON SAMPLE SIZE.

Equation: Log (Life expectancy at birth, total (years)) = 1.6746574 -

Q1: 25% of the residuals are less than - 0.012860

Q3: 25% of the residuals are larger than 0.013644

OLS standard error

D. MULTICOLLINEARITY (Variance Inflation Factor analysis)

b) Interpret the coefficients on 5 explanatory variables. Describe if the coefficients are

i) Literacy rate total

iii) Domestic private health expenditure per capita (current US$)

iv) Domestic general government health expenditure (% of GDP)

i) Logged Domestic private health expenditure per capita (current US$)

● P-value of 0.334154 (or 33.41254%)

ii) Logged Domestic general government health expenditure (% of GDP)

A. 7 INDEPENDENT VARIABLES WITH BP TEST

Explanation: df = 7 as can be seen that logged Life expectancy is regressed on 7 independent

HAC ROBUST ERROR:

Did any of the standard errors change significantly?

A. GAUSS MARKOV ASSUMPTIONS:

4. No perfect collinearity: There must be no perfect collinearity among regressors, which

● When OECD = 1, other independent variables held 0 then the log(Life_expectancy) =

Assignment total: 40 marks

VARIABLE DESCRIPTIONS: Source: www.worldbank.org

Access to Access to electricity is the percentage of the population with access to

You might also like