You are on page 1of 23

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/346345860

A research study on the variables affecting Life Expectancy Descriptive and


inferential statistics with Excel and R. Data, Models and Decisions -Professor
Pompeo Dalla Posta

Research · November 2020

CITATIONS READS
0 15

2 authors, including:

Suresh Kumar Karna


Beijing Normal University
1 PUBLICATION   0 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

A research study on the variables affecting Life Expectancy View project

All content following this page was uploaded by Suresh Kumar Karna on 28 November 2020.

The user has requested enhancement of the downloaded file.


19th July
20

A research study on the variables affecting Life Expectancy


Descriptive and inferential statistics with Excel and R.

Data, Models and Decisions - Professor Pompeo Dalla Posta

Elisa D’Odorico
201929070006
dodorico.elisa@gmail.com

Suresh K. Karna
201929070044
sureshk.karna@gmail.com
Table of Contents
Introduction .................................................................................................................................................................3

Background ..................................................................................................................................................................3

Figure 1. Life Expectancy (1960-2018)...........................................................................................................4

Research question ........................................................................................................................................................ 4

Literature review ......................................................................................................................................................... 4

Significance and limitations of the study: adjustments and data sources ............................................................... 6

Description of the variables ........................................................................................................................................6

Table 1. Structure Dataset “Life Expectancy” on R ....................................................................................... 6


1. Life expectancy at birth .............................................................................................................................. 7
2. Total Population .........................................................................................................................................7
3. Current health expenditure ......................................................................................................................... 8
4. GDP ............................................................................................................................................................ 8
5. Mortality caused by road traffic injury .......................................................................................................8
6. Mortality rate attributed to unsafe water, unsafe sanitation and lack of hygiene ........................................9
7. Prevalence of HIV ......................................................................................................................................9
8. Mortality rate under 5-year-old ..................................................................................................................9
9. Mortality from CVD, cancer, diabetes or CRD .......................................................................................... 9
10. Total alcohol consumption per capita .................................................................................................. 10
11. Mortality rate attributed to household and ambient air pollution......................................................... 10

Descriptive Statistics Analysis .................................................................................................................................. 10

Life Expectancy ...................................................................................................................................................... 10


Figure 2. Histogram Life expectancy ............................................................................................................ 11
Table 2. 5 Number Summary for Life Expectancy on R ................................................................................ 11

Life Expectancy and Health Expenditure ............................................................................................................... 11


Figure 3. Life expectancy against Current expenditure in health in proportion to GDP .............................. 12
Table 3. Correlation between Life expectancy and the other variables. ....................................................... 12

Life Expectancy and Mortality due to Road Injury, Lack of Hygiene and Pollution .............................................. 12
Figure 4. Life expectancy and mortality (other than natural death), region-wise ........................................ 13
Figure 5. Life expectancy and mortality (other than natural death), income-wise ....................................... 13

1
Life Expectancy and Infant Mortality ..................................................................................................................... 13
Figure 6. Life expectancy and infant mortality rate ...................................................................................... 14

Life Expectancy and Alcohol Consumption ........................................................................................................... 14


Figure 7. Life expectancy and Alcohol consumption..................................................................................... 15

Life Expectancy and Mortality due to disease ........................................................................................................ 15


Figure 8. Life expectancy and mortality due to diseases ............................................................................... 15

Inferential statistics ................................................................................................................................................... 16

Multiple Linear Regression..................................................................................................................................... 16


Table 4. Output Multiple Linear Regression on R ......................................................................................... 17
Table 5. Output Multiple Linear Regression on Excel .................................................................................. 18
Estimate values for the variables ........................................................................................................................ 18
Goodness of fit ................................................................................................................................................... 19
p-value................................................................................................................................................................ 19
F-statistics .......................................................................................................................................................... 19
Residual Standard Error ..................................................................................................................................... 19

Discussion of the results obtained and Conclusion ................................................................................................. 20

References .................................................................................................................................................................. 20

2
Introduction

With this research paper, we aim to apply the tools and concepts learned during the course “Data,
Models and Decisions” on a real-life collection of observations on factors affecting life expectancy.
We will consider demographic variables, income composition, and mortality rates in a cross-
sectional dataset for the year 2016 for 120 countries. The paper will assess whether factors such
as GDP, health expenditure, the mortality due to traffic accidents or alcohol consumption may
affect life expectancy. A multiple regression model with ten independent variables and life
expectancy as the dependent variable will be used. The data are collected from the World Bank
and the World Health Organization (WHO). In 2016, the WHO conducted several statistics studies
on mortality. For this reason, this year has been selected for our research. Several adjustments and
simplifications have been applied to the data to give us the possibility to answer a complex
phenomenon with the fundamental statistic, Excel, and R.

Background

In the past decades, as we can see from fig. 1, the world's life expectancy has improved thanks to
the development of technology, the growth of the economy, and the improvement of social and
living conditions continuously. The World Bank Data develops and collects economic, social, and
health datasets from different organizations, such as the Global Health Observatory (GHO), WHO,
and United Nations (UN). Each variable's dataset is available for the public at data.worldbank.org.
We will consider datasets related to life expectancy, health, social and economic factors collected
from the World Bank Data in 2016. The dataset's usability is not high due to limited information
for many countries and several missing values. It will follow an explanation of the method we have
used to prepare our final dataset, used for the descriptive and inferential analysis, which is included
in excel and CSV format. Only 120 countries will be considered in our paper, and they will be
regarded as a sample for explaining the phenomenon.

3
Figure 1. Life Expectancy (1960-2018)

Life Expectancy (in Years)


75

70

65

60

55

50
1960 1970 1980 1990 2000 2010

Data Source: World Bank Data.

Research question

Which are the factors affecting life expectancy? Our research focuses on whether the ten variables
we have selected affect life expectancy by considering the life expectancy by region and by income
class. Several can be the outcomes of the research; for example, if health expenditure is positively
related to life likelihood, a country with a low life expectancy may increase its healthcare
expenditure to improve its average lifespan. Does life expectancy positively or negatively correlate
with variables such as alcohol, income, and sanitation? If drinking alcohol and driving imprudently
(as the possible cause of traffic accidents) impact life expectancy, tighter policies on alcohol
consumption and higher effort on implementing effective traffic laws may increase life expectancy
in those countries with a short lifespan.

Literature Review

Several studies have been undertaken on factors affecting life expectancy, considering
demographic variables, income composition, and mortality rates. Not only by studying cross-

4
sectional data but also panel data for a period of a few decades. The research questions wanted to
assess the significant factors contributing to the improvement of life expectancy in the years and
also understand whether life expectancy is influenced by socioeconomic and demographic
inequalities such as gender, race, and education. The World Bank calculation for Life expectancy
at birth is the average number of years a newborn is expected to live if mortality patterns at the
time of its birth remain constant in the future. It summarizes the overall mortality rates for a
population and the model that prevails among age groups in a given year. High mortality for young
age groups significantly lowers the life expectancy at birth. But if people survive their childhood
in a country with high child mortality, they may live much longer. Therefore, in the model, a low
life expectancy at birth may also be caused by high childhood mortality. Our aim with this research
question is first to test our knowledge and put it into practice. We are aware that our data have
several limitations, and the use of basic statistics may not give real results explaining the
phenomenon.

5
Significance and limitations of the study: adjustments and data sources

The dataset "Life Expectation" is a cross-sectional dataset composed of 120 observations collected
in 2016 representing the countries' life expectations and several economic, health, and social
variables that may or may not influence life expectancy. The data source is the World Bank and
the WHO data. I have read several articles and papers to understand how to manage complex
datasets like those downloaded from the World Bank. The primary issue has been how to manage
the missing values and which method was the most effective in creating an optimal dataset that
could fit the requirements of R and could be used to perform an inferential analysis. To create a
usable dataset, especially following R's requirements, we opted for a convenience sampling. We
collected each variable's observations for 2016, but several countries have almost no value for all
the parameters. The techniques on how to manage these missing real-life data are very complex,
and also major research studies have to work with limited quality data. Therefore, we performed a
listwise deletion by eliminating the observations with missing values. For this reason, our dataset
is limited to 120 observations/countries. We deleted 100 observations from the original list of
countries, n/N>0.5. Despite this, with the purpose of simplifying our research analysis, we will
consider our sample not as part of the cross-sectional dataset for 2016 of 220 variables, but as a
sample of a much larger population spanning among different years. We are not interested in yearly
growth, or any property of the year change, or the country in itself but mainly in the relationship
between life expectancy and the variables collected. For these reasons, we take the freedom to
perform both a descriptive and inferential statistical analysis on our dataset, considering it as a
random sample of 120 observations for life expectancy, the country or year of the observation is
not crucial in our study. Still, we will try to assess regional and income patterns in life expectancy.

Description of the variables


The dataset is composed of 14 variables and 120 observations (Table 1): 3 qualitative variables
and 11 quantitative variables. The qualitative variables are the country where the data are taken
from, the region of belonginess and the income class.

Table 1. Structure Dataset “Life Expectancy” on R

6
The cross-sectional dataset depicted above are summarized to “Region-wise” and “Income-wise”
in EXCEL to quantitatively describe features from the population as the graphical representation
for 120 elements (countries) with ten variables may not be informative all the time.

The 11 quantitative variables consist of life expectancy and 10 other quantitative variables such as
mortality rates, GDP, and population, which will be briefly introduced here:

1. Life expectancy at birth in total years: Life expectancy at birth is a quantitative ratio
variable. It indicates the number of years a newborn child would live if prevailing patterns
of mortality at the time of its birth stay constant during the time. The weighted average is
used as aggregation method. Mortality rates for different age groups and overall mortality
indicators are the essential parameters for the welfare of a country. Mortality rates are used
to identify vulnerable populations and the socioeconomic development across countries
when data on the incidence and prevalence of diseases are unavailable. A limitation of the
dataset is that it is not a collection of observed data, but a prospect based on interpolated
data from 5-year period data.

2. Total Population: Total population is based on the count all residents regardless of legal
status or citizenship. Summation is the aggregation method and the values are midyear
estimates. Population can be a data to assess a country's sustainability. An increase in
immigration or a higher birth/death ratio can have an impact on natural resources and social
structure of a country. A high population may reduce the availability of food, energy, water,

7
social services, and infrastructure and so reduce the quality of life and maybe life
expectancy. On the other hand, a decreasing population size, with low birth rates and, and
people moving abroad can damage the long-run competitiveness and economy of a country.
A Limitation for this data is the lack reliable recent census data for several countries,
especially developing countries. The data available are mainly based on estimates of the
United Nations. Method for estimating the population are based on fertility, mortality, and
net migration data, from small or limited in coverage sample surveys, and so are susceptible
to biases and errors in both the model and the data.

3. Current health expenditure as % of GDP: The indicator considers health expenditure


estimates of current health expenditures for healthcare services and goods consumed every
year but does not include capital health expenditures such as buildings, machinery, IT and
stocks of vaccines for emergency or outbreaks. The estimates are realized by the World
Health Organization and it is considered a comprehensive data for health spending in a
country and therefore an indicator for policymaking. The weighted average is the
aggregation method.

4. GDP in current US$: GDP at purchaser's prices is the sum of gross value added by all
resident producers in the economy plus any product taxes and minus any subsidies not
included in the value of the products. Data are in current U.S. dollars. Dollar figures for
GDP are converted from domestic currencies using single year official exchange rates.
(World Bank). GDP despite being widely used, it may not be the most effective indicator
for aggregated economic performance for all countries. Different countries use different
methods, definitions, and reporting standards. Actual practice may disclose from these
standards, in particular in developing countries, which have limited resources, training, and
budgets required to produce reliable and comprehensive series of national accounts
statistics.

5. Mortality caused by road traffic injury (per 100,000 people): It is estimated weighted
average of road traffic fatal injury deaths per 100,000 population.

8
6. Mortality rate attributed to unsafe water, unsafe sanitation and lack of hygiene (per 100,000
population): This mortality rate represents the deaths attributable to unsafe water,
sanitation and hygiene focusing on inadequate services per 100,000 population. The death
rates are calculated by dividing the number of deaths by the total population. This estimate
considers the deaths caused by diarrheal diseases, intestinal nematode infections, and
protein-energy malnutrition are taken into account (WHO). These deaths, according to the
World Health Organization, could be prevented if adequate sanitary services would be
provided.

7. Prevalence of HIV, total (% of population ages 15-49): Prevalence of HIV refers to the
percentage of people ages 15-49 who are infected with HIV. The estimates are provided
by UNAIDS. However, the availability of data on health status is limited and represents a
major constraint in assessing the health situation of a developing country. Estimates of
prevalence and incidence are available for some diseases but are often unreliable and
incomplete. (World Bank).

8. Mortality rate under 5-year-old (per 1,000 live births): Under-five mortality rate is the
probability per 1,000 that a newborn baby will die before reaching age five, if subject to
age-specific mortality rates of the specified year. The estimates are developed by the UN
Group for Child Mortality Estimation childmortality.org. Under-five mortality rates are
higher for males in comparison to females in countries in which parental gender
preferences are not significant. This mortality rate also captures the effect of gender
discrimination, as malnutrition and medical interventions have more significant impacts to
this age group. In countries with higher female under-five mortality, girls are likely to have
less access to resources than boys (World Bank). These data are estimates, since vital
registration systems are not complete developing countries and they are obtained from
sample surveys or derived by applying indirect estimation techniques to registration,
census, or survey data.

9. Mortality from CVD, cancer, diabetes or CRD between exact ages 30 and 70 (%): This
mortality rate is the percent of 30-year-old-people who would die before their 70th birthday

9
from any of cardiovascular disease, cancer, diabetes, or chronic respiratory disease,
assuming that s/he would experience current mortality rates at every age and s/he would
not die from any other cause of death (e.g., injuries or HIV/AIDS) (World Bank).

10. Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years
of age): It is defined as the sum of recorded and unrecorded alcohol amounts consumed per
person, 15 years of age or older, over a calendar year, in liters of pure alcohol (World Bank).
According to the World Health Organization, alcohol consumption causes more than 200
disease and injury conditions, such as liver cirrhosis, some cancers and cardiovascular
diseases, as well as injuries resulting from violence and road incidents.

11. Mortality rate attributed to household and ambient air pollution, age-standardized (per
100,000 population): It is the number of deaths attributable to the effects of household and
ambient air pollution in a year per 100,000 population, standardized for age. Air pollution
is one of the major environmental risks for our health. According to the World Health
Organization, the combined effects of outdoor and household air pollution cause about 7
million premature deaths every year. These deaths are due mainly stroke, heart disease,
chronic obstructive pulmonary disease, lung cancer, and acute respiratory infections and
affects more populations in low- and middle-income countries.

Descriptive Statistics Analysis

Life Expectancy

The histogram shown below gives a quick snapshot of the life expectancies of 120 countries
considered how many countries fall under each age class. Negative skewness, data towards right
on age axis, shows that life expectancy is higher for most countries.

10
Figure 2. Histogram Life expectancy

Life expectancy of 120 countries


35
30
25
Frequency

20
15
10
5
0
50-55 56-60 61-65 66-70 71-75 76-80 81-85
Age in years

Descriptive statistics reveals the following information:

Table 2. 5 Number Summary for Life Expectancy on R

Mode Variance Standard Standard Skewness


Deviation Error
53.44 63.23 7.95 0.73 0.62

Negative skewness is the evident that Mean age is less than the Median age.

Life Expectancy and Health Expenditure

We plotted a line diagram to see if a relationship exists between “Life expectancy” and “Current
expenditure in health (% of GDP).” Since the output for 120 countries would be large, we choose
region-wise expenditure with respect to the proportion to the GDP.

11
Figure 3. Life expectancy against Current expenditure in health in proportion to GDP

Life expectancy against Current expenditure in health


proportion to GDP
100.00
Life expectancy in years

80.00

60.00

40.00

20.00

0.00
East Asia & Latin America & Europe & Middle East & North America Sub-Saharan
Pacific Caribbean Central Asia North Africa Africa

Life Exp Current Health Expenditire % of GDP

The line plot above shows that expenditure on the health system has a positive relationship with
life expectance. Growth in expenditure shows an increase in life expectancy. However, since GDP
is an absolute value, expenditure per population (say 10000 people) would have given better
correlation.
Table 3. Correlation between Life expectancy and the other variables.

Life Expectancy and Mortality due to Road Injury, Lack of Hygiene and Pollution

We drew a scattered graph to see if the mortality rate due to road traffic injury, unsafe water/
unsafe sanitation and lack of hygiene; and household and ambient air pollution per 100,000
population affects the life expectancy. The correlation between life expectancy and mortality due

12
to traffic injury, unsafe sanitation, and air pollution is -0.763, -0.833, and -0.896. These values
explain that there is a strong relationship between these variables and life expectancy. An increase
in any of these three variables negatively affects life expectancy. In other words, a reduction in
these variables will increase life expectancy. Scattered diagrams are shown below:

Figure 4. Life expectancy and mortality (other than natural death), region-wise

Life expectance and Mortality rate


Mortality per 100,000 population

200.00
Life expectancy in years

175.00
150.00
125.00
100.00
75.00
50.00
25.00
0.00
0 1 2 3 4 5 6 7
1. East Asia & Pacific 2. Latin America and Carribean 3. Europe and Central Asia 4. Middle
East and North Africa 5. North America 6. Sub-Saharan Africa

Life Exp Road injury Lack of hygiene Air pollution

Figure 5. Life expectancy and mortality (other than natural death), income-wise

Life expectance and Mortality


Mortality per 100,000 population

200.00
Life expectancy in years

150.00

100.00

50.00

0.00
0 1 2 3 4 5 6
1. High Income 2. Low Income 3. Lower Middle Income 4. Upper Middle Income 5. World
Aggregate

Life Exp Road injury Lack of hygiene Air pollution

Life Expectancy and Infant Mortality

13
The bar diagram of the observations between infant mortality rate and life expectancy shows that
the two variables are inversely related. The correlation between life expectancy and infant
mortality is -0.922. This proves there is a strong negative correlation meaning lower the infant
mortality rate higher the life expectancy.

Figure 6. Life expectancy and infant mortality rate

Life expectancy and Infant mortality Rate


LE years and mortality per 1000 births

90.00
80.00
70.00
60.00
50.00
40.00
30.00
20.00
10.00
0.00
East Asia & Latin America & Europe & Middle East & North America Sub-Saharan
Pacific Caribbean Central Asia North Africa Africa

Life Exp Mortality rate, under-5 (per 1,000 live births)

Life Expectancy and Alcohol Consumption

The stacked bar diagram of the observations between alcohol consumption and life expectancy
shows that the two variables are not tightly associated. The correlation between life expectancy
and alcohol consumption is -0.34. It is a slightly weak negative correlation meaning life
expectancy has little impact.

14
Figure 7. Life expectancy and Alcohol consumption

Life expectancy and Alcohol consumption


Life expectancy in years, aclohal 100.00
6.55 6.93 9.88 0.80 9.71
80.00
6.26
60.00
consumption ltrs

40.00
20.00
75.64 75.12 77.59 73.71 78.88 60.44
0.00
East Asia & Latin America Europe & Middle East & North America Sub-Saharan
Pacific & Caribbean Central Asia North Africa Africa
Region

Life Exp Alcohol consumption per capita

Life Expectancy and Mortality due to disease

The chart below shows observations between mortality due to diseases, and life expectancy shows
that the two variables negatively associated with, increase in mortality due to diseases decrease
life expectancy. The correlation between life expectancy and mortality due to diseases is -0.71. A
strong negative correlation meaning increase in diseases decreases life expectancy.
Figure 8. Life expectancy and mortality due to diseases

Life expectancy and Mortality due to diseases


Sub-Saharan Africa
North America
Region

Middle East & North Africa


Europe & Central Asia
Latin America & Caribbean
East Asia & Pacific

0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00
Life expectance and mortality percentage

Mortality from CVD, cancer, diabetes or CRD between exact ages 30 and 70 (%) (2016) Life Exp

15
The correlation between life expectancy with population and GDP have low correlation, -0.036
and 0.20 respectively; whereas prevalence of HIV has moderate correlation (-0,508).

Descriptive statistics summarizes and graphs a group that we choose; it allows us to understand
the specific set of observation of interest to gain visual interpretation of the data than raw numbers.
We represented data and variables that are similar in character to draw different types of graphs
and charts to understand if there exists relationship. For categorical representation, we used region-
wise and income-wise grouping. However, it is not practically possible to select all elements and
variables of the population and analyze data for possible relationship. Therefore, for more precise
insight including all variables of the dataset and to draw conclusion that the variables of the dataset
demonstrate relations, we use mathematical approach known as inferential statistics.

Inferential statistics

Multiple Linear Regression


We tested our model for “Life Expectancy” with a multiple linear regression (MLR) both in excel
and R. The MLR has life expectancy as the dependent variable and the other ten quantitative
variables described before as independent variables. The summary of the MLR at a 95%
confidence level is reported in Tab. 4 for R and Tab. 5 for Excel, which produced almost identical
results for the parameter.

16
Table 4. Output Multiple Linear Regression on R

17
Table 5. Output Multiple Linear Regression on Excel
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.986733219
R Square 0.973642445
Adjusted R Square 0.971224321
Standard Error 1.348868584
Observations 120

ANOVA
df SS MS F Significance F
Regression 10 7325.885923 732.5885923 402.6436667 3.44135E-81
Residual 109 198.3196637 1.819446456
Total 119 7524.205587

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 86.13555624 0.796270257 108.1737707 9.665E-113 84.55737448 87.713738 84.55737448 87.713738
Population 1.53343E-09 1.09175E-09 1.404567653 0.162993132 -6.30376E-10 3.69725E-09 -6.30376E-10 3.69725E-09
Current health expenditure (% of0.096594348
GDP) 0.061035132 1.58260242 0.116409458 -0.0243753 0.217563996 -0.0243753 0.217563996
GDP (current US$) -2.86356E-14 8.39037E-14 -0.341290656 0.73354181 -1.9493E-13 1.37659E-13 -1.9493E-13 1.37659E-13
Mortality caused by road traffic-0.139449653
injury (per 100,000
0.020051238
people) 2016
-6.954665405 2.74087E-10 -0.179190556 -0.099708751 -0.179190556 -0.099708751
Mortality rate attributed to unsafe
-0.052899368
water, unsafe
0.017399974
sanitation and
-3.040198068
lack of hygiene
0.002960857
(per 100,000
-0.087385552
population)-0.018413185
(2016) -0.087385552 -0.018413185
Prevalence of HIV, total (% of population
-0.330636747
ages 0.034035687
15-49) -9.714413696 1.91724E-16 -0.398094373 -0.263179122 -0.398094373 -0.263179122
Mortality rate, under-5 (per 1,000
-0.098689957
live births) 0.016230294 -6.080601767 1.80439E-08 -0.130857873 -0.066522041 -0.130857873 -0.066522041
Mortality from CVD, cancer, diabetes
-0.376968867
or CRD between
0.035595939
exact ages
-10.5902211
30 and 70 (%)
1.90527E-18
(2016) -0.447518862 -0.306418873 -0.447518862 -0.306418873
Total alcohol consumption per capita
-0.031816775
(liters of 0.035888344
pure alcohol, -0.886548982
projected estimates,
0.377273616
15+ years
-0.102946305
of age) 0.039312756 -0.102946305 0.039312756
Mortality rate attributed to household
-0.011926911
and ambient
0.005627714
air pollution,
-2.119317058
age-standardized
0.036334086
(per 100,000
-0.023080857
population)
-0.000772964 -0.023080857 -0.000772964

The output depicts that the linear equation is:


ylife_exp = 86.14 + 1.5*10-9 xpop + 0.098 xhealth_exp – 2.89*10-14 xGDP – 0.139 xmrt_traffic – 0.052 xunsafe_san
– 0.33 xHIV – 0.098 xmrt_5yo– 0.377 xmrt_diseases– 0.033 xalcohol – 0.012 xmrt_pollution

Estimate values for the variables: The intercept is positive; this means that when all the dependent
variables are equal to 0, the life expectancy would be around 86 years. The relation is also positive
with population and health expenditure, despite not being significant. All other variables have
negative estimated values: as mortality rates, alcohol consumption, and weak immunization caused
by HIV increase, life expectancy decreases. The strongest parameter that shortens life expectancy
can be considered the mortality by diseases, which include cardiovascular disease, cancer, diabetes,
or chronic respiratory disease, which represent the highest diseases that lead to death.

18
Goodness of fit: R square is a basic matrix that tells us about how much variance has been
explained by the model. R2 is 0.9737. Hence approximately 97.4% of the variation in Life
Expectancy can be explained by our model (Tab. 4). Since R2 value increases irrespective of the
variable significance, adjusted, R2 values are used while doing a multiple linear regression because
the later calculates additional variables whose addition in the model are significant. Adjusted R 2
value, 0.9713, illustrates that the multiple linear regression model explains almost all variances,
and therefore there is a very close correlation between the Life Expectancy and all other dependent
variables.
p-value: The p-values tells the significance of the linear model. At 95% significance level, when
the p-value is less than 0.05, the model is significant. This is the case for the mortality rates: the
mortality due to traffic, due unsafe sanitation, under 5 years old, due to diseases and due to air
pollution. Population, health expenditure, GDP, alcohol consumption’s p- values are higher than
0.1, therefore we do not reject the Null. We conclude there is no significant relationship with these
variables and life expectancy.

F-statistics provides the overall statistical significance of our model, this tests whether the null
hypothesis has at least a parameter equal or not to 0.

H0 : bpop = bhealth_exp = bGDP = … = bair_poll = 0


Ha : At least one is different from zero
In our case, F calculated (numerator at 10 and denominator at 108 degree of freedom) is 400.2
which is much higher than the critical value of F, therefore we reject null hypothesis and we can
consider our model statistically significant.

Residual Standard Error: is 1.353 with 108 degrees of freedom. It gives an idea on how far the
observed life expectancy values are from the predicted life expectancy values.

19
Discussion of the results obtained and Conclusion

In conclusion, we can say that only several factors related to mortality are statistically significant
in our multiple linear regression models. Other factors, such as population size, GDP, alcohol
consumption, and Health expenditure effects, are not captured in our model. The correlation of
these factors with life expectancy is also weak. This result may not mean there is no relation, but
instead that the relationship may be studied with more sophisticated tools. However, different types
of mortality have a high correlation with life expectancy. Several policies and initiatives may be
taken to lower mortality rates, mortality due to traffic injuries may be reduced with tighter road
laws. Improvements in sanitation, access to good quality medical infrastructures, devices, and
investment in research should improve all the mortality rates, both from injuries, diseases.
Initiatives on sanitation, give access to clean water may reduce the mortality of children under five
years old, and the mortality due to unsafe sanitation.

References

20
 Pongiglione, B. (2015), “A Systematic Literature Review of Studies Analyzing
Inequalities in Health Expectancy among the Older Population”
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130747
 Convenience sampling, http://dissertation.laerd.com/convenience-sampling.php
 World Bank Data, https://data.worldbank.org/indicator/SP.DYN.LE00.IN
 Anderson, D. (2011), Statistics for business and economics, Eleventh Ed., South-Western
CENTAGE Learning.

21

View publication stats

You might also like