You are on page 1of 20

Introduction

When India got Independence the average life expectancies of the people was around 35 years
but the current Indian population has got the life expectancies of 68.3 years. Many people have
suggested various reasons for that, one of them being better medical facility. Few people also say
that this is an effect of rise in economy or rise in per-capita income. This rise in economic
growth has led to urbanization in India. The Urban population in India have the access of better
medical facilities because they have comparatively higher per-capita income as compared to the
rural population. In the current study, we have tried to forecast the average life expectancy of the
Indian population by using the above variables like Per-capita income, Economic Growth, and
Urban population. For this we have tried to consider univariate as well as the multivariate
techniques for the analysis.

Literature review

As we are living in the modern era the average life expectancy of the Indian Population has gone
very high as compared to the earlier nineteenth century where the average life expectancy of
Indian population was around 40 years but now it has grown up to 68.3 years combined together
with males and females. In this male stands at 66.6 and females at 69.9 years respectively. So,
what could be the reason for such rise in human life expectancy. We are trying to find the reason
for such an increase in Average life expectancy of the human being.

The James C. Riley in his paper (Riley, 2011) discovered that the disparity in the life expectancy
of the human being could be attributed to the medical facilities available in the different
countries. This is very common that the developed countries used to have higher medical
facilities, which is an effect of higher per capita income, High economic growth of the respective
country. In the case of India, population can be divided into two main categories, rural
population (which have limited access to the basic medical facilities) and the urban population
which has got comparatively high income and better access to medical facilities.
So, in the paper what they have tried to do is that they have tried to consider per-capita income,
Economic Growth, and Urban population as a variable to calculate their effect on the life
expectancy of the people of India. In order to do the univariate analysis, the most of the author
suggest the le-carter model for the forecasting of the life expectancy of the population. This
model was introduced by the Ronald D. Lee and Lawrence Cater in 1992 with the article
“Modeling and forecasting the time series of U.S Mortality” (Lee & Carter, 1992). This model
describes the time series movement of the age-specific mortality as a function of latent level of
mortality, which also known as overall index of mortality which can be forecasted by using
simple time series analysis method. This method is currently used by many countries to forecast
their mortality rate. As there is some drawback associated with the Lee carter Model, and this
drawback is also associated with the Univariate analysis like, it concerns the stability over time
which is usually not the case (because of some structural changes trend my change), also
inaccurate modeling of the past data brings biased results. This is why people started to think that
we need to develop a model which could include structural change and they can eliminate the
effect of inaccuracy of the previous data. For that they suggested ARIMA model as most suitable
method for the forecasting of mortality index (Nunes & Coelho, 2011) .

Methodology
The data has been collected from secondary sources of data. The source for secondary data for
life expectancy of India is United Nations Database. We are using past 55 years of data starting
from 1960 to 2014. We will be using the univariate techniques such as:

1. Simple Regression
2. Decomposition Techniques
3. ARIMA
4. Smoothening Techniques

The multivariate techniques will involve use of multiple regression with life expectancy as the
dependent variable and some other independent variables like per capita income, Economic
growth and Urban population. Data for independent variables has been collected from World
Bank data repository.
We have applied decomposition techniques in univariate analysis. A strong trend is observed
while analyzing per Capita income through decomposition. Coefficient of determination, R 2 is
found to be higher than 99 % in case of cubic functional form regressed with time as independent
variable. These results show a strong upward trend for first(a=752.885) and third
degree(c=0.867417) of Time while coefficient of T2 is negative (b= -39.052). The order of
moving average used for determining cyclicity is 5.

Simple Regression
Simple regression is a statistical technique commonly referred as simple linear regression. It
concerns with the relationship between one dependent and one independent variable. We have
used simple regression with life expectancy as dependent variable and time as independent
variable. It shows a straight-line relationship between the variables.

The method involves choice of functional form as the first step in the process. We have selected
five types of functional forms for regression with time and for each form regression has been
carried out separately. The five types of functional forms used are linear, quadratic, cubic,
exponential and double-log form. The rationale behind choosing multiple forms is to explore the
forms and get the maximum value of coefficient of determination. Coefficient of determination
R2 represents the ratio of the sum of squares of regression to total sum of squares. A higher value
of R2 is desirable as it explains the variation in the dependent variable with the independent
variable. For example, 0.7 value of R2 represents 70% of variation is explained by the regressed
relationship. The relationship is defined by Y = a + bT.

The significance of each of the coefficients of time variable is checked by analyzing the output
by using t-test. A confidence interval of 95% is used across the study. Simple Linear regression
uses the ordinary least squares method for producing the best -fit line.

Decomposition Techniques
Decomposition is a method used for breaking data into its components. There are four
components of data namely trend, seasonality, cyclicity, irregularity or randomness. Any
variation or pattern in data is explained by these components. Trend includes data showing a
continuously increasing or decreasing pattern over a period of time. The time period over which
trend is exhibited is always more than 1 year. Since the data, we have selected is of Life
Expectancy, it is normally expected to show a strong trend in a developing country like India.
Seasonality occurs when the data repetitively shows a pattern over a fixed interval of time. The
time period for seasonality is always less than 12 months. Seasonality may occur in monthly data
like sales, or quarter data or daily data but it will never in annual data. Since the data we have
used is annual data, no seasonality will be shown. This would eliminate the calculation of
seasonality factor for this study.

A multiplicative model of the components will be assumed for determining the components from
the actual data. Since the data is of life expectancy over years, it is expected to show a strong
trend component. A simple increasing trend is observed on an initial analysis. Cyclical
components will affect the average life expectancy as the number of deaths might increase
during periods of recession and depression. Even during famines, the number of deaths per
thousand persons increase which leads to reduction in the mortality rate. Random variations in
the data are expected to be very low as the major part will be explained by trend and other by
cyclical. If due to some disasters or outbreak of diseases, the life expectancy is reduced, then
these will be included in random variations. These outbreaks have no period or pattern in
occurrence and affect life expectancy in different proportions every time. Hence, they would
form part of random component of data. Entire dataset from 1960 to 2014 for life expectancy
will be used for decomposition technique.

Exponential Smoothening
Brown’s Method
Brown’s method of exponential smoothening involves calculation of simple exponential
smoothening(SES) and double exponential smoothening(DES). The initial value of SES and
DES have to be assumed in Brown’s method. In our study, we will be using the first observation
as the initial value for DES and SES respectively. The trend estimate b is calculated by

bt= (α/(1- α)) *(SESt - DESt)

where α is the smoothening constant used for calculating the single and double exponential
smoothening values. The adjustment coefficient at is determined by

at = 2SESt-DESt
The value for double exponential smoothening will be derived using

DES = α(𝑺𝑬𝑺𝒕 ) +(1-α)𝑫𝑬𝑺𝒕−𝟏

The forecasts will be derived using the below equation where m represents the period for the
which the forecasts has to be made.

𝑭𝒕+𝒎 = 𝒂𝒕 + 𝒃𝒕 (𝒎)

An appropriate value of α will be selected by trial and error method. The objective of the method
will be to forecast the data for a period of 5 years from 2014 i.e., 2019. The value of α will lie
between 0 and 1.

Multiple Regression
As part of multivariate technique, we will be using multiple regression for forecasting the life
expectancy. Life Expectancy is the dependent variable and independent variables used are Per
Capita Income, GDP growth rate and Urban Population. The techniques of multiple regression
are used when the variations in a dependent variable cannot be explained by single independent
variable. The technique uses Ordinary Least Squares method for estimating linear equations.
OLS method finds the coefficients of the independent variable by finding the best fit line. The
best fit line is the one which has the least residuals meaning thereby the difference between the
forecasted value (through best-fit line) and the actual value is the least. The sum of all such
differences is error.

Error obtained should be random. The randomness of error is checked by observing the mean of
error and variance. If the mean error is zero and the variance of error is constant, then error is
following a probability distribution with mean zero and constant standard deviation. The
occurrence of any value of error can be calculated by taking the z-value for that error value. The
error value should not follow any pattern or it indicates that there is some problem in the model
formation or there exists some correlation between the selected set of independent variables.

The literature review has suggested other factors like number of physicians per 1000 people,
Healthcare expenditure, Average last mile distance for hospital along with the above independent
variables. But we have not used them as the data for these variables was not available for the full
period under consideration. Inconsistency in data interval was also observed- i.e., Data for
number of physicians was collected on five-year basis. Healthcare expenditure data was
available from year 1990. So, we have restricted our approach to three independent variables
only.

Y = b0 + b1*x1 + b2*x2 + b3*x3


Where,
Y = Life Expectancy
X1 = Urban Population
X2 = GDP per Capita (constant LCU)

X3 = GDP growth rate

Data Analysis
The initial analysis shows that life expectancy is constantly increasing from the period of 1960 to
2014. The line graph of life expectancy data is shown in Figure 1.

Life Expectancy
80
70
60
50
40
30
20
10
0

Life Expectancy

Figure 1: Life Expectancy over years

Simple Regression
The output of simple regression is shown in Table 1. Five types of functional forms are used for
performing the simple regression of variable life expectancy with time. Life expectancy is the
dependent variable and time is the independent variable. In quadratic and cubic form, time with
power two and three are used for finding the coefficients. Comparing these values, we were able
to obtain high values of R2 for all the functional form. But for cubic form, value of R2 was 0.999
which is very close to 1. Hence, we could interpret that the cubic form was able to explain all the
variations in data by use of simple regression.

Simple Regression- Life Expectancy

Coefficent Coefficient Coefficent


Equation Functional Form Constraint T T2 T3 R2
1 LINEAR FORM 42.4506 0.487452 0.989313
QUADRATIC
2 FORM 40.67012 0.674871 -0.00335 0.998705
3 CUBIC FORM 40.06785 0.798399 -0.00881 6.51E-05 0.999394
EXPONENTIAL
4 FORM 1.636237 0.00387 0.97151
DOUBLE LOG
5 FORM 1.538667 0.154927 0.917382
Table 1:Simple Regression Functional Form

The ANOVA output for simple regression with cubic form is shown below.

ANOVA
df SS MS F Significance
F
Regression 3 3326.822788 1108.940929 28021.38 5.25596E-82
Residual 51 2.018315723 0.039574818
Total 54 3328.841104

Coefficie Standard t Stat P-value Lower Upper Lower Upper


nts Error 95% 95% 95.0% 95.0%
Intercept 40.06784 0.115049 348.2660 8.85E-88 39.83687 40.29882 39.83688 40.29882
538 538 688
Period(T) 0.798398 0.017632 45.28043 7.42E-43 0.763000 0.833797 0.763001 0.833797
317 106 559
T2 - 0.000728 - 1.3E-16 - -0.00735 -0.01027 -0.00735
0.008812 1 12.10278 0.010273
06 751
T3 6.50626E 8.55116E 7.608626 5.97E-10 4.78954E 8.22E-05 4.79E-05 8.22E-05
-05 -06 473 -05
Table 2: ANOVA output for Simple Regression
The observed t-stat value for all the coefficients is very high (t = 1.96) at 95% confidence level.

The value for F-stat is 28021.38 which is also very high. Both of these high values indicate that
each of the variables in the form of time(T) and its square and cubic forms are significant. The
regression equation has an intercept value of 40.06784. The regression equation which is formed
from the above output is

LifeExpectancy = 40.0678 + 0.7938*Time - 0.008812*Time2 + 0.00006*Time3

We will be using the above equation to generate forecast. To determine the accuracy of the
method, Mean Absolute Percentage Error and Thiel U2 are used for comparison with Naïve
Method.

Method MAPE U2 Numerator U2 Denominator Thiel U2


Simple 0.2933087 9.94867E-05 0.356671
Regression 1.27E-05
Table 3: Measures of Relative Error Simple Regression

The MAPE came out to be 0.29% which is very low. This suggests that the method is suited for
this purpose. The low MAPE can also be accounted for the nature of data. Simple regression
equations are better at estimating trends. Having a very low MAPE for simple regression
indicates presence of strong trend in data. By looking at the coefficients we can say there is an
increasing trend as the signs of coefficients are positive except for Time 2 which has very low
coefficient. This impacts Life expectancy to a very low level and hence we can say that there is
an increasing trend in Life expectancy.

Thiel U2 for data is 0.36 which is less than 1. This indicates that the method is better than Naïve
method of forecast. The Naïve method suggests that forecast is equal to the next actual value. A
value of 0.36 for U2 is very desirable which clearly indicates that this method is better than
Naïve method. The entire dataset and output can be found in Annexure 2.

Decomposition
Decomposition technique output is attached in Table 4. The data shows a very strong increasing
trend over 55 years.
Cyclical C
Year Life Expectancy Period(T) Trend(T) CI (5) Irregularity(I)
1960 41.17195122 1 40.8575 100.77%
1961 41.7904878 2 41.62992 100.39%
1962 42.41741463 3 42.38549 100.08% 100.14% 99.93%
1963 43.05273171 4 43.12462 99.83% 99.90% 99.93%
1964 43.69841463 5 43.84768 99.66% 99.72% 99.94%
1965 44.3535122 6 44.55506 99.55% 99.61% 99.94%
1966 45.0185122 7 45.24717 99.49% 99.54% 99.95%
1967 45.69092683 8 45.92438 99.49% 99.53% 99.96%
1968 46.3677561 9 46.58709 99.53% 99.56% 99.97%
1969 47.047 10 47.2357 99.60% 99.63% 99.97%
1970 47.72707317 11 47.87058 99.70% 99.73% 99.97%
1971 48.40741463 12 48.49213 99.83% 99.85% 99.98%
1972 49.08697561 13 49.10074 99.97% 99.98% 99.99%
1973 49.76029268 14 49.69681 100.13% 100.13% 100.00%
1974 50.42341463 15 50.28071 100.28% 100.27% 100.02%
1975 51.06792683 16 50.85285 100.42% 100.40% 100.02%
1976 51.68946341 17 51.41361 100.54% 100.50% 100.03%
1977 52.28463415 18 51.96338 100.62% 100.58% 100.04%
1978 52.84902439 19 52.50255 100.66% 100.62% 100.04%
1979 53.38070732 20 53.03152 100.66% 100.61% 100.05%
1980 53.87470732 21 53.55067 100.61% 100.55% 100.05%
1981 54.32853659 22 54.06039 100.50% 100.45% 100.05%
1982 54.74460976 23 54.56108 100.34% 100.30% 100.03%
1983 55.13234146 24 55.05312 100.14% 100.13% 100.02%
1984 55.49963415 25 55.53691 99.93% 99.94% 99.99%
1985 55.86090244 26 56.01283 99.73% 99.76% 99.97%
1986 56.2305122 27 56.48128 99.56% 99.60% 99.95%
1987 56.61843902 28 56.94264 99.43% 99.49% 99.94%
1988 57.03112195 29 57.39731 99.36% 99.42% 99.94%
1989 57.47304878 30 57.84568 99.36% 99.41% 99.94%
1990 57.94373171 31 58.28813 99.41% 99.46% 99.95%
1991 58.43821951 32 58.72507 99.51% 99.54% 99.97%
1992 58.94507317 33 59.15687 99.64% 99.65% 99.99%
1993 59.45292683 34 59.58393 99.78% 99.78% 100.00%
1994 59.95482927 35 60.00663 99.91% 99.90% 100.02%
1995 60.44436585 36 60.42538 100.03% 100.01% 100.02%
1996 60.91560976 37 60.84056 100.12% 100.10% 100.02%
1997 61.36956098 38 61.25255 100.19% 100.17% 100.02%
1998 61.80721951 39 61.66176 100.24% 100.21% 100.02%
1999 62.22707317 40 62.06856 100.26% 100.23% 100.02%
2000 62.63063415 41 62.47336 100.25% 100.23% 100.02%
2001 63.01985366 42 62.87654 100.23% 100.21% 100.01%
2002 63.39919512 43 63.27849 100.19% 100.18% 100.01%
2003 63.77453659 44 63.6796 100.15% 100.15% 100.00%
2004 64.14780488 45 64.08027 100.11% 100.11% 99.99%
2005 64.52387805 46 64.48088 100.07% 100.08% 99.99%
2006 64.90809756 47 64.88182 100.04% 100.05% 99.99%
2007 65.30043902 48 65.28348 100.03% 100.03% 99.99%
2008 65.69943902 49 65.68626 100.02% 100.02% 100.00%
2009 66.10263415 50 66.09055 100.02% 100.02% 100.00%
2010 66.50614634 51 66.49672 100.01% 100.00% 100.01%
2011 66.90417073 52 66.90519 100.00% 99.98% 100.02%
2012 67.28987805 53 67.31633 99.96% 99.93% 100.03%
2013 67.66041463 54 67.73053 99.90%
2014 68.01380488 55 68.14819 99.80%
Table 4: Decomposition of Life Expectancy

The trend component in the results was calculated by using regression equation. Five functional
forms were used to check for the maximum relevance. Since the output would same as simple
regression method, we have used it instead. The trend values show a lag than their actual counter
parts. Except for values for observations 52-55, the trend component is always lesser than the
actual value. This showed that there was some other component than trend involved in
determining life expectancy.

While calculating cyclicity, a moving average of order 5 was used. This was determined by Trial
and Error method. Four different moving averages of order 3,4,5 and 12 were used to determine
the order of moving average. Thiel U2 was used as a measure for comparing different moving
averages. Initially order 3 and 12 were taken to determine the direction of movement of order of
moving average.

Thiel Coefficient for Moving Averages for Cyclicity

Order=3 Order=4 Order=5 Order=12


U2
0.210575473 0.219432 0.184207952 0.200797
Table 5: Order determination for Cyclicity
From the above table it can be seen that moving average with order 5 has the least value among
the four and hence we have selected order of moving average as 5 for calculating cyclicity.

The cyclical component was less than 1 percent in all the cases followed by random irregular
variations less than 1 percent. This shows that 98% of variation in life expectancy was explained
by Trend component. The only significant component was trend while random and cyclical
components were not very important.

Exponential Smoothening
Brown’s Method
The output of Brown’s method is calculated based on various values of α. MAPE and Thiel U2
are collectively used as relative measures for determining the accuracy of the method for various
values of α.

Alpha MAPE Thiel U2


0.1 1.938320254 2.839708267
0.2 0.799373269 1.233985209
0.5 0.168576529 0.31808415
0.6 0.113573691 0.224184435
0.7 0.078257313 0.15616104
0.8 0.054471734 0.102959274
0.9 0.037835335 0.059778912
0.95 0.031241748 0.042654101
0.97 0.028862383 0.037361929
0.99 0.026614954 0.033517516
Table 6: Brown’s Method Smoothening Constant

Table 6 shows different values of MAPE and U2 for various values of α. As the value of α is
approaching 1, the value of MAPE and Thiel U2 is reducing. This shows that DES would be
approximately equal to SES. The only variation in DES and SES will be due to previous value of
DES. The output of Brown’s method is shown in Table 7.

Browns Method
Actual SES DES a b Forecast
Year Time 1 2 3 4 5 6
1960 1 41.17195 41.17195 41.17195 41.17195
1961 2 41.79049 41.7843 41.77818 41.79043 0.606228 41.17195
1962 3 42.41741 42.41108 42.40475 42.41741 0.626576 42.39665
1963 4 43.05273 43.04632 43.0399 43.05273 0.635145 43.04399
1964 5 43.69841 43.69189 43.68537 43.69841 0.645474 43.68788
1965 6 44.35351 44.3469 44.34028 44.35351 0.654907 44.34389
1966 7 45.01851 45.0118 45.00508 45.01851 0.6648 45.00842
1967 8 45.69093 45.68414 45.67734 45.69093 0.672264 45.68331
1968 9 46.36776 46.36092 46.35408 46.36776 0.676739 46.36319
1969 10 47.047 47.04014 47.03328 47.047 0.679195 47.04449
1970 11 47.72707 47.7202 47.71333 47.72707 0.680056 47.72619
1971 12 48.40741 48.40054 48.39367 48.40741 0.680336 48.40713
1972 13 49.08698 49.08011 49.07325 49.08698 0.679576 49.08775
1973 14 49.76029 49.75349 49.74669 49.76029 0.673442 49.76655
1974 15 50.42341 50.41672 50.41002 50.42342 0.663327 50.43373
1975 16 51.06793 51.06141 51.0549 51.06793 0.644886 51.08674
1976 17 51.68946 51.68318 51.6769 51.68947 0.621999 51.71281
1977 18 52.28463 52.27862 52.2726 52.28464 0.595702 52.31147
1978 19 52.84902 52.84332 52.83761 52.84903 0.565011 52.88034
1979 20 53.38071 53.37533 53.36996 53.38071 0.532343 53.41404
1980 21 53.87471 53.86971 53.86472 53.87471 0.49476 53.91305
1981 22 54.32854 54.32395 54.31936 54.32854 0.45464 54.36947
1982 23 54.74461 54.7404 54.73619 54.74461 0.416837 54.78318
1983 24 55.13234 55.12842 55.1245 55.13234 0.388307 55.16145
1984 25 55.49963 55.49592 55.49221 55.49964 0.367708 55.52065
1985 26 55.8609 55.85725 55.8536 55.8609 0.361394 55.86734
1986 27 56.23051 56.22678 56.22305 56.23051 0.369446 56.2223
1987 28 56.61844 56.61452 56.61061 56.61844 0.38756 56.59996
1988 29 57.03112 57.02696 57.02279 57.03112 0.412185 57.006
1989 30 57.47305 57.46859 57.46413 57.47305 0.441337 57.4433
1990 31 57.94373 57.93898 57.93423 57.94373 0.470102 57.91438
1991 32 58.43822 58.43323 58.42824 58.43822 0.494005 58.41383
1992 33 58.94507 58.93995 58.93484 58.94507 0.5066 58.93222
1993 34 59.45293 59.4478 59.44267 59.45293 0.50783 59.45167
1994 35 59.95483 59.94976 59.94469 59.95483 0.502021 59.96076
1995 36 60.44437 60.43942 60.43447 60.44437 0.489784 60.45685
1996 37 60.91561 60.91085 60.90608 60.91561 0.471612 60.93415
1997 38 61.36956 61.36497 61.36038 61.36956 0.454301 61.38722
1998 39 61.80722 61.8028 61.79837 61.80722 0.437988 61.82386
1999 40 62.22707 62.22283 62.21859 62.22707 0.420213 62.24521
2000 41 62.63063 62.62656 62.62248 62.63064 0.403891 62.64729
2001 42 63.01985 63.01592 63.01199 63.01986 0.38951 63.03453
2002 43 63.3992 63.39536 63.39153 63.3992 0.379542 63.40936
2003 44 63.77454 63.77074 63.76695 63.77454 0.375424 63.77874
2004 45 64.1478 64.14403 64.14026 64.14781 0.373311 64.14996
2005 46 64.52388 64.52008 64.51628 64.52388 0.376018 64.52112
2006 47 64.9081 64.90422 64.90034 64.9081 0.384057 64.8999
2007 48 65.30044 65.29648 65.29252 65.30044 0.392177 65.29215
2008 49 65.69944 65.69541 65.69138 65.69944 0.398865 65.69262
2009 50 66.10263 66.09856 66.09449 66.10263 0.40311 66.0983
2010 51 66.50615 66.50207 66.49799 66.50615 0.403505 66.50574
2011 52 66.90417 66.90015 66.89613 66.90417 0.398133 66.90965
2012 53 67.28988 67.28598 67.28208 67.28988 0.385954 67.3023
2013 54 67.66041 67.65667 67.65292 67.66042 0.370842 67.67583
2014 55 68.0138 68.01023 68.00666 68.01381 0.353736 68.03126
2015 56 68.36754
2016 57 68.72128
2017 58 69.07501
2018 59 69.42875
2019 60 69.78249
Table 7: Forecast using Brown’s Method

When forecasting using Brown’s method, it is necessary to decide a period for which forecasts
can be generated. We have selected a period of 5 years for forecast as the study is aimed at using
forecasting techniques to predict the values of life expectancy up to year 2019. Brown’s method
is good for handling data with trends. We have not used Winter’s method as it is more effective
when the data is exhibiting trend as well as seasonality. The current data is showing only trend as
observed while using decomposition techniques and hence the choice for exponential
smoothening techniques is between Brown and Holt’s method. While calculating U2 and MAPE,
the values came out to be very low and hence we have not felt the need to apply Holt’s method.
57
53
49
45
Life Expectancy (Years)

41
37
33 Series2
29
25 Series1
21
17
13
9
5
1
0 10 20 30 40 50 60 70 80

Figure 2: Brown’s Method Forecast


Multiple Regression
As part of multivariate technique, we will be using multiple regression for forecasting the life
expectancy. Life Expectancy is the dependent variable and independent variables used are Per
Capita Income, GDP growth rate and Urban Population. The ANOVA output for multiple
regression is shown below:

SUMMARY
OUTPUT

Regression Statistics
Multiple R 0.990211452
R Square 0.98051872
Adjusted R
Square 0.979372763
Standard Error 1.127639257
Observations 55

ANOVA
Significance
df SS MS F F
Regression 3 3263.991019 1087.997006 855.6326 1.39048E-43
Residual 51 64.85008502 1.271570294
Total 54 3328.841104

Standard Upper
Coefficients Error t Stat P-value Lower 95% 95%
Intercept 38.74972043 0.396868211 97.63876104 1.15E-59 37.95297425 39.54647
Urban population 1.17918E-07 4.58281E-09 25.73050452 7.08E-31 1.08718E-07 1.27E-07
GDP per capita - - -
(constant LCU) 0.000261061 2.56684E-05 10.17052375 7.22E-14 0.000312593 -0.00021
GDP Growth - - -
Rate 0.011614199 0.05602744 0.207294844 0.836605 0.124093979 0.100866
Table 8: ANOVA output for Multiple Regression

GDP per capita at constant LCU means that the GDP calculation has been done on constant local
currency. The confidence interval used for calculation is 95%. T-test has been performed for
checking the significance of each variable in multiple regression. From the output, we can see
the multiple regression line is having an intercept suggesting that the line is not passing through
origin. Since the data is about average life expectancy, the value of intercept should not be zero
and this is confirmed by the ANOVA output.
Life expectancy at Urban GDP per capita GDP Growth Forecasted
Year birth, total (years) population (constant LCU) Rate Value
1960 41.17195122 80597394 15203.85464 3.213567412 44.24715548
1961 41.7904878 82711244 15459.4171 3.722742532 44.42378562
1962 42.41741463 85270104 15594.24675 2.931127737 44.69951678
1963 43.05273171 87926199 16194.24668 5.994353262 44.82050495
1964 43.69841463 90685977 17045.80901 7.452950122 44.90668254
1965 44.3535122 93534323 16255.53236 -2.635770112 45.56603708
1966 45.0185122 96479620 15911.55488 -0.05532877 45.97317037
1967 45.69092683 99528560 16801.22849 7.825963033 46.00890156
1968 46.3677561 102693004 17006.41993 3.387929174 46.38002356
1969 47.047 105995689 17731.69905 6.539700298 46.5435225
1970 47.72707317 109459181 18238.92262 5.157229736 46.83557091
1971 48.40741463 113270086 18124.28585 1.642930383 47.35568866
1972 49.08697561 117821286 17613.80971 -0.553301313 48.05113058
1973 49.76029268 122565619 17775.75103 3.295521136 48.52359585
1974 50.42341463 127509099 17572.07334 1.18533626 49.18420218
1975 51.06792683 132621821 18740.06246 9.149912014 49.38966604
1976 51.68946341 137905348 18618.1223 1.663103637 50.13147675
1977 52.28463415 143368899 19517.09103 7.254764585 50.47609998
1978 52.84902439 149029915 20166.38719 5.712532089 50.99204218
1979 53.38070732 154913681 18677.49148 -5.238182703 52.20172141
1980 53.87470732 161046127 19481.77618 6.735821528 52.57581199
1981 54.32853659 167094674 20179.22046 6.006203623 53.11544365
1982 54.74460976 172694859 20401.913 3.475733241 53.74705998
1983 55.13234146 178465642 21389.0025 7.288892902 54.12556239
1984 55.49963415 184383497 21704.12705 3.820737855 54.78139783
1985 55.86090244 190422087 22335.66783 5.254299224 55.31193662
1986 56.2305122 196583439 22889.71615 4.77656417 55.89937972
1987 56.61843902 202861559 23284.14386 3.965355634 56.54613565
1988 57.03112195 209262114 24984.39527 9.627782919 56.79124298
1989 57.47304878 215784885 25918.01808 5.947343329 57.35940863
1990 57.94373171 222412636 26790.88584 5.533454563 57.91787574
1991 58.43821951 229041105 26528.21866 1.056831432 58.82005696
1992 58.94507317 235534919 27428.55893 5.482396022 59.29935209
1993 59.45292683 242129413 28171.76566 4.750776219 59.89143724
1994 59.95482927 248838086 29469.81154 6.658924067 60.32148045
1995 60.44436585 255660006 31099.19528 7.574491841 60.68990611
1996 60.91560976 262616315 32818.06808 7.549522248 61.06174015
1997 61.36956098 269690046 33513.11926 4.049820849 61.75505662
1998 61.80721951 276868152 34934.99965 6.184415821 62.20549611
1999 62.22707317 284132133 37342.86189 8.845755561 62.40254256
2000 62.63063415 291466608 38096.07455 3.840991157 63.12890188
2001 63.01985366 299249745 39248.04848 4.823966264 63.73452271
2002 63.39919512 307913082 40057.10886 3.803975321 64.55671932
2003 63.77453659 316683356 42497.06081 7.860381475 64.90680506
2004 64.14780488 325568976 45129.15002 7.922936613 65.26671799
2005 64.52387805 334543792 48547.54302 9.284831507 65.41678461
2006 64.90809756 343617891 52234.1983 9.263958898 65.52458537
2007 65.30043902 352796785 55884.37637 8.6082046 65.66163978
2008 65.69943902 362065825 57215.65302 3.890957062 66.46187006
2009 66.10263415 371381904 61192.6698 8.479783897 66.46886458
2010 66.50614634 380743507 66550.06955 10.25996306 66.15348306
2011 66.90417073 390151214 70031.40088 6.6383638 66.39604391
2012 67.28987805 399686039 73021.16706 5.618562773 66.75170503
2013 67.66041463 409362870 76900.68691 6.638812736 66.86813757
2014 68.01380488 419234061 81465.4502 7.243471746 66.83342501
2015 429329716 86573.69454 7.56336718 66.68660634
Table 9: Multiple Regression for Life Expectancy

ARIMA
We have not used ARIMA method for univariate analysis as we have thought that due to strong
trend in data using exponential smoothing and regression techniques will yield accurate results.
While feeding data into Eviews for ARIMA, we have seen strong auto correlation but no partial
correlation. By taking first difference by generating series, we were not able to achieve
stationarity of data as the helix shape in Auto correlation was still observed. From our
knowledge, this was indicative of seasonality but since the data is annual data there should be no
seasonality present. On checking individual values also, we could find only increasing values
which is shown in the Figure 4. The correlogram of Life Expectancy is shown in Figure 3. The
correlogram is generated after generating a series with a lag length of 5. The pattern suggests a
combination of trend and seasonality. Trend should be eliminated while taking the first
difference but as seen in correlogram it is not happening and hence we have not used this method
for forecasting. Instead of this, we have used exponential smoothening methods for forecasting
of life expectancy.

Equation used to generate correlogram is

life_expectancy_at_birth1 = life_expectancy_at_birth - life_expectancy_at_birth(-5)


Autocorrelation Partial Correlation AC PAC Q-Stat Prob

1 0.943 0.943 51.636 0.000


2 0.886 -0.030 98.078 0.000
3 0.829 -0.030 139.52 0.000
4 0.772 -0.029 176.20 0.000
5 0.716 -0.028 208.37 0.000
6 0.661 -0.027 236.29 0.000
7 0.606 -0.027 260.26 0.000
8 0.552 -0.026 280.57 0.000
9 0.499 -0.026 297.53 0.000
10 0.447 -0.026 311.45 0.000
11 0.396 -0.025 322.64 0.000
12 0.347 -0.025 331.41 0.000
13 0.299 -0.025 338.07 0.000
14 0.252 -0.024 342.92 0.000
15 0.207 -0.024 346.27 0.000
16 0.163 -0.024 348.41 0.000
17 0.121 -0.023 349.62 0.000
18 0.081 -0.023 350.17 0.000
19 0.042 -0.023 350.32 0.000
20 0.005 -0.024 350.32 0.000
21 -0.031 -0.024 350.41 0.000
22 -0.065 -0.025 350.81 0.000
23 -0.098 -0.026 351.75 0.000
24 -0.129 -0.027 353.43 0.000
25 -0.159 -0.028 356.07 0.000
26 -0.188 -0.029 359.87 0.000
27 -0.215 -0.029 365.06 0.000
28 -0.242 -0.029 371.83 0.000
29 -0.267 -0.029 380.40 0.000
30 -0.290 -0.028 390.95 0.000
31 -0.312 -0.027 403.67 0.000
32 -0.332 -0.025 418.73 0.000
33 -0.351 -0.024 436.24 0.000
34 -0.367 -0.022 456.31 0.000
35 -0.380 -0.020 479.00 0.000
36 -0.392 -0.017 504.31 0.000
37 -0.400 -0.015 532.22 0.000
38 -0.406 -0.013 562.65 0.000
39 -0.409 -0.010 595.45 0.000
40 -0.409 -0.007 630.45 0.000
41 -0.406 -0.004 667.41 0.000
42 -0.400 -0.001 706.01 0.000
43 -0.391 0.002 745.91 0.000
44 -0.378 0.006 786.67 0.000
45 -0.362 0.010 827.80 0.000
46 -0.343 0.014 868.74 0.000
47 -0.320 0.019 908.85 0.000
48 -0.293 0.024 947.37 0.000
49 -0.263 0.030 983.51 0.000
50 -0.229 0.036 1016.3 0.000
51 -0.191 0.042 1044.9 0.000
52 -0.149 0.048 1068.0 0.000
53 -0.103 0.055 1084.7 0.000
54 -0.053 0.062 1093.6 0.000

Figure 3: Correlogram of Life Expectancy


Life expectancy at birth, total (years)

80

70

60

50

40

30

5 10 15 20 25 30 35 40 45 50 55

Figure 4: Life Expectancy graph in EViews

Findings and Interpretations


For the three univariate techniques used, we have obtained results showing significant trend
component in each of the results. While using decomposition techniques, almost 98% of the total
variation in data was explained by trend component. Only 2% of variation accounted for cyclical
and random variations.
References
Lee, R. D., & Carter, L. (1992). Modeling and Forcasting the time series of U.S Mortality.
Journal of the American Statistical Association, 659-671.

Nunes, L. C., & Coelho, E. (2011, july). Forecasting mortality in the event of a structural
changes. Journal of the Royal Statistical Society. Series A (Statistics in Society), 174(3),
713-736. Retrieved from http://www.jstor.org/stable/23013518

Riley, J. C. (2011). Rising Life Expectancy: A Global History. New York: Cambridge University
Press.

You might also like