You are on page 1of 13

AMERICAN JOURNAL OF SCIENTIFIC AND INDUSTRIAL RESEARCH

© 2017, Science Huβ, http://www.scihub.org/AJSIR


ISSN: 2153-649X, doi:10.5251/ajsir.2017.8.3.34.46

Stochastic models of Nigerian total livebirths


Adekanmbi, D.B
Department of Statistics,
Ladoke Akintola University of Technology,
Ogbomoso, Oyo-State, Nigeria.

ABSTRACT

This study aims at modelling Nigerian total livebirths data, and to select the appropriate model for
the disaggregated livebirths series among the proposed univariate stochastic time series models,
based on in-sample fitting. Forecast of demographic variables such as births has a great
influence on the growth of a population with respect to its demands on various systems such as
education, health, economy, and provision of social amenities for future population over a period
of time. The result of the Dickey-Fuller test confirmed the stationarity of the livebirths series, after
subjecting it to non-seasonal differencing and Box-Cox variance stabilizing transformation.
Correlogram visual analysis method was employed to identify reasonable models for the total
livebirths series; and the identified univariate models are ARIMA(1,1,0), and
SARIMA(1,1,0)(0,0,1)4. The diagnostic checks to evaluate the adequacy of the fits of the models
revealed that the residuals of the two univariate models were indeed serially uncorrelated. The
results of other measures of adequacy verification and forecast accuracy suggest that the two
univariate models provided adequate predictive models for the Nigerian total livebirths series,
with the SARIMA model outperforming the ARIMA model. Thus the SARIMA model was chosen
as the more appropriate model in fitting the livebirths data because of its overall fit as confirmed
by its measures of forecast performance.

Keywords: Autoregressive Integrated Moving Average (ARIMA) model, Seasonal Autoregressive


Integrated Moving Average (SARIMA), Augmented Dickey Fuller (ADF) test, Autocorrelation
function (acf), Partial autocorrelation function (pacf), Box-Cox variance stabilizing transformation.

INTRODUCTION Fertility rate refers to the relative frequency with


which births occur in a given population, while birth
Fertility is a vital component of demographic change
rates refer to the rate of incidence of births within the
and for predicting the size and structure of a
total population. Reproduction refers to the ability of a
population. Four demographic terms namely fertility,
population to grow and replace itself; and this
livebirths, natality and reproduction have overlapping
encompasses the processes of birth and death and
meanings, relating to total births, including live-births
also examines the extent to which fertility balances
and still-births. Fertility which is the actual birth
the force of mortality and permits population growth
performance in a population is also defined as the
as well.
frequency of child bearing among a population.
Natality in a broader sense is the role of births in a Nigeria has a population close to 124 million in 2003;
population change and human reproduction, [36]. and is one of the 10 most populous countries in the
The demographic definition of a livebirth as world, and the most populous country in Africa. The
recommended by the United Nations and the World crude birth rate which is the number of births per
Health Organisation, defined livebirth as the complete thousand of a population for Nigeria stood at 36.1 in
expulsion or extraction from its mother of a product of 2010, [31]. The estimated total fertility rate per
conception, irrespective of duration of pregnancy, woman in Nigeria has also decline from 7.2 in 1960
which after such separation, breathes or shows any to 4.0 in 2013, [31]. This is an indication that though
other evidence of life, such as beating of the heart, the number of livebirths is on the increase in Nigeria,
pulsation of the umbilical cord, or definite movement nevertheless there is a sharp reduction in the number
of voluntary muscles, whether or not the umbilical of children a woman between ages 15-49 would have
cord has been cut or the placenta is attached; each during her reproductive life, if for all of her
product of such a birth is considered livebirth, [29]. childbearing years she were to experience the age-
Am. J. Sci. Ind. Res., 2017, 8(3):34-46

specific birth rates for that given year. Between 1980 demographic variables of different populations, [4,
and 2003, it has been observed that the birthrate 21, 22, 25]. Seasonal ARIMA (SARIMA) model is a
among Nigerian women aged 15-19 has declined by generalization of a non-seasonal ARIMA model, [18].
27%. However, because Nigeria population SARIMA model is capable of describing a wide range
increased rapidly, the annual number of births to of series containing random changes in level and
teenage women increased by 50% over this period, slope of their seasonal pattern. In seasonal data
[32]. Adolescent birthrate therefore contributes there are two time intervals of importance, [23]. A
significantly to the total birth rate of women in Nigeria. SARIMA model may be thought of as describing two
In the recent, age-at-marriage of females has effects simultaneously. When dealing with a quarterly
increased which consequently lead to a decrease in series, the quarter-to-quarter behavior is assumed to
fertility rate in Nigeria. A decrease of total fertility rate be described by a non-seasonal ARIMA model with
may be due to postponement of births among the parameters (p, d, q), while the residuals from this
females of reproductive age in the country. The model are assumed to be represented by a year-to-
estimated crude birth rate of Nigeria which stood at year ARIMA model with parameters (P, D, Q).
46.2 in 1970 has reduced to 41.5 as at 2012, [30].
The focus of this study is to propose univariate
The main sources of data on livebirths are vital stochastic time series models namely: ARIMA model
statistics registration system, national censuses, and and SARIMA model for projecting total livebirths of
national sample surveys. Most developing nations in Nigeria and to conduct empirical evaluation of the
Africa now have vital registration system which forecasting power of the models based on measures
provides birth statistics among other relevant vital of forecast performance. This article has been
statistics that can be collected through the organised into eight sections. Description of the data
registration system. In Nigeria, the National employed and its disaggregation are discussed in
Population Commission is an agency with statutory section 2. Section 3 focuses on the theory of
mandate to establish and maintain a uniform system univariate time series models, which are ARIMA and
of vital registration for the nation with a view of SARIMA models. Overview of the theories of
providing vital statistics on a regular basis for the diagnostic tests, measures of adequacy of time
purpose of socio-economic planning, [24]. The series models and model selection are presented in
National Population Commission reported a total of 9, sections 4, 5 and 6 respectively. Results of the
936,221 registered livebirths in Nigeria during the analyses and an extensive discussion on the models
period 1994-2007. are presented in section 7. Conclusions and
recommendations based on the study are given in
Modelling of demographic phenomenon can
section 8.
contribute to understanding such demographic
variable by revealing something about the process Data: Data on livebirths were extracted from the
that builds persistence into the series. Time series reports on vital statistics registration of livebirths of
modelling is a critical step in direct projection of the publication of the National Population
demographic phenomenon such as total livebirths. Commission in Nigeria, [24]. The statistical report on
Time series models are useful statistical instrument livebirths by the Commission on birth registration
for short term forecasts of births, to indicate the coverage stood at 35% as at 2007. The data were
number of children to be born if movements observed recorded on annual basis from 1994 to 2007. Time
in the recent past continue in the near future, [21]. series models are adequate for short term forecasting
Univariate time series models use only past values of with fifty or more observations of data point in order
the variable under consideration to forecast future to obtain good estimates of the parameters of a time
values. Two univariate stochastic time series models series model, [5, 6, 8, 13, 25]. It is therefore
to be considered in this study are: Autoregressive necessary to disaggregate the annual observations
Integrated Moving Average (ARIMA) and Seasonal on livebirths to quarterly figures so as to have series
Autoregressive Integrated Moving Average with high caseload that will not be affected by random
(SARIMA). ARIMA model, a forecasting instrument in fluctuations; and to uncover the possible pattern of
time series is adequate in calculating projections for a total livebirths in Nigeria. Boot-Feibes-Lisman first
particular variable based on observations of that difference (BFL-FD) [3], which is a non-model based
same variable in the past, [5]. Several researchers in method but suitable in disaggregating annual series
the past have proposed Autoregressive Integrated into quarterly figures, was employed to disaggregate
Moving Average (ARIMA) models in predicting future

35
Am. J. Sci. Ind. Res., 2017, 8(3):34-46

the annual data on registered livebirths into quarterly B : Backward shift operator such that B y t  y t  j
j
figures.
Univariate Time Series Models: Univariate φB : the autoregressive (AR) operator, such that:
stochastic time series models attempts to forecast a
φB  1  φ1B  .... φ p B .
time series yt from its past history only. Forecast from p
a univariate time series can be written as:

Forecast  π1y t  π 2 y t 1  π3y t 2  π 4 y t4  ............ θB : the moving average (MA) operator, such that:
(1)
θB  1  θ1B  .... θ q B .
q
A univariate model must be developed so that the
weights  i s can be expressed in terms of few
ε t : a white noise sequence also referred to as
parameters, in order to achieve parsimony. A
disturbance term such that
 
parsimonious model is the one which represents the
data adequately with the minimum number of   2 2

E ε t  0 ; E ε t  σ ε and E ε t  t k  0 . 
parameters. ARIMA and SARIMA models are
referred to as univariate because only one variable y t : livebirths at time t.
depending on its past values is included. The models
are useful for out-of-sample forecasting and for d: order of non-seasonal differencing, and is a non-
descriptive analyses. The two univariate time series negative integer.
models that will be considered in this study are: (i)
ARIMA (ii) SARIMA. The univariate models use the Equation (1) is referred to as an autoregressive
total livebirths as variable for the reference period. integrated moving average (ARIMA) process. Simply
The main objective of a time series model is written as ARIMA(p, d, q) model.
forecasting. Forecasting is to estimate the future where
value of a series as accurately as possible from the
current and past values of the observed time series, p: order of non-seasonal autoregressive (AR) term.
[5,7]. d: order of non-seasonal differencing.
Arima Modelling: Autoregressive Integrated Moving q: order of non-seasonal moving average (MA) term.
Average (ARIMA) models which is the most general
model of univariate time series models was first If d  1 , (1) now becomes
introduced by Box and Jenkins, [5]. ARIMA models
may possibly include autoregressive terms, moving y t  φy t1  .........  φpy t p  ε t  θε t 1  ......  θε t q
average terms, and differencing operation. ARIMA (3)
models are capable of describing a wide class of
stationary and non-stationary time series containing The series yt will be stationary when the zeros of
stochastic trends; and have become popular time  p B are all outside the unit circle, and will be
series models for forecasting demographic variables,
[21, 22, 25]. They are capable of describing series invertible when the zeros of θ p B are all outside
whose slope, level and higher derivatives are being
the unit circle, [1, 15, 16, 28]. Stationarity of the
continuously modified by random shocks entering the
series must first be checked by subjecting the series
system, [6]. ARIMA model could be used to forecast
to Augumented Dickey-Fuller (ADF) unit root test,
a value in a response series as a linear combination
[12]. Unit root makes a series non-stationary. If the
of its own past values and past errors. The models
result of the ADF test shows that the original series is
could be adopted for short term forecasts of livebirths
non-stationary, the degree of the difference d should
of a population to determine the future livebirths of
be chosen to transform the series into a stationary
such a population. y t can be projected based on the series. Differencing d times removes polynomial
observed number of livebirths, trend of degree d, and also removes stochastic ‘unit
root’ non-stationarity. In order to determine the order
φBd y t  θBε t (2) of integration, d, of a series one could proceed by
testing the differenced series until the unit root
where hypothesis is rejected. A difference stationary series

36
Am. J. Sci. Ind. Res., 2017, 8(3):34-46

is said to be integrated, denoted as I(d), where d is series, [19]. The second step is to estimate the
the order of integration. The order of integration is the parameters of the model. Parameters φ, θ  can be
number of unit roots contained in a series, and
estimated by least squares regression, and
determines the number of differencing operations
sometimes could require more complicated iteration
required to make the series stationary. For a series
procedure, [9].
with one unit root, it is regarded as I(1), and a
stationary series is I(0), [14]. A stationary series have The next step is diagnostic checking of the proposed
constant mean, variance and autocorrelations, so model which involves verification of the model to
that the autocovariances of observations with a fixed ensure that the residuals of the model are random,
interval are constant through time. and to ensure that the estimated parameters are
statistically significant. If the ARIMA model is
When a series is characterized by heteroscedasticity,
correctly specified, the residuals from the model
it cannot be seen as the realization of an ARIMA
should be nearly white noise. This implies that there
process even after differencing, because of
should be no serial correlation left in the residuals. If
nonstationarity in variance, which can be removed by
the proposed model fails the diagnostic checks, one
applying Box-Cox transformation, [7, 8, 11, 13]. The
starts over with identification. Estimation process is
Box-Cox transformation is a variance stabilizing
usually guided by the principle of parsimony, by
measure which deal with nonstationarity in variance.
which the best model is the simplest possible model
λ which adequately describes the series. The goal of
yt  1
z t λ   if λ  0 ARIMA analysis is a parsimonious representation of
λ
 t
the process governing residuals. Models that survive
z t λ   log y if λ  0 diagnostic checks could be used to forecast, using
measures of accuracy to assess the performance of
(4) such model.
The Box-Cox transformation allows reaching a good Seasonal ARIMA (SARIMA) Modelling: Many
level of symmetry of a series, independence of demographic variables are seasonal, and this
random effects and stable variance. Apart from the variation would be present even if the factors are not
case of λ  0 , which corresponds to the logarithmic casually related. When a series possess a seasonal
transformation, other particular values are λ  0.5 , component that repeats every s observations,
SARIMA models could be formulated to deal with
(square root) and λ  1 (cubic root). λ can be seasonality in such data. Seasonality in a series is a
3 regular pattern of changes that repeats over s time
determined through maximum likelihood estimates or periods, where s is the number of time periods until
by common visual analysis. A good value of λ to the pattern repeats again. The fundamental fact
treat heteroscedasticity also makes the process about seasonal series with period s therefore is that
similar to a Gaussian and reduces the effects of observations which are s intervals apart are similar.
possible outliers. Mean non-stationarity can therefore Seasonality usually causes a series to be non-
be removed by differencing the series to induce stationary because the average values at some
stationarity, while variance non-stationarity can be particular times within the seasonal span may be
removed by subjecting the series to Box-Cox different from the average values at other times, [7].
variance stabilizing transformation. Unless the trend Seasonal differencing removes seasonal trend, and
is removed, no other components can be recognized can also get rid of a seasonal random walk type of
from acf and pacf. nonstationarity. A series with both trend and
After achieving stationarity in a series, the first step in seasonality could be subjected to a non-seasonal
ARIMA modelling is to identify an adequate model, difference and a seasonal difference to induce
which involves specifying the appropriate structure stationarity. The seasonal ARIMA (SARIMA) model
and order of a model. Identification could be done by incorporates both non-seasonal and seasonal factors
visual inspection of the time series plot of the data, in a multiplicative fashion. A multiplicative model is
acf and pacf; judging the appropriate model structure one in which for example, for quarterly series, the
and order from the appearance of the plotted acf and quarter-to-quarter behavior can be separated from
pacf. Many popular statistical computing softwares the year-to-year behavior, [18]. This implies that the
now have facility for identifying possible models for a autoregressive operator, which dictates the weight to
be applied to the past history of the series, can be

37
Am. J. Sci. Ind. Res., 2017, 8(3):34-46

factorised into a product of a non-seasonal distributed with mean zero and variance σε2 . It is also
autoregressive operator and a number of seasonal
autoregressive operators, one for each seasonal referred to as a white noise process. The assumption
periods. Similarly, the moving average operator may of normality of the model white noise process is
be written as the product of non-seasonal and usually made in order to construct forecast intervals
seasonal moving average operator. SARIMA models for point forecasts.
are ARIMA(p,d,q) models whose residuals ε are SARIMA modeling process is also carried out in three
t stages which are: Model identification, model
ARIMA(P,D,Q), so that estimation and order selection; and diagnostic checks
y ~ SARIMAp, d, q x P, D, Qs . The general form of and residual analysis which is also referred to as
t verification of the adequacy of the model. Plots of acf
the multiplicative SARIMA model which can be and pacf could be used to discover s and also the
written as SARIMA(p, d, q)(P, D, Q)s model as degree of differencing that is necessary to achieve
suggested by reference [5] is: stationarity. Spikes in the acf at low lags could
 
φ p BΦ P Bs dsD y t  θq BΘQ Bs ε
t
  t indicate
 1,2,......non-seasonal
T MA terms, while spikes in the
pacf indicate possible non-seasonal AR terms.
(5) Repetition of patterns across the lags of the
where correlograms that are multiples of s could indicate
p: order of non-seasonal autoregressive (AR) term. seasonality in the series, [11]. A pure seasonal MA
q: order of non-seasonal moving average (MA) term. should have a significant value for the acf at the lag
d: order of non-seasonal differencing. of the period and roughly zero otherwise. A pure
P: order of seasonal autoregressive (AR) term. seasonal AR should taper off exponentially at
Q: order of seasonal moving average (MA) term. multiples of the lag of the period and be roughly zero.
D: order of seasonal differencing. The pacf of a pure seasonal MA should taper off
s: seasonal length or seasonal order. exponentially at multiples of the period and be zero
B: Backward shift operator, such that Byt  y t 1 otherwise. The pacf of a pure seasonal AR should cut
off after the lag of one period, and should be zero for
and B t   t 1 . all other values [5, 9, 11]. Having identified the
appropriate SARIMA model for a series, the
 : Differencing operator, such that y t  yt  y parameters of the selected model are estimated at
t 1
d d1
and  y t    yt  w  t
 the estimation stage. Diagnostics measures should
also be employed to evaluate the appropriateness of
  1  1  B the fits. If the model is found inadequate, the three
stages are repeated until satisfactory SARIMA model
φ
p
B : AR polynomials of B of order p. is selected for the series under consideration.
Diagnostic Tests For Time Series Models: A
φ
p
B  1  φ1B  φ2B2  ...  φp Bp crucial aspect of the model building process is
diagnosis of the univariate models, which involves

q
B : MA polynomials of B of order q. analysing the model residuals to verify that the
residuals are random, [9]. Different diagnostic tests

q
B  1  1B   2B2  ...   q Bq can be performed to ascertain the adequacy of the
model. A large residual is an indication that the model
 
s s
 p B : seasonal AR polynomial of B of order P. is not adequate, while the model is a good fit for a
series if the residuals should be random. The
 p B   1  1B   2 B  ......  P B
s s 2s Ps
residuals should be without pattern, so that there

Q B  : seasonal MA polynomial of B of order Q.


s s should be no serial correlation in the residuals. One
of such tests is to verify if the residuals of a proposed
Q B   1  1B  2 B  ...... P B
s s 2s Qs univariate model is a white noise process. The

ε t : is a sequence of identically and independently


significance of the autocorrelation of the residuals
could be checked if they are within the insignificance
bound, which is the two standard error bounds
distributed random variables, which are normally

38
Am. J. Sci. Ind. Res., 2017, 8(3):34-46

2 N , [20]. The desirable result is that the Square Error (FMSE), Forecast Mean Absolute Error
(FMAE) or Forecast Mean Absolute Percentage Error
correlation is 0 between residuals separated by any
(FMAPE), [20]. The measures are used in measuring
given time span, so that residuals should be
the accuracy of the projection of the models. Given a
unrelated to each other. The adequacy of the model
particular model for a process,
should be questioned if the autocorrelations of the
residuals of the first N/4 lags are close to the critical yt  μt  εt
bounds. (8)
Another approach to evaluating the randomness of
univariate time series models residuals is to examine The error after y t is observed is e t  y t  μ t .
the acf ‘as a whole’ rather than at the individual rk ' s
where
separately, [9]. The test is called the portmanteau
lack-of-fit test or Ljung-Box statistic. It is also simply μ t : is a known function of past y t and ε t values .
referred to as Q-statistic. Q-statistic for high-order
serial correlation is another diagnostic test for e t : is a realisation of random variable ε t and are
univariate time series models, which is a function of
the accumulated sample autocorrelations up to any independent and identically distributed
specified time lag. Q-statistic test is therefore a more with mean 0 and variance σ2 .

 2  Ee2t 
general test for serial correlation in the residuals. The
most direct evidence of random residuals is the FMSE  E y t  μ (9)
absence of significance values of the Q-statistic at
lags of about one quarter of the sample size. The The Forecast Mean Absolute Error (FMAE) could be
ideal acf of residuals is that all autocorrelations are computed using the formula

   
zero. A significant Q for residuals indicates a possible
problem with the model. If the autocorrelations of the FMAE  E y t  μ t  E ε t (10)
residuals at a particular lag exceeds the confidence
level, but the Q-statistic at that lag is non- The Forecast Mean Absolute Percentage Error
significance, then the autocorrelation could be (FMAPE):
regarded as a chance occurrence. Q-statistic is
approximately a χ 2n p distribution, where p is the 
 y  μt 

FMAPE  E  t  (11)
number of parameters in the model. The Q statistic

 yt  
is:
m Given that yt takes only positive values.
Q  n n  2 n  K  ρ ~a K 
1 2
(6)
Model Selection: The objective penalty function
k 1
criteria such as Akaike Information criteria (AIC),
where Bayesian Information Criteria (BIC) or Schwarz-
Bayesian Information Criteria (SBC) can be used to
ρ ~a : are the autocorrelations of estimation residuals. choose the most appropriate model among the
proposed models for a series. These criteria try to
K: a prefixed number of lags. find a trade-off between the goodness-of-fit and the
n: sample size. parsimony of the models considered. Akaike, [2]
proposed an information criterion of the form
Accuracy of Forecast Models: In most forecasting
situations, accuracy is treated as the overriding AIC  2InL  2k (12)
criterion for selecting a forecasting model. The word
where
‘accuracy’ refers to goodness of fit, which in turn
refers to how well the forecasting model is able to K: number of parameters in the model to be
reproduce the observed series. Empirical evaluations estimated.
of the performance of forecasting models rely upon L: Likelihood function of the model.
measures of error criterion such as Forecast Mean

39
Am. J. Sci. Ind. Res., 2017, 8(3):34-46

When there are two or more competing models, the the time plot revealed that there is no consistent
model with the smallest AIC is deemed best in the trend in the series over the time period considered.
sense of minimizing the forecast mean square error, The livebirths series did not show any marked
[20]. It was however pointed out by Schwartz [27] that seasonality pattern, and the curved trend shows that
AIC is not a consistent criterion due to the fact that it the series is not stationary in variance. A downward
does not select the true model with probability trend in the number of registered livebirths was
approaching 1 as n   , [10, 17]. Schwartz [2] noticed in 1998, which was then followed by an
therefore proposed the Bayesian Information criterion upward trend till 2003, after which there was a slight
(BIC) downward trend. A sudden upward peak was obvious
in 2006, reaching its maximum in 2007, followed by a
BIC  2InL  kIn(n) (13) downward trend. A sharp peak is an indication that
highest number of livebirths was recorded in this
where period. The disaggregated series was subjected to
n: number of observed data points or sample size. the Augumented Dickey-Fuller test to ascertain its
stationarity status. The ADF value of -3.0655 with p-
The BIC include fewer terms than the AIC since the value of 0.1723 at 5% level of significance, showing
penalty term is greater. If the process follows an that there is presence of unit root and the original
ARIMA(p,d,q) model, then it is known that the orders disaggregated series is therefore not stationary. The
specified by minimizing the BIC are consistent, that is autocorrelations of the series exhibits non-stationarity
they approach the true orders as the sample size in mean and the variability of the data does not look
increases. However if the true process is not a finite constant, as shown in Figure 2(a). After subjecting
order ARIMA process, then minimizing AIC among an the series to non-seasonal differencing and Box-Cox
increasingly large class of ARIMA models will lead to variance stabilizing transformation, the value of the
an optimal ARIMA model that is closest to the true ADF test yielded a value of -5.020 with a p-value
process among the class of models under study. 0.0105 at 5% level of significance, an evidence that
RESULTS AND DISCUSSION OF THE ANALYSES: stationarity has been achieved.
The time plot for the quarterly disaggregated
livebirths series is shown in Figure 1. Examination of

Fig. 1: Time plot of the disaggregated livebirths series (yt).


Figure 2 shows the acf and the pacf of the suggesting that at least one non-seasonal
disaggregated livebirths series, yt, while figure 3 is the differencing will be appropriate. The livebirths series
 
acf and pacf of the differenced series y t with log is expected
characteristics,
to
since
possess
the
non-stationarity
series represent
transformation. The acf of the livebirths data shown in ‘uncontrolled’ behavior of certain process outputs.
Figure 2(a) has a sinusoidal pattern that tails off to Spikes in the pacf could indicate possible non-
zero reflecting the meandering shape of the series, seasonal AR terms. The pacf of the series shows a
which indicates that the series is non-stationary;

40
Am. J. Sci. Ind. Res., 2017, 8(3):34-46

clear single positive spike at lag 1 and a negative


spike at lag 2. After subjecting the series to variance
stabilizing transformation so that z t  log  y t and
 
non-seasonal differencing z t to remove trend
nonstationarity; the series becomes a stationary
process, as shown in figure 3(a). The acf cuts off
after lag 2, and the rest of the values randomly
oscillate about zero, within the insignificance limits.
The non-significance spikes show some patterns of
groups of positive and negative values. The
autocorrelations at lag 1 and 2 are positive, an
indication that the AR is positive. The tapering pattern
in the lags of the acf suggests that a non-seasonal
AR(1) may be a useful part the model. The large
spike in the acf at lag 1 might lead to a seasonal
MA(1) interpretation. The identified univariate models
are therefore ARIMA(1,1,0) and
SARIMA(1,1,0)(0,0,1)4.

Fig. 3: acf and pacf of z  t .

The equation of the fitted ARIMA model is therefore:

1  φBz t  ε t
(14)
ARIMA(1,1,0) can be written explicitly as

z t  z t 1  φz t 1  φz t 2  ε t
(15)

z t  z t 1  φz t 1  z t 2   ε t
(16)
The equation of the SARIMA(1,1,0)(0,0,1)4 model is

1  φBz t  1  ΘB4 ε t
The SARIMA model can also be written explicitly as
Fig. 2: acf and pacf of livebirths series (yt)
z t  z t 1  φz t 1  z t 2   ε t  Θεt 4
(17)
The disaggregated livebirths data are quarterly
series, so that the period length is 4.

41
Am. J. Sci. Ind. Res., 2017, 8(3):34-46

Table 1. Model estimation for the livebirths models


Model Estimated Model MLEa SEb t- p- Stationary R2
value value R2 value value

ARIMA1,1,0 1  0.508Bz t  ε t φ 0.126 4.028 0.000 0.247 0.824


=0.508

SARIMA1,1,00,0,14
φ
1  0.519Bz t  1  0.421B4 ε t
0.124 4.187 0.000 0.302 0.902
=0.519
 
 0.138 3.056 0.004
=0.421
a
The maximum likelihood estimates.
b
Standard error.

Table 2. Performance Comparison of the Univariate Models

Model AIC BIC FMSE FMAE FMAPE Residual Variance

ARIMA1,1,0 -15.02 21.529 45521.864 23212.44 12.178 0.03999

SARIMA1,1,00,0,14 -20.89 21.033 34179.619 19699.29 11.577 0.03236

The summary of the fitted model of the livebirths is desirable. The estimate of the residual standard
series is shown in Table 1. The estimates of the deviation σ̂  0.199975 for the ARIMA model
parameters of the proposed univariate models
including the associated standard errors, t-values and implies that the standard deviation of the residuals is
their corresponding significant values are also approximately 20% of the level of the series. The
reported in Table 1. The associated p-value of the AR criteria of selecting the better model is based on
parameter shows that its coefficient is statistically minimization of forecast errors, AIC, BIC and on
significant. The estimated coefficients for the variance. It is noticeable from Table 2 that the
SARIMA model appears to have performed better
SARIMA model are also significant as revealed by than the ARIMA model, based on the values of their
their p-values. The R2 value for the ARIMA model is AIC, BIC and measures of forecasting accuracy. The
0.824, indicating that over 82% of the total variation result of performance evaluation of the two univariate
in the series is accounted for by the model, and this models therefore show that SARIMA yields better
clearly demonstrate the effectiveness of forecast compared to the ARIMA model. The ARIMA
ARIMA(1,1,0) in modeling the livebirths series. The model though not as good as the SARIMA model, but
value of R2 for the SARIMA model is 0.902 which would be suitable for forecasting as well.
shows that more than 90% of the variation in the
series is explained by the model, and the model is Measures of diagnostic verification to determine the
considered to be practically significant. This is an adequacy of the models in representing the
indicating that the inclusion of the seasonal MA terms underlying process in the livebirths series were
performed. The acf of the residuals and Ljung-Box
test could be used to check for correlations between

42
Am. J. Sci. Ind. Res., 2017, 8(3):34-46

successive forecast errors. The acfs of the residuals statistics for each lag up to 10 which are all well
of the two models show no significant correlation in above 0.05, indicating non-significance, which is a
the residuals for all the lags, except at lag 0, as desirable result. The result of the Ljung-Box test on
shown in figures 4(a) and 5(a). The big spike at the the residuals for the estimated univariate models
beginning is the unimportant lag 0 correlation. therefore favour accepting the models as effective
Figures 4(b) and 5(b) give p-values for the Ljung-Box models for the persistence in the livebirths series.

Figure 4: (a) acf of residuals and (b) p-values for Q-statistic for ARIMA(1,1,0) model.

Fig. 5: (c) acf of residuals and (d) p-values for Q-statistic for SARIMA(1,1,0)(0,0,1) model.

43
Am. J. Sci. Ind. Res., 2017, 8(3):34-46

Table 3: Actual and Predicted number of Livebirths in Nigeria for period 2005-2007
Year Actual Predicted cases Lower limit Upper limit Predicted Lower limit Upper limit
livebirths from ARIMA cases from
model SARIMA model
Figures

2005 166882.28 166734.03 114222.13 235608.16 177932.06 123271.17 249083.37


2005 168043.07 169893.11 116386.28 240072.19 163786.62 113471.23 229281.47
2005 170364.64 171396.95 117416.50 242197.23 173171.50 119973.06 242419.17
2005 173847.01 174364.91 119449.72 246391.19 177135.37 122719.23 247968.10

2006 154566.01 178518.51 122295.16 252260.55 182097.46 126156.96 254914.43


2006 185486.81 147994.35 101384.41 209127.54 145132.06 100547.36 203167.33
2006 247328.40 206814.54 141679.52 292245.04 207146.65 143511.01 289980.26
2006 340090.79 290923.04 199298.55 411096.89 292038.57 202324.07 408818.70

2007 560020.28 406333.50 278361.17 574180.86 433754.17 300504.51 607203.40


2007 516714.67 733234.72 502306.79 1036117.7 660201.45 457386.98 924202.20
2007 430103.45 504146.80 345368.75 712398.67 464019.10 321472.02 649570.63
2007 300186.61 398262.70 272832.22 562776.19 369996.60 256333.32 517950.51

800000
700000
600000
500000
Livebirths

actual
400000
forecastARIMA
300000
forecastSARIMA
200000
100000
0
2005

2006

2007

Year

Fig. 6: Forecasted livebirths for the ARIMA and the SARIMA models, with the actual Livebirths
The univariate models are used as predictive models Table 3. In order to assess the prediction accuracy,
for making forecasts for future values of livebirths in the forecast mean square error (FMSE), forecast
Nigeria. The results of forecasting for the period mean absolute error (FMAE), and forecast mean
2005-2007 with the univariate models are given in percentage error (FMAPE) were employed. The

44
Am. J. Sci. Ind. Res., 2017, 8(3):34-46

FMSE for the SARIMA forecasts is lower than assessment of the data suggests an under-reporting
forecasts generated by the ARIMA model. Figure 6 is of live births in 1994 in most Nigerian states. The
the plot of the actual livebirths and the forecasts for data on livebirths for 1994 was therefore discarded,
both the ARIMA and SARIMA models, for period so that the study is based on total live births data
2005-2007. The SARIMA model provides better from 1995-2007; which are more reliable. Despite the
forecasts, tending to outperform the ARIMA model possibility of under-reporting of total livebirths in
forecasts before the peak in the registered livebirths Nigeria, it is believed that under-reporting will not
of the 1st quarter of 2007. For the remaining part of substantially alter the proposed time series models
2007, the ARIMA model forecasts seem superior. and the forecast of future livebirths in the country.
Generally, the forecasts at the turning point are poor
There is need to exercise caution in interpreting the
for the two models. The analysis of the two
result of the analysis due to the fact that recent
univariate time series models suggests that both
changes in fertility behaviour of Nigerians are not
models are satisfactory, but SARIMA model
accounted for in this study; and this could make a
outperformed the ARIMA model.
difference when considered. An empirical question to
CONCLUSION: Modelling and prediction of Nigeria be investigated in future research is the possiblity to
total livebirths are the focus of this study. Univariate obtain improved forecasts of total births by
stochastic time series models were developed and considering stochastic model that incorporates other
factors that have implications on births in Nigeria.
fitted to the quarterly disaggregated total livebirths
series to predict future livebirths in Nigeria based on REFERENCES
past data. The proposed univariate models were [1] Anderson, T.W. The statistical analysis of Time
ARIMA(1,1,0) and SARIMA(1,1,0)(0,0,1)4. It was Series. New-York: John Wiley, 1971.
discovered that the inclusion of the seasonal MA [2] Akaike, H. Markovian representation of stochastic
improved the effectiveness of the model in predicting processes and its application to the analysis of
future total livebirths in Nigeria. The values of the autoregressive moving average processes. Ann.
Inst. Statist. Math. 1974; 26: 363-387.
measures of forecast performance considered
showed that the models fit the livebirths series well [3] Boot, J.C.G., Feibes, W., and Lisman, J.H.C.
Further methods of derivation of quarterly
and are statistically significant. Since the acf’s of the figuresfrom annual data. Journal of Royal
residuals show that none of the autocorrelations for Statistical Society 1967; Series C. 16(1): 65-75.
lags 1-15 exceed the significance bounds, and the p- [4] Booth, H. And Tickle, L. Mortality modelling and
values for the Ljung-Box test statistic are all well forecasting: a review of methods. ADSRI working
above 0.05, it can be concluded that there is very Paper, 2008; no. 3.
little evidence for non-zero autocorrelations in the [5] Box, G.E.P., and Jenkins, G.M. Time series
errors at lags 1-15. The SARIMA model was chosen Analysis: Forecast and Control 3rd Ed. San-
as the more adequate model in predicting future Francisco: Holden-Day, 1994.
livebirths because of its good prediction performance [6] Brockwell, P. and Davis, R. Time Series: Theory
and lower values of AIC and BIC, compared with the and Methods, 2nd Ed. Spinger-Verlag, 1991.
ARIMA model. It is necessary to forecast future [7] Chatfield, C. Time series forecasting. Chapman
livebirths of a population like Nigeria in order to and Hall, Inc., 2001.
determine forecast demands of this demographic [8] Chatfield, C. The analysis of time series, an
phenomenon on our various systems such as introduction, 4th Ed. London: Chapman and Hall,
1989.
education, health, economy, and provision of social
amenities for future population over a period of time. [9] Chatfield, C. The analysis of time series, an
introduction, 6th ed. New-York, Chapman &
The Nigeria birth rate though has declined, but the
Hall/CRC, 2004.
total livebirths is on the increase as revealed by the
[10] Chik, Z. Performance of order selection criteria for
result of this study.
short time series. Pakistan Journal of
Data from vital registration on live births are likely to AppliedSciences 2002; 2(7): 783-788.
be inadequate in a developing nation like Nigeria, [11] Cryer, JD. Time series analysis. Boston: Duxbury
due to poor record keeping culture. A critical Press, 1985.

45
Am. J. Sci. Ind. Res., 2017, 8(3):34-46

[12] Dickey, D.A., and Fuller, W.A. Distribution of the [23] Mills, T.C. Time series techniques for Economists.
estimators for autoregressive time series with a Cambridge: Cambridge University Press, 1990.
unit root. Journal of the American Statistical
Association 1979; 74: 427-431. [24] National Population Commission [Nigeria]. Report
on livebirths, deaths and stillbirths registration in
[13] Diggle, P. J. Time series, a biostatistical Nigeria, (1994-2007). Claverton, Maryland:
introduction. Oxford: Clarendon Press, 1990. National Population Commission and ORC/Macro,
2008.
[14] Fuller WA. Introduction to statistical time series.
New-York: John Wiley, 1976. [25] Saboia, J.L.M. Autoreggressive integrated moving
average (ARIMA) models for birth forecasting.
[15] Granger, C.W.J. and Newbold, P. Forecasting Journal of the American Statistical Association
Economic Time Series. New-York: Academic 1977; 72: 264-270.
Press, 1977.
[26] Shryock, HS., Siegel, JS., and Associates. The
[16] Hannan, E.J. Multiple Time Series. New-York: methods and materials of demography,
John Wiley, 1970. (Condensed ed.). London: Academic Press Inc.
[17] Hurvich, C.M, and Tsai, C.L. Regression and Time Ltd, 1976.
Series model selection in small samples, [27] Schwartz, G. Estimating the dimensions of a
Biometrika 1989; 76(2): 297-307. model. Ann. Statist. 1978; 6: 461-464.
[18] Jenkins, G. M. Practical experiences with [28] Tiao, G.C and Box, G.E.P. Modelling multiple
modeling and forecasting time series. time series with applications. Journal of the
Kendal:Titus Wilson Ltd, 1979. American Statistical Association 1981; 76 (376):
[19] Ljung, L., System identification toolbox for use 802-16.
with MATLAB®, User’s guide. The MathWorks [29] U.S. National Centre for Health Statistics,
Inc., 24 Prime Park Way, Natick, Mass. 01760, Physician Handbook on Medical certification:
1995. death, fetal death, birth. Public Health Service
[20] Kendall, M. and Ord, J.K.. Time series (3rd ed.). Publication 1967; Series B (593).
New-York : Edward Arnold, 1993. [30] UNICEF.Frequency of Births in Africa.
[21] McDonald, J. A time series approach to forecast www.unicef.org/infobycountry/nigeria_statistics.htl,
Australian total livebirths. Demography 1979; 16: 2014.
575-602. [31] USAID. USAID country health statistical report;
[22] -------------. Modelling demographic relationships: Nigeria, 2011. (Website pls)
an analysis of forecast functions for Australian [32] Research in view. Early childbearing in Nigeria: a
births. Journal of the American Statistical continuing challenge. www.guttmacher.org/
Association 1981; 76: 782-792. pubs/ribs/ 2004/12/10/rib2-04.pdf. New-York: The
Allan Guttmacher Institute, 2004; Series 2.

46

You might also like