You are on page 1of 52

INTRODUCTION TO TIME SERIES FORECASTING (STA233)

WRITTEN REPORT:
TOTAL POPULATION IN MALAYSIA FROM 1970-2019

No. Student Name Student ID

1 IYLIA SYAMIMI BINTI AZMAN 2022617404

2 NUR NIESA FATIMAH BINTI AZMI 2022461104

3 SOFIYYAH MAWADDAH BINTI KHAIRIL HILMI 2022816476

4 NUR ALIYAH BINTI HAMZAH 2022474138

5 NABILA MARDIAH BINTI MAZLAN 2022617748


TABLE OF CONTENTS

ACKNOWLEDGMENT 2
1.0 INTRODUCTION 3
2.0 PROBLEM STATEMENT 5
3.0 RESEARCH OBJECTION 5
4.0 DATA DESCRIPTION 6
5.0 METHODOLOGY 9
5.1 Univariate Model 9
5.1.1 Naïve Forecast 9
5.1.2 Naïve with Trend 9
5.1.3 Average Change Model 9
5.1.4 Average Percent Change Model 10
5.1.5 Single Exponential Model 10
5.1.6 Double Exponential Model 11
5.1.7 Holt’s Method 12
5.2 Box-Jenkins Model 12
6.0 ANALYSIS AND RESULT 14
6.1 Components of Time Series Data 14
6.2 Univariate Modelling 15
6.2.1 Naïve Forecast 15
6.2.2 Naïve with Trend 18
6.2.4 Average Percent Change Model 24
6.2.5 Single Exponential Smoothing Model 27
6.3 Box-Jenkins 36
6.3.1 First-order differencing 38
6.3.2 Second-order differencing 41
6.3.3 ARIMA (1,2,2) 43
CONCLUSION 50
REFERENCE 51

1
ACKNOWLEDGMENT

First and foremost, we would like to express our sincere gratitude to Allah SWT for providing us
with the strength, health, and desire to carry out our project on time. Without His assistance, it would not
have been feasible to complete our assignment, which required tireless effort, within the time constraints.
Next, we would like to thank DOSM (Department of Statistics Malaysia) for developing a website that
makes it easy for us to gather data for our research. Without this information, we would be unable to carry
out and finish this project. Furthermore, we would like to thank our lecturer, Sir Sapuan Baharuddin for
guiding us the required direction throughout the journey in finishing our group project. Finally, thank you
to the team members for their dedication and cooperative spirit. The team members are constantly eager
to assist one another, exchange ideas, and have a good time while working on this group project. We want
to express our gratitude to everyone who helped us, both directly and indirectly, with the creation of our
group project.

2
1.0 INTRODUCTION

The population of Malaysia has experienced significant growth and transformation over
the years. According to the most recent demographic data published by the Department of
Statistics Malaysia (DOSM), Malaysia’s population in 2019 was at 32.5 million and grew from
32.4 million in comparison to 2018. 29.4 million (90.2%) citizens and 3.2 million (9.8%)
non-citizens make up the entire population. Selangor had the highest population among the three
states in the fourth quarter of 2022, followed by Johor (12.3%) and Sabah (10.4%). Nonetheless,
it's important to bear in mind that population estimates are subject to change and that the actual
population may be different.Malaysia's population is unevenly distributed and densely populated,
with urban areas having larger population densities than rural ones. Kuala Lumpur, the nation's
capital, and the surrounding Klang Valley are large urban hubs with dense populations,
expanding economies, and modern infrastructure. Ipoh, Penang, Johor, and other significant
cities also contribute to urbanization and growth in population.

There are numerous factors that could have an impact on Malaysia's population
fluctuation. Development, migration, the state of the economy, governmental initiatives, and
social dynamics are just a few of the variables that affect population trends in Malaysia.
Malaysia's urban areas, such as Selangor, Penang, and Johor are attracting people from rural
regions like Kelantan, Kedah and Terengganu, in search of better job opportunities and improved
living standards, leading to a significant increase in the urban population while causing a decline
in rural areas. This has resulted in a serious rural-urban out migration of population, leading to
the need for immigrant workers from neighboring countries like Indonesia, Thailand, and
Bangladesh to work in large plantations. These workers have remitted a large sum of money to
their own countries, some with the intention of staying permanently. In order to effectively plan,
allocate resources, and formulate policies that promote sustainable development, social cohesion,
and population well-being, it is essential to comprehend these patterns and the implications they
have.
The purpose of this project is to study the population of Malaysia starting from 1970 until
2019. The project also used to observe the increasing population among those 49 years. It is
important to study and forecast the pattern as it helps the policymakers to allocate resources
effectively in planning for infrastructures.It also helps businesses and industries for economic

3
planning. The result of the forecast helps to improve the method of attracting customers and gain
more profit from it. The forecast is also important so that the government can do long-term
planning for the country’s future. Also, the data that was collected can be used in the future to
predict and forecast the total number of population in Malaysia.
One of the issues identified in forecasting population in Malaysia is the population is
slowly shrinking over the years after years of increasing population that might mess up the trend
component. Chief Statistician, Datuk Seri Dr Mohd Uzir Mahidin said the country’s population
growth was being affected by a substantial decline in birth rates, higher death rates and slower
growth in the number of non-citizens in most states.

4
2.0 PROBLEM STATEMENT

From the raw dataset of Population in Malaysia from the year 1975 until 2019, we have identified
a few problems. One of the problems is that the data does not have a suitable trend for the yearly pattern
of Population in Malaysia from 1970 to 2019. From this issue,, it would impact the respective
departments from acquiring the estimated results and would not live up to their expectations. Other than
that, there has not been a suitable model to be used to analyze the number of tourists visiting Malaysia in
the span of 49 years starting from 2017 to 2019. By referring to this setback, we have decided to explore
which model is the most suitable to be used to analyze the number of Population in Malaysia in that
period of 49 years by determining through Mean of Square Error (MSE) and Mean Absolute Error
(MAPE). From this, we will forecast the estimated number of Population in Malaysia onwards to observe
the trendline of Population in Malaysia.

3.0 RESEARCH OBJECTION

1) To study the yearly pattern of Population in Malaysia from 1970 until 2019
2) To determine the most suitable model to analyze the Population in Malaysia in the span of 49 years
starting from 1970 until 2019
3) To forecast the correct estimated population in Malaysia in 2023.
4) To analyze the Mean of Square Error (MSE) and Mean Absolute Percentage Error (MAPE) for each
model .

5
4.0 DATA DESCRIPTION

The data we have chosen for our project is displayed below. The data represents the population
in Malaysia from 1970 until 2019. This information is based on a time series analysis of
historical data. Microsoft Excel software is used to compute and create graphs more effectively.
These functions are used to forecast the total population in Malaysia from 1970 – 2019.

Number Year Total Population


(‘000)

1 1970 10881.8

2 1971 11159.7

3 1972 11441.3

4 1973 11719.8

5 1974 12001.3

6 1975 12300.3

7 1976 12588.1

8 1977 12901.1

9 1978 13200.2

10 1979 13518.3

11 1980 13879.2

12 1981 14256.9

13 1982 14651.1

14 1983 15048.2

15 1984 15450.4

16 1985 15882.7

17 1986 16329.4

18 1987 16773.5

6
19 1988 17219.1

20 1989 17662.1

21 1990 18102.4

22 1991 18547.2

23 1992 19067.5

24 1993 19601.5

25 1994 20141.7

26 1995 20681.8

27 1996 21222.6

28 1997 21769.3

29 1998 22333.5

30 1999 22909.5

31 2000 23494.9

32 2001 24030.5

33 2002 24542.5

34 2003 25038.1

35 2004 25541.5

36 2005 26045.5

37 2006 26549.9

38 2007 27058.4

39 2008 27567.6

40 2009 28081.5

41 2010 28588.6

42 2011 29062

43 2012 29510

7
44 2013 30213.7

45 2014 30708.5

46 2015 31186.1

47 2016 31633.5

48 2017 32022.6

49 2018 32382.3

50 2019 32581.4
Table 1

8
5.0 METHODOLOGY

5.1 Univariate Model

5.1.1 Naïve Forecast

This model assumes a strong deterministic relation, which states that the current
event will determine the future outcome. This model works best when the actual
historical data series contain no discernible pattern.

Ft+m = yt form = 1, 2, 3, 4,…

Where m refers to the number of periods into the future for which the forecast is
desired and is y, actual value at time t.

5.1.2 Naïve with Trend

The Naïve Model is modified to take into account trend components. It is used to
overcome the problem of insufficient data, which commonly happens in the
organization. This model can be used even with short time series. The
one-step-ahead forecast is represented as:

Where y, is the actual value at time t1 and yt-1 is the actual value in preceding
period.

5.1.3 Average Change Model

It is given as:

Ft+m = yt + Average of Changes

9
This model is similar to the Naïve with Trend Model, with the exception that it is
influenced by all historical observation and its responses relatively quickly changes
in the actual time series. The model is most useful when the historical data being
analyzed are characterized by period-to-period changes that are approximately of
the same size.

5.1.4 Average Percent Change Model

The model assumes that the forecast of the dependent variable is equal to the actual
level of that variable in the current time period plus the average of the percentage
changes from one time period to the next. It can be formally stated as:

Ft+m = yt + Average of Percent Changes

Where Average of Percent Changes =

The most significant aspect is that the forecasts are generated based on percentage
changes in the historical data. It is most appropriate for time series that exhibit a
constant percentage of growth rate and also suitable for short data series.

5.1.5 Single Exponential Model

The general equation for single exponentially smoothed statistics is given as:

Ft+m = αyt + (1-α)Ft

Where,

10
Ft+m is the single exponentially smoothed value in period t+m, for m = 1, 2, 3,
4, …,

Yt is the actual value in time period t,

α is the unknown smoothing constant to be determined with value lying

between 0 and 1, selected by the forecaster or alternatively determined by the data,

Ft is the forecast or smoothed value for period t.

5.1.6 Double Exponential Model

This technique is also known as Brown’s Method. It is useful for series that exhibit
a linear trend characteristic. To demonstrate the method the following notations will
be used:

Let,

St be the exponentially smoothed value of yt , at time t.

S’t be the double exponentially smoothed value of yt , at time t.

Generally, there are 5 main equations involved:

St = αyt + (1-α)St-1

S’t = αSt + (1-α)S’t-1

at = 2St - S’t

α
bt = ( 1−α )(St - S’t)

Ft+m = at + bt x m

11
5.1.7 Holt’s Method

This technique is used to handle data with linear trends. This technique not only
smoothed the trend and the slope directly by using different smoothing constants,
but also provides more flexibility in selecting the rates at which the trend and slopes
are tracked.

The application of this method requires 3 equations:

St = αyt + (1-α)(St-1 + Tt-1)

Tt = β(St – St-1) + (1-β)Tt-1

Ft+m = St + Tt x m

The values of α and β range from 0 to 1.

5.2 Box-Jenkins Model

5.2.1 The Autoregressive (AR) Model

In the AR model, the current value of the variable is defined as a function of its previous values
plus an error term. In other words, the dependent variable, yt, is taken as the function of the
time-lagged values of itself. The general formula is written as,
yt = μ – ∅1yt-1 - ∅2yt-2 - … - ∅pyt-p + εt
Where μ and ∅ are constant terms or parameters to be estimated, yt is the dependent or current
value and yt-p the p^th order of the lagged dependent or current value and εt is the error term,
which is assumed iid with mean zero and variance, σ^2

5.2.2 The Moving Average (MA) Model

The moving average model links the current values of time series to random errors that have
occurred in the previous periods rather than the values of the actual series themselves. The
moving average can be written as,

12
yt = μ – θ1εt-1 - θ2εt-2 - … - θqεt-q + εt
Where is the mean about which series fluctuates, ‘s are the moving parameters to be estimated
and ‘s are the error terms (q = 1, 2, 3, …) assumed to be independently distributed over time.

5.2.3 The Mixed Autoregressive Moving Average (ARMA) Model

Under the assumption of stationarity, the mixed autoregressive and moving average model of the
Box-Jenkins methodology is known as the ARMA model. In other word, the series yt is assumed
stationary and the ARMA model is written as,
yt = μ – ∅1yt-1 - ∅2yt-2 - … - ∅pyt-p - θ1εt-1 - θ2εt-2 - … - θqεt-q + εt
The AR and the MA models are of the order ‘p’ and ‘q’ respectively. It can be expressed in terms
of both, p and q as ARMA (p,q). For example,
ARMA (1,1) : yt = μ – ∅1yt-1 - θ1εt-1 + εt

13
6.0 ANALYSIS AND RESULT

6.1 Components of Time Series Data

Figure 1

The graph shows the actual value of the total population in Malaysia from 1970 to 2019. As we
can interpret from the graph, there are 50 years of data in the graph above that shows the total
population in each year. In the first 20 years which started from 1970 to 1990, the total
population ranged between 15,000,000 and 20,000,000 and the line of the graph shows an
increase following the other year. In addition, the total population in Malaysia from 1991 to
2010s are above 30,0000. The graphs also continued to increase as the population in Malaysia
also increased. This could be caused by the improvement in technology in the medical sector,
migration and social dynamics that increase the birth rate in Malaysia throughout the years.
Furthermore, the total population is estimated to increase in 2019 and upwards with more
advanced technology, especially in the medical industry, causing an increase in Malaysian
fertility rates. Lastly, the latest total population in Malaysia as stated in the graph above is below
35 millions and is expected to increase in the future.

14
6.2 Univariate Modelling

Based on the data, 8 Univariate Forecasting Techniques to forecast values have been chosen. The
models chosen are Naïve Forecast, Naïve with Trend, Average Forecast, Average Change Model,
Average Percent Change Model, Single Exponential Method, Double Exponential Method and
Holt’s Method. The best model was determined by calculating the Mean Squared Error (MSE)
and Mean Absolute Percentage Error (MAPE).

6.2.1 Naïve Forecast

Figure 2

MSE 206627.2

MAPE 2.211999
Table 2

Referring to the graph above, the fitted value only has a slight difference with the actual value. It
follows the pattern of the actual value rather closely. The calculation has been done in Microsoft
Excel to determine the value of MSE and MAPE which are 206,627.2 and 2.211999 respectively.

15
Values in Microsoft Excel for Naive Forecast
Figure 3

16
Formulas in Microsoft Excel for Naive Forecast
Figure 4

17
6.2.2 Naïve with Trend

Figure 5

MSE 3458.867

MAPE 0.122432
Table 3

As we can see on the graph above, it represents the fitted value and the actual value of the Naïve
with Trend model. The fitted value of this model is almost identical to the actual value. The
calculation has been done in Microsoft Excel to determine the value of MSE and MAPE which
are 3,458.867 and 0.122432 respectively.

18
Values in Microsoft Excel for Naive with Trend Forecasts
Figure 6

19
Formulas in Microsoft Excel for Naive with Trend Forecast

20
Figure 7
6.2.3 Average Change Model

Figure 8

Table 4

According to the graph, the average change model predicts that the total population in Malaysia is
increasing from 1970 to 2019. The graph demonstrates that the fitted value is almost the same as the
actual value and will still increase in 2019. The graph shows the upward trend and does not show the fall
down of the population. The MSE and MAPE are determined by the average change model in Microsoft
Excel which are 3,143.748 and 0.132962 respectively.

21
Values in Microsoft Excel for Average Change Model
Figure 9

22
Formulas in Microsoft Excel for Average Change Forecast
Figure 10

23
6.2.4 Average Percent Change Model

Figure 11

Table 5

Based on the graph above, the average percent change model predicts an increase in total population in
Malaysia from 1970 to 2019. As we can see in this graph, the fitted values mostly approach the actual
values and keep increasing and create an upward trend. We developed this average percent change model
in Microsoft Excel to determine the values of MSE and MAPE, which are 3,399.711 and 0.145488,
correspondingly.

24
Values in Microsoft Excel for Average Percent Change Model
Figure 12

25
Formulas in Microsoft Excel for Average Change Forecast
Figure 13

26
6.2.5 Single Exponential Smoothing Model

Figure 14

Table 6

Referring to the graph above, there are upwards trends which represent the total population in Malaysia
from 1970 to 2019 using a single exponential smoothing method. Other than that, the total population in
Malaysia also gradually increases as the years also increase. Next, by looking at the graph there are also
two lines which represent the actual total and fitted value. The fitted values are near to the actual values
and do not show a large difference. However, there is a slight difference or gap between both lines in
1992 until 2014. To find the suitable alpha, we have tried three values given which are 0.2, 0.5 and 0.8
which allow us to find the suitable alpha that can be used in this model. The value of the alpha is 0.8.
Lastly, we calculated MSE and MAPE values using Microsoft Excel which are 314,544.9 and 2.689941
respectively.

27
Values in Microsoft Excel for Single Exponential Smoothing Model
Figure 15

28
Formulas in Microsoft Excel for Single Exponential Smoothing Model
Figure 16

29
6.2.6 Double Exponential Smoothing Model

Figure 17

Table 7

The graph above shows estimation in Total Population in Malaysia from 1970 to 2019 using a
double-exponential model. Additionally, as the years pass, Malaysia's population grows overall over time
and creates upward trends. The two lines in this graph indicated the actual value of total population and
the fitted value. As we can see in this graph, the fitted pattern is almost equal to the actual value but there
are several years that have a slightly different gap between the actual and the fitted value. Next, we have
experimented with three of the provided values, 0.2, 0.5, and 0.8 in order to discover the appropriate alpha
that may be employed in this model. As a result, we discovered that the value of suitable alpha used in
this model is 0.8. Last but not least, the calculation of MSE which is 4,743.025 and MAPE which is
0.204392 are calculated using Microsoft Excel.

30
Values in Microsoft Excel for Double Exponential Smoothing Model
Figure 18

31
Formulas in Microsoft Excel for Double Exponential Smoothing Model
Figure 19

32
6.2.7 Holt’s Method

Figure 20

Table 8

According to the graph above, the Holt’s Method is used to estimate the total population in Malaysia
throughout 1970 until 2019. The graph shows an upward trend and the total population gradually
increased as the years passed. The actual value is represented in the blue line while the fitted value is
represented in the orange line. As we can see in this graph, the actual value is closed to the fitted value but
has a slight gap between them. For instance, we can see a gap between the lines in 2012 to 2014.
However, the gap narrowed in 2015 and above. In addition, we are using the value of alpha and beta
which is 0.2, 0.4 and 0.8. The values of alpha and beta are substituted into the formula repeatedly. As a
result, the most suitable values of alpha and beta are both 0.8 and the MSE and MAPE are also calculated
using Microsoft Excel which are 53,738.75 and 1.127817 respectively.

33
Values in Microsoft Excel for Holts’ Method
Figure 21

34
Formulas in Microsoft Excel for Holt’s Method
Figure 22

35
6.3 Box-Jenkins
Fitting part: 1970 - 2007
Hold-out part: 2008 – 2019

Figure 23

Figure 24

36
Figure 25

Figure 26

37
Figure 27

Comment:
● ACF: it shows a slow decline in the values of the autocorrelation function (ACF) of the original
series which suggests that the CPI data is not stationary. This behavioral pattern is also called
decay.
● PACF: the first partial autocorrelation is highly significant with a value of 0.925 and for the
subsequent lags the values of the partial autocorrelation are insignificant. Since there is only one
large partial autocorrelation value and the rests are insignificant, then it is said that the PACF has
one significant ‘spike’.

6.3.1 First-order differencing

38
Figure 28

Figure 29

39
Figure 30

Figure 31

Comment:
● It shows that the ACF and PACF of the same series after performing the first difference.
In both cases, the patterns do not show significant change.

40
6.3.2 Second-order differencing

Figure 32

Figure 33

41
Figure 34

Figure 35
Comment:
● ACF: The rate of decay is much faster after second differencing in which the values of
the autocorrelation change from positive to negative at lag 3. It shows that ACF has one
significant spike at lag 9 and a second spike which is barely near the line at lag 12.
● PACF: Has one spike at lag 9.

42
6.3.3 ARIMA (1,2,2)

Figure 36

Figure 37

Figure 38

Figure 39

43
6.3.4 ARIMA (1,2,1)

Figure 40

Figure 41

Figure 42

Figure 43

44
6.3.5 ARIMA (1,2,3)

Figure 44

Figure 45

Figure 46

Figure 47

45
6.3.6 ARIMA (2,2,2)

Figure 48

Figure 49

Figure 50

Figure 51

46
6.3.7 ARIMA Comparison

Statistics Model Form

ARIMA(1,2,2) ARIMA (1,2,1) ARIMA (1,2,3) ARIMA (2,2,2)

Bayesian 6.406 6.281 6.503 6.362


Information
Criterion (BIC)

Comment:
By comparing the value of the Bayesian Information Criterion (BIC), we can conclude that
ARIMA (1,2,1) is found to be the best model since this criterion is minimum for the model / has
the lowest value.

47
6.3.8 Evaluating Part

Figure 52

Figure 53

Figure 54

48
Figure 55

Figure 56

Therefore, the forecast for one-step-ahead ARIMA(1,2,1) model is 32825.6

49
CONCLUSION

MODEL ESTIMATION

MSE MAPE

Naive Forecast 206627.2 2.211999

Naive with Trend 3458.867 0.122432

Average Change Model 3143.748 0.132962

Average Percent Change Model 3399.711 0.145488

Single Exponential 314554.9 2.689941

Double Exponential 4743.025 0.204392

Holt’s Method 53738.75 1.127817

From the error measure table, it reveals that, from the full model above in technique, the Average
Change Model has the MAPE values for estimation. In terms of MSE, Naive Forecast may have
the lowest value among other models, but the MAPE value is the second highest among all.
Meanwhile Holt's Method is the worst model since it has the highest value of MSE. In terms of
MAPE, the Average Change Model is the best model with the lowest value of MAPE.
As a result, the Average Change Model may be the best method to forecast the number of
population in Malaysia for the upcoming years.

50
REFERENCE

DOSM, Department of Statistics Malaysia (2019, July 15). CURRENT POPULATION


ESTIMATES, MALAYSIA, 2018-2019. Retrieved June 16, 2023,.

Dr. H. F. (n.d.). Malaysia. Map Malaysia - Popultion density by administrative division.


http://geo-ref.net/ph/mys.htm

Pfordten, D. (2022, April 21). Interactive: Malaysia’s population is barely growing. Here's why
and why it matters. The Star.
https://www.thestar.com.my/news/nation/2022/04/22/interactive-malaysias-population-is-barely-growing-
heres-why-and-why-it-matters

The Stars, (2023, February 9). Malaysia’s Population Estimated to Be 33mil at End-2022, Says
Stats Dept.

Yusoff, M.B., Hasan, F.A., & Jalil, S.A. (2000). Globalisation, economic policy, and equity: the
case of Malaysia.

51

You might also like