Professional Documents
Culture Documents
Credit Units: 4 CU
Course description
Forecasting
By the end of the course the student should be able to use basic competence in
the concepts, principles, procedures and applications of time series analysis.
Reading List
Chapter One
There are a number of reasons for analyzing time series that include but not
limited to the following:
As a result, the ability to forecast and predict future events and trends
greatly enhances the likelihood of success. It is therefore no wonder that
businesses and governments spend a good deal of time and effort in the
pursuit of accurate forecasts of future trends and developments.
There are four components to a time series: the trend, the cyclical variation,
the seasonal variation, and the irregular variation.
Trend (Secular Trend) is the steady increase or decrease over a long period of
time, reflecting long-term growth or decline of the variable of interest.
3
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
There are two main reasons why we may wish to identify cycles in time series.
In the first place, we may want to know where we are in the cycle to anticipate
what may happen in the near future.
Second, as with trend, when a cycle is identified and isolated, the other factors
affecting the time series data are more easily seen and can be explained
accordingly.
4
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Practically, all business and economic series have recurring seasonal patterns.
For example, sales for clothes are high prior to Christmas. Prices of produce
are low at harvest time.
Furthermore, once we know the seasonal variations, we may want to iron out
the intra-year variations by promoting during the off season.
Time series analysts prefer to subdivide the irregular variation into episodic
and residual variations.
Episodic variation
Residual variation
After the episodic fluctuations have been removed, the remaining variation is
the residual variation, often called chance fluctuations. These are
unpredictable and they cannot be identified.
Note: Neither the episodic nor the residual variations can be predicted into the
future. They are merely treated as the residual influence after the other three
components of the time series data have been taken into account.
5
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
1.4.1 Definition
The additive model refers to time series 𝑌𝑡 as an algebraic sum of the four
components, symbolically expressed as: 𝑌𝑡 = 𝑇𝑡 + 𝑆𝑡 + 𝐶𝑡 + 𝐼𝑡
Where 𝑌𝑡 is the value of the time series for the time period t, and the right-hand
side values are the trend, the seasonal variation, the cyclical variation, and the
random or irregular variation respectively, for the same time period.
All the values are expressed in their original units, and S, C and I are
deviations around T.
The time series 𝑌𝑡 does not depend on the four components. That is, 𝑌𝑡 is
independent of the four components.
In order to break down the time series data and measure the effect of the
individual components, we proceed in four steps as follows:
Example: If we were to develop a time-series model for sales for a local retail
store, we might find that T = $500, S = $100, C = -$25, and I = -$10. Sales
would be: Y = $500 + $100 -$25 -$10 = $565
6
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Notice that the positive value for S indicates that existing seasonal influences
have a positive impact on sales. The negative cyclical value suggests that the
business cycle is currently in a downswing. There was apparently some
random event that had a negative impact on sales.
The additive model suffers from the somewhat unrealistic assumption that the
components are independent of each other. This is seldom the case in the real
world. In most instances, movements in one component will have an impact on
the other components, thereby negating the assumption of independence. Or,
perhaps even more commonly, we often find that certain forces at work in the
economy simultaneously affect two or more components. Again, the
assumption of independence is violated.
The multiplicative model assumes that the four components interact with each
other and do not move independently.
It is expressed as follows: 𝑌𝑡 = 𝑇𝑡 x 𝑆𝑡 x 𝐶𝑡 x 𝐼𝑡
This model is often preferred for the reason that the components affect one
another.
In order to break down the time series data and measure the effect of the
individual components, we proceed in four steps as follows:
1. Isolate the seasonal variation, S, and then deseasonalise the data:
𝑇𝑡 x 𝑆𝑡 x 𝐶𝑡 x 𝐼𝑡
= 𝑇𝑡 x 𝐶𝑡 x 𝐼𝑡
𝑆𝑡
Interpretation
Examine the trend analysis plot to determine whether your model fits your data. If the fits closely follow
the actual data, the model fits your data. Ideally, the data points should fall randomly around the fitted
line.
If the model fits the data, you can perform double exponential smoothing and compare the two
models.
If the model does not does fit the data, perform the analysis again and select a different type of
model. If you fit a linear model and see curvature in the data, select the quadratic, exponential,
or S-curve model. If none of the models fit your data, use a different time series analysis. For
more information, go to Which time series analysis should I use?.
On this trend analysis plot, the fits closely follow the data, which indicates that the model fits the
data.
8
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Traditional methods of time series analysis are concerned with decomposing of a series into a
trend, a seasonal variation and other irregular fluctuations. Although this approach is not always
the best but still useful (Kendall and Stuart, 1996).
The components, by which time series is composed of, are called component of time series data.
There are four basic Component of time series data described below.
9
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
average or autoregressive models, i.e. we can see if any cyclical variation is still left in the
residuals. These variation occur due to sudden causes are called residual variation (irregular
variation or accidental or erratic fluctuations) and are unpredictable, for example rise in prices
of steel due to strike in the factory, accident due to failure of break, flood, earth quick, war etc.
R TIME SERIES
10
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
11
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
12
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
13
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
14
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
15
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Exercise 1:
Plot a graph using the time series data in the table below:
Exercise 2:
Plot the graph for using time series data in the table below:
Exercise 3:
The table below shows weekly fuel purchases for the Ministry of Finance
Headquarters:
Week Fuel (liters) Week Fuel (liters)
1 409 11 318
2 289 12 598
3 509 13 418
4 364 14 359
5 404 15 432
6 445 16 252
7 310 17 446
8 372 18 473
9 440 19 337
10 414 20 478
In the modules on hypothesis testing we presented techniques for testing the equality of means in
more than two independent samples using analysis of variance (ANOVA). An underlying
assumption for appropriate use of ANOVA was that the continuous outcome was approximately
normally distributed or that the samples were sufficiently large (usually nj> 30, where j=1, 2, ...,
k and k denotes the number of independent comparison groups). An additional assumption for
appropriate use of ANOVA is equality of variances in the k comparison groups. ANOVA is
generally robust when the sample sizes are small but equal. When the outcome is not normally
distributed and the samples are small, a nonparametric test is appropriate.
A popular nonparametric test to compare outcomes among more than two independent groups is
the Kruskal Wallis test. The Kruskal Wallis test is used to compare medians among k
comparison groups (k > 2) and is sometimes described as an ANOVA with the data replaced by
their ranks. The null and research hypotheses for the Kruskal Wallis nonparametric test are
stated as follows:
The procedure for the test involves pooling the observations from the k samples into one
combined sample, keeping track of which sample each observation comes from, and then
ranking lowest to highest from 1 to N, where N = n1+n2 + ...+ nk. To illustrate the procedure,
consider the following example.
Example:
A clinical study is designed to assess differences in albumin levels in adults following diets with
different amounts of protein. Low protein diets are often prescribed for patients with kidney
failure. Albumin is the most abundant protein in blood, and its concentration in the serum is
measured in grams per deciliter (g/dL). Clinically, serum albumin concentrations are also used to
assess whether patients get sufficient protein in their diets. Three diets are compared, ranging
from 5% to 15% protein, and the 15% protein diet represents a typical American diet. The
albumin levels of participants following each diet are shown below.
17
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Is there is a difference in serum albumin levels among subjects on the three different diets. For
reference, normal albumin levels are generally between 3.4 and 5.4 g/dL. By inspection, it
appears that participants following the 15% protein diet have higher albumin levels than those
following the 5% protein diet. The issue is whether this observed difference is statistically
significant.
In this example, the outcome is continuous, but the sample sizes are small and not equal across
comparison groups (n1=3, n2=5, n3=4). Thus, a nonparametric test is appropriate. The hypotheses
to be tested are given below, and we will us a 5% level of significance.
To conduct the test we first order the data in the combined total sample of 12 subjects from
smallest to largest. We also need to keep track of the group assignments in the total sample.
Notice that the lower ranks (e.g., 1, 2.5, 4) are assigned to the 5% protein diet group while the
higher ranks (e.g., 10, 11 and 12) are assigned to the 15% protein diet group. Again, the goal of
the test is to determine whether the observed data support a difference in the three population
medians. Recall in the parametric tests, discussed in the modules on hypothesis testing, when
comparing means among more than two groups we analyzed the difference among the sample
means (mean square between groups) relative to their within group variability and summarized
the sample information in a test statistic (F statistic). In the Kruskal Wallis test we again
summarize the sample information in a test statistic based on the ranks.
18
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
where k=the number of comparison groups, N= the total sample size, nj is the sample size in the
jth group and Rj is the sum of the ranks in the jth group.
In this example R1 = 7.5, R2 = 30.5, and R3 = 40. Recall that the sum of the ranks will always
equal n(n+1)/2. As a check on our assignment of ranks, we have n(n+1)/2 = 12(13)/2=78 which
is equal to 7.5+30.5+40 = 78. The H statistic for this example is computed as follows:
We must now determine whether the observed test statistic H supports the null or research
hypothesis. Once again, this is done by establishing a critical value of H. If the observed value of
H is greater than or equal to the critical value, we reject H0 in favor of H1; if the observed value
of H is less than the critical value we do not reject H0. The critical value of H can be found in the
table below.
To determine the appropriate critical value we need sample sizes (n1=3, n2=5 and n3=4) and our
level of significance (α=0.05). For this example the critical value is 5.656, thus we reject H0
because 7.52 > 5.656, and we conclude that there is a difference in median albumin levels among
the three different diets.
Notice that Table 8 contains critical values for the Kruskal Wallis test for tests comparing 3, 4 or
5 groups with small sample sizes. If there are 3 or more comparison groups and 5 or more
observations in each of the comparison groups, it can be shown that the test statistic H
approximates a chi-square distribution with df=k-1. Thus, in a Kruskal Wallis test with 3 or
more comparison groups and 5 or more observations in each group, the critical value for the test
can be found in the table of Critical Values of the χ 2 Distribution below.
19
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
20
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
21
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
22
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Example:
A personal trainer is interested in comparing the anaerobic thresholds of elite athletes. Anaerobic
threshold is defined as the point at which the muscles cannot get more oxygen to sustain activity
or the upper limit of aerobic exercise. It is a measure also related to maximum heart rate. The
following data are anaerobic thresholds for distance runners, distance cyclists, distance
swimmers and cross-country skiers.
H1: The four population medians are not all equal α=0.05
The test statistic for the Kruskal Wallis test is denoted H and is defined as follows:
where k=the number of comparison groups, N= the total sample size, nj is the sample size in the
jth group and Rj is the sum of the ranks in the jth group.
Because there are 4 comparison groups and 5 observations in each of the comparison groups, we
find the critical value in the table of critical values for the chi-square distribution for df=k-1=4-
1=3 and α=0.05. The critical value is 7.81, and the decision rule is to reject H0 if H > 7.81.
23
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
To conduct the test we assign ranks using the procedures outlined above. The first step in
assigning ranks is to order the data from smallest to largest. This is done on the combined or total
sample (i.e., pooling the data from the four comparison groups (n=20)), and assigning ranks from
1 to 20, as follows. We also need to keep track of the group assignments in the total sample. The
table below shows the ordered data.
24
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
We now assign the ranks to the ordered values and sum the ranks in each group.
Recall that the sum of the ranks will always equal n(n+1)/2. As a check on our assignment of
ranks, we have n(n+1)/2 = 20(21)/2=210 which is equal to 46+62+24+78 = 210. In this example,
Step 5. Conclusion.
Reject H0 because 9.11 > 7.81. We have statistically significant evidence at α =0.05, to show that
there is a difference in median anaerobic thresholds among the four different groups of elite
athletes.
25
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Notice that in this example, the anaerobic thresholds of the distance runners, cyclists and cross-
country skiers are comparable (looking only at the raw data). The distance swimmers appear to
be the athletes that differ from the others in terms of anaerobic thresholds. Recall, similar to
analysis of variance tests, we reject the null hypothesis in favor of the alternative hypothesis if
any two of the medians are not equal.
Exercise 1
Test for stationarity in the time series using a Runs test on the following data:
Exercise 2
Use turning point test to test for stationarity in the data in exercise 1.
Exercise 3
Exercise 4
26
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Exercise 5
Use the Pearson’s test on the data below to test for stationarity in the time
series:
t 𝒀𝒕 𝒕𝟐 𝒀𝒕 𝟐 t𝒀𝒕
1 1.82 1 3.31 1.82
2 2.6 4 6.76 5.2
3 1.7 9 2.89 5.1
4 2.8 16 7.84 11.2
5 3.4 25 11.56 17.0
6 4.3 36 18.49 25.8
7 3.1 49 9.61 21.7
8 4.5 64 20.25 36.0
9 5.0 81 25.0 45.0
10 5.7 100 32.49 57.0
11 4.12 121 16.97 45.32
12 3.6 144 12.96 43.2
13 6.3 169 39.69 81.9
14 7.0 196 49.0 98.0
Exercise 6
A record of the production for each machine operator was kept over a period of
time. Certain changes in the production procedure were suggested, and 11
operators were picked as an experimental test group to determine whether the
new procedures were worthwhile. Their production rates before and after the
new procedures were established and recorded as follows:
27
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Exercise 7
One of the major car manufacturers is studying the effect of regular verses
super gasoline in its economy cars. Ten executives are selected and asked to
maintain records on the number of kilometers per liter of gas. The results are:
Exercise 8
It has been suggested that daily production of an assembly line for batteries at
Uganda Batteries Limited would be increased if better portable lighting were
installed and background music and free coffee and doughnuts were provided
during the day. Management agreed to try the scheme for a limited time. The
numbers of batteries produced per week by a small test group of employees are
as follows:
Using the Wilcoxon signed-rank test, determine whether the suggested changes
are worthwhile.
a. State the null hypothesis
b. Decide on the alternative hypothesis
c. Decide on the level of significance
d. State the decision rule
e. Compute T and arrive at a decision
28
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Exercise 9
The following observations were selected from populations that were not
necessarily normally distributed. Use the 0.05 significance level, a two-tailed
test, and the Wilcoxon rank-sum test to determine whether there is a
difference between the two populations:
Population A 38 45 56 57 61 69 70 79
Population B 26 31 35 42 51 52 57 62
Exercise 10
Current Method: 41 36 42 39 36 48 49 38
Experimental: 21 27 36 20 19 21 39 24 22
Exercise 11
independent.
29
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Exercise 12
Four methods of treating steel rods are analyzed to determine whether there
is any difference in analyzed the pressure the rods can bear before breaking.
The results of the tests measuring the pressure in pounds before the rods
bent are shown. Conduct the test, complete with the hypotheses, decision
rule, and conclusion. Ser 𝛂 = 1 percent.
Exercise 13
The quality control manager for a large plant in Kampala Industrial Area gives
two operations manuals to two groups of employees. Each group is then tested
on operational procedures. The scores are shown in the table below. The
manager has always felt that manual 1 provides a better base of knowledge for
new employees. Compute the mean test scores of the employees and report
your conclusion. State the hypotheses. Set 𝛂 = 0.05.
Manual 1 87 97 82 97 92 90 81 89 90 88 87 89 93
Manual 2 92 79 80 73 84 93 86 88 91 82 81 84 72 74
Exercise 14
At the 10 percent level, is there a relationship between study time in hours and
grades on a test, according to the data in the table below?
Time 21 18 15 17 18 25 18 4 6 5
Grade 67 58 59 54 58 80 14 15 19 21
30
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
CHAPTER THREE
3.1 Introduction
Series whose average value changes over the time is referred to as being non-
stationary.
3.2 Definition
A trend is a general tendency of a series to either rise or decline. That is, a time
series 𝑌𝑡 is trended if E(𝑌𝑡 ) = f(𝛽0 𝛽1…..) for t = 1,2,………..
There are numerous tests for trends but the most common are the tests for
linear trends and those for non-linear trends.
31
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
The basic tests for linear trends are the Simple linear and the Least Square
Models.
The long-time trend of many time series such as sales, exports, production
often approximates a straight line.
𝜕(𝜀𝑡 )2
= 2(-t)(𝑌𝑡 - 𝛼0 - 𝛼1 t) = 0 (ii)
𝜕𝛼1
(−2)(𝑌𝑡 − 𝛼0 − 𝛼1 t) 0
From (i) dividing both sides by -2: =
−2 −2
Therefore: (𝑌𝑡 - 𝛼0 - 𝛼1 t) = 0
𝑛 ∑𝑡 𝛼0 ∑ 𝑌𝑡
( 2 ) (𝛼 ) =( )
∑𝑡 ∑𝑡 1 ∑ 𝑡𝑌𝑡
∑ 𝑌𝑡 ∑𝑡 𝑛 ∑ 𝑌𝑡
| | | |
∑ 𝑡𝑌𝑡 ∑ 𝑡 2 ∑ 𝑡 ∑ 𝑡𝑌𝑡
𝛼0 = 𝑛 ∑𝑡 , and 𝛼1 = 𝑛 ∑𝑡
| | | |
∑ 𝑡 ∑ 𝑡2 ∑ 𝑡 ∑ 𝑡2
32
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Example:
t 𝒀𝒕 𝒕𝒀𝒕 𝒕𝟐
1 1.8 1.8 1
2 2.4 4.8 4
3 2.8 8.4 9
4 3.6 14.4 16
5 4.2 21.0 25
6 4.9 29.4 36
7 5.9 41.3 49
8 3.8 30.4 64
9 6.4 57.6 81
10 7.0 70.0 100
55 42.8 279.1 385
∑ 𝑌𝑡 = n𝛼0 + 𝛼1 ∑ 𝑡
∑ 𝑡𝑌𝑡 = 𝛼0 ∑ 𝑡+ 𝛼1 ∑ 𝑡 2
𝑛 ∑𝑡 𝛼0 ∑ 𝑌𝑡
( 2 ) (𝛼 ) =( )
∑𝑡 ∑𝑡 1 ∑ 𝑡𝑌𝑡
10 55 𝛼0 42.8
( ) (𝛼 ) =( )
55 385 1 279.1
∑ 𝑌𝑡 ∑𝑡 42.8 55
| | | |
∑ 𝑡𝑌𝑡 ∑ 𝑡 2 279.1 385 (42.8𝑥385)−(279.1𝑥55) 1127.5
𝛼0 = 𝑛 ∑𝑡 = 10 55 = = = 1.37
| | | | (10𝑥385)−(55𝑥55) 825
∑ 𝑡 ∑ 𝑡2 55 385
𝑛 ∑ 𝑌𝑡 10 42.8
| | | | (10𝑥279.1)−(42.8𝑥55)
∑ 𝑡 ∑ 𝑡𝑌𝑡 55 279.1 437
𝛼1 = 𝑛 ∑𝑡 = 10 55 = = = 0.53
| | | | (10𝑥385)−(55𝑥55) 825
∑ 𝑡 ∑ 𝑡2 55 385
Projection for t =15: 𝑌̂15 = 𝑇15 = 1.37 + 0.534(15) = 1.37 + 8.01 = 9.38
33
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Definition
The Least Squares Model is the method of computing the equation for the
straight line through the data of interest gives the “best-fitting” line.
The Least Squares Estimation Method (LSE) minimizes the error term.
𝜀𝑡 = 𝑌𝑡 - 𝛼0 - 𝛼1 t - 𝛼2 𝑡 2
𝜕(𝜀𝑡 )2
= 2(-t)( 𝑌𝑡 − 𝛼0 − 𝛼1 t − 𝛼2 𝑡 2 ) = 0 (ii)
𝜕𝛼1
𝜕(𝜀𝑡 )2
= 2(-𝑡 2 )( 𝑌𝑡 − 𝛼0 − 𝛼1 t − 𝛼2 𝑡 2 ) = 0 (iii)
𝜕𝛼2
(−2)(𝑌𝑡 − 𝛼0 − 𝛼1 t− 𝛼2 𝑡 2 ) 0
From (i) dividing both sides by -2: =
−2 −2
Therefore: (𝑡𝑌𝑡 − 𝛼0 t − 𝛼1 t 2 − 𝛼2 𝑡 3 ) = 0
34
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
(−2)(t2 )(𝑌𝑡 − 𝛼0 − 𝛼1 t− 𝛼2 𝑡 2 ) 0
From (iii) dividing both sides by -2: =
−2 −2
Therefore: (𝑡 2 𝑌𝑡 − 𝛼0 t 2 − 𝛼1 t 3 − 𝛼2 𝑡 4 ) = 0
𝑛 ∑𝑡 ∑ t 2 𝛼0 ∑ 𝑌𝑡
(∑𝑡 ∑ t2 ∑ t 3 ) (𝛼1 ) = (∑ 𝑡𝑌𝑡 )
∑ t2 ∑ t3 ∑ t 4 𝛼2 𝑡 2 𝑌𝑡
Example:
t 𝒀𝒕 𝒕𝟐 𝒕𝟑 𝒕𝟒 t 𝒀𝒕 𝒕𝟐 𝒀𝒕
1 1.8 1 1 1 1.8 1.8
2 2.3 4 8 16 4.6 9.2
3 2.8 9 27 81 8.4 25.2
4 3.6 16 64 256 14.4 57.6
5 4.2 25 125 625 21.0 105.0
6 4.9 36 216 1296 29.4 176.4
7 5.9 49 343 2401 41.3 289.1
8 3.8 64 512 4096 30.4 243.2
9 6.4 81 729 6561 57.6 518.4
10 7.0 100 1000 10000 70.0 700.0
55 42.7 385 3025 25333 278.9 2125.9
𝑛 ∑𝑡 ∑ t 2 𝛼0 ∑ 𝑌𝑡
(∑𝑡 ∑ t2 ∑ t ) (𝛼1 ) = (∑ 𝑡𝑌𝑡 )
3
∑ t2 ∑ t3 ∑ t 4 𝛼2 𝑡 2 𝑌𝑡
35
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
10 55 385 𝛼0 42.7
( 55 385 3025 ) (𝛼1 ) = ( 278.9 )
385 3025 25333 𝛼2 2125.9
𝛼0 = 1.225, 𝛼1 = 0.588, 𝛼2 = -0.0049
Therefore: 𝑌̂ = 𝛼0 + 𝛼1 t + 𝛼2 𝑡 2
For example t = 7
For projection:
For example t = 12
∑ 𝑌𝑡 = n𝛼0 + 𝛼1 ∑ 𝑡 + 𝛼2 ∑ t 2 + 𝛼3 ∑ t 3 (i)
∑ 𝑡𝑌𝑡 = 𝛼0 ∑ 𝑡 + 𝛼1 ∑ t 2 + 𝛼2 ∑ t 3 + 𝛼3 ∑ t 4 (ii)
𝑡 2 𝑌𝑡 = 𝛼0 ∑ t 2 + 𝛼1 ∑ t 3 + 𝛼2 ∑ t 4 + 𝛼3 ∑ t 5 (iii)
𝑡 3 𝑌𝑡 = 𝛼0 ∑ t 3 + 𝛼1 ∑ t 4 + 𝛼2 ∑ t 5 + 𝛼3 ∑ t 6 (iv)
∑ 𝑌𝑡 = n𝛼0 + 𝛼1 ∑ 𝑡
∑ 𝑡𝑌𝑡 = 𝛼0 ∑ 𝑡 + 𝛼1 ∑ t 2
Conversely,
∑ ln 𝑌𝑡 = 𝑛 ln 𝛼0 + ln 𝛼1 ∑ 𝑡
36
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
∑ 𝑡 ln 𝑌𝑡 = ln 𝛼0 ∑ 𝑡 + ln 𝛼1 ∑ t 2
This describes the exponential trend as a curve, where 𝛼0 and 𝛼1 are both
positive constants and 𝛼1 is raised to the power (exponent) equal to the number
of time periods from the midpoint origin. If we take the logarithm of both sides
of the equation, we get a logarithmic trend similar to a simple linear regression
equation.
Example:
t 𝒀𝒕 ln𝒀𝒕 t ln𝒀𝒕 𝒕𝟐
1 2.27 0.82 0.82 1
2 2.32 0.84 1.68 4
3 2.393 0.87 2.61 9
4 2.56 0.94 3.76 16
5 2.647 0.97 4.85 25
6 2.775 1.02 6.12 36
7 2.85 1.05 7.35 49
8 2.9 1.06 8.48 64
9 2.982 1.09 9.81 81
10 2.059 0.72 7.2 100
11 3.143 1.15 12.65 121
12 3.289 1.19 14.28 144
13 3.376 1.22 15.86 169
14 3.602 1.28 17.92 196
105 14.22 113.39 1015
37
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
𝛼 −1
𝛼1 = (∑2 𝑦 − ∑1 𝑦)( (𝛼𝑛2−1)2)
2
1 𝛼𝑛 −1
𝛼0 = 𝑛[∑1 𝑦 - ( 𝛼2 −1) 𝛼1 ]
2
Example:
t 𝒚𝒕 t 𝒚𝒕 t 𝒚𝒕
1 1.985 8 2.393 15 3.059
2 2.032 9 2.560 16 3.143
3 2.088 10 2.647 17 3.289
4 2.095 11 2.775 18 3.376
5 2.182 12 2.853 19 3.454
6 2.270 13 2.904 20 3.547
7 2.320 14 2.982 21 3.602
Total 14.972 Total 19.114 Total 23.47
∑3 𝑦− ∑2 𝑦 23.47−19.114
𝛼27 = ∑2 𝑦− ∑1 𝑦
= = 1.0517
19.114−14.972
7
𝛼2 = √1.0517 = 1.0072
𝛼 −1 1.0072−1 0.0072 0.0072
𝛼1 = (∑2 𝑦 − ∑1 𝑦) ( (𝛼𝑛2−1)2) = (4.142)( (1.0517−1)2) = (4.142)( (0.0517)2 ) = (4.142)( 0.00267)
2
= 11.1574
1 𝛼𝑛 −1 1 1.0517−1
𝛼0 = 𝑛[∑1 𝑦 - ( 𝛼2 −1) 𝛼1 ] = 7[14.972- (1.0072−1)11.1574] = -9.3063
2
38
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
∑3 𝑙𝑛𝑦− ∑2 𝑙𝑛𝑦
𝛼2𝑛 = ∑2 𝑙𝑛𝑦− ∑1 𝑙𝑛𝑦
𝛼 −1
ln𝛼1 = (∑2 𝑙𝑛𝑦 − ∑1 𝑙𝑛𝑦)( (𝛼𝑛2−1)2)
2
1 𝛼𝑛 −1
𝑙𝑛𝛼0 = 𝑛[∑1 𝑙𝑛𝑦 - ( 𝛼2 −1) 𝑙𝑛𝛼1 ]
2
Example:
Using the information from the modified trend above fit a Gompertz Curve.
7
𝛼2 = √0.849 = 0.977
𝛼 −1 0.977−1
ln𝛼1 = (∑2 𝑙𝑛𝑦 − ∑1 𝑙𝑛𝑦)( (𝛼𝑛2−1)2) = (7.014 − 5.313)( (0.849−1)2 ) = -1.73
2
The general behavior of the variable can often be best discussed by examining
its long-term trend. However, if the time series contains too many random
fluctuations or short-term seasonal changes, the trend may be somewhat
obscured and difficult to observe. It is possible to eliminate many of these
confounding factors by averaging the data over several time periods. This is
accomplished by using certain smoothing techniques that remove random
fluctuations in the series, thereby providing a less obstructed view of the
behavior of the series.
39
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
A Moving Average (MA) will have the effect of “smoothing out” the data,
producing a movement with fewer peaks and valleys. It is computed by
averaging the values in the time series over a set number of time periods. The
same number of time periods is retained for each average by dropping the
oldest and picking up the newest.
𝑌𝑡 − 𝑀𝐴 = 𝑆𝑡 + 𝐼𝑡
If L is odd, only calculate the Moving Average (MA) and not the Cumulative
Moving Average (CMA).
If L is even, calculate both the Moving Average and the Cumulative Moving
Average.
Example 1:
Consider the data below and assume an additive model to compute the
seasonal and adjusted seasonal, hence de-seasonalize the series.
Solution:
Year I II III IV
2008 -2.25 1.25
2009 1.625 -1.875 -3.5 6.125
2010 2.125 -5.25 -3.75 6.25
2011 1.25 -2.75 -5.25 9.5
2012 -6.5 -8.625
Average -0.375 -4.625 -3.6875 5.78125
Seasonal(ASi)
∑ 𝐴𝑆𝑖 = -2.90625
∑ 𝐴𝑆
Adjusted seasonal – Adj 𝑆𝑖 = 𝑆𝑖 - 𝐿
∑ 𝐴𝑆 (−2.90625)
Adj 𝑆1 = 𝐴𝑆1 - = -0.375 – = 0.3516
𝐿 4
∑ 𝐴𝑆 (−2.90625)
Adj 𝑆2 = 𝐴𝑆2 - = -4.625 – = -3.86875
𝐿 4
∑ 𝐴𝑆 (−2.90625)
Adj 𝑆3 = 𝐴𝑆3 - = -3.6875 – = -2.9609
𝐿 4
∑ 𝐴𝑆 (−2.90625)
Adj 𝑆4 = 𝐴𝑆4 - = 5.78125 – = 6.5078
𝐿 4
41
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Example 2:
Consider the data below and assume a multiplicative model. Compute the
seasonal and adjusted seasonal.
Year I II III IV
2009 6 9 10 5
2010 16 20 18 6
2011 15 18 24 9
2012 20 22 26 12
Solution:
Year I II III IV
2009 1.1429 0.4396
2010 1.1636 1.3445 1.2101 0.4138
2011 1.0 1.1163 1.4015 0.4932
2012 1.0526 1.1210
Average 1.0721 1.1939 1.2515 0.4489
Seasonal
∑ 𝐴𝑆𝑖 = 3.9664
𝐴𝑆1
Adjusted seasonal – Adj 𝑆𝑖 = ∑ 𝐴𝑆𝑖
xL
𝐴𝑆1 (1.0721)
Adj 𝑆1 = ∑ 𝐴𝑆𝑖
xL= x 4 = 1.0812
3.9664
42
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
𝐴𝑆2 (1.1869)
Adj 𝑆2 = ∑ 𝐴𝑆𝑖
xL= x 4 = 1.2040
3.9664
𝐴𝑆3 (1.2515)
Adj 𝑆3 = ∑ 𝐴𝑆𝑖
xL= x 4 = 1.2621
3.9664
𝐴𝑆4 (0.4489)
Adj 𝑆4 = ∑ 𝐴𝑆𝑖
xL= x 4 = 0.4527
3.9664
Exercise 1
Fit a linear trend for the data in the table below and plot both the time series
and the trend values.
Exercise 2
Fit a second degree trend to the following data and plot the time series and
trend.
t 𝒀𝒕 T 𝒀𝒕
1 1.985 8 2.393
2 2.032 9 2.560
3 2.086 10 2.775
4 2.095 11 2.853
5 2.188 12 2.904
6 2.270 13 2.981
7 2.320 14 3.376
Exercise 3
Exercise 4
44
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
CHAPTER FOUR
1. Understand what cyclic variations are and what their causes are
2. Define and use the application of the test for cyclic variations
4.1 Introduction
The cyclic variations are long peaks and troughs away from trend that occur
over a number of years.
There are a number of causes that lead to cycles in time series. These include:
The mostly used test for cycles is the Van Newman’s Ratio Test.
4.3.1 Definition
Van Newman’s Ratio Test is used to test whether the residuals in the time
series are independent of each other, that is, they do not have cycles within
themselves.
Example:
45
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Year t 𝒀𝒕 𝒕𝟐 𝑻𝒕 (𝒀𝒕 − 𝑻𝒕 ) 𝑹𝒕 𝒅𝒕 𝒅𝒕 𝟐
1998 1 16.2 1 16.04 0.16 10
1999 2 15.4 4 16.82 -1.42 3 -7 49
2000 3 17.1 9 17.6 -0.5 6 3 9
2001 4 18.0 16 18.38 -0.38 7 1 1
2002 5 21.2 25 19.16 2.04 13 6 36
2003 6 21.4 36 19.94 1.46 12 -1 1
2004 7 20.4 49 20.72 -0.32 8 -4 16
2005 8 18.0 64 21.5 -3.5 1 -7 49
2006 9 21.3 81 22.28 -0.98 5 4 16
2007 10 23.7 100 23.06 0.64 11 -1 1
2008 11 28.0 121 23.84 4.16 15 -10 100
2009 12 27.6 144 24.62 2.98 14 5 25
2010 13 24.2 169 25.4 -1.2 4 -7 49
2011 14 25.9 196 26.18 -0.28 9
2012 15 24.2 225 26.96 -2.76 2
120 322.6 1240 404
𝑇𝑡 = 𝛼0 + 𝛼1 t, ∑ 𝑌𝑡 = n𝛼0 + 𝛼1 ∑ 𝑡
∑ 𝑡𝑌𝑡 = 𝛼0 ∑ 𝑡 + 𝛼1 ∑ t 2
𝑇𝑡 = 𝑌̂ = 15.26 + 0.78t
Step 2: Select a Level of Significance: Choose a level say 0.1, 0.05 etc.
Step 4: Formulate a decision Rule: Reject 𝐻0 if the 𝑅𝑀𝑐 is less than 𝑅𝑀𝑡 with (n-
2).
12
Step 5: Carry out the test; 𝑅𝑀𝑐 = x 404 = 1.44
15(152 − 1)
46
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
𝑅𝑀𝑡 = 1.19
47
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
48
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Exercise 1
49
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Exercise 2
t 𝒀𝒕 T 𝒀𝒕
1 1.985 8 2.393
2 2.032 9 2.560
3 2.086 10 2.775
4 2.095 11 2.853
5 2.188 12 2.904
6 2.270 13 2.981
7 2.320 14 3.376
Exercise 3
Listed below are the total numbers of vehicles sold by Toyota Motor Company
(in thousands) from 2000 to 2012.
50
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
CHAPTER FIVE
1. Understand what seasonal variations are and what their causes are
2. Define and use the application of the test for seasonal variations
5.1 Introduction
The rationale of a test for seasonality is that if specific seasonal are purely
random then the distribution would be the same for all seasons. In other
words, the rankings of the specific seasonal should be as likely to fall in one
season as in another.
𝑌𝑡 − 𝑀𝐴 = 𝑆𝑡 + 𝐼𝑡
If L is odd, only calculate the Moving Average (MA) and not the Cumulative
Moving Average (CMA).
If L is even, calculate both the Moving Average and the Cumulative Moving
Average.
By doing so, we remove the trend and cyclic components from the series and
remain only with the seasonal and irregular terms.
51
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Definitions
n = ∑𝐿1 𝑛𝑖
Test statistic
12 𝑅𝑖 2
H= [∑ ] – 3(n+1) where n denotes the total number of pieces of data and
𝑛(𝑛+1) 𝑛𝑖
𝑅1 , 𝑅2 , … … … 𝑅𝐾 denote respectively, the sums of the ranks for the sample data
from populations 1,2,………,K.
1. Independent samples
2. Populations have the same shape
3. All samples are of size 5 or greater
Additive model:
𝐻0 : 𝑆1 + 𝑆2 + 𝑆3 + 𝑆4 = 0, and 𝐻𝐴 : Some 𝑆𝑖 ≠ 0
Or
𝐻0 = ∑ 𝑆𝑖 = 0, and 𝐻𝐴 : ∑ 𝑆𝑖 ≠ 0
Multiplicative model:
𝐻0 : 𝑆1 = 𝑆2 = 𝑆3 = 𝑆4 = 1, and 𝐻𝐴 : Some 𝑆𝑖 ≠ 1
Or
𝐻0 = ∑ 𝑆𝑖 = L, and 𝐻𝐴 : ∑ 𝑆𝑖 ≠ L
Generalization:
52
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Step 3: Decide on the critical value ᵡ2𝛼 with, (K -1) degrees of freedom. Use the
tables or calculator to determine the critical value.
Step 6: If the value of the test statistic falls in the rejection region, then reject
𝐻0 ; otherwise, do not reject 𝐻0 .
Example 1:
Using the data in the table below, make the Kruskal-Wallis test to determine
whether seasonality exists in the series or not:
Year I II III IV
2008 4 2 1 5
2009 6 4 4 14
2010 10 3 5 16
2011 12 9 7 22
2012 10 13 35 35
Specific Seasonal
Year I II III IV
2008 -2.25 1.25
2009 1.625 -1.875 -3.5 6.125
2010 1.625 -5.75 -3.75 6.25
2011 1.25 -2.75 -5.25 9.5
2012 -6.5 -8.625
53
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Ranking
Year I II III IV
2008 8 10.5
2009 12.5 9 6 14
2010 12.5 3 5 15
2011 10.5 7 4 16
2012 2 1
∑ 𝑅𝑖 37.5 20 23 55.5
𝑛𝑖 4 4 4 4
∑ 𝑅𝑖 2 1406.25 400 529 3080.25
𝑛 = ∑ 𝑛𝑖 = 4 + 4 + 4 + 4 = 16
12 5415.5
= [ ] − 51 = 59.7298 − 51 = 8.7298
272 4
Example 2:
In Kampala, independent random samples of cars, buses, and trucks provided
the data on the number of kilometers, driven last year, in thousands:
Cars Buses Trucks
19.2 1.3 11.6
12.5 7.3 24.0
1.5 7.3 8.2
6.1 7.0 10.6
33.5 12.8 10.0
7.6 18.9 2.3
11.3 44.1
6.3 1.5
8.8 13.0
0.4
54
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
𝑛 = 10 + 6 + 9 = 25
𝑅1 = 22 + 18 + 3.5 + 6 + 24 + 11 + 16 + 7 + 13 + 1 = 121.5
𝑅3 = 17 + 23 + 12 + 15 + 14 + 5 + 25 + 3.5 + 20 = 134.5
= 1.011
Step 6: If the value of the test statistic falls in the rejection region, reject𝐻0 ;
otherwise, do not reject 𝐻0 .
The test results are not statistically significant at the 5% level; that is, at the,
at the 5% significance level the data do not provide sufficient evidence to
55
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
conclude that a difference exists in the last year’s mean number of kilometers
driven among cars, buses and trucks.
Exercise 1
Exercise 3
Gaba Village, near Kansanga Trading Center, contains shops, restaurants, and
motels. They have two peak seasons – Vacation and End of Year. The specific
seasonals with respect to the total sales volumes for the recent years are:
Quarter
Year I II III IV
2008 117.0 80.7 129.6 76.1
2009 118.6 82.5 121.4 77.0
2010 114.0 84.3 119.9 75.0
2011 120.7 79.6 130.7 69.6
2012 125.2 80.2 127.6 72.0
a. Develop the typical seasonal pattern for Gaba Village using the ratio – to-
moving average method.
b. Explain the typical index for the first quarter.
56
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Exercise 4
Using the data in the table below, compute a three- year and a five-year moving
average:
CHAPTER SIX
FORECASTING
6.1 Introduction
57
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Example:
Using the multiplicative model fit a linear trend and forecast the values for the
four quarters of 2013 on the data below:
58
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
∑ 𝑌𝑡 = n𝛼0 + 𝛼1 ∑ 𝑡
240 = 16𝛼0 + 136 𝛼1 (i)
∑ 𝑡𝑌𝑡 = 𝛼0 ∑ 𝑡+ 𝛼1 ∑ 𝑡 2
2,358 = 136 𝛼0 + 1300 𝛼1 (ii)
Solving the two equations simultaneously:
𝑛 ∑𝑡 𝛼0 ∑ 𝑌𝑡
( 2 ) (𝛼 ) =( )
∑𝑡 ∑𝑡 1 ∑ 𝑡𝑌𝑡
16 136 𝛼0 240
( ) (𝛼 ) =( )
136 1300 1 2,358
∑ 𝑌𝑡 ∑𝑡 240 136
| | | |
∑ 𝑡𝑌𝑡 ∑ 𝑡 2 2358 1300 (240𝑥1300)−(136𝑥2358) 312000−320688 −8688
𝛼0 = 𝑛 ∑𝑡 = 16 136 = = = -3.7708
| | | | (16𝑥1300)−(136𝑥136) 20800−18496 2304
∑ 𝑡 ∑ 𝑡2 136 1300
𝑛 ∑ 𝑌𝑡 16 240
| | | | (16𝑥2358)−(136𝑥240)
∑ 𝑡 ∑ 𝑡𝑌𝑡 136 2358 37728−32640 5088
𝛼1 = 𝑛 ∑ 𝑡 = 16 136 = = = = 2.2083
| | | | (16𝑥1300)−(136𝑥136) 20800−18496 2304
∑ 𝑡 ∑ 𝑡2 136 1300
Fit a simple exponential trend for the data below and project for 𝑡18 :
t 𝒀𝒕 T 𝒀𝒕
1 2.27 8 2.9
2 2.32 9 2.982
3 2.393 10 2.059
4 2.56 11 3.143
5 2.647 12 3.289
6 2.775 13 3.376
7 2.85 14 3.602
Solution
t 𝒀𝒕 ln𝒀𝒕 t ln𝒀𝒕 𝒕𝟐
1 2.27 0.82 0.82 1
2 2.32 0.84 1.68 4
3 2.393 0.87 2.61 9
4 2.56 0.94 3.76 16
5 2.647 0.97 4.85 25
6 2.775 1.02 6.12 36
7 2.85 1.05 7.35 49
59
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Let ln 𝛼0 = 𝑎 and ln 𝛼1 = 𝑏
Exercise 1
Fit a linear trend for the data in the table below and plot both the time series
and the trend values and project the value for 𝑌18 .
60
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020
Exercise 2
Fit a second degree trend to the following data and plot the time series and
trend and project the value for 𝑌17 .
t 𝒀𝒕 t 𝒀𝒕
1 1.985 8 2.393
2 2.032 9 2.560
3 2.086 10 2.775
4 2.095 11 2.853
5 2.188 12 2.904
6 2.270 13 2.981
7 2.320 14 3.376
Exercise 3
The following is the enrolling at the University of Kampala from 2000 to 2012.
e. Develop both a linear and a logarithmetic trend equation.
f. Estimate the enrollment for 2016 using both equations.
g. Which trend equation would you recommend? Why?
h. Plot the data and comment on your forecast.
Exercise 4
Listed below are the total numbers of vehicles sold by Toyota Motor Company
(in thousands) from 2000 to 2012.