Professional Documents
Culture Documents
BY MUKALAZI HERBERT
COURSE CONTENT
The Time Series and Index Numbers I course is designed to introduce students to the
major concepts and tools for analyzing and drawing conclusions from data. Data and
information are integral to the operation and planning, and as statisticians grow and
develop there is an increasing need for the use of formalized statistical methodology to
answer statistics related questions.
Teaching Objectives
COURSE CONTENT
READING MATERIALS/LIST
1. Anderson, The Statistical Analysis of Time Series
2. Neil R. Ullman, Elementary Statistics and applied approach
2
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Chapter One
There are a number of reasons for analyzing time series that include but not
limited to the following:
3
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
As a result, the ability to forecast and predict future events and trends
greatly enhances the likelihood of success. It is therefore no wonder that
businesses and governments spend a good deal of time and effort in the
pursuit of accurate forecasts of future trends and developments.
There are four components to a time series: the trend, the cyclical variation,
the seasonal variation, and the irregular variation.
Trend (Secular Trend) is the steady increase or decrease over a long period of
time, reflecting long-term growth or decline of the variable of interest.
Disposable income, money supply, bank deposits have generally increased over
time, together with sales of durable goods such as cars, mobile phones, usually
accompanied by steadily rising prices. The per capita death rates data exhibit
long-term downward trends attributable to advances in medicine and the rising
standards of living.
There are two main reasons why we may wish to identify cycles in time series.
In the first place, we may want to know where we are in the cycle to anticipate
what may happen in the near future.
Second, as with trend, when a cycle is identified and isolated, the other factors
affecting the time series data are more easily seen and can be explained
accordingly.
These are movements in the time series that reoccur each year about the same
time.
5
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
banking activity at the end of the month, while resorts and amusement parks
are busiest on weekends.
Practically, all business and economic series have recurring seasonal patterns.
For example, sales for clothes are high prior to Christmas. Prices of produce
are low at harvest time.
Furthermore, once we know the seasonal variations, we may want to iron out
the intra-year variations by promoting during the off season.
Time series analysts prefer to subdivide the irregular variation into episodic
and residual variations.
Episodic variation
Residual variation
After the episodic fluctuations have been removed, the remaining variation is
the residual variation, often called chance fluctuations. These are
unpredictable and they cannot be identified.
6
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Note: Neither the episodic nor the residual variations can be predicted into the
future. They are merely treated as the residual influence after the other three
components of the time series data have been taken into account.
1.4.1 Definition
The additive model refers to time series 𝑌𝑡 as an algebraic sum of the four
components, symbolically expressed as: 𝑌𝑡 = 𝑇𝑡 + 𝑆𝑡 + 𝐶𝑡 + 𝐼𝑡
Where 𝑌𝑡 is the value of the time series for the time period t, and the right-hand
side values are the trend, the seasonal variation, the cyclical variation, and the
random or irregular variation respectively, for the same time period.
All the values are expressed in their original units, and S, C and I are
deviations around T.
The time series 𝑌𝑡 does not depend on the four components. That is, 𝑌𝑡 is
independent of the four components.
In order to break down the time series data and measure the effect of the
individual components, we proceed in four steps as follows:
7
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Example: If we were to develop a time-series model for sales for a local retail
store, we might find that T = $500, S = $100, C = -$25, and I = -$10. Sales
would be: Y = $500 + $100 -$25 -$10 = $565
Notice that the positive value for S indicates that existing seasonal influences
have a positive impact on sales. The negative cyclical value suggests that the
business cycle is currently in a downswing. There was apparently some
random event that had a negative impact on sales.
The additive model suffers from the somewhat unrealistic assumption that the
components are independent of each other. This is seldom the case in the real
world. In most instances, movements in one component will have an impact on
the other components, thereby negating the assumption of independence. Or,
perhaps even more commonly, we often find that certain forces at work in the
economy simultaneously affect two or more components. Again, the
assumption of independence is violated.
The multiplicative model assumes that the four components interact with each
other and do not move independently.
It is expressed as follows: 𝑌𝑡 = 𝑇𝑡 x 𝑆𝑡 x 𝐶𝑡 x 𝐼𝑡
This model is often preferred for the reason that the components affect one
another.
In order to break down the time series data and measure the effect of the
individual components, we proceed in four steps as follows:
8
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
9
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
R TIME SERIES
10
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
11
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
12
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
13
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
14
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Exercise 1:
Plot a graph using the time series data in the table below:
Exercise 2:
Plot the graph for using time series data in the table below:
Exercise 3:
The table below shows weekly fuel purchases for the Ministry of Finance
Headquarters:
6 445 16 252
7 310 17 446
8 372 18 473
9 440 19 337
10 414 20 478
SOLUTIONS TO EXERCISES
16
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
17
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
18
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Chapter Two
2.1 Introduction
There are numerous activities that cause time series to be stationary. These
include:
1. Stable environment – the forces that generate the time series would have
stabilized and the environment in the time series is relatively unchanged.
For example, the mature stage of the life cycle of a good or a service.
2. Easily correctable trend – which stabilization may be obtained by making
simple corrections for factors such as population growth and inflation.
For example, the Gross Domestic Product (GDP) per Capita.
3. Short forecasting horizon – where the time series may have a trend but
the period over which the forecasts are needed is relatively short so that
the amount due to trend is negligible.
4. Transferrable series – whereby the series may be mathematically altered
into a stable one by taking logarithms, square roots or differentials.
The decision to use a no-trend model depends on cost, availability of data and
the desired level of accuracy.
Definition:
Nonparametric tests are statistical procedures that can be used to test
hypotheses when no assumptions regarding parameters or population
distribution are possible.
Coverage:
Seven distribution-free tests will be considered in this chapter: the runs
test, the turning point test, the signs test, Daniel’s test, Pearson’s test,
Wilcoxon Signed-Rank Test, Wilcoxon Rank-Sum Test.
20
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Example:
Using the Runs test, test for stationarity in the time series below:
t 𝒀𝒕 +𝑺 /−𝒔 Run
1 2.0 - 1
2 4.3 -
3 2.4 -
4 4.5 + 2
5 2.8 - 3
6 4.1 -
7 5.6 + 4
8 4.8 +
9 3.6 - 5
10 2.4 -
11 5.5 + 6
12 5.8 +
13 3.3 - 7
14 5.2 + 8
15 4.1 - 9
16 4.9 + 10
17 2.9 - 11
18 5.6 + 12
19 5.8 +
20 6.2 +
21
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Step 1: Arranging 𝑌𝑡 values in ascending order and compute the median of the
series.
2. 2. 2. 2. 3. 3. 4. 4. 4. 4. 4. 4. 5. 5. 5. 5. 5. 5. 6.
2 4 4 8 9 3 6 1 1 3 5 8 9 2 5 6 6 8 8 2
From 2.0 to 4.1 there are 9 values and from 4.8 to 6.2 there are 9 values. Since
the number of values is even the median is the average of the two values in the
middle. That is, (4.3 + 4.5)/2 = 8.8/2 = 4.4
Step 2: Assign a plus sign (+) to observations above the median and a minus
sign (-) to observations below
Step 3: List the pluses +s and minuses –s in chronological order and count the
number of runs or blocks of pluses and minuses (R). R =12 the number of
Runs
Step 4: Since the number of observations is even this step is not applicable
Step 5: Let m be the number of pluses, the statistic, which is equal to the
number of runs in a random sequence of m pluses and m minuses with mean
𝜇𝑅 which is equal to the expected number of runs and is equal to m+1
𝜇𝑅 = m + 1 = 10 + 1 = 11
𝑚(𝑚−1) 10(10−1)
𝑆𝑅 = √ 2𝑚−1
=√ 20−1
= 2.176
22
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Step 8: Conclusion
Definition
A turning point in time series is a point where the series change direction, each
such point, represents either a local peak or local trough in the series. A
turning point is a time period whose sign is different from that of the next
period.
Step 1: Assign a plus (+) or a minus (-) to a period depending on whether its
first difference (𝑌𝑡 -𝑌𝑡−1 ) is positive or negative. A positive indicates that the
series went up in the period and a negative implies it went down
Step 2: Determine the number of turning points (U). This is the test statistic
that is equal to the turning points in a series of n observations
2(𝑛−2) 16𝑛−29
𝜇𝑈 = and 𝜎𝑈 = √
3 90
23
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Step 6: Conclusion
Example:
Using the turning point test, test for stationarity in the following time series:
Step 1: Assign a plus (+) or a minus (-) to a period depending on whether its
first difference (𝑌𝑡 -𝑌𝑡−1 ) is positive or negative
U = 13
Step 6: Conclusion
Since 𝑍𝑐 = 0.179 is less than 𝑍𝛼 = 1.96 we fail to reject 𝐻0 and conclude at 95%
2
confidence that the series is stationary or has no trend
Definition
When the signs of the first difference have been determined for the turning
point test, a sign test may be used. The sign test is based on the sign of a
difference between two related observations. We designate a plus sign for a
positive difference and a minus sign for a negative difference
Step 1: Determine the signs of the first differences for the turning point test
25
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Step 2: Determine the test statistic, V, the number of positive first differences
in the series. 𝑛΄ , the number of non-zero first differences
𝑛΄ 𝑛 ΄
𝜇𝑉 = and 𝜎𝑉 = √ 2
2
𝑉− 𝜇𝑉
Step 4: Calculate the value of 𝑍𝑐 as follows: 𝑍𝑐 = 𝜎𝑉
Step 6: Conclusion
Example:
Test for stationarity in the time series below using the sigh test at 95%
confidence level:
t 𝒀𝒕 𝒀𝒕 - 𝒀𝒕−𝟏 V N
1 1.4
2 3.0 + 1 1
3 1.9 - 2
4 3.1 + 2 3
5 2.1 - 4
6 2.5 + 3 5
7 4.1 +
8 3.6 - 6
9 2.9 -
10 1.9 -
11 4.0 + 4 7
12 4.2 +
13 2.7 - 8
14 3.4 + 5 9
26
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
15 3.0 - 10
16 3.5 + 6 11
17 2.7 - 12
18 4.1 + 7 13
19 4.3 +
Step 1: Determine the signs of the first differences for the turning point test.
Step 2: Determine the test statistic, V, the number of positive first differences
in the series. V = 7
𝑛΄ 𝑛 13 ΄
𝜇𝑉 = = 13/2 = 6.5 and 𝜎𝑉 = √ 2 = √ 2 = √6.5 = 2.55
2
Step 6: Conclusion
Since 𝑍𝑐 = 0.196 is less than 𝑍𝛼 = 1.96 we fail to reject 𝐻0 and conclude at 95%
2
confidence that the series is stationary or has no trend.
27
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Definition
Step 2: Select a Level of Significance; Choose a level say 0.1, 0.05 etc.
6 ∑ 𝑑𝑖 2
r=1- 𝑛(𝑛2 − 1)
Where 𝑑𝑖 the difference between the rankings for each of the observation, and n
is the number of observations.
1
𝜇𝑟 = 0 and 𝜎𝑟 =
√𝑛−1
𝑟−𝜇𝑟
𝑍𝑐 =
𝜎𝑟
Step 5: Conclusion
28
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Example
Using the data in the table below, test for stationarity using Daniel’s test:
t 𝒀𝒕 𝑹𝒕 𝑹𝒀 d 𝒅𝟐
1 19.1 1 1 0 0
2 40.5 2 3.5 -1.5 2.25
3 40.5 3 3.5 -0.5 0.25
4 62.3 4 5 -1.0 1
5 37.6 5 2 3 9
6 84.3 6 7 1 1
7 123.9 7 9 2 4
8 74.5 8 6 2 4
9 200.5 9 11 2 4
10 177.4 10 10 0 0
11 114.6 11 8 3 9
34.5
Step 2: Select a Level of Significance; Choose a level say 0.1, 0.05 etc.
𝑟−𝜇𝑟 0.843−0
𝑍𝑐 = = = 2.668
𝜎𝑟 0.316
Step 5: Conclusion
Since |𝑍𝑐 | = 2.668 is greater than 𝑍𝛼 =1.96 reject 𝐻0 and conclude that the time
2
series is not stationary, that is, has a trend
29
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Definition
Step 2: Select a Level of Significance; Choose a level say 0.1, 0.05 etc.
2
(∑ 𝑡)
̅ 2 = ∑ 𝑡2 –
Where 𝑆𝑡𝑡 = ∑(𝑡 − 𝑡) 𝑛
2
(∑ 𝑌)
𝑆𝑌𝑌 = ∑(𝑌 − ̅̅̅
𝑌)2 = ∑ 𝑌 2 – 𝑛
∑𝑡∑𝑌
𝑆𝑡𝑌 = ∑(𝑡 − 𝑡̅ )(𝑌 − 𝑌̅ )= ∑ 𝑡𝑌–
𝑛
√𝑛−2
𝑡𝑐 = r(√1−𝑟 2)
Step 5: Conclusion
30
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Definition
Example:
Thomas’s is a family restaurant in town, offering a full dinner menu, but their
specialty is chicken. Recently, Jones Thomas, the owner and founder,
developed a new spicy flavor for the batter in which the chicken is cooked.
Before replacing the current flavor, he wants to conduct some tests to be sure
that patrons will like the spicy flavor better
The samples are dependent or related. That is, the participants are asked to
rate both flavors of chicken. Thus, if we compute the difference between the
rating for the spicy flavor and the current flavor, the resulting value shows the
amount the participants favor one flavor over the other. If we choose to
subtract the current flavor score from the spicy flavor score, a positive result is
the “amount” the participant favors the spicy flavor. Negative difference scores
indicate the participant favored the current flavor. Because of the somewhat
31
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
subjective nature of the scores, we are not sure the distribution of the
differences follows the normal distribution. We decide to use the nonparametric
Wilcoxon signed-rank test.
As usual, we will use the five-step hypothesis testing procedure. The null
hypothesis is that there is no difference in the rating of the chicken flavors by
the participants. That is, as many participants in the study rated the spicy
flavor higher as rated the regular flavor higher. The alternative hypothesis is
that the ratings are higher for the spicy flavor. That is,
This is a one-tailed test. Why? Because Jones, the owner of Thomas’s, will want
to change his chicken flavor only if the sample participants show that the
population of customers like the new flavor better.
Step 1: Compute the difference between the spicy flavor score and the current
flavor score for each participant. These differences are shown in column 4 of
the table below:
Annette 14 12 2 2 1 1
Susan 8 16 -8 8 6 6
George 6 2 4 4 3 3
William 18 4 14 14 13 13
Jonah 20 12 8 8 6 6
Jacob 16 16 * * * *
Liz 14 5 9 9 9 9
Garrett 6 16 -10 10 11 11
John 19 10 9 9 9 9
Joseph 18 10 8 8 6 6
Peter 16 13 3 3 2 2
Paul 18 2 16 16 14 14
Godwin 4 13 -9 9 9 9
James 7 14 -7 7 4 4
Deon 16 4 12 12 12 12
Total 75 30
32
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Step 2: Only the positive and negative differences are considered further. That
is, if the difference in flavor scores is 0, that participant is dropped from the
analysis and the number in the sample reduced. Hence Jacob is dropped from
the study and the sample size reduced from 15 to 14.
Step 3: Determine the absolute differences for the values computed in column
4. Recall that in an absolute difference we ignore the sign of the difference.
Step 4: Rank the absolute differences from smallest to largest. There are three
participants who rated the difference in flavors as 8. To resolve the problem, we
average the rankings involved and report the average rank for each. This
situation involves the ranks 5, 6, and 7, so all three participants are assigned
the rank of 6. The same situation occurs for those participants with a
difference of 9. The ranks involved are 8, 9, and 10, so three participants are
assigned a rank of 9.
Step 5: Each assigned rank in column 6 is then given the same sign as the
original difference, and the results are reported in column 7.
Step 7: Obtain the critical values for the Wilcoxon signed-rank test from table.
An extract of the table is below:
33
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
The decision rule is to reject the null hypothesis if the smaller of the rank sums
is 25 or less. This is the largest value in the rejection region.
Step 8: Test the values obtained in column 7 against the critical value and
make decision to reject 𝐻0 or not
In this case the smaller rank sum is 30, so the decision is not to reject the null
hypothesis
Definition
The Wilcoxon rank-sum test is based on the average of ranks. The data are
ranked as if the observations were part of a single sample. If the null
hypothesis is true, then the ranks will be about evenly distributed between the
two samples, and the average of the ranks for the two samples will be about
the same. That is, the low, medium, and high ranks should be about equally
divided between the two samples. If the alternative hypothesis is true, one of
the samples will have more of the lower ranks and, thus, a smaller rank total.
If each of the samples contains at least eight observations, the standard normal
distribution is used as the test statistic. The formula is:
𝑛1(𝑛
1+𝑛2 +1)
𝑊−
Z= 2
𝑛 𝑛 (𝑛 +𝑛 +1)
√ 1 2 1 2
12
34
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Example:
Nairobi Entebbe
11 13
15 14
10 10
18 8
11 16
20 9
24 17
22 21
25
At the 0.05 significance level, can we conclude that there are no-shows for
flights originating from Nairobi?
Solution:
If the number of no-shows is the same for Nairobi and Entebbe, then we expect
the means of the two ranks to be about the same. If the number of no-shows is
not the same, we expect the two sums of ranks to be quite different.
Mr. Thompson believes there are more no-shows for Nairobi flights. Thus, a
one-tailed test is appropriate, with the rejection region located in the upper tail.
The test statistic follows the standard normal distribution. At the 0.05
significance level, we find from the standard normal tables, the critical value of
Z is 1.65. The null hypothesis is rejected if the computed value of Z is greater
than 1.65
35
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Nairobi Entebbe
No-Shows Rank No-Shows Rank
11 5.5 13 7
15 9 14 8
10 3.5 10 3.5
18 12 8 1
11 5.5 16 10
20 13 9 2
24 16 17 11
22 15 21 14
25 17
96.5 56.5
The value of W is calculated for the Nairobi group and is found to be 96.5
Because the computed z value (1.49) is less than 1.65, the null hypothesis is
not rejected.
The evidence does not show a difference in the typical number of no-shows.
That is, it appears that the number of no-shows is the same in Nairobi as in
Entebbe.
Exercise 1
Test for stationarity in the time series using a Runs test on the following data:
36
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Exercise 2
Use turning point test to test for stationarity in the data in exercise 1.
Exercise 3
Exercise 4
Exercise 5
Use the Pearson’s test on the data below to test for stationarity in the time
series:
t 𝒀𝒕 𝒕𝟐 𝒀𝒕 𝟐 t𝒀𝒕
1 1.82 1 3.31 1.82
2 2.6 4 6.76 5.2
3 1.7 9 2.89 5.1
4 2.8 16 7.84 11.2
5 3.4 25 11.56 17.0
6 4.3 36 18.49 25.8
7 3.1 49 9.61 21.7
8 4.5 64 20.25 36.0
9 5.0 81 25.0 45.0
10 5.7 100 32.49 57.0
11 4.12 121 16.97 45.32
12 3.6 144 12.96 43.2
13 6.3 169 39.69 81.9
14 7.0 196 49.0 98.0
37
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Exercise 6
A record of the production for each machine operator was kept over a period of
time. Certain changes in the production procedure were suggested, and 11
operators were picked as an experimental test group to determine whether the
new procedures were worthwhile. Their production rates before and after the
new procedures were established and recorded as follows:
Exercise 7
One of the major car manufacturers is studying the effect of regular verses
super gasoline in its economy cars. Ten executives are selected and asked to
maintain records on the number of kilometers per liter of gas. The results are:
38
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Exercise 8
It has been suggested that daily production of an assembly line for batteries at
Uganda Batteries Limited would be increased if better portable lighting were
installed and background music and free coffee and doughnuts were provided
during the day. Management agreed to try the scheme for a limited time. The
numbers of batteries produced per week by a small test group of employees are
as follows:
Using the Wilcoxon signed-rank test, determine whether the suggested changes
are worthwhile.
a. State the null hypothesis
b. Decide on the alternative hypothesis
c. Decide on the level of significance
d. State the decision rule
e. Compute T and arrive at a decision
Exercise 9
The following observations were selected from populations that were not
necessarily normally distributed. Use the 0.05 significance level, a two-tailed
test, and the Wilcoxon rank-sum test to determine whether there is a
difference between the two populations:
Population A 38 45 56 57 61 69 70 79
Population B 26 31 35 42 51 52 57 62
Exercise 10
39
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Current Method: 41 36 42 39 36 48 49 38
Experimental: 21 27 36 20 19 21 39 24 22
Exercise 11
Exercise 12
Four methods of treating steel rods are analyzed to determine whether there
is any difference in analyzed the pressure the rods can bear before breaking.
The results of the tests measuring the pressure in pounds before the rods
bent are shown. Conduct the test, complete with the hypotheses, decision
rule, and conclusion. Ser 𝛂 = 1 percent.
40
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2019
Exercise 13
The quality control manager for a large plant in Kampala Industrial Area gives
two operations manuals to two groups of employees. Each group is then tested
on operational procedures. The scores are shown in the table below. The
manager has always felt that manual 1 provides a better base of knowledge for
new employees. Compute the mean test scores of the employees and report
your conclusion. State the hypotheses. Set 𝛂 = 0.05.
Manual 1 87 97 82 97 92 90 81 89 90 88 87 89 93
Manual 2 92 79 80 73 84 93 86 88 91 82 81 84 72 74
Exercise 14
At the 10 percent level, is there a relationship between study time in hours and
grades on a test, according to the data in the table below?
Time 21 18 15 17 18 25 18 4 6 5
Grade 67 58 59 54 58 80 14 15 19 21
41