You are on page 1of 61

TIME SERIES ANALYSIS NOTES; JAN – MAY 2020

Course Code: SMT 1206

Course Name: TIME SERIES ANALYSIS

Course Level: Year I Semester II

Credit Units: 4 CU

Pre-requisite: SMT 1101 Calculus

Course description

 Introduction to time series

 Applied Time Series Analysis

 Forecasting

Course Learning Outcomes

By the end of the course the student should be able to use basic competence in
the concepts, principles, procedures and applications of time series analysis.

Reading List

1. Ronald E Walpole 3rd edition, Introduction to statistics.

2. Wayne W Daniel, Biostatistics: A foundation for analysis in heath


sciences, 6th and 7th edition.

MUKALAZI HERBERT, KYAMBOGO UNIVERSITY MATH DEPARTMENT Page 1


TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Chapter One

Introduction to Time Series

1.0 Goals to the Introduction to Time Series

1. Define and explain the meaning of each of the components of a time


series
2. Use of time series and their application to different situations and fields
of study
3. Introduced to the importance of forecasting the future with some degree
of accuracy
4. Introduced to how those forecasts can be used to make informed
decisions
5. Define and explain the Time Series Models – the Additive and the
Multiplicative models
6. Introduced to the use the time series models and their application in
different situations

1.1 Definition of Time Series

A Time Series is a collection of data for some variable or a set of variables


recorded over a period of time – usually hourly, daily, weekly, monthly,
quarterly, half-yearly or yearly. The data set is called a time series because
it contains observations for some variable over time. Examples of time series
are: (1) Sales by quarter at Uchumi Supermarket; (2) the annual production
of coffee in Uganda since independence (1962); (3) the weekly interest rates
in financial institutions; (4) the hourly wind speed recorded by the
Meteorology Department of the Ministry of Natural Resources; (5) the
annual birth and death rates from the Social Science Surveys.

1.2 Reasons for analyzing Time Series

There are a number of reasons for analyzing time series that include but not
limited to the following:

1. The main purpose of time-series analysis is to predict or forecast future


values of the variable from past observations.
Time series can be used by management to make current decisions and
for long – term forecasting and planning. Long – term forecasts usually
extend more than one year into the future. 5-, 10-, 15-, and 20-year
projections are common. Long-range predictions are considered essential
in order to allow sufficient time for the procurement, manufacturing,
sales, finance, and other departments of a company to develop plans for
possible new products, and new methods of assembling.
2
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

As a result, the ability to forecast and predict future events and trends
greatly enhances the likelihood of success. It is therefore no wonder that
businesses and governments spend a good deal of time and effort in the
pursuit of accurate forecasts of future trends and developments.

2. In time series analysis, we try to capture the underlying patterns of


variation of the variable of interest over time. This means identifying
what factors affect the patterns of variation by examining the time series
data. Such data come from repeated observations of the same variable of
interest over equal intervals of time.

1.3 Components of a Time Series

There are four components to a time series: the trend, the cyclical variation,
the seasonal variation, and the irregular variation.

1.3.1 Trend (Secular Trend)

Trend (Secular Trend) is the steady increase or decrease over a long period of
time, reflecting long-term growth or decline of the variable of interest.

1.3.1.1 Causes of Trend

Trend can be a consequence of long-term gradual changes in population, gross


national product, technological advances, or consumer preferences, among
other causes. Disposable income, money supply, bank deposits have generally
increased over time, together with sales of durable goods such as cars, mobile
phones, usually accompanied by steadily rising prices. The per capita death
rates data exhibit long-term downward trends attributable to advances in
medicine and the rising standards of living.

1.3.2 Cyclical Variations

Cyclical variations are medium-term variations lasting a few years but


exhibiting no regular periodicity or pattern. These fluctuations are also referred
to as business cycles. They cover much longer time periods than do seasonal
variations, often encompassing three or more years in duration.

1.3.2.1 Phases of a cycle


A cycle contains four phases namely:
1. Upswing or expansion, during which the level of business activity is
accelerated, unemployment is low, and production is brisk.
2. Peak, at which the rate of economic activity has “topped out”.

3
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

3. Downturn, or contraction, when unemployment rises and activity wanes.


4. Trough, when activity is at its lowest point.

1.3.2.2 A typical business cycle

A typical business cycle consists of a period of prosperity followed by periods of


recession, depression, and recovery.

1.3.2.3 Reasons for identifying cycles in time series

There are two main reasons why we may wish to identify cycles in time series.

In the first place, we may want to know where we are in the cycle to anticipate
what may happen in the near future.

Second, as with trend, when a cycle is identified and isolated, the other factors
affecting the time series data are more easily seen and can be explained
accordingly.

1.3.3 Seasonal Variations

Seasonal variations are characterized by predictable swings occurring within


one calendar year or less with clockwork regularly. Such intra-year variations
often reflect natural phenomena, such as changing weather associated with the
four seasons as well as institutional manmade factors such as work holidays
and school calendars.

Consider the expanded consumption of electricity and gas in winter, the


increased sales of warm clothing in cold weather, and the growth in
summertime sales of ice cream, sunglasses, and air-conditioners. Climatic
conditions also affect production in such industries as agriculture and outdoor
construction. Finally social factors cause increases in the purchase of toys just
before Christmas and flowers around Valentine’s Day, recreational travel
during school vacation. These are movements in the time series that reoccur
each year about the same time.

1.3.3.1 Periodic variations

Periodic variations are shorter versions of seasonal variations. They manifest


themselves within a month, a week, or even a day. Notice the increase in
banking activity at the end of the month, while resorts and amusement parks
are busiest on weekends.

The unit of time may be quarterly, monthly, weekly or even daily.

4
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Practically, all business and economic series have recurring seasonal patterns.
For example, sales for clothes are high prior to Christmas. Prices of produce
are low at harvest time.

1.3.3.2 Reasons for analyzing seasonal variations

An analysis of seasonal variation is important in planning production


schedules and anticipating sales.

Furthermore, once we know the seasonal variations, we may want to iron out
the intra-year variations by promoting during the off season.

1.3.4 Irregular Variations

Time series contain irregular or random fluctuations caused by unusual


occurrences producing movements that have no discernible pattern. These
movements are unique and unlikely to reoccur in similar fashion. They can be
caused by events such as wars, natural disasters – floods, earthquakes,
political upheavals, economic embargoes, etc.

In time series analysis, irregular activity in the variable consists of whatever


variation is left over after we have account for the effects of trend, cycles, and
seasonal variation.

1.3.4.1 Types of irregular variations

Time series analysts prefer to subdivide the irregular variation into episodic
and residual variations.

Episodic variation

Episodic fluctuations are unpredictable, but they can be identified. For


example, the impact on the economy of a major strike or a war can be
identified, but a strike or a war cannot be predicted.

Residual variation

After the episodic fluctuations have been removed, the remaining variation is
the residual variation, often called chance fluctuations. These are
unpredictable and they cannot be identified.

Note: Neither the episodic nor the residual variations can be predicted into the
future. They are merely treated as the residual influence after the other three
components of the time series data have been taken into account.

5
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

1.4 Time Series Models

1.4.1 Definition

A time series model can be expressed as some combination of the four


components: trend, cyclical, seasonal and irregular.

The model is simply a mathematical statement of the relationship among the


four components. Two types of models are commonly associated with time
series, namely, the additive and the multiplicative models.

1.4.2 The Additive Model

The additive model refers to time series 𝑌𝑡 as an algebraic sum of the four
components, symbolically expressed as: 𝑌𝑡 = 𝑇𝑡 + 𝑆𝑡 + 𝐶𝑡 + 𝐼𝑡

Where 𝑌𝑡 is the value of the time series for the time period t, and the right-hand
side values are the trend, the seasonal variation, the cyclical variation, and the
random or irregular variation respectively, for the same time period.

All the values are expressed in their original units, and S, C and I are
deviations around T.

The time series 𝑌𝑡 does not depend on the four components. That is, 𝑌𝑡 is
independent of the four components.

1.4.2.1 Application of the Additive Model

In order to break down the time series data and measure the effect of the
individual components, we proceed in four steps as follows:

1. Isolate the seasonal variation, S, and then deseasonalize the data:


𝑌𝑡 - 𝑆𝑡 = 𝑇𝑡 + 𝐶𝑡 + 𝐼𝑡

2. Compute the trend, T, then remove its influence:


𝑌𝑡 - 𝑆𝑡 - 𝑇𝑡 = 𝐶𝑡 + 𝐼𝑡

3. Identify the cyclical fluctuation, C, then remove its influence:


𝑌𝑡 - 𝑆𝑡 - 𝑇𝑡 - 𝐶𝑡 = 𝐼𝑡

4. Recognize that the residual (what remains) is the effect of unpredictable


irregular events, I.

Example: If we were to develop a time-series model for sales for a local retail
store, we might find that T = $500, S = $100, C = -$25, and I = -$10. Sales
would be: Y = $500 + $100 -$25 -$10 = $565

6
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Notice that the positive value for S indicates that existing seasonal influences
have a positive impact on sales. The negative cyclical value suggests that the
business cycle is currently in a downswing. There was apparently some
random event that had a negative impact on sales.

1.4.2.2 Defects of the Additive Model

The additive model suffers from the somewhat unrealistic assumption that the
components are independent of each other. This is seldom the case in the real
world. In most instances, movements in one component will have an impact on
the other components, thereby negating the assumption of independence. Or,
perhaps even more commonly, we often find that certain forces at work in the
economy simultaneously affect two or more components. Again, the
assumption of independence is violated.

1.4.3 The Multiplicative Model

The multiplicative model assumes that the four components interact with each
other and do not move independently.
It is expressed as follows: 𝑌𝑡 = 𝑇𝑡 x 𝑆𝑡 x 𝐶𝑡 x 𝐼𝑡
This model is often preferred for the reason that the components affect one
another.

1.4.3.1 Application of the Multiplicative Model

In order to break down the time series data and measure the effect of the
individual components, we proceed in four steps as follows:
1. Isolate the seasonal variation, S, and then deseasonalise the data:
𝑇𝑡 x 𝑆𝑡 x 𝐶𝑡 x 𝐼𝑡
= 𝑇𝑡 x 𝐶𝑡 x 𝐼𝑡
𝑆𝑡

2. Compute the trend, T, then remove its influence:


𝑇𝑡 x 𝐶𝑡 x 𝐼𝑡
= 𝐶𝑡 x 𝐼𝑡
𝑇𝑡

3. Identify the cyclical fluctuation, C, then remove its influence:


𝐶𝑡 x 𝐼𝑡
= 𝐼𝑡
𝐶𝑡

4. Recognize that the residual (what remains) is the effect of unpredictable


irregular events, I.

Example: Values for bad debts at a commercial bank might be recorded as T =


$10 million, S = 1.7, C = 0.91 and I = 0.87. Bad debts could then be computed
as:

Y = (10) (1.7) (0.91) (0.87) = $13.46 million


7
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Trend analysis plot


The trend analysis plot displays the observations versus time. The plot includes the fits
calculated from the fitted trend equation, the forecasts, and the accuracy measures.

Interpretation
Examine the trend analysis plot to determine whether your model fits your data. If the fits closely follow
the actual data, the model fits your data. Ideally, the data points should fall randomly around the fitted
line.

 If the model fits the data, you can perform double exponential smoothing and compare the two
models.
 If the model does not does fit the data, perform the analysis again and select a different type of
model. If you fit a linear model and see curvature in the data, select the quadratic, exponential,
or S-curve model. If none of the models fit your data, use a different time series analysis. For
more information, go to Which time series analysis should I use?.

On this trend analysis plot, the fits closely follow the data, which indicates that the model fits the
data.

8
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Component of Time Series Data

Traditional methods of time series analysis are concerned with decomposing of a series into a
trend, a seasonal variation and other irregular fluctuations. Although this approach is not always
the best but still useful (Kendall and Stuart, 1996).

The components, by which time series is composed of, are called component of time series data.
There are four basic Component of time series data described below.

Different Sources of Variation are:

1. Seasonal effect (Seasonal Variation or Seasonal Fluctuations)


Many of the time series data exhibits a seasonal variation which is annual period, such as sales
and temperature readings. This type of variation is easy to understand and can be easily
measured or removed from the data to give de-seasonalized data.Seasonal Fluctuations
describes any regular variation (fluctuation) with a period of less than one year for example cost
of variation types of fruits and vegetables, cloths, unemployment figures, average daily rainfall,
increase in sale of tea in winter, increase in sale of ice cream in summer etc., all show seasonal
variations.The changes which repeat themselves within a fixed period, are also called seasonal
variations, for example, traffic on roads in morning and evening hours, Sales at festivals like EID
etc., increase in the number of passengers at weekend etc. Seasonal variations are caused by
climate, social customs, religious activities etc.
2. Other Cyclic Changes (Cyclical Variation or Cyclic Fluctuations)
Time series exhibits Cyclical Variations at a fixed period due to some other physical cause, such
as daily variation in temperature. Cyclical variation is a non-seasonal component which varies in
recognizable cycle. Some time series exhibits oscillation which do not have a fixed period but are
predictable to some extent. For example, economic data affected by business cycles with a
period varying between about 5 and 7 years. In weekly or monthly data, the cyclical component
may describe any regular variation (fluctuations) in time series data. The cyclical variation are
periodic in nature and repeat themselves like business cycle, which has four phases (i) Peak (ii)
Recession (iii) Trough/Depression (iv) Expansion.
3. Trend (Secular Trend or Long Term Variation)
It is a longer term change. Here we take into account the number of observations available and
make a subjective assessment of what is long term. To understand the meaning of long term, let
for example climate variables sometimes exhibit cyclic variation over a very long time period
such as 50 years. If one just had 20 years data, this long term oscillation would appear to be a
trend, but if several hundreds years of data is available, then long term oscillations would be
visible. These movements are systematic in nature where the movements are broad, steady,
showing slow rise or fall in the same direction. The trend may be linear or non-linear
(curvilinear). Some examples of secular trend are: Increase in prices, Increase in pollution,
increase in the need of wheat, increase in literacy rate, decrease in deaths due to advances in
science. Taking averages over a certain period is a simple way of detecting trend in seasonal
data. Change in averages with time is evidence of a trend in the given series, though there are
more formal tests for detecting trend in time series.
4. Other Irregular Variation (Irregular Fluctuations)
When trend and cyclical variations are removed from a set of time series data, the residual left,
which may or may not be random. Various techniques for analyzing series of this type examine
to see “if irregular variation may be explained in terms of probability models such as moving

9
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

average or autoregressive models, i.e. we can see if any cyclical variation is still left in the
residuals. These variation occur due to sudden causes are called residual variation (irregular
variation or accidental or erratic fluctuations) and are unpredictable, for example rise in prices
of steel due to strike in the factory, accident due to failure of break, flood, earth quick, war etc.

R TIME SERIES

10
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

11
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

12
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

13
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

14
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

15
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

1.5 CHAPTER EXERCISES

Exercise 1:
Plot a graph using the time series data in the table below:

Year Shares (%) Year Shares (%)


2004 31.3 2009 32.5
2005 31.1 2010 35.2
2006 29.3 2011 38.0
2007 28.9 2012 39.7
2008 29.5

What is the direction of the trend?

Exercise 2:

Plot the graph for using time series data in the table below:

Year Sales (Billion Shillings) Year Sales (Billion Shillings)


1997 3.6 2005 6.3
1998 4.2 2006 5.8
1999 5.5 2007 4.7
2000 6.2 2008 5.9
2001 5.6 2009 6.1
2002 4.3 2010 7.5
2003 6.0 2011 8.1
2004 7.2 2012 8.5
Draw the trend line.

Exercise 3:
The table below shows weekly fuel purchases for the Ministry of Finance
Headquarters:
Week Fuel (liters) Week Fuel (liters)
1 409 11 318
2 289 12 598
3 509 13 418
4 364 14 359
5 404 15 432
6 445 16 252
7 310 17 446
8 372 18 473
9 440 19 337
10 414 20 478

a. Plot the fuel purchases by the Ministry against time.


b. Do you expect a trend, a seasonal, and a cyclical variation or not?
16
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Tests with More than Two Independent Samples

In the modules on hypothesis testing we presented techniques for testing the equality of means in
more than two independent samples using analysis of variance (ANOVA). An underlying
assumption for appropriate use of ANOVA was that the continuous outcome was approximately
normally distributed or that the samples were sufficiently large (usually nj> 30, where j=1, 2, ...,
k and k denotes the number of independent comparison groups). An additional assumption for
appropriate use of ANOVA is equality of variances in the k comparison groups. ANOVA is
generally robust when the sample sizes are small but equal. When the outcome is not normally
distributed and the samples are small, a nonparametric test is appropriate.

The Kruskal-Wallis Test

A popular nonparametric test to compare outcomes among more than two independent groups is
the Kruskal Wallis test. The Kruskal Wallis test is used to compare medians among k
comparison groups (k > 2) and is sometimes described as an ANOVA with the data replaced by
their ranks. The null and research hypotheses for the Kruskal Wallis nonparametric test are
stated as follows:

H0: The k population medians are equal versus

H1: The k population medians are not all equal

The procedure for the test involves pooling the observations from the k samples into one
combined sample, keeping track of which sample each observation comes from, and then
ranking lowest to highest from 1 to N, where N = n1+n2 + ...+ nk. To illustrate the procedure,
consider the following example.

Example:
A clinical study is designed to assess differences in albumin levels in adults following diets with
different amounts of protein. Low protein diets are often prescribed for patients with kidney
failure. Albumin is the most abundant protein in blood, and its concentration in the serum is
measured in grams per deciliter (g/dL). Clinically, serum albumin concentrations are also used to
assess whether patients get sufficient protein in their diets. Three diets are compared, ranging
from 5% to 15% protein, and the 15% protein diet represents a typical American diet. The
albumin levels of participants following each diet are shown below.

5% Protein 10% Protein 15% Protein


3.1 3.8 4.0
2.6 4.1 5.5
2.9 2.9 5.0
3.4 4.8
4.2

17
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Is there is a difference in serum albumin levels among subjects on the three different diets. For
reference, normal albumin levels are generally between 3.4 and 5.4 g/dL. By inspection, it
appears that participants following the 15% protein diet have higher albumin levels than those
following the 5% protein diet. The issue is whether this observed difference is statistically
significant.

In this example, the outcome is continuous, but the sample sizes are small and not equal across
comparison groups (n1=3, n2=5, n3=4). Thus, a nonparametric test is appropriate. The hypotheses
to be tested are given below, and we will us a 5% level of significance.

H0: The three population medians are equal versus

H1: The three population medians are not all equal

To conduct the test we first order the data in the combined total sample of 12 subjects from
smallest to largest. We also need to keep track of the group assignments in the total sample.

Total Sample Ranks


(Ordered Smallest to
Largest)
5% 10% 15% 5% 10% 15% 5% 10% 15%
Protein Protein Protein Protein Protein Protein Protein Protein Protein
3.1 3.8 4.0 2.6 1
2.6 4.1 5.5 2.9 2.9 2.5 2.5
2.9 2.9 5.0 3.1 4
3.4 4.8 3.4 5
4.2 3.8 6
4.0 7
4.1 8
4.2 9
4.8 10
5.0 11
5.5 12

Notice that the lower ranks (e.g., 1, 2.5, 4) are assigned to the 5% protein diet group while the
higher ranks (e.g., 10, 11 and 12) are assigned to the 15% protein diet group. Again, the goal of
the test is to determine whether the observed data support a difference in the three population
medians. Recall in the parametric tests, discussed in the modules on hypothesis testing, when
comparing means among more than two groups we analyzed the difference among the sample
means (mean square between groups) relative to their within group variability and summarized
the sample information in a test statistic (F statistic). In the Kruskal Wallis test we again
summarize the sample information in a test statistic based on the ranks.

18
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Test Statistic for the Kruskal Wallis Test


The test statistic for the Kruskal Wallis test is denoted H and is defined as follows:

where k=the number of comparison groups, N= the total sample size, nj is the sample size in the
jth group and Rj is the sum of the ranks in the jth group.

In this example R1 = 7.5, R2 = 30.5, and R3 = 40. Recall that the sum of the ranks will always
equal n(n+1)/2. As a check on our assignment of ranks, we have n(n+1)/2 = 12(13)/2=78 which
is equal to 7.5+30.5+40 = 78. The H statistic for this example is computed as follows:

We must now determine whether the observed test statistic H supports the null or research
hypothesis. Once again, this is done by establishing a critical value of H. If the observed value of
H is greater than or equal to the critical value, we reject H0 in favor of H1; if the observed value
of H is less than the critical value we do not reject H0. The critical value of H can be found in the
table below.

Critical Values of H for the Kruskal Wallis Test

To determine the appropriate critical value we need sample sizes (n1=3, n2=5 and n3=4) and our
level of significance (α=0.05). For this example the critical value is 5.656, thus we reject H0
because 7.52 > 5.656, and we conclude that there is a difference in median albumin levels among
the three different diets.

Notice that Table 8 contains critical values for the Kruskal Wallis test for tests comparing 3, 4 or
5 groups with small sample sizes. If there are 3 or more comparison groups and 5 or more
observations in each of the comparison groups, it can be shown that the test statistic H
approximates a chi-square distribution with df=k-1. Thus, in a Kruskal Wallis test with 3 or
more comparison groups and 5 or more observations in each group, the critical value for the test
can be found in the table of Critical Values of the χ 2 Distribution below.

19
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

20
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

21
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Critical Values of the χ2 Distribution

22
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

The following example illustrates this situation.

Example:
A personal trainer is interested in comparing the anaerobic thresholds of elite athletes. Anaerobic
threshold is defined as the point at which the muscles cannot get more oxygen to sustain activity
or the upper limit of aerobic exercise. It is a measure also related to maximum heart rate. The
following data are anaerobic thresholds for distance runners, distance cyclists, distance
swimmers and cross-country skiers.

Distance Distance Distance Cross-Country


Runners Cyclists Swimmers Skiers
185 190 166 201
179 209 159 195
192 182 170 180
165 178 183 187
174 181 160 215

Is a difference in anaerobic thresholds among the different groups of elite athletes?

Step 1. Set up hypotheses and determine level of significance.

H0: The four population medians are equal versus

H1: The four population medians are not all equal α=0.05

 Step 2. Select the appropriate test statistic.

The test statistic for the Kruskal Wallis test is denoted H and is defined as follows:

where k=the number of comparison groups, N= the total sample size, nj is the sample size in the
jth group and Rj is the sum of the ranks in the jth group.

 Step 3. Set up the decision rule.

Because there are 4 comparison groups and 5 observations in each of the comparison groups, we
find the critical value in the table of critical values for the chi-square distribution for df=k-1=4-
1=3 and α=0.05. The critical value is 7.81, and the decision rule is to reject H0 if H > 7.81.

 Step 4. Compute the test statistic.

23
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

To conduct the test we assign ranks using the procedures outlined above. The first step in
assigning ranks is to order the data from smallest to largest. This is done on the combined or total
sample (i.e., pooling the data from the four comparison groups (n=20)), and assigning ranks from
1 to 20, as follows. We also need to keep track of the group assignments in the total sample. The
table below shows the ordered data.

Total Sample (Ordered Smallest to


Largest)
Distance Distance Distance Cross- Distance Distance Distance Cross-
Runners Cyclists Swimmers Country Runners Cyclists Swimmers Country
Skiers Skiers
185 190 166 201 159
179 209 159 195 160
192 182 170 180 165
165 178 183 187 166
174 181 160 215 170
174
178
179
180
181
182
183
185
187
190
192
195
201
209
215

24
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

We now assign the ranks to the ordered values and sum the ranks in each group.

Total Sample (Ordered Smallest to Ranks


Largest)
Distance Distance Distance Distance Distance Distance Distance Cross-
Runners Runners Runners Runners Runners Cyclists Swimmers Country
Skiers
159 1
160 2
165 3
166 4
170 5
174 6
178 7
179 8
180 9
181 10
182 11
183 12
185 13
187 14
190 15
192 16
195 17
201 18
209 19
215 20
R1=46 R2=62 R3=24 R4=78

Recall that the sum of the ranks will always equal n(n+1)/2. As a check on our assignment of
ranks, we have n(n+1)/2 = 20(21)/2=210 which is equal to 46+62+24+78 = 210. In this example,

 Step 5. Conclusion.

Reject H0 because 9.11 > 7.81. We have statistically significant evidence at α =0.05, to show that
there is a difference in median anaerobic thresholds among the four different groups of elite
athletes.
25
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Notice that in this example, the anaerobic thresholds of the distance runners, cyclists and cross-
country skiers are comparable (looking only at the raw data). The distance swimmers appear to
be the athletes that differ from the others in terms of anaerobic thresholds. Recall, similar to
analysis of variance tests, we reject the null hypothesis in favor of the alternative hypothesis if
any two of the medians are not equal.

2.5 CHAPTER EXCERCISES

Exercise 1

Test for stationarity in the time series using a Runs test on the following data:

1.7 6.1 3.8 2.4 2.1 3.8 4.9 3.2 2.8


1.6 3.4 3.0 2.9 1.7 3.1 2.8 3.0 1.9
2.4 2.7 1.6 1.9 3.3 2.6 2.2 2.6

Exercise 2

Use turning point test to test for stationarity in the data in exercise 1.

Exercise 3

Use sign test to test for stationarity in the data in exercise 1.

Exercise 4

Given the following performances in two subjects STA2101 and STA2106


determine the Spearman’s rank correlation coefficient and use it to test for
stationarity in the time series.

STA2101 STA2106 STA2101 STA2106


40 39 28 96
43 45 80 47
47 60 44 81
42 54 59 94
56 77 64 96
70 80 68 81
85 81 74 44
52 49 55 50

26
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Exercise 5

Use the Pearson’s test on the data below to test for stationarity in the time
series:

t 𝒀𝒕 𝒕𝟐 𝒀𝒕 𝟐 t𝒀𝒕
1 1.82 1 3.31 1.82
2 2.6 4 6.76 5.2
3 1.7 9 2.89 5.1
4 2.8 16 7.84 11.2
5 3.4 25 11.56 17.0
6 4.3 36 18.49 25.8
7 3.1 49 9.61 21.7
8 4.5 64 20.25 36.0
9 5.0 81 25.0 45.0
10 5.7 100 32.49 57.0
11 4.12 121 16.97 45.32
12 3.6 144 12.96 43.2
13 6.3 169 39.69 81.9
14 7.0 196 49.0 98.0

Exercise 6

A record of the production for each machine operator was kept over a period of
time. Certain changes in the production procedure were suggested, and 11
operators were picked as an experimental test group to determine whether the
new procedures were worthwhile. Their production rates before and after the
new procedures were established and recorded as follows:

Operator Production Before Production After


S.M. 17 18
D.J. 21 23
M.D. 25 22
B.B. 15 25
M.F. 10 28
A.A. 16 16
U.Z. 10 22
Y.U. 20 19
U.T. 17 20
Y.H. 24 30
Y.Y. 23 26

a. How many usable pairs are there? That is, what is n?


b. Using the Wilcoxon signed-rank test, determine the new procedures
actually increased production. Use the 0.05 level and a one-tailed test.

27
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Exercise 7
One of the major car manufacturers is studying the effect of regular verses
super gasoline in its economy cars. Ten executives are selected and asked to
maintain records on the number of kilometers per liter of gas. The results are:

Kilometers per Liter


Executive Regular Super
B.W. 25 28
D.M. 33 31
G.S. 31 35
D.T. 45 44
K.L. 42 47
R.U. 38 40
G.O. 29 29
B.N. 42 37
S.W 41 44
L.W. 30 44

At the 0.05 significance level, is there a difference in the number of kilometers


per liter between regular and super gasoline?

Exercise 8
It has been suggested that daily production of an assembly line for batteries at
Uganda Batteries Limited would be increased if better portable lighting were
installed and background music and free coffee and doughnuts were provided
during the day. Management agreed to try the scheme for a limited time. The
numbers of batteries produced per week by a small test group of employees are
as follows:

Employee Past Current Employee Past Current


Production Production Production Production
J.D. 23 33 J.B. 30 29
S.B. 26 26 W.W 21 25
M.D. 24 30 O.P. 25 22
R.C. 17 25 C.D. 21 23
M.F. 20 19 P.A. 16 17
U.H. 24 22 R.T. 20 15
A.T. 17 9 O.O 23 30

Using the Wilcoxon signed-rank test, determine whether the suggested changes
are worthwhile.
a. State the null hypothesis
b. Decide on the alternative hypothesis
c. Decide on the level of significance
d. State the decision rule
e. Compute T and arrive at a decision
28
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Exercise 9

The following observations were selected from populations that were not
necessarily normally distributed. Use the 0.05 significance level, a two-tailed
test, and the Wilcoxon rank-sum test to determine whether there is a
difference between the two populations:

Population A 38 45 56 57 61 69 70 79
Population B 26 31 35 42 51 52 57 62

Exercise 10

One group was taught an assembly procedure using a standard sequence of


steps and another group was taught a new experimental technique. The
time to compute the assembly in seconds, for a sample of workers is shown
below:

Current Method: 41 36 42 39 36 48 49 38
Experimental: 21 27 36 20 19 21 39 24 22

At the 0.05 significance level, can we conclude the experimental method is


faster? Assume that the distribution of assembly lines is not normal.

Exercise 11

Six truck models are rated on a scale of 1 to 10 by two companies that


purchase entire fleet of trucks for industrial use. Calculate the Spearman
rank coefficient to determine at the 1% level whether the rankings are

independent.

Model Rating by First Company Rating by Second Company


1 8 9
2 7 6
3 5 8
4 7 5
5 3 7
6 2 8

29
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Exercise 12

Four methods of treating steel rods are analyzed to determine whether there
is any difference in analyzed the pressure the rods can bear before breaking.
The results of the tests measuring the pressure in pounds before the rods
bent are shown. Conduct the test, complete with the hypotheses, decision
rule, and conclusion. Ser 𝛂 = 1 percent.

Method 1 Method 2 Method 3 Method 4


50 10 72 54
62 12 63 59
73 10 73 64
48 14 82 82
63 10 79 79

Exercise 13

The quality control manager for a large plant in Kampala Industrial Area gives
two operations manuals to two groups of employees. Each group is then tested
on operational procedures. The scores are shown in the table below. The
manager has always felt that manual 1 provides a better base of knowledge for
new employees. Compute the mean test scores of the employees and report
your conclusion. State the hypotheses. Set 𝛂 = 0.05.

Manual 1 87 97 82 97 92 90 81 89 90 88 87 89 93
Manual 2 92 79 80 73 84 93 86 88 91 82 81 84 72 74

Exercise 14

At the 10 percent level, is there a relationship between study time in hours and
grades on a test, according to the data in the table below?

Time 21 18 15 17 18 25 18 4 6 5
Grade 67 58 59 54 58 80 14 15 19 21

30
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

CHAPTER THREE

THE TRENDS AND SMOOTHING ANALYSIS

3.0 Goals of the chapter

1. Understand what trends are and what their causes are


2. Define and use the application of the test for linear trends
3. Define and use the application of the test for non-linear trends
4. Define and use the smoothing techniques for trends

3.1 Introduction

Series whose average value changes over the time is referred to as being non-
stationary.

3.2 Definition

A trend is a general tendency of a series to either rise or decline. That is, a time
series 𝑌𝑡 is trended if E(𝑌𝑡 ) = f(𝛽0 𝛽1…..) for t = 1,2,………..

Where f(𝛽0 𝛽1…..) is an increasing or decreasing function.

3.3 Causes of Trend

There are many causes of trend that include the following


1. Population changes – that result in changes in demand and in variables
associated with that demand. For example sales, raw material use,
energy consumption etc.
2. Technological changes – that affect productivity hence causing changes
in the life quality and standard of living necessitating introduction of new
products and making others obsolete
3. Changes in social customs and behaviors – that affect tastes, habits and
consumption patterns
4. Inflation pressure – that leads to a general increase in prices over time.
5. Market acceptance – that is impacted to the consumer through branding
and advertising

3.4 Tests for Trends

There are numerous tests for trends but the most common are the tests for
linear trends and those for non-linear trends.

31
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

3.4.1 Tests for Linear Trends

The basic tests for linear trends are the Simple linear and the Least Square
Models.

Test for Linear Trend

The long-time trend of many time series such as sales, exports, production
often approximates a straight line.

The Linear Trend Model is expressed as: 𝑌𝑡 = 𝛼0 + 𝛼1 t + 𝜀𝑡

This implies that: 𝜀𝑡 = 𝑌𝑡 - 𝛼0 - 𝛼1 t

Squaring both sides: (𝜀𝑡 )2 = (𝑌𝑡 − 𝛼0 − 𝛼1 t) 2

Differentiating with respect to 𝛼0 and 𝛼1 and equate the result to zero:


𝜕(𝜀𝑡 )2
= 2(-1)(𝑌𝑡 - 𝛼0 - 𝛼1 t) = 0 (i)
𝜕𝛼0

𝜕(𝜀𝑡 )2
= 2(-t)(𝑌𝑡 - 𝛼0 - 𝛼1 t) = 0 (ii)
𝜕𝛼1

(−2)(𝑌𝑡 − 𝛼0 − 𝛼1 t) 0
From (i) dividing both sides by -2: =
−2 −2

Therefore: (𝑌𝑡 - 𝛼0 - 𝛼1 t) = 0

Summing up the values of 𝑌𝑡 : ∑(𝑌𝑡 − 𝛼0 − 𝛼1 t) = 0

Expanding the brackets: ∑ 𝑌𝑡 = n𝛼0 + 𝛼1 ∑ 𝑡 1st normal equation


(−2)(t)(𝑌𝑡 − 𝛼0 − 𝛼1 t) 0
From (ii) dividing both sides by -2: −2
= −2

Therefore: (𝑡𝑌𝑡 - 𝑡𝛼0 - 𝛼1 𝑡 2 ) = 0

Summing up the values of 𝑌𝑡 : ∑(𝑡𝑌𝑡 − 𝑡𝛼0 − 𝛼1 𝑡 2 ) = 0

Expanding the brackets: ∑ 𝑡𝑌𝑡 = 𝛼0 ∑ 𝑡+ 𝛼1 ∑ 𝑡 2 2nd normal equation

Solving the two normal equations simultaneously:

𝑛 ∑𝑡 𝛼0 ∑ 𝑌𝑡
( 2 ) (𝛼 ) =( )
∑𝑡 ∑𝑡 1 ∑ 𝑡𝑌𝑡
∑ 𝑌𝑡 ∑𝑡 𝑛 ∑ 𝑌𝑡
| | | |
∑ 𝑡𝑌𝑡 ∑ 𝑡 2 ∑ 𝑡 ∑ 𝑡𝑌𝑡
𝛼0 = 𝑛 ∑𝑡 , and 𝛼1 = 𝑛 ∑𝑡
| | | |
∑ 𝑡 ∑ 𝑡2 ∑ 𝑡 ∑ 𝑡2

32
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Example:

Fit a linear trend for the following data:

t 𝒀𝒕 𝒕𝒀𝒕 𝒕𝟐
1 1.8 1.8 1
2 2.4 4.8 4
3 2.8 8.4 9
4 3.6 14.4 16
5 4.2 21.0 25
6 4.9 29.4 36
7 5.9 41.3 49
8 3.8 30.4 64
9 6.4 57.6 81
10 7.0 70.0 100
55 42.8 279.1 385

∑ 𝑌𝑡 = n𝛼0 + 𝛼1 ∑ 𝑡

42.8 = 10𝛼0 + 55𝛼1 (i)

∑ 𝑡𝑌𝑡 = 𝛼0 ∑ 𝑡+ 𝛼1 ∑ 𝑡 2

279.1 = 55𝛼0 + 385𝛼1 (ii)

Solving the two equations simultaneously:

𝑛 ∑𝑡 𝛼0 ∑ 𝑌𝑡
( 2 ) (𝛼 ) =( )
∑𝑡 ∑𝑡 1 ∑ 𝑡𝑌𝑡

10 55 𝛼0 42.8
( ) (𝛼 ) =( )
55 385 1 279.1

∑ 𝑌𝑡 ∑𝑡 42.8 55
| | | |
∑ 𝑡𝑌𝑡 ∑ 𝑡 2 279.1 385 (42.8𝑥385)−(279.1𝑥55) 1127.5
𝛼0 = 𝑛 ∑𝑡 = 10 55 = = = 1.37
| | | | (10𝑥385)−(55𝑥55) 825
∑ 𝑡 ∑ 𝑡2 55 385

𝑛 ∑ 𝑌𝑡 10 42.8
| | | | (10𝑥279.1)−(42.8𝑥55)
∑ 𝑡 ∑ 𝑡𝑌𝑡 55 279.1 437
𝛼1 = 𝑛 ∑𝑡 = 10 55 = = = 0.53
| | | | (10𝑥385)−(55𝑥55) 825
∑ 𝑡 ∑ 𝑡2 55 385

Therefore the trend is: 𝑌̂ = 𝑇𝑡 = 1.37 + 0.53t; for t = 6

𝑌̂6= 𝑇6 = 1.37 + 0.53(6) = 1.37 + 3.18 = 4.55

Projection for t =15: 𝑌̂15 = 𝑇15 = 1.37 + 0.534(15) = 1.37 + 8.01 = 9.38
33
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

3.4.2: Least Squares Model (Second Degree Trend)

Definition

The Least Squares Model is the method of computing the equation for the
straight line through the data of interest gives the “best-fitting” line.

The model is: 𝑌𝑡 = 𝛼0 + 𝛼1 t + 𝛼2 𝑡 2 + 𝜀𝑡

The Least Squares Estimation Method (LSE) minimizes the error term.

𝜀𝑡 = 𝑌𝑡 - 𝛼0 - 𝛼1 t - 𝛼2 𝑡 2

Squaring both sides: (𝜀𝑡 2 ) = (𝑌𝑡 − 𝛼0 − 𝛼1 t − 𝛼2 𝑡 2 )2

Differentiating with respect to 𝛼0 , 𝛼1 and 𝛼2 and equate to zero:


𝜕(𝜀𝑡 )2
= 2(-1)( 𝑌𝑡 − 𝛼0 − 𝛼1 t − 𝛼2 𝑡 2 ) = 0 (i)
𝜕𝛼0

𝜕(𝜀𝑡 )2
= 2(-t)( 𝑌𝑡 − 𝛼0 − 𝛼1 t − 𝛼2 𝑡 2 ) = 0 (ii)
𝜕𝛼1

𝜕(𝜀𝑡 )2
= 2(-𝑡 2 )( 𝑌𝑡 − 𝛼0 − 𝛼1 t − 𝛼2 𝑡 2 ) = 0 (iii)
𝜕𝛼2

(−2)(𝑌𝑡 − 𝛼0 − 𝛼1 t− 𝛼2 𝑡 2 ) 0
From (i) dividing both sides by -2: =
−2 −2

Therefore: (𝑌𝑡 - 𝛼0 - 𝛼1 t -𝛼2 𝑡 2 ) = 0

Summing up the values of 𝑌𝑡 : ∑(𝑌𝑡 − 𝛼0 − 𝛼1 t − 𝛼2 𝑡 2 ) = 0

Expanding the brackets:

∑ 𝑌𝑡 = n𝛼0 + 𝛼1 ∑ 𝑡 + 𝛼2 ∑ 𝑡 2 1st normal equation


(−2)(t)(𝑌𝑡 − 𝛼0 − 𝛼1 t− 𝛼2 𝑡 2 ) 0
From (ii) dividing both sides by -2: =
−2 −2

Therefore: (𝑡𝑌𝑡 − 𝛼0 t − 𝛼1 t 2 − 𝛼2 𝑡 3 ) = 0

Summing up the values of 𝑌𝑡 : ∑(𝑡𝑌𝑡 − 𝛼0 t − 𝛼1 t 2 − 𝛼2 𝑡 3 ) = 0

Expanding the brackets:

∑ 𝑡𝑌𝑡 = 𝛼0 ∑ 𝑡+ 𝛼1 ∑ 𝑡 2 + 𝛼2 ∑ 𝑡 3 2nd normal equation

34
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

(−2)(t2 )(𝑌𝑡 − 𝛼0 − 𝛼1 t− 𝛼2 𝑡 2 ) 0
From (iii) dividing both sides by -2: =
−2 −2

Therefore: (𝑡 2 𝑌𝑡 − 𝛼0 t 2 − 𝛼1 t 3 − 𝛼2 𝑡 4 ) = 0

Summing up the values of 𝑌𝑡 : ∑(𝑡 2 𝑌𝑡 − 𝛼0 t 2 − 𝛼1 t 3 − 𝛼2 𝑡 4 ) = 0

Expanding the brackets:

∑(𝑡 2 𝑌𝑡 ) = 𝛼0 ∑ t 2 + 𝛼1 ∑ t 3 + 𝛼2 ∑ t 4 ) 3rd normal equation

Solving the three normal equations simultaneously:

𝑛 ∑𝑡 ∑ t 2 𝛼0 ∑ 𝑌𝑡
(∑𝑡 ∑ t2 ∑ t 3 ) (𝛼1 ) = (∑ 𝑡𝑌𝑡 )
∑ t2 ∑ t3 ∑ t 4 𝛼2 𝑡 2 𝑌𝑡

Example:

Fit a second degree trend using the following data:

t 𝒀𝒕 𝒕𝟐 𝒕𝟑 𝒕𝟒 t 𝒀𝒕 𝒕𝟐 𝒀𝒕
1 1.8 1 1 1 1.8 1.8
2 2.3 4 8 16 4.6 9.2
3 2.8 9 27 81 8.4 25.2
4 3.6 16 64 256 14.4 57.6
5 4.2 25 125 625 21.0 105.0
6 4.9 36 216 1296 29.4 176.4
7 5.9 49 343 2401 41.3 289.1
8 3.8 64 512 4096 30.4 243.2
9 6.4 81 729 6561 57.6 518.4
10 7.0 100 1000 10000 70.0 700.0
55 42.7 385 3025 25333 278.9 2125.9

42.7 = 10𝛼0 + 55𝛼1 + 385𝛼2 (i)

278.9 = 55𝛼0 + 385𝛼1 + 3025𝛼2 (ii)

2125.9 = 385𝛼0 + 3025𝛼1 + 25333𝛼2 (iii)

Solving the three normal equations simultaneously:

𝑛 ∑𝑡 ∑ t 2 𝛼0 ∑ 𝑌𝑡
(∑𝑡 ∑ t2 ∑ t ) (𝛼1 ) = (∑ 𝑡𝑌𝑡 )
3

∑ t2 ∑ t3 ∑ t 4 𝛼2 𝑡 2 𝑌𝑡

35
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

10 55 385 𝛼0 42.7
( 55 385 3025 ) (𝛼1 ) = ( 278.9 )
385 3025 25333 𝛼2 2125.9
𝛼0 = 1.225, 𝛼1 = 0.588, 𝛼2 = -0.0049

Therefore: 𝑌̂ = 𝛼0 + 𝛼1 t + 𝛼2 𝑡 2

For example t = 7

𝑌̂ 7 = 1.225 + (0.588x7) – (0.0049x49) = 1.225 + 4.116 – 0.2401 = 5.1

For projection:

For example t = 12

𝑌̂ 12 = 1.225 + (0.588x12) – (0.0049x144) = 1.225 + 7.056 – 0.7056 = 7.5754 =


7.6

3.4.3 Third Degree Trend

The model is: 𝑌𝑡 = 𝛼0 + 𝛼1 t + 𝛼2 𝑡 2 + 𝛼3 𝑡 3 + 𝜀𝑡

The normal equations are:

∑ 𝑌𝑡 = n𝛼0 + 𝛼1 ∑ 𝑡 + 𝛼2 ∑ t 2 + 𝛼3 ∑ t 3 (i)

∑ 𝑡𝑌𝑡 = 𝛼0 ∑ 𝑡 + 𝛼1 ∑ t 2 + 𝛼2 ∑ t 3 + 𝛼3 ∑ t 4 (ii)

𝑡 2 𝑌𝑡 = 𝛼0 ∑ t 2 + 𝛼1 ∑ t 3 + 𝛼2 ∑ t 4 + 𝛼3 ∑ t 5 (iii)

𝑡 3 𝑌𝑡 = 𝛼0 ∑ t 3 + 𝛼1 ∑ t 4 + 𝛼2 ∑ t 5 + 𝛼3 ∑ t 6 (iv)

Solve the four normal equations simultaneously.

3.5 Simple Exponential Trend

The model is: 𝑌𝑡 = 𝛼0 𝛼1 𝑡 𝜀𝑡

Using natural logarithm: ln(𝑌𝑡 ) = ln(𝛼0 𝛼1 𝑡 𝜀𝑡 ) = ln𝛼0 + tln𝛼1 + ln𝜀𝑡

Compare with the linear trend.

∑ 𝑌𝑡 = n𝛼0 + 𝛼1 ∑ 𝑡

∑ 𝑡𝑌𝑡 = 𝛼0 ∑ 𝑡 + 𝛼1 ∑ t 2

Conversely,

∑ ln 𝑌𝑡 = 𝑛 ln 𝛼0 + ln 𝛼1 ∑ 𝑡

36
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

∑ 𝑡 ln 𝑌𝑡 = ln 𝛼0 ∑ 𝑡 + ln 𝛼1 ∑ t 2

Solve the two equations simultaneously: 𝑌̂ = 𝑇𝑡 = 𝛼0 𝛼1 𝑡

This describes the exponential trend as a curve, where 𝛼0 and 𝛼1 are both
positive constants and 𝛼1 is raised to the power (exponent) equal to the number
of time periods from the midpoint origin. If we take the logarithm of both sides
of the equation, we get a logarithmic trend similar to a simple linear regression
equation.

Example:

Fit a simple exponential trend for the data below:

t 𝒀𝒕 ln𝒀𝒕 t ln𝒀𝒕 𝒕𝟐
1 2.27 0.82 0.82 1
2 2.32 0.84 1.68 4
3 2.393 0.87 2.61 9
4 2.56 0.94 3.76 16
5 2.647 0.97 4.85 25
6 2.775 1.02 6.12 36
7 2.85 1.05 7.35 49
8 2.9 1.06 8.48 64
9 2.982 1.09 9.81 81
10 2.059 0.72 7.2 100
11 3.143 1.15 12.65 121
12 3.289 1.19 14.28 144
13 3.376 1.22 15.86 169
14 3.602 1.28 17.92 196
105 14.22 113.39 1015

14.22 = 14ln𝛼0 + 105ln𝛼1

113.39 = 105ln𝛼0 + 1015ln𝛼1

Let ln𝛼0 = a and ln𝛼1 = b

14.22 = 14a + 105b (i)

113.39 = 105a + 1015b (ii)

Solving the two equations simultaneously: a = 0.7648, b = 0.0326

Therefore, ln𝛼0 = 0.7648 and ln𝛼1 = 0.0326

𝛼0 = 𝑒 0.7648 = 2.1486 and 𝛼1 = 𝑒 0.0326 = 1.0331

The exponential trend is: 𝑌̂ = 𝑇𝑡 = (2.1486)( 1.0331)𝑡

37
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

3.6 Asymptotic Growth Curves

3.6.1 Modified Exponential Trend

𝑦𝑡 = 𝛼0 + 𝛼1 𝛼2𝑡 𝜀𝑡 where 𝑡 = 𝑛 is the duration of each period and


∑3 𝑦− ∑2 𝑦
𝛼2𝑛 = ∑2 𝑦− ∑1 𝑦

𝛼 −1
𝛼1 = (∑2 𝑦 − ∑1 𝑦)( (𝛼𝑛2−1)2)
2

1 𝛼𝑛 −1
𝛼0 = 𝑛[∑1 𝑦 - ( 𝛼2 −1) 𝛼1 ]
2

Example:

t 𝒚𝒕 t 𝒚𝒕 t 𝒚𝒕
1 1.985 8 2.393 15 3.059
2 2.032 9 2.560 16 3.143
3 2.088 10 2.647 17 3.289
4 2.095 11 2.775 18 3.376
5 2.182 12 2.853 19 3.454
6 2.270 13 2.904 20 3.547
7 2.320 14 2.982 21 3.602
Total 14.972 Total 19.114 Total 23.47
∑3 𝑦− ∑2 𝑦 23.47−19.114
𝛼27 = ∑2 𝑦− ∑1 𝑦
= = 1.0517
19.114−14.972

7
𝛼2 = √1.0517 = 1.0072
𝛼 −1 1.0072−1 0.0072 0.0072
𝛼1 = (∑2 𝑦 − ∑1 𝑦) ( (𝛼𝑛2−1)2) = (4.142)( (1.0517−1)2) = (4.142)( (0.0517)2 ) = (4.142)( 0.00267)
2
= 11.1574
1 𝛼𝑛 −1 1 1.0517−1
𝛼0 = 𝑛[∑1 𝑦 - ( 𝛼2 −1) 𝛼1 ] = 7[14.972- (1.0072−1)11.1574] = -9.3063
2

3.6.2 The Gompertz Curve


𝑡
𝑦𝑡 = 𝛼0 𝛼1 𝛼2

Transforming by taking logs:


𝑡
ln(𝑦𝑡 ) = ln(𝛼0 𝛼1 𝛼2 )

ln(𝑦𝑡 ) = ln𝛼0 + (ln𝛼1 ) 𝛼2 𝑡

38
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

∑3 𝑙𝑛𝑦− ∑2 𝑙𝑛𝑦
𝛼2𝑛 = ∑2 𝑙𝑛𝑦− ∑1 𝑙𝑛𝑦

𝛼 −1
ln𝛼1 = (∑2 𝑙𝑛𝑦 − ∑1 𝑙𝑛𝑦)( (𝛼𝑛2−1)2)
2

1 𝛼𝑛 −1
𝑙𝑛𝛼0 = 𝑛[∑1 𝑙𝑛𝑦 - ( 𝛼2 −1) 𝑙𝑛𝛼1 ]
2

Example:

Using the information from the modified trend above fit a Gompertz Curve.

t 𝒚𝒕 ln𝒚𝒕 t 𝒚𝒕 ln𝒚𝒕 t 𝒚𝒕 ln𝒚𝒕


1 1.985 0.686 8 2.393 0.873 15 3.059 1.118
2 2.032 0.709 9 2.560 0.940 16 3.143 1.145
3 2.088 0.736 10 2.647 0.973 17 3.289 1.191
4 2.095 0.740 11 2.775 1.021 18 3.376 1.217
5 2.182 0.780 12 2.853 1.048 19 3.454 1.240
6 2.270 0.820 13 2.904 1.066 20 3.547 1.266
7 2.320 0.842 14 2.982 1.093 21 3.602 1.281
14.972 5.313 19.114 7.014 23.47 8.458
∑3 𝑙𝑛𝑦− ∑2 𝑙𝑛𝑦 8.458−7.014
𝛼27 = ∑2 𝑙𝑛𝑦− ∑1 𝑙𝑛𝑦
= = 0.849
7.014−5.313

7
𝛼2 = √0.849 = 0.977
𝛼 −1 0.977−1
ln𝛼1 = (∑2 𝑙𝑛𝑦 − ∑1 𝑙𝑛𝑦)( (𝛼𝑛2−1)2) = (7.014 − 5.313)( (0.849−1)2 ) = -1.73
2

Therefore, 𝛼1 = 𝑒 −1.73 = 0.183


1 𝛼𝑛 −1 1 0.849−1
𝑙𝑛𝛼0 = 𝑛[∑1 𝑙𝑛𝑦 - ( 𝛼2 −1) 𝑙𝑛𝛼1 ] = 7[5.313- (0.977−1)∗ (−1.73)] = 2.38
2

Therefore, 𝛼0 = 𝑒 2.38 = 10.8


𝑡 𝑡
Hence, 𝑦𝑡 = 𝛼0 𝛼1 𝛼2 = 10.8(0.183)0.977

3.7 Smoothing Techniques

The general behavior of the variable can often be best discussed by examining
its long-term trend. However, if the time series contains too many random
fluctuations or short-term seasonal changes, the trend may be somewhat
obscured and difficult to observe. It is possible to eliminate many of these
confounding factors by averaging the data over several time periods. This is
accomplished by using certain smoothing techniques that remove random
fluctuations in the series, thereby providing a less obstructed view of the
behavior of the series.

39
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

3.7.1 Moving Averages

A Moving Average (MA) will have the effect of “smoothing out” the data,
producing a movement with fewer peaks and valleys. It is computed by
averaging the values in the time series over a set number of time periods. The
same number of time periods is retained for each average by dropping the
oldest and picking up the newest.

Under the multiplicative model: 𝑌𝑡 = 𝑆𝑡 ∗ 𝑇𝑡 ∗ 𝐶𝑡 ∗ 𝐼𝑡 |𝑀𝐴|


𝑌𝑡
= 𝑆𝑡 ∗ 𝐼𝑡
𝑀𝐴

Under the additive model: 𝑌𝑡 = 𝑆𝑡 + 𝑇𝑡 + 𝐶𝑡 + 𝐼𝑡 |𝑀𝐴|

𝑌𝑡 − 𝑀𝐴 = 𝑆𝑡 + 𝐼𝑡

L – is the number of times, the time series is observed in a year.

If L is odd, only calculate the Moving Average (MA) and not the Cumulative
Moving Average (CMA).

If L is even, calculate both the Moving Average and the Cumulative Moving
Average.

Example 1:

Consider the data below and assume an additive model to compute the
seasonal and adjusted seasonal, hence de-seasonalize the series.

Year Quarter I Quarter II Quarter III Quarter IV


2008 4 2 1 5
2009 6 4 4 14
2010 10 3 5 16
2011 12 9 7 22
2012 10 13 35 35

Solution:

Year Q T 𝒀𝒕 MA CMA 𝒀𝒕 − 𝑪𝑴𝑨 𝑺𝒊 𝒀𝒕 − 𝑺𝒊


2008 I 1 4 0.3516 3.6484
II 2 2 3.0 -3.8984 5.8984
III 3 1 3.5 3.25 -2.25 -2.9609 3.9609
IV 4 5 4.0 3.75 1.25 6.5078 -1.5078
2009 I 5 6 4.75 4.375 1.625 0.3516 5.6484
II 6 4 7.0 5.875 -1.875 -3.8984 7.8984
III 7 4 8.0 7.5 -3.5 -2.9609 6.9609
IV 8 14 7.75 7.875 6.125 6.5078 7.4922
40
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

2010 I 9 10 8.0 7.875 2.125 0.3516 9.6484


II 10 3 8.5 8.25 -5.25 -3.8984 6.8984
III 11 5 9.0 8.75 -3.75 -2.9609 7.9609
IV 12 16 10.5 9.75 6.25 6.5078 9.4492
2011 I 13 12 11.0 10.75 1.25 0.3516 11.6484
II 14 9 12.5 11.75 -2.75 -3.8984 12.8984
III 15 7 12.0 12.25 -5.25 -2.9609 9.9609
IV 16 22 13.0 12.5 9.5 6.5078 15.4922
2012 I 17 10 20.0 16.5 -6.5 0.3516 9.6484
II 18 13 23.25 21.625 -8.625 -3.8984 16.8984
III 19 35 -2.9609 37.9609
IV 20 35 6.5078 28.4922

Year I II III IV
2008 -2.25 1.25
2009 1.625 -1.875 -3.5 6.125
2010 2.125 -5.25 -3.75 6.25
2011 1.25 -2.75 -5.25 9.5
2012 -6.5 -8.625
Average -0.375 -4.625 -3.6875 5.78125
Seasonal(ASi)

∑ 𝐴𝑆𝑖 = -2.90625
∑ 𝐴𝑆
Adjusted seasonal – Adj 𝑆𝑖 = 𝑆𝑖 - 𝐿

∑ 𝐴𝑆 (−2.90625)
Adj 𝑆1 = 𝐴𝑆1 - = -0.375 – = 0.3516
𝐿 4

∑ 𝐴𝑆 (−2.90625)
Adj 𝑆2 = 𝐴𝑆2 - = -4.625 – = -3.86875
𝐿 4

∑ 𝐴𝑆 (−2.90625)
Adj 𝑆3 = 𝐴𝑆3 - = -3.6875 – = -2.9609
𝐿 4

∑ 𝐴𝑆 (−2.90625)
Adj 𝑆4 = 𝐴𝑆4 - = 5.78125 – = 6.5078
𝐿 4

Interpretation of the results:

𝑆1 =0.3516 would imply that 𝑦𝑡 is 0.3516 above average.

𝑆2 = -3.86875 would imply that 𝑦𝑡 is 3.86875 below average.

𝑆3 = -2.9609 would imply that 𝑦𝑡 is 2.9609 below average.

𝑆4 = 6.5078 would imply that 𝑦𝑡 is 6.5078 above average.

41
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Example 2:

Consider the data below and assume a multiplicative model. Compute the
seasonal and adjusted seasonal.

Year I II III IV
2009 6 9 10 5
2010 16 20 18 6
2011 15 18 24 9
2012 20 22 26 12

Solution:

Year Q t 𝒀𝒕 MA CMA 𝒀𝒕 / 𝑪𝑴𝑨 𝑺𝒊 𝒀𝒕 / 𝑺𝒊


2009 I 1 6 1.0812 5.5494
II 2 9 7.5 1.2040 7.4751
III 3 10 10.0 8.75 1.1429 1.2621 79233
IV 4 5 12.75 11.375 0.4396 0.4527 11.0448
2010 I 5 16 14.75 13.75 1.1636 1.0812 14.7984
II 6 20 15.0 14.875 1.3445 1.2040 16.6113
III 7 18 14.75 14.875 1.2101 1.2621 14.2619
IV 8 6 14.25 14.5 0.4138 0.4527 13.2538
2011 I 9 15 15.75 15.0 1.0 1.0812 13.8735
II 10 18 16.5 16.125 1.1163 1.2040 14.9502
III 11 24 17.75 17.125 1.4015 1.2621 19.0159
IV 12 9 18.75 18.25 0.4932 0.4527 19.8807
2012 I 13 20 19.25 19.0 1.0526 1.0812 18.4980
II 14 22 20 19.625 1.1210 1.2040 18.2724
III 15 26 1.2621 20.6006
IV 16 12 0.4527 26.5076

Year I II III IV
2009 1.1429 0.4396
2010 1.1636 1.3445 1.2101 0.4138
2011 1.0 1.1163 1.4015 0.4932
2012 1.0526 1.1210
Average 1.0721 1.1939 1.2515 0.4489
Seasonal

∑ 𝐴𝑆𝑖 = 3.9664
𝐴𝑆1
Adjusted seasonal – Adj 𝑆𝑖 = ∑ 𝐴𝑆𝑖
xL

𝐴𝑆1 (1.0721)
Adj 𝑆1 = ∑ 𝐴𝑆𝑖
xL= x 4 = 1.0812
3.9664

42
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

𝐴𝑆2 (1.1869)
Adj 𝑆2 = ∑ 𝐴𝑆𝑖
xL= x 4 = 1.2040
3.9664

𝐴𝑆3 (1.2515)
Adj 𝑆3 = ∑ 𝐴𝑆𝑖
xL= x 4 = 1.2621
3.9664

𝐴𝑆4 (0.4489)
Adj 𝑆4 = ∑ 𝐴𝑆𝑖
xL= x 4 = 0.4527
3.9664

3.8 CHAPTER EXCERCISES

Exercise 1

Fit a linear trend for the data in the table below and plot both the time series
and the trend values.

Year (t) 𝒀𝒕 Year (t) 𝒀𝒕


2000 2248 2007 4283
2001 2966 2008 4795
2002 3453 2009 5033
2003 3766 2010 5021
2004 4031 2011 5267
2005 3749 2012 5511
2006 3778

Exercise 2

Fit a second degree trend to the following data and plot the time series and
trend.

t 𝒀𝒕 T 𝒀𝒕
1 1.985 8 2.393
2 2.032 9 2.560
3 2.086 10 2.775
4 2.095 11 2.853
5 2.188 12 2.904
6 2.270 13 2.981
7 2.320 14 3.376
Exercise 3

Using the data in exercise 2 fit a third degree trend.

Exercise 4

The following is the enrolling at Kyambogo University from 2003 to 2012.

a. Develop both a linear and a log arithmetic trend equation


b. Estimate the enrollment for 2015 using both equations
c. Which trend equation would you recommend? Why?
d. Plot the data and comment on your forecast
43
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Year Enrollment Year Enrollment Year Enrollment


1986 12,755 1995 20,270 2004 23928
1987 14,489 1996 21,117 2005 24,781
1988 15,158 1997 21,386 2006 24,969
1989 14,669 1998 21,589 2007 24,541
1990 15,730 1999 21,039 2008 24,188
1991 17,204 2000 21,238 2009 23,107
1992 17,498 2001 21,176 2010 21,991
1993 17,257 2002 21740 2011 21,343
1994 18,246 2003 22,806 2012 20,040
Exercise 5
Listed below are the total numbers of vehicles sold by Toyota Motor Company
(in thousands) from 2000 to 2012.
Year Units Sold Year Units Sold
2003 3,876 2008 3,693
2004 4,313 2009 4,131
2005 4,131 2010 4,591
2006 3,632 2011 4,279
2007 3,212 2012 4,222

a. Determine the trend equation


b. What is the yearly rate of increase?
c. Plot the data
d. Does it appear that we can effectively forecast the total number of units
sold with this equation?
Exercise 6

De-seasonalise the following data assuming:

(a) a multiplicative model. (b) an additive model.

Year Quarter Sales Year Quarter Sales


2007 I 6.7 I 2010 7.0
II 4.6 II 5.5
III 10.0 III 10.8
IV 12.7 IV 15.0
2008 I 6.5 I 2011 7.1
II 4.6 II 5.7
III 9.8 III 11.1
IV 13.6 IV 14.5
2009 I 6.9 I 2012 8.0
II 5.0 II 6.2
III 10.4 III 11.4
IV 14.1 IV 14.9

44
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

CHAPTER FOUR

The Cyclic Variations

4.0 Goals of the chapter

1. Understand what cyclic variations are and what their causes are
2. Define and use the application of the test for cyclic variations

4.1 Introduction

The cyclic variations are long peaks and troughs away from trend that occur
over a number of years.

In the additive model: 𝐶𝑡 = 𝑌𝑡 - 𝑆𝑡 − 𝑇𝑡 − 𝐼𝑡


𝑌𝑡
In the multiplicative model: 𝐶𝑡 = 𝑆𝑡 ∗𝑇𝑡 ∗𝐼𝑡

4.2 Causes of cycles

There are a number of causes that lead to cycles in time series. These include:

1. Psychological factors – the swings in the time series linked to popular


tastes such as fashion, food or music
2. Population changes – that leads to changes in demand for certain goods
3. Institutional changes – that leads to support of certain items and
exclusion of others
4. Replacement cycles – that leads to certain goods to be replaced by others
5. Education levels – that leads to people to favor certain goods and leaving
out others
6. Predator – that leads to certain elements of society falling prey to
calamities

4.3 Tests for Cycles

The mostly used test for cycles is the Van Newman’s Ratio Test.

4.3.1 Definition

Van Newman’s Ratio Test is used to test whether the residuals in the time
series are independent of each other, that is, they do not have cycles within
themselves.

Example:
45
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Consider the following data on Rice Production on Kibuku Rice Scheme in


Uganda for the period 1998 to 2012:

Year t 𝒀𝒕 𝒕𝟐 𝑻𝒕 (𝒀𝒕 − 𝑻𝒕 ) 𝑹𝒕 𝒅𝒕 𝒅𝒕 𝟐
1998 1 16.2 1 16.04 0.16 10
1999 2 15.4 4 16.82 -1.42 3 -7 49
2000 3 17.1 9 17.6 -0.5 6 3 9
2001 4 18.0 16 18.38 -0.38 7 1 1
2002 5 21.2 25 19.16 2.04 13 6 36
2003 6 21.4 36 19.94 1.46 12 -1 1
2004 7 20.4 49 20.72 -0.32 8 -4 16
2005 8 18.0 64 21.5 -3.5 1 -7 49
2006 9 21.3 81 22.28 -0.98 5 4 16
2007 10 23.7 100 23.06 0.64 11 -1 1
2008 11 28.0 121 23.84 4.16 15 -10 100
2009 12 27.6 144 24.62 2.98 14 5 25
2010 13 24.2 169 25.4 -1.2 4 -7 49
2011 14 25.9 196 26.18 -0.28 9
2012 15 24.2 225 26.96 -2.76 2
120 322.6 1240 404

𝑇𝑡 = 𝛼0 + 𝛼1 t, ∑ 𝑌𝑡 = n𝛼0 + 𝛼1 ∑ 𝑡

∑ 𝑡𝑌𝑡 = 𝛼0 ∑ 𝑡 + 𝛼1 ∑ t 2

𝑇𝑡 = 𝑌̂ = 15.26 + 0.78t

4.3.2 Apply the Van Newman’s Ratio Test

Step 1: State the Null and the Alternative Hypotheses

𝐻0 : The residuals are independent

𝐻𝐴 : The residuals have cycles present

Step 2: Select a Level of Significance: Choose a level say 0.1, 0.05 etc.

Step 3: Decide on the test statistic


∑(𝑅𝑡+1 −𝑅𝑡 )2 ∑(𝑅𝑡+1 −𝑅𝑡 )2 12 12
RM = ∑(𝑅𝑡 −𝑅̅ )2
= 𝑛(𝑛2 −1)
= ∑(𝑅𝑡+1 − 𝑅𝑡 )2 = ∑(𝑑𝑖 )2
𝑛(𝑛2 −1) 𝑛(𝑛2 −1)
12

Step 4: Formulate a decision Rule: Reject 𝐻0 if the 𝑅𝑀𝑐 is less than 𝑅𝑀𝑡 with (n-
2).
12
Step 5: Carry out the test; 𝑅𝑀𝑐 = x 404 = 1.44
15(152 − 1)

46
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

𝑅𝑀𝑡 = 1.19

Step 6: Conclusion: Since𝑅𝑀𝑐 = 1.44 is less than 𝑅𝑀𝑡 = 1.19 𝐻0 is rejected,


concluding with (1 – α) x 100% confidence that the residuals are not
independent, hence have a cyclic component.

47
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

48
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

4.4 Chapter Exercises

Exercise 1

Determine whether the time series below has a cyclic component.

Year (t) 𝒀𝒕 Year (t) 𝒀𝒕


2000 2248 2007 4283
2001 2966 2008 4795
2002 3453 2009 5033
2003 3766 2010 5021
2004 4031 2011 5267
2005 3749 2012 5511
2006 3778

49
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Exercise 2

Determine whether the time series below has a cyclic component.

t 𝒀𝒕 T 𝒀𝒕
1 1.985 8 2.393
2 2.032 9 2.560
3 2.086 10 2.775
4 2.095 11 2.853
5 2.188 12 2.904
6 2.270 13 2.981
7 2.320 14 3.376

Exercise 3

Listed below are the total numbers of vehicles sold by Toyota Motor Company
(in thousands) from 2000 to 2012.

Year Units Sold Year Units Sold


2003 3,876 2008 3,693
2004 4,313 2009 4,131
2005 4,131 2010 4,591
2006 3,632 2011 4,279
2007 3,212 2012 4,222

Determine the trend equation and test for cyclic variation.

50
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

CHAPTER FIVE

The Seasonal Variations

5.0 Goals of the chapter

1. Understand what seasonal variations are and what their causes are
2. Define and use the application of the test for seasonal variations

5.1 Introduction

The rationale of a test for seasonality is that if specific seasonal are purely
random then the distribution would be the same for all seasons. In other
words, the rankings of the specific seasonal should be as likely to fall in one
season as in another.

5.2 Causes of seasonality

There are numerous causes of seasonality. These include:


1. Climate changes – winter, summer, rainy seasons
2. Calendar related factors – Christmas, Fasting, End of year seasons
3. Changes in the gestation period of goods – The time taken for a product
to mature.
4. Institutional man-made such as working holidays, school calendar.

5.3 Computation of the seasonal factor in time series

Under the multiplicative model: 𝑌𝑡 = 𝑆𝑡 ∗ 𝑇𝑡 ∗ 𝐶𝑡 ∗ 𝐼𝑡 |𝑀𝐴|


𝑌𝑡
= 𝑆𝑡 ∗ 𝐼𝑡
𝑀𝐴

Under the additive model: 𝑌𝑡 = 𝑆𝑡 + 𝑇𝑡 + 𝐶𝑡 + 𝐼𝑡 |𝑀𝐴|

𝑌𝑡 − 𝑀𝐴 = 𝑆𝑡 + 𝐼𝑡

L – is the number of times, the time series is observed in a year.

If L is odd, only calculate the Moving Average (MA) and not the Cumulative
Moving Average (CMA).

If L is even, calculate both the Moving Average and the Cumulative Moving
Average.

By doing so, we remove the trend and cyclic components from the series and
remain only with the seasonal and irregular terms.

51
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

5.3.1 Kruskal – Wallis Test for seasonal variation

Definitions

𝑅𝑖 = Rank of the ith season

𝑛𝑖 = Number of specific seasonal in the ith season

n = ∑𝐿1 𝑛𝑖

𝑅̅𝑖 = Average rank of the ith season

Test statistic

12 𝑅𝑖 2
H= [∑ ] – 3(n+1) where n denotes the total number of pieces of data and
𝑛(𝑛+1) 𝑛𝑖
𝑅1 , 𝑅2 , … … … 𝑅𝐾 denote respectively, the sums of the ranks for the sample data
from populations 1,2,………,K.

Assumptions for the Kruskal – Wallis Test

1. Independent samples
2. Populations have the same shape
3. All samples are of size 5 or greater

Additive model:

𝐻0 : 𝑆1 + 𝑆2 + 𝑆3 + 𝑆4 = 0, and 𝐻𝐴 : Some 𝑆𝑖 ≠ 0

Or

𝐻0 = ∑ 𝑆𝑖 = 0, and 𝐻𝐴 : ∑ 𝑆𝑖 ≠ 0

Multiplicative model:

𝐻0 : 𝑆1 = 𝑆2 = 𝑆3 = 𝑆4 = 1, and 𝐻𝐴 : Some 𝑆𝑖 ≠ 1

Or

𝐻0 = ∑ 𝑆𝑖 = L, and 𝐻𝐴 : ∑ 𝑆𝑖 ≠ L

Generalization:

𝐻0 : The series is stationary, 𝐻𝐴 : The series has seasonality

Steps to follow when constructing the Kruskal-Wallis Test

52
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Step 1: State the null and the alternate hypotheses

Step 2: Decide on the significance level, α

Step 3: Decide on the critical value ᵡ2𝛼 with, (K -1) degrees of freedom. Use the
tables or calculator to determine the critical value.

Step 4: Construct a worktable of the following form:

Sample Overall Sample Overall Sample Overall


Population Rank Population Rank Population Rank
1 2 K

Step 5: Compute the value of the test statistic


12 𝑅𝑖 2
H= [∑ ] – 3(n+1)
𝑛(𝑛+1) 𝑛𝑖

Step 6: If the value of the test statistic falls in the rejection region, then reject
𝐻0 ; otherwise, do not reject 𝐻0 .

Step 7: State the conclusion in words.

Example 1:

Using the data in the table below, make the Kruskal-Wallis test to determine
whether seasonality exists in the series or not:

Year I II III IV
2008 4 2 1 5
2009 6 4 4 14
2010 10 3 5 16
2011 12 9 7 22
2012 10 13 35 35

Specific Seasonal
Year I II III IV
2008 -2.25 1.25
2009 1.625 -1.875 -3.5 6.125
2010 1.625 -5.75 -3.75 6.25
2011 1.25 -2.75 -5.25 9.5
2012 -6.5 -8.625

53
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Ranking

Year I II III IV
2008 8 10.5
2009 12.5 9 6 14
2010 12.5 3 5 15
2011 10.5 7 4 16
2012 2 1
∑ 𝑅𝑖 37.5 20 23 55.5

𝑛𝑖 4 4 4 4
∑ 𝑅𝑖 2 1406.25 400 529 3080.25

𝑛 = ∑ 𝑛𝑖 = 4 + 4 + 4 + 4 = 16

12 𝑅𝑖 2 12 1406.25 400 529 3080.25


𝐻 = 𝑛(𝑛+1) [∑ ] – 3(𝑛 + 1) = [ + + + ] − 3(16 + 1)
𝑛𝑖 16(16+1) 4 4 4 4

12 5415.5
= [ ] − 51 = 59.7298 − 51 = 8.7298
272 4

Rejection criteria: Reject 𝐻0 if 𝐻 > 𝜒𝛼2 ᵡ2𝛼 with, (𝑘 − 1) degrees of freedom


2 (4 2 (3)
𝜒𝛼2 (𝑘 − 1) = 𝜒0.05 − 1) = 𝜒0.05 = 7.815

Conclusion: Reject 𝐻0 , that the series is stationary at 95% confidence interval.

Example 2:
In Kampala, independent random samples of cars, buses, and trucks provided
the data on the number of kilometers, driven last year, in thousands:
Cars Buses Trucks
19.2 1.3 11.6
12.5 7.3 24.0
1.5 7.3 8.2
6.1 7.0 10.6
33.5 12.8 10.0
7.6 18.9 2.3
11.3 44.1
6.3 1.5
8.8 13.0
0.4

At the 5% significant level, do the data provide sufficient evidence to conclude


that a difference exists in last year’ mean number of kilometer driven among
cars, buses and trucks?

54
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Step 1: State the null hypotheses


Let 𝜇1 = 𝜇2 = 𝜇3 denote last year’s mean number of kilometer driven for cars,
buses and trucks, respectively.
𝐻0 = 𝜇1 = 𝜇2 = 𝜇3 (the mean numbers of kilometers driven are equal)
𝐻𝐴 = Not all means are equal
Step 2: Decide on the significance level, α
We are to perform the hypothesis test at the 5% significance level, so α = 0.05.
2 (3 2 (2)
Step 3: The critical value is 𝜒𝛼2 (𝑘 − 1) = 𝜒0.05 − 1) = 𝜒0.05 = 5.991
Step 4: Construct a worktable

Cars Rank Buses Rank Trucks Rank


19.2 22 1.3 2 11.6 17
12.5 18 7.3 9.5 24.0 23
1.5 3.5 7.3 9.5 8.2 12
6.1 6 7.0 8 10.6 15
33.5 24 12.8 19 10.0 14
7.6 11 18.9 21 2.3 5
11.3 16 44.1 25
6.3 7 1.5 3.5
8.8 13 13.0 20
0.4 1
12.15 11.50 14.94

Step 5: Compute the value of the test statistic

𝑛 = 10 + 6 + 9 = 25

𝑅1 = 22 + 18 + 3.5 + 6 + 24 + 11 + 16 + 7 + 13 + 1 = 121.5

𝑅2 = 2 + 9.5 + 9.5 + 8 + 19 + 21 = 69.0

𝑅3 = 17 + 23 + 12 + 15 + 14 + 5 + 25 + 3.5 + 20 = 134.5

12 𝑅𝑖 2 12 121.52 692 134.52


𝐻 = 𝑛(𝑛+1) [∑ ] −3(n+1) = [ + + ] − 3(25 + 1)
𝑛𝑖 25(25+1) 10 6 9

= 1.011

Step 6: If the value of the test statistic falls in the rejection region, reject𝐻0 ;
otherwise, do not reject 𝐻0 .

Since 𝐻 = 1.011 < ᵡ20.05 (2) = 5.991 we do not reject 𝐻0 .

Step 7: State the conclusion in words

The test results are not statistically significant at the 5% level; that is, at the,
at the 5% significance level the data do not provide sufficient evidence to

55
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

conclude that a difference exists in the last year’s mean number of kilometers
driven among cars, buses and trucks.

5.4 CHAPTER EXERCISES

Exercise 1

The Appliance Center sells a variety of electronic equipment and home


appliances. For the last four years the following quarterly sales (in $ millions)
were reported and recorded as below:
Quarter
Year I II III IV
2009 5.3 4.1 6.8 6.7
2010 4.8 3.8 5.6 6.8
2011 4.3 3.8 5.7 6.0
2012 5.6 4.6 6.4 5.9
Determine a typical seasonal index for each of the four quarters.
Exercise 2
Victor Mule, the owner of Mule Investments, is studying absenteeism among
his employees. His workforce is small, consisting of only five employees. For the
last three years he recorded the following number of employee absences, in
days, for each quarter.
Quarter
Year I II III IV
2010 4 10 7 3
2011 5 12 9 4
2012 6 16 12 4
Determine a typical seasonal index for each of the four quarters.

Exercise 3

Gaba Village, near Kansanga Trading Center, contains shops, restaurants, and
motels. They have two peak seasons – Vacation and End of Year. The specific
seasonals with respect to the total sales volumes for the recent years are:

Quarter
Year I II III IV
2008 117.0 80.7 129.6 76.1
2009 118.6 82.5 121.4 77.0
2010 114.0 84.3 119.9 75.0
2011 120.7 79.6 130.7 69.6
2012 125.2 80.2 127.6 72.0

a. Develop the typical seasonal pattern for Gaba Village using the ratio – to-
moving average method.
b. Explain the typical index for the first quarter.

56
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Exercise 4

Using the data in the table below, compute a three- year and a five-year moving
average:

Year Production Year Production


1995 6 2004 9
1996 8 2005 13
1997 10 2006 15
1998 5 2007 18
1999 3 2008 15
2000 7 2009 11
2001 10 2010 14
2002 12 2011 17
2003 11 2012 22

CHAPTER SIX

FORECASTING

6.0 Goals of the chapter

1. Understand what forecasts are


2. Define and use the application of forecasts

6.1 Introduction

Forecast is the process of predicting events by extrapolation past and current


time series data into the future according to a statistical model.

Accurate forecasting is extremely crucial in governmental, corporate, and


private planning and decision making. For instance, government economists
must be able to forecast next year’s Gross National Product in order to estimate
the expected tax revenues. Corporations need to forecast personal disposal
incomes in order to estimate gross sales and make plant capacity adjustments.
Private individuals want to know their future financial needs in order to set
aside adequate funds for retirement.

57
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

6.2 Linear Trend

Using the multiplicative model

Example:

Using the multiplicative model fit a linear trend and forecast the values for the
four quarters of 2013 on the data below:

Year Quarter 𝒀𝒕 Year Quarter 𝒀𝒕


2009 I 6 2011 I 15
II 9 II 18
III 10 III 24
IV 5 IV 9
2010 I 16 2012 I 20
II 20 II 22
III 18 III 17
IV 6 IV 25

Year Quarter 𝒀𝒕 T t𝒀𝒕 𝒕𝟐


2009 I 6 1 6 1
II 9 2 18 4
III 10 3 30 9
IV 5 4 20 16
2010 I 16 5 80 25
II 20 6 120 36
III 18 7 126 49
IV 6 8 48 64
2011 I 15 9 135 81
II 18 10 180 100
III 24 11 264 121
IV 9 12 108 144
2012 I 20 13 260 169
II 22 14 308 196
III 17 15 255 225
IV 25 16 400 256
Totals 240 136 2,358 1300
2013 I 29.35
II
III
IV
2014 I
II
III
IV

58
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

∑ 𝑌𝑡 = n𝛼0 + 𝛼1 ∑ 𝑡
240 = 16𝛼0 + 136 𝛼1 (i)
∑ 𝑡𝑌𝑡 = 𝛼0 ∑ 𝑡+ 𝛼1 ∑ 𝑡 2
2,358 = 136 𝛼0 + 1300 𝛼1 (ii)
Solving the two equations simultaneously:
𝑛 ∑𝑡 𝛼0 ∑ 𝑌𝑡
( 2 ) (𝛼 ) =( )
∑𝑡 ∑𝑡 1 ∑ 𝑡𝑌𝑡
16 136 𝛼0 240
( ) (𝛼 ) =( )
136 1300 1 2,358
∑ 𝑌𝑡 ∑𝑡 240 136
| | | |
∑ 𝑡𝑌𝑡 ∑ 𝑡 2 2358 1300 (240𝑥1300)−(136𝑥2358) 312000−320688 −8688
𝛼0 = 𝑛 ∑𝑡 = 16 136 = = = -3.7708
| | | | (16𝑥1300)−(136𝑥136) 20800−18496 2304
∑ 𝑡 ∑ 𝑡2 136 1300

𝑛 ∑ 𝑌𝑡 16 240
| | | | (16𝑥2358)−(136𝑥240)
∑ 𝑡 ∑ 𝑡𝑌𝑡 136 2358 37728−32640 5088
𝛼1 = 𝑛 ∑ 𝑡 = 16 136 = = = = 2.2083
| | | | (16𝑥1300)−(136𝑥136) 20800−18496 2304
∑ 𝑡 ∑ 𝑡2 136 1300

Therefore the trend is: 𝑌̂ = 𝑇𝑡 = -3.7708 + 2.2083t

Projection for t =15

𝑌̂15= 𝑇15 = -3.7708 + 2.2083(15) = -3.7708 + 33.1245 = 29.35

6.2 Non-Linear Trends

Fit a simple exponential trend for the data below and project for 𝑡18 :

t 𝒀𝒕 T 𝒀𝒕
1 2.27 8 2.9
2 2.32 9 2.982
3 2.393 10 2.059
4 2.56 11 3.143
5 2.647 12 3.289
6 2.775 13 3.376
7 2.85 14 3.602
Solution

t 𝒀𝒕 ln𝒀𝒕 t ln𝒀𝒕 𝒕𝟐
1 2.27 0.82 0.82 1
2 2.32 0.84 1.68 4
3 2.393 0.87 2.61 9
4 2.56 0.94 3.76 16
5 2.647 0.97 4.85 25
6 2.775 1.02 6.12 36
7 2.85 1.05 7.35 49
59
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

8 2.9 1.06 8.48 64


9 2.982 1.09 9.81 81
10 2.059 0.72 7.2 100
11 3.143 1.15 12.65 121
12 3.289 1.19 14.28 144
13 3.376 1.22 15.86 169
14 3.602 1.28 17.92 196
105 14.22 113.39 1015
14.22 = 14ln𝛼0 + 105ln𝛼1

113.39 = 105ln𝛼0 + 1015ln𝛼1

Let ln 𝛼0 = 𝑎 and ln 𝛼1 = 𝑏

14.22 = 14𝑎 + 105𝑏 (i)

113.39 = 105𝑎 + 1015𝑏 (ii)

Solving the two equations simultaneously: a = 0.7648, and b = 0.0326

Therefore, ln 𝛼0 = 0.7648 and ln 𝛼1 = 0.0326

𝛼0 = 𝑒 0.7648 =2.1486 and 𝛼1 = 𝑒 0.0326 = 1.0331

The exponential trend is: 𝑌̂ = 𝑇𝑡 = (2.1486)( 1.0331)𝑡

The projection for 𝑇18 :

𝑌̂ = 𝑇18 = (2.1486)( 1.0331)18 = (2.1486)(1.7971) = 3.8612

6.3 CHAPTER EXERCISES

Exercise 1

Fit a linear trend for the data in the table below and plot both the time series
and the trend values and project the value for 𝑌18 .

Year (t) 𝒀𝒕 Year (t) 𝒀𝒕


2000 2248 2007 4283
2001 2966 2008 4795
2002 3453 2009 5033
2003 3766 2010 5021
2004 4031 2011 5267
2005 3749 2012 5511
2006 3778

60
TIME SERIES ANALYSIS KYAMBOGO UNIVERSITY; Jan –May 2020

Exercise 2

Fit a second degree trend to the following data and plot the time series and
trend and project the value for 𝑌17 .

t 𝒀𝒕 t 𝒀𝒕
1 1.985 8 2.393
2 2.032 9 2.560
3 2.086 10 2.775
4 2.095 11 2.853
5 2.188 12 2.904
6 2.270 13 2.981
7 2.320 14 3.376

Exercise 3

The following is the enrolling at the University of Kampala from 2000 to 2012.
e. Develop both a linear and a logarithmetic trend equation.
f. Estimate the enrollment for 2016 using both equations.
g. Which trend equation would you recommend? Why?
h. Plot the data and comment on your forecast.

Year Enrollment Year Enrollment Year Enrollment


1986 12,755 1995 20,270 2004 23928
1987 14,489 1996 21,117 2005 24,781
1988 15,158 1997 21,386 2006 24,969
1989 14,669 1998 21,589 2007 24,541
1990 15,730 1999 21,039 2008 24,188
1991 17,204 2000 21,238 2009 23,107
1992 17,498 2001 21,176 2010 21,991
1993 17,257 2002 21740 2011 21,343
1994 18,246 2003 22,806 2012 20,040

Exercise 4

Listed below are the total numbers of vehicles sold by Toyota Motor Company
(in thousands) from 2000 to 2012.

Year Units Sold Year Units Sold


2003 3,876 2008 3,693
2004 4,313 2009 4,131
2005 4,131 2010 4,591
2006 3,632 2011 4,279
2007 3,212 2012 4,222

a. Determine the trend equation.


b. Project the number of units to be sold for 2015.
61

You might also like