You are on page 1of 19

MODULE 6

6.1 TIME SERIES PATTERNS:


Trend
A trend exists when there is a long-term increase or decrease in the data. It does
not have to be linear. Sometimes we will refer to a trend as “changing direction”,
when it might go from an increasing trend to a decreasing trend.
Seasonal
A seasonal pattern occurs when a time series is affected by seasonal factors such
as the time of the year or the day of the week. Seasonality is always of a fixed
and known frequency. The monthly sales of antidiabetic drugs shows seasonality
which is induced partly by the change in the cost of the drugs at the end of the
calendar year.
Cyclic
A cycle occurs when the data exhibit rises and falls that are not of a fixed
frequency. These fluctuations are usually due to economic conditions, and are
often related to the “business cycle”. The duration of these fluctuations is usually
at least 2 years.
Many people confuse cyclic behaviour with seasonal behaviour, but they are
really quite different. If the fluctuations are not of a fixed frequency then they are
cyclic; if the frequency is unchanging and associated with some aspect of the
calendar, then the pattern is seasonal. In general, the average length of cycles is
longer than the length of a seasonal pattern, and the magnitudes of cycles tend to
be more variable than the magnitudes of seasonal patterns.
Many time series include trend, cycles and seasonality. When choosing a
forecasting method, we will first need to identify the time series patterns in the
data, and then choose a method that is able to capture the patterns properly.
The examples in Figure 2.3 show different combinations of the above
components.
Figure 2.3: Four examples of time series showing different patterns.
The monthly housing sales (top left) show strong seasonality within each year, as
well as some strong cyclic behaviour with a period of about 6–10 years. There is
no apparent trend in the data over this period.
The US treasury bill contracts (top right) show results from the Chicago market
for 100 consecutive trading days in 1981. Here there is no seasonality, but an
obvious downward trend. Possibly, if we had a much longer series, we would see
that this downward trend is actually part of a long cycle, but when viewed over
only 100 days it appears to be a trend.
The Australian quarterly electricity production (bottom left) shows a strong
increasing trend, with strong seasonality. There is no evidence of any cyclic
behaviour here.
The daily change in the Google closing stock price (bottom right) has no trend,
seasonality or cyclic behaviour. There are random fluctuations which do not
appear to be very predictable, and no strong patterns that would help with
developing a forecasting model.
6.2 FORECAST ACCURACY:
Training and test sets
It is important to evaluate forecast accuracy using genuine forecasts.
Consequently, the size of the residuals is not a reliable indication of how large
true forecast errors are likely to be. The accuracy of forecasts can only be
determined by considering how well a model performs on new data that were not
used when fitting the model.
When choosing models, it is common practice to separate the available data into
two portions, training and test data, where the training data is used to estimate
any parameters of a forecasting method and the test data is used to evaluate its
accuracy. Because the test data is not used in determining the forecasts, it should
provide a reliable indication of how well the model is likely to forecast on new
data.

The size of the test set is typically about 20% of the total sample, although this
value depends on how long the sample is and how far ahead you want to forecast.
The test set should ideally be at least as large as the maximum forecast horizon
required. The following points should be noted.
• A model which fits the training data well will not necessarily
forecast well.
• A perfect fit can always be obtained by using a model with enough
parameters.
• Over-fitting a model to data is just as bad as failing to identify a
systematic pattern in the data.
Some references describe the test set as the “hold-out set” because these data are
“held out” of the data used for fitting. Other references call the training set the
“in-sample data” and the test set the “out-of-sample data”. We prefer to use
“training data” and “test data” in this book.
Forecast errors
A forecast “error” is the difference between an observed value and its forecast.
Here “error” does not mean a mistake, it means the unpredictable part of an
observation.
Note that forecast errors are different from residuals in two ways. First, residuals
are calculated on the training set while forecast errors are calculated on
the test set. Second, residuals are based on one-step forecasts while forecast
errors can involve multi-step forecasts.
We can measure forecast accuracy by summarising the forecast errors in different
ways.
1)Scale-dependent errors
The forecast errors are on the same scale as the data. Accuracy measures that are
based only on et are therefore scale-dependent and cannot be used to make
comparisons between series that involve different units.
The two most commonly used scale-dependent measures are based on the
absolute errors or squared errors: Mean absolute error: MAE=mean(|et|),Root
mean squared error: RMSE=√mean(e2t).
When comparing forecast methods applied to a single time series, or to several
time series with the same units, the MAE is popular as it is easy to both
understand and compute. A forecast method that minimises the MAE will lead to
forecasts of the median, while minimising the RMSE will lead to forecasts of the
mean. Consequently, the RMSE is also widely used, despite being more difficult
to interpret.
2)Percentage errors
The percentage error is given by pt=100et/yt. Percentage errors have the
advantage of being unit-free, and so are frequently used to compare forecast
performances between data sets. The most commonly used measure is: Mean
absolute percentage error: MAPE=mean(|pt|).
Measures based on percentage errors have the disadvantage of being infinite or
undefined.Another problem with percentage errors that is often overlooked is that
they assume the unit of measurement has a meaningful zero.3 For example, a
percentage error makes no sense when measuring the accuracy of temperature
forecasts on either the Fahrenheit or Celsius scales, because temperature has an
arbitrary zero point.
They also have the disadvantage that they put a heavier penalty on negative errors
than on positive errors.
.
3)Scaled errors
Scaled errors were proposed by Hyndman & Koehler (2006) as an alternative to
using percentage errors when comparing forecast accuracy across series with
different units. They proposed scaling the errors based on the training MAE from
a simple forecast method.

6.3 MOVING AVERAGES AND EXPONENTIAL SMOOTHENING:


Moving averages and exponential smoothing are two commonly used techniques
for forecasting time series data. Here's an explanation of each technique along
with examples:

1. Moving Averages:
Moving averages involve calculating the average of a specific number of past
observations to forecast future values. It is a simple and intuitive method that
smooths out fluctuations in the data.

There are different types of moving averages, such as:


- Simple Moving Average (SMA): In this method, the forecasted value is the
average of the past 'n' observations. For example, if you want to forecast the sales
for the next month using a 3-month simple moving average, you would take the
average of the sales for the past three months.

- Weighted Moving Average (WMA): WMA assigns different weights to each


observation based on their importance. The weighted average is then calculated
by multiplying each observation with its corresponding weight. This method
gives more weight to recent observations. For example, you may assign weights
of 0.4, 0.3, and 0.3 to the last three months' sales and calculate the weighted
average accordingly.

Example:
Let's say you want to forecast monthly sales for a product using a 3-month simple
moving average. The sales data for the past six months are as follows:

Month: Sales
----------------
January: 100
February: 120
March: 110
April: 130
May: 140
June: 150

To forecast the sales for July, you would take the average of the sales for April,
May, and June, which is (130 + 140 + 150) / 3 = 140.

2. Exponential Smoothing:
Exponential smoothing is a technique that assigns exponentially decreasing
weights to past observations, with more recent observations having higher
weights. It is a popular method for forecasting time series data as it captures both
trend and seasonality.

The forecasted value is calculated using the following formula:


Forecast = α * (Latest Observation) + (1 - α) * (Previous Forecast)
Here, α (alpha) is the smoothing factor that determines the weight given to the
latest observation.

Example:
Let's consider the same sales data as in the previous example. Suppose we use
exponential smoothing with α = 0.3 to forecast the sales for July. The forecast
calculation would be as follows:

Latest Observation (June sales) = 150


Previous Forecast (June forecast) = 140

Forecast = 0.3 * 150 + 0.7 * 140 = 45 + 98 = 143

Therefore, the forecasted sales for July using exponential smoothing would be
143.

6.4 TREND PROJECTION:


Trend projection, also known as linear regression or trend analysis, is a
forecasting method that assumes a linear relationship between the forecasted
variable and time. It involves fitting a straight line to historical data points and
extending the line into the future to make predictions.
The trend projection method assumes that the underlying trend in the data will
continue in a straight-line fashion. It is a simple and commonly used technique
for forecasting time series data when there is a clear and consistent trend observed
over time.
The process of trend projection involves the following steps:
1. Data collection: Gather historical data on the variable of interest over a specific
period.
2. Plot the data: Create a scatter plot with time on the x-axis and the variable's
values on the y-axis. This helps visualize the trend and identify any patterns.

3. Fit the trend line: Use a regression analysis technique to fit a straight line to the
data points. The regression line represents the best linear fit to the historical data.

4. Determine the equation of the trend line: The equation of the trend line takes
the form of a linear equation: Y = a + bx, where Y is the forecasted variable, x is
time, a is the y-intercept (the value of Y when x is 0), and b is the slope of the line
(the rate of change of Y with respect to time).

5. Forecast future values: Extend the trend line into the future by adding new time
points and using the equation of the line to calculate the corresponding forecasted
values.

It's important to note that trend projection assumes a constant rate of change and
may not capture complex patterns or changes in the underlying data. Therefore,
it is more suitable for short- to medium-term forecasts when the trend is relatively
stable.

Additionally, it's always a good practice to evaluate the accuracy of the trend
projection model using appropriate metrics and consider other factors that may
influence the variable being forecasted, such as seasonality, cyclical patterns, or
external factors.

6.5 SEASONALITY AND TREND AND TIME SERIES


DECOMPOSTION:
Seasonality and trend are two important components of time series data.
Seasonality refers to the regular and recurring patterns that occur at fixed intervals
within a time series, while trend represents the long-term upward or downward
movement in the data.
Time series decomposition is a method used to separate a time series into its
underlying components, including trend, seasonality, and other residual or
irregular components. The decomposition process helps in understanding and
modeling the individual components of the time series, which can then be used
for forecasting or analysis purposes.
There are two commonly used approaches for time series decomposition:
1. Additive Decomposition:
In the additive decomposition method, the time series is expressed as the sum of
its individual components:
Time Series = Trend + Seasonality + Residual
The additive decomposition assumes that the fluctuations around the trend and
seasonality components are constant over time.
2. Multiplicative Decomposition:
In the multiplicative decomposition method, the time series is expressed as the
product of its components:
Time Series = Trend * Seasonality * Residual
The multiplicative decomposition assumes that the fluctuations around the trend
and seasonality components are proportional to the level of the time series.
The decomposition process involves the following steps:

1. Identifying the period: Determine the length of the seasonal pattern in the data.
For example, if the data shows a repeating pattern every 12 months, the period is
12.
2. Decompose the time series: Apply the chosen decomposition method (additive
or multiplicative) to break down the time series into its individual components,
i.e., trend, seasonality, and residual.
3. Analyzing the components: Examine the trend component to understand the
long-term movement in the data and identify any patterns or changes. Assess the
seasonality component to identify recurring patterns and variations. Analyze the
residual component, which represents the random or unpredictable fluctuations
in the data.
4. Forecasting: Once the components have been identified and analyzed, they can
be used for forecasting future values of the time series. For example, the trend
and seasonality components can be extrapolated into the future, and the residuals
can be modeled using statistical techniques.
Time series decomposition is a useful tool for understanding and modeling
complex time series data, especially when there are clear seasonal and trend
patterns. It helps in capturing and incorporating these components into
forecasting models, leading to more accurate and reliable predictions.
6.6 NONPARAMETRIC METHODS:
Nonparametric methods, also known as distribution-free methods, are statistical
techniques that do not make explicit assumptions about the underlying probability
distribution or functional form of the data. Unlike parametric methods, which
require assumptions about the data distribution, nonparametric methods offer
more flexibility and can be applied to a wider range of data types and situations.
Here are some key characteristics and examples of nonparametric methods:
1. No Assumptions about Data Distribution:
Nonparametric methods do not assume a specific probability distribution for the
data. Instead, they rely on fewer assumptions or make weaker assumptions about
the data structure, which makes them more robust and applicable in various
scenarios.
2. Data-Driven Approaches:
Nonparametric methods often employ data-driven approaches to estimate
statistical parameters. They rely on the observed data itself to make inferences or
perform analyses, rather than assuming a specific mathematical model.

3. Rank-Based Techniques:
Many nonparametric methods utilize ranks or orders of the data instead of the
actual data values. Ranks provide a way to represent the relative positions or
orders of the observations, and these methods can be especially useful when
dealing with skewed or heavy-tailed distributions.

4. Examples of Nonparametric Methods:


- Mann-Whitney U Test: A nonparametric alternative to the independent samples
t-test, used to compare two independent groups or populations.
- Kruskal-Wallis Test: A nonparametric alternative to the one-way ANOVA, used
to compare three or more independent groups.
- Wilcoxon Signed-Rank Test: A nonparametric alternative to the paired samples
t-test, used to compare paired observations.
5. Advantages and Disadvantages:
Advantages of nonparametric methods include their flexibility, robustness to
violations of assumptions, and applicability to a wide range of data types. They
can be used in situations where parametric assumptions may not hold or when
there is limited data. However, nonparametric methods may have lower statistical
power compared to parametric methods in certain scenarios, especially when
assumptions are met.

Nonparametric methods are particularly useful when dealing with small sample
sizes, data with outliers or non-normal distributions, or variables with unknown
distributions. They provide an alternative approach to statistical inference and
analysis that does not require strong distributional assumptions.

6.7 SIGN TEST:


The sign test is a nonparametric statistical test used to determine whether there is
a significant difference between paired observations or to compare the medians
of two related samples. It is a distribution-free test that does not rely on specific
assumptions about the underlying distribution of the data.
Here's an overview of how the sign test works:
1. Formulate the hypothesis:
The sign test is typically used to test the null hypothesis that there is no difference
between two paired or related samples. The alternative hypothesis may state that
there is a difference, but it does not specify the direction of the difference.

2. Calculate the test statistic:


For each pair of observations, assign a "+" sign if the second observation is
greater than the first, a "-" sign if the second observation is smaller, and a "0" sign
if the two observations are equal. Count the number of "+" and "-" signs.

3. Determine the critical region:


Based on the number of pairs and the significance level chosen, determine the
critical region or critical value for the test. The critical region represents the range
of values that, if the test statistic falls within, would lead to rejecting the null
hypothesis.

4. Perform the test:


Compare the number of "+" and "-" signs to the critical region. If the test statistic
falls within the critical region, the null hypothesis is rejected, indicating a
significant difference between the paired observations. If the test statistic falls
outside the critical region, the null hypothesis is not rejected, indicating no
significant difference.

5. Calculate the p-value:


Optionally, you can calculate the exact p-value associated with the observed test
statistic. The p-value represents the probability of obtaining a test statistic as
extreme as, or more extreme than, the observed value, assuming the null
hypothesis is true. If the p-value is below the significance level, the null
hypothesis is rejected.
The sign test is commonly used in situations where the data are not normally
distributed, the differences between paired observations are not assumed to
follow a specific distribution, or when the data are measured on an ordinal scale.

It is important to note that the sign test has less statistical power than parametric
tests, such as the paired t-test, especially when the data exhibit a symmetric
distribution. Additionally, the sign test focuses on the medians or central tendency
of the data and does not provide information about the magnitude of the
difference between the paired observations.

6.8 Wilcoxon Signed-Rank Test:


The Wilcoxon signed-rank test is a nonparametric statistical test used to
determine whether there is a significant difference between paired observations
or to compare the medians of two related samples. It is often used when the data
do not meet the assumptions of normality required for parametric tests, such as
the paired t-test.
Here's an overview of how the Wilcoxon signed-rank test works:
1. Formulate the hypothesis:
The Wilcoxon signed-rank test is typically used to test the null hypothesis that
there is no difference between paired or related samples. The alternative
hypothesis may state that there is a difference, but it does not specify the direction
of the difference.

2. Calculate the test statistic:


For each pair of observations, calculate the difference between the paired values.
Discard any pairs with a difference of zero. Rank the absolute values of the
differences, assigning ranks from 1 to N, where N is the number of non-zero
differences. The positive ranks are assigned to positive differences, and the
negative ranks are assigned to negative differences. Calculate the sum of the
positive ranks (W+) and the sum of the negative ranks (W-).
3. Determine the critical region:
Based on the number of pairs and the significance level chosen, determine the
critical region or critical value for the test. The critical region represents the range
of values that, if the test statistic falls within, would lead to rejecting the null
hypothesis.

4. Perform the test:


Compare the test statistic (W+ or W-) to the critical region. If the test statistic
falls within the critical region, the null hypothesis is rejected, indicating a
significant difference between the paired observations. If the test statistic falls
outside the critical region, the null hypothesis is not rejected, indicating no
significant difference.

5. Calculate the p-value:


Optionally, you can calculate the exact p-value associated with the observed test
statistic. The p-value represents the probability of obtaining a test statistic as
extreme as, or more extreme than, the observed value, assuming the null
hypothesis is true. If the p-value is below the significance level, the null
hypothesis is rejected.

The Wilcoxon signed-rank test is commonly used in situations where the data are
not normally distributed, the differences between paired observations are not
assumed to follow a specific distribution, or when the data are measured on an
ordinal scale.

It is important to note that the Wilcoxon signed-rank test is less sensitive to


detecting differences between groups compared to parametric tests, especially
when the data follow a symmetric distribution. However, it provides a robust
alternative when the assumptions of parametric tests are violated.
6.9 Mann-Whitney-Wilcoxon Test:
The Mann-Whitney-Wilcoxon test, also known as the Mann-Whitney U test or
Wilcoxon rank-sum test, is a nonparametric statistical test used to compare two
independent groups or samples. It is commonly used when the data do not meet
the assumptions of normality required for parametric tests, such as the
independent t-test.

Here's an overview of how the Mann-Whitney-Wilcoxon test works:

1. Formulate the hypothesis:


The Mann-Whitney-Wilcoxon test is used to test the null hypothesis that there is
no difference between two independent groups or samples. The alternative
hypothesis may state that there is a difference, but it does not specify the direction
of the difference.

2. Combine the data:


Combine the data from both groups or samples into a single dataset. Assign a rank
to each observation, regardless of the group it belongs to. The ranks are assigned
based on the combined order of the observations, from the smallest to the largest.

3. Calculate the test statistic:


Calculate the sum of ranks (U) for one of the groups (typically the smaller group).
The test statistic U represents the probability that a randomly chosen observation
from one group will have a lower rank than a randomly chosen observation from
the other group. Alternatively, you can calculate the minimum of U and its
complement (N1 * N2 - U), where N1 and N2 are the sample sizes of the two
groups.
4. Determine the critical region:
Based on the sample sizes and the significance level chosen, determine the critical
region or critical value for the test. The critical region represents the range of
values that, if the test statistic falls within, would lead to rejecting the null
hypothesis.

5. Perform the test:


Compare the test statistic (U or its minimum with its complement) to the critical
region. If the test statistic falls within the critical region, the null hypothesis is
rejected, indicating a significant difference between the two groups. If the test
statistic falls outside the critical region, the null hypothesis is not rejected,
indicating no significant difference.

6. Calculate the p-value:


Optionally, you can calculate the exact p-value associated with the observed test
statistic. The p-value represents the probability of obtaining a test statistic as
extreme as, or more extreme than, the observed value, assuming the null
hypothesis is true. If the p-value is below the significance level, the null
hypothesis is rejected.

The Mann-Whitney-Wilcoxon test is commonly used when comparing two


independent groups or samples, especially when the data are not normally
distributed or measured on an ordinal scale. It is a robust alternative to parametric
tests, such as the independent t-test, when the assumptions of those tests are
violated.

It's important to note that the Mann-Whitney-Wilcoxon test does not provide
information about the magnitude of the difference between the groups. It only
tests whether there is a significant difference or not.

6.10 Kruskal-Wallis Test:


The Kruskal-Wallis test is a non-parametric statistical test used to determine
whether there are statistically significant differences between the medians of two
or more groups. It is an extension of the Mann-Whitney U test, which is used to
compare two independent groups.

The Kruskal-Wallis test is used when the data do not meet the assumptions
required for parametric tests, such as the analysis of variance (ANOVA).
Specifically, it does not assume that the data are normally distributed or have
equal variances across groups.

Here are the steps involved in conducting the Kruskal-Wallis test:

1. Formulate hypotheses:
- Null hypothesis (H0): The medians of all groups are equal.
- Alternative hypothesis (Ha): At least one group has a different median.

2. Rank the data:


- Combine the data from all groups into a single ranked list.
- Assign ranks to the data values, ignoring group membership.
- If there are tied values, assign the average rank to each tied value.

3. Calculate the test statistic:


- Calculate the sum of ranks for each group.
- Calculate the overall sum of ranks across all groups.
- Calculate the test statistic, which follows the chi-square distribution:
H = [(12 / (n(n + 1))) * (Σ(Ri^2 / ni)) - 3(n + 1)],
where Ri is the sum of ranks in group i, ni is the number of observations in
group i, and n is the total number of observations.

4. Determine the critical value:


- Based on the significance level (α) chosen for the test (e.g., 0.05), determine
the critical value from the chi-square distribution with (k - 1) degrees of freedom,
where k is the number of groups.

5. Compare the test statistic with the critical value:


- If the test statistic is greater than the critical value, reject the null hypothesis
and conclude that there are significant differences between the medians of the
groups.
- If the test statistic is not greater than the critical value, fail to reject the null
hypothesis and conclude that there is insufficient evidence to suggest differences
between the medians.

6. Report the results:


- If the null hypothesis is rejected, you can perform post hoc tests (e.g., Dunn's
test) to determine which specific groups differ significantly from each other.

It's worth noting that the Kruskal-Wallis test is an omnibus test, meaning it only
tells you whether there are overall differences among the groups. If the test is
significant, further analyses are needed to determine the specific group
differences.

6.11 RANK CORRELATION:


Rank correlation is a statistical measure that quantifies the relationship between
the ranks of two variables or sets of data. It is used when the variables are ordinal
or when the assumptions required for parametric correlation measures, such as
Pearson's correlation coefficient, are not met.

The most commonly used rank correlation measures are:

1. Spearman's Rank Correlation Coefficient (ρ):


Spearman's rank correlation coefficient assesses the monotonic relationship
between two variables. It is calculated by assigning ranks to the observations of
each variable, calculating the difference in ranks for each pair of observations,
and then computing the correlation coefficient. Spearman's ρ ranges from -1 to 1,
where -1 indicates a perfect negative monotonic relationship, 1 indicates a perfect
positive monotonic relationship, and 0 indicates no monotonic relationship.

2. Kendall's Rank Correlation Coefficient (τ):


Kendall's rank correlation coefficient also measures the strength of the
monotonic relationship between two variables. It is calculated by counting the
number of concordant and discordant pairs of observations, and then computing
the correlation coefficient. Kendall's τ ranges from -1 to 1, with the same
interpretation as Spearman's ρ.

You might also like