Professional Documents
Culture Documents
The size of the test set is typically about 20% of the total sample, although this
value depends on how long the sample is and how far ahead you want to forecast.
The test set should ideally be at least as large as the maximum forecast horizon
required. The following points should be noted.
• A model which fits the training data well will not necessarily
forecast well.
• A perfect fit can always be obtained by using a model with enough
parameters.
• Over-fitting a model to data is just as bad as failing to identify a
systematic pattern in the data.
Some references describe the test set as the “hold-out set” because these data are
“held out” of the data used for fitting. Other references call the training set the
“in-sample data” and the test set the “out-of-sample data”. We prefer to use
“training data” and “test data” in this book.
Forecast errors
A forecast “error” is the difference between an observed value and its forecast.
Here “error” does not mean a mistake, it means the unpredictable part of an
observation.
Note that forecast errors are different from residuals in two ways. First, residuals
are calculated on the training set while forecast errors are calculated on
the test set. Second, residuals are based on one-step forecasts while forecast
errors can involve multi-step forecasts.
We can measure forecast accuracy by summarising the forecast errors in different
ways.
1)Scale-dependent errors
The forecast errors are on the same scale as the data. Accuracy measures that are
based only on et are therefore scale-dependent and cannot be used to make
comparisons between series that involve different units.
The two most commonly used scale-dependent measures are based on the
absolute errors or squared errors: Mean absolute error: MAE=mean(|et|),Root
mean squared error: RMSE=√mean(e2t).
When comparing forecast methods applied to a single time series, or to several
time series with the same units, the MAE is popular as it is easy to both
understand and compute. A forecast method that minimises the MAE will lead to
forecasts of the median, while minimising the RMSE will lead to forecasts of the
mean. Consequently, the RMSE is also widely used, despite being more difficult
to interpret.
2)Percentage errors
The percentage error is given by pt=100et/yt. Percentage errors have the
advantage of being unit-free, and so are frequently used to compare forecast
performances between data sets. The most commonly used measure is: Mean
absolute percentage error: MAPE=mean(|pt|).
Measures based on percentage errors have the disadvantage of being infinite or
undefined.Another problem with percentage errors that is often overlooked is that
they assume the unit of measurement has a meaningful zero.3 For example, a
percentage error makes no sense when measuring the accuracy of temperature
forecasts on either the Fahrenheit or Celsius scales, because temperature has an
arbitrary zero point.
They also have the disadvantage that they put a heavier penalty on negative errors
than on positive errors.
.
3)Scaled errors
Scaled errors were proposed by Hyndman & Koehler (2006) as an alternative to
using percentage errors when comparing forecast accuracy across series with
different units. They proposed scaling the errors based on the training MAE from
a simple forecast method.
1. Moving Averages:
Moving averages involve calculating the average of a specific number of past
observations to forecast future values. It is a simple and intuitive method that
smooths out fluctuations in the data.
Example:
Let's say you want to forecast monthly sales for a product using a 3-month simple
moving average. The sales data for the past six months are as follows:
Month: Sales
----------------
January: 100
February: 120
March: 110
April: 130
May: 140
June: 150
To forecast the sales for July, you would take the average of the sales for April,
May, and June, which is (130 + 140 + 150) / 3 = 140.
2. Exponential Smoothing:
Exponential smoothing is a technique that assigns exponentially decreasing
weights to past observations, with more recent observations having higher
weights. It is a popular method for forecasting time series data as it captures both
trend and seasonality.
Example:
Let's consider the same sales data as in the previous example. Suppose we use
exponential smoothing with α = 0.3 to forecast the sales for July. The forecast
calculation would be as follows:
Therefore, the forecasted sales for July using exponential smoothing would be
143.
3. Fit the trend line: Use a regression analysis technique to fit a straight line to the
data points. The regression line represents the best linear fit to the historical data.
4. Determine the equation of the trend line: The equation of the trend line takes
the form of a linear equation: Y = a + bx, where Y is the forecasted variable, x is
time, a is the y-intercept (the value of Y when x is 0), and b is the slope of the line
(the rate of change of Y with respect to time).
5. Forecast future values: Extend the trend line into the future by adding new time
points and using the equation of the line to calculate the corresponding forecasted
values.
It's important to note that trend projection assumes a constant rate of change and
may not capture complex patterns or changes in the underlying data. Therefore,
it is more suitable for short- to medium-term forecasts when the trend is relatively
stable.
Additionally, it's always a good practice to evaluate the accuracy of the trend
projection model using appropriate metrics and consider other factors that may
influence the variable being forecasted, such as seasonality, cyclical patterns, or
external factors.
1. Identifying the period: Determine the length of the seasonal pattern in the data.
For example, if the data shows a repeating pattern every 12 months, the period is
12.
2. Decompose the time series: Apply the chosen decomposition method (additive
or multiplicative) to break down the time series into its individual components,
i.e., trend, seasonality, and residual.
3. Analyzing the components: Examine the trend component to understand the
long-term movement in the data and identify any patterns or changes. Assess the
seasonality component to identify recurring patterns and variations. Analyze the
residual component, which represents the random or unpredictable fluctuations
in the data.
4. Forecasting: Once the components have been identified and analyzed, they can
be used for forecasting future values of the time series. For example, the trend
and seasonality components can be extrapolated into the future, and the residuals
can be modeled using statistical techniques.
Time series decomposition is a useful tool for understanding and modeling
complex time series data, especially when there are clear seasonal and trend
patterns. It helps in capturing and incorporating these components into
forecasting models, leading to more accurate and reliable predictions.
6.6 NONPARAMETRIC METHODS:
Nonparametric methods, also known as distribution-free methods, are statistical
techniques that do not make explicit assumptions about the underlying probability
distribution or functional form of the data. Unlike parametric methods, which
require assumptions about the data distribution, nonparametric methods offer
more flexibility and can be applied to a wider range of data types and situations.
Here are some key characteristics and examples of nonparametric methods:
1. No Assumptions about Data Distribution:
Nonparametric methods do not assume a specific probability distribution for the
data. Instead, they rely on fewer assumptions or make weaker assumptions about
the data structure, which makes them more robust and applicable in various
scenarios.
2. Data-Driven Approaches:
Nonparametric methods often employ data-driven approaches to estimate
statistical parameters. They rely on the observed data itself to make inferences or
perform analyses, rather than assuming a specific mathematical model.
3. Rank-Based Techniques:
Many nonparametric methods utilize ranks or orders of the data instead of the
actual data values. Ranks provide a way to represent the relative positions or
orders of the observations, and these methods can be especially useful when
dealing with skewed or heavy-tailed distributions.
Nonparametric methods are particularly useful when dealing with small sample
sizes, data with outliers or non-normal distributions, or variables with unknown
distributions. They provide an alternative approach to statistical inference and
analysis that does not require strong distributional assumptions.
It is important to note that the sign test has less statistical power than parametric
tests, such as the paired t-test, especially when the data exhibit a symmetric
distribution. Additionally, the sign test focuses on the medians or central tendency
of the data and does not provide information about the magnitude of the
difference between the paired observations.
The Wilcoxon signed-rank test is commonly used in situations where the data are
not normally distributed, the differences between paired observations are not
assumed to follow a specific distribution, or when the data are measured on an
ordinal scale.
It's important to note that the Mann-Whitney-Wilcoxon test does not provide
information about the magnitude of the difference between the groups. It only
tests whether there is a significant difference or not.
The Kruskal-Wallis test is used when the data do not meet the assumptions
required for parametric tests, such as the analysis of variance (ANOVA).
Specifically, it does not assume that the data are normally distributed or have
equal variances across groups.
1. Formulate hypotheses:
- Null hypothesis (H0): The medians of all groups are equal.
- Alternative hypothesis (Ha): At least one group has a different median.
It's worth noting that the Kruskal-Wallis test is an omnibus test, meaning it only
tells you whether there are overall differences among the groups. If the test is
significant, further analyses are needed to determine the specific group
differences.