Professional Documents
Culture Documents
Inp Parmezan ICMLA 2015 Study
Inp Parmezan ICMLA 2015 Study
net/publication/300414605
CITATIONS READS
32 1,388
2 authors:
All content following this page was uploaded by Antonio Rafael Sabino Parmezan on 16 January 2019.
Abstract—In the last two decades, with the rise of the k examples that are the nearest to an unlabeled example. The
Data Mining process, there is an increasing interest in the new example classification is decided on the labels of those k
adaptation of Machine Learning methods to support Time Series nearest examples [4].
non-parametric modeling and prediction. The non-parametric
temporal data modeling can be performed according to local and In this article we propose a novel modification of the
global approaches. The most of the local prediction data strategies kNN algorithm for TS prediction, named the kNN - Time
are based on the k-Nearest Neighbor (kNN) learning method. In Series Prediction with Invariances (kNN-TSPI). Our proposal
this paper we propose a modification of the kNN algorithm for differs from the literature by incorporating three techniques
Time Series prediction. Our proposal differs from the literature for obtaining amplitude and offset invariance [5], complexity
by incorporating three techniques for obtaining amplitude and invariance [6], and treatment of trivial matches [7]. As we
offset invariance, complexity invariance, and treatment of trivial discuss with more details throughout this paper, these three
matches. We evaluate the proposed method with six complexity
measures, in order to verify the impact of these measures in the
modifications allow a more meaningful matching between the
projection of the future values. Besides, we face our method with reference query and the TS subsequences.
two Machine Learning regression algorithms. The experimental This study has three major contributions:
comparisons were performed using 55 data sets, which are
available at the ICMC-USP Time Series Prediction Repository. • We describe for the first time our similarity-based
Our results indicate that the developed method is competitive method for TS prediction (kNN-TSPI) and discuss the
and the use of a complexity-invariant distance measure generally relevance of using the invariances to complexity, offset
improves the predictive performance. and amplitude, as well as the elimination of trivial
Keywords—Time Series Prediction, Similarity-Based Methods, matches;
Machine Learning, Data Mining. • As complexity can be measured according to different
paradigms, we evaluate six complexity measures for
I. I NTRODUCTION TS. The investigated complexity measures use con-
cepts from information theory, Kolmogorov complex-
In the last two decades, with the rise of the Data Mining ity and chaos complexity;
(DM) process, there is an increasing interest in the adaptation
of Machine Learning (ML) methods, especially those for • We perform one of the most comprehensible empirical
regression tasks, to support Time Series (TS) non-parametric evaluations ever done for TS prediction. We employ
modeling and prediction [1]. Due to their simplicity and com- a set of 55 TS from diverse areas. We built an online
prehensibility, the ML methods have established themselves as archive so that other researchers can replicate our
serious candidates to the classical parametric models, which results and evaluate their own methods. We named
are based on autoregression and moving averages [2]. our archive ICMC-USP Time Series Prediction Repos-
itory (ICMC-USP TSPR) [8].
The non-parametric temporal data modeling does not pre-
suppose the nature of the data distribution and can be per- Our results show that using complexity invariance gen-
formed according to the local and global approaches [3]. In the erally improves the similarity-based TS prediction. From the
global approach, the predictive models are constructed from a complexities measures compared, all but one outperformed the
training procedure that takes as input all observations of the prediction without complexity invariance.
series. Differently, in the local approach, the original series are Additionally, we face the kNN-TSPI with the best per-
partitioned into subsequences whose closest or most important forming complexity invariance to the two well-known ML re-
values about the current value are considered to predict future gression algorithms: Multilayer Perceptron (MLP) and Support
observations. Vector Machines (SVM). The results indicate that the proposed
method outperforms MLP but is outperformed by SVM, with
In this context, one of the local prediction data strategies
to the statistical difference.
consists of a modification of the k-Nearest Neighbor (kNN)
method. The kNN is an instance-based ML algorithm that In general, we believe the kNN-TSPI performs well consid-
consists on finding, according to some similarity measure, the ering its simplicity and the reduced number of parameters. The
two parameters necessary are the window size and quantity of zm+h is typically denoted by ẑ(m, h) or ẑ(h). For simplicity,
the nearest neighbors. However, both parameters can be easily but without loss of generality, the idea will be discussed in a
estimated using seasonality as a reference. unitary horizon (h = 1), i.e., considering the prediction of the
next value in the series.
We are also highlighting the importance of evaluation in
a set of publicly available TS from the ICMC-USP TSPR. As mentioned, from the series Z, the objective is to predict
The data in this repository pose some of the problems that the next (unobserved) data point ẑ(m, 1). The simplest method
one usually encounters in a typical one or multi-step ahead uses the TS last l observations as query Q, and searches for the
prediction tasks such as the growing trend, non-stationarity, k most similar subsequences to Q, using a sliding window of
(1) (k)
outliers and multiple overlying seasonalities. In addition, the size l. Let S1..l , . . . , S1..l be the k most similar subsequences,
method used for this experimental comparison is based on the (j)
we use the next observations of each subsequence Sl+1 with
guidelines advocated in related literature. (j)
1 ≤ j ≤ k to predict ẑ(m, 1). Thus, the values of Sl+1 are
The remaining of this article is organized as follows: in provided as input to a prediction function f , for example the
Section II, we briefly describe the fundamentals on TS pre- average (Equation 1), which aims to approximate the value of
diction and related work. In Section III, we present our ẑ(m, 1).
similarity-base Time Series prediction method. In Section IV,
we introduce concepts about the complexity-invariant distance k
1 X (j)
measure. In Section V, we specify the configuration of the f (S) = S (1)
experiments, as well as considered data sets. Results and k j=1 l+1
discussion are shown in Section VI while in Section VII we
present the conclusions and directions for future work. In Equation 1, f matches prediction function, S denotes
the set of the most similar subsequences and S (j) refers to
j th nearest neighbor. This is the simplest way of combining
II. BACKGROUND AND R ELATED W ORK
the predictions, since the average of the predictions considers
The methods for TS prediction are essentially based on that all prediction values are equally probable to occur in the
the idea that historical data include intrinsic patterns, usually future.
difficult to identify and not always directly interpretable, Figure 1 displays an example of applying the aforemen-
which discovered may help in the future description of the tioned method with k = 3 and l = 25.
phenomenon investigated. This description is one of the main
goals of the TS processing, it aims to answer in which (𝑘) (𝑗)
𝑍 𝑄 𝑆1..𝑙 𝑆𝑙+1 𝑧(𝑚, 1)
circumstances the patterns found will repeat and what types
of changes they may suffer over time [9].
The design of a model for TS prediction focuses on the
application of algorithms which perform assumptions about the
Z
III. T HE kNN-TSPI A LGORITHM The second issue addresses the need for complexity invari-
ance when comparing subsequences. In summary, the problem
There are three major issues with the naı̈ve similarity lies in the fact that pairs of complex objects, even those
search procedure shown in Figure 1, and the combination of that subjectively may seem very similar, tend to be further
these issues shows that the k’s most similar subsequences to apart under current distance measures than pairs of simple
a query Q are frequently very different from Q. objects [6]. A general idea of this problem is exemplified in
The first issue is the lack of invariance to amplitude and Figure 3, where the ED between Q and S (1) is greater than the
offset. When comparing the query Q to a subsequence S, both distance between Q and the straight line S (2) . The reason is
should be made invariant to the amplitude and offset. This that simple shapes such as S (2) usually present a good average
invariance can be obtained by several means, but a simple way behavior that match well with complex shapes. In contrast,
is through z-normalization, which is defined by Equation 2. complex shapes usually have several features such as peaks
and valleys that make them difficult to match each other, even
when the shapes look similar to the human eye.
zt − µ
zt0 = (2)
σ 3
2
zt0
In Equation 2, and zt refer respectively to the normalized 1 S(1) S(2)
value and the observation of the TS Z, both at time t. Likewise, 0
µ indicates the average and σ the standard deviation of the -1
values of a given subsequence that includes the observation -2 Q
zt . The z-normalization which transforms the data to ensure
10 20 30 40 50 60 70 80
zero average and unit standard deviation, has been strongly
advocated in tasks that need to search for TS subsequences [5]. Fig. 3. The simple-shaped subsequence S (2) is considered the best match
Figure 2 illustrates a subsequence S (1)
that is the exact for Q, even though S (1) has a similar behavior
match to the query Q. The subsequence S (1) has the exact
same values as Q, with just one difference, a small offset As we scan along the TS with a sliding window looking
increase. We also included a second subsequence S (2) that is for a match for Q, the subsequences with huge diversity of
completely different from Q. In our example, S (2) is a straight shapes will appear and the current distance measures will tend
line in order to make our argument stronger, but also for rea- to choose simpler shapes as the best matches. However, overly
sons that will become clearer when we discuss the complexity simple shapes, such as straight lines, will hardly be useful for
invariance. According to ED, the most similar subsequence to prediction purposes. The solution to this problem lies in the
Q is S (2) (ED(Q, S (2) ) = 84.26 e ED(Q, S (1) ) = 114.54), use of complexity-invariant distance.
even though the offset difference between Q and S (1) is just
A third issue we should take care of is known as trivial
10 units. The reason is very simple: Small offset differences
matches. A subsequence taken from a sliding window that
quickly accumulate making the final ED grow very fast.
starts at observation m is very similar to the subsequence that
90 starts at observation m + 1 (or m − 1). That happens because
80
these subsequences share all but two observations. We illustrate
this idea in Figure 4, in which the query Q is shifted first
70 S(1) to the right and then to the left just by changing one of the
60 S(2) observations.
50 Q
80
10 20 30 40 50 60 70 80
70
Fig. 2. Small offset difference makes ED consider Q more similar to S (2)
than to S (1) 60
Compression
t
term Dt has the value 1 if (zˆt − ẑt−1 )(zt − zt−1 ) > 0, and is
CID kNN-TSPI 0 otherwise.
Edges
Zero-crossings Ph
t=1 Dt
Permutation Entropy P OCID = × 100 (9)
h
Step 1 Step 2 Step 3
From the values of these evaluation measures it was
Fig. 5. Experimental setup also possible to objectively compare different settings of
TABLE I. S UMMARY OF FEATURES AND SETTINGS OF THE BENCHMARK DATA SETS
CD
kNN-TSPI algorithm. Such comparisons were analyzed using
the Friedman’s non-parametric statistical test for paired data 1 2 3 4 5 6 7