Professional Documents
Culture Documents
Series Data
Time Series
Decomposi-
Time Series Analysis and Mining with R
tion
Time Series
Forecasting
Time Series
Yanchang Zhao
Clustering
1/42
Outline
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
2/42
R
R and Time
Series Data
a free software environment for statistical computing and
Time Series
Decomposi- graphics
tion
Time Series
runs on Windows, Linux and MacOS
Forecasting
widely used in academia and research, as well as industrial
Time Series
Clustering applications
Time Series
Classification
over 3,000 packages
R Functions & CRAN Task View: Time Series Analysis
Packages for
Time Series http://cran.r-project.org/web/views/TimeSeries.html
Conclusions
3/42
Time Series Data in R
R and Time
Series Data
Time Series
Decomposi-
class ts
tion
represents data which has been sampled at equispaced
Time Series
Forecasting points in time
Time Series
Clustering frequency=7: a weekly series
Time Series frequency=12: a monthly series
Classification
Conclusions
4/42
Time Series Data in R
Time Series
2011 10
Clustering 2012
Time Series
Classification > str(a)
R Functions & Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8
Packages for
Time Series
> attributes(a)
Conclusions
$tsp
[1] 2011.167 2012.750 12.000
$class 5/42
Outline
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
6/42
What is Time Series Decomposition
R and Time
Series Data
R Functions &
Irregular component: the residuals
Packages for
Time Series
Conclusions
7/42
Data AirPassengers
Time Series
Forecasting
600
Time Series
Clustering
500
Time Series
Classification
AirPassengers
400
R Functions &
Packages for
Time Series
300
Conclusions
200
100
Time Series
Forecasting ● ●
60
Time Series
Clustering
40
Time Series
20
Classification ●
f$figure
R Functions &
0
●
●
Packages for ●
Time Series
−20
●
●
●
Conclusions ●
−40
2 4 6 8 10 12
Index
9/42
Decomposition
> plot(f)
Decomposition of additive time series
R and Time
Series Data
500
observed
Time Series
Decomposi-
300
tion
450100
Time Series
Forecasting
350
trend
Time Series
250
Clustering
150
Time Series
Classification
40
seasonal
R Functions &
0
Packages for
60 −40
Time Series
Conclusions
random
0 20
−40
2 4 6 8 10 12
Time
10/42
Outline
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
11/42
Time Series Forecasting
R and Time
Series Data
Time Series
Decomposi- To forecast future events based on known past data
tion
Time Series
E.g., to predict the opening price of a stock based on its
Forecasting
past performance
Time Series
Clustering Popular models
Time Series Autoregressive moving average (ARMA)
Classification
Autoregressive integrated moving average (ARIMA)
R Functions &
Packages for
Time Series
Conclusions
12/42
Forecasting
R and Time
> # build an ARIMA model
Series Data > fit <- arima(AirPassengers, order=c(1,0,0),
Time Series
Decomposi-
+ list(order=c(2,1,0), period=12))
tion
> fore <- predict(fit, n.ahead=24)
Time Series
Forecasting
> # error bounds at 95% confidence level
Time Series > U <- fore$pred + 2*fore$se
Clustering
> L <- fore$pred - 2*fore$se
Time Series
Classification > ts.plot(AirPassengers, fore$pred, U, L,
R Functions & + col=c(1,2,4,4), lty = c(1,1,2,2))
Packages for
Time Series > legend("topleft", col=c(1,2,4), lty=c(1,1,2),
Conclusions + c("Actual", "Forecast",
+ "Error Bounds (95% Confidence)"))
13/42
Forecasting
Actual
Forecast
R and Time 700 Error Bounds (95% Confidence)
Series Data
600
Time Series
Decomposi-
tion
500
Time Series
Forecasting
400
Time Series
Clustering
Time Series
300
Classification
R Functions &
Packages for
200
Time Series
Conclusions
100
Time 14/42
Outline
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
15/42
Time Series Clustering
R and Time
Series Data To partition time series data into groups based on
Time Series similarity or distance, so that time series in the same
Decomposi-
tion cluster are similar
Time Series Measure of distance/dissimilarity
Forecasting
Time Series
Euclidean distance
Clustering Manhattan distance
Time Series Maximum norm
Classification
Hamming distance
R Functions &
Packages for
The angle between two vectors (inner product)
Time Series Dynamic Time Warping (DTW) distance
Conclusions ...
16/42
Dynamic Time Warping (DTW)
Time Series
Classification
0.5
R Functions &
Query value
Packages for
Time Series
0.0
Conclusions
−0.5
−1.0
0 20 40 60 80 100
Index 17/42
Synthetic Control Chart Time Series
18/42
Synthetic Control Chart Time Series
R and Time
Series Data
> # read data into R
Time Series
Decomposi- > # sep="": the separator is white space, i.e., one
tion
> # or more spaces, tabs, newlines or carriage returns
Time Series
Forecasting > sc <- read.table("synthetic_control.data",
Time Series + header=F, sep="")
Clustering
> # show one sample from each class
Time Series
Classification > idx <- c(1,101,201,301,401,501)
R Functions & > sample1 <- t(sc[idx,])
Packages for
Time Series > plot.ts(sample1, main="")
Conclusions
19/42
Six Classes
36
30
34
32
20
301
1
30
R and Time
10
Series Data
28
26
Time Series
0
Decomposi-
45 24
45
tion
40
Time Series
35
Forecasting
101
401
35
Time Series
25
Clustering
30
15
Time Series
35 25
Classification
45
R Functions &
30
Packages for
40
25
201
501
Time Series
35
20
Conclusions
15
30
10
25
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time Time
20/42
Hierarchical Clustering with Euclidean distance
R and Time
Series Data > # sample n cases from every class
Time Series > n <- 10
Decomposi-
tion > s <- sample(1:100, n)
Time Series > idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s)
Forecasting
Time Series
> sample2 <- sc[idx,]
Clustering > observedLabels <- c(rep(1,n), rep(2,n), rep(3,n),
Time Series
Classification
+ rep(4,n), rep(5,n), rep(6,n))
R Functions &
> # hierarchical clustering with Euclidean distance
Packages for > hc <- hclust(dist(sample2), method="ave")
Time Series
Conclusions
> plot(hc, labels=observedLabels, main="")
21/42
Hierarchical Clustering with Euclidean distance
Time Series
Decomposi-
100
tion
Time Series
Forecasting
Height
80
Time Series
Clustering
Time Series
60
Classification
22
R Functions &
Packages for
5
6
Time Series
6
40
2 2
22
Conclusions
4 4
5 2
6
2
5
6
6
6
64
55
5
2
2
55
6
5
33
1
44
4
4
3
4
33
3
20
11
5
3
3
4
4
11
6
6
1
3
3
1
1
1
1
22/42
Hierarchical Clustering with Euclidean distance
R Functions &
3 0 0 0 0 0 0 10 0
Packages for
Time Series
4 0 0 0 0 0 0 0 10
Conclusions
5 0 0 0 0 0 0 10 0
6 0 0 0 0 0 0 0 10
23/42
Hierarchical Clustering with DTW Distance
Time Series
Decomposi-
tion
600
Time Series
Forecasting
Height
Time Series
Clustering
400
Time Series
Classification
R Functions &
Packages for
Time Series
200
Conclusions
22
22
2
22
22
2
4 6
6
3
35
5
66
66
33
55
3
64
56
35
61
11
44
6
4
33
55
3
44
4
4
11
5
1
1
4
1
3
1
1
0
25/42
Outline
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
26/42
Time Series Classification
R and Time
Series Data ctree from package party
Time Series
Decomposi- > classId <- c(rep("1",100), rep("2",100),
tion
+ rep("3",100), rep("4",100),
Time Series
Forecasting + rep("5",100), rep("6",100))
Time Series > newSc <- data.frame(cbind(classId, sc))
Clustering
> library(party)
Time Series
Classification > ct <- ctree(classId ~ ., data=newSc,
R Functions & + controls = ctree_control(minsplit=20,
Packages for
Time Series + minbucket=5, maxdepth=5))
Conclusions
28/42
Decision Tree
Conclusions
> # accuracy
> (sum(classId==pClassId)) / nrow(sc)
[1] 0.8183333
29/42
DWT (Discrete Wavelet Transform)
Time Series
Forecasting
Time Series
Clustering
Time Series
Classification
R Functions &
Packages for
Time Series
Conclusions
Time Series
> library(wavelets)
Decomposi-
tion
> wtData <- NULL
Time Series
> for (i in 1:nrow(sc)) {
Forecasting + a <- t(sc[i,])
Time Series
Clustering
+ wt <- dwt(a, filter="haar", boundary="periodic")
Time Series + wtData <- rbind(wtData,
Classification
+ unlist(c(wt@W, wt@V[[wt@level]])))
R Functions &
Packages for + }
Time Series
> wtData <- as.data.frame(wtData)
Conclusions
> wtSc <- data.frame(cbind(classId, wtData))
31/42
Decision Tree with DWT
Time Series
classId 1 2 3 4 5 6
Clustering 1 98 2 0 0 0 0
Time Series
Classification
2 1 99 0 0 0 0
R Functions &
3 0 0 81 0 19 0
Packages for 4 0 0 0 74 0 26
Time Series
Conclusions
5 0 0 16 0 84 0
6 0 0 0 3 0 97
> (sum(classId==pClassId)) / nrow(wtSc)
[1] 0.8883333
32/42
> plot(ct, ip_args=list(pval=FALSE), ep_args=list(digits=0))
1
V57
Node 4 (n = 68) Node 5 (n = 6) Node 7 (n = 9) Node 8 (n = 86) Node 10 (n = 31) Node 13 (n = 80) Node 15 (n = 9) Node 16 (n = 99) Node 18 (n = 12) Node 20 (n = 103) Node 21 (n = 97)
1 1 1 1 1 1 1 1 1 1 1
0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
0 0 0 0 0 0 0 0 0 0 0
123456 123456 123456 123456 123456 123456 123456 123456 123456 123456 123456
k-NN Classification
Time Series
> newTS <- sc[501,] + runif(100)*15
Forecasting
> distances <- dist(newTS, sc, method="DTW")
Time Series
Clustering > s <- sort(as.vector(distances), index.return=TRUE)
Time Series > # class IDs of k nearest neighbours
Classification
> table(classId[s$ix[1:k]])
R Functions &
Packages for
Time Series
4 6
Conclusions 3 17
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
35/42
Functions - Construction, Plot & Smoothing
36/42
Functions - Decomposition & Forecasting
Decomposition
decomp() time series decomposition by square-root filter
R and Time
Series Data
(timsac)
Time Series decompose() classical seasonal decomposition by moving
Decomposi-
tion averages (stats)
Time Series stl() seasonal decomposition of time series by loess
Forecasting
(stats)
Time Series
Clustering tsr() time series decomposition (ast)
Time Series ardec() time series autoregressive decomposition
Classification
(ArDec)
R Functions &
Packages for
Time Series
Forecasting
Conclusions arima() fit an ARIMA model to a univariate time series
(stats)
predict.Arima() forecast from models fitted by arima
(stats)
37/42
Packages
Packages
timsac time series analysis and control program
R and Time
Series Data ast time series analysis
Time Series
Decomposi-
ArDec time series autoregressive-based decomposition
tion
ares a toolbox for time series analyses using generalized
Time Series
Forecasting additive models
Time Series
Clustering
dse tools for multivariate, linear, time-invariant, time
Time Series
series models
Classification
forecast displaying and analysing univariate time series
R Functions &
Packages for forecasts
Time Series
dtw Dynamic Time Warping – find optimal alignment
Conclusions
between two time series
wavelets wavelet filters, wavelet transforms and
multiresolution analyses
38/42
Online Resources
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
40/42
Conclusions
Time Series
k-NN, neural networks, regression and decision trees
Classification
Time series clustering: work out your own
R Functions &
Packages for distance/similarity metrics, and then use existing clustering
Time Series
techniques, such as k-means and hierarchical clustering
Conclusions
Techniques specially for classifying/clustering time series
data: a lot of research publications, but no R
implementations (as far as I know)
41/42
The End
Email: yanchangzhao@gmail.com
RDataMining: http://www.rdatamining.com
R and Time
Series Data Twitter: http://twitter.com/rdatamining
Time Series
Decomposi-
Group on Linkedin: http://group.rdatamining.com
tion Group on Google: http://group2.rdatamining.com
Time Series
Forecasting
Time Series
Clustering
Time Series
Classification
R Functions &
Packages for
Time Series
Conclusions
42/42