You are on page 1of 42

R and Time

Series Data

Time Series
Decomposi-
Time Series Analysis and Mining with R
tion

Time Series
Forecasting

Time Series
Yanchang Zhao
Clustering

Time Series RDataMining.com


Classification
http://www.rdatamining.com/
R Functions &
Packages for
Time Series
18 July 2011
Conclusions

1/42
Outline

1 R and Time Series Data


R and Time
Series Data

Time Series 2 Time Series Decomposition


Decomposi-
tion

Time Series 3 Time Series Forecasting


Forecasting

Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions

7 Conclusions

2/42
R

R and Time
Series Data
a free software environment for statistical computing and
Time Series
Decomposi- graphics
tion

Time Series
runs on Windows, Linux and MacOS
Forecasting
widely used in academia and research, as well as industrial
Time Series
Clustering applications
Time Series
Classification
over 3,000 packages
R Functions & CRAN Task View: Time Series Analysis
Packages for
Time Series http://cran.r-project.org/web/views/TimeSeries.html
Conclusions

3/42
Time Series Data in R

R and Time
Series Data

Time Series
Decomposi-
class ts
tion
represents data which has been sampled at equispaced
Time Series
Forecasting points in time
Time Series
Clustering frequency=7: a weekly series
Time Series frequency=12: a monthly series
Classification

R Functions & frequency=4: a quarterly series


Packages for
Time Series

Conclusions

4/42
Time Series Data in R

> a <- ts(1:20, frequency=12, start=c(2011,3))


> print(a)
R and Time
Series Data Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
Time Series 2011 1 2 3 4 5 6 7 8 9
Decomposi-
tion 2012 11 12 13 14 15 16 17 18 19 20
Time Series Dec
Forecasting

Time Series
2011 10
Clustering 2012
Time Series
Classification > str(a)
R Functions & Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8
Packages for
Time Series
> attributes(a)
Conclusions
$tsp
[1] 2011.167 2012.750 12.000

$class 5/42
Outline

1 R and Time Series Data


R and Time
Series Data

Time Series 2 Time Series Decomposition


Decomposi-
tion

Time Series 3 Time Series Forecasting


Forecasting

Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions

7 Conclusions

6/42
What is Time Series Decomposition

R and Time
Series Data

Time Series To decompose a time series into components:


Decomposi-
tion
Trend component: long term trend
Time Series
Forecasting Seasonal component: seasonal variation
Time Series
Clustering Cyclical component: repeated but non-periodic
Time Series fluctuations
Classification

R Functions &
Irregular component: the residuals
Packages for
Time Series

Conclusions

7/42
Data AirPassengers

Data AirPassengers: Monthly totals of Box Jenkins


international airline passengers, 1949 to 1960. It has
R and Time 144(=12×12) values.
Series Data

Time Series > plot(AirPassengers)


Decomposi-
tion

Time Series
Forecasting
600

Time Series
Clustering
500

Time Series
Classification
AirPassengers

400

R Functions &
Packages for
Time Series
300

Conclusions
200
100

1950 1952 1954 1956 1958 1960


8/42
Decomposition

> apts <- ts(AirPassengers, frequency = 12)


> f <- decompose(apts)
R and Time
Series Data > # seasonal figures
Time Series > plot(f$figure,type="b")
Decomposi-
tion

Time Series
Forecasting ● ●
60

Time Series
Clustering
40

Time Series
20

Classification ●
f$figure

R Functions &
0



Packages for ●

Time Series
−20




Conclusions ●
−40

2 4 6 8 10 12

Index

9/42
Decomposition

> plot(f)
Decomposition of additive time series
R and Time
Series Data
500
observed

Time Series
Decomposi-
300

tion
450100

Time Series
Forecasting
350
trend

Time Series
250

Clustering
150

Time Series
Classification
40
seasonal

R Functions &
0

Packages for
60 −40

Time Series

Conclusions
random
0 20
−40

2 4 6 8 10 12

Time
10/42
Outline

1 R and Time Series Data


R and Time
Series Data

Time Series 2 Time Series Decomposition


Decomposi-
tion

Time Series 3 Time Series Forecasting


Forecasting

Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions

7 Conclusions

11/42
Time Series Forecasting

R and Time
Series Data

Time Series
Decomposi- To forecast future events based on known past data
tion

Time Series
E.g., to predict the opening price of a stock based on its
Forecasting
past performance
Time Series
Clustering Popular models
Time Series Autoregressive moving average (ARMA)
Classification
Autoregressive integrated moving average (ARIMA)
R Functions &
Packages for
Time Series

Conclusions

12/42
Forecasting

R and Time
> # build an ARIMA model
Series Data > fit <- arima(AirPassengers, order=c(1,0,0),
Time Series
Decomposi-
+ list(order=c(2,1,0), period=12))
tion
> fore <- predict(fit, n.ahead=24)
Time Series
Forecasting
> # error bounds at 95% confidence level
Time Series > U <- fore$pred + 2*fore$se
Clustering
> L <- fore$pred - 2*fore$se
Time Series
Classification > ts.plot(AirPassengers, fore$pred, U, L,
R Functions & + col=c(1,2,4,4), lty = c(1,1,2,2))
Packages for
Time Series > legend("topleft", col=c(1,2,4), lty=c(1,1,2),
Conclusions + c("Actual", "Forecast",
+ "Error Bounds (95% Confidence)"))

13/42
Forecasting

Actual
Forecast
R and Time 700 Error Bounds (95% Confidence)
Series Data
600

Time Series
Decomposi-
tion
500

Time Series
Forecasting
400

Time Series
Clustering

Time Series
300

Classification

R Functions &
Packages for
200

Time Series

Conclusions
100

1950 1952 1954 1956 1958 1960 1962

Time 14/42
Outline

1 R and Time Series Data


R and Time
Series Data

Time Series 2 Time Series Decomposition


Decomposi-
tion

Time Series 3 Time Series Forecasting


Forecasting

Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions

7 Conclusions

15/42
Time Series Clustering

R and Time
Series Data To partition time series data into groups based on
Time Series similarity or distance, so that time series in the same
Decomposi-
tion cluster are similar
Time Series Measure of distance/dissimilarity
Forecasting

Time Series
Euclidean distance
Clustering Manhattan distance
Time Series Maximum norm
Classification
Hamming distance
R Functions &
Packages for
The angle between two vectors (inner product)
Time Series Dynamic Time Warping (DTW) distance
Conclusions ...

16/42
Dynamic Time Warping (DTW)

DTW finds optimal alignment between two time series.


> library(dtw)
R and Time
Series Data
> idx <- seq(0, 2*pi, len=100)
Time Series
> a <- sin(idx) + runif(100)/10
Decomposi-
tion
> b <- cos(idx)
Time Series
> align <- dtw(a, b, step=asymmetricP1, keep=T)
Forecasting > dtwPlotTwoWay(align)
Time Series
Clustering
1.0

Time Series
Classification
0.5

R Functions &
Query value

Packages for
Time Series
0.0

Conclusions
−0.5
−1.0

0 20 40 60 80 100

Index 17/42
Synthetic Control Chart Time Series

The dataset contains 600 examples of control charts


R and Time
Series Data synthetically generated by the process in Alcock and
Time Series Manolopoulos (1999).
Decomposi-
tion Each control chart is a time series with 60 values.
Time Series
Forecasting
Six classes:
Time Series 1-100 Normal
Clustering 101-200 Cyclic
Time Series
Classification
201-300 Increasing trend
301-400 Decreasing trend
R Functions &
Packages for 401-500 Upward shift
Time Series
501-600 Downward shift
Conclusions
http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_
control.html

18/42
Synthetic Control Chart Time Series

R and Time
Series Data
> # read data into R
Time Series
Decomposi- > # sep="": the separator is white space, i.e., one
tion
> # or more spaces, tabs, newlines or carriage returns
Time Series
Forecasting > sc <- read.table("synthetic_control.data",
Time Series + header=F, sep="")
Clustering
> # show one sample from each class
Time Series
Classification > idx <- c(1,101,201,301,401,501)
R Functions & > sample1 <- t(sc[idx,])
Packages for
Time Series > plot.ts(sample1, main="")
Conclusions

19/42
Six Classes

36

30
34
32

20
301
1
30
R and Time

10
Series Data
28
26
Time Series

0
Decomposi-
45 24

45
tion

40
Time Series
35

Forecasting
101

401
35
Time Series
25

Clustering

30
15

Time Series

35 25
Classification
45

R Functions &

30
Packages for
40

25
201

501
Time Series
35

20
Conclusions
15
30

10
25

0 10 20 30 40 50 60 0 10 20 30 40 50 60

Time Time
20/42
Hierarchical Clustering with Euclidean distance

R and Time
Series Data > # sample n cases from every class
Time Series > n <- 10
Decomposi-
tion > s <- sample(1:100, n)
Time Series > idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s)
Forecasting

Time Series
> sample2 <- sc[idx,]
Clustering > observedLabels <- c(rep(1,n), rep(2,n), rep(3,n),
Time Series
Classification
+ rep(4,n), rep(5,n), rep(6,n))
R Functions &
> # hierarchical clustering with Euclidean distance
Packages for > hc <- hclust(dist(sample2), method="ave")
Time Series

Conclusions
> plot(hc, labels=observedLabels, main="")

21/42
Hierarchical Clustering with Euclidean distance

R and Time 140


120
Series Data

Time Series
Decomposi-
100

tion

Time Series
Forecasting
Height

80

Time Series
Clustering

Time Series
60

Classification

22
R Functions &
Packages for
5
6

Time Series
6
40

2 2
22
Conclusions
4 4

5 2
6

2
5
6
6
6
64

55
5

2
2
55
6

5
33

1
44
4
4

3
4

33
3
20

11
5
3
3
4
4

11
6
6

1
3
3

1
1
1
1
22/42
Hierarchical Clustering with Euclidean distance

R and Time > # cut tree to get 8 clusters


Series Data
> memb <- cutree(hc, k=8)
Time Series
Decomposi- > table(observedLabels, memb)
tion

Time Series memb


Forecasting
observedLabels 1 2 3 4 5 6 7 8
Time Series
Clustering 1 10 0 0 0 0 0 0 0
Time Series 2 0 3 1 1 3 2 0 0
Classification

R Functions &
3 0 0 0 0 0 0 10 0
Packages for
Time Series
4 0 0 0 0 0 0 0 10
Conclusions
5 0 0 0 0 0 0 10 0
6 0 0 0 0 0 0 0 10

23/42
Hierarchical Clustering with DTW Distance

> myDist <- dist(sample2, method="DTW")


R and Time
> hc <- hclust(myDist, method="average")
Series Data
> plot(hc, labels=observedLabels, main="")
Time Series
Decomposi- > # cut tree to get 8 clusters
tion
> memb <- cutree(hc, k=8)
Time Series
Forecasting > table(observedLabels, memb)
Time Series
Clustering memb
Time Series observedLabels 1 2 3 4 5 6 7 8
Classification
1 10 0 0 0 0 0 0 0
R Functions &
Packages for 2 0 4 3 2 1 0 0 0
Time Series
3 0 0 0 0 0 6 4 0
Conclusions
4 0 0 0 0 0 0 0 10
5 0 0 0 0 0 0 10 0
6 0 0 0 0 0 0 0 10
24/42
Hierarchical Clustering with DTW Distance

R and Time 1000


800
Series Data

Time Series
Decomposi-
tion
600

Time Series
Forecasting
Height

Time Series
Clustering
400

Time Series
Classification

R Functions &
Packages for
Time Series
200

Conclusions

22

22
2
22
22
2
4 6
6
3

35
5

66

66
33

55
3

64
56
35

61
11
44

6
4
33

55
3

44
4
4

11
5

1
1
4

1
3

1
1
0

25/42
Outline

1 R and Time Series Data


R and Time
Series Data

Time Series 2 Time Series Decomposition


Decomposi-
tion

Time Series 3 Time Series Forecasting


Forecasting

Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions

7 Conclusions

26/42
Time Series Classification

Time Series Classification


To build a classification model based on labelled time
R and Time
Series Data series
Time Series
Decomposi-
and then use the model to predict the lable of unlabelled
tion time series
Time Series
Forecasting Feature Extraction
Time Series
Clustering
Singular Value Decomposition (SVD)
Time Series Discrete Fourier Transform (DFT)
Classification

R Functions & Discrete Wavelet Transform (DWT)


Packages for
Time Series Piecewise Aggregate Approximation (PAA)
Conclusions
Perpetually Important Points (PIP)
Piecewise Linear Representation
Symbolic Representation
27/42
Decision Tree (ctree)

R and Time
Series Data ctree from package party
Time Series
Decomposi- > classId <- c(rep("1",100), rep("2",100),
tion
+ rep("3",100), rep("4",100),
Time Series
Forecasting + rep("5",100), rep("6",100))
Time Series > newSc <- data.frame(cbind(classId, sc))
Clustering
> library(party)
Time Series
Classification > ct <- ctree(classId ~ ., data=newSc,
R Functions & + controls = ctree_control(minsplit=20,
Packages for
Time Series + minbucket=5, maxdepth=5))
Conclusions

28/42
Decision Tree

> pClassId <- predict(ct)


> table(classId, pClassId)
R and Time
Series Data
pClassId
Time Series
Decomposi- classId 1 2 3 4 5 6
tion
1 100 0 0 0 0 0
Time Series
Forecasting 2 1 97 2 0 0 0
Time Series
Clustering
3 0 0 99 0 1 0
Time Series
4 0 0 0 100 0 0
Classification 5 4 0 8 0 88 0
R Functions &
Packages for
6 0 3 0 90 0 7
Time Series

Conclusions
> # accuracy
> (sum(classId==pClassId)) / nrow(sc)
[1] 0.8183333

29/42
DWT (Discrete Wavelet Transform)

Wavelet transform provides a multi-resolution


representation using wavelets.
R and Time
Series Data Haar Wavelet Transform – the simplest DWT
Time Series
Decomposi-
http://dmr.ath.cx/gfx/haar/
tion

Time Series
Forecasting

Time Series
Clustering

Time Series
Classification

R Functions &
Packages for
Time Series

Conclusions

DFT (Discrete Fourier Transform): another popular


feature extraction technique
30/42
DWT (Discrete Wavelet Transform)

R and Time > # extract DWT (with Haar filter) coefficients


Series Data

Time Series
> library(wavelets)
Decomposi-
tion
> wtData <- NULL
Time Series
> for (i in 1:nrow(sc)) {
Forecasting + a <- t(sc[i,])
Time Series
Clustering
+ wt <- dwt(a, filter="haar", boundary="periodic")
Time Series + wtData <- rbind(wtData,
Classification
+ unlist(c(wt@W, wt@V[[wt@level]])))
R Functions &
Packages for + }
Time Series
> wtData <- as.data.frame(wtData)
Conclusions
> wtSc <- data.frame(cbind(classId, wtData))

31/42
Decision Tree with DWT

> ct <- ctree(classId ~ ., data=wtSc, controls =


+ ctree_control(minsplit=20,
R and Time
Series Data
+ minbucket=5, maxdepth=5))
Time Series > pClassId <- predict(ct)
Decomposi-
tion > table(classId, pClassId)
Time Series pClassId
Forecasting

Time Series
classId 1 2 3 4 5 6
Clustering 1 98 2 0 0 0 0
Time Series
Classification
2 1 99 0 0 0 0
R Functions &
3 0 0 81 0 19 0
Packages for 4 0 0 0 74 0 26
Time Series

Conclusions
5 0 0 16 0 84 0
6 0 0 0 3 0 97
> (sum(classId==pClassId)) / nrow(wtSc)
[1] 0.8883333
32/42
> plot(ct, ip_args=list(pval=FALSE), ep_args=list(digits=0))
1
V57

≤ 117 > 117


2 9
W43 V57

≤ −4 > −4 ≤ 140 > 140


3 6 11
W5 W31 V57

≤ 178 > 178


12 17
W22 W31

≤ −8 > −8 ≤ −6 > −6 ≤ −6 > −6 ≤ −15 > −15


14 19
W31 W43

≤ −13 > −13 ≤3 >3

Node 4 (n = 68) Node 5 (n = 6) Node 7 (n = 9) Node 8 (n = 86) Node 10 (n = 31) Node 13 (n = 80) Node 15 (n = 9) Node 16 (n = 99) Node 18 (n = 12) Node 20 (n = 103) Node 21 (n = 97)
1 1 1 1 1 1 1 1 1 1 1
0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
0 0 0 0 0 0 0 0 0 0 0
123456 123456 123456 123456 123456 123456 123456 123456 123456 123456 123456
k-NN Classification

find the k nearest neighbours of a new instance


R and Time
label it by majority voting
Series Data
needs an efficient indexing structure for large datasets
Time Series
Decomposi- > k <- 20
tion

Time Series
> newTS <- sc[501,] + runif(100)*15
Forecasting
> distances <- dist(newTS, sc, method="DTW")
Time Series
Clustering > s <- sort(as.vector(distances), index.return=TRUE)
Time Series > # class IDs of k nearest neighbours
Classification
> table(classId[s$ix[1:k]])
R Functions &
Packages for
Time Series
4 6
Conclusions 3 17

Results of Majority Voting


Label of newTS ← class 6
34/42
Outline

1 R and Time Series Data


R and Time
Series Data

Time Series 2 Time Series Decomposition


Decomposi-
tion

Time Series 3 Time Series Forecasting


Forecasting

Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions

7 Conclusions

35/42
Functions - Construction, Plot & Smoothing

R and Time Construction


Series Data

Time Series ts() create time-series objects (stats)


Decomposi-
tion
Plot
Time Series
Forecasting
plot.ts() plot time-series objects (stats)
Time Series
Clustering
Smoothing & Filtering
Time Series
Classification
smoothts() time series smoothing (ast)
R Functions &
Packages for
Time Series
sfilter() remove seasonal fluctuation using moving
Conclusions
average (ast)

36/42
Functions - Decomposition & Forecasting

Decomposition
decomp() time series decomposition by square-root filter
R and Time
Series Data
(timsac)
Time Series decompose() classical seasonal decomposition by moving
Decomposi-
tion averages (stats)
Time Series stl() seasonal decomposition of time series by loess
Forecasting
(stats)
Time Series
Clustering tsr() time series decomposition (ast)
Time Series ardec() time series autoregressive decomposition
Classification
(ArDec)
R Functions &
Packages for
Time Series
Forecasting
Conclusions arima() fit an ARIMA model to a univariate time series
(stats)
predict.Arima() forecast from models fitted by arima
(stats)
37/42
Packages

Packages
timsac time series analysis and control program
R and Time
Series Data ast time series analysis
Time Series
Decomposi-
ArDec time series autoregressive-based decomposition
tion
ares a toolbox for time series analyses using generalized
Time Series
Forecasting additive models
Time Series
Clustering
dse tools for multivariate, linear, time-invariant, time
Time Series
series models
Classification
forecast displaying and analysing univariate time series
R Functions &
Packages for forecasts
Time Series
dtw Dynamic Time Warping – find optimal alignment
Conclusions
between two time series
wavelets wavelet filters, wavelet transforms and
multiresolution analyses
38/42
Online Resources

An R Time Series Tutorial


http://www.stat.pitt.edu/stoffer/tsa2/R_time_series_quick_fix.htm

R and Time Time Series Analysis with R


Series Data
http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_
Time Series
Decomposi- intro.pdf
tion
Using R (with applications in Time Series Analysis)
Time Series
Forecasting http://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdf

Time Series CRAN Task View: Time Series Analysis


Clustering
http://cran.r-project.org/web/views/TimeSeries.html
Time Series
Classification R Functions for Time Series Analysis
R Functions & http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
Packages for
Time Series R Reference Card for Data Mining;
Conclusions R and Data Mining: Examples and Case Studies
http://www.rdatamining.com/

Time Series Analysis for Business Forecasting


http://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htm
39/42
Outline

1 R and Time Series Data


R and Time
Series Data

Time Series 2 Time Series Decomposition


Decomposi-
tion

Time Series 3 Time Series Forecasting


Forecasting

Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions

7 Conclusions

40/42
Conclusions

Time series decomposition and forecasting: many R


functions and packages available
R and Time
Series Data Time series classification and clustering: no R functions or
Time Series
Decomposi-
packages specially for this purpose; have to work it out by
tion
yourself
Time Series
Forecasting Time series classification: extract and build features, and
Time Series then apply existing classification techniques, such as SVM,
Clustering

Time Series
k-NN, neural networks, regression and decision trees
Classification
Time series clustering: work out your own
R Functions &
Packages for distance/similarity metrics, and then use existing clustering
Time Series
techniques, such as k-means and hierarchical clustering
Conclusions
Techniques specially for classifying/clustering time series
data: a lot of research publications, but no R
implementations (as far as I know)
41/42
The End

Email: yanchangzhao@gmail.com
RDataMining: http://www.rdatamining.com
R and Time
Series Data Twitter: http://twitter.com/rdatamining
Time Series
Decomposi-
Group on Linkedin: http://group.rdatamining.com
tion Group on Google: http://group2.rdatamining.com
Time Series
Forecasting

Time Series
Clustering

Time Series
Classification

R Functions &
Packages for
Time Series

Conclusions

42/42

You might also like