52 views

Uploaded by joegao1

A spiral recurrent neural network (SpiralRNN) has a special structure of the reccurent hidden layer which allows to bound the eigenvalues of the recurrent weight matrix. Thus, the network can learn characteristic temporal correlations online without running into dynamical instabilities. In this paper, SpiralRNN is employed to solve the financial time series prediction problem in NN5 competition.
We use these time series to demonstrate the performance of SpiralRNN for data with particular weakly and seasonal periodicities. These are taken into account by providing additional sinusoidal input time series to the problem in question, where methods such as preprocessing and enlargement with appropriate periodicities. The non-regular Easter holidays are taken into account by an additional Gaussian-shaped input signal centered at these holidays. The prediction performance is enhanced by a mixture of experts approach consisting of the combined output of 30 online learning SpiralRNNs with weights proportional to their temporal average one-step forceast error. The main advantage of this approach is the low configuration effort and the online learning capability. An evaluation based on a forecast of the last 56 data of the 111 time series is provided.

save

You are on page 1of 5

Data Prediction

Huaien Gao1,2, Rudolf Sollacher2, Han-Peter Kriegel1

1- University of Munich, Germany

2- Siemens AG, Corporate Technology, Germany

**Abstract— A spiral recurrent neural network (SpiralRNN) II. S PIRAL R ECURRENT N EURAL N ETWORK
**

has a special structure of the reccurent hidden layer which

allows to bound the eigenvalues of the recurrent weight matrix. A. Hidden Units

Thus, the network can learn characteristic temporal correla-

tions online without running into dynamical instabilities. In A SpiralRNN [1], [2] is a recurrent neural network with

this paper, SpiralRNN is employed to solve the financial time special recurrent layer structure which can be broken down

series prediction problem in NN5 competition. We use these into smaller units, namely “hidden units” or “spiral units”.

time series to demonstrate the performance of SpiralRNN for Each hidden unit receives signals from the input neurons

data with particular weakly and seasonal periodicities. These

are taken into account by providing additional sinusoidal input

and provides processed signals to the output neurons. In

time series to the problem in question, where methods such as addition, they receive signals from other hidden neurons in

pre-processing and enlargement with appropriate periodicities. the same unit delayed by one time step. Fig-1(a) illustrates a

The non-regular Easter holidays are taken into account by typical hidden unit with three input neurons and three output

an additional Gaussian-shaped input signal centered at these neurons, where the hidden layer structure is only shown

holidays. The prediction performance is enhanced by a mixture

of experts approach consisting of the combined output of 30

symbolically. Note that hidden neurons are fully connected

online learning SpiralRNNs with weights proportional to their to all input neurons and all output neurons. More details

temporal average one-step forceast error. The main advantage of the connections inside the hidden layer are shown in

of this approach is the low configuration effort and the online fig-1(b), where the connections from only one particular

learning capability. An evaluation based on a forecast of the neuron to all other neurons in the hidden unit are displayed.

last 56 data of the 111 time series is provided.

With all neurons in the hidden unit aligned clockwise on a

circle, values of connection weights are defined such that the

I. I NTRODUCTION

connection from one neuron to its first clockwise neighbor

Time series prediction is a common task in various in- has value β1 , the connection to its second clockwise neighbor

dustry sectors, such as robotic control and financial market. has value β2 and so on. The definition of connection values

NN5 competition1 is one of the leading competitions with is applied to all the neurons, so that all connections from

an emphasis on utilizing computational intelligence methods. neurons to their respective first clockwise neighbors have an

The data in question come from the amount of money identical weight β1 , and all the connections from neurons to

withdrawn from ATM machines across England. These data their second clockwise neighbors have value β2 , and so on.

exhibit strong periodical (e.g. weekly, seasonally and yearly)

behavior. The associated processes have deterministic and

stochastic components. In general, they will not be stationary, 0 β1 β2 . . . βu−1

.. 0 1 0

as for example more tourists are visiting this area or a new

βu−1 0 β1 . . . . .. .. ..

shopping mall has opened. In this paper, we apply online

.

.. P = . .

M = βu−2 βu−1 0 . . .

.

learning Spiral Recurrent Neural Network (SpiralRNN) [1], ..

0 . 1

. . . .

[2] to this prediction problem. Our approach focuses on .. .. .. .. β

1 1 0 ... 0

the online learning capability and on an as low as possible β1 . . . . . . βu−1 0

configuration and preprocessing effort. (1)

The remainder of this paper is arranged as followed: This corresponding hidden-weight matrix M is shown in

Section-II introduces the SpiralRNN structure; section-III ~∈

eq. (1). Its matrix elements are determined by a vector β

discusses the adaptation of SpiralRNN model to the pre- R(u−1)×1 where u refers to the number of hidden neurons in

diction of NN5 competition data; section-IV presents some the hidden unit. Furthermore, matrix M can be decomposed

evaluation results of forecasting the last 56 data of the 111 into iterated permutations described by a matrix P:

time series.

u−1

M = β1 P + β2 P 2 + . . . + βu−1 P , P ∈ Ru×u (2)

This paper has been presented to the special section of time series

competition in the World Congress on Computational Intelligence (WCCI)

2008 in Hong Kong. It is obvious that matrix P 2 is also a permutation matrix

1 http://www.neural-forecasting-competition.com/ shifting a multiplier vector by two positions. Similarly, P u

Output xt

Data Output Layer

Target x̂t

Output Layer

Z−1

Gradient

Calculation

Hidden Layer

Hidden Layer

**EKF Input Layer
**

Input Layer

Input x̂t−1

Fig. 2. The typical structure of SpiralRNNs. Note that all hidden

units have the same basic topology (however the number of hidden

neurons in the hidden units can be different), as shown in fig-1,

1

0.5

1

0.8

1

0.5

and are separated from each other whereas the input and output

0

0.6

0

connections are fully connected to the hidden neurons.

0.4

−0.5

−0.5

0.2

−1

0 −1

320 325 330 335 340 0 200 400 600 800 0 200 400 600 800

**Fig. 1. (a) The structure of a hidden unit with 3 input neurons
**

between any hidden neuron from one hidden unit to any

and 3 output neurons; (b) Structure of a hidden unit, where only hidden neuron of another hidden unit (see fig-2). The hidden-

the outgoing connections from one neuron are shown; connections weight matrix Whid of the entire network is a block diagonal

from other neurons will have the same structure and weights. matrix with each sub-block corresponding to one particular

hidden unit. Note that the sizes of different sub-blocks Mi

can differ from each other.

up-shifts the multiplier vector by u positions, and therefore: For such a block-diagonal structure the constraint upon the

P=P

u eigenvalue spectrum of the hidden-weight matrix Whid can

be easily derived:

Now, the eigenvalue λ̂k of any permutation matrix P i (i ∈ n o

~ (k) ||taxi , k ∈ [1, · · · , n

N+ ) satisfies [3]: |λ| ≤ max ||β units

] (6)

k

**|λ̂k | = 1, k = 1, . . . , u With this structure, SpiralRNN can have a bounded eigen-
**

value spectrum of the hidden-weight matrix as in echo state

Therefore, the maximum absolute eigenvalue of matrix M is neural networks (ESN) [4] while still remaining trainable as

bounded, such that the relation in (3) holds. in simple recurrent networks (SRN) introduced by Elman [5].

u−1

X C. On-line training

|λu | ≤ |βi | (3)

On-line training of SpiralRNN is conducted with extend

i=1

Kalman filter (EKF). The EKF is extension of the Kalman

A suitable parameterization of the vector β~ by a predefined filter which is an optimal linear estimator with the following

value γ ∈ R+ and a trainable vector ξ~ is the following: equations:

Pt†

~ = γ tanh ξ~ ,

β (4) = Pt−1 + Qt

−1

−1

† T −1

Now, the matrix M can be rewritten as Pt = Pt + H Rt H

u−1

wt− + Pt H T Rt−1 ŷt − Hwt−

X

i wt =

M= γ tanh(ξi )P ,

i=1 where wt is the parameter set to be optimized, matrices

and the relation (3) simplifies to the following relation: P, Q, R with initialization {1, 10−8, 1} × Id 2 are parameters

of Kalman filter. During the on-line training of SpiralRNN

u−1

X with NN5 competition data, we have fixed the Q and R

|λu | ≤ γ |tanh(ξi )| matrices with their initialization value. H is the gradient

i=1

of the error w.r.t. the parameter set, which has special

≤ γ(u − 1) (5)

form because of definition of SMAPE error value in NN5

B. SpiralRNNs competition as in equation-7 where Ft∗ is the data and yt∗ is

the prediction.

The construction of SpiralRNNs is generally based on

n

spiral hidden units. It simply concatenates several hidden X |yt∗ − Ft∗ |

Esmape = 1/n × 100% (7)

units together, and fully connects all hidden neurons to (yt∗ + Ft∗ )/2

t

all input and output neurons. Note that hidden units are

separated from each other, i.e. there is no interconnections 2I refers to identity matrix

d

As it will be mentioned in the later section, we use logarithm Seasonal features do not prevail in the dataset, but they

operator to transform the data into reasonable range. before do exist among several of them, e.g. time series No. 9,

the data are fed into the neural network. Therefore, the No. 88. As both are regular features with a yearly

gradient H is calculated in equation-9 with yt and Ft are period, it makes sense to provide an additional input as

the corresponding values of yt∗ and Ft∗ in logarithmic scale shown in figure-3(b) which has the period value 365.

and et is the on-line training one-step forecast error. 3) Easter holiday bump addressing feature F2 . The Easter

holidays did not have as much impact on the data dy-

s = exp(yt ) + exp(Ft ) (8)

namics as the Christmas holidays did, but it did provide

exp(yt ) |et | ∂et certain stimulation on the usage of ATM in some areas

Ht = − sign(et ) + (9)

s s ∂wt (shown in some time series). Furthermore, as the 58-

As there exist data with empty values, whenever such step prediction interval includes the Easter holidays

empty values are supposed to be the target for training, we of year 1998, the prediction over the holiday can be

don’t implement the parameter-update but still accumulate improved when the related data behavior is learnt. This

the gradient of output w.r.t. parameters, until the data values additional input uses the Gaussian-distribution-shape

become available again. curve to emulate the Easter holiday bump as in figure-

3(c).

III. TOWARDS NN5 COMPETITION

Theoretically, SpiralRNN can learn the dynamic characters 1

of given data by itself. Being aware of the features of

given data, additional input can help to find a more accurate 0.5

solution and to speed-up the convergence. These efforts

include: (1) providing more information as input of neural 0

network; (2) using committee of experts approach on the

top of neural network training. Utilizing of these efforts is −0.5

based on the characteristics of the given dataset of NN5

competition. −1

250 255 260 265 270

A. Data characteristics (a) Weekly-input

The time series data in the NN5 dataset exhibit at least 1

the following features:

F1 Strong weekly period dominates the frequency 0.5

spectrum, usually with higher values on Thursday

and/or Friday; 0

F2 Important holidays such as the Christmas holidays

(including the New Year holiday) and the Easter

−0.5

holidays have a visible impact on the data dynam-

ics;

−1

F3 Several of the time series such as time series No. 9 0 200 400 600 800

No.89 show strong seasonal behavior; (b) Christmas-input

F4 Some of the time series (like No. 26 and No. 48)

1

show a sudden change in their statistics, e.g. a shift

in the mean value. 0.8

B. Pre-processing and additional inputs 0.6

**The data presented to the neural network are mapped to 0.4
**

a useful range by the logarithm function. In order to avoid

0.2

singularities due to original zero values we replace them by

small positive random values. 0

Additional sinusoidal inputs are provided as a representa- 200 400 600 800

tion of calendar information. These additional inputs include: (c) Easter-input

1) Weekly behavior addressing feature F1 . Refer to figure-

3(a) and note the period is equal to 7. Fig. 3. Additional inputs of neural networks. On the X-axis is the

time steps, and Y-axis is the additional input value.

2) Christmas and seasonal behavior addressing feature F2

and F3 . It is often observed from the dataset that,

right after the Christmas holiday, withdraw of money C. Hybrid with expertization

was rare and then increased along the year and finally SpiralRNN is capable of learning time series prediction

it reached its summit value right before Christmas. with fast convergence; nevertheless, the learned weights

correspond to local minimima of the error landscape as Seasonal behavior of data is also learnt and predicted as

mentioned in [9]. As computational complexity is not an shown in figure-5. The curves in both sub-plots begin with

issue for this competition, we apply a mixture of experts the values in Christmas holidays (with time indices around

ansatz. 280 and 650) It is observed from the data in figure-5(a) that

The committee of experts consists of SpiralRNN models data values behaved as an arch in the first season (90 days)

with identical structure but different initialization of pa- after Christmas holidays and continued with another arch

rameter values. Each SpiralRNN model operates in parallel in the second season. Figure-5(b) has shown the model is

without any interference to the others. During the on-line able to capture the swapping of seasons. Note that, in figure-

training, a filtered value of the training error is recorded 5(b), there is overlap between the data and prediction, where

according to equation-10 with et referring to the one-step prediction is displayed with the black solid line and data is

on-line training error and α = 0.01. The reciprocal of this shown by the dashed line.

filtered value at the end of on-line training determines the

weight of the corresponding expert vote in the committee. 9 9

ε ← αε + (1 − α)e2t (10) 15 15

**After the on-line training, autonomous predictions of all 10
**

10

models will be combined based on their ε values. This

procedure is shown in table-I. 5

5

**0. Initialize the n experts;
**

1. For a SpiralRNN model k, implement on-line training with 300 350 400 650 700 750 800

the data and make a prediction yt,k , meanwhile calculate the (a) seasonal-data (b) seasonal-result

filtered error value εk ;

2. Based on their εP values, combine the prediction, such that: Fig. 5. Comparison between result and data, in terms of seasonal

1 n

yt = φ y /ε behavior. Dashed line is the data and solid line is the prediction.

Pn k t,k k

φ = k 1/εk

TABLE I Easter holidays can also be recognized by the trained

Committee of experts. model, which is shown in figure-6. The Easter holidays in

1996 to 1998 are indexed at position around 20, 375 and

755. In figure-6, the prediction on the Easter holidays 1998

IV. R ESULT follows the data values on Easter holidays 1996 and 1997,

which has predicted a spike summit on throughout ATM

Some results from the prediction are shown in this section, machine.

which somehow indicates the performance of SpiralRNN

model. In figure-4, prediction and data are displayed together, 97

where X-axis is the time step and Y-axis is the value. It is 50

shown that the prediction has a period of 7 as is the data,

furthermore, the prediction can not only recognize the main

40

peak within the period but also the smaller bump.

35 30

35 20

30

10

25

20 0

0 200 400 600 800

15 Fig. 6. Comparison between result and data, in terms of easter

behavior. Dashed line is the data and solid line is the prediction.

10

5 Table-II shows the SMAPE errors and its variances values

of the hybrid approach on the testing dataset (i.e. the data

700 710 720 from the last 56 time steps) with varied number of member

Fig. 4. Comparison between result and data, in terms of weekly in the expert committee. It is shown in the table that number

behavior. Dashed line with circles is the data and solid line with of experts doesn’t alter the average result, which on the

squares is the prediction. other side can save the effort from utilizing large amount of

experts and is favourable to the distributed sensor network

application.

# experts 3 5 10 15 20 30

SMAPE 20.65 20.15 20.41 20.96 20.58 20.38

variance 2.45 2.82 2.78 3.30 3.16 3.30

TABLE II

Statistic results. The average SMAPE error value with its

variance of expert committee on all 111 time series, given

different numbers of expert members.

R EFERENCES

[1] H. Gao, R. Sollacher, and H.-P. Kriegel, “Spiral recurrent neural

network for online learning,” in 15th European Symposium On Artificial

Neural Networks Advances in Computational Intelligence and Learning,

Bruges (Belgium), April 2007.

[2] H. Gao and R. Sollacher, “Condictional prediction of time series using

spiral recurrent neural network,” in European Symposium on Artificial

Neural Networks Advances in Computational Intelligence and Learning,

2008.

[3] K. Wieand, “Eigenvalue distributions of random permutation matrices,”

The Annals of Probability, vol. 28, no. 4, pp. 1563–1587, 2000.

[4] H. Jaeger, “Adaptive nonlinear system identification with echo state

networks,” Advances in Neural Information Processing Systems, vol. 15,

pp. 593–600, 2003.

[5] J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14,

no. 2, pp. 179–211, 1990.

[6] R. Kalman, “A new approach to linear filtering and prediction prob-

lems,” Transactions of the ASME–Journal of Basic Engineering, vol. 82,

pp. 35–45, 1960.

[7] F. Lewis, Optimal Estimation: With an Introduction to Stochastic

Control Theory. A Wiley-Interscience Publication, 1986, iSBN: 0-

471-83741-5.

[8] G. Welch and G. Bishop, “An introduction to the Kalman filter,”

University of North Carolina at Chapel Hill, Department of Computer

Science, Tech. Rep. Technical Report 95-041, 2002.

[9] R. Sollacher and H. Gao, “Efficient online learning with spiral recurrent

neural networks,” in to appear in: International Joint Conference on

Neural Networks, 2008.

- CHAPTER 2-Demand ForecastingUploaded byJohn Tuah
- New Microsoft Office PowerPoint PresentationUploaded byParul Sharma
- paper.pdfUploaded byAmitabha
- AI_and_Law-Achievements_and_failures.pdfUploaded byHai Dai Gia
- Scheduled Sampling for Sequence Prediction With Recurrent Neural NetworksUploaded byShantanu Sharma
- forecastingslides-1213683784415910-9 (1)Uploaded byTejal Patil
- 31995 2446 CSE IV SyllabusUploaded byGanni Om Ramyashree
- Week14-LO5-AdvFiltering_slides.pptUploaded byBenson Edwin Raj
- Artificial Neural NetworkUploaded byRoots999
- ManuscriptUploaded byAbi Moki
- Extraction of Rules With PSOUploaded byNaren Pathak
- HONNUploaded byc_mc2
- Intelligent Support SystemsUploaded byCamilo Amarcy
- Artificial neural networksUploaded bym_pandeey
- Index Analysis of Financial Time SeriesUploaded byAlim Ahmad
- AI and Soft ComputingUploaded byAshwani Jha
- p118Uploaded bybonuth
- Halliburton - Perforating SolutionsUploaded byTaisir El-Gorashi
- Using Neural Networks for Image ClassificationUploaded bywixu
- 1607.01759v2Uploaded byMuhammad Agung Pribadi

- Conditional Prediction of Time Series Using Spiral Recurrent Neural NetworkUploaded byjoegao1
- Keine Religion des FriedensUploaded byjoegao1
- Einseitige Berichterstattung über China in den deutschen MedienUploaded byjoegao1
- spiral recurrent neural network for time series predictionUploaded byjoegao1
- Die Wahrheit liegt irgendwo in der MitteUploaded byjoegao1