Professional Documents
Culture Documents
Neural Networks For Time Series Prediction
Neural Networks For Time Series Prediction
Prediction
P x(t)
2
Examples of Time Series
sunspot activity
3
Discrete Phenomena
4
Continuous Phenomena
5
Nyquist Sampling Theorem
6
Studying Time Series
In addition to describing either discrete or continuous phenomena,
time series can also be deterministic vs stochastic, governed by linear
vs nonlinear dynamics, etc.
price will go up
no change
9
Embedding
10
Using the Past to Predict the Future
x(t) tapped delay line
delay element
x(t 1)
f (t + 1)
x
x(t 2)
x(t T )
11
Linear Systems
Its possible that P , the process whose output we are trying to
predict, is governed by linear dynamics.
There are two basic filter architectures, known as the FIR filter and
the IIR filter.
12
Finite Impulse Response (FIR) Filters
Characterized by q + 1 coefficients:
q
X
x[t] = i u[t i] (3)
i=0
FIR filters implement the convolution of the input signal with a given
coefficient vector {i}.
They are known as Finite Impulse Response because, when the input
u[t] is the impulse function, the output x is only as long as q + 1,
which must be finite.
1.2 1.2 1.2
1 1 1
0 0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30
Characterized by p coefficients:
p
X
x[t] = i x[t i] + u[t] (4)
i=1
In IIR filters, the input u[t] contributes directly to x[t] at time t, but,
crucially, x[t] is otherwise a weighed sum of its own past samples.
14
FIR and IIR Differences
In DSP notation:
u[t] 0
x[t] u[t] x[t]
1 1
u[t 1] x[t 1]
2 2
u[t 2] x[t 2]
q p
u[t q] x[t p]
F IR IIR
15
DSP Process Models
16
Autoregressive (AR[p]) Models
An AR[p] assumes that at its heart is an IIR filter applied to some
(unknown) internal signal, [t]. p is the order of that filter.
p
X
x[t] = i x[t i] + [t] (5)
i=1
17
Estimating AR[p] Parameters
Batch version:
x[t] x
[t] (8)
p
X
= wi x[t i] (9)
i=1
w1
x[p + 1] x[1] x[2] x[p]
w2
x[p + 2] = x[2] x[3] x[p + 1] ... (10)
... ... ... ... ...
wp
| {z }
w
18
Estimating AR[p] Parameters
x[t] x
[t]
p
X
= wi x[t i]
i=1
19
Moving Average (MA[q]) Models
20
Autoregressive Moving Average
(ARMA[p, q]) Models
A combination of the AR[p] and MA[q] models:
p
X q
X
x[t] = ix[t i] + i[t i] + [t] (12)
i=1 i=1
[t i] = x[t i] x
[t i] (13)
21
Linear DSP Models as Linear NNs
DSP Filter DSP Model NN Connections
FIR MA[q] feedforward
IIR AR[p] recurrent
1
x(t 1)
p1
x(t p + 1)
2
x(t 2)
p
x(t p)
22
Nonlinear AR[p] Models
Non-linear models are more powerful, but need more training data,
and are less well behaved (overfitting, local minima, etc).
23
Nonlinear ARMA[p, q] Models
[t]
x
f
train with backprop
f f f f
subtract
[t 3]
[t 2]
[t 1]
x[t 3] x[t 2]
x[t 1]
24
Jordan Nets
out
hidden
plan state
25
The Case for Alternative Memory Models
Uniform sampling is simple but has limitations.
x(t)
x(t 1)
f x(t + 1)
x(t 2)
x(t T )
Can only look back T equispaced time steps. To look far into the
past, T must be large.
i[t] = x[t i + 1]
x (15)
2[t]
x
x(t 1)
3[t]
x f x(t + 1)
x(t 2)
T +1[t]
x
x(t T )
27
Propose Non-uniform Sampling
28
Convolutional Memory Terms
Mozer has suggested treating each memory term as a convolution
of x[t] with a kernel function:
t
X
i[t] =
x ci[t ] x[ ] (17)
=1
1.5
ci[t]
0.5
0
0 2 d_i 6 8 10 12
t
29
Exponential Trace Memory
The idea: remember past values as exponentially decaying weighed
average of input:
Each x
i uses a different decay rate.
0.5
0.4
0.3
ci[t]
0.2
0.1
0
0 2 4 6 8 10 12
t
30
Exponential Trace Memory, contd
A nice feature: if all i , dont have to do the convolution at
each time step. Compute incrementally:
i[t] = (1 ) x[t] +
x xi[t i] (20)
out
hidden
plan state
31
Special Case: Binary Sequences
Memory x
[t] is a bit string, treated as a floating point fraction.
x[t] = {1} [t] = .1
x
{1, 0} .01
{1, 0, 0} .001
{1, 0, 0, 1} .1001
{1, 0, 0, 1, 1} .11001
32
Memory Depth and Resolution
33
Gamma Memory (deVries & Principe)
t (1 )di+1 tdi if t d
di i i i
ci[t] = (21)
0 otherwise
0.8
0.6
ci[t]
0.4
0.2
0
0 2 4 6 8 10 12
t
Can store any transformation we like. For example, can store the
internal state of the NN.
out
hidden
plan context
Think of this as a 1-tap delay line storing f (x[t]), the hidden layer.
35
Horizon of Prediction
Three options:
Train to predict x[t + 1] only, but iterate to get x[t + s] for any s.
36
Predicting Sunspot Activity
37
Fessant et al: the IR5 Sunspots Series
1
IR5[t] = (R[t 3] + R[t 2] + R[t 1] + R[t] + R[t + 1])
5
where R[t] is the mean sunspot number for month t and IR5[t] is
the desired index.
38
Fessant et al: Simple Feedforward NN
Output: {
x[t], , x
[t + 5]}
(1087 weights)
39
Fessant et al: Modular Feedforward NN
Output: x
[t + 5]
(552 weights)
[t + 5]
x [t], , x
x [t + 5]
40
Fessant et al: Elman NN
Output: {
x[t], , x
[t + 5]}
(786 weights)
41
Fessant et al: Results
42