Professional Documents
Culture Documents
Eric Plummer
Computer Science Department
University of Wyoming
February 15, 2010
Topics
• Thesis Goals
• Time Series Forecasting
• Neural Networks
• K-Nearest-Neighbor
• Test-Bed Application
• Empirical Evaluation
• Data Preprocessing
• Contributions
• Future Work
• Conclusion
• Demonstration
0
5
10
15
20
25
30
35
-10
-5
0
5
10
15
20
25
30
35
40
0 0
7 7
14 14
21 21
28 28
35 35
42 42
49 49
56 56
63 63
70 70
77 77
84 84
91 91
Original
98 98
105 105
Original
Data Point
119 119
Original
126 126
More Noisy
Count 133 133
140 140
More Noisy
0
20
40
60
80
100
120
140
160
180
200
147 147
1784
154 154
1791
161 161
1798
168 168
1805
175 175
1812
182 182
1819
189 189
1826 196
196
1833 203 203
1840 210 210
1847
1854
1861
1868
1875
1882
Year
1889
Value Value
1896
-5
0
5
10
15
20
25
30
35
0
10
20
30
40
50
60
Sunspots 1784-1983
1903 0 0
Sunspots
1910 7 7
1917 14 14
1924 21 21
1931 28 28
1938 35 35
42 42
1945
49 49
1952
56 56
1959
63 63
1966
70 70
1973
77 77
1980 84
84
91 91
Original
Original
98 98
105 105
112 112
Data Point
Data Point
119 119
126 126
Original with Ascending
Original with Less Noisy
Less Noisy
Ascending
133 133
Less Noisy
Ascending
140 140
147 147
154 154
161 161
168 168
175 175
182 182
189 189
196 196
Empirical Evaluation – Data Series
203 203
210 210
Empirical Evaluation –
Neural Network Architectures
• Number of network inputs
based on data series
• Need to make unambiguous
examples
• For “sawtooths”:
– 24 inputs are necessary • For sunspots:
– Test networks with 25 & – 30 inputs
35 inputs – 1 hidden layer with 30
– Test networks with 1 units
hidden layer with 2, 10, & • For real-world data series,
20 hidden layer units selection may be trial-and-
– One output layer unit error!
Heuristic NN Simple NN
Nets Trained on Original Nets Trained on Original
30 30
25
25
20
20
15
Value
Value
15
10
10
5
5
0
222
240
261
282
216
219
225
228
231
234
237
243
246
249
252
255
258
264
267
270
273
276
279
285
-5 0
216
219
237
240
255
258
276
279
222
225
228
231
234
243
246
249
252
261
264
267
270
273
282
285
-10 -5
Data Point Data Point
Smaller NN K-N-N
Nets Trained on Original K-Nearest-Neighbor on Original
Original 25,10 25,20 Original 2,20 2,24 2,30
150 35
30
100
25
50
20
Value
Value
0
15
240
261
282
216
219
222
225
228
231
234
237
243
246
249
252
255
258
264
267
270
273
276
279
285
-50 10
5
-100
0
222
261
282
216
219
225
228
231
234
237
240
243
246
249
252
255
258
264
267
270
273
276
279
285
-150
Data Point Data Point
Empirical Evaluation – Less Noisy Data Series
Heuristic NN Simple NN
Nets Trained on Less Noisy Nets Trained on Less Noisy
35
30
30
25
20
20
Value
Value
15 10
10
0
5
222
240
261
282
216
219
225
228
231
234
237
243
246
249
252
255
258
264
267
270
273
276
279
285
0
-10
222
240
261
282
216
219
225
228
231
234
237
243
246
249
252
255
258
264
267
270
273
276
279
285
-5
-10 -20
Data Point Data Point
K-N-N
K-Nearest-Neighbor on Less Noisy
30
25
20
Value
15
10
0
222
261
282
216
219
225
228
231
234
237
240
243
246
249
252
255
258
264
267
270
273
276
279
285
Data Point
Empirical Evaluation – More Noisy Data Series
Heuristic NN Simple NN
Nets Trained on More Noisy Nets Trained on More Noisy
50 50
40
40
30
30
20
Value
Value
20
10
10
0
222
240
261
282
216
219
225
228
231
234
237
243
246
249
252
255
258
264
267
270
273
276
279
285
0
-10
222
240
261
282
216
219
225
228
231
234
237
243
246
249
252
255
258
264
267
270
273
276
279
285
-10 -20
-20 -30
Data Point Data Point
K-N-N
K-Nearest-Neighbor on More Noisy
30
25
20
15
Value
10
0
222
240
261
282
216
219
225
228
231
234
237
243
246
249
252
255
258
264
267
270
273
276
279
285
-5
-10
Data Point
Empirical Evaluation – Ascending Data Series
Heuristic NN Simple NN
Nets Trained on Ascending Nets Trained on Ascending
60 60
50 50
40 40
Value
Value
30 30
20 20
10 10
0 0
222
261
282
222
261
282
216
219
225
228
231
234
237
240
243
246
249
252
255
258
264
267
270
273
276
279
285
216
219
225
228
231
234
237
240
243
246
249
252
255
258
264
267
270
273
276
279
285
Data Point Data Point
Empirical Evaluation – Longer Forecast
Heuristic NN
Nets Trained on Less Noisy (Longer Forecast)
40
20
0
Value
226
266
306
346
216
221
231
236
241
246
251
256
261
271
276
281
286
291
296
301
311
316
321
326
331
336
341
351
356
-20
-40
-60
-80
Data Point
100
50
Value
0
226
266
306
346
216
221
231
236
241
246
251
256
261
271
276
281
286
291
296
301
311
316
321
326
331
336
341
351
356
-50
-100
Data Point
Empirical Evaluation – Sunspots Data Series
200
150
Count
100
50
0
1950
1952
1954
1956
1958
1960
1962
1964
1966
1968
1970
1972
1974
1976
1978
1980
1982
-50
Year
Empirical Evaluation –
Discussion
• Heuristic training method observations:
– Networks train longer (more epochs) on smoother data series like
the original and ascending data series
– The total squared error and unscaled error are higher for noisy data
series
– Neither the number of epochs nor the errors appear to correlate
well with the coefficient of determination
– In most cases, the committee forecast is worse than the best
candidate's forecast
• When actual values are unavailable, choosing the best candidate is
difficult!
Coefficient of Determination
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
-0.1 Original Less Noisy More Noisy Ascending -0.1 Original Less Noisy More Noisy Ascending
-0.2 -0.2
-0.3 -0.3
-0.4 -0.4
-0.5 -0.5
Data Series Data Series
• First-difference
– For ascending data series, a neural network trained on first-
difference can forecast near perfectly
– In that case, it is better to train and forecast on first-
difference
– FORECASTER reconstitutes forecast from its first-difference
• Moving average
– For noisy data series, moving average would eliminate much
of the noise
– But would also smooth out peaks and valleys
– Series may then be easier to learn and forecast
– But in some series, the “noise” may be important data (e.g.,
utility load forecasting)
• Presented:
– Time series forecasting
– Neural networks
– K-nearest-neighbor
– Empirical evaluation
• Learned a lot about the implementation details of the
forecasting techniques
• Learned a lot about MFC programming
P
Oc = hOutput ∑ ic , p wc , p + bc where hOutput ( x ) = x
p =1
′ ( x )( Dc − Oc )
δ c = hOutput
N
′
δ c = h Hidden ( x )∑ δ n wn ,c
n =1
∆wc , p = α δcO p
Forecast Error Formulas
1 C
E C = ∑ ( Dc − O c )
2
2 c =1
C
UEC = ∑ UDc − UOc
c =1
n
1 ∀i xˆi = xi
∑(x
if
i − xˆ i ) 2
0 > k > 1
if xˆ i is a better forecast than x
r2 = 1 − i =1
r2 =
n
0 if generally xˆi = x
∑ i
( x
i =1
− x ) 2
k < 0 if xˆ i is a worse forecast than x
Related Work