Professional Documents
Culture Documents
Ch22 Presn
Ch22 Presn
1
22 Network Training Steps
Initialize Weights
Analyze Network &
Performance Train Network
Use Network
2
22 Selection of Data
3
22 Determination of Input Range
4
22 Data Preprocessing
•Nonlinear transformations.
pt = 1 / p p t = exp(p )
6
22 Importance of Transfer Function
7
22 Softmax Transfer Function
exp (n )
ai = f (n i ) = -
----
-----------------i------
S
∑ exp ( n j)
j=1
8
22 Missing Data
9
22 Problem Types
• Fitting (nonlinear regression). Map between a set of inputs and a
corresponding set of targets. (e.g., estimate home prices from tax rate,
pupil/teacher ratio, etc.; estimate emission levels from fuel
consumption and speed; predict body fat level from body
measurements.)
• Pattern recognition (classification). Classify inputs into a set of target
categories. (e.g., recognize the vineyard from a chemical analysis of
the wine; classify a tumor as benign or malignant, from uniformity of
cell size, clump thickness and mitosis.)
• Clustering (segmentation) Group data by similarity. (e.g., group
customers according to buying patterns, group genes with related
expression patterns.)
• Prediction (time series analysis, system identification, filtering or
dynamic modeling). Predict the future value of some time series. (e.g.,
predict the future value of some stock; predict the future value of the
concentration of some chemical; predict outages on the electric grid.)
10
22 Choice of Network Architecture
• Fitting
– Multilayer networks with sigmoid hidden layers and linear output layers.
– Radial basis networks
• Pattern Recognition
– Multilayer networks with sigmoid hidden layers and sigmoid output
layers.
– Radial basis networks.
• Clustering
– Self-organizing feature map
• Prediction
– Focused time-delay neural network
– NARX network
11
22 Prediction Networks
Inputs Layer 1 Layer 2
1
p (t)
T a1(t) a2(t)
Focused D
L
IW1,1
1 1
n1(t)
1
S x1
LW2,1
2 1
2
n (t) S2x1
S x (R d) S xS
Time Delay
2
S x1
1 b
1 1
S x1 1 b2
R1 S1x1 S1 S2
2
S x1
1 2
S1
2
R S1x1 S x1 S
T
D LW1,3
L
12
22 Architecture Specifics
• Number of layers/neurons
– For multilayer network, start with two layers. Increase number of layers if
result is not satisfactory.
– Use a reasonably large number of neurons in the hidden layer (20). Use
early stopping or Bayesian regularization to prevent overfitting.
– Number of neurons in output layer = number of targets. You can use
multiple networks instead of multiple outputs.
• Input selection
– Sensitivity analysis (see later slide)
– Bayesian regularization with separate α for each column of the input
weight matrix.
13
22 Weight Initialization
i w = 0 .7 ( )
S 1 1/ R
− iw and iw .
14
22 Weight Initialization
15
22 Choice of Training Algorithm
16
22 Stopping Criteria
• Norm of the gradient (of the mean squared error) less than
a pre-specified amount (for example, 10-6).
• Early stopping because the validation error increases.
• Maximum number of iterations reached.
• Mean square error drops below a specified threshold (not
generally a useful method).
• Mean square error curve (on a log-log scale) becomes flat
for some time (user stop).
17
22 Typical Training Curve
3
10
2
10
Sum Square Error
1
10
0
10
−1
10
−2
10
−3
10
0 1 2 3
10 10 10 10
Iteration Number
18
22 Competitive Network Stopping Criteria
19
22 Choice of Performance Function
Q
1 T
Mean Square Error F ( x) = -----
-
---
--
∑ (t q – a q ) (t q – a q )
M
QS
q=1
M
Q S
1 2
-
----
------
F (x ) =
QS
M ∑ ∑ ( ti, q – a i, q )
q = 1i = 1
M
Q S
1 K
Minkowski error F (x ) = ----M
-
---- --
∑ ∑ t i , q – a i, q
QS
q = 1i = 1
M
Q S
a i, q
Cross-Entropy F ( x) = – ∑ ∑ t i, q --t--i--, --q-
ln
q = 1i = 1
20
22 Committees of Networks
21
22 Post-Training Analysis
• Fitting
• Pattern Recognition
• Clustering
• Prediction
22
22 Fitting
Regression Analysis (Outputs vs Targets)
Q Q
∑ ( t q – t ) (a q – a ) t = ∑ tq
a q = mtq + c + εq q= 1
m̂ = --------------
Q
--------------------------------- q=1
2 Q
ĉ = am̂ – t ∑ ( tq – t ) a = ∑ aq
q= 1
q=1
R Value (-1<R<1) Q
Q 1
st = -------------
Q–1
∑ ( tq – t)
∑ ( t q – t ) (a q – a ) q=1
q= 1
R = ------------------------1-----------------------
( Q – ) s t sa Q
1
sa = -------------
Q –1
∑ ( aq – a )
q= 1
23
22 Sample Regression Plot
50
R = 0.965
40
R2 = 0.931
30
a
20
10
0
0 5 10 15 20 25 30 35 40 45 50
t
24
22 Error Histogram
140
120
100
80
60
40
20
0
−8 −6 −4 −2 0 2 4 6 8 10 12
e=t-a
25
22 Pattern Recognition
False Positives
Confusion Matrix
(Type I Error)
47 1 97.9%
1
Output Class 22.0% 0.5% 2.1%
4 162 97.6%
2
1.9% 75.7% 2.4%
False Negatives 1 2
(Type II Error) Target Class
26
22 Pattern Recognition
Receiver Operating Characteristic (ROC) Curve
0.9
0.8
0.7
True Positive Rate
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
False Positive Rate
27
22 Clustering (SOM)
2
⎛ – iw – jw ⎞
hi,j = neighborhood function h ij = exp ⎜ ------------------
---------- ⎟
2d 2
⎝ ⎠
28
22 Prediction
Autocorrelation Function of Prediction Errors.
Q –τ
1
--
---
-------
R e (τ ) =
Q–τ ∑ e (t ) e ( t + τ )
t=1
Confidence Intervals.
2R ( 0) 2 R e (0 )
– -
----
----
e--------
< R e( τ) < ----------------
-
Q Q
0.2 0.03
Before After
0.02
0.1
0.01
−0.1
−40 −20 0 20 40 −40 −20 0 20 40
29
22 Prediction
Cross-correlation Between Prediction Errors and Input.
Q–τ
1
R pe ( τ) = ------------ ∑ p ( t ) e( t + τ )
Q–τ
t= 1
Confidence Intervals.
2 R ( 0 ) R (0 ) 2 R ( 0 ) R (0 )
– ---------------------------------------- < R pe (τ ) < ----------------------------------------
e p e p
Q Q
0.2 0.02
Before After
0.01
0.1
−0.01
−0.1 −0.02
−40 −20 0 20 40 −40 −20 0 20 40
30
22 Overfitting and Extrapolation
31
22 Diagnosing Problems
33
22 Sensitivity Analysis
Check for important inputs.
m ∂F
s i ≡ ---------
m
∂n i
1 1
S S
ˆ ˆ ∂ni
1
∂ni
1
--∂--F
--- = -∂----F--- ×
-------- 1
× --------
∂pj ∑ ∂n i
1 ∂ pj
= ∑ si
∂ pj
i=1 i =1
R
1 1 1
ni = ∑ wi, j p j + b i
j=1
1 1
S S
ˆ ˆ ∂ ni
1
∂--F--
--- F--
--∂---- -------- 1 1
∂ pj
= ∑ ∂n i
×
1 ∂p
j
= ∑ s i × wi, j
i=1 i= 1
ˆ
-∂--F
T
--- = ( W1 ) s1
∂p
34