Professional Documents
Culture Documents
Regularization in Neural Networks: Sargur Srihari Srihari@buffalo - Edu
Regularization in Neural Networks: Sargur Srihari Srihari@buffalo - Edu
Sargur Srihari
srihari@buffalo.edu
1
Machine Learning Srihari
2
Machine Learning Srihari
Definition of Regularization
• Generalization error
• Performance on inputs not previously seen
• Also called as Test error
• Regularization is:
• “any modification to a learning algorithm to reduce
its generalization error but not its training error”
• Reduce generalization error even at the expense of
increasing training error
• E.g., Limiting model capacity
is a regularization method
3
Machine Learning Srihari
Goals of Regularization
• Some goals of regularization
1. Encode prior knowledge
2. Express preference for simpler model
3. Need to make underdetermined problem
determined
4
Machine Learning Srihari
• Generalization
• Prevent over-fitting
• Occam's razor
• Bayesian point of view
• Regularization corresponds to prior distributions on
model parameters
5
Machine Learning Srihari
6
Machine Learning Srihari
7
Machine Learning Srihari
Invariance to Transformation
Same building, different views, scales
11
Machine Learning Srihari
Linear Transformation
https://mathinsight.org/determinant_linear_transformation
Machine Learning Srihari
yk = ∑ wkj z j + wk 0
j
13
Machine Learning Srihari
16
Machine Learning Srihari
Regularizer invariance
• The regularizer should be invariant to re-scaling
of weights and shifts of biases
• Such a regularizer is λ2 ∑ w + λ2 ∑ w
1
w∈W1
2 2
w∈W2
2
n =1
y(x,w)=w0+w1x
p(w|α)~N(0,1/α)
( )
N
p(w | t) = ∏ N tn | wTφ(x n ), β −1 N(w | 0,α −1I)
n =1
w1
Log of Posterior is
2
w0
β N
2 n =1
{ α
}
ln p(w | t) = − ∑ tn − wTφ(x n ) − wT w + const
2
output
output
scale
controlled
by α2w
scale
controlled
by α1w
input 21
Machine Learning Srihari
3. Early Stopping
• Alternative to regularization Training Set Error
• In controlling complexity
• Error measured with an
independent validation set
• shows initial decrease in error Iteration Step
complexity 22
Machine Learning Srihari
• Effective no of parameters in
network grows during training 23
Machine Learning Srihari
Invariances
• In classification there is a need for:
• Predictions invariant to transformations of input
• Example:
• handwritten digit should be assigned same
classification irrespective of position in image
(translation) and size (scale)
• Such transformations produce significant
changes in raw data, yet need to produce same
output from classifier
• Examples: pixel intensities,
in speech recognition, nonlinear time warping along time axis 24
Machine Learning Srihari
25
Machine Learning Srihari
Approaches to Invariance
1. Training set augmented
• Transform training patterns for desired invariances
E.g., shift each image into different positions
2. Regularization term to error function
• Penalize changes inoutput when input transformed
• Leads to tangent propagation.
3. Invariance built into pre-processing
• By extracting features invariant to transformations
4. Build invariance into structure of network (CNN)
• Local receptive fields and shared weights
26
Machine Learning Srihari
warp each
handwritten digit
image before
presentation to
model
• Easy to Random displacements
Δx,Δy ∈ (0,1) at each pixel
implement but Then smooth by convolutions
of width 0.01, 30 and 60 resply
computationally
costly
27
Machine Learning Srihari
∂s(x n ,ξ )
τn =
∂ξ ξ = 0 28
Machine Learning Srihari
30
Machine Learning Srihari
31