Professional Documents
Culture Documents
1
Neural networks
Neural networks
2
Neural networks
If the neuron does end up firing, the nerve impulse, or action potential, is
conducted down the axon.
3
Neural networks
The signals (p vectors x i = {xi1 , ..., xip } ) interact with the dendrites
through synaptic weights ( ω = {ω0 , . . . , ωp } ). The dendrites carry input
signals to the cell body, where they all are summed.
The output signal ybi is equal to :
p
!
X
> 1
ybi = φ ω0 + ωj xij =φ ω .
xi
j =1
4
Neural networks
e z − e −z
φ(z ) = ∈ [−1, 1].
e z + e −z
These activation functions are bounded. Other popular activation functions
are the rectifier
φ(z ) = max(z , 0)
φ(z ) = log(1 + e z ).
5
Neural networks
6
Neural networks
The number of neuronal layers : n net . The number of neurons in the j th layer
is noted njnet .
Activation function φi,j (.) : i denotes the position in a layer and j the layer.
Output of the i th neurons in hidden and output layers j , ybi,j , is
njnet
−1
j
X j
ybi,j = φi,j ωi0 + ωi,k ybk ,j −1
k =1
j
where ωi,k are the weights for the k th input signal received by the neuron
(i , j ).
7
Neural networks
Cybenko’s theorem.
Let φ(·) be a bounded, continuous function. Let Im denote [0, 1]m . Given an ε > 0 and any
function f ∈ C (Im ), there exist N ∈ N, vi , ω0,i ∈ R and vectors ωi ∈ Rm , where i = 1, . . . , N ,
such that
N
X
F (x ) = vi φ ωiT x + ω0,i
i=1
With simple words : Any continuous function may be approached by a single layer neural
networks.
8
Neural networks
Deep learning ?
A shallow network has only 1 hidden A deep neural network has multiple
layer. hidden layers and eventually loops
Universal approximator but # of neurons (recurrent networks).
may be high for approaching a function. Universal approximator with
multiplicative layer. Less neurons for
approaching high-dim. functions.
9
Neural networks
10
Neural networks
11
Neural networks
µ = β> x
g(µ) = β> x
g(µ) = f (x )
12
Neural networks
Data preprocessing
13
Neural networks
Data preprocessing
Categorical variables are modeled as dummy variables : e.g. if a variable counts
the three following modalities : ’Urban’, ’Suburban’ and ’Countryside’, we create
two dummy input variables as follows
(
1 if Urban
xi,d =
0 else .
(
1 if Suburban
xi,d+1 =
0 else .
14
Neural networks
Training...
Weights are found by minimizing the distance between output signals of the
network and real outputs.
15
Neural networks
16
Neural networks
Gradient descent
17
Neural networks
Gradient descent
or
−1
Ω∗ = Ω0 − H (R(Ω0 )) ∇R(Ω0 ) .
Therefore by iterating, we can find optimal weights.
Problems :
Risk to reach a local minimum...
Inversion of the Hessian matrix...
18
Neural networks
Back-propagation
Solution : Adjust the vector of weights Ωt by a small step in the opposite direction
of the gradient :
Algorithm
Main procedure :
For t = 0 to maximum epoch, T
1. Calculate the gradient ∇R(Ωt )
2. Update the step size
ρt+1 = ρ0 e −αt
19
Neural networks
In most of applications, the quality of the model is assessed by the total loss
R(Ω).
There is no rule for determining the best architecture for the neural network.
In practice, we test several models and choose the one with the lowest loss...
But overfitting must be checked.
Overfitting
Overfitting is the production of an analysis that corresponds too closely or exactly
to a particular set of data, and may therefore fail to fit additional data or predict
future observations reliably.
20
Neural networks
Solution to control overfitting : split the database into training (e.g. 85%)
and validation (e.g. 15%) samples
Train the neural network on the training set, monitor the training loss
Train the neural network on the validation set, monitor the validation loss
21
Neural networks
22
Neural networks
23
Neural networks
24
Neural networks
Till now, we have assumed that the estimator is the output of the last layer
of neurons : (b
yi )i=1,...n = (b
yi,n net )i=1,...n .
However the domain of Yi depends on the distribution (R for the Gaussian,
R+ for the Gamma and Poisson).
For these two last distributions, we transform the output signal of the neural
network with a function g(.) to ensure that the estimator is well in the
domain of definition of Yi .
ybi = g (b
yi,n net ) i = 1, ...n
25
Neural networks
∂
transform, g(.) ∂b
yi,n net
D(yi , ybi,n net )
Normal none y
bi := y
bi,n net 2νi ybi,n net − yi
−byi,n net
Gamma exponential y
bi := exp y
bi,n net 2νi 1 − yi e
y
Poisson exponential y
bi := exp y 2νi e i,n net − yi
b
bi,n net
26
Neural networks
27
Neural networks
Frequencies of claims
ybi = exp (b
yi,n net ) i = 1, ..., n.
28
Neural networks
Estimation
(
1 if Suburban
xi,d+1 =
0 else .
29
Neural networks
Database : 62436 contracts but only 693 claims then we fit the model on the
whole dataset :
# of hidden # of
Model neurons weights Deviance AIC BIC
NN(3) 3 52 5664.36 7116.93 7587.11
NN(4) 4 69 5564.86 7051.43 7675.32
NN(5) 5 86 5546.91 7067.48 7845.08
NN(2,2) 2×2 41 5684.57 7115.14 7485.86
NN(3,3) 3×3 57 5499.72 7080.29 8129.15
GLM 16 5781.66 7162.23 7306.90
30
Neural networks
Frequency estimates :
µ = g −1 (fnet (x ))
4
!
X
2
= exp ω10 + ωk21 ybk ,1 .
k =1
31
Neural networks
Forecast frequencies of claims for drivers of a 4 years old vehicle, with the NN(4)
model.
32
Neural networks
Forecast frequencies of claims for drivers of a 4 years old vehicle, with the NN(4)
model.
33
Neural networks
Package ’NeuralNet’ in R
34
Neural networks
Package ’NeuralNet’ in R
Other functions :
compute(x, covariate, rep = 1)
Computes the outputs of all neurons for specific covariate given a trained
neural network, x.
rep : an integer indicating the neural network’s repetition which should be
used.
gwplot(x, rep = NULL, ...)
35