Professional Documents
Culture Documents
10 11 12 Neural Network
10 11 12 Neural Network
QBUS64840 Predictive Analytics These lecture slides are comprehensive enough for your study.
Optional readings include
I Online textbook Section 9.1 and 9.3: introduces (very briefly)
Forecasting with Neural Networks and Deep Learning
some concepts in neural networks.
University of Sydney Business School I A comprehensive book is Deep Learning by Goodfellow,
Bengio and Courville, freely available at
https://www.deeplearningbook.org
1 41 2 41
3 41 4 41
Introduction
5 41 6 41
Introduction Introduction
A simple example
# Compile model
model.compile(loss=’MSE’, optimizer=’adam’)
# Fit the model
model.fit(X_train, y_train, epochs=100, batch_size=10)
# evaluate the model
MSE_nn = model.evaluate(X_test, y_test)
print(’\n Root of MSE on the test data for neural net: ’, np.sqrt(MSE_nn))
10 41 11 41
Introduction Introduction
A simple example
12 41 13 41
13 41 14 41
I
Introduction: Representation Learning Introduction: Representation Learning
I We want to predict a response Y , based on raw/original I We want to predict a response Y , based on raw/original
covariates X = (X1 , ..., Xp ), using linear regression modelling covariates X = (X1 , ..., Xp ), using linear regression modelling
I Usually, before doing regression modelling, some appropriate I Usually, before doing regression modelling, some appropriate
transformation of the covariates Xi is needed: Z1 = 1 (X ), transformation of the covariates Xi is needed: Z1 = 1 (X ),
..., Zd = d (X ). ..., Zd = d (X ).
I The Zi are called predictors or features. I The Zi are called predictors or features.
I Then we model I Then we model
15 41 15 41
Introduction: Representation Learning Introduction: Representation Learning
I We want to predict a response Y , based on raw/original I We want to predict a response Y , based on raw/original
covariates X = (X1 , ..., Xp ), using linear regression modelling covariates X = (X1 , ..., Xp ), using linear regression modelling
I Usually, before doing regression modelling, some appropriate I Usually, before doing regression modelling, some appropriate
transformation of the covariates Xi is needed: Z1 = 1 (X ), transformation of the covariates Xi is needed: Z1 = 1 (X ),
..., Zd = d (X ). ..., Zd = d (X ).
I The Zi are called predictors or features. I The Zi are called predictors or features.
I Then we model I Then we model
15 41 15 41
18 41 19 41
What are neural networks? What are neural networks?
I A neural network is an interconnected assembly of simple I A neural network is an interconnected assembly of simple
processing units or neurons, which communicate by sending processing units or neurons, which communicate by sending
signals to each other over weighted connections signals to each other over weighted connections
I A neural network is made of layers of similar neurons: an I A neural network is made of layers of similar neurons: an
input layer, (one or many) hidden layers, and an output layer. input layer, (one or many) hidden layers, and an output layer.
I The input layer receives
I data from outside the network. The I The input layer receives data from outside the network. The
output layer sends data out of the network. Hidden layers output layer sends data out of the network. Hidden layers
receive/process/send data within the network. receive/process/send data within the network.
I A neural network is said to be deep, if it has many hidden I A neural network is said to be deep, if it has many hidden
layers. Deep neural network modelling is collectively refered to layers. Deep neural network modelling is collectively refered to
as deep learning. as deep learning.
19 41 19 41
I A neural network is an interconnected assembly of simple I In a nutshell, a neural net is a multivariate function: output ⌘
processing units or neurons, which communicate by sending is a function of the inputs X = (X1 , ..., Xp )>
signals to each other over weighted connections
⌘ = f (X1 , ..., Xp )
I A neural network is made of layers of similar neurons: an
input layer, (one or many) hidden layers, and an output layer. I More precisely, this function is a layered composite function
I The input layer receives data from outside the network. The
output layer sends data out Eof the network. Hidden layers Z1 = f1 (X )
receive/process/send data within the network. Z2 = f2 (Z1 )
I A neural network is said to be deep, if it has many hidden ...
layers. Deep neural network modelling is collectively refered to ZL = fL (ZL 1)
as deep learning. ⌘ = fL+1 (ZL )
19 41 20 41
21 41 22 41
Elements of a neural network
Fundamental concepts
23 41 24 41
25 41 25 41
25 41 25 41
Elements of a neural network Elements of a neural network
A (feedforward) neural net includes It’s useful to distinguish three types of units:
I a set of processing units (also called neurons, nodes) I input units (often denoted by X ): receive data from outside
I weights wik , which are connection strengths from unit i to the network
unit k I hidden units (often denoted by Z ): receive data from and
I a propagation rule that determines the total input Sk of unit send data to units within the network.
k, from the units that send information to unit k I output units: send data out of the network. The type of the
I the output Zk for each unit k, which is a function of the input output depends on the task (regression, binary classification or
Sk multinomial regression). In many cases, there is only one
I an activation function hk that determines the output Zk based scalar output unit.
on the input Sk , Zk = hk (Sk ) Given the signal from a set of inputs X , an NN produces an output.
25 41 26 41
which is a weighted sum of the outputs from all units i that are
connected to unit k, plus a bias/intercept term w0k .
Then, the output of unit k is
!
X
Zk = hk (Sk ) = hk wik Zi + w0k
i
27 41 28 41
The total input sent to unit k is The total input sent to unit k is
X X
Sk = wik Zi + w0k Sk = wik Zi + w0k
i i
which is a weighted sum of the outputs from all units i that are which is a weighted sum of the outputs from all units i that are
connected to unit k, plus a bias/intercept term w0k . connected to unit k, plus a bias/intercept term w0k .
Then, the output of unit k is Then, the output of unit k is
I ! !
X X
Zk = hk (Sk ) = hk wik Zi + w0k Zk = hk (Sk ) = hk wik Zi + w0k
i i
Usually, we use the same activation function hk = h for all units. Usually, we use the same activation function hk = h for all units.
28 41 28 41
Elements of a neural network Neural Net as a Data Representation Learning tool
Popular activation functions:
Z = (X ) = (X , w )
Neural Net as a Data Representation Learning tool Neural Net as a Data Representation Learning tool
Z = (X ) = (X , w )
30 41 31 41
32 41 33 41
Forward propagation algorithm* Forward propagation algorithm*
0 1
(j)
w0,v
B C
(j) B w (j) C
wv = B 1,v C : set of weights sends signal to unit v of layer j
B ... C
@ A
(j)
w`(j 1) ,v
0 1
(j)
S1
B C
S (j) = @ ... A : vector of total inputs to layer j, j = 1, ..., L.
I Let wuv
(j)
be the weight from unit u in the previous layer j 1 (j)
S`(j)
to unit v in layer j. Layer j = 0 is the input layer, `(0) := p.
0 1 0 1
I The total input to unit v of layer j is 1 1
BZ C
(j) B X1 C
(j 1)
`X B 1 C
(j) (j) (j) (j 1) (j) 0
(j)
Z =B C : vector of outputs from layer j, Z := B
(0)
@ ... A
C
Sv = w0v + wuv Zu = wv Z (j 1)
@ ... A
(j) Xp
u=1 Z`(j)
(j) (j)
Its output is Zv = h(Sv ).
34 41 35 41
I
0 1 0 1 Pseudo-code algorithm for computing the output.
w01
(j) (j)
w11 ...
(j)
w`(j (j) 0
1) ,1 w1
B (j) (j) (j) C B (j) 0 C Input: covariates X1 , ..., Xp and weights w = (W (1) , ..., W (L) ),
Bw w12 ... w`(j 1) ,2 C B C
W (j) =B 02 C = B w2 C
B ... C B C
@ ... ... A @ ... A Output: ⌘
(j) (j) (j) (j) 0
w0,`(j) w1,`(j) ... w`(j 1) ,`(j) w`(j) I Z (0) := (1, X1 , ..., Xp )0
I For i = 1, ..., L:
be the matrix of all weights from layer j 1 to layer j.
I S (j) = W (j) (j 1)
✓ Z ◆
I Then 1
I Z (j) =
S (j) = W (j) Z (j 1) h(S (j) ))
I The final output of the network is I ⌘= 0 Z (L) .
(L) (L) 0
⌘= 0 + 1 Z1 + ... + L Z`(L) = Z (L) .
36 41 37 41
Y = ⌘(X , w , ) + ✏
Given a neural network, we now know how to compute its output ⌘
= 0 + 1 Z1 + ... + d Zd +✏
from an input vector X .
where ✏ is an error term with mean 0 and variance 2. Often, we
How is this output used for forecasting?
assume ✏ ⇠ N (0, 2 ). E
The least squares method can now be used to estimate the model
parameters ✓ = (w , , 2 ).
Note on Python: In Python, the activation function of the output
unit for regression is defined as the identity function, named linear.
38 41 39 41
Neural net for forecasting Training a neural net
39 41 40 41
Next...
41 41
Table of contents
1 32 2 32
Learning objectives
I Know the methods used to train/estimate a neural network Neural Networks for cross-sectional
model, and the difficulties in training data
I Know how to use a neural network for prediction with
cross-sectional data and time series data
3 32 4 32
Suppose that the response Y is numerical. I Let {yi , xi = (xi1 , ..., xip )> }, i = 1, ..., n be the training
From now on, we will use w to denote ALL the weights in the dataset.
I
neural net. The output is ⌘(X , w ). I The neural net regression model can be written as
The neural net model for regression is
yi = ⌘(xi , w ) + ✏i , i = 1, ..., n
Y = ⌘(X , w ) + ✏ I The parameters are ✓ = (w , 2 ).
where ✏ is an error term with mean 0 and variance 2. Often, we I Define the loss function to be the sum of squared errors
assume ✏ ⇠ N (0, 2 ). X ⇣ ⌘2
Loss(✓) = `i (✓), `i (✓) = yi ⌘(xi , w )
The least squares method can now be used to estimate the model
i
parameters.
I LS minimizes Loss(✓) to estimate ✓.
5 32 6 32
Difficulties in training a neural net Multimodality issue
Difficulties
I There are a huge number of parameters
I The surface of the loss function of often highly multimodal
I We often need big data, so computational expensive
7 32 8 32
Gradient descent method for optimization Gradient descent method for optimization
at ! 0, as t ! 1
9 32 10 32
n
X
r✓ Loss(✓) = r✓ `i (✓)
i=1
11 32 12 32
How to select learning rate at ? Back-propagation algorithm
13 32 14 32
Practice recommendations
I It’s not unusual that people have to spend a few days or even
weeks and months to train a deep learning model
Practice recommendations I If you can train a deep learning model successfully, then in
most cases you beat conventional models (linear regression,
logistic, LDA, etc) in terms of prediction accuracy
I But how to train a deep learning model successfully requires
some e↵ort. There are some implementation tricks that you
might find useful in practice
15 32 16 32
17 32 18 32
Practice recommendations Practice recommendations
O
Fix random seed Early stopping
19 32 20 32
Practice recommendations
How to select the structure of the neural net?
How many hidden layers should we use? How many units in each
layer?
I This is a challenging model selection problem. There is no Neural Networks for Time Series
definite answer.
I But there are basically three shapes to choose from: left-base
pyramid (large to small layers), right-base pyramid (small to
large layers) and retangular (roughly equal size layers)
I In many cases, a rectangular-shaped neural nets work well
I For the number of hidden layers, simply start from 1 layer,
adding more until the validation loss doesn’t get smaller.
21 32 22 32
Neural Networks for Time Series: Non-linear Neural Networks for Time Series: Non-linear
Autoregression Autoregression
I For time series, we shall use lagged values of time series as I We denote by NNAR(p, k) the NNs with p lagged inputs and
inputs to a neural network and the output as the prediction k neurons in the hidden layer and with one output as the
forecast
I This means, in general, the number of input neurons of a
I For example NNAR(12, 10) model is a neural network with the
neural network is the number of the time lag and there is only
one output neuron last 12 observations (yt 1 , yt 2 , ..., yt 12 ) to fit yt at any time
step t, with 10 neurons in the hidden layer
I We will first consider feed-forward networks with one hidden
I A NNAR(p, 0) model is equivalent to ARIMA(p, 0, 0) without
layer.
the restrictions on the parameters to ensure stationary.
23 32 24 32
Neural Networks for Time Series: Training Neural Networks for Seasonal Time Series
I Consider the neural network NNAR(p, k). For a given section
of time series yt p+1 , yt p+2 , ..., yt denote the output from
NNAR(p, k) by ybt+1 = ⌘(yt p+1 , yt p+2 , ..., yt ; w ) where w
collects all the weights in the network. I Except for the lagged data as inputs, it is useful to add the
I Given training data {y1 , ..., yn }, we form the training patterns last observed values from the same seasons as inputs, for
as follow seasonal time series
I The notation NNAR(p, P, k)m means a model with inputs
y1 , y2 , ..., yp ! yp+1 : ✏p+1 = (yp+1 ⌘(y1 , y2 , ..., yp ; w ))
y2 , y3 , ..., yp+1 ! yp+2 : ✏p+2 = (yp+2 ⌘(y2 , y3 , ..., yp+1 ; w ))
y3 , y4 , ..., yp+2 ! yp+3 : ✏p+3 = (yp+3 ⌘(y3 , y4 , ..., yp+2 ; w ))
(yt 1 , yt 2 , ..., yt p , yt m , yt 2m , ..., yt Pm )
..
. and k neurons in the hidden layer
yn p , yn p+1 , ..., yn 1 ! yn : ✏n = (yn ⌘(yn p , yn p+1 , ..., yn 1 ; w )) I What is the input for NNAR(5, 4, 10)6 ?
I To estimate the weights w , we minimise
n n
X o
min Loss(w ) = ✏2i
w
i=p+1
25 32 26 32
J
series analysis methods to identify the lag values p and P the prediction as the output unit. Here is all the data pattern
(e.g. ACF). for the training data {Y1 , Y2 , ..., Yn } for a NNAR(4, k).
I Split your data into two main sections: training section
{Y1 , Y2 , ..., Yn } and validation section {Yn+1 , Yn+2 , ..., YT }. Training Pattern 1: Y1 , Y2 , Y3 , Y4 ! Y5
For example, you may use data in the first two years for Training Pattern 2: Y2 , Y3 , Y4 , Y5 ! Y6
training, the data in the third year as validation for a time Training Pattern 3: Y3 , Y4 , Y5 , Y6 ! Y7
series of three years data .. ..
I Define the neural network architecture . .
Training Pattern n 4: Yn 4 , Yn 3 , Yn 2 , Yn 1 ! Yn
train
27 32 28 32
I Forecast using the network on the validation set I Dataset: International Airport Arriving Passengers: in csv
{Yn+1 , Yn+2 , ..., YT } : Here you will pass in four values as the format
input layer and see what the output node gets. I Prepare Data: Xtrain, Ytrain, Xtest and Ytest with
time lag 4
Validation Pattern 1: Yn 3 , Yn 2 , Yn 3 , Yn ! Ybn+1 I Define the NN architecture:
Validation Pattern 2: Yn 2 , Yn 1 , Yn , Yn+1 ! Ybn+2 I model = Sequential()
I model.add(Dense(30, input dim=time lag,
Validation Pattern 3: Yn 1 , Yn , Yn+1 , Yn+2 ! Ybn+3 activation=’relu’))
.. .. I model.add(Dense(1))
. .
I This defines a network with 30 neurons on the hidden layer
I Then, form the validation/test error. I Test on Lecture10 Example01.py
29 32 30 32
The example: Forecasting
Hidden = 3 Hidden = 30
1 24 2 24
3 24 4 24
5 24 6 24
Recurrent neural network (RNN) Recurrent neural network (RNN)
HIII 7 24 8 24
9 24 10 24
ht = h(vyt 1 + wht 1 + b)
⌘t = 0+ 1 ht
yt = ⌘t + ✏ t
2
V(yt |y1:t 1) =
11 24 12 24
Recurrent neural network (RNN): training Advanced variants of RNN
13 24 14 24
15 24 16 24
Epoch = 1 Epoch = 20
19 24 20 24
Consultation hours
I I still run consultation as usual until the exam day. Or send Study hard and play hard!
me an email to make appointment at other times.
I Check with your tutor/lecturer as they might extend their
All the best with your exams!
consultation hours as well.
21 24 22 24
Hope you’ll be like this after the final Don’t forget to give the teaching
exams...
team your feedback on the course! If
you can, please do it now.
You might win a Macbook!
23 24 24 24