You are on page 1of 40

Course: Intelligent Systems

Unit 2: Neural Networks


2.1 Representing Neural Networks

Daniel Manrique
2021

This work is licensed under a Creative Commons


Attribution-NonCommercial-ShareAlike 4.0 International License:
http://creativecommons.org/licenses/by-nc-sa/4.0/
Intelligent Systems
Neural Networks

1. Representing neural networks.


2. Training neural networks.
1. Linear Regression.
2. Logistic Regression.
3. Multilayer perceptrons.

2
Text Books, tutorials, and articles
n D. Manrique. (2021). From Artificial Cells to Deep Learning. An
Evolutionary Story. Archivo Digital UPM. Madrid.
n Python Tutorial: https://docs.python.org/3/tutorial/
n T. P. Lillicrap, A. Santoro, L. Marris, C. J. Akerman, and G. Hinton.
(2020). Backpropagation and the brain. Nature Reviews Neuroscience,
21, 335-346.
n A. Géron. (2019). Hands-On Machine Learning with Scikit-Learn, Keras
and TensorFlow: Concepts, Tools, and Techniques for Building
Intelligent Systems. O’Reilly, CA, USA.
n M. Abadi et al. (2016). Tensorflow: Large-Scale Machine Learning on
Heterogeneous Distributed Systems. ArXiv: 1603.04467v2.
n J. M. Font, D. Manrique, J. Ríos. (2009). Redes de Neuronas Artificiales
y Computación Evolutiva. Fundación General de la UPM, Madrid,
España.
n S. Haykin (1999). Neural Networks. A Comprehensive Foundation.
Prentice Hall, Ontario, Canada, 2nd Edition.
3
Machine learning

● Machine learning is a field


of study (discipline) within
artificial intelligence that
gives (programs) computers
the ability to learn from
data without being explicitly
programmed. Géron, 2017

● Machine learning is well suited for:


● Problems for which existing solutions require many hand-tuning or long lists of rules.
● Complex problems for which there is no good solution using a traditional approach.
● Fluctuating environments: adapting to new data.
● Discovering patterns or knowledge about complex problems and large amounts of data.

4
Learning

M. Sewell (2008). Structural risk minimization. University College London.

A (machine) learning problem consists of setting up, tuning, or finding


a configuration for modifiable components of an (intelligent or
learning) system that are responsible for its success or failure.
5
Supervised learning
● The training data fed to the algorithm include the desired or target
solutions.
● Typical examples are classification, where labels correspond to
classes, or regression, where labels correspond to the target
values.

Labelled data

https://scorecardstreet.wordpress.com/2015/12/09/is-
machine-learning-the-new-epm-black/ Laura Edell, 2015. http://www.ashbooth.com/blog/tag/machine-learning-2/.
Ash Booth. Iris flower database. 6
Unsupervised learning

http://www.frankichamaki.com/data-driven-market-
segmentation-more-effective-marketing-to-segments-using-ai/.
Franki Chamaki, 2016

Unsupervised Image-to-Image Translation


Networks: https://arxiv.org/abs/1703.00848
7
Semi-supervised learning

Carla

María

Alejandra

Pablo

Miguel

Samuel
Unsupervised Image-to-Image Translation
Networks: https://arxiv.org/abs/1703.00848 8
Reinforcement learning

By Megajuice (Own work) [CC0], via Wikimedia Commons https://www.econsejos.com/nina-quema-boca/

9
Artificial neural networks
An artificial neural network is a machine learning system
inspired by the natural (animal) nervous system, composed
of processing elements, units, or (artificial) neurons
interconnected by weighted connections.

Kernel W[3]

Kernel W[1]
Kernel W[2]
https://medium.com/autonomous-agents/mathematical-foundation-
for-activation-functions-in-artificial-neural-networks-
a51c9dd7c089#.r0uddzxdd
Features
n Learning from a set of examples.
n Neural networks learning is about finding weights that make the neural
network exhibits the desired behavior.
n This set of weights is called a solution to the problem.
n The learning process changes the synaptic weights (parameters)
between neurons to adapt their responses to achieve the expected
network behavior: to fit the dataset
n Generalization. To give adequate answers to unseen data.

W[1] W[2]

W[3]

Input 0419213143
MNIST 5361928694
examples ….

11
12
http://www.asimovinstitute.org/neural-network-zoo/
A little bit about history

2013
EANN
Manrique

2001
GANN
Manrique &
Ríos

Alan Mathison Turing (1936)


(1912-1954)
Source: Favio Vazquez
Feedforward neural networks
n These networks are organized by
layers. Each layer groups a set of
neurons that receive synapses from y1 y2
the neurons of the previous layer W[2]
and send outputs to the neurons in
the next layer.
n Feedforward NNs with a single
hidden layer are universal W[1]
approximators under some smooth
conditions, but empirical evidence x1 x2 x3
suggests that NNs with several https://commons.wikimedia.org/wiki/F
hidden layers are better adapted to ile:Artificial_neural_network.svg#/me
dia/File:Artificial_neural_network.svg
learn functions.
14
Deep and shallow
n Deep NN learning assigns weights to long casual chains
(paths) of connections throughout neurons, each of which
transforms the aggregate activation of the network.
n Shallow NN models have short such paths.

x1
y1
x2
y2
x3
W[1] W[2]

15
Components of FF neural networks
Neuron: basic network processing unit. A neuron i receives a lot of
different information from multiple inputs X, processes the information
received, and sends out a single output or response yi that is transmitted
identically to multiple neurons.

x1

x2 Neuron i yi
X
xj
xj is the jth input to a neuron i.
xn𝓍
X is the input vector to a neuron i.
n𝓍 x 1
X is a column vector of n𝓍 x 1.
yi is the output from neuron i. 16
Components of FF neural networks
Synapse: directed connection from neuron j in the previous layer ℓ-1 to
another neuron i in the current layer ℓ. And from neuron i to neuron k in
the following layer ℓ+1.

ℓ-1 Current layer ℓ ℓ+1


j i k

17
Components of FF neural networks
Synaptic weight: a real number wij representing the strength of the
connection between the neuron j in the previous layer and the neuron i. A
large weight means that the information communicated through the
connection makes a significant contribution to the new state of the receptor
neuron i.

x1
wi1
x2 wi2 Neuron i yi
wij
j i X wij

xj
wi n𝓍

n𝓍 x 1
xi n𝓍
18
Components of FF neural networks

Net: the total weighted input received by a neuron i, noted as neti.


Bias: a weight with the input fixed to 1, noted as bi or wi0.

x1 bi
wi1
x2 wi2 Neuron i yi
X wij
neti=∑n𝓍
!"# x ! w$! + bi

xj
wi n𝓍

n𝓍 x 1 xn𝓍
19
Components of FF neural networks
Activation: A neuron’s level of excitation ai given by the activation
function f (net). It is usually the output of the neuron, yi.

x1 bi
wi1
x2 wi2 Neuron i yi
X yi = f (neti)
wij

xj
wi n𝓍

xn𝓍
n𝓍 x 1
20
Components of FF neural networks
Kernel: the matrix of weights corresponding to layer ℓ, noted as W[ℓ]. We do not
consider the input layer since it has not either kernel or activation function.
The activation function is the same for all neurons in the layer ℓ, but it may differ
for the neurons in different layers.

Hidden layer 1
Input layer f[1](x) Hidden layer 2
x1
f[2](x) Output layer
Input vector X f[1](x) f[3](x) y
Output vector Y
x2
f[2](x)
W[3]
W[1]
f[1](x)
W[2]
21
The artificial neuron

x1 w1 Net

x2 y W1x n𝓍 Net
y
x3 x
Activation
bias function
b
wn𝓍
xn𝓍 n𝓍 x 1

Scalar model Matrix model


$𝓍

y = f $ x ! w! + b y = f 𝑊𝑥 + 𝑏
!"#
Net = 𝑊𝑥 + 𝑏
$𝓍

net = $ x! w! + b
!"# 22
Neural network dynamics
y1=f[1](w1,01+w1,x1x1+w1,x2x2)
y2=f[1](w2,01+w2,x1x1+w2,x2x2)
W1,x1
f[1](x) f[1](x)
x1 x1
W1,x2 f[2](x) W2,x1 f[2](x)

f[1](x) f[3](x) y f[1](x) f[3](x) y


f[2](x) W2,x2 f[2](x)
x2 f[1](x) x2
f[1}(x)

f[1](x) f[1](x) y4=f[2](w4,01+w4,1y1+w4,2y2+w4,3y3)


x1 x1 W4,1
f[2](x) W4,2 f[2](x)
f[1](x) f[3](x) y f[1](x) W4,3 f[3](x) y
W3,x1 f[2](x) f[2](x)
x2 x2
W3,x2 f[1](x) y =f[1](w 1+w x +w x ) f[1](x)
3 3,0 3,x1 1 3,x2 2

f[1](x) f[1](x)
x1 x1
f[2](x) f[2](x) W6,4
f[1](x) W5,1 f[3](x) y f[1](x) f[3](x) y
W5,2 f[2](x) W6,5 y=f[3](w6,01+w6,4y4+w6,5y5)
f[2](x)
x2 f[1](x) W5,3 y5=f[2](w5,01+w5,1y1+w5,2y2+w5,3y3) x2 f[1](x)

23
Case study: California housing dataset
Median house value regression and classification
Dataset: m=20,640 examples corresponding to districts in
California, ranging from 600 to 3,000 people.

n𝓍=9 attributes.
Label = Median house value
ny= 3 classes (classification)
ny= 1 for regression.

R. K. Pace & R. Barry, “Sparse


Spatial Autoregressions,”
Statistics & Probability Letters 33, Pacific Ocean
no. 3 (1997): 291–297.
http://lib.stat.cmu.edu/datasets/ 24
Median house value regression problem
Attributes: longitude, latitude, median age, total rooms,
total bedrooms, population, households, median income,
ocean proximity.
Median house value: [15,000-500,000]$

Attributes Output
Long. Lat. Age Rooms Beds Pop. HouH. Inc. Ocean proximity Median H. Val.

-122.23 37.88 41 880 129 322 126 8.3252 Near Bay 452600

-121.97 37.57 21 4342 783 2172 789 4.6146 <1h ocean 247600

-121.9 37.66 18 7397 1137 3126 1115 6.4994 Inland 323000

-124.17 41.8 16 2739 480 1259 436 3.7557 Near ocean 109400

-118.32 33.35 27 1675 521 744 331 2.1579 Island 450000

25
Median house value classification problem
Attributes: longitude, latitude, median age, total rooms,
total bedrooms, population, households, median income,
ocean proximity.
Median house value: Cheap [15, 141.3]; Averaged [141.4,
230.2]; Expensive [230.3, 500] th. Dollars.
Attributes Output
Long Lat. Age Rooms Beds Pop. HouH Inc. Ocean proximity Median H. Val.
. .
-122.23 37.88 41 880 129 322 126 8.3252 Near Bay Expensive

Classes
-121.97 37.57 21 4342 783 2172 789 4.6146 <1h ocean Expensive

-121.9 37.66 18 7397 1137 3126 1115 6.4994 Inland Expensive

-124.17 41.8 16 2739 480 1259 436 3.7557 Near ocean Cheap

-118.32 33.35 27 1675 521 744 331 2.1579 Island Expensive

26
ANN project stages

Christof Angermueller et al., 2016. Deep learning for computational biology, Molecular Systems Biology.

27
n Cleaning and Preparing data:
1. The total_bedrooms attribute has 207 missing values, (na or nan), which are
removed since they are very few compared to the whole dataset.
2. ISLAND has only 5 samples, not enough to generalize. This class is removed.

<1h ocean: 0;
Inland: 1; Discretizing attributes
Near bay: 2;
Near ocean: 3;
3. Dataset is randomized.
4. Classes are encoded: first discretized
and then one-hot encoded.

Cheap: 1,0,0
Averaged: 0,1,0 One-hot encoded labels
Expensive: 0,0,1 28
Data cleaning and preparing

5. Attributes are individually re-scaled, normalized with a min-max scaling


within the range [-1,1]: x-(max/2) / (max-min)/2.
Reg. Class.
Long. Lat. Age Rooms Beds Pop. HouH. Inc. Ocean proximity H. val. H. val.

-0.5623 0.763 0.17 -0.87 -0.74 -0.85 -091 0.93 0.33 0.87 001

0.4297 0.477 -.13 0.44 0.32 0.44 0.52 0.23 -1 -0.45 100

0.0212 -0.75 0.21 0.67 0.87 0.94 0.9 0.64 -0.33 0.34 010

-0.6171 -0.11 -.37 -0.31 -0.23 -0.34 -0.27 -0.34 1 -082 100

6. The correlation matrix between all pairs of attributes has been calculated
to visualize their dependencies. The results show that total_rooms,
total_bedrooms, population, and households are highly (positive)
correlated.
7. Finally, a partition of the dataset is created with three subsets: 16,342
(80%) samples for training the neural model; 2,043 (10%) for the
development testing; and 2,043 (10%) for final testing purposes.
29
Dataset partition
Long. Lat. Age Rooms Beds Pop. HouH. Inc. O. proximity H. val.

-0.5623 0.763 0.17 -0.87 -0.74 -0.85 -091 0.93 0.33 0.87
0.477 -.13 0.44 0.32 0.44 0.52 0.23 -1 -0.45
Training set 0.4297

0.0212 -0.75 0.21 0.67 0.87 0.94 0.9 0.64 -0.33 0.34

📉 -0.6171

-0.5623
-0.11

0.763
-.37

0.17
-0.31

-0.87
-0.23

-0.74
-0.34

-0.85
-0.27

-091
-0.34

0.93
1

0.33
-082

0.87

0.4297 0.477 -.13 0.44 0.32 0.44 0.52 0.23 -1 -0.45

-0.75 0.21 0.67 0.87 0.94 0.9 0.64 -0.33 0.34

🔍
0.0212

-0.6171 -0.11 -.37 -0.31 -0.23 -0.34 -0.27 -0.34 1 -082

0.477 -.13 0.44 0.32 0.44 0.52 0.23 -0.45

🔒
0.4297 -1

0.0212 -0.75 0.21 0.67 0.87 0.94 0.9 0.64 -0.33 0.34

Training set: adjust the parameters (weights and bias).


Development or validation (test) set: tune the model (hyperparameters).
(Final) test set: Reliability.
30
Measuring performance
n Training set: the data employed to adjust the parameters in
the training process: weights and bias (parameters).
n Development or validation (test) set: employed to measure
how well the model classifies unseen data and tune the
hyper-parameters (e.g., learning rate) accordingly. Since the
model is tuned according to the results of this dataset, it
seems to be better than it really is: it is biased.
n (Final) test set: never presented until the model is finished
(tuned) to predict how it fits the problem.
n Accuracy: the percentage of correctly classified examples.
n Utility function: greater is better. Maximization.
n Cost (J) and loss (ℒ ) functions (error measures), where
lower is better. Minimization. 31
Development environments

32
Computational resources

33
Our choice

34
Other open source DL libraries
n H2O: Python, R. h2o.ai 2014.

n Deeplearning4j: Java. Gibson & Patternson 2014.

n Caffe: Python, C++, Matlab. Berkeley 2013.

n Theano: Python. Montreal, 2010.

n PyTorch: C++. 2002.


35
2
36
Overview
n TensorFlow is a programming library for machine learning.
n Google announced Tensorflow 2 in 2020. They deployed this new
version on Google Colab in summer.
n Tensors may be arrays of any dimension.
n TensorFlow encourages Python, although it also allows
C++, Java, and Go.
n Tensorflow 2 promotes Keras, a library built on top
Tensorflow, to ease the construction and execution of
neural models. Keras is central to Tensorflow 2.
n Main difference between the two versions:
n Tensorflow 1 represents computations as graphs. The code writing
comprises two parts: building the computational graph and
executing it within a session (device).
n Tensorflow 2 implements eager execution. You can see the result
without the need to create a session. 37
Example in TensorFlow 1
n Matrix multiplication.
1. Assembling the graph.
n The code below produces de graph, but no performs any
operation.
# Tensorflow is firstly imported
import tensorflow as ts
# Three nodes: two constant ops and one matmul op
# First, a constant op that produces a 1x2 matrix.
node_matrix1 = tf.constant([[3., 3.]])
# Then, a constant op that produces a 2x1 matrix.
node_matrix2 = tf.constant([[2.],[2.]])
# Finally a matmul op that takes both matrices as inputs and #
computes their inner product
node_product = tf.matmul (node_matrix1, node_matrix2)

2. Launching the session to execute the graph.


with tf.Session() as sess:
result = sess.run(node_product)
print(result) 38
Example in TensorFlow 2
Eager execution
import tensorflow as tf
M1 = tf.constant([[3., 3.]]) # M1 = (3,3)
M2 = tf.constant([[2.],[2.]]) # M2 = (2,2)T
result = tf.matmul (M1, M2)
#Since the multiplication is already performed
print(result) # [[12.]]

39
Lecture slides Representing Neural Networks of the master course “Intelligent Systems”.
2021 Daniel Manrique

Suggested work citation:


D. Manrique. (2021). Representing Neural Networks. Lecture slides. In course
“Intelligent Systems” of the Master Degree in Computer Engineering. Department of
Artificial Intelligence. Universidad Politécnica de Madrid.

This work is licensed under a Creative Commons


Attribution-NonCommercial-ShareAlike 4.0 International License:
http://creativecommons.org/licenses/by-nc-sa/4.0/

You might also like