2.1 Representing Neural Networks

Course: Intelligent Systems
Unit 2: Neural Networks

2.1 Representing Neural Networks
Daniel Manrique
2021
This work is licensed under a Creative Commons

Attribution-NonCommercial-ShareAlike 4.0 International License:
http://creativecommons.org/licenses/by-nc-sa/4.0/
Intelligent Systems
Neural Networks
1. Representing neural networks.

2. Training neural networks.
1. Linear Regression.
2. Logistic Regression.
3. Multilayer perceptrons.
2
Text Books, tutorials, and articles
n D. Manrique. (2021). From Artificial Cells to Deep Learning. An
Evolutionary Story. Archivo Digital UPM. Madrid.
n Python Tutorial: https://docs.python.org/3/tutorial/
n T. P. Lillicrap, A. Santoro, L. Marris, C. J. Akerman, and G. Hinton.
(2020). Backpropagation and the brain. Nature Reviews Neuroscience,
21, 335-346.
n A. Géron. (2019). Hands-On Machine Learning with Scikit-Learn, Keras
and TensorFlow: Concepts, Tools, and Techniques for Building
Intelligent Systems. O’Reilly, CA, USA.
n M. Abadi et al. (2016). Tensorflow: Large-Scale Machine Learning on
Heterogeneous Distributed Systems. ArXiv: 1603.04467v2.
n J. M. Font, D. Manrique, J. Ríos. (2009). Redes de Neuronas Artificiales
y Computación Evolutiva. Fundación General de la UPM, Madrid,
España.
n S. Haykin (1999). Neural Networks. A Comprehensive Foundation.
Prentice Hall, Ontario, Canada, 2nd Edition.
3
Machine learning
● Machine learning is a field

of study (discipline) within
artificial intelligence that
gives (programs) computers
the ability to learn from
data without being explicitly
programmed. Géron, 2017
● Machine learning is well suited for:

● Problems for which existing solutions require many hand-tuning or long lists of rules.
● Complex problems for which there is no good solution using a traditional approach.
● Fluctuating environments: adapting to new data.
● Discovering patterns or knowledge about complex problems and large amounts of data.
4
Learning
M. Sewell (2008). Structural risk minimization. University College London.
A (machine) learning problem consists of setting up, tuning, or finding

a configuration for modifiable components of an (intelligent or
learning) system that are responsible for its success or failure.
5
Supervised learning
● The training data fed to the algorithm include the desired or target
solutions.
● Typical examples are classification, where labels correspond to
classes, or regression, where labels correspond to the target
values.
Labelled data
https://scorecardstreet.wordpress.com/2015/12/09/is-
machine-learning-the-new-epm-black/ Laura Edell, 2015. http://www.ashbooth.com/blog/tag/machine-learning-2/.
Ash Booth. Iris flower database. 6
Unsupervised learning
http://www.frankichamaki.com/data-driven-market-
segmentation-more-effective-marketing-to-segments-using-ai/.
Franki Chamaki, 2016
Unsupervised Image-to-Image Translation

Networks: https://arxiv.org/abs/1703.00848
7
Semi-supervised learning
Carla
María
Alejandra
Pablo
Miguel
Samuel
Unsupervised Image-to-Image Translation
Networks: https://arxiv.org/abs/1703.00848 8
Reinforcement learning
By Megajuice (Own work) [CC0], via Wikimedia Commons https://www.econsejos.com/nina-quema-boca/
9
Artificial neural networks
An artificial neural network is a machine learning system
inspired by the natural (animal) nervous system, composed
of processing elements, units, or (artificial) neurons
interconnected by weighted connections.
Kernel W[3]
Kernel W[1]
Kernel W[2]
https://medium.com/autonomous-agents/mathematical-foundation-
for-activation-functions-in-artificial-neural-networks-
a51c9dd7c089#.r0uddzxdd
Features
n Learning from a set of examples.
n Neural networks learning is about finding weights that make the neural
network exhibits the desired behavior.
n This set of weights is called a solution to the problem.
n The learning process changes the synaptic weights (parameters)
between neurons to adapt their responses to achieve the expected
network behavior: to fit the dataset
n Generalization. To give adequate answers to unseen data.
W[1] W[2]
W[3]
Input 0419213143
MNIST 5361928694
examples ….
11
12
http://www.asimovinstitute.org/neural-network-zoo/
A little bit about history
2013
EANN
Manrique
2001
GANN
Manrique &
Ríos
Alan Mathison Turing (1936)

(1912-1954)
Source: Favio Vazquez
Feedforward neural networks
n These networks are organized by
layers. Each layer groups a set of
neurons that receive synapses from y1 y2
the neurons of the previous layer W[2]
and send outputs to the neurons in
the next layer.
n Feedforward NNs with a single
hidden layer are universal W[1]
approximators under some smooth
conditions, but empirical evidence x1 x2 x3
suggests that NNs with several https://commons.wikimedia.org/wiki/F
hidden layers are better adapted to ile:Artificial_neural_network.svg#/me
dia/File:Artificial_neural_network.svg
learn functions.
14
Deep and shallow
n Deep NN learning assigns weights to long casual chains
(paths) of connections throughout neurons, each of which
transforms the aggregate activation of the network.
n Shallow NN models have short such paths.
x1
y1
x2
y2
x3
W[1] W[2]
15
Components of FF neural networks
Neuron: basic network processing unit. A neuron i receives a lot of
different information from multiple inputs X, processes the information
received, and sends out a single output or response yi that is transmitted
identically to multiple neurons.
x1
x2 Neuron i yi
X
xj
xj is the jth input to a neuron i.
xn𝓍
X is the input vector to a neuron i.
n𝓍 x 1
X is a column vector of n𝓍 x 1.
yi is the output from neuron i. 16
Synapse: directed connection from neuron j in the previous layer ℓ-1 to
another neuron i in the current layer ℓ. And from neuron i to neuron k in
the following layer ℓ+1.
ℓ-1 Current layer ℓ ℓ+1

j i k
17
Synaptic weight: a real number wij representing the strength of the
connection between the neuron j in the previous layer and the neuron i. A
large weight means that the information communicated through the
connection makes a significant contribution to the new state of the receptor
neuron i.
x1
wi1
x2 wi2 Neuron i yi
wij
j i X wij
xj
wi n𝓍
n𝓍 x 1
xi n𝓍
18
Net: the total weighted input received by a neuron i, noted as neti.

Bias: a weight with the input fixed to 1, noted as bi or wi0.
x1 bi
wi1
x2 wi2 Neuron i yi
X wij
neti=∑n𝓍
!"# x ! w$! + bi
xj
wi n𝓍
n𝓍 x 1 xn𝓍
19
Activation: A neuron’s level of excitation ai given by the activation
function f (net). It is usually the output of the neuron, yi.
x1 bi
wi1
x2 wi2 Neuron i yi
X yi = f (neti)
wij
xj
wi n𝓍
xn𝓍
n𝓍 x 1
20
Kernel: the matrix of weights corresponding to layer ℓ, noted as W[ℓ]. We do not
consider the input layer since it has not either kernel or activation function.
The activation function is the same for all neurons in the layer ℓ, but it may differ
for the neurons in different layers.
Hidden layer 1
Input layer f[1](x) Hidden layer 2
x1
f[2](x) Output layer
Input vector X f[1](x) f[3](x) y
Output vector Y
x2
f[2](x)
W[3]
W[1]
f[1](x)
W[2]
21
The artificial neuron
x1 w1 Net
x2 y W1x n𝓍 Net
y
x3 x
Activation
bias function
b
wn𝓍
xn𝓍 n𝓍 x 1
Scalar model Matrix model

$𝓍
y = f $ x ! w! + b y = f 𝑊𝑥 + 𝑏
!"#
Net = 𝑊𝑥 + 𝑏
$𝓍
net = $ x! w! + b
!"# 22
Neural network dynamics
y1=f[1](w1,01+w1,x1x1+w1,x2x2)
y2=f[1](w2,01+w2,x1x1+w2,x2x2)
W1,x1
f[1](x) f[1](x)
x1 x1
W1,x2 f[2](x) W2,x1 f[2](x)
f[1](x) f[3](x) y f[1](x) f[3](x) y

f[2](x) W2,x2 f[2](x)
x2 f[1](x) x2
f[1}(x)
f[1](x) f[1](x) y4=f[2](w4,01+w4,1y1+w4,2y2+w4,3y3)

x1 x1 W4,1
f[2](x) W4,2 f[2](x)
f[1](x) f[3](x) y f[1](x) W4,3 f[3](x) y
W3,x1 f[2](x) f[2](x)
x2 x2
W3,x2 f[1](x) y =f[1](w 1+w x +w x ) f[1](x)
3 3,0 3,x1 1 3,x2 2
f[1](x) f[1](x)
x1 x1
f[2](x) f[2](x) W6,4
f[1](x) W5,1 f[3](x) y f[1](x) f[3](x) y
W5,2 f[2](x) W6,5 y=f[3](w6,01+w6,4y4+w6,5y5)
f[2](x)
x2 f[1](x) W5,3 y5=f[2](w5,01+w5,1y1+w5,2y2+w5,3y3) x2 f[1](x)
23
Case study: California housing dataset
Median house value regression and classification
Dataset: m=20,640 examples corresponding to districts in
California, ranging from 600 to 3,000 people.
n𝓍=9 attributes.
Label = Median house value
ny= 3 classes (classification)
ny= 1 for regression.
R. K. Pace & R. Barry, “Sparse

Spatial Autoregressions,”
Statistics & Probability Letters 33, Pacific Ocean
no. 3 (1997): 291–297.
http://lib.stat.cmu.edu/datasets/ 24
Median house value regression problem
Attributes: longitude, latitude, median age, total rooms,
total bedrooms, population, households, median income,
ocean proximity.
Median house value: [15,000-500,000]$
Attributes Output
Long. Lat. Age Rooms Beds Pop. HouH. Inc. Ocean proximity Median H. Val.
-122.23 37.88 41 880 129 322 126 8.3252 Near Bay 452600
-121.97 37.57 21 4342 783 2172 789 4.6146 <1h ocean 247600
-121.9 37.66 18 7397 1137 3126 1115 6.4994 Inland 323000
-124.17 41.8 16 2739 480 1259 436 3.7557 Near ocean 109400
-118.32 33.35 27 1675 521 744 331 2.1579 Island 450000
25
Median house value classification problem
Attributes: longitude, latitude, median age, total rooms,
total bedrooms, population, households, median income,
ocean proximity.
Median house value: Cheap [15, 141.3]; Averaged [141.4,
230.2]; Expensive [230.3, 500] th. Dollars.
Attributes Output
Long Lat. Age Rooms Beds Pop. HouH Inc. Ocean proximity Median H. Val.
. .
-122.23 37.88 41 880 129 322 126 8.3252 Near Bay Expensive
Classes
-121.97 37.57 21 4342 783 2172 789 4.6146 <1h ocean Expensive
-121.9 37.66 18 7397 1137 3126 1115 6.4994 Inland Expensive
-124.17 41.8 16 2739 480 1259 436 3.7557 Near ocean Cheap
-118.32 33.35 27 1675 521 744 331 2.1579 Island Expensive
26
ANN project stages
Christof Angermueller et al., 2016. Deep learning for computational biology, Molecular Systems Biology.
27
n Cleaning and Preparing data:
1. The total_bedrooms attribute has 207 missing values, (na or nan), which are
removed since they are very few compared to the whole dataset.
2. ISLAND has only 5 samples, not enough to generalize. This class is removed.
<1h ocean: 0;
Inland: 1; Discretizing attributes
Near bay: 2;
Near ocean: 3;
3. Dataset is randomized.
4. Classes are encoded: first discretized
and then one-hot encoded.
Cheap: 1,0,0
Averaged: 0,1,0 One-hot encoded labels
Expensive: 0,0,1 28
Data cleaning and preparing
5. Attributes are individually re-scaled, normalized with a min-max scaling

within the range [-1,1]: x-(max/2) / (max-min)/2.
Reg. Class.
Long. Lat. Age Rooms Beds Pop. HouH. Inc. Ocean proximity H. val. H. val.
-0.5623 0.763 0.17 -0.87 -0.74 -0.85 -091 0.93 0.33 0.87 001
0.4297 0.477 -.13 0.44 0.32 0.44 0.52 0.23 -1 -0.45 100
0.0212 -0.75 0.21 0.67 0.87 0.94 0.9 0.64 -0.33 0.34 010
-0.6171 -0.11 -.37 -0.31 -0.23 -0.34 -0.27 -0.34 1 -082 100
6. The correlation matrix between all pairs of attributes has been calculated
to visualize their dependencies. The results show that total_rooms,
total_bedrooms, population, and households are highly (positive)
correlated.
7. Finally, a partition of the dataset is created with three subsets: 16,342
(80%) samples for training the neural model; 2,043 (10%) for the
development testing; and 2,043 (10%) for final testing purposes.
29
Dataset partition
Long. Lat. Age Rooms Beds Pop. HouH. Inc. O. proximity H. val.
-0.5623 0.763 0.17 -0.87 -0.74 -0.85 -091 0.93 0.33 0.87
0.477 -.13 0.44 0.32 0.44 0.52 0.23 -1 -0.45
Training set 0.4297
0.0212 -0.75 0.21 0.67 0.87 0.94 0.9 0.64 -0.33 0.34
📉 -0.6171
-0.5623
-0.11
0.763
-.37
0.17
-0.31
-0.87
-0.23
-0.74
-0.34
-0.85
-0.27
-091
-0.34
0.93
1
0.33
-082
0.87
0.4297 0.477 -.13 0.44 0.32 0.44 0.52 0.23 -1 -0.45
-0.75 0.21 0.67 0.87 0.94 0.9 0.64 -0.33 0.34
🔍
0.0212
-0.6171 -0.11 -.37 -0.31 -0.23 -0.34 -0.27 -0.34 1 -082
0.477 -.13 0.44 0.32 0.44 0.52 0.23 -0.45
🔒
0.4297 -1
0.0212 -0.75 0.21 0.67 0.87 0.94 0.9 0.64 -0.33 0.34
Training set: adjust the parameters (weights and bias).

Development or validation (test) set: tune the model (hyperparameters).
(Final) test set: Reliability.
30
Measuring performance
n Training set: the data employed to adjust the parameters in
the training process: weights and bias (parameters).
n Development or validation (test) set: employed to measure
how well the model classifies unseen data and tune the
hyper-parameters (e.g., learning rate) accordingly. Since the
model is tuned according to the results of this dataset, it
seems to be better than it really is: it is biased.
n (Final) test set: never presented until the model is finished
(tuned) to predict how it fits the problem.
n Accuracy: the percentage of correctly classified examples.
n Utility function: greater is better. Maximization.
n Cost (J) and loss (ℒ ) functions (error measures), where
lower is better. Minimization. 31
Development environments
32
Computational resources
33
Our choice
34
Other open source DL libraries
n H2O: Python, R. h2o.ai 2014.
n Deeplearning4j: Java. Gibson & Patternson 2014.
n Caffe: Python, C++, Matlab. Berkeley 2013.
n Theano: Python. Montreal, 2010.
n PyTorch: C++. 2002.

35
2
36
Overview
n TensorFlow is a programming library for machine learning.
n Google announced Tensorflow 2 in 2020. They deployed this new
version on Google Colab in summer.
n Tensors may be arrays of any dimension.
n TensorFlow encourages Python, although it also allows
C++, Java, and Go.
n Tensorflow 2 promotes Keras, a library built on top
Tensorflow, to ease the construction and execution of
neural models. Keras is central to Tensorflow 2.
n Main difference between the two versions:
n Tensorflow 1 represents computations as graphs. The code writing
comprises two parts: building the computational graph and
executing it within a session (device).
n Tensorflow 2 implements eager execution. You can see the result
without the need to create a session. 37
Example in TensorFlow 1
n Matrix multiplication.
1. Assembling the graph.
n The code below produces de graph, but no performs any
operation.
# Tensorflow is firstly imported
import tensorflow as ts
# Three nodes: two constant ops and one matmul op
# First, a constant op that produces a 1x2 matrix.
node_matrix1 = tf.constant([[3., 3.]])
# Then, a constant op that produces a 2x1 matrix.
node_matrix2 = tf.constant([[2.],[2.]])
# Finally a matmul op that takes both matrices as inputs and #
computes their inner product
node_product = tf.matmul (node_matrix1, node_matrix2)
2. Launching the session to execute the graph.

with tf.Session() as sess:
result = sess.run(node_product)
print(result) 38
Example in TensorFlow 2
Eager execution
import tensorflow as tf
M1 = tf.constant([[3., 3.]]) # M1 = (3,3)
M2 = tf.constant([[2.],[2.]]) # M2 = (2,2)T
result = tf.matmul (M1, M2)
#Since the multiplication is already performed
print(result) # [[12.]]
39
Lecture slides Representing Neural Networks of the master course “Intelligent Systems”.
2021 Daniel Manrique
Suggested work citation:

D. Manrique. (2021). Representing Neural Networks. Lecture slides. In course
“Intelligent Systems” of the Master Degree in Computer Engineering. Department of
Artificial Intelligence. Universidad Politécnica de Madrid.
This work is licensed under a Creative Commons

Attribution-NonCommercial-ShareAlike 4.0 International License:
http://creativecommons.org/licenses/by-nc-sa/4.0/

2.1 Representing Neural Networks

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2.1 Representing Neural Networks

Uploaded by

Copyright:

Available Formats

Course: Intelligent Systems

Unit 2: Neural Networks

This work is licensed under a Creative Commons

1. Representing neural networks.

● Machine learning is a field

● Machine learning is well suited for:

M. Sewell (2008). Structural risk minimization. University College London.

A (machine) learning problem consists of setting up, tuning, or finding

Unsupervised Image-to-Image Translation

By Megajuice (Own work) [CC0], via Wikimedia Commons https://www.econsejos.com/nina-quema-boca/

Alan Mathison Turing (1936)

ℓ-1 Current layer ℓ ℓ+1

Net: the total weighted input received by a neuron i, noted as neti.

Scalar model Matrix model

f[1](x) f[3](x) y f[1](x) f[3](x) y

f[1](x) f[1](x) y4=f[2](w4,01+w4,1y1+w4,2y2+w4,3y3)

R. K. Pace & R. Barry, “Sparse

-121.9 37.66 18 7397 1137 3126 1115 6.4994 Inland 323000

-118.32 33.35 27 1675 521 744 331 2.1579 Island 450000

-121.9 37.66 18 7397 1137 3126 1115 6.4994 Inland Expensive

-118.32 33.35 27 1675 521 744 331 2.1579 Island Expensive

5. Attributes are individually re-scaled, normalized with a min-max scaling

0.4297 0.477 -.13 0.44 0.32 0.44 0.52 0.23 -1 -0.45

-0.75 0.21 0.67 0.87 0.94 0.9 0.64 -0.33 0.34

-0.6171 -0.11 -.37 -0.31 -0.23 -0.34 -0.27 -0.34 1 -082

0.477 -.13 0.44 0.32 0.44 0.52 0.23 -0.45

Training set: adjust the parameters (weights and bias).

n Deeplearning4j: Java. Gibson & Patternson 2014.

n Caffe: Python, C++, Matlab. Berkeley 2013.

n Theano: Python. Montreal, 2010.

n PyTorch: C++. 2002.

2. Launching the session to execute the graph.

Suggested work citation:

This work is licensed under a Creative Commons

You might also like