Artificial Neural Networks

Artificial Neural Networks
Neural networks are the backbone of AI. They are used in a variety
of applications, including image recognition, natural language
processing, and even self-driving cars. In short, if you’re interested
in AI and want to understand how it works, you need to understand
neural networks.
Let´s Begin!
We will use an example where we want to predict gender based on

person´s weight and height.
First, a little overview of AI concepts and types of neural network.
An (artificial) neural network is a type of ML model designed to

mimic human brain. Deep learning is a subfield of ML based on the
use of neural networks.
Types of neural networks
1. Feedforward Neural Networks: These are the most basic

type of neural networks. It is when data moves in one
direction between the input and output nodes. They are
mainly used for simplistic classification problems. This is the
type of neural network we will show on our example.
2.Perceptron and Multilayer Perceptron neural

networks: A perceptron is one of the earliest and simplest
models of a neuron. A Perceptron model is a binary classifier,
separating data into two different classifications.
3.Radial basis function artificial neural networks (RBF)
4.Recurrent neural networks (RNN)
5. Concurrent Neural Networks (CNN)
6.Long Short-Term Networks (LSTM)
7. Transformers
Difference between (our example of) feedforward neural

network and other types:
The simplest type of feedforward neural network is the perceptron, a

feedforward neural network with no hidden units. A perceptron is
always feedforward, that is, all the arrows are going in the
direction of the output. Neural networks in general might have
loops, and if so, are often called recurrent networks. A recurrent
network is much harder to train than a feedforward network.
Explaining Neural Networks
CHAPTER 1: INPUT AND OUTPUT
First we feed neural network with a lot of data about people´s

height, weight and gender.
Data = any info, be it quantitative (e.g. height, weight, number of

readers of my blog) or qualitative (e.g. gender, colors, comments
on my blog post.)
Second, when the network is trained, we toss at it new data — height

and weight of any person on this world — and it predicts (makes a
guess) what gender is that person.
Note that the prediction is never 100% correct. How good it is

depends on several factors which we will get to.
But how do we give data to the neural network? Simply by

entering data into a program such as Python. The trick is that such a
program only understands data in the form of numbers, so it cannot
read words or images. How different types of data are converted to
numbers is beyond the scope of this text
In conclusion — neural network does two things.

1. It trains on data.
2.After it is trained, it can tell us insightful things about other,

completely new data that it hasn´t seen.
Now you may ask — what happens in the step 1 — How does neural
network actually train data? We put our data (usually called
something like “training set”) into our model. It performs some
magic that we will get to later (it includes applying some functions
on the data) and it produces a result. Result is usually in form of
numbers (let´s say one number, for simplicity). The result is then
compared with the target.
Result = Gender of a person
Data = Height and Weight.
The result is usually called output, the data that we use are input.
Input is the data that the network uses to make predictions or

decisions. It can be any type of data, such as images, text, or
numbers
Output is the result or prediction produced by the network based

on the input it received. The output can be a single value or a set of
values, depending on the task the network is designed to perform.
For example, if the network is trained to recognize images, the
output might be a label indicating what the image shows, such as
“dog” or “cat”.
In our example, we have input data — height and weight of people.
And we get output data — predicted gender that neural network
“guesses” and we compare it with the actual gender so that the
neural network can learn how close it was and enhance its
calculations based on that.
CHAPTER 2: NEURONS
Notice on the previous chapter that in our example with height,

weight and gender, we already think that these are somehow related.
It indeed wouldn't make much sense to train a network for
predicting tomorrow´s weather based on number of words in my
blog posts or something similarly nonsense. The output data should
obviously depend on the input data.
But what is actually the relationship that connects input and

output? How does the neural network find the connection between
height & weight of you and your gender?
The magic happens in parts of the neural network called neurons.
(Sometimes they are called also nodes.)
When we say relationship, in mathematical language, it usually

means a function. This already implies the neurons apply some
functions on the data they get.
There are many of neurons in each neural network. And, similarly

to human brain (hence the name) each neuron takes an input, does
some “calculation” on it and sends it forward in form of output.
There are tons of ways the neuron can examine the data they
received. (Tons of different functions it can use).
Neurons are like tiny calculators that process information and

transmit it to other neurons in the network. They work together to
analyze and recognize patterns in data, and they are the building
blocks of the network.
Neurons are organized in layers, and they process and transmit

information between each other.
CHAPTER 3: WEIGHTS AND BIASES
What happens in one neuron when they get our data? Suppose we
put the data about people´s weight and height into a program.
Neuron then first creates from the pair (height, weight) (which are
numbers, so the programming language we use can read them well)
another number multiplied by something called weight and added
to something called bias. (DISCLAIMER: The weight in neural
network is not the weight we mentioned as weight of people.)
(height, weight) -> (height, weight) * weight + bias.
What is weight and bias? And why do we multiply and add them?
The weights are randomly initialized and optimized during the

training to minimize a loss function.
Think of neural network like a water pipe system, each pipe has
certain diameter, the bigger the pipe the more water it can flow
through. The weights are then like the pipe diameters, the bigger the
weight the more information flows through that connection.
Usually, the weights are randomly initialized then adjusted as the

learning proceeds.
Bias: inside each neuron, the linear combination of inputs and

weights includes also a bias, similar to the constant in a linear
equation, therefore the full formula of a neuron is
f( Σ(Xi * Wi ) + bias )
Bias refers to the starting point or assumption of the network before

it starts learning from data. It can be thought of as a kind of “guess”
that the network makes about what the correct output should be
before it has seen any data. This bias can affect the way the network
learns, and can sometimes lead to inaccuracies or unfairness in the
results.
How it all looks:
What happens in a neuron.

Source: https://victorzhou.com/blog/intro-to-neural-networks/
https://deepai.org/machine-learning-glossary-and-terms/weight-
artificial-neural-network
CHAPTER 4: ACTIVATING FUNCTION
Now, some math comes to the scene. What we have seen with the
weights and biases is called linear function (just a function that
takes x and creates w*x + b.)
After that, the neuron has to do one more thing — apply activation
function on the result of applying linear function.
So far we haven’t done anything different from a linear regression

(which is not anything “mysterios”, but something used commonly
in econometrics and statistics). But applying one more
function f (the activation function), we switch
from Σ(x_i*w_i)=Y to a non-linear one f(Σ(x_i*w_i))=Y
The process then looks like:
Input -> Linear Function -> Activation Function -> Output.
Why we need activation function? It is used to turn an unbounded

input into an output that has a nice, predictable form. A commonly
used activation function is the sigmoid function, but there are many
more possibilities.
How to choose the perfect activation function? Some of he

most used one is ReLU, a piecewise linear function that returns the
output only if it’s positive, and it is mainly used for hidden layers.
Besides, the output layer must have an activation compatible with
the expected output. Choosing the function also depends on type of
neural network model. For example, the linear function is suited
for regression problems while the Sigmoid is frequently used for
classification.
To see everything on one more picture:
https://medium.com/technologymadeeasy/for-dummies-the-
introduction-to-neural-networks-we-all-need-c50f6012d5eb
CHAPTER 4: GRADIENT DESCENT
We have explained neurons and activation functions above. Recall:
A neuron is just a linear function with an activation function on

top.
But that´s not all.
A neuron is just a linear function with an activation function on top

that performs some form of gradient descent.
What is gradient descent?

It is used to minimize the error by adjusting the weights.
Mathematically, gradient descent is the optimization algorithm

used to train neural networks which finds the local minimum of the
loss function by taking repeated steps in the direction of steepest
descent.
You can imagine gradient descent like a process where you go for a
hike in mountains and in every place, you choose to move in the
steepest direction. “Gradient” is a vector (an arrow) pointing to the
steepest direction you can walk if you want to get to the highest
point. Of course, it is different in every place where you are located.
“Gradient descent” then indeed means showing the opposite, that

is… a direction in which you will be descending the most.
In the case of neural network, the gradient descent algorithm looks

for the smalles “loss”, that is something like a mistake in the neural
network.
CHAPTER 5: LAYERS
Neural networks are like onions— they have layers.
Neural networks are made up of layers of neurons. Some networks

only have one input and one output layer (such as the one we have
previously explained). But some have more layers. What does it
mean? Why more layers? How are different layers different?
The reason for having more layers is to improve the accuracy of the
final result. The different layers are called hidden layers and they
represent the intermediary nodes.
Layers are like a set of filters that process information.
A layer refers to a group of interconnected nodes. These layers are

stacked on top of each other, and each layer performs a set of
calculations on the data it receives from the previous layer. The first
layer of a neural network typically receives the raw input data, and
the last layer produces the output. In between these two layers, there
can be any number of “hidden layers” that perform more complex
calculations and help the network learn more intricate patterns in
the data.
When it comes to choosing the number of layers for a neural

network, the best approach is to experiment with different options
and evaluate their performance. Hidden layers are useful to
overcome the non-linearity of data, so if you don’t need non-
linearity then you can avoid hidden layers.
It is not true that more layers mean better training. Having too
many hidden layers can cause overfitting, which is a problem that
we won´t devote time to in this text.
CHAPTER 6: TRAINING THE MODEL
What does “training a model” mean? In our example, we want, after

some time, the neural network to be able to guess as many people´s
gender as possible. That is,
Training a neural network means adjusting the internal

parameters of the network, so it can make accurate predictions or
decisions based on the input data.
This process is done by showing the network a large set of training

data and adjusting the parameters until the network’s output is as
close as possible to the desired output.
Imagine you want to teach your network to recognize pictures of cats

and dogs. You will give the network a large set of pictures of cats and
dogs, along with the correct label (“cat” or “dog”). The network will
then try to guess the label of each picture, and it will compare its
guess to the correct label. If the network’s guess is wrong, it will
adjust its parameters a little bit and try again. This process will be
repeated multiple times, and with each iteration, the network will
become better at recognizing cats and dogs. Once the network’s
predictions are accurate enough, the training process is stopped, and
the network is ready to be used on new, unseen data.
In mathematical terms, training the model is minimizing its

loss function.
In technical terms, when coding a neural network in Python or any

programming tool of your choice, training the model
means choosing an optimizer, the loss function and
metrics.
CHAPTER 7: BACKPROPAGATION
To be complete, we have to mention backpropagation which is a

method of adjusting the weights of the connections between the
neurons. It is usually used in conjunction with an optimization
algorithm such as the mentioned gradient descent.
Backpropagation: during training, the model learns by

propagating the error back into the nodes and updating the
parameters (weights and biases) to minimize the loss.
In mathematical terms, backpropagation is function fitting by

gradient descent. We take partial derivatives, and then through
an application of the chain rule we are able to propagate various
portions of the loss backwards, updating our weights along the
way.
References
 [1] https://victorzhou.com/blog/intro-to-neural-networks/)
 [2] https://www.bbntimes.com/science/artificial-intelligence-
vs-machine-learning-vs-artificial-neural-networks-vs-deep-
learning
 [3] https://cs.stackexchange.com/questions/53521/what-is-
difference-between-multilayer-perceptron-and-multilayer-
neural-network?
newreg=948ddd49cab64af09eb5d1c5ad6ed3ce
Useful resources
 https://towardsdatascience.com/deep-learning-with-python-
neural-networks-complete-tutorial-6b53c0b06af0
 3Blue1Brown (Youtube)
 https://www.freecodecamp.org/news/neural-networks-for-
dummies-a-quick-intro-to-this-fascinating-field-
795b1705104a/
 https://victorzhou.com/blog/intro-to-neural-networks/
 https://www.youtube.com/watch?v=ov_RkIJptwE
 https://playground.tensorflow.org/
#activation=tanh&batchSize=10&dataset=circle&regDataset=
reg-
plane&learningRate=0.03&regularizationRate=0&noise=0&n
etworkShape=4,2&seed=0.23935&showTestData=false&discr
etize=false&percTrainData=50&x=true&y=true&xTimesY=fal
se&xSquared=false&ySquared=false&cosX=false&sinX=false
&cosY=false&sinY=false&collectStats=false&problem=classifi
cation&initZero=false&hideText=false
 http://colah.github.io/posts/2014-03-NN-Manifolds-
Topology/
 Neural Network In 5 Minutes
 Machine Learning for Beginners: An Introduction to Neural

Networks
 Weight (Artificial Neural Network)
 Everything you need to know about Neural Networks and

Backpropagation — Machine Learning Easy and Fun
 First neural network for beginners explained (with code)

Artificial Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Neural Networks

Uploaded by

Copyright:

Available Formats

Artificial Neural Networks

We will use an example where we want to predict gender based on

First, a little overview of AI concepts and types of neural network.

An (artificial) neural network is a type of ML model designed to

1. Feedforward Neural Networks: These are the most basic

2.Perceptron and Multilayer Perceptron neural

3.Radial basis function artificial neural networks (RBF)

4.Recurrent neural networks (RNN)

5. Concurrent Neural Networks (CNN)

6.Long Short-Term Networks (LSTM)

Difference between (our example of) feedforward neural

The simplest type of feedforward neural network is the perceptron, a

Explaining Neural Networks

CHAPTER 1: INPUT AND OUTPUT

First we feed neural network with a lot of data about people´s

Data = any info, be it quantitative (e.g. height, weight, number of

Second, when the network is trained, we toss at it new data — height

Note that the prediction is never 100% correct. How good it is

But how do we give data to the neural network? Simply by

In conclusion — neural network does two things.

2.After it is trained, it can tell us insightful things about other,

Result = Gender of a person

Data = Height and Weight.

Input is the data that the network uses to make predictions or

Output is the result or prediction produced by the network based

Notice on the previous chapter that in our example with height,

But what is actually the relationship that connects input and

When we say relationship, in mathematical language, it usually

There are many of neurons in each neural network. And, similarly

Neurons are like tiny calculators that process information and

Neurons are organized in layers, and they process and transmit

CHAPTER 3: WEIGHTS AND BIASES

(height, weight) -> (height, weight) * weight + bias.

The weights are randomly initialized and optimized during the

Usually, the weights are randomly initialized then adjusted as the

Bias: inside each neuron, the linear combination of inputs and

Bias refers to the starting point or assumption of the network before

What happens in a neuron.

So far we haven’t done anything different from a linear regression

The process then looks like:

Input -> Linear Function -> Activation Function -> Output.

Why we need activation function? It is used to turn an unbounded

How to choose the perfect activation function? Some of he

To see everything on one more picture:

CHAPTER 4: GRADIENT DESCENT

We have explained neurons and activation functions above. Recall:

A neuron is just a linear function with an activation function on

But that´s not all.

A neuron is just a linear function with an activation function on top

What is gradient descent?

Mathematically, gradient descent is the optimization algorithm

“Gradient descent” then indeed means showing the opposite, that

In the case of neural network, the gradient descent algorithm looks

Neural networks are like onions— they have layers.

Neural networks are made up of layers of neurons. Some networks

Layers are like a set of filters that process information.

A layer refers to a group of interconnected nodes. These layers are

When it comes to choosing the number of layers for a neural

CHAPTER 6: TRAINING THE MODEL

What does “training a model” mean? In our example, we want, after

Training a neural network means adjusting the internal

This process is done by showing the network a large set of training

Imagine you want to teach your network to recognize pictures of cats