You are on page 1of 16

Artificial Neural Networks

Neural networks are the backbone of AI. They are used in a variety
of applications, including image recognition, natural language
processing, and even self-driving cars. In short, if you’re interested
in AI and want to understand how it works, you need to understand
neural networks.

Let´s Begin!

We will use an example where we want to predict gender based on


person´s weight and height.

First, a little overview of AI concepts and types of neural network.

An (artificial) neural network is a type of ML model designed to


mimic human brain. Deep learning is a subfield of ML based on the
use of neural networks.
Types of neural networks

1. Feedforward Neural Networks: These are the most basic


type of neural networks. It is when data moves in one
direction between the input and output nodes. They are
mainly used for simplistic classification problems. This is the
type of neural network we will show on our example.

2.Perceptron and Multilayer Perceptron neural


networks: A perceptron is one of the earliest and simplest
models of a neuron. A Perceptron model is a binary classifier,
separating data into two different classifications.

3.Radial basis function artificial neural networks (RBF)

4.Recurrent neural networks (RNN)

5. Concurrent Neural Networks (CNN)

6.Long Short-Term Networks (LSTM)

7. Transformers

Difference between (our example of) feedforward neural


network and other types:

The simplest type of feedforward neural network is the perceptron, a


feedforward neural network with no hidden units. A perceptron is
always feedforward, that is, all the arrows are going in the
direction of the output. Neural networks in general might have
loops, and if so, are often called recurrent networks. A recurrent
network is much harder to train than a feedforward network.

Explaining Neural Networks

CHAPTER 1: INPUT AND OUTPUT

First we feed neural network with a lot of data about people´s


height, weight and gender.

Data = any info, be it quantitative (e.g. height, weight, number of


readers of my blog) or qualitative (e.g. gender, colors, comments
on my blog post.)

Second, when the network is trained, we toss at it new data — height


and weight of any person on this world — and it predicts (makes a
guess) what gender is that person.

Note that the prediction is never 100% correct. How good it is


depends on several factors which we will get to.

But how do we give data to the neural network? Simply by


entering data into a program such as Python. The trick is that such a
program only understands data in the form of numbers, so it cannot
read words or images. How different types of data are converted to
numbers is beyond the scope of this text

In conclusion — neural network does two things.


1. It trains on data.

2.After it is trained, it can tell us insightful things about other,


completely new data that it hasn´t seen.

Now you may ask — what happens in the step 1 — How does neural
network actually train data? We put our data (usually called
something like “training set”) into our model. It performs some
magic that we will get to later (it includes applying some functions
on the data) and it produces a result. Result is usually in form of
numbers (let´s say one number, for simplicity). The result is then
compared with the target.

Result = Gender of a person

Data = Height and Weight.

The result is usually called output, the data that we use are input.

Input is the data that the network uses to make predictions or


decisions. It can be any type of data, such as images, text, or
numbers

Output is the result or prediction produced by the network based


on the input it received. The output can be a single value or a set of
values, depending on the task the network is designed to perform.
For example, if the network is trained to recognize images, the
output might be a label indicating what the image shows, such as
“dog” or “cat”.
In our example, we have input data — height and weight of people.
And we get output data — predicted gender that neural network
“guesses” and we compare it with the actual gender so that the
neural network can learn how close it was and enhance its
calculations based on that.

CHAPTER 2: NEURONS

Notice on the previous chapter that in our example with height,


weight and gender, we already think that these are somehow related.
It indeed wouldn't make much sense to train a network for
predicting tomorrow´s weather based on number of words in my
blog posts or something similarly nonsense. The output data should
obviously depend on the input data.

But what is actually the relationship that connects input and


output? How does the neural network find the connection between
height & weight of you and your gender?
The magic happens in parts of the neural network called neurons.
(Sometimes they are called also nodes.)

When we say relationship, in mathematical language, it usually


means a function. This already implies the neurons apply some
functions on the data they get.

There are many of neurons in each neural network. And, similarly


to human brain (hence the name) each neuron takes an input, does
some “calculation” on it and sends it forward in form of output.
There are tons of ways the neuron can examine the data they
received. (Tons of different functions it can use).

Neurons are like tiny calculators that process information and


transmit it to other neurons in the network. They work together to
analyze and recognize patterns in data, and they are the building
blocks of the network.

Neurons are organized in layers, and they process and transmit


information between each other.

CHAPTER 3: WEIGHTS AND BIASES

What happens in one neuron when they get our data? Suppose we
put the data about people´s weight and height into a program.
Neuron then first creates from the pair (height, weight) (which are
numbers, so the programming language we use can read them well)
another number multiplied by something called weight and added
to something called bias. (DISCLAIMER: The weight in neural
network is not the weight we mentioned as weight of people.)

(height, weight) -> (height, weight) * weight + bias.

What is weight and bias? And why do we multiply and add them?

The weights are randomly initialized and optimized during the


training to minimize a loss function.
Think of neural network like a water pipe system, each pipe has
certain diameter, the bigger the pipe the more water it can flow
through. The weights are then like the pipe diameters, the bigger the
weight the more information flows through that connection.

Usually, the weights are randomly initialized then adjusted as the


learning proceeds.

Bias: inside each neuron, the linear combination of inputs and


weights includes also a bias, similar to the constant in a linear
equation, therefore the full formula of a neuron is

f( Σ(Xi * Wi ) + bias )

Bias refers to the starting point or assumption of the network before


it starts learning from data. It can be thought of as a kind of “guess”
that the network makes about what the correct output should be
before it has seen any data. This bias can affect the way the network
learns, and can sometimes lead to inaccuracies or unfairness in the
results.
How it all looks:

What happens in a neuron.


Source: https://victorzhou.com/blog/intro-to-neural-networks/

https://deepai.org/machine-learning-glossary-and-terms/weight-
artificial-neural-network
CHAPTER 4: ACTIVATING FUNCTION

Now, some math comes to the scene. What we have seen with the
weights and biases is called linear function (just a function that
takes x and creates w*x + b.)

After that, the neuron has to do one more thing — apply activation
function on the result of applying linear function.

So far we haven’t done anything different from a linear regression


(which is not anything “mysterios”, but something used commonly
in econometrics and statistics). But applying one more
function f (the activation function), we switch
from Σ(x_i*w_i)=Y to a non-linear one f(Σ(x_i*w_i))=Y

The process then looks like:

Input -> Linear Function -> Activation Function -> Output.

Why we need activation function? It is used to turn an unbounded


input into an output that has a nice, predictable form. A commonly
used activation function is the sigmoid function, but there are many
more possibilities.

How to choose the perfect activation function? Some of he


most used one is ReLU, a piecewise linear function that returns the
output only if it’s positive, and it is mainly used for hidden layers.
Besides, the output layer must have an activation compatible with
the expected output. Choosing the function also depends on type of
neural network model. For example, the linear function is suited
for regression problems while the Sigmoid is frequently used for
classification.

To see everything on one more picture:

https://medium.com/technologymadeeasy/for-dummies-the-
introduction-to-neural-networks-we-all-need-c50f6012d5eb

CHAPTER 4: GRADIENT DESCENT

We have explained neurons and activation functions above. Recall:

A neuron is just a linear function with an activation function on


top.

But that´s not all.

A neuron is just a linear function with an activation function on top


that performs some form of gradient descent.

What is gradient descent?


It is used to minimize the error by adjusting the weights.

Mathematically, gradient descent is the optimization algorithm


used to train neural networks which finds the local minimum of the
loss function by taking repeated steps in the direction of steepest
descent.

You can imagine gradient descent like a process where you go for a
hike in mountains and in every place, you choose to move in the
steepest direction. “Gradient” is a vector (an arrow) pointing to the
steepest direction you can walk if you want to get to the highest
point. Of course, it is different in every place where you are located.

“Gradient descent” then indeed means showing the opposite, that


is… a direction in which you will be descending the most.

In the case of neural network, the gradient descent algorithm looks


for the smalles “loss”, that is something like a mistake in the neural
network.
CHAPTER 5: LAYERS

Neural networks are like onions— they have layers.

Neural networks are made up of layers of neurons. Some networks


only have one input and one output layer (such as the one we have
previously explained). But some have more layers. What does it
mean? Why more layers? How are different layers different?

The reason for having more layers is to improve the accuracy of the
final result. The different layers are called hidden layers and they
represent the intermediary nodes.

Layers are like a set of filters that process information.

A layer refers to a group of interconnected nodes. These layers are


stacked on top of each other, and each layer performs a set of
calculations on the data it receives from the previous layer. The first
layer of a neural network typically receives the raw input data, and
the last layer produces the output. In between these two layers, there
can be any number of “hidden layers” that perform more complex
calculations and help the network learn more intricate patterns in
the data.

When it comes to choosing the number of layers for a neural


network, the best approach is to experiment with different options
and evaluate their performance. Hidden layers are useful to
overcome the non-linearity of data, so if you don’t need non-
linearity then you can avoid hidden layers.

It is not true that more layers mean better training. Having too
many hidden layers can cause overfitting, which is a problem that
we won´t devote time to in this text.

CHAPTER 6: TRAINING THE MODEL

What does “training a model” mean? In our example, we want, after


some time, the neural network to be able to guess as many people´s
gender as possible. That is,

Training a neural network means adjusting the internal


parameters of the network, so it can make accurate predictions or
decisions based on the input data.

This process is done by showing the network a large set of training


data and adjusting the parameters until the network’s output is as
close as possible to the desired output.

Imagine you want to teach your network to recognize pictures of cats


and dogs. You will give the network a large set of pictures of cats and
dogs, along with the correct label (“cat” or “dog”). The network will
then try to guess the label of each picture, and it will compare its
guess to the correct label. If the network’s guess is wrong, it will
adjust its parameters a little bit and try again. This process will be
repeated multiple times, and with each iteration, the network will
become better at recognizing cats and dogs. Once the network’s
predictions are accurate enough, the training process is stopped, and
the network is ready to be used on new, unseen data.

In mathematical terms, training the model is minimizing its


loss function.

In technical terms, when coding a neural network in Python or any


programming tool of your choice, training the model
means choosing an optimizer, the loss function and
metrics.

CHAPTER 7: BACKPROPAGATION

To be complete, we have to mention backpropagation which is a


method of adjusting the weights of the connections between the
neurons. It is usually used in conjunction with an optimization
algorithm such as the mentioned gradient descent.

Backpropagation: during training, the model learns by


propagating the error back into the nodes and updating the
parameters (weights and biases) to minimize the loss.

In mathematical terms, backpropagation is function fitting by


gradient descent. We take partial derivatives, and then through
an application of the chain rule we are able to propagate various
portions of the loss backwards, updating our weights along the
way.
References

 [1] https://victorzhou.com/blog/intro-to-neural-networks/)

 [2] https://www.bbntimes.com/science/artificial-intelligence-
vs-machine-learning-vs-artificial-neural-networks-vs-deep-
learning

 [3] https://cs.stackexchange.com/questions/53521/what-is-
difference-between-multilayer-perceptron-and-multilayer-
neural-network?
newreg=948ddd49cab64af09eb5d1c5ad6ed3ce

Useful resources

 https://towardsdatascience.com/deep-learning-with-python-
neural-networks-complete-tutorial-6b53c0b06af0

 3Blue1Brown (Youtube)

 https://www.freecodecamp.org/news/neural-networks-for-
dummies-a-quick-intro-to-this-fascinating-field-
795b1705104a/

 https://victorzhou.com/blog/intro-to-neural-networks/

 https://www.youtube.com/watch?v=ov_RkIJptwE

 https://playground.tensorflow.org/
#activation=tanh&batchSize=10&dataset=circle&regDataset=
reg-
plane&learningRate=0.03&regularizationRate=0&noise=0&n
etworkShape=4,2&seed=0.23935&showTestData=false&discr
etize=false&percTrainData=50&x=true&y=true&xTimesY=fal
se&xSquared=false&ySquared=false&cosX=false&sinX=false
&cosY=false&sinY=false&collectStats=false&problem=classifi
cation&initZero=false&hideText=false

 http://colah.github.io/posts/2014-03-NN-Manifolds-
Topology/

 Neural Network In 5 Minutes

 Machine Learning for Beginners: An Introduction to Neural


Networks

 Weight (Artificial Neural Network)

 Everything you need to know about Neural Networks and


Backpropagation — Machine Learning Easy and Fun

 First neural network for beginners explained (with code)

You might also like