You are on page 1of 11

1.

Determine and report the network architecture, including the number of input nodes, the
number of output nodes and the number of hidden nodes.
Neural Networks are the functional unit of deep learning and are known to mimic the behavior of
the human brain to solve complex data-driven problems. The input data is processed through
different layers of artificial neurons stacked together to produce the desired output. From speech
recognition and person recognition to healthcare and marketing, Neural Networks have been used
in a varied set of domains. 

Neural Network Architecture


Key Components of the Neural Network Architecture
The Neural Network architecture is made of individual units called neurons that mimic the
biological behavior of the brain. 
Here are the various components of a neuron.
Neuron in Artificial Neural Network
Input - It is the set of features that are fed into the model for the learning process. For example, the
input in object detection can be an array of pixel values pertaining to an image.
Weight - Its main function is to give importance to those features that contribute more towards the
learning. It does so by introducing scalar multiplication between the input value and the weight
matrix. For example, a negative word would impact the decision of the sentiment analysis model
more than a pair of neutral words.
Transfer function - The job of the transfer function is to combine multiple inputs into one output
value so that the activation function can be applied. It is done by a simple summation of all the
inputs to the transfer function. 
Activation Function - Introduces non-linearity in the working of perceptrons to consider varying
linearity with the inputs. Without this, the output would just be a linear combination of input values
and would not be able to introduce non-linearity in the network.
Bias - The role of bias is to shift the value produced by the activation function. Its role is similar to
the role of a constant in a linear function. When multiple neurons are stacked together in a row,
they constitute a layer, and multiple layers piled next to each other are called a multi-layer neural
network.

Nodes
Input Layer
Input layer should contain 387 nodes for each of the feature. Output layer should contain 3 nodes
for each class. The data that we feed to the model is loaded into the input layer from external
sources like a CSV file or a web service. It is the only visible layer in the complete Neural Network
architecture that passes the complete information from the outside world without any computation.
Hidden Layers
If data is less complex and is having fewer dimensions or features then neural networks
with 1 to 2 hidden layers would work. If data is having large dimensions or features then to
get an optimum solution, 3 to 5 hidden layers can be used.
The hidden layers are what makes deep learning what it is today. They are intermediate layers that
do all the computations and extract the features from the data. There can be multiple interconnected
hidden layers that account for searching different hidden features in the data. For example, in image
processing, the first hidden layers are responsible for higher-level features like edges, shapes, or
boundaries. On the other hand, the later hidden layers perform more complicated tasks like
identifying complete objects (a car, a building, a person).
Output Layer 
Each binary network has the structure of 12–5–1, i.e., 12 input nodes, 5 hidden neurons,
and 1 output node. Activation functions for both hidden and output neurons are logistic
sigmoidal functions.
The output layer takes input from preceding hidden layers and comes to a final prediction based on
the model’s learnings. It is the most important layer where we get the final result. In the case of
classification/regression models, the output layer generally has a single node. However, it is
completely problem-specific and dependent on the way the model was built.

2.RATIONALE OF NEURAL NETWORK


The artificial neural network is a computational system created to mimic how the human
brain solves problems.
Artificial neural networks and the human brain share the trait of learning to interpret input
and find answers, which is a similarity between both.
Principles of Neural networks
Simple neurons (also known as nodes or units) are arranged in layers and connected by links
to form neural networks, which are parallel information processing systems. The neurons in
artificial neural networks are analogous to the cell bodies in biology, and the links are
analogous to the axons, and they mimic the highly interconnected structures of the brain and
nervous system of animals and humans. Only multilayer feedforward neural networks, one of
many different forms of NN, are employed. A multilayer feedforward neural network with
three input variables (x1, x2, and x3) and one output variable (y) is demonstrated.
    Network elements of a multilayer feedforward backpropagation network.

Types of Neural Networks

 Convolutional neural networks (CNNs) There are five types of layers in


convolutional neural networks (CNNs): input, convolution, pooling, and output are all
fully connected. The purpose of each layer is distinct, such as connecting, activating,
or summarizing. Image classification and object detection have become popular
thanks to convolutional neural networks. However, CNNs have also been used in
natural language processing and forecasting, among other fields.

 Recurrent neural networks (RNNs) make use of sequential data like time-stamped


sensor device data or a spoken sentence made up of a series of terms. A recurrent
neural network, in contrast to conventional neural networks, does not have
independent inputs; rather, each element's output is dependent on the computations of
its predecessors. Forecasting, sentiment analysis, and other text-based applications all
make use of RNNs.

 Feedforward neural networks, in which each perceptron in one layer is connected to


every perceptron from the next layer. Information is fed forward from one layer to the
next in the forward direction only. There are no feedback loops.

 Autoencoder neural networks are used to create abstractions known as encoders.


Autoencoders are considered unsupervised because they attempt to model the inputs
themselves, in contrast to more conventional neural networks. Autoencoders are based
on the idea of sensitizing the relevant and desensitizing the irrelevant. Higher layers,
which are the ones closest to where a decoder layer is added, are where additional
abstractions are created as layers are added. Linear and nonlinear classifiers can then
make use of these abstractions.
General structural layer of Neural network

Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations
to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.

An activation function receives the weighted total as an input to produce the output. A
node's firing status is decided by activation functions. The output layer is only accessible to
those that have been fired. Depending on the kind of job we're doing, we can use a variety
of different activation functions.
Applications of Neural network

1. Facial Recognition 
Strong surveillance systems are being implemented using facial recognition technologies. The
digital images and the human face are matched by Recognition Systems. They are used for
selective entry in offices. As a result, the systems verify a person's identity by comparing it to
a database's ID list. Face recognition and image processing are made possible by
Convolutional Neural Networks (CNN).
2. Stock Market Prediction
Market risks apply to investments. In the highly volatile stock market, it is nearly impossible
to anticipate future changes. Before neural networks, the ever-changing bullish and bearish
phases were unpredictable.
A Multilayer Perceptron MLP is used to accurately predict stocks in real time. MLP has
multiple layers of nodes, and each layer is completely connected to the nodes below it. The
MLP model takes into account past stock performance, annual returns, and non-profit ratios.
 3. Social Media
 Neural networks duplicate the behaviours of social media users. Post analysis of individuals'
behaviours via social media networks the data can be linked to people’s spending
habits. Multilayer Perceptron ANN is used to mine data from social media applications.  
 4.Health care
 The age old saying goes like “Health is Wealth”. Modern day individuals are leveraging the
advantages of technology in the healthcare sector. Convolutional Neural Networks are
actively employed in the healthcare industry for X ray detection, CT Scan and ultrasound. 
The medical imaging data retrieved from the aforementioned tests is analyzed and evaluated
based on neural network models because CNN is utilized in image processing. Additionally,
voice recognition systems are being developed using Recurrent Neural Networks (RNN).
Nowadays, voice recognition systems are used to track the patient's data. Generative neural
networks are also being used by researchers to discover new drugs. Generative neural
networks have simplified the cumbersome process of drug discovery, which involves
matching various drug categories. They can be put to use in drug discovery by combining
various components. 

5.Report the results on both the training set and the test set.
Training set
The actual dataset from which a model is trained is called the Training Set. In order to
correctly predict the outcome or make decisions, the model looks at this data and learns from
it. The majority of the training data is gathered from a variety of sources and preprocessed
and arranged to ensure that the model works properly. The nature of the training data has a
significant impact on the model's capacity to generalize, i.e. The model's performance will
improve the more varied and high-quality the training data. This data accounts for more than
60% of the project's available data.
Example:
Firstly we created a dummy matrix of 8×2 shape using NumPy library to represent input x.
And a list of 0 to 7 integers representing our target variable y.
Now in order to split our dataset into training and testing data, a function named train test
split of library is used.
Input data x with target variable y is passed as parameters to function which then divides the
dataset into 2 parts on the size given in train size i.e. if train size =0.8 is given then the dataset
will be divided in such an way that the training set will be 80% of given dataset and testing
set will be 20% of given dataset.
And as we specify random state to be a positive number, train test split function will
randomly split data.

Training set x: [[ 0 1]
[14 15]
[ 4 5]
[ 8 9]
[ 6 7]
[12 13]]
Training set y: [0, 7, 2, 4, 3, 6]
Test set
This dataset is independent of the training set but has a probability distribution of classes that
is somewhat comparable. It is used as a benchmark to evaluate the model and is used only
after the model has been trained. Most of the time, a testing set is a well-organized dataset
with all kinds of data for scenarios that the model would likely encounter in the real world. It
is not recommended to use the validation and testing set simultaneously as a testing set.
Overfitting occurs when a model's accuracy on training data is higher than its accuracy on
testing data. This data accounts for between 20 and 25 percent of the project's total data.
Example:
To demonstrate the operation of the train test split function, we first created an 82 dummy
matrix to represent input x and a list of 0 to 7 integers representing our target variable y.
Next, in order to divide our dataset into training and testing data, input data x and our target
variable y are passed as parameters to the function. This function then divides the dataset into
two parts based on the size specified in test size.
For example, Additionally, the train test split function will split the data at random if we
specify a positive number for random state.
Test set x: [[ 2 3]
[10 11]]
Test set y: [1, 5]
3.Determine the learning parameters, including the learning rate, momentum, initial
weight ranges and any other used parameters.
Momentum: it is a technique used during the backpropagation phase. As said regarding the
learning rate, parameters are updated so that they can converge towards the minimum of the
loss function. This process might be too long and affecting the efficiency of the algorithm.
Hence, one possible solution is taking track of the previous directions (that are the gradients of
the loss function with respect to weights) and keeping them as embedded information: this is
what momentum is thought for. It basically increases the speed of convergence not in terms of
learning rate (how much a weight is updated each time) but in terms of embedded memory of
past re-calibration (the algorithm knows the previous direction of that weight was, let’s say,
right, and it will directly proceed towards this direction during the next propagation). We can
visualize it if we consider the projection of a two-weights loss function (specifically, a
paraboloid):

WEIGHT INITIALIZATION TECHNIQUES IN NEURAL NETWORK:


Basic notations:Consider an L layer neural network, which has L-1 hidden layers and 1 input
and output layer each. The parameters (weights and biases) for layer l are represented as

In this article, we’ll have a look at some of the basic initialization practices in the use and
some improved techniques that must be used in order to achieve a better result. Following are
some techniques generally practised to initialize parameters :

 Zero initialization
 Random initialization

Zero initialization :

In general practice biases are initialized with 0 and weights are initialized with random
numbers, what if weights are initialized with 0?

In order to understand let us consider we applied sigmoid activation function for the output
layer.

If all the weights are initialized with 0, the derivative with respect to loss function is the same
for every w in W[l], thus all weights have the same value in subsequent iterations. This makes
hidden units symmetric and continues for all the n iterations i.e. setting weights to 0 does not
make it better than a linear model. An important thing to keep in mind is that biases have no
effect what so ever when initialized with 0.

W[l] = np.random.zeros((l-1,l))

Random initialization :

Assigning random values to weights is better than just 0 assignment. But there is one
thing to keep in my mind is that what happens if weights are initialized high values or very
low values and what is a reasonable initialization of weight values.

a) If weights are initialized with very high values the term np.dot(W,X)+b becomes
significantly higher and if an activation function like sigmoid() is applied, the function maps
its value near to 1 where the slope of gradient changes slowly and learning takes a lot of time.
b) If weights are initialized with low values it gets mapped to 0, where the case is the same as
above. This problem is often referred to as the vanishing gradient.

To see this let us see the example we took above but now the weights are initialized with very
large values instead of 0 :

W[l] = np.random.randn(l-1,l)*10

LEARNING RATE:

If there is only time to optimize one hyper-parameter and one uses stochastic gradient
descent, then this is the hyper-parameter that is worth tuning. Unfortunately, we cannot
analytically calculate the optimal learning rate for a given model on a given dataset. Instead,
a good (or good enough) learning rate must be discovered via trial and error. The range of
values to consider for the learning rate is less than 1.0 and greater than 10^-6. A traditional
default value for the learning rate is 0.1 or 0.01, and this may represent a good starting point
on your problem. The grid search approach can help to both highlight an order of magnitude
where good learning rates may reside, as well as describe the relationship between learning
rate and performance.

Moreover, momentum can accelerate learning on those problems where the high-dimensional


weight space that is being navigated by the optimization process has structures that mislead
the gradient descent algorithm, such as flat regions or steep curvature. Although no single
method works best on all problems, the Adaptive momentum has proven to be robust over
many types of neural network architectures and problem types.

6.Analyze the results and make conclusion.

CONCLUSION:

The gaming industry already relies heavily on neural networks. They can help us recognize
handwriting, which can be useful in banking, for example. In addition, there are a lot of
important things that artificial neural networks can do in medicine. They could be used to
make human body models that doctors could use to accurately diagnose diseases in their
patients. Furthermore, complex medical images, such as CT scans, can be analyzed more
quickly and precisely as a result of artificial neural networks. Neural network-based machines
will be able to independently resolve many abstract issues. They will gain knowledge from
their errors. Using a device known as a brain-computer interface, we might one day be able to
connect humans and machines! Human thoughts would become signals that could control
machines as a result of this. In the future, perhaps we will only need to interact with our
surroundings by thinking.
DRAWBACKS
There is no real theory that explains how to select how many hidden layers there are. When
the input data is large, it takes a long time and requires powerful computers. difficult to
comprehend the findings. incredibly challenging to interpret and quantify the influence of
individual predictors. Choosing the right training sample size and learning rate is difficult.
The issue of the local minimum The gradient descent algorithm results in the best weights for
the local minimum, but the error function's global minimum is not always guaranteed.

You might also like