Professional Documents
Culture Documents
Coursera ML Advanced Algos
Coursera ML Advanced Algos
Back to aman.ai
search...
Neural Networks
The original intuition behind creating neural networks was to write software that could
mimic how the human brain worked.
Neural networks have revolutionized applications areas they’ve been a part of with
speech recognition being one of the first areas.
Let’s look at an example to further understand how neural networks work:
The problem we will look at is demand prediction for a shirt.
Lets look at the image below, we have our graph with the data and it’s represented
by a sigmoid function since we’ve applied logistic regression.
Our function, which we originally called the output of the learning algorithm, we will
now denote with a and call this the activation function.
What the activation function does is it takes in the x which in our case is the price,
runs the formula we see, and returns the probability that this shirt is a top seller.
Now lets talk about how our neural network will work before we get to the activation
function/output.
The network will take in several features (price, marketing, material, shipping cost)
→ feed it through a layer of neurons which will output → affordability, awareness,
perceived value which will output → the probability of being the top seller.
Each of these tasks contains a combination of neurons or a single neuron called a
“layer”.
The last layer is called the output layer, easy enough! The first layer is called the
input layer that takes in a feature vector x . The layer in the middle is called the
hidden layer because the values for affordability,awareness etc. are not explicitly
stated in the dataset.
Each layer of neurons will have access to its previous layer and will focus on the
subset of features that are the most relevant.
Forward Propogation
Inference: inference is the process of using a trained model to make predictions against
previously unseen data set.
Forward propogation: going through each layer from left to right
With forward prop, you’d be able to download the parameters of a neural network that
someone else had trained and posted on the Internet. You’d also be able to carry out
inference on your new data using their neural network.
Lets look at an example with TensorFlow below:
In this image above, we can see x the input vector is instantiated as a NumPy array.
layer_1, which is the first hidden layer in this network, is named Dense and its activation
function is the σ function as this is a logistic regression problem.
layer_1 can also be thought of as a function and a1 will take that function with the input
feature vector and hold the output vectors from that layer.
This process will continue for the second layer and onwards.
Each layer is carrying out inference for the layer before it.
And below is a more succint version of the code doing the same thing as earlier.
Each of these would be a NumPy array and we would then have to take their dot product
into our value of z.
This value z will then be given to the sigmoid function and the result would be the output
vector for that layer.
Let’s delve even deeper, lets see how things work under the hood withing NumPy.
Here we would need to take each neuron and create a matrix for w and b .
We would then also need to implement the dense function in python that takes in a, .
W,
b, g
Below is code for a vectorized implementation of forward prop in a neural network.
Now lets look into each of these steps for a neural network.
The first step is to specify how to compute the output given the input x and parameters w
and b .
This code snippet specifies the entire architecture of the neural network. It tells you
that there are 25 hidden units in the first hidden layer, then the 15 in the next one,
and then one output unit and that we’re using the sigmoid activation value.
Based on this code snippet, we know also what are the parameters w1, v1 though
the first layer parameters of the second layer and parameters of the third layer.
This code snippet specifies the entire architecture of the neural network and
therefore tells TensorFlow everything it needs in order to compute the output x as a
function.
The second step is to specify the loss/cost function we used to train the neural network.
Btw, once you’ve specified the loss with respect to a single training example,
TensorFlow will know that the cost you want to minimize is the average.
It will take the average over all the training examples.
You can also always change your loss function.
Activation Functions
ReLU (Rectified Linear Unit)
Most common choice, its faster to compute and Prof. Andrew Ng suggests using it
as a default for all hidden layers.
It only goes flat in one place, thus gradient descent is faster to run.
If the output can never be negative, like the price of a house, this is the best
activation function.
There are variants like LeakyReLU.
Linear activation function:
Output can be negative or positive.
This is great for regression problems
Sometimes if we use this, people will say we are not using any activation function.
Sigmoid activation function:
This is the natural choice for a binary classification problem as it will naturally give
you a 0 or 1.
It’s flat in two places so its slower than ReLU.
Multiclass Classification Problem
This is a problem where there are more than 2 classes. This is when the target y can take
on more than two possible values for its output.
Binary classification only has 2 class possibilities, whereas multiclass can have multiple
possibilities for the output.
So now we need a new decision boundary algorithm to learn the probabilities for each
class.
And below is the cost function side by side for both logistic regression and softmax
regression.
In order to build a neural network to have multiclass classification, we will need to add a
softmax layer to its output.
Lets also look at the TensorFlow code for softmax below:
XGBoost
Extreme Gradient Boosting or XGBoost is a distributed gradientboosted decision tree.
It provides parallel tree boosting and is the leading machine learning library for regression,
classification, and ranking problems.
It has built in regularization to prevent overfitting.
from xgboost import XGBClassifier or from XGBoost import XGBRegressor for
classification vs. regression.
TL;DR
Lets quickly go over the key takeaways from this section:
Neural Nets behind the hood:
Every layer inputs a vector of numbers and applies a bunch of logistic regression
units to it, and then computes another vector of numbers.
These vectors then get passed from layer to layer until you get to the final output
layers computation, which is the prediction of the neural network.
Then you can either threshold at 0.5 or not to come up with the final prediction.
Convolutional Layer only looks at part of the previous layers inputs instead of all of them.
They have faster compute time and need less training data so thus, are less prone
to overfitting.
| | | |
www.amanchadha.com