You are on page 1of 140

**Deep Learning has been

around for a long time.


However they did not have
the computing/storage
capabilities at the time.
**If categorical, there will be
multiple dummy output values
for each category
**Weights
**Weighted sum
of all input values
**Apply activation
function to
weighted sum
**Neuron passes
signal to next neuron
down the line
**Binary Values
(0 or 1)
**Similar to Logistic
Regression, binary
output with smooth
curve for probabilities
**One of the most
used functions in
Neural Networks
**Similar to sigmoid
function, but values drop
below zero (-1 to 1)
**Very common to apply:
rectifier function on hidden layer &
sigmoid function on output layer
**Property Valuation Example
(focusing on application, not training)
**Power comes from
hidden layer
**Not all inputs are valid to every neuron.
Looking specifically for
relationships between certain inputs
i.e. Area and Distance to City
**Looking specifically for relationships
between certain inputs
i.e. Area, Bedrooms, Age
**Could be looking for a certain
single criteria
i.e.
if Age > 100 then historic property
**Could be looking for a certain
single criteria
i.e.
if Age > 100 then historic property

**Good Example of a rectifier


function example
**Even discovering relationships that
may not be as intuitive to us
i.e.
Bedrooms & Dist. to City
**Or picking up a combination of
all 4 inputs
**The hidden layer allows for
increased flexibility of neural network
**Each neuron by itself cannot
predict price, but all neurons
together are powerful
i.e. ant analogy
Hard Coding Vs. Neural Network
**The Perceptron
**The Perceptron: First invented by
Frank Rosenblatt in 1957 at Cornell
Aeronautical Laboratory
**Comparing y-hat to y
**Cost Function: taking the squared
difference between y-hat and y
(Error in our Prediction)
**Our goal is to minimize the
Cost Function
**Cost Function readjusts
the weights
**Iterative process, refeed input
values to get new y-hat
**Cost function and weights
are readjusted
**This is one neural network, but
copied 8 times to visualize
**This is one neural network, but
copied 8 times to visualize –
These inputs have shared weights
**Back Propagation
**Cost function for weights
**Trained
**Untrained
**Impossible to use
brute force to compute
every possible weight
**Finding the optimal
weight for the cost function
**Slope is negative, must
move to the right
**Slope is positive, must
move to the left
**Slope is negative, must
move to the right
**Gradient Descent in a
1- Dimensional problem
**Gradient Descent in a
2- Dimensional problem

**Descending in the minimal


of the cost function
**Gradient Descent in a **Same problem projected onto a
3- Dimensional problem 2- Dimensional space
**Gradient Decent is only effective
on a convex cost problem
**But what about this?
(different cost function or
multidimensional)
**May find local minimum rather
than global minimum
**Normal Gradient Decent or
(batch gradient descent)
**Stochastic Gradient Decent
(taking rows one by one)
**Stochastic Gradient Decent
(taking rows one by one)
**Stochastic Gradient Decent
(taking rows one by one)
**Stochastic
**Deterministic
(Random)
**Ability to readjust
all weights simultaneously

You might also like