Deep learning has been around for a long time but lacked computing power. Neural networks use weighted sums of inputs and activation functions to model relationships between inputs in hidden layers. Gradient descent is used to minimize cost functions by iteratively adjusting weights until the network is trained. Deep learning finds complex patterns across many inputs to make predictions.
Deep learning has been around for a long time but lacked computing power. Neural networks use weighted sums of inputs and activation functions to model relationships between inputs in hidden layers. Gradient descent is used to minimize cost functions by iteratively adjusting weights until the network is trained. Deep learning finds complex patterns across many inputs to make predictions.
Deep learning has been around for a long time but lacked computing power. Neural networks use weighted sums of inputs and activation functions to model relationships between inputs in hidden layers. Gradient descent is used to minimize cost functions by iteratively adjusting weights until the network is trained. Deep learning finds complex patterns across many inputs to make predictions.
However they did not have the computing/storage capabilities at the time. **If categorical, there will be multiple dummy output values for each category **Weights **Weighted sum of all input values **Apply activation function to weighted sum **Neuron passes signal to next neuron down the line **Binary Values (0 or 1) **Similar to Logistic Regression, binary output with smooth curve for probabilities **One of the most used functions in Neural Networks **Similar to sigmoid function, but values drop below zero (-1 to 1) **Very common to apply: rectifier function on hidden layer & sigmoid function on output layer **Property Valuation Example (focusing on application, not training) **Power comes from hidden layer **Not all inputs are valid to every neuron. Looking specifically for relationships between certain inputs i.e. Area and Distance to City **Looking specifically for relationships between certain inputs i.e. Area, Bedrooms, Age **Could be looking for a certain single criteria i.e. if Age > 100 then historic property **Could be looking for a certain single criteria i.e. if Age > 100 then historic property
**Good Example of a rectifier
function example **Even discovering relationships that may not be as intuitive to us i.e. Bedrooms & Dist. to City **Or picking up a combination of all 4 inputs **The hidden layer allows for increased flexibility of neural network **Each neuron by itself cannot predict price, but all neurons together are powerful i.e. ant analogy Hard Coding Vs. Neural Network **The Perceptron **The Perceptron: First invented by Frank Rosenblatt in 1957 at Cornell Aeronautical Laboratory **Comparing y-hat to y **Cost Function: taking the squared difference between y-hat and y (Error in our Prediction) **Our goal is to minimize the Cost Function **Cost Function readjusts the weights **Iterative process, refeed input values to get new y-hat **Cost function and weights are readjusted **This is one neural network, but copied 8 times to visualize **This is one neural network, but copied 8 times to visualize – These inputs have shared weights **Back Propagation **Cost function for weights **Trained **Untrained **Impossible to use brute force to compute every possible weight **Finding the optimal weight for the cost function **Slope is negative, must move to the right **Slope is positive, must move to the left **Slope is negative, must move to the right **Gradient Descent in a 1- Dimensional problem **Gradient Descent in a 2- Dimensional problem
**Descending in the minimal
of the cost function **Gradient Descent in a **Same problem projected onto a 3- Dimensional problem 2- Dimensional space **Gradient Decent is only effective on a convex cost problem **But what about this? (different cost function or multidimensional) **May find local minimum rather than global minimum **Normal Gradient Decent or (batch gradient descent) **Stochastic Gradient Decent (taking rows one by one) **Stochastic Gradient Decent (taking rows one by one) **Stochastic Gradient Decent (taking rows one by one) **Stochastic **Deterministic (Random) **Ability to readjust all weights simultaneously