Professional Documents
Culture Documents
Neural Networks
Linear Models
Linear Models
Linear Models
Statistical Models
Linear Models
Linear Models
Greedy Hill Descent
Greedy Hill Descent
1. Initial random
parameters.
2. Repeatedly find
direction to change
parameters in order to
reduce loss function
and update in that
direction based on local
information and
learning rate.
Greedy Hill Descent
Multinomial Logistic Regression
Generalizable to multinomial cases.
One set of parameters per value:
Feature Transformations &
Non-Linear Models
X´=X X´´=X^2 Y
1 1 4
2 3 9
3 9 10
4 16 5
Feature Transformations &
Non-Linear Models
Most of the plane represent
impossible points. Only one
curve, where X´^2=X´´
represents possible points.
X´=X X´´=X^2 Y
1 1 4
2 3 9
3 9 10
4 16 5
Feature Transformations &
Non-Linear Models
We can take that curve and
plot the X'=X vs Y values into
our original two dimensional
space. And we have a non-
linear model!
X Y
1 4
2 9
3 10
4 5
Feature Transformations &
Non-Linear Models
We can create non-linear models by:
• Performing non-linear feature transformations
• Creating a linear model on the transformed data
• Taking the curve of possible values
Most non-linear models are of this sort...
… including neural networks and deep learning models.
Artificial Neural Networks (ANNs)
Basic ANNs are a series of one or more feature transformations
followed by linear regression (for regression tasks) or logistic
regression (for classification tasks).
• The transformations are parameterized, and the values of the
parameters are fit during training.
• ANNs learn the transformation used from the data.
Note: Multinomial logistic regression is often called softmax in the
context of ANNs.
Artificial Neural Networks
• Input, hidden and output layers
• Dummy variables (X0, H0)
• Parameters are weights on edges
• Hidden layers are feature
transformations of previous layer
• Can have multiple hidden layers
• Output layer: Linear/logistic
regression
Artificial Neural Networks
• Transformations are nonlinear
functions of the weighted sum of
the previous layer's variables.
• The sums weights are those
associated with the incoming
edges.
• The non-linear function is called
the 'activation' function.
Artificial Neural Networks