Foundations of Machine Learning

Foundations of Machine Learning
Dr. Panashe Chiurunge

Machine Learning
 TensorFlow
 Directed Acyclic Graphs
 TensorFlow Eager Execution Mode
 TensorFlow KERAS API
 Linear Regression with TensorFlow
What is Machine Learning
The types of Machine Learning
What is Machine Learning
 We are trying to learn from data or learn the

representation of the data
 To formulate the basic learning from data problem,
we must specify several basic elements:
 data spaces, probability measures, loss
functions, and statistical risk
Machine Learning – Data Space
 We have to learn from some data

 Learning from data begins with a specification of
two spaces
 The Input space is also sometimes called the

feature space
 The Output space is also called the "label space",
"outcome space", "signal range", or in statistical
regression the "response space"
Machine Learning
 We then want to create a function that can map the

representation of the feature space given some random
noise within the data
Machine Learning
 The basic problem in machine learning is to determine

a mapping
 That takes an input
 Predicts the output

Machine Learning – Loss Functions
 Since we are trying to predict/classify labels we need to

measure the performance of our learner in some way.
 Suppose we have a true label and a label

prediction
 A loss function measures how "different" are these two
quantities. Formally, a loss function is a map
 Suppose we have a true label and a label

prediction
 A loss function measures how "different" are these two
quantities. Formally, a loss function is a map
Cost function
Cost function
 In regression or estimation problems , . The squared

error loss function is often employed.
Cost function
 The loss function can be used to measure the " risk"

of a learning rule.
 We have to minimize this " risk" of a learning rule as
we learn our data representation
Cost function
 The loss function can be used to measure the " risk"

of a learning rule.
 We have to minimize this " risk" of a learning rule as
we learn our data representation
Machine Learning – Linear Regression
 Linear Regression is simply finding the best possible

line of fit that represent a set of data point.
 In Machine Learning terms we are creating a
learning rule that fits a line of representation of our
data
 Linear Regression is simply finding the best possible

line of fit that represent a set of data point.
 In Machine Learning terms we are creating a
learning rule that fits a line of representation of our
data
 Let’s suppose we want to model the above set of

points with a line.
 To do this we’ll use the standard line
equation where is the line’s gradient and is
the line’s intercept.
 To find the best line for our data, we need to find the
best set of gradient and intercept values.
 A standard approach to solving this type of problem is

to define an error function (also called a cost
function/loss function) that measures how “good” a
given line is.
 This function will take in a (m,b) pair and return an error
value based on how well the line fits our data.
 To compute this error for a given line, we’ll iterate

through each (x,y) point in our data set and sum the
square distances between each point’s y value and the
candidate line’s y value (computed at mx + b).
 It’s conventional to square this distance to ensure that
it is positive and to make our cost function
 Our loss function is
 Which is
 Our loss function is
 Lines that fit our data better (where better is defined by

our cost function) will result in lower error values.
 If we minimize this function, we will get the best line of
fit to represent our data.
 Our cost function consists of two parameters (m and b)

we can visualize it as a two-dimensional surface
Machine Learning – Gradient Descent
 Each point in this two-dimensional space represents a line. The

height of the function at each point is the error value for that line.
You can see that some lines yield smaller error values than
others (i.e., fit our data better). When we run gradient descent
search, we will start from some location on this surface and
move downhill to find the line with the lowest error.
 To run gradient descent on this error function, we first need to

compute its gradient.
 The gradient will act like a compass and always point us downhill.
 To compute it, we will need to differentiate our error function.
 Since our function is defined by two parameters (m and b), we
will need to compute a partial derivative for each.
 These derivatives work out to be:
 We can initialize our search to start at any pair of m and b

values (i.e., any line) and let the gradient descent algorithm
march downhill on our error function towards the best line.
 Each iteration will update m and b to a line that yields slightly
lower error than the previous iteration.
 The direction to move in for each iteration is calculated using the
two partial derivatives
 The learning Rate variable controls how large of a step we take

downhill during each iteration. If we take too large of a step, we
may step over the minimum.
 However, if we take small steps, it will require many iterations to
arrive at the minimum
 We can also observe how the error changes as we move toward

the minimum. A good way to ensure that gradient descent is
working correctly is to make sure that the error decreases for
each iteration.
 We can also observe how the error changes as we move toward

the minimum. A good way to ensure that gradient descent is
working correctly is to make sure that the error decreases for
each iteration.
 Do the following until convergence
Machine Learning –
Stochastic Gradient Descent
Machine Learning –
Stochastic Gradient Descent
Machine Learning – GD Algorithms
 Stochastic Gradient Descent
 Adaptive Momentum Estimation
 Nesterov accelerated Gradient
 Adaptive Gradient Descent
 Adaptive Learning Rate Method
 Root Mean Square Propagation
Machine Learning
Q&A

Foundations of Machine Learning - 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Foundations of Machine Learning - 3

Uploaded by

Copyright:

Available Formats

Dr. Panashe Chiurunge

 We are trying to learn from data or learn the

 We have to learn from some data

 The Input space is also sometimes called the

 We then want to create a function that can map the

 The basic problem in machine learning is to determine

 That takes an input

 Predicts the output

 Since we are trying to predict/classify labels we need to

 Suppose we have a true label and a label

 Suppose we have a true label and a label

 In regression or estimation problems , . The squared

 The loss function can be used to measure the " risk"

 The loss function can be used to measure the " risk"

 Linear Regression is simply finding the best possible

 Linear Regression is simply finding the best possible

 Let’s suppose we want to model the above set of

 A standard approach to solving this type of problem is

 To compute this error for a given line, we’ll iterate

 Our loss function is

 Our loss function is

 Lines that fit our data better (where better is defined by

 Our cost function consists of two parameters (m and b)

 Each point in this two-dimensional space represents a line. The

 To run gradient descent on this error function, we first need to

 We can initialize our search to start at any pair of m and b

 The learning Rate variable controls how large of a step we take

 We can also observe how the error changes as we move toward

 We can also observe how the error changes as we move toward

You might also like