You are on page 1of 38

Foundations of Machine Learning

Dr. Panashe Chiurunge


Machine Learning

 TensorFlow
 Directed Acyclic Graphs
 TensorFlow Eager Execution Mode
 TensorFlow KERAS API
 Linear Regression with TensorFlow
What is Machine Learning
The types of Machine Learning
What is Machine Learning

 We are trying to learn from data or learn the


representation of the data
 To formulate the basic learning from data problem,
we must specify several basic elements:
 data spaces, probability measures, loss
functions, and statistical risk
Machine Learning – Data Space

 We have to learn from some data


 Learning from data begins with a specification of
two spaces

 The Input space is also sometimes called the


feature space
 The Output space is also called the "label space",
"outcome space", "signal range", or in statistical
regression the "response space"
Machine Learning

 We then want to create a function that can map the


representation of the feature space given some random
noise within the data
Machine Learning

 The basic problem in machine learning is to determine


a mapping

 That takes an input

 Predicts the output


Machine Learning – Loss Functions

 Since we are trying to predict/classify labels we need to


measure the performance of our learner in some way.

 Suppose we have a true label and a label


prediction
 A loss function measures how "different" are these two
quantities. Formally, a loss function is a map
Machine Learning – Loss Functions

 Suppose we have a true label and a label


prediction
 A loss function measures how "different" are these two
quantities. Formally, a loss function is a map

Cost function
Machine Learning – Loss Functions

Cost function

 In regression or estimation problems , . The squared


error loss function is often employed.
Machine Learning – Loss Functions

Cost function

 The loss function can be used to measure the " risk"


of a learning rule.
 We have to minimize this " risk" of a learning rule as
we learn our data representation
Machine Learning – Loss Functions

Cost function

 The loss function can be used to measure the " risk"


of a learning rule.
 We have to minimize this " risk" of a learning rule as
we learn our data representation
Machine Learning – Linear Regression

 Linear Regression is simply finding the best possible


line of fit that represent a set of data point.
 In Machine Learning terms we are creating a
learning rule that fits a line of representation of our
data
Machine Learning – Linear Regression

 Linear Regression is simply finding the best possible


line of fit that represent a set of data point.
 In Machine Learning terms we are creating a
learning rule that fits a line of representation of our
data
Machine Learning – Linear Regression

 Let’s suppose we want to model the above set of


points with a line.
 To do this we’ll use the standard line
equation where is the line’s gradient and is
the line’s intercept.
Machine Learning – Linear Regression

 To find the best line for our data, we need to find the
best set of gradient and intercept values.
Machine Learning – Linear Regression

 A standard approach to solving this type of problem is


to define an error function (also called a cost
function/loss function) that measures how “good” a
given line is.
 This function will take in a (m,b) pair and return an error
value based on how well the line fits our data.
Machine Learning – Linear Regression

 To compute this error for a given line, we’ll iterate


through each (x,y) point in our data set and sum the
square distances between each point’s y value and the
candidate line’s y value (computed at mx + b).
 It’s conventional to square this distance to ensure that
it is positive and to make our cost function
Machine Learning – Linear Regression

 Our loss function is

 Which is
Machine Learning – Linear Regression

 Our loss function is

 Lines that fit our data better (where better is defined by


our cost function) will result in lower error values.
 If we minimize this function, we will get the best line of
fit to represent our data.
Machine Learning – Linear Regression

 Our cost function consists of two parameters (m and b)


we can visualize it as a two-dimensional surface
Machine Learning – Gradient Descent

 Each point in this two-dimensional space represents a line. The


height of the function at each point is the error value for that line.
You can see that some lines yield smaller error values than
others (i.e., fit our data better). When we run gradient descent
search, we will start from some location on this surface and
move downhill to find the line with the lowest error.
Machine Learning – Gradient Descent

 To run gradient descent on this error function, we first need to


compute its gradient.
 The gradient will act like a compass and always point us downhill.
 To compute it, we will need to differentiate our error function.
 Since our function is defined by two parameters (m and b), we
will need to compute a partial derivative for each.
 These derivatives work out to be:
Machine Learning – Gradient Descent

 We can initialize our search to start at any pair of m and b


values (i.e., any line) and let the gradient descent algorithm
march downhill on our error function towards the best line.
 Each iteration will update m and b to a line that yields slightly
lower error than the previous iteration.
 The direction to move in for each iteration is calculated using the
two partial derivatives
Machine Learning – Gradient Descent

 The learning Rate variable controls how large of a step we take


downhill during each iteration. If we take too large of a step, we
may step over the minimum.
 However, if we take small steps, it will require many iterations to
arrive at the minimum
Machine Learning – Gradient Descent

 We can also observe how the error changes as we move toward


the minimum. A good way to ensure that gradient descent is
working correctly is to make sure that the error decreases for
each iteration.
Machine Learning – Gradient Descent

 We can also observe how the error changes as we move toward


the minimum. A good way to ensure that gradient descent is
working correctly is to make sure that the error decreases for
each iteration.
Machine Learning – Gradient Descent
Machine Learning – Gradient Descent
Machine Learning – Gradient Descent
Machine Learning – Gradient Descent
Machine Learning – Gradient Descent
Machine Learning – Gradient Descent
 Do the following until convergence
Machine Learning –
Stochastic Gradient Descent
Machine Learning –
Stochastic Gradient Descent
Machine Learning – GD Algorithms
 Stochastic Gradient Descent
 Adaptive Momentum Estimation
 Nesterov accelerated Gradient
 Adaptive Gradient Descent
 Adaptive Learning Rate Method
 Root Mean Square Propagation
Machine Learning

Q&A

You might also like