01 Machine Learning Fundamentals

Unit 1
Fundamentals of Machine
Learning
Part 1
Oscar Contreras Carrasco

UNIVALLE
2021
Introduction
●
Machine Learning is a discipline whose aim is
to make machines learn
●
For Machine Learning to be effective, it is
paramount to have data at our disposal.
●
In this unit, we are going to discuss the basics
of Machine Learning.
●
Primarily, our focus will be on parametric
methods as well as their main traits.
●
Let’s begin
Machine Learning
ARTIFICIAL
INTELLIGENCE
MACHINE LEARNING
DEEP LEARNING
DEEP LEARNING
Machine Learning
ARTIFICIAL
INTELLIGENCE
MACHINE LEARNING
DEEP LEARNING
DEEP LEARNING
BIG DATA
Machine Learning
●
The past ten years have seen an increasing number of applications
of Artificial Intelligence in general
●
These developments can be ascribed to the rise of Deep Learning as
well as the emergence of other technologies that support it.
●
Cloud Computing and Big Data have also played an important role.
●
Machine Learning in general will continue to see further
developments in the following years.
Machine Learning
●
Are you familiar with any of these technologies?
●
Can you name them?
Machine Learning
●
All of the applications we
mentioned have something in
common.
●
They all require data to be
successful at performing some
operation.
●
This is the very principle of Machine
Learning. Data is fed into our
model, and it is expected to
produce a prediction in return.
Datasets, features, and predictions
FEATURES RESPONSE
Salary Marital Total debt Has credit Matrices

x1 status x3 y
x2
Dataset
3000 Single 0 Yes
3500 Married 2000 No Responses
2500 Divorced 3500 No

Predictions
8000 Single 120 Yes
7500 Married 300 Yes

FEATURES RESPONSE

x1 status x3 y
x2
Dataset
3000 Single 0 Yes

Predictions
8000 Single 120 Yes
Can you imagine what it would be
7500 Married 300 Yes like to have to use a complex
program to make the predictions?
FEATURES RESPONSE

x1 status x3 y
x2
Dataset
3000 Single 0 Yes

Predictions
8000 Single 120 Yes
Can we create our own features and
7500 Married 300 Yes entries? YES!
FEATURES RESPONSE

x1 status x3 y
x2
Dataset
3000 Single 0 Yes

Predictions
8000 Single 120 Yes
The number of features defines the
7500 Married 300 Yes dimensionality of our dataset!
Supervised and unsupervised learning
From the previous standpoint, we can immediately determine
that Machine Learning in general can be divided into two major
areas: Supervised and unsupervised learning
SUPERVISED UNSUPERVISED
Salary Marital Total Has Salary Marital Total Has

x1 status debt credit x1 status debt credit?
x2 x3 y x2 x3 z
3000 Single 0 Yes 3000 Single 0 ?
3500 Married 2000 No 3500 Married 2000 ?
2500 Divorced 3500 No 2500 Divorced 3500 ?
8000 Single 120 Yes 8000 Single 120 ?
7500 Married 300 Yes 7500 Married 300 ?

Further detail
SUPERVISED LEARNING UNSUPERVISED LEARNING
CLUSTERING
DIMENSIONALITY REDUCTION
Training, validation, and testing
When we are working on predictive analysis of data, it is a good
practice to divide our dataset into three parts.
Training set Validation set Testing set
It is used to train the model. In It is used to measure the It is used to report the final
strict terms. We use the training predictive performance of the performance of the model. The
set to adjust the parameters of model. We use the validation set information elicited here will be
the model to adjust the hyperparameters used to report performance on a
of the model. paper, or as a benchmark in a
Kaggle competition.
Bias and variance tradeoff
Model predictive performance is measured by using the
validation set. Sometimes we also call this process “cross-
validation” because it requires the use of special techniques that
ensure stability of the performance measures.
●
We say a model has a high bias (nothing to do with the bias of
linear models) when it does not fit the training data well.
●
We say a model has a high variance when it is overfitting the
training data.
●
Let’s illustrate this concept graphically for better understanding.
Bias and variance tradeoff
Before we move on...
●
At this point, we have covered the basics of Machine Learning in
a rather intuitive, informal fashion.
●
The next thing we are going to cover is linear models, which will
involve a good deal of mathematical notation.
●
So before we continue, we will now talk about derivatives, the
main intuitions behind extrema and other topics.
Derivatives
●
Training a Machine Learning model is all about maximizing or
minimizing and objective function.
●
This process involves calculating derivatives.
●
Let’s delve deeper into the intuitions behind differentiation and
its relevance to ML optimization in general.
Derivatives
●
In very simple terms, a derivative is the slope of the tangent line
evaluated on a point (x, y)
●
That’s how we can infer the fundamental equation for derivatives
which is:
Question
When is this slope going to be zero?

Maxima and minima
●
Yes, you guessed it right! At the critical points of the function we
are evaluating. That means that in order to find the critical points
we need to find the first derivative and then set it equal to zero.
●
But let’s first look at what kinds of extrema we could have.
General differentiation problems
●
All differentiation equations can be derived by using the general
equation we mentioned before, and here is a table of some of
the most common derivatives:
Example
Find the maxima and minima of the following function:
The first derivative of this function is:
And from here, we get:
If we replace in y, we get (1, 19) and (3, 15). Since y = 19 is bigger,

then this is a maximum and (3, 15) is a minimum.
The second derivative criterion
Instead of finding the maximum and minimum values, we can use
the second derivative criterion to confirm whether a point is actually
a maximum or a minimum.
Given this, the second derivative of the previous function is:
And this is how we can assert that x = 1 is a maximum and x = 3 is a

minimum.
The chain rule
The chain rule of derivatives is absolutely important and we will see
it in many proofs and derivations.
Example:
Exercise
Find the derivatives of
Find the maxima and minima of

Gradients
Thus far we have talked about univariate derivatives. In a
multivariate scenario, we will have gradients (Gradient Descent,
anyone?) whose intuitions we are now going to discuss.
●
The gradient of a function f(x, y) can be denoted as:
On the next slide, we will look into a graphical definition of

gradients.
Gradients
●
The gradient of a function is a vector that indicates the rate of
change along a particular axis. Let’s look at a graph for further
clarity on this definition:
Example
●
Find the gradient of the following function:
●
For more info on how to create these nice plots, go to the Wolfram
Alpha website

01 Machine Learning Fundamentals

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01 Machine Learning Fundamentals

Uploaded by

Copyright:

Available Formats

Unit 1

Oscar Contreras Carrasco

Salary Marital Total debt Has credit Matrices

3500 Married 2000 No Responses

2500 Divorced 3500 No

7500 Married 300 Yes

Salary Marital Total debt Has credit Matrices

3500 Married 2000 No Responses

2500 Divorced 3500 No

Salary Marital Total debt Has credit Matrices

3500 Married 2000 No Responses

2500 Divorced 3500 No

Salary Marital Total debt Has credit Matrices

3500 Married 2000 No Responses

2500 Divorced 3500 No

Salary Marital Total Has Salary Marital Total Has

3000 Single 0 Yes 3000 Single 0 ?

3500 Married 2000 No 3500 Married 2000 ?

2500 Divorced 3500 No 2500 Divorced 3500 ?

8000 Single 120 Yes 8000 Single 120 ?

7500 Married 300 Yes 7500 Married 300 ?

Training set Validation set Testing set

When is this slope going to be zero?

The first derivative of this function is:

And from here, we get:

If we replace in y, we get (1, 19) and (3, 15). Since y = 19 is bigger,

Given this, the second derivative of the previous function is:

And this is how we can assert that x = 1 is a maximum and x = 3 is a

Find the maxima and minima of

On the next slide, we will look into a graphical definition of

You might also like