You are on page 1of 29

Unit 1

Fundamentals of Machine
Learning

Part 1

Oscar Contreras Carrasco


UNIVALLE
2021
Introduction

Machine Learning is a discipline whose aim is
to make machines learn

For Machine Learning to be effective, it is
paramount to have data at our disposal.

In this unit, we are going to discuss the basics
of Machine Learning.

Primarily, our focus will be on parametric
methods as well as their main traits.

Let’s begin
Machine Learning
ARTIFICIAL
INTELLIGENCE

MACHINE LEARNING

DEEP LEARNING
DEEP LEARNING
Machine Learning
ARTIFICIAL
INTELLIGENCE

MACHINE LEARNING

DEEP LEARNING
DEEP LEARNING

BIG DATA
Machine Learning

The past ten years have seen an increasing number of applications
of Artificial Intelligence in general

These developments can be ascribed to the rise of Deep Learning as
well as the emergence of other technologies that support it.

Cloud Computing and Big Data have also played an important role.

Machine Learning in general will continue to see further
developments in the following years.
Machine Learning

Are you familiar with any of these technologies?

Can you name them?
Machine Learning

All of the applications we
mentioned have something in
common.

They all require data to be
successful at performing some
operation.

This is the very principle of Machine
Learning. Data is fed into our
model, and it is expected to
produce a prediction in return.
Datasets, features, and predictions
FEATURES RESPONSE

Salary Marital Total debt Has credit Matrices


x1 status x3 y
x2
Dataset
3000 Single 0 Yes

3500 Married 2000 No Responses

2500 Divorced 3500 No


Predictions
8000 Single 120 Yes

7500 Married 300 Yes


Datasets, features, and predictions
FEATURES RESPONSE

Salary Marital Total debt Has credit Matrices


x1 status x3 y
x2
Dataset
3000 Single 0 Yes

3500 Married 2000 No Responses

2500 Divorced 3500 No


Predictions
8000 Single 120 Yes
Can you imagine what it would be
7500 Married 300 Yes like to have to use a complex
program to make the predictions?
Datasets, features, and predictions
FEATURES RESPONSE

Salary Marital Total debt Has credit Matrices


x1 status x3 y
x2
Dataset
3000 Single 0 Yes

3500 Married 2000 No Responses

2500 Divorced 3500 No


Predictions
8000 Single 120 Yes
Can we create our own features and
7500 Married 300 Yes entries? YES!
Datasets, features, and predictions
FEATURES RESPONSE

Salary Marital Total debt Has credit Matrices


x1 status x3 y
x2
Dataset
3000 Single 0 Yes

3500 Married 2000 No Responses

2500 Divorced 3500 No


Predictions
8000 Single 120 Yes
The number of features defines the
7500 Married 300 Yes dimensionality of our dataset!
Supervised and unsupervised learning
From the previous standpoint, we can immediately determine
that Machine Learning in general can be divided into two major
areas: Supervised and unsupervised learning
SUPERVISED UNSUPERVISED

Salary Marital Total Has Salary Marital Total Has


x1 status debt credit x1 status debt credit?
x2 x3 y x2 x3 z

3000 Single 0 Yes 3000 Single 0 ?

3500 Married 2000 No 3500 Married 2000 ?

2500 Divorced 3500 No 2500 Divorced 3500 ?

8000 Single 120 Yes 8000 Single 120 ?

7500 Married 300 Yes 7500 Married 300 ?


Further detail
SUPERVISED LEARNING UNSUPERVISED LEARNING

CLUSTERING

DIMENSIONALITY REDUCTION
Training, validation, and testing
When we are working on predictive analysis of data, it is a good
practice to divide our dataset into three parts.

Training set Validation set Testing set

It is used to train the model. In It is used to measure the It is used to report the final
strict terms. We use the training predictive performance of the performance of the model. The
set to adjust the parameters of model. We use the validation set information elicited here will be
the model to adjust the hyperparameters used to report performance on a
of the model. paper, or as a benchmark in a
Kaggle competition.
Bias and variance tradeoff
Model predictive performance is measured by using the
validation set. Sometimes we also call this process “cross-
validation” because it requires the use of special techniques that
ensure stability of the performance measures.

We say a model has a high bias (nothing to do with the bias of
linear models) when it does not fit the training data well.

We say a model has a high variance when it is overfitting the
training data.

Let’s illustrate this concept graphically for better understanding.
Bias and variance tradeoff
Before we move on...

At this point, we have covered the basics of Machine Learning in
a rather intuitive, informal fashion.

The next thing we are going to cover is linear models, which will
involve a good deal of mathematical notation.

So before we continue, we will now talk about derivatives, the
main intuitions behind extrema and other topics.
Derivatives

Training a Machine Learning model is all about maximizing or
minimizing and objective function.

This process involves calculating derivatives.

Let’s delve deeper into the intuitions behind differentiation and
its relevance to ML optimization in general.
Derivatives

In very simple terms, a derivative is the slope of the tangent line
evaluated on a point (x, y)

That’s how we can infer the fundamental equation for derivatives
which is:
Question

When is this slope going to be zero?


Maxima and minima

Yes, you guessed it right! At the critical points of the function we
are evaluating. That means that in order to find the critical points
we need to find the first derivative and then set it equal to zero.

But let’s first look at what kinds of extrema we could have.
General differentiation problems

All differentiation equations can be derived by using the general
equation we mentioned before, and here is a table of some of
the most common derivatives:
Example
Find the maxima and minima of the following function:

The first derivative of this function is:

And from here, we get:

If we replace in y, we get (1, 19) and (3, 15). Since y = 19 is bigger,


then this is a maximum and (3, 15) is a minimum.
The second derivative criterion
Instead of finding the maximum and minimum values, we can use
the second derivative criterion to confirm whether a point is actually
a maximum or a minimum.

Given this, the second derivative of the previous function is:

And this is how we can assert that x = 1 is a maximum and x = 3 is a


minimum.
The chain rule
The chain rule of derivatives is absolutely important and we will see
it in many proofs and derivations.

Example:
Exercise
Find the derivatives of

Find the maxima and minima of


Gradients
Thus far we have talked about univariate derivatives. In a
multivariate scenario, we will have gradients (Gradient Descent,
anyone?) whose intuitions we are now going to discuss.

The gradient of a function f(x, y) can be denoted as:

On the next slide, we will look into a graphical definition of


gradients.
Gradients

The gradient of a function is a vector that indicates the rate of
change along a particular axis. Let’s look at a graph for further
clarity on this definition:
Example

Find the gradient of the following function:


For more info on how to create these nice plots, go to the Wolfram
Alpha website

You might also like