You are on page 1of 12

0:00

hello and welcome back to our next


0:02
lecture we will talking about
0:03
nearest neighbor algorithm so without
0:06
any further delay let's start
0:08
let's assume is a scenario our data set
0:10
contains data appliance for two
0:12
categories
0:13
which is graded by different colors in
0:15
the case that's at point
0:17
points corresponding to their categories
0:19
one is orange color and which is on the
0:20
right side
0:22
and the second category is blue color
0:25
which is on the left side of the plot
0:28
so uh so you're just going to consider
0:31
to worry about
0:32
it is we don't know is a variable x and
0:35
y
0:36
which is actually represented two
0:38
columns of our data set
0:40
so for instance there are many they are
0:42
may in
0:44
age and salary in the example that we
0:46
consider in the data
0:47
data pre-processing section of the
0:50
course
0:51
[Music]
0:53
uh however to make the things extremely
0:55
simple we are going to
0:57
we are going to add my any skill to
1:00
these
1:01
variables now let's assume that we
1:03
encounter a new data point
1:05
we are we were asked in category based
1:08
on the
1:09
available information so
1:12
the key issue is uh whether whether it
1:15
should be fall
1:15
in blue categories or in orange category
1:18
there is uh
1:19
category one and category two in other
1:21
words uh
1:22
how do we decide or classify these data
1:25
data points
1:26
so this is this is the time to which the
1:28
nearest negabor algorithm is the
1:30
going to resolve for us at that point
1:33
algorithm we will
1:34
be able to determine the category of
1:36
further
1:37
data point uh in the in the case turn
1:40
out of the category 2
1:42
which is shown by the blue color okay
1:45
so let's see how how can uh how
1:48
k n algorithm is going to do that for us
1:52
in order to understand the an algorithm
1:55
we are going to introduce four
1:57
step procedure
2:03
so you will notice there is very small
2:05
simple algorithm
2:06
the the very first step is to choose the
2:08
number of neighbors that
2:10
you are going to have in your algorithm
2:13
and this means that you need to identify
2:15
whether k is equal to 1
2:17
to 3 or some other numbers one of the
2:20
most commonly
2:22
used value for k is value of five
2:29
step two step two is to the
2:33
computer navy neighbors of the new data
2:36
new data
2:37
find according to the some distance
2:39
measures such as the euclidean
2:41
euclidean distance measure so so you
2:44
don't
2:45
have to use euclidean distance all the
2:47
time you you can use
2:48
the distance matters such as
2:52
manhattan or city block or hamming
2:55
distance measure
2:56
since is the most case of euclidean
2:58
distance is used
2:59
so in this example we will just stick to
3:02
that
3:04
so once you have to compute
3:08
okay nearest neighbor the next step is
3:10
to count the number of data
3:12
lines from each category among the
3:14
neighbor
3:15
computing second step
3:19
so how so how to neighbor appliance
3:21
happen to
3:22
be in category one and how many of these
3:26
um happens into the into categories we
3:28
need to determine that
3:30
in this step which which is step number
3:33
three
3:33
if you have more than two categories in
3:35
in the data
3:36
set and then you just simply need to
3:39
counter how many neighbors
3:41
that applies happened in each of the
3:43
category finally in the step
3:45
number four we will assign
3:48
the due date applied to the category
3:52
with most neighbors so this is as simple
3:54
as
3:55
that and after after these four steps
3:59
you are done and your model is ready to
4:01
predict
4:02
uh any a new data point
4:06
so let's do uh let's do a man will
4:09
exercise to the solidify or our
4:12
knowledge
4:12
and see that and see
4:15
they can and they can add algorithms in
4:18
action
4:19
remember remember the issue you you
4:24
yes we need to classify the new data
4:27
apply based on the available appliance
4:29
in the two categories so let's start for
4:32
steps process
4:34
of the algorithm into is to choose the
4:38
number of uh
4:39
neighbors so we we keep it five the next
4:43
the next step which is uh which is step
4:45
number two
4:46
is to determine determine the five
4:49
neighbors of this new data point
4:51
according to some distance measure uh
4:53
point out as earlier we use uh
4:55
jupiter and distance and we also talk
4:58
about the nucleating distance is the
5:00
data preprocessing section
5:02
so euclidean distance is a basic method
5:04
which we are
5:05
studying in which we are studying in the
5:08
geometry and uh
5:12
so basically if we have two points such
5:14
as p1 and p2
5:16
in this case the euclidean distance
5:17
between those two points is major
5:19
according to the formula which is
5:21
which means that we you know we need to
5:23
determine and determine the
5:24
difference of x coordinate values of two
5:27
points and that
5:28
in difference of y coordinates and
5:30
values
5:31
and then taking square root of of the
5:34
difference and take some and finally
5:37
under root
5:39
so that's coming to our example of uh
5:41
[Music]
5:42
of the algorithm we were in the step
5:46
of control algorithm which we need to
5:49
determine the
5:50
neighbor based or locally um locally
5:54
the distance so basically we just look
5:57
at
5:57
them and we see the distance here we we
6:00
can
6:00
see that this is the closest one we were
6:03
if we were to give the actual value of
6:05
these lines then we could easily verify
6:07
that they are five years
6:09
uh neighbors step three is to count the
6:11
number of data points
6:12
for each categories so in this case we
6:16
see that
6:17
from the neighbors which are data
6:19
appliance inside the
6:20
circle of two 2002 belongs to categories
6:24
one and
6:25
three belongs to category two finally
6:28
finally step four is to assign the data
6:30
points uh to the category
6:32
which most neighbors which is in this
6:34
case happened to the
6:35
uh to be category 2. so that was simple
6:38
as that
6:39
and now we have to classify our data
6:41
appliance and
6:43
we have already the model of to classify
6:47
and any further data points so
6:50
in the conclusion it's one of the oldest
6:52
algorithm in machine learning and
6:54
one of the simplest one is to so
6:57
i believe that enough to get you started
6:59
with the
7:01
knn algorithm so now
7:04
we know we will apply these
7:07
this algorithm in our matlab
7:13
our first machine learning model of k
7:15
nearest next board and i can't wait to
7:17
show you the first result
7:19
to show how they can manage chapter data
7:21
of some categories
7:23
and predict the categories into unseen
7:26
data line
7:26
so let's start making the model right
7:28
now the first thing we need to do a
7:30
local
7:31
load the data set and the data that we
7:33
will be using is related to the social
7:35
network
7:36
so you can see on the screen the the
7:38
data set contain information of user in
7:39
social network
7:40
and the information include the user a
7:43
gender range estimation
7:45
uh salary and the social social network
7:48
has several business clients
7:49
which can put their which can put their
7:53
data on social network and the client is
7:57
a car company
7:58
who has launched their brand so
8:02
we we are trying to see which of the
8:04
users of social network are going to buy
8:06
this brand new
8:08
uh suv okay so last column of
8:12
uh so here the last column tells us if a
8:15
certain user of social uh
8:16
network has bought suv or
8:19
he has not bought the suv so even
8:23
when the building is modeled is going to
8:25
predict if user is going to buy the
8:28
suv or not on the variable given on the
8:31
table
8:34
so there are 400 instance in the
8:37
particular data set
8:39
and let's load the data set into the
8:42
matlab
8:44
okay we will
8:48
we will need to pre-process the template
8:51
and if you build in the last section
8:54
of the course we will just copy the
8:56
template and paste it over here in order
8:58
to load the dataset
9:00
so
9:04
we so we will ignore this and we will
9:06
take a right off the variable we will
9:08
build
9:08
build up our model and you will see that
9:11
this is a very
9:12
easy in matlab in fact we and we will
9:15
see
9:16
not by writing an extra statement or
9:18
code for this in matlab
9:20
uh this is uh so
9:23
so the question is you know do we need
9:25
to apply any pre-process
9:27
technique that we learned in earlier
9:29
section of pre-processing
9:31
and the answer is yes we are going to
9:35
apply process
9:36
apply the standardization
9:40
technique to preprocess our data
9:45
so as you know that we will use
9:48
euclidean distance for
9:50
standardization
9:58
so we will need some a few functions in
10:01
matlab so let's see how can we
10:04
do this in matlab so the function we
10:06
need to use in order to build
10:07
the classification model is a fade c
10:11
and we need to best provide the variable
10:14
which contain that data
10:16
and as a second input the function
10:18
expect the variable name
10:19
which will we will use as a response
10:23
variable name or in other words variable
10:25
name for which we will make a prediction
10:28
so in this case uh the uh the wording
10:31
table in the purchase variable
10:33
uh remember that point uh point of one
10:36
point out in the last lecture and we
10:38
will building our model based on two
10:40
and uh we are two years of age
10:43
estimation salary and i told you that
10:45
there is
10:45
going to be a very simple very simple
10:49
in matlab so in order to specify that we
10:51
need to write a
10:54
variable names that we use for our
10:56
building our model
10:58
we also need to insert the place between
11:01
and wheel variable name
11:02
and rows and our model is ready to
11:07
use we need to store this model in some
11:10
variables so we use the variable name of
11:12
classification underscore
11:14
model in this case this will
11:17
mean that classification underscore
11:19
model which contain our keys
11:22
in very classic in any classification
11:25
model so
11:26
now we are on all done we have both
11:28
classification model
11:30
and that model is stored in the variable
11:34
okay please please note
11:37
that if you don't mention anything here
11:41
then
11:41
if we delete the variable name of age
11:45
or if we will salary then the default
11:48
matlab is going to build the model based
11:50
on all on our label within our data
11:55
another very important part uh that
11:59
name model uh model we build which is
12:01
stored in the variable classification
12:03
underscore model is built with some
12:05
default options
12:06
and we can change uh we can check some
12:09
office properties by being in
12:11
command windows for instance
12:50
and please note that there is a sum of
12:52
all the four values happen
12:54
to be a total number of prediction
12:57
uh so classifiers didn't respond
13:00
uh reasonable well in this case so we
13:04
did successfully implement our first
13:06
machine learning
13:07
model and tested its performance on the
13:11
testing data
13:11
and surprisingly we did this
13:15
by using only five lines of code that we
13:18
that you some indicate the strength of
13:20
matlab and that analysis that
13:23
is very and that is with very few lines
13:26
of code
13:27
so enjoy machine learning
13:31
first algorithm
13:35
so now let's have a look at the results
13:38
again we will be
13:39
uh looking at them emitting the values
13:41
later that will always be a part of
13:43
the course but i know we are just not at
13:46
this
13:47
corresponding to the correct predictions
13:49
and these two diagonals values
13:51
correspond to the incorrect
13:52
prediction

You might also like