You are on page 1of 3

So with this idea in mind, we will

start by looking at the first algorithm


for predicting recommendation.
And this is called K nearest neighbors.
Let's write it down--
K nearest neighbors.
So the number K here means, how big
should be your advisory pool on how many neighbors
you want to look at.
And this can be one of the hyperparameters
of the algorithm.
Let's say here we have some user who didn't watch--
let's start with our first user--
the first movie.
But maybe they watch some other movie.
So here this user gave 5, here they get 4,
and here they give 1.
We may have some other users who did watch this movie,
gave to it also 1, and give for 4, 4, and 2.
We also have some other users who, let's say,
liked everything--
there are some people who love everything.
So you can look at this particular thing.
So many here they also like some other movie.
Let's make them a very happy user.
So you can clearly look at these two representations
of the user, and here we are representing the users
by their choices, and say that actually my user is
closer to this specific user.
And maybe because this user predicted 1,
I can give the same score, and maybe it
could be some low score for my first user for the first movie.
Now, in reality, you may not want
to use just a single user maybe there are many other users who
are similar to me.
And I am going to aggregate them all
and look how did the score the movie that I'm interested in
and take their average.
This is our first idea.
So what we will do, we will say, OK,
and I would use the following imitation--
yai and y with a hat means predicted y, predicted value.
I am going to go now through all the users b
which are my K nearest neighbors of me--
of user a-- who did watch movie i.
So I'm going to select all the nearest neighbors who
did watches this movie, take this call for that movie,
because we know that they watched it, and then divide
by K.
So what we pretty much wrote here informally,
it's exactly what I said NN was, we
will identify users similar to me, take their values,
and take the average.
Now, the question you may be asking
is, how do I define this similarity?
And we already discussed, there are many ways
to compute the similarity.
You can use cosine similarity, you can Euclidean distance.
Because, at the end, every user is just a vector,
and you can use any measures that you've seen in this class
to compute the similarity between the user vectors.
Now, some of you may be saying, OK,
let's say I want to have K large,
and some users are really closer to me,
and some users are farther away from me,
you can get even more sophisticated, and you can say,
OK, I'm not just going to take the score of that user,
but I want to weigh it by how similar this user is to me.
We can easily do it.
We can just say, OK, in this case,
we assume that we have similarity score.
And we will look at K nearest neighbors.
But now, when we have taken their score,
we will actually compare the similarity between a and b--
between me and this user--
and use it as a weighting factor.
And now, we will divide it by the sum
of all the similarities.

So in this case, you can see that there are many ways to go.
And people at the time when these algorithms were a popular
designed variety of metrics.
Like, for instance, we can say that if the user--
maybe we as users-- and actually,
people found that this is true in practice--
that our mediian kind of what is my basic score
differs from user to user, meaning that maybe I
think that a movie that is OK, I should just give to it 3.
And I would be 5 to some super, super exceptional movie.
The majority of movies that I like, I give to them 3.
Well, somebody else may give to vast majorities of movies 5,
like this happy user.
And only if they don't like something
they would give to it 4.
So what people have done to that,
they actually compared not the scores themselves,
but how you deviate from your average score.
So instead of using the original matrix,
you would take an average score and record your deviation,
either positive or negative from your average,
and then you'd compare.
So there are lots of different heuristics
how people try to approach this problem.
And they produce some OK results, but they're very far,
all these techniques, from the state-of-the-art method.
And the question is, why?
So clearly, the goodness of this method
would depend on how do you compute the similarity
between me and somebody else.
And it can be cosine similarities.
It can be any other similarity.
But the problem is that if you look at people--
like look at me, I may buy some books in machine learning,
because this is an area that I work in,
but I can also buy books on plants,
because I really love plants.
So how many people do you have which
would have the same tastes both in plants
and in technical books like me?
Maybe not that many.
But there clearly will be people who
would buy the same machine learning textbooks and also
people who are interested in gardening.
So the idea is that this method doesn't
enable you to detect the hidden structures that is there
in the data, which is I may be similar to some pool of users
in one dimension, but similar to some other set of users
in a different dimension.
So when we are moving from K nearest neighbor,
we are going to go to this new approach, which
goes by the name collaborative filtering or matrix
factorization, where the algorithm would be actually
able to detect these hidden groupings among the users,
both in terms of products and in terms of users.
So then you don't have to engineer very sophisticated
similarity measure, your algorithm
would be able to pick up these very complex dependencies.
And for us, as human, it would be definitely not tractable
to come up with such measures.
So now, we are ready to start talking
about the next approach, which is called matrix factorization.

You might also like