You are on page 1of 4

Recommenders can take many different form and many different purposes.

For example I bought recently a gas powered generator, and amazon is automatically recommending
things like starter fluids and motor oil to me. It is amazing how effective these recommendations can
be. Billions of dollars has resulted from them and they have helped people find the things that they
need.

The beauty of it is all data -driven. Recommender systems find relationship between users and items
just based on actions.Usually, there is no human curation involved, at all. It knows statistically that
people who buys generators also buy starter fluid, and it can use those historical patterns to show
people stuff that they want before they even know they want it.

But we don’t have to limit ourselves recommending things, we can also recommend content, music,
movies etc.

Same idea using patterns of articles or books people read instead of stuff people buy.

Music recommenders work in same manner, but music company like pandora take it a step further,
they analyse the tempo and similarities between tempos, musical styles and song structures. This is
an example of content based recommendation.

Content based recommendation is not just on behaviour of user but on properties of things that we
are recommending themselves. In this case, musical properties of the song we like.

Lookup Pandora’s Genome project for more depth.

Why stop at product, music …you can recommend as well people, that’s basically what online dating
services do.

How Recommender system work?

Where from users

Explicit feedback – rating content, course in 1-5 or thumbs up or down

Problem: extra work for users

Data tends to be sparse

Everyone has different standard (one my be rating 4 and other 3)

Implicit behaviour-

Click a link on webpage, click on Ads,


Clicks are not releable, people often click by accident ….alone click dataalone doesnot work….

Things you buy are much better …..

Amamzon purchase data…..has more than enough data

Youtube does consumption data…mins watched

TOP N Recommenders architecture – that means that their job is to produce a finite list of best
things to present to a given person

Say when I search books, I am given options of 100 books through 20 pages…here N is 100

Ultimately our goal is to present best content we can find in front of users in the form.of TOP N list.

Our success depends on the best top recommenders for people. So important is finding things
people will love and not our ability to predict.

Database maybe no sql Cassandra, mongodb

Here is one way a top end recommender might work, and there are many ways to do it.

But generally we start with some data store, representing individual interest of each users. For
example their rating of movies they have seen or implicit ratings such as stuff they have bought in
the past. In practise this is usually a big distributed no sql datastore like Cassandra, mongoDB or
Memcached.because it has to vend lots of data with lot of queries. Ideally, this interest data is
normalized using techniques such as mean centering, or ZScores, to ensure that data is comparable
between users.

But , in real world data is often too sparse to normalise effectively.


The first step is to generate Recommendation Candidate items we think might be interesting to User
based on their past behaviour. So, the candidate generation phase might take all the items a user
indicated interest in before and consult another datastore of similar to those items based on
aggregate behaviour.

Ex- Based on behaviour I know everyone who likes Star Trek also likes Star Wars, so based on my
interest in Star Trek I might get some recommendation candidates that include Star War stuff

In the process of building up those recommendations, I might assign scores to each candidate based
on how I rated the items they came from and how strong the similarities are between the items.and
the candidates that came from them. I might even filter out candidates at this stage if the score isn’t
high enough.

Next we move to candidate ranking, Many candidates will appear more than once, and need to be
combined together in some way, maybe boosting their score in the process since they keep coming
up repeatedly. After that it can just be a matter of sorting the resulting recommended candidates by
score to get our first cut at the TOP N list of recommendations. Although much more complicated
process exist, learning to rank where machine learning is used to find optional ranking of candidates
at this stage. This ranking stage might also have access to more information about the
recommendation candidates that it can use , such as average review scores, that can be used to
boost result for highly rated or popular items for example.

Some filtering will be required before presenting the final sorted list of recommendation candidates,
to the user. This filtering stage is where we might eliminate recommendations for the items the user
has already rated.since we don’t want to recommend things user has already seen. We might also
apply a Stop-list, here to remove items that are potentially offensive to users, or remove items that
below some minimum quality score or minimum rating threshold.

It is also where we apply the N in top N- recommendations and cut things off if we have more result
than we need.

The output of filtering stage is than handed off to your display layer where a pretty widget of
product recommendations is presented to the user.

Generally speaking, the candidate generation, ranking and filtering will live inside some distributed
recommendation webservice that your web front end talks to in the process of rendering a page for
a specific user.

The above diagram is a simplified version of what we call item based collaborative filtering and it is
the same algorithm Amazon published in 2003.
Another way to do it is build up a database ahead of time of predicted ratings of every item by every
user. The candidate generation phase is then just retrieving all of the rating predictions for a given
user for every item and ranking is just a matter of sorting them. This requires us to look at every
single item in our catalog for every single user, however which isn’t very efficient at run time.

In previous case, we only started with items the user actually liked, and worked from there.instead
of looking at every item that exists.

The reason we see these kind of architecture is because people like to measure themselves, on how
accurately they can predict ratings, either good or bad.

But as we will see, that is not the right thing to focus on in the real world.

If you have a small list of items to recommend however, this approach is not entirely unreasonable.

You might also like