Professional Documents
Culture Documents
Supervisor
Assist. Prof. Dr. Nejra Beganović
SARAJEVO
January, 2018
Contents
Abstract ................................................................................................................................................... 3
Introduction ............................................................................................................................................. 4
Literature Review .................................................................................................................................... 6
Methodology ........................................................................................................................................... 6
Data set information ............................................................................................................................ 7
Building the Model.............................................................................................................................. 8
Results ................................................................................................................................................... 11
Predicting Ratings ............................................................................................................................. 12
Recommendations from Catalog ....................................................................................................... 14
Web Service .......................................................................................................................................... 15
References ............................................................................................................................................. 16
2
Abstract
This paper shows the use of the Matchbox recommender modules to train a movie recommender
system on Azure Machine Learning platform. A pure collaborative filtering approach is used
for training the model. The model learns from a collection of users who have given ratings to
some of the movies in the dataset. Matrix factorization is used to deduce from this latent user
preferences and movie traits. These preferences and traits can later be used to predict what
rating a specific user will give to unseen movies so that movies that the user is most likely to
enjoy can be recommended. After training the model, a web service is deployed for an easier
user interface.
3
Introduction
Nowadays, almost everyone had an online experience where a website made custom-made
recommendations in hopes of continuing traffic and future sales. Amazon gives suggestions
like “Customers Who Bought This Item Also Bought”, Udemy gives similar suggestion
“Students Who Viewed This Course Also Viewed”. Netflix gave an award of $1 million to a
developer crew in 2009, for making an algorithm which increased the accuracy of the
In the last few years recommender systems have become increasingly popular and are used in
a variety of areas like: movies, music, news, books, research articles, search queries, social tags,
and products in general. There are also recommender systems for experts, collaborators, jokes,
restaurants, garments, financial services, life insurance, romantic partners (online dating), and
Twitter pages.
Recommender systems are a useful alternative to search algorithms since they help users
learning and software engineering, and acquiring new skills and tools is problematic and time-
consuming.
There are three different groups of recommendation systems. Those are the following:
based on user-based input. They recommend items based on user behavior, and
4
• Content-based filtering systems – Content-based systems generate suggestions based on
items and similarities between them. Pandora, a popular music streaming service, uses
are derived from thin datasets. Netflix is one of the examples of a hybrid recommender
Collaborative systems often use nearest neighbor technique. The end objective of collaborative
and favorite items, along with item characteristics, price ranges, and product categories. This
5
Literature Review
algorithms, were among the earliest algorithms developed for collaborative filtering. These
algorithms are based on the fact that similar users display similar patterns of rating behavior
In the last few years recommender systems have become increasingly popular and were used in
a variety of areas like: movies, music, news, books, research articles, search queries, social tags,
One of the most popular examples on this field is the work of Ruslan Salakhutdinov and Andriy
Mnih and their recommender system which won the 1st prize on Netflix’s competition for
improving their old recommendation system in 2009. They used Probabilistic Matrix
Factorization (PMF) model which scales linearly with the number of observations and, more
importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. They
achieved an error rate of 0.8861, which is nearly 7% better than the score of Netflix’s old system
Michael J. Pazzani and Daniel Billsus in their paper (Pazzani & Billsus, n.d.) discuss content-
based recommendation systems, i.e., systems that recommend an item to a user based upon a
description of the item and a profile of the user’s interests. Content-based recommendation
systems may be used in a variety of domains ranging from recommending web pages, news
articles, restaurants, television programs, and items for sale. Although the details of various
Ivens Portugal, Paulo Alencar and Donald Cowan in their paper (Portugal, Alencar, & Cowan,
2018) discuss why choosing a suitable machine learning algorithm for a recommender system
6
is difficult because of the number of algorithms available today. They analyse different learning
algorithms in recommender systems and gives useful tips which ones to use with their
In paper (Pazzani & Billsus, n.d.) Shreya Agrawal and Pooja Jain analyze how to improve the
filtering and collaborative filtering, using Support Vector Machine as a classifier and a genetic
Methodology
This paper shows the use of the Matchbox recommender modules to train a movie recommender
system on Azure Machine Learning platform. To get started a free trial (200$) account on Azure
Portal needs to be created. When the account is made, implementation of the recommendation
The training data consists of approximately 225,000 ratings for 15,742 movies which was given
by 26,770 users. It was gathered from Twitter using techniques described in the original paper
by Dooms, De Pessemier and Martens (Dooms & Martens, 2014). The data can be found on
Each instance of data consists of a user identifier, a movie identifier, and the rating. The dataset
also contains a time-stamp, but it was not used in this analysis. A short insight of the dataset
can be seen on Figure 1. To this data, a file containing movie names extracted from IMDB was
added. Because a movie id does not tell give any insights about which movie it could be, they
were combined together on the movie identifier from the ratings data.
7
Figure 1. Statistical overview of the dataset
First, because a Train Recommender module will be used, the data needs to be prepared for that
The ratings and movie datasets have already been uploaded and are available in Azure ML
1. The rating field looks like an integer, but is actually numeric type. Since the trainer
2. The Train Recommender module is more tolerant with respect to the user and item
identifiers. To make the results easier to work with, the data needs to be merged
8
including both the ratings and movie title datasets, using the Join module. A specific
key column that is common to both the left and right datasets needs to be chosen. In this
3. The Train Recommender module requires that the input contain three fields used for
training, so Project Columns module is used to select only the user ID, movie name, and
rating fields.
4. This dataset contains a few inconsistent ratings for the same user-movie pairs. This
presents noise in the training and evaluation, so the duplicates need to be removed,
randomly recalling only the first occurrence of each user-movie pair that is encountered.
As with any statistical model, the parameters need to fit on one set of data and test accuracy on
a hold-out set. In a collaborative filtering approach, something about each user and each item
needs to give information, so simply taking a random sample of all the observations will not
9
work. Fortunately, Azure ML Studio provides a special Recommender split option in the Split
module that gives control to the user on how the train and test samples are selected.
• Fraction of training-only users: 0.75. This means that 75% of the users will be used to
train.
• Fraction of test-user ratings for testing: 0.25. For each user in the testing group, 25% of
• Fraction of cold users: 0. Cold users are users for whom no prior training data is known.
Usually, the Matchbox algorithm can use optional user metadata to make
recommendations for users even before we've seen a single rating. However, for this
experiment the user metadata is not given, so fraction of cold users is made 0.
• Fraction of cold items: 0. Cold items are treated the same as cold users, and are evaluated
• Fraction of ignored users: 0. In some cases the user might want to test an algorithm or
Now, everything is ready to train the model. The Train Recommender module requires two
parameters:
• Number of features: This determines the number of hidden parameters that will be
extracted for each user and each item. More features make more powerful models, but
have a risk of over fitting the training data. The parameter is typically determined
through experimentation, with the goal of finding the smallest number that achieves
acceptable performance. For this experiment, the default value of 20 features is used.
10
• Number of iterations: Model parameters are found by arbitrary initialization, followed
by minimizing a residual error, difference between the true and predicted ratings for
each user-movie pair, using an iterative gradient descent technique. The error typically
decreases exponentially, meaning that most of the benefit occurs in the initial iterations.
Therefore, it is common practice not to run the optimization all the way to convergence,
but in its place, limit the iterations to a reasonable number to limit training time. For this
Results
In this experiment, two different ways that can be used for the trained recommender model are
shown: predicting the ratings and making n recommendations from the full catalog for each
user. The first method is used for simply evaluating the performance of the learned model, while
11
To perform different types of predictions, the Score Recommender module is used. The module
• The first required input is a trained model. In this case the output of the trainer has been
directly connected, but for production one the trained model will be saved and then
• The second input is a dataset to be scored. The format of this dataset will depend on the
• The two optional ports are for user and item metadata, similar to the optional inputs
when training. Here no metadata was given, so these fields were left blank.
Predicting Ratings
Prediction is a straightforward task. An input dataset for which the scores are needed are
provided, using the three-item tuple format used for training. The Score Recommender module
will use the trained model to predict a rating for each user-movie pair, and will output a tuple
For evaluating the accuracy of predictions, the Evaluate Recommender module is used. The
first input is the testing dataset, containing tuples (movie-user-rating) similar to those provided
for training. Typically, this data is gained by using the dataset output from the test output port
of the Split module which was used while setting up the experiment. The Evaluate
By using these parameters, the user can limit the evaluation to users who have rated at least n
items; and items that have been rated by at least m users, respectively.
12
In this experiment, the second input contains the same set of tuples that were used earlier for
training the model; thus, evaluation will compare the predicted ratings with the actual ratings,
• Mean Absolute Error (MAE): MAE measures the average magnitude of the errors in a
set of predictions, without considering their direction. It is the average over the test
sample of the absolute differences between prediction and actual observation where all
• Root mean squared error (RMSE): RMSE is a quadratic scoring rule that also measures
the average magnitude of the error. It is the square root of the average of squared
differences between prediction and actual observation. This measures how well the
model approximates the true expected value of the ratings and penalizes large errors
The real value of these metrics is for comparing different parameter settings for the trainer.
For this run values of MAE=1.77 and RMSE=2.46 were obtained. These are reasonable,
13
Recommendations from Catalog
A characteristic usage for a recommendation system is to request the top n items most likely of
interest to a user from the catalog of all items. For this mode the input to the scorer should
contain only one column, containing the user IDs for which to generate recommendations.
To demonstrate this approach, a list of 100 user IDs was generated by taking the test data and
extracting a list of unique user IDs, and then used the Head option in the Partition and Sample
The output, which can be seen on Figure 5., shows the three recommendations for each of the
100 user IDs provided. The Shawshank Redemption and Dark Knight seem to be popular
choices, which is not surprising because they have one of the best scores on IMDb.
14
Web Service
A key feature of Azure Machine Learning is the ability to straightforwardly publish models as
web services on Windows Azure platform. In order to publish the movie recommender, the first
step is to save the trained model. This can be done by clicking the output port of Train
A new experiment is then created which only has the scoring module, and then the train model
is added. Sample input data needs to be provided, so in this case the data pipeline that was built
for sampling 100 user IDs was used. To specify the Web service entry and exit points, the
special Web Service modules were used. Web service input module is attached to the node
where input data would enter the experiment and the output module is attached to the output of
After successfully running the experiment, the experiment can be published by clicking Publish
Web Service at the bottom of the experiment canvas. The overview of the experiment can be
seen on Figure 6.
15
References
Building a recommendation system in Python - as easy as 1-2-3! (2017, May 2). Retrieved
Pazzani, M. J., & Billsus, D. (n.d.). Content-Based Recommendation Systems. In Lecture Notes
Portugal, I., Alencar, P., & Cowan, D. (2018). The use of machine learning algorithms in
recommender systems: A systematic review. Expert Systems with Applications, 97, 205–227.
Salakhutdinov, R., & Mnih, A. (2008). Bayesian probabilistic matrix factorization using
Markov chain Monte Carlo. In Proceedings of the 25th international conference on Machine
Dooms, S., & Martens, L. (2014). “Harvesting movie ratings from structured data in social
media” by Simon Dooms and Luc Martens with Ching-man Au Yeung as coordinator. ACM
JJ. (2016, March 23). MAE and RMSE — Which Metric is Better? – Human in a Machine
world/mae-and-rmse-which-metric-is-better-e60ac3bde13d
16