You are on page 1of 16

INTERNATIONAL BURCH UNIVERSITY

FACULTY OF ENGINEERING AND NATURAL SCIENCES


DEPARTMENT OF INFORMATION TECHNOLOGIES

MOVIE RECOMMENDATION SYSTEM USING MICROSOFT AZURE


MACHINE LEARING STUDIO

CLOUD COMPUTING PROJECT REPORT


TARIK SULIĆ

Supervisor
Assist. Prof. Dr. Nejra Beganović

SARAJEVO
January, 2018
Contents
Abstract ................................................................................................................................................... 3
Introduction ............................................................................................................................................. 4
Literature Review .................................................................................................................................... 6
Methodology ........................................................................................................................................... 6
Data set information ............................................................................................................................ 7
Building the Model.............................................................................................................................. 8
Results ................................................................................................................................................... 11
Predicting Ratings ............................................................................................................................. 12
Recommendations from Catalog ....................................................................................................... 14
Web Service .......................................................................................................................................... 15
References ............................................................................................................................................. 16

2
Abstract

This paper shows the use of the Matchbox recommender modules to train a movie recommender

system on Azure Machine Learning platform. A pure collaborative filtering approach is used

for training the model. The model learns from a collection of users who have given ratings to

some of the movies in the dataset. Matrix factorization is used to deduce from this latent user

preferences and movie traits. These preferences and traits can later be used to predict what

rating a specific user will give to unseen movies so that movies that the user is most likely to

enjoy can be recommended. After training the model, a web service is deployed for an easier

user interface.

3
Introduction

Nowadays, almost everyone had an online experience where a website made custom-made

recommendations in hopes of continuing traffic and future sales. Amazon gives suggestions

like “Customers Who Bought This Item Also Bought”, Udemy gives similar suggestion

“Students Who Viewed This Course Also Viewed”. Netflix gave an award of $1 million to a

developer crew in 2009, for making an algorithm which increased the accuracy of the

company’s recommendation system by 10 percent.

In the last few years recommender systems have become increasingly popular and are used in

a variety of areas like: movies, music, news, books, research articles, search queries, social tags,

and products in general. There are also recommender systems for experts, collaborators, jokes,

restaurants, garments, financial services, life insurance, romantic partners (online dating), and

Twitter pages.

Recommender systems are a useful alternative to search algorithms since they help users

discover items they might not have found otherwise.

Building recommender systems nowadays needs specialized knowledge in analytics, machine

learning and software engineering, and acquiring new skills and tools is problematic and time-

consuming.

There are three different groups of recommendation systems. Those are the following:

• Collaborative filtering systems – Collaborative systems generate recommendations

based on user-based input. They recommend items based on user behavior, and

similarities between users. An example of this is Google PageRank, which recommends

similar web pages based on a web pages’ back links.

4
• Content-based filtering systems – Content-based systems generate suggestions based on

items and similarities between them. Pandora, a popular music streaming service, uses

content-based filtering to make its music recommendations.

• Hybrid recommendation systems – Hybrid recommendation systems mix both

collaborative and content-based algorithms. They help improve recommendations that

are derived from thin datasets. Netflix is one of the examples of a hybrid recommender

(“Building a recommendation system in Python - as easy as 1-2-3!,” 2017).

Collaborative systems often use nearest neighbor technique. The end objective of collaborative

filtering systems is to make recommendations based on users’ behavior, purchasing patterns,

and favorite items, along with item characteristics, price ranges, and product categories. This

paper analyzes the implementation of a collaborative filtering recommendation system.

5
Literature Review

Neighborhood-based collaborative filtering algorithms, also referred to as memory-based

algorithms, were among the earliest algorithms developed for collaborative filtering. These

algorithms are based on the fact that similar users display similar patterns of rating behavior

and similar items receive similar ratings (Aggarwal, 2016).

In the last few years recommender systems have become increasingly popular and were used in

a variety of areas like: movies, music, news, books, research articles, search queries, social tags,

and products in general. A lot of research was done on this topic.

One of the most popular examples on this field is the work of Ruslan Salakhutdinov and Andriy

Mnih and their recommender system which won the 1st prize on Netflix’s competition for

improving their old recommendation system in 2009. They used Probabilistic Matrix

Factorization (PMF) model which scales linearly with the number of observations and, more

importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. They

achieved an error rate of 0.8861, which is nearly 7% better than the score of Netflix’s old system

(Salakhutdinov & Mnih, 2008).

Michael J. Pazzani and Daniel Billsus in their paper (Pazzani & Billsus, n.d.) discuss content-

based recommendation systems, i.e., systems that recommend an item to a user based upon a

description of the item and a profile of the user’s interests. Content-based recommendation

systems may be used in a variety of domains ranging from recommending web pages, news

articles, restaurants, television programs, and items for sale. Although the details of various

systems differ, content-based recommendation systems share in common a means for

describing the items that may be recommended.

Ivens Portugal, Paulo Alencar and Donald Cowan in their paper (Portugal, Alencar, & Cowan,

2018) discuss why choosing a suitable machine learning algorithm for a recommender system

6
is difficult because of the number of algorithms available today. They analyse different learning

algorithms in recommender systems and gives useful tips which ones to use with their

advantages and disadvantages.

In paper (Pazzani & Billsus, n.d.) Shreya Agrawal and Pooja Jain analyze how to improve the

quality of a movie recommendation system, a Hybrid approach by combining content based

filtering and collaborative filtering, using Support Vector Machine as a classifier and a genetic

algorithm which they provided in their methodology.

Methodology

This paper shows the use of the Matchbox recommender modules to train a movie recommender

system on Azure Machine Learning platform. To get started a free trial (200$) account on Azure

Portal needs to be created. When the account is made, implementation of the recommendation

system can start.

Data set information

The training data consists of approximately 225,000 ratings for 15,742 movies which was given

by 26,770 users. It was gathered from Twitter using techniques described in the original paper

by Dooms, De Pessemier and Martens (Dooms & Martens, 2014). The data can be found on

the following website: https://github.com/sidooms/MovieTweetings.

Each instance of data consists of a user identifier, a movie identifier, and the rating. The dataset

also contains a time-stamp, but it was not used in this analysis. A short insight of the dataset

can be seen on Figure 1. To this data, a file containing movie names extracted from IMDB was

added. Because a movie id does not tell give any insights about which movie it could be, they

were combined together on the movie identifier from the ratings data.

7
Figure 1. Statistical overview of the dataset

Building the Model

First, because a Train Recommender module will be used, the data needs to be prepared for that

usage. It requires triplets in this format: <user, item, rating>.

The ratings and movie datasets have already been uploaded and are available in Azure ML

Studio, so they are just connected to the studio environment.

1. The rating field looks like an integer, but is actually numeric type. Since the trainer

requires an integer rating, Metadata Editor is used to transform it to integer.

2. The Train Recommender module is more tolerant with respect to the user and item

identifiers. To make the results easier to work with, the data needs to be merged

8
including both the ratings and movie title datasets, using the Join module. A specific

key column that is common to both the left and right datasets needs to be chosen. In this

case it is the “Movie Id” column.

3. The Train Recommender module requires that the input contain three fields used for

training, so Project Columns module is used to select only the user ID, movie name, and

rating fields.

4. This dataset contains a few inconsistent ratings for the same user-movie pairs. This

presents noise in the training and evaluation, so the duplicates need to be removed,

randomly recalling only the first occurrence of each user-movie pair that is encountered.

The overview of the data preprocessing can be seen on Figure 2.

Figure 2. Overview of Data Preprocessing

As with any statistical model, the parameters need to fit on one set of data and test accuracy on

a hold-out set. In a collaborative filtering approach, something about each user and each item

needs to give information, so simply taking a random sample of all the observations will not
9
work. Fortunately, Azure ML Studio provides a special Recommender split option in the Split

module that gives control to the user on how the train and test samples are selected.

For this experiment, the following settings were used:

• Fraction of training-only users: 0.75. This means that 75% of the users will be used to

train.

• Fraction of test-user ratings for testing: 0.25. For each user in the testing group, 25% of

that user's ratings will be used for testing the model.

• Fraction of cold users: 0. Cold users are users for whom no prior training data is known.

Usually, the Matchbox algorithm can use optional user metadata to make

recommendations for users even before we've seen a single rating. However, for this

experiment the user metadata is not given, so fraction of cold users is made 0.

• Fraction of cold items: 0. Cold items are treated the same as cold users, and are evaluated

only on movies for which the ratings are known.

• Fraction of ignored users: 0. In some cases the user might want to test an algorithm or

settings on a subset of the data. Here the full dataset is used.

• Fraction of ignored items: 0. Same as for users.

Now, everything is ready to train the model. The Train Recommender module requires two

parameters:

• Number of features: This determines the number of hidden parameters that will be

extracted for each user and each item. More features make more powerful models, but

have a risk of over fitting the training data. The parameter is typically determined

through experimentation, with the goal of finding the smallest number that achieves

acceptable performance. For this experiment, the default value of 20 features is used.

10
• Number of iterations: Model parameters are found by arbitrary initialization, followed

by minimizing a residual error, difference between the true and predicted ratings for

each user-movie pair, using an iterative gradient descent technique. The error typically

decreases exponentially, meaning that most of the benefit occurs in the initial iterations.

Therefore, it is common practice not to run the optimization all the way to convergence,

but in its place, limit the iterations to a reasonable number to limit training time. For this

experiment the default value of 30 is used.

Results

In this experiment, two different ways that can be used for the trained recommender model are

shown: predicting the ratings and making n recommendations from the full catalog for each

user. The first method is used for simply evaluating the performance of the learned model, while

the second method represents a typical manufacturing use case.

Figure 3. Overview of Training the Matchbox Recommender System

11
To perform different types of predictions, the Score Recommender module is used. The module

has two required and two optional inputs.

• The first required input is a trained model. In this case the output of the trainer has been

directly connected, but for production one the trained model will be saved and then

connected to the scorer.

• The second input is a dataset to be scored. The format of this dataset will depend on the

task, which will be described below.

• The two optional ports are for user and item metadata, similar to the optional inputs

when training. Here no metadata was given, so these fields were left blank.

Predicting Ratings

Prediction is a straightforward task. An input dataset for which the scores are needed are

provided, using the three-item tuple format used for training. The Score Recommender module

will use the trained model to predict a rating for each user-movie pair, and will output a tuple

consisting of <user, item, predicted rating>.

For evaluating the accuracy of predictions, the Evaluate Recommender module is used. The

first input is the testing dataset, containing tuples (movie-user-rating) similar to those provided

for training. Typically, this data is gained by using the dataset output from the test output port

of the Split module which was used while setting up the experiment. The Evaluate

Recommender module requires two parameters:

• Minimum number of items

• Minimum number of users

By using these parameters, the user can limit the evaluation to users who have rated at least n

items; and items that have been rated by at least m users, respectively.

12
In this experiment, the second input contains the same set of tuples that were used earlier for

training the model; thus, evaluation will compare the predicted ratings with the actual ratings,

using these two metrics:

• Mean Absolute Error (MAE): MAE measures the average magnitude of the errors in a

set of predictions, without considering their direction. It is the average over the test

sample of the absolute differences between prediction and actual observation where all

individual differences have equal weight.

• Root mean squared error (RMSE): RMSE is a quadratic scoring rule that also measures

the average magnitude of the error. It is the square root of the average of squared

differences between prediction and actual observation. This measures how well the

model approximates the true expected value of the ratings and penalizes large errors

more heavily (JJ, 2016).

Figure 4. MAE and RMSE Values

The real value of these metrics is for comparing different parameter settings for the trainer.

For this run values of MAE=1.77 and RMSE=2.46 were obtained. These are reasonable,

considering the 1-10 rating scale.

13
Recommendations from Catalog

A characteristic usage for a recommendation system is to request the top n items most likely of

interest to a user from the catalog of all items. For this mode the input to the scorer should

contain only one column, containing the user IDs for which to generate recommendations.

To demonstrate this approach, a list of 100 user IDs was generated by taking the test data and

extracting a list of unique user IDs, and then used the Head option in the Partition and Sample

module to select the first 100.

The output, which can be seen on Figure 5., shows the three recommendations for each of the

100 user IDs provided. The Shawshank Redemption and Dark Knight seem to be popular

choices, which is not surprising because they have one of the best scores on IMDb.

Figure 5. Movie Recommendations for 100 User IDs

14
Web Service

A key feature of Azure Machine Learning is the ability to straightforwardly publish models as

web services on Windows Azure platform. In order to publish the movie recommender, the first

step is to save the trained model. This can be done by clicking the output port of Train

Recommender and selecting the option, Save as Trained Model.

A new experiment is then created which only has the scoring module, and then the train model

is added. Sample input data needs to be provided, so in this case the data pipeline that was built

for sampling 100 user IDs was used. To specify the Web service entry and exit points, the

special Web Service modules were used. Web service input module is attached to the node

where input data would enter the experiment and the output module is attached to the output of

the Matchbox recommender system.

After successfully running the experiment, the experiment can be published by clicking Publish

Web Service at the bottom of the experiment canvas. The overview of the experiment can be

seen on Figure 6.

Figure 5. Web Service Experiment Overview

15
References

Aggarwal, C. C. (2016). Recommender Systems. Cham: Springer International Publishing.

Building a recommendation system in Python - as easy as 1-2-3! (2017, May 2). Retrieved

January 22, 2018, from http://www.data-mania.com/blog/recommendation-system-python/

Pazzani, M. J., & Billsus, D. (n.d.). Content-Based Recommendation Systems. In Lecture Notes

in Computer Science (pp. 325–341).

Portugal, I., Alencar, P., & Cowan, D. (2018). The use of machine learning algorithms in

recommender systems: A systematic review. Expert Systems with Applications, 97, 205–227.

Salakhutdinov, R., & Mnih, A. (2008). Bayesian probabilistic matrix factorization using

Markov chain Monte Carlo. In Proceedings of the 25th international conference on Machine

learning - ICML ’08. https://doi.org/10.1145/1390156.1390267

Dooms, S., & Martens, L. (2014). “Harvesting movie ratings from structured data in social

media” by Simon Dooms and Luc Martens with Ching-man Au Yeung as coordinator. ACM

SIGWEB Newsletter, (Winter), 1–5.

JJ. (2016, March 23). MAE and RMSE — Which Metric is Better? – Human in a Machine

World – Medium. Retrieved January 31, 2018, from https://medium.com/human-in-a-machine-

world/mae-and-rmse-which-metric-is-better-e60ac3bde13d

16

You might also like