Professional Documents
Culture Documents
http://www.caffeinatedrook.com/MovieRec/MovieRecServlet
1
Problem Statement
Data: User-Movie Ratings
Input: User number and Movie number
Output: Predicted Rating
Goal: Predict Ratings with the smallest RMSD possible.
(Make Customer happy.)
-0.07283353 0.001291469
-0.07283353
0.11694841 0.06918105
0.11694841
-0.0622078 -0.08339876
+ =
-0.0622078
0.081832446 0.12175138
0.081832446
0.049034953 0.17088805
0.049034953
-0.008441236 -0.1085485
-0.008441236
0.004925302 -0.07531176
0.004925302
0.001412398 -0.083747916
0.001412398
0.05334269 0.12860337
0.05334269
2
4.312 / 5
Motivation – Netflix
www.netflix.com
3
Impact to Field
Better Recommendations
= Happy Customers
Happy Customers
= More Money
= Larger Market Share
4
Motivation - Personal
One million of them
Feature Extraction
5
A problem with the problem statement
Tackling the Netflix Challenge requires many
hundreds (thousands…more?) of hours of
computation.
Ultimately, it will require the solution to many sub-
problems.
Sparcity
Noise
Memory Requirements
Movie Similarity
User Similarity (more on these a little later)
6
The Problem Statement Redefined
7
Related Work
Netflix Prize forum http://www.netflixprize.com//community/
Lots of info on strategies people are trying.
www.netflix.com www.blockbuster.com
www.amazon.com www.spout.com
8
Domain Understanding
9
Data Selection
First, what the data does not contain.
10
Data Selection
6:
• 17,770 Movies
2031561,1,2004-07-26
1176140,1,2004-02-16
2336133,2,2004-09-05 • 480,189 users
1521836,1,2004-08-11
117277,3,2004-10-12
326587,3,2004-09-06 • 100,480,507 Ratings
1961542,3,2004-04-20
1041552,3,2004-10-19
1678346,3,2005-04-11 • 17,770 * 480,189
643182,2,2004-07-18
2182301,5,2004-08-04 = 8,532,958,530
2502669,2,2004-02-10
2211030,4,2004-05-26
603277,3,2004-12-13 • 100,480,507 / 8,532,958,530
214166,2,2005-10-09 = 0.01177
……..
……..
(%98.8 sparse!)
11
Cleaning and Preprocessing
12
Discovering Patterns
Which Software to use? SPSS, SAS, Weka?
8,532,958,530 ratings * 4 bytes / rating
34,131,834,120 bytes
33,331,869 kilobytes
32,550 megabytes
31 gigabytes
14
Discovering Patterns
M = 17,770 * 25
D = 17,770 * 480,189
444,250
12,004,725
12,448,975
8,532,958,530 Movie: a
User: b
vab = ∑i(Uai x Mbi)
1000
U = 25 * 480,189 .001
1c/5h
15 25c / ~5
A little board work to explain the algorithm
16
Interpretation: Feature 1-movie view
Trailer Park Boys: Season 3 Sweet Potato Pie
Trailer Park Boys: Season 4 Legion of the Dead
The Lord of the Rings: The Fellowship of the Dark Town
Ring: Extended Edition Comedy Only in Da Hood
Lord of the Rings: The Return of the King: Predator Island
Extended Edition Bad Bizness
Lord of the Rings: The Two Towers: Extended Vampiyaz
Edition My Big Phat Hip Hop Family
Lost: Season 1 Jack O'Lantern
Veronica Mars: Season 1 Desperate Souls
House
4
As Time Goes By: Series 9
Gilmore Girls: Season 4 3
-1
-2 17
Interpretation: Feature 2-movie view
Lost in Translation
National Lampoon's Mr. Wong
Without You I'm Nothing
Punch-Drunk Love Dragon Ball Z: World Tournament
Dogville Dragon Ball: Piccolo Jr. Saga: Part 2
The Royal Tenenbaums Dragon Ball: Tien Shinhan Saga
Whiteboyz Dragon Ball Z: Fusion
Pornografia Dragon Ball: Red Ribbon Army Saga
Spooks & Creeps Dragon Ball Z: Garlic Jr.
Kaaterskill Falls Dragon Ball: Piccolo Jr. Saga: Part 1
Armageddon
1 Dragon Ball: The Path to Power
0.8 Pearl Harbor
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
18 -1
Interpretation: Feature 3-movie view
Nostradamus: A Voice from the Past 1.5
Absolution
Ozzy Osbourne: Double O: Unauthorized 1
Monster-a-Go-Go! 0.5
Still Bout It
WWE: Rebellion 2002 -1.5
18 components
Find the nearest neighbors using Euclidean (q=2) distance
q=1 q=3
1. American Beauty (1999) 1. American Beauty (1999)
2. Fight Club (1999) 2. Mystic River (2003)
3. Reservoir Dogs (1992) 3. Fight Club (1999)
4. Mystic River (2003) 4. Traffic (2000)
q=2 q=4
1. American Beauty (1999) 1. American Beauty (1999)
2. Fight Club (1999) 2. Mystic River (2003)
3. Mystic River (2003) 3. Fight Club (1999)
4. Reservoir Dogs (1992) 4. Traffic (2000)
20
Demo – Name a movie!
http://www.caffeinatedrook.com/MovieRec/MovieRecServlet
21