Professional Documents
Culture Documents
Rex Cheung
Movie ratings:
Iron Man Beautiful Mind Harry Potter Toy Story 4 Annabelle The Greatest Showman
User A 4 5 1
User B 5 5 5
User C 1 1 5 3
User D 3 2 2 1
General pseudocode:
For each item, obtain the attributes.
Calculate similarity between items.
Recommends user items that is most similar to user’s previously
viewed item.
Movie Attributes:
Action Bibliography Drama Horror Sci-Fi Animation Adventure
Iron Man 1 1 1
Beautiful Mind 1 1
Harry Potter 1
Toy Story 4 1 1
Annabelle 1
The Greatest Showman 1 1
Useful for text documents, or when observations are binary, though can be
used when vectors are real-valued.
Rex Cheung (SFSU, DS 862) 4/21/2021 14 / 26
Jaccard Similarity
|X ∩ Y | |X ∩ Y |
s(x , y ) = =
|X ∪ Y | |X | + |Y | − |X ∩ Y |
Similarity matrix:
Iron Man Beautiful Mind Harry Potter Toy Story 4 Annabelle TGS
Ironman 1 0 0.5774 0.4082 0 0
Beautiful Mind 0 1 0 0 0 1
Harry Potter 0.5774 0 1 0.7071 0 1
Toy Story 4 0.4082 0 0.7071 1 0 0
Annabelle 0 0 0 0 1 0
TGS 0 1 0 0 0 1
So, if a user last watched Toy Story 4, the content based recommendation
system will recommend Harry Potter and the Iron Man as the next movies
to watch.
Act Bi Dr H SF Ani Ad
IM 4 0 0 0 4 0 4
TS 0 0 0 0 0 5 5
Anna 0 0 0 1 0 0 0
Total 4 0 0 1 4 5 9
Scaled Total 0.17 0 0 0.05 0.17 0.22 0.39
Table: Black are original attribute value. Red is the weighted attribute value.
Cold start: For a new user or item, there isn’t enough data to make
accurate recommendations
Scalability: A large amount of computation power is often necessary
to calculate recommendations, especially when you have too many
users / items
Sparsity: Users may have only rated a small portion of items
General: 1, 2, 3, 4, 5
Content Based: 1, 2, 3
Some use case: 1, 2, 3