Professional Documents
Culture Documents
Moviesuggester - Jupyter Notebook
Moviesuggester - Jupyter Notebook
Movie Suggester
By: Haylee Bell, Marilyn Kathka, David Baker and Eduardo
Belman
After you finish watching a movie do you feel empty with nothing to
do? This project aims to solve this problem by suggesting a similar
movie to you, so you have something else to watch.
The data from this dataset was scraped from various streaming platforms such as amazon, apple
tv, crunchyroll, darkmatter, disney, funimation, hbo, hulu, netflix, paramount, rakuten viki, and starz.
The information included in the dataset from the movies/tv shows are the title, type, description,
release year, age_certification, runtime, genres, production countries, seasons, imdb id, imdb
score, imdb votes, imdb popularity and imdb score.
Goal/Prediction
In this project we will predict which movies a user will be most likely to enjoy depending on the last
movie he watched. To do this we will be using imdb score, release year, and runtime as predictors
to find which movie the user should watch next.
Data Preparation
The data initially came separated in different files depending on what streaming service they were
scraped from. What we did first was compile them into one file so the data is easier to manage. We
titled this file raw_titles.csv and can be found here: raw_titles.csv
(https://raw.githubusercontent.com/Osprey-Corp/CST383-Final/main/raw_titles.csv)
localhost:8888/notebooks/Downloads/MovieSuggester.ipynb# 1/11
8/16/22, 3:40 AM MovieSuggester - Jupyter Notebook
<class 'pandas.core.frame.DataFrame'>
Since we're creating a movie suggester and not a movie/tv show suggester we will be only be
keeping entries with type MOVIE.
This reduced our working dataset from 31392 entries to 21613 entries.
localhost:8888/notebooks/Downloads/MovieSuggester.ipynb# 2/11
8/16/22, 3:40 AM MovieSuggester - Jupyter Notebook
In [ ]: df = df[df['type']=='MOVIE']
df.info()
df.describe()
<class 'pandas.core.frame.DataFrame'>
Through our first experimentation with data exploration and visualization we grouped our data by
genre and found out each genres average IMDB score. Through this experiment we found that the
Lowest Rated Genre was Horror and Highest Rated Genre was Documentation.
localhost:8888/notebooks/Downloads/MovieSuggester.ipynb# 3/11
8/16/22, 3:40 AM MovieSuggester - Jupyter Notebook
localhost:8888/notebooks/Downloads/MovieSuggester.ipynb# 4/11
8/16/22, 3:40 AM MovieSuggester - Jupyter Notebook
In [ ]: # Get dataframe grouped by genre, and displaying each genres average imdb sco
single_category = df[df['genres'].str.count("'") == 2].groupby('genres')['imd
# Display data
print(single_category)
print('\nLowest Rated Genre:', single_category.idxmin(), 'Score:', single_cat
print('Highest Rated Genre:', single_category.idxmax(), 'Score:', single_cate
# Graph data
single_category.plot.bar(color=(0.1, 0.1, 0.1, 0.1), edgecolor='blue')
plt.title("Average IMDB Scores per Genre")
plt.xlabel("Genre")
plt.ylabel("Average Score")
genres
['action'] 4.899180
['animation'] 6.574359
['comedy'] 6.027032
['crime'] 5.816667
['documentation'] 6.905603
['drama'] 6.317478
['family'] 5.595745
['fantasy'] 5.492857
['history'] 6.150000
['horror'] 4.217564
['music'] 6.812500
['romance'] 5.886792
['scifi'] 4.400000
['sport'] 5.600000
['thriller'] 5.186730
['war'] 6.000000
['western'] 5.705914
localhost:8888/notebooks/Downloads/MovieSuggester.ipynb# 5/11
8/16/22, 3:40 AM MovieSuggester - Jupyter Notebook
In our second experiment we will group our data by release year and find out each release year's
average IMDB score. Through this experiment we found that the Lowest Rated Release Year was
1935 and Highest Rated Release Year was 1926. Through our graph we can see the in the
beginning of the 20th century movies typically were rated higher than they currently are.
localhost:8888/notebooks/Downloads/MovieSuggester.ipynb# 6/11
8/16/22, 3:40 AM MovieSuggester - Jupyter Notebook
In [ ]: # Get dataframe grouped by release year, and displaying each release year's a
single_year = df.groupby('release_year')['imdb_score'].mean().dropna()
# Display data
print(single_year)
print('\nLowest Rated Year:', single_year.idxmin(), 'Score:', single_year.min
print('Highest Rated Year:', single_year.idxmax(), 'Score:', single_year.max(
# Graph data
single_year.plot.line()
plt.title("Average IMDB Scores per Release Year")
plt.xlabel("Year")
plt.ylabel("Average Score")
release_year
1912 5.800000
1914 5.600000
1915 6.083333
1916 7.050000
1917 6.233333
...
2018 5.840277
2019 5.951824
2020 5.839374
2021 5.834129
2022 6.262011
localhost:8888/notebooks/Downloads/MovieSuggester.ipynb# 7/11
8/16/22, 3:40 AM MovieSuggester - Jupyter Notebook
In [ ]: # Get dataframe grouped by release year, and displaying each release year's a
runtime_year = df.groupby('release_year')['runtime'].mean().dropna()
# Display data
print(runtime_year)
print('\nYear with Lowest Runtime:', runtime_year.idxmin(), 'Score:', runtime
print('Year with Highest Runtime:', runtime_year.idxmax(), 'Score:', runtime_
# Graph data
runtime_year.plot.line()
plt.title("Average Runtime (Minutes) per Release Year")
plt.xlabel("Year")
plt.ylabel("Average Runtime (Minutes)")
release_year
1901 2.000000
1902 8.000000
1903 2.000000
1904 21.000000
1906 8.000000
...
2018 94.189969
2019 93.790486
2020 89.566957
2021 92.355464
2022 92.461207
Machine Learning
localhost:8888/notebooks/Downloads/MovieSuggester.ipynb# 8/11
8/16/22, 3:40 AM MovieSuggester - Jupyter Notebook
localhost:8888/notebooks/Downloads/MovieSuggester.ipynb# 9/11
8/16/22, 3:40 AM MovieSuggester - Jupyter Notebook
Title: Expelled from Paradise - IMDB Score: 6.7 - Release Year: 2014 -
Runtime: 104
Title: Paycheck - IMDB Score: 6.3 - Release Year: 2003 - Runtime: 119
- Distance: 0.00034925176931333013
Title: Sky Racket - IMDB Score: 4.8 - Release Year: 1937 - Runtime: 63
localhost:8888/notebooks/Downloads/MovieSuggester.ipynb# 10/11
8/16/22, 3:40 AM MovieSuggester - Jupyter Notebook
- Distance: 0.00034925176931333013
Title: Visit to a Chief's Son - IMDB Score: 6.8 - Release Year: 1974 -
Runtime: 85 - Distance: 0.0004673107914404673
localhost:8888/notebooks/Downloads/MovieSuggester.ipynb# 11/11