Professional Documents
Culture Documents
Official - Non
(Closed) Sensitive
- Non Sensitive
Table of Contents
1. Project Objectives .......................................................................................................................2
4. Visualisations ............................................................................................................................17
5. Dashboard ................................................................................................................................19
Official (Closed)
Official - Non
(Closed) Sensitive
- Non Sensitive
DVST2020AprAssignment
1. Project Objectives
• Provide a list of primary business/research questions you are trying to answer with your
visualisation(s).
• Background on the business research question as to why the target audience or others should
care.
• The questions may evolve over the course of the project. Update this part by adding or removing
new business/research questions you consider necessary in the course of your analysis.
Based on the dataset that was available for this assignment, a key business value would be to design
a dashboard(s) to facilitate a search for suitable content by users based on their stated preferences.
The preferences could be determined using a number of filters. In addition, a secondary objective would
be to do exploratory data analysis and via visualisations, discover any interesting insights that can be
gleamed from the dataset provided.
Based on the assignment scope, meta-data and data provided, the potential business questions that
were identified included the following:
DVST2020AprAssignment
iv. Lead Time (In Years) between Year of Release and Year Added to Netflix
v. Movies by Duration (in Hours)
vi. TV Shows by Duration (in Seasons)
vii. Movies/TV Shows by Genre
viii. Movies/TV Shows by Content Rating
ix. Is Netflix has increasingly focusing on Movies/TV in recent years
b. Pertaining to Geography of Production
i. Movies/TV Shows produced by Country of Production,
ii. Movies/TV Shows produced by Region of Production
iii. Movies/TV Shows produced in ASEAN
iv. Top N Countries where Movies/TV Show was produced
c. Pertaining to Directors
i. Movies/TV Shows by Director
ii. Top N Directors of Movies/TV Shows
iii. Most Prolific Directors by Movie
iv. Most Prolific Directors by Genre
v. Which Movies/TV Shows has the Most Number of Directors involved in the Production
vi. Which Director has collaborated with the Most Actors
d. Pertaining to Actors
i. Movies/TV Shows by Actor
ii. Top N Actors of Movies/TV Shows
iii. Most Popular First Name for Actors
iv. Most Prolific Actors by Movie
v. Most Prolific Actors by Genre
vi. Which Movies/TV Shows has the Most Number of Actors in their Cast List
vii. Which Actors has collaborated with the Most Directors
DVST2020AprAssignment
2. Data Preparation
• Describe the state of the data (e.g. is it clean, is it complete). Do you expect to do substantial
data clean-up?
• How will you prepare your data so that it is suitable for exploration and analysis?
Describe the steps involved in detail.
a. The data source was made available to us as a csv format. Quite a bit of effort was expended
to transform the data and make it accessible for analysis. After conversion from csv to xls
format, there were the following initial observations made:
i. Dataset File Structure: The file contained 6,235 rows (observations) with 12 columns
(variables). According to the metadata provided with the assignment, the columns are
listed below. I also did an initial exploration on possible data issues and rectify them
where appropriate.
3 title Title of the movie / Some titles had dates, parts of date or time or just
TV show numeric (7 records). Where possible, the data
were verified against the Netflix search engine or
Internet and amended accordingly
4 director Director of the Some blank fields were found (1,969 records). It
Movie may not be possible to update the fields manually
as the number of records is large.
The column was split into specific Directors and
the Director’s name was further split to their
respective First and Last Names. This is to enable
more data available for analysis. An example
would be which Director was most prolific by their
involvement in the TV/movie direction
DVST2020AprAssignment
5 cast Actors involved in Some blank fields were (1,969 records). It may not
the movie / show be possible to update the fields manually as the
number of records is large
6 country Country where the Some blank fields were found (476 records) and
movie / show was some fields have more than 1 country listed.
produced
The Country field was split so that we can analyse
the countries separately. Further classification
was also done to group the countries into Regions.
Both Soviet Union and West Germany were
updated with new Country names
7 date_added Date it was added Some blank fields were possible (11 records).
on Netflix Where possible, the data were verified against the
Netflix search engine or Internet and amended
accordingly manually
8 release_year Actual release year Converted this field from Numeric (Default) to Date
of the movie / TV
show
9 rating TV Rating of the Some blank fields were found (10 records). Where
movie / TV show possible, the data were verified against the Netflix
search engine or Internet and amended
accordingly
11 listed_in Genre of movie / TV Some fields have more than 1 genre listed. The
show field was split into specific genre to enable more
data be made available for analysis
12 description Summarized
description of movie
/ TV show
b. Column Names in the Dataset: Some of the column names were converted to initial caps for
visual aesthetics and readability. Some of the data were transformed as per the above table to
enable more data to be made available for analysis.
DVST2020AprAssignment
c. Data Type: A scan of all the columns in the Tableau Data Sources tab was also done to ensure
that columns are of the appropriate data types (e.g. Numeric Fields, Geographic Data, Date
fields converted from String to Date)
d. Custom Split-CrossTab-Pivoting: For those attributes (i.e. Genre, Country, Director, Cast) that
contains more than 1 categorical data in the same column, a process of split-crosstab-pivot
was done for each field respectively so as to enhance the pool of categorical data available for
analysis. The Tableau split function was only able to generate max 10 split columns but this is
acceptable as I have computed that I would be able to cover over 99% of the data available for
recovery from splits. This process also generate a lot of blank rows which may be a
performance drain on Tableau. These were removed by using Excel to filter out the blank lines
(e.g. Rows with Show_ID but not Country is blank). This process enrich the analysis in areas
such as Director, Actor, Country in which production was located and the type of Genre.
e. Grouping of Countries: For Countries, I also did a grouping of countries in broad regions (6 –
Africa, Americas, ASEAN, Asia, Europe and Oceania) to enrich the analysis.
f. Some of the names have special characters in them so I progressively verify the names against
Netflix search engine or the Internet and amend them manually. I may not be able to amend
all the names with special characters.
g. Using Calculated Fields: There were some interesting data on the duration of movies which
were then categorised using calculated fields to enhance the analysis for insights.
DVST2020AprAssignment
ii. Before the data transformation, there were the original data available for Cast, Country,
Director and Genre:
DVST2020AprAssignment
Table 2-2: Example of Fields with multiple Directors (DirectorList), Actors (Castlist) and Countries
(CountryList)
DVST2020AprAssignment
a. Based on a number of basic descriptive visualisations, quite a number of columns were suited
for categorical analysis and transformation. I found the following observations most
interesting in terms of insights for this project. Please check the Data Story entitled
“Exploratory Data Analysis - Insights I Learnt about the Netflix Dataset”
DVST2020AprAssignment
2) Most of the movies were of 1-1.5 hrs and 1.5-2 hrs duration (Side note: This was
enabled through the use of Calculated Fields to categorize the numerical data)
DVST2020AprAssignment
4) Over the years, there has been a steady increase in the proportion of movies added
to Netflix
5) In absolute numbers as well each year, we can also observe more Movies being
added to the offering in recent years. In addition, some months are “hotter” than
other months in their addition. You can use the filter to focus on Movies, TV Shows
or both.
DVST2020AprAssignment
6) About one-third (1/3, 2174, 34.9%) of the titles were added to Netflix within a lead
time of between 1 to 3 years.
Table 3-7: Lead Time (Years) between Year of Release and the Year the content was
added to Netflix
DVST2020AprAssignment
into North, South and Central America but it is advisable to keep the color range
between 3 to 5 entities for visual impact)
2) Interestingly, ASEAN as a whole had 206 titles in the dataset. The major production
(in descending order) were done in Thailand, Indonesia, Philippines, Singapore and
Malaysia. The ranking of the countries are different if you toggle the (content) Type
filter.
DVST2020AprAssignment
3) Singapore had 30 productions located in the country between 2009 and 2018
DVST2020AprAssignment
1) The 5 most popular Actor’s first names were Michael, David, John, Paul, James and
Paul. You can toggle the Type filter to see the most popular Actor’s first names for
Movies and TV Shows. You can also toggle to see the Top N (i.e. 5, 10, 15 etc, in
increments of 5) most popular Actor’s first names.
2) James Franco was the top actor who collaborated with the most Directors.
DVST2020AprAssignment
DVST2020AprAssignment
4. Visualisations
• Identify the core findings and insights that help to answer the business/research questions
identified earlier. Create the visualisations accordingly.
• What are the different visualisations you have considered? Justify the design decisions you
made using visualisation and design principles.
• Describe each visualisation by highlighting the business/research questions it answers.
• Explain how to interpret the visualisations in order to answer the business/research questions
(e.g. interactive elements).
b. Who:
i. The visualisation is targeted at the end-user who might be interested to find out what
content is available for viewing based on a number of preferences (enabled by
Filters). The preferences could take many forms e.g. Title, Year of Release, Genre,
Director, Actor etc.
ii. The dashboard would likely take the shape of either an exploratory or information
dashboard
c. Where:
i. The interface can be in the form of desktop, tablet or mobile. However, based on the
questions derived, I decided to optimise the design and visual real estate targeted at
the desktop dimensions instead
d. Why
i. As the dataset has many dimensions for analysis, it could be quite challenging for an
interested user to look for interesting content to view. It is hoped that with a
visualisation dashboard, it would help facilitate this search for interesting content to
watch
e. What
DVST2020AprAssignment
Business Questions
Table 4-1: Overall plan in grouping the business questions to shape the
analysis and design of the Dashboard(s)
i. Based on the business questions collected and generated, a grouping of 4 key categories
is possible to focus the questions in facilitating the analysis and design of the dashboards.
DVST2020AprAssignment
5. Dashboard
• From your library of visualisations created earlier, identify suitable ones to create a dashboard.
Recall that dashboards should incorporate only visualisations related to one main topic.
• Depending on your business/research questions, create multiple dashboards to aid you in
performing your video recorded data story telling presentation subsequently.
Using the Netflix portal as inspiration, the following dashboards were designed:
a) Search – allowing a user to find suitable content based on his/her preferences. Currently for
Netflix.com, they have a search field which allows you to input a Title, People (Actors or
Directors) and Genre you wish to search for
b) Movies – interesting visual filters related to Movies. Currently for Netflix, they have a Movies
page with a listing of recommendations.
c) TV Shows – interesting visual filters related to TV Shows. Currently for Netflix, they have a TV
Shows page with a listing of recommendations.
DVST2020AprAssignment
DVST2020AprAssignment
DVST2020AprAssignment
Filters were added to see different categories, Top N number of items as well as links to Wikipedia
(Countries and Regions) and IMDB (Titles, Directors, Actors) for more information