You are on page 1of 7

1

CIA – 1

REPORT ON DATA VISUALIZATION

Under the Guidance of


Dr Subburaj Alagarsamy

MBA PROGRAMME
SCHOOL OF BUSINESS AND MANAGEMENT
CHRIST (DEEMED TO BE UNIVERSITY), BANGALORE

AUGUST 2021

Done By:
ATUL JOHNSON
SECTION – J
REGISTER NUMBER – 2127714
2

DATASET

Source - https://www.kaggle.com/narmelan/top-ten-blockbusters-20191977

The Worldwide Blockbusters 1977-2019 dataset contains data on the top 10 highest-grossing
films in the world from 1977 to 2019. The main attributes of the dataset are – Film title,
worldwide gross, film budget, rating, genre, domestic distributor, and released year. The
dataset also contains the information of the 10 ten grossing films worldwide for each year.
This dataset gives a vast idea about the size of the film industry and its turnovers. The IMDb
ratings provide information about the values and recognition of the movies. The names of the
distribution company are provided with the film name. There are 430 observations and 14
attributes.

Transformation or cleaning done –

The cleaning done in the dataset are as follows.

 The whole dataset is converted into a table to understand the dataset better.
 It is sorted from highest to lowest gross value.
 The data was well arranged and clean, there was no outliers or blank observations.

Nominal Data

Nominal data is a type of qualitative data which groups variables into categories.

Distributer's Releases
15 1
17 1 Warner Bros.
51111 71
Walt Disney
13 Vestron Pictures
1
8 Universal Pictures
6 United Artists
Twentieth Century Fox
TriStar Pictures
The H Collective
51 Summit Entertainment
Sony Pictures
76 Rank Film Distributors
Paramount Pictures
Orion Pictures
1 New Line Cinema
Miramax
31 Metro-Goldwyn-Mayer
1 Lionsgate
31 IFC Films
9 Icon Productions
Fox Searchlight Pictures
58 Embassy Pictures
49 8
3

Transformation – Count of
Row Labels domestic_distributor
Data is converted to a pivot chart from Warner Bros. 71
Walt Disney 76
the dataset, where domestic distributor
Vestron Pictures 1
row was dragged to row and value in the Universal Pictures 58
United Artists 8
pivot table, then created this pivot chart. Twentieth Century Fox 49
TriStar Pictures 9
Inference – The H Collective 1
Summit Entertainment 3
This graph gives an idea about which Sony Pictures 31
distributors in the industry have given Rank Film Distributors 1
Paramount Pictures 51
their best. Since this dataset consists of Orion Pictures 6
New Line Cinema 8
the best gross movies of all time, we
Miramax 1
understand that all these distributors have Metro-Goldwyn-Mayer 13
Lionsgate 5
made their profits and done an excellent
IFC Films 1
job. But when it comes to the best among Icon Productions 1
Fox Searchlight Pictures 1
them, we can know from the graph that Embassy Pictures 1
Walt Disney is the most successful DreamWorks 17
Compass International
distributor in the world. They have 76 Pictures 1
movies on the list. Just behind it comes Columbia Pictures 15
American International
Warner Bros, with 71 films on the list. Pictures 1
Just behind them comes the Universal Grand Total 430

Pictures and Paramount Pictures. All other distributors stay way back, when comparing to the
distributors mentioned above. We can easily find from the graph that Walt Disney and
Warner Bros are the leading players of all time.

Ordinal data

Ordinal data is where the variables have natural, ordered categories and the distance between
the categories is unknown.

Transformation –

The dataset doesn’t contain an ordinal data. The IMDb rating was used to create an ordinal
attribute. The categories made are – Brilliant, Outstanding, Very good, Good, Average. IF
function is used in order to categorize the IMDb data. After making the new column, Pivot
table is inserted to it and plotted the graph.
4

Categorized Rating
250

196
200
Number of Films

150

106
100 95

50
21
12
0
Average Brilliant Good Outstanding Very Good
Rating

Interpretation –

The whole data of 430 films are categorized and plotted in


Count of
the graph. From this graph, we could find the number of Row Labels rating
Brilliant movies, Outstanding movies, Very good movies, Average 12
Brilliant 21
Good movies, Average movies based on IMDb rating. Good 95
When we consider the case of a film, it not only the case of Outstanding 106
Very Good 196
revenue, profit margin etc. Films plays a huge impact on
Grand Total 430
society’s value, messages, moral uprightness etc. So it is
essential to identify the best movies in sort of all these moral criteria. IMDb is a very
matching source, where the films with more values are rated high. So, by analyzing the
graph, we can understand that most movies are in the category of very good. And Brilliant
films are only near, and average films lies at 12. Since this data set gives an idea of the best
movies of all time, The 21 brilliant category can be considered the movies with the best
values and the best one in history.

Interval Data

It is defined as a data type which is measured along a scale, in which each point is placed at
equal distance from one another. Interval data always appears in the form of numbers.
5

Data source - https://www.kaggle.com/narmelan/top-ten-blockbusters-20191977

Data set –

The City of New York has hosted this dataset. The SAT results at the school level for New
York City. For the graduating seniors of 2012, results are provided at the school level. The
records contain the average SAT scores of 2012 college-bound seniors taken during the 2012
school year. The city maintains an open data platform, which they update according to the
amount of data received. Dataset gives an idea about SAT. The SAT is wholly owned,
developed, and published by the collage board, a private, not-for-profit organization in the
United States. The highest SAT scores of the dataset are taken and created the graph

Average SAT Score of Schools


566
SUSAN E. WAGNER HIGH SCHOOL 455
397
FORT HAMILTON HIGH SCHOOL 417
462
EDWARD R. MURROW HIGH SCHOOL 468
School Name

632
FOREST HILLS HIGH SCHOOL 456
462
MIDWOOD HIGH SCHOOL 478
679
BENJAMIN N. CARDOZO HIGH SCHOOL 480
468
BROOKLYN TECHNICAL HIGH SCHOOL 587
200 300 400 500 600 700 800
SAT Score

Inference –

This clustered bar chart shows the data of the highest SAT score among schools in New
York. It is clear from the graph that Stuyvesant High School has the highest average single
SAT score. Any SAT score above the 50th percentile (median) can be considered a decent
score. The SAT is scored on a 200 to 800 scale in each section in 10-point increments. The 2
sections (Evidence-Based Reading and Writing and Math) will have scores provided
separately. Here Writing SAT score is taken. From this graph, we can understand that the
brightest students in the New York are mostly in Stuyvesant High School. After that comes
the Bronx high school, Fiorello high school, Brooklyn high school. From this graph, we can
6

infer that, by studying in the schools mentioned above, there is a high chance of clearing SAT
exam and students can join to college of their ambition.

Ratio Data

 For ratio data, Worldwide Blockbusters 1977-2019 dataset is used

It is the data with equal ratio between each data and absolute zero being treated as a point if
origin. In other words, there can be no negative numerical value in ratio data.

Interpretation –

In the Histogram, the range of gross which most of the companies are in is shown.
180000000 is the bin width. The most seen range is from 3,71,87,139 – 21,71,87,139. Here it
is visible that 111 movies are there in this range. Just near to that comes the range
21,71,87,139- 39,71,87,139 where 109 films are there. So by this graph, we can infer that
most of the blockbusters in the world come in the gross range of 3,71,87,139 - 39,71,87,139 –
where 220 films are there in this range altogether. So we can understand from this graph that
only nearly half the observation of the dataset has high gross. Half of these movies have a
7

comparatively lower gross. War films, musicals, and historical dramas have traditionally been
the most popular genres, but franchise films have been the most successful in the twenty-first
century. The superhero genre has many fans, with nine films from the Marvel Cinematic
Universe among the nominal top earners. Avengers: Endgame, the most successful superhero
picture, is also the highest-grossing film on the nominal profits chart, with four films based
on the Avengers comic books charting in the top twenty in total. All these films happened
nearly and we can see a trend from the graph that the highest gross movies are mostly
released recently.

Conclusion –

From the above graphs, the importance of data visualization is evident. By placing data in a
visual context, such as maps or graphs, data visualization helps us understand what it means.
This makes the data more natural for the human mind to understand, making it easier to see
trends, patterns, and outliers in huge data sets. Business customers can utilize data
visualization to gain insight into their massive volumes of data. They profit from being able
to spot new patterns and faults in the data. Users can pay attention to places that suggest red
flags or progress by making sense of these patterns. By Finding cleaning, analyzing and
inferring datasets, the visualization of the dataset has been done there.

REFERENCES

1) City of NewYork. (2021, January 1). New York City SAT Results [Dataset]. Kaggle.

https://www.kaggle.com/new-york-city/new-york-city-sat-results

2) Box Office Mojo by IMDB. (2020, February 3). Worldwide Blockbusters 2019–1977

[Dataset]. kaggle. https://www.kaggle.com/narmelan/top-ten-blockbusters-20191977

You might also like