Professional Documents
Culture Documents
Description 2
Conclusion 24
Unanswered questions 24
Knowledge earned 25
1. Description
After more than 100 years of establishment and rapid development, cinema
has transformed from a simple new form of entertainment into an art and the
most important tool of mass communication in modern society.
With great interest, our team wanted to analyze the development process and
learn about trends of the film industry, especially in the period from 1990 to
2019. This is the golden era of cinema with thousands of new technologies
applied such as 3D, 4D, Motion Capture, IMAX, VFX, StageCraft,... and box
office records are continuously broken and replaced.
We have collected a dataset containing basic information (title, director, year,
runtime, genre, imdb_rating, metascore_rating, vote, mpap, keyword, budget,
country, language, released_date, award_win, award_nomination, profit
wolrdwide_gross, us_gross, international_gross) about 12,146 movies.
After going through the dataset we gathered ourselves, plus our longstanding
curiosity, we have the following questions about the cinema industry:
● How diverse are movies?
○ What are the most popular contents in movies?
○ How many genres of movies have been made? Which genre is
the most popular?
○ Which countries produce the most movies?
○ Which movies are more popular?
○ Which movies have the most votes?
○ Which directors are the most successful?
● Which genre of film is taking the throne at the box office?
○ Which genre of movies are growing both in terms of numbers
and sales? And which genre does decline?
○ What is the cause of that increase and decrease?
● What Affects Box Office profit?
○ How do domestic sales, vote, imdb, award win affect global
profits?
○ Does the genre of the movie affect its profit?
○ Which MPAP rating is the most profitable?
○ Are these features enough to predict a movie's profit?
Through specific analysis and visual charts, we will have a more specific look
at cinema in many different aspects and get answers for all questions above.
Pandas describe() also show the number of values, the number of unique
values, the mode and its frequency for non-numeric columns.
3.2. Distribution
Distribution of imdb, metascore, score, popularity, vote, year, runtime, award
win, award nomination of dataset
We use the 'distplot' function of seaborn. This function provides access to
several approaches for visualizing the univariate or bivariate distribution of
data, including subsets of data defined by semantic mapping and faceting
across multiple subplots.
Here is the world cloud of words in the titles of the movies. To draw these
word clouds, we use the WordCloud() function of the wordcloud library.
The word Love is the most commonly used word in movie titles. Man and girl
are also among the most commonly occuring words. This encapsulates the
idea of the ubiquitous presence of romance in movies pretty well.
Love, friendship, relationship, woman, male, girl, husband wife, father, son and
marriage are all words that appear a lot, expressing a very popular emotional
theme in movies. In addition, sensational topics such as murder, death and
drug also attracted a lot of people's attention. This comes from both
subjective and objective factors.
Humans have always longed for healthy relationships with the people around
them and love movies give us hope of finding love of our own. There's another
scientific reason that people fall for a good love story—oxytocin, a.k.a. the love
hormone. Oxytocin releases into our bloodstream upon hearing a well-told
story, our brains react as if we are experiencing it ourselves.
And movies about murder attract us because murder and crime genre give
people a glimpse into the minds of people who have committed crimes. We're
drawn to the tension between good and evil, and crime, murder movies
embodies our fascination with that dynamic. Besides, maybe learning about
crime and murder simply appeals to our survival instinct, which is as a
subconscious way of preparing for real-life threats.
Comedy, Drama and Action are the three most produced film genres,
accounting for about 72% of the films produced between 1990 and 2019.
Drama and Comedy account for such a large number because the two genres
are easily communicated, their content related to human-to-human
relationships. While the action genre gives viewers moments of relaxation.
Also the action movie is able to create a scenario of stress, but this stress
happens in complete control and is short lived – this is something which
people enjoy. Whenever a person watches a movie he feels that he is traveling
with the hero, the same pain, same happiness, same excitement and he
becomes one with him in his imagination.
The variety of film genres exists because one has different interests, different
degrees of love for each genre of film. The following is the box plot for the
popularity of movies by genre from 1990 to 2019.
Biography, Drama, and Animation are the three genres that tend to have a
higher concentration of data (median) than other genres, showing the
popularity of films in these three genres slightly better. The common point of
these three genres is that their content reflects highly dramatic issues. Not
only that, animation also has the ability to increase creativity in storytelling.
The variation of the Music, Musical and Fantasy genres is relatively large,
showing the uneven quality of movies.
The parameters of
interest are p1 and p2,
the proportion of
movies that won
awards in biography
genre (p1) or in action
genre (p2):
H0: p1 = p2
H1: p1 > p2
α = 0.05
From this, we can conclude the population proportion of biography movies
with won awards is higher than the population proportion of action movies
with won awards.
As we can see on the map, in addition to Hollywood, the world also has typical
movie powerhouses such as Britain, France, India, and Canada. And the
emerging dragons of Asian cinema such as Japan, Korea, China, Hong Kong.
Europe can be considered the cradle of cinema. They were pioneers in the
motion picture industry, with a number of innovative engineers and artists.
Throughout time, there have been countless new waves and movements of
classical images born and developed here, such as German Expressionism,
Soviet Montage, French Impressionist Cinema and Italian Realism.
India is considered one of the world's major film markets. In particular,
Bollywood is famous for its lavish dance sequences on romantic music. Such
typical movies are mass-produced in this country to cater to the tastes of
mass audiences. In addition, they also focus on serious topics, emphasizing
depictions of realism and naturalism, symbolic elements and concern for the
environment, politics and society.
And it is impossible not to mention the Korean film industry with recent
admirable achievements such as the Academy Award for Best Picture, Best
Director, Best Foreign Language Film of Parasite. Korean movies with unique
content are not only inspired by Western cinema and Japanese New Wave, but
also based on Pansori: a traditional Korean art form of storytelling.
In 1990, the total revenue of action movies was roughly equal to the total
revenue of comedy movies, and comedy’s revenue was nearly 27 times that of
animation movies. But in 2019, action’s revenue was 5.3 times more than
comedy and animation’s revenue is nearly 3 times more than comedy.
There are three main reasons for the decline, instability of comedy and the
dramatic growth of action over the past 30 years:
● Action movies are more profitable
● Theater audiences are more interested in action movies
● Fierce competition from other genres
The two graphs below show us why the more comedies decline, the more
actions thrive:
The average revenue of the action movies has increased following by
decreased and increased again year by year, but the general trend is still up. In
1990, an action film earned about 90 million USD but in 2019, the average
revenue of an action film is 450 million. Meanwhile in the comedy genre, the
average in 1990 was about 70 million but in 2019 it was only about 110
million.
The annual revenue of these two genres are also significantly different. From
1990 to 2000, comedy gross revenue was pretty close together. But by 2008,
action's annual gross jumped while comedy's annual gross fell, turning them
into two unequal opponents.
Here are line plots of average and total profit of these two genres.
Not surprisingly, these two charts are quite similar to the two above. It seems
that the film industry is like the stock market: You have to spend money to
make money. And if you want to make the real big money, you're gambling
with an awful lot of risks. The cost of an action movie is high, but its profit is
much higher, so the profit is large.
Like revenue, the average profit of comedy has grown slightly and for a period
of time that has barely increased. From 1991 to 2005, the average annual
return was even lower than it was in 1990.
Therefore, investors and film producers will not hesitate to choose action
movies that take more money to produce but also earn more.
In both charts "Average votes per film of two genres Action and Comedy by
year" and "Total votes of two genres Action and Comedy by year", it is
interesting that both the average and total number of votes of action movies
are still higher than that of comedy movies. The difference between the
interest of the mass audience and the two lines is really clear in 2008.
To explain the increased popularity of action movies, we rely on our
knowledge as well as the results of some surveys:
● Action movies have higher budgets, so marketing costs are also higher.
● New technologies in action movies are always appealing to us.
● Action movie content is often more novel and attractive than comedy.
● According to a survey by Gökçe Bayramıçlılar, 75% of respondents said
that they would rather watch sitcom series online than go to the
cinema to watch comedy.
The regression plots in seaborn are preeminent and intended to add a visual
guide that helps to emphasize patterns in a dataset during exploratory data
analyses. In other words, it shows us the relationship, the degree of
connection between two features.
The regression plot of budget and profit will help us understand if movies with
higher budget, will it generate higher profit? We can see that there's a strong
correlation between the budget and the profit.
Similarly, we see that worldwide gross, international gross, domestic gross are
also closely related to profit. There are strong positive linear relationships
between worldwide gross, international gross, domestic gross and profit. This
is quite obvious because profit is partly calculated by them.
In addition, vote, imdb, won awards, nominated awards are also associated
with profit. These features will play an important role in building the movie
profit prediction algorithm.
Although we were able to establish moderate positive correlation between
rating and vote, we couldn't establish acceptable correlation between them
and profit.
4.3.2. Genres
Movie sales can also be related to the genre of the film. Each category will
have a different number of followers. Some genres will be suitable for the
majority of audiences such as action, comedy, family, animation,... Meanwhile,
some genres will be more picky with audiences such as crime, horror, thriller,...
4.3.3. MPA
Another factor that also greatly affects the profit is the film's MPA rating, the
age of the audience that the film aims for. We will consider 4 main types,
which are R, PG-19, PG, G labels:
● R – Restricted: Under 17 requires accompanying parent or adult
guardian. Contains some adult material. Parents are urged to learn
more about the film before taking their young children with them.
5. Conclusion
5.1. Unanswered questions
● How do the director and main actors affect the profit of the film?
(Because we don't know how to handle the director column, nor
have we collected the main actors of each movie.)
● Are the average profit for each genre the same across
countries? (The data we have is from two US websites, so the
movies from the US will be more and more accurate.)
● What content makes the movie profitable? Which audience will
watch movies in theaters more?
● How do online movie streaming platforms like Netflix, Amazon
Prime, HBO Max,... affect the box office profit of movies?
● How has the COVID-19 pandemic changed the film industry?
5.2. Knowledge earned
After working together on this project, we have both learned a lot of
new knowledge as well as reinforced old knowledge:
● Revise the knowledge of probability and statistics such as
probability, sampling distribution, descriptive statistics,
confidence interval, test of hypotheses, simple linear regression,
correlation,...
● Learn more how to collect, clean, process data and calculate
and graph from that data in Python.
● Improve the ability to work in groups, make documents and
slides.
● Learn more about the film industry and answer the group's
questions about movies.