A Two-Stage Proxy Variable Approach To Estimating Movie Box Office Receipts

J Cult Econ
DOI 10.1007/s10824-012-9198-y
ORIGINAL ARTICLE
A two-stage proxy variable approach to estimating

movie box office receipts
Frederick W. Derrick • Nancy A. Williams •
Charles E. Scott
Received: 27 October 2011 / Accepted: 18 December 2012

Ó Springer Science+Business Media New York 2013
Abstract This paper advances the ongoing discussion of methods for predicting
movie box office revenues with two contributions to the methodology and an out-of-
sample test of the model. The first innovation is the development of a two-stage
model using publicly available pre-release indicators to predict (1) initial week and
(2) subsequent run box office revenues. To incorporate the experience-good nature
of movies, the second stage is estimated by incorporating a proxy variable for box
office success during the first week relative to predicted first week success. The
second contribution is an empirical test of De Vany and Walls’ (J Econ Dyn Control
28:1035–1057, 2004) finding that the distribution of movie revenues has ‘‘heavy
tails’’ and follows a non-Gaussian stable distribution with infinite variance. We
estimate the two-stage model of a movie’s box office success on all general release
movies in 1 year with both the Gaussian and stable distribution with heavy tails and
infinite variance and find no evidence for the stable distribution in either stage of the
estimation. This two-stage model is validated by comparing all general release
movies in 3 future years (out-of-sample data) to the model’s predictions.
Keywords Motion picture success Movie critics Two-stage estimation

Stable distribution Stable Paretian hypothesis
JEL Classification C20 L82
1 Introduction
Motion pictures are one of America’s most cherished pastimes with the industry
earning over 10.2 billion dollars in revenue in 2011 (www.boxofficemojo.com), but
F. W. Derrick N. A. Williams (&) C. E. Scott

Department of Economics, Loyola University Maryland, Baltimore, MD 21210, USA
e-mail: nwilliams@loyola.edu
123
J Cult Econ
the prediction of box office success is a daunting challenge when one considers that
each movie is a crafted product with a unique combination of inputs (including
specific actors/actresses, directors, script, etc.) which introduces greater variation
than that in mass-produced goods and services.1 Historically, many movies have
failed to cover production costs, and the industry was dubbed the ‘‘nobody knows
anything’’ industry by screenwriter Goldman (1983).
Consumers, while having experience with the industry, do not have knowledge
about the particular movie until it is experienced. At release, potential movie
viewers may be influenced by signaling: advance marketing of the movie, actor(s),
actress(es), director, story line, and/or reviews. Shortly after release, potential movie
viewers are influenced by the experience of previous viewers. In addition, during the
relatively short life of each movie, the movie faces an ever-changing market of
competitors, as new movies are released and previously released movies end
screenings. Furthermore, the vast majority of the costs of a movie are sunk costs at
the time of release, including production costs and most marketing costs.
Thus, the factors that impact movie revenues in the first week and in subsequent
weeks differ and suggest a two-stage model is appropriate for prediction of box
office revenues. In the first, studio-driven revenue stage, marketing and distribution
decisions are made largely by corporate decision makers without the experience
component available. Movie houses normally sign contracts for 2- to 4-week runs
without knowing how consumers will respond to the movie. After the first week, the
experiential component (second stage) enters the model as word spreads regarding
the movie’s appeal and influences the contracts and success for the remaining run of
the movie (second stage).2
Clearly, movie theaters and studios alike would benefit from being able to predict
revenues for the entire run of the movie soon after its release. We utilized data on all
135 general release movies in 1999 that are publicly available before or at the time
of the movie’s release and estimated both the first week’s revenues and revenues for
subsequent weeks.3 Revenues during the opening week are estimated on production
and distribution characteristics of the movie as well as two variables that measure
critical reviews. In the second stage, the subsequent financial success is estimated
incorporating pre-release information as well as information not available at
opening—a proxy variable measure of the first week box office success.
De Vany and Walls (1999, 2004) concluded that the distribution of movie
revenues has ‘‘heavy tails’’ and is a non-Gaussian stable distribution with infinite
variance. In this paper, we estimate the two-step model with both ordinary least
squares (OLS) and the stable distribution with heavy tails and infinite variance. We
1
In addition, Prag and Casavant (1994) state that the total domestic and foreign box offices represents
about one-third of gross revenue when video rental and sales, television, and cable are considered. Ravid
(1999) used domestic and foreign box office receipts and video rentals in the study of returns for 180
movies released between 1990 and 1993.
2
The release of a new consumer product line or product has similar dynamics and also has relatively
high failure rates.
3
The definition of ‘‘wide release movie’’ used in this paper is the one used by the MPAA of an initial
release in more than 600 theaters for at least 1 week. This eliminates most minor studio releases and art
house films.
123
J Cult Econ
test for the appropriateness of OLS estimation and provide the best estimates in both
steps.
To test the accuracy of the forecast and predictive performance, we use a hold-
out sample of all US movies widely released during the years 2000, 2001, and 2002
to evaluate the estimates based on R2, mean absolute deviation (MAD), and mean
square error (MSE). To adjust for inflation, box office revenues were adjusted to
2001 US dollars based on the average ticket price/year available at the Motion
Picture Association of America (MPAA) website.4
2 Literature
Understanding and forecasting box office revenue has been a source of fascination
for economists over the years beginning with early works of Litman (1983) and
Smith and Smith (1986). The literature is vast, and we restrict our review here to the
key findings that relate to the development of our model and test of the De Vany and
Walls (2004) findings.
2.1 Use of a two-stage model
To date, several authors have modeled revenue (or profit) in a two-step approach of
first week revenue estimation followed by subsequent weeks’ revenue estimation.
Deuchert et al. (2005) find that first week revenue significantly influences the revenue
in subsequent weeks. They identify post-release factors, such as Oscar nominations
and awards for the 204 most successful movies in each year between 1990 and 2000,
and model their impact in a survival analysis. De Vany and Walls (1997) found that
the initial revenue as well as the initial number of bookings are significant direct
predictors in a survival model. Our purpose here is a predictive model using a priori
data; hence, we do not focus on survival analysis. Ravid et al. (2006) investigated
reviewer bias and ran their model with opening week revenue/screen and total
revenue using identical independent variables. Our model differs from their model in
the use of a new independent variable in the second stage of estimation, the use of
remainder revenue in the second stage, and the test of the De Vany Walls finding.
Our specific two-stage model specification is new to the literature.
2.2 OLS versus general stable distribution modeling
De Vany and Walls (2004) showed that profit estimates based on OLS yield
significantly different and incorrect results when compared to the stable estimates
that take into account the infinite variance. They modeled the unconditional
distribution of motion picture profits using the general stable distribution, capturing
the heavy Paretian tails as well as the central portion of the distribution. They
rejected the Gaussian model since revenues, returns and profits all have infinite
4
Source: http://www.stop-runaway-production.com/wp-content/uploads/2009/07/2002-MPAA-Market-
Stats-60-pages.pdf. Accessed January 10, 2013. More recent data are found at www.mpaa.org.
123
J Cult Econ
variance.5 Walls’ (2005) paper refined and extended this earlier work by modeling
the conditional stable distribution of outcomes. In the framework of a linear
regression model, he explicitly accounted for the stable distribution of the
endogenous variables while simultaneously estimating the regression coefficients
and the characteristic exponent of the stable distribution. Moreover, he found that
OLS estimators differ statistically from those obtained using the stable distribution.
The following sections of the paper lay out a two-stage model specification using
publicly available pre-release data and critical reviews, discuss its rationale—
including the rationale for the proxy variable and the De Vany and Walls (2004) test,
provide the estimation of the model, and finally test the model with out-of-sample data.
3 Two-stage model specification
Box office revenue of the initial domestic release is modeled with an equation in two
distinct time periods, first week and subsequent weeks, using the 135 domestic movies
released in 1999.6 At each stage of the estimation, only information observable to
consumers is used to estimate revenue: number of theaters, genre, rating, season, prior
popularity, distribution company, and star power. Unobservable characteristics of the
movie (such as lighting, critical acclaim) are captured by two additional variables
based on critical reviews: the number of critical reviews and the percentage of positive
critical reviews.7 In the estimation of revenue for subsequent weeks, a new variable is
added to factor in the word-of-mouth buzz from the first week. Data sources for each
of the variables are identified in the ‘‘Appendix: Data sources’’.
3.1 Stage 1: first week domestic box office revenue equation
The first week revenue (in millions of dollars) might be viewed as the ‘‘advance
marketing’’ stage, and is incorporated in log form due to its highly positive skew8:
Log ð1st Week RevÞ ¼ b0 þ b1 Number of Reviews þ b2 Percent Positive Reviews
þ b3 logðTheatresÞ þ b4 HighSeason þ b5 StarPower
þ b6 PriorPopularity þ b7 Rating þ b8 Genre
þ b9 DistributionCompany þ error ð1Þ
The impact of critical reviews has been studied extensively in the literature (see
e.g., Boatwright et al. 2007; King 2007; Lee 2009; Reinstein and Snyder 2005;
5
Since no robust standard OLS model can be established, Collins et al. (2002) and De Vany and Walls
(1999) analyzed blockbuster movies, those with revenue beyond a given threshold, using a binary
dependent variable.
6
Revenue from rare subsequent releases is not addressed in this paper.
7
Reinstein and Snyder (2005) found that roughly 80 % of the movies in their study were reviewed before
the opening weekend. We find that 95 % of critical reviews appear within the first week of the film’s
opening.
8
This is consistent with De Vany and Walls (2002), who concluded that the log specification provides
the best fit of box office revenue using a Box–Cox transformation.
123
J Cult Econ
Hennig-Thurau et al. 2012). We utilize the number of reviews and percent positive
reviews to test for the ‘‘influencer’’ versus ‘‘predictor’’ nature of reviews identified
by Eliashberg and Shugan (1997). Reinstein and Snyder (2005) found no influence
effect of reviews for widely released movies. Wallace et al. (1993) suggested that
there may be positive benefits to receiving particularly bad reviews, and Ravid et al.
(2006) found that although critics may be biased, audiences may not be able to
distinguish biased from unbiased opinions. Hennig-Thurau et al. (2012) found that
long-term box offices are influenced by reviewer quality perceptions, while short-
term box offices are not.9,10
Larger film budgets may be viewed as a proxy for higher-quality films in the eyes
of movie attendees, but Epstein (2012) warns of the unreliability of publicly
available budget data. Moreover, production budgets are closely correlated with the
number of opening screens, a more accurate variable.11 The number of theaters the
movie opened in during the initial weekend, Theaters, is expected to have a direct
relationship as it provides a proxy for the expected initial interest in the movie by
theater owners, promotional marketing, and for the expected demand by the
distributor.12 Studies of movie revenue often overlook the interdependence of
revenue and screens and incorporate number of screens as an exogenous variable.
Following the works of Elberse and Eliashberg (2003) and Fernandez-Blanco et al.
(2012), we investigate potential endogenity in the estimation of revenue and
screens. The logarithm of the number of theaters is used to address potential
heteroskedasticity.
Results on the impact of movie stars on box office revenue are mixed as some
argue a real quality difference (Rosen 1981; Hamlen 1994), while others argue that
the presence of stars lowers search costs for audiences (Adler 1985).13,14 These
inconsistent findings may be the result of a differing impact of stars on revenue
during the first week and in subsequent weeks. StarPower is measured as the number
of actors and/or actresses in the movies who starred in at least 10 movies and were
ranked in the top 25 actors or top 25 actresses when sorted by average box office
9
The impact of positive reviews may not be expected to be constant over this range since there may be
diminishing impact at some point as percent of positive reviews approaches 1. Thus, critical review could
be modeled as a cubic to allow varying impacts on the dependent variable and linearly for comparison.
Hennig-Thurau et al. (2012) found nonlinear influence of critics. To allow for the possibility that it is not
only the reviewers’ statements about the quality of the movie but also the volume of reviews that
influence interest in a movie, the interaction between quality of review and the number of critical reviews
was modeled. Our (unreported) analysis finds neither the cubic nor the interaction to be significant.
10
Hadida (2010) suggested that directors, producers, and lead actors may also be a signal in the first
week of movie quality.
11
Log (production budget) has a correlation of 0.69 with Log (theaters), and we have omitted production
budget from our model.
12
In the second equation, the number of theaters could have a quadratic affect if the first week release is
on a sufficiently large number of screens to saturate the market so that there is little or no aftermarket.
13
Pokorny and Sedgwick (2001) investigated the strategy for the use of stars by movie firms during the
studio era of the 1930s to the 1950s as opposed to the individual film strategy.
14
The use of awards is inappropriate at this point of a film’s box office revenue cycle since major awards
such as the Academy Awards and Cannes Film Festival occur following a movie’s release and are
unknown during the first week.
123
J Cult Econ
gross during the 1990s. This measure is consistent with that used by Simonoff and
Sparrow (2000).15
Following the literature, other variables observable to the consumer include
distribution company, high season, prior popularity, rating, and Genre, and they are
defined in the ‘‘Appendix: Data sources’’.16
3.2 Stage 2: post-first week domestic box office revenue equation
The second step of the estimation can be viewed as estimating the ‘‘legs of the
movie’’ or its ‘‘playability,’’ the ability of the movie to attract and to continue to
attract customers through experience and word of mouth. As De Vany and Walls
(1996, p. 1493) state, ‘‘audiences make hits or flops, and they do it, not by revealing
preferences they already have, but by discovering what they like.’’ The ‘‘legs of the
movie’’ can be measured in the value of the remaining revenue stream of box office
revenues (in millions of dollars)17:
LogðPost-First Week RevenueÞ ¼ b0 þ b1 Number of Reviews
þ b2 Percent Positive Reviews
þ b3 logðFirst Week Actual Revenue=
Estimated RevenueÞ
þ b4 logðTheatresÞ þ b5 High Season
þ b6 Star Power þ b7 Prior Popularity
þ b8 Rating þ b9 Genre
þ b10 Distribution Company þ error ð2Þ
This model includes all of the variables in the first week’s revenue equation and
adds a new explanatory variable: the log of the ratio of the actual first week revenue
to the estimated first week revenue. This variable is a measure of the relative success
or failure of a movie’s first week after controlling for the other variables in (1).18 It
can be viewed as an instrument for the first week viewer’s actual evaluation of the
uniquely crafted product, a proxy for the word-of-mouth buzz that follows the first
15
An alternative measure could be based on Entertainment Weekly which does a yearly analysis of the
top 25 movie stars in the United States but would lack the impact of an actor/actress over time. Simonoff
and Sparrow (2000) also used this measure. In a recent paper, Nelson and Glotfelty (2012) measure star
power with visits to the star’s webpage on IMDb.com.
16
Recent research by Gutierrez-Navratil et al. (2012) studies the extent to which a movie’s box office
receipts are influenced by the temporal distribution of rival films. This is beyond the scope of our paper,
and the data are not available. However, this issue may be partially addressed by our distribution
company variable.
17
The staying power of a movie could also be estimated in a hazard model. Simonoff and Ma (2003)
used a hazard model to measure the success of Broadway shows. Deuchert et al. (2005) used a hazard
model for a movie continuing to be shown. Nelson et al. (2001) also used a hazard model in determining
the work of an Oscar using weekly box office revenue. In a vastly different approach, Sawhney and
Eliashberg (1996) modeled the gross box office revenue based on the first 3 weeks of box office revenue
using either the exponential, Erlang-2 or generalized gamma probability distributions.
18
A reviewer noted that if there are endogeneity issues with the first stage of estimation, this variable
would introduce endogeneity issues in the second stage. See Fernandez-Blanco et al. (2012).
123
J Cult Econ
week. We hypothesize that there is a change in the mindset of potential viewers

between the first week and subsequent weeks under the premise that the experience
of previous viewers will influence later viewers. Under the assumption of constant
ticket prices, first week revenues could be a proxy for initial interest in the movie
and a predictor of future success. Consumers discover what they like in the first
week and convey it to potential moviegoers over time; that is, the expectation is that
the coefficient will have a positive sign.
4 Testing De Vany and Walls’ (2004) findings on heavy Paretian tails
While many researchers use OLS estimation for movie revenue estimation, De Vany
and Walls (1999) report that box office revenues are asymptotically Pareto-
distributed and have infinite variance. The statistical model that accommodates these
features and which is the most general form of the central limit theorem is the stable
distribution (De Vany and Walls 2004). If the finite variance assumption of the central
limit theorem is dropped, one obtains the generalized central limit theorem, which
states the limiting distribution of the sum of a large number of independent identically
distributed random variables must be the stable class of distributions. A stable
distribution S(a, b, c, d) is a four-parameter distribution. With its characteristic high
peak, heavy Paretian tails, and skew, the stable distribution captures the winner-take-
all nature of the movie business as well as the influence of extreme events. The
characteristic exponent a is a measure of the probability weight in the upper and lower
tails with a range of 0 \ a B 2 and the variance is infinite when a \ 2.19
The conditional distribution of box office returns for Eqs. (1) and (2) is analyzed
using the stable distribution regression model.20 The coefficients in this model
represent what is known about the correlates of film success while at the same time
permitting the variance of film success at the box office to be infinite.
To evaluate the quality of the estimations and stability of results over time, out-of-
sample comparisons are made for the movies released in the 3 years following the
estimation. The equations are estimated using the 135 domestic movies released in
1999, and the resulting model was used to compare the estimates for the 126 domestic
movies released in 2000 to actual 2000 first week revenue and remainder revenue. The
measures of successful prediction of revenue are based on dollar amounts not the log
specification used in the first week revenue and remainder revenue equations. The
equations are first used to obtain the estimated log of the revenue, and then the antilog
is taken to convert the estimated values to dollars. This approach is repeated for the
128 movies released in 2001 and for the 129 movies in 2002. To control for inflation
19
The skewness coefficient b is a measure of the skewness with a range from -1 \ b \ 1, where the sign
indicates the type of skewness. The scale parameter must be positive and either expands or contracts the
distribution in a nonlinear way about the location parameter d. Given the infinite variance, the precision of
the MSE and other second moment-based measures of the forecasts would necessarily be zero.
20
The symmetric stable regression model was estimated in Gauss 3.2 using code by McCulloch (1998a,
b). Source code is electronically available at: http://www.econ.ohio-state.edu/jhm/programs/SMSTRG.
Accessed January 10, 2013.
123
J Cult Econ
across the years used in this study, the revenue data are adjusted for inflation using
average ticket price per year deflated by the average ticket price in 2001.
For each of the out-of-sample comparisons on dollar amounts, we compute the
R2, p value on the corresponding F statistic, MSE, and MAD.
5 Estimation results
There is considerable diversity in the movies that were released in 1999 across the
variables in the data set. The 135 movies in 1999 had a mean first week box office
revenue of 12.94 million dollars with a maximum of 68.07 million as shown in
Table 1. The mean remainder box office revenue was 43.24 million dollars and
ranged from 1.11 to 412.24 million.
The explanatory variables show similar wide variation. Critical reviews of movies
ranged from as few as 16 to as many as 144 for a given movie and ranged in evaluation
from no positive reviews to all positive reviews. On average, there were 65.42 reviews per
movie with 48.19 % of the reviews positive. The numbers of reviews and the percentage
of positive reviews for a movie have a correlation of 0.535. The number of theaters the
movie opened in during the first week ranged from as few as 25 to 3,342 with a mean of
1,990. All of the categorical variables, rating, genre, prior popularity, and distribution
company exhibit numerous observations per category with no predominant outcome.
5.1 First week domestic box office revenue
The results for the estimation of logarithm of the first week box office revenue with
symmetric stable maximum likelihood (ML) estimation are shown in Table 2. The
estimate of alpha of 2.0 indicates that first week revenues can be estimated with
OLS and the variance in the error term is finite. Thus, the log first week revenue is
not necessarily asymptotically Pareto-distributed with infinite variance as reported
by De Vany and Walls (1999).
In addition to OLS estimation, the log (first revenue) equation was estimated with
three stage least squares (3SLS) using log (first week revenue) and log (number of
theaters) as endogenous variables. The results of OLS and 3SLS estimation are
identical in significance test results and show only minute differences in estimated
coefficients and standard errors. We report the OLS estimates here since the gains
from 3SLS are minimal.21
The impact of critical reviews on log first week revenue is mixed.22 Consistent
with Ravid’s (1999) finding, the percentage of positive reviews for a movie is
insignificant, while the number of reviews is significant. This finding indicates that
the consumer does not consider the reviewer an ‘‘influencer,’’ and this finding is in
keeping with Eliashberg and Shugan’s (1997) result. One may conclude that the
visibility of a review rather than its content is the more important marketing tool.
21
The 3SLS estimates are available from the authors.
22
Throughout the results, the columns titled Wald ratio record the ratio of each regression coefficient to
its standard error, a statistic which is asymptotically standard normal under the hypothesis that the
corresponding b is zero.
123
J Cult Econ
Table 1 Descriptive statistics for wide release movies 1999 (n = 135)

Mean Median SD Range Minimum Maximum
Dependent variables
First week revenue ($ mil.) 12.94 8.37 12.8 67.66 0.41 68.07
Log (first week revenue) 0.92 0.92 0.43 2.22 -0.39 1.83
Remainder revenue ($ mil.) 43.24 23.17 57.76 411.13 1.11 412.24
Log (remainder revenue) 1.34 1.36 0.53 2.57 0.04 2.62
Independent variables
Percentage positive reviews 48.19 48 26.17 100 0 100
Number of reviews 65.42 64 27.28 128 16 144
Production budget ($ mil.) 57.25 46.5 39.45 180.9 0.1 181
Number of theaters 1,989.89 2,124.5 770.99 3,317 25 3,342
Log (number of theaters) 3.23 3.33 0.33 2.13 1.4 3.52
High season 0.49 0 0.5 1 0 1
Star power 0.32 0 0.51 2 0 2
Prior popularity 0.21 0 0.41 1 0 1
G/PG 0.16 0 0.37 1 0 1
PG 13 0.33 0 0.47 1 0 1
Drama 0.30 0 0.46 1 0 1
Comedy 0.37 0 0.48 1 0 1
Science fiction 0.04 0 0.21 1 0 1
Action 0.13 0 0.33 1 0 1
Horror 0.09 0 0.29 1 0 1
BuenaVista 0.09 0 0.29 1 0 1
WarnerBrothers 0.13 0 0.33 1 0 1
Columbia 0.10 0 0.30 1 0 1
Paramount 0.10 0 0.31 1 0 1
Fox20 0.12 0 0.32 1 0 1
Universal 0.12 0 0.32 1 0 1
Disney/Touchstone 0.04 0 0.19 1 0 1
MGM 0.06 0 0.24 1 0 1
NewLine 0.05 0 0.22 1 0 1
Miramax 0.04 0 0.21 1 0 1
Sony 0.06 0 0.24 1 0 1
DreamWorks 0.04 0 0.19 1 0 1
Alternatively, it may be that a critic’s normative evaluation of a movie and a

movie’s entertainment value are not related.
The logarithm of the number of theaters has a positive, significant coefficient in
the first week revenue equation. This could be due to larger exposure or constrained
attendance for those with fewer theaters.23 Of the twelve major distribution
23
In another formulation of the model, the production budget coefficient is of the expected sign but not
statistically significant. As suggested by a reviewer, it has been removed from the model.
123
J Cult Econ
Table 2 Log (first week revenue) estimation results for 1999 releases (n = 135)
Variables Ordinary least squares estimates Symmetric stable ML estimates
Coefficient SE t Coefficient SE Wald

ratio
Constant -2.3225 0.2768 -8.391 -2.3225 0.2487 -9.339

Percentage of positive 0.0010 0.0011 0.8536 0.0010 0.0010 0.950
reviews
Number of reviews 0.0052 0.0011 4.5988* 0.0052 0.0010 5.118*
Log (theaters) 0.8400 0.0885 9.4945* 0.8400 0.0795 10.566*
High season 0.0058 0.0491 0.1178 0.0058 0.0441 0.131
Star power 0.0445 0.0491 0.9060 0.0445 0.0441 1.008
Prior popularity 0.1386 0.0670 2.0676* 0.1386 0.0602 2.301*
G/PG 0.0522 0.0831 0.6277 0.0522 0.0747 0.699
PG 13 0.1266 0.0572 2.2121* 0.1266 0.0514 2.462*
Drama 0.1692 0.1256 1.3477 0.1692 0.1128 1.500
Comedy 0.1492 0.1204 1.2385 0.1492 0.1082 1.378
Science fiction 0.2417 0.1472 1.6419 0.2417 0.1323 1.827
Action 0.2904 0.1397 2.0790* 0.2904 0.1255 2.314*
Horror 0.2299 0.1435 1.6018 0.2299 0.1290 1.783
BuenaVista -0.1638 0.1346 -1.217 -0.1638 0.1209 -1.355
WarnerBrothers -0.1096 0.1330 -0.824 -0.1096 0.1195 -0.917
Columbia -0.1485 0.1373 -1.082 -0.1485 0.1234 -1.204
Paramount -0.0149 0.1345 -0.111 -0.0149 0.1208 -0.123
Fox20 -0.2701 0.1276 -2.116* -0.2701 0.1147 -2.355*
Universal -0.0576 0.1306 -0.441 -0.0576 0.1173 -0.491
Disney/Touchstone 0.0106 0.1711 0.0619 0.0106 0.1538 0.069
MGM -0.0903 0.1416 -0.638 -0.0903 0.1272 -0.710
NewLine -0.1966 0.1488 -1.322 -0.1966 0.1337 -1.471
Miramax -0.2211 0.1425 -1.551 -0.2211 0.1281 -1.726
Sony -0.3346 0.1401 -2.389* -0.3346 0.1259 -2.658*
DreamWorks -0.1074 0.1541 -0.697 -0.1074 0.1385 -0.776
Log L = 13.8495 a = 2.0000,
SE = 0.0000
R2 = 0.6058 [based on c = 0.1544
log (revenue)] Log c = -1.8681,
F stat = 204.421 SE = 0.0609
(p value = 1.14E-28) Log L = 13.8495
LR stat = 0.0000
* Significance at the 0.05 level
companies modeled with indicator variables, only Fox20 and Sony have signifi-
cantly different first week box office revenues.
Prior Popularity has the expected positive and significant coefficient in the log of
first week revenue equation, but Star Power and High Season are insignificant. Basing
123
J Cult Econ
the movie on a popular book or earlier movie or television show (Prior Popularity)
would lead to an estimated 0.1386 % increase in first week revenue. All three of these
factors are decisions under the control of the creators of the movie, and these
coefficients point to the marginal revenue product (MRP) of each characteristic.
The findings for genre and rating of the movie are generally consistent with
expectations. Both G/PG and PG-13 ratings show more impact on the log of first
week revenue compared to the R-rating, but only PG-13 is significant.24 Action
movies perform significantly better than children’s movies with a 0.2904 % increase
in first week revenue. The signs of the remaining coefficients of ratings and genres
are consistent with expectations but are insignificant.
Given the inconsistency of findings in the literature, the question that arises is
whether these results for 1999 are driven by specific movies and due to fitting the
model to the data. In Table 4 (below), the out-of-sample estimates of fit for first
week revenue in dollars [not log (dollars)] in 2000, 2001, and 2002 indicate that the
results are surprisingly stable. In 2000–2002, the R2 for first week revenue, again in
dollars, range from 0.474 in 2002 to 0.623 in 2001, and the p values indicate
significance of the overall forecasting model in each year.
While the R2 and p values indicate the estimated equations are significant in their
fit to the 1999 data and to the out-of-sample data for 2000, 2001, and 2002, the
magnitude of the errors as measured by the MAD for first week revenue is still
substantial, ranging from $5.021 million in 1999 to $15.41 million in 2002.
5.2 Post-first week domestic box office revenue equation
In the estimation of the logarithm of the post-first week (remainder) of box office
revenue, the estimated a of 2.0 in Table 3 indicates that the distribution of
remainder revenue in 1999 has finite variance, a result again inconsistent with the
findings of De Vany and Walls (2004). We find no evidence of infinite variance in
either stage of a movie’s run. Estimation of the post-first week revenue equation
with 3SLS shows only minimal differences in any of the results and does not impact
the findings. Thus, we report the OLS estimates.
In contrast to the first week revenue results, both measures of critical review are
of the expected sign and significant. The percentage of positive reviews coefficient
indicates that an increase of one in the percentage of positive reviews for the movie
leads to an estimated 0.009 % increase in remainder revenue. The number of
reviews is significant and consistent with the conclusion of the first week revenue
results that movie reviews are another way of advertising or marketing a movie.
The proxy variable, log of the ratio of the first week revenue to the estimated first
week revenue, is highly significant and positive. A one percent increase in first week
revenue ratio leads to a 1.3 % increase in the remainder revenue. As hypothesized,
unexpected success in the first week leads to continued success through the experience
of early movie goers, likely due to word-of-mouth dissemination of information.
24
Children’s movies would be expected to have lower revenue due to lower prices for children’s tickets,
but higher revenue due to parents accompanying. Thus, this result is not necessarily an indication of less
interest in the product.
123
J Cult Econ
Table 3 Log (remainder revenue) estimation results for 1999 releases (n = 135)
Variables Ordinary least squares estimates Symmetric stable ML estimates
Coefficient SE t Coefficient SE Wald

ratio
Constant -0.8362 0.1926 -4.3416 -0.8362 0.1723 -4.8541

Percentage of positive reviews 0.0034 0.0008 4.3759* 0.0034 0.0007 4.8924*
Number of reviews 0.0090 0.0008 11.2743* 0.0090 0.0070 12.0650*
Log (theaters) 0.3002 0.0616 4.8758* 0.3002 0.0551 5.4513*
High season 0.0314 0.0342 0.9175 0.0314 0.0306 1.0258
Star power 0.0912 0.0342 2.6701* 0.0912 0.0306 2.9852*
Prior popularity 0.2132 0.0467 4.5705* 0.2132 0.4170 5.1099*
Log (first week rev/expected rev) 1.2999 0.0667 19.5021* 1.2999 0.0596 21.8040*
G/PG 0.2740 0.0578 4.7373* 0.2740 0.0517 5.2965*
PG 13 0.2242 0.0398 5.6296* 0.2242 0.0356 6.2940*
Drama 0.3437 0.0874 3.9341* 0.3437 0.0781 4.3984*
Comedy 0.3019 0.0838 3.6021* 0.3019 0.0750 4.0273*
Science fiction 0.4854 0.1024 4.7385* 0.4854 0.0916 5.2978*
Action 0.5390 0.0972 5.5446* 0.5390 0.0869 6.1990*
Horror 0.4202 0.0999 4.2078* 0.4202 0.0893 4.7044*
BuenaVista -0.0843 0.0937 -0.9002 -0.0843 0.0838 -1.0065
WarnerBrothers -0.0140 0.0926 -0.1513 -0.0140 0.0828 -0.1692
Columbia -0.1254 0.0955 -1.3131 -0.1254 0.0854 -1.4680
Paramount 0.0544 0.0936 0.5811 0.0544 0.0837 0.6497
Fox20 -0.3380 0.0888 -3.8062* -0.3380 0.0794 -4.2555*
Universal 0.0021 0.0909 0.0228 0.0021 0.0813 0.0254
Disney/Touchstone 0.2334 0.1191 1.9602* 0.2334 0.1065 2.1915*
MGM -0.1203 0.0985 -1.2213 -0.1203 0.0881 -1.3654
NewLine -0.2196 0.1035 -2.1214* -0.2196 0.0926 -2.3717*
Miramax -0.1884 0.0992 -1.8995* -0.1884 0.0887 -2.1237*
Sony -0.3912 0.0975 -4.0133* -0.3912 0.0872 -4.4870*
DreamWorks 0.0317 0.1072 0.2960 0.0317 0.0959 0.3309
Log L = 13.8495 a = 2.0000, SE = 0.0000
R2 = .7779 [based on log (remainder revenue)] c = 0.1070
F stat = 465.8968 (p value = 2.72E-45) Log c = -2.2353, SE = 0.0609
Log L = 63.4197
LR stat = 0.0000
* Significance at the 0.05 level
The logarithm of the number of theaters in the first week has a positive and
significant impact on log of remainder revenue. This finding and the number of
critical reviews finding are consistent with the interpretation of both acting as
advertising early in the movie’s run. Prior popularity and star power also have
positive and significant impacts on the log of remainder revenue.
123
J Cult Econ
Table 4 Out-of-sample
Years R2 p value MAD MSE
goodness of fit comparisons on
on F stat
dollar revenues
First week revenue
1999 0.606 1.14E-28 5.021 68.236
2000 (n = 126) 0.538 1.66E-22 5.878 74.355
2001 (n = 128) 0.623 1.74E-28 7.669 113.213
Based on actual revenue and
2002 (n = 129) 0.474 1.89E-19 15.410 537.639
exponentiated predicted
revenue, not log (revenue) Remainder revenue
Standard errors and MAPE 1999 0.778 2.72E-45 15.106 1,180.751
exhibit similar results 2000 (n = 126) 0.734 1.84E-37 13.293 543.595
MSE mean squared error; MAD 2001 (n = 128) 0.643 6E-30 20.934 2,630.698
mean absolute deviation in 2002 (n = 129) 0.675 9.4E-33 28.601 4,931.665
millions of dollars
The findings for genre and rating for the log of remainder revenue are consistent
with the finding in the log of first week revenue estimation in that G/PG and PG-13
movies perform significantly better than the base R-rating. Horror, Comedy, Action,
Sci-Fi, and Drama all perform significantly better than children’s movies with
Action movies having the largest advantage. In addition, all genres are significant in
the remainder weeks, while only Action movies perform significantly better than
children’s movies in predicting first week revenues. Five of the 12 distribution
companies are statistically significant (Fox20, Disney/Touchstone, NewLine,
Miramax and Sony) at the 0.05 level.
The significant R2 of 0.778 in 1999 (Table 4) measures the fit between actual
remaining box office revenue and the predicted remaining box office in [in dollars,
not log (dollars)]. Surprisingly, the fits are stronger for the out-of-sample estimates
of remainder revenue in 2000, 2001, and 2002 than the fits for first week revenue
and range from 0.643 to 0.734. The MSDs for the out-of-sample years ranged from
$13.29 to $28.60 million. The consistency of these findings in the out-of-sample
data is in contrast to the literature; our results indicate that it is possible to predict
the remaining box office revenue with some accuracy. Our finding is of practical
importance because contracts between the theaters and the studios for the remaining
run are signed after knowing the initial box office success. The results do not appear
to be driven by specific movies and are due to the appropriateness of the model
estimated with the initial data.
6 Conclusion
Given the experience-good nature of movies, one would expect difficulty in

predicting revenues using a priori data. The two-step model analyzing the impact of
known information on the first week box office revenue (phase 1) and of acquired
information on the remaining life of the movie (phase 2) contributes to economists’
understanding of this experience good. This second phase is a separate estimation
involving remaining box office revenue. The study of the impact of the observed
123
J Cult Econ
revenue from the first week relative to the expected revenue from the first week on
the remaining revenue is a second contribution to the literature.
The two-phase estimation approach is successful in analyzing the 135 wide
release movies in 1999. We find significant summary measures (R2, F statistics) in
the first week revenue equation and in the remainder revenue equation. The finding
of significant fit indicates that Goldman’s description (1983) of the movie industry
as the ‘‘nobody knows anything’’ industry may be overstated. The success of the
critical reviews in each of the equations along with the success of the ratio of the
first week actual revenues to expected revenues in the second stage lends additional
credibility to this model. Most surprising of these results is that both first week
revenue and remaining week revenues can be estimated with OLS with the
assumption of a finite error term variance. This is in contrast to the work of De Vany
and Walls (2004) which concludes the need for estimation with a non-Gaussian
stable distribution with infinite variance. Contrary to the works of Elberse and
Eliashberg (2003) and of Fernandez-Blanco et al. (2012), endogeneity between
revenue and screens is not important in our estimation of either stage.
The third contribution to the literature is the evaluation of the extent to which fits
of the two-phase model can be extended to movies in the future by evaluating the
quality of the forecast on out-of-sample data points. The finding of relatively stable
fits for movies 3 years into the future is further indication of a reliable underlying
model. One would expect sizeable degradation of the quality of the fits as one
moves from the transformed dependent variable specification to the actual
dependent variable measures and from the estimation data set to out-of-sample
estimation. Sizeable MSEs and mean absolute deviations are expected due to the
experience-good nature of movies. In sum, the production of a movie is still a
gamble in domestic box offices, but the factors we expect to impact revenues still
play a large role in the gross revenue of a movie.
Appendix: Data sources
Dependent variable
Box office revenue for the first week and for the remainder of first run for most movies
is from www.boxofficemojo.com. Only movies with wide releases in 1999, 2000,
2001, and 2002 (defined by the site as being released in more than 600 theaters for at
least 1 week) were included. If a wide release movie was not included in the first site,
the data are from www.the-movie-times.com To account for inflation, movie revenues
($ millions) were adjusted to 2001 US dollars by dividing by the average ticket price
for a given year, found at http://boxofficemojo.com/about/adjuster.htm and then
multiplying by the average ticket price for 2001.
Explanatory variables
(1) Production budget is predominantly from www.boxofficemojo.com. More

than 99 % of production budget data were collected from this site. In the rare
123
J Cult Econ
instance it was not, budget data are from www.imdb.com. All production
budget data were adjusted to 2001 prices using the CPI.
(2) Number of theaters is the number of theaters in which the movie was released
in the first week. Although some movies were released in a small number of
theaters during the first week, all movies in this data set are considered ‘‘wide
release’’ because they eventually played in at least 600 theaters in a given
week. Data are from www.boxofficemojo.com.
(3) Percentage positive reviews data were obtained from the movie review site
www.rottentomatoes.com. This website was accessed for collection of data
for 1999, 2000, 2001, and 2002 in the summer of 2004. This website com-
piles all critical reviews that pass the websites standards. The methodology
the website uses to calculate what constitutes a positive or negative review
can be found at http://www.rottentomatoes.com/pages/faq#marginal.
(4) Number of critical reviews is from www.rottentomatoes.com. This website
was accessed for collection of data for 1999, 2000, 2001, and 2002 in the
summer of 2004.
(5) High season is defined as an indicator variable with 1 for an initial release for
June 1 through August 31 or with an initial release for November 15 to
December 31 based on opening day dates available at www.boxofficemojo.
com.25
(6) Prior popularity is defined to be a 1 if the movie was based off of a popular
book (e.g., Lord of the Rings, Harry Potter), TV show (The Brady Bunch), or
a comic book (Spiderman).
(7) Genre (drama, comedy, science fiction, action, horror, and children) is from
www.rottentomatoes.com. If multiple genres appear, the first genre that is
listed determines the type of movie dummy. Children have been used as the
base dummy.
(8) Rating (G/PG, PG-13, R) is from www.rottentomatoes.com. R has been used
as the base dummy. The data set does not include any unrated or NC 17
movies.
(9) Star power is defined using data from www.the-movie-times.com. It is
measured as the number of actors and/or actresses in each movie who starred
in at least 10 movies and ranked in the top 25 actors or top 25 actresses when
sorted by average box office gross during the 1990s. Star does not distinguish
between gender or whether the star is the lead or a supporting role.
(10) Distribution company is defined as the company listed as the company that
distributed the movie in its US theatrical opening. Twelve dummy variables
represent the various companies: Buena Vista, Warner Brothers, Columbia,
Paramount, twentieth century Fox, Universal, Sony, DreamWorks, Miramax,
MGM, Disney, and NewLine. The base case is other companies that had no
more than 2 movies released during the year. The data are from
www.rottentomatoes.com and www.imdb.com.
25
Production companies may dump bad movies during the last 2 weekends of August. This would
suggest shortening the time period of the dummy.
123
J Cult Econ
References
Adler, M. (1985). Stardom and talent. The American Economic Review, 75(1), 208–212.
Boatwright, P., Basuroy, S., & Kamakura, W. (2007). Reviewing the reviewers: The impact of individual
film critics on box office performance. Quantitative Marketing and Economics, 5(4), 401–425.
Collins, A., Hand, C., & Snell, M. C. (2002). What makes a blockbuster? Economic analysis of film
success in the United Kingdom. Managerial and Decision Economics, 23, 343–354.
De Vany, A., & Walls, W. D. (1996). Bose-Einstein dynamics and adaptive contracting in the motion
picture industry. Economic Journal, 106, 1493–1514.
De Vany, A., & Walls, W. D. (1997). The market for motion pictures: Rank, revenue, and survival.
Economic Inquiry, 35(4), 783–797.
De Vany, A., & Walls, W. D. (1999). Uncertainty in the movie industry: Does star power reduce the terror
of the box office? Journal of Cultural Economics, 23, 285–318.
De Vany, A., & Walls, W. D. (2002). Big budgets, movie stars, and wide releases: Empirical analysis of
the blockbuster strategy. Proceedings of the XIX Latin American meeting of the Econometric
Society, San Paulo.
De Vany, A., & Walls, W. D. (2004). Motion picture profit, the stable Paretian hypothesis, and the curse
of the superstar. Journal of Economic Dynamics and Control, 28, 1035–1057.
Deuchert, E., Adjamah, K., & Pauly, F. (2005). For Oscar glory or Oscar money? Journal of Cultural
Economics, 29(3), 159–176.
Elberse, A., & Eliashberg, J. (2003). Demand and supply dynamics for sequentially released products in
international markets: The case of motion pictures. Marketing Science, 22(3), 329–354.
Eliashberg, J., & Shugan, S. M. (1997). Film critics: Influencers or predictors? Journal of Marketing,
61(April), 68–78.
Epstein, E. J. (2012). The Hollywood economist 2.0: The hidden financial reality behind the movies.
New York: Melville House.
Fernandez-Blanco, V., Orea, B., & Prieto-Rodriguez, J. (2012). Endogeneity and measurement errors
when estimating demand functions with average prices. Empirical Economics,. doi:10.1007/s00181-
012-0587-z.
Goldman, W. (1983). Adventures in the screen trade. New York: Warner Books.
Gutierrez-Navratil, F., Fernandez-Blanco, V., & Prieto-Rodriguez, J. (2012). How do your rivals’
releasing dates affect your box office? Journal of Cultural Economics,. doi:10.1007/s10824-012-
9188-0.
Hadida, A. L. (2010). Commercial success and artistic recognition of motion picture projects. Journal of
Cultural Economics, 34, 45–80.
Hamlen, W., Jr. (1994). Variety and superstardom in music. Economic Inquiry, 32, 395–406.
Hennig-Thurau, T., Marchand, A., & Hiller, B. (2012). The relationship between reviewer judgments and
motion picture success: Re-analysis and extension. Journal of Cultural Economics, 36, 249–283.
King, T. (2007). Does film criticism affect box office earnings? Evidence from movies released in the US
in 2003. Journal of Cultural Economics, 31, 171–186.
Lee, F. L. F. (2009). Cultural discount of cinematic achievement: The academy awards and US movies’
East Asian box office. Journal of Cultural Economics, 33, 239–263.
Litman, B. R. (1983). Predicting success of theatrical movies: An empirical study. Journal of Popular
Culture, 16(4), 159–175.
McCulloch, J. H. (1998a). Linear regression with stable disturbances. In R. J. Adler, R. E. Feldman, & M.
S. Taqqu (Eds.), A practical guide to heavy tails statistical techniques and applications (pp.
359–378). Boston: Birkhauser.
McCulloch, J. H. (1998b). Numerical approximations of the symmetric stable distribution and density. In
R. J. Adler, R. E. Feldman, & M. S. Taqqu (Eds.), A practical guide to heavy tails statistical
techniques and applications (pp. 489–499). Boston: Birkhauser.
Nelson, R. A., Donihue, M. R., Waldman, D. M., & Wheaton, C. (2001). What’s an Oscar worth?
Economic Inquiry, 39, 1–16.
Nelson, R. A., & Glotfelty, R. (2012). Movie stars and box office revenues: An empirical analysis.
Journal of Cultural Economics, 36, 141–166.
Pokorny, M., & Sedgwick, J. (2001). Stardom and the probability of film making: Warner Bros in the
1930s. Journal of Cultural Economics, 25, 157–184.
123
J Cult Econ
Prag, J., & Casavant, J. (1994). An empirical study of the determinants of revenues and marketing
expenditures in the motion picture industry. Journal of Cultural Economics, 18, 217–235.
Ravid, S. A. (1999). Information, blockbusters, and stars—A study of the film industry. Journal of
Business, 72, 463–492.
Ravid, S. A., Wald, J. K., & Basuroy, S. (2006). Distributors and film critics: Does it take two to tango?
Journal of Cultural Economics, 30, 201–218.
Reinstein, D. A., & Snyder, C. M. (2005). The influence of expert reviews on consumer demand for
experience goods: A case study of movie critics. The Journal of Industrial Economics, 53(1), 27–51.
Rosen, S. (1981). The economics of superstars. The American Economic Review, 71(5), 845–858.
Sawhney, M. S., & Eliashberg, J. (1996). A parsimonious model for forecasting gross box-office revenues
of motion pictures. Marketing Science, 15(2), 113–131.
Simonoff, J. S., & Ma, L. (2003). An empirical study of factors relating to the success of broadway shows.
Journal of Business, 65(1), 135–150.
Simonoff, J. S., & Sparrow, I. R. (2000). Predicting movie grosses: Winners and losers, blockbusters and
sleepers. Chance, 13(3), 15–24.
Smith, S., & Smith, V. (1986). Successful movies: A preliminary empirical analysis. Applied Economics,
18(5), 501–507.
Wallace, W. D., Seigerman, A., & Holbrook, M. B. (1993). The role of actors and actresses in the success
of films: How much is a movie star worth? Journal of Cultural Economics, 17(1), 1–27.
Walls, W. D. (2005). Modeling movie success when ‘nobody knows anything’: Conditional stable
distribution analysis of film returns. Journal of Cultural Economics, 29, 177–190.
123

A Two-Stage Proxy Variable Approach To Estimating Movie Box Office Receipts

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Two-Stage Proxy Variable Approach To Estimating Movie Box Office Receipts

Uploaded by

Copyright:

Available Formats

J Cult Econ

A two-stage proxy variable approach to estimating

Frederick W. Derrick • Nancy A. Williams •

Received: 27 October 2011 / Accepted: 18 December 2012

Keywords Motion picture success Movie critics Two-stage estimation

JEL Classification C20 L82

F. W. Derrick N. A. Williams (&) C. E. Scott

2.1 Use of a two-stage model

2.2 OLS versus general stable distribution modeling

3 Two-stage model specification

3.1 Stage 1: first week domestic box office revenue equation

3.2 Stage 2: post-first week domestic box office revenue equation

week. We hypothesize that there is a change in the mindset of potential viewers

4 Testing De Vany and Walls’ (2004) findings on heavy Paretian tails

5.1 First week domestic box office revenue

Table 1 Descriptive statistics for wide release movies 1999 (n = 135)

Alternatively, it may be that a critic’s normative evaluation of a movie and a

Coefficient SE t Coefficient SE Wald

Constant -2.3225 0.2768 -8.391 -2.3225 0.2487 -9.339

* Significance at the 0.05 level

5.2 Post-first week domestic box office revenue equation

Coefficient SE t Coefficient SE Wald

Constant -0.8362 0.1926 -4.3416 -0.8362 0.1723 -4.8541

* Significance at the 0.05 level

Given the experience-good nature of movies, one would expect difficulty in

Appendix: Data sources

(1) Production budget is predominantly from www.boxofficemojo.com. More

You might also like