You are on page 1of 12

Journal of Cultural Economics 26: 53–64, 2002.

53
© 2001 Kluwer Academic Publishers. Printed in the Netherlands.

Research Note

The Distribution and Predictability


of Cinema Admissions

CHRIS HAND
Department of Economics, University of Portsmouth, Locksway Road, Milton, Southsea, Hants,
PO4 8JF, U.K.

Abstract. Using a time series data set covering the period 1936–1999, this paper investigates the
statistical distribution of cinema admissions and attempts to produce a forecasting model using the
ARIMA methodology.

Key words: ARIMA, cinema admissions, forecasting, heavy-tailed distribution

1. Introduction
Most of the published studies of cinema demand/admissions (e.g. Cameron 1986,
1988, 1990) have adopted a static model specification. Dynamic models are
employed by Fernández-Blanco and Baños-Pino (1997) who use Johansen’s proce-
dure to estimate a model of Spanish cinema admissions and Cameron (2000) who
adopts a Rational Addiction framework for U.K. data from 1965 to 1983. The focus
of these studies was to identify which factors have an impact on cinema admissions,
through estimating price and income elasticities for example. Less effort appears to
been focused on forecasting cinema admissions. The cinema industry’s interest in
such forecasts should be obvious, but the academic interest may not be so obvious.
Recently De Vany and Walls (1999) suggested that the revenue for an individual
film is inherently unpredictable, as the statistical distribution of revenues has an
infinite variance. This then begs the question, if individual film revenues (and hence
admissions) cannot be forecast, can total admissions be forecast with any degree
of accuracy? U.K. cinema-goers select which film they see well in advance of their
visit to the cinema. According to a recent Cinema and Video Industry Audience
Research survey (conducted by the Cinema Advertising Association) the majority
of people decide which film to see in advance of their visit (CAVIAR Consortium,
2000). Therefore it could be argued that film choice is the driving factor behind
cinema admissions and therefore admissions may be unpredictable (as the sum
of unpredictable admissions to the films on offer). Alternatively, the number of
“cinema prone” individuals in a given population might vary independently of film
54 CHRIS HAND

availability due to other factors such as availability/accessibility of cinemas, prior


experience of cinema-going or demographic factors such as age. The proportion of
this population subset choosing a particular film may not be predictable, but this
need not affect the size of the subset. This would suggest that cinema admissions
should be forecastable, as do the models of cinema admissions identified by the
studies cited earlier.
Forecasts of cinema admissions will of course be of interest to the cinema indus-
try, allowing for example the viability of expansion plans or the success of efforts
to promote the cinema to be assessed. There appear to be few sources of such
forecasts in the public domain, those produced by Dodona Research (Grummet
and Kouling, 2000) being perhaps the most accessible.

2. The Distribution of Admissions


The discussion of heavy-tailed distributions in economics has been largely re-
stricted to the financial economics literature, with De Vany and Walls’ (1999)
study of film revenues a recent exception. Their study concluded that film revenues
followed a stable-paretian distribution which displays unbounded higher moments
(stable-paretian distributions are discussed in more detail by Mandelbrot (1963)). It
may be reasonable to assume that if revenues follow a stable-paretian distribution,
admissions may follow the same distribution. If the series were heavy-tailed, the
first-differences of admissions would not be normally distributed with the distrib-
ution showing longer tails that the normal (i.e., leptokurtic). Hence a basic test for
heavy tailedness is to test for non-normality in the change in the value of the series
(i.e. first differences). In the finance literature where this approach has been used,
it is usual to transform the data by taking logarithms (denoted as LAdm-LAdmt in
Table I). The changes in admissions in the sample period are not normally distrib-
uted, but the log of the change in admissions is normally distributed according to a
Kolmogorov–Smirnov test (at the 5% level) as is shown in Table I.
Table I. Normality tests

LAdmt-LAdmt-1 Admt-Admt-1

Kolmogorov–Smirnov Z. 0.709 (0.696) 1.504 (0.022)

Note: Asymptotic significance level in parentheses.

The tail index of the stable paretian distribution (usually denoted as α) can be
estimated from a survival-type function. A conventional survival function gives
the probability of survival beyond a given time period, denoted as Pr(X > x). A
similar function can be calculated to show the probability of a level of admissions
being above a given level, where x represents admissions rather than time. Using
the top 10 per cent of the sample (i.e., the biggest changes in admissions) the upper
tail index can be obtained from the equation log (Pr(X > x)) = a + b log x.
THE DISTRIBUTION AND PREDICTABILITY OF CINEMA ADMISSIONS 55

The tail index is given by the coefficient attached to log x. If the tail index is less
than 2 it suggests that the variance of the distribution is unbounded. Using this
technique, assuming the top 6 observations constitute the tail, an estimate of α =
1.19 was obtained from an OLS regression. A second approach, using all of the
observations has been developed by Nolan (1999). Using the quantile estimator in
Nolan’s STABLE program, estimates of α = 2 were obtained (for a more detailed
discussion of these tail index estimators see Lee, 1999).
On the basis of the results above, the logarithm of the changes in admissions
would appear to be normally distributed; hence it would be appropriate to proceed
using a standard econometric technique such as ARIMA.

3. The Modelling Strategy


This study attempts to model cinema admissions using the Box–Jenkins approach
otherwise known as ARIMA (Autoregressive Integrated Moving Average) method-
ology (Box and Jenkins, 1976). ARIMA modelling has fallen somewhat out of
favour in mainstream economic modelling, although it was widely used and is
still used as a benchmark for assessing alternative forecasting techniques. Time
series models are increasingly being estimated using cointegration models and time
series modellers are cautioned to test their time series for non-stationarity (Harris,
1995). There are practical reasons for adopting the ARIMA framework over any
other: firstly, ARIMA is a dedicated forecasting technique; in the tourism literature,
ARIMA models have been found to be better able to forecast than other econo-
metric models (Dharmaratne, 1995). Secondly, cointegration requires a significant
amount of data to be available. In cases where only a few reliable or complete
series are available, cointegration techniques cannot be gainfully employed, but
ARIMA models can be estimated and will allow forecasts to be generated. For the
reasons stated above ARIMA modelling is more often employed in the tourism
economics literature, recent examples being Dharmaratne, (1995), Kim (1999)
and Dalrymple and Greenidge (1999). ARIMA models also have the intuitively
appealing property of letting the data “speak for themselves” and does not require
the modeller to identify each and every factor which may influence the dependent
variable. It could be argued that the aggregate series likely to be available, such as
price, income, price of substitutes and the number of cinema sites and screens, may
not adequately explain cinema admissions with individual-specific factors playing
a more important role.

4. The Data and Results


It is the second reason above, the lack of complete data series, which necessitates
the use of the ARIMA framework. Earlier studies based on U.K. data (those by
Cameron) used quarterly data collected for the U.K. Department of Trade and
Industry (DTI) by H.M. Customs and Excise. With the passing of the Films Act
56 CHRIS HAND

Figure 1. Cinema admissions 1936–1999 (source CAA, undated).

in 1985 the Eady Levy, which supported domestic production with funds levied on
exhibition, was abolished. With the levy gone, little reason was seen in collecting
statistics. The operation of the levy required the collection of data on the number of
cinemas, seating capacity, number of admissions, gross box office takings, average
ticket price, payments for film hire and payments to the British film fund. Data
on numbers of full-time and part time staff were also collected. The collection
and publication of cinema statistics was recommenced in 1987 (using a voluntary
panel, benchmarked against the Annual Business Inquiry), hence there is a 2-year
gap in the official admissions, screens, box office takings and payments for film
hire series. Given this, data collected by the cinema industry can be used; such a
series is collected by the Cinema Advertising Association (CAA, undated). The
main source for this up to 1985 was the DTI, after 1985 data were collected by
market research companies. Currently the data are collected by AC Nielsen EDI, a
major supplier of data to the film industry and to trade publications such as Variety
and Screen International. There is a shift in the series as the DTI data defined the
cinema market geographically, covering just the U.K. However, as the U.K. and the
Republic of Ireland form one market for film distributors, the industry data covers
both territories (see Allin, 1998). Hence the CAA series has recorded higher values
than the official series in recent years. The problem posed by the structural shift in
the data is not insurmountable, being accounted for by the introduction of a dummy
variable where necessary.
The CAA series consists of annual observations of cinema admissions from
1936 to 1999. The admissions series is shown as Figure 1. Cinema admissions
fell steadily throughout the 1950s, ’60s and ’70s after the upswing in admissions
during the Second World War and its aftermath. Various explanations have been put
THE DISTRIBUTION AND PREDICTABILITY OF CINEMA ADMISSIONS 57

Figure 2. ACF plot of admissions.

forward for this decline (for example, the introduction of television, the availability
of credit to purchase consumer durables, improved housing) by Spraos (1962).
Subsequently, home video has also been cited as a cause of the decline in admis-
sions, although whether it caused a decline or just prevented an increase is a moot
point. The decline may have been caused in part by a vicious spiral, as admissions
fell cinemas began to close, as there was less opportunity to see films (as fewer
cinemas were operating) admissions fell further. The upturn in cinema admissions
coincides with the expansion of the exhibition sector. The first multiplex in the
U.K. opened in 1986 signalling the start of the growth in cinema screens (and the
trend towards multi-screen cinemas in the U.K.).
The data were transformed into logarithms to account for the non-normality of
the data. The ARIMA methodology requires the data series to be stationary as it as-
sumes the process generating the series remains constant over time. An Augmented
Dickey–Fuller test is usually performed to assess whether a variable is stationary.
However, given that the data show both a change in the underlying trend and a
structural shift, the ADF test is less powerful.1 Hence, the sample autocorrelation
function (ACF) was also plotted, and the pattern it revealed was used to discern
whether the series is stationary (Figure 2). If the series is stationary (as the ADF
test suggested), the ACF plot should show little correlation between observations.
The plot shows significant correlations after many lags, which is indicative of
non-stationarity. After first differences are taken, only lags 1 and 2 are significant
with subsequent lags tailing off. Hence the series is identified as I(1) and modelling
proceeds on the differenced series.
An initial specification for the ARIMA model can be obtained from the ACF and
Partial Autocorrelation Function (PACF) with the number of significant lags in the
ACF and PACF suggesting the number of moving average terms and autoregressive
terms respectively, which should be included as shown in Table II.
The ACF and PACF plots (Figure 3) show significant correlations at lags 1 and
2 suggesting ARIMA (2,1,2) as an initial specification.
From this initial specification, 3 models can be identified on the basis of the sig-
nificance (at the 5% level) of the individual coefficients: ARIMA (1,1,1), ARIMA
(2,1,0) and ARIMA (0,1,2).2 For notational convenience these models will be de-
58 CHRIS HAND

Table II. ARIMA model specification guide

Model ACF PACF

AR (p) Tails off Cuts off at lag p


MA (q) Cuts off at lag q Tails off
ARMA (p, q) Tails off Tails off

Figure 3. Autocorrelation and partial autocorrelation function plots.

noted as I, II and III respectively below. Models I and II produced non-normal


errors due to the presence of a single outlying value. This was the result of the
structural shift in the data when government-collected figures were replaced by
industry-collected figures. This was corrected for by including an impulse dummy
(1985 = 1) in the estimated models (Model III did not require an impulse dummy
to produce normal errors).
The models should produce residuals that are white noise. The independence of
the residuals can be tested using either the Box–Pierce Q test (Box and Pierce,
1970) or the Ljung–Box (LB) test (Ljung and Box, 1978) both producing test
statistics which are approximately χ 2 distributed. However, the LB test has better
small sample properties (Gujarati, 1995) so it alone is employed here (see appen-
dix). For lag length k = 16 the LB test produces values of 8.111 for Model I,
7.549 for Model II and 12.379 for Model III, with 13 degrees of freedom. None of
these is significant at the 5% level, suggesting all three models produce white noise
residuals.
THE DISTRIBUTION AND PREDICTABILITY OF CINEMA ADMISSIONS 59
Table III. Goodness of fit statistics

Model I II III

rmspe 9.1% 9.3% 9.9%

If more than one specification satisfies the LB test, a number of other decision
criteria can be used to select the best specification for the model. A variety of
tests can be found in the published ARIMA studies; which tests are employed
and reported appears to depend on personal preference (and on which tests are
produced by the software package used). One can test for goodness of fit to the
observed series, using the assumption that the closer the fit, the better the forecasts
are likely to be. In this study, goodness of fit is tested using the Root Mean Square
Percentage Error (RMSPE) of each model (defined in the appendix). The lower the
RMSPE, the better the fit to the data. The test was conducted after the fitted values
were transformed back into levels, and the results are shown in Table III.
As is clear from Table II, all three models perform reasonably well, with Models
I and II being the best by a small margin. There is of course no guarantee that a
model that closely follows the observed series will also generate accurate forecasts.
Hence, as Pindyck and Rubinfeld (1991) suggest, where more than one specifica-
tion satisfies the assumptions about the residual (i.e., produces random errors) the
best specification should be selected on the basis of its out of sample forecasting
accuracy.
The last three years (1997–1999) of the sample were excluded from the esti-
mation period, allowing these observations to be used to test the accuracy of the
models’ forecasts. A three year test period was chosen to provide both a suitably
long estimation period and a sufficiently large test period to draw conclusions.
The three models were used to generate forecasts for 1997–1999 and the accuracy
assessed using mean absolute percentage error (MAPE), which is defined in the
appendix. The lower the MAPE value, the more accurate the forecast. In general a
MAPE of less than 10 per cent is regarded as highly accurate (Lewis, 1982). The
accuracy of the forecasts was evaluated after the predicted values were transformed
back from logarithmic values to actual values. The results are shown in Table IV.
The three year forecasts produced by all three models underestimate the level
of admissions for all three years. However, the last three years of the sample (kept
back to test the models’ predictions) display a rather unusual pattern: admissions
being 139 million, 135 million and 139 million in 1997, 1998 and 1999 respec-
tively. Towards the end of the sample the series changes direction more frequently
than in earlier years. This may be the effect of particularly successful films drawing
more people to the cinema (e.g. Four Weddings and a Funeral, The Full Monty and
Titanic). Model I (the ARIMA (1,1,1) model) clearly outperforms the other models
60 CHRIS HAND

Table IV. Three year forecasts

Model I II III

Forecast (millions) 1997 126.6 121.9 122.4


1998 129.9 124.6 126.3
1999 132.8 125.0 126.3

Error (millions) 1997 12.4 17.1 17.6


1998 5.1 10.4 9.7
1999 6.2 14.0 13.7

Confidence Interval 1997 104.3–153.7 100.6–147.8 100.2–149.6


(millions) 1998 94.5–178.3 91.2–170.4 90.2–176.7
1999 85.7–205.9 79.2–197.3 78.7–202.5

MAPE 5.7 10.0 9.2

Table V. One year forecasts

Model I II III

Forecast (millions) 137.4 136.7 134.9


Error (millions) 1.6 2.3 4.1
Confidence Interval (millions) 114.16–165.37 113.33–164.90 110.27–164.96
1 year MAPE 1.15 1.94 2.95

on the basis of the MAPE values. However, it should remembered that any measure
of forecast accuracy is sensitive to the test period chosen.
The models were re-estimated including 1997 and 1998 in the estimation period
to generate a 1-year forecast to assess the models’ shorter term forecasting ability.
The 1-year forecast values, errors, confidence intervals and MAPE values are pre-
sented in Table V. On the basis of the MAPE values, all three models perform well,
but again Model I outperforms the other specifications.
The forecasts are accompanied by fairly wide 95 per cent confidence intervals
which suggest that only short term forecasts can be made with any degree of
accuracy (longer term forecasts compound the error, so the confidence intervals
will widen over time). To some extent this illustrates that the measure of forecast
accuracy depends in part on the forecast period used.
THE DISTRIBUTION AND PREDICTABILITY OF CINEMA ADMISSIONS 61
Table VI. Cinema admissions forecasts

Year Forecast (millions) Lower Bound Upper Bound


95% C.I. 95% C.I.

2000 142 118 170


2001 144 106 196
2002 146 95 226

Figure 4. Cinema admissions, 1980–1999, and forecasts, 2000–2002 (millions).

5. Forecasted Admissions

The ARIMA (1,1,1) model is used to produce a three year forecast for the future
(with the previously used test period included in the estimation period) shown in
Table VI.
It has been suggested that the growth in cinema admissions is not likely to con-
tinue at its average rate over the past decade of approximately 3 per cent per annum
(Grummet and Couling, 2000). The growth in cinema screens has outstripped the
growth in admissions in recent years and does not seem sustainable. The expecta-
tion is that the growth in admissions will slow, as the forecasting model shows. As
the in-sample forecasting performance of the models and the widening confidence
intervals suggest, however, confidence can only be placed in the one-step ahead
forecasts. This is clearly illustrated if the forecast values and confidence values are
plotted as in Figure 4.
62 CHRIS HAND

6. Conclusions
The ARIMA model used above is a fairly simple model of cinema admissions.
A more accurate model might be obtained through the use of less aggregated data
(quarterly or monthly). In general, as Kim (1999) argues, monthly or quarterly data
may contain fewer distortions because of their lesser degree of aggregation. Data
on quarterly admissions have been collected and published for the U.K. by the
U.K. Office for National Statistics (ONS), but the series is somewhat erratic with
no clear seasonal pattern emerging, so there is less reason to expect it to produce
a significantly more accurate model (the ONS estimated seasonal factors are fairly
small). Disaggregated observations will also show the effects of individual films
more clearly; a blockbuster film released in one quarter may induce a surge of
admissions in that quarter suggesting that the error from a quarterly model may
well be as great if not greater than that from an annual model. Hence, using less
aggregated data could introduce distortions into the model and could potentially
encounter problems of infinite variance.
The results above would appear to suggest that, even if the performance of
individual films cannot be forecast, the level of cinema admissions is forecastable,
at least in the short term. This might appear to be something of a paradox as total
admissions can be forecast (at least 1 year ahead), admissions to particular films
cannot, yet film choice is the driving factor behind cinema-going. Whilst it has been
shown that film revenues appear to be heavy tailed (De Vany and Walls, 1999; Lee,
1999), on the basis of the results presented here, annual admissions are not. The
implications of the stable-paretian distribution will only be important if the word of
mouth support for a film encourages those who would not otherwise have attended
to go to the cinema (i.e., it results in extra admissions). If the word of mouth effects
encourage substitution of one film for another, there would be no effect on the level
of admissions and hence on the predictability of admissions. It is likely that both
results occur, which would appear to suggest that admissions may be forecastable,
but such forecasts may be subject to large errors.

Acknowledgements
I would like to thank the two anonymous referees for their detailed and helpful
comments on earlier drafts of this paper.

Appendix: LB, RMSPE and MAPE Statistics


The LB test is defined as:

K  
ρk2
LB = n(n + 2) , (A.1)
k=1
n − k
THE DISTRIBUTION AND PREDICTABILITY OF CINEMA ADMISSIONS 63

where n is the number of observations, K is the lag length and ρk2 is the sample
autocorrelation coefficient. The test statistic follows the χ 2 distribution with k −
p − q degrees of freedom (i.e. lag length, k, minus the number of autoregressive
and moving average terms, p and q minus other variables, such as the 1985 dummy
if included).
The rmspe is defined below:
  f 2

1  T
Yt − Yta
rmspe =  , (A.2)
T t =1 Yta
f
where Yt is the fitted value of Y in period t, Yta is the actual value of Y in period
t, and T is the number of observations. The error is expressed in percentage terms
to allow for easier comparison across models.
The Mean Absolute Percentage Error (MAPE) is used to asses the accuracy of
forecasts and is defined as:
1  |et |
n
MAPE = × 100 , (A.3)
n t =1 Nt

where n is the number of forecasts, et is the forecast error and Nt is the actual
observation. Using absolute values prevents positive and negative forecast errors
from cancelling each other out and hence provides a better indication of the forecast
accuracy.

Notes
1. Whilst it is possible to reformulate the ADF test to accommodate structural breaks, a simpler
approach to testing for stationarity using the Autocorrelation Function (ACF) is used here. This
approach appears to be more usually adopted in the ARIMA literature.
2. In ARIMA notation, an ARIMA (p, i, q) model contains p autoregressive terms, is based on
data integrated of order i (i.e., the data is I (i)) and contains q moving average terms.

References
Allin, P. (1998) “Statistics on Film: What the Official Statistics Show”. Cultural Trends 30: 5–23.
Box, G.E.P. and Jenkins, G.M. (1976) Time Series Analysis: Forecasting and Control. Holden-Bay,
San Francisco.
Box, G.E.P and Pierce, D.A. (1970) “Distribution of Residual Autocorrelation in Autoregressive
Integrated Time series Models”. Journal of the American Statistical Association 70: 1509–1529.
CAA (undated) Screens and Admissions History. CAA, London.
Cameron, S. (1986), “The Supply and Demand for Cinema Tickets: Some U.K. Evidence”. Journal
of Cultural Economics 10: 38–62.
Cameron, S. (1988) “The Impact of Video Recorders on Cinema Attendance”. Journal of Cultural
Economics 12: 73–80.
Cameron, S. (1990) “The Demand for Cinema in the U.K.”. Journal of Cultural Economics 14:
35–47.
64 CHRIS HAND

Cameron, S. (1999) “Rational Addiction and the demand for the Cinema”. Applied Economics Letters
6 (9): 617–620.
CAVIAR Consortium (2000) CAVIAR 17. Cinema Advertising Association, London.
Dalrymple, K. and Greenidge, K. (1999) “Forecasting Arrivals to Barbados”. Annals of Tourism
Research 26 (1): 188–190.
De Vany, A.S. and Walls, W.D. (1999) “Uncertainty in the Movie Industry: Does Star Power Reduce
the Terror of the Box Office?” Journal of Cultural Economics 23 (4): 285–318.
Dharmaratne, G. (1995) “Forecasting Tourist Arrivals in Barbados”. Annals of Tourism Research
22 (4): 804–818.
Fernández-Blanco, V. and Baños Pino, J. (1997) “Cinema Demand in Spain: A Cointegration
Analysis”. Journal of Cultural Economics 21 (1): 57–75.
Grummet, K.P. and Couling, K. (2000) Cinemagoing 8. Dodona Research, Leicester.
Gujarati, D. (1995) Basic Econometrics, 3rd edition. McGraw-Hill, London.
Harris, R.I.D. (1995) Cointegration Analysis in Econometric Modelling. Prentice Hall, London.
Kim, J.H. (1999) “Forecasting Monthly Tourist Departures from Australia”. Tourism Economics
5 (3): 277–291.
Lee, C.H.K. (1999) “Heavy-Tailed Distributions and the Motion Picture Industry”, unpublished
paper, available at http://orion.oac.uci.edu/% 7Ehlee2/papers.html.
Lewis, C. (1982) Industrial and Business Forecasting Methods. Butterworth Scientific, London.
Ljung, G.M. and Box, G.E.P. (1978) “On a Measure of Lack of Fit in Time Series Models”.
Biometrika 66: 66–72.
Mandelbrot, B. (1963) “The Variation of Certain Speculative Prices”. Journal of Business 36 (4):
394–419.
Nolan, J.P. (1999) STABLE version 2.16 [online] available from http://www.cas.american.
edu/∼jpnolan accessed 18/5/2000.
Pindyck, R.S. and Rubinfeld, D.L. (1991) Econometric Models and Economic Forecasts, 3rd edition.
McGraw-Hill, London.
Spraos, J. (1962) The Decline of the Cinema: An Economist’s Report. George Allen and Unwin,
London.

You might also like