The Costs of Growth: Accelerated Growth and Crowd-Out in The Mexican Supermarket Industry

International Journal of Industrial Organization 61 (2018) 1–52
Contents lists available at ScienceDirect
International Journal of Industrial

Organization
www.elsevier.com/locate/ijio
The costs of growth: Accelerated growth and

crowd-out in the Mexican supermarket industryR
Mauricio J. Varela
Department of Economics, University of Arizona, USA
a r t i c l e i n f o a b s t r a c t
Article history: Retailers expand gradually into new markets. Policies that aim
Received 10 February 2017 to facilitate or delay this process must understand how firms
Revised 16 August 2018 differ in their ability to expand and how firms’ expansions af-
Accepted 22 August 2018
fect that of others. I propose a model that quantifies various
Available online 28 August 2018
explanations for gradual expansion, with focus on expansion
JEL classification: costs: those due to the rate of outlet expansion. I estimate the
L81 model using entry patterns in Mexico’s supermarket indus-
L13 try for the years 1996–2006. I find firms’ incur a 33% higher
F61 cost for opening the marginal store during the expansion pe-
F23 riod rather than at the end of it. I simulate how the industry
would have grown under lesser expansion costs and find that,
Keywords:
Gradual growth
although the industry would have added more stores on av-
Entry erage, there are outcomes in which certain firms’ accelerated
Supermarkets expansion would have crowded out other firms’ expansion, re-
Walmart sulting in fewer long-run stores and lower consumer welfare.
Mexico © 2018 Elsevier B.V. All rights reserved.
1. Introduction
When expanding its roster of outlets across geographic markets, retailers typically do
so gradually rather than all at once. Walmart’ s expansion is a textb o ok example of
R
I am very grateful to Guy Arie, Paul Greico, Kate Ho, Mike Mazzeo, Aviv Nevo, Mo Xiao and two
anonymous referees for their useful comments. All errors are my own.
E-mail address: mvarela@email.arizona.edu
https://doi.org/10.1016/j.ijindorg.2018.08.006
0167-7187/© 2018 Elsevier B.V. All rights reserved.
2 M.J. Varela / International Journal of Industrial Organization 61 (2018) 1–52
this gradual growth processes. In this paper, I study some of the drivers of such gradual
growth, in the context of the Mexican supermarket industry. Specifically, is the observed
gradual expansion of supermarket stores in Mexico driven by firms’ constraints on how
many outlets can be open in a given year: i.e. expansion costs as modeled in Baumol
(1962)? Alternatively, are the drivers of gradual expansion other factors such as demand
growth, changing competitive landscapes, increasing access to distribution, increasing
network economies, etcetera?
Stimulating business growth has always been a high priority in public policy. For
example, the current US administration implemented a “Two-for-One” executive order
that requires federal agencies to cut two existing regulations for every new regulation
they implement. Commenting on this executive order, President Trump stressed how the
purpose of this new measure is to ease the opening and expansion of small businesses
(cf. Lam, 2017). As for the so cial b enefits of accelerating supermarket entry, it increases
employment (Basker, 2005), increases productivity and innovation of suppliers (Javorcik
et al., 2008; Iacovone et al., 2015), and displaces stores in the informal sector, increasing
tax revenue (Fuentes et al., 2008) and health safety. More importantly, it gives consumers
earlier access to products and services, generating welfare from trade. However, in order
to effectively stimulate supermarket expansion, it is important to understand the drivers
of gradual growth.
Crowd-out plays an important role in the growth process of industries. Dep ending
on the driver of gradual growth, accelerating growth can lead some firms to crowd-out
others, resulting in fewer offerings. In this paper, in addition to studying the drivers of
gradual growth in Mexico, I study how stimulating growth by easing expansion costs
could have resulted in fewer stores and higher market concentration due to crowd-out.
Section II illustrates these points with a simple theoretical model.
Much work exists1 on the drivers of gradual growth. In this paper, I focus on expansion
costs: the incremental costs firms incur from opening various stores simultaneously in a
year, relative to opening them sequentially over various years. However, I do not explore
the reasons behind such expansion costs: be it limits on managers’ ‘span-of-control’ (e.g.
Lucas, 1978), upward sloping supply curves for entrepreneurial talent, imperfect financial
markets, increasing company risks, etc. Instead, I contrast expansion costs as a whole with
drivers of store profitability: market demographics, the comp etitive landscape (Ellickson
et al., 2013; Maican and Orth, 2018), distribution economies (Holmes, 2011), density
economies (Jia, 2008; Nishida, 2015), and preemption and learning (Aguirregabiria and
Vicentini, 2016).
Once expansion costs and store profitability have been estimated, I simulate how
the industry would have evolved had expansion costs been reduced. These simulations
illustrate how accelerated growth can foster crowd-out, resulting in fewer stores. I also
simulate how the industry would have evolved had store profitability been improved (e.g.
through favorable taxation policies) and show how, for this particular setting, crowd-out
1
Penrose (1995) provides a summary of the classic models, and Sutton (2001), of more recent work.
M.J. Varela / International Journal of Industrial Organization 61 (2018) 1–52 3
is more pervasive under this type of growth policy vis-a-vis policies that reduce expansion
costs (e.g. reducing regulations for store opening).
In order to estimate expansion costs and store profitability, I propose a structural
model that builds on Bresnahan and Reiss (1990, 1991). In a one-shot game (i.e. a
static game), firms simultaneously choose how many stores to open and where to open
them. The cost of opening a store is increasing in the number of stores the firm opens,
capturing the expansion costs. The profitability of these stores is indexed by a rich set
of covariates that capture demand (i.e. market demographics), competition from both
inter-firm and intra-firm stores, density economies, distribution economies, and dynamic
incentives (e.g. future demand, learning, etc.). The model identifies expansion costs from
comparing firms’ entry decisions over time. More sp ecifically, the mo del estimates per-
store entry costs for each year and infers expansion costs from the correlation between
these yearly costs and firms’ yearly number of stores opened. The yearly entry costs
are themselves identified from where firms choose to open stores, contrasting markets
in which firms opened stores to markets in which they did not. Importantly, as this
identification applies on a firm-by-firm basis, the model estimates entry costs for each
firm separately, therefore capturing heterogeneity that is important in analyzing crowd-
out under accelerated growth.
Why use a static model to estimate a growth process? With non-zero expansion costs,
geographic markets are interdependent: op ening an additional store in Mexico City in-
creases the cost of opening a store in Tijuana. Hence, entry decisions cannot be modeled
market-by-market. The static model proposed in this paper captures these cross-market
interdependencies. A dynamic mo del creates interdependencies over time: a decision to
enter today affects the decision to enter tomorrow. A model that couples cross-market
interdependencies with intertemp oral dep endencies would be intractable. As the key fo-
cus of this paper is to understand expansion costs, it is best to model cross-market
interdependencies explicitly and capture dynamic incentives with static controls.
I estimate the model using Mexico’s supermarket expansion from 1996 to 2006. It
is a time of expansion for both Walmart and other discount retail chains. I find firms’
expansion costs are sizable: firms incur a 33% cost increase by opening the marginal store
early on, rather than waiting until the end of the expansion wave. More importantly, there
is significant heterogeneity across firms: Soriana, Comercial Mexicana, and Walmart have
high expansion costs, whereas Gigante, Casa Ley, and Chedrahui do not. In addition to
heterogeneity in expansion costs, there is also heterogeneity in competitive pressure:
Walmart is least affected by rival entry, with profits decreasing only fifteen percent upon
the entry of a rival firm’s store into the same market. In contrast, Soriana’s profits
decrease by twenty five percent upon entry of a rival firm’s store. This heterogeneity
comes into play when simulating industry expansion under relaxed expansion costs.
I simulate the industry’s expansion had expansion costs been ten percent lower than
the estimated values. These simulations show how reducing expansion costs would have
likely resulted in opening of more stores, in earlier opening of stores, and in lesser long-run
concentration. However, the simulations also show how, under certain scenarios, reducing
expansion costs would have resulted in negative long-run effects, with the expansion
p erio d ending with fewer stores and more concentrated markets than actually occurred,
i.e. without reducing expansion costs. These simulations illustrate both the benefits and
pitfalls from accelerating firm growth: consumers get earlier access to firm’s products but
firm crowd-out may lead to lesser products in the end.
Previous work on Walmart entry has focused on its impact on the labor market
(Basker, 2005) and on drivers of the geographic location of stores: economies of scale
in distribution (Holmes, 2011), regional complementarities (Jia, 2008), and regional ri-
valry (Ellickson et al., 2013). On the international front, Maican and Orth (2018) study
how licensing fees affect the Swedish grocery industry and Nishida (2015) the impact of
density economies in Japanese convenience stores. While my model borrows from these
papers, I focus on the rate of store entry and on expansion costs. By focusing on the
rate of store entry, I shed light on the growth process of firms. In doing so, however, I
imp ose mo deling limitations relative to prior literature: I do not have a dynamic model
as in Holmes (2011) and I do not endogenize regional complementarities as in Jia (2008).
These trade-offs are necessary for tractability and I discuss their impact on the model.
The hope is that they allow me to say something about expansion costs and inform the
discussion on long-run impacts of accelerating firm growth in new industries.
To better understand the potential for crowd-out, the next section presents a simple
model of store expansion and illustrates how reducing expansion costs can result in
fewer products being offered. Sections 3 provides necessary background on the Mexican
supermarket industry and describes the available data. A model of store entry, tailored to
the Mexican supermarket industry, is presented in Section 4. Afterwards, in Section 5, I
introduce an estimator that greatly simplifies the estimation process of the entry model.
However, for the sake of clarity, this estimator is introduced using a simple two-player
coordination game. Section 6 applies this estimator to the model of store entry, with
results shown in Section 7. Finally, Section 8 simulates the expansion of the Mexican
supermarket industry under alternative expansion costs and illustrates the roll of crowd-
out in the Mexican supermarket industry. Conclusions follow.
2. Understanding expansion costs and crowd-out
In this section I present a simple theoretical framework to understand how expansion

costs shape long-run market structures and how reducing such costs can result in lesser
net offerings due to crowd out.
Assume there are two firms, A and B, both of which have access to the same two
identical markets, M and N. Firms make entry decisions over the course of two p erio ds:
in the first p erio d firms cho ose, simultaneously, whether to enter both markets, only
market N, only market M, or neither. After first p erio d choices are made and observed
by everyone, in a second p erio d, firms simultaneously choose to enter any markets they
themselves had not entered in the prior p erio d.
The cost of entering any one market in any given p erio d is given by fA and fB . The
cost of entering both markets simultaneously is given by FA and FB . Assume expansion
costs are non-negligible: Fi > 2fi for i ∈ {A, B}. That is, it is more costly to enter both
markets simultaneously than to do so sequentially.
Immediately upon entering a market, firms incur a p er-p erio d revenue stream from
such market, given by RA and RB if facing no competition and by rA and rB if the rival
firm has also entered such market. Assume monopoly revenue is higher than duopoly
revenue: RA > rA and RB > rB . In addition, assume no discounting. Hence, as an example,
the profit for firm A from entering market M in the first p erio d, market N in the second
p erio d, and having firm B enter no market ever, is 3RA − 2fA , where 3RA captures the
first p erio d revenue in market M and the second p erio d revenue in markets M and N.
In order to illustrate how decreasing expansion costs can result in lesser net entry, the
following assumption characterizes firm A as a strong firm that prefers sequential entry
and firm B as a weak firm that requires monopoly rents to justify entry costs:
Assumption 1. Revenues and costs are such that
1. Firm A prefers sequential entry but always enters: RA < FA − 2fA and rA > fA
2. Firm B requires monopoly rents and never enters simultaneously: 12 FB > RB > fB >
rB
Given these assumptions, in all sub-game perfect Nash equilibria (SPNE), firm A and
B enter opposing markets in the first p erio d and, in the second p erio d, firm B enters no
market and firm A enters its remaining market. Thus, both markets have one firm in the
first p erio d and one market has two firms in the second p erio d.
It is useful to understand why the SPNE is such. First, note how entering both markets
in the first p erio d dominates doing so in the second p erio d. Second, Assumption 1 guar-
antees firm A enters any remaining markets in the second p erio d. Because of this, and
since firm B needs monopoly rents to justify entry, firm B never enters in the second
p erio d. Firm B also never enters both markets simultaneously in the first p erio d (by
Assumption 1). Hence, in any equilibria, firm B enters at most one market, and does
so in the first p erio d. As none of firm A’s first p erio d actions can alter firm B’s second
p erio d actions, i.e. firm B never enters in the second p erio d, firm A has no preemption
benefits from simultaneously entering both markets in the first p erio d. As firm A prefers
sequential entry –by Assumption 1–, in equilibrium, firm A enters sequentially. Finally,
as firm A’s profits are higher when coordinating first period entry away from firm B’s
first p erio d entry, b oth firms enter opposing markets in the first p erio d.
Let us now contrast these equilibrium outcomes with the equilibrium outcomes that
would have arisen absent expansion costs. That is, assume Fi = 2fi for i ∈ {A, B} and re-
place Assumption 1 with a new assumption that accommodates the absence of expansion
costs, but otherwise retains the same premises:
Assumption 2. Revenues and costs are such that firm A always enters (i.e. rA > fA ) and
firm B requires monopoly rents to enter: RB > fB > rB
Under these new assumptions, all entry occurs in the first period. However, to fully
characterize the SPNE of this new game, one needs to additionally specify whether two
p erio d duopoly rents are sufficient to justify firm B’s entry. That is, if 2rB is smaller than
fB there is a unique equilibrium in which firm A enters both markets in the first p erio d
and firm B enters neither. On the other hand, if 2rB is larger than fB , both firms enter
both markets in the first period.
In summary, whenever 2rB is less than fB , reducing expansion costs results in only
firm A entering both markets. No market ever becomes a duopoly nor do customers ever
have access to firm B. This occurs because firm A effectively crowds-out firm B: firm A’s
early entry in both markets doesn’t allow firm B to ever enjoy monopoly profits, making
it unattractive for firm B to enter. On the other hand, whenever 2rB is larger than fB ,
reducing expansion costs results in both markets becoming duopoly markets in the first
p erio d: more entry occurs and it occurs sooner.
There are two additional points worth noting. First, crowd-out is not preemption:
neither firms’ choices in p erio d one affect rival’s choices in p erio d two. Second, foresight
is not required for crowd-out to occur: if firms were myopic and firm A prefers sequential
entry under myopia, i.e. if RA < 12 FA replaces RA < FA − 2fA in Assumption 1 (1), the
resulting equilibria is one where both firms enter opposing markets in the first p erio d
and firm A enters the remaining market in the second p erio d. Reducing expansion costs,
so that FA is decreased to 2fA , results in a new equilibrium in which firm A enters both
markets in the first p erio d and firm B enters neither, ever. As firm B is also myopic, the
equilibrium is irrespective of the relationship between 2rB and fB .
3. Data and industry background
3.1. Overview
The principal data source is the National Retailers’ Association (i.e. ANTAD) annual
directories, from 1995 to 2006. The directories contain store counts, by store brand and
by municipality. Walmart data missing in the ANTAD directories was acquired from
Walmex and complemented with maps in Iacovone et al. (2015) and annual reports. All
major Mexican chain retailers are included in the directories.
This study focuses on self-service supermarket, hypermarket, and bodega stores, as
defined by the ANTAD. These stores sell a wide range of groceries and general mer-
chandise and have sales floor size generally greater than 25 thousand square feet. For
simplicity, I refer to this sector as the supermarket industry.
The p erio d under study, 1995 through 2006, includes the years that follow the signing
of NAFTA, which Durand (2007) and Iacovone et al. (2015) suggest was a major driver
Fig. 1. Store counts, by firm and year.
behind the expansion of discount retailers in Mexico. The study ends in 2006, the last
year for which the ANTAD directories include store counts at the municipality level.
I aggregate store counts across all brands owned by the same holding company. For
example, I aggregate all of Walmart’s stores, regardless of store name (i.e. Supercentro,
Bodega Aurrera, and Superama), under a single corporate identifier: Walmart. I denote
with firm each distinct holding company, and focus on the largest six companies in
the supermarket industry: Walmart (WM), Grupo Gigante (GG), Comercial Mexicana
(CM), Soriana (SOR), Casa Ley (LEY), and Chedrahui (CHD). These companies have a
national footprint and significant expansion activity. For example, by 2006, each of them
operated more than one hundred stores and had been opening stores at an average pace
of eleven stores per year. In contrast, the largest firm excluded from this sample, HEB,
operated solely 25 stores at the time and had been opening stores at a rate lesser than two
stores per year.2 Supermarket stores from firms other than the top six aforementioned
are aggregated into a single composite count labeled other. Store opening decisions of
the firms’ aggregated into other are not directly modeled, but taken as exogenous.3 The
appendix contains a detailed list of these firms.
Fig. 1 shows nationwide store counts, by firm and year, for the top six firms. There
is tremendous growth during this time, with the number of stores more than doubling
over the eleven-year span. However, the expansion is gradual: store counts increase every
year, without any year standing out as an exceptional growth year.
2
Carrefour operated 29 hypermarkets prior to being sold to Chedraui in 2005. Waldo’s Dolar Mart,
although classified by the ANTAD as a supermarket, has a business model closer to that of a convenience
store, for which I exclude it from the analysis.
3
Endogenizing entry decisions for these small players deeply complicates the analysis and adds little
identifying variation, as store openings and closings are rare.
3.2. Markets
Defining the relevant economic market is always a challenge in these studies. Stores
cater to consumers within localized neighborhoods. However, both ANTAD and the Cen-
sus report store counts and demographics at the coarser municipality level. Mexican
municipalities, similar in size to a US county, are mutually exclusive and exhaustive
geopolitical constructs. Most population centers (i.e. towns) are fully contained within a
municipality, with the exception of metropolitan areas. As data is reported at the mu-
nicipality level and these contain the localized neighborhoo ds to which stores cater, I
define the geographic scope of an economic market to be a municipality.4 This definition
is similar to those used in Basker (2005) and in Ellickson et al. (2013) when analyzing
discount retailers. I include in the analysis all municipalities reported in the ANTAD
directories and all municipalities that, at any point during the sample p erio d, contained
a population center with more than 35 thousand people. Given this definition, there are
332 markets, with a median population of 71 thousand inhabitants.
Stores operating in municipalities that are part of larger metropolitan areas as well as
municipalities with an international border can experience profitability spillovers from
adjacent municipalities. Hence, I obtain the list of municipalities b elonging to a metrop oli-
tan area from the census bureau (cf. Consejo Nacional de Poblacion and Secretaria de
Desarrollo Social, 2007) and the list of municipalities sharing a port of entry with either
USA, Guatemala, or Belize from the Mexican State Department, i.e. Secretaria de Rela-
ciones Exteriores. These municipalities receive special attention in the empirical analysis
below.
3.3. Variables that proxy for supermarket store profitability
I consider urban population counts at a municipality level as the main demand shifter
for supermarket stores. The Mexican census bureau, INEGI, rep orts p opulation counts
at the locality level in each five-year census. A locality is a subdivision of a municipality
constructed to encompass a single population center. Only data from urban localities
is retained. Data for the years inbetween censuses is interpolated using CONAPO’s5
state-wide yearly projections; cf. the appendix for details.
I also obtain variables pertaining to income, education, employment, urbanization,
tourism, and industry output, all of which proxy for supermarket demand. Specifically,
from the censuses I obtain the fractions of the urban population that (a) work in the
formal sector, (b) work in government jobs, (c) are adults, and (d) are retirees. I also
4
Metropolitan areas are defined as population centers spanning more than one municipality (e.g. 75
municipalities conform Mexico City - cf. Consejo Nacional de Poblacion and Secretaria de Desarrollo Social,
2007). Hence, defining economic markets by municipality rather than by metropolitan area is a narrower
definition, and one that is closer to approximating the localized demand for a given supermarket store.
Indeed, the average population density in Mexican metropolitan areas is 32 thousand people per sq. mile,
almost twice that of San Francisco.
5
Consejo Nacional de Poblacion: the agency that oversees population dynamics.
obtain the fraction of the adult urban population that have a middle school degree and
the average years of schooling among adults. I proxy for poverty with the fraction of
houses that have access to public utilities, i.e. water, sewage, and electricity. I calculate
population density using the land area of all urban localities reported in INEGI’s Marco
Geostadistico 2005. I create a measure of urbanization akin to the Herfindahl index: sum
across localities the square of each localities population share relative to the municipal-
ity’s total population. I use INEGI’s bi-annual income and expenditure survey to obtain
income per-capita, and obtain yearly GDP at the state level from the central bank. I
convert all currency values into real 2010 USD using average yearly exchange rates and
US’s CPI. The appendix contains details on interpolations used to fill in between census
years and between survey years. Finally, I classify a municipality as a touristic destina-
tion if it is among the top 70 destinations in the Mexican’s Secretary of Tourism’s report
of hotel occupancy (cf. Secretaria de Turismo, 2015).
The profitability of non-supermarket retailers –department stores, pharmacies, and
convenience stores– correlates with that of supermarkets, as they share common inputs
and cater to similar customers. As the ANTAD directories also contains store counts
for these retailers, I include the number of such stores, aggregated across all retailers
of the same class, as a proxy for supermarket profitability. In calculating these store
counts, I include only those retailers for which data is available in all eleven years.6
I use a logistic transformation on these counts to prevent outliers from driving most
variation: if xmt is the count of pharmacies in market-year mt, the transformed variable

˜mt = xmt /(xmt + M1T i xi ).
is x
Profit variables are merged with the ANTAD supermarket store counts to form a
single data set in which the unit of observation is a firm-market-year triplet. Table 1
summarizes these variables, segmenting them across two categories, according to where
firms entered. Thus, the average population of markets (i.e. of municipalities) in which
firms enter is 441 thousand inhabitants, while that in which firms do not enter is 169
thousand. The difference in means is suggestive of how population is an important factor
for determining store entry. Similarly, income, department stores, and pharmacies appear
to be important factors impacting store entry.
Market structure also plays an important role in store profitability. Rivals’ stores
can decrease profitability of a firm’s stores through competition. Moreover, a firm’s own
stores can cannibalize each other’s profitability, such that a firm’s total profitability in
a given market is concave in the number of stores that firm operates in such market.
With this in mind, I calculate the number of stores a firm has in a given market prior
to adding/removing stores and the number of rivals’ stores in that market. For rivals’
store counts, I use end-of-year values to account for rivals’ store openings and closures
6
Participation in the ANTAD directories is voluntary. By considering only those retailers present across
all years, I avoid considering as entry and exit what is simply intermittent participation by some retailers.
The included firms are, by sector: (a) convenience stores: OXXO, 7–11, and Extra; (b) department stores:
Del Sol, Fabricas de Francia, Las Galas, Liverp o ol, Rob erts, Sanb orns, Sears, Wo olwoth, and Zara; (c)
pharmacies: ABC, Benavides, COFAR, El Fenix, Farmacias Guadalajara, San Francisco de Asis, and Yza.
Table 1
An observation is a firm-market-year triplet. Profitability factors in markets with and without entry.
Variables common to all firms Variables specific to each firm
With entry Without With entry Without

entry entry
no. of observations 626 22,804 626 22,804

Population (’000 441 (411) 169 (255) Own stores (#, 1.93 (3.18) 0.34 (1.27)
hab) start-of-year)
Urbanization ([0,1]) 0.84 (0.23) 0.73 (0.27) Rival stores (#, 5.33 (6.00) 1.96 (4.22)
end-of-year)
Income(US$ / 946 (518) 687 (601) No prior own store 0.49 (0.50) 0.85 (0.35)
month / hab) (dummy)
Gvt. employees (per 127 (86.2) 112 (90.8) Pop / own stores 253 (235) 161 (202)
1000 hab) (’000/store)
Pop. density (hab / 5.27 (3.73) 3.89 (3.12) Pop / rival stores 107 (104) 83.8 (59.9)
sq km) (’000/store)
Access to public 82.9 (11.4) 71.6 (19.2) Distance to nearest 65.5 (59.4) 123 (157)
utilities (%) market with
stores (mi)
Dept. stores 0.42 (0.39) 0.14 (0.29) Centrality - sum of 0.44 (0.49) 0.37 (0.47)
(∗ count) inverse distance
to stores
(stores/mi)
Convenience stores 0.27 (0.38) 0.13 (0.27) Distance to existing 200 (246) 327 (295)
(∗ count) DC (mi)
Pharmacies 0.44 (0.32) 0.21 (0.27) Distance to future 164 (221) 284 (276)
(∗ count) DC (mi)
Tourist city 0.65 (0.48) 0.41 (0.49) Age of oldest store 15.0 (11.2) 11.5 (11.3)
(dummy) (years)∗∗
Border city 0.08 (0.27) 0.04 (0.20)
(dummy)
Standard deviations in parenthesis. (∗ count) Count variables shown under logistic transformation: f (x) =
x/(x + E[x]). (∗∗ ) Age of oldest store shown for markets where firm has prior stores, reducing the number
of observations to 320 for the With Entry column and to 3337 for the Without Entry column. Distance to
Future DC is the distance to the nearest distribution center that will b e op erational within the following
three years.
done during that year. These values, displayed in Table 1, show that firms prefer to add
stores in markets with already many stores. This does not immediately imply stores are
complements to each other, as their may by other factors that make certain markets
attractive, inducing repeated entry over time. Thus, I also calculate p opulation-per-store
as a variable that reflects demand for the marginal store, after controlling for existing
stores. Not surprisingly, firms enter markets with higher values, preferring markets with
fewer own and rival stores.
Both distribution and density economies matter in supermarket retail (cf. Holmes,
2011; Jia, 2008). To account for distribution economies I obtain the distance from markets
to the nearest distribution center (DC). I obtain DCs’ locations from firms’ webpages
and annual reports. DCs for both perishable and non-perishable goods are included and
treated equally. As firms may open stores in anticipation of future distribution centers,
I also calculate the distance from markets to the nearest distribution center that will
b e op erational within the following three years. I proxy for density economies with two
Fig. 2. Evolution of variables affecting store profitability. Yearly average values indexed to 1996.
variables: (a) the distance to the nearest municipality in which the firm has stores, and
(b) with the sum of the inverse distance to all of the firm’s stores. All distances are
calculated ‘as-the-crow-flies’ between municipality centers.
Finally, the 2006 and 2007 ANTAD directories include the date in which stores were
opened. I use this to build the age of the oldest store in the market, where a young age
captures the value of learning about the local market and an old age captures shifting
demographics within a municipality.
This paper argues that expansion costs are a large factor behind the gradual growth
process of firms. However, it is not the only factor. Fig. 2 shows the evolution, over time,
of some of the variables discussed. The gradual growth of these variables may also be
causing gradual growth of stores: markets slowly become attractive over time, thus firms
gradually open stores in those markets, waiting until the market is sufficiently profitable
to add the next store. There is significant growth in variables that ought to be related to
store profitability: income grows by 85% and the number of department stores by 35%.
Similarly, economies of distribution and density improve significantly, with distance to
distribution centers dropping by 36% and distance to nearest market with stores by 39%.
The other profit variables included in the Table 1 also have significant variation over time,
but are not shown in Fig. 2 for readability.
3.4. Industry practices for store opening
Having interviewed employees of ANTAD and of select firms, I learned that decisions
for store opening are made by a board of directors or a vice president and are revised once
every twelve or eighteen months, depending on the firm. During these twelve months,
a real estate department is constantly searching for optimal locations, relying on infor-
mation from regional managers, store managers, and suppliers. Having found a suitable
location, the real estate department negotiates purchase of the land and obtains all
required permits. This process is complex, as it requires reaching agreements with vari-
ous levels of government, suppliers, landowners, financial institutions, etc. Firms finance
land purchase and store construction with bank loans and internal cash flows; exter-
nal investors and stock sales are rarely utilized. Hence, limited cash flows and limited
managerial talent can constrain how quickly firms expand.
On a few occasions, firms have expanded through mergers & acquisitions: Gigante pur-
chased Azcunaga (10 stores, 2000) and Super Maz (11 stores, 2001), Chedrahui purchased
Carrefour (29 stores, 2005), Comercial Mexicana purchased Kmart (2 stores, 1997) and
Auchan (2 stores, 2003), and Walmart purchased Almacenes Chalita (4 stores, 2005).
These decisions were taken by the board of directors and took priority over individual
store expansion choices.7
3.5. Store openings
I infer store openings and closures by differencing stores counts year to year. I do not
consider changes in store counts from mergers as store openings and instead take them
as exogenous to this study, as the decision-making behind M&A is very different from,
and takes priority over, that for store opening. On a few occasions, a firm will close a
store one year and open another the following year. I do not consider these shifts as entry
and exit, but as the result of store refurbishing or store repositioning within a market.
Fig. 3 illustrates the main premise of this paper. It is a scatter plot where each dot
represents a firm-year pair. The x-axis plots the number of stores that firm opened that
year, in logs. The y-axis plots, in logs, the p opulation-per-store averaged across the
markets in which the firm opened stores. This captures how large demand must have
been to justify the cost of opening a store. The key takeaway is that the cost of opening
a store, the average p opulation-per-store, is increasing with the total number of stores
opened in the same year.
The above scatter plot relies on p opulation-per-store to proxy for store profitability,
and, by revealed preference, for the cost of opening a store. However, many factors
drive store profitability. Table 1 shows summary statistics for some of these factors,
segmented by markets in which store openings occurred and those in which it did not.
This covariance, between profit factors and store entry choice, is precisely what identifies
how profit factors determine store profitability. Not surprisingly, store openings occur
in markets with high demand: population, income, education, and employment. Store
7
For example, Chedrahui capitalized on Carrefour’s global restructuring and acquired all of Carrefour’s
stores upon their decision to exit the Mexican market (Tamayo, 2005). Gigante acquired Super Maz with
the intent of establishing a footprint and distribution in southern Mexico (Grupo Gigante, 2001).
Fig. 3. Relationship between profitability in markets where stores were opened and nationwide number of
stores opened. An observation is a firm-year. The vertical axis measures the average p opulation-p er-store in
markets where the given firm opened stores in the given year. Population-per-store divides population by
the sum of the firm’s own stores prior to entry and rivals’ stores after entry. The horizontal axis measures the
total count of stores opened by the given firm in the given year. Both axis are shown in log-scale. A positive
relation is suggestive that opening many stores simultaneously is costlier than opening them sequentially.
op enings also o ccur in markets with favorable cost structures, such as short distances to
distribution centers and to other markets in which the firm operates.
4. A model of store opening
I extend the model in Bresnahan and Reiss (1991) to account for multi-business-unit
firms,8 with the objective of estimating a cost of opening a store that is increasing in the
number of stores opened. Increasing costs introduce strategic interactions across markets,
as opening in a market increases the cost of opening simultaneously in another. Although
these inter-market interactions induce complexity to the model, they are necessary to
capture the opportunity cost of opening in one location relative to another.
As in Bresnahan and Reiss (1991), I model competition as a static, one-shot game.
Therefore, payoffs ought to be interpreted as the discounted sum of all future payoffs,
including expectations on future actions.
8
Previous extensions of Bresnahan and Reiss (1991) involved firms choosing a single location on an
exogenous choice set of vertically (Mazzeo, 2002) or horizontally (Seim, 2006) differentiated locations; firms
choosing multiple entry in a single location (Ishii, 2008); and, as this pap er do es, firms cho osing multiple
entry in multiple locations (Ellickson et al., 2013; Holmes, 2011; Jia, 2008; Nishida, 2015).
4.1. Game setup
The industry is composed of I firms, all competing across the same M markets. In each
market, firms are endowed with prior stores, ranging in number from zero to infinity. The
game consists in firms simultaneously choosing how many stores to add (i.e. open) or
remove (i.e. close) in every market. Firms cannot remove more than the endowed stores,
but can add as many as they like. Firms’ profits are determined by how many stores
they have after adding and removing stores, as well as by how many stores they add: i.e.
store opening costs. Finally, the game is one of complete information: all firms have full
knowledge of rivals’ payoffs conditional on rivals’ actions.
So as to put structure to the game, let zim ∈ N be the endowed stores of firm i in
market m, aim be the number of stores added (a positive integer) or removed (a negative
integer), and sim the number of stores after adding and removing stores: sim ≡ zim + aim .
Also, let qim ∈ + be an index for firm i’s profitability in market m, akin to market size.
For notational purposes, let zi be the vector containing firm i’s endowed stores across all
markets, zi ≡ (zi1 , . . . , ziM ), and z be the vector containing endowed stores for all firms,
z ≡ (z1 , . . . , zI ). As is common in the literature, firm i’s rivals’ endowed stores in market
m and across all markets are denoted z−im and z−i , respectively. Similar notation is used
for a, s, and q.
Firm i obtains a profit from each store, where such profit-per-store in market m are:
own rival
πim (sim , s−im ) = qim (sim + 1)θ (sjm + 1)θ − θfc (1)
j=i
own
In the above equation, (sim + 1)θ captures how firms’ own stores cannibalize or com-
plement each other. A positive value on θown is indicative that firms’ own stores are
complements to each other. Jia (2008) suggests that neighboring stores of the same firm
can be complements to each other, driven by economies of scope in distribution, brand-
ing, information gathering, etc. On the other hand, Ellickson et al. (2013) suggests that
neighboring stores of the same firm can be partial substitutes to each other, as they cater
to the same consumers.9 If this were the case, θown would be negative.
rival
As for the other terms in Eq. (1), (sjm + 1)θ captures how firm j’s stores, rivals to
firm i’s stores, affect firm i’s per-store profit. As firms i and j are comp etitors, I exp ect
θrival to be negative. qim is an exogenous ‘market-size’: higher values of qim generate
higher per-store profits without altering stores fixed costs, θfc . As qim captures both
demand and variable cost factors, I refer to qim simply as a profit index.
9
For example, consider a Hotelling model with uniformly distributed consumers on the unit line, willing
to pay 1.5 dollars for the consumption of a homogeneous go o d, and having to incur a linear transportation
cost of 1 dollar per unit of distance traveled. Assume there is a single store located at the edge of the line
(i.e. x = 0) which has no manufacturing costs. At optimal prices, this store sells to consumers located in
the [0,0.75] interval and makes (3/4)2 dollars. Consider now adding a second store, operated by the same
firm, at the opposite end of the line (i.e. at x = 1). At optimal prices, the old store sells to consumers on
the [0,0.5] interval at a price of 1 dollar, making a total profit of 1/2 dollars, which is less than the (3/4)2
dollars it was making before. The firm’s total profits increased, but the per-store profit decreased as the
new store cannibalized the old store’s sales.
Firm i’s total profits are simply the sum of all per-store profits minus the cost of
adding new stores. As mentioned previously, this cost is convex in the number of added
stores: the per-store cost is increasing in the total number of stores. To represent this,
let ω i be the total number of added stores, so that the cost of adding new stores is:

M
C(ai ) = θlin ωi + θcvx ωi ln ωi where ωi ≡ max {aim , 0} (2)
m=1
This cost function captures the sunk cost of adding new stores, where the average cost is
θlin + θcvx ln ωi , an increasing function of ω i whenever θcvx is positive.10 This increasing
average cost cost captures diseconomies of scale in adding stores, i.e. expansion costs.
While the model doesn’t specify the source of these expansion costs, it does capture
the size of them. It is conceivable that θcvx be negative instead of positive, capturing
economies of scale. However, as the logarithm function is unbounded, a negative θcvx
would imply economies of scale never taper off and it becomes optimal to add an infinite
number of stores: the cost of adding a store becomes negative and infinitely large. As
this is unrealistic, and, more importantly, never observed in the data, I imp ose θcvx b e
non-negative.
Firm i’s total profits from choosing ai is:

M
Πi (ai , a−i ) = sim πim (sim , s−im ) − C(ai ) where sim ≡ zim + aim (3)
m=1
An equilibrium in pure strategies is the set of actions –adding and removing stores–
that maximize each firm’s profits when rivals’ actions are optimal:
ai = argmaxΠi (ai , a−i ) s.t. ai ≥ −zi ∀i = 1..I (4)

ai ∈Z M
Before characterizing the equilibrium in more detail, I would like to note a few things
about the structural forms assumed so far.
First, the per-store cost of adding stores is irrespective of where such store is added.
That is, adding a store in Mexico City increases the cost of adding a store in Juarez
by the same amount as in Cancun. This is not a naive assumption but is necessary
for tractability: the ranking of most profitable store addition to least profitable store
addition is invariant to where stores are added. Hence, in searching for an optimum, it is
sufficient to order store additions from most profitable to least and add stores one at a
time until the per-store cost of adding a store is larger than the profit gain from adding
the store. Without this assumption, the search for an optimum would be an enormous
combinatorial problem. The assumption is not as restrictive as may seem at first glance:
10
The specific functional form implies per-store costs increase sub-linearly in ω. Alternatively, one could
have chosen a functional form in which these costs increase linearly in ω. As it turns out, the main empirical
findings are robust to the choice of functional form, but the sub-linear form fits the data best.
the mo del does allow for heterogeneity in per-store profits, such that adding a store in
Juarez may be preferred to adding one in Cancun. As store opening decisions in Mexico
are highly centralized, assuming diseconomies of scale only at the national level is not
unrealistic.
A second key assumption of the above model is that there is no cost in reducing
the number of stores. This restriction is imposed for the purpose of identification: one
observes only two distinct choices but would like to identify three distinct values. In
simple terms, one observes if firms add a store or not and if they remove a store or not,
but would want to identify the cost of adding a store, the value of keeping a store, and
the cost (or scrap value) of removing a store. As not all three are identified, I normalize
the cost of removing a store to zero. Hence the fixed cost of a store, θfc , and the linear
cost of adding a store, θlin , are identified up to level shifts, akin to discrete choice models
normalizing the mean utility of the outside go o d to zero.
Third, I assume the per-store-profit function is decreasing in the number of a firm’s
own stores. More specifically, that θown is between −1 and 0. This imposes that stores be
partial substitutes to each other, as in Ellickson et al. (2013), and rules out complemen-
tarities, as in Jia (2008). The restriction follows from observed data patterns and from
the additional restriction that removing stores is free: if complementarities existed and
the cost of closing stores did not increase with the number of stores being closed, a firm
that were ever to close a store in a given market would close all stores in that market.
In the observed data, in certain markets firms close some stores but not all, suggesting
that a firm’s own stores need be partial substitutes to each other.
Fourth, the use of a static model instead of a dynamic model is not without limitations.
The current model assumes a firm’s action in one market does not alter future actions in
other markets, by the same firm or by rivals. They do alter the firm’s future actions in the
same market, but only through the change in the number of existing stores.11 Interviews
with industry experts suggest this limitation, the lack of intertemporal cross-market in-
terdependencies, is not of first-order concern. That is, industry experts commented how
decisions were based on existing market profitability, with little regard to future entry
choices. This is likely due to the early stages of the expansion wave, in which opening
in the most profitable locations takes priority over preemptive entry choices. A second
major limitation of the static model is that payoffs in a given market are a function of
solely a profit index and of the number of stores in that market. Thus, if a given market
has the same profit index and the same number of stores in 1996 as in 2006, benefits
11
It should be clear by now why a dynamic model that allows for expansion costs is untractable in the
current setting. Expansion costs create inter-market dependencies. Dynamic models create intertemporal
dep endencies. A mo del that accomodates both would likely by characterized with a 2I · M dimensional state
space, where I is the number of firms and M the number of markets. The current setting has 6 firms and
335 markets, making such characterization imp ossible. Metho ds that reduce the complexity of large state
spaces in dynamic models (e.g. Benkard et al. (2008), Bajari et al. (2007)) require aggregating markets,
agents, or actions into a single ‘representative’ market, agent, or action. Any such reduction that simplifies
the dynamic problem in a meaningful way, in this particular setting, is likely to impose stronger restrictions
on the representation of industry and firm behavior than the restrictions imposed by the proposed static
model.
from adding a store are the same for the two years, even though firms are much further
into their expansion wave in 2006. Given this limitation, estimated parameters will rep-
resent average effects across the 11-year time span. Finally, the estimated payoffs of a
static model capture the discounted sum of all future payoffs, where this discounted sum
accounts for expected future actions, i.e. the static profits are a reduced form expression
of the dynamic process. Therefore, this reduced from expression captures firms’ respon-
siveness to variables that affect future profits, e.g. market growth, early entry, etc., in
addition to variables that affect current profits. However, in any counterfactual scenario
in which model primitives (i.e. costs) are altered, so too must the reduced form pay-
offs be, as these depend on expectations of future actions which are likely to shift with
changes in model primitives. Unfortunately, without a dynamic model, it is impossible
to predict how these reduced form payoffs ought to be modified. The counterfactual sim-
ulations in Section 8 alter costs but do not alter parameters governing payoffs, and thus
are representative only if expectations of future actions do not change with changes in
costs. Thus, the simulations alter costs only slightly so that any changes to expectations
of future actions are of second order.
4.2. Characterizing best responses
As the action space is discrete, it is not feasible to characterize best responses

using FOCs. Instead, I use inequalities to characterize best responses. Specifically,
if ai maximizes firm i’s profits, then it must be more profitable than any other
action:
Πi (ai , a−i ) ≥ Πi (ai , a−i ) ∀ai ∈ ZM , ai ≥ −zi (5)
Eq. (5) references an infinite number of inequalities, most of which are redundant. The
only non-redundant inequalities are those in which the optimal allocation is perturbed
by adding or removing a single store or p erturbed by adding one less store in one market
and one more store in another. Formally, let em be an M-long vector of zeros with a
single one in the mth position, M be the set of market indices, N>0 (x) be the set
of market indices with positive x value: N>0 (x) ≡ {m ∈ M | xm > 0} and N≥0 (x) be
the set of market indices with non-negative x value. The non-redundant inequalities
are:
Πi (ai , a−i ) ≥ Πi (ai , a−i ) ∀ai ∈ A a ∪ A b (6)
A a (ai , zi ) = {ai | ai = ai ±em , ai ≥ −zi , ∀m ∈ M } (7)
A b (ai , zi ) = {ai | ai = ai + em − en , ai ≥ −zi , ∀m ∈ N≥0 (ai ) , ∀n ∈ N>0 (ai )} (8)

To prove these are the only non-redundant inequalities, note how profits (Eq. (3))
are concave in the allocation choice ai .12 As such, it is sufficient to consider deviations
from the optimal allocation that consist of at most one additional or one less store in
each market. There are 2M such deviations. However, benefits from removing stores in
one market are invariant to actions in other markets: variable profits in one market
are unaffected by actions in other markets and there is no cost for removing stores. As
such, inequalities in which the optimal allocation is perturbed by changing the number
or location of removed stores are redundant given the inequalities characterized by A a
and by inequalities that perturb the optimal allocation without changing the number or
location of removed stores. In addition, recall the cost function is convex and depends
solely on the total number of stores added, irrespective of where they are added. Hence,
inequalities that perturb the optimal allocation by changing the number of added stores
are redundant given inequalities characterized by A a and inequalities that perturb the
optimal allocation without changing the total number of added stores. Thus, the non-
redundant deviations yet to consider, in addition to those given in A a , are deviations
that differ from the optimal allocation by at most one unit in every market, that have the
same number of added stores, and that have the same number and lo cation of removed
stores. The set A b is sufficient to characterize these deviations as variable profits in any
one market are irrespective of actions in other markets.
The above game is an extension of the two-player entry game, and as such can have
multiple equilibria.13 To obtain unique outcomes I impose an equilibrium selection pro-
cess. Specifically, I find an equilibrium by initiating firms’ actions at a given value and
iterating firms’ b est resp onses until convergence. The initial action and the order in which
firms b est resp ond determine the equilibrium. For example, in a two-player symmetric
entry game, where parameters are such that either the first firm enters and the second
does not or vice versa, initiating both firms’ actions at ‘no-entry’ and iterating best re-
sponses with firm one best-responding first, converges to the equilibrium in which firm
one enters and firm two does not. In contrast, having firm two b est-resp ond first results
in the other equilibrium arising.
Let E describe the order in which firms best respond to each other and the initial ac-
tion under this best response heuristic. As different orderings and different actions result
in different equilibria being selected, E can be thought of as a complex equilibrium selec-
tion rule. Denote the unique equilibrium outcome of the game as a (E , q, z), where the
dependency on the equilibrium selection rule, on the profit indices, and on the endowed
stores is explicitly stated.
12
The cost function, Eq. (2), is the sum of convex functions, i.e. max {x, 0} is a convex function, and
hence is convex itself. The sum of variable profits, i.e. the term Mm=1 sim πim in Eq. (3), is concave in ai as
per-store-profits (Eq. (1)) are decreasing and concave in sim and invariant to actions taken in other markets,
and sim is a linear transformation of aim . As the sum of variable profits is concave and the cost function is
convex, the profit function is concave.
13
Existence of equilibria in mixed strategies follows from Nash (1950b). Existence in pure strategies is not
guaranteed, as firms’ set of feasible actions is not convex. However, in the empirical exercise that follows, I
show an equilibrium in pure strategies does exist by computing the equilibrium at the required parameter
values and for the given data.
Section 6 details the empirical approach to estimating the above model using the Mex-
ican supermarket industry. However, to most easily understand the empirical approach,
it is best to introduce the estimating strategy in a much simpler setting. The next section
does so.
5. A partial likelihoo d estimator with correction
5.1. Setup
Consider the following standard entry game and data generating process. There are
N markets. In each market, two firms simultaneously make an entry decision. Firm i’s
profit from entering market n is:

α + in if rival firm does not enter
πin = (9)
α − β + in if rival firm does enter
Profit from not entering is zero. The two parameters of this game, α and β, are the same
for all firms and all markets. Firms differ in in , which is distributed Standard Normal
and is iid across firms and markets. All firms know each other’s random shocks prior to
making entry decisions.
Every market represents a coordination game of complete information. For certain
values of (1n , 2n ) the game has multiple equilibria. To resolve such cases, let the realized
equilibrium be the one that generates highest industry profits. After these games are
played and outcomes are realized, the researcher observes the outcomes and would like
to estimate α and β. The researcher does not observe the profit shocks.
Represent a firm’s entry choice with a ‘1’ and a no-entry choice with a ‘0’. A market
outcome is a pair of digits where the first digit is firm one’s choice and the second digit
is firm two’s choice. There are four possible market outcomes: (00), (10), (01), and (11).
Denote with Y the set of possible market outcomes, and with yn the realized market
outcome in market n.
One way to estimate α and β is to calculate the likelihoo d that a given market outcome
is realized, and, using the observed distribution of market outcomes, choose parameters
that maximize the sample log-likelihoo d. Compute the log-likelihoo d by first partitioning
the space of profit shocks by the corresponding market outcome. Fig. 4 illustrates this
partition and Table 2 details the half-spaces on profit shocks whose intersection generates
the desired partition. The half-spaces in Table 2 have been separated into two columns
for reasons that should become apparent soon. The first column, column A, corresponds
to half spaces derived from firms’ b est resp onse functions. The second column, column B,
includes any additional half-spaces that resolve multiplicity of equilibria. The half-spaces
in column B are specific to the equilibrium selection rule and would be different under
alternative equilibrium selection rules.
Fig. 4. Market outcomes in a two-firm entry game shown over the space of each firms profit shocks. The
four regions defined by the solid lines correspond to the four market outcomes: (11) - both firms enter; (10)
- firm 1 enters and firm 2 does not; (01) - firm 2 enters and firm 1 does not; (00) - neither firm enters.
The figure assumes both firms share the same parameters, α and β, and multiple equilibria is resolved by
selecting the equilibrium with highest industry profits. The square defined by ABCD is the area where there
is multiplicity of equilibria. The shaded area is that for which firms’ actions are b est resp onses to the rival’s
action when the market outcome is (10). The dashed area, a subset of the shaded area, is that for which
the market outcome is (10) given the equilibrium selection rule.
Table 2
Two-firm entry game. Corresp ondence b etween profit shocks and market outcomes.
Market outcome Bounds on profit shocks
Set A Set B
0 ≤ α − β + 1
In,In (11)
0 ≤ α − β + 2
0 > α + 1
Out,Out (00)
0 > α + 2
0 ≤ α + 1
In,Out (10) 1 ≥ 2
0 > α − β + 2
0 > α − β + 1
Out,In (01) 1 < 2
0 ≤ α + 2
Sets A and B contain half-spaces on the 2-dimensional profit shock such that when profit shocks are within
the defined half-spaces the outcome of the game is the row’s market outcome. A and B split the half-spaces
into two sets: set A contains half spaces generated from firms’ b est resp onse functions; set B contains
additional half-spaces that resolve multiplicity of equilibria by choosing the equilibria with highest industry
payoff.
Calculating the likelihoo d of each partition may be a challenge, as there are no closed
form solutions for the partitions corresponding to market outcomes (10) and (01). These
likelihoo ds can always be approximated through simulation (e.g. Monte Carlo integra-
tion), but such procedure may be inaccurate and computationally burdensome, especially
if the likelihoo d is close to zero for some markets. The following estimator is suggested
as a more robust alternative.
5.2. The estimator
Let θ be the parameters of the model, i.e. θ ≡ (α, β), and define Ay (θ) and By (θ) as
the set of profit shocks within the half-spaces defined in Table 2. For example, A(10) (θ) ≡

(1 , 2 ) ∈ 2 | α + 1 ≥ 0 ∧ α − β + 2 < 0 and B(10) (θ) ≡ {(1 , 2 ) ∈ 2 | 1 ≥ 2 }. The
log-likelihoo d for specific outcome y is
ly (θ) = ln Pr [Ay (θ) ∩ B y (θ)] (10)
Decompose this log-likelihood using Bayes’ Law and properties of logarithms:
ly (θ) = ln (Pr [B y (θ)|Ay (θ)] Pr [Ay (θ)]) = ln Pr [B y (θ)|Ay (θ)] + ln Pr [Ay (θ)] (11)
To keep notation compact, let a(Y, θ) and b(Y, θ) be the corresponding log-partial-
likelihoo ds for specific outcome Y (a random variable):

a(Y, θ) ≡ 1{y = Y } · ln Pr [Ay (θ)] (12)
y∈Y

b(Y, θ) ≡ 1{y = Y } · ln Pr [B y (θ)|Ay (θ)] (13)
y∈Y
Also, define Qa (θ) and Qb (θ) to be the log-partial-likelihood functions (i.e. Qa (θ) =
E[a(Y, θ)]) and define QaN (θ) and QbN (θ) be to their sample analogs (i.e. QaN (θ) =
1
N a a
N n=1 a(yn , θ)). Finally, define θ and θN to be the unique maximizers of their cor-
responding functions. Notice how the log-likelihoo d and its finite sample version are
simply the sum of the corresponding log–partial-likelihoods, i.e. L(θ) = Qa (θ) + Qb (θ).
a L
Thus, how far is θN from the maximizer of the finite-sample log-likelihoo d, θN ? The
latter is the value that sets the gradient of the sample log-likelihoo d function to zero.
a
Using a first order Taylor expansion of this gradient around θN (denote the gradient and
hessian of f with ∇f and ∇ f, respectively):
2
L L
L

∇ L N θN = 0 = ∇ L N (θ N
a
) + ∇ 2 L (θ N
a
) θN − θN
a
+ o
θN a
− θN (14)
a
By definition of θN , the gradient ∇QaN (θN
a a
) is zero at θN . Hence, ∇LN (θN
a
) = ∇QbN (θN
a
)
and
−1
L
θN − θNa
≈ − ∇2 LN (θN a
) ∇QbN (θN
a
) (15)
L a
This suggests using an estimator that approximates θN by estimating θN and adding
a corrective term. That is, define the Partial Likelihoo d Estimator with Correction
(PLEC) as:
−1
θˆN = θˆN
a
− ∇2 LN θˆN
a
∇QbN θˆN
a
(16)
θˆN
a
= arg max QaN (θ) (17)
θ∈Θ

L

If the corrective term is ‘small’, the discrepancy o
θN − θNa
is also small, the ap-

proximation in Eq. (15) is accurate, and θˆN is a go o d approximation of θN L
. If the cor-
rective term is ‘large’, the approximation is inaccurate and one should not consider θˆN
L
as a go o d approximation of θN .
Why is it easier to calculate θˆN than θN
L
? The log-partial-likelihoo d, QaN (·), is a smooth
convex function. Its first and second derivatives have closed form solutions which are
a
easy to program. Hence θN can be obtained using gradient based optimizers, which are
fast and accurate. In many applications QbN (·), and by extension LN ( · ), need to be
calculated through simulation or through complex algorithms which are computationally
burdensome and prone to error: human error, numerical error, and statistical error.14
The PLEC removes QbN (·) from the maximization process, accelerating the search for
optimal parameters and reducing errors introduced in this search. The correction term
re-introduces QbN (·) after the search for parameters is finalized, allowing for θˆ to include
information in QbN (·) if such information exists, but mostly to confirm whether QbN (·)
contains relevant information.
Newey and McFadden (1994) describe conditions under which the estimator is con-
sistent, detailed in the appendix. The key assumption for consistency of the estimator is
that the correction term go to zero as N becomes large. This will be the case when the
identifying variation in a(Y, θ) is much stronger than that in b(Y, θ). This is not to say
that the latter is likely or unlikely, but that it is either invariant to different values of θ
at θa or that the sensitivity of a(Y, θ) to θ is so much larger than that of b(Y, θ), that
the former is sufficient to identify θ.
The following exercise illustrates such cases in the context of the two-firm entry game
and shows how the PLEC estimator can outperform a simulated maximum likelihoo d
estimator under certain conditions.
14
For example, given the half-spaces in Table 2, both Pr [B y (θ)|Ay (θ)] and Pr [B y (θ) ∩ Ay (θ)] need to be
calculated using simulation (e.g. Monte Carlo integration, Gibbs sampling, etc.). The resulting simulated
probability is not smooth and only approximates a smooth function if sufficient simulations draws are
used. Simulation error is introduced by the sampling, and this error is larger for the full log-likeliho o d
–ln Pr [B y (θ) ∩ Ay (θ)]– than for the partial log-likeliho o d –ln Pr [B y (θ)|Ay (θ)]– as the probability in the
former is smaller than in the latter by construction. In addition, numerical error is introduced in computing
derivatives, as the step size used in finite differencing needs to be large to account for how the objective
function is not truly smooth.
5.3. Exogenous identifying variation
Modify the above two-firm entry model so that markets have some number of prior
comp etitors, x, ranging b etween zero and four, and distributed Binomial(p, 4). Firms’
profits upon entry are

α − βx if rival does not enter
π= (18)
α − β(x + 1) if rival enters
The partial log-likelihoo d is

2
a(Y, X, θ) = 1{(00) = Y } · ln Φ(−α + βX )
+ 1{(01) = Y } · ln [Φ(−α + β(X + 1))(1 − Φ(−α + βX ))]
+ 1{(10) = Y } · ln [(1 − Φ(−α + βX ))Φ(−α + β(X + 1))]

2
+ 1{(11) = Y } · ln (1 − Φ(−α + β(X + 1)))
1
QaN (θ) = a(yn , xn , θ) (19)
N n
I illustrate the advantages of the PLEC, as well as how variation in X identifies β, by

simulating markets and horse racing the PLEC against a simulated maximum likelihoo d
estimator (SMLE). I do so across six different simulation sets, where sets differ in how
much X varies and in how pervasive is multiple equilibria. For each set, I simulate 10,000
draws of (1 , 2 , x) and, for each draw, solve the entry game and calculate the resulting
market outcomes. I then use the market outcomes and the draws of x to estimate α and
β using both the PLEC and the SMLE. I compare the two estimators in how close the
estimated parameters are to the true parameters and the time they take to compute.
For three of the simulation sets there is no variance in X, which is achieved by setting
the Binomial probability parameter, p, to zero. For the other three sets the probability
parameter is set to 21 . For each value of p, I choose α and β so the percentage of markets
with multiple equilibria is one, ten, and forty, respectively.15
Table 3 contains the outcomes from these simulations. When variation in X is large,
i.e. p = 12 , the PLEC performs just as well as the SMLE in that the estimated confidence
intervals include the true parameter and are small. Although the SMLE’s confidence
intervals also contain and the true parameter and are smaller than the PLECs, the latter is
much faster: up to twenty times faster when multiple equilibria is p ervasive. Imp ortantly,
the PLEC’s p erformance does not degrade with higher incidences of multiple equilibria,
as the variation in X allows the PLEC to identify β well. The PLEC also performs
15
Fig. 4 illustrates how values of α and β define the percentage of markets with multiple equilibria.
Expressed algebraically, the percentage of markets with multiple equilibria is P M E = EX (F (−α + β(X +
1)) − F (−α + βX))2 . Hence, for a given p and a desired amount of multiple equilibria, α and β are chosen
by inverting the above equation. As the inversion has more than one solution, I choose the solution with
the smallest β, effectively centering the area of multiple equilibria on zero.
24
M.J. Varela / International Journal of Industrial Organization 61 (2018) 1–52
Table 3
Monte Carlo simulations comparing the Partial Likelihood Estimator with Correction (PLEC) with the Simulated Maximum Likeliho o d Estimator(SMLE).
Fraction of markets with multiple equilibria
One percent Ten percent Forty percent
True PLEC SMLE True PLEC SMLE True PLEC SMLE
Exogenous None (p = 0) α 0.12 0.12 (0.01) 0.12 (0.02) 0.41 0.35 (0.01) 0.46 (0.00) 0.90 0.52 (0.02) 0.89 (0.02)
variation β −0.25 −0.23 (0.02) −0.24 (0.02) −0.81 −0.70 (0.02) −0.83 (0.01) −1.80 −1.07 (0.03) −1.79 (0.03)
Time 22 114 25 145 31 236
Much (p = 1/2) α 0.65 0.68 (0.02) 0.68 (0.01) 2.84 2.82 (0.05) 2.88 (0.00) 7.09 6.88 (0.20) 7.22 (0.13)
β −0.26 −0.27 (0.01) −0.27 (0.01) −1.14 −1.13 (0.02) −1.15 (0.01) −4.72 −4.63 (0.11) −4.80 (0.07)
Time 21 146 21 162 11 230
Each box corresponds to a different simulation set. Each simulation set consists of 10,000 draws of (1 , 2 ) ∼ Normal(0, I) and x ∼ Binomial(p, 4). For each
draw, a two-firm entry game is played where Firm 1’s profit upon entering is α − βx + 1 if Firm 2 does not enter and is α − βx − β + 1 if firm 2 does
enter. Profits for Firm 2 are defined similarly. If multiple equilibria exists, the firm with the highest profit shock enters. Given equilibrium entry decisions
and exogenous shock x, the parameters α and β are estimated using the two different estimators. The table shows the true parameters, as well as the
estimated parameters from each of the two estimators and corresponding standard errors in parenthesis. The table also shows the time required to compute
the estimated parameters, in seconds. Six simulations sets are shown, corresponding to two different parameter values for p, the parameter governing the
exogenous variation in x, and three different parameter values for (α,β) which generate various incidences of multiple equilibria. For each choice of p, and
each desired incidence of multiple equilibria, the values of α and β that generate such incidence are shown in the True sub-column.
well when the incidence of multiple equilibria is small, i.e. when less than ten percent of
markets have multiple equilibria, regardless of variation in X. In such cases, the difference
between the partial likelihood and the full likelihood is small and little information is
lost by ignoring multiplicity of equilibria. In contrast, when there is no variation in X,
i.e. p = 0, and when multiple equilibria is pervasive, i.e. forty percent of markets have
multiple equilibria, the PLEC performs poorly, underestimating both β and α.
Intuitively, the preexisting competitors are very helpful in identifying the role of com-
petition. Firms’ entry decisions are determined by the number of prior competitors as
well as by co ordination with p otential entrants. When there is a lot of variation in the
number of prior competitors, coordinating with potential entrants becomes just a small
factor in a firm’s entry choice. The researchers’ understanding of competition is minimally
affected by ignoring this coordination.
6. Estimating expansion costs in the Mexican supermarket industry
Section 4 characterizes an entry game with expansion costs. In this section, I detail
how to estimate the parameters of such game using data on the Mexican supermarket
industry and a PLEC estimator.
6.1. Estimation strategy
Assume firms replay the game every year and let at ≡ (at1 , . . . , atI ) denote the observed
outcome of the game in year t.16 In addition, assume the profit index, qim t
in Eq. (1), is
distributed log-normal with mean parameter xim γ and scale parameter equal to one17
t
and is iid across firms, markets, and years. The parameters of the model are: θ ≡ (γ, θown ,
θriv , θfc , θlin , θcvx ).
Recall from the end of Section 4 that a (E , q, z) denotes the outcome of the game
given equilibrium selection rule E , profit index q, and firms’ starting stores z. As qim t
is
t t t t
log-normal with mean parameter xim γ, it can be written as qim = exp(xim γ + εim ) for a
Standard Normal shock εtim . Hence, re-write the outcome of the game as a (ε|E , x, z, θ),
which makes explicit the dependency on the Standard Normal shock, on the exogenous
variables, x and z, and on the parameter vector θ.
By revealed preference, i.e. taking the observed outcome as the game’s equilibrium,
the log-likelihoo d is:

T

L(θ) = ln Pr a (ε|E , xt , z t , θ) = at (20)
t=1
16
Throughout this section, I add time superscripts to denote variation in a variable across time.
17
This normalization is typical in entry models. The alternative most common normalization (Manski,
1975; Fox, 2018) normalizes the parameter value of a covariate with full support to one. This is done when
the researcher does not want to make a distributional assumption on the unobservable, allowing higher
moments of the unobservable to be unknown.
As with the PLEC estimator, I calculate the probability of ε by taking the intersection
of three sets of half-spaces. Specifically, recall the definitions of Πi (ai , a−i ), A a (ai , zi )
and A b (ai , zi ) from Eqs. (3), (7) and (8), and define the three half-spaces:

Λa (ε|x, z, a, θ) = ε ∈ I·M | Πi (ai , a−i ) ≥ Πi (y, a−i ) ∀y ∈ A a (ai , z i ) ∀i ∈ I (21)

Λb (ε|x, z, a, θ) = ε ∈ I·M | Πi (ai , a−i ) ≥ Πi (y, a−i ) ∀y ∈ A b (ai , z i ) ∀i ∈ I (22)

Λc (ε|x, z, a, E , θ) = ε ∈ I·M | a (ε|E , x, z, θ) = a (23)
The last set, Λc (ε|x, z, a, E , θ), is the only one that uses the equilibrium selection rule.
It nests the other two: the intersection of all three is exactly the third. As with the PLEC,
express the likelihoo d as the sum of three partial log-likelihoo ds:

T

L(θ) = ln Pr Λa (ε|xt , z t , at , θ) ∩ Λb (ε|xt , z t , at , θ) ∩ Λc (ε|xt , z t , at , θ)
t=1

T
T
T

= ln Pr [Λat (θ)] + ln Pr Λbt (θ)|Λat (θ) + ln Pr Λct (θ)|Λbt (θ) ∩ Λat (θ)
t=1 t=1 t=1
(24)
where I’ve used a compressed notation: Λat (θ) ≡ Λa (ε|xt , z t , at , θ).

ˆˆ
I estimate the parameters with a PLEC estimator, θ, which adds a corrective term
to the maximizer of the partial log-likelihoo d given by the first two elements of the
ˆ
likelihoo d, θ:

T
T

θˆ ≡ arg max ln Pr [Λat (θ)] + ln Pr Λbt (θ)|Λat (θ) (25)
θ∈Θ
t=1 t=1
T
−1
ˆ
θˆ ≡ θˆ − ∇2 L(θ)
ˆ ∇ ˆ |Λb (θ)
ln Pr Λct (θ) t
ˆ ∩ Λa (θ)
t
ˆ (26)
t=1
Standard errors are calculated using the sandwich variance estimator given in
Proposition 1, found in the Appendix.
6.2. Calculating likelihoods
ˆ
One advantage of computing θˆ instead of the maximizer of the log-likelihoo d is that

Pr [Λt ] has an exact closed form solution and Pr Λbt |Λat can be approximated by a
a
smooth function. In contrast, the likelihoo d function, L(θ), has to b e approximated by

a non-smooth simulated likelihoo d. Moreover, Λat defines upper and lower bounds on
the profit shocks, ε. As the profit shocks are iid across firms, markets, and time, these
upper and lower b ounds can b e expressed for each firm, market, and time separately, in
essence making ln Pr [Λat ] the sum of censored Normals’ log-likelihoo d. Similarly, Λbt is
the intersection of half-spaces proper to each firm and thus can be calculated separately
for each firm. The following paragraphs expand on this.
6.2.1. Partial likelihood from single market deviations (Pr [Λa ])

Recall from Section 4 that A a considers inequalities that perturb the optimal alloca-
tion by only one unit and in only one market. Hence it is useful to introduce notation
that captures how profits change when adding or removing one additional store. The fol-
lowing variables, all of which are functions of θ, xt , zt , and at , do just that. Specifically,
M
let k be either one or minus one, recall stim ≡ zim
t
+ atim and ωit ≡ m=1 max {atim , 0},
and define:
1. Effective profit index (logged):

t
rim ≡ xtim γ + j=i θrival ln stjm + 1
2. Change inunit profits:
θ own
θown
t t
− stim + k stim + 1 + k if stim + k ≥ 0
t
gim (k) ≡ (∞sim )(sim + 1) otherwise
3. Change incosts: t
htim (k) ≡ −θ k − θ lin k − θ cvx ωit ln ωit − ωit + k ln(ωit + k) if ∨ atim = 0 ∧ k > 0
fc
aim > 0
−θ fc k otherwise
With this notation, re-write Λat as
Λat = {ε ∈ I·M | Πi (ati , at−i ) − Πi (ati ±em , at−i ) ≥ 0 ∧ atim ±em ≥ zim
t
∀(m, i) ∈ M × I}
= {ε ∈ | exp(εim + rim
I·M t
) · gim
t
(k) ≥ htim (k) ∀k ∈ {1, −1} ∀(m, i) ∈ M × I}
= {ε ∈ I·M | ln htim (−1) − rim
t
− ln gim
t
(−1) ≤ εim ≤ ln[−htim (1)]
−rim
t
− ln[−gim
t
(1)] ∀(m, i) ∈ M × I} (27)
where the second line follows from the definition of profits, per-store profits, and costs,
i.e. Eqs. (1)–(3), and the last line follows from g(1) < 0 and from taking logs on both
sides of the inequalities.
Denote with Φ( · ) the standard normal CDF and note that εtim is normally distributed
and independent across firms, markets, and time. Hence, the log-likelihoo d of Λat is:

T
T
I
M
t t
ln Pr [Λat ] = ln Φ ψim (1) − Φ ψim (−1) where
t=1 t=1 i=1 m=1

t
ψim (k) ≡ ln −k · htim (k) − rim
t
− ln −k · gim
t
(k) (28)
t
One advantage of this notation is that rim is linear in the parameters and
ln [−k · him (k)] can be approximated by a linear function of parameters. Specifically,
t

define the alternative cost parameters θ˜fc ≡ ln θfc , θ˜lin ≡ ln 1 + θlin /θfc , and θ˜cvx =
θcvx /(θfc + θlin ) and the function ρ(ω) ≡ ω ln ω − (ω − 1) ln(ω − 1). Whenever θcvx is
small, ln [−k · htim (k)] is a linear function of the alternative parameters.18
Hence, ln Pr [Λat ] is the log-likelihoo d of many censored normal likelihoo ds whose mean
parameter values are linear in all but one of the structural model’s parameters: θown .

6.2.2. Partial likelihood from market-pair deviations (Pr Λb |Λa )
A b considers the inequalities that perturb the optimal allocation by adding one more
store in markets where stores were not removed, i.e. markets where it is costly to add
more stores, and adding one less store in markets where stores were added. In effect, the
opportunity cost of adding a store in a given market is not adding it in another. A b
captures this opportunity cost by comparing the benefits of reducing a store in a market
where it is added optimally, and adding that store in any another market. However, one
need not compute all pair-wise comparisons. It is sufficient to show the smallest benefit
received from adding a store is larger than the biggest benefit that could have been
received from adding it elsewhere. Hence, there really is only one half-space that needs
to be considered for each firm, but such half-space depends on the firm’s profit shocks in
all markets:
Λbt |Λat = {ε ∈ Λat | Πi (ati , at−i ) − Πi (ati + em − en , at−i ) ≥ 0 , ∀(m, n) ∈ N≥0 (ati )
×N>0 (ati ) , ∀i ∈ I}
= {ε ∈ Λat | exp(εin + rin
t
) · gin
t
(−1)
≥ − exp(εim + rim
t
) · gim
t
(1) , ∀(m, n) ∈ N≥0 (ati ) × N>0 (ati ) , ∀i ∈ I}
= {ε ∈ Λat | min {εin + rin
t t
+ ln gin (−1)}
n∈N>0 (ati )
≥ max {εim + rim

t t
+ ln[−gim (1)]} , ∀i ∈ I} (29)
m∈N≥0 (ati )
And as the profit shocks are independent across firms, the partial likelihoo d can be
decomposed into the sum of each firms’ log-partial-likelihoo d. These partial likelihoo ds,
however, do not have closed form representations. Thus, I approximate them using Monte
Carlo integration. Specifically, I take 1000 draws from an MI-multivariate censored nor-
mal distribution with upper and lower limits given by ϕtim (1) and ϕtim (−1), respectively.
I define εr to be one such draw and 1{ · } be the indicator function. The simulated

18
Note that ln −k · htim (k) = ln θ fc + θ lin + θ cvx · ρ(ωit + 12 (1 + k)) whenever atim > 0 or whenever

atim = 0 and k = 1. Simple algebra gives ln −k · him (k) = θ + θ + ln 1 + θ˜cvx · ρ(ωit + 12 (1 + k)) . By
t ˜fc ˜lin

properties of log functions, ln [1 + c] ≈ c whenever c is close to zero. Hence, ln −k · htim (k) ≈ θ˜fc + θ˜lin +
θ˜cvx · ρ(ωit + 12 (1 + k)). The case for when atim = 0 and k = −1 is straightforward.
log-partial-likelihoo d is:

T
I
1
R

ln 1 min εrin + rin
t t
+ ln gin (−1)
t=1 i=1
R r=1 n∈N>0 (ati )

r t
≥ max t
εim + rim + ln −gim (1) (30)
m∈N≥0 (ati )
Many practices can be implemented to improve performance of simulated likelihoo ds

R
(cf. Train, 2009), including using Halton sequences in constructing {εr }r=1 , replac-
ing the indicator function with a smoothing kernel –e.g. the standard normal CDF–,
and replacing the maximum and minimum operators with smoothing kernels –e.g.

J(x1 , . . . , xN |w) = xn exp(wxn )/ exp(wxn ). In the current application I implement
the first two.

6.2.3. Conditional likelihood (Pr Λc |Λa ∩ Λb )
I solve for the conditional likelihoo d through simulation. For each time p erio d, I sim-
ulate 1000 draws from an MI-multivariate censored normal, with upper and lower limits
given by ψimt t
(1) and ψim (−1). I then remove any draws that do not satisfy the restrictions
in A –the Accept-Reject method described in Train (2009). This guarantees that the
b
simulated draws are taken from Λat ∩ Λbt . Denote with Rt the number of accepted draws,
which varies across time but is increasing linearly in the number of original draws, R.
Also denote with εrt one such accepted draw in p erio d t. These draws are functions of
the parameter θ as well as the exogenous data, (xt , zt , at ). For each simulated draw I
calculate the equilibrium action, a (εrt |E , xt , z t ), where the equilibrium selection rule is
given by b est resp onding in random order and having the starting actions of all firms be
not adding nor removing any stores. The simulated conditional likelihoo d is

T
1
Rt
t
ln 1 a = a (ε |E , x , z )
rt t t
(31)
t=1
Rt r=1
The gradient and hessian of this function are calculated through forward-differencing.
As the simulated conditional likelihoo d is not smooth, and only approximates a smooth
function as R becomes large, the forward-differencing step length is chosen so as to
ˆ
ˆ Details are given in the appendix.
minimize errors in the calculation of θ.
6.3. Exogenous profit shifters
t
The profit index, qim in Eq. (1), captures the value from adding or removing stores
from a market. The higher the value the more stores a firm would find it optimal to have
t
in said market. As the mean value of logqim is given by xtim γ, the variables included in xtim
should capture firms’ incentives to have stores in a given market. Hence, I include within
xtim variables that represent stores’ current and future demand, as well as operating costs.
The main proxy for demand is population counts. However, firms’ profitability is not
likely linear in population counts. Thus, I include a linear spline of population counts
with three knots, at 150k, 340k, and 590k inhabitants, respectively. These values pertain
to the 25th, 50th, and 75th p opulation p ercentiles among municipalities with stores. In
addition to population, variables pertaining to income, education, poverty, and income
stability are also included: average income, GDP per capita, population density, the
degree of urbanization, fraction of households with access to public utilities, years of
schooling among adults, and the fractions of adults working in the formal sector, in
government jobs, retired, and with middle school degrees. I also include indicators for
touristic destinations, border cities, a Mexico City municipality, and a municipality that
is within an MSA and is not in Mexico City. Population counts, income, and fraction of
households with access to public utilities are also interacted with the dummies indicating
if the municipality is within Mexico City or within other non-Mexico City MSAs. This
allows these variables to have a differential impact in MSAs and in non-MSAs, and
particularly in Mexico City.
Costs are captured with the number of department stores, convenience stores, phar-
macies, and small grocery stores, as well as with measures of economies of distribution
and of density. Distance to the nearest distribution center captures economies of distri-
bution, while economies of density are captured with: (a) distance to nearest market with
prior stores and, (b) the sum of the inverse of the distance to all prior stores. Firm and
year fixed effects are also included to account for firm-wide profitability and temporal
macro economic sho cks affecting all firms.
I also include variables that capture dynamic incentives, i.e. learning, future growth,
future distribution centers, etc. Learning is captured with the age of the oldest store in
the market and with dummies for when there are no prior stores in the market and for
when there has been stores for only one year. Future distribution economies are captured
with the distance to the nearest distribution center that will be active three years later.
Demand growth is captured with the market’s year-to-year population growth.
The above demand and cost factors control for much heterogeneity. However, it is
likely that some municipalities are more profitable than others, even after conditioning
on the above variables. Such unobserved profitability would drive multiple firms to add
stores to a given municipality, biasing the rivalry parameter θrival . I address this issue
by including municipality fixed effects. These fixed effects capture any unobserved mu-
nicipality specific profitability that is common to all firms and that does not vary over
time. While effective at controlling unobserved market-specific heterogeneity, the inclu-
sion of municipality fixed effects has two drawbacks. First, they suffer of an incidental
parameter’s problem: with only six firms and eleven years, each fixed effect is estimated
off only 66 observations. As the model is non-linear in the parameters, any bias in the
municipality fixed effects permeates to all other estimates. Second, the estimated fixed
effect is unbounded below for any municipality where no stores are added and which
has no preexisting stores. Hence, the fixed effects effectively removes these municipalities
from the estimation sample, creating sample bias.
Unobserved market heterogeneity that is specific to a given firm and that is persistent
over time biases upward the cannibalization parameter, θown . That is, in the presence of
such unobserved (positive) heterogeneity, a firm will open many stores in the given mar-
ket. This will appear as if stores are weak substitutes to each other when in truth they
are strong substitutes, profiting from the same unobserved factors. To address this con-
cern, I estimate a specification with firm-municipality random effects. The random effects
capture persistent, firm-municipality specific unobserved heterogeneity. I use random ef-
fects instead of fixed effects, as each firm-municipality fixed effect would be estimated
solely off the eleven-year time span and would suffer acutely from the incidental param-
eters’ problem and sample bias. However, these random effects can be estimated only
by omitting the PLEC’s correction term, as well as the partial likelihoo d corresp onding
to market-pair deviations, the details of which are provided in the appendix. Hence, the
specification with random effects corrects for unobserved firm-market heterogeneity but
introduces mo del missp ecification.
Finally, firms also differ from each other in their competitive pressure. For example,
the profit loss that a Walmart store imposes on a Soriana store may differ than the
profit loss a Soriana store imposes on a Walmart store. Thus I include a specification in
which the cannibalization and competition parameters, θown and θrival , vary across firms.
Similarly, I allow for the cost parameters, θlin and θcvx to vary across firms.
6.4. Discussion on identification
The exogenous covariates, xt , include a constant. As this constant is collinear with the
fixed cost parameter, θfc , I normalize the fixed cost parameter to one. Such normalization
is typical in static entry models (cf. Sutton, 1991), as the inequalities that define entry
thresholds are preserved under positive, scalar transformations.19
The profit index parameters, γ, are identified off where firms choose to add and remove
stores. For example, if firms add stores in markets with high population and with low
poverty, the model infers –and estimates– that the coefficient on population is positive
and that on poverty is negative. Similarly, if firms shun markets where they already have
many stores, or where rivals have many stores, the respective cannibalization and rivalry
parameters, θown and θrival , are estimated negative.20
In contrast, cost parameters, θlin and θcvx are identified from how many stores firms
choose to add. If firms add few stores, θlin is estimated large so that only with low
probability would the unobserved shocks, ε, be sufficiently large to justify adding stores.
The increasing-cost parameter, θcvx , is identified from variation in how many stores are
added at the national level, which varies over time and across firms. More specifically, for
19
For example, if M is a measure of market size, π are the per-unit-of-market size variable profits, and F
are fixed costs, a firm enters if M π − F ≥ 0. This inequality is unaltered for any alternative market size
and fixed cost pairs, M and F , such that M/F = M /F .
20
The cannibalization and rivalry parameters are also identified off coordination in firms’ choices: firms
simultaneously adding stores in different markets, rather than the same markets. Such coordination is not
included in the partial likeliho o d, but in the PLEC’s correction term.
each year and each firm, the profit parameters and exogenous variables imply the number
of stores that should be added. If the true number of stores added is small whenever this
expected number is high, θcvx is estimated as large: it is more costly to add stores when
many stores should be added than when few stores should be added. This is similar to a
simultaneous equations model, in which each market represents an equation, choosing to
add stores is the dependent variable, and this choice depends on the choices in all other
markets. In such case, the identifying restriction is that θcvx does not vary over time and
depends solely on the sum of choices.
Finally, the cannibalization estimate, θown , is identified separately from the dummy
variable on store age, a dummy that equals one whenever the firm has no stores in the
market, from two sources: first, from store closures, in which the dummy variable is always
one, and therefore has no effect on decisions; second, from variation in the number of
starting stores, z, in which the dummy variable is defined as there being any number of
starting stores, while the cannibalization effects vary with the number of starting stores.
7. Results
7.1. Entry costs
I estimate six different specifications of the structural model, shown in Tables 4 and 5,
where the former contains the estimates on costs and competition parameters, and the
latter on variables affecting variable profits. Specification I is the most stringent specifi-
cation. In this specification the mean profit index is a function of solely a linear spline of
log-population and firm and year fixed effects. Specification II expands on Specification
I in that the profit index includes an extensive set of covariates, including detailed de-
mographics, non-grocery retailers, and variables that capture economies of distribution
and of density. Specification III adds variables that capture dynamic incentives: market
growth, store age, and distance to future distribution centers. Specification IV, V, and VI
extend specification III in different ways. Specification IV adds municipality fixed effects
to control for time invariant unobserved market factors common to all firms. Specification
V adds firm-municipality random effects to control for time invariant unobserved market
factors specific to each firm. Finally, Specification VI allows cost and competition pa-
rameters to differ across firms, estimating heterogeneity in the key parameters governing
store expansion.
Regarding cost estimates, i.e. Table 4, the convex cost parameter is positive and signif-
icant across all six sp ecifications. Fo cusing on Specification III, a convex cost parameter
value of 0.089, coupled with a linear cost parameter value of 2.881, implies the cost of
adding the second store is 13% higher than the cost of adding the first store.21 More im-
21
The cost of the nth store is given by θ lin + θ cvx (n ln n − (n − 1) ln(n − 1)). However, the parameter
values shown in Table 4 correspond to the linearized transformation of the cost parameters. Thus, for the
parameters θ˜lin and θ˜cvx shown in Table 4, the cost parameters are calculated as θ lin = exp θ˜lin − 1 and

θ cvx = θ˜cvx exp θ˜lin .
Table 4
Structural estimates - cost and rivalry parameters.
I II III IV V VI
CCM CHD GG LEY SOR WM
Convex cost (θ˜cvx ) 0.104 0.097 0.089 0.090 0.081 0.460 0.242 0.137 0.199 0.892 0.299
(0.001) (0.013) (0.016) (0.018) (0.031) (0.068) (0.043) (0.034) (0.040) (0.101) (0.045)
Linear cost (θ˜lin ) 1.391 1.583 2.881 3.117 3.867 2.436 2.306 3.036 2.510 0.069† 1.841
(0.004) (0.048) (0.081) (0.089) (0.136) (0.230) (0.151) (0.141) (0.163) (0.383) (0.205)
Cannibalization (θ own ) −0.956 −0.947 −0.772 −0.856 −0.968 −0.712 −0.783 −0.775 −0.548 −0.753 −0.691
(0.005) (0.005) (0.012) (0.011) (0.027) (0.035) (0.047) (0.023) (0.037) (0.032) (0.022)
Rivalry (θ rival ) −0.079 −0.107 −0.188 −0.448 −0.187 −0.324 −0.353 −0.285 −0.341 −0.418 −0.239
(0.013) (0.016) (0.020) (0.035) (0.049) (0.031) (0.030) (0.029) (0.034) (0.028) (0.035)
Partial log-likeliho o d −5391 −4945 −3309 −2563 −471 −3356
Size of correction term 7.7 × 10−5 9.6 × 10−4 3.7 × 10−4 0.122 na 3.2 × 10−3
Equilibrium selection - 0.769 0.623 0.163 0.001 na 0.012
avg. of Pr Λc |Λa ∩ Λb
Standard errors in parenthesis. (†) NOT statistically significant at 5% p-level. Abbreviations: CCM - Comercial Mexicana, CHD - Chedrahui, GG - Gigante,
LEY - Casa Ley, ˜cvx ≡ θ cvx /(1 + θ lin )
SOR- Soriana, WM - Walmart. Convex cost and linear cost estimates shown correspond to the linearized parameter: θ
and θ˜lin ≡ ln 1 + θ lin . ‘Size of correction term’ shows the norm of the PLEC correction term (i.e. Eq. (26)) over the norm of the partial log-likeliho o d

ˆ ‘Equilibrium selection’ averages the estimated conditional likeliho o d Pr Λc |Λa ∩ Λb across all eleven years. See Table 5 for estimated parameters
estimate, θ. t t t
on profit shifters. For Specification I, these are simply population (linear spline) and firm and year fixed effects. Specification II additionally includes extensive
demographic variables and proxies for economies of distribution and density. Specification III adds proxies for dynamic incentives. Specification IV and V
add to Specification III market fixed effects and market-firm random effects, respectively. Specification VI includes the same variables as Specification III.
33
34
Table 5
Structural estimates - select profit shifters. Estimated values, standard errors, and increased profit index induced by a typical shift in the covariate’s value.

I II III IV V VI
Est S.E. Est S.E. Est S.E. Est S.E. Est S.E. Est S.E.
∗ ∗ ∗ ∗ ∗
Population (< 150k hab) −0.83 (0.01) −0.75 (0.06) −0.52 (0.06) −0.88 (0.50) −0.73 (0.10) −0.49 (0.06)
+50k hab −40% −37% −27% −42% −36% −26%
Population (150k–340k hab) 0.13∗ (0.04) 0.10 (0.07) 0.42∗ (0.08) 0.28 (0.60) 0.11 (0.27) 0.62∗ (0.08)
+50k hab 2.7% 1.9% 9.1% 5.7% 2.2% 13%
Population (340k–590k hab) 0.43∗ (0.06) 0.65∗ (0.13) 0.68∗ (0.15) 3.74∗ (0.74) 1.48 (0.78) 0.74∗ (0.15)
+100k hab 9.1% 14% 15% 114% 35% 16%
Population (>590k hab) 0.99∗ (0.10) 1.08∗ (0.11) 1.33∗ (0.13) 5.85∗ (0.92) 1.93∗ (0.39) 1.61∗ (0.13)
+100k hab 10% 12% 14% 80% 21% 18%
Lo cal gro cery stores (#) −0.13∗ (0.02) −0.07∗ (0.03) −0.06 (0.06) −0.13∗ (0.06) −0.07∗ (0.03)
+1 store −8.8% −5.3% −4.5% −9.3% −5.2%
Department stores† 0.27∗ (0.06) 0.78∗ (0.07) 0.29 (0.18) 0.77∗ (0.16) 0.94∗ (0.07)
+2 stores 7.3% 22% 7.7% 22% 28%
Convenience stores† 0.44∗ (0.05) 0.46∗ (0.07) 0.34∗ (0.09) 0.53∗ (0.14) 0.53∗ (0.07)
+10 stores 16% 17% 13% 20% 20%
Pharmacies† −0.13∗ (0.06) −0.08 (0.07) −0.07 (0.15) 0.02 (0.20) −0.11 (0.07)
+10 stores −3.9% −2.2% −2.0% 0.5% −3.2%
Distribution center (miles) −0.01 (0.01) −0.06∗ (0.03) −0.07 (0.03) −0.02 (0.06) −0.09∗ (0.03)
+100 miles −0.3% −2.6% −3.0% −0.7% −4.0%
Nearest active market (miles) 0.41∗ (0.02) 0.16∗ (0.02) 0.23∗ (0.03) 0.26∗ (0.10) 0.17∗ (0.02)
+60 miles 33% 12% 17% 19% 12%
Network centrality (stores/mi) 0.01 (0.02) −0.11∗ (0.03) −0.25∗ (0.05) −0.16 (0.11) −0.10∗ (0.03)
+0.5 stores/miles 1.0% −8.2% −17% −11% −7.4%
(continued on next page)
Table 5 (continued)
I II III IV V VI
Est S.E. Est S.E. Est S.E. Est S.E. Est S.E. Est S.E.
Population growth (%) 1.32 (0.72) −2.57 (1.42) −0.79 (1.70) 1.14 (0.76)
+2.5% 3.3% −6.2% −2.0% 2.9%
Future dist. center (miles) −0.05 (0.04) −0.01 (0.02) 0.01 (0.07) −0.01 (0.03)
+100 miles −2.2% −0.6% 0.4% −0.2%
Store age (years) −0.93∗ (0.03) −0.99∗ (0.03) −1.21∗ (0.07) −1.04∗ (0.03)
+1 year −7.8% −8.3% −10% −8.7%
Store age - new market −1.44∗ (0.04) −1.40∗ (0.05) −1.11∗ (0.08) −1.44∗ (0.05)
(dummy)
0→1 −76% −75% −67% −76%
Store age - one year (dummy) −2.89∗ (0.06) −2.88∗ (0.06) −2.84∗ (0.10) −2.97∗ (0.06)
0→1 −94% −94% −94% −95%
Standard errors in parenthesis. (∗ ) statistically significant at 5% p-level. (†) Variable expressed on the unit interval using a logistic transformation: f (x) =
x/(x + E[x]). All variables whose domain is not the unit interval are in logs. For each variable, the table includes a typical shift in the covariates value,
usually determined by the variable’s sample standard deviation. For each estimate, the table includes the percent change in the profit index that a typical
shift in the corresponding covariate would cause. Additional profit shifters included in Specifications II-VI but not shown above are: average income, state
GDP p er capita, p opulation density, average years of scho oling among adults, fraction of p opulation with IMSS, with ISSTE, retired, and with access to
public utilities, fraction of adults with middle school diploma, urbanization (i.e. the HHI of population centers), indicators for municipalities belonging to
border cities, touristic destinations, Mexico City, and other metropolitan areas, population of adjacent municipalities within the same MSA, indicators for
missing data, and interactions of the metropolitan area indicator with population, income, access to public utilities, and fraction of population in urban
centers.
35
Table 6
Cost of adding the nth store relative to the first store – Specification VI estimates.
CCM CHD GG LEY SOR WM
nth store (#) 5 4 11 5 13 27

Cost of first store 10.4 9.04 19.8 11.3 0.07 5.29
Incremental cost of nth store 13.1 6.06 7.1 6.1 2.39 4.71
Abbreviations: CCM - Comercial Mexicana, CHD - Chedrahui, GG - Gigante, LEY - Casa Ley, SOR -
Soriana, WM - Walmart. Incremental cost of nth store is the difference in per-store cost for the nth store
relative to the first store.
portantly, given firms open an average of eleven stores a year, the estimated parameters
imply that opening the 12th store, the marginal store, costs 33% more than the cost of
the first store. This incremental cost can be avoided by delaying adding the marginal
store until the end of the expansion p erio d. Thus, assuming the expansion wave lasts ten
years, the deferred cost is consistent with a three percent internal return rate.
The estimated cost parameters do not vary much across Specifications I through V,
which differ in how they control for store profitability. Specification VI allows cost param-
eters to differ across firms. In this specification, the estimated parameters imply Soriana
has the largest expansion costs and Gigante the smallest. Such estimates follow from
observed behavior, in which Soriana’s expansion is consistent year after year while Gi-
gante’s is sporadic: some years adding 41 stores, others adding none. The model attributes
expansion costs to consistent expansion, and high entry costs to sporadic expansion.22
The convex cost parameter estimates do not have a straightforward interpretation on
how entry costs rise with the number of added stores. In order to better understand how
entry costs increase as firms add stores, Table 6 uses the estimates from Specification VI
to show how much higher firms’ entry costs are for the nth store relative to the first store,
where n is set to firms’ average number of yearly opened stores (rounded to the nearest
integer). In other words, this is the cost that a firm could avoid by delaying adding the
marginal store until the end of the expansion p erio d, which could b e several years into
the future. Table 6 shows cost increases ranging from 13.1, for Comercial Mexicana to
2.39, for Gigante.23 The lower incremental costs for Soriana and WalMart are consistent
with these firms’ aggressive expansion relative to the other firms.
In summary, firms have significant expansion costs that limit how quickly they can add
stores, and these expansion costs differ significantly across firms. Policies that accelerate
expansion by decreasing these costs are more likely to favor Comercial Mexicana and
Gigante, as these have the largest expansion costs. However, WalMart and Soriana are
likely to expand fastest under lower expansion costs, as they have the highest percentage
increases in entry costs and therefore are the most sensitive to changes in expansion costs.
22
Walmart’s convex cost parameter in Specification VI is 0.299, one-third of Soriana’s value. The lower
estimate is due to Walmart’s increased expansion in the later years. As the model does not allow expansion
costs to vary with time, the estimation attributes the slower expansion in the earlier years as high entry
costs.
23
Parameters are estimated up to positive affine transformations; hence, there is no direct interpretation
to firms’ profit units. However, such profit units can be compared, one relative to another.
7.2. Competition parameters
In addition to the cost parameters, the model’s two other key parameters are the
comp etition parameters, θown and θrival . These parameters dictate how entry of additional
stores, of the same firm or of other firms, depress profitability of existing stores. A
cannibalization parameter (i.e. θown ) of −0.772, as in Specification I I I, implies that a
own
firm’s second store depresses profitability of the first store by 27%, i.e. (s + 2)θ /(s +
own
1)θ , for s preexisting stores. In contrast, the estimated value of the rivalry parameter
(i.e. θrival ) is approximately one-fourth that of the cannibalization parameter, or −0.188
in Specification I I I. Hence, a firm’s p er-store profits fall 12% when a rival adds its first
riv
store, i.e. (2/1)θ . Not surprisingly, stores of the same firm are closer substitutes to each
other than stores of different firms.
Both comp etition parameters, θown and θrival , are larger in Specifications IV and V
than in Specification I I I. Specifications IV and V control for unobserved, time invariant
market heterogeneity common across firms (Specification IV) or sp ecific to each firm
(Specification V). Unobserved market profitability induces entry of multiple stores into
the same market, biasing competition parameters upwards. When controlling for it, both
comp etition parameters b ecome more negative: the cannibalization parameter becomes
−0.968 (Specification V) and the rivalry parameter becomes −0.448 (Specification IV).
These parameters imply per-store profits decrease by 33% with the entry of a firm’s
second store, and by 27% with the the entry of a rival’s first store. Ellickson et al.
(2013) report similar cannibalization estimates for the US industry: adding a second
store depresses Walmart’s per-store profits by 18%. However, the same paper finds much
larger rivalry effects than those estimated here: for a Walmart store to offset the profit
loss from a Target’s store entry, markets must grow 130%. In contrast, Jia (2008), who
also studies Walmart’s expansion in the US, finds rivalry effects similar to those estimated
here: for a Kmart store to offset the profit loss from a Walmart’s store entry, markets
must grow 27%.24
To better understand how competition parameters and entry costs affect market out-
comes, Table 7 uses the estimates from Specification VI to show entry and exit thresholds,
similar to those in Bresnahan and Reiss (1991). An entry threshold is the profit-index
value at which a firm finds it optimal to open an additional store. An exit threshold
is the profit-index value at which firms find it optimal to close a store. Entry and exit
thresholds differ in that latter does not depend on entry costs.25 A given profit-index
value can be achieved by a combination of factors, including increasing population and
24
Values pertaining to Ellickson et al. (2013) are calculated using estimates from Column I of Table 1,
using the average value of the identified sets reported in Table 2, and assuming no other stores are present
and market population is 50 thousand. Values pertaining to Jia (2008) are calculated using estimates from
the Baseline specification in Table 3. Cannibalization estimates are not available in Jia (2008), as that paper
assumes a firm’s own stores are complements to each other and not substitutes.
25
The entry threshold for adding the nth store is defined as the profit index, q , at which the firm is
indifferent between the own
profits from n − 1 stores andown the profits from n stores minus the cost of adding one
store: q (n − 1) · nθ − θ fc (n − 1) = q n · (n + 1)θ − θ fc n − C, where C is the cost of adding a store. In
calculating thresholds, C is defined as the firm’s cost of their average marginal store (i.e. values reported in
Table 7
Market thresholds at which firms add or remove the first three stores in a given market – Specification VI
estimates.
Entry thresholds Exit thresholds
0→1 1→2 2→3 0←1 1←2 2←3
CCM 64.9 130 195 1.64 3.29 4.92

CHD 31.6 69.4 110 1.72 3.77 5.98
GG 57.0 124 195 1.71 3.71 5.58
LEY 30.8 51.2 68.4 1.46 2.43 3.25
SOR 44.8 94.6 146.4 1.69 3.56 5.50
WM 37.0 72.3 106 1.61 3.16 4.65
Entry thresholds calculated using firms’ average marginal cost, as shown in Table 6. Values shown in abstract
profit units. Abbreviations: CCM - Comercial Mexicana, CHD - Chedrahui, LEY - Casa Ley, GG - Gigante,
SOR - Soriana, WM - Walmart.
decreasing distance to distribution centers, to name a few. As firms may have different
mechanisms to achieve a given profit-index value, these thresholds are expressed in their
abstract profit-units and should be compared between each other and across firms, but
should not be assigned a direct per-se meaning.
Three facts are worth mentioning regarding Table 7. First, depending on the firm, entry
thresholds are twenty to forty times larger than the corresponding exit thresholds. Thus,
market profitability must fall drastically for a firm to want to close existing stores. This
difference between entry and exit thresholds is due to entry costs being much larger than
fixed costs. Second, firms differ significantly in their entry thresholds. For example, Casa
Ley has the lowest entry thresholds: 30.8 units for the first store. In contrast, Comercial
Mexicana’s entry threshold is 64.9 for the first store, more than twice that of Casa Ley.
Importantly, the difference in thresholds is not the same as the difference in entry costs,
i.e. those in Table 6, as the thresholds also factor in the cannibalization estimates, θown .
Third, the amount by which the profit index must increase to justify the second store is
larger than the profit loss the existing store suffers when the second store is added. For
example, Chedrahui’s entry thresholds for the first and second stores are 31.6 and 69.4,
respectively. That is, the profit index must grow 120% from when Chedrahui opens its
first store to when it opens its second second store. However, Chedrahui’s cannibalization
parameter of −0.78 implies the profit index must grow only 27% to offset the first store’s
profit loss from the second store’s entry. Entry thresholds command larger increases in
the profit index because the second store’s profit needs to offset both the profit loss to
the first store and the costs of the second store.
Table 6). In addition, θ fc is assumed to be one, its normalized value. Exit thresholds are similarly calculated,
with the exception that C is set to zero.
7.3. Profit shifters
Recall estimates for select variables concerning the mean profit index are shown on
Table 5. In addition to showing the estimates and standard errors, the table includes the
percentage by which the mean profit index would change if the variable were to increase
above mean value by a typical shift. This typical shift is displayed underneath each
variable name and is chosen as the variable’s standard deviation, as a unit change –for
dummy variables–, or as a commonly used value. For example, the top population spline
(i.e. p opulation counts ab ove 590k habitants) has a typical shift of 100k inhabitants. In
addition, Specification I rep orts a p oint estimate of 0.99 for this variable. Thus, firms with
stores in municipalities of more than 590k habitants experience a ten percent increase in
their profit indices when population increases by 100k inhabitants.
Most demand shifters have the expected effects across all specifications. For exam-
ple, focus on Specification I I I. Here, the top two population splines have statistically
significant estimates of 0.68 and 1.08, respectively. These estimates imply profit indices
increase by fifteen and fourteen p ercent, resp ectively, when p opulation counts increase
by 100k inhabitants over mean values. Similarly, increasing the number of department
stores by two is equivalent to a twenty-two percent increase in the profit index.
Economies of distribution appear weak, as decreasing the distance to the nearest dis-
tribution center by one hundred miles generates a meager 2.6% increase in the profit
index. In comparison, Holmes (2011) finds that locating a store one hundred miles closer
to a distribution center increases Walmart’s stores’ variable profits by 1.5–4.3 percent.26
Economies of density are represented by two variables: distance to nearest active market
and network centrality –the sum of inverse distance to all of a firm’s stores. The two
measures estimate opposing effects. For example, markets that are closer to a firm’s core
have larger profit indices: increasing the measure of network centrality by one standard
deviation (by 0.5 units) increases the profit index by eight percent. However, locating
sixty miles farther from the nearest market also increases the profit index by twelve per-
cent. Both effects are reconcilable with industry practices: firms build their core business
around specific geographic regions, i.e. Walmart in central Mexico, Soriana in northeast
Mexico, but also branch a few stores far out, i.e. Walmart in Tijuana, Soriana in Merida.
As for the dynamic incentives, estimated in Specifications I I I-VI, demand growth and
future distribution economies appear to have no effects on the profit index. Store age, in
contrast, has a negative impact on the index. The negligible effects on demand growth
and future distribution economies are consistent with comments from industry experts,
whom describe firms’ expansion plans as responding to existing market needs and not
to future developments. The negative effects of store age on the profit index, which is
strongest for the first year after entry, are consistent with firms being cautious when
26
Holmes (2011) estimates yearly cost savings of between $178, 000 and $520, 000 dollars when locating a
store one hundred miles closer to a distribution center. Holmes (2011) also reports average per-store-sales of
$70M, gross margins are 24% of revenue, and additional variable costs are 7% of revenue, netting a yearly
operating margin of $12M.
expanding into new markets: probe the market with a single store for a few years before
adding additional stores, despite the market being large enough to support multiple
stores.
7.4. Equilibria selection and it’s effect on estimated parameters
The PLEC estimator includes a correction term that accounts for how the estimated
partial-log-likelihoo d does not consider equilibrium selection. When the norm of this
correction is small, the PLEC estimator is a go o d approximation of the log-likelihoo d
estimator. Said differently, a small norm means the equilibrium selection rule provides
little identifying power to the model parameters, even when the area of multiple equilibria
may be large. Hence, Table 4 shows the norm of this correction term relative to the norm
of the partial log-likelihoo d estimate. This ratio is small across all specifications. For
example, in Specification I I I the norm of the correction term is less than 0.04 percent
that of the partial log-likelihoo d estimate. Thus, the PLEC is an appropriate estimator in
this setting and firms’ best responses are highly informative of parameter values vis-a-vis
equilibrium selection.
In order to assess the degree of multiple equilibria, Table 4 also includes the aver-
age, across years, of the conditional likelihoo d used in the PLEC’s correction term. This
conditional likelihoo d is the probability that mo del outcomes corresp ond to observed out-
comes, conditional on firms actions being best responses to each other. Said loosely, it is
the probability that the equilibrium chosen by the model corresponds to the true equi-
librium. This conditional likelihoo d is high for the restrictive specifications. For example,
it is 0.769 in Specification I, which implies that 77% of the simulated draws generate the
same outcomes as those observed in the data whenever these draws guarantee firms are
b est resp onding to each other. In contrast, the conditional likelihoo d values are much
smaller in the least restrictive sp ecifications, Specifications I I I, IV and VI. For example,
in Specification I I I, the average conditional likelihoo d is 0.16. Imp ortantly, the relative
norm of the correction term is also small, 3.7 × 10−4 . Thus, even as the assumed equilib-
rium selection predicts outcomes p o orly, knowing such rule is not necessary to identify
the model parameters: firms’ best responses carry sufficient information to identify model
parameters.
8. Simulating industry expansion under accelerated growth
8.1. Overview
Mexico currently ranks 38th in the World’s Bank Ease of Doing Business Index. Part
of firms’ expansion costs are from dealing with government regulation and red tape.
Had the country implemented reforms that facilitated stores expansion, would have the
industry grown faster? While lower expansion costs would lead to more stores in the
short-run, it could also induce crowd out, resulting in a more concentrated industry and
possibly even fewer stores. In order to address if less constrained growth does in fact lead
to more growth, I use the estimates from Specification I I I and simulate the industry’s
expansion from 1996 to 2006 assuming a lower expansion cost than estimated.
I simulate the industry under three different scenarios. The first scenario retains all pa-
rameters at estimated values and serves as a basis for comparing the other two scenarios.
The second scenario assumes the expansion cost parameter, θcvx , is ten percent smaller
than estimated.The third scenario assumes expansion costs are as estimated but the per-
store cost parameter, θlin , is reduced by one percent. For the second and third scenarios
I calculate a ‘subsidy amount’, which, given firms’ entry decisions in each scenario, is the
difference between firms’ incurred costs and the costs they would have incurred had cost
parameters not been reduced. Hence, the one percent reduction in the third scenario was
chosen such that the ‘subsidy amount’ were the same, on average, across the second and
third scenarios.
These cost reductions compare two different mechanisms by which policy makers can
accelerate store expansion. The first one, a reduction in expansion cost, can be achieved
by easing the process of opening store by, for example, streamlining permits, centralizing
government involvement, educating the work force, etc. The second one, a reduction in
p er-store costs, can b e achieved by easing the cost of opening store by, for example,
reducing tax burdens, permit costs, hiring costs, etc. If the costs of these policies are
smaller than the ‘subsidy amount’, then these policies are likely to be revenue neutral
under an appropriate taxation policy.
For each of the three scenarios, I simulate 3000 profit shocks, each an MxIxT-long
vector, drawn from the estimated partial-likelihoo d distribution.27 For each profit draw,
I simulate firms’ equilibrium choice of adding and removing stores one year at a time,
starting in 1996 and progressing through 2006. For each year, I update the starting stores,
zt , according to the previous year’s equilibrium choices and the previous year’s starting
stores. Exogenous variables that depend on starting stores are updated accordingly, i.e.
distance to nearest active market, store age, etc., and profit indices for every firm and
market are calculated by adding the simulated profit shock with the relevant mean profit
index (evaluated using the exogenous profit shifters: population, income, store age, etc.).
Equilibrium choices are then computed using as the iterated b est-resp onse heuristic dis-
cussed previously, in which Walmart best responds first, followed by Soriana, Comercial
Mexicana, Gigante, Chedrahui, and Casa Ley. Thus, each simulation represents one way
in which the industry could have evolved across the eleven-year time span that is con-
sistent with observed data. Importantly, for these simulations to properly assess how
the industry would have evolved under alternative cost structures, it is necessary that
firms’ expectations on future actions not be altered by the policy change, such that the
estimated parameters truly capture firms’ responsiveness to shifts in the profit variables,
27
Drawing shocks from the estimated partial likeliho o d is akin to including the residuals from a linear
regression in a prediction model: if parameters are kept at their estimated values and rival firms choose
the same actions as in the observed data, the simulated profit shocks would be such that a firm would
endogenously choose the same actions as in the observed data.
i.e. xi , zi , and a−i for i ∈ I. Therefore, I purposely mo del small cost reductions such that
any effect of these cost reductions on firms’ expectations becomes a second order issue.
From each simulation, I record the number of stores each firm has in each market and
in each year. To summarize these market configurations in a meaningful way, I calculate
four metrics: (a) the total number of stores, (b) aggregate consumer welfare, (c) the
percentage of markets with no stores, and (d) the percentage of markets with stores
from only one firm. For each metric, I calculate the value in the last year of the sample
–to show long-run effects– as well as the accumulated values across all years –to show
aggregate effects. These metrics are calculated for each of the 3000 simulation draws, in
each of the three scenarios. For Scenarios II and III, I compute the difference in these
metrics’ values with respect to the metrics’ values in Scenario I, and show in Table 8 the
average of differences values (i.e. averaged across simulations), as well as the standard
deviation, minimum, and maximum differences.
The model estimated in Section 6 does not directly generate a measure of consumer
welfare under which alternative cost reductions can be compared. However, the model
does predict how many stores each firm would have had in each market and in each year.
In order to calculate consumer welfare from the store allocation, I introduce the following
extremely simplified model of demand and of consumer welfare.
8.2. A simple model relating stores to consumer welfare
Let consumers’ preference for a given store be captured in a nested-logit framework.

Specifically, for every market and year, consumers make a discrete choice of which store
to purchase from or to not purchase at all. Consumer r’s utility from choosing store j is:
urj = δ − αpj + ξrj (32)

In the above equation, δ captures consumers’ mean utility of purchasing at store j, which
I assume is the same for all stores, regardless of firm ownership.28 pj is the ‘price’ of store
j and ξ rj is consumer r’s random utility for store j. Not purchasing generates utility ur0 =
ξr0 . Random utility, ξr = (ξr0 , . . . , ξrJ ), is distributed according to a GEV distribution
with correlations pertaining to a two-level nest structure, where the upper nest is the
choice of which firm to purchase from and the lower nest is the choice of which store
to purchase from among those of the upper-nest firm. Formally, let Jimt be the set
containing the indices to firm i’s stores (in market m at time t) and λ be the nesting
I
parameter, common to all firms. The CDF of ξ r is exp(− i=1 ( j∈Jimt e−ξrj /λ )λ ) and
the probability of a consumer choosing firm i’s store j is:

(δ−αpk )/λ λ−1
e(δ−αpj )/λ k∈Ji e
Dj = I
(δ−αpk )/λ λ
(33)
1 + n=1 k∈Jn e
28
This assumes all firm heterogeneity estimated in Section 7 will be driven by cost factors, not consumer
preferences.
Table 8
Accelerating growth: industry outcomes under three different cost scenarios. Summary of market outcomes from simulating store entry and exit, from 1996
through 2006.
Aggregate across 11-year span Last year values (2006)

Cost Subsidy Cons. Stores Unserved markets Monop. Cons. Stores Unserved Monop.
welfare markets welfare markets markets
Base scenario
Mean 31,892 0 380 11,194 54.2 17.6 44.1 1611 29.3 32.1
SD 655 0 1.20 85 0.44 0.50 0.32 22 0.58 0.79
Min 30,372 0 376 10,971 52.5 16.3 43.2 1549 27.0 29.9
Max 34,379 0 383 11,415 55.4 19.3 45.1 1693 30.7 34.3
Reduced expansion cost (difference relative to base)
Mean 1264 842 2.65 203 −2.05 1.66 0.28 23 −0.62 0.16
SD 269 21 0.60 38 0.44 0.51 0.16 9 0.48 0.66
Min 530 786 0.99 95 −3.59 0.15 −0.21 0 −1.97 −1.97
Max 2184 913 3.98 342 −0.74 3.28 0.81 55 0.56 1.97
Reduced per-store cost (difference relative to base)
Mean 808 842 1.97 148 −1.94 1.59 0.18 13 −0.53 0.16
SD 239 17 0.59 34 0.41 0.48 0.15 8 0.46 0.65
Min 133 795 0.28 58 −3.48 0.10 −0.29 −8 −1.97 −1.69
Max 1672 894 3.51 280 −0.67 3.33 0.74 45 0.56 2.25
Summary statistics of market outcomes generated from 3,000 simulations. SD - standard deviation. Base scenario assumes all parameters are at estimated
values. Reduced expansion cost assumes the convex cost parameter, θ cvx , is reduced by ten percent relative to its estimated value. Reduced per-store cost
assumes the linear cost parameter, θ lin , is reduced by one percent relative to its estimated value. Each simulation predicts a number of stores every firm has
in every market for every year, from 1996 though 2006. From these outcome, six metrics are calculated: (a) the total cost of adding stores, aggregated across
firms, markets, and year; (b) the total subsidized expense, measured as the cost difference between the expense firms incurred and that which they would
have incurred had cost parameters not been reduced; (c) the aggregate consumer welfare, aggregated across markets and years, (d) the total number of stores,
aggregated across firms, markets, and years; (e) the percentage of markets with no stores, calculated across all market-year pairs; and (f) the percentage of
markets with stores owned by solely one firm. The top panel shows the summary statistics of these metrics. The bottom panels show the summary statistics
of the difference in the metrics’ values relative to the Base scenario. Cost and Subsidy are expressed in ‘profit units’, which have no external translation
(e.g. dollar value) but can be compared to each other and across scenarios. Similarly, Consumer Welfare is expressed in ‘utils’, which also have no external
meaning but can be compared across scenarios.
43
Demand for store j is simply Mmt · Dj where Mmt is the number of consumers in the
market: the population of the municipality. Variable costs are assumed to be a constant,
c, same for all firms. Prices are determined by Nash equilibrium. Given equilibrium prices,
I calculate equilibrium profits and consumer welfare. As all firms have the same mean
utility and costs, equilibrium prices are the same across all of a firm’s stores. In addition,
equilibrium profits and consumer welfare depend solely on each firm’s number of stores
and two parameters: λ and δ − αc. Details are provided in the appendix.
I calibrate the two parameters, λ and δ − αc, so that this simplified model generates
similar cannibalization and competition effects as those estimated in Section 6. Specifi-
cally, I set the parameters so that b oth mo dels generate the same percentage change to a
firm’s variable profits from (a) adding a second store when no rival stores exist and (b) a
rival adding its first store when the firm has only one store and no other rival stores exist.
The resulting calibrated values are: λ = 0.49 and δ − αc = −1.48. Given these parameter
values, the model generates value from variety: two stores of different firms generates
higher consumer welfare than two stores of the same firm. This value for variety is driven
by two factors: competition decreases prices and, due to the nested structure of demand,
consumers value variety per-se.
8.3. Results from counterfactual simulations
Table 8 summarizes the simulations’ outcomes. The top panel contains the values
from the first scenario, where costs have not been modified. In this scenario, the industry
generates an average of 11,194 store-years. However, under some profit draws, store-years
can be as low as 10,971 or, for other profit draws, as high as 11,415. These store-years
generate an average consumer welfare of 380 utility units,29 which arises from the number
of stores, the size of the markets in which they are located, and the diversity of firm
ownership within markets. In addition, by the end of the sample p erio d, 29% of markets
remain unserved and 32% of markets are served by a single firm. Thus, there appears to
be significant space for further growth.
The bottom panels of Table 8 display how market outcomes change when store entry
costs are reduced. For example, the panel ‘Reduced Expansion Cost’ displays outcomes
when reducing the expansion cost (i.e. θcvx ) by ten percent. From this panel, one observes
how the reduction in cost generates, on average, 203 more store-years. More importantly,
for the Mexican supermarket industry of the late 1990s, reducing expansion costs would
have likely resulted in more stores, higher consumer welfare, and lower market concen-
tration. However, in some simulations, no additional stores are generated in the long run:
the minimum value of ‘Stores’ in ‘Last year values’ is zero. In addition, the minimum
‘Consumer Welfare’ value for ‘Last year values’ is negative (i.e. −0.21): a reduction in
29
Without a calibrated price coefficient, consumer welfare cannot be expressed in dollar values. However,
consumer welfare can be compared across scenarios, expressing differences in percentage terms. Hence, I
express consumer welfare in utility units and discuss percentage differences across scenarios.
expansion costs could have resulted in increased concentration and lower welfare, even
as the number of stores is kept constant.30
The bottom most panel in Table 8, the panel labeled ‘Reduced Per-Store Cost’, dis-
plays market outcomes when the p er-store cost, θlin , is reduced by one percent. The
percentage reduction is chosen so that this scenario generates the same average sub-
sidy expense as the Reduced Expansion Cost Scenario. In comparing simulations, one
observes how subsidizing expansion costs is much more effective at accelerating growth
than subsiding per-store costs. For example, when subsiding the expansion cost, the
industry generates 203 more store-years than in the Base Scenario. In contrast, when
subsiding per-store costs, the industry generates only 143 more store-years. In addition,
crowd-out is also larger in the Reduced Per-Store Cost Scenario: for some simulation
draws, eight fewer stores are generated on the long-run relative to the Base Scenario.
That is, a reduction in store costs results in fewer stores being opened.
It is not intuitive that reducing per-store costs generates fewer stores than reducing
expansion costs, and this is likely driven by the fact that the industry continues to expand
long after the end of the sample p erio d. If the industry were near the end of its expansion
wave, reductions in expansion costs should have negligible effects on fostering industry
growth.
Finally, notice that the subsidy amounts and their effects are small. The average
subsidy amount is 842 units, while firms’ total costs in the Base Scenario is 32,892
units. Similarly, average consumer welfare increases by 2.65 units from a base of 380
units. I simulate small subsidy amounts to illustrate the effects of crowd-out, which are
hard to observe at the beginning of an expansion wave: when there are ample expansion
opportunities, lowering expansion costs is unlikely to result in crowd-out as firms simply
expand faster into different markets. Crowd-out is most easily observed at the end of
an expansion wave, where such expansion opportunities dry out. However, the sample
p erio d ends in 2006, long before Mexican supermarket firms stalled their expansion.
9. Conclusions
This paper studies expansion costs and the consequences of lowering them in Mexico’s
supermarket industry. Mexican firms do indeed appear to be constrained in their rate
of growth, in which the marginal store opened in a given year costs 33% more to open
than if firms had delayed its opening until the end of the expansion p erio d. What would
have happened if such costs were lower? In simulating industry expansion under lower
expansion costs, firms do indeed add more stores and add them sooner. However, under
some conditions, reducing expansion costs would have resulted in firms crowding out
each other, and thus in fewer stores in the long run. Crowd-out occurs, in the presence
30
The changes in outcomes are not driven by equilibrium selection, as all simulations use the same equilib-
rium selection rule. Moreover, in comparing across scenarios the simulated profit shocks are kept constant.
That is, the differences in market outcomes do not arise from comparing outcomes across different profit
shocks, but from the industry evolving differently under the same profit shocks.
of sunk cost, as some firms add stores only if they can achieve high enough profits for a
short p erio d. Accelerating expansion results in rival firms entering sooner than otherwise
and deteriorating those profits. As a result, some firms do not add stores that they would
have otherwise added. I indeed find that reducing expansion costs by ten percent could
have resulted in fewer supermarket stores in Mexico, with consumer welfare decreasing
and industry concentration increasing as a result.
Although this paper quantifies expansion costs in the Mexican supermarket industry, it
does not address from where these expansion costs arise. Understanding if such expansion
costs arise from imperfect capital markets or from limited managerial talent remains to
be studied. In addition, in quantifying expansion costs, the structural model presented
in this paper assumes firms are myopic. A richer model that allows for expansion costs
and for forward looking agents is a daunting task and left for future research.
Appendix A. Data
Regional and local firms

Firms whose supermarket stores are included as exogenous profit shifters are: Al Super,
Aramburo, Arteli, Auchan, Azcunaga, Calimax, Carrefour, Casa Chapa, Chalita, Coloso,
Comercial Californiano, Comerical Cruz Azul, Comercial VH, De las Fuentes, El Alba, El
Camino, El Fenix, HEB, Kmart, Luna, Los Molinos, Merco, MZ, Pitico, Rialfer, S-Mart,
Smart & Final, Su Bo dega, Super 10, Super Ahorros, Super del Norte, Super Gutierrez,
Super Kompras, Super Maz, Super San Fco de Asis, Tiendas PH, and Vision.
Interpolating population counts
INEGI provides population counts for each locality and for years 1995, 2000, 2005,
and 2010. CONAPO provides population estimates for each state and year. I use the
year-on-year changes in CONAPO’s projections to obtain yearly growth rates at the
state level, which I then re-scale so that their five-year compounded rate equals INEGI’s
five-year population change. Inter-census population counts in a given locality is given
by the count in the prior year times the growth rate of the respective year and state.
Interpolating demographic data
INEGI also provides demographic data at the municipality level for years 2000, 2005,
and 2010. Most demographic variables take on values between 0 and 1: e.g. the fraction
of the population that are retirees, government employees, private sector employees, and
adults; the fraction of adults with a middle school degree, and the faction of households
with public utilities. Average years of schooling is also included, which I divide by 14 so
that it also takes values between 0 and 1. As the relevant range of all these variables is
between 0 and 1, I interpolate between census years using a logistic transformation of a
quadratic index, where the weights are specific to each variable and each municipality.
For example, let xmt be the fraction of adult population in municipality m at year t and
let f(t) be the logistic function with quadratic time index:
1
f (t ) =
1+ eα0 +α1 ·t+α2 ·t2
I calibrate the parameters (α0 , α1 , α2 ) so that xmt is exactly f(t) at all three census years.
I then use the calibrated parameters to interpolate values for the remaining years. This
calibration and interpolation is done separately for each variable and for each municipal-
ity.
Interpolating income and expenditure data
ENIGH provides bi-annual surveys on household income and expenditure. Not all
municipalities are surveyed every survey year, but all states are accurately represented
in every survey year. Hence I first calculate average income and expenditure at the
state level and interpolate between survey years to obtain values for every year. The
interpolation fits a quadratic index to log-values from the prior year, the following year,
and those three years after. Interpolated values are used to calculate state-specific yearly
growth rates. These growth rates are then used to extrapolate values at a municipal level
for years prior to the first year in which the municipality is surveyed, as well as for years
following the last year the municipality is surveyed. For values between years in which a
municipality is surveyed, I interpolate these values in the same way as population: scale
the state-specific growth rates so that the compounded growth between survey years
matches the survey data, and then interpolate b etween survey years using these scaled
growth rates.
Appendix B. Asymptotic properties of the partial likelihood estimator with correction
Asymptotic properties of the PLEC are derived under the usual regularity assumptions
and one additional condition: that the second part of the likelihoo d function, Qb ( · ), be
uninformative of the parameter values to be estimated. In order to formally present
the asymptotic properties of the estimator, define the score of the log-partial-likelihoo d
function:
S(Y, θ) ≡ ∇a(Y, θ) (34)
The following result can now be established:
Proposition 1. Assume the parameter space, Θ, is compact, that a(Y, θ) and b(Y, θ) are
defined as in Eqs. (12) & (13), that Qa (θ) = E[a(Y, θ)], Qb (θ) = E[b(Y, θ)] and L(θ) =
Qa (θ) + Qb (θ), and that QaN (θ), QbN (θ), and LN (θ) are their corresponding finite sample
equivalents. In addition, assume θL , θa , and θNa
are the maximizers of L(θ), Qa (θ) and
QaN (θ), respectively, and θˆN and S(Y, θ) are defined as in Eqs. (16) and (34). Also,
assume:
p
1. Uniform convergence in probability: maxθ∈Θ |LN (θ) − L(θ)| → 0 and
p
maxθ∈Θ |QN (θ) − Q (θ)| → 0
a a
2. Unique identification: L(θL ) > L(θ) ∀θ ∈ Θ\{θL } and Qa (θa ) > Qa (θ) ∀θ ∈ Θ\{θa }
3. Normality of the log-partial-likelihood estimator: θa is in the interior of Θ, S(Y, θ)
is continuously differentiable in the interior of Θ for every feasible Y, |∇S(Y, θ)| is
bounded by a function c(Y), where E[c(Y )] < ∞, ∇2 Q(θa ) is positive-definite and each
element of S(Y, θ) has finite second moment.
4. Smoothness: L( · ) is twice differentiable at θa and ∇2 L(θa ) is non-singular
5. Irrelevance: Qb ( · ) is once differentiable at θa and ∇Qb (θa ) = 0
√ d
Then N (θˆN − θL ) → N (0, G−1 HG−1 ), where G ≡ Var[S(Y, θa )] and H ≡ ∇2 Qa (θa ).
Proof. By Assumptions (1) and (2) the finite sample parameters converge in probability
L p a p
to their respective asymptotic values: θN → θL and θN → θa . By assumption (3) the
distribution to which θN converges is a normal distribution with variance G−1 HG−1 .
a
Refer to Wooldridge (2002), Theorems 12.2 and 12.3 for these assertions.
Define T (θ) ≡ (∇2 L(θ))−1 ∇Qb (θ) and TN (θ) ≡ (∇2 LN (θ))−1 ∇QbN (θ). By the contin-
p
uous mapping theorem, TN (θN a
) → TN (θa ). Given Assumptions (1) and (4), by Slutsky’s
p p
theorem, ∇LN (θa )−1 → ∇L(θa )−1 and TN (θa ) → T (θa ). By assumption (5), T (θa ) = 0,
p a p p
which implies |θˆN − θN
a
| → 0. This fact, taken together with θN → θa , implies θˆN → θa
d
and θˆN → θa .
By the mean value theorem, ∃λ ∈ [0, 1] s.th. θL = θa + T (λθa + (1 − λ)θL ). As T (θa ) =
p d
0, λ = 1 satisfies such condition and θa = θL . Hence, θˆN → θL and θˆN → θL , where such
distribution is Normal with variance G−1 HG−1 .
Appendix C. Derivatives through finite differencing
ˆ
Recall θˆ is the estimator that maximizes the partial likelihoo d and θˆ = θˆ + T (θ)
ˆ where
ˆ
T (θ) is the PLEC’s correction term:
ˆ = −(∇2 Qa (θ) + ∇2 Qb (θ))−1 (∇Qb (θ))

T (θ)

T
T

Qa (θ) = ln Pr [Λat (θ)] + ln Pr Λbt (θ)|Λat (θ)
t=1 t=1

T

Qb (θ) = ˆ Λc (θ)|Λb (θ) ∩ Λa (θ)
ln Pr t t t
t=1
The gradient, ∇Qb , and the hessians, ∇2 Qa and ∇2 Qb , are calculated using forward
differences. Forward differences is the procedure in which derivative of f(x) at x1 is ap-
proximated by (f (x1 + w) − f (x1 ))/w for a predetermined step length w. The objective
of this section is to determine what is an appropriate step length such that T (θ) ˆ is as
accurate as possible.
Optimal step length balances mathematical accuracy with numerical accuracy: a small
length makes the forward differencing derivative mathematically accurate but numerically
inaccurate. The numerical inaccuracy arises from computers not being able to distinguish
f (x1 + w) from f(x1 ). With simulated functions the numerical inaccuracy can be very
large as f (x1 + w) can compute to the same value of f(x1 ) even for relatively large step
√
lengths, i.e. w 2 · 1016 –the standard step length. For example, if f(x) is a simulated

function of the form f (x) = R1 r 1{gr (x) = y}, the smallest non-zero difference between
f (x + w) and f(x) is 1/R. Hence, the finite difference derivative, given step length w, is
1
either 0 or larger (in absolute terms) than Rw . In the current application, the simulated
T
conditional likelihoo d is Q (θ) = t=1 ln[Pt (θ)] where Pˆt (θ) = R1 r 1{a (θ|εr , xt , z t ) =
b ˆ
at }. Hence, the smallest non-zero difference between Qb (θ + w) and Qb (θ) is at least
ln(maxt∈T Pˆt (θ) + R1 ) − ln(maxt∈T Pˆt (θ)), which is approximately (R · maxt∈T Pˆt (θ))−1
whenever this value is close to zero.
Hence, whenever ∇Qb is calculated by forward differencing with step size w, any non-
zero element of ∇Qb is at least as large as ρ/w, for ρ ≡ (R · maxt∈T Pˆt (θ))−1 . This means
that if the true gradient value is between 0 and ρ/w, the forward differencing gradient
will round it to one of the two extreme values. The larger the w, the smaller the error
due to this rounding. This suggest choosing w to be as small as p ossible sub ject to
acceptable rounding errors. I define an acceptable rounding error as one for which the
rounding error introduced in the correction term, G(θ), is within one percent of the θ. ˆ
−1 ˆ
That is, choose the smallest w s.th. |(∇ Q (θ) + ∇ Q (θ)) (ρ/w)| ≤ 0.01θ. As, prior to
2 a 2 b
calculating finite difference derivatives, ∇2 Qa (θ) is known but ∇2 Qb (θ) is not, I assume
|∇2 Qb (θ)| |∇2 Qa (θ)| and choose w as
ˆ)
w = ρ/(0.01 · |∇2 Qa (θ) · θ| (35)
Eq. (35) is intuitive. The step length decreases as the number of simulated draws, R,
increases: with high R the simulated likelihoo d approximates closely the true likelihoo d,
which is smooth function. Also, the step length increases as the acceptable rounding error,
0.01, decreases: the higher the numerical precision required, the lower the mathematical
accuracy of the gradient calculation.
Appendix D. A random effects model
The random effects model alters the assumption that the profit shocks, εtim , are inde-
pendent across markets, firms, and time. Instead, the random effects model assumes εtim
is the sum of two independent Normal random terms, one of which varies across firms
and markets, but is common across time. The other term is independent across markets,
firms, and time:
εtim = νim + ζim
t
(36)
T
The log-likelihoo d of observing a sequence of actions {at }t=1 is thus:

L(θ) = ln Pr ∩Tt=1 a (ν + ζ|E , xt , z t ) = at

= ln Pr ∩Tt=1 Λat + ln Pr ∩Tt=1 Λbt | ∩Tt=1 Λat + ln Pr ∩Tt=1 Λct | ∩Tt=1 (Λbt ∩ Λat )
(37)
where the second line follows from Bayes’ rule and the definitions of Λat , Λbt and Λct .
The log-likelihoo d has no closed form solution and cannot be expressed as the sum
of log-likelihoo ds across ‘observations’ as the time p ersistent comp onent in the profit
shock implies observations across time are correlated. However, the first component of
the log-likelihoo d (Eq. (37)) can be expressed as the sum of log-partial-likelihoo ds, each
one pertaining to different firms and markets. This suggests estimating the parameters
using this partial likelihoo d, with asymptotics based off markets and firms. Specifically,
note the definition of ∩Tt=1 Λat :
∩Tt=1 Λat ≡ {(ν, ξ) ∈ IM × IM T | ln htim (−1) − rim

t
− ln gim
t
(−1) ≤ νim + ζim
t
≤ ln[−htim (1)] − rim

t
− ln[−gim
t
(1)]}
Assume the time-persistent shock, ν, has a variance given by σ ν , and that of ζ is nor-
malized to one. The log-partial-likelihoo d is

Qa (θ, σν ) = ln Pr ∩Tt=1 Λat

e −ν 2 /σν2
= ln ΠTt=1 Φ ψim
t
(1) − ν − Φ ψim
t
(−1) − ν √ dν
i m
2πσν
where ψimt
(·) is as previously defined in Eq. (28). I estimate (θ, σ ν ) by maximizing Qa (θ,
σ ν ). The integral is approximated using Gauss-Hermite quadrature with fifteen sample
points.
Appendix E. Details on consumer welfare in counterfactual simulations
Recall demand for store j, owned by firm i, is given by

(δ−αpk )/λ λ−1
e(δ−αpj )/λ k∈Ji e
Dj = I
(δ−αpk )/λ λ
1 + n=1 k∈Jn e
Let the vectors p(i) and p(−i) be prices of firm i’s stores and firm i’s rivals’ stores,
respectively. Nash equilibrium in prices is defined as:

p(i) = arg max M · Dj (p(i) , p(−i) ) · (pj − c) ∀i
p(i)
j∈Ji
Equilibrium prices are determined by the FOCs of the maximization problem:
d d
Dj + (p j − c) · Dj + (p k − c) Dk = 0 ∀j ∈ Ji ∀i
dpj dpj
k∈Ji \j
As all of a firm’s stores are identical, there is an equilibrium where prices are the same
across all of a firm’s stores. In addition, if prices are the same across a firm’s stores,
then so is demand and its derivatives. Thus, let si be firm i’s number of stores, let pi be
firm i’s equilibrium price in all of its stores, and let Di be firm i’s equilibrium demand,
aggregated across all of firm i’s stores. The FOC on prices, equilibrium demand, and
equilibrium profits can be highly simplified:
−1
1
F OC → 0 = α(pi − c) − 1 − Di
si

eλ ln si +δ−αpi
Demand → Di = si I
1 + n=1 eln sn +δ−αpn
Profits → πi = (pi − c)Di
To solve for equilibrium prices it is useful to introduce the change of variable: ρi =
α(pi − c). With this change of variable, equilibrium demand is:
eλ ln si +δ−αc−ρi

Di = si I
1 + n=1 eλ ln sn +δ−αc−ρn

Furthermore, consumer surplus is

I
λ ln sn +δ−αc−ρ
CS = ln 1 + e n
n=1
It should now be clear how equilibrium prices, demand, and consumer surplus are de-
termined as a function of two parameters, λ and δ − αc. Moreover, relative profits are
also a function of only these two parameters, as equilibrium profits can be re-written as
πi = α1 ρi Di .
References
Aguirregabiria, V., Vicentini, G., 2016. Dynamic spatial competition between multi-store firms. The
Journal of Industrial Economics (64) 710–754.
Bajari, P., Benkard, L., Levin, J., 2007. Estimating dynamic models of imperfect competition. Econo-
metrica 75 (5), 1331–1370.
Basker, E., 2005. Job creation or destruction? labor market effects of Wal-Mart expansion. Review of
Economics and Statistics 87 (1), 174–183.
Baumol, W.J., 1962. On the theory of the expansion of the firm. The American Economic Review 52,
1078–1087.
Benkard, L., Weintraub, G., Roy, B.V., 2008. Markov perfect industry dynamics with many firms. Econo-
metrica 1375–1441.
Bresnahan, T., Reiss, P., 1990. Entry in monopoly markets. The Review of Economic Studies 57 (4),
531–553.
Bresnahan, T.F., Reiss, P.C., 1991. Entry and competition in concentrated markets. The Journal of
Political Economy 99 (5), 977–1009.
Consejo Nacional de Poblacion, Secretaria de Desarrollo Social, 2007. Delimitacion de las Zonas
Metropolitanas de Mexico 2005.
Durand, C., 2007. Externalities from foreign direct investment in the mexican retailing sector. Cambridge
Journal of Economics 31, 393–411.
Ellickson, P.B., Houghton, S., Timmins, C., 2013. Estimating network economies in retail chains: a
revealed preference approach. RAND Journal of Economics 44, 169–193.
Fox, J., 2018. Estimating matching games with transfers. Quantitive Economics 9, 1–38.
Fuentes, H.J., Zamudio, A., Ortega, C., Mendoza, J., Soto, J., 2008. Servicio de elaboracion del estudio
de: “evasion fiscal generada por el ambulantaje”. ITESM, Campus Ciudad de Mexico.
Grupo Gigante, 2001. Informe Annual 2001, Grupo Gigante.
Holmes, T.J., 2011. The diffusion of Wal-Mart and economies of density. Econometrica 79 (1), 253–302.
Iacovone, L., Javorcik, B., Keller, W., Tyb out, J., 2015. Supplier resp onses to Wal-Mart’s invasion of
mexico. Journal of International Economics 95, 1–15.
Ishii, J., 2008. Compatibility, Competition, and Investment in Network Industries: ATM Networks in
the Banking Industry. working paper. https://www.researchgate.net/publication/237271062.
Javorcik, B.S., Keller, W., Tybout, J.R., 2008. Openness and industrial response in a Wal-Mart world:
a case study of mexican soaps, detergents, and surfactant producers. World Economy 31, 1558–1580.
Jia, P., 2008. What happens when Wal-Mart comes to town: An empirical analysis of the discount
retailing industry. Econometrica 76 (6), 1263–1316.
Lam, B., 2017. Trump’s ‘Two-for-One’ Regulation Executive Order. The Atlantic.
Lucas Jr., R.E., 1978. On the size and distribution of business firms. Bell Journal of Economics 9,
508–523.
Maican, F., Orth, M., 2018. Entry regulation, welfare and determinants of market structure. International
Economic Review 59, 727–756.
Manski, C.F., 1975. Maximum score estimation of the stochastic utility model of choice. Journal of
Econometrics 3 (3), 205–228.
Mazzeo, M.J., 2002. Product choice and oligopoly market structure. The RAND Journal of Economics
33 (2), 221–242.
Nash Jr, J., 1950b. Equilibrium points in n-person games. Proceedings of the National Academy of
Sciences 36, 48–49.
Newey, W., McFadden, D., 1994. Large Sample Estimation and Hypothesis Testing, IV. Elsevier.
Nishida, M., 2015. Estimating a model of strategic network choice: the convenience-store industry in
Okinawa. Marketing Science 34, 20–38.
Penrose, E., 1995. The Theory of the Growth of the Firm, third ed. OXFORD University Press.
Secretaria de Turismo, 2015. Resultados de la actividad hotelera.
Seim, K., 2006. An empirical model of firm entry with endogenous pro duct-typ e choices. The RAND
Journal of Economics 37 (3), 619–640.
Sutton, J., 1991. Sunk Costs and Market Structure: Price Competition, Advertising, and the Evolution
of Concentration. The MIT Press.
Sutton, J., 2001. Technology and Market Structure, first ed. MIT Press.
Tamayo, Z.R., 2005. Chedrahui adquiere carrefour Mexico. El Universal.
Train, K.E., 2009. Discrete Choice Methods with Simulation, second ed. Cambridge University Press.
Wooldridge, J.M., 2002. Econometric Analysis of Cross Section and Panel Data, first ed. The MIT Press.

The Costs of Growth: Accelerated Growth and Crowd-Out in The Mexican Supermarket Industry

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Costs of Growth: Accelerated Growth and Crowd-Out in The Mexican Supermarket Industry

Uploaded by

Copyright:

Available Formats

International Journal of Industrial Organization 61 (2018) 1–52

Contents lists available at ScienceDirect

International Journal of Industrial

The costs of growth: Accelerated growth and

2. Understanding expansion costs and crowd-out

In this section I present a simple theoretical framework to understand how expansion

Assumption 1. Revenues and costs are such that

3. Data and industry background

Fig. 1. Store counts, by ﬁrm and year.

3.3. Variables that proxy for supermarket store proﬁtability

With entry Without With entry Without

no. of observations 626 22,804 626 22,804

3.4. Industry practices for store opening

3.5. Store openings

4. A model of store opening

4.1. Game setup

ai = argmaxΠi (ai , a−i ) s.t. ai ≥ −zi ∀i = 1..I (4)

4.2. Characterizing best responses

As the action space is discrete, it is not feasible to characterize best responses

Πi (ai , a−i ) ≥ Πi (ai , a−i ) ∀ai ∈ A a ∪ A b (6)

A a (ai , zi ) = {ai | ai = ai ±em , ai ≥ −zi , ∀m ∈ M } (7)

A b (ai , zi ) = {ai | ai = ai + em − en , ai ≥ −zi , ∀m ∈ N≥0 (ai ) , ∀n ∈ N>0 (ai )} (8)

5. A partial likelihoo d estimator with correction

5.2. The estimator

ly (θ) = ln Pr [Ay (θ) ∩ B y (θ)] (10)

Decompose this log-likelihood using Bayes’ Law and properties of logarithms:

is also small, the ap-

5.3. Exogenous identifying variation

The partial log-likelihoo d is

I illustrate the advantages of the PLEC, as well as how variation in X identiﬁes β, by

One percent Ten percent Forty percent

True PLEC SMLE True PLEC SMLE True PLEC SMLE

6. Estimating expansion costs in the Mexican supermarket industry

6.1. Estimation strategy

where I’ve used a compressed notation: Λat (θ) ≡ Λa (ε|xt , z t , at , θ).

6.2. Calculating likelihoods

smooth function. In contrast, the likelihoo d function, L(θ), has to b e approximated by

6.2.1. Partial likelihood from single market deviations (Pr [Λa ])

1. Eﬀective proﬁt index (logged):

With this notation, re-write Λat as

≥ max {εim + rim

Many practices can be implemented to improve performance of simulated likelihoo ds

6.3. Exogenous proﬁt shifters

6.4. Discussion on identiﬁcation

7.1. Entry costs

CCM CHD GG LEY SOR WM

M.J. Varela / International Journal of Industrial Organization 61 (2018) 1–52

nth store (#) 5 4 11 5 13 27

7.2. Competition parameters

0→1 1→2 2→3 0←1 1←2 2←3

CCM 64.9 130 195 1.64 3.29 4.92

7.3. Proﬁt shifters

7.4. Equilibria selection and it’s eﬀect on estimated parameters

8. Simulating industry expansion under accelerated growth

8.2. A simple model relating stores to consumer welfare

Let consumers’ preference for a given store be captured in a nested-logit framework.

urj = δ − αpj + ξrj (32)

M.J. Varela / International Journal of Industrial Organization 61 (2018) 1–52

8.3. Results from counterfactual simulations

Regional and local ﬁrms

Appendix B. Asymptotic properties of the partial likelihood estimator with correction

The following result can now be established:

Appendix C. Derivatives through ﬁnite diﬀerencing