You are on page 1of 50

Advanced Topics in Industrial Organization

Lecture 2: Discrete-choice models

Panle Jia Barwick

Cornell

Spring 2015

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 1 / 50


Introduction: random utility and discrete choices

In AIDS models, demand is continuous: consumers allocate a share of


their expenditure / budget to each product.
We now introduce the second type of demand models – discrete
choice models:
there are many products, with each product being a unique
combination of different characteristics;
consumers have well-defined and idiosyncratic preferences;
each consumer chooses the single option that yields the highest utility.
These models are often called “random utility” models, because
utility is described as a random variable reflecting unobserved taste
differences.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 2 / 50


Road map

A quick road-map of today’s lecture:


we will start with the (in)famous logit demand;
discuss its limitations;
present the extension of logit models: nested logit;
discuss Berry (1994) which inverts shares to solve the endogeneity
problem in a nonlinear demand system.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 3 / 50


Basic set-up of a random utility model

Let i index consumers, j index products. Let consumer i’s indirect


utility of consuming product j be denoted as:

Uij = Vij + εij , j = 0, ..., J

where Vij is the utility explained by observed attributes (which include


both observed consumer characteristics and observed product
attributes) and εij is the unobserved random utility.
Assume εi = {εi0 , ...εiJ } is distributed according to a density function
f ( εi ).
Consumer i chooses product j iff:

Uij ≥ Uik , ∀k 6= j

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 4 / 50


Basic set-up of a random utility model II

The probability of consumer i choosing product j is:

Pij = Pr(εik − εij ≤ Vij − Vik , ∀k 6= j )


Z
= I (εik − εij ≤ Vij − Vik , ∀k 6= j )f (εi )d εi

What is the standard normalization that we impose on this kind of


models?

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 5 / 50


Basic set-up of a random utility model II

The probability of consumer i choosing product j is:

Pij = Pr(εik − εij ≤ Vij − Vik , ∀k 6= j )


Z
= I (εik − εij ≤ Vij − Vik , ∀k 6= j )f (εi )d εi

Like all discrete choice models, only differences matter: if every Vij is
shifted up or down by a constant, the choice does not change.
Typically we normalize the observed utility of one choice to 0
(Vi0 = 0).
Scales do not matter either. Dividing both sides of the inequality by a
positive constant does not change the inequality.
The standard solution is to normalize the variance of the error terms.
Note that the choice probability is a J dimensional integral in the
most unrestricted form, which indicates the level of computational
complexity.
Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 6 / 50
Logit models

Suppose εij is iid across j, distributed according to the type one


−ε
extreme value distribution F (ε) = e −e .
The distribution is sometimes called “double exponential”, or the
“Gumbel distribution”, or the “Weibull distribution”.
When all consumers are identical except for the error term, Vij
becomes Vj , and the choice probabilities are the same across
consumers, which are equal to market shares.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 7 / 50


Logit models: deriving shares

Let sj denote the market share of product j (omitting the subscript i):
Z
sj = I (εk − εj ≤ Vj − Vk , ∀k 6= j )f (ε)d ε
Z
−(Vj −Vk +εj ) − εj
= ∏ e −e ∗ e − εj ∗ e −e d εj
k 6 =j
 
e Vk
− ∑ k 6 =j ∗ e − εj
Z
− εj
∗ e − εj ∗ e −e
V
= e e j d εj
 
e Vk
− ∑ k 6 =j +1 ∗ e − εj
Z
− εj V
= e ∗e e j d εj
e Vk
Z
= exp(−εj − e −εj ∗ (∑ ))d εj
k e Vj

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 8 / 50


Logit models: deriving shares II

V
Let e λj = ∑k e Vkj . Using the change of variables (replacing d εj with
e
d (εj − λj )), we have:
Z
sj = exp(−εj − e −εj ∗ e λj )d εj
Z
−(εj −λj )
= e − λj ∗ e −(εj −λj ) e −e d ( εj − λj )
e Vj
= e − λj = (1)
∑ k e Vk

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 9 / 50


Logit models: linear utility

Typically, the indirect utility function is linear in attributes:

Uij = Xj β − αpj + εij , j = 1, ..., J

where ‘Xj ’ is the vector of product attributes, and ‘pj ’ is product j’s
price.
This kind of indirect utility can be derived from a quasi-linear utility
function, where consumers assign a fixed budget to consumption of
product j = 0, ..., J, which is free of income effects.
For many products, the assumption of no income effects seems
reasonable.
For large household items (cars, houses, durable goods), this is less
ideal. Later we will study papers that incorporate income effects.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 10 / 50


Logit models: price coefficient and welfare

What does the price coefficient ‘α’ measure?


How do we measure consumers’ willingness to pay for one additional
unit of some product attribute?
What is consumer’s expected surplus with J choices?
How does the introduction of a new product affect consumer surplus?
Is this feature unique to logit models?

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 11 / 50


Logit models: price coefficient and welfare

The price coefficient ‘α’ is the marginal utility of a dollar, and the
β
ratio α measures consumers’ willingness to pay for one additional unit
of some product attribute (for example, 1GB of computer RAM).
The expected consumer surplus with J choices is:
h i J
E (CS ) = E max {Uij }K
j =0 = ln ( ∑ e Vj ) + euler constant
j =0

The introduction of a new product k, regardless of how similar the


product is to existing products, increases consumer surplus from
ln(∑Jj=0 e Vj ) to ln(∑Jj=0 e Vj + e Vk ).

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 12 / 50


Welfare with N+1 products

Note: this feature is not unique to logit models.


It holds for all models with independent taste errors.
In other words, the unobserved product space is expanded each time a
new product is introduced.
There are several ways to deal with this:
One is to modify the distribution assumption (for example, make the
error terms correlated or make the support of the error terms bounded).
Another is to get rid of the error term (εj : taste for products) and only
use characteristics (Berry and Pakes 2007).

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 13 / 50


Logit models: estimation

When all characteristics are exogenous, this is a standard multinomial


logit model.
The parameters can be estimated using MLE. STATA does it easily.
How does the IO literature depart from the rest of the literature on
discrete choices?
What is some related empirical evidence?

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 14 / 50


Logit models: estimation

If all characteristics (including prices) are exogenous, then this is a


standard multinomial logit model. The parameters can be estimated
using MLE. STATA does it easily.
Often it is problematic to assume prices are exogenous. For example,
we do not observe all relevant product characteristics (omitted
variable bias).
To make things explicit:

Uij = Xj β − αpj + ξ j + εij , j = 1, ..., J

where ξ j captures unobserved product attributes (product quality,


brand image, customer service, advertising etc.) that consumers value
but researchers do not observe.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 15 / 50


Logit models: estimation II

Price is correlated with ξ j through the first-order condition:


mark-up (pj − cj ) is inversely related to demand elasticity, and
demand elasticity depends on ξ j .
We will discuss how to address the endogeneity problem (Berry
(1994)) after presenting extensions to logit models.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 16 / 50


Drawbacks of logit models: IIA

Logit models are easy to estimate, and the endogeneity problem can
be easily addressed (see Berry 1994). However, it has a couple of
undesirable features.
What are the undesirable features?

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 17 / 50


Drawbacks of logit models: IIA

Logit models are easy to estimate, and the endogeneity problem can
be easily addressed (see Berry 1994). However, it has a couple of
undesirable features.
IIA property: independence from irrelevant alternatives. From the
share equation (1), we know that:

sj e Vj
= V , ∀j, k
sk e k

The odds of choosing j over k does not depend on other alternatives.


Conditioning on Vj and Vk , the odds of choosing j over k does not
depend on attributes of product j and k.
The famous red-bus and blue-bus example.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 18 / 50


Proportional substitution patterns

Proportional substitution patterns:


( ∂V
j
∂sj ∂pj ∗ sj ∗ (1 − sj ), j = k
= ∂Vk
∂p ∗ sj ∗ sk , j 6 = k
∂pk
k

When the price of product k increases, some consumers who used to


buy k will switch to other products. The fraction of consumers
switching to product j is proportional to j’s existing market share.
The fact that consumers switch to the most popular products is not
unique to logit model.
This is driven by a) homogeneous observed utility Vj , and b) error
term εij that is iid across both i and j.
Since consumers only differ in the iid random taste shocks εij , the
market share of each product is determined by the order statistics.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 19 / 50


Proportional substitution patterns II

Consider the case of cars. Suppose the price of BMW increases. For
some consumers who used to buy BMW, their utility from this
product decreases enough such that they switch to their second best
choice.
In models with iid εij and Vj that is identical across i, the proportion
of consumers who rank each brand as their second choice is simply
determined by the order statistics.
These models would predict that when BMW becomes more
expensive, a high fraction of its consumers would switch to either
Honda accord or Toyota camry, the most popular products.
This switching pattern is unlikely to hold in reality, because a typical
BMW consumer values the engine power (not the fuel efficiency).

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 20 / 50


Unappealing elasticity and mark-up predictions

In logit models, the price elasticities are:



∂sj pk −αpj (1 − sj ), j = k
η jk = ∗ =
∂pk sj αpk sk , j 6= k

Typically 1 − sj ≈ 1. Low price products have:


low elasticities
pj −cj 1 .
high Lerner’s ratio =
pj |εjj |
marginal cost as a percentage of price is lower
Two products with the same mean utility Vj = Vj 0 , j 6= j 0 will have:
the same market shares
the same cross-price elasticities with respect to any other product
k 6= j, k 6= j 0 .

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 21 / 50


Extension of logit models: nested logit and generalized
extreme value models

There are two ways to generalize the logit model:


allow Vj to depend on household attributes: Vij (next class)
generalize the distribution of ε (GEV models).
In this class, we will discuss the GEV models, where the unobserved
utilities for all alternatives {εi · } have a joint generalized extreme
value distribution.
The most popular candidate is the nested logit, where products are
clustered into different nests. Alternatives within a nest are
correlated, while alternatives in different nests are independent.
When prices change, consumers substitute more towards products
within a nest, less to products in other nests.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 22 / 50


Nested logit: distribution assumption
Formally, ε = {ε1 , ..., εJ } is distributed according to:
− εj
− ∑K
k =1 ( ∑j ∈Bk exp( ))λk
F (ε) = e λk

where Bk is the set of products included in nest k, λk measures the


independence of εj within the nest k. The higher λk is, the more
independent εj are.
If λk = 1 for all nests, then we are back to the logit model, where all
εj ’s are independent:
− εj
F ( ε ) = e − ∑j e , j = 1, ..., J
When λk is close to 0, then products in nest k are highly
substitutable.
λk < 0 is not consistent with utility maximization; if λk > 1, then
only certain parameter values are consistent with utility maximization.
We assume λk ∈ [0, 1] ∀k.
Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 23 / 50
Nested logit: deriving shares

The choice probabilities can be decomposed into two components:


the probability of choosing product j conditioning on nest k being
chosen, and the probability of choosing nest k.
For simplicity, let Wk denote the utility common to the nest k, and
Vj be the deviation of product j from the nest average:

Uij = Wk + Vj + εij , j ∈ Bk

Then:
sj = sj |Bk ∗ sBk
e Wk + λ k I k

 sBk = , Ik = ln ∑j ∈Bk e Vj /λk
∑l e Wl +λl Il
(V /λ )
e j k
sj | B k =
∑i ∈Bk e (Vi /λk )

where Ik is the inclusive value of nest Bk , and λk Ik is the expected


utility that consumers receive from the choices in nest Bk .
Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 24 / 50
Nested logit & multi-stage budgeting

Nested logit is often associated with multi-stage budgeting, where


consumers first choose a nest, then narrow their choice down to a
particular product within the nest.
For example, consumers can first choose which city to live in, and
then choose a house in a city.
Like logit models, nested logit models also have analytic solutions,
which is convenient.
Nesting choices only partially alleviate the problem that consumers
switch to popular products – they switch to popular products within
the same nest.
The way nests are constructed also matters: the substitution pattern
will be different for different nest structures, even if the final branches
are the same.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 25 / 50


Generalized nested logit

The most general solution is the generalized nested logit, where the
same product can belong to multiple nests.
For example, suppose consumers can choose between driving, car
pool, bus, or train:
driving and car pool are similar choices, while bus or train are similar
choices;
like bus and train, car pool is also inflexible to some extent;
as a result, car pool can belong to both nests.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 26 / 50


Generalized Extreme Value II

Let αjk denote the portion of εj that belongs to nest k, and let
∑k αjk = 1. The choice probability of choosing product j can be
decomposed to:
sj = ∑ sj | k ∗ sk
k
{∑j ∈k [αjk exp(Vj )]1/λk }λk

 sk =
∑K
l =1 { ∑j ∈l [ αjl exp(Vj )]
1/λl }λl

(αjk exp(Vj ))1/λk


 sj | k =
∑i ∈k (αik exp(Vi ))1/λk

where the utility model is: Uij = Vj + εij .


In nested logit, product j belongs to one nest; here it can belong to
multiple nests, and we sum over all nests that product j belongs to.
For more details, see Kenneth Train (2003): “Discrete Choice
Methods with Simulation.”

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 27 / 50


Generalized Extreme Value III

In principle one could consider estimating an unrestricted


variance-covariance matrix to allow for fully flexible substitution
patterns between any two products.
However, this reintroduces the dimensionality problem of estimating
J × J parameters when J is large.
In the next class, we introduce the second approach to relax the
restrictive substitution patterns imposed by logit: random coefficient
models.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 28 / 50


Berry (1994): a solution to endogenous prices

As we argued above, prices are endogenous:


prices are correlated with unobserved attributes that affect demand,
because firms set prices to maximize profits.
In nonlinear models (the discrete choice model is one of them),
dealing with endogeneity is difficult when we do not want to impose
any restriction on the correlation between prices and the unobserved
attributes.
The contribution of this paper is to invert unobserved product
attributes from observed market shares.
As we see in the production literature, another popular approach to
deal with unobserved attributes is the control function approach.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 29 / 50


Berry (1994): model primitives

Firms produce differentiated products: j = 1, ..., J


Consumers have well-defined (random) preference: Uij
Equilibrium concept: consumers maximize utility, and firms compete
in prices or quantity.
Data requirement: a large number of markets and product shares in
each market.
The concept of “market” is fairly general. Different cities, different
time periods, or different consumer groups are all valid candidates of
“markets”.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 30 / 50


Berry (1994): data

Pj Qj
Shares: we need physical shares (sj ), not revenue shares ( ∑ Pk Qk ).
k
Sometimes this requires converting observed sales to physical shares.
Prices: what if we need to aggregate over different styles, packages,
SKUs, etc.?
Use shares weighted prices. Constant shares (e.g. sample average
shares) are preferred.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 31 / 50


Berry (1994): data II

Outside-option: we rarely observe the fraction of consumers


choosing the outside option. (How many people choose not to
purchase an iphone, or not to watch Avatar?)
Market size: the standard solution is to assume that the market size
is a parameterized function of observed variables, like population,
with the parameters to be determined.
For example, in most studies on cars (or household durables), the
market size is assumed to be the number of households in US.
In airline studies, the market size is often assumed to be the product
of the population of end-cities (partly motivated by gravity models).

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 32 / 50


Berry (1994): demand

Given utility fn Uij = Xj β − αpj + ξ j + εij , the market share of


product j is:
Z
sj = I [εij ≥ εik + Xk β − αpk + ξ k −
(Xj β − αpj + ξ j ), ∀k 6= j ]dF (ε)
= sj (X , P, ξ )

What are the standard approaches in estimating parameters {α, β}?

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 33 / 50


Berry (1994): demand

The market share of product j is:


Z
sj = I [εij ≥ εik + Xk β − αpk + ξ k −
(Xj β − αpj + ξ j ), ∀k 6= j ]dF (ε)
= sj (X , P, ξ )

We can not estimate the likelihood of the sample, because we do not


want to impose a distribution assumption on the unobserved quality
ξj .
We can not instrument for pj , because ξ j enters the share equation
nonlinearly.
It seems that ξ j is the trouble maker.
Why do we need ξ j in our model?

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 34 / 50


Berry (1994): demand and xi

Suppose we are interested in studying the car demand. For now, let
us simply ignore ξ j .
There are 100 million households in US.
If each household’s purchase probability is described by sj (X , P ), with
such a big “sample size”, the model predicted aggregate share should
1
differ from observed market shares by approximately √100,000,000 .
There is no demand model (with pure consumer taste shocks) that is
up to the test.
As a result, we need unobserved quality ξ j to help us fit data, even if
we are bold enough to claim prices are exogenous!

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 35 / 50


Berry (1994): inverting shares

The contribution of this paper is to show that under mild


distributional assumptions, there is a one-to-one mapping between the
observed market share vector {sj } and the vector of mean utility
{δj = Xj B − αpj + ξ j }.
The existence of a one-to-one mapping justifies inverting {ξ j } from
{sj }.
The paper’s proof used to be important since most extensions of BLP
had to show that there was a one-to-one mapping between ξ and s.
Berry, Gandhi, and Haile (2012) proved the existence of such a
mapping for a very general set up.
In the following, we use g (·) to denote the share function, to
distinguish it from the observed shares s.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 36 / 50


Berry (1994): inverting shares

Theorem
If the density function F (ε) is positive and continuous for all ε, and we
normalize mean utility for the outside option to 0, then the inverse
function g −1 exists: ξ = g −1 (s ). In other words, there is one and only one
mean utility vector that can rationalize the observed market share.

Proof sketch: Since F (ε) is positive and continuous, it is easy to show


that the share function is smooth, increasing in own mean utility and
∂g ∂g
decreasing in other product’s mean utility: ∂δjj > 0, ∂δkj < 0, ∀k 6= j.
Note that limδj →−∞ gj (δ) = 0, and limδj →∞ gj (δ) = 1.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 37 / 50


Berry (1994): inverting shares II

We first define the element-by-element inverse of the share


equation: rj (δ, sj ), which is the value that makes the predicted share
gj equal to the observed share sj , conditional on δ−j :

sj = gj (δ1 , ..., rj (δ, sj ), ..., δJ )

Since gj (·) is monotone and smooth, the inverse function rj (δ−j , sj )


exists and strictly increases in δk , k 6= j.Combining the element by
 δ1 = r1 (δ−1 , s1 )

...



element inverse functions into a vector: δ j = rj ( δ − j , s j )
...




δJ = rJ (δ−J , sJ )

The construction of r (δ, s ) transforms the inversion problem into a


fixed-point problem: δ satisfies g (δ) = s iff δ = r (δ, s ).
We show that the fixed point δ exists and is unique.
Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 38 / 50
Berry (1994): inverting shares III

Key: show that r (δ, s ) is a continuous function that maps a compact


space into itself. Loosely speaking, we want to show that r (δ, s ) is
bounded.
Before we start, notice that to rationalize a given level of market
δj
share sj = 1+ e e δk , if δk were to increase for k 6= j, then
∑k
δj = rj (δ, sj ) would have to increase as well.
This is just to repeat that rj (δ−j , sj ) strictly increases in δk , k 6= j.
How low can rj (δ−j , sj ) go?

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 39 / 50


Berry (1994): inverting shares III

We can push δk to −∞ for k 6= j. Then δLj = rj (0, −∞, −∞, ..., sj ) is


as low as we can go if we were to rationalize sj .
Intuitively, rj (δ, sj ) is bounded below because limδj →−∞ gj (δ) = 0.
n o
Let δ= min δLj . This is the lower bound of r (δ, s ).

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 40 / 50


Berry (1994): inverting shares III

How do we show that rj (δ, sj ) is also bounded above?


1
Hint: s0 = 1 + ∑ k e δk
.
What happens if we push δk higher and higher?

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 41 / 50


Berry (1994): inverting shares III

In other words, r (δ, s ) is also bounded above otherwise the market


share of the outside product would be 0 (because δ0 ≡ 0).
Let δk = −∞, for all k 6= j. Define δU
j as the value of δj that sets the
market share
n o of the outside good equal to the observed share s0 . Let
U
δ̄ = max δj .
This is the upper bound of r (δ, s ).

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 42 / 50


Berry (1994): inverting shares IV

We show that r (δ, s ) is a continuous function that maps [δ, δ̄]J into
itself.
The Brouwer’s fixed point theorem shows that a fixed point exists.
The dominant diagonal argument says the fixed point is unique iff
∂r
∑k ∂δkj < 1, ∀j.
∂rj
What is implied by condition ∑k ∂δk < 1, ∀j?
Intuition: since δ0 is fixed at 0, when attributes of all rival products
increases ε, δj (or rj (δ, sj ) only needs to increase by less than ε to
rationalize sj . Hence the ‘slope’ of rj (δ, sj ) is less than 1.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 43 / 50


Berry (1994): inverting shares IV
∂rj
To show this formally, first note that ∂δj = 0 because rj does not
∂rj
depend on δj . In addition, ∂δ0 = 0 because we fix δ0 = 0.
∂gj
∂rj ∂r ∂r
=− , ∂δjj = 0, ∂δj0 = 0
∂δk
Since ∂δk ∂gj
∂δj

∂rj ∂rj ∂gj ∂gj


∑k ∂δk = ∑k 6=0,j ∂δk < 1 ⇔ ∑k 6=0,j ∂δk < ∂δj
∂gj ∂gj ∂gj
Given that ∂δk < 0,for all k 6= j, ∑k 6=0,j ∂δk < ∂δj ⇔
∂gj ∂gj ∂gj
∂δ1 + ... + ∂δj + ... + ∂δJ > 0.
∂gj ∂gj ∂gj ∂gj ∂gj
This is true because ∂δ0 + ∂δ1 ... + ∂δj + ... + ∂δJ = 0, and ∂δ0 < 0.
∂r ∂g
Note: ∂δj0 = 0 but ∂δ0j < 0. This is because rj is defined such that it
does not depend on δ0 , while gj is defined such that it depends on δ0
and that δ0 is normalized to 0

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 44 / 50


Inverting shares: implementation

Berry (1994) shows that the inversion exists and is unique. In


practice, how do we solve for the mean utility δj = Xj β − αpj + ξ j ?
Logit and nested logit models have explicit formula for δj . We will
learn about a) inverting δj iteratively b) imposing the share equations
as constraints (the MPEC approach) when we study BLP.
How do we solve for δj (and hence ξ j ) in logit models?

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 45 / 50


Inverting shares: implementation

In logit models,

ln(sj ) − ln(s0 ) = Xj β − αpj + ξ j

This is a simple IV regression, where we instrument prices.


What IVs can we use?

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 46 / 50


Estimation and instruments

Possible IVs include:


cost variables excluded from demand side, like input prices that vary
across firms;
characteristics of other products: Xk , k 6= j, if we take characteristics
as exogenously given.
Why are characteristics of other products appropriate IVs?

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 47 / 50


Estimation and instruments

Because firms’ first-order pricing condition suggests that pj depends


on other products’ attributes:
pj − cj 1
=−
pj εjj (X , P, ξ )

Intuitively, products facing lots of substitutes have lower prices.


In addition, other products attributes are excluded from the utility of
consuming project j.
As a result, in differentiated product models, demand-side
instruments can be any excluded variables that affect mark-ups.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 48 / 50


Monte-Carlo evidence

σξd =1 σξd =3
(1) (2) (3) (4)
Parameter True Value OLS IV OLS IV
βo 5 3.460 4.980 0.378 4.890
(0.158) (0.226) (0.415) (0.738)
βx 2 1.410 1.990 0.325 1.950
(0.058) (0.091) (0.127) (0.272)
α 1 0.726 0.995 0.181 0.979
(0.029) (0.039) (0.076) (0.128)
Table: Berry, Table 1 – Monte Carlo Parameter Estimates 100 Random Samples
of 500 Duopoly Markets Logit Utility Notes: The values given in the table are
empirical means and (standard errors). The utility function is
uij = βo + βx xj + σξd ξ j − αpj + eij . Marginal cost is
cj = e γo +γx xj +σξc ξ j +γw wj +σω ωj .

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 49 / 50


Results

In column one, 80% of the variance in the mean utility is accounted


by the observed attributes, all parameters are biased toward 0.
In the second experiment, 70% of the variance in the mean utility is
accounted by the unobserved attributes, and the parameters are
severely biased toward 0.
As expected, IV estimates have slightly larger standard errors.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 50 / 50

You might also like