Lec2 Demand Berry1994 Sp15

Advanced Topics in Industrial Organization
Lecture 2: Discrete-choice models
Panle Jia Barwick
Cornell
Spring 2015
Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 1 / 50

Introduction: random utility and discrete choices
In AIDS models, demand is continuous: consumers allocate a share of

their expenditure / budget to each product.
We now introduce the second type of demand models – discrete
choice models:
there are many products, with each product being a unique
combination of different characteristics;
consumers have well-defined and idiosyncratic preferences;
each consumer chooses the single option that yields the highest utility.
These models are often called “random utility” models, because
utility is described as a random variable reflecting unobserved taste
differences.

Road map
A quick road-map of today’s lecture:

we will start with the (in)famous logit demand;
discuss its limitations;
present the extension of logit models: nested logit;
discuss Berry (1994) which inverts shares to solve the endogeneity
problem in a nonlinear demand system.

Basic set-up of a random utility model
Let i index consumers, j index products. Let consumer i’s indirect

utility of consuming product j be denoted as:
Uij = Vij + εij , j = 0, ..., J
where Vij is the utility explained by observed attributes (which include

both observed consumer characteristics and observed product
attributes) and εij is the unobserved random utility.
Assume εi = {εi0 , ...εiJ } is distributed according to a density function
f ( εi ).
Consumer i chooses product j iff:
Uij ≥ Uik , ∀k 6= j

Basic set-up of a random utility model II
The probability of consumer i choosing product j is:
Pij = Pr(εik − εij ≤ Vij − Vik , ∀k 6= j )

Z
= I (εik − εij ≤ Vij − Vik , ∀k 6= j )f (εi )d εi
What is the standard normalization that we impose on this kind of

models?

Basic set-up of a random utility model II
The probability of consumer i choosing product j is:
Pij = Pr(εik − εij ≤ Vij − Vik , ∀k 6= j )

Z
= I (εik − εij ≤ Vij − Vik , ∀k 6= j )f (εi )d εi
Like all discrete choice models, only differences matter: if every Vij is
shifted up or down by a constant, the choice does not change.
Typically we normalize the observed utility of one choice to 0
(Vi0 = 0).
Scales do not matter either. Dividing both sides of the inequality by a
positive constant does not change the inequality.
The standard solution is to normalize the variance of the error terms.
Note that the choice probability is a J dimensional integral in the
most unrestricted form, which indicates the level of computational
complexity.
Logit models
Suppose εij is iid across j, distributed according to the type one

−ε
extreme value distribution F (ε) = e −e .
The distribution is sometimes called “double exponential”, or the
“Gumbel distribution”, or the “Weibull distribution”.
When all consumers are identical except for the error term, Vij
becomes Vj , and the choice probabilities are the same across
consumers, which are equal to market shares.

Logit models: deriving shares
Let sj denote the market share of product j (omitting the subscript i):
Z
sj = I (εk − εj ≤ Vj − Vk , ∀k 6= j )f (ε)d ε
Z
−(Vj −Vk +εj ) − εj
= ∏ e −e ∗ e − εj ∗ e −e d εj
k 6 =j

e Vk
− ∑ k 6 =j ∗ e − εj
Z
− εj
∗ e − εj ∗ e −e
V
= e e j d εj

e Vk
− ∑ k 6 =j +1 ∗ e − εj
Z
− εj V
= e ∗e e j d εj
e Vk
Z
= exp(−εj − e −εj ∗ (∑ ))d εj
k e Vj

Logit models: deriving shares II
V
Let e λj = ∑k e Vkj . Using the change of variables (replacing d εj with
e
d (εj − λj )), we have:
Z
sj = exp(−εj − e −εj ∗ e λj )d εj
Z
−(εj −λj )
= e − λj ∗ e −(εj −λj ) e −e d ( εj − λj )
e Vj
= e − λj = (1)
∑ k e Vk

Logit models: linear utility
Typically, the indirect utility function is linear in attributes:
Uij = Xj β − αpj + εij , j = 1, ..., J
where ‘Xj ’ is the vector of product attributes, and ‘pj ’ is product j’s
price.
This kind of indirect utility can be derived from a quasi-linear utility
function, where consumers assign a fixed budget to consumption of
product j = 0, ..., J, which is free of income effects.
For many products, the assumption of no income effects seems
reasonable.
For large household items (cars, houses, durable goods), this is less
ideal. Later we will study papers that incorporate income effects.

Logit models: price coefficient and welfare
What does the price coefficient ‘α’ measure?

How do we measure consumers’ willingness to pay for one additional
unit of some product attribute?
What is consumer’s expected surplus with J choices?
How does the introduction of a new product affect consumer surplus?
Is this feature unique to logit models?

Logit models: price coefficient and welfare
The price coefficient ‘α’ is the marginal utility of a dollar, and the
β
ratio α measures consumers’ willingness to pay for one additional unit
of some product attribute (for example, 1GB of computer RAM).
The expected consumer surplus with J choices is:
h i J
E (CS ) = E max {Uij }K
j =0 = ln ( ∑ e Vj ) + euler constant
j =0
The introduction of a new product k, regardless of how similar the

product is to existing products, increases consumer surplus from
ln(∑Jj=0 e Vj ) to ln(∑Jj=0 e Vj + e Vk ).

Welfare with N+1 products
Note: this feature is not unique to logit models.

It holds for all models with independent taste errors.
In other words, the unobserved product space is expanded each time a
new product is introduced.
There are several ways to deal with this:
One is to modify the distribution assumption (for example, make the
error terms correlated or make the support of the error terms bounded).
Another is to get rid of the error term (εj : taste for products) and only
use characteristics (Berry and Pakes 2007).

Logit models: estimation
When all characteristics are exogenous, this is a standard multinomial

logit model.
The parameters can be estimated using MLE. STATA does it easily.
How does the IO literature depart from the rest of the literature on
discrete choices?
What is some related empirical evidence?

Logit models: estimation
If all characteristics (including prices) are exogenous, then this is a

standard multinomial logit model. The parameters can be estimated
using MLE. STATA does it easily.
Often it is problematic to assume prices are exogenous. For example,
we do not observe all relevant product characteristics (omitted
variable bias).
To make things explicit:
Uij = Xj β − αpj + ξ j + εij , j = 1, ..., J
where ξ j captures unobserved product attributes (product quality,

brand image, customer service, advertising etc.) that consumers value
but researchers do not observe.

Logit models: estimation II
Price is correlated with ξ j through the first-order condition:

mark-up (pj − cj ) is inversely related to demand elasticity, and
demand elasticity depends on ξ j .
We will discuss how to address the endogeneity problem (Berry
(1994)) after presenting extensions to logit models.

Drawbacks of logit models: IIA
Logit models are easy to estimate, and the endogeneity problem can
be easily addressed (see Berry 1994). However, it has a couple of
undesirable features.
What are the undesirable features?

Drawbacks of logit models: IIA
Logit models are easy to estimate, and the endogeneity problem can
be easily addressed (see Berry 1994). However, it has a couple of
undesirable features.
IIA property: independence from irrelevant alternatives. From the
share equation (1), we know that:
sj e Vj
= V , ∀j, k
sk e k
The odds of choosing j over k does not depend on other alternatives.

Conditioning on Vj and Vk , the odds of choosing j over k does not
depend on attributes of product j and k.
The famous red-bus and blue-bus example.

Proportional substitution patterns
Proportional substitution patterns:

( ∂V
j
∂sj ∂pj ∗ sj ∗ (1 − sj ), j = k
= ∂Vk
∂p ∗ sj ∗ sk , j 6 = k
∂pk
k
When the price of product k increases, some consumers who used to

buy k will switch to other products. The fraction of consumers
switching to product j is proportional to j’s existing market share.
The fact that consumers switch to the most popular products is not
unique to logit model.
This is driven by a) homogeneous observed utility Vj , and b) error
term εij that is iid across both i and j.
Since consumers only differ in the iid random taste shocks εij , the
market share of each product is determined by the order statistics.

Proportional substitution patterns II
Consider the case of cars. Suppose the price of BMW increases. For
some consumers who used to buy BMW, their utility from this
product decreases enough such that they switch to their second best
choice.
In models with iid εij and Vj that is identical across i, the proportion
of consumers who rank each brand as their second choice is simply
determined by the order statistics.
These models would predict that when BMW becomes more
expensive, a high fraction of its consumers would switch to either
Honda accord or Toyota camry, the most popular products.
This switching pattern is unlikely to hold in reality, because a typical
BMW consumer values the engine power (not the fuel efficiency).

Unappealing elasticity and mark-up predictions
In logit models, the price elasticities are:

∂sj pk −αpj (1 − sj ), j = k
η jk = ∗ =
∂pk sj αpk sk , j 6= k
Typically 1 − sj ≈ 1. Low price products have:

low elasticities
pj −cj 1 .
high Lerner’s ratio =
pj |εjj |
marginal cost as a percentage of price is lower
Two products with the same mean utility Vj = Vj 0 , j 6= j 0 will have:
the same market shares
the same cross-price elasticities with respect to any other product
k 6= j, k 6= j 0 .

Extension of logit models: nested logit and generalized
extreme value models
There are two ways to generalize the logit model:

allow Vj to depend on household attributes: Vij (next class)
generalize the distribution of ε (GEV models).
In this class, we will discuss the GEV models, where the unobserved
utilities for all alternatives {εi · } have a joint generalized extreme
value distribution.
The most popular candidate is the nested logit, where products are
clustered into different nests. Alternatives within a nest are
correlated, while alternatives in different nests are independent.
When prices change, consumers substitute more towards products
within a nest, less to products in other nests.

Nested logit: distribution assumption
Formally, ε = {ε1 , ..., εJ } is distributed according to:
− εj
− ∑K
k =1 ( ∑j ∈Bk exp( ))λk
F (ε) = e λk
where Bk is the set of products included in nest k, λk measures the

independence of εj within the nest k. The higher λk is, the more
independent εj are.
If λk = 1 for all nests, then we are back to the logit model, where all
εj ’s are independent:
− εj
F ( ε ) = e − ∑j e , j = 1, ..., J
When λk is close to 0, then products in nest k are highly
substitutable.
λk < 0 is not consistent with utility maximization; if λk > 1, then
only certain parameter values are consistent with utility maximization.
We assume λk ∈ [0, 1] ∀k.
Nested logit: deriving shares
The choice probabilities can be decomposed into two components:

the probability of choosing product j conditioning on nest k being
chosen, and the probability of choosing nest k.
For simplicity, let Wk denote the utility common to the nest k, and
Vj be the deviation of product j from the nest average:
Uij = Wk + Vj + εij , j ∈ Bk
Then:
sj = sj |Bk ∗ sBk
e Wk + λ k I k

 sBk = , Ik = ln ∑j ∈Bk e Vj /λk
∑l e Wl +λl Il
(V /λ )
e j k
sj | B k =
∑i ∈Bk e (Vi /λk )

where Ik is the inclusive value of nest Bk , and λk Ik is the expected

utility that consumers receive from the choices in nest Bk .
Nested logit & multi-stage budgeting
Nested logit is often associated with multi-stage budgeting, where

consumers first choose a nest, then narrow their choice down to a
particular product within the nest.
For example, consumers can first choose which city to live in, and
then choose a house in a city.
Like logit models, nested logit models also have analytic solutions,
which is convenient.
Nesting choices only partially alleviate the problem that consumers
switch to popular products – they switch to popular products within
the same nest.
The way nests are constructed also matters: the substitution pattern
will be different for different nest structures, even if the final branches
are the same.

Generalized nested logit
The most general solution is the generalized nested logit, where the
same product can belong to multiple nests.
For example, suppose consumers can choose between driving, car
pool, bus, or train:
driving and car pool are similar choices, while bus or train are similar
choices;
like bus and train, car pool is also inflexible to some extent;
as a result, car pool can belong to both nests.

Generalized Extreme Value II
Let αjk denote the portion of εj that belongs to nest k, and let
∑k αjk = 1. The choice probability of choosing product j can be
decomposed to:
sj = ∑ sj | k ∗ sk
k
{∑j ∈k [αjk exp(Vj )]1/λk }λk

 sk =
∑K
l =1 { ∑j ∈l [ αjl exp(Vj )]
1/λl }λl
(αjk exp(Vj ))1/λk

 sj | k =
∑i ∈k (αik exp(Vi ))1/λk
where the utility model is: Uij = Vj + εij .

In nested logit, product j belongs to one nest; here it can belong to
multiple nests, and we sum over all nests that product j belongs to.
For more details, see Kenneth Train (2003): “Discrete Choice
Methods with Simulation.”

Generalized Extreme Value III
In principle one could consider estimating an unrestricted

variance-covariance matrix to allow for fully flexible substitution
patterns between any two products.
However, this reintroduces the dimensionality problem of estimating
J × J parameters when J is large.
In the next class, we introduce the second approach to relax the
restrictive substitution patterns imposed by logit: random coefficient
models.

Berry (1994): a solution to endogenous prices
As we argued above, prices are endogenous:

prices are correlated with unobserved attributes that affect demand,
because firms set prices to maximize profits.
In nonlinear models (the discrete choice model is one of them),
dealing with endogeneity is difficult when we do not want to impose
any restriction on the correlation between prices and the unobserved
attributes.
The contribution of this paper is to invert unobserved product
attributes from observed market shares.
As we see in the production literature, another popular approach to
deal with unobserved attributes is the control function approach.

Berry (1994): model primitives
Firms produce differentiated products: j = 1, ..., J

Consumers have well-defined (random) preference: Uij
Equilibrium concept: consumers maximize utility, and firms compete
in prices or quantity.
Data requirement: a large number of markets and product shares in
each market.
The concept of “market” is fairly general. Different cities, different
time periods, or different consumer groups are all valid candidates of
“markets”.

Berry (1994): data
Pj Qj
Shares: we need physical shares (sj ), not revenue shares ( ∑ Pk Qk ).
k
Sometimes this requires converting observed sales to physical shares.
Prices: what if we need to aggregate over different styles, packages,
SKUs, etc.?
Use shares weighted prices. Constant shares (e.g. sample average
shares) are preferred.

Berry (1994): data II
Outside-option: we rarely observe the fraction of consumers

choosing the outside option. (How many people choose not to
purchase an iphone, or not to watch Avatar?)
Market size: the standard solution is to assume that the market size
is a parameterized function of observed variables, like population,
with the parameters to be determined.
For example, in most studies on cars (or household durables), the
market size is assumed to be the number of households in US.
In airline studies, the market size is often assumed to be the product
of the population of end-cities (partly motivated by gravity models).

Berry (1994): demand
Given utility fn Uij = Xj β − αpj + ξ j + εij , the market share of

product j is:
Z
sj = I [εij ≥ εik + Xk β − αpk + ξ k −
(Xj β − αpj + ξ j ), ∀k 6= j ]dF (ε)
= sj (X , P, ξ )
What are the standard approaches in estimating parameters {α, β}?

Berry (1994): demand
The market share of product j is:

Z
sj = I [εij ≥ εik + Xk β − αpk + ξ k −
(Xj β − αpj + ξ j ), ∀k 6= j ]dF (ε)
= sj (X , P, ξ )
We can not estimate the likelihood of the sample, because we do not

want to impose a distribution assumption on the unobserved quality
ξj .
We can not instrument for pj , because ξ j enters the share equation
nonlinearly.
It seems that ξ j is the trouble maker.
Why do we need ξ j in our model?

Berry (1994): demand and xi
Suppose we are interested in studying the car demand. For now, let
us simply ignore ξ j .
There are 100 million households in US.
If each household’s purchase probability is described by sj (X , P ), with
such a big “sample size”, the model predicted aggregate share should
1
differ from observed market shares by approximately √100,000,000 .
There is no demand model (with pure consumer taste shocks) that is
up to the test.
As a result, we need unobserved quality ξ j to help us fit data, even if
we are bold enough to claim prices are exogenous!

Berry (1994): inverting shares
The contribution of this paper is to show that under mild

distributional assumptions, there is a one-to-one mapping between the
observed market share vector {sj } and the vector of mean utility
{δj = Xj B − αpj + ξ j }.
The existence of a one-to-one mapping justifies inverting {ξ j } from
{sj }.
The paper’s proof used to be important since most extensions of BLP
had to show that there was a one-to-one mapping between ξ and s.
Berry, Gandhi, and Haile (2012) proved the existence of such a
mapping for a very general set up.
In the following, we use g (·) to denote the share function, to
distinguish it from the observed shares s.

Berry (1994): inverting shares
Theorem
If the density function F (ε) is positive and continuous for all ε, and we
normalize mean utility for the outside option to 0, then the inverse
function g −1 exists: ξ = g −1 (s ). In other words, there is one and only one
mean utility vector that can rationalize the observed market share.
Proof sketch: Since F (ε) is positive and continuous, it is easy to show

that the share function is smooth, increasing in own mean utility and
∂g ∂g
decreasing in other product’s mean utility: ∂δjj > 0, ∂δkj < 0, ∀k 6= j.
Note that limδj →−∞ gj (δ) = 0, and limδj →∞ gj (δ) = 1.

Berry (1994): inverting shares II
We first define the element-by-element inverse of the share

equation: rj (δ, sj ), which is the value that makes the predicted share
gj equal to the observed share sj , conditional on δ−j :
sj = gj (δ1 , ..., rj (δ, sj ), ..., δJ )
Since gj (·) is monotone and smooth, the inverse function rj (δ−j , sj )

exists and strictly increases in δk , k 6= j.Combining the element by
 δ1 = r1 (δ−1 , s1 )

...



element inverse functions into a vector: δ j = rj ( δ − j , s j )
...




δJ = rJ (δ−J , sJ )

The construction of r (δ, s ) transforms the inversion problem into a

fixed-point problem: δ satisfies g (δ) = s iff δ = r (δ, s ).
We show that the fixed point δ exists and is unique.
Berry (1994): inverting shares III
Key: show that r (δ, s ) is a continuous function that maps a compact

space into itself. Loosely speaking, we want to show that r (δ, s ) is
bounded.
Before we start, notice that to rationalize a given level of market
δj
share sj = 1+ e e δk , if δk were to increase for k 6= j, then
∑k
δj = rj (δ, sj ) would have to increase as well.
This is just to repeat that rj (δ−j , sj ) strictly increases in δk , k 6= j.
How low can rj (δ−j , sj ) go?

We can push δk to −∞ for k 6= j. Then δLj = rj (0, −∞, −∞, ..., sj ) is

as low as we can go if we were to rationalize sj .
Intuitively, rj (δ, sj ) is bounded below because limδj →−∞ gj (δ) = 0.
n o
Let δ= min δLj . This is the lower bound of r (δ, s ).

How do we show that rj (δ, sj ) is also bounded above?

1
Hint: s0 = 1 + ∑ k e δk
.
What happens if we push δk higher and higher?

In other words, r (δ, s ) is also bounded above otherwise the market

share of the outside product would be 0 (because δ0 ≡ 0).
Let δk = −∞, for all k 6= j. Define δU
j as the value of δj that sets the
market share
n o of the outside good equal to the observed share s0 . Let
U
δ̄ = max δj .
This is the upper bound of r (δ, s ).

Berry (1994): inverting shares IV
We show that r (δ, s ) is a continuous function that maps [δ, δ̄]J into
itself.
The Brouwer’s fixed point theorem shows that a fixed point exists.
The dominant diagonal argument says the fixed point is unique iff
∂r
∑k ∂δkj < 1, ∀j.
∂rj
What is implied by condition ∑k ∂δk < 1, ∀j?
Intuition: since δ0 is fixed at 0, when attributes of all rival products
increases ε, δj (or rj (δ, sj ) only needs to increase by less than ε to
rationalize sj . Hence the ‘slope’ of rj (δ, sj ) is less than 1.

Berry (1994): inverting shares IV
∂rj
To show this formally, first note that ∂δj = 0 because rj does not
∂rj
depend on δj . In addition, ∂δ0 = 0 because we fix δ0 = 0.
∂gj
∂rj ∂r ∂r
=− , ∂δjj = 0, ∂δj0 = 0
∂δk
Since ∂δk ∂gj
∂δj
∂rj ∂rj ∂gj ∂gj

∑k ∂δk = ∑k 6=0,j ∂δk < 1 ⇔ ∑k 6=0,j ∂δk < ∂δj
∂gj ∂gj ∂gj
Given that ∂δk < 0,for all k 6= j, ∑k 6=0,j ∂δk < ∂δj ⇔
∂gj ∂gj ∂gj
∂δ1 + ... + ∂δj + ... + ∂δJ > 0.
∂gj ∂gj ∂gj ∂gj ∂gj
This is true because ∂δ0 + ∂δ1 ... + ∂δj + ... + ∂δJ = 0, and ∂δ0 < 0.
∂r ∂g
Note: ∂δj0 = 0 but ∂δ0j < 0. This is because rj is defined such that it
does not depend on δ0 , while gj is defined such that it depends on δ0
and that δ0 is normalized to 0

Inverting shares: implementation
Berry (1994) shows that the inversion exists and is unique. In

practice, how do we solve for the mean utility δj = Xj β − αpj + ξ j ?
Logit and nested logit models have explicit formula for δj . We will
learn about a) inverting δj iteratively b) imposing the share equations
as constraints (the MPEC approach) when we study BLP.
How do we solve for δj (and hence ξ j ) in logit models?

Inverting shares: implementation
In logit models,
ln(sj ) − ln(s0 ) = Xj β − αpj + ξ j
This is a simple IV regression, where we instrument prices.

What IVs can we use?

Estimation and instruments
Possible IVs include:

cost variables excluded from demand side, like input prices that vary
across firms;
characteristics of other products: Xk , k 6= j, if we take characteristics
as exogenously given.
Why are characteristics of other products appropriate IVs?

Estimation and instruments
Because firms’ first-order pricing condition suggests that pj depends

on other products’ attributes:
pj − cj 1
=−
pj εjj (X , P, ξ )
Intuitively, products facing lots of substitutes have lower prices.

In addition, other products attributes are excluded from the utility of
consuming project j.
As a result, in differentiated product models, demand-side
instruments can be any excluded variables that affect mark-ups.

Monte-Carlo evidence
σξd =1 σξd =3
(1) (2) (3) (4)
Parameter True Value OLS IV OLS IV
βo 5 3.460 4.980 0.378 4.890
(0.158) (0.226) (0.415) (0.738)
βx 2 1.410 1.990 0.325 1.950
(0.058) (0.091) (0.127) (0.272)
α 1 0.726 0.995 0.181 0.979
(0.029) (0.039) (0.076) (0.128)
Table: Berry, Table 1 – Monte Carlo Parameter Estimates 100 Random Samples
of 500 Duopoly Markets Logit Utility Notes: The values given in the table are
empirical means and (standard errors). The utility function is
uij = βo + βx xj + σξd ξ j − αpj + eij . Marginal cost is
cj = e γo +γx xj +σξc ξ j +γw wj +σω ωj .

Results
In column one, 80% of the variance in the mean utility is accounted

by the observed attributes, all parameters are biased toward 0.
In the second experiment, 70% of the variance in the mean utility is
accounted by the unobserved attributes, and the parameters are
severely biased toward 0.
As expected, IV estimates have slightly larger standard errors.

Lec2 Demand Berry1994 Sp15

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec2 Demand Berry1994 Sp15

Uploaded by

Copyright:

Available Formats

Advanced Topics in Industrial Organization

Lecture 2: Discrete-choice models

Panle Jia Barwick

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 1 / 50

In AIDS models, demand is continuous: consumers allocate a share of

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 2 / 50

A quick road-map of today’s lecture:

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 3 / 50

Let i index consumers, j index products. Let consumer i’s indirect

Uij = Vij + εij , j = 0, ..., J

where Vij is the utility explained by observed attributes (which include

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 4 / 50

The probability of consumer i choosing product j is:

Pij = Pr(εik − εij ≤ Vij − Vik , ∀k 6= j )

What is the standard normalization that we impose on this kind of

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 5 / 50

The probability of consumer i choosing product j is:

Pij = Pr(εik − εij ≤ Vij − Vik , ∀k 6= j )

Suppose εij is iid across j, distributed according to the type one

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 7 / 50

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 8 / 50

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 9 / 50

Typically, the indirect utility function is linear in attributes:

Uij = Xj β − αpj + εij , j = 1, ..., J

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 10 / 50

What does the price coefficient ‘α’ measure?

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 11 / 50

The introduction of a new product k, regardless of how similar the

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 12 / 50

Note: this feature is not unique to logit models.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 13 / 50

When all characteristics are exogenous, this is a standard multinomial

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 14 / 50

If all characteristics (including prices) are exogenous, then this is a

Uij = Xj β − αpj + ξ j + εij , j = 1, ..., J

where ξ j captures unobserved product attributes (product quality,

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 15 / 50

Price is correlated with ξ j through the first-order condition:

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 16 / 50

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 17 / 50

The odds of choosing j over k does not depend on other alternatives.

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 18 / 50

Proportional substitution patterns:

When the price of product k increases, some consumers who used to

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 19 / 50

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 20 / 50

In logit models, the price elasticities are:

Typically 1 − sj ≈ 1. Low price products have:

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 21 / 50

There are two ways to generalize the logit model:

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 22 / 50

where Bk is the set of products included in nest k, λk measures the

The choice probabilities can be decomposed into two components:

where Ik is the inclusive value of nest Bk , and λk Ik is the expected

Nested logit is often associated with multi-stage budgeting, where

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 25 / 50

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 26 / 50

(αjk exp(Vj ))1/λk

where the utility model is: Uij = Vj + εij .

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 27 / 50

In principle one could consider estimating an unrestricted

Panle Jia Barwick (Cornell) Discrete-Choice Berry94 Spring 2015 28 / 50

As we argued above, prices are endogenous: