© All Rights Reserved

0 views

© All Rights Reserved

- 6d8e3a_845ee1af6bad4749b22eb1ed8fca3228
- SAS Dumps for Base SAS 9 Exam
- IBM SPSS Complex Samples
- 2014-01-25T14-27-52-R45-Credit Analysis Models
- The Young and the Restless? The Liberalization of Young Evangelicals
- Island Exceptionalism and International Maritime Conflicts
- Enhanced Predictive Models for Purchasing in the Fashion Field by Using Kernel Machine Regression Equipped With Ordinal Logistic Regression
- Multivariate vs Multivariable - Cook County Hospital
- IZADP7774.pdf
- Mid Semester Exam 2014 (IDEC 8029)
- Tradeoffs and Concessions
- green 2000.docx
- Formare e Riorientare i Lavoratori Over 40
- Ga Bauer 2005
- Thomas Otter
- diabetic readmittance.pdf
- The Potential Protective Effect of Youth Assets on Adolescent Alcohol and Drug Use
- Capstone Project - credit risk analysis
- Final Project Report2.docx
- Shantha's Study

You are on page 1of 32

Lecture 3

Discrete Choice Models

• To date we have implicitly assumed that the variable yi is a

continuous random variable.

• But, the CLM does not require this assumption! The dependent

variable can have discontinuities. For example, it can be discrete or

follow counts. In these cases, linearity of the conditional expectations

is unusual.

• Different types of discontinuities generate different models:

Truncated,

Censored

Discrete

Truncated/

Choice

Duration Censored

Models (DCM)

(Hazard) Regr. Models

Models

RS – Lecture 17

With limited dependent variables, the conditional mean is rarely

linear. We need to use adjusted models.

• We usually study discrete data that represent a decision, a choice.

• Sometimes, there is a single choice. Then, the data come in binary

form with a ”1” representing a decision to do something and a ”0”

being a decision not to do something.

=> Single Choice (binary choice models): Binary Data

Data: yi = 1 (yes/accept) or 0 (no/reject)

- Examples: Trade a stock or not, do an MBA or not, etc.

..., J, where J represents the number of choices.

=> Multiple Choice (multinomial choice models)

Data: yi = 1(opt. 1), 2 (opt. 2), ....., J (opt. J)

- Examples: CEO candidates, transportation modes, etc.

RS – Lecture 17

Fly Ground

RS – Lecture 17

• Truncated variables:

We only sample from (observe/use) a subset of the population.

The variable is observed only beyond a certain threshold level

(‘truncation point’)

- store expenditures, labor force participation, income below

poverty line.

• Censored variables:

Values in a certain range are all transformed to/grouped into (or

reported as) a single value.

- hours worked, exchange rates under Central Bank intervention.

Note: Censoring is essentially a defect in the sample data. Presumably,

if they were not censored, the data would be a representative

sample from the population of interest.

RS – Lecture 17

• We model the time between two events.

Examples:

– Time between two trades

– Time between cash flows withdrawals from a fund

– Time until a consumer becomes inactive/cancels a subscription

– Time until a consumer responds to direct mail/ a questionnaire

• Consumers Maximize Utility.

• Fundamental Choice Problem:

Max U(x1,x2,…) subject to prices and budget constraints

– Indirect Utility Function: V = V(p,I)

– Demand System of Continuous Choices

∂V (p, I) / ∂p j

x*j = −

∂V (p, I) / ∂I

RS – Lecture 17

• Theory is silent about discrete choices

• Translation to discrete choice

- Existence of well defined utility indexes: Completeness of rankings

- Rationality: Utility maximization

- Axioms of revealed preferences

• Choice and consideration sets: Consumers simplify choice situations

• Implication for choice among a set of discrete alternatives

• Commonalities and uniqueness

– Does this allow us to build “models?”

– What common elements can be assumed?

– How can we account for heterogeneity?

• Revealed choices do not reveal utility, only rankings which are scale

invariant.

• We will model discrete choice. We observe a discrete variable yi

and a set of variables connected with the decision xi, usually called

covariates. We want to model the relation between yi and xi.

• It is common to distinguish between covariates zi that vary by

units (individuals or firms), and covariates that vary by choice (and

possibly by individual), wij.

• Example of zi‘s: individual characteristics, such as age or education.

• Example of wij: the cost associated with the choice, for example the

cost of investing in bonds/stocks/cash, or the price of a product.

using utility maximizing choice behavior. We may put restrictions

on the way covariates affect utilities: the characteristics of choice i

should affect the utility of choice i, but not the utility of choice j.

RS – Lecture 17

• The modern literature goes back to the work by Daniel McFadden

in the seventies and eighties (McFadden 1973, 1981, 1982, 1984).

• Usual Notation:

n = decision maker

i,j = choice options

y = decision outcome

x = explanatory variables/covariates

β = parameters

ε = error term

I[.] = indicator function: equal to 1 if expression within brackets is

true, 0 otherwise. Example: I[y=j|x] = 1 if j was selected (given x)

= 0 otherwise

• Predicting behavior

- Individual – for example, will a person buy the add-on insurance?

- Aggregate – for example, what proportion of the population will

buy the add-on insurance?

• Analyze changes in behavior when attributes change. For example,

how will changes in education change the proportion of who buy the

insurance?

RS – Lecture 17

German Health Care Usage Data, N = 7,293, Varying Numbers of Periods

Data downloaded from Journal of Applied Econometrics Archive. This is an

unbalanced panel with 7,293 individuals. This is a large data set. There are altogether

27,326 observations. The number of observations ranges from 1 to 7. (Frequencies

are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). (Downloaded from

the JAE Archive)

Variables in the file are

DOCTOR = 1(Number of doctor visits > 0)

HOSPITAL = 1(Number of hospital visits > 0)

HSAT = health satisfaction, coded 0 (low) - 10 (high)

DOCVIS = number of doctor visits in last three months

HOSPVIS = number of hospital visits in last calendar year

PUBLIC = insured in public health insurance = 1; otherwise = 0

ADDON = insured by add-on insurance = 1; otherswise = 0

HHNINC = household nominal monthly net income in German marks / 10000.

(4 observations with income=0 were dropped)

HHKIDS = children under age 16 in the household = 1; otherwise = 0

EDUC = years of schooling

AGE = age in years

FEMALE = 1 for female headed household, 0 for male

EDUC = years of education

RS – Lecture 17

Q: Does income affect doctor’s visits? What is the effect of age on

doctor’s visits? Is gender relevant?

27,326 Observations –

– 1 to 7 years, panel

– 7,293 households observed

– We use the 1994 year => 3,337 household observations

Descriptive Statistics

=========================================================

Variable Mean Std.Dev. Minimum Maximum

--------+------------------------------------------------

DOCTOR| .657980 .474456 .000000 1.00000

AGE| 42.6266 11.5860 25.0000 64.0000

HHNINC| .444764 .216586 .340000E-01 3.00000

FEMALE| .463429 .498735 .000000 1.00000

1. Characteristics of the choice set

- Alternatives must be mutually exclusive: No combination of

choice alternatives. For example, no combination of different

investments types (bonds, stocks, real estate, etc.).

If we are considering types of investments, we should include all:

bonds; stocks; real estate; hedge funds; exchange rates;

commodities, etc. If relevant, we should include international and

domestic financial markets.

RS – Lecture 17

2. Random utility maximization (RUM)

Assumption: Revealed preference. The decision maker selects the

alternative that provides the highest utility. That is,

Decision maker n selects choice i if Uni > Unj ∀ j ≠ i

(unobserved) part, εnj:

Unj = Vnj + εnj

- The deterministic part, Vnj, is a function of some observed variables,

xnj (age, income, sex, price, etc.):

Vnj = α + β1 Agen + β2 Incomenj + β3 Sexn + β4 Pricenj

2. Random utility maximization (RUM)

• We think of an individual’s utility as an unobservable variable, with

an observable component, V, and an unobservable (tastes?) random

component, ε.

Vnj = α + β1 Agen + β2 Incomenj + β3 Sexn + β4 Pricenj

individuals. There is no heterogeneity. This is a useful assumption for

estimation. It can be relaxed.

RS – Lecture 17

2. Random utility maximization (continuation)

Probability Model: Since both U’s are random, the choice is random.

Then, n selects i over j if:

P ni = Prob (U ni > U nj ∀ j ≠ i)

= Prob (V ni + ε ni > V nj + ε nj ∀ j ≠ i)

= Prob (ε nj − ε ni < V ni − V nj ∀ j ≠ i)

P ni = ∫ I (ε nj − ε ni < V ni − V nj ∀ j ≠ i ) f ( ε n ) d ε n

• Vnj - Vnj = h(X, β). h(.) is usually referred as the index function.

F h ( xi , β ) = Pr [ yi = 1]

0

h(X, β).

RS – Lecture 17

• To evaluate the integral, f(εn) needs to be specified. Many

possibilities:

– Normal: Probit Model, natural for behavior.

– Logistic: Logit Model, allows “thicker tails.”

– Gompertz: Extreme Value Model, asymmetric distribution.

the CDF F(X,β). These methods impose weaker assumptions than

the fully parametric model described above.

conclusions, but likely more robust results.

RS – Lecture 17

• Note: Probit? Logit?

Normal distribution function is usually called a “Probability Unit” or

Probit for short. “Probit” graph papers have a normal probability

scales on one axis. The Normal qualitative choice model became

known as the Probit model. The “it” was transmitted to the Logistic

Model (Logit) and the Gompertz Model (Gompit).

• Many candidates for CDF –i.e., Pn(x’nβ) = F(Zn),:

– Normal (Probit Model) = Φ(Zn)

– Logistic (Logit Model) = 1/[1+exp(-Zn)]

– Gompertz (Gompit Model) = 1 – exp[-exp(Zn)]

- Probit Model: Prob(yn=1) approaches 1 very rapidly as X and

therefore Z increase. It approaches 0 very rapidly as X and Z decrease

- Logit Model: It approaches the limits 0 and 1 more slowly than

does the Probit.

- Gompit Model: Its distribution is strongly negatively skewed,

approaching 0 very slowly for small values of Z, and 1 even more

rapidly than the Probit for large values of Z.

RS – Lecture 17

• Comparisons: Probit vs Logit

Note: Not all the parameters may be identified.

doctor or not –i.e., (0,1) data. If Uvisit > 0, an agent visits a doctor.

Uvisit>0 ⇔ α + β1 Age + β2 Income + β3 Sex + ε > 0

=> ε > -(α + β1 Age+ β2 Income+ β3 Sex)

Let Y = 1 if Uvisit > 0

Var[ε] = σ2

Uvisit>0 ⇔ ε/σ> -[α/σ + (β1/σ) Age+ (β2/σ)Income+ (β3/σ) Sex] > 0

or w > -[α’ + β1’Age + β’Income + β’Sex] > 0

RS – Lecture 17

• Dividing everything by σ.

Uvisit>0 ⇔ ε/σ> -[α/σ + (β1/σ) Age+ (β2/σ)Income+ (β3/σ) Sex] > 0

or w > -[α’ + β1’Age + β’Income + β’Sex] > 0

Y = 1 if Uvisit > 0

Var[w] = 1

could have assigned the values (1,2) instead of (0,1) to yn. It is

possible to produce any range of values in yn.

Note: Aggregation can be problematic

are used as inputs:

E P1 ( x ) ≠ P1 E ( x )

estimates can be obtained by sample enumeration:

- Compute probabilities/elasticities for each decision maker

- Compute (weighted) average of these values.

1

P1 = ∑ P1 ( xi )

N i

RS – Lecture 17

individuals, a and b, equally represented in the population, with

Va = β ′xa

Vb = β ′xb

then

Pa = Pr yi = 1 xa Pb = Pr yi = 1 xb

= F [ β ′xa ] = F [ β ′xb ]

but

P= 1

2 ( Pa + Pb ) ≠ P ( x ) = F [ β ′x ]

In general, P (V ) will tend to (underestimate) overestimate P

when probabilities are (high) low

F ( ⋅)

Pb

P

P (V )

Pa

Va V Vb V

RS – Lecture 17

Graph: Average probability (2.1) vs. Probability of the average (2.2)

3. Identification problems

a. Only differences in utility matter

Choice probabilities do not change when a constant is added to each

alternative’s utility

Implication: Some parameters cannot be identified/estimated.

Alternative-specific constants; coefficients of variables that

change over decision makers but not over alternatives.

b. Overall scale of utility is irrelevant

Choice probabilities do not change when the utility of all alternatives

are multiplied by the same factor

Implication: Coefficients of different models (data sets) are not

directly comparable.

Normalization of parameters or variance of error terms is used to

deal with identification issues.

RS – Lecture 17

DCM: Estimation

• Since we specify a pdf, ML estimation seems natural to do. But, in

general, it can get complicated.

– Normal: Probit Model = Φ(x’nβ)

– Logistic: Logit Model = exp(x’nβ)/[1+exp(x’nβ)]

– Gompertz: Extreme Value Model = 1 – exp[-exp(x’nβ)]

• Methods

- ML estimation (Numerical optimization)

- Bayesian estimation (MCMC methods)

- Simulation-assisted estimation

DCM: ML Estimation

• Example: Logit Model

Suppose we have binary (0,1) data. The logit model follows from:

Pn[yn=1|x] = exp(x’nβ)/[1+exp(x’nβ) = F(x’nβ)

Pn[yn=0|x] = 1/[1+exp(x’nβ)] = 1 - F(x’nβ)

- Likelihood function

L(β) = Πn (1- P [yn=1|x,β]) P[yn=1|x,β]

- Log likelihood

Log L(β) = Σy=0 log[1- F(x’nβ)] + Σy=1 log[F(x’nβ)]

- Numerical optimization to get β.

computation of the Hessian, H, may cause problems.

• ML estimators are consistency, asymptotic normal and efficient.

RS – Lecture 17

DCM: ML Estimation

• Example: Logit Model

Suppose we have binary (0,1) data. The logit model follows from:

Pn[yn=1|x] = exp(x’nβ)/[1+exp(x’nβ) = F(x’nβ)

Pn[yn=0|x] = 1/[1+exp(x’nβ)] = 1 - F(x’nβ)

- Likelihood function

L(β) = Πn (1- P [yn=1|x,β]) P[yn=1|x,β]

- Log likelihood

Log L(β) = Σy=0 log[1- F(x’nβ)] + Σy=1 log[F(x’nβ)]

- Numerical optimization to get β.

computation of the Hessian, H, may cause problems.

• ML estimators are consistency, asymptotic normal and efficient.

• How can we estimate the covariance matrix, Σβl?

Using the usual conditions, we can use the information matrix:

−1

∂ 2L −1

I n g e n e r a l: Σ β l = − E = I (β )

∂β∂β′

−1

∂ 2L

N e w to n -R a p h s o n : Σ β l = −

∂β∂β′ β =βl

−1

T ∂L i ∂L i

B H H H : Σ βl = ∑

i =1 ∂ β ∂ β ′ β = β l

samples they often provide different covariance estimates for the

same model 38

RS – Lecture 17

DCM: ML Estimation

• Numerical optimization - Steps:

(1) Start by specifying the likelihood for one observation: Fn(X,β)

(2) Get the joint likelihood function: L(β) = Πn Fn(X,β)

(3) It is easier to work with the log likelihood function:

Log L(β) = Σ n ln(Fn(X,β))

(4) Maximize Log L(β) with respect to β

- Set the score equal to 0 ⇒ no closed-form solution

- Numerical optimization, as usual:

(i) Starting values β0

(ii) Determine new value βt+1 = βt + update, such that Log

L(βt+1) > Log L(βt).

Say, N-R’s updating step: βt+1 = βt - λt H-1 ∇f (βt)

(iii) Repeat step (ii) until convergence.

• The Bayesian estimator will be the mean of the posterior density:

f ( y | X, β, γ ) f (β, γ ) f ( y | X, β, γ ) f (β, γ )

f (β , γ | y , X ) = =

f ( y | X, β, γ )

∫ f (y | X, β, γ) f (β, γ )dβdγ

- f(β,γ) is the prior density for the model parameters

- f( y|X,β,γ) is the likelihood.

- The priors are usually non-informative (flat), say f(β,γ) α 1.

- The likelihood depends on the model in mind. For a Probit Model,

we will use a normal distribution. If we have binary data, then,

f( y|X,β,γ) = Πn (1-Φ[yn|x,β,γ]) Φ[yn|x,β,γ]

RS – Lecture 17

• Let θ=(β,γ). Suppose we have binary data with Pn[yn=1|x, θ] =

F(x’n,,θ). The estimator of θ is the mean of the posterior density.

N

∫ ∏ (1 − F ( X, θ))

yn

θ ( F ( X, θ )) y n d θ

E [θ | y , X ] =

∫θ f ( y | X, θ ) f ( θ ) d θ

= n =1

∫ f (y | X, θ) f (θ) dθ

N

∫ ∏ (1 − F ( X, θ))

yn

( F ( X, θ )) y n d θ

n =1

using MCMC methods. Much simpler.

• ML Estimation is likely complicated due to the multidimensional

integration problem. Simulation-based methods approximate the

integral. Relatively easy to apply.

an integral. For example:

E[h(U)] = ∫ h(u) f(u) du

expectation. In many cases, an analytic solution or a precise numerical

solution is not possible. But, we can always simulate E[h(u)]:

- Steps

- Draw R pseudo-RV from f(u): u1, u2, ...,uR (R: repetitions)

- Compute Ê[h(U)] = (1/R) Σn h(un)

RS – Lecture 17

• We call Ê[h(U)] a simulator.

continuous and differentiable.

the times consistent) estimator for E[h(U)].

• There are many simulators. But the idea is the same: compute an

integral by drawing pseudo-RVs, never by integration.

• In general, the β’s do not have an interesting interpretation. βk does

not have the usual marginal effect interpretation.

Partial effect = ∂P(α+β1Income…)/∂xk (derivative)

Marginal effect = ∂E[yn|x]/∂xk

Elasticity = ∂log P(α+β1Income…)/∂log xk

= Partial effect x xk/P(α+β1Income…)

• These effects vary with the values of x: larger near the center of the

distribution, smaller in the tail.

RS – Lecture 17

• We know the distribution of bn, with mean θ and variance σ2/n, but

we are interested in the distribution of g(bn), where (g(bn) is a

continuous differentiable function, independent of n.)

g(bn) → N(g(θ), [g′(θ)]2 σ2/n).

a

→ N(g(θ), [G(θ)]’ Var[bn] [G(θ)]).

• The partial and marginal effects will vary with the values of x.

the x. For example:

Estimated Partial effect = f (α+β1Mean(Income)…) -f(.)=pdf

marginal effects at every observation.

RS – Lecture 17

• Q: How well does a DCM fit?

In the regression framework, we used R2. But, with DCM, there are

no residuals or RSS. The model is not computed to optimize the

fit of the model: => There is no R2.

- Let Log L(β0) only with constant term. Then, define Pseudo R²:

Pseudo R² = 1 – Log L(β)/Log L(β0) (“likelihood ratio index”)

(This McFadden’s Pseudo R² . There are many others.)

- LR-test : LR = -2(Log L(β0) – Log L(β)) ~ χ²(k)

=>sometimes conflicting results

• “Fit measures” computed from accuracy of predictions

Direct assessment of the effectiveness of the model at predicting the

outcome –i.e., a 1 or a 0.

- Computation

- Use the model to compute predicted probabilities

- Use the model and a rule to compute predicted y = 0 or 1

Rule: Predict y=1 if estimated F is “large”, say 0.5 or greater

More general, use ŷ=1 if estimated F is greater than P*

Example: Cramer Fit measure

F̂ = Predicted Probability

Σ N y Fˆ Σ N (1 − yi )Fˆ

λˆ = i =1 i − i =1

N1 N0

λˆ = M ean Fˆ | when y = 1 - Mean Fˆ | when y = 0

= reward for correct predictions minus

penalty for incorrect predictions

RS – Lecture 17

Let

1 Fˆi ≥ 0.5

yˆ i =

0 Fˆi < 0.5

Predicted

Fˆi ≥ 0.5 Fˆi < 0.5

yi = 1

Actual

yi = 0

– No weight to the costs of individual errors made. It may be more

costly to make an error to classify a “yes” as a “no” than viceversa.

– In this case, some loss functions would be more helpful than others.

• There is no way to judge departures from a diagonal table.

• Use the Likelihood:

- LR-test

LR = -2(Log L(βr) – Log L(βu))

r=restricted model; u=unrestricted (full) model

LR ~ χ²(k) (k=difference in # of parameters)

• AIC, CAIC, BIC ⇒ lowest value

RS – Lecture 17

DCM: Testing

• Given the ML estimation setup, the trilogy of tests (LR, W, and LM)

is used:

- LR Test: Based on unrestricted and restricted estimates.

- Distance Measures - Wald test: Based on unrestricted estimates.

- LM tests: Based on restricted estimates.

constructed.

- Fit an unrestricted model, based on model for the different

categories (say, female and male) or subsamples (regimes), and

compare it to the restricted model (pooled model) => LR test.

DCM: Testing

• Issues:

- Linear or nonlinear functions of the parameters

- Constancy of parameters (Chow Test)

- Correct specification of distribution

- Heteroscedasticity

RS – Lecture 17

DCM: Heteroscedasticity

• In the RUM, with binary data agent n selects

yn = 1 iff Un = β’xn + εn > 0,

where the unobserved εn has E[εn] = 0, and Var[εn] = 1

variance = 1, an identification assumption. But, implicitly we are

assuming homoscedasticity across individuals.

presense of heteroscedasticity, we scale each observation by the

squared root of its variance.

DCM: Heteroscedasticity

• Q: How to accommodate heterogeneity in a DCM?

Use different scaling for each individual. We need to know the

model for the variance.

– Parameterize: Var[εn] = exp(zn´γ)

– Reformulate probabilities

Binary Probit or Logit: Pn[yn=1|x] = P(xn´β/exp(zn´γ))

• Marginal effects (derivative of E[yn] w.r.t. xn and zn) are now more

complicated. If xn =zn, signs and magnitudes of marginal effects

tend to be ambiguous.

RS – Lecture 17

• There is no generic, White-type test for heteroscedasticity. We do

the tests in the context of the maximum likelihood estimation.

straightforward

under H1 (heteroscedasticity), say,

H1: Var[εn] = exp(zn´γ)

• In the context of maximum likelihood estimation, it is common to

define the Var[bM] = (1/T) H0-1V0 H0-1 , where if the model is

correctly specified: -H = V. Similarly, for a DCM we can define:

"Robust" Covariance Matrix: V = A B A

A = negative inverse of second derivatives matrix

−1 −1

∂ 2 log L N ∂ log Prob i

2

= estimated E -

∂β ∂β ′

= − i =1

∑

∂βˆ ∂ βˆ ′

B = matrix sum of outer products of first derivatives

−1

∂ log L ∂ log L ∂ log Prob i ∂ log Prob i

∑

N

= estimated E =

∂β ∂ β ′ i =1

∂βˆ ∂βˆ ′

−1

For a logit model, A = ∑ Pˆ (1 − Pˆi ) x i x ′i

N

i =1 i

B = ∑ ∑

( yi − Pˆi ) 2 x i x ′i = i =1 ei2 x i x ′i

N N

i =1

(Resembles the White estimator in the linear model case.)

RS – Lecture 17

• Q: Is this matrix robust to what?

• It is not “robust” to:

– Heteroscedasticity

– Correlation across observations

– Omitted heterogeneity

– Omitted variables (even if orthogonal)

– Wrong distribution assumed

– Wrong functional form for index function

• In all cases, the estimator is inconsistent so a “robust” covariance

matrix is pointless.

DCM: Endogeneity

• It is possible to have in a DCM endogenous covariates. For

example, many times we include education as part of an individual’s

characteristics or the income/benefits generated by the choice as part

of its characteristics.

Suppose agent n selects yn = 1 iff

Un = xn’β+ hn’θ + εn > 0,

where E[ε|h] ≠ 0 (h is endogenous)

– Case 1: h is continuous (complicated)

– Case 2: h is discrete, say, binary. (Easier, a treatment effect)

RS – Lecture 17

DCM: Endogeneity

• Approaches

- Maximum Likelihood (parametric approach)

- GMM

- Various approaches for case 2. Case 2 is the easier case: SE DCM!

The usual problems with endogenous variables are made worse in

nonlinear models. In a DCM is not clear how to use IVs.

a Probit Model: E[(yn-Φ(xn’β))(xn z)]=0

=>This moment equation forms the basis of a straightforward two

step GMM estimator. Since we specify Φ(.), it is parametric.

DCM: Endogeneity - ML

• ML estimation requires full specification of the model, including the

assumption that underlies the endogeneity of hn. For example:

- Revealed preference: yn = 1[Un > 0]

- Endogenous variable: hn = zn’α + un, with

E[ε|h] ≠ 0 Cov[u,ε] ≠ 0 (ρ=Corr[u,ε])

- Additional Assumptions:

1)

RS – Lecture 17

DCM: Endogeneity - ML

• ML becomes a simultaneous equations model.

- Reduced form estimation is possible:

- Insert the second equation in the first. If we use a Probit Model,

this becomes Prob(yn = 1|xn,zn) = Φ(xn′β* + zn′α*).

- Write down the joint density: f(yn|xn,zn) f(zn).

- Assume probability model for f(yn|xn,zn), say a Probit Model.

- Assume marginal for f(zn), say a normal distribution.

- Use the projection: εn|un= [(ρσ)/σu2] un + vn, σv2=(1- ρ2).

- Insert projection in P(yn)

- Replace un = (hn - zn’α) in P(yn).

- Maximize Log L(.) w.r.t. (β,α,θ,ρ,σu)

because of the correlation betw een h and ε induced by the

correlation of u and ε . U sing the bivariate norm ality,

β ′x + θ h + ( ρ / σ ) u

P rob ( y = 1 | x , h ) = Φ u

1− ρ 2

Insert u i = ( hi - α ′z )/ σ u and include f(h| z ) to form logL

hi - α ′z i

β ′ x i + θ hi + ρ

(2 y − 1) σu +

log Φ

N

i 2

logL= ∑

1− ρ

i=1

log 1 φ hi - α ′z i

σ u σu

RS – Lecture 17

• Two step limited information ML (Control Function) is also

possible:

- Use OLS to estimate α,σu

- Compute the residual vn.

- Plug residuals vn into the assumed model P(yn)

- Fit the probability model for P(yn).

- Transform the estimated coefficients into the structural ones.

- Use delta method to calculate standard errors.

- 6d8e3a_845ee1af6bad4749b22eb1ed8fca3228Uploaded byLuis Oliveira
- SAS Dumps for Base SAS 9 ExamUploaded byvikrants1306
- IBM SPSS Complex SamplesUploaded byKuldeepBukarwal
- 2014-01-25T14-27-52-R45-Credit Analysis ModelsUploaded bymishika
- The Young and the Restless? The Liberalization of Young EvangelicalsUploaded byluisgmailcom
- Island Exceptionalism and International Maritime ConflictsUploaded bysvitkonavenacava
- Enhanced Predictive Models for Purchasing in the Fashion Field by Using Kernel Machine Regression Equipped With Ordinal Logistic RegressionUploaded byfutulash
- Multivariate vs Multivariable - Cook County HospitalUploaded byBob Loblaw
- IZADP7774.pdfUploaded byRicardo Gutiérrez Argüelles
- Mid Semester Exam 2014 (IDEC 8029)Uploaded byWisnu Harto Adi Wijoyo
- Tradeoffs and ConcessionsUploaded byAdrian Tajmani
- green 2000.docxUploaded byReetika Dadheech
- Formare e Riorientare i Lavoratori Over 40Uploaded byroberto
- Ga Bauer 2005Uploaded byWilliam Sasaki
- Thomas OtterUploaded byapi-3706846
- diabetic readmittance.pdfUploaded bySri kanta
- The Potential Protective Effect of Youth Assets on Adolescent Alcohol and Drug UseUploaded byThe Stacie Mathewson Foundation
- Capstone Project - credit risk analysisUploaded byVivek Prakash
- Final Project Report2.docxUploaded byDineshPabbi
- Shantha's StudyUploaded byHemantha Gamage
- Predective Parameteres of Arteriovenous Functional MaturationUploaded bymarius
- Lec DiagTest 0902Uploaded byChika Albert
- Associations of Psychosocial and IndividUploaded byDevy Syahputri Tanjung
- pointersUploaded byAr-Ar Fundimera
- HomeUploaded byAnupama Shrivastava
- 501_waiver_2011-2Uploaded byAndy Griffin
- LDMU ppts NUploaded byVuppalapati Rahul
- J. Nutr.-2002-de Pee-2215-21Uploaded byDian Asmaraningtyas
- Hasil Analsis 48-96 Jam PrintUploaded byhelltothelaw
- Socio-economic factors influencing uptake of agriculture insurance by smallholder maize farmers in Goromonzi district of ZimbabweUploaded byPremier Publishers

- System Administration Manua v5.0.2Uploaded bygreg_williams9235
- Faith Auto Works Inc. Collision Center Business Profile 2017Uploaded byFaith Auto Works Inc.
- United States v. Honorable Jack B. Weinstein, United States District Judge for the Eastern District of New York, United States of America v. Albert Grunberger, 452 F.2d 704, 2d Cir. (1971)Uploaded byScribd Government Docs
- Transit Astrology.pdfUploaded byJim Letterman
- Cpix Post Contract Bim Execution Plan Bep r1.0 METADATAUploaded byGansgans
- Various Ways to Build an ALV Fieldcatalog Within SAPUploaded bymdwaris
- Answer 1Uploaded byYash Patil
- Perioperative Nursing - OrIGINALUploaded byJamil Lorca
- 2leah b. Taday v. Atty. Dionisio b. ApoyaUploaded bylala reyes
- SHE101 Kuliah 120906 Dato' DevaUploaded byCzeryo Cheang
- Inhouse JournalUploaded byPathan Shaibaj
- Gennadios Limouris, The Integrity of Creation in a World of Change Today, 1990Uploaded byConstantin Balasoiu
- _yzing Neural Time Series Data Theory and PracticeUploaded byJuan Pablo Dehesa Golding
- KinamaticsUploaded byChathur Chandrasoorya
- KQ.docxUploaded byRahul Gautam
- Chapter 1- ni pampam.docxUploaded byPamela Magbanua Cabangon
- Exp 9Uploaded byRaj Kumar
- 04_Kukan v MoralesUploaded byTheodoreJosephJumamil
- Community Psychology Rudkin PDFUploaded byLance
- Rat Dissection Protocol _ Intro BiologyUploaded byValar Mathei Padmanadhan
- Boating Safety Signage and Buoyage GuidelinesUploaded byshanrama
- Rds Edi Nw730v1 Process Flows en XxUploaded byrexjonathan
- Caffeine and Creatine Use in SportUploaded byHani Assi
- Processing Industrial Byproducts to Yield as FertilizerUploaded byChelsea Skinner
- Corporate Restructuring Assignment #6Uploaded byNatt Niljianskul
- 26. Bert Osmena vs CAUploaded byZahraMina
- 4.7Uploaded bynur_rahim_20
- Micro-Evolution 2012-2013Uploaded byowls_1102
- HR Management Handbook for AcquisitionsUploaded byzianlakdawalla
- MIMO channel capacityUploaded byriazgurguri