Professional Documents
Culture Documents
MFE 402
Dan Yavorsky
1
Topics for Today
2
Consumer Search with Maximum
Likelihood
A Model of Sequential Consumer Search
uij = x⏟
j 𝛽 + 𝜂ij +𝜀ij
𝛿ij
3
Reservation Utilities
∞
Bij (u∗i ) = ∫ (uij - u∗i ) fuij (uij ) duij
u∗i
Define a reservation utility (zij ) by equating marginal benefit and marginal cost:
𝜙(𝜁ij )
cij = Bij (zij ) ⟹ cij = [1 - Φ(𝜁ij )] × [ -𝜁 ]×𝜎
1 - Φ(𝜁ij ) ij
Kim et al. (2010) shows you can invert this function to find zij = 𝛿ij + 𝜁ij 4
Rational (Optimal) Behavior
Weitzman (1979) shows that a consumer acting optimally will search alternatives in
order of descending reservation utilities, continuing until the maximum realized utility
of the searched alternatives is higher than the reservation utility of the
next-to-be-searched alternative
5
Example
6
Likelihood
Individual Likelihood
Li (𝛽, 𝛾) = ∫ 1[zij ≥ max {uh } for j = 2, … , Ki ⋂
h<j
Total Likelihood
N
L(𝛽, 𝛾) = ∏ Li (𝛽, 𝛾)
i=1
7
Estimation via KSF MLE
Define:
-1
Ki Ki
q
Lĩ =⎛
⎜1 + ∑ e-𝜆𝜈1,j + ∑ e-𝜆𝜈2,j + e-𝜆𝜈3 + e-𝜆𝜈4 ⎞
⎟
8
⎝ j=2 j=1 ⎠
Results
9
Leverage and Outliers
Intuition about Unusual Observations
A: An observation that deviates markedly from the rest of the sample, due to…
• Large residual: the distance between the actual data point (Yn |Xn ) and its fitted value
(Ŷ T ) is much larger for the outlier than for other observations
• High leverage: the data point has an unusual combination of values for the explanatory
variable values (ie, it’s in a “remote” part of the X-space)
Why do we care?
• Observations with high leverage and large residuals are influential: if you dropped that
observation from the data, coefficient estimates would change markedly
• May suggest something is wrong with the model specification
• Understand if predicted values are driven more by data or modeling assumptions
10
Visual Example: Influential Observation
^ < e
Large Positive e, e
X 11
Visual Example: “Hidden” Leverage
2
1
X2
0 −1
−2
−2 −1 0 1 2
12
X1
Leverage Values
The leverage value of observation i is the ith diagonal element of the “hat” matrix
P = X(X′ X)-1 X′ :
Rule of thumb:
13
The Effect on the Coefficients and Fitted Values (part 1)
Let’s assess the effect on the coefficient estimates when we leave out observation i:
𝛽(̂ -i) = (X′(-i) X(-i) )-1 X′(-i) y(-i)
-1
= (∑ Xj X′j ) (∑ Xj Yj )
j≠i j≠i
-1
= (X′ X - Xi X′i ) (X′ Y - Xi Yi )
𝛽(̂ -i) - (X′ X)-1 Xi X′i 𝛽(̂ -i) = (X′ X)-1 (X′ Y - Xi Yi ) = 𝛽 ̂ - (X′ X)-1 Xi Yi
Rearrange to find:
𝛽 ̂ - 𝛽(̂ -i) = (X′ X)-1 Xi (Yi - Xi 𝛽(̂ -i) ) = (X′ X)-1 Xi eĩ
14
The Effect on the Coefficients and Fitted Values (part 2)
eî = Yi - X′i 𝛽 ̂ = Yi - X′i 𝛽(̂ -i) - X′i (X′ X)-1 Xi eĩ = (1 - hii )eĩ
Alternatively:
Ŷ i - Ỹ i = X′i 𝛽 ̂ - X′i 𝛽(̂ -i) = X′i (X′ X)-1 Xi eĩ = hii (1 - hii )-1 eî
Thus both differences (in the coefficient estimates and in the fitted values) are
functions of the observation’s leverage and residual.
15
Some Intuition
Because OLS minimizes squared errors, observations with high leverage can “pull” the
regression line toward the observation in order to minimize the error. Thus, observations with
high leverage (hii ) will have smaller residuals (eî ) as a result of this influence.
Under homoskedasticity:
So when looking for unusual observations, you should look for those with high leverage and
high standardized residuals, defined as:
eî ei
rî = ≈ ∼ N(0, 1)
s√1 - hii 𝜎
16
Regression Diagnostics
Residuals vs Fitted Normal Q−Q
4
236 227 227 236
3
2
Standardized residuals
2
1
Residuals
1
0
−1 0
−1
−2
−3
123
123
4
123
236 227 0.5
Standardized residuals
Standardized residuals
1.5
198
2
1.0
0
35
0.5
−2
123
0.0
Cook's distance
−4
0.5
2.90 2.95 3.00 3.05 3.10 0.00 0.01 0.02 0.03 0.04 0.05 0.06
17
What to do with Influential Observations
18
Forecasting
Prediction and Prediction Error
Ỹ n+1 = X′n+1 𝛽 ̂
E[Y|X] = β0 + β1X
Yn+1
en+1
Prediction Error
^ ^
Sampling Error β0 + β1X
^
Yn+1
Xn+1 20
MSFE: Mean Squared Forecast Error
Notice that;
• MSFE > 𝜎2
• MSFE depends on Xn+1
• In particular, MSFE gets larger as Xn+1 is further from X̄
21
Estimating MSFE & Prediction Intervals
Under conditional homoskedasticity (i.e., var(ei |Xi ) = 𝜎2 for all i), this simplifies to:
̂ of the MSFE:
Replace 𝜎2 with s2 to get an estimator MSFE
Assuming the errors are normally distributed (i.e., e ∼ N(0, 𝜎2 )), we can construct
prediction intervals using the estimated MSFE, where c is the critical value from a tn-k
distribution:
22
Cross Validation
One way to assess prediction error is to set aside some of your data (the “validation”
data), fit your model on the rest. Then assess predictions on the observations in the
validation dataset.
• When data are scarce (as is often the case), this is not possible
A general, simple, and widely-used method for estimating prediction error (in any
model, not just linear regression) is K-fold cross validation.
24
Computation
[1] 0.511118
# loo regression
n <- length(y)
etilde_loo <- vector(length=n)
for(i in 1:n) {
coef_est <- coef(lm(lwage[-i] ~ exper[-i]))
etilde_loo[i] <- y[i] - x[i,] %*% coef_est
}
mean(etilde_loo^2)
[1] 0.5046459 26
Bootstrap
Bootstrap Procedure
While CV is a general purpose method for assessing prediction errors, the bootstrap
procedure is a general purpose method for assessing standard errors (and thus
confidence intervals and hypothesis tests).
The procedure:
• Sample (with replacement) n observations from the original dataset, call this
bootstrap sample b
• Calculate the parameter (or quantity of interest) from the bootstrap sample
dataset b, call this 𝜃(̂ b)
• Repeat the first two steps B times (often, B = 1, 000 or B = 10, 000)
• Then assess this distribution of B values of 𝜃(̂ b) , as explained on the next two slides
27
Bootstrap Standard Errors
The bootstrap estimator of the variance of an estimator 𝜃 ̂ is the sample variance across
the bootstrap parameter estimates (or quantity of interest):
1 B 1 B (̂ b)
V̂ boot
̂ = ∑ (𝜃(̂ b) - 𝜃)̄ where 𝜃 ̄ = ∑𝜃
𝜃 B - 1 b=1 B b=1
̂ ) = √[V̂ boot ]
s (𝜃jboot ̂
𝜃 jj
̂ ) , 𝜃 ̂ + c × s (𝜃boot
= (𝜃 ̂ - c × s (𝜃jboot ̂ ))
se boot
CI j where c = z∗1-𝛼/2 28
Bootstrap Percentile Intervals
Given that we have an empirical distribution, we can simply take the empirical
quantiles as the confidence interval boundary values.
• For example, if you have 10,000 bootstrap values of 𝜃(̂ b) , sort those values (call
(b) (250) (9750)
the sorted values 𝜃∗̂ ) and take 𝜃∗̂ and 𝜃∗̂ as your 95% confidence interval:
(250) (9750)
= (𝜃∗̂ , 𝜃∗̂
pi boot
CI )
29
Example Computations
# get data # bootstrap
dat <- read.table("support/cps09mar.txt") set.seed(1234)
exper <- dat[,1] - dat[,4] - 6 B <- 1000
lwage <- log( dat[,5]/(dat[,6]*dat[,7]) ) n <- nrow(dat)
sam <- dat[,11]==4 & dat[,12]==7 & dat[,2]==0
dat <- data.frame(exper=exper[sam], lwage=lwage[sam]) res <- matrix(NA_real_, nrow=B, ncol=2)
for(b in 1:B) {
# run regression draws <- sample(1:n, size=n, replace=T)
out <- lm(lwage ~ exper, data=dat) res[b,] <- lm(lwage ~ exper, data=dat[draws, ])$coef
tt <- summary(out)$coefficients[1:2,1:2] }
tt
# CIs from stderr
Estimate Std. Error
serr <- apply(res, 2, function(x) sqrt(var(x)))
(Intercept) 2.876515044 0.067631401
cbind(low = out$coef - 2*serr,
exper 0.004776039 0.004335196
high = out$coef + 2*serr)
# calculate CIs
cbind(low = tt[,1] - 2*tt[,2], low high
high = tt[,1] + 2*tt[,2]) (Intercept) 2.737715643 3.01531444
exper -0.003596382 0.01314846
low high
# CIs from percentiles
(Intercept) 2.741252241 3.01177785
cbind(sort(res[,1])[c(25, 975)],
exper -0.003894353 0.01344643
sort(res[,2])[c(25, 975)])
[,1] [,2]
[1,] 2.742939 -0.003159473
[2,] 3.012780 0.013336742
30
Next Time
Next time:
• Bayes!
31