Professional Documents
Culture Documents
Decision Theory
Big Data and Machine Learning for Applied Economics
Ignacio Sarmiento-Barbieri
1 Shifting Paradigms
3 Linear Regression
1 Shifting Paradigms
3 Linear Regression
Y = f (X ) + u (1)
Y = f (X ) + u (2)
I Interest on predicting Y
I ”Correct” f () to be able to predict (no inference!)
I Model?
1 Shifting Paradigms
3 Linear Regression
I The loss function specifies the penalty for taking a particular action
L : A → [0, ∞] (3)
I at a given θ, the risk function is the average loss that will be incurred if the estimator
δ(X) is used
I Since θ is unknown we would like to use an estimator that has a small value of
R(θ, δ) for all values θ
I If we need to compare two estimators (δ1 and δ2 ) then we will compare their risk
functions
I If R(δ1 , θ ) < R(δ2 , θ ) for all θ ∈ Θ, then δ1 is preferred because it performs better for
all θ
0.06 6e−04
0.04 4e−04
Estimators Estimators
Risk
Risk
p1 p1
p2 p2
0.02 2e−04
0.00 0e+00
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
p p
I In a prediction problem we want to predict y from f (X) in such a way that the loss is
minimum
I Decision theory requires to specify a Loss function, that penalizes errors in
prediction.
I The most common is the Squared error loss
(y − f (X))2 (5)
I this risk is also known as the mean squared (prediction) error MSE(f ) or EPE
Z
minm E(Y − m)2 = (y − m)2 f (y)dy (10)
I Result: The best prediction of Y at any point X = x is the conditional mean, when
best is measured using a square error loss
I Proof
2
(Y − X0 b)2 = ((Y − E(Y|X)) + (E(Y|X) − X0 b)) (12)
= (Y − E(Y|X))2 + (E(Y|X) − X0 b)2 + 2(Y − E(Y|X))(E(Y|X) − X0 b)
(13)
I The CEF approximation problem then has the same solution as the population least
square problems
I Regression provides the best linear predictor for the dependent variable in the same
way that the CEF is the best unrestricted predictor of the dependent variable.
I The fact that Regression approximates the CEF is useful because it helps describe the
essential features of statistical relationships, without necessarily trying to pin them
down exactly.
I Linear regression is the “work horse” of econometrics and (supervised) machine
learning.
I Very powerful in many contexts.
I Big ‘payday’ to study this model in detail.
I Casella, G., & Berger, R. L. (2002). Statistical inference (Vol. 2, pp. 337-472). Pacific
Grove, CA: Duxbury.