02 Predictive Paradigm DecisionTheory

The classic and the predictive paradigms
Decision Theory
Big Data and Machine Learning for Applied Economics
Ignacio Sarmiento-Barbieri
Universidad de los Andes
August 10, 2022
Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 0 / 16

Agenda
1 Shifting Paradigms
2 Statistical Decision Theory: A bit of theory
3 Linear Regression

Agenda
3 Linear Regression

Shifting Paradigms
Big vs Small
I Classical Stats (small data?)

I Get the most of few data (Gosset)
I Lots of structure, e.g. X1 , X2 , ..., Xn ∼ tv
I Carefully curated → approximates random sampling (expensive, slow) but very good
and reliable
I Big Data (the 4 V’s)

I Data Volume
I Data Variety
I Data Velocity
I Data Value

Classic vs Predictive: The Classic Paradigm
Y = f (X ) + u (1)
I Interest lies on inference

I ”Correct” f () to understand how Y is affected by X
I Model: Theory, experiment
I Hypothesis testing (std. err., tests)

Classic vs Predictive: The Predictive Paradigm
Y = f (X ) + u (2)
I Interest on predicting Y
I ”Correct” f () to be able to predict (no inference!)
I Model?

Agenda
3 Linear Regression

Statistical Decision Theory: A bit of theory
How do we measure: “Whatever works, works....”?
I We need a bit of theory to give us a framework for choosing f

I A decision theory approach involves an action space A
I The action space A specify the possible ”actions we might take”
I Some examples
Table 1: Action Spaces

Inference Action Space
Estimation A=Θ
Prediction A = space of Xn+1
Model Selection A = {Model I, Model II, ...}
Hyp. Testing A = {Reject|Accept H0 }

I The loss function specifies the penalty for taking a particular action
L : A → [0, ∞] (3)
I Loss is minimum if the action is correct

Risk Function
I In a decision theoretic analysis, the quality of an estimator is quantified by its risk

function,
I For example, for an estimator δ(x) of θ, the risk function is
R(θ, δ) = Eθ (L(θ, δ(X)) (4)
I at a given θ, the risk function is the average loss that will be incurred if the estimator
δ(X) is used

Risk Function
I Since θ is unknown we would like to use an estimator that has a small value of
R(θ, δ) for all values θ
I If we need to compare two estimators (δ1 and δ2 ) then we will compare their risk
functions
I If R(δ1 , θ ) < R(δ2 , θ ) for all θ ∈ Θ, then δ1 is preferred because it performs better for
all θ

Example: Binomial Risk Function
I Let X1 , X2 , ...Xn ∼iid Bernoulli(p)
I Consider 2 estimators for p:

1
1 p̂1 = n ∑ Xi
√
∑ Xi +√ n/4
2 p̂2 = n+ n
I Their risks are:
p(1−p)
1 R(p̂1 , p) = n
n√
2 R(p̂2 , p) = 4(n+ n)2

Example: Binomial Risk Function
0.06 6e−04
0.04 4e−04
Estimators Estimators
Risk
Risk
p1 p1
p2 p2
0.02 2e−04
0.00 0e+00
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
p p
(a) n=4 (b) n=400

Decision Theory for prediction
How to choose f?
I In a prediction problem we want to predict y from f (X) in such a way that the loss is
minimum
I Decision theory requires to specify a Loss function, that penalizes errors in
prediction.
I The most common is the Squared error loss
(y − f (X))2 (5)
I we are going to assume that X ∈ Rp and Y ∈ R with joint distribution Pr(X, Y)

How to choose f?
I Minimizing the risk leads to a criterion for choosing f (X)
R(Y, f (X)) = E[(Y − f (X))2 ] (6)

Z
= (y − f (x))2 Pr(dx, dy) (7)
I conditioning on X we have that
R(Y, f (X)|X) = EX EY|X [(Y − f (X))2 |X] (8)
I this risk is also known as the mean squared (prediction) error MSE(f ) or EPE

I By conditioning on X, it suffices to minimize the MSE(f ) point wise so
f (x) = argminm EY|X [(Y − m)2 |X = x) (9)
I Y a random variable and m a constant (predictor)
Z
minm E(Y − m)2 = (y − m)2 f (y)dy (10)
I Result: The best prediction of Y at any point X = x is the conditional mean, when
best is measured using a square error loss

Linear Regression
I Note the following from the Regression-CEF Theorem
The function X0 β provides the minimum risk linear approximation to E(Y|X), that is
β = argmin E (E(Y|X) − X0 b)2

(11)
b
I Proof
2
(Y − X0 b)2 = ((Y − E(Y|X)) + (E(Y|X) − X0 b)) (12)
= (Y − E(Y|X))2 + (E(Y|X) − X0 b)2 + 2(Y − E(Y|X))(E(Y|X) − X0 b)
(13)
I The CEF approximation problem then has the same solution as the population least
square problems

Linear Regression
I Regression provides the best linear predictor for the dependent variable in the same
way that the CEF is the best unrestricted predictor of the dependent variable.
I The fact that Regression approximates the CEF is useful because it helps describe the
essential features of statistical relationships, without necessarily trying to pin them
down exactly.
I Linear regression is the “work horse” of econometrics and (supervised) machine
learning.
I Very powerful in many contexts.
I Big ‘payday’ to study this model in detail.

Further Readings
I Angrist, J. D., & Pischke, J. S. (2008). Mostly harmless econometrics. Princeton

university press.
I Casella, G., & Berger, R. L. (2002). Statistical inference (Vol. 2, pp. 337-472). Pacific
Grove, CA: Duxbury.
I Elliott, G., & Timmermann, A. (2008). Economic forecasting. Journal of Economic

Literature, 46(1), 3-56.
I Tom Shaffer The 42 V’s of Big Data and Data Science.

https://www.kdnuggets.com/2017/04/42-vs-big-data-data-science.html

02 Predictive Paradigm DecisionTheory

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02 Predictive Paradigm DecisionTheory

Uploaded by

Copyright:

Available Formats

The classic and the predictive paradigms

Universidad de los Andes

August 10, 2022

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 0 / 16

2 Statistical Decision Theory: A bit of theory

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 1 / 16

2 Statistical Decision Theory: A bit of theory

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 1 / 16

I Classical Stats (small data?)

I Big Data (the 4 V’s)

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 2 / 16

I Interest lies on inference

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 3 / 16

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 4 / 16

2 Statistical Decision Theory: A bit of theory

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 4 / 16

I We need a bit of theory to give us a framework for choosing f

Table 1: Action Spaces

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 5 / 16

I Loss is minimum if the action is correct

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 6 / 16

I In a decision theoretic analysis, the quality of an estimator is quantified by its risk

R(θ, δ) = Eθ (L(θ, δ(X)) (4)

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 7 / 16

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 8 / 16

I Let X1 , X2 , ...Xn ∼iid Bernoulli(p)

I Consider 2 estimators for p:

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 9 / 16

(a) n=4 (b) n=400

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 10 / 16

I we are going to assume that X ∈ Rp and Y ∈ R with joint distribution Pr(X, Y)

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 11 / 16

I Minimizing the risk leads to a criterion for choosing f (X)

R(Y, f (X)) = E[(Y − f (X))2 ] (6)

I conditioning on X we have that

R(Y, f (X)|X) = EX EY|X [(Y − f (X))2 |X] (8)

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 12 / 16

f (x) = argminm EY|X [(Y − m)2 |X = x) (9)

I Y a random variable and m a constant (predictor)

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 13 / 16

β = argmin E (E(Y|X) − X0 b)2

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 14 / 16

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 15 / 16

I Angrist, J. D., & Pischke, J. S. (2008). Mostly harmless econometrics. Princeton

I Elliott, G., & Timmermann, A. (2008). Economic forecasting. Journal of Economic

I Tom Shaffer The 42 V’s of Big Data and Data Science.

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 16 / 16

You might also like