You are on page 1of 19

The classic and the predictive paradigms

Decision Theory
Big Data and Machine Learning for Applied Economics

Ignacio Sarmiento-Barbieri

Universidad de los Andes

August 10, 2022

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 0 / 16


Agenda

1 Shifting Paradigms

2 Statistical Decision Theory: A bit of theory

3 Linear Regression

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 1 / 16


Agenda

1 Shifting Paradigms

2 Statistical Decision Theory: A bit of theory

3 Linear Regression

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 1 / 16


Shifting Paradigms
Big vs Small

I Classical Stats (small data?)


I Get the most of few data (Gosset)
I Lots of structure, e.g. X1 , X2 , ..., Xn ∼ tv
I Carefully curated → approximates random sampling (expensive, slow) but very good
and reliable

I Big Data (the 4 V’s)


I Data Volume
I Data Variety
I Data Velocity
I Data Value

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 2 / 16


Classic vs Predictive: The Classic Paradigm

Y = f (X ) + u (1)

I Interest lies on inference


I ”Correct” f () to understand how Y is affected by X
I Model: Theory, experiment
I Hypothesis testing (std. err., tests)

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 3 / 16


Classic vs Predictive: The Predictive Paradigm

Y = f (X ) + u (2)

I Interest on predicting Y
I ”Correct” f () to be able to predict (no inference!)
I Model?

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 4 / 16


Agenda

1 Shifting Paradigms

2 Statistical Decision Theory: A bit of theory

3 Linear Regression

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 4 / 16


Statistical Decision Theory: A bit of theory
How do we measure: “Whatever works, works....”?

I We need a bit of theory to give us a framework for choosing f


I A decision theory approach involves an action space A
I The action space A specify the possible ”actions we might take”
I Some examples

Table 1: Action Spaces


Inference Action Space
Estimation A=Θ
Prediction A = space of Xn+1
Model Selection A = {Model I, Model II, ...}
Hyp. Testing A = {Reject|Accept H0 }

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 5 / 16


Statistical Decision Theory: A bit of theory

I The loss function specifies the penalty for taking a particular action

L : A → [0, ∞] (3)

I Loss is minimum if the action is correct

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 6 / 16


Statistical Decision Theory: A bit of theory
Risk Function

I In a decision theoretic analysis, the quality of an estimator is quantified by its risk


function,
I For example, for an estimator δ(x) of θ, the risk function is

R(θ, δ) = Eθ (L(θ, δ(X)) (4)

I at a given θ, the risk function is the average loss that will be incurred if the estimator
δ(X) is used

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 7 / 16


Statistical Decision Theory: A bit of theory
Risk Function

I Since θ is unknown we would like to use an estimator that has a small value of
R(θ, δ) for all values θ
I If we need to compare two estimators (δ1 and δ2 ) then we will compare their risk
functions
I If R(δ1 , θ ) < R(δ2 , θ ) for all θ ∈ Θ, then δ1 is preferred because it performs better for
all θ

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 8 / 16


Statistical Decision Theory: A bit of theory
Example: Binomial Risk Function

I Let X1 , X2 , ...Xn ∼iid Bernoulli(p)

I Consider 2 estimators for p:


1
1 p̂1 = n ∑ Xi

∑ Xi +√ n/4
2 p̂2 = n+ n
I Their risks are:
p(1−p)
1 R(p̂1 , p) = n
n√
2 R(p̂2 , p) = 4(n+ n)2

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 9 / 16


Statistical Decision Theory: A bit of theory
Example: Binomial Risk Function

0.06 6e−04

0.04 4e−04

Estimators Estimators
Risk

Risk
p1 p1
p2 p2

0.02 2e−04

0.00 0e+00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
p p

(a) n=4 (b) n=400

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 10 / 16


Decision Theory for prediction
How to choose f?

I In a prediction problem we want to predict y from f (X) in such a way that the loss is
minimum
I Decision theory requires to specify a Loss function, that penalizes errors in
prediction.
I The most common is the Squared error loss

(y − f (X))2 (5)

I we are going to assume that X ∈ Rp and Y ∈ R with joint distribution Pr(X, Y)

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 11 / 16


Decision Theory for prediction
How to choose f?

I Minimizing the risk leads to a criterion for choosing f (X)

R(Y, f (X)) = E[(Y − f (X))2 ] (6)


Z
= (y − f (x))2 Pr(dx, dy) (7)

I conditioning on X we have that

R(Y, f (X)|X) = EX EY|X [(Y − f (X))2 |X] (8)

I this risk is also known as the mean squared (prediction) error MSE(f ) or EPE

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 12 / 16


Decision Theory for prediction
I By conditioning on X, it suffices to minimize the MSE(f ) point wise so

f (x) = argminm EY|X [(Y − m)2 |X = x) (9)

I Y a random variable and m a constant (predictor)

Z
minm E(Y − m)2 = (y − m)2 f (y)dy (10)

I Result: The best prediction of Y at any point X = x is the conditional mean, when
best is measured using a square error loss

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 13 / 16


Linear Regression
I Note the following from the Regression-CEF Theorem
The function X0 β provides the minimum risk linear approximation to E(Y|X), that is

β = argmin E (E(Y|X) − X0 b)2



(11)
b

I Proof
2
(Y − X0 b)2 = ((Y − E(Y|X)) + (E(Y|X) − X0 b)) (12)
= (Y − E(Y|X))2 + (E(Y|X) − X0 b)2 + 2(Y − E(Y|X))(E(Y|X) − X0 b)
(13)

I The CEF approximation problem then has the same solution as the population least
square problems

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 14 / 16


Linear Regression

I Regression provides the best linear predictor for the dependent variable in the same
way that the CEF is the best unrestricted predictor of the dependent variable.
I The fact that Regression approximates the CEF is useful because it helps describe the
essential features of statistical relationships, without necessarily trying to pin them
down exactly.
I Linear regression is the “work horse” of econometrics and (supervised) machine
learning.
I Very powerful in many contexts.
I Big ‘payday’ to study this model in detail.

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 15 / 16


Further Readings

I Angrist, J. D., & Pischke, J. S. (2008). Mostly harmless econometrics. Princeton


university press.

I Casella, G., & Berger, R. L. (2002). Statistical inference (Vol. 2, pp. 337-472). Pacific
Grove, CA: Duxbury.

I Elliott, G., & Timmermann, A. (2008). Economic forecasting. Journal of Economic


Literature, 46(1), 3-56.

I Tom Shaffer The 42 V’s of Big Data and Data Science.


https://www.kdnuggets.com/2017/04/42-vs-big-data-data-science.html

Sarmiento-Barbieri (Uniandes) Decision Theory August 10, 2022 16 / 16

You might also like