You are on page 1of 48

1 / 48

ECON3210
Big Data Econometrics
Lecture Slides Week 1
Fangzhou Yu

ECON3210
2 / 48

Week 1: Course Overview


Econometrics in a Big Data world

Course overview & themes

Big Data & research design

Econometrics vs ML or Econometrics & ML

Prediction & Causal inference

Prediction methods for causal inference

This course is 20% Prediction + 80% Causal inference

The key takeaway from this course:

Difference between Prediction and Causal inference

ECON3210
3 / 48

1. What is Big Data?

ECON3210
4 / 48

Data, data everywhere


Conventional sources

Statistical agencies such as Australia Bureau of Statistics (ABS)

Census of Population & Housing

Surveys such as Australian Health Survey

GDP, employment rate, income…

Reserve Bank of Australia (RBA)

Interest rates, exchange rates…

Administrative data

Firms, government agencies routinely record their operations

Sources of data have exploded & become more diverse

Era of Big Data & the data deluge

ECON3210
5 / 48

What exactly is Big Data?


According to NSF and NIH — two of the largest funders of academic research on big data

In their 2012 joint program solicitation:

Big data […] refers to large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments,
sensors, Internet transactions, email, video, click streams, and/or all other digital sources available

Or according to the “3Vs” definition of Big Data

The definition of big data is data that contains greater variety, arriving in increasing volumes and with more
velocity.

ECON3210
6 / 48

What exactly is Big Data?


“The GDELT Project is an initiative to
construct a catalog of human societal-
scale behavior and beliefs across all
countries of the world, connecting every
person, organization, location, count,
theme, news source, and event across
the planet into a single massive network
that captures what’s happening around
the world, what its context is and who’s
involved, and how the world is feeling
about it, every single day.”

ECON3210
7 / 48

What exactly is Big Data?


Also observe many more “traditional” surveys

Data sources varied & not monopoly of statistical agencies

Data collection costs relative low

Researchers/Business developing new data sources

Online panels

Lab, field & social experiments

However, the above definitions/examples of Big Data is vague

Later, We will have a more detailed discussion of the “Big Data” to be discussed in this course

ECON3210
8 / 48

Common data in ECON


A common data set we have in Economics

n observations and p variables

p << n 10 : 2 5
− 10 observations, less than 10 or tens of variables

The traditional methods (e.g OLS, IV) can handle this

ECON3210
9 / 48

Big n problem ✗
Big n is when the number of observations explodes

Census data / user data in big tech firms (Goolge, Amazon…) / high frequency
data (stock transaction)

The size of the data file is the major challenge

New software/platform to analyze this kind of data: SQL/Hive/Hadoop

But little challenge to the traditional methods

Not discussed in this course

ECON3210
10 / 48

Big p problem ✓

Big p is when the number of variables explodes The dataset does not have to be big in size

Sometimes due to rich dataset Huge challenge for traditional methods (will discuss this

Sometimes we create a Big p problem by adding in next week)

polynomials or interactions to capture complex Sometimes also called high-dimensional data


relationships This is the focus of this course
ECON3210
11 / 48

2. Big Data with different research objectives

ECON3210
12 / 48

Research objectives
Descriptive Analytics: summarizing & exploring data

Is there a problem with customer churn?

Infering “social networks” from social media interactions

Our world in data

Predictive Analytics: forecasting out-of-sample data

Which customers are at the most risk of churning?

What is the GDP/employment rate/electricity consumption in next year?

“Your recommendations” on Amazon, Netflix …

ECON3210
13 / 48

Research objectives
Causal Inference: predicting counterfactuals

Will customers stay if they are offered discounts?

Is it profitable if online advertising is increased?

What is the effect of lockdown during Covid-19?

What is the impact of combating inflation by raising interest rates on the housing market?

How much is the gender wage gap?

Why does incumbency status affect election outcomes?

This kind of research objectives is the main focus of Economics

Differs from the other two because correlation is not causation

ECON3210
14 / 48

Big p challenge under research objectives


For Descriptive Analytics and Predictive Analytics

Machine Learning methods are developed to deal with Big p

Example: Chatgpt, the Large Language Model (LLM)

Model Release Training Data Size Parameters Training Data Source


Date

GPT-1 2018 ~800 million 117M BooksCorpus (7,000 unpublished books)


words

GPT-2 2019 ~40GB 1.5B WebText (45 million pages from diverse web sources)

GPT-3 2020 Hundreds of GBs 175B WebText2 (Extended and larger version of WebText)

GPT- Not available Not available Not Not available


3.5 available

GPT-4 Knowledge Not available Not Assumed larger and more diverse dataset
cutoff 2021 available

ECON3210
15 / 48

Previous page is generated by Chatgpt4

ECON3210
16 / 48

Big p challenge under research objectives


For Causal Inference

Machine Learning methods cannot be applied directly

ML is algorithm-based while Econometrics is model- or design-based

ML stresses accurate predictions Y^

while Econometrics stresses relationships β


^
and its inferences (e.g. hypothesis testing)

ML relies on data-driven model selection

while Econometrics starts with a model based on economic theory

ECON3210
17 / 48

“…ML tools are becoming standard across disciplines, so the economist’s toolkit needs to adapt accordingly while
preserving the traditional strengths of applied econometrics.” - Athey and Imbens (2019)

The main task of ECON3210 is to introduce recent research in


causal inference with ML methods

Susan Athey: Professor of Economics at Stanford, President of American Economic Association, Chief Economist at
Microsoft

Guido Imbens: Professor of Economics at Stanford, Nobel Laureate in 2021 for his research in causal inference, Chief
Editor of Econometrica, Husband of Susan Athey
ECON3210
18 / 48

3. Regression as description, prediction and


causal inference

ECON3210
19 / 48

Linear regression for descriptive analysis


Linear regression model
Yi = β0 + β1Xi + ϵi

Consider sales (Y ) & TV advertising (X )


i i

(advertising.csv)

Q1 What key features of the data are revealed by the


scatter plot?

Without specifying a linear model,

the OLS estimator β


^
is just (standardized) covariance of sales and TV advertising
1

Cov(Xi, Yi)
^
β1 =
V ar(Xi)

ECON3210
20 / 48

Linear regression for predictiive analysis


Assume that the relationship between sales and TV ad is linear

Or without assuming linearity, OLS is Best Linear Unbiased Predictor (BLUE) by Gauss Markov theorem

Either way, we can use TV ad to predict out-of-sample market

^ = 7.033 + 0.048X
Y i i

A market where T V i
ˆ = 11.833
= 100 → salesi

ECON3210
21 / 48

Linear regression for causal inference


But may want the model to do even more - causality & “what-if” counterfactuals

What happens to sales in a particular market if TV ad were increased

Doesn’t our regression answer this question?

At least two threats to causally interpret the regression

Confounding variables leading to omitted variable bias

What could be the confounding variables?

Reverse Causality

What if markets with low sales increase advertising?

ECON3210
22 / 48

Correlation is not causation


If X and Y are correlation, what could be the true relationship (or more formally, the data generating process (DGP))

X could cause Y (causality we are looking for)

or Y could cause X (threat of reverse causality)

or X could cause Y , and C could cause both X and Y (threat of confounding)

or no causation, but C could cause both X and Y (threat of confounding)

or D, E, F …

Q2: What if X and Y are not correlated?

ECON3210
23 / 48

4. Introduction to causal inference

ECON3210
24 / 48

Causality & notion of ceteris paribus


Definition of causal effect of X on Y

How does variable Y change if X is changed but all other relevant factors are held constant

In evaluating an intervention or policy change think of counterfactual outcomes

e.g. A person’s wage with & without higher education

Important to define the causal effect of interest

Useful to describe how an experiment would have to be designed to infer the causal effect in question

See this NSW doc and Netflix blog for a summary of experiment or A/B test

ECON3210
25 / 48

Experiment 1
Impact of back-to-work program on employment

“If a person from population of those looking for work and given access to a back-to-work program, will that
increase their chance of employment?

Implicit assumption: all other factors that influence employment (experience, ability, local employment
prospects…) are held fixed

Experiment:

Choose a group of workers looking for work

Randomly assign them to access the program or not

Compare employment outcomes in next period

Experiment works because of randomness: characteristics of people are independent of where they receive
program or not

ECON3210
26 / 48

RCT evaluating back-to-work program

Haynes et al. (2012) ECON3210


27 / 48

Experiment 2
A/B testing of a website landing page

“If a business rearranges its current website (UI), by how much will this change customer behaviors such as
consumption, time spending on the website…”

Implicit assumption: all other factors that influence customer behaviors are held fixed

ECON3210
28 / 48

A/B test by Netflix

ECON3210
29 / 48

Experiment 3
Measuring returns to education

“If a person is given another year of education, by how much will his or her wage increase

Implicit assumption: experience, family background, intelligence etc. are held fixed

How would you design an experiment of this

What are the difficulties in your experiment

ECON3210
30 / 48

Experiments in a Big Data world


Experiments by firms are common

ECON3210
31 / 48

Experiments in a Big Data world


Economists also run experiments

ECON3210
32 / 48

Experiments in a Big Data world


But most empirical analysis in Economics relies on non-experimental/observational data

Experiments can be expensive, Economists do have the funds as firms

Experiments can be unethical, such as random assignment of education years

It is more challenging to identify causal effect in observational data

The difficulty is usually called “selection bias”

Next, need to formalize how we think about impact of a treatment

ECON3210
33 / 48

Potential outcome framework


Previous examples included various types of treatment: advertising, lockdown, raising interest rate…

Will concentrate on binary treatment or policy

Let D represents a binary treatment


i

Customer saw new website (D i ) or old (D


= 1 i = 0 )

Person completed a university degree (D i = 1 ) or not (D


i )
= 0

In experiments, treatments applied by chance

In observational data, treatment (are more like to) be applied by choice

Deciding to go to university is a choice and unlikely to be as if randomly assigned leading to a selection problem

ECON3210
34 / 48

Potential outcome framework

Consider two potential states of the world

Two potential outcomes depending on the treatment status

Yi(1) if treated

Yi(0) if untreated

Unit level treatment/causal effect is

T Ei = Yi(1) − Yi(0)

We observe Y i = Yi(1) if D
i ,
= 1 Yi = Yi(0) if D
i = 0

ECON3210
35 / 48

Potential outcome framework


I am having a headache, and I need to make the decision whether to take pill or not

Binary treatment D : take pill or not


i

Graph by Brady Neal


ECON3210
36 / 48

Potential outcome framework

Di = 1

ECON3210
37 / 48

Potential outcome framework

Di = 1

Di = 0

ECON3210
38 / 48

Potential outcome framework

Di = 1

Yi(Di) = Yi(1) = 1

Di = 0

Yi(Di) = Yi(0) = 0

ECON3210
39 / 48

Potential outcome framework

Di = 1

Yi(Di) = Yi(1) = 1

Di = 0

Yi(Di) = Yi(0) = 1

ECON3210
40 / 48

Potential outcome framework

Q3: For any individual, the treatment effect cannot be identified/estimated. Why?

Q4: What if we can observe more individuals?

ECON3210
41 / 48

Potential outcome framework


The true status of the two parallel worlds

i D Y Y (1) Y (0) Y (1) − Y (0)

1 0 0 0 0 0

2 1 1 1 1 0

3 1 0 0 0 0

4 0 0 0 0 0

5 0 1 0 1 -1

6 1 1 1 0 1

With this population, we can define the Average treatment effect (ATE)

AT E = E[T Ei] = E[Yi(1) − Yi(0)] =?

ECON3210
42 / 48

Potential outcome framework


We can only observe

i D Y Y (1) Y (0) Y (1) − Y (0)

1 0 0 0 ?

2 1 1 1 ?

3 1 0 0 ?

4 0 0 0 ?

5 0 1 1 ?

6 1 1 1 ?

Q3: This is why treatment effect can not be identified/estimated at individual level

What about Q4?

ECON3210
43 / 48

Potential outcome framework


i D Y Y (1) Y (0) Y (1) − Y (0)

1 0 0 0 ?

2 1 1 1 ?

3 1 0 0 ?

4 0 0 0 ?

5 0 1 1 ?

6 1 1 1 ?

2/3 1/3

Try to use the group means we have (a feasible estimator)

2/3 − 1/3 = 1/3 ≠ AT E

ECON3210
44 / 48

Potential outcome framework


Formally, the observed outcome can be written as

Yi = DiYi(1) + (1 − Di)Yi(0)

What we did in previous slide is

E[Yi|Di = 1] − E[Yi|Di = 0] = 1/3

And we obtained the conclusion that

E[Yi(1) − Yi(0)] ≠ E[Yi|Di = 1] − E[Yi|Di = 0]

The RHS is the estimator we usually use in experiments, right? Why is it not working in this case?

ECON3210
45 / 48

Potential outcome framework


Because the treatment is not randomly assigned! Otherwise, we would have

E[Yi|Di = 1] − E[Yi|Di = 0]

=E[Yi = DiYi(1) + (1 − Di)Yi(0)|Di = 1] − E[Yi = DiYi(1) + (1 − Di)Yi(0)|Di = 0]

=E[Yi(1)|Di = 1] − E[Yi(0)|Di = 0]

=E[Yi(1)] − E[Yi(0)] by random treatment

So for Q4,

We can estimate ATE in experiments

But cannot in observational data, this is the challenge we will deal with in later lectures

ECON3210
46 / 48

Linear regression in Potential outcome framework


How does the familiar linear regression fit into our discussion?

The observed outcome

Yi = DiYi(1) + (1 − Di)Yi(0) = Yi(0) + (Yi(1) − Yi(0))Di

Need some assumptions to transform this into a linear model

Assume Y (0)
i = α + ϵi(0)

Assume treatment effect is constant across individuals, that is T E


i = Yi(1) − Yi(0) = τ

Then we have a linear model

Yi = α + τ Di + ϵi(0)

τ can be estimated in experiments because the exogeneity assumption in linear regression

E[ϵi|Di] = E[ϵi(0)|Di] = 0

which is implied by the random assignment treatment in experiments


ECON3210
47 / 48

Summary and Future Plans


Dicussed the concept of Big Data

Difference between prediction and causal inference

Potential outcome framework

The role of experiments in causal inference

Difficulties in causal inference with observational data

For Week 2 and 3, introduce 2 machine learning methods

Prediction problems in Econometrics with Big Data

Also a preparation of future discussion of the new causal inference methods in Econometrics

Come back to causal inference after Week 3

Causal inference with machine learning methods

ECON3210
48 / 48

References
Athey, Susan, and Guido W Imbens. 2019. “Machine Learning Methods That Economists Should Know About.” Annual
Review of Economics 11: 685–725.
Haynes, Laura, Ben Goldacre, David Torgerson, et al. 2012. “Test, Learn, Adapt: Developing Public Policy with
Randomised Controlled Trials.” Cabinet Office-Behavioural Insights Team.

ECON3210

You might also like