You are on page 1of 129

Introduction to Econometrics

Lecture 2: Review of Basic Statistics & Randomized Controlled


Trials(RCTs)

Zhaopeng Qu

Business School,Nanjing University

Sep. 19th, 2019

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 1 / 129
Outlines

1 An Brief Review of Basic Statistics

2 Review: Random Experiment as the Research Design

3 What is an RCT?

4 Randomization: The Cornerstone of RCTs

5 How to Run RCTs?

6 Limitations of RCTs

7 An Example of Randomized Controlled Trials . . . . . . . . . . . . . . . . . . . .


. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 2 / 129
An Brief Review of Basic Statistics

An Brief Review of Basic Statistics

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 3 / 129
An Brief Review of Basic Statistics Basic Concepts

Population and Sample(总体与样本)

A population is a collection of people, items, or events about


which you want to make inferences.
Population always have a probability distribution.
A sample is a subset of population, which draw from population
in a certain way.
The sample could also follow a probability distribution.
To represent the population well, a sample should be randomly
collected and adequately large.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 4 / 129
An Brief Review of Basic Statistics Basic Concepts

Random Sample(随机样本) and i.i.d(独立同分布)

Definition
The r.v.s are called a random sample of size n from the population
f(x) if X1 , ..., Xn are mutually independent and have the same
p.d.f/p.m.f f(x). Alternatively, X1 , ..., Xn are called independent,
and identically distributedrandom variable with p.d.f/p.m.f ,
commonly abbreviated to i.i.d. r.v.s.

eg. Random sample of n respondents in a survey.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 5 / 129
An Brief Review of Basic Statistics Basic Concepts

Statistic(统计量) and Sampling Distribution

Definition
X1 , ..., Xn is a random sample of size n from the population f(x). A
statistic is a real-valued or vector-valued function fully depended on
X1 , ..., Xn , thus
T = T(X1 , ..., Xn )

A statistic is only a function of the sample (统计量是样本的函


数).
The probability distribution of a statistic T is called the
sampling distribution (抽样分布)of T.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 6 / 129
An Brief Review of Basic Statistics Basic Concepts

Sample Mean(样本均值) and Sample Variance(样本方差)

Definition
The sample average or sample mean, X, of the n observation X1 , ..., Xn
is
1∑
n
1
X̄ = (X1 + X2 + ... + Xn ) = Xi
n n
i=1

Accordingly, the sample variance is

1 ∑
n
S2 = (Xi − X)2
n−1
i=1

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 7 / 129
An Brief Review of Basic Statistics Basic Concepts

Sample Mean(样本均值) and Sample Variance(样本方差)

As we know that if Xi is a random variable(r.v.), then f(Xi ),


which is a function of Xi , is also a r.v..(随机变量的函数还是随
机变量)

So if Xi is a r.v., then Xi is also a r.v..
the sample mean and the sample variance are also a
function of sums, so they are a r.v. too.
there are some certain probability functions which can
describe distributions of the sample mean and the sample
variance.
Then naturally ask: what is the expectation, variance or
p.d.f./c.d.f. of these distributions?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 8 / 129
An Brief Review of Basic Statistics Basic Concepts

A simple case of sample mean


Let {X1 , ..., Xn } ∈ [1, 100] , assume n = 2, thus only X1 and X2

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 9 / 129
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

Sampling Distributions
There are two approaches to characterizing sampling
distributions:
exact/finite sample distribution: The sampling distribution
that exactly describes the distribution of X for any n is
called the exact/finite sample distribution of X.
approximate/asymptotic distribution: when the sample size
n is large, the sample distribution approximates to a certain
distribution function.
Two key tools used to approximate sampling distributions when
the sample size is large, thus assume that n → ∞
The Law of Large Numbers(L.L.N.): when the sample size
is large, X will be close to µY , the population mean with
very high probability.
The Central Limit Theorem(C.L.T.): when the sample
size is large, the sampling distribution of the standardized . . . . . . . . . . . . . . . . . . . .

sample average,(Y−µ
Zhaopeng Qu (Nanjing University)
Y )/σ
Y to, Econometrics
Introduction
is approximately normal.
. . .
Sep. 19th, 2019
. . . .
10 / 129
. . . . . . . . . . . . .
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

Convergence in probability(概率收敛)

Definition
Let X1 , ..., Xn be an random variables or sequence, is said to converge
in probability to a value b if for every ε > 0,

P(| Xn − b |> ε) → 0
p
as n → ∞. We write this Xn −
→ b or plim(Xn ) = b.

it is similar to the concept of a limitation in a probability way.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 11 / 129
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

The Law of Large Numbers(大数定律)

Theorem
Let X1 , ..., Xn be an i.i.d draws from a distribution
∑ with mean µ and
finite variance σ 2 (a population) and X = n1 ni=1 Xi is the sample
mean, then
p
X− →µ

Intuition: the distribution of Xn “collapses”on µ.


直观解释:选择的样本量越大,其平均值越接近总体平均值
(抽样分布更紧凑)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 12 / 129
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

A simple case

Example
Suppose X has a Bernoulli distribution if it have a binary values
X ∈ {0, 1} and its probability mass function is
{
0.78 if x = 1
P(X = x) =
0.22 if x = 0

then E(X) = p = 0.78 and Var(X) = p(1 − p) = 0.1716.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 13 / 129
. . . .... .... .... . . . . .
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

Convergence in Distribution(分布收敛)
Definition
Let X1 , X2, ... be a sequence of r.v.s, and for n = 1, 2, ...let Fn (x) be
the c.d.f of Xn . Then it is said that X1 , X2 , ...converges in
distribution to r.v. W with c.d.f, FW if

lim Fn (x) = FW (x)


n∞

d
which we write as Xn −
→ W.

Basically: when n is big, the distribution of Xn is very similar to


the distribution of w.
Common to standardize a r.v. by subtracting its expectation and
dividing by its standard deviation
X − E[X]
Z= √ . . . . . . . . . . . . . . . . . . . .
Var[X] . . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 15 / 129
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

The Central Limit Theorem(中心极限定理)


Theorem
Let X1 , ..., Xn be an i.i.d draws from a distribution with sample size n
with mean µ and 0 < σ 2 < ∞, then

Xn − µ d
σ/√n
∼ N(0, 1)

Because we don’t have to make specific assumption about the


distribution of Xi , so whatever the distribution of Xi , when n is
big,
d
the standardized Xn ∼ N(0, 1)
d 2
or Xn ∼ N(µ, σn )
直观理解:选取的样本量越大,样本均值的分布越趋于正态
分布。 . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 16 / 129
. . . .... .... .... . . . . .
. . . .... .... .... . . . . .
An Brief Review of Basic Statistics Large-Sample Approximations to Sampling Distributions

How large is “large enough”?

How large is large enough ?


how large must n be for the distribution of Y to be
approximately normal?
The answer: it depends.
if Yi are themselves normally distributed, then Y is exactly
normally distributed for all n.
if Yi themselves have a distribution that is far from normal,
then this approximation can require n = 30 or even more.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 19 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Statistical Inference

Inference
What is our best guess about some quantity of interest?
What are a set of plausible values of the quantity of interest?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 20 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Inference: from Samples to Population

Our focus: {Y1 , Y2 , ..., Yn } arei.i.d. draws from f(y) or F(Y),


thus population distribution.
Statistical inference or learning is using samples to infer f(y).
Normally, we don’t need to know everything of the population,
just some measures (the Moment) enough to describe the
characteristics of the population.

Point estimation: providing a single “best guess”as to the


value of some fixed, unknown quantity of interest, θ, which is is a
feature of the population distribution, f(y).
µ =?E[Y]
σ 2 =?Var[Y]
µy − µx ? = E[Y] − E[X]
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 21 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Three Characteristics of an Estimator


Let µ̂Y denote the some estimation value of the population
moment, µY and E(µ̂Y ) is the mean of the sampling distribution
of µ̂Y ,
1 Unbiasedness: the estimator of µY is unbiased if
E(µ̂Y ) = µY
2 Consistency:the estimator of µY is consistent if
p
µ̂Y −
→ µY
3 Efficiency:Let µ̃Y be another estimator of µY and suppose that
both µ̃Y and µ̂Y are unbiased.Then µ̂Y is said to be more
efficient than µ̂Y
var(µ̂Y ) < var(µ̃Y )
Comparing variances is difficult if we do not restrict our
attention to unbiased estimators because we could always . . . . . . . . . . . . . . . . . . . .

use aUniversity)
Zhaopeng Qu (Nanjing trivial estimator with
Introduction variance zero that Sep.
to Econometrics is biased.
. . .
19th, 2019
. . . .
22 / 129
. . . . . . . . . . . . .
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Three Characteristics of an Estimator

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 23 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Three Characteristics of an Estimator

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 24 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Properties of the Sample Mean

Let µY and σY2 denote the mean and variance of Y,(总体的均


值和方差)

Let Y = n1 ni=1 Yi is the sample mean of Yi (样本均值)
Then the expectation of the sample mean(样本均值的期望) is

1∑
n
E(Y) = E(Yi ) = µY
n i=1

so Y is an unbiased estimator of µY .
p
Based on the L.L.N., Y −
→ µY , so Y is also consistent.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 25 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Properties of the Sample Mean

The Variance of sample mean(样本均值的方差)


( n )
1∑ 1 ∑
n
σ2
Var(Y) = var Yi = 2 Var(Yi ) = Y
n i=1 n i=1 n
σY
Then, the Standard Deviation of the sample mean is σY = √
n

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 26 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Properties of the Sample Mean

Follow the C.L.T, the

d σY2
Y ∼ N(µY , )
n
And let Z be the standardized Y, then
Y − µY d
Z= σY ∼ N(0, 1)

n

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 27 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Properties of the Sample Variance

Let µY and σY2 denote the mean and variance of Yi , then the
sample variance:

1 ∑
n
S2Y = (Yi − Y)2
n − 1 i=1

Then it is easy to prove that


2 2 2
1 E(S ) = σ , thus S is an unbiased estimator of σ
2
Y Y Y which is
also the reason why the average uses the divisor n − 1
instead of n.
2 P
Y −→ σY2 , thus the sample variance is a consistent
2 S

estimator of the population variance.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 28 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

The Standard Error

Recall: the standardized sample mean will be approximately


follow a standard normal distribution when n is large.

Y − µY d
Z= σY ∼ N(0, 1)

n

But in general σY , the standard deviation of population is


unknown, so we have to use sample to estimate it.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 29 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

The Standard Error

σY
Let σY = √ n
, because S2Y is an unbiased and consistent
SY
estimator of the σY2 , then we can use √ n
as an estimator of the
standard deviation of the sample mean, σY .
It is called the standard error(标准误)of the sample mean
SY
SE[Y] = σ̂Y = √
n

Equivalence to the “standard deviation”(标准差)of the sample


distribution which measures the deviations of the sample mean.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 30 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Application: Sample Size and Standard Error(Population)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 31 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Sample Size and Standard Error(n=250)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 32 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Sample Size and Standard Error(n=500)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 33 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Sample Size and Standard Error(n=1000)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 34 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

Recall: The Chi-Square Distribution

Let Zi (i = 1, 2, ..., m) be independent random variables, each


distributed as standard normal. Then a new random variable
can be defined as the sum of the squares of Zi :

m
X= Z2i
i=1

Then X has a chi-squared distribution with m degrees of


freedom.
Then, it can be prove that a variation of the sample variance will
follow a Chi-Square distribution
(n − 1)S2Y
∼ χ2n−1
σ2
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 35 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

The Student-t Distribution


The Student t distribution can be obtained from a standard
normal and a chi-square random variable.
Let Z have a standard normal distribution, let X have a
chi-square distribution with m degrees of freedom and assume
that Z and X are independent. Then the random variable
Z
T= √
X/n

has has a t-distribution with m degrees of freedom, denoted as


T ∼ tm .
Then, the Z will follow a student t distribution.
Y − µY
Z= SY
∼ t(n − 1)

n . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 36 / 129
An Brief Review of Basic Statistics Statistical Inference: Estimation, Confident Intervals and Testing

The Student-t Distribution

But it does not matter


a lot in the large
sample.
As the degrees of
freedom get large
which is highly
correlated with the
sample size n, the
t-distribution actually
approaches the
standard normal
distribution.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 37 / 129
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

Interval Estimation

A point estimate provides no information about how close the


estimate is “likely”to be to the population parameter.
We cannot know how close an estimate for a particular sample is
to the population parameter because the population is never
unknown.
A different (complementary) approach to estimation is to
produce a range of values that will contain the truth with some
fixed probability.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 38 / 129
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

What is a Confidence Interval?

Definition
A 100(1 − α)% confidence interval for a population parameter θ is an
interval Cn = (a, b) , where a = a(Y1 , ..., Yn ) and b = b(Y1 , ..., Yn )
are functions of the data such that

P(a < θ < b) = 1 − α

In general, this confidence level is 1 − α ; where α is called


significance level.
The key is how to obtain or construct the values of a and b.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 39 / 129
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

Interval Estimation and Confidence Intervals


Suppose the population has a normal distribution N(µ, σ 2 ) and
let Y1 , Y2 , ..., Yn be a random sample from the population.
Then the sample mean Y has a normal distribution:
2
Y ∼ N(µ, σn )
The standardized sample mean Z is given by:
Z = Y−µ
σ/√n ∼ N(0, 1)

Then let θ = Z, then P(a < θ < b) = 1 − α turns into


Y−µ
a< σ/√n
<b

then it follows that


P(Y − aσ/√n < µ < Y + bσ/√n) = 1 − α
The random interval contains the population mean with a
probability 1 − α. .
.
.
.
.
. . . . .
. . . .
. . . .
. . . .
. . . .
. . . . .
.
.
.
.
.
.
.
.
.

Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 40 / 129


An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

Interval Estimation and Confidence Intervals


Two cases: σ is known and unknown
When σ is known, for example,σ = 1, thus Y ∼ N(µ, 1),
σ2 1
Y ∼ N(µ, = )
n n
From this, we can standardize Y, and, because the standardized
version of Y has a standard normal distribution, and we let
α = 0.05, then we have
Y−µ
P(−1.96 <
1/√n
< 1.96) = 1 − 0.05

The event in parentheses is identical to the event


Y − 1.96/√n ≤ µ ≤ Y + 1.96/√n, so
P(Y − 1.96/√n ≤ µ ≤ Y + 1.96/√n) = 0.95
The interval estimate of µ may be written as . . . . . . . . . . . . . . . . . . . .

[Y − 1.96/√n, Y + 1.96/
Zhaopeng Qu (Nanjing University)

n]
Introduction to Econometrics
. . . . . . . . . . . . . . .
Sep. 19th, 2019
. . .
41 / 129
. .
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

Interval Estimation and Confidence Intervals

When σ is unknown, we could use an estimate of σ,thus


SY
SE[Y] = σ̂Y = √ n
, the standard error, replacing unknown σ
thus
Y − µY Y − µY
Zt = SY =

n
SE(Y)
We just prove that it follows a student t distribution.

Zt ∼ tn−1

Definition
The t-statistic or t-ratio:
Y − µY
∼ tn−1
SE(Y)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 42 / 129
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

Interval Estimation and Confidence Intervals

To construct a 95% confidence interval, let c denote the 97.5th


percentile in the tn−1 distribution.

P(−c < t ≤ c) = 0.95

where cα/2 is the critical value of the t distribution.


The condence interval may be written as [Y ± cα/2 S/√n]

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 43 / 129
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

Interval Estimation and Confidence Intervals

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 44 / 129
An Brief Review of Basic Statistics Interval Estimation and Confidence Intervals

A simple rule of thumb for a 95% confidence interval

Because as the degrees of freedom get large which is highly


correlated with the sample size n, the t-distribution approaches
the standard normal distribution.
And Φ(1.96) = 0.975, so a rule of thumb for an approximate
95% confidence interval is

[Y ± 1.96 × SE(Y)]

Or
[Y ± 2 × SE(Y)]

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 45 / 129
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

Hypothesis Testing
Definition
A hypothesis is a statement about a population parameter, thus θ.
Formally, we want to test whether is significantly different from a
certain value µ0
H0 : θ = µ0
which is called null hypothesis. The alternative
hypothesis(two-sided) is

H1 : θ ̸= µ0

If the value µ0 does not lie within the calculated confidence


interval, then we reject the null hypothesis.
If the value µ0 lie within the calculated confidence interval, then
we fail to reject the null hypothesis. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 46 / 129
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

Introduction

In criminal law, institutions in most countries follow the rule:


“innocent until proven guilty”(疑罪从无)
The prosecutor wants to prove their hypothesis that the
accused person is guilty.
However, the burden is on the prosecutor to show guilt.
The jury or judge starts with the “null hypothesis”that
the accused person is innocent.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 47 / 129
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

Introduction

In program evaluations,instead of “presumption of innocence,”


the rule is: “presumption of insignificance”
Policymaker’s hypothesis: the program improves learning.
Evaluators approach experiments using the hypothesis:
there is zero impact of the program
Then we test this “null hypothesis”
The burden of proof is on the program
it should show a statistically significant impact.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 48 / 129
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

Two Type Errors(两种错误)


In both cases, there is a certain risk that our conclusion is wrong
Type I Error
A Type I error is when we reject the null hypothesis when it is in fact
true.(“left-wing”)
A Type II error is when we fail to reject the null hypothesis when it is
false.(“right-wing”)

In Criminal trial Case


The Type I : the judge reject the null hypothesis when the
suspect is actually no guilty.
“宁可错杀一千,不能放过一个”
The Type II: the judge fail to reject the null hypothesis
when the suspect is actually guilty.
“宁可放过一千,不能错杀一个” . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 49 / 129
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

The Significance level(显著性水平)

Definition
The significance level or size of a test is the maximum probability
for the Type I Error

P(Type I error) = P(reject H0 | H0 is true) = α

Usually, we has to carry the "burden of proof,"


We would like to prove that the assertion of H1 is true by
showing that the data rejects H0 .

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 50 / 129
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

Testing procedure

The following are the steps of the hypothesis testing:


1 Specify H and H .
0 1
2 Choose the significance level α.

3 Define a decision rule (critical value).

4 Given the data compute the test statistic and see if it falls

into the critical region.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 51 / 129
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

Decision Rule

The decision rule that leads us to reject or not to reject H0 is


based on a test statistic, which is a function of the data

Tn = T(Y1 , ..., Yn )

Usually, one rejects H0 if the test statistic falls into a critical


region(rejection region). A critical region is constructed by
taking into account the probability of making wrong
decisions,thus α.
By convention, α is chosen to be a small number, for example,
α = 0.01, 0.05, or 0.10

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 52 / 129
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

P-Value

To provide additional information, we could ask the question:


What is the largest significance level at which we could carry out
the test and still fail to reject the null hypothesis?
Or in other word, given the data, the smallest significance level
at which the null can be rejected.
We can consider thep-valueof a test
1 Calculate the t-statistic t

2 The largest significance level at which we would fail to reject

H0 is the significance level associated with using t as our


critical value
p − value = 1 − Φ(t)
where Φ(t) denotes the standard normal c.d.f.(we assume
that n is large enough)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 53 / 129
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

P-Value
Suppose that t = 1.52, then we can find the largest significance level
at which we would fail to reject H0
p − value = P(T > 1.52 | H0 ) = 1 − Φ(1.52) = 0.065

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 54 / 129
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

Hypothesis Test of of Ȳ

Specify H0 and H1

H0 : E[Y] = µY,0 H1 : E[Y] ̸= µY,0

Choose the significance level α and define a decision rule (critical


region or critical value)
eg. if we choose α = 0.05, then the critical value is 1.96,
then the region is (−∞, −1.96] and [1.96, +∞)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 55 / 129
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

Hypothesis Test of of Ȳ

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 56 / 129
An Brief Review of Basic Statistics Hypothesis Testing(假设检验)

Hypothesis Test of of Ȳ

Given the data compute the test statistic


Step1: Compute the sample average Ȳ
Step2: Compute the standard error of Ȳ
sY
SE(Y) = √
n

Step3: Compute the t-statistic

Ȳ − µY,0
tact = ( )
SE Ȳ

Step4: Reject the null hypothesis if


| tact |> critical value
or if p − value < significance level

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 57 / 129
Review: Random Experiment as the Research Design

Review: Random Experiment as the Research Design

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 58 / 129
Review: Random Experiment as the Research Design

Recall the last lecture

The Core of Empirical Studies: Causality v.s. Forecasting


The Central Question of Causality?
Rubin Causal Model: comparing counterfactuals or potential
outcomes.
However, we can never observe both counterfactuals —
fundamental problem of causal inference.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 59 / 129
Review: Random Experiment as the Research Design

Recall the last lecture

Random Assignment Solves the Selection Problem.


We should treat experimental design as a Benchmark.
To construct the counterfactuals, two broad categories of
empirical strategies.
Random Controlled Trials/Experiments:
it can eliminates selection bias which is the most
important bias arises in empirical research. If we could
observe the counterfactual directly, then there is no
evaluation problem, just simply difference.
Program Evaluation Econometrics:
The various approaches using naturally-occurring data
provide alternative methods of constructing the proper
counterfactual.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 60 / 129
What is an RCT?

What is an RCT?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 61 / 129
What is an RCT?

Randomized Controlled Trials(RCTs)(随机可控试验)

In essence, an RCT is an experiment carried out on two or


more groups where participants are randomly assigned to
receive an intervention or not.
Participants are randomly assigned to either an treatment
group who are given the intervention, or a control group
who are not..
In RCTs, each group is tested at the end of the trial and the
results from the groups are compared to see if the intervention
has made a difference and achieved its desired outcome. If the
randomized groups are large enough, you can be confident that
differences observed are due to the intervention and not some
other factor.
RCTs are considered the gold standard for establishing a causal
link between an intervention and change.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 62 / 129
What is an RCT?

RCT in History

First recorded RCT was done in 1747


by James Lind,who was a Scottish
physician in the Royal Navy.
Scurvy(败血症) is a terrible disease
caused by Vitamin C deficiency.
Serious issue during long sea voyages.
Lind took 12 sailors with scurvy and
split them into six groups of two.
Groups were assigned:
(1) 1 qt cider(苹果酒) (2) 25
drops of vitriol(硫酸)(3) 6
spoonfuls of vinegar, (4) 1/2 pt
of sea water, (5) garlic,
mustard(芥末)and barley water
.
.
.
.
.
. . . . .
. . . .
. . . .
. . . .
. . . .
. . . . .
.
.
.
.
.
.
.
.
.

Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 63 / 129


What is an RCT?

RCT in History

Only Group 6 (citrus fruit) showed substantial improvement.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 64 / 129
What is an RCT?

RCT in History

Ronald A.
Fisher(1890-1962),British
statistician and geneticist who
pioneered the application of
statistical procedures to the design of
scientific experiments.
“a genius who almost
single-handedly created the
foundations for modern
statistical science”.
“Rothamsted Experimental
Station”

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 65 / 129
What is an RCT?

RCTs in Economics

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 66 / 129
What is an RCT?

RCTs in Social Policies

According to Baruch (1978), 245 randomized field experiments


had been conducted in U.S for social policies evaluations up to
1978.
The huge effort has been prompted by the 1% part of every
social budget devoted to evaluation.
Some of them were ambitious and very costly, and affected
different kind of policies.
the Perry Preschool Program in 1961
The Rand Health Insurance Experiment from 1974-1982.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 67 / 129
What is an RCT?

Education: the Perry Preschool Program

123 children born between 1958 and 1962 in Michigan


Half of them (drawn at random) entered the perry school
program at 3 or 4 years old.
Education by skilled professionals in nurseries and kindergarten.
Program duration circle 30 weeks
follow-up survey (age : 14, 15, 19, 27 and 40 years old)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 68 / 129
What is an RCT?

Health Care: The Rand Health Insurance Experiment

5809 people randomly assigned in 1974 to different insurance


programs with 0%, 25%, 50% and 75% sharing.
They were followed until 1982.
Main results : paying a portion of health cost make people give
up some “superfluous”cares, with little harm on their health.
But some heterogeneity : not true for poor people.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 69 / 129
What is an RCT?

RCTs in China

“One egg a day”program in rural China by REAP at Stanford.


One egg a day
“Free-lunch”program in primary schools at Western China.
Free Lunch

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 70 / 129
What is an RCT?

RCT in Business

An interesting question: What is the optimal color for taxis?


Ho, Chong and Xia(2017), Yellow taxis have fewer accidents
than blue taxis because yellow is more visible than blue,
PNAS.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 71 / 129
What is an RCT?

RCT in Business

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 72 / 129
What is an RCT?

RCT in Business

Another Critical Question for business: Is Working at Home is


better than Working at Office?
Bloom, Liang, Roberts and Ying,(2015), “Does Working from Home
Work? Evidence from a Chinese Experiment”, The Quarterly Journal
of Economics

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 73 / 129
What is an RCT?

Types of RCTs

Lab Experiments
eg: students evolves a experiment in a classroom.
eg: computer game for gamble in Lab
Field Experiments
eg: the role of women in household’s decision or fake
resumes in job application
Quasi-Experiment or Natural Experiments: some unexpected
institutional change or natural shock
eg: Germany Reunion(德国统一), Great Famine in
China(1959-1961 年大饥荒)and U.S Bombing in
Vietnam(美国轰炸越南).

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 74 / 129
Randomization: The Cornerstone of RCTs

Randomization: The Cornerstone of RCTs

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 75 / 129
Randomization: The Cornerstone of RCTs Introduction

Introduction:

Randomization is the key element that separates RCTs from


other study designs.
Assume that our topic is about the entrepreneur or the
firm,please tell whether the following ways to assign participants
into treatment is random
the left(front)side and right(back)side in our class room.
Date of birth(生日)
Some sort of record number(登记号码奇偶数字)
Day of enrollment(参加实验的时间交替给编号)
None of these methods described above should be considered as
really generating random assignments,but rather systematic
occurrences.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 76 / 129
Randomization: The Cornerstone of RCTs Introduction

What does random assignment mean?

More specifically, it means that all participants have a defined


probability of being assigned to a particular treatment(group).
In RCTs, neither the investigator, practitioners or participants
determine assignments and it is not predictable based on a
pattern.
So what are better potential ways to achieve the random
assignment?
tossing a coin
lottery
Most investigators use software packages to perform
randomization
There is a wide range of software programs available that
can be used, including Excel, Stata or R.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 77 / 129
Randomization: The Cornerstone of RCTs Randomize in different levels

Which level to randomize?

1 Randomizing at the individual level


2 Randomizing at the group level

Have to consider
What unit does the program target for treatment?
What is the unit of analysis?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 78 / 129
Randomization: The Cornerstone of RCTs Randomize in different levels

Unit of Randomization: Individual

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 79 / 129
Randomization: The Cornerstone of RCTs Randomize in different levels

Unit of Randomization: Individual

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 80 / 129
Randomization: The Cornerstone of RCTs Randomize in different levels

Unit of Randomization: Class

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 81 / 129
Randomization: The Cornerstone of RCTs Randomize in different levels

Unit of Randomization:Class

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 82 / 129
Randomization: The Cornerstone of RCTs Randomize in different levels

Unit of Randomization:School

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 83 / 129
Randomization: The Cornerstone of RCTs Randomize in different levels

Unit of Randomization:School

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 84 / 129
Randomization: The Cornerstone of RCTs Randomize in different levels

Constraints in reality

1 Simple random sampling. Most basic of the probability sampling


methods —every unit has an equal probability of being selected in the
study. Easiest to do.
2 Stratified sampling. Sample each subpopulation of interest
independently and then reweigh after data collection. Statistically
efficient while allowing for adequately powered subgroup analyses.
3 Cluster sampling. Sample by a "cluster" such as geographical location
or unit. For example, sampling entire schools at random. Statistically
efficient if more variance between clusters than within clusters.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 85 / 129
Randomization: The Cornerstone of RCTs Randomize in different levels

Other methods of randomization

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 86 / 129
Randomization: The Cornerstone of RCTs Randomize in different levels

Constraints in reality

Political Advantages:
Lotteries are simple, common and transparent.
Randomly chosen from applicant pool
Participants know the “winners”and “losers”
Simple lottery is useful when there is no a priori reason to discriminate
Perceived as fair.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 87 / 129
Randomization: The Cornerstone of RCTs Randomize in different levels

Constraints in reality

Avoid contamination?
Spillovers
Crossovers

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 88 / 129
Randomization: The Cornerstone of RCTs Multiple Treatments

Introduction

Some times core question is deciding among different possible


interventions.
Can randomize these programs
any one more intervention is benefit enough
Remember: you should always have a control group

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 89 / 129
Randomization: The Cornerstone of RCTs Multiple Treatments

Multiple treatments: can’t work

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 90 / 129
Randomization: The Cornerstone of RCTs Multiple Treatments

Varying levels of treatment:

Treatment 1: Some schools are assigned full treatment: All kids get
pills
Treatment 2: Some schools are assigned partial treatment: 50% kids
get pills
Control: Some schools are assigned zero treatment: no kids get pills

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 91 / 129
Randomization: The Cornerstone of RCTs Multiple Treatments

Should the assignment make the number of participants


equal to each group?
In general, participants are typically allocated in equal numbers to
different treatments.
50% of participants go to the treatment group and the other 50% to
the control group
However, there are a number of practical reasons for which you may
choose to have a different allocation ratio, i.e. an unequal allocation.
resource constraints or costs: a fixed quota of participants it needs to
assign to the treatment group.
an allocation ratio of 3:1, where the intervention group has three times
as many participants as the control group.
limited resources to provide the intervention (including the availability
of staff delivering it), or treatment is significantly more expensive.
an allocation ratio of 1:2 (twice as many participants in the control
group compared to the intervention group)
In most cases, we don’t even to adjust the weight of the sample.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 92 / 129
How to Run RCTs?

How to Run RCTs?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 93 / 129
How to Run RCTs? The Process of RCTs

Step 1:Asking the Right Question


Choosing a right topic: Check whether the answer to the question
you are planning to study already exists.
There are many different types of questions that you might be
interested in,ensure that you are asking the type of question that
requires an RCT.
Strategic Question: What areas of innovation policy should we focus
our attention on
Descriptive question: What are the key challenges that SMEs in certain
geographic areas are facing.
Process-related questions: Is the young entrepreneurs mentoring
program using the resources dedicated for program delivery?
RCTs can answer questions about impact:
Did the program or policy work?
Which components of the intervention were most crucial for achieving
impact?
Try answer one question one time at first. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 94 / 129
How to Run RCTs? The Process of RCTs

Step 2: Finding a specific context

Transform into a measurable empirical question


Empirical problem: Class size and educational output.
Policy question: What is the effect on test scores (or some
other outcome measure) of reducing class size by one
student per class? By 8 students/class?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 95 / 129
How to Run RCTs? The Process of RCTs

Step 3: Making a experiment design

Specifying eligibility criteria for participants


Specifying the treatment
Specifying the outcomes
Sample sizes
Randomization
Specifying the analyzing method

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 96 / 129
How to Run RCTs? The Process of RCTs

Step 4: Piloting the treatment

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 97 / 129
How to Run RCTs? The Process of RCTs

Step 5:Random assignment(Baseline)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 98 / 129
How to Run RCTs? The Process of RCTs

Step 6:Implementing the program

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 99 / 129
How to Run RCTs? The Process of RCTs

Step 7: Measure participant outcomes

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 100 / 129
How to Run RCTs? The Process of RCTs

Step 8: Analyzing the data and Writing the report

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 101 / 129
How to Run RCTs? An assuming case: the California School

A Case: the California School

Draw schools (n = 420) randomly from all school in California


Variables:
5th grade test scores (Stanford-9 achievement test, combined math
and reading), district average
Student-teacher ratio (STR) = no. of students in the district divided
by no. full-time equivalent teachers

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 102 / 129
How to Run RCTs? An assuming case: the California School

Summary Table: Descriptive Statistics

Does this table tell us anything about the relationship between test
scores and the STR?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 103 / 129
How to Run RCTs? An assuming case: the California School

Scatterplot: test score v. student-teacher ratio

What does this figure show? and it may suggest...? .


.
.
.
.
. . . . .
. . . .
. . . .
. . . .
. . . .
. . . . .
.
.
.
.
.
.
.
.
.

Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 104 / 129
How to Run RCTs? An assuming case: the California School

The California Test Score

We need to get some numerical evidence on whether districts with


low STRs have higher test scores.
But how?
1 Compare average test scores in districts with low STRs to those with
high STRs (“estimation”)
2 Test the “null”hypothesis that the mean test scores in the two types
of districts are the same, against the “alternative”hypothesis that
they differ (“hypothesis testing”)
3 Estimate an interval for the difference in the mean test scores, high v.
low STR districts (“confidence interval”)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 105 / 129
How to Run RCTs? An assuming case: the California School

The California Test Score

Compare districts with “small”and “large”class sizes:

Small v.s. Large


Class Size Average score(Y) Standard deviation N
Small(STR < 20) 657.4 19.4 238
Large(STR ⩾ 20) 650.0 17.9 182

1 Estimation of ∆= difference between group means


2 Test the hypothesis that ∆ = 0
3 Construct a confidence interval for ∆

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 106 / 129
How to Run RCTs? Comparing Means from Different Populations

An Example: Comparing Means from Different Populations

In an RCT, we would like to estimate the average causal effects


over the population

ATE = ATT = E{Yi (1) − Yi (0)}

We only have random samples and random assignment to


treatment, then what we can estimate instead

difference in mean = Ytreated − Ycontrol

Under randomization, difference-in-means is a good estimate for


the ATE.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 107 / 129
How to Run RCTs? Comparing Means from Different Populations

Hypothesis Tests for the Difference Between Two Means

To illustrate a test for the difference between two means, let mw


be the mean hourly earning in the population of women recently
graduated from college and let mm be the population mean for
recently graduated men.
Then the null hypothesis and the two-sided alternative
hypothesis are

H0 : µm = µw
H1 : µm ̸= µw

Consider the null hypothesis that mean earnings for these two
populations differ by a certain amount, say d0 . The null
hypothesis that men and women in these populations have the
same mean earnings corresponds to H0 : H0 : d0 = µm − µw = 0
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 108 / 129
How to Run RCTs? Comparing Means from Different Populations

The Difference Between Two Means


Suppose we have samples of nm men and nw women drawn at
random from their populations. Let the sample average annual
earnings be Ym for men and Yw for women. Then an estimator
of µm − µw is Ym − Yw .
Let us discuss the distribution of Ym − Yw .
σ2 σ2
∼ N(µm − µw , m + w )
nm nw
2 2
if σm and σw are known, then the this approximate normal
distribution can be used to compute p-values for the test of the
null hypothesis. In practice, however, these population variances
are typically unknown so they must be estimated.
Thus the standard error of Ym − Yw is

s2m s2
SE(Ym − Yw ) = + w
nm nw .
.
.
.
.
. . . . .
. . . .
. . . .
. . . .
. . . .
. . . . .
.
.
.
.
.
.
.
.
.

Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 109 / 129
How to Run RCTs? Comparing Means from Different Populations

The Difference Between Two Means

The t-statistic for testing the null hypothesis is constructed


analogously to the t-statistic for testing a hypothesis about a
single population mean, thus t-statistic for comparing two means
is
Ym − Yw − d0
tact =
SE(Ym − Yw )
If both nm and nm are large, then this t-statistic has a standard
normal distribution when the null hypothesis is true,thus
Ym − Yw = 0.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 110 / 129
How to Run RCTs? Comparing Means from Different Populations

Confidence Intervals for the Difference Between Two


Means

the 95% two-sided confidence interval for d consists of those


values of d within ±1.96 standard errors of Ym − Yw , thus
d = µm − µw is

(Ym − Yw ) ± 1.96SE(Ym − Yw )

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 111 / 129
How to Run RCTs? Comparing Means from Different Populations

Hypothesis Test of the Difference Between Two Means

Reject the null hypothesis if


Ym −Yw −d0
| tact |=| SE(Ym −Yw )
|> critical value
or if p − value < significance level

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 112 / 129
Limitations of RCTs

Limitations of RCTs

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 113 / 129
Limitations of RCTs

RCT are far from perfect!

High Costs, Long Duration


Potential Ethical Problems: “Parachutes reduce the risk of
injury after gravitational challenge, but their effectiveness has not
been proved with randomized controlled trials."
Milgram Experiment
Stanford Prison Experiment
Monkey Experiment
Limited Generalizability
RCTs allow us to gain knowledge about causal effects but
without knowing the mechanism.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 114 / 129
Limitations of RCTs

Potential Problems in Practice

Small sample: Student Effect


Hawthorne effect(霍桑效应):The subjects are in an experiment
can change their behavior.
Attrition(样本流失):It refers to subjects dropping out of the
study after being randomly assigned to the treatment or control
group.
Failure to randomize or failure to follow treatment protocol:
People don’t always do what they are told.
Wearing glasses program in Western Rural China.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 115 / 129
Limitations of RCTs

Program Evaluation Econometrics(项目评估计量经济学)


Question: How to do empirical research scientifically when we
can not do experiments? It means that we always have selection
bias in our data, or in term of “endogeneity”.
Answer: Build a reasonable counterfactual world by naturally
occurring data to find a proper control group is the core of
econometrical methods.
Here you Furious Seven Weapons in Applied Econometrics(七种
盖世武器)
1 Random Controlled Trials (RCT)(随机实验)
2 OLS(最小二乘回归)
3 Decomposition(分解)
4 Instrumental Variable(工具变量)
5 Differences in Differences(双差分)
6 Matching and Propensity Score(匹配)
7 Regression Discontinuity(断点回归)
8 Synthetic Control (合成控制法) . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 116 / 129
An Example of Randomized Controlled Trials

An Example of Randomized Controlled Trials

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 117 / 129
An Example of Randomized Controlled Trials

Working from Home(WFH) v.s Working from Office

“Does Working from Home Work? Evidence from a Chinese


Experiment”,by Nicholas A. Bloom, James Liang, John Roberts,
Zhichun Jenny Ying The Quarterly Journal of
Economics,February 2015, Vol. 130, Issue 1, Pages 165-218.
Basic Question: WFH = SFH
SFH(Shirking from Home)?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 118 / 129
An Example of Randomized Controlled Trials

Working from Home(WFH) is going to a trend


internationally.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 119 / 129
An Example of Randomized Controlled Trials

Motivations

Working from home is a modern management practice which


appears to be stochastically spreading in the US and Europe
20 million people in US report working from home at least once
per week
Little evidence on the effect of workplace flexibility
productivity
employee satisfaction
shirking

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 120 / 129
An Example of Randomized Controlled Trials

Ctrip Experiment

Ctrip, China’s largest travel-agent, with16,000 employees, $6bn


NASDAQ.
Co-founder of Ctrip, James Liang, was an Econ PhD at
Stanford and decided to run a experiment to test WFH.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 121 / 129
An Example of Randomized Controlled Trials

Ctrip Experiment: A call center in Shanghai


The experiment runs on airfare & hotel departments in Shanghai.
Main Work: Employees take calls and make bookings.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 122 / 129
An Example of Randomized Controlled Trials

The Experimental Design

Treatment: work 4 shifts (days) a week at home and to work the


5th shift in the office on a fixed day.
Control: work in the office on all 5 days.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 123 / 129
An Example of Randomized Controlled Trials

The Experimental Design

In early November 2010, employees in the airfare and hotel


booking departments were informed of the WFH program.
Of the 994 employees in the airfare and hotel booking
departments, 503 (51%) volunteered for the experiment.
Among the volunteers, 249 (50%) of the employees met the
eligibility requirements and were recruited into the experiment.
The treatment and control groups were then determined from
this group of 249 employees through a public lottery.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 124 / 129
An Example of Randomized Controlled Trials

The Experimental Design

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 125 / 129
An Example of Randomized Controlled Trials

Results: the number of receiving calls

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 126 / 129
An Example of Randomized Controlled Trials

Results: Working hours

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 127 / 129
An Example of Randomized Controlled Trials

Results:Many Outcomes

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 128 / 129
An Example of Randomized Controlled Trials

Conclusion: Very positive

They found a highly significant 13% increase in employee


performance from WFH,
of which about 9% was from employees working more
minutes of their shift period (fewer breaks and sick days)
and about 4% from higher performance per minute.
Home workers also reported substantially higher work satisfaction
and psychological attitude scores, and their job attrition rates fell
by over 50%.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Zhaopeng Qu (Nanjing University) Introduction to Econometrics Sep. 19th, 2019 129 / 129

You might also like