Mfin6201 Week1

MFIN6201 Lecture 1
An Overview of Econometrics
Raphael Park
May 30, 2021
1
Who’s talking?
• Raphael Park, jonghyeon.park@unsw.edu.au

• Research areas include Corporate Finance, Nonprofits, and fund
management.
• Consultation by appointments. Please email me if you would like to have
a consultation.
• More flexible before final exams
2
Course Admins
• Questions re. the course content should be posted on Moodle Discussion

Forum, I will not reply emails in this regard
• Questions re. the course admin should be posted on Moodle ”Questions
about Course Admin” forum. Email me if it is personal (such as special
consideration)
• Group formation is also done via Moodle, you have to select a group
before week 3. A group can have 5-8 people, not less, not more.
• You will need the UNSW myaccess to complete this course, since we use
a non-free software called Stata. (more on this later)
• Lecture recordings will be provided after each one.
• The textbook is Stock Watson (SW), but it is for reference only. Lecture
Slides + Homework will be sufficient for you to succeed in the exams.
3
Course Schedule
• Week 2 - Programming in Stata

• Week 3 - Statistics, Probability theory, Mathematics and intro to OLS
• Week 4 - OLS regression I
• Week 5 - OLS regression II
• Week 6 - Flexibility week - we will have a discussion lecture
• Week 7 - IV regressions
• Week 8 - Panel Data and Fixed Effect Regressions
• Week 9 - Experiments
• Week 10 - Machine Learning
4
Assessment
The course outline is bit outdated due to the virus situation, please refer to
slides and materials on Moodle.
• Participation (10%) - based on finishing homework each week - there will

be 8 homework in total, each one worths 1.25. Each homework should be
submitted to the link under each week’s tab.
• Written assignment (10%) - Due in week 4
• Group projects (20%) - Written group report due in week 10.
• Peer evaluations (10%) - The average of how your teammates grade you.
• Final exam (50%) - University Exam Period
5
Overview of today’s lecture
Read (SW Chapters 1, 2, 3) Topics
• The probability framework for statistical inference

• Estimation
• Hypothesis testing
• Confidence interval
6
What is Econometrics?
The science and art of using economic theory and statistical

techniques to analyze economic data.
• Statistics + economics
• Standard assumptions in statistics
• Nature of economic data
• Financial econometrics = econometrics + finance
• Use econometrics techniques to study a variety of problems
from finance
• Focus on hypothesis testing, causal inference and forecasting
7
Types of data in finance
• Studies varying across entities (e.g., firms, individuals, etc.):

cross-sectional data
• Studies varying across time: time-series data
• Study variations both across firms and through time using
panel data
Can you visualize the data structure?

Cross-sectional vs. time series vs. panel data
Answer: See Stock and Watson Chapter 1.3
8
Brief overview of the first part of the course
Economics and financial theories suggest important relations, often

with policy implications, but virtually never suggests quantitative
magnitudes of causal effects.
• What is the quantitative effect of independent board members
on firm performance?
• What is the quantitative effect of high frequency trading on
market quality?
• What is the quantitative effect on asset prices of a 1
percentage point increase in interest rates by the Fed?
• What is the magnitude for the price of risk across different
financial assets?
We need to carry out empirical analysis to answer these questions
with rigor!
9
How to carry out empirical analysis in finance?
Example: medical research
• Treatment group : given new drugs

• Control group : given placebo (usually vitamin)
• Treatment effect : difference between the two groups
Example: financial research
• Firms with high R&D expense vs with low R&D expense

• Firms that invest in social responsibility perform better?
10
How to carry out empirical analysis in finance?
But almost always we only have observational (non-experimental)

data
• Independent board of directors and algorithmic traders

• Monetary policy
Part of the course deals with difficulties arising from using

observational to estimate causal effects
• confounding effects (omitted factors)

• simultaneous causality
• “correlation does not imply causation”
11
An empirical example: class size and education output
• Question: What is the effect on test scores (or some other

outcome measure) of reducing class size by one student per
class? by 8 students/class??
• We must use data to find out (is there any way to answer this
without data?)
12
Using data for empirical analysis
The California Test Score Data Set - see SW chapter 1.3
• California school districts (n = 420) in 1999

• Variables:
• fifth grade test scores (combined math and reading),
• district average Student-teacher ratio (STR) = no. of students
in the district divided by no. full-time equivalent teachers
13
Initial look at the data
What is the relationship between test scores and the STR?
14
Initial look at the data
What does this figure show? Answer: Eyeball econometrics
15
Eyeball econometrics is not rigorous enough!
We need to get some numerical evidence on whether districts with low STRs
have higher test scores but how?
• Compare average test scores in districts with low STRs to those with high
STRs (“estimation”).
Point estimation involves the use of sample data to calculate a single
value (known as a statistic), which is served as the “best estimate” of an
unknown (fixed or random) population parameter.
• Test the null hypothesis that the mean test scores in the two types of
districts are the same, against the “alternative” hypothesis that they
differ (“hypothesis testing”).
Testing a hypothesis on the basis of observing a process that is modeled
via a set of random variables.
• Estimate an interval for the difference in the mean test scores, high v.
low STR districts (“confidence interval”). A confidence interval is a
type of interval estimate of a population parameter.
16
What is next?
• Or, you can use (linear) regression to measure the relation (i.e., the
slope) between student-to-teacher ratios and average test scores.
• Why linear? - it is easy to implement, easy to interpret and it gives the
best linear approximation even the relationship is not linear.
• Why regression? - it went through a robust mathematical procedure to
get the estimate (hard to beat with alternatives like eyeballing)
• Before turning to regression, however, we will review some of the
underlying theory of estimation, hypothesis testing, and confidence
intervals:
• Why do these procedures work, and why use these rather than
others?
• We will review the intellectual foundations of statistics and
econometrics
17
Review of Statistical Theory
• Probability framework for statistical inference

• Estimation
• Testing
• Confidence Intervals
18
Probability framework for statistical inference
• Population, random variable, and distribution

• Moments of a distribution (mean, variance, standard
deviation, covariance, correlation)
• Conditional distributions and conditional means
• Distribution of a sample of data drawn randomly from a
population: X1 , · · ·, Xn
19
Population, random variable, and distribution
Population
• The group or collection of ALL possible entities of interest
(school districts)
• We will think of populations as infinitely large
• e.g. Population Average of height of human being...? (only
God knows)
Random variable
• Random variable is a variable whose value is subject to
variations due to chance
• As a result, it can take on a set of possible different values
each with an associated probability
• Numerical summary of a random outcome (district average
test score, district STR)
20
Discrete random variable
Example 

 1 with probability 0.2

 2 with probability 0.3
X =


4 with probability 0.2

21
Discrete random variable: pdf
• pdf (probability density function)
f (x) ≡ P(X = x)
- The probability for a random variable X to be equal to a given

constant, x
22
Discrete random variable: cdf
• cdf (cumulative distribution function)
f (x) ≡ P(X = x)
- The probability for a random variable X to be less than or equal

to a given constant, x
F (1) = P(X ≤ 1) = 0.2 = 0.2

F (2) = P(X ≤ 2) = 0.2 + 0.3 = 0.5
F (3) = P(X ≤ 3) = 0.2 + 0.3 + 0.3 = 0.8
F (4) = P(X ≤ 4) = 0.2 + 0.3 + 0.3 + 0.2 =1
23
Discrete random variable: cdf
24
Relation between pdf and cdf
• cdf is a sum of pdf
F (x) = f (x) + f (x − ∆) + · · ·
= f (x) + F (x − ∆)
• pdf is a difference of cdf
f (x) = F (x) − F (x − ∆)
25
Continuous random variable
For example,
• Normal distribution (bell-shaped curve)
X ∼ N(µ, σ 2 )
: X is drawn from a normal distribution with mean µ and variance
σ2
• pdf of the normal distribution
26
Continuous random variable
• pdf of continuous distribution is somehow different from the

pdf of discrete distribution because P(X = x) = 0
• Let’s begin with cdf
X = P(X ≤ x)
27
pdf and cdf of continuous distribution
cdf is an integration of pdf

Z x
F (x) = f (t)dt
−∞
pdf is a differentiation of cdf

d
f (x) = F (x)
dx
28
pdf and cdf
cdf
• Discrete: F (x) =
P
t≤x f (t)
Rx
• Continuous: F (x) = −∞ f (t)
pdf
• Discrete: f (x) = F (x) − F (x − ∆)

• Continuous: f (x) = d
dx F (x)
29
Probability density of stock returns
30
Moments: mean
• Mean (µ, average, expectation, expected value, 1st moment)

( P
n
xp (discrete)
E [X ] ≡ R ∞i=1 i i
−∞ xf (x)dx (continuous)
• Example



 2 with probability 0.3
X =


4 with probability 0.2

E [X ] = 1 · 0.2 + 2 · 0.3 + 3 · 0.3 + 4 · 0.2

= 2.5
31
Moments: variance
• Variance (σ 2 , 2nd moment)
Var [X ] ≡ Ep[(X − µ)2 ]

( P
n
(x − µ)2 pi (discrete)
= R ∞i=1 i 2
−∞ (x − µ) f (x)dx (continuous)
• Standard deviation
p
std(X ) = Var [X ]
• Example
Var (X ) = (1 − 2.5)2 · 0.2 + (2 − 2.5)2 · 0.3

+ (3 − 2.5)2 · 0.3 + (4 − 2.5)2 · 0.2
32
= 1.05
Useful formulas
• Expectation
E [a + bX ] = a + bE [X ]
E [X + Y ] = E [X ] + E [Y ]
Combined,
E [aX + bY ] = aE [X ] + bE [Y ]
• Variance
Var [a + bX ] = b 2 Var [X ]
Var [X + Y ] = Var [X ] + Var [Y ] + 2Cov [X , Y ]
Combined,
Var [aX + bY ] = a2 Var [X ] + b 2 Var [Y ] + 2abCov [X , Y ] 33
Moments: skewness, kurtosis
E [(X −µ)3 ]
• Skewness (3rd moment) = σ3
• measure of asymmetry of a distribution
• skewness = 0: distribution is symmetric
• skewness > (<) 0: distribution has long right (left) tail
E [(X −µ)4 ]
• Kurtosis (4th moment) = σ4
• measure of mass in tails
• measure of probability of large values
• kurtosis = 3: normal distribution
• kurtosis > 3: heavy tails (leptokurtotic)
34
Skewness and Kurtosis
35
Covariance
• Random variables X and Y have a joint distribution

• Covariance between X and Y is
Cov (X , Y ) = E [(X − µx )(Y − µy )] = σxy
• The covariance is a measure of the linear association between
X and Y
• cov(X, Y ) > 0 means a positive relation between X and Y
• If X and Y are independently distributed, then Cov(X,Y) = 0
(but not vice versa!! - consider X ∼ N(0,1) and Y = X 2 )
• The covariance of a random variable with itself is its variance
36
Correlation
• Correlation is defined as
Cov (X , Y ) σxy
Corr (X , Y ) = p = = ρxy
Var (X )Var (Y ) σx σy
• −1 ≤ corr (X , Y ) ≤ 1
• Cov(X,Y) > 0 means a positive relation between X and Y
• Corr(X,Y) = 1 mean perfect positive linear association
• Corr(X,Y) = -1 means perfect negative linear association
• Corr(X,Y) = 0 means no linear association
37
Correlation examples
• Two Bernoulli random variables (thinking about flipping coins)

(
1 with probability p
X,Y =
0 with probability 1 − p
• Mean
E [X ] = p · 1 + (1 − p) · 0 = p
• Variance
Var [X ] = E [(X − µ)2 ]

= p · (1 − p)2 + (1 − p) · (0 − p)2
= p(1 − p)
38
• Example 1: independent distributions
Y/X 1 0
1 p2 p(1 − p)
0 p(1 − p) (1 − p)2
• Covariance
Cov (X , Y ) = E [(X − µx )(Y − µy )]

= p 2 · (1 − p)(1 − p) + p(1 − p) · (1 − p)(0 − p)
+ p(1 − p) · (0 − p)(1 − p) + (1 − p)2 · (0 − p)(0 − p)
=0
39
• Correlation
Cov (X , Y )
Corr (X , Y ) = p =0
Var (X )Var (Y )
• Thus, independent distribution implies zero

covariance/correlation, but not vice versa.
40
• Example 2: perfect correlation
Y/X 1 0
1 p 0
0 0 1-p
• Covariance
Cov (X , Y ) = E [(X − µx )(Y − µy )]

= p · (1 − p)(1 − p) + 0 · (1 − p)(0 − p)
+ 0 · (0 − p)(1 − p) + (1 − p) · (0 − p)(0 − p)
= p(1 − p)
= Var (X )
Thus, perfect correlation implies that covariance is equal to one’s
41
own variance
• Correlation
Cov (X , Y ) p(1 − p)
Corr (X , Y ) = p = =1
Var (X )Var (Y ) p(1 − p)
• Thus, perfect correlation implies that correlation is equal to

one
42
Correlation
43
Correlation
• Corr(X, Y) ∈ (0, 1)
: When X is high, Y is likely, but not perfectly, to be high
• Corr(X, Y) ∈ (-1, 0)
: When X is high, Y is likely, but not perfectly, to be low
• Note: correlation does not imply causality
• Example: More education is related to more salary.
• But does it mean getting more education will definitely
increase the salary? No! Maybe you earn more salary then you
have the money to get education, or higher education means
better family background..!
44
Bayes’ theorem
• The distribution of Y, given value(s) of some other random

variable, X
T
P(A B)
P(A|B) =
P(B)
P(B|A)P(A)
=
P(B)
45
Conditional probability: example
• Example: what is the probability for the first child to be a son

if at least one of the two children is known to be a son?
2
P(first is a son | at least one is a son) =
3
P(first is a son & at least one is a son)
=
P(at least one is a son)
1/2
=
3/4
2
=
3
46
Textbook example: Table 2.2
• Joint distribution
47
• Marginal distribution
n
X
P(Y = y ) = P(X = xi , Y = y )
i=1
• From the example,
P(rain) = 0.15 + 0.15 = 0.30

P(no rain) = 0.07 + 0.63 = 0.70
P(long commute) = 0.15 + 0.07 = 0.22
P(short commute) = 0.15 + 0.63 = 0.78
48
• Conditional distribution
P(X = x and Y = yB)
P(Y = y |X = x) =
P(X = x)
• From the example,
0.15
P(long commute|rain) = = 0.50
0.30
0.15
P(rain|long commute) = = 0.68
0.22
0.63
P(short commute|no rain) = = 0.90
0.70
49
Conditional probability example: AIDS testing
• Question
The probability that a patient has HIV is 0.001 and the
diagnostic test for HIV can detect the virus with a probability
of 0.98. Given that the chance of a false positive is 6%, what
is the probability that a patient who has already tested
positive really has HIV?
50
• The following information is given from the question,
P(HIV ) = 0.001
P(positive | HIV ) = 0.98
P(positive | not HIV ) = 0.06
• The question asks
P(HIV | positive) =???
51
• Marginal distribution & Bayes’ theorem
P(positive) = P(positive&HIV ) + P(positive&not HIV )

= P(positive|HIV ) · P(HIV ) + P(positive|not HIV ) · P(not HIV )
= 0.98 × 0.001 + 0.06 × 0.999
= 0.06092
52
• Bayes’ theorem
P(positive|HIV ) · P(HIV )
P(HIV | positive) =
P(positive)
0.98 × 0.001
=
0.06092
= 0.0161
53
Conditional means
• Conditional expectations and conditional moments
E (Y |X = x)
• Example: E (test scores|STR < 20) = the mean of test scores

among districts with small class sizes
• Conditional variance: variance of conditional distribution
54
Conditional means
• Do you remember the classroom-size example?
∆ = E (test scores|STR < 20) − E (test scores|STR ≥ 20)
Other examples of conditional means:
• Wages of all female workers (Y = wages, X = gender)

• Mortality rate of those given an experimental treatment
(Y =live/die; X = treated/not treated)
• If E (X |Z ) = const, then corr(X,Z) = 0 (not necessarily vice
versa however)
The conditional mean is a (possibly new) term for the

familiar idea of the group mean
55
Homework!
• Try to summarize an academic paper that you are interested

in.
• The main point is to illustrate the research question, empirical
challenges, and how the authors overcame the challenges.
• Choose an article in the following journals: Journal of Finance,
Journal of Financial Economics, Review of Financial Studies.
• Due on the day before the next lecture.
• Efforts are more important than getting the correct answer...!
56

Mfin6201 Week1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mfin6201 Week1

Uploaded by

Copyright:

Available Formats

MFIN6201 Lecture 1

• Raphael Park, jonghyeon.park@unsw.edu.au

• Questions re. the course content should be posted on Moodle Discussion

• Week 2 - Programming in Stata

• Participation (10%) - based on finishing homework each week - there will

Read (SW Chapters 1, 2, 3) Topics

• The probability framework for statistical inference

The science and art of using economic theory and statistical

• Studies varying across entities (e.g., firms, individuals, etc.):

Can you visualize the data structure?

Economics and financial theories suggest important relations, often

Example: medical research

• Treatment group : given new drugs

Example: financial research

• Firms with high R&D expense vs with low R&D expense

But almost always we only have observational (non-experimental)

• Independent board of directors and algorithmic traders

Part of the course deals with difficulties arising from using

• confounding effects (omitted factors)

• Question: What is the effect on test scores (or some other

The California Test Score Data Set - see SW chapter 1.3

• California school districts (n = 420) in 1999

What is the relationship between test scores and the STR?

What does this figure show? Answer: Eyeball econometrics

• Probability framework for statistical inference

• Population, random variable, and distribution

• pdf (probability density function)

- The probability for a random variable X to be equal to a given

• cdf (cumulative distribution function)

- The probability for a random variable X to be less than or equal

F (1) = P(X ≤ 1) = 0.2 = 0.2

• cdf is a sum of pdf

• pdf is a difference of cdf

• pdf of continuous distribution is somehow different from the

cdf is an integration of pdf

pdf is a differentiation of cdf

• Discrete: f (x) = F (x) − F (x − ∆)

• Mean (µ, average, expectation, expected value, 1st moment)

E [X ] = 1 · 0.2 + 2 · 0.3 + 3 · 0.3 + 4 · 0.2

• Variance (σ 2 , 2nd moment)

Var [X ] ≡ Ep[(X − µ)2 ]

Var (X ) = (1 − 2.5)2 · 0.2 + (2 − 2.5)2 · 0.3

• Random variables X and Y have a joint distribution

• Two Bernoulli random variables (thinking about flipping coins)

Var [X ] = E [(X − µ)2 ]

• Example 1: independent distributions

Cov (X , Y ) = E [(X − µx )(Y − µy )]

• Thus, independent distribution implies zero

• Example 2: perfect correlation

Cov (X , Y ) = E [(X − µx )(Y − µy )]

• Thus, perfect correlation implies that correlation is equal to

• The distribution of Y, given value(s) of some other random

• Example: what is the probability for the first child to be a son

• From the example,

P(rain) = 0.15 + 0.15 = 0.30

• From the example,

• The following information is given from the question,

• The question asks

P(HIV | positive) =???

• Marginal distribution & Bayes’ theorem

P(positive) = P(positive&HIV ) + P(positive&not HIV )

• Conditional expectations and conditional moments