You are on page 1of 42

Lecture Notes 2

Review of Probability and Statistics

Tingting Wu

National University of Singapore


Recap for week 1 lecture
Ø What is Econometrics?
• Economics + Mathematics +Statistics

Ø What econometrics can do?


• Quantify, identify and predict

Ø Steps of econometrics analysis


• Formulate question; selection of models; Selection and collection of data; Testify and verify

Ø Data sampling
• Stochastic process for representative sample data (Typical causes for Sample Bias?)

Ø Types of research data (from the way to collect)


• Observational data; Experimental data; Simulation data; Derived/compiled data

Ø Types of dataset (from how the dataset looks like )


• Cross sectional data; Time series data; Panel data; Pool cross sections data
Various Types of Datasets
Individuals (Person, household, firm, or other economic agent)

Cross Sectional Data


Time Series Data
• One observation per individual
• Indexed by time. • e.g., surveys
• e.g., price, interest • Focus of our module
rate, exchange rate
• Useful in finance
Time

Pool Cross Panel Data


Sections Data
• cross-sectional data
• Cross sectional + Time series
allowing for different
• A set of individuals surveyed
individuals from
repeatedly over time
different times.
Types of Datasets

Can you tell the types of dataset?

Date Daily Month Name Alcohol (ml) Month Name Alcohol (ml)
Expense ($)
01/01/2023 25 Jan Jenny 100 Jan Jenny 100
02/01/2023 37 Jan Mary 293 Jan Mary 293
03/01/2023 78 Jan Ting 20 Jan Ting 20
04/01/2023 25 Jan Ying 30 Feb Jenny 200
…... ….. Feb Tom 120 Feb Mary 124
16/01/2023 76 Feb Jerry 20 Feb Ting 30

(A: Daily expense for Jerry) (B: Surveys of consumption for alcohol) (C: Monthly Survey of consumption for alcohol)
Types of Datasets
Can you tell the types of dataset?

D: Undergraduate students Tuition fee for


special terms across Schools in NUS
Outline

Ø Probability, Random variables, and Probability Distribution

Ø Moments of a probability distribution


• Mean; Variance; Skewness

Ø Often used probability distributions in econometrics


• Normal; Chi-Squared; Student t; F-distribution

Ø Two Random Variables


• Joint distribution, Marginal distribution, Conditional distribution
• Independence, Covariances and Correlations
Probability, Random variables, and Probability
Distribution
Probability
Probability reflects the likelihood of the occurrence of any event.

The Addition Rule


• The probability that either A and B will happen is equal to
• P(A∪B)=P(A)+P(B)−P(A∩B)
• If A and B are disjoint, then P(A∪B)=P(A)+P(B), as P(A∩B)=0

The Multiplication Rule


• The probability that A and B both occur is equal to the probability that B occurs times the
conditional probability that A occurs given that B occurs.
• P(A∩B)=P(B)⋅P(A|B)
P(A∩B)=P(A)⋅P(B|A)
• If A and B are independent, then P(A∩B)=P(A)⋅P(B), as P(B)=P(B|A)

But, what is the importance of knowing probability rules in real life situations…?
Practice for probability

An urn contains 20 red and 10 blue balls. Two balls are drawn from a bag one after the other
without replacement.

• What is the probability that both the balls are drawn are red?

!"
• In the first draw P(A) = P(red balls in first draw) = #"
$%
• In the second draw, Conditional probability of B on A will be, 𝑃(𝐵|𝐴) =
!%
• By multiplication rule of probability,
20 19 38
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵|𝐴) = + =
30 29 87
Random Variables
Some definitions:
Outcomes are the mutually exclusive potential results of a random process
• The number of days it will rain next week
Random variables is a numerical summary of a random outcome
• The number of days it will rain next week is random and takes on a numerical value
(0,1,2,3,4,5,6 or 7)

Two types of random variables:


Discrete random variables: take on discrete number of values, like 0,1,2
Continuous random variables: take on a continuum of possible values or any value in a real interval

In general, X is a random variable if it represents a random draw from some population


Probability distribution of discrete random variables
• Each outcome of a discrete random variable occurs with a certain probability

A probability distribution of a discrete random variable is the list of possible values


of the variable and the probability that each value will occur.

• Let random variables S be the number of days it will rain in the last week of August.

Probability Distribution of S
Outcome (s) 0 1 2 3 4 5 6 7
Probability 0.2 0.25 0.2 0.15 0.1 0.05 0.04 0.01

Pr(S=1)
Cumulative distribution of discrete random variables
A cumulative probability distribution is the probability that the random variable is less
than or equal to a particular value

• The probability that it will rain less than or equal to s days, F(s) = Pr(S ≤ s) is the cumulative
probability distribution of S evaluated at s.
• A cumulative probability distribution is also referred to as a cumulative distribution or a CDF.

(Cumulative) Probability distribution of S


Outcome (s) 0 1 2 3 4 5 6 7
Probability 0.2 0.25 0.2 0.15 0.1 0.05 0.04 0.01
CDF 0.2 0.45 0.65 0.8 0.9 0.95 0.99 1

Pr(S ≤ 1) = Pr(S=0) + Pr(S=1)

Pr(S ≤ 3) = Pr(S=0) + Pr(S=1)+ Pr(S=2)+ Pr(S=3)


Probability distribution of continuous random variables
• Tomorrow’s temperature is an example of a continuous random variable
• The CDF is defined similar to a discrete random variable
• A probability distribution that lists all values and the probability of each value is not suitable for a
continuous random variable.
• Instead, the probability is summarized in a probability density function (PDF/ density) or Cumulative
distribution function (CDF)

PDF CDF
Moments of a Probability Distribution
Mean (expected value)
The Mean or expected value of a random variable X is the average value over many repeated
trails or occurrences.
It measures the location of the central point of density curve of X.

• Suppose a discrete random value X takes on k possible values


(
𝐸 𝑋 = 6 𝑥& . 𝑃) 𝑋 = 𝑥&
&'$
Numbers of days (S) it will rain in the last week of August
Outcome 0 1 2 3 4 5 6 7
Probability 0.2 0.25 0.2 0.15 0.1 0.05 0.04 0.01

⇒ 𝐸 𝑆 = 0 ×0.2 + 1×0.25 + 2×0.2 + 3×0.15 + 4×0.1 + 5×0.05 + 6×0.04 + 7×0.01 = 2.06

• The mean or expected value of a continuous random variable X with a probability distribution function 𝑓 𝑥
"
𝐸 𝑋 = & 𝑥 . 𝑓 𝑥 𝑑𝑥
!"
Variance
The variance of a random variable X is the expected value of the square of the deviation of X
from its mean.
• The variance of X is a measure of the dispersion/spread of the density of X.
• Suppose a discrete random value X takes on k possible values
(
!
𝜎*! = 𝑉𝑎𝑟 𝑋 = 𝐸 𝑋−𝐸 𝑋 = )(𝑥& − 𝐸(𝑋))! . 𝑃) (𝑋 = 𝑥& )
&'$

Numbers of days (S) it will rain in the last week of August


Outcome 0 1 2 3 4 5 6 7
Probability 0.2 0.25 0.2 0.15 0.1 0.05 0.04 0.01
⇒ 𝑉𝑎𝑟 𝑆 = (0 − 2.06)! · 0.2 + (1 − 2.06)2 · 0.25 + (2 − 2.06)! · 0.2 + (3 − 2.06)! · 0.15
+(4 − 2.06)! · 0.1 + (5 − 2.06)! · 0.05 + (6 − 2.06)! · 0.04 + (7 − 2.06)! · 0.01 = 2.94

The standard deviation of a random variable X is the square root of Var(X).


𝜎* = 𝑉𝑎𝑟(𝑋)
Skewness
Skewness is a measure of the lack of symmetry of a distribution.

• 𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 = 0: Distribution is symmetric


• 𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 > < 0: Distribution is asymmetric: Negative skewness (left tail) and Positive skewness (right tail)

Checking the shapes: Mean < Median Mean > Median

The Mean will typically be strongly influenced by extreme values than the Median.
Often used probability distributions in Econometrics
Normal Distribution
“Everyone believes the normal distribution, Experimentalists think that it is a mathematical theorem while the
mathematicians believe it to be an experimental fact.”
- Gabriel Lippman, France, 1845-1921, (1908 Nobel Physics Prize)

• The most often encountered probability distribution (density) function in Econometrics is the Normal
distribution:
Normal Distribution
A normal distribution with mean µ and standard deviation σ (Variance σ! ) is denoted as 𝑵 𝝁, 𝝈𝟐

• Suppose a random variable 𝑿 ~ 𝑵 𝝁, 𝝈𝟐 , its , probability density function 𝑓 𝑥 is


$ / 0/1 ! /!-!
𝑓 𝑥 = 𝑒
- !.

• Symmetric, bell-shaped, and single-peaked.


• The total area (size) under the probability density function = 1.

A standard normal distribution 𝑁 0,1 has 𝜇 = 0 and 𝜎 = 1.


Normal Distribution & Probability
$
Probability function P X ≤ 𝑥 = F 𝑥 = ∫"# 𝑓 𝑢 𝑑𝑢
Probability density function
1 ! !
𝑓 𝑥 = 𝑒 " #"$ /!&
2𝜋𝜎

"
P a < X < 𝑏 = ) 𝑓 𝑋 𝑑𝑥
!
1 #"$ !/!& !
𝑓 𝑥 = 𝑒"
2𝜋𝜎

𝑎 𝑏
Standard Normal Distribution
• To look up probabilities of a general normally distributed random variable
𝑋 ~ 𝑁 𝜇, 𝜎 ! ,
• We need standardize (scaling) X to obtain the standard normal random variable Z as below
𝑋−𝜇
Z=
𝜎
, which we call standardization and z-score.

• Should be applied to a data that follows the normal distribution.


• Useful when we compare measurements that have different units.
• After standardized, the z-score always has mean of 0 and standard deviation of 1.
Standard Normal Distribution
' ! /!
$ ! ! 𝑋−𝜇 𝑓 𝑍 = 𝑒 ")
𝑓 𝑥 = 𝑒 / 0/1 /!- 𝑍=
!(
!.- 𝜎

• After standardized, the z-score always has


mean of 0 and standard deviation of 1.

• Once standardized to z-scores, the


observations initially measured in different
units are comparable.
Probabilities for Standardized Normal Distribution

Find the probability for a z score lies between 0 and 1 ? Cumulative form mean Z table
z .00 .01 .02
0.0 .0000 .0040 .0080
0.1 .0398 .0438 .0478

The standard normal distribution 0.2 .0793 .0832 .0871


0.3 .1179 .1217 .1255
0.4 .1554 .1591 .1628
0.5 .1915 .1950 .1985
0.6 .2257 .2291 .2324
0.7 .2580 .2611 .2642
0.8 .2881 .2910 .2939
0.9 .3159 .3186 .3212
1.0 .3413 .3438 .3461
0 Z
1.1 .3643 .3665 .3686

1. Check the Z table (Row => Integer +first decimal; Column => Second decimal )
2. The area between 0 and 1 shows 34.13%.
Probabilities for Standardized Normal Distribution
Find the area between -2 and 1 under the standard normal distribution curve.

= +
48% 34%

-2 1 -2 0 0 1

The area between -2 and 0 is the same as the area between 0 and 2, by symmetry
The area between -2 and 0 is about 48% and the area between 0 and 1 is about 34%
Hence the area between -2 and 1 is about 48%+34%=82%
The Chi-Squared Distribution
The chi-squared distribution is the distribution of the sum of m squared independent standard
normal random variables
• If we have m independent standard normal random variables Z, then the sum of the squares of these random
,
variables ∑3 !
&'$ 𝑍& has a chi-squared distribution with m degrees of freedom: 𝜒+

• The chi-squared distribution is used when testing


hypotheses in Econometrics (Survey studies)
e.g., goodness of fit of distribution of data
relationships between two categorial variables.

• Note: Degrees of freedom refers to the number of


independent pieces of information (variables) used to
calculate a statistic.
The Student t distribution
Let Z be a standard normal random variable and W a Chi-Squared distributed random variable with m
4
degrees of freedom, then we can find the follows the Student t-distribution with m degrees of
5/3
freedom 𝑡3 .

• The t distribution is similar as the standard normal


distribution but with fatter tails (higher variance).

• When m increases, t distribution will converge to the


standard normal distribution. (m > 30)

• The Student t distribution is also often used when testing


hypotheses in Econometrics
e.g., Test how close one observation to the mean
The F-distribution
8/3
The F-distribution with m and n degrees of freedom 𝐹3,7 is the distribution of a statistic , where W (df
9/7
=m) and V (df = n) are two chi-squared random variables.

• The main use of F-distribution is to test whether two


independent samples have been drawn for the normal
populations with the same variance.

• In Econometrics, we use F-distribution to


Ø Have F- statistics to compare two populations.
Ø Test the fit of different regression models.
Two Random Variables
Two Random Variables: Joint Distribution
• Most of the interesting questions in econometrics involve 2 or more random variables
• Answering these questions requires understanding the concepts of joint distribution, as well as
marginal distribution and conditional distribution
The joint probability distribution of two random variables X and Y can be written as
Pr(X = x, Y = y)
Simple example with a binary variable:
• Let Y equal 1 if it rains and 0 if it does not rain
• Let X equal 1 if it is very cold and 0 if it is no very cold

Joint probability distribution of X and Y


Very Cold (X=1) Not Very Cold (X=0) Total
Rain (Y=1) 0.15 0.07 0.22
Not Rain(Y =0) 0.15 0.63 0.78
Total 0.30 0.70 1
Two Random Variables: Marginal Distribution
The marginal probability distribution of a random variable is just another name for its probability
distribution
• The marginal distribution of Y can be computed from the joint distribution of X and Y by adding up the
probabilities of all possible outcomes for which Y takes a specific value
:
Pr 𝑌 = 𝑦 = 6 𝑃) (𝑋 = 𝑥& , 𝑌 = 𝑦)
&'$
Then the probability that it will rain:

Pr 𝑌 = 1 = 𝑃) 𝑋 = 1, 𝑌 = 1 + 𝑃) 𝑋 = 0, 𝑌 = 1 = 0.22

Joint probability distribution of X and Y


Very Cold (X=1) Not Very Cold (X=0) Total
Rain (Y=1) 0.15 0.07 0.22
Not Rain(Y =0) 0.15 0.63 0.78
Total 0.30 0.70 1
Two Random Variables: Conditional Distribution
The conditional distribution is the distribution of a random variable conditional on another random variable
taking on a specific value.

• The conditional probability that it rains given that is it very cold


0.15
Pr 𝑌 = 1 | 𝑋 = 1 = = 0.5
0.3
• In general, the conditional distribution of Y given X is
𝑃) (𝑋 = 𝑥, 𝑌 = 𝑦)
Pr 𝑌 = 𝑦| 𝑋 = 𝑥 =
𝑃) (𝑋 = 𝑥)
• The conditional expectation of Y given X is
(
𝐸 𝑌| 𝑋 = 𝑥 = 6 𝑦& 𝑃) (𝑌 = 𝑦& |𝑋 = 𝑥)
&'$
Example: The expected value of rain (Y) given that it is very cold (X=1) equals

𝐸 𝑌| 𝑋 = 1 = 1 × 𝑃) 𝑌 = 1 𝑋 = 1 + 0×𝑃) 𝑌 = 0 𝑋 = 1) = 1×0.5 + 0×0.5 = 0.5


Two Random Variables: Independence
Independence: Two random variables X and Y are independent if the conditional distribution of Y given X
does not depend on X

Pr 𝑌 = 𝑦| 𝑋 = 𝑥 = 𝑃) (𝑌 = 𝑦)
• If X and Y are independent this also implies

Pr 𝑌 = 𝑦, 𝑋 = 𝑥 = 𝑃) 𝑋 = 𝑥 _ 𝑃) 𝑌 = 𝑦 (see slide 8 )

• The conditional mean of Y given X equals the unconditional mean of Y


E 𝑌 | 𝑋 = 𝐸 (𝑌)

Take the example: What is the expected value for rain?


Two Random Variables: Covariance
The covariance is a measure of the extend to which two random variables X and Y move together,
or we say the linear association between X and Y.

𝐶𝑜𝑣 𝑋, 𝑌 = 𝐸 𝑋 − 𝐸(𝑋) _ 𝑌 − 𝐸(𝑌) = 𝜎*;


Example:
Very Cold (X=1) Not Very Cold (X=0) Total
Rain (Y=1) 0.15 0.07 0.22
Not Rain(Y =0) 0.15 0.63 0.78
Total 0.30 0.70 1

The covariance between rain (Y) and it being very cold (X):

𝐶𝑜𝑣 𝑋, 𝑌 = 1 − 0.3 1 − 0.22 _ 0.15 + (1 − 0.3) (0 − 0.22) · 0.15 + (0 − 0.3) (1 − 0.22) · 0.07+ (0 −
0.3) (0 − 0.22) · 0.63 = 0.84

Note: If X and Y are independently distributed, then 𝐶𝑜𝑣 𝑋, 𝑌 = 0 (But not vice versa !!)
Two Random Variables: Correlation
• The units of the covariance of X and Y are the units of X multiplied by the units of Y.
• This makes it hard to interpret the size of the covariance.

• The correlation between X and Y is unit free:

𝐶𝑜𝑣(𝑋, 𝑌) 𝜎*;
𝐶𝑜𝑟𝑟 𝑋, 𝑌 = =
𝑉𝑎𝑟 𝑋 𝑉𝑎𝑟(𝑌) 𝜎* 𝜎;

• A correlation is always between -1 and 1 and X and Y are uncorrelated if Corr (X, Y) = 0

• If the conditional mean of Y does not depend on X, X and Y are uncorrelated.

• If X and Y are uncorrelated this does not necessarily imply mean Independence!
Two Random Variables: Correlation

Once we calculated the corr, then we can conclude that


Ø corr(X,Y) = 1 mean perfect positive linear association
Ø corr(X,Y) = -1 means perfect negative linear association
Ø corr(X,Y) = 0 means no linear association

? ? Have a look for the three scatter plots.

Guess what the correlation Corr(X, Y) looks?

ü corr(X,Y) !=0?
ü corr(X,Y) > 0?
ü corr(X,Y) < 0?
ü corr(X,Y) = 0?
ü Others?

? ?
Two Random Variables: Correlation for a sample dataset
Example: Association between the scores of midterm and final

1. Collect the data for two variables over each individual (Cross-sectional dataset)

Midterm Final

?
Rank id major name score Rank id major name score

1 … … *** 198 1 … … *** 192

2 … … *** 188 2 … … *** 188


2 … … *** 188 3 … … *** 180
4 … … *** 185 3 … … *** 180
4 … … *** 185 5 … … *** 175
6 … … *** 182 5 … … *** 175
7 … … *** 181 7 … … *** 173

Check the joint distribution of the two variables?


Two Random Variables: Correlation for a sample dataset
2. To check if midterm score tends to be similar as the final score through scatter plot

$!! 45-degree line


Dependent variable

#"!

• If midterm score and final score is always the same, all


기말고사
Exam

observations will be on the 45-degree line.


#!!
• The pairs (midterm score, final score) are spread around
Final

45-degree line.
"! • The midterm score helps a lot in prediction the final score

!
! "! #!! #"! $!!
Midterm
중간고사

Independent variable
Two Random Variables: Correlation for a sample dataset
Population correlation 𝜌*+ :

𝐶𝑜𝑣(𝑋, 𝑌) 𝜎*+
𝜌*+ = 𝐶𝑜𝑟𝑟 𝑋, 𝑌 = =
𝑉𝑎𝑟 𝑋 𝑉𝑎𝑟(𝑌) 𝜎* 𝜎+

Sample correlation efficient 𝑟#, : (After simplification)


7
1 𝑥& − 𝑥̅ 𝑦& − 𝑦j
𝑟0< = 6( )( )
𝑛−1 𝑠0 𝑠<
&'$
Where:
n is the sample size and degree of freedom is n-1.
𝑥- , 𝑦- are the individual sample points indexed with i
' '
𝑥̅ = . ∑.-/' 𝑥- and 𝑦Q = . ∑.-/' 𝑦- are the sample means of x y, respectively.
'
𝑠# = ∑.-/' 𝑥- − 𝑥̅ ! is the sample standard deviation, analogously for 𝑠,
."'
#" "#̅ ," ",2
and are also called as sample standardized scores for x and y, respectively.
1# 1$
Two Random Variables: Correlation for a sample dataset

x y
𝑥̅ = 4, 𝑠0 = 2.24
1 5
3 9 𝑦j = 7, 𝑠< = 4.47
4 7
5 1
7 13

x y standardized x standardized y Product

1 5 -1.34 -0.45 0.6


3 9 -0.45 0.45 -0.2
4 7 0.00 0.00 0.0
5 1 0.45 -1.34 -0.6
7 13 1.34 1.34 1.8

𝑟0< = (0.6 − 0.2 + 0.0 − 0.6 + 1.8) / (5 − 1) = 0.40


Two Random Variables: Correlation for a sample dataset

Concern and Caution on Interpreting Correlation:

1. Correlation by Chance
• Replicate with a big enough sample size

2. Correlation is not Causation


• Reverse causality
New Hebrides (Vanatu) in South Pacific Ocean under UK and French sovereignty
“Lice make a man healthy”
From observation that people in good health usually had lice and sick people did not

• Simultaneity
Income and investment
This is it for today!

You might also like