Professional Documents
Culture Documents
FRM一级强化段定量分析 Crystal 金程教育 (标准版
FRM一级强化段定量分析 Crystal 金程教育 (标准版
Analysis
FRM一级培训讲义-强化班
讲师:Crystal
1-156
Fundamentals of Probability
Framework Random Variables and basic
statistics
Distribution
Hypothesis Testing
Linear Regression
Time Series
Measuring Returns, Volatility and
Correlation
Simulation and Bootstrapping
2-156
Fundamentals of
Probability
3-156
1. Fundamental principles of probability
Axioms of Probability
Any event A in the event space has P A ≥ 0 𝑜𝑟 (Pr A ≥ 0).
The probability of all events in sample space is 1 .
If events A1 and A2 are mutually exclusive, P A1 ∪ A2 = P A1 + P A2 .
Extensions
The probability of the union of any two sets can be decomposed into:
P A∪B =P A +P B −P A∩B
4-156
2. Conditional probability
Unconditional Probability (Marginal Probability)
The probability of an event without any restrictions (or lacking any prior
information), commonly known as P(A).
Conditional Probability
The probability on condition that another event occurs first. The
conditional probability of Event B, conditional on Event A , is given by
P(A ∩ B)
P BA = ;P A > 0
P(A)
Joint probability
P(A ∩ B) is the joint probability, which means the probability that two
events occur simultaneously.
5-156
2. Conditional probability
Independent events(or unconditional independent events):
If the event (B) is not influenced by whether the other event(A) occurs, then
we say those events are independent, otherwise they are dependent.
P A ∩ B = P(A) × P(B)
Conditional independence
Like probability , independence can be redefined to hold conditional on
another event(C) , two events A and B are conditionally independent if :
P A∩B C =P A C ×P B C
Note that two types of independence—unconditional and conditional—do
not imply each other.
Events can be both unconditionally dependent and conditionally
independent.
Events can be independent, yet conditional on another event they may
be dependent.
6-156
3. Total Probability Formula
Total Probability Formula
If an event A must result in one of the mutually exclusive events 𝐴1 , 𝐴2 ,
𝐴3 , ……, 𝐴𝑛 , then
𝑃 𝐴 = 𝑃 𝐴1 𝑃 𝐴 𝐴1 + 𝑃 𝐴2 𝑃 𝐴 𝐴2 +. . . +𝑃 𝐴𝑛 𝑃 𝐴 𝐴𝑛
A1 …
A2 1 𝐴𝑖 ∩ 𝐴𝑗 = ∅ (𝑖 ≠ 𝑗)
𝑛
An
(2) 𝐴𝑖 = Ω
A3 … 𝑖=1
7-156
4. Bayes’ rule
Specifically, we are often interested in the probability of an event happening
only if another event happens first.
Bayes’ rule provides a method to construct conditional probabilities using
other probability measures.
It is both a simple application of the definition of conditional probability
and an extremely important tool.
𝑃 𝐴∩𝐵 𝑃 𝐵|𝐴
𝑃 𝐴|𝐵 = = ×𝑃 𝐴
𝑃 𝐵 𝑃 𝐵
8-156
4. Bayes’ rule
Example:
Suppose you are asked using historical data to categorize the
managers as excellent or average.
Excellent managers are expected to outperform the market 70% of the
time. Average managers are expected to outperform the market only
50% of the time.
Assume that the probabilities of managers outperforming the market
for any given year is independent of their performance in prior years.
ABC Insurance Company has found that only 20% of all fund managers
are excellent managers and the remaining 80% are average managers.
A new fund manager to the portfolio started three years ago and
outperformed the market all three years.
9-156
4. Bayes’ rule
Questions:
1. What is the probability that the new manager was an excellent
manager when she first started managing portfolios three years
ago?
2. What are the probabilities that the new manager is an excellent or
average manager today?
3. What is the probability that the new manager will beat the market
next year, given that the new manager outperformed the market
the last three years?
TIPS:
P(O): The probability that the fund manager will outperform the
market.
P(E): The probability that the fund manager is an excellent manager.
P(A): The probability that the fund manager is an average manager.
10-156
4. Bayes’ rule
Answer 1: P(E)=20%
Answer 2:
The probability of an excellent manager outperforming the market is
70% [P(O|E)=70%].
The probability of an average manager outperforming the market is
50% [P(O|A) = 50%].
Next, we need to use Bayes’ theorem to determine the probability that
the new manager is excellent given that the manager outperformed
the market three years in a row.
𝑃 𝑂 𝐸 = 0.73 = 0.343 𝑃 𝐸 = 0.2
𝑃 𝑂 = 𝑃 𝑂 𝐸 × 𝑃 𝐸 + 𝑃 𝑂 𝐴 × 𝑃 𝐴 = 0.73 × 0.2 + 0.53 × 0.8
= 0.1686
𝑃 𝑂 𝐸 × 𝑃(𝐸) 0.343 × 0.2
𝑃 𝐸𝑂 = = = 0.4069 = 40.7%
𝑃(𝑂) 0.1686
11-156
4. Bayes’ rule
Answer 3: This question is determined by finding the unconditional
probability of the new manager outperforming the market. However,
now we will use 40.7% as the weight for the probability that the
manager is excellent and 59.3% as the weight for the probability that
the manager is average:
𝑃 𝑂 = 𝑃 𝑂 𝐸 × 𝑃 𝐸 + 𝑃 𝑂 𝐴 × 𝑃 𝐴 = 0.7 × 0.407 + 0.5 × 0.593
= 0.5814
12-156
5. Question 1
The following is a probability matrix for X = {1, 2, 3} and Y = {1, 2, 3}.
X
1 2 3
1 6% 15% 9%
Y 2 12% 30% 18%
3 2% 5% 3%
Each of the following is true except:
A. X and Y are independent.
B. The covariance (X,Y) is non-zero.
C. The probability Y = 3 conditional on X = 1 is 10%.
D. The unconditional probability that X = 2 is 50%.
Correct Answer : B
13-156
Random Variable
and basic
statistics
14-156
1.Random Variable
Definition of a random variable
A random variable is a function 𝜔 (i.e., an event in the sample space Ω)
that returns a number x .
Two classes of random variables
Discrete random variable
Assigns a probability to a distinct set of values, which can be either
finite or contain a countably infinite set of values.
When the set of values is infinite, the set must be countable.
Probability mass function (PMF)
Cumulative distribution function (CDF): 𝐹𝑋 𝑥 = P X ≤ 𝑥
Continuous random variable
Produces values from an uncountable set.
Probability density function (PDF) is used instead of PMF
𝑃(𝑋 = 𝑥) = 0 even though x can occur.
15-156
1.Random Variable
Multivariate random variables extend the concept of a single random
variable to include measures of dependence between two or more random
variables.
like their univariate counterparts, they can be discrete or continuous.
Probability matrix(probability table): When dealing with the joint
probabilities of two variables, it is often convenient to summarize the
various probabilities in a probability matrix.
Example: We take X from 1 or 2 with the same probability. We take Y
from [1,X] with the same probability.
X
Y
1 2
1 0.50 0.25
2 0.00 0.25
The probability in green blanks are joint probabilities
16-156
1.Random Variable
Marginal distributions:
The distribution of a single component of a bivariate random variable is
called a marginal distribution.
When a PMF is represented as a probability matrix, the two marginal
distributions are computed by either summing across columns or
summing down rows.
E.g. The probability in yellow blanks are marginal probabilities
The yellow column and row are the marginal distribution of Y and X.
X
Y
1 2 f𝑌 𝑦
1 0.50 0.25 0.75
2 0.00 0.25 0.25
f𝑋 𝑥 0.50 0.50
17-156
1.Random Variable
Conditional Distributions:
The conditional distribution summarizes the probability of the outcomes
for one random variable conditional on the other taking a specific value.
E.g. What is the conditional distribution of Y condition on X=2
fX,Y 2, y fX,Y 2, y
fY|X y x = 2 = =
fX 2 0.5
The conditional distribution is then
Y
1 2
50% 50%
18-156
1.Random Variable
Independence
The components of a bivariate random variable are independent if
f𝑋,𝑌 x, 𝑦 = f𝑋 𝑥 × f𝑌 𝑦
E.g. X and Y are not independent in the previous example
f𝑋 𝑥 = 1 × f𝑌 𝑦 = 1 = 0.375 ≠ f𝑋,𝑌 1,1 = 0.5
X
Y
1 2 f𝑌 𝑦
1 0.50 0.25 0.75
2 0.00 0.25 0.25
f𝑋 𝑥 0.50 0.50
19-156
2.Expectations and Moments
Moment: As stated previously, moments are a set of commonly used
descriptive measures that characterize important features of random
variables.
Two types of moments
Central moment: 𝜇𝐾 = 𝐸 𝑋 − 𝐸 𝑋 𝐾
, (𝑘 ≥ 2)
Non-central moment: 𝜇𝑘𝑁𝐶 = 𝐸 𝑋 𝐾 , (𝑘 ≥ 1)
20-156
2.Expectation and conditional expectations
The expectation of a function g(X,Y) is a probability weighted average of the
function of the outcomes g(X, Y). The expectation is defined as:
𝐸 𝑔 𝑋, 𝑌 = 𝑔 𝑋, 𝑌 𝑓𝑋,𝑌 (𝑋, 𝑌)
X
1 2
3 15% 10%
Y
4 60% 15%
Now, consider the function: 𝑔 𝑋, 𝑌 = 𝑋 𝑌 . The expectation of 𝑔 𝑋, 𝑌 is therefore:
𝐸 𝑔 𝑋, 𝑌 = 𝑔 𝑋, 𝑌 𝑓(𝑋, 𝑌)
𝑋∈{1,2} 𝑌∈{3,4}
𝐸 𝑔 𝑋, 𝑌 = 13 × 0.15 + 1 × 0.60 + 2 × 0.10 + 24 × 0.15 = 3.95
4 3
21-156
2.Expectation and conditional expectations
A conditional expectation is an expectation when one random variable
takes a specific value or falls into a defined range of values.
E.g. In the previous example, the distribution of Y condition on X=2
fX,𝑌 2, 𝑦 fX,𝑌 2, 𝑦
f𝑌|𝑋 𝑦x=2 = =
f𝑋 2 0.5
The conditional distribution is then
Y
1 2
50% 50%
The conditional expectation of Y is then
E 𝑌 X = 2 = 1 × 50% + 2 × 50% = 1.5
22-156
2.Expectations and Moments
Expected Value
A measure of central tendency – the first moment
23-156
2.Expectations and Moments
The median measures the 50% quantile of the distribution.
First, the data are sorted from smallest to largest.
When the sample size is odd, 𝑚𝑒𝑑𝑖𝑎𝑛 𝑥 = 𝑥(𝑛+1) 2
Two other commonly reported quantiles are the 25% and 75% quantiles.
First, the data are sorted from smallest to largest
Second, calculate the unit interval= 100%/(𝑛 − 1)
Find the 25% quantile and 75% quantile with linear interpretation.
These two quantile estimates ( 𝑞25 and 𝑞75 ) are frequently used together to
estimate the inter-quartile range:
𝐼𝑄𝑅 = 𝑞75 − 𝑞25
24-156
2. Variance and Moments
Variance
A measure of dispersion – the second moment
n
i=1(X i − μ)2
σ2x = =E X−μ 2
n
Above formula is the definition of variance. To compute the variance, we
use the following formula:
σ2 X = E X 2 − [E(X)]2
Measures how noisy or unpredictable that random variable is.
The positive square root of 𝜎𝑥2 , 𝜎, is known as the standard deviation,
also called volatility.
25-156
2. Variance and Moments
Properties of Variance
The variance of a constant is zero.
If a is constant, then: σ2 (aX) = a2 σ2 (X).
If b is a constant, then: σ2 (X + b) = σ2 (X).
If a and b are constant, then: σ2 (aX + b) = a2 σ2 (X).
If X and Y are independent random variables and a and b are constants,
then σ2 aX + bY = a2 σ2 X + b2 σ2 (Y).
If X and Y are two independent random variables, then:
σ2 X + Y = σ2 X + σ2 (Y) σ2 X − Y = σ2 X + σ2 (Y)
The relationship between expectation and variance:
σ2 X = E X 2 − E X 2
26-156
3. Covariance and Moments
Covariance
𝐶𝑜𝑣 𝑋, 𝑌 = 𝐸 𝑋 − 𝐸 𝑋 𝑌−𝐸 𝑌 = 𝐸 𝑋𝑌 − 𝐸 𝑋 𝐸 𝑌
𝑛
1
𝑆𝑋𝑌 = (𝑋𝑖 − 𝑋)(𝑌𝑖 − 𝑌)
𝑛−1
𝑖=1
Covariance measures how one random variable moves with another
random variable.
Covariance ranges from negative infinity to positive infinity.
27-156
3. Covariance and Moments
Properties of Covariance
If X and Y are independent random variables, their covariance is zero.
The relationship between covariance and variance:
σ2X ± Y = σ2x + σ2y ± 2Cov X, Y
If a, b and c are constant, then
Cov(a + bX, cY) = Cov(a, cY) + Cov(bX, cY) = b × c × Cov(X, Y)
The covariance of X and itself is the variance of X
Cov X, X = E X − E X X−E X = σ2 X
28-156
4. Correlation Coefficient and Moments
Correlation coefficient
Cov X,Y 𝜎𝑥𝑦 𝐸[(𝑋 − 𝜇𝑥 )(𝑌 − 𝜇𝑦 )]
𝜌𝑋𝑌 = = =
𝜎𝑥 𝜎𝑦 𝜎𝑥 𝜎𝑦 𝜎𝑥 𝜎𝑦
Properties of Correlation coefficient
Correlation has no units, ranges from -1 to +1.
Correlation measures the linear relationship between two random
variables.
If two variables are independent, their covariance is zero, therefore, the
correlation coefficient will be zero. The converse, however, is not true.
For example, Y = X2.
Variances of correlated variables:
𝜎𝑋2 ± 𝑌 = 𝜎𝑥2 + 𝜎𝑦2 ± 2𝜌𝑋𝑌 𝜎𝑥 𝜎𝑦
29-156
5. Skewness and Moments
Skewness
A measure of asymmetry of a PDF – the third moment.
𝐸 𝑋 − 𝜇𝑥 3
third central moment
𝑆= =
𝜎𝑥3 cube of standard deviation
Symmetrical and nonsymmetrical distributions
Positively skewed (right skewed) and negatively skewed(left skewed)
Negative-skewed Symmetric Positive-skewed
Mean Median Mode Mean = Median = Mode Mode Median Mean
Positive skewed: Mode < median < mean, having a right fat tail
Negative skewed: Mode > media > mean, having a left fat tail
30-156
6. Kurtosis and Moments
Kurtosis
A measure of tallness or flatness of a PDF – the fourth moment.
𝐸 𝑋 − 𝜇𝑥 4 fourth central moment
𝐾= =
𝐸 𝑋 − 𝜇𝑥 2 2 square of second central moment
For a normal distribution, the K value is 3.
Excess kurtosis = kurtosis – 3
31-156
7. Moments and Linear Transformations
Let 𝑌 = 𝑎 + 𝑏𝑋, a and b are constants:
The mean of Y is
𝐸 𝑌 = 𝑎 + 𝑏𝐸 𝑋
The variance of Y is
𝑉 𝑌 = 𝑏 2 𝑉[𝑋] 𝑜𝑟 𝑏 2 𝜎 2
If b>0, the skewness and kurtosis of Y are identical to the skewness and
kurtosis of X.
If b<0, (and thus 𝑌 = 𝑎 + 𝑏𝑋 is a decreasing transformation), then the
skewness has the same magnitude but the opposite sign.
32-156
8. The BLUE mean estimator
A sample estimator is used to estimate the parameter of population, the
best estimator should be unbiased, and linear(BLUE).
Linear: estimator is be calculated as a linear function of the sample data.
Unbiased: the expected value of the estimator should equal to the
parameter of population.
Best: If the estimator has the minimum variance(Best) among all the
other linear unbiased estimators.
The mean estimator is the Best Linear Unbiased Estimator (BLUE) of the
population mean when the data are iid.
33-156
8. Question 1
A model of the frequency of losses (L) per day assumes the following
discrete distribution: zero loss with probability of 20%; one loss with
probability of 30%; two losses with probability of 30%; three losses with
probability of 10%; and four losses with probability of 10%. What are,
respectively, the expected number of loss events and the standard
deviation of the number of loss events?
A. E(L) = 1.2 and σ = 1.44
B. E(L) = 1.6 and σ = 1.20
C. E(L) = 1.8 and σ = 2.33
D. E(L) = 2.2 and σ = 9.60
Correct Answer : B
34-156
8. Question 2
An analyst is concerned with the symmetry and peakedness of a
distribution of returns over a period of time for a company she is
examining. She does some calculations and finds that the median return is
4.2%, the mean return is 3.7%, and the mode return is 4.8%. She also finds
that the measure of kurtosis is 2. Based on this information, the correct
characterization of the distribution of return over time is:
Skewness Kurtosis
A. Positive Leptokurtic
B. Positive Platykurtic
C. Negative Platykurtic
D. Negative Leptokurtic
Correct Answer : C
35-156
8. Question 3
Which one of the following statements about the correlation coefficient
is false?
A. It always ranges from -1 to +1.
B. A correlation coefficient of zero means that two random variables
are independent.
C. It is a measure of linear relationship between two random variables.
D. It can be calculated by scaling the covariance between two
random variables.
Correct Answer: B
36-156
Distribution
37-156
1. Binomial Distribution
Bernoulli Distribution
P(X = 1) = p P(X = 0) = 1 – p
Binomial Distribution
The probability of x successes in n trails
𝑛!
𝑝 𝑥 =𝑃 𝑋=𝑥 = 𝐶𝑛𝑥 𝑝 𝑥 1−𝑝 𝑛−𝑥
= 𝑝𝑥 1 − 𝑝 𝑛−𝑥
𝑥! 𝑛 − 𝑥 !
Expectations and variances
Expectation Variance
Bernoulli random variable p p(1 – p)
Binomial random variable np np(1 – p)
38-156
2. Poisson Distribution
Poisson Distribution
When there are a large number of trials but a small probability of
success, Binomial calculations become impractical.
If we substitute 𝜆t/𝑛 for 𝑝, and let n very large, the Binomial Distribution
becomes the Poisson Distribution.
𝜆𝑘 −𝜆
𝑝 𝑘 =𝑃 𝑋=𝑘 = 𝑒 𝜆 = 𝑛𝑝
𝑘!
X refers to the number of success per unit.
λ indicates the rate of occurrence of the random events; i.e., it tells
us how many events occur on average per unit of time.
Mean Variance
λ λ
39-156
3. Normal Distribution
Normal Distribution
As n increases, the binomial distribution approaches Normal Distribution.
40-156
3. Normal Distribution
Normal distribution in practice
① Sums of independent normally distributed random variables are also
normally distributed.
② Standardization and Z-table Application
③ Key quantiles in normal distribution
④ Approximating discrete random variables to normal random variable
41-156
3. Normal Distribution
① Sums of independent normally distributed random variables are also
normally distributed
If X~𝑁(𝜇1 , 𝜎12 ), Y~𝑁(𝜇2 , 𝜎22 ) and they are independent, then
aX + bY~𝑁(𝑎𝜇1 + 𝑏𝜇2 , 𝑎2 𝜎12 + 𝑏 2 𝜎22 )
When applied to portfolio, a and b are usually asset weights
Example:
A $50 million prudent fund (PF) is merged with a $200 million
aggressive fund (AF). The return of PF~N(0.03, 0.072) and the return of
AF~ N(0.07, 0.152). Assuming the returns are independent, what is the
distribution of the portfolio return?
Correct Answer: 𝑅𝑝 ~𝑁(0.062, 0.12082 )
42-156
3. Normal Distribution
② Standardization and Z-table Application
Standardization: if 𝑋 ~ 𝑁(µ, 𝜎²), then
𝑋−𝜇
𝑍= ∼ 𝑁 0,1
𝜎
Check Z-table to find CDF
𝑋−𝜇
Φ
𝜎
Example:
If 𝑋 ~ 𝑁(70, 9), compute the probability of X ≤ 64.12.
Solution:
𝑍 = (𝑋 − 𝜇)/𝜎 = (64.12 − 70)/3 = −1.96
𝑃(𝑍 ≤ −1.96) = 0.0250
𝑃(𝑋 ≤ 64.12) = 0.0250
43-156
3. Normal Distribution
③ Key quantiles in normal distribution
Approximately 68% of all observations fall in the interval 𝜇 ± 𝜎
Approximately 90% of all observations fall in the interval 𝜇 ± 1.65𝜎
Approximately 95% of all observations fall in the interval 𝜇 ± 1.96𝜎
Approximately 99% of all observations fall in the interval 𝜇 ± 2.58𝜎
90%
95%
99%
44-156
3. Normal Distribution
④ Approximating discrete random variables to normal random variable
A binomial random variable can approximate to a normal random
variable, 𝑁(𝑛𝑝, 𝑛𝑝(1 − 𝑝) if both
𝑛𝑝 ≥ 10
𝑛(1 − 𝑝) ≥ 10.
n = 20
0.2 0.2 0.2
P(x)
P(x)
P(x)
0.1 0.1 0.1
45-156
4. Student’s t distribution
Student’s t distribution
It is similar to the normal, except it exhibits slightly heavier tails.
𝑋 − 𝜇𝑋 It is symmetrical.
𝑡= ~𝑡𝑛−1
𝑆𝑋 𝑛 It has mean of zero.
As the d.f. increase, the t-distribution converges with the standard
normal distribution.
Both the normal and student’s t distribution characterize the sampling
distribution of the sample mean. The difference is that the normal is
used when we know the population variance. The student’s t is used
when we must rely on the sample variance. In practice, we don’t know
the population variance, so the student’s t distribution is typically
appropriate.
46-156
5. Lognormal Distribution
Lognormal Distribution
It is useful for modeling asset prices which never take negative values.
Right skewed.
Chi-Square (𝝌𝟐 ) Distribution
F-Distribution
47-156
6. LLN and CLT
Law of Large Numbers (LLN)
The Kolmogorov proved Strong Law of Large Numbers which states
that if 𝑋𝑖 is a sequence of iid random variables with 𝐸 𝑋𝑖 ≡ 𝜇(a finite
parameter, 𝜇), then :
1 𝑎.s.
n
μn = i=1 X i μ
n
𝑎.𝑠.
The symbol means converges almost surely.
The mean in large samples μn (formally , when n → ∞ ) is
approximate to the true population mean μ.
48-156
6. LLN and CLT
The Central Limit Theorem (CLT)
If X1, X2, … , Xn, a i.i.d. random sample from any population (regardless
of any distributions) with a finite mean, 𝜇 and finite variance, 𝜎 2 , then
the mean estimator, μn , tends to be normally distributed with mean 𝜇
𝜎2
and variance, as the sample size increases.
𝑛
d 𝜎2
μn = 𝑋 → 𝑁 𝜇 ,
𝑛
or
𝜇−𝜇 d
→ 𝑁 0,1
𝜎 𝑛
49-156
7. Question 1
1. On a multiple choice exam with four choices for each of six questions,
what is the probability that a student gets less than two questions
correct simply by guessing?
A. 0.46%
B. 23.73%
C. 35.60%
D. 53.39%
Correct Answer:D
p(X=0) = (3/4)6 = 17.80%
p(X=1) = 6×(1/4)×(3/4)5 = 35.59%
The probability of getting less than two questions correct is p(X=0)
+ p(X=1) = 53.39%.
50-156
7. Question 2
2. A call center receives an average of two phone calls per hour. The
probability that they will receive 20 calls in an 8-hour day is closest to:
A. 5.59%
B. 16.56%
C. 3.66%
D. 6.40%
Correct Answer: A
To solve this question, we first need to realize that the expected
number of phone calls in an 8-hour day is 16. Using the Poisson
distribution, we solve for the probability that X will be 20.
1620 𝑒 −16
𝑃(𝑋 = 20) = = 5.59%
20!
51-156
7. Question 3
3. Which of the following statement about the normal distribution is not
accurate?
A. Kurtosis equals three.
B. Skewness equals one.
C. The entire distribution can be characterized by two moments,
mean and variance.
D. The normal density function has the following expression:
1 1
[− 2 (𝑥−𝜇)2 ]
𝑓 𝑥 = 𝑒 2𝜎
2𝜋𝜎 2
Correct Answer:B
52-156
Hypothesis
Testing
53-156
1. Statistical Inference
What is Statistical Inference?
sampling estimation
54-156
2. Point and Confidence Interval Estimate
Point Estimation
Using a single numerical value to estimate the parameter of the
population.
parameter
𝑋 → 𝜇𝑋 S2 → 𝜎𝑋2
estimator
Do not reject
Take a sample, Formulate a
Reject arrive at decision decision rule
Step 5 Step 4
58-156
3. Hypothesis Testing
Steps 1: State null (H0) and alternative hypotheses (Ha)
H0 : 𝜇𝑋 = 18.5, HA : 𝜇𝑋 ≠ 18.5
One-tailed test vs. Two-tailed test
59-156
3. Hypothesis Testing
Steps 3: Choose the level of significance α and obtain critical value
Level of significance (alpha)
Critical Value
The distribution of test statistic (z, t, χ2, F)
Degree of freedom
t(n-1), χ2 (n-1), F(n1-1,n2-1)
One-tailed or two-tailed test
For the example:
Alpha=5%
t𝛼 𝑛 − 1 = 2.052
2
60-156
3. Hypothesis Testing
Step 4: Formulate a decision rule
Reject H0 if |test statistic| > critical value.
Fail to reject H0 if |test statistic| < critical value.
Rejection region lies on the same side of HA
2.5% 2.5% 5%
95% 95%
61-156
4. Test of Single Population Mean
H0: μ = μ0
z-test vs. t-test
z-statistic
𝑥 − 𝜇0
𝑧=
𝜎 𝑛
t-statistic
𝑥 − 𝜇0
𝑡=
𝑠 𝑛
62-156
5. Testing the equality of two means
H0 : μX = μY
Testing whether the means of two series are equal.
If the null hypothesis is true , then : E Zi = E Xi − E Yi = μX − μY = 0.
The test statistic is constructed as :
μZ μX − μY
T= =
σ2Z n 𝜎X2 + 𝜎Y2 − 2𝜎𝑋𝑌
n
When Xi and Yi are both iid and mutually independent , the test statistic
for testing that the means are equal is :
μX − μY
T=
𝜎X2 𝜎Y2
+
nX nY
63-156
5. Testing the equality of two means
Example: Assume two distributions that measure the average rainfall in City
X and City Y , respectively . Their means , variance ,sample size and
correlation are
μX = 10CM ; μY = 14CM ; 𝜎X2 = 4 ; 𝜎Y2 = 6 , nX = nY = 12 , ρXY = 0.3
Question : Whether the average rainfall in City X are significantly
different from that in City Y ?
H0 : μX = μY
The value of the test statistic is
μX − μY 10 − 14
T= = = −5.21
𝜎X2 + 𝜎Y2 − 2𝜎𝑋𝑌
4 + 6 − 2 × 0.3 × 4 × 6
n 12
Therefore , the null would be rejected even at a 99.99% confidence level.
64-156
6. Test of Single Population Variance
H0: 𝝈𝟐 = 𝝈𝟐𝟎
The Chi-Square test
2
𝑛 − 1)𝑠 2
𝜒 = 𝑑𝑓 = 𝑛 − 1
𝜎02
Where:
𝑛 = sample size
𝑠 2 = sample variance
𝜎02 = hypothesized value for the population variance
65-156
7. Test of Variances Difference
H0: 𝛔𝟐𝟏 = 𝛔𝟐𝟐
The F-test
𝑠12
𝐹 = 2 df1 = 𝑛1 − 1 df2 = 𝑛2 − 1
𝑠2
Always put the larger variance in the numerator (𝑠12 > 𝑠22 ).
66-156
8. Summary of Hypothesis Testing
Test type Assumptions H0 Test-statistic distribution
Normally distributed population, 𝑥 − 𝜇0
𝜇=0 𝑧= 𝑁(0,1)
known population variance 𝜎 𝑛
Normally distributed population, 𝑥 − 𝜇0
𝜇=0 𝑡= 𝑡(𝑛 − 1)
Mean unknown population variance 𝑠 𝑛
hypothesis μZ
testing T=
Testing the equality of two σ2Z n
𝜇1 = 𝜇2 μX − μY 𝑡(𝑛 − 1)
means =
𝜎X2 + 𝜎Y2 − 2𝜎𝑋𝑌
n
𝑛 − 1 𝑠2
Variance
2
Normally distributed population 𝜎 = 𝜎02 𝜒 = 2 𝜒 2 (𝑛 − 1)
𝜎02
hypothesis
testing Two independent normally
𝜎12 = 𝜎22 𝐹 = 𝑠12 𝑠22 𝐹(𝑛1 − 1, 𝑛2 − 1)
distributed populations
67-156
9. P Value Approach
① P Value Approach
P value is the smallest level of significance for which the null
hypothesis can be rejected.
Return to our P/E example,
𝑋 − 𝜇𝑋 23.25 − 18.5
𝑡= = = 2.65~𝑡𝑝 (27) → 𝑃 ≈ 0.015
𝑆𝑋 𝑛 9.49 28 2
68-156
9. P Value Approach
2.5% 2.5%
1.07% 1.07%
69-156
10. Type I and Type II Errors
Type I error
Reject the null hypothesis when it’s actually true.
Type II error
Fail to reject the null hypothesis when it’s actually false.
Significance level (α)
The probability of making a Type I error:
Significance level = P(Type I error)
Power of a test
The probability of correctly rejecting the null hypothesis when it is false:
Power of a test = 1 – P(Type II error)
70-156
11. Question 1
Analyst John is concerned that the average days sales outstanding has
increased above its historical average of 27 days. From a large sample
of 36 companies, he computes a sample mean DSO of 29 days with
sample standard deviation of 7. His one-sided alternative hypothesis,
stated with 95% confidence, is that DSO is greater than 27. Does she
reject the null?
A. No, do not reject one-sided null as the t-statistic is less than 1.65.
B. No, do not reject one-sided null as the t-statistic is less than 1.96.
C. Yes, do reject one-sided null as the t-statistic is greater than 1.65.
D. Yes, do reject one-sided null as the t-statistic is greater than 1.96.
Correct answer : C
71-156
11. Question 2
If the mean P/E of 30 stocks in a certain industrial sector is 18 and the
sample standard deviation is 3.5, standard error of the mean is closest
to:
A. 0.12
B. 0.34
C. 0.64
D. 1.56
Correct answer :C
72-156
11. Question 3
Assume random variable is normally distributed with mean of 10 and a
variance of 25. Without using a calculator, what is the probability that X
falls within 1.75 and 21.65?
A. 68%
B. 94%
C. 95%
D. 99%
Correct Answer : B
73-156
11. Question 4
Assume the population of hedge fund returns has an unknown
distribution with mean of 8% and volatility of 10%. From a sample of
40 funds, what is the probability the sample mean return will exceed
10.6%?
A. 5%
B. 8%
C. 10%
D.12%
Correct Answer : A
74-156
11. Question 5
Which of the following statements regarding hypothesis testing is
correct?
A. Type II error refers to the failure to reject the Ha when it is actually
false.
B. Hypothesis testing is used to make inferences about the
parameters of a given population on the basis of statistics
computed for a sample that is drawn from another population.
C. All else being equal, the decrease in the chance of making a type I
error comes at the cost of increasing the probability of making a
type II error.
D. If the p-value is greater than the significance level, then the
statistics falls into the reject intervals.
Correct answer : C
75-156
Linear Regression
76-156
1. Elements of linear regression
Interpretation of regression function
An estimated slope coefficient of b1 would indicate that the dependent
variable will change b1 units for every 1 unit change in the independent
variable.
The intercept term of b0 can be interpreted to mean that the
independent variable is zero, the dependent variable is 𝑏0.
The error term is the portion that can’t be explained by independent
model.
This shock is assumed to have mean 0 so that E[Y] = E[b0 + b1X +
ε] = b0 + b1E[X]
Slope coefficient
𝑌𝑖 = b0 + 𝑏1 𝑋𝑖 + 𝜀𝑖
77-156
2. Key assumptions in linear regression
OLS regression requires a number of assumptions. Most of the major assumptions pertain
to the regression model’s residual term (i.e., error term).
A linear relationship exists between X and Y.
Independent variables are not random, or independent variables are not correlated
with error term, or independent variables are not correlated(only in multiple
regression).
The expected value of the error term is zero (i.e., 𝐸 𝜀𝑖 = 0).
The variance of the error term is constant (i.e., the error terms are homoskedastic).
The error term is uncorrelated across observations (i.e., the error terms are
autocorrelated).
The error term is normally distributed.
There are five key assumptions for the linear regression model:
The regression errors, 𝜀𝑖 , have a mean of zero conditional on the regressors Xi
Constant Variance of Shocks
Variance of independent variable is strictly greater than 0 (σ2X > 0).
The sample observations are i.i.d. random draws from the population
Large outliers are unlikely
The explanatory variables are not perfectly linearly dependent.
78-156
3. Ordinary Least Squares
Ordinary least squares (OLS)
The OLS estimation is the process of estimating the population
parameter bi using the corresponding bi value, which minimizes the
square residual (i.e., the error terms).
The OLS sample coefficients are those that:
minimize 𝜀𝑖2 = 𝑌𝑖 − 𝑏0 + 𝑏1 × 𝑋1 ]2
79-156
4. Inference in linear regression
The confidence interval for the regression coefficient, b1and b0
𝑏1 ± (Critical Value × 𝑠𝑏1 )
𝑏0 ± (𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑉𝑎𝑙𝑢𝑒 × 𝑠𝑏0 )
The standard error of the regression coefficient.
1 𝑠
𝑆𝑏1 =
𝑛 𝜎𝑥
𝑠 2 (𝜇𝑋2 + 𝑠𝑋2 )
𝑆𝑏0 =
𝑛𝜎𝑋2
80-156
4. Inference in linear regression
Regression Coefficient Hypothesis Testing
A t-test may also be used to test the hypothesis that the true slope
coefficient b1, is equal to some hypothesized value, k (usually k=0), the
appropriate test procedure is:
State null hypothesis and alternative hypothesis
𝐻0: 𝑏1 = 0 𝐻𝑎 : 𝑏1 ≠ 0
Calculate test statistic
𝑏1 − 𝑏1
T=
𝑠𝑏1
The test statistic can also be transformed into a p-value,
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 2(1 − Φ 𝑇 )
Reject H0 if T falls into rejection region, or p-value is less than test
size.
81-156
4. Inference in linear regression
The relationship between each component
2 2 2
𝑌𝑖 − 𝑌 = 𝑌𝑖 − 𝑌 + 𝑌𝑖 −𝑌𝑖
Y
Residual Sum of Squares
𝑌𝑖 = 𝑏0 + 𝑏1 𝑋𝑖 𝑌𝑖 −𝑌𝑖 → 𝑅SS
Total Sum
𝑌𝑖 − 𝑌 → 𝑇𝑆𝑆
𝑌𝑖 − 𝑌 → 𝐸𝑆𝑆 of Squares
b0
X
82-156
4. Inference in linear regression
The reporting matrix of each component: Analysis of Variance (ANOVA)
Table
df SS MSS
Explained k ESS ESS/k
Residual n-k-1 RSS RSS/(n-k-1)
Total n–1 TSS -
83-156
4. Inference in linear regression
Joint Hypothesis Testing
The t-test is not directly applicable when testing complex hypotheses
that involve more than one parameter , because the parameter
estimators can be correlated.
An F-test is used to test whether at least one slope coefficient is
significantly different from zero.
H0: b1 = b2 = b3 = … = bk = 0; H1: at least one bj ≠ 0 (j = 1 to k)
𝐸𝑆𝑆
𝐹= 𝑘
𝑅𝑆𝑆
𝑛−𝑘−1
The test assesses the effectiveness of the model as a whole in explaining
the dependent variable.
Decision rule: reject H0, if F(test-statistic) > Fc(critical value).
84-156
4. Inference in linear regression
Selected Data from 30 college students
Age Height Weight Age Height Weight
21 163 60 20 153 42
22 164 56 20 156 44
21 165 60 21 156 38
23 168 55 21 157 48
21 169 60 21 158 52
21 170 54 23 158 45
23 170 80 22 159 43
23 170 64 22 160 50
22 171 67 21 160 45
22 172 65 21 160 52
23 172 60 23 160 50
21 172 60 22 161 50
23 173 60 21 161 45
22 173 62 21 162 55
21 174 65 20 162 60
85-156
4. Inference in linear regression
Summary output
Regression Statistics
Multiple R 0.8148004
R Square 0.6638996
Adjusted R Square 0.6390033
Standard Error 5.5120076
Observations 30
ANOVA
df SS MS F Significance F
Regression 2 1620.3799 810.18993 26.666575 4.04884E-07
Residual 27 820.32014 30.382227
Total 29 2440.7
Standard
Coefficients t Stat P-value Lower 95% Upper 95%
Error
Intercept -140.7652 29.755964 -4.730656 6.282E-05 -201.8194407 -79.71104917
Age -0.048528 1.1637722 -0.041699 0.9670455 -2.436391407 2.33933526
Height 1.1972821 0.1800553 6.6495241 3.896E-07 0.827839137 1.566725091
86-156
5. Measuring model fit
R2 (the Coefficient of Determination)
A more intuitive measure of the “goodness of fit” of the regression. It is
interpreted as a percentage of variation in the dependent variable
explained by the independent variable. Its limits are 0 ≤ R2 ≤ 1.
2
𝐸𝑆𝑆 𝑅SS
𝑅 = =1−
𝑇𝑆𝑆 𝑇𝑆𝑆
Notes that in a simple two-variable regression, the square root of R2 is
the correlation coefficient (r) between Xi and Yi.
𝑟 2 = 𝑅2 → 𝑟 = ± 𝑅2
87-156
5. Measuring model fit
The limitations with R2
Adding a new variable to the model always increases the R2
The adjusted R2 is a modified version of the R2 that does not
necessarily increase with a new independent variable is added.
𝑅SS 𝑛 − 𝑘 − 1
𝐴djusted 𝑅2 = 1 −
𝑇𝑆𝑆 𝑛 − 1
𝑛 − 1 𝑅𝑆𝑆
=1−
𝑛 − 𝑘 − 1 𝑇𝑆𝑆
𝑛−1
=1− (1 − 𝑅2 )
𝑛−𝑘−1
Adjusted R² ≤ R²; adjusted R² may be less than zero.
R2 cannot be compared across models with different dependent
variables.
There is no such thing as a "good" value for R2
88-156
5. Measuring model fit
Calculating R2 and adjusted R2
The analyst runs a regression of monthly return on the five independent
variables within 60 months. The total sum of squares is 570, the residual
sum of squares is 180. Calculate R2 and adjusted R2.
Assuming that the analyst now adds four independent variables for the
regression, R2 increases to 70.0%. Identify the models that analysts are
most likely to use.
Answer:
570 − 180
𝑅2 = = 68.42%
570
60 − 1
𝑅𝑎2 = 1 − 1 − 𝑅2 = 65.5%
60 − 5 − 1
2 60 − 1 2
𝑅′ 𝑎 = 1 − 1 − 𝑅′ = 64.6%
60 − 9 − 1
89-156
6. Violation of the assumption
Conditional Heteroskedasticity
The heteroskedasticity is related to the level of (i.e., conditional on) the
independent variable.
Conditional heteroskedasticity does create significant problems for statistical
inference. Y High residual
Low residual variance
variance
0 X
Effect of Heteroskedaticity on Regression Analysis
The standard errors are usually unreliable estimates.
The coefficient estimates (the b1) aren’t affected.
If the standard errors are too small, but the coefficient estimates themselves
are not affected, the t-statistics will be too large and the null hypothesis of
no statistical significance is rejected too often.
How to test heteroskedasticity: White Test
90-156
6. Violation of the assumption
Multi-collinearity (Multiple-regression only)
Multicollinearity refers to the situation that two or more independent
variables are highly correlated with each other.
Three methods to detect multi-collinearity
1. T-tests indicate that none of the individual coefficients is
significantly different than zero, while the F-test indicates overall
significance and the R2 is high.
2. The absolute value of the sample correlation between any two
independent variables is greater than 0.7 (i.e., |r| > 0.7).
3. Common sense
Methods to correct multi-collinearity
Omit one or more of the correlated independent variables.
91-156
6. Violation of the assumption
Omitted Variable Bias
For omitted variable bias to arise, two things must be true:
At least one of the included regressors must be correlated with the
omitted variable.
Omitted variables must be the determinant of the dependent
variable Y.
Implications of Omitted Variable Bias
Omitted variable bias is a problem whether the sample size is large or
small.
Whether this bias is large or small in practice depends on the correlation
𝑟𝑋,𝜀 between the regressor and the error term.
The larger the correlation, the larger is the bias.
Corrections of Omitted Variable Bias
Add omitted variables to construct a multiple regression
92-156
7. Selection models
Two approaches to find the appropriate model complexity
① General-to-specific model selection
② M-fold cross-validation
93-156
7. Selection models
① General-to-specific model selection
1. First includes all relevant variables.
2. Remove the variable with coefficient with the smallest absolute t-
statistic (statistically insignificant ).
3. Re-estimate using the remaining explanatory variables and remove
unqualified variables.
4. Repeat the steps mentioned above until the model contains no
coefficients that are statistically insignificant.
5. Common choice for 𝛼 are between 1% and 0.1% (t value are least 2.57
or 3.29 , respectively).
94-156
7. Selection models
② M-fold cross-validation
It is designed to select a model that performs well in fitting observations
not used to estimated the parameter (out-of-sample prediction).
Steps of m-fold cross-validation
1. The first step is to determine a set of candidate models . If a data
set has n candidate explanatory variables, then there are 2𝑛
possible model specifications.
2. Splitting the data into m equal sized blocks , parameters are
estimated using m-1 blocks (training set) and residuals are
computed with data in the excluded block . (validation set)
3. Repeat the process of estimating parameters and computing
residual for a total of m times (ensure each block is used to
compute residual once).
4. Compute sum of squared errors for m times and choose the model
with the smallest out-of-sample sum of squared residuals.
95-156
8. Plot of Residuals and Outliers
Outlier
Outliers are values that , if removed from the sample, produce large
changes in the estimated coefficients.
Cook’s distance measures the sensitivity of the fitted values in a
regression to dropping a single observation j . It is defined as :
−j 2
n
i=1 Yi − Yi
Dj =
KS 2
−j
Yi is the fitted value of 𝑌𝑖 when observation j is dropped and the
model is estimating n-1 observations.
K is the number of coefficients in the regression model.
S 2 is the estimate of error variance from the model using all
observations.
Dj larger than 1 indicate that observation j has a large impact on the
estimated model parameters.
96-156
9. Question 1
1. A regression of a stock’s return (in percent) on an industry index’s return (in
percent) provides the following results:
Coefficient Standard Error
Intercept 2.1 2.01
Industry index 1.9 0.31
Degrees of Freedom SS
Explained 1 92.648
Residual 3 24.512
Total 4 117.160
Which of the following statements regarding the regression is incorrect?
A. The correlation coefficient between the X and Y variables is 0.889.
B. The industry index coefficient is significant at the 99% confidence interval.
C. If the return on the industry index is 4%, the stock’s expected return is 9.7%.
D. The variability of industry returns explains 21% of the variation of company
returns.
97-156
9. Question 1
Answer: D
r2 = R2 = 92.648/117.160 = 79%, the variability of industry returns
explains 79% of the variation of company.
t-stat (industry index) = 1.9/0.31 = 6.13, so the coefficient of
industry index is significant.
R = 2.1% + 1.9 × R(industry index) = 2.1% + 1.9 × 4 % = 9.7%
98-156
Time Series
99-156
1. Introduction to time series analysis
A time series is a set of observations on a variable’s outcomes in different
time periods, taking the example of a set of monthly U.S. retail sales data.
100-156
1. Introduction to time series analysis
A time series can be decomposed into three distinct components
① The trend, which captures the changes in the level of the time series
over time.
② The seasonal component, which captures predictable changes in the
time series according to the time of year.
③ The cyclical component, which captures the cycles in the data.
101-156
2. Non-Stationary Time Series
Covariance-stationary time series have means, variances, and
autocovariances that do not depend on time. Any time series that is not
covariance-stationary is non-stationary.
Sources of non-Stationarity
There are three most pervasive sources of non-Stationarity in time series:
Time trends
Seasonalities
Unit roots (random walks)
102-156
2. Non-Stationary Time Series——Trend
Polynomial Trends
Linear time trend
Yt = δ0 + δ1 t + εt εt ~WN 0, σ2
This process is non-stationary because the mean depends on time :
𝐸 𝑌𝑡 = δ0 + δ1 t
Nonlinear Trends
The linear time trend model generalizes to a polynomial time
trend model by including higher powers of time :
Yt = δ0 + δ1 t + δ2 𝑡 2 + ⋯ + δm 𝑡 𝑚 + εt
Log-linear trend
Constant growth rates are more plausible than constant growth in
most financial variables. These growth rates can be examined using a
log-linear models that relate the natural log of 𝑌𝑡 to a linear trend.
𝑙𝑛𝑌𝑡 = δ0 + δ1 t + εt
The constant growth rate of 𝑌𝑡 can be defined as :δ1
103-156
2. Non-Stationary Time Series——Seasonalities
Seasonalities induce non-stationary behavior in time series by relating the
mean of the process to the month or quarter of the year.
Seasonal Dummy Variables
The seasonality repeats each s periods and can be directly modeled as:
𝑌𝑡 = 𝛿 + 𝛾1 𝑙1𝑡 + 𝛾2 𝑙2𝑡 + ⋯ + 𝛾𝑠−1 𝑙𝑠−1𝑡 + 𝜀𝑡
𝑙𝑖𝑡 = 1 or 0
Note that dummy variable 𝑙𝑠𝑡 is omitted to avoid what is known as
the dummy variable trap, which occurs when dummy variables are
mullticollinear.
Illustrates the use of dummy variables
𝑌𝑡 = 𝛿 + 𝛾1 𝐽𝑎𝑛𝑡 + 𝛾2 𝐹𝑒𝑏𝑡 + ⋯ + 𝛾11 𝑁𝑜𝑣𝑡 + 𝜀𝑡
𝑌𝑡 = 𝑎 𝑚𝑜𝑛𝑡ℎ𝑙𝑦 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑟𝑒𝑡𝑢𝑟𝑛𝑠
𝐽𝑎𝑛𝑡 = 1 𝑖𝑓 𝑝𝑒𝑟𝑖𝑜𝑑 𝑡 𝑖𝑠 𝑖𝑛 𝑡ℎ𝑒 𝐽𝑎𝑛𝑢𝑎𝑟𝑦, 𝐽𝑎𝑛𝑡 = 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
···
𝑁𝑜𝑣𝑡 = 𝑂𝑐𝑡𝑡 =···= 𝐽𝑎𝑛𝑡 = 0 𝑖𝑓 𝑝𝑒𝑟𝑖𝑜𝑑 𝑡 𝑖𝑠 𝑖𝑛 𝑡ℎ𝑒 𝐷𝑒𝑐𝑒𝑚𝑏𝑒𝑟.
104-156
2. Non-Stationary Time Series——Random Walk
Random Walks
105-156
2. Non-Stationary Time Series——Random Walk
How does random walk looks like?
106-156
2. Non-Stationary Time Series——Random Walk
Unit Roots
The roots are 1 and 0.8. Then it indicates the existence of unit root.
107-156
2. Non-Stationary Time Series——Random Walk
Consider the situation
If 𝑌𝑡 is a random walk, then:
𝑌𝑡 = 𝑌𝑡−1 + 𝜀𝑡
𝑌𝑡 − 𝑌𝑡−1 = 𝑌𝑡−1 − 𝑌𝑡−1 + 𝜀𝑡
Δ𝑌𝑡 = 0 × 𝑌𝑡−1 + 𝜀𝑡
The value of γ is 0 when the process is a random walk.
Testing for Unit Roots: Augmented Dickey-Fuller (ADF) Test
Step 1: form the correct model
Δ𝑌𝑡 = 𝛾𝑌𝑡−1 + 𝜀𝑡
Step 2: State the Null H0 : γ = 0(Yt is a random walk) , H1 : γ < 0(Yt is
covariance-stationary)
Step 3: Perform the regular t test and draw a conclusion
A unit root can only be removed by differencing.
108-156
3. Stationary Time Series
Stationary Time Series
In a stationary time series, the trend and seasonality components are
strictly eliminated, only cyclical component is left.
Cyclical component is modeled by both the shocks to the process
and the memory (i.e., persistence) of the process.
109-156
3. Stationary Time Series——Covariance Stationarity
Why is stationarity so important?
The ability of a model to forecast a time series depends crucially on
whether the past resembles the future.
If the past data can’t explain the data in the future, the time series
models are useless.
Stationarity is a key concept that formalizes the structure of a time series
and justifies the use of historical data to build models.
If the underlying probabilistic structure of the series were changing
over time(not stationary), there would be no way to predict the
future accurately on the basis of the past.
Covariance stationarity is used to test a time series data of its stationary.
Covariance stationarity depends on the first two moments of a time
series: the mean and the autocovariances.
110-156
3. Stationary Time Series——Covariance Stationarity
Covariance Stationary: A time series is covariance stationary if its first two
moments satisfy three key properties.
The mean is constant and does not change over time.
E Yt = μ for all t
The autocovariance is finite, does not change over time, and only
depends on the distance between observation h
𝐶𝑜𝑣 Yt , Yt−ℎ = 𝛾ℎ for all t
The variance is finite and does not change over time(specially when
h=0)
V Yt = 𝛾0 < ∞ for all t
111-156
3. Stationary Time Series——Covariance Stationarity
Autocovariance
Autocovariance is defined as the covariance between a stationary time
series at different points in time.
The ℎ𝑡ℎ autocovariance is defined as:
γt,h = E Yt − E Yt Yt−h − E Yt−h
When h=0 , then γt,0 is the variance of Yt
2
γt,0 = 𝐸 Yt − E Yt
The autocovariance function is a function of h that returns the
autocovariance between Yt and Yt−h
𝛾 ℎ =γℎ
The symmetry follows from the third property of covariance stationary ,
because the autocovariance depends only on h :
Cov Yt , Yt−h = Cov Yt+h , Yt
112-156
3. Stationary Time Series——Covariance Stationarity
Autocorrelation function(ACF)
The autocorrelations are just the “simple” or “regular” correlations
between 𝑌𝑡 and 𝑌t−ℎ , so the function can be written as follow:
cov(𝑌𝑡 , 𝑌𝑡−ℎ ) 𝛾(ℎ) 𝛾(ℎ)
𝜌(h) = = =
var(𝑌𝑡 ) var(𝑌𝑡−ℎ ) 𝛾(0) 𝛾(0) 𝛾(0)
Partial Autocorrelation function(PACF)
The partial autocorrelations function measures the relationship between
𝑌t and 𝑌t−ℎ removed the effects of 𝑌t−1 ⋯ 𝑌t−ℎ+1 ,
In other words, the partial autocorrelations function measures the
relationship only between 𝐘𝐭 and 𝐘𝐭−𝛕 .
Given that
Yt = b0 + b1 Yt−1 + b2 Yt−2 + ⋯ + bh Yt−h + εt
α(h) = bℎ
113-156
3. Stationary Time Series——White Noise
White noise
White noise is the fundamental building block of any time-series model to
model shocks, which means that all the stationary time series models are
established on white noise process.
𝜀𝑡 ~𝑊𝑁 0, 𝜎 2
White noise processes is covariance-stationary because of three properties
Mean zero
Constant and finite variance, 𝜎 2
No autocorrelation or autocovariance, 𝛾 ℎ = 𝜌 ℎ = 0 𝑓𝑜𝑟 ℎ ≠ 0
Independent white noise
If 𝜀 is serially independent, then we say that 𝜀 is independent white noise.
𝑖𝑖𝑑
𝜀𝑡 ~ (0, 𝜎 2 )
Note that white noise itself has no autocorrelation, but not necessarily to be
independent.
Gaussian white noise(normal white noise)
𝑖𝑖𝑑
A special case of iid noise , where 𝜀𝑡 ~ 𝑁(0, 𝜎 2 )
114-156
3. Stationary Time Series——AR, MA and ARMA
There are three famous models to model stationary time series process,
each built on white noise process.
① AR(autoregressive model)
② MA(moving average model)
③ ARMA(autoregressive & moving average model)
115-156
3. Stationary Time Series——AR, MA and ARMA
AR(p) Model
Yt = δ + ϕ1 Yt−1 + ϕ2 Yt−2 +. . . +ϕp Yt−p + εt ; εt ~WN 0, σ2
How to detect an AR(p) model to be stationary?
Step 1, find the lag polynomial in AR(P) model
The lag operator L shifts the time index of an observation, so that
LYt = Yt−1 ; LP Yt = Yt−P
The AR(p) model can be written as
1 − ϕ1 L − ϕ2 L2 −. . . −ϕp Lp Yt = δ + εt
Lag polynomial: Φ L = 1 − ϕ1 L − ϕ2 L2 −. . . −ϕp Lp
116-156
3. Stationary Time Series——AR, MA and ARMA
How to detect an AR(p) model to be stationary?
Step 2, let z=1/L, rewrite the lag polynomial to get characteristic
equation:
Φ z = z p − ϕ1 z p−1 − ϕ2 z p−2 −. . . −ϕp
Step 3, make Φ z = 0 and solve for the characteristic roots of the
function.
Step 4, if the roots of the function are all less than 1 in absolute value,
the AR(p) model is stationary.
The long run mean of AR(P) is
δ
E Yt = μ =
1 − ϕ1 − ϕ2 − ⋯ − ϕ𝑝
117-156
3. Stationary Time Series——AR, MA and ARMA
Test the stationarity of an AR(2) model.
Yt = 1.4 × Yt−1 − 0.45 × Yt−2 +εt
Correct answer
Step 1, find the lag polynomial in AR(2) model
Φ L = 1 − 1.4L + 0.45L2
Step 2, let z=1/L, rewrite the lag polynomial to get characteristic
equation
Φ z = z 2 − 1.4z 2−1 + 0.45
Step 3, make Φ z = 0 and solve for the characteristic roots of the
function
z1=0.9 and z2=0.5
Step 4, the roots of the function are all less than 1 in absolute value, the
AR(2) model is stationary.
118-156
3. Stationary Time Series——AR, MA and ARMA
② MA Model: The current value of the observed series is expressed as a
function of current and lagged unobservable shocks.
MA(1) Model
MA parameter
𝑌𝑡 = 𝜇 + 𝜀𝑡 + 𝜃𝜀𝑡−1 = μ + 1 + 𝜃𝐿 𝜀𝑡 ; 𝜀𝑡 ~𝑊𝑁 0, 𝜎 2
Long run mean
θ acts as a weight and determines the strength of the effect of the
previous shock.
MA(q) Model
𝑦𝑡 = 𝜇 + 𝜀𝑡 + 𝜃𝟏 𝜀𝑡−1 +. . . +𝜃𝑞 𝜀𝑡−𝑞 ; 𝜀𝑡 ~𝑊𝑁 0, 𝜎 2
119-156
3. Stationary Time Series——AR, MA and ARMA
Characteristic of MA(1)
Moving averages are always covariance-stationary.
The property of coefficient,𝜃
If the coefficient 𝜃 > 0 , then MA(1) process is persistent because
two consecutive values are positively correlated.
If the coefficient 𝜃 < 0 , then MA(1) process is aggressively mean
reverts, because the effect on previous shock is revised in current
period.
Difference between MA(1) and AR(1)
The structure of the MA(1) process, in which only the first lag of the
shock appears on the right, forces it to have a very short memory, and
hence weak dynamics, regardless of the parameter value.
While AR(1) has a much long memory because the lagged time series
contain the information of more previous time.
120-156
4.AR, MA and ARMA
For MA(1) Model
E(𝑌𝑡 ) = 𝜇
V(𝑌𝑡 ) = (1 + 𝜃 2 )𝜎 2
1, ℎ = 0
𝜃
𝜌(ℎ) = (1+𝜃2 )
,ℎ =1
0, ℎ ≥ 2
121-156
3. Stationary Time Series——AR, MA and ARMA
③ ARMA Model: Autoregressive Moving Average (ARMA) processes combine AR
and MA processes.
ARMA(1,1) model
𝑌𝑡 = δ + ϕ1 𝑌𝑡−1 + 𝜃𝜀𝑡−1 + 𝜀𝑡
𝜀𝑡 ~𝑊𝑁 0, 𝜎 2
The mean of ARMA(1,1) is
δ
E Yt = μ =
1 − ϕ1
The variance of ARMA(1,1) is
𝜎 2 (1 + 2ϕ1 𝜃 + 𝜃 2 )
V Yt = 𝛾0 =
1 − ϕ1 2
ARMA(p,q) models are often both highly accurate and highly parsimonious.
123-156
3. Stationary Time Series——AR, MA and ARMA
Gradually decay of an ACF or PACF
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
0 5 10 15 20 0 5 10 15 20 25 30 35
Sharp cutoff of an ACF or PACF
1
0.5
0
-0.5
-1
0 5 10 15 20 25 30 35
124-156
4. Hypothesis test for autocorrelation
Hypothesis test for autocorrelation
If an ARMA model applies, the ACF must gradually decay to 0.
Box-Pierce Statistic & Ljung-Box Statistic
The joint null hypothesis is H0 : ρ1 = ρ2 = ⋯ ρn = 0
Box-Pierce Statistic and Ljung-Box Statistic (follow a χ2 distribution)
ℎ ℎ T+2
2
𝑄𝐵𝑃 = 𝑇 𝜌𝑖 ; 𝑄𝐿𝐵 = 𝑇 𝜌𝑖 2
𝑖=1 𝑖=1 T−i
Reject the null when Q-statistic is large.(look up in the χ2 table)
Ljung-Box statistic is a version of Box-Pierce statistic and works
better in smaller samples.
125-156
Measuring Returns,
Volatility and
Correlation
126-156
1. Simple Returns
Simple Returns (R t )
The usual return on an asset bought at time t-1 and sold at time t can
be expressed as percentage change in the market variable between the
end of day t–1 and the end of day t .
Pt − Pt−1
Rt =
Pt−1
The return of an asset over multiple periods is the product of the simple
returns in each period :
T
1 + RT = (1 + R t )
t=1
127-156
2. Continuously Compounded Returns
Continuously Compounded Returns (rt )
Continuously compound returns is also known as log returns. These are
computed as the difference of the natural logarithm of the price :
rt = lnPt − lnPt−1
The relationship between simple and log return
1 + R t = 𝑒 rt
The main advantage of log returns is that the total return over multiple
periods is just the sum of the single period log return:
T
rT = rt
t=1
However , the accuracy of log return approximation is poor when the
simple return is large.
128-156
3. Measuring Volatility and Risk
Measuring Volatility
A volatility of a financial asset is usually measured by the standard
deviation of its returns.
The measure of volatility scales with the square-root of the holding
period. When the volatility is measured daily, it is common to convert
daily volatility to annualized volatility by scaling by 252.
σannual = 252 × σdaily
The variance (also called the variance rate) of returns is estimated using
the standard estimator:
T
1
σ2 = rt − μ 2
T
t=1
129-156
4.The Distribution of Financial Returns
Non-Normal Distributions
A normal distribution is symmetric and thin-tailed, and so has no
skewness or excess kurtosis.
However, many return series are both skewed and fat-tailed.
E.g., equity price tend to decline in stress periods and gold serves as
a flight-to-safety assets and so moves in the opposite direction.
Two methods to test normality of a distribution
① The Jarque-Bera Test
② Power Laws
130-156
4.The Distribution of Financial Returns
① The Jarque-Bera Test
It is used to formally test whether the sample skewness and kurtosis are
compatible with an assumption that the returns are normally distributed.
H0 : Skewness = 0 and Kurtosis = 3
The test statistic is
s 2 (K − 3)2
JB = T − 1 +
6 24
T is the sample size and JB~χ22 .
The critical values of χ22 are 5.99 for a test size of 5% and 9.21 for a
test size of 1%.
When test statistic is above these values, the null that the data are
normally distributed is rejected.
131-156
4.The Distribution of Financial Returns
② Power Laws
Normal random variables have thin tails so that the probability of a
return larger than Kσ declines rapidly as K increase, whereas many other
distributions have tails that decline less quickly for large deviations.
The simplest method to compare tail behavior is to examine the
natural log of the tail probability. lnP X > 𝑥 , which measures the
chance of appearing in extreme tail.
132-156
5. Correlation Versus Dependence
Dependence
Two random variables, X and Y are independent if their joint density is
equal to the product of their marginal densities:
fX,Y x, y = fX x × fY y
Financial assets are highly dependent and exhibit both linear and
nonlinear dependence.
The linear correlation estimator (Pearson’s correlation) measures
linear dependence.
Cov X,Y
ρXY =
σX σY
Nonlinear dependence takes many forms and cannot be
summarized by a single statistic.
Rank correlation (also known as Spearman's correlation)
Kendal’s 𝝉
133-156
6. Rank Correlation——Spearman’s Correlation
Performance of a Portfolio with Two Assets
134-156
6. Rank Correlation——Spearman’s Correlation
Order the return set pairs of X and Y with respect to the set X.
Then derive the ranks of Xi and Yi.
Derive the difference of the ranks and square the difference.
135-156
6. Rank Correlation——Spearman’s Correlation
Spearman correlation coefficient
𝑛 2
6 𝑖=1
𝑑𝑖
𝜌𝑠 = 1 −
𝑛(𝑛2 − 1)
We derive:
6 × 36
𝜌𝑆 = 1 − = −0.8
5 × (52 − 1)
136-156
6. Rank Correlation——Kendall’s Ƭ
A concordant pair is a pair of observations, each on two variables, {X1, Y1}
and {X2, Y2}, having the property that:
137-156
6. Rank Correlation——Kendall’s Ƭ
Recall the previous example
2 concordant pairs: {(1,4),(2,5)}, {(4,1),(5,2)}
8 discordant pairs: {(1,4),(4,1)}, {(1,4),(5,2)}, {(2,5),(4,1)}, {(2,5),(5,2)},
{(1,4),(3,3)}, {(2,5),(3,3)}, {(3,3),(4,1)}, {(3,3),(5,2)}
0 neither concordant nor discordant pairs
138-156
6. Rank Correlation——Kendall’s Ƭ
Kendall’s Ƭ:
nc − nd
τ=
n(n − 1 ) 2
nc: the number of concordant data pairs.
nd: the number of discordant pairs.
We derive:
2−8
τ= = −0.6
5 × (5 − 1 ) 2
139-156
6. Rank Correlation——Kendall’s Ƭ
The problem with applying ordinal rank correlations to cardinal
observations is that ordinal correlation are less sensitive to outliers.
Let’s double the outliers of the returns of asset X.
That result in an increase of the Pearson correlation coefficient from
−0.7402 to −0.6108.
A special problem with the Kendall t is when many non-concordant and
many non-discordant pairs occur, which are omitted in the calculation.
If the 10 observation pairs, four are neither concordant nor discordant,
leaving just six pairs to be evaluated.
140-156
7. The Relationship between Correlations
Relationship between Pearson's correlation (𝝆), Spearman's rank correlation (𝝆𝑺 ),
and Kendall's 𝝉 for a bivariate normal distribution.
The rank correlation is virtually identical to the linear correlation for normal
random variables.
Kendall's 𝜏 is increasing in 𝜌 , although the relationship is not linear.
Kendall's 𝜏 produces values much closer to zero than the linear correlation for
most of the range of 𝜌.
Estimates of 𝜏 with the same sign as the sample correlation, but that are smaller
in magnitude, do not provide evidence of nonlinearity.
141-156
Simulation and
Bootstrapping
142-156
1. Monte Carlo Simulation
Monte Carlo: Monte Carlo was derived from the name of a famous casino
established in 1862 in the south of France (actually, in Monaco).
143-156
1. Monte Carlo Simulation
Improving the Accuracy of Simulations
① Increasing the repeating times, N.
The standard method of estimating an expected value relies on the
LLN and CLT. In that case, the standard error of the estimated
1
expected value is proportional to .
𝑁
144-156
1. Monte Carlo Simulation
Improving the Accuracy of Simulations
② Antithetic Variables
The antithetic variables add a second set of random variables that
are constructed to have a negative correlation with the iid
variables used in the simulation.
③ Control Variates
The control variates help to reduce the Monte Carlo variation owing
to particular sets of random draws by using the same draws on a
related problem whose solution is known.
145-156
1. Monte Carlo Simulation
Limitation of Simulations
The biggest challenge when using simulation to approximate moments
is the specification of the DGP.
If the DGP does not adequately describe the observed data, then
the approximation of the moment may be unreliable.
making specific assumptions about observed data is needed
The other important consideration is the computational cost.
146-156
2. Bootstrapping
Bootstrapping directly uses the observed data to simulate a sample that
has similar characteristics.
Specifically, it does this without directly modeling the observed data or
making specific assumptions about their distribution.
Two types of bootstrapping
i.i.d. bootstrap
Circular block bootstrap(CBB)
147-156
2. Bootstrapping
Example: Bootstrapping is used to simulate data from a data set of {X1,X2,X3……X100}
If the data set is iid, then iid bootstrap is applied
Define simulation sample size m=5
Random sampling with replacement
Sample 1: {X1,X2,X34,X56,X99}
Sample 2: {X3,X5,X44,X56,X67}
Sample 3: {X7,X8,X31,X67,X68}
If the data set is not iid, then Circular block bootstrap(CBB) is applied
Define block size q=5 (rule-of-thumb, q= 𝑁)
Samples blocks of size q with replacement
Sample 1: {X1,X2,X3,X4,X5}
Sample 2: {X2,X3,X4,X5,X6}
Sample 3: {X3,X4,X5,X6,X7}
148-156
The Standard Normal Distribution
149-156
The Standard Normal Distribution
150-156
Student’s t distribution Table
151-156
Student’s t distribution Table
152-156
Chi-Square Distribution Table
153-156
F-Distribution Table
154-156
It’s not the end but just beginning.
感到晚了的时候其实是最快的时候
155-156
问题反馈
如果您认为金程课程讲义/题库/视频或其他资料中存在错误,欢迎您告诉我
们,所有提交的内容我们会在最快时间内核查并给与答复。
如何告诉我们?
将您发现的问题通过电子邮件告知我们,具体的内容包含:
您的姓名或网校账号
所在班级(eg.2005FRM一级长线无忧班)
问题所在科目(若未知科目,请提供章节、知识点)和页码
您对问题的详细描述和您的见解
请发送电子邮件至:academic.support@gfedu.net
非常感谢您对金程教育的支持,您的每一次反馈都是我们成长的动力。后续
我们也将开通其他问题反馈渠道(如微信等)。
156-156