FRM一级强化段定量分析 Crystal 金程教育 (标准版

Quantitative
Analysis
FRM一级培训讲义-强化班
讲师：Crystal
1-156
 Fundamentals of Probability
Framework  Random Variables and basic
statistics
 Distribution
 Hypothesis Testing
 Linear Regression
 Time Series
 Measuring Returns, Volatility and
Correlation
 Simulation and Bootstrapping
2-156
Fundamentals of
Probability
3-156
1. Fundamental principles of probability
 Axioms of Probability
 Any event A in the event space has P A ≥ 0 𝑜𝑟 (Pr A ≥ 0).
 The probability of all events in sample space is 1 .
 If events A1 and A2 are mutually exclusive, P A1 ∪ A2 = P A1 + P A2 .
 Extensions
 The probability of an event or its complement must be 1

P A ∪ AC = P A + P AC = 1
 The probability of the union of any two sets can be decomposed into:
P A∪B =P A +P B −P A∩B
4-156
2. Conditional probability
 Unconditional Probability (Marginal Probability)
 The probability of an event without any restrictions (or lacking any prior
information), commonly known as P(A).
 Conditional Probability
 The probability on condition that another event occurs first. The
conditional probability of Event B, conditional on Event A , is given by
P(A ∩ B)
P BA = ;P A > 0
P(A)
 Joint probability
 P(A ∩ B) is the joint probability, which means the probability that two
events occur simultaneously.
5-156
2. Conditional probability
 Independent events(or unconditional independent events):
 If the event (B) is not influenced by whether the other event(A) occurs, then
we say those events are independent, otherwise they are dependent.
P A ∩ B = P(A) × P(B)
 Conditional independence
 Like probability , independence can be redefined to hold conditional on
another event(C) , two events A and B are conditionally independent if :
P A∩B C =P A C ×P B C
 Note that two types of independence—unconditional and conditional—do
not imply each other.
 Events can be both unconditionally dependent and conditionally
independent.
 Events can be independent, yet conditional on another event they may
be dependent.
6-156
3. Total Probability Formula
 Total Probability Formula
 If an event A must result in one of the mutually exclusive events 𝐴1 , 𝐴2 ,
𝐴3 , ……, 𝐴𝑛 , then
𝑃 𝐴 = 𝑃 𝐴1 𝑃 𝐴 𝐴1 + 𝑃 𝐴2 𝑃 𝐴 𝐴2 +. . . +𝑃 𝐴𝑛 𝑃 𝐴 𝐴𝑛
A1 …
A2 1 𝐴𝑖 ∩ 𝐴𝑗 = ∅ (𝑖 ≠ 𝑗)
𝑛
An
(2) 𝐴𝑖 = Ω
A3 … 𝑖=1
7-156
4. Bayes’ rule
 Specifically, we are often interested in the probability of an event happening
only if another event happens first.
 Bayes’ rule provides a method to construct conditional probabilities using
other probability measures.
 It is both a simple application of the definition of conditional probability
and an extremely important tool.
𝑃 𝐴∩𝐵 𝑃 𝐵|𝐴
𝑃 𝐴|𝐵 = = ×𝑃 𝐴
𝑃 𝐵 𝑃 𝐵
posterior probability prior probability
8-156
4. Bayes’ rule
 Example:
Suppose you are asked using historical data to categorize the
managers as excellent or average.
Excellent managers are expected to outperform the market 70% of the
time. Average managers are expected to outperform the market only
50% of the time.
Assume that the probabilities of managers outperforming the market
for any given year is independent of their performance in prior years.
ABC Insurance Company has found that only 20% of all fund managers
are excellent managers and the remaining 80% are average managers.
A new fund manager to the portfolio started three years ago and
outperformed the market all three years.
9-156
4. Bayes’ rule
 Questions:
1. What is the probability that the new manager was an excellent
manager when she first started managing portfolios three years
ago?
2. What are the probabilities that the new manager is an excellent or
average manager today?
3. What is the probability that the new manager will beat the market
next year, given that the new manager outperformed the market
the last three years?
TIPS:
P(O): The probability that the fund manager will outperform the
market.
P(E): The probability that the fund manager is an excellent manager.
P(A): The probability that the fund manager is an average manager.
10-156
4. Bayes’ rule
 Answer 1: P(E)=20%
 Answer 2:
The probability of an excellent manager outperforming the market is
70% [P(O|E)=70%].
The probability of an average manager outperforming the market is
50% [P(O|A) = 50%].
Next, we need to use Bayes’ theorem to determine the probability that
the new manager is excellent given that the manager outperformed
the market three years in a row.
𝑃 𝑂 𝐸 = 0.73 = 0.343 𝑃 𝐸 = 0.2
𝑃 𝑂 = 𝑃 𝑂 𝐸 × 𝑃 𝐸 + 𝑃 𝑂 𝐴 × 𝑃 𝐴 = 0.73 × 0.2 + 0.53 × 0.8
= 0.1686
𝑃 𝑂 𝐸 × 𝑃(𝐸) 0.343 × 0.2
𝑃 𝐸𝑂 = = = 0.4069 = 40.7%
𝑃(𝑂) 0.1686
11-156
4. Bayes’ rule
 Answer 3: This question is determined by finding the unconditional
probability of the new manager outperforming the market. However,
now we will use 40.7% as the weight for the probability that the
manager is excellent and 59.3% as the weight for the probability that
the manager is average:
𝑃 𝑂 = 𝑃 𝑂 𝐸 × 𝑃 𝐸 + 𝑃 𝑂 𝐴 × 𝑃 𝐴 = 0.7 × 0.407 + 0.5 × 0.593
= 0.5814
12-156
5. Question 1
 The following is a probability matrix for X = {1, 2, 3} and Y = {1, 2, 3}.
X
1 2 3
1 6% 15% 9%
Y 2 12% 30% 18%
3 2% 5% 3%
Each of the following is true except:
A. X and Y are independent.
B. The covariance (X,Y) is non-zero.
C. The probability Y = 3 conditional on X = 1 is 10%.
D. The unconditional probability that X = 2 is 50%.
 Correct Answer : B
13-156
Random Variable
and basic
statistics
14-156
1.Random Variable
 Definition of a random variable
 A random variable is a function 𝜔 (i.e., an event in the sample space Ω)
that returns a number x .
 Two classes of random variables
 Discrete random variable
 Assigns a probability to a distinct set of values, which can be either
finite or contain a countably infinite set of values.
 When the set of values is infinite, the set must be countable.
 Probability mass function (PMF)
 Cumulative distribution function (CDF): 𝐹𝑋 𝑥 = P X ≤ 𝑥
 Continuous random variable
 Produces values from an uncountable set.
 Probability density function (PDF) is used instead of PMF
𝑃(𝑋 = 𝑥) = 0 even though x can occur.
15-156
1.Random Variable
 Multivariate random variables extend the concept of a single random
variable to include measures of dependence between two or more random
variables.
 like their univariate counterparts, they can be discrete or continuous.
 Probability matrix(probability table): When dealing with the joint
probabilities of two variables, it is often convenient to summarize the
various probabilities in a probability matrix.
 Example: We take X from 1 or 2 with the same probability. We take Y
from [1,X] with the same probability.
X
Y
1 2
1 0.50 0.25
2 0.00 0.25
 The probability in green blanks are joint probabilities
16-156
1.Random Variable
 Marginal distributions:
 The distribution of a single component of a bivariate random variable is
called a marginal distribution.
 When a PMF is represented as a probability matrix, the two marginal
distributions are computed by either summing across columns or
summing down rows.
 E.g. The probability in yellow blanks are marginal probabilities
 The yellow column and row are the marginal distribution of Y and X.
X
Y
1 2 f𝑌 𝑦
1 0.50 0.25 0.75
2 0.00 0.25 0.25
f𝑋 𝑥 0.50 0.50
17-156
1.Random Variable
 Conditional Distributions:
 The conditional distribution summarizes the probability of the outcomes
for one random variable conditional on the other taking a specific value.
 E.g. What is the conditional distribution of Y condition on X=2
fX,Y 2, y fX,Y 2, y
fY|X y x = 2 = =
fX 2 0.5
 The conditional distribution is then
Y
1 2
50% 50%
18-156
1.Random Variable
 Independence
 The components of a bivariate random variable are independent if
f𝑋,𝑌 x, 𝑦 = f𝑋 𝑥 × f𝑌 𝑦
 E.g. X and Y are not independent in the previous example
f𝑋 𝑥 = 1 × f𝑌 𝑦 = 1 = 0.375 ≠ f𝑋,𝑌 1,1 = 0.5
X
Y
1 2 f𝑌 𝑦
1 0.50 0.25 0.75
2 0.00 0.25 0.25
f𝑋 𝑥 0.50 0.50
19-156
2.Expectations and Moments
 Moment: As stated previously, moments are a set of commonly used
descriptive measures that characterize important features of random
variables.
 Two types of moments
 Central moment: 𝜇𝐾 = 𝐸 𝑋 − 𝐸 𝑋 𝐾
, (𝑘 ≥ 2)
 Non-central moment: 𝜇𝑘𝑁𝐶 = 𝐸 𝑋 𝐾 , (𝑘 ≥ 1)
20-156
2.Expectation and conditional expectations
 The expectation of a function g(X,Y) is a probability weighted average of the
function of the outcomes g(X, Y). The expectation is defined as:
𝐸 𝑔 𝑋, 𝑌 = 𝑔 𝑋, 𝑌 𝑓𝑋,𝑌 (𝑋, 𝑌)
 Example, consider the following joint PMF:
X
1 2
3 15% 10%
Y
4 60% 15%
Now, consider the function: 𝑔 𝑋, 𝑌 = 𝑋 𝑌 . The expectation of 𝑔 𝑋, 𝑌 is therefore:
𝐸 𝑔 𝑋, 𝑌 = 𝑔 𝑋, 𝑌 𝑓(𝑋, 𝑌)
𝑋∈{1,2} 𝑌∈{3,4}
𝐸 𝑔 𝑋, 𝑌 = 13 × 0.15 + 1 × 0.60 + 2 × 0.10 + 24 × 0.15 = 3.95
4 3
21-156
2.Expectation and conditional expectations
 A conditional expectation is an expectation when one random variable
takes a specific value or falls into a defined range of values.
 E.g. In the previous example, the distribution of Y condition on X=2
fX,𝑌 2, 𝑦 fX,𝑌 2, 𝑦
f𝑌|𝑋 𝑦x=2 = =
f𝑋 2 0.5
 The conditional distribution is then
Y
1 2
50% 50%
 The conditional expectation of Y is then
E 𝑌 X = 2 = 1 × 50% + 2 × 50% = 1.5
22-156
 Expected Value
 A measure of central tendency – the first moment
E(X) = P(xi ) xi = P(x1 )x1 + P(x2 )x2 +. . . +P(xn )xn
 Properties of Expected Value

 If b is a constant, E(b) = b.
 If a is a constant, E(aX) = aE(X).
 If a and b are constants, then E aX + b = aE X + E(b).
 E(X 2 ) ≠ [E(X)]2 .
 E X + Y = E(X)+E(Y).
 In general, E XY ≠ E(X)E(Y); If X and Y are independent random
variables, then E XY = E(X)E(Y).
23-156
 The median measures the 50% quantile of the distribution.
 First, the data are sorted from smallest to largest.
 When the sample size is odd, 𝑚𝑒𝑑𝑖𝑎𝑛 𝑥 = 𝑥(𝑛+1) 2
 When the sample size is even, 𝑚𝑒𝑑𝑖𝑎𝑛 𝑥 = 1 2 (𝑥𝑛 2 + 𝑥𝑛 2+1 )
 Two other commonly reported quantiles are the 25% and 75% quantiles.
 First, the data are sorted from smallest to largest
 Second, calculate the unit interval= 100%/(𝑛 − 1)
 Find the 25% quantile and 75% quantile with linear interpretation.
 These two quantile estimates ( 𝑞25 and 𝑞75 ) are frequently used together to
estimate the inter-quartile range:
𝐼𝑄𝑅 = 𝑞75 − 𝑞25
24-156
2. Variance and Moments
 Variance
 A measure of dispersion – the second moment
n
i=1(X i − μ)2
σ2x = =E X−μ 2
n
 Above formula is the definition of variance. To compute the variance, we
use the following formula:
σ2 X = E X 2 − [E(X)]2
 Measures how noisy or unpredictable that random variable is.
 The positive square root of 𝜎𝑥2 , 𝜎, is known as the standard deviation,
also called volatility.
25-156
2. Variance and Moments
 Properties of Variance
 The variance of a constant is zero.
 If a is constant, then: σ2 (aX) = a2 σ2 (X).
 If b is a constant, then: σ2 (X + b) = σ2 (X).
 If a and b are constant, then: σ2 (aX + b) = a2 σ2 (X).
 If X and Y are independent random variables and a and b are constants,
then σ2 aX + bY = a2 σ2 X + b2 σ2 (Y).
 If X and Y are two independent random variables, then:
σ2 X + Y = σ2 X + σ2 (Y) σ2 X − Y = σ2 X + σ2 (Y)
 The relationship between expectation and variance:
σ2 X = E X 2 − E X 2
26-156
3. Covariance and Moments
 Covariance
𝐶𝑜𝑣 𝑋, 𝑌 = 𝐸 𝑋 − 𝐸 𝑋 𝑌−𝐸 𝑌 = 𝐸 𝑋𝑌 − 𝐸 𝑋 𝐸 𝑌
𝑛
1
𝑆𝑋𝑌 = （𝑋𝑖 − 𝑋）（𝑌𝑖 − 𝑌）
𝑛−1
𝑖=1
 Covariance measures how one random variable moves with another
random variable.
 Covariance ranges from negative infinity to positive infinity.
27-156
3. Covariance and Moments
 Properties of Covariance
 If X and Y are independent random variables, their covariance is zero.
 The relationship between covariance and variance:
 σ2X ± Y = σ2x + σ2y ± 2Cov X, Y
 If a, b and c are constant, then
 Cov(a + bX, cY) = Cov(a, cY) + Cov(bX, cY) = b × c × Cov(X, Y)
 The covariance of X and itself is the variance of X
 Cov X, X = E X − E X X−E X = σ2 X
28-156
4. Correlation Coefficient and Moments
 Correlation coefficient
Cov X,Y 𝜎𝑥𝑦 𝐸[(𝑋 − 𝜇𝑥 )(𝑌 − 𝜇𝑦 )]
𝜌𝑋𝑌 = = =
𝜎𝑥 𝜎𝑦 𝜎𝑥 𝜎𝑦 𝜎𝑥 𝜎𝑦
 Properties of Correlation coefficient
 Correlation has no units, ranges from -1 to +1.
 Correlation measures the linear relationship between two random
variables.
 If two variables are independent, their covariance is zero, therefore, the
correlation coefficient will be zero. The converse, however, is not true.
For example, Y = X2.
 Variances of correlated variables:
𝜎𝑋2 ± 𝑌 = 𝜎𝑥2 + 𝜎𝑦2 ± 2𝜌𝑋𝑌 𝜎𝑥 𝜎𝑦
29-156
5. Skewness and Moments
 Skewness
 A measure of asymmetry of a PDF – the third moment.
𝐸 𝑋 − 𝜇𝑥 3
third central moment
𝑆= =
𝜎𝑥3 cube of standard deviation
 Symmetrical and nonsymmetrical distributions
 Positively skewed (right skewed) and negatively skewed(left skewed)
Negative-skewed Symmetric Positive-skewed
Mean Median Mode Mean = Median = Mode Mode Median Mean
 Positive skewed: Mode < median < mean, having a right fat tail
 Negative skewed: Mode > media > mean, having a left fat tail
30-156
6. Kurtosis and Moments
 Kurtosis
 A measure of tallness or flatness of a PDF – the fourth moment.
𝐸 𝑋 − 𝜇𝑥 4 fourth central moment
𝐾= =
𝐸 𝑋 − 𝜇𝑥 2 2 square of second central moment
 For a normal distribution, the K value is 3.
 Excess kurtosis = kurtosis – 3
Leptokurtic Mesokurtic Platykurtic

Kurtosis >3 =3 <3
Excess kurtosis >0 =0 <0
Tails (assuming same variance) fat tail normal thin tail
31-156
7. Moments and Linear Transformations
 Let 𝑌 = 𝑎 + 𝑏𝑋, a and b are constants:
 The mean of Y is
 𝐸 𝑌 = 𝑎 + 𝑏𝐸 𝑋
 The variance of Y is
 𝑉 𝑌 = 𝑏 2 𝑉[𝑋] 𝑜𝑟 𝑏 2 𝜎 2
 If b>0, the skewness and kurtosis of Y are identical to the skewness and
kurtosis of X.
 If b<0, (and thus 𝑌 = 𝑎 + 𝑏𝑋 is a decreasing transformation), then the
skewness has the same magnitude but the opposite sign.
32-156
8. The BLUE mean estimator
 A sample estimator is used to estimate the parameter of population, the
best estimator should be unbiased, and linear(BLUE).
 Linear: estimator is be calculated as a linear function of the sample data.
 Unbiased: the expected value of the estimator should equal to the
parameter of population.
 Best: If the estimator has the minimum variance(Best) among all the
other linear unbiased estimators.
 The mean estimator is the Best Linear Unbiased Estimator (BLUE) of the
population mean when the data are iid.
33-156
8. Question 1
 A model of the frequency of losses (L) per day assumes the following
discrete distribution: zero loss with probability of 20%; one loss with
probability of 30%; two losses with probability of 30%; three losses with
probability of 10%; and four losses with probability of 10%. What are,
respectively, the expected number of loss events and the standard
deviation of the number of loss events?
A. E(L) = 1.2 and σ = 1.44
B. E(L) = 1.6 and σ = 1.20
C. E(L) = 1.8 and σ = 2.33
D. E(L) = 2.2 and σ = 9.60
34-156
8. Question 2
 An analyst is concerned with the symmetry and peakedness of a
distribution of returns over a period of time for a company she is
examining. She does some calculations and finds that the median return is
4.2%, the mean return is 3.7%, and the mode return is 4.8%. She also finds
that the measure of kurtosis is 2. Based on this information, the correct
characterization of the distribution of return over time is:
Skewness Kurtosis
A. Positive Leptokurtic
B. Positive Platykurtic
C. Negative Platykurtic
D. Negative Leptokurtic
 Correct Answer : C
35-156
8. Question 3
 Which one of the following statements about the correlation coefficient
is false?
A. It always ranges from -1 to +1.
B. A correlation coefficient of zero means that two random variables
are independent.
C. It is a measure of linear relationship between two random variables.
D. It can be calculated by scaling the covariance between two
random variables.
 Correct Answer: B
36-156
Distribution
37-156
1. Binomial Distribution
 Bernoulli Distribution
 P(X = 1) = p P(X = 0) = 1 – p
 Binomial Distribution
 The probability of x successes in n trails
𝑛!
𝑝 𝑥 =𝑃 𝑋=𝑥 = 𝐶𝑛𝑥 𝑝 𝑥 1−𝑝 𝑛−𝑥
= 𝑝𝑥 1 − 𝑝 𝑛−𝑥
𝑥! 𝑛 − 𝑥 !
 Expectations and variances
Expectation Variance
Bernoulli random variable p p(1 – p)
Binomial random variable np np(1 – p)
38-156
2. Poisson Distribution
 Poisson Distribution
 When there are a large number of trials but a small probability of
success, Binomial calculations become impractical.
 If we substitute 𝜆t/𝑛 for 𝑝, and let n very large, the Binomial Distribution
becomes the Poisson Distribution.
𝜆𝑘 −𝜆
𝑝 𝑘 =𝑃 𝑋=𝑘 = 𝑒 𝜆 = 𝑛𝑝
𝑘!
 X refers to the number of success per unit.
 λ indicates the rate of occurrence of the random events; i.e., it tells
us how many events occur on average per unit of time.
 Mean Variance
λ λ
39-156
3. Normal Distribution
 Normal Distribution
 As n increases, the binomial distribution approaches Normal Distribution.
The normal curve is symmetrical.

1 The two halves are identical.
1 − 2 𝑥−𝜇 2
𝑓 𝑥 = ⋅ 𝑒 2𝜎
2𝜋𝜎 Theoretically, the
Theoretically, the
curve extends to
curve extends to -∞.
+∞.
 Properties The mean, median, and mode are equal.
 X ~ N(µ, σ²), fully described by its two parameters µ and σ².

 Bell-shaped, symmetrical distribution: skewness = 0; kurtosis = 3.
 A linear combination (function) of two (or more) independent normally
distribution random variables is itself normally distributed.
 The tails get thin and go to zero but extend infinitely, asymptotic.
40-156
 Normal distribution in practice
① Sums of independent normally distributed random variables are also
normally distributed.
② Standardization and Z-table Application
③ Key quantiles in normal distribution
④ Approximating discrete random variables to normal random variable
41-156
① Sums of independent normally distributed random variables are also
normally distributed
 If X~𝑁(𝜇1 , 𝜎12 ), Y~𝑁(𝜇2 , 𝜎22 ) and they are independent, then
aX + bY~𝑁(𝑎𝜇1 + 𝑏𝜇2 , 𝑎2 𝜎12 + 𝑏 2 𝜎22 )
 When applied to portfolio, a and b are usually asset weights
 Example:
 A $50 million prudent fund (PF) is merged with a $200 million
aggressive fund (AF). The return of PF~N(0.03, 0.072) and the return of
AF~ N(0.07, 0.152). Assuming the returns are independent, what is the
distribution of the portfolio return?
 Correct Answer: 𝑅𝑝 ~𝑁(0.062, 0.12082 )
42-156
② Standardization and Z-table Application
 Standardization: if 𝑋 ~ 𝑁(µ, 𝜎²), then
𝑋−𝜇
𝑍= ∼ 𝑁 0,1
𝜎
 Check Z-table to find CDF
𝑋−𝜇
Φ
𝜎
 Example:
 If 𝑋 ~ 𝑁(70, 9), compute the probability of X ≤ 64.12.
 Solution:
𝑍 = (𝑋 − 𝜇)/𝜎 = (64.12 − 70)/3 = −1.96
𝑃(𝑍 ≤ −1.96) = 0.0250
𝑃(𝑋 ≤ 64.12) = 0.0250
43-156
③ Key quantiles in normal distribution
 Approximately 68% of all observations fall in the interval 𝜇 ± 𝜎
 Approximately 90% of all observations fall in the interval 𝜇 ± 1.65𝜎
-2.58 -1.65  +1.65 +2.58

-1.96 +1.96
90%
95%
99%
44-156
④ Approximating discrete random variables to normal random variable
 A binomial random variable can approximate to a normal random
variable, 𝑁(𝑛𝑝, 𝑛𝑝(1 − 𝑝) if both
 𝑛𝑝 ≥ 10
 𝑛(1 − 𝑝) ≥ 10.
n = 20
0.2 0.2 0.2
P(x)
P(x)
P(x)
0.1 0.1 0.1
0.0 0.0 0.0

0 1 2 3 4 5 6 7 8 9 1011121314151617181920 0 1 2 3 4 5 6 7 8 9 1011121314151617181920 0 1 2 3 4 5 6 7 8 9 1011121314151617181920
x x x
p = 0.1 p = 0.3 p = 0.5
 The Poisson random variable can approximate to a normal random

variable, N(λ，λ) if
 λ is large, commonly applied when 𝜆 ≥ 1000.
45-156
4. Student’s t distribution
 Student’s t distribution
 It is similar to the normal, except it exhibits slightly heavier tails.
𝑋 − 𝜇𝑋  It is symmetrical.
𝑡= ~𝑡𝑛−1
𝑆𝑋 𝑛  It has mean of zero.
 As the d.f. increase, the t-distribution converges with the standard
normal distribution.
 Both the normal and student’s t distribution characterize the sampling
distribution of the sample mean. The difference is that the normal is
used when we know the population variance. The student’s t is used
when we must rely on the sample variance. In practice, we don’t know
the population variance, so the student’s t distribution is typically
appropriate.
46-156
5. Lognormal Distribution
 Lognormal Distribution
 If lnX is normal, then X is lognormal; if a variable is lognormal, its natural

log is normal.
 It is useful for modeling asset prices which never take negative values.
 Right skewed.
 Chi-Square (𝝌𝟐 ) Distribution
 F-Distribution
47-156
6. LLN and CLT
 Law of Large Numbers (LLN)
 The Kolmogorov proved Strong Law of Large Numbers which states
that if 𝑋𝑖 is a sequence of iid random variables with 𝐸 𝑋𝑖 ≡ 𝜇(a finite
parameter, 𝜇), then :
1 𝑎.s.
n
μn = i=1 X i μ
n
𝑎.𝑠.
 The symbol means converges almost surely.
 The mean in large samples μn (formally , when n → ∞ ) is
approximate to the true population mean μ.
48-156
6. LLN and CLT
 The Central Limit Theorem (CLT)
 If X1, X2, … , Xn, a i.i.d. random sample from any population (regardless
of any distributions) with a finite mean, 𝜇 and finite variance, 𝜎 2 , then
the mean estimator, μn , tends to be normally distributed with mean 𝜇
𝜎2
and variance, as the sample size increases.
𝑛
d 𝜎2
μn = 𝑋 → 𝑁 𝜇 ,
𝑛
or
𝜇−𝜇 d
→ 𝑁 0,1
𝜎 𝑛
49-156
7. Question 1
1. On a multiple choice exam with four choices for each of six questions,
what is the probability that a student gets less than two questions
correct simply by guessing?
A. 0.46%
B. 23.73%
C. 35.60%
D. 53.39%
 Correct Answer：D
 p(X=0) = (3/4)6 = 17.80%
 p(X=1) = 6×(1/4)×(3/4)5 = 35.59%
 The probability of getting less than two questions correct is p(X=0)
+ p(X=1) = 53.39%.
50-156
7. Question 2
2. A call center receives an average of two phone calls per hour. The
probability that they will receive 20 calls in an 8-hour day is closest to:
A. 5.59%
B. 16.56%
C. 3.66%
D. 6.40%
 Correct Answer： A
 To solve this question, we first need to realize that the expected
number of phone calls in an 8-hour day is 16. Using the Poisson
distribution, we solve for the probability that X will be 20.
1620 𝑒 −16
𝑃(𝑋 = 20) = = 5.59%
20!
51-156
7. Question 3
3. Which of the following statement about the normal distribution is not
accurate?
A. Kurtosis equals three.
B. Skewness equals one.
C. The entire distribution can be characterized by two moments,
mean and variance.
D. The normal density function has the following expression:
1 1
[− 2 (𝑥−𝜇)2 ]
𝑓 𝑥 = 𝑒 2𝜎
2𝜋𝜎 2
 Correct Answer：B
52-156
Hypothesis
Testing
53-156
1. Statistical Inference
 What is Statistical Inference?
 Loosely speaking, is the study of the relationship between a population

and a sample drawn from that population.
 Sampling and Estimation
population population parameter
sampling estimation
sample sample statistic
54-156
2. Point and Confidence Interval Estimate
 Point Estimation
 Using a single numerical value to estimate the parameter of the
population.
parameter
𝑋 → 𝜇𝑋 S2 → 𝜎𝑋2
estimator
 Confidence Interval Estimate

 Level of significance (alpha)
 Degree of confidence (1 – alpha)
 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = 𝑃𝑜𝑖𝑛𝑡 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 𝑅𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝐹𝑎𝑐𝑡𝑜𝑟 ×
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟
55-156
 The population has a normal distribution with a known variance.
 Confidence interval:
𝜎
𝑋 ± 𝑧𝛼 2
𝑛
Point estimate Reliability factor Standard error
 The population has a normal distribution with a unknown variance.
 Confidence interval:
𝑠
𝑋 ± t𝛼 2
𝑛
When sampling form a: Small Sample (N < 30) Larger Sample (N ≥ 30)
Normal distribution with known
z-Statistic z-Statistic
variance
Normal distribution with unknown
t-Statistic t-Statistic or z-Statistic
variance
Nonnormal distribution with known
not available z-Statistic
variance
Nonnormal distribution with
not available t-Statistic or z-Statistic
unknown variance
56-156
 Example: Assume the ratio follows normal distribution, calculate the
95% confidence interval of the ratio.
Price to earning (P/E) ratios of 28 companies on the New York stock
exchange (NYSE)
Company P/E Company P/E
AA 27.96 INTC 36.02
AXP 22.90 IBM 22.94
T 8.30 JPM 12.10
BA 49.78 JNJ 22.43
CAT 24.88 MCD 22.13
C 14.55 MRK 16.48
KO 26.22 MSFT 33.75
DD 28.21 MMM 26.05
EK 34.71 MO 12.21
XOM 12.99 PG 24.49
GE 21.89 SBC 14.87
GM 9.86 UTX 14.87
HD 20.26 WMT 27.84
HON 23.36 DIS 37.10
Mean = 23.25 Variance = 90.13 Standard deviation = 9.49
57-156
3. Hypothesis Testing
Step 1 Step 2 Step 3
State null and Select a level of

Identify the test statistic
alternative hypotheses significance
Do not reject
Take a sample, Formulate a
Reject arrive at decision decision rule
Step 5 Step 4
58-156
 Steps 1: State null (H0) and alternative hypotheses (Ha)
H0 : 𝜇𝑋 = 18.5, HA : 𝜇𝑋 ≠ 18.5
 One-tailed test vs. Two-tailed test
One-tailed test Two-tailed test

H0: μ ≥ 0 H a: μ < 0
H0: μ = 0 H a: μ ≠ 0
H0: μ ≤ 0 H a: μ > 0
 Steps 2: Select the test statistic and determine the distribution

 Test statistic = (sample statistic – hypothesized value)/(standard error of
the sample statistic)
𝑋 − 𝜇𝑋 23.25 − 18.5
𝑡= = = 2.65~𝑡 𝑛−1
𝑆𝑋 𝑛 9.49 28
59-156
 Steps 3: Choose the level of significance α and obtain critical value
 Level of significance (alpha)
 Critical Value
 The distribution of test statistic (z, t, χ2, F)
 Degree of freedom
t(n-1), χ2 (n-1), F(n1-1,n2-1)
 One-tailed or two-tailed test
 For the example:
 Alpha=5%
 t𝛼 𝑛 − 1 = 2.052
2
60-156
 Step 4: Formulate a decision rule
 Reject H0 if |test statistic| > critical value.
 Fail to reject H0 if |test statistic| < critical value.
 Rejection region lies on the same side of HA
2.5% 2.5% 5%
95% 95%
-2.052 2.052 1.645
Reject H0 Fail to Reject H0 Reject H0 Fail to Reject H0 Reject H0

 Step 5: Arrive at decision
 We can never say “accept” H0.
 State the conclusion: μ is (not) significantly different from μ0.
61-156
4. Test of Single Population Mean
 H0: μ = μ0
 z-test vs. t-test
Normal population, n < 30 n ≥ 30

Known population variance (σ²) z-test z-test
Unknown population variance t-test t-test or z-test
 z-statistic
𝑥 − 𝜇0
𝑧=
𝜎 𝑛
 t-statistic
𝑥 − 𝜇0
𝑡=
𝑠 𝑛
62-156
5. Testing the equality of two means
 H0 : μX = μY
 Testing whether the means of two series are equal.
 If the null hypothesis is true , then : E Zi = E Xi − E Yi = μX − μY = 0.
 The test statistic is constructed as :
μZ μX − μY
T= =
σ2Z n 𝜎X2 + 𝜎Y2 − 2𝜎𝑋𝑌
n
 When Xi and Yi are both iid and mutually independent , the test statistic
for testing that the means are equal is :
μX − μY
T=
𝜎X2 𝜎Y2
+
nX nY
63-156
5. Testing the equality of two means
 Example: Assume two distributions that measure the average rainfall in City
X and City Y , respectively . Their means , variance ,sample size and
correlation are
μX = 10CM ; μY = 14CM ; 𝜎X2 = 4 ; 𝜎Y2 = 6 , nX = nY = 12 , ρXY = 0.3
 Question : Whether the average rainfall in City X are significantly
different from that in City Y ?
 H0 : μX = μY
 The value of the test statistic is
μX − μY 10 − 14
T= = = −5.21
𝜎X2 + 𝜎Y2 − 2𝜎𝑋𝑌
4 + 6 − 2 × 0.3 × 4 × 6
n 12
Therefore , the null would be rejected even at a 99.99% confidence level.
64-156
6. Test of Single Population Variance
 H0: 𝝈𝟐 = 𝝈𝟐𝟎
 The Chi-Square test
2
𝑛 − 1)𝑠 2
𝜒 = 𝑑𝑓 = 𝑛 − 1
𝜎02
Where:
𝑛 = sample size
𝑠 2 = sample variance
𝜎02 = hypothesized value for the population variance
65-156
7. Test of Variances Difference
 H0: 𝛔𝟐𝟏 = 𝛔𝟐𝟐
 The F-test
𝑠12
𝐹 = 2 df1 = 𝑛1 − 1 df2 = 𝑛2 − 1
𝑠2
 Always put the larger variance in the numerator (𝑠12 > 𝑠22 ).
66-156
8. Summary of Hypothesis Testing
Test type Assumptions H0 Test-statistic distribution
Normally distributed population, 𝑥 − 𝜇0
𝜇=0 𝑧= 𝑁(0,1)
known population variance 𝜎 𝑛
Normally distributed population, 𝑥 − 𝜇0
𝜇=0 𝑡= 𝑡(𝑛 − 1)
Mean unknown population variance 𝑠 𝑛
hypothesis μZ
testing T=
Testing the equality of two σ2Z n
𝜇1 = 𝜇2 μX − μY 𝑡(𝑛 − 1)
means =
𝜎X2 + 𝜎Y2 − 2𝜎𝑋𝑌
n
𝑛 − 1 𝑠2
Variance
2
Normally distributed population 𝜎 = 𝜎02 𝜒 = 2 𝜒 2 (𝑛 − 1)
𝜎02
hypothesis
testing Two independent normally
𝜎12 = 𝜎22 𝐹 = 𝑠12 𝑠22 𝐹(𝑛1 − 1, 𝑛2 − 1)
distributed populations
67-156
9. P Value Approach
① P Value Approach
 P value is the smallest level of significance for which the null
hypothesis can be rejected.
 Return to our P/E example,
𝑋 − 𝜇𝑋 23.25 − 18.5
𝑡= = = 2.65~𝑡𝑝 (27) → 𝑃 ≈ 0.015
𝑆𝑋 𝑛 9.49 28 2
 The Rule of P Value

 Reject H0 if the p-value is less than the significance level of the
hypothesis test.
 Do not reject H0 if the p-value is greater than the significance level.
68-156
9. P Value Approach
2.5% 2.5%
1.07% 1.07%
Negative of the Critical value for 5% Positive of the

test statistic significance level test statistic
 If P-value < alpha, we reject null hypothesis.
69-156
10. Type I and Type II Errors
 Type I error
 Reject the null hypothesis when it’s actually true.
 Type II error
 Fail to reject the null hypothesis when it’s actually false.
 Significance level (α)
 The probability of making a Type I error:
Significance level = P(Type I error)
 Power of a test
 The probability of correctly rejecting the null hypothesis when it is false:
Power of a test = 1 – P(Type II error)
70-156
11. Question 1
 Analyst John is concerned that the average days sales outstanding has
increased above its historical average of 27 days. From a large sample
of 36 companies, he computes a sample mean DSO of 29 days with
sample standard deviation of 7. His one-sided alternative hypothesis,
stated with 95% confidence, is that DSO is greater than 27. Does she
reject the null?
A. No, do not reject one-sided null as the t-statistic is less than 1.65.
B. No, do not reject one-sided null as the t-statistic is less than 1.96.
C. Yes, do reject one-sided null as the t-statistic is greater than 1.65.
D. Yes, do reject one-sided null as the t-statistic is greater than 1.96.
 Correct answer ： C
71-156
11. Question 2
 If the mean P/E of 30 stocks in a certain industrial sector is 18 and the
sample standard deviation is 3.5, standard error of the mean is closest
to:
A. 0.12
B. 0.34
C. 0.64
D. 1.56
 Correct answer ：C
72-156
11. Question 3
 Assume random variable is normally distributed with mean of 10 and a
variance of 25. Without using a calculator, what is the probability that X
falls within 1.75 and 21.65?
A. 68%
B. 94%
C. 95%
D. 99%
73-156
11. Question 4
 Assume the population of hedge fund returns has an unknown
distribution with mean of 8% and volatility of 10%. From a sample of
40 funds, what is the probability the sample mean return will exceed
10.6%?
A. 5%
B. 8%
C. 10%
D.12%
 Correct Answer : A
74-156
11. Question 5
 Which of the following statements regarding hypothesis testing is
correct?
A. Type II error refers to the failure to reject the Ha when it is actually
false.
B. Hypothesis testing is used to make inferences about the
parameters of a given population on the basis of statistics
computed for a sample that is drawn from another population.
C. All else being equal, the decrease in the chance of making a type I
error comes at the cost of increasing the probability of making a
type II error.
D. If the p-value is greater than the significance level, then the
statistics falls into the reject intervals.
 Correct answer ： C
75-156
Linear Regression
76-156
1. Elements of linear regression
 Interpretation of regression function
 An estimated slope coefficient of b1 would indicate that the dependent
variable will change b1 units for every 1 unit change in the independent
variable.
 The intercept term of b0 can be interpreted to mean that the
independent variable is zero, the dependent variable is 𝑏0.
 The error term is the portion that can’t be explained by independent
model.
 This shock is assumed to have mean 0 so that E[Y] = E[b0 + b1X +
ε] = b0 + b1E[X]
Slope coefficient
𝑌𝑖 = b0 + 𝑏1 𝑋𝑖 + 𝜀𝑖
Intercept coefficient Error term
77-156
2. Key assumptions in linear regression
 OLS regression requires a number of assumptions. Most of the major assumptions pertain
to the regression model’s residual term (i.e., error term).
 A linear relationship exists between X and Y.
 Independent variables are not random, or independent variables are not correlated
with error term, or independent variables are not correlated(only in multiple
regression).
 The expected value of the error term is zero (i.e., 𝐸 𝜀𝑖 = 0).
 The variance of the error term is constant (i.e., the error terms are homoskedastic).
 The error term is uncorrelated across observations (i.e., the error terms are
autocorrelated).
 The error term is normally distributed.
 There are five key assumptions for the linear regression model:
 The regression errors, 𝜀𝑖 , have a mean of zero conditional on the regressors Xi
 Constant Variance of Shocks
 Variance of independent variable is strictly greater than 0 (σ2X > 0).
 The sample observations are i.i.d. random draws from the population
 Large outliers are unlikely
 The explanatory variables are not perfectly linearly dependent.
78-156
3. Ordinary Least Squares
 Ordinary least squares (OLS)
 The OLS estimation is the process of estimating the population
parameter bi using the corresponding bi value, which minimizes the
square residual (i.e., the error terms).
 The OLS sample coefficients are those that:
minimize 𝜀𝑖2 = 𝑌𝑖 − 𝑏0 + 𝑏1 × 𝑋1 ]2
 The OLS sample coefficients are those that:

𝐶𝑜𝑣(𝑋, 𝑌) 𝜎𝑦
𝑏1 = = 𝜌xy
𝜎 2 (𝑋) 𝜎𝑥
𝑏0 = 𝑌 − 𝑏1 𝑋
79-156
4. Inference in linear regression
 The confidence interval for the regression coefficient, b1and b0
𝑏1 ± (Critical Value × 𝑠𝑏1 )
𝑏0 ± (𝐶𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑉𝑎𝑙𝑢𝑒 × 𝑠𝑏0 )
 The standard error of the regression coefficient.
1 𝑠
𝑆𝑏1 =
𝑛 𝜎𝑥
𝑠 2 (𝜇𝑋2 + 𝑠𝑋2 )
𝑆𝑏0 =
𝑛𝜎𝑋2
80-156
 Regression Coefficient Hypothesis Testing
 A t-test may also be used to test the hypothesis that the true slope
coefficient b1, is equal to some hypothesized value, k (usually k=0), the
appropriate test procedure is:
 State null hypothesis and alternative hypothesis
𝐻0: 𝑏1 = 0 𝐻𝑎 : 𝑏1 ≠ 0
 Calculate test statistic
𝑏1 − 𝑏1
T=
𝑠𝑏1
 The test statistic can also be transformed into a p-value,
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 2(1 − Φ 𝑇 )
 Reject H0 if T falls into rejection region, or p-value is less than test
size.
81-156
 The relationship between each component
2 2 2
𝑌𝑖 − 𝑌 = 𝑌𝑖 − 𝑌 + 𝑌𝑖 −𝑌𝑖
Y
Residual Sum of Squares
𝑌𝑖 = 𝑏0 + 𝑏1 𝑋𝑖 𝑌𝑖 −𝑌𝑖 → 𝑅SS
Total Sum
𝑌𝑖 − 𝑌 → 𝑇𝑆𝑆
𝑌𝑖 − 𝑌 → 𝐸𝑆𝑆 of Squares
Explained Sum of Squares
b0
X
82-156
 The reporting matrix of each component: Analysis of Variance (ANOVA)
Table
df SS MSS
Explained k ESS ESS/k
Residual n-k-1 RSS RSS/(n-k-1)
Total n–1 TSS -
83-156
 Joint Hypothesis Testing
 The t-test is not directly applicable when testing complex hypotheses
that involve more than one parameter , because the parameter
estimators can be correlated.
 An F-test is used to test whether at least one slope coefficient is
significantly different from zero.
H0: b1 = b2 = b3 = … = bk = 0; H1: at least one bj ≠ 0 (j = 1 to k)
𝐸𝑆𝑆
𝐹= 𝑘
𝑅𝑆𝑆
𝑛−𝑘−1
 The test assesses the effectiveness of the model as a whole in explaining
the dependent variable.
 Decision rule: reject H0, if F(test-statistic) > Fc(critical value).
84-156
 Selected Data from 30 college students
Age Height Weight Age Height Weight
21 163 60 20 153 42
22 164 56 20 156 44
21 165 60 21 156 38
23 168 55 21 157 48
21 169 60 21 158 52
21 170 54 23 158 45
23 170 80 22 159 43
23 170 64 22 160 50
22 171 67 21 160 45
22 172 65 21 160 52
23 172 60 23 160 50
21 172 60 22 161 50
23 173 60 21 161 45
22 173 62 21 162 55
21 174 65 20 162 60
85-156
 Summary output
Regression Statistics
Multiple R 0.8148004
R Square 0.6638996
Adjusted R Square 0.6390033
Standard Error 5.5120076
Observations 30
ANOVA
df SS MS F Significance F
Regression 2 1620.3799 810.18993 26.666575 4.04884E-07
Residual 27 820.32014 30.382227
Total 29 2440.7
Standard
Coefficients t Stat P-value Lower 95% Upper 95%
Error
Intercept -140.7652 29.755964 -4.730656 6.282E-05 -201.8194407 -79.71104917
Age -0.048528 1.1637722 -0.041699 0.9670455 -2.436391407 2.33933526
Height 1.1972821 0.1800553 6.6495241 3.896E-07 0.827839137 1.566725091
86-156
5. Measuring model fit
 R2 (the Coefficient of Determination)
 A more intuitive measure of the “goodness of fit” of the regression. It is
interpreted as a percentage of variation in the dependent variable
explained by the independent variable. Its limits are 0 ≤ R2 ≤ 1.
2
𝐸𝑆𝑆 𝑅SS
𝑅 = =1−
𝑇𝑆𝑆 𝑇𝑆𝑆
 Notes that in a simple two-variable regression, the square root of R2 is
the correlation coefficient (r) between Xi and Yi.
𝑟 2 = 𝑅2 → 𝑟 = ± 𝑅2
87-156
 The limitations with R2
 Adding a new variable to the model always increases the R2
 The adjusted R2 is a modified version of the R2 that does not
necessarily increase with a new independent variable is added.
𝑅SS 𝑛 − 𝑘 − 1
𝐴djusted 𝑅2 = 1 −
𝑇𝑆𝑆 𝑛 − 1
𝑛 − 1 𝑅𝑆𝑆
=1−
𝑛 − 𝑘 − 1 𝑇𝑆𝑆
𝑛−1
=1− (1 − 𝑅2 )
𝑛−𝑘−1
 Adjusted R² ≤ R²; adjusted R² may be less than zero.
 R2 cannot be compared across models with different dependent
variables.
 There is no such thing as a "good" value for R2
88-156
 Calculating R2 and adjusted R2
 The analyst runs a regression of monthly return on the five independent
variables within 60 months. The total sum of squares is 570, the residual
sum of squares is 180. Calculate R2 and adjusted R2.
 Assuming that the analyst now adds four independent variables for the
regression, R2 increases to 70.0%. Identify the models that analysts are
most likely to use.
 Answer:
570 − 180
𝑅2 = = 68.42%
570
60 − 1
𝑅𝑎2 = 1 − 1 − 𝑅2 = 65.5%
60 − 5 − 1
2 60 − 1 2
𝑅′ 𝑎 = 1 − 1 − 𝑅′ = 64.6%
60 − 9 − 1
89-156
6. Violation of the assumption
 Conditional Heteroskedasticity
 The heteroskedasticity is related to the level of (i.e., conditional on) the
independent variable.
 Conditional heteroskedasticity does create significant problems for statistical
inference. Y High residual
Low residual variance
variance
0 X
 Effect of Heteroskedaticity on Regression Analysis
 The standard errors are usually unreliable estimates.
 The coefficient estimates (the b1) aren’t affected.
 If the standard errors are too small, but the coefficient estimates themselves
are not affected, the t-statistics will be too large and the null hypothesis of
no statistical significance is rejected too often.
 How to test heteroskedasticity: White Test
90-156
 Multi-collinearity (Multiple-regression only)
 Multicollinearity refers to the situation that two or more independent
variables are highly correlated with each other.
 Three methods to detect multi-collinearity
1. T-tests indicate that none of the individual coefficients is
significantly different than zero, while the F-test indicates overall
significance and the R2 is high.
2. The absolute value of the sample correlation between any two
independent variables is greater than 0.7 (i.e., |r| > 0.7).
3. Common sense
 Methods to correct multi-collinearity
 Omit one or more of the correlated independent variables.
91-156
 Omitted Variable Bias
 For omitted variable bias to arise, two things must be true:
 At least one of the included regressors must be correlated with the
omitted variable.
 Omitted variables must be the determinant of the dependent
variable Y.
 Implications of Omitted Variable Bias
 Omitted variable bias is a problem whether the sample size is large or
small.
 Whether this bias is large or small in practice depends on the correlation
𝑟𝑋,𝜀 between the regressor and the error term.
 The larger the correlation, the larger is the bias.
 Corrections of Omitted Variable Bias
 Add omitted variables to construct a multiple regression
92-156
7. Selection models
 Two approaches to find the appropriate model complexity
① General-to-specific model selection
② M-fold cross-validation
93-156
7. Selection models
① General-to-specific model selection
1. First includes all relevant variables.
2. Remove the variable with coefficient with the smallest absolute t-
statistic (statistically insignificant ).
3. Re-estimate using the remaining explanatory variables and remove
unqualified variables.
4. Repeat the steps mentioned above until the model contains no
coefficients that are statistically insignificant.
5. Common choice for 𝛼 are between 1% and 0.1% (t value are least 2.57
or 3.29 , respectively).
94-156
7. Selection models
② M-fold cross-validation
 It is designed to select a model that performs well in fitting observations
not used to estimated the parameter (out-of-sample prediction).
 Steps of m-fold cross-validation
1. The first step is to determine a set of candidate models . If a data
set has n candidate explanatory variables, then there are 2𝑛
possible model specifications.
2. Splitting the data into m equal sized blocks , parameters are
estimated using m-1 blocks (training set) and residuals are
computed with data in the excluded block . (validation set)
3. Repeat the process of estimating parameters and computing
residual for a total of m times (ensure each block is used to
compute residual once).
4. Compute sum of squared errors for m times and choose the model
with the smallest out-of-sample sum of squared residuals.
95-156
8. Plot of Residuals and Outliers
 Outlier
 Outliers are values that , if removed from the sample, produce large
changes in the estimated coefficients.
 Cook’s distance measures the sensitivity of the fitted values in a
regression to dropping a single observation j . It is defined as :
−j 2
n
i=1 Yi − Yi
Dj =
KS 2
−j
 Yi is the fitted value of 𝑌𝑖 when observation j is dropped and the
model is estimating n-1 observations.
 K is the number of coefficients in the regression model.
 S 2 is the estimate of error variance from the model using all
observations.
 Dj larger than 1 indicate that observation j has a large impact on the
estimated model parameters.
96-156
9. Question 1
1. A regression of a stock’s return (in percent) on an industry index’s return (in
percent) provides the following results:
Coefficient Standard Error
Intercept 2.1 2.01
Industry index 1.9 0.31
Degrees of Freedom SS
Explained 1 92.648
Residual 3 24.512
Total 4 117.160
Which of the following statements regarding the regression is incorrect?
A. The correlation coefficient between the X and Y variables is 0.889.
B. The industry index coefficient is significant at the 99% confidence interval.
C. If the return on the industry index is 4%, the stock’s expected return is 9.7%.
D. The variability of industry returns explains 21% of the variation of company
returns.
97-156
9. Question 1
 Answer: D
 r2 = R2 = 92.648/117.160 = 79%, the variability of industry returns
explains 79% of the variation of company.
 t-stat (industry index) = 1.9/0.31 = 6.13, so the coefficient of
industry index is significant.
 R = 2.1% + 1.9 × R(industry index) = 2.1% + 1.9 × 4 % = 9.7%
98-156
Time Series
99-156
1. Introduction to time series analysis
 A time series is a set of observations on a variable’s outcomes in different
time periods, taking the example of a set of monthly U.S. retail sales data.
100-156
1. Introduction to time series analysis
 A time series can be decomposed into three distinct components
① The trend, which captures the changes in the level of the time series
over time.
② The seasonal component, which captures predictable changes in the
time series according to the time of year.
③ The cyclical component, which captures the cycles in the data.
101-156
2. Non-Stationary Time Series
 Covariance-stationary time series have means, variances, and
autocovariances that do not depend on time. Any time series that is not
covariance-stationary is non-stationary.
 Sources of non-Stationarity
 There are three most pervasive sources of non-Stationarity in time series:
 Time trends
 Seasonalities
 Unit roots (random walks)
102-156
2. Non-Stationary Time Series——Trend
 Polynomial Trends
 Linear time trend
Yt = δ0 + δ1 t + εt εt ~WN 0, σ2
 This process is non-stationary because the mean depends on time :
𝐸 𝑌𝑡 = δ0 + δ1 t
 Nonlinear Trends
 The linear time trend model generalizes to a polynomial time
trend model by including higher powers of time :
Yt = δ0 + δ1 t + δ2 𝑡 2 + ⋯ + δm 𝑡 𝑚 + εt
 Log-linear trend
 Constant growth rates are more plausible than constant growth in
most financial variables. These growth rates can be examined using a
log-linear models that relate the natural log of 𝑌𝑡 to a linear trend.
𝑙𝑛𝑌𝑡 = δ0 + δ1 t + εt
 The constant growth rate of 𝑌𝑡 can be defined as :δ1
103-156
2. Non-Stationary Time Series——Seasonalities
 Seasonalities induce non-stationary behavior in time series by relating the
mean of the process to the month or quarter of the year.
 Seasonal Dummy Variables
 The seasonality repeats each s periods and can be directly modeled as:
𝑌𝑡 = 𝛿 + 𝛾1 𝑙1𝑡 + 𝛾2 𝑙2𝑡 + ⋯ + 𝛾𝑠−1 𝑙𝑠−1𝑡 + 𝜀𝑡
 𝑙𝑖𝑡 = 1 or 0
 Note that dummy variable 𝑙𝑠𝑡 is omitted to avoid what is known as
the dummy variable trap, which occurs when dummy variables are
mullticollinear.
 Illustrates the use of dummy variables
 𝑌𝑡 = 𝛿 + 𝛾1 𝐽𝑎𝑛𝑡 + 𝛾2 𝐹𝑒𝑏𝑡 + ⋯ + 𝛾11 𝑁𝑜𝑣𝑡 + 𝜀𝑡
 𝑌𝑡 = 𝑎 𝑚𝑜𝑛𝑡ℎ𝑙𝑦 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑟𝑒𝑡𝑢𝑟𝑛𝑠
 𝐽𝑎𝑛𝑡 = 1 𝑖𝑓 𝑝𝑒𝑟𝑖𝑜𝑑 𝑡 𝑖𝑠 𝑖𝑛 𝑡ℎ𝑒 𝐽𝑎𝑛𝑢𝑎𝑟𝑦, 𝐽𝑎𝑛𝑡 = 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
 ···
 𝑁𝑜𝑣𝑡 = 𝑂𝑐𝑡𝑡 =···= 𝐽𝑎𝑛𝑡 = 0 𝑖𝑓 𝑝𝑒𝑟𝑖𝑜𝑑 𝑡 𝑖𝑠 𝑖𝑛 𝑡ℎ𝑒 𝐷𝑒𝑐𝑒𝑚𝑏𝑒𝑟.
104-156
2. Non-Stationary Time Series——Random Walk
 Random Walks
 In a non-stationary time series process, once the trend and seasonality

components are modeled and strictly eliminated, the time series should
stay stationary and modeled with AR, MA or ARMA.
Trend Seasonality Cyclical
Linear trend Dummy variable AR

Polynomial trend Apply different lags MA
Log-linear trend ARMA
 However, if a series contains a random walk component, then

detrending cannot eliminate the non-stationarity.
105-156
 How does random walk looks like?
 A simple random walk process evolves according to:

Yt = Yt−1 + εt
 Substituting Yt−1 = Yt−2 + εt−1 into the random walk:

Yt = Yt−2 + εt−1 + 𝜀𝑡
 Repeating the substitution until observation zero is reached:

𝑡
Yt = Y0 + 𝑖=1 εi
 Unlike stationarity, initial value Y0 and every shock between periods

1 and t permanently affects all future values.
 Random walks become more dispersed over time , the variance of

a random walk is V Yt = tσ2 , which is not constant anymore.
106-156
 Unit Roots
 Unit Roots generalize random walks by adding short-run stationary

dynamics to the long-run random walk. A unit root process is usually
described using a lag polynomial.
 e.g., the AR(2) Yt = 1.8Yt−1 − 0.8Yt−2 + εt can be written with a lag

polynomial as :
1 − 1.8𝐿 + 0.8𝐿2 Yt = 𝜀𝑡
 The characteristic equation for the AR(2) with a unit root is

Z2 − 1.8Z + 0.8 = Z − 1 Z − 0.8 = 0
 The roots are 1 and 0.8. Then it indicates the existence of unit root.
107-156
 Consider the situation
 If 𝑌𝑡 is a random walk, then:
𝑌𝑡 = 𝑌𝑡−1 + 𝜀𝑡
𝑌𝑡 − 𝑌𝑡−1 = 𝑌𝑡−1 − 𝑌𝑡−1 + 𝜀𝑡
Δ𝑌𝑡 = 0 × 𝑌𝑡−1 + 𝜀𝑡
 The value of γ is 0 when the process is a random walk.
 Testing for Unit Roots: Augmented Dickey-Fuller (ADF) Test
 Step 1: form the correct model
Δ𝑌𝑡 = 𝛾𝑌𝑡−1 + 𝜀𝑡
 Step 2: State the Null H0 : γ = 0(Yt is a random walk) , H1 : γ < 0(Yt is
covariance-stationary)
 Step 3: Perform the regular t test and draw a conclusion
 A unit root can only be removed by differencing.
108-156
3. Stationary Time Series
 Stationary Time Series
 In a stationary time series, the trend and seasonality components are
strictly eliminated, only cyclical component is left.
 Cyclical component is modeled by both the shocks to the process
and the memory (i.e., persistence) of the process.
109-156
3. Stationary Time Series——Covariance Stationarity
 Why is stationarity so important?
 The ability of a model to forecast a time series depends crucially on
whether the past resembles the future.
 If the past data can’t explain the data in the future, the time series
models are useless.
 Stationarity is a key concept that formalizes the structure of a time series
and justifies the use of historical data to build models.
 If the underlying probabilistic structure of the series were changing
over time(not stationary), there would be no way to predict the
future accurately on the basis of the past.
 Covariance stationarity is used to test a time series data of its stationary.
 Covariance stationarity depends on the first two moments of a time
series: the mean and the autocovariances.
110-156
 Covariance Stationary: A time series is covariance stationary if its first two
moments satisfy three key properties.
 The mean is constant and does not change over time.
 E Yt = μ for all t
 The autocovariance is finite, does not change over time, and only
depends on the distance between observation h
 𝐶𝑜𝑣 Yt , Yt−ℎ = 𝛾ℎ for all t
 The variance is finite and does not change over time(specially when
h=0)
 V Yt = 𝛾0 < ∞ for all t
111-156
 Autocovariance
 Autocovariance is defined as the covariance between a stationary time
series at different points in time.
 The ℎ𝑡ℎ autocovariance is defined as:
γt,h = E Yt − E Yt Yt−h − E Yt−h
 When h=0 , then γt,0 is the variance of Yt
2
γt,0 = 𝐸 Yt − E Yt
 The autocovariance function is a function of h that returns the
autocovariance between Yt and Yt−h
𝛾 ℎ =γℎ
 The symmetry follows from the third property of covariance stationary ,
because the autocovariance depends only on h :
Cov Yt , Yt−h = Cov Yt+h , Yt
112-156
 Autocorrelation function(ACF)
 The autocorrelations are just the “simple” or “regular” correlations
between 𝑌𝑡 and 𝑌t−ℎ , so the function can be written as follow:
cov(𝑌𝑡 , 𝑌𝑡−ℎ ) 𝛾(ℎ) 𝛾(ℎ)
𝜌(h) = = =
var(𝑌𝑡 ) var(𝑌𝑡−ℎ ) 𝛾(0) 𝛾(0) 𝛾(0)
 Partial Autocorrelation function(PACF)
 The partial autocorrelations function measures the relationship between
𝑌t and 𝑌t−ℎ removed the effects of 𝑌t−1 ⋯ 𝑌t−ℎ+1 ,
 In other words, the partial autocorrelations function measures the
relationship only between 𝐘𝐭 and 𝐘𝐭−𝛕 .
 Given that
Yt = b0 + b1 Yt−1 + b2 Yt−2 + ⋯ + bh Yt−h + εt
α(h) = bℎ
113-156
3. Stationary Time Series——White Noise
 White noise
 White noise is the fundamental building block of any time-series model to
model shocks, which means that all the stationary time series models are
established on white noise process.
𝜀𝑡 ~𝑊𝑁 0, 𝜎 2
 White noise processes is covariance-stationary because of three properties
 Mean zero
 Constant and finite variance, 𝜎 2
 No autocorrelation or autocovariance, 𝛾 ℎ = 𝜌 ℎ = 0 𝑓𝑜𝑟 ℎ ≠ 0
 Independent white noise
 If 𝜀 is serially independent, then we say that 𝜀 is independent white noise.
𝑖𝑖𝑑
𝜀𝑡 ~ (0, 𝜎 2 )
 Note that white noise itself has no autocorrelation, but not necessarily to be
independent.
 Gaussian white noise(normal white noise)
𝑖𝑖𝑑
 A special case of iid noise , where 𝜀𝑡 ~ 𝑁(0, 𝜎 2 )
114-156
3. Stationary Time Series——AR, MA and ARMA
 There are three famous models to model stationary time series process,
each built on white noise process.
① AR(autoregressive model)
② MA(moving average model)
③ ARMA(autoregressive & moving average model)
115-156
 AR(p) Model
Yt = δ + ϕ1 Yt−1 + ϕ2 Yt−2 +. . . +ϕp Yt−p + εt ; εt ~WN 0, σ2
 How to detect an AR(p) model to be stationary?
 Step 1, find the lag polynomial in AR(P) model
 The lag operator L shifts the time index of an observation, so that
LYt = Yt−1 ; LP Yt = Yt−P
 The AR(p) model can be written as
1 − ϕ1 L − ϕ2 L2 −. . . −ϕp Lp Yt = δ + εt
 Lag polynomial: Φ L = 1 − ϕ1 L − ϕ2 L2 −. . . −ϕp Lp
116-156
 How to detect an AR(p) model to be stationary?
 Step 2, let z=1/L, rewrite the lag polynomial to get characteristic
equation:
 Φ z = z p − ϕ1 z p−1 − ϕ2 z p−2 −. . . −ϕp
 Step 3, make Φ z = 0 and solve for the characteristic roots of the
function.
 Step 4, if the roots of the function are all less than 1 in absolute value,
the AR(p) model is stationary.
 The long run mean of AR(P) is
δ
E Yt = μ =
1 − ϕ1 − ϕ2 − ⋯ − ϕ𝑝
117-156
 Test the stationarity of an AR(2) model.
Yt = 1.4 × Yt−1 − 0.45 × Yt−2 +εt
 Correct answer
 Step 1, find the lag polynomial in AR(2) model
Φ L = 1 − 1.4L + 0.45L2
 Step 2, let z=1/L, rewrite the lag polynomial to get characteristic
equation
Φ z = z 2 − 1.4z 2−1 + 0.45
 Step 3, make Φ z = 0 and solve for the characteristic roots of the
function
 z1=0.9 and z2=0.5
 Step 4, the roots of the function are all less than 1 in absolute value, the
AR(2) model is stationary.
118-156
② MA Model: The current value of the observed series is expressed as a
function of current and lagged unobservable shocks.
 MA(1) Model
MA parameter
𝑌𝑡 = 𝜇 + 𝜀𝑡 + 𝜃𝜀𝑡−1 = μ + 1 + 𝜃𝐿 𝜀𝑡 ; 𝜀𝑡 ~𝑊𝑁 0, 𝜎 2
Long run mean
 θ acts as a weight and determines the strength of the effect of the
previous shock.
 MA(q) Model
𝑦𝑡 = 𝜇 + 𝜀𝑡 + 𝜃𝟏 𝜀𝑡−1 +. . . +𝜃𝑞 𝜀𝑡−𝑞 ; 𝜀𝑡 ~𝑊𝑁 0, 𝜎 2
119-156
 Characteristic of MA(1)
 Moving averages are always covariance-stationary.
 The property of coefficient,𝜃
 If the coefficient 𝜃 > 0 , then MA(1) process is persistent because
two consecutive values are positively correlated.
 If the coefficient 𝜃 < 0 , then MA(1) process is aggressively mean
reverts, because the effect on previous shock is revised in current
period.
 Difference between MA(1) and AR(1)
 The structure of the MA(1) process, in which only the first lag of the
shock appears on the right, forces it to have a very short memory, and
hence weak dynamics, regardless of the parameter value.
 While AR(1) has a much long memory because the lagged time series
contain the information of more previous time.
120-156
4.AR, MA and ARMA
For MA(1) Model
 E(𝑌𝑡 ) = 𝜇
 V(𝑌𝑡 ) = (1 + 𝜃 2 )𝜎 2
1, ℎ = 0
𝜃
 𝜌(ℎ) = (1+𝜃2 )
,ℎ =1
0, ℎ ≥ 2
 PACF is very complex.
121-156
③ ARMA Model: Autoregressive Moving Average (ARMA) processes combine AR
and MA processes.
 ARMA(1,1) model
𝑌𝑡 = δ + ϕ1 𝑌𝑡−1 + 𝜃𝜀𝑡−1 + 𝜀𝑡
𝜀𝑡 ~𝑊𝑁 0, 𝜎 2
 The mean of ARMA(1,1) is
δ
E Yt = μ =
1 − ϕ1
 The variance of ARMA(1,1) is
𝜎 2 (1 + 2ϕ1 𝜃 + 𝜃 2 )
V Yt = 𝛾0 =
1 − ϕ1 2
 ARMA(p,q) models are often both highly accurate and highly parsimonious.
𝑌𝑡 = δ + ϕ1 𝑌𝑡−1 + ϕ2 𝑌𝑡−2 +. . . +ϕ𝑝 𝑌𝑡−𝑝 + 𝜃𝜀𝑡−1 +. . . +𝜃𝑞 𝜀𝑡−𝑞 + 𝜀𝑡

𝜀𝑡 ~𝑊𝑁 0, 𝜎 2
122-156
 The application of AR, MA and ARMA
 The right choice of the three model depends on the structure of the
time series data.
 In application, the ACF and PACF are commonly used to define the
characteristic of the time series data
Model ACF PACF

AR Gradually decay Sharp cutoff
MA Sharp cutoff Gradually decay
ARMA Gradually decay Gradually decay
123-156
 Gradually decay of an ACF or PACF
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
0 5 10 15 20 0 5 10 15 20 25 30 35
 Sharp cutoff of an ACF or PACF
1
0.5
0
-0.5
-1
0 5 10 15 20 25 30 35
124-156
4. Hypothesis test for autocorrelation
 Hypothesis test for autocorrelation
 If an ARMA model applies, the ACF must gradually decay to 0.
 Box-Pierce Statistic & Ljung-Box Statistic
 The joint null hypothesis is H0 : ρ1 = ρ2 = ⋯ ρn = 0
 Box-Pierce Statistic and Ljung-Box Statistic (follow a χ2 distribution)
ℎ ℎ T+2
2
𝑄𝐵𝑃 = 𝑇 𝜌𝑖 ; 𝑄𝐿𝐵 = 𝑇 𝜌𝑖 2
𝑖=1 𝑖=1 T−i
 Reject the null when Q-statistic is large.(look up in the χ2 table)
 Ljung-Box statistic is a version of Box-Pierce statistic and works
better in smaller samples.
125-156
Measuring Returns,
Volatility and
Correlation
126-156
1. Simple Returns
 Simple Returns (R t )
 The usual return on an asset bought at time t-1 and sold at time t can
be expressed as percentage change in the market variable between the
end of day t–1 and the end of day t .
Pt − Pt−1
Rt =
Pt−1
 The return of an asset over multiple periods is the product of the simple
returns in each period :
T
1 + RT = (1 + R t )
t=1
127-156
2. Continuously Compounded Returns
 Continuously Compounded Returns (rt )
 Continuously compound returns is also known as log returns. These are
computed as the difference of the natural logarithm of the price :
rt = lnPt − lnPt−1
 The relationship between simple and log return
1 + R t = 𝑒 rt
 The main advantage of log returns is that the total return over multiple
periods is just the sum of the single period log return:
T
rT = rt
t=1
 However , the accuracy of log return approximation is poor when the
simple return is large.
128-156
3. Measuring Volatility and Risk
 Measuring Volatility
 A volatility of a financial asset is usually measured by the standard
deviation of its returns.
 The measure of volatility scales with the square-root of the holding
period. When the volatility is measured daily, it is common to convert
daily volatility to annualized volatility by scaling by 252.
σannual = 252 × σdaily
 The variance (also called the variance rate) of returns is estimated using
the standard estimator:
T
1
σ2 = rt − μ 2
T
t=1
129-156
4.The Distribution of Financial Returns
 Non-Normal Distributions
 A normal distribution is symmetric and thin-tailed, and so has no
skewness or excess kurtosis.
 However, many return series are both skewed and fat-tailed.
 E.g., equity price tend to decline in stress periods and gold serves as
a flight-to-safety assets and so moves in the opposite direction.
 Two methods to test normality of a distribution
① The Jarque-Bera Test
② Power Laws
130-156
① The Jarque-Bera Test
 It is used to formally test whether the sample skewness and kurtosis are
compatible with an assumption that the returns are normally distributed.
H0 : Skewness = 0 and Kurtosis = 3
 The test statistic is
s 2 (K − 3)2
JB = T − 1 +
6 24
 T is the sample size and JB~χ22 .
 The critical values of χ22 are 5.99 for a test size of 5% and 9.21 for a
test size of 1%.
 When test statistic is above these values, the null that the data are
normally distributed is rejected.
131-156
② Power Laws
 Normal random variables have thin tails so that the probability of a
return larger than Kσ declines rapidly as K increase, whereas many other
distributions have tails that decline less quickly for large deviations.
 The simplest method to compare tail behavior is to examine the
natural log of the tail probability. lnP X > 𝑥 , which measures the
chance of appearing in extreme tail.
132-156
5. Correlation Versus Dependence
 Dependence
 Two random variables, X and Y are independent if their joint density is
equal to the product of their marginal densities:
fX,Y x, y = fX x × fY y
 Financial assets are highly dependent and exhibit both linear and
nonlinear dependence.
 The linear correlation estimator (Pearson’s correlation) measures
linear dependence.
Cov X,Y
ρXY =
σX σY
 Nonlinear dependence takes many forms and cannot be
summarized by a single statistic.
Rank correlation (also known as Spearman's correlation)
Kendal’s 𝝉
133-156
6. Rank Correlation——Spearman’s Correlation
Performance of a Portfolio with Two Assets
Asset X Asset Y Return of Asset X Return of Asset Y
2008 100 200
2009 120 230 20.00% 15.00%
2010 108 460 -10.00% 100.00%
2011 190 410 75.93% -10.87%
2012 160 480 -15.79% 17.07%
2013 280 380 75.00% -20.83%
Average 29.03% 20.07%
134-156
 Order the return set pairs of X and Y with respect to the set X.
 Then derive the ranks of Xi and Yi.
 Derive the difference of the ranks and square the difference.
Ranked Asset Returns to Derive the Spearman Correlation Coefficient

Ranked Assigned (same year) Rank Rank
di di2
Return of Xi Return of Yi of Xi of Yi
2012 -15.79% 17.07% 1 4 -3 9
2010 -10.00% 100.00% 2 5 -3 9
2009 20.00% 15.00% 3 3 0 0
2013 75.00% -20.83% 4 1 3 9
2011 75.93% -10.87% 5 2 3 9
Sum = 36
135-156
 Spearman correlation coefficient
𝑛 2
6 𝑖=1
𝑑𝑖
𝜌𝑠 = 1 −
𝑛(𝑛2 − 1)
 We derive:
6 × 36
𝜌𝑆 = 1 − = −0.8
5 × (52 − 1)
 Since the Spearman correlation coefficient is defined between −1 and +1,

we find that the returns of assets X and Y are highly negatively correlated.
136-156
6. Rank Correlation——Kendall’s Ƭ
 A concordant pair is a pair of observations, each on two variables, {X1, Y1}
and {X2, Y2}, having the property that:
 sgn (X2 - X1) = sgn (Y2 - Y1)
 A discordant pair is a pair of two-variable observations such that:

 sgn (X2 - X1) = - sgn (Y2 - Y1)
137-156
 Recall the previous example
 2 concordant pairs: {(1,4),(2,5)}, {(4,1),(5,2)}
 8 discordant pairs: {(1,4),(4,1)}, {(1,4),(5,2)}, {(2,5),(4,1)}, {(2,5),(5,2)},
{(1,4),(3,3)}, {(2,5),(3,3)}, {(3,3),(4,1)}, {(3,3),(5,2)}
 0 neither concordant nor discordant pairs
138-156
 Kendall’s Ƭ:
nc − nd
τ=
n(n − 1 ) 2
 nc: the number of concordant data pairs.
 nd: the number of discordant pairs.
 We derive:
2−8
τ= = −0.6
5 × (5 − 1 ) 2
139-156
 The problem with applying ordinal rank correlations to cardinal
observations is that ordinal correlation are less sensitive to outliers.
 Let’s double the outliers of the returns of asset X.
 That result in an increase of the Pearson correlation coefficient from
−0.7402 to −0.6108.
 A special problem with the Kendall t is when many non-concordant and
many non-discordant pairs occur, which are omitted in the calculation.
 If the 10 observation pairs, four are neither concordant nor discordant,
leaving just six pairs to be evaluated.
140-156
7. The Relationship between Correlations
 Relationship between Pearson's correlation (𝝆), Spearman's rank correlation (𝝆𝑺 ),
and Kendall's 𝝉 for a bivariate normal distribution.
 The rank correlation is virtually identical to the linear correlation for normal
random variables.
 Kendall's 𝜏 is increasing in 𝜌 , although the relationship is not linear.
 Kendall's 𝜏 produces values much closer to zero than the linear correlation for
most of the range of 𝜌.
 Estimates of 𝜏 with the same sign as the sample correlation, but that are smaller
in magnitude, do not provide evidence of nonlinearity.
141-156
Simulation and
Bootstrapping
142-156
1. Monte Carlo Simulation
 Monte Carlo: Monte Carlo was derived from the name of a famous casino
established in 1862 in the south of France (actually, in Monaco).
Monte-Carlo, Monaco Monte Carlo casino, Monaco

 Monte Carlo Simulations are also often extremely useful tools in
finance, in situations such as:
 The pricing of exotic options, where an analytical pricing formula is
unavailable
143-156
 Improving the Accuracy of Simulations
① Increasing the repeating times, N.
 The standard method of estimating an expected value relies on the
LLN and CLT. In that case, the standard error of the estimated
1
expected value is proportional to .
𝑁
var 𝑠𝑖𝑚𝑢𝑙𝑎𝑡𝑒𝑑 𝑑𝑎𝑡𝑎

𝑆𝑥 =
𝑁
② Antithetic Variables
③ Control Variates
144-156
 Improving the Accuracy of Simulations
② Antithetic Variables
 The antithetic variables add a second set of random variables that
are constructed to have a negative correlation with the iid
variables used in the simulation.
③ Control Variates
 The control variates help to reduce the Monte Carlo variation owing
to particular sets of random draws by using the same draws on a
related problem whose solution is known.
145-156
 Limitation of Simulations
 The biggest challenge when using simulation to approximate moments
is the specification of the DGP.
 If the DGP does not adequately describe the observed data, then
the approximation of the moment may be unreliable.
 making specific assumptions about observed data is needed
 The other important consideration is the computational cost.
146-156
2. Bootstrapping
 Bootstrapping directly uses the observed data to simulate a sample that
has similar characteristics.
 Specifically, it does this without directly modeling the observed data or
making specific assumptions about their distribution.
 Two types of bootstrapping
 i.i.d. bootstrap
 Circular block bootstrap(CBB)
147-156
2. Bootstrapping
 Example: Bootstrapping is used to simulate data from a data set of {X1,X2,X3……X100}
 If the data set is iid, then iid bootstrap is applied
 Define simulation sample size m=5
 Random sampling with replacement
 Sample 1: {X1,X2,X34,X56,X99}
 Sample 2: {X3,X5,X44,X56,X67}
 Sample 3: {X7,X8,X31,X67,X68}
 If the data set is not iid, then Circular block bootstrap(CBB) is applied
 Define block size q=5 (rule-of-thumb, q= 𝑁)
 Samples blocks of size q with replacement
 Sample 1: {X1,X2,X3,X4,X5}
 Sample 2: {X2,X3,X4,X5,X6}
 Sample 3: {X3,X4,X5,X6,X7}
148-156
The Standard Normal Distribution
149-156
The Standard Normal Distribution
150-156
Student’s t distribution Table
151-156
Student’s t distribution Table
152-156
Chi-Square Distribution Table
153-156
F-Distribution Table
154-156
It’s not the end but just beginning.
Thought is already is late, exactly is the earliest time.
感到晚了的时候其实是最快的时候
155-156
问题反馈
 如果您认为金程课程讲义/题库/视频或其他资料中存在错误，欢迎您告诉我
们，所有提交的内容我们会在最快时间内核查并给与答复。
 如何告诉我们？
 将您发现的问题通过电子邮件告知我们，具体的内容包含：
 您的姓名或网校账号
 所在班级（eg.2005FRM一级长线无忧班）
 问题所在科目（若未知科目，请提供章节、知识点）和页码
 您对问题的详细描述和您的见解
 请发送电子邮件至：academic.support@gfedu.net
 非常感谢您对金程教育的支持，您的每一次反馈都是我们成长的动力。后续
我们也将开通其他问题反馈渠道（如微信等）。
156-156

FRM一级强化段定量分析 Crystal 金程教育 (标准版

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FRM一级强化段定量分析 Crystal 金程教育 (标准版

Uploaded by

Copyright:

Available Formats

Quantitative

 The probability of an event or its complement must be 1

posterior probability prior probability

 Example, consider the following joint PMF:

E(X) = P(xi ) xi = P(x1 )x1 + P(x2 )x2 +. . . +P(xn )xn

 Properties of Expected Value

 When the sample size is even, 𝑚𝑒𝑑𝑖𝑎𝑛 𝑥 = 1 2 (𝑥𝑛 2 + 𝑥𝑛 2+1 )

Leptokurtic Mesokurtic Platykurtic

The normal curve is symmetrical.

 Properties The mean, median, and mode are equal.

 X ~ N(µ, σ²), fully described by its two parameters µ and σ².

-2.58 -1.65  +1.65 +2.58

0.0 0.0 0.0

p = 0.1 p = 0.3 p = 0.5

 The Poisson random variable can approximate to a normal random

 If lnX is normal, then X is lognormal; if a variable is lognormal, its natural

 Loosely speaking, is the study of the relationship between a population

 Sampling and Estimation

population population parameter

sample sample statistic

 Confidence Interval Estimate

Step 1 Step 2 Step 3

State null and Select a level of

One-tailed test Two-tailed test

 Steps 2: Select the test statistic and determine the distribution

-2.052 2.052 1.645

Reject H0 Fail to Reject H0 Reject H0 Fail to Reject H0 Reject H0

Normal population, n < 30 n ≥ 30

Unknown population variance t-test t-test or z-test

 The Rule of P Value

Negative of the Critical value for 5% Positive of the

 If P-value < alpha, we reject null hypothesis.

Intercept coefficient Error term

 The OLS sample coefficients are those that:

Explained Sum of Squares

 In a non-stationary time series process, once the trend and seasonality

Trend Seasonality Cyclical

Linear trend Dummy variable AR

 However, if a series contains a random walk component, then

 A simple random walk process evolves according to:

 Substituting Yt−1 = Yt−2 + εt−1 into the random walk:

 Repeating the substitution until observation zero is reached:

 Unlike stationarity, initial value Y0 and every shock between periods

 Random walks become more dispersed over time , the variance of

 Unit Roots generalize random walks by adding short-run stationary

 e.g., the AR(2) Yt = 1.8Yt−1 − 0.8Yt−2 + εt can be written with a lag

 The characteristic equation for the AR(2) with a unit root is

 PACF is very complex.

𝑌𝑡 = δ + ϕ1 𝑌𝑡−1 + ϕ2 𝑌𝑡−2 +. . . +ϕ𝑝 𝑌𝑡−𝑝 + 𝜃𝜀𝑡−1 +. . . +𝜃𝑞 𝜀𝑡−𝑞 + 𝜀𝑡

Model ACF PACF

Asset X Asset Y Return of Asset X Return of Asset Y

2008 100 200

2009 120 230 20.00% 15.00%

2010 108 460 -10.00% 100.00%

2011 190 410 75.93% -10.87%

2012 160 480 -15.79% 17.07%

2013 280 380 75.00% -20.83%

Average 29.03% 20.07%

Ranked Asset Returns to Derive the Spearman Correlation Coefficient

 Since the Spearman correlation coefficient is defined between −1 and +1,

 sgn (X2 - X1) = sgn (Y2 - Y1)

 A discordant pair is a pair of two-variable observations such that: