Professional Documents
Culture Documents
“There is no way to predict whether the price of stocks and bonds will go up or down
over the next few days or weeks. But it is quite possible to foresee the broad course of
the prices of these assets over longer time periods, such as the next three to five years...”
2013 Nobel Prize Committee
1
Chapter 1 Empirical finance
2
Chapter 1 Empirical finance
the linear combination, there is an error vector. The error vector has mean zero and a
variance-covariance matrix given by
σ 2 0 .. .. 0
0 σ 2 0 .. ..
0 2
V ar(ε) = E(εε ) = σ In =
.. 0 σ 2 0 .. .
.. .. ... ... ...
0 .. .. 0 σ 2
Note: the first column of the X matrix could just be a column of ones. This is the case
when there is an intercept in the regression (β1 would therefore be the intercept).
Given the assumptions, it is easy to show that E(Y ) = Xβ and V ar(Y ) = σ 2 In . In
fact,
n
X
βb = arg min (yi − x0i β)2
β i=1
= arg min(Y − Xβ)0 (Y − Xβ)
β
0
= (X X)−1 X 0 Y
3
Chapter 1 Empirical finance
n n
−1 n
x21i
P P P
i=1 x1i x2i ... ... i=1 x1i yi
i=1
n n n
P P
x22i
P
x1i x2i ... ... x y
i=1 2i i
=
i=1 i=1
.
... ... ... ...
n ...
n P
x2ki
P
... ... ... xki yi
i=1 i=1
The least squares method chooses the vector β by minimizing the squared differences
around the multivariate line x0 β.
Proof.
Hence,
∂C(β) 0
= 0 ⇒ −2X (Y − Xβ) = 0
∂β
2
∂ C(β)
= 2X 0 X > 0.
∂β∂β 0
By setting the first derivative equal to zero, we obtain the OLS βb estimator. The second
derivative is positive. Hence, we confirm that we have a minimum.
yb1 βb1 x11 + βb2 x21 + ... + βbk xk1
yb2 βb1 x12 + βb2 x22 + ... + βbk xk2
Yb =
yb3 =
βb1 x13 + βb2 x23 + ... + βbk xk3
... ...
ybn βb1 x1n + βb2 x2n + ... + βbk xkn
4
Chapter 1 Empirical finance
= X β.
b
Residuals (b
ε). These are the differences between the true Y values and the fitted values
Yb .
y1 yb1
y2 yb2
εb = Y − Yb =
y3 −
yb3 = Y − X β.
b
... ...
yn ybn
1.2 Properties
From the first-order conditions of the minimization problem we can write
0
b = 0 ⇒ X 0 εb = 0
X (Y − X β)
This is like saying that the residuals are orthogonal to the X observations! Since the
fitted values are linear combinations of the X observations (see above), it also says that
the residuals are orthogonal to the fitted values. In other words,
0
X εb = 0
and
5
Chapter 1 Empirical finance
0
Yb εb = 0.
Important: Note that, if there is an intercept in the regression, then the first element in
n
0 P
X εb becomes εbi = 0. In other words, the sample mean of the residuals is zero, if there
i=1
is an intercept in the regression!
Yb = X βb = X(X 0 X)−1 X 0 Y = P Y
and
where M and P are symmetric and idempotent matrices. They are symmetric, since
M = M 0 and P = P 0 (try showing it ...). They are idempotent since M = M M and
P = P P (try showing it ...).
Geometrically, P is the matrix which projects Y on the space spanned by the columns
of X (recall, Yb is a linear combination of the regressors). M is a matrix which projects
Y on the space orthogonal to the space spanned by the columns of X (recall εb and Yb are
orthogonal).
X = [ X1 | X2 ],
n×k1 n×k2
where k = k1 + k2 . We are simply separating the full matrix X into two sub-matrices.
Then,
6
Chapter 1 Empirical finance
βb = (X 0 X)−1 X 0 Y
" #−1 " #
X10 X1 X10 X2 X10 Y
=
X20 X1 X20 X2 X20 Y
Thus, we can run two separate regressions (one on X1 and one on X2 ) to obtain the least
squares estimates.
(2) Assume X10 X2 6= 0. Then, we can show that
" # " #
βb1 (X10 M2 X1 )−1 X10 M2 Y
= ,
βb2 (X20 M1 X2 )−1 X20 M1 Y
where M2 = In − X2 (X20 X2 )−1 X20 and M1 = In − X1 (X10 X1 )−1 X10 .
Intuition: Focus on βb1 . This is like running several regressions. First, regress the first
column of X1 on X2 . Obtain the residuals. Now, regress the second column of X1 on X2 .
Obtain the residuals. Keep going until you reach the last column of X1 . Collect the k1
columns of residuals in a new matrix X e1 = M2 X1 . Regress Y on the matrix of residuals
to obtain βb1 .
7
Chapter 1 Empirical finance
−1
βb1 = e10 X
X e1 e10 Y
X
−1
= (X10 M2 X1 ) X10 M2 Y.
In English, first you want to purge X1 of the effect of X2 and compute the component of
X1 which is orthogonal of X2 (the residual matrix), then you want to regress Y on the
residual matrix.
Important: multiple regression does this automatically. You never really go through
these steps. If you know X2 and only care about β1 , just put X2 in your regression!
Proof.
" #
β1
Xβ = [X1 |X2 ] = X1 β1 + X2 β2
β2
= (M2 + P2 )X1 β1 + X2 β2
= M2 X1 β1 + P2 X1 β1 + X2 β2
= M2 X1 β1 + X2 (β2 + (X20 X2 )−1 X20 X1 β1 )
= M2 X1 β1 + X2 c
" #
β1
= [M2 X1 |X2 ] .
c
Notice that β1 has not changed. However, the regressors (and the second set of param-
eters) have changed. The two blocks are now orthogonal! Hence, I can find β1 by just
running a regression of Y on M2 X1 = X
e1 :
−1
βb1 = e10 X
X e1 e10 Y
X
−1
= (X10 M20 M2 X1 ) X10 M20 M2 Y
−1
= (X10 M2 M2 X1 ) X10 M2 M2 Y
8
Chapter 1 Empirical finance
−1
= (X10 M2 X1 ) X10 M2 Y,
Write
βb = (X 0 X)−1 X 0 Y
⇒
βb = (X 0 X)−1 X 0 (Xβ + ε)
= (X 0 X)−1 (X 0 X)β + (X 0 X)−1 X 0 ε
⇒
βb = β + (X 0 X)−1 X 0 ε.
b = E(β + (X 0 X)−1 X 0 ε)
E(β)
= β + (X 0 X)−1 X 0 E(ε)
= β.
The OLS estimator is unbiased. Interpret: whatever the true parameter β is, if the
model is true (i.e., if Y = Xβ +ε), then the OLS estimator will deliver the right parameter
(on average). This said, there is some sampling variation around the expectation. Hence,
we need to talk about the variance of β.b
9
Chapter 1 Empirical finance
b = E[(βb − E(β))(
V ar(β) b βb − E(β))b 0]
Interpret. The variance of βb depends directly on the variance of the error terms σ 2 and
0
inversely on the “variability” of the X observations, i.e., XnX . It also depends inversely
on the number of observations. Notice that, when the number of observations increases
without bound (i.e., when n → ∞), the distribution of the βb estimator becomes more and
more concentrated around the expected value β. We will return to this idea in Chapter
2.
e ≥ V ar(β).
V ar(β) b
10
Chapter 1 Empirical finance
E(β)
e = E(AY )
= E(AXβ + Aε)
= AXβ + AE(ε)
= AXβ.
e = E((βe − E(β))(
V ar(β) e 0)
e βe − E(β))
e ≥ V ar(β).
V ar(β) b
Write
AA0 − (X 0 X)−1
= AA0 − AX(X 0 X)−1 X 0 A0
= A(In − X(X 0 X)−1 X 0 )A0
= AM A0
= AM M A0
= AM M 0 A0
11
Chapter 1 Empirical finance
z 0 AM (AM )0 z
= ze0 ze
X n
= zei2 ≥ 0,
i=1
2.2 Estimation of σ 2
We (almost) use the empirical variance of the estimated residuals:
n
εb2i
P
i=1 εb0 εb
b2 =
σ = .
n−k n−k
b is unbiased for σ 2 (i.e., E(b
Statistical property: σ 2
σ 2 ) = σ 2 ).
Proof. Recall
εb = M Y
or
εb = M (Xβ + ε) = M ε.
Thus,
1 1
σ2) =
E(b ε0 εb) =
E(b E(ε0 M 0 M ε)
n−k n−k
1 1
= E(ε0 M ε) = E(trε0 M ε)
n−k n−k
12
Chapter 1 Empirical finance
1 1
= E(trM εε0 ) = tr (M E(εε0 ))
n−k n−k
σ2 σ2
= tr(M In ) = tr(M )
n−k n−k
σ2 0 σ2 0
= tr(In − X(X 0 X)−1 X ) = (trIn − tr(X(X 0 X)−1 X ))
n−k n−k
σ2 0
σ2
= (n − tr (X 0 X)−1 X X ) = (n − tr (Ik ))
n−k n−k
σ2
= (n − k) = σ 2 .
n−k
The result relies on the symmetry and idempotency of M . It also relies on the properties
of the trace (for a review, refer to Chapter 0).
where
d
ε → N (0, σ 2 In ).
d
The symbol → signifies “distributed as”. Because the error terms are normally distributed
and Y is a linear combination of normal random variables (with Xβ deterministic), we
have
d
Y → N (Xβ, σ 2 In ).
Note: we are imposing strong restrictions on the error terms. Not only are we saying that
they are mean zero, homeskedastic (same variance) and uncorrelated, we are also saying
that they are normally distributed. This will lead to an exact inferential theory. We
will see later what we mean by “exact”. In Chapter 2, normality will be relaxed. In
Chapter 3, we will relax normality, homoskedasticity and uncorrelatedness.
13
Chapter 1 Empirical finance
H0 : c0 β = γ
or, equivalently,
k
X
H0 : cj βj = γ.
j=1
H0 : βj = 0.
Write:
0
0
..
c=
1 (this is the jth spot)
...
0
and γ = 0.
(2) Multiple linear restrictions:
H0 : R β = r
q×kk×1 q×1
H0 : β2 = β3 = ... = βk = 0.
Write:
14
Chapter 1 Empirical finance
0 1 0 ...
0 0 1 0 ...
R = ... ...
k−1×k
0 ... 0
0 ... 0 1
and
0
0
r = .
k−1×1
...
0
3.2 Implementation
Recall:
d
Y → N (Xβ, σ 2 In ).
Write
βb = (X 0 X)−1 X 0 Y.
Since, βb is a linear combination of normal random variables (the Y s), it is also a normal
random variable. Hence,
d
βb → N (β, σ 2 (X 0 X)−1 )
H0 : c0 β = γ.
d
c0 βb → N (c0 β, σ 2 c0 (X 0 X)−1 c)
15
Chapter 1 Empirical finance
or
d
c0 βb − c0 β → N (0, σ 2 c0 (X 0 X)−1 c)
or
c0 βb − c0 β d
p → N (0, 1)
0 0
σ c (X X) c −1
c0 βb − γ d
p → N (0, 1).
0 0 −1
σ c (X X) c H0
This would be our test statistic, if we knew σ. If we knew σ, we could test the null hypoth-
0b
esis c0 β = γ by checking if the ratio √ c0 β−γ
0 −1
falls in the tails of the normal distribution
σ c (X X) c
(i.e., if it is larger than 2, or smaller than
q -2, for a 5% level test). Unfortunately, we do
εb0 εb
not know σ. We estimate σ using σ b = n−k .
We will show that, when we replace σ with σ b, the ratio is not standard normal
anymore. It is t-distributed with n − k degrees of freedom. The following result will lead
to the finding.
IFirst aside:
εb0 εb d 2
→ χn−k ,
σ2
where χ2n−k is a chi-squared random variable with n − k degrees of freedom.
Proof:
Recall, εb = M ε. Hence,
εb0 εb ε0 M 0 M ε ε0 M ε ε0 QΛQ0 ε
= = =
σ2 σ2 σ2 σ2
by the Jordan decomposition of the idempotent matrix M (please refer to Chapter 0).
Q is the matrix containing the eigenvectors of M . Note that QQ0 = In . Λ is the
matrix containing the eigenvalues of M on the diagonal and zeros everywhere else. By
16
Chapter 1 Empirical finance
Q0 ε d 1
→ N (0, 2 Q(σ 2 In )Q0 ) = N (0, In ).
σ σ
Q0 ε d
Thus, call σ
= Z → N (0, In ). This implies,
n−k
ε0 QΛQ0 ε 0
X d
= Z ΛZ = zi2 → χ2n−k
σ2 i=1
N (0, 1)
q 2 = tn−k ,
χn−k
n−k
c0 βb − γ d
p → N (0, 1)
σ c0 (X 0 X)−1 c
Write
0b
√ c0 β−γ
c0 βb − γ c (X 0 X)−1 c d N (0, 1)
σ
p = q 0 → q 2 = tn−k .
σ
b c0 (X 0 X)−1 c εb εb χn−k
σ 2 (n−k) n−k
Interpret. When we replace σ with its estimator σ b, we effectively compute the ratio
between a normal random variable and the square root of an independent chi-squared
random variable divided by the number of degrees of freedom (n − k, in this case). As in
17
Chapter 1 Empirical finance
the second aside, this ratio is distributed as a t-student distribution with n − k degrees
of freedom. Thus,
c0 βb − γ d
p → tn−k .
σ
b c0 (X 0 X)−1 c
0b
We now test the null hypothesis c0 β = γ by checking if the ratio √ c0 β−γ falls in the
0 σ
b c (X X)−1 c
tails of the t distribution with n − k degrees of freedom (i.e., if it is larger than t0.025,n−k
- i.e., slightly larger than 2 - or smaller than −t0.025,n−k - i.e., slighly smaller than -2 - for
a 5% level test).
where (X 0 X)−1 0 −1
jj is the j th spot on the diagonal of the matrix (X X)jj .
H0 : R β = r .
q×kk×1 q×1
d
Rβb → N (Rβ, σ 2 R(X 0 X)−1 R0 )
or
d
Rβb − Rβ → N (0, σ 2 R(X 0 X)−1 R0 )
or
−1/2 d
Zq = σ −1 R(X 0 X)−1 R0 (Rβb − Rβ) → N (0, Iq ).
18
Chapter 1 Empirical finance
−1 d
Zq0 Zq = σ −2 (Rβb − Rβ)0 R(X 0 X)−1 R0 (Rβb − Rβ) → χ2q
−1 d
Zq0 Zq = σ −2 (Rβb − r)0 R(X 0 X)−1 R0 (Rβb − r) → χ2q .
H0
This last result is obvious. The internal product of q independent standard normal
random variables is just a chi-squared random variable with q degrees of freedoms.
At this point, we could use the 95th percentile of the chi-squared distribution with q
degrees of freedom (χ20.95,q ) to test the null hypothesis. If
−1
σ −2 (Rβb − r)0 R(X 0 X)−1 R0 (Rβb − r) ≥ χ20.95,q
then we would reject the null hypothesis. The problem, again, is that we do not know σ.
Just like earlier, we will show that when we replace σ with σ b, the distribution of
the test statistic changes. In this case, it changes to that of an F random variable with
number of degrees of freedom q, n − k (when the test statistic is also divided by the
number of restrictions q).
IThird aside: Consider a chi-squared random variable with q degrees of freedom χ2q .
Consider a chi-squared random variable with n − k degrees of freedom χ2n−k . Assume the
two random variables are independent. Hence,
χ2q /q d
2
→ Fq,n−k ,
χn−k /n − k
an F distribution with q degrees of freedom in the numerator and n−k degrees of freedoms
in the denominator.J
Now, write
−1
σ −2 (Rβb − γ)0 (R(X 0 X)−1 R0 ) (Rβb − γ)/q
εb0 εb
σ 2 (n−k)
19
Chapter 1 Empirical finance
−1 d
b−2 (Rβb − γ)0 R(X 0 X)−1 R0
= σ (Rβb − γ)/q → Fq,n−k .
Thus, when we replace σ with σ b (and divide by q ), rather than using the 95th percentile
of the chi-squared distribution, we use the 95th percentile of the F distribution to test.
Example: Classical F-test with an intercept (H0 : β2 = β3 = ... = βk = 0).
The relevant statistic is
−1 d
σ b 0 R(X 0 X)−1 R0
b−2 (Rβ) (Rβ)/(k
b − 1) → Fk,n−k
with
0 1 0 ...
0 0 1 0 ...
R = .
k−1×k
... ... 0 ... 0
0 ... 0 1
Note: All of these tests are “exact” in the sense that they are valid for any number of
observations n. Exact tests can be derived only by imposing strong restrictions (like
normality) on the error terms. Without normality, this testing framework would not
hold. In Chapter 2, we will abandon normality of the error terms and derive tests which
are not “exact” (and are, therefore, not valid for any n) but “asymptotic”, i.e., they are
valid only when the number of observations goes off to infinity.
We are interested in the relation between the average bid-ask spreads on stocks and the
characteristics of the corresponding companies (Stoll, 2000). Download the file spreads-
microstructure.xls. The file contains information for the 100 stocks in the S&P 100 index.
Our variable of interest (the Y variable) is the bid-ask spread (constructed as an average
20
Chapter 1 Empirical finance
over the day) - or tradecost - of the S&P100 stocks. The explanatory, or X, variables
are:
2. log size - The log of the size of the stock. Size is total outstanding number of shares
multiplied by share price. Size is measured in thousands of dollars
3. log trades - This is the log of the average number of trades per day
4. log turn - This is the log of the ratio of the average number of shares traded per
day (in dollars) over the number of shares outstanding (in dollars)
The same data is used in Bandi and Russell (2007). Consider the following theories of
the determinants of bid-ask spreads.
1. Asymmetric information. Stocks with greater degrees of asymmetry in infor-
mation (regarding their fundamental value) tend to have wider bid-ask spreads. The
number of analysts following a stock is viewed as an asymmetric information proxy. The
larger it is, the lower private information, the smaller the spreads. Log turn-over is, also,
seen as an asymmetric information proxy. The larger it is, the larger private information,
the larger the spreads. (As Stoll, 1989, points out, without informed trading, stocks
would be traded in proportion to their shares outstanding. Trading rates in excess of this
proportion should be associated with informed trading.)
2. Liquidity. Stocks that trade more frequently and have larger market capitalization
(i.e., more liquid stocks) tend to have lower bid-ask spreads. The larger log trades and
log size, the larger liquidity, the smaller the spreads. Log turn-over is, also, sometimes
seen as a liquidity proxy. The larger it is, the larger liquidity, the smaller the spreads.
3. Fundamental volatility. Stocks that have a higher volatility of fundamental
values tend to have larger bid-ask spreads. Higher uncertainty about the underlying
stock’s value implies higher potential for adverse price moves and, hence, higher inventory
risk, mostly in the presence of large imbalances to offset (Ho and Stoll, 1981).
21
Chapter 1 Empirical finance
30
Series: TRADECOST
Sample 1 100
25
Observations 100
20 Mean 0.000643
Median 0.000533
Maximum 0.002967
15
Minimum 0.000319
Std. Dev. 0.000399
10 Skewness 3.886016
Kurtosis 21.20235
5
Jarque-Bera 1632.208
Probability 0.000000
0
0.0005 0.0010 0.0015 0.0020 0.0025 0.0030
22
Chapter 1 Empirical finance
20
Series: LOG_TRADECOST
Sample 1 100
16 Observations 100
Mean -7.451748
12 Median -7.536629
Maximum -5.820198
Minimum -8.049928
8 Std. Dev. 0.403460
Skewness 1.624503
Kurtosis 6.613926
4
Jarque-Bera 98.40212
Probability 0.000000
0
-8.0 -7.8 -7.6 -7.4 -7.2 -7.0 -6.8 -6.6 -6.4 -6.2 -6.0 -5.8
23
Chapter 1 Empirical finance
H0 : β3 = 0.
24
Chapter 1 Empirical finance
Write
0
0
c = 1 (this is the 3rd spot)
...
0
and γ = 0. Thus,
βb3 − 0
t= q
b (X 0 X)−1
σ 3,3
1.022999 − 0
=
0.053245
=19.21,
where (X 0 X)−1 0 −1
3,3 is the 3rd value on the diagonal of the matrix (X X)3,3 .
Since |19.21| > 2, the parameter associated with volatility is “statistically different”
from zero. In fact, volatility is the most statistically significant of all assumed
predictors.
Note: if one wished to use the critical values of the t distribution with n − k degrees
of freedom (in our case, 100 - 6 = 94), these critical values would be close to -2
and 2 (since the t density function would be very similar to the normal density
function).
Alternatively, one could use a “multiple restriction” test to test a single restriction.
This would effectively amount to using a “one-sided” test rather than a “two-sided”
test. Write
H0 : R β = r .
q×kk×1 q×1
where q = 1, the vector R is the same as the vector c above (transposed, of course),
25
Chapter 1 Empirical finance
and the scalar r is the same as the scalar γ above, namely 0. Hence,
H0 : c0 β = γ = 0.
1×kk×1 1×1
0 −1 d
b−2 βb3 (X 0 X)−1
σ 3,3 βb3 /1 → F1,94
Notice that the F statistic would now effectively be the square of the t statistic.
Here is the output.
Wald Test:
Equation: Untitled
In the output above, one could ignore - for the time being - the remaining test
(called Chi-square). We will return to it in the next chapter.
5. Test the assumption that the coefficient associated with log volatility
is equal to 1. If this is the case, how would you interpret the relation
between daily volatility and bid-ask spreads?
26
Chapter 1 Empirical finance
Again, this is a single restriction test. We are testing if c(3) = 1. In other words:
H0 : β3 = 1.
Write
0
0
..
c=
1 (this is the 3rd spot)
...
0
and γ = 1. Specifically, write
βb3 − 1
t= q
b (X 0 X)−1
σ 3,3
1.022999 − 1
=
0.053245
=0.43,
Hence, we “fail” to reject. The true volatility slope could be 1. Of course, as earlier,
we could have used a single-sided F test as well.
What does it mean to have a slope equal to 1? The slope is
∂log(tradecosts)∂tradecosts ∂tradecosts
∂log(tradecosts) ∂tradecosts tradecosts
= ∂log(volatility)∂volatility
= ∂volatility
= 1,
∂log(volatility) volatility
∂volatility
Hence, since the regression is log/log (“logarithm on logarithm”), the slope has an
interpretation in terms of elasticity. A 1% increase in volatility translates into a 1%
increase in tradecosts.
27
Chapter 1 Empirical finance
6. Test the assumption that the coefficients associated with log size and log
trades are equal to each other. Be as precise as possible.
We use a classical F test with 1 restriction. See below in bold. We “fail” to reject.
Clear from the p-value, right?
Wald Test:
Equation: Untitled
7. Test the assumption that the coefficients associated with log turnover
and number of analysts are jointly equal to zero.
We use a classical F test with 2 restrictions. See below in bold. We reject. Again,
look at the p-value ...
28
Chapter 1 Empirical finance
Wald Test:
Equation: Untitled
8. Let us use our model to predict what the spread will look like tomor-
row. Use a regression which does not include the number of analysts
to predict. Consider a stock which has a log size of 10.5. Suppose that
for this stock we expect that for the following day the log turnover will
be -1.1, the log of the number of trades will be 7.6, and the log of the
standard deviation will be -3.5. Predict what the spread will be for this
stock tomorrow. (Note that since the regression is run with log spreads
you will have to make a transformation to convert your prediction for
the log spread into a prediction for the actual spread ...).
Here is the regression output excluding the number of analysts. All other parameter
estimates are robust to this exclusion, i.e., similar to previous results.
29
Chapter 1 Empirical finance
\
log(tradecosts) = −0.829−0.14×10.5+1.023×(−3.5)−0.17×7.6−0.099×(−1.1) = −7.0626
Now, to obtain a prediction for tradecosts rather than for log(tradecosts) we need
to exponentiate. We write
\ = e−7.0626 = 0.00085655,
tradecosts
which is a value larger than the historical mean (see the histogram in point 1 above).
30
Chapter 1 Empirical finance
5 Appendix I:
5.1 Another useful idempotent matrix
Consider
L = In − i(i0 i)−1 i0 ,
(ii0 )
L = In −
n
or
1 0 ... ... 1 1 1 1
1 ... 1 1 1
− 1 1
L= .
n 1 1
... ... ... 1 1
1 1 1 1 1
The matrix L transforms any n × 1 column vector y in deviations from the mean. In fact,
1 0 ... ... y1 1 1 1 1 y1
1 ... y2 1 1 1 1 1 y2
Ly = ... − n 1
... 1 ...
1 1 1
...
1 yn 1 1 1 1 yn
y1 Y y1 − Y
y2 Y y2 − Y
= − = .
...
... ...
yn Y yn − Y
31
Chapter 1 Empirical finance
Y = Xβ + ε,
where
1 x11
1 x12
X=
... ...
1 x1n
and
" #
β1
β= .
β2
−1
βb2 = (X20 M1 X2 ) X20 M1 Y
−1
βb2 = (X20 LX2 ) (X20 LY )
−1
= (X20 LLX2 ) (X20 LLY )
−1
= (X20 L0 LX2 ) (X20 L0 LY )
= ((LX2 )0 (LX2 ))−1 (LX2 )0 (LY )
Pn
(yi − Y )(xi − X)
i=1
= Pn ,
(xi − X) 2
i=1
32
Chapter 1 Empirical finance
which has a very familiar form from univariate regression analysis, right? Also, this is
how you compute the beta of an asset ... recall?
but
Y = Yb + εb.
Hence,
(LYb )0 εb = Yb 0 L0 εb = Yb 0 Lb
ε = Yb 0 εb = 0.
Thus,
n
X
(yi − Y )2
i=1
= (LY )0 (LY )
= (LYb )0 (LYb ) + (b
ε0 εb)
33
Chapter 1 Empirical finance
n
X n
X
= yi − Yb )2 +
(b εb2i .
i=1 i=1
We expressed the variance of the Y observations (the total sum of squares or SST) as the
sum of the variance of the fitted values (the regression sum of squares or SSR) and the
variance of the residuals (the residuals sum of squares or SSE). Now write
n n
yi − Yb )2 εb2i
P P
(b
1 = i=1
Pn +P
n
i=1
(yi − Y )2 (yi − Y )2
i=1 i=1
SSR SSE
= + .
SST SST
Define
SSR
R2 = .
SST
Naturally,
0 ≤ R2 ≤ 1.
The closer the R2 is to 1 the better the fit (the larger the variance of Y that is explained
by the regression or, equivalently, the smaller SSE, the better the fit). The closer the
R2 is to 0, the worse the fit (the larger is SSE).
(1) First alternative (and equivalent) way to write the R2 :
SSE
R2 = 1 − .
SST
n n 2
2 0
P P
This expression is useful. Note: SSE = εbi = yi − β xi . Recall, OLS finds β
b
i=1 i=1
which minimizes SSE. Hence, SSE from a regression with k regressors can never be
smaller than SSE from a regression with k + 1 regressors. Why? Well, because I could
always set the extra parameter equal to zero and, at the very least, achieve the value
34
Chapter 1 Empirical finance
that I would obtain with only k regressors. Hence, the R2 from a regression with k + 1
regressors will always be larger than the R2 from a regression with k regressors. This
is mechanical. It is also problematic since the increase might not have anything to do
with the actual explanatory power of the extra regressor. In sum: just adding regressors
(irrespective of their explanatory power ) increases the R2 . Thus, the R2 cannot be a
perfect measure of goodness of fit.
where s2yb is the sample variance of the fitted values, s2y is the sample variance of the y
observations, and syby is the sample covariance between fitted values and true values. But,
syby = cov(Y, Yb )
= cov(Yb + εb, Yb )
= var(Yb ) = s2yb.
Hence,
35
Chapter 1 Empirical finance
2 SSE/n − k
R = 1−
SST /n − 1
b2
σ
= 1− 2.
sy
−1
b−2 (Rβb − r)0 R(X 0 X)−1 R0
σ (Rβb − r)/q
ε∗0 εb∗ − εb0 εb) /q
(b
=
εb0 εb/(n − k)
d
→ Fq,n−k ,
where the εb∗ s are the estimated residuals from a regression which imposes the restriction
Rβ = r.
Proof.
36
Chapter 1 Empirical finance
−1 ε∗0 εb∗ −b
(b ε0 εb)/q
b−2 (Rβb − r)0 (R(X 0 X)−1 R0 ) (Rβb − r)/q and
We want to show that σ εb0 εb/n−k
are
identical. Note that
−1
b−2 (Rβb − r)0 R(X 0 X)−1 R0
σ (Rβb − r)/q
−1
(Rβb − r)0 (R(X 0 X)−1 R0 ) (Rβb − r)/q
= .
ε0 εb)/n − k
(b
−1
(Rβb − r)0 R(X 0 X)−1 R0 (Rβb − r) = εb∗0 εb∗ − εb0 εb.
∂C(β, λ)
= 0 ⇒ −X 0 (Y − Xβ ∗ ) + R0 λ = 0,
∂β
∂C(β, λ)
= 0 ⇒ Rβ ∗ = r.
∂λ
Thus,
R0 λ = X 0 Y + (X 0 X)β ∗
and
or
(X 0 X)−1 R0 λ = βb − β ∗ . (1)
37
Chapter 1 Empirical finance
In addition,
and
Now, consider
εb∗0 εb∗ = (Y − Xβ ∗ )0 (Y − Xβ ∗ )
= (Y − X βb + X βb − Xβ ∗ )0 (Y − X βb + X βb − Xβ ∗ )
= (Y − X β) b 0 (Y − X β) b + (βb − β ∗ )0 X 0 X(βb − β ∗ ) + 2(βb − β ∗ )0 X 0 (Y − X β)
b
Thus,
Done.
Theorem.
We can write
38
Chapter 1 Empirical finance
−1
b−2 (Rβb − r)0 R(X 0 X)−1 R0
σ (Rβb − r)/q
ε∗0 εb∗ − εb0 εb) /q
(b
=
εb0 εb/n − k
(R2 − R∗2 ) /q
= (1−R2 )
n−k
d
→ Fq,n−k ,
where the εb∗ s are the estimated residuals from a regression which imposes the restriction
Rβ = r and R∗2 is the R2 from the same regression.
Proof. Recall,
εb0 εb
R2 = 1 − ,
SST
εb∗0 εb∗
R∗2 = 1− ,
SST
and
Then
39
Chapter 1 Empirical finance
(R2 − R∗2 ) /q
= (1−R2 )
.
n−k
Done.
Note: it is trivial to compute R2 and R∗2 from virtually all unrestricted and restricted
regressions (provided the restrictions are linear). Hence, this is a very useful way to re-
express our general tests. The classical F test for H0 : β2 = β3 = ... = βk , for example,
is
R2 /(k − 1) d
(1−R2 )
→ Fk−1,n−k .
n−k
Why? R∗2 from the restricted regression is zero (the restricted regression only contains
an intercept). The number of restrictions q is, of course, equal to k − 1.
References
[1] Bandi, F.M. and J.R. Russell (2007). Full-information transaction costs. Working
paper.
[2] Ho, T. and H.R. Stoll (1981). Optimal dealer pricing under transactions and return
uncertainty. Journal of Financial Economics 9, 47-73.
[3] Stoll, H.R. (1989). Inferring the components of the bid-ask spread: theory and em-
pirical tests. Journal of Finance 44, 115-134.
40