You are on page 1of 40

Chapter 1 Empirical finance

Figure 1: Predicting returns over alternative horizons.

“There is no way to predict whether the price of stocks and bonds will go up or down
over the next few days or weeks. But it is quite possible to foresee the broad course of
the prices of these assets over longer time periods, such as the next three to five years...”
2013 Nobel Prize Committee

1
Chapter 1 Empirical finance

Multiple linear regression with pre-determined


regressors

We consider the model


Y = Xβ + ε,

where X is a fixed (non stochastic) n × k matrix of regressors (n observations and k


regressors) and ε is an n-vector of error terms.
Assumptions:

1. E(ε) = 0. On average, the errors are zero.

2. V ar(ε) = σ 2 In , where In is the identity matrix of dimension n. (a) The errors


are homoskedastic (same variances on the diagonal of V ar(ε)) and (b) uncorrelated
(the off-diagonal elements, i.e., the covariances, are equal to zero).

More explicitly, write


     
y1 x11 x21 ... xk1   ε1
 β1
y2 x12 x22 ... xk2 ε2
    
      
     β2   
=
y3 x13 x23 ... xk3   ...  +  ε3
    
   
... ... ... ... ... ...
      
     
βk
yn x1n x2n ... xkn εn
or
     
y1 β1 x11 + β2 x21 + ... + βk xk1 ε1
y2 β1 x12 + β2 x22 + ... + βk xk2 ε2
     
     
     
 y3 = β1 x13 + β2 x23 + ... + βk xk3 + ε3 ,
     
... ... ...
     
     
yn β1 x1n + β2 x2n + ... + βk xkn εn
where, again, E(ε) = 0 and V ar(ε) = σ 2 In . All we are saying is that the Y observations
are linear combinations of the k regressors contained in the fixed matrix X. On top of

2
Chapter 1 Empirical finance

the linear combination, there is an error vector. The error vector has mean zero and a
variance-covariance matrix given by
 
σ 2 0 .. .. 0
0 σ 2 0 .. ..
 
 
0 2
 
V ar(ε) = E(εε ) = σ In = 
 .. 0 σ 2 0 .. .

.. .. ... ... ...
 
 
0 .. .. 0 σ 2
Note: the first column of the X matrix could just be a column of ones. This is the case
when there is an intercept in the regression (β1 would therefore be the intercept).
Given the assumptions, it is easy to show that E(Y ) = Xβ and V ar(Y ) = σ 2 In . In
fact,

E(Y ) = E(Xβ + ε) = E(Xβ) + E(ε) = Xβ,


V ar(Y ) = V ar(Xβ + ε) = V ar(ε) = σ 2 In .

1 The ordinary least squares (OLS) method


We need to estimate parameters.

n
X
βb = arg min (yi − x0i β)2
β i=1
= arg min(Y − Xβ)0 (Y − Xβ)
β
0
= (X X)−1 X 0 Y

3
Chapter 1 Empirical finance

 n n
−1  n

x21i
P P P
 i=1 x1i x2i ... ...  i=1 x1i yi
i=1
 
 n n  n
 P   P 
x22i
P
x1i x2i ... ... x y
 
 i=1 2i i
   
= 
 i=1 i=1 
 
.


 ... ... ... ... 
  n ...
 

 n   P 
x2ki
P
... ... ... xki yi
i=1 i=1

The least squares method chooses the vector β by minimizing the squared differences
around the multivariate line x0 β.
Proof.

C(β) = (Y − Xβ)0 (Y − Xβ)

Hence,

∂C(β) 0
= 0 ⇒ −2X (Y − Xβ) = 0
∂β
2
∂ C(β)
= 2X 0 X > 0.
∂β∂β 0

By setting the first derivative equal to zero, we obtain the OLS βb estimator. The second
derivative is positive. Hence, we confirm that we have a minimum.

1.1 Definitions: fitted values and estimated residuals


Fitted values (Yb ). These are the values on the estimated line.

   
yb1 βb1 x11 + βb2 x21 + ... + βbk xk1
yb2 βb1 x12 + βb2 x22 + ... + βbk xk2
   
   
   
Yb = 
 yb3 =
  βb1 x13 + βb2 x23 + ... + βbk xk3 

... ...
   
   
ybn βb1 x1n + βb2 x2n + ... + βbk xkn

4
Chapter 1 Empirical finance

= X β.
b

Residuals (b
ε). These are the differences between the true Y values and the fitted values
Yb .
   
y1 yb1
y2 yb2
   
   
   
εb = Y − Yb = 
 y3 −
  yb3  = Y − X β.

b
... ...
   
   
yn ybn

1.2 Properties
From the first-order conditions of the minimization problem we can write

0
b = 0 ⇒ X 0 εb = 0
X (Y − X β)

or, equivalently (less compactly, but more intelligibly),


n
 
P
x1i εbi 
i=1

 Pn 
x2i εbi 
 

 i=1  = 0.
 

 ... 

 Pn 
xki εbi
i=1

This is like saying that the residuals are orthogonal to the X observations! Since the
fitted values are linear combinations of the X observations (see above), it also says that
the residuals are orthogonal to the fitted values. In other words,

0
X εb = 0

and

5
Chapter 1 Empirical finance

0
Yb εb = 0.

Important: Note that, if there is an intercept in the regression, then the first element in
n
0 P
X εb becomes εbi = 0. In other words, the sample mean of the residuals is zero, if there
i=1
is an intercept in the regression!

1.3 Properties: continued


Write

Yb = X βb = X(X 0 X)−1 X 0 Y = P Y

and

εb = Y − Yb = Y − X(X 0 X)−1 X 0 Y = (I − X(X 0 X)−1 X 0 )Y = M Y

where M and P are symmetric and idempotent matrices. They are symmetric, since
M = M 0 and P = P 0 (try showing it ...). They are idempotent since M = M M and
P = P P (try showing it ...).
Geometrically, P is the matrix which projects Y on the space spanned by the columns
of X (recall, Yb is a linear combination of the regressors). M is a matrix which projects
Y on the space orthogonal to the space spanned by the columns of X (recall εb and Yb are
orthogonal).

1.4 Partitioned matrices


Write

X = [ X1 | X2 ],
n×k1 n×k2

where k = k1 + k2 . We are simply separating the full matrix X into two sub-matrices.
Then,

6
Chapter 1 Empirical finance

βb = (X 0 X)−1 X 0 Y
" #−1 " #
X10 X1 X10 X2 X10 Y
=
X20 X1 X20 X2 X20 Y

There are two cases.


(1) Assume X10 X2 = 0 (the regressors in the first block and in the second block are
orthogonal). Then,

" #−1 " #


X10 X1 0 X10 Y
βb = 0
0 X2 X2 X20 Y
" #" #
(X10 X1 )−1 0 X10 Y
=
0 (X2 X2 )−1
0
X20 Y
" # " #
0 −1 0
(X1 X1 ) X1 Y βb1
= = .
(X20 X2 )−1 X20 Y βb2

Thus, we can run two separate regressions (one on X1 and one on X2 ) to obtain the least
squares estimates.
(2) Assume X10 X2 6= 0. Then, we can show that
" # " #
βb1 (X10 M2 X1 )−1 X10 M2 Y
= ,
βb2 (X20 M1 X2 )−1 X20 M1 Y
where M2 = In − X2 (X20 X2 )−1 X20 and M1 = In − X1 (X10 X1 )−1 X10 .
Intuition: Focus on βb1 . This is like running several regressions. First, regress the first
column of X1 on X2 . Obtain the residuals. Now, regress the second column of X1 on X2 .
Obtain the residuals. Keep going until you reach the last column of X1 . Collect the k1
columns of residuals in a new matrix X e1 = M2 X1 . Regress Y on the matrix of residuals
to obtain βb1 .

7
Chapter 1 Empirical finance

 −1
βb1 = e10 X
X e1 e10 Y
X
−1
= (X10 M2 X1 ) X10 M2 Y.

In English, first you want to purge X1 of the effect of X2 and compute the component of
X1 which is orthogonal of X2 (the residual matrix), then you want to regress Y on the
residual matrix.
Important: multiple regression does this automatically. You never really go through
these steps. If you know X2 and only care about β1 , just put X2 in your regression!
Proof.

" #
β1
Xβ = [X1 |X2 ] = X1 β1 + X2 β2
β2
= (M2 + P2 )X1 β1 + X2 β2
= M2 X1 β1 + P2 X1 β1 + X2 β2
= M2 X1 β1 + X2 (β2 + (X20 X2 )−1 X20 X1 β1 )
= M2 X1 β1 + X2 c
" #
β1
= [M2 X1 |X2 ] .
c

Notice that β1 has not changed. However, the regressors (and the second set of param-
eters) have changed. The two blocks are now orthogonal! Hence, I can find β1 by just
running a regression of Y on M2 X1 = X
e1 :

 −1
βb1 = e10 X
X e1 e10 Y
X
−1
= (X10 M20 M2 X1 ) X10 M20 M2 Y
−1
= (X10 M2 M2 X1 ) X10 M2 M2 Y

8
Chapter 1 Empirical finance

−1
= (X10 M2 X1 ) X10 M2 Y,

by the symmetry and idempotency of M2 .

2 The statistical properties of OLS

Write

βb = (X 0 X)−1 X 0 Y

βb = (X 0 X)−1 X 0 (Xβ + ε)
= (X 0 X)−1 (X 0 X)β + (X 0 X)−1 X 0 ε

βb = β + (X 0 X)−1 X 0 ε.

(1) The expected value of β.


b

b = E(β + (X 0 X)−1 X 0 ε)
E(β)
= β + (X 0 X)−1 X 0 E(ε)
= β.

The OLS estimator is unbiased. Interpret: whatever the true parameter β is, if the
model is true (i.e., if Y = Xβ +ε), then the OLS estimator will deliver the right parameter
(on average). This said, there is some sampling variation around the expectation. Hence,
we need to talk about the variance of β.b

9
Chapter 1 Empirical finance

(2) The variance of β.


b

b = E[(βb − E(β))(
V ar(β) b βb − E(β))b 0]

= E[(βb − β)(βb − β)0 ]


h 0 i
= E (X 0 X)−1 X 0 ε (X 0 X)−1 X 0 ε


= E (X 0 X)−1 X 0 εε0 X(X 0 X)−1


 

= (X 0 X)−1 X 0 E(εε0 )X(X 0 X)−1


= σ 2 (X 0 X)−1 X 0 In X(X 0 X)−1
= σ 2 (X 0 X)−1 X 0 X(X 0 X)−1
= σ 2 (X 0 X)−1 .

We can also write −1


X 0X

b = σ2 1
V ar(β)
n n

Interpret. The variance of βb depends directly on the variance of the error terms σ 2 and
0
inversely on the “variability” of the X observations, i.e., XnX . It also depends inversely
on the number of observations. Notice that, when the number of observations increases
without bound (i.e., when n → ∞), the distribution of the βb estimator becomes more and
more concentrated around the expected value β. We will return to this idea in Chapter
2.

2.1 The Gauss-Markov Theorem


The OLS estimator βb is BLUE (best linear unbiased). For any estimator βe which is linear
(in the observations Y ) and unbiased, it turns out that

e ≥ V ar(β).
V ar(β) b

Proof. Consider the generic estimator βe = AY , where A is a k × n matrix. Note that βe


is linear in Y. Let us compute its expected value.

10
Chapter 1 Empirical finance

E(β)
e = E(AY )

= E(AXβ + Aε)
= AXβ + AE(ε)
= AXβ.

Thus, AX = Ik for βe to be unbiased. So, βe = AY (with the restriction AX = Ik ) is a


generally specified unbiased and linear estimator of β. Note:

e = E((βe − E(β))(
V ar(β) e 0)
e βe − E(β))

= E((βe − β)(βe − β)0 )


= E(Aεε0 A0 )
0
= σ 2 AIn A
0
= σ 2 AA .

Now, we need to show that

e ≥ V ar(β).
V ar(β) b

Write

AA0 − (X 0 X)−1
= AA0 − AX(X 0 X)−1 X 0 A0
= A(In − X(X 0 X)−1 X 0 )A0
= AM A0
= AM M A0
= AM M 0 A0

11
Chapter 1 Empirical finance

by the fact that AX = Ik and M is symmetric and idempotent. Is AM M 0 A0 positive


semi-definite? Write

z 0 AM (AM )0 z
= ze0 ze
X n
= zei2 ≥ 0,
i=1

for any real vector z ∈ Rk . Because z 0 AM M 0 A0 z ≥ 0 for any conformable real z, AM M 0 A0


is positive semi-definite and AA0 − (X 0 X)−1 ≥ 0.

2.2 Estimation of σ 2
We (almost) use the empirical variance of the estimated residuals:
n
εb2i
P
i=1 εb0 εb
b2 =
σ = .
n−k n−k
b is unbiased for σ 2 (i.e., E(b
Statistical property: σ 2
σ 2 ) = σ 2 ).
Proof. Recall

εb = M Y

or

εb = M (Xβ + ε) = M ε.

Thus,

1 1
σ2) =
E(b ε0 εb) =
E(b E(ε0 M 0 M ε)
n−k n−k
1 1
= E(ε0 M ε) = E(trε0 M ε)
n−k n−k

12
Chapter 1 Empirical finance

1 1
= E(trM εε0 ) = tr (M E(εε0 ))
n−k n−k
σ2 σ2
= tr(M In ) = tr(M )
n−k n−k
σ2 0 σ2 0
= tr(In − X(X 0 X)−1 X ) = (trIn − tr(X(X 0 X)−1 X ))
n−k n−k
σ2  0
 σ2
= (n − tr (X 0 X)−1 X X ) = (n − tr (Ik ))
n−k n−k
σ2
= (n − k) = σ 2 .
n−k

The result relies on the symmetry and idempotency of M . It also relies on the properties
of the trace (for a review, refer to Chapter 0).

3 Exact inferential theory: testing


Write
Y = Xβ + ε,

where

d
ε → N (0, σ 2 In ).
d
The symbol → signifies “distributed as”. Because the error terms are normally distributed
and Y is a linear combination of normal random variables (with Xβ deterministic), we
have

d
Y → N (Xβ, σ 2 In ).

Note: we are imposing strong restrictions on the error terms. Not only are we saying that
they are mean zero, homeskedastic (same variance) and uncorrelated, we are also saying
that they are normally distributed. This will lead to an exact inferential theory. We
will see later what we mean by “exact”. In Chapter 2, normality will be relaxed. In
Chapter 3, we will relax normality, homoskedasticity and uncorrelatedness.

13
Chapter 1 Empirical finance

3.1 Classical testing problems


(1) Single linear restriction:

H0 : c0 β = γ

or, equivalently,
k
X
H0 : cj βj = γ.
j=1

Example: Standard t-test on the j th parameter.

H0 : βj = 0.

Write:
 
0
 

 0 

 .. 
c=
 

 1 (this is the jth spot) 
 

 ... 

0
and γ = 0.
(2) Multiple linear restrictions:

H0 : R β = r
q×kk×1 q×1

with q ≤ k. Here q is the number of restrictions, k is always the number of parameters.


Example: Standard F -test on the slope parameters (excluding the intercept).

H0 : β2 = β3 = ... = βk = 0.

Write:

14
Chapter 1 Empirical finance

 
0 1 0 ...
 
 0 0 1 0 ... 
R = ... ...

k−1×k
 0 ... 0 
0 ... 0 1
and
 
0
 
 0 
r = .
k−1×1 
 ... 

0

3.2 Implementation
Recall:

d
Y → N (Xβ, σ 2 In ).

Write

βb = (X 0 X)−1 X 0 Y.

Since, βb is a linear combination of normal random variables (the Y s), it is also a normal
random variable. Hence,

d
βb → N (β, σ 2 (X 0 X)−1 )

3.2.1 Single linear restriction

H0 : c0 β = γ.

Construction of the test:

d
c0 βb → N (c0 β, σ 2 c0 (X 0 X)−1 c)

15
Chapter 1 Empirical finance

or

d
c0 βb − c0 β → N (0, σ 2 c0 (X 0 X)−1 c)

or

c0 βb − c0 β d
p → N (0, 1)
0 0
σ c (X X) c −1

and, under the null hypothesis H0 : c0 β = γ,

c0 βb − γ d
p → N (0, 1).
0 0 −1
σ c (X X) c H0
This would be our test statistic, if we knew σ. If we knew σ, we could test the null hypoth-
0b
esis c0 β = γ by checking if the ratio √ c0 β−γ
0 −1
falls in the tails of the normal distribution
σ c (X X) c
(i.e., if it is larger than 2, or smaller than
q -2, for a 5% level test). Unfortunately, we do
εb0 εb
not know σ. We estimate σ using σ b = n−k .
We will show that, when we replace σ with σ b, the ratio is not standard normal
anymore. It is t-distributed with n − k degrees of freedom. The following result will lead
to the finding.

IFirst aside:

εb0 εb d 2
→ χn−k ,
σ2
where χ2n−k is a chi-squared random variable with n − k degrees of freedom.
Proof:
Recall, εb = M ε. Hence,

εb0 εb ε0 M 0 M ε ε0 M ε ε0 QΛQ0 ε
= = =
σ2 σ2 σ2 σ2
by the Jordan decomposition of the idempotent matrix M (please refer to Chapter 0).
Q is the matrix containing the eigenvectors of M . Note that QQ0 = In . Λ is the
matrix containing the eigenvalues of M on the diagonal and zeros everywhere else. By

16
Chapter 1 Empirical finance

idempotency, the eigenvalues of M are either 1 or zero. Since the trace of M is n − k, it


turns out that the number of ones is n − k. Now, notice that

Q0 ε d 1
→ N (0, 2 Q(σ 2 In )Q0 ) = N (0, In ).
σ σ
Q0 ε d
Thus, call σ
= Z → N (0, In ). This implies,

n−k
ε0 QΛQ0 ε 0
X d
= Z ΛZ = zi2 → χ2n−k
σ2 i=1

since the sum of n − k independent normal random variables is a chi-squared random


variable with n − k degrees of freedomJ

ISecond aside: Consider a standard normal random variable. Consider a chi-square


random variable with n − k degrees of freedom. Assume the two random variables are
independent. Hence,

N (0, 1)
q 2 = tn−k ,
χn−k
n−k

a t distribution with n − k degrees of freedoms.J

Let us now go back to

c0 βb − γ d
p → N (0, 1)
σ c0 (X 0 X)−1 c
Write

0b
√ c0 β−γ
c0 βb − γ c (X 0 X)−1 c d N (0, 1)
σ
p = q 0 → q 2 = tn−k .
σ
b c0 (X 0 X)−1 c εb εb χn−k
σ 2 (n−k) n−k

Interpret. When we replace σ with its estimator σ b, we effectively compute the ratio
between a normal random variable and the square root of an independent chi-squared
random variable divided by the number of degrees of freedom (n − k, in this case). As in

17
Chapter 1 Empirical finance

the second aside, this ratio is distributed as a t-student distribution with n − k degrees
of freedom. Thus,

c0 βb − γ d
p → tn−k .
σ
b c0 (X 0 X)−1 c
0b
We now test the null hypothesis c0 β = γ by checking if the ratio √ c0 β−γ falls in the
0 σ
b c (X X)−1 c
tails of the t distribution with n − k degrees of freedom (i.e., if it is larger than t0.025,n−k
- i.e., slightly larger than 2 - or smaller than −t0.025,n−k - i.e., slighly smaller than -2 - for
a 5% level test).

Example: Classical t-test (H0 : βj = 0).


The relevant statistic is
βbj − 0
t= q
b (X 0 X)−1
σ jj

where (X 0 X)−1 0 −1
jj is the j th spot on the diagonal of the matrix (X X)jj .

3.2.2 Multiple linear restrictions

H0 : R β = r .
q×kk×1 q×1

Construction of the test:

d
Rβb → N (Rβ, σ 2 R(X 0 X)−1 R0 )

or

d
Rβb − Rβ → N (0, σ 2 R(X 0 X)−1 R0 )

or

−1/2 d
Zq = σ −1 R(X 0 X)−1 R0 (Rβb − Rβ) → N (0, Iq ).

This implies that

18
Chapter 1 Empirical finance

−1 d
Zq0 Zq = σ −2 (Rβb − Rβ)0 R(X 0 X)−1 R0 (Rβb − Rβ) → χ2q

and, under the null hypothesis,

−1 d
Zq0 Zq = σ −2 (Rβb − r)0 R(X 0 X)−1 R0 (Rβb − r) → χ2q .
H0

This last result is obvious. The internal product of q independent standard normal
random variables is just a chi-squared random variable with q degrees of freedoms.
At this point, we could use the 95th percentile of the chi-squared distribution with q
degrees of freedom (χ20.95,q ) to test the null hypothesis. If

−1
σ −2 (Rβb − r)0 R(X 0 X)−1 R0 (Rβb − r) ≥ χ20.95,q

then we would reject the null hypothesis. The problem, again, is that we do not know σ.
Just like earlier, we will show that when we replace σ with σ b, the distribution of
the test statistic changes. In this case, it changes to that of an F random variable with
number of degrees of freedom q, n − k (when the test statistic is also divided by the
number of restrictions q).

IThird aside: Consider a chi-squared random variable with q degrees of freedom χ2q .
Consider a chi-squared random variable with n − k degrees of freedom χ2n−k . Assume the
two random variables are independent. Hence,

χ2q /q d
2
→ Fq,n−k ,
χn−k /n − k
an F distribution with q degrees of freedom in the numerator and n−k degrees of freedoms
in the denominator.J

Now, write

−1
σ −2 (Rβb − γ)0 (R(X 0 X)−1 R0 ) (Rβb − γ)/q
εb0 εb
σ 2 (n−k)

19
Chapter 1 Empirical finance

−1 d
b−2 (Rβb − γ)0 R(X 0 X)−1 R0
= σ (Rβb − γ)/q → Fq,n−k .

Thus, when we replace σ with σ b (and divide by q ), rather than using the 95th percentile
of the chi-squared distribution, we use the 95th percentile of the F distribution to test.
Example: Classical F-test with an intercept (H0 : β2 = β3 = ... = βk = 0).
The relevant statistic is
−1 d
σ b 0 R(X 0 X)−1 R0
b−2 (Rβ) (Rβ)/(k
b − 1) → Fk,n−k

with  
0 1 0 ...
 
 0 0 1 0 ... 
R = .
k−1×k 
 ... ... 0 ... 0 
0 ... 0 1

Note: All of these tests are “exact” in the sense that they are valid for any number of
observations n. Exact tests can be derived only by imposing strong restrictions (like
normality) on the error terms. Without normality, this testing framework would not
hold. In Chapter 2, we will abandon normality of the error terms and derive tests which
are not “exact” (and are, therefore, not valid for any n) but “asymptotic”, i.e., they are
valid only when the number of observations goes off to infinity.

4 Regression analysis, liquidity and asymmetric in-


formation

We are interested in the relation between the average bid-ask spreads on stocks and the
characteristics of the corresponding companies (Stoll, 2000). Download the file spreads-
microstructure.xls. The file contains information for the 100 stocks in the S&P 100 index.
Our variable of interest (the Y variable) is the bid-ask spread (constructed as an average

20
Chapter 1 Empirical finance

over the day) - or tradecost - of the S&P100 stocks. The explanatory, or X, variables
are:

1. log volatility - The log of the daily return standard deviation

2. log size - The log of the size of the stock. Size is total outstanding number of shares
multiplied by share price. Size is measured in thousands of dollars

3. log trades - This is the log of the average number of trades per day

4. log turn - This is the log of the ratio of the average number of shares traded per
day (in dollars) over the number of shares outstanding (in dollars)

5. NumberAnalysts - This is the number of analysts following the stock

The same data is used in Bandi and Russell (2007). Consider the following theories of
the determinants of bid-ask spreads.
1. Asymmetric information. Stocks with greater degrees of asymmetry in infor-
mation (regarding their fundamental value) tend to have wider bid-ask spreads. The
number of analysts following a stock is viewed as an asymmetric information proxy. The
larger it is, the lower private information, the smaller the spreads. Log turn-over is, also,
seen as an asymmetric information proxy. The larger it is, the larger private information,
the larger the spreads. (As Stoll, 1989, points out, without informed trading, stocks
would be traded in proportion to their shares outstanding. Trading rates in excess of this
proportion should be associated with informed trading.)
2. Liquidity. Stocks that trade more frequently and have larger market capitalization
(i.e., more liquid stocks) tend to have lower bid-ask spreads. The larger log trades and
log size, the larger liquidity, the smaller the spreads. Log turn-over is, also, sometimes
seen as a liquidity proxy. The larger it is, the larger liquidity, the smaller the spreads.
3. Fundamental volatility. Stocks that have a higher volatility of fundamental
values tend to have larger bid-ask spreads. Higher uncertainty about the underlying
stock’s value implies higher potential for adverse price moves and, hence, higher inventory
risk, mostly in the presence of large imbalances to offset (Ho and Stoll, 1981).

21
Chapter 1 Empirical finance

Asymmetric information is a term used to describe a situation where some market


participants are better informed that others about the value of the asset. The greater the
degree of asymmetric information, the wider the spreads should be as the market makers
(who are not fully informed) charge a higher price when selling (raising the ask) and
a lower price when buying (lowering the bid) to insulate themselves from losing money
to potentially better informed traders. We do not get to see the degree of asymmetric
information between market participants but we do know the following: 1) The larger
the number of analysts following a stock, the lower asymmetric information we expect
to be for that stock. Analysts provide information about the stock, thereby uncovering
its fundamental value. 2) The greater the turn-over of the stock, the higher we expect
asymmetric information to be for that stock. Intuitively, the greater the turnover, the
faster individuals are getting in and out of investment positions in the stock. They do this
more when they believe the current price does not accurately reflect the fundamental value
of the stock - that is, when they believe they possess asymmetric (superior) information.
As indicated, turn-over is also viewed, by some, as a liquidity proxy.

1. Generate a histogram of the bid-ask spreads. What do you notice?

30
Series: TRADECOST
Sample 1 100
25
Observations 100

20 Mean 0.000643
Median 0.000533
Maximum 0.002967
15
Minimum 0.000319
Std. Dev. 0.000399
10 Skewness 3.886016
Kurtosis 21.20235
5
Jarque-Bera 1632.208
Probability 0.000000
0
0.0005 0.0010 0.0015 0.0020 0.0025 0.0030

Figure 2: Histogram of the spreads

There is apparent skewness.

22
Chapter 1 Empirical finance

2. Take a logarithmic transformation of the bid-ask spreads. Plot a his-


togram again. What do you notice now?

20
Series: LOG_TRADECOST
Sample 1 100
16 Observations 100

Mean -7.451748
12 Median -7.536629
Maximum -5.820198
Minimum -8.049928
8 Std. Dev. 0.403460
Skewness 1.624503
Kurtosis 6.613926
4
Jarque-Bera 98.40212
Probability 0.000000
0
-8.0 -7.8 -7.6 -7.4 -7.2 -7.0 -6.8 -6.6 -6.4 -6.2 -6.0 -5.8

Figure 3: Histogram of the log(spreads)

A bit more “normal” or “Gaussian”.

3. Run a least-squares regression of the log bid-ask spread on the 5 ex-


planatory variables.

23
Chapter 1 Empirical finance

Dependent Variable: LOG TRADECOST


Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: Time:
Sample: 1 100
Included observations: 100
LOG TRADECOST = C(1)+C(2)*LOG SIZE+C(3)*LOG VOLATILITY+C(4)
*NUMBERANALYSTS+C(5)*LOG TRADES+C(6)*LOG TURN

Coefficient Std. Error t-Statistic Prob.

C(1) −0.829315 0.442769 −1.873022 0.0642


C(2) −0.140288 0.023910 −5.867276 0.0000
C(3) 1.022999 0.053245 19.21317 0.0000
C(4) 5.04E − 05 0.002913 0.017313 0.9862
C(5) −0.169363 0.035596 −4.757966 0.0000
C(6) −0.098714 0.032502 −3.037139 0.0031

R-squared 0.882467 Mean dependent var −7.451748


Adjusted R-squared 0.876216 S.D. dependent var 0.403460
S.E. of regression 0.141950 Akaike info criterion −1.008566
Sum squared resid 1.894069 Schwarz criterion −0.852256
Log likelihood 56.42829 Hannan-Quinn criter. −0.945304
F-statistic 141.1555 Durbin-Watson stat 1.894852
Prob(F-statistic) 0.000000

4. Interpret the coefficients economically (in light of the three theories


above) and statistically. Do you find your results surprising?
The theories have some empirical validation. Importantly, all variables are statis-
tically significant, with the exception of the number of analysts. Take volatility,
for instance. To test statistical significance, one could run the following “single
restriction” test. We are testing if c(3) = 0. In other words, we are testing the null

H0 : β3 = 0.

24
Chapter 1 Empirical finance

Write

 
0
0
 
 
 
c =  1 (this is the 3rd spot) 


...
 
 
0
and γ = 0. Thus,

βb3 − 0
t= q
b (X 0 X)−1
σ 3,3

1.022999 − 0
=
0.053245
=19.21,

where (X 0 X)−1 0 −1
3,3 is the 3rd value on the diagonal of the matrix (X X)3,3 .
Since |19.21| > 2, the parameter associated with volatility is “statistically different”
from zero. In fact, volatility is the most statistically significant of all assumed
predictors.
Note: if one wished to use the critical values of the t distribution with n − k degrees
of freedom (in our case, 100 - 6 = 94), these critical values would be close to -2
and 2 (since the t density function would be very similar to the normal density
function).
Alternatively, one could use a “multiple restriction” test to test a single restriction.
This would effectively amount to using a “one-sided” test rather than a “two-sided”
test. Write

H0 : R β = r .
q×kk×1 q×1

where q = 1, the vector R is the same as the vector c above (transposed, of course),

25
Chapter 1 Empirical finance

and the scalar r is the same as the scalar γ above, namely 0. Hence,

H0 : c0 β = γ = 0.
1×kk×1 1×1

Hence, the statistic would be:

0 −1 d
b−2 βb3 (X 0 X)−1
σ 3,3 βb3 /1 → F1,94

Notice that the F statistic would now effectively be the square of the t statistic.
Here is the output.
Wald Test:
Equation: Untitled

Test Statistic Value df Probability

t-statistic 19.21317 94 0.0000


F-statistic 369.1457 (1, 94) 0.0000
Chi-square 369.1457 1 0.0000

Null Hypothesis: C(3)=0


Null Hypothesis Summary:

Normalized Restriction (= 0) Value Std. Err.

C(3) 1.022999 0.053245

Restrictions are linear in coefficients.

In the output above, one could ignore - for the time being - the remaining test
(called Chi-square). We will return to it in the next chapter.

5. Test the assumption that the coefficient associated with log volatility
is equal to 1. If this is the case, how would you interpret the relation
between daily volatility and bid-ask spreads?

26
Chapter 1 Empirical finance

Again, this is a single restriction test. We are testing if c(3) = 1. In other words:

H0 : β3 = 1.

Write

 
0
 

 0 

 .. 
c=
 

 1 (this is the 3rd spot) 
 

 ... 

0
and γ = 1. Specifically, write

βb3 − 1
t= q
b (X 0 X)−1
σ 3,3

1.022999 − 1
=
0.053245
=0.43,

Hence, we “fail” to reject. The true volatility slope could be 1. Of course, as earlier,
we could have used a single-sided F test as well.
What does it mean to have a slope equal to 1? The slope is

∂log(tradecosts)∂tradecosts ∂tradecosts
∂log(tradecosts) ∂tradecosts tradecosts
= ∂log(volatility)∂volatility
= ∂volatility
= 1,
∂log(volatility) volatility
∂volatility

Hence, since the regression is log/log (“logarithm on logarithm”), the slope has an
interpretation in terms of elasticity. A 1% increase in volatility translates into a 1%
increase in tradecosts.

27
Chapter 1 Empirical finance

6. Test the assumption that the coefficients associated with log size and log
trades are equal to each other. Be as precise as possible.
We use a classical F test with 1 restriction. See below in bold. We “fail” to reject.
Clear from the p-value, right?

Wald Test:
Equation: Untitled

Test Statistic Value df Probability

t-statistic 0.672016 94 0.5032


F-statistic 0.451606 (1, 94) 0.5032
Chi-square 0.451606 1 0.5016

Null Hypothesis: C(2)=C(5)


Null Hypothesis Summary:

Normalized Restriction (= 0) Value Std. Err.

C(2) - C(5) 0.029075 0.043265

Restrictions are linear in coefficients.

7. Test the assumption that the coefficients associated with log turnover
and number of analysts are jointly equal to zero.
We use a classical F test with 2 restrictions. See below in bold. We reject. Again,
look at the p-value ...

28
Chapter 1 Empirical finance

Wald Test:
Equation: Untitled

Test Statistic Value df Probability

F-statistic 4.979229 (2, 94) 0.0088


Chi-square 9.958458 2 0.0069

Null Hypothesis: C(4)=0, C(6)=0


Null Hypothesis Summary:

Normalized Restriction (= 0) Value Std. Err.

C(4) 5.04E − 05 0.002913


C(6) −0.098714 0.032502

Restrictions are linear in coefficients.

8. Let us use our model to predict what the spread will look like tomor-
row. Use a regression which does not include the number of analysts
to predict. Consider a stock which has a log size of 10.5. Suppose that
for this stock we expect that for the following day the log turnover will
be -1.1, the log of the number of trades will be 7.6, and the log of the
standard deviation will be -3.5. Predict what the spread will be for this
stock tomorrow. (Note that since the regression is run with log spreads
you will have to make a transformation to convert your prediction for
the log spread into a prediction for the actual spread ...).
Here is the regression output excluding the number of analysts. All other parameter
estimates are robust to this exclusion, i.e., similar to previous results.

29
Chapter 1 Empirical finance

Dependent Variable: LOG TRADECOST


Method: Least Squares (Gauss-Newton / Marquardt steps)
Date: Time:
Sample: 1 100
Included observations: 100
LOG TRADECOST = C(1)+C(2)*LOG SIZE+C(3)*LOG VOLATILITY+C(4)
*LOG TRADES+C(5)*LOG TURN

Coefficient Std. Error t-Statistic Prob.

C(1) −0.828920 0.439847 −1.884563 0.0625


C(2) −0.140277 0.023775 −5.900212 0.0000
C(3) 1.023106 0.052606 19.44864 0.0000
C(4) −0.169296 0.035201 −4.809382 0.0000
C(5) −0.098864 0.031164 −3.172391 0.0020

R-squared 0.882467 Mean dependent var −7.451748


Adjusted R-squared 0.877518 S.D. dependent var 0.403460
S.E. of regression 0.141201 Akaike info criterion −1.028563
Sum squared resid 1.894075 Schwarz criterion −0.898304
Log likelihood 56.42813 Hannan-Quinn criter. −0.975845
F-statistic 178.3208 Durbin-Watson stat 1.895160
Prob(F-statistic) 0.000000

Let us do the prediction:

\
log(tradecosts) = −0.829−0.14×10.5+1.023×(−3.5)−0.17×7.6−0.099×(−1.1) = −7.0626

Now, to obtain a prediction for tradecosts rather than for log(tradecosts) we need
to exponentiate. We write

\ = e−7.0626 = 0.00085655,
tradecosts

which is a value larger than the historical mean (see the histogram in point 1 above).

30
Chapter 1 Empirical finance

5 Appendix I:
5.1 Another useful idempotent matrix
Consider

L = In − i(i0 i)−1 i0 ,

where i is an n × 1 column vector of ones. L is, of course, symmetric and idempotent.


More explicitly,

(ii0 )
L = In −
n
or
   
1 0 ... ... 1 1 1 1
   
 1 ...  1 1 1
−  1 1 
L= .
 n 1 1
... ... ...  1 1
  
 
1 1 1 1 1
The matrix L transforms any n × 1 column vector y in deviations from the mean. In fact,

     
1 0 ... ... y1 1 1 1 1 y1
     
 1 ...   y2  1  1 1 1 1  y2 
Ly =   ...  − n  1
    

 ... 1 ... 
   1 1 1 
 ... 

1 yn 1 1 1 1 yn
     
y1 Y y1 − Y
     
 y2   Y   y2 − Y 
=  − = .

 ... 
  ...   ... 
   
yn Y yn − Y

31
Chapter 1 Empirical finance

5.2 More on partitioned regressions (the scalar case with an


intercept)
Consider the case

Y = Xβ + ε,

where
 
1 x11
 
 1 x12 
X= 

 ... ... 

1 x1n
and
" #
β1
β= .
β2

This is the standard scalar case. What is βb2 ? It is

−1
βb2 = (X20 M1 X2 ) X20 M1 Y

but, in this case, M1 = L. Hence,

−1
βb2 = (X20 LX2 ) (X20 LY )
−1
= (X20 LLX2 ) (X20 LLY )
−1
= (X20 L0 LX2 ) (X20 L0 LY )
= ((LX2 )0 (LX2 ))−1 (LX2 )0 (LY )
Pn
(yi − Y )(xi − X)
i=1
= Pn ,
(xi − X) 2
i=1

32
Chapter 1 Empirical finance

which has a very familiar form from univariate regression analysis, right? Also, this is
how you compute the beta of an asset ... recall?

6 Appendix II: Miscellanea


6.1 The R2
Write n
X
(yi − Y )2 = (LY )0 (LY ),
i=1

but

Y = Yb + εb.

Hence,

(LY )0 (LY ) = (L(Yb + εb))0 (L(Yb + εb))


= (LYb )0 (LYb ) + (Lb
ε)0 (Lb
ε) + 2(LYb )0 (Lb
ε)
= (LYb )0 (LYb ) + (b
ε0 εb) + 2(LYb )0 εb,

since, with an intercept, the mean of the residuals is zero. In addition

(LYb )0 εb = Yb 0 L0 εb = Yb 0 Lb
ε = Yb 0 εb = 0.

Thus,

n
X
(yi − Y )2
i=1
= (LY )0 (LY )
= (LYb )0 (LYb ) + (b
ε0 εb)

33
Chapter 1 Empirical finance

n
X n
X
= yi − Yb )2 +
(b εb2i .
i=1 i=1

We expressed the variance of the Y observations (the total sum of squares or SST) as the
sum of the variance of the fitted values (the regression sum of squares or SSR) and the
variance of the residuals (the residuals sum of squares or SSE). Now write

n n
yi − Yb )2 εb2i
P P
(b
1 = i=1
Pn +P
n
i=1

(yi − Y )2 (yi − Y )2
i=1 i=1
SSR SSE
= + .
SST SST

Define

SSR
R2 = .
SST
Naturally,

0 ≤ R2 ≤ 1.

The closer the R2 is to 1 the better the fit (the larger the variance of Y that is explained
by the regression or, equivalently, the smaller SSE, the better the fit). The closer the
R2 is to 0, the worse the fit (the larger is SSE).
(1) First alternative (and equivalent) way to write the R2 :

SSE
R2 = 1 − .
SST
n n  2
2 0
P P
This expression is useful. Note: SSE = εbi = yi − β xi . Recall, OLS finds β
b
i=1 i=1
which minimizes SSE. Hence, SSE from a regression with k regressors can never be
smaller than SSE from a regression with k + 1 regressors. Why? Well, because I could
always set the extra parameter equal to zero and, at the very least, achieve the value

34
Chapter 1 Empirical finance

that I would obtain with only k regressors. Hence, the R2 from a regression with k + 1
regressors will always be larger than the R2 from a regression with k regressors. This
is mechanical. It is also problematic since the increase might not have anything to do
with the actual explanatory power of the extra regressor. In sum: just adding regressors
(irrespective of their explanatory power ) increases the R2 . Thus, the R2 cannot be a
perfect measure of goodness of fit.

(2) Second alternative (and equivalent) way to write the R2 :


n
1
yi − Yb )2
P
(b
SSR n−1
i=1 s2yb s2ybsyby
R2 = = n = 2 = 2 ,
SST 1
P sy sy syby
n−1
(yi − Y )2
i=1

where s2yb is the sample variance of the fitted values, s2y is the sample variance of the y
observations, and syby is the sample covariance between fitted values and true values. But,

syby = cov(Y, Yb )
= cov(Yb + εb, Yb )
= var(Yb ) = s2yb.

Hence,

s2ybsyby s2ybsyby s2yby


R2 = = = 2
= ryb
y.
s2y syby s2y s2yb s2y s2yb
The R2 is just the empirical correlation between the fitted values and the Y values. In
the scalar case, R2 = ryx 2
, the empirical correlation between the regressor x and the
regressand y. This observation is also useful: the R2 is just a descriptive statistic. In
the scalar case, it does not contain more information than the correlation between Y and
X. In fact, it is the correlation between Y and X raised to the second power and could,
therefore, be computed before inference begins.

35
Chapter 1 Empirical finance

6.2 The adjusted R2


As we discussed, the value of the R2 increases mechanically with the number of regressors.
Idea: penalize the goodness-of-fit measure for using a large number of k which might
spuriously increase the R2 . The adjusted-R2 contains a penalty. It is defined as

2 SSE/n − k
R = 1−
SST /n − 1
b2
σ
= 1− 2.
sy

A larger k decreases SSE. We discussed this effect earlier. However, if k ↑, n − k ↓,


2
SSE/n − k ↑, and R ↓, keeping SSE fixed. The two effects might, at least partially,
compensate each other. Thus, dividing by n − k introduces a penalty.

6.3 Re-writing hypothesis testing in a friendlier (for empirical


work) format
Assume one wants to test the general restriction H0 : Rβ = r.
Theorem.

−1
b−2 (Rβb − r)0 R(X 0 X)−1 R0
σ (Rβb − r)/q
ε∗0 εb∗ − εb0 εb) /q
(b
=
εb0 εb/(n − k)
d
→ Fq,n−k ,

where the εb∗ s are the estimated residuals from a regression which imposes the restriction
Rβ = r.
Proof.

36
Chapter 1 Empirical finance

−1 ε∗0 εb∗ −b
(b ε0 εb)/q
b−2 (Rβb − r)0 (R(X 0 X)−1 R0 ) (Rβb − r)/q and
We want to show that σ εb0 εb/n−k
are
identical. Note that

−1
b−2 (Rβb − r)0 R(X 0 X)−1 R0
σ (Rβb − r)/q
−1
(Rβb − r)0 (R(X 0 X)−1 R0 ) (Rβb − r)/q
= .
ε0 εb)/n − k
(b

Hence, we only need to show that

−1
(Rβb − r)0 R(X 0 X)−1 R0 (Rβb − r) = εb∗0 εb∗ − εb0 εb.

First, compute the constrained OLS estimator: βb∗ .

β ∗ = arg minC(β, λ) = (Y − Xβ)0 (Y − Xβ) + λ0 (Rβ − r)


β

The first-order conditions are:

∂C(β, λ)
= 0 ⇒ −X 0 (Y − Xβ ∗ ) + R0 λ = 0,
∂β
∂C(β, λ)
= 0 ⇒ Rβ ∗ = r.
∂λ

Thus,

R0 λ = X 0 Y + (X 0 X)β ∗

and

(X 0 X)−1 R0 λ = (X 0 X)−1 X 0 Y + (X 0 X)−1 (X 0 X)β ∗

or

(X 0 X)−1 R0 λ = βb − β ∗ . (1)

37
Chapter 1 Empirical finance

In addition,

R(X 0 X)−1 R0 λ = R(βb − β ∗ )

and

λ = (R(X 0 X)−1 R0 )−1 (Rβb − r). (2)

Now, consider

εb∗0 εb∗ = (Y − Xβ ∗ )0 (Y − Xβ ∗ )
= (Y − X βb + X βb − Xβ ∗ )0 (Y − X βb + X βb − Xβ ∗ )
= (Y − X β) b 0 (Y − X β) b + (βb − β ∗ )0 X 0 X(βb − β ∗ ) + 2(βb − β ∗ )0 X 0 (Y − X β)
b

= εb0 εb + (βb − β ∗ )0 X 0 X(βb − β ∗ ) + 2(βb − β ∗ )0 X 0 εb


= εb0 εb + (βb − β ∗ )0 X 0 X(βb − β ∗ ).

Thus,

εb∗0 εb∗ − εb0 εb = (βb − β ∗ )0 X 0 X(βb − β ∗ )


= λ0 R(X 0 X)−1 X 0 X(X 0 X)−1 R0 λ
= λ0 R(X 0 X)−1 R0 λ
= (Rβb − r)0 (R(X 0 X)−1 R0 )−1 R(X 0 X)−1 R0 (R(X 0 X)−1 R0 )−1 (Rβb − r)


= (Rβb − r)0 (R(X 0 X)−1 R0 )−1 (Rβb − r).

Done.
Theorem.
We can write

38
Chapter 1 Empirical finance

−1
b−2 (Rβb − r)0 R(X 0 X)−1 R0
σ (Rβb − r)/q
ε∗0 εb∗ − εb0 εb) /q
(b
=
εb0 εb/n − k
(R2 − R∗2 ) /q
= (1−R2 )
n−k
d
→ Fq,n−k ,

where the εb∗ s are the estimated residuals from a regression which imposes the restriction
Rβ = r and R∗2 is the R2 from the same regression.
Proof. Recall,

εb0 εb
R2 = 1 − ,
SST
εb∗0 εb∗
R∗2 = 1− ,
SST

and

εb0 εb = SST − (SST ) R2 ,


εb∗0 εb∗ = SST − (SST ) R∗2 .

Then

ε∗0 εb∗ − εb0 εb) /q


(b (SST − (SST ) R∗2 − SST + (SST ) R2 ) /q
= SST −(SST )R2
εb0 εb/n − k
n−k
∗2 2
(1 − R − 1 + R ) /q
= 1−R2
n−k

39
Chapter 1 Empirical finance

(R2 − R∗2 ) /q
= (1−R2 )
.
n−k

Done.
Note: it is trivial to compute R2 and R∗2 from virtually all unrestricted and restricted
regressions (provided the restrictions are linear). Hence, this is a very useful way to re-
express our general tests. The classical F test for H0 : β2 = β3 = ... = βk , for example,
is

R2 /(k − 1) d
(1−R2 )
→ Fk−1,n−k .
n−k

Why? R∗2 from the restricted regression is zero (the restricted regression only contains
an intercept). The number of restrictions q is, of course, equal to k − 1.

References
[1] Bandi, F.M. and J.R. Russell (2007). Full-information transaction costs. Working
paper.

[2] Ho, T. and H.R. Stoll (1981). Optimal dealer pricing under transactions and return
uncertainty. Journal of Financial Economics 9, 47-73.

[3] Stoll, H.R. (1989). Inferring the components of the bid-ask spread: theory and em-
pirical tests. Journal of Finance 44, 115-134.

[4] Stoll, H.R. (2000). Frictions. Journal of Finance 55, 1479-1514.

40

You might also like