Applied Financial Econometrics Using Stata 3. Linear Factor Models

Applied Financial Econometrics using Stata
3. Linear Factor Models
Stan Hurn
Queensland University of Technology
Hurn (QUT) Applied Financial Econometrics using Stata 1 / 40

Introduction to .do Files

The Problem
One of the most common problems in empirical asset pricing concerns the
estimation and evaluation of linear factor models. There is a large
literature on the econometric techniques to estimate and evaluate these
models which deals with the following questions.
how to estimate parameters
how to calculate standard errors of the pricing errors
how to test the model

The Data
The data are monthly percentage returns for the period July 1926 to
December 2013 (T = 1050) on 25 portfolios (r1 to r25) sorted in terms of
size and book-to-market values together with the risk free (US Treasury
bill rate) and the return on the market (S&P500 index). The data are
freely available from Ken Frenchs website:
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
The series in the file fama french.dta are:

r1 r25 = monthly returns to the portfolios
rm rf = excess market return
rf = riskfree rate of return
I will upload another file famafrench.dta which will contain some
additional series for you to play around with.

A First .do File
set more off

version 13
clear *
// set up a log file

capture log close capm
log using capm, name(capm) replace
// set current working directory

cd ~/Dropbox/Teaching/Singapore/do
// load the data and add a few labels

use fama_french.dta, clear
label variable rm "Market Return"
label variable rf "Risk Free Rate"
// format date variable and set data set as time series

format %td dateid01
tsset dateid01

When Run ...
. set more off

. version 13
. clear *
.
. // set up a log file
. capture log close capm
. log using capm, name(capm) replace
name: capm
log: /Users/stanhurn/Dropbox/TEACHING/SIngapore/do/capm.smcl
log type: smcl
opened on: 13 Mar 2014, 17:24:36
.
. // set current working directory
. cd ~/Dropbox/Teaching/Singapore/do
/Users/stanhurn/Dropbox/TEACHING/SIngapore/do
.
. // load the data and add a few labels
. use fama_french.dta, clear
. label variable rm "Market Return"
. label variable rf "Risk Free Rate"
.

Some plots
// format date variable and set data set as time series

format %td dateid01
tsset dateid01
// plot return on market and rf on same graph using same y-axes

twoway (tsline rm) (tsline rf), name(factors0, replace) ///
tlabel(,angle(forty_five) format(%tdCCYY)) xtitle("")
graph export "../factors0.pdf", as(pdf) replace
// plot return on market and rf on same graph using different y-axes

twoway (tsline rm, yaxis(1)) (tsline rf, yaxis(2)), name(factors, replace) ///
tlabel(,angle(forty_five) format(%tdCCYY)) xtitle("")
graph export "../factors.pdf", as(pdf) replace

Plot of the Market Return and Risk Free Rate
40
20
0
-20
-40
30
40
50
60
70
80
90
00
10
19
19
19
19
19
19
19
20
20
Market Return Risk Free Rate

Plot of the Market Return and Risk Free Rate
1.5
40
20
1
Risk Free Rate
Market Return
0
.5
-20
0
-40
30
40
50
60
70
80
90
00
10
19
19
19
19
19
19
19
20
20
Market Return Risk Free Rate

Estimating a Simple CAPM

One Factor Pricing Model
Define the excess returns zit = rit rf . If the pricing factor, ft is also an
excess return then the fundamental pricing model states that the excess
returns are linear in the betas
E(zit ) = E(ft ) .
This model is usually evaluated in the form of a time-series linear

regression
zit = i + i ft + uit .
Comparing the model and the expectation of the time-series regression, it
follows that all the regression intercepts i should be zero. In other words
the regression intercepts are equal to the pricing errors.

When Run ...
. // estimate CAPM for first portfolio and test alpha = 0 and beta = 1
. reg z1 rm_rf
Source SS df MS Number of obs = 1021
F( 1, 1019) = 1103.83
Model 79600.6792 1 79600.6792 Prob > F = 0.0000
Residual 73483.5518 1019 72.1133973 R-squared = 0.5200
Adj R-squared = 0.5195
Total 153084.231 1020 150.082579 Root MSE = 8.492
z1 Coef. Std. Err. t P>|t| [95% Conf. Interval]
rm_rf 1.63492 .0492092 33.22 0.000 1.538357 1.731483

_cons -.5916513 .2676044 -2.21 0.027 -1.11677 -.0665325
.
. // test the model
. test _cons
( 1) _cons = 0
F( 1, 1019) = 4.89
Prob > F = 0.0273
. test rm_rf=1
( 1) rm_rf = 1
F( 1, 1019) = 166.47
Prob > F = 0.0000
. test (_cons=0) (rm_rf=1)
( 1) _cons = 0
( 2) rm_rf = 1
F( 2, 1019) = 83.49
Prob > F = 0.0000

Estimation of the CAPM
There are at least four ways to estimate the simple CAPM for all 25
portfolios in Stata:
1 Equation-by-equation OLS. Loop over the excess returns and estimate
each equation.
2 Use the mveqn command. Performs equation-by-equation OLS
automatically.
3 Use the sureg command which performs seemingly unrelated
regressions.
4 Reshape the data as long format and use statsby prefix.

The Commands
// generate excess returns

local N = 25
forvalues i = 1/N {
qui gen zi = ri - rf
}
drop r1-r9 // note use of hyphen r10 comes right after r1
// at least four ways to do this estimation
forvalues i = 1/N {
qui regress zi rm_rf
}
qui mvreg z* = rm_rf
qui sureg z* = rm_rf
qui reshape long z, i(dateid01) j(portfolio)

qui statsby _b _se, by(portfolio) saving(simplecapm, replace): reg z rm_rf

The Reshape Command
. reshape long z, i(dateid01) j(portfolio)

(note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25)
Data wide -> long
Number of obs. 1021 -> 25525

Number of variables 29 -> 6
j variable (25 values) -> portfolio
xij variables:
z1 z2 ... z25 -> z

The Data in Long Format

Testing Pricing Errors

Basic Results
Recall from the results of the classical two-variable regression model

2
" #!
s2 f

b N 0 , 1+
T var(f )
where s 2 is the variance of the residuals of the regression.

The Wald test of the restriction that bi is zero (no pricing error in the i th
equation) is then given by dividing the coefficient estimate squared by its
variance !#1
2
"
f bi2

J =T 1+ 21
var(f ) s2

Joint Wald Test
We also want to know if all the pricing errors are jointly equal to zero. We
now have to think of the time-series regressions as a panel regression with
correlated errors, E(uit ujt ) 6= 0. The classic form of the test assumes no
autocorrelation or heteroskedasticity so the the Wald test of the joint
restrictions is given by
" 2
!#1
f 1 a
J=T 1+ b0
b
b 2N
var(f )
1
b0
= aT b
b.
b = [1 2 N ]0 and
with b is the residual covariance matrix. For
convenience, the test is often written just with a positive scaling constant
aT which depends on the sample size and the factor.

Gibbons, Ross, Shanken Test
The Wald test is asymptotical valid. A finite-sample F test is also

available, known as the Gibbons, Ross, Shanken or GRS test, given by
" 2
!#1
T N 1 f 1
GRS = 1+ b0
b
b FN,T N1
N var(f )
The F distribution recognises the sample variation in the estimation of b

which is not accounted for the asymptotic Wald version. This distribution
requires that the errors are normally distributed as well as uncorrelated and
homoskedastic.

Multiple Factors
The test does generalise to the case of multiple factors. Assuming normal
iid errors the test statistic is
T N K h 0 b 1 1 0 b 1
i
GRS = 1+f f
b b FN,T NK
N
in which
N = number of assets
K = number of factors
f = ET (ff )
T
b= 1
X
(ff t f )(ff t f i )0
T
t=1

GRS in Stata (Wald Version)
. // Gibbons Ross Shanken test (using seemingly unrelated regression estimator)

. qui sureg (z* = rm_rf)
.
. // Wald version
. qui test _cons
. qui sca grsW = r(chi2)
. qui sca pval = r(p)
.
. di as text "Degrees of freedom = " as res r(df)
Degrees of freedom = 25
. di as text "Gibbons Ross Shanken test (Wald Version) = " as res grsW
Gibbons Ross Shanken test (Wald Version) = 96.631577
. di as text "p-value = " as res pval
p-value = 2.302e-10

GRS in Stata (F Version)
. // F version
. sca tmp0 = (`T-`N-1)/`N
. sca tmp1 = grsW/`T
. sca grsF = tmp0 * tmp1
. sca pvF = Ftail(`N,`T-`N-1,grsF)
.
. di as text "Gibbons Ross Shanken test (F Version) = " as res grsF
Gibbons Ross Shanken test (F Version) = 3.7668333
. di as text "p-value = " as res pvF
p-value = 1.958e-09

GRS in Mata (Wald Version)
. // Estimate seemingly unrelated regression model (cheating!)

. qui sureg (z* = rm_rf)
.
. // now call mata (e(b) is 1x50 so must reshape
. // reshape to have 25 rows using rowshape()
. mata:
mata (type end to exit)
: aT = st_numscalar("aT")
: sigma = st_matrix("e(Sigma)")
: nf = strtoreal(st_local("N"))
: b = st_matrix("e(b)")
: bmat = (1::nf),rowshape(b,nf)
: st_matrix("suregb", bmat)
: end

GRS in Mata (Wald Version)
. // need to drop variables so matrix does not take dimensions of data set in memory
. drop *
.
. // stata view of the matrix has 3 columns (company # slope and constant)
. // name them and use the names to break the matrix into variables
. mat colnames suregb = company beta alpha
. qui svmat suregb, names(col)
. mata
: st_view(alpha=.,.,"alpha")
: J = aT * alpha * invsym(sigma) * alpha
: J
96.6328742
: end
Value returned for the GRS test is 96.6328742 which is (almost) identical
to that obtained previously.

Cross Section Regressions

Price of Risk
The central question of interest is why average returns vary across assets.
The answer is that the expected returns should be high if the asset has a
high exposure to the factors that carry large risk premia. Recall the
fundamental pricing model with a single factor in which the excess returns
are linear in the betas
E(zit ) = i E(ft ) .
Since the factor, ft , is also an excess return, the model applies to the
factor as well
E(ft ) = 1
where is the price of risk (risk premium) associated with the factor so
that
E(zit ) = i .

Two-pass Regression
A natural idea is then to store estimates of i from the time-series

regressions and then estimate the factor risk premium from a
cross-sectional regression of average returns on the i
ET (zit ) = i + i .
The cross-sectional regression residuals i are the pricing errors.

Using collapse
A powerful Stata command which can be used to implement the two-pass

estimator is
collapse clist [if] [in] [weight] [, options]
collapse converts the dataset in memory into a dataset of means, sums,

medians, etc. or any summary statistic contained in clist which must refer
to numeric variables exclusively.

First Pass
. // reshape the data to long form
. reshape long z, i(dateid01) j(portfolio)
(note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25)
Data wide -> long
Number of obs. 1021 -> 25525

Number of variables 30 -> 7
j variable (25 values) -> portfolio
xij variables:
z1 z2 ... z25 -> z
. save "./working/famafrenchlong.dta", replace

file ./working/famafrenchlong.dta saved
.
. // first pass regression
. statsby _b, by(portfolio) saving("./working/firstpass",replace) nodots: reg z rm_rf
command: regress z rm_rf
by: portfolio
.
. // keep estimated betas
. use "./working/firstpass.dta", clear
(statsby: regress)
. ren _b_rm_rf betas
. drop _b_cons
. save "./working/coefs.dta", replace
file ./working/coefs.dta saved

Using Collapse
. // now collapse the data to a pure cross section
. use "./working/famafrenchlong.dta", clear
. collapse (mean) z, by(portfolio)
.
. merge 1:1 portfolio using "./working/coefs"
Result # of obs.
not matched 0
matched 25 (_merge==3)
. drop _merge
.
. // second pass regression
. reg z betas, noconstant
Source SS df MS Number of obs = 25
F( 1, 24) = 31.28
Model 4.82079646 1 4.82079646 Prob > F = 0.0000
Residual 3.69828055 24 .154095023 R-squared = 0.5659
Adj R-squared = 0.5478
Total 8.51907702 25 .340763081 Root MSE = .39255
z Coef. Std. Err. t P>|t| [95% Conf. Interval]
betas 1.555446 .2780928 5.59 0.000 .9814903 2.129401

Fama-MacBeth Regressions
The Fama-MacBeth (1973) approach estimates cross section

regressions for each time period
zit = t i + uit
Having obtained these estimates, the Fama-MacBeth procedure then
computes
XT
=
b
bt ,
i=1
as the estimated price of risk.
The standard errors of these parameters are the sample standard
deviations from the cross-sectional regressions, defined as
h P i
1 T b 2
bt ) " T #
T t=1 ( 1 X
b2 ()
b = = 2 ( b 2 ,
bt )
T T
t=1

Second Pass Regression
. // this time merge the estimated betas into the long data set
. merge m:1 portfolio using "./working/coefs.dta"
Result # of obs.
not matched 0
matched 25,525 (_merge==3)
. drop _merge
. save "./working/famafrenchlong.dta", replace
file ./working/famafrenchlong.dta saved
.
. // run the regressions for each time period
. statsby _b, by(dateid01) saving("./working/famamacbeth",replace) nodots: reg z betas, nocon
command: regress z betas, noconstant
by: dateid01
.
. use "./working/famamacbeth.dta", replace
(statsby: regress)
. sum _b_betas
Variable Obs Mean Std. Dev. Min Max
_b_betas 1021 1.555446 11.53178 -36.97755 143.2732

Interesting New Developments

Large Numbers of Assets
The general Wald form of the Gibbons, Ross, Shanken test for zero pricing
errors (b
= 0) in a linear factor model is
b0
J = aT bb 2N
in which, b is the estimated covariance matrix of the errors, aT is a

positive scaling constant and N is the number of assets being tested.

Large Numbers of Assets
The general Wald form of the Gibbons, Ross, Shanken test for zero pricing
errors (b
= 0) in a linear factor model is
b0
J = aT bb 2N
in which, b is the estimated covariance matrix of the errors, aT is a

positive scaling constant and N is the number of assets being tested.
This test is applicable, however, only when the number of assets N is

much smaller than the length of the time series T . When N > T the
sample covariance b becomes degenerate. In practise, one typically picks a
testing period of T = 60 monthly data and does not increase the testing
period any longer, because the factor pricing model is technically a
one-period model whose factor loadings can be time-varying. If you are
looking at lots of assets this constitutes a problem.

Pesaran and Yamagata (2012)
To overcome the difficulty, Pesaran and Yamagata (2012, PY test) suggest
ignoring the correlations among assets and constructing a test statistic
under working independence by setting V = diag( b )1 . They derive the
following result for the distribution of the standardised quadratic form
b0 V
aT bN
Js = p N(0, 1)
2N(1 + eT )
with
1 X 2 2
et = ij > cT )
bij I(b
N
i6=j
1 1
cT = (1 c/N)
T
c (0, 0.5).

PY (Basic Version)
. // basic version of Pesaran Yamagata test with sqrt(2*N) as the scaling factor
. mata:
: st_view(s2=.,.,"s2")
: st_view(alpha=.,.,"alpha")
: N = strtoreal(st_local("N"))
: aT = st_numscalar("aT")
: Jnum = aT :* alpha * invsym(diag(s2)) * alpha - N
: Jden = sqrt(2 * N)
: J = Jnum :/ Jden
: pval = 1 - normal(abs(J))
: Jnum, Jden, J, pval
1 2 3 4
1 -6.500208749 7.071067812 -.9192683372 .1789776174
: end
. // Notes:
. // 1. the test strongly rejects for whole sample
. // 2. this value is calculated using T = N so it works when GRS would fail
. // 3. now we have a problem with low power

PY (Basic Version)
. // more advanced implementation of the test

. // uses sqrt(2*N*(1+eT)) as the scaling factor in the denominator
. mata:
: //Janum = aT :* alpha * invsym(diag(s2)) * alpha - N
: Jaden = sqrt(2 * N * (1 + eT))
: Ja = Jnum :/ Jaden
: pval1 = 1 - normal(abs(Ja))
:
: Jnum, Jaden, Ja, pval1
1 2 3 4
1 -6.500208749 8.712846629 -.7460487974 .227818969
: end
. // Notes:
. // 1. the numerator of the test is identical to the previous one
. // 2. eT=0.5 so denominator is only slightly affected
. // 3. value of this adjustment is questionable (need some MC evidence)

Power Enhancements
The PY test, or any other genuine quadratic statistic, is powerful only

when a non-negligible fraction of assets are mispriced. Indeed, the factor
N above reflects the noise accumulation in estimating N parameters in the
vector .
A new working paper by Fan, Liao and Yao (2013) proposes an interesting
method which uses power enhancements (PEM) to improve the power
of the asset pricing test statistic as follows:
1 Compute J1 , a test statistic that has the correct asymptotic size (e.g.,
GRS, PY) but which may suffer from small power.
2 Compute a PEM component test J0 that has two properties:
p
1 J0 0 under H0 .
2 J0 does not converge to 0 but even diverges when the true parameters
fall into a subset of the alternative hypothesis.
3 Compute the PEM test J = J0 + J1 .
Suggestion: Screened Wald Test
Of course the trick here is to find a statistic J0 which has these nice
properties. Fan, Liao and Yao (2013) propose a screened Wald test

J0 = NaT b 0s
bs
bs
in which b s is a subset of the original vector of estimated

b whose value
exceeds some threshold value T and u is the corresponding submatrix of
b
the original weight matrix b.

Suggestion: Screened Wald Test
Of course the trick here is to find a statistic J0 which has these nice
properties. Fan, Liao and Yao (2013) propose a screened Wald test

J0 = NaT b 0s
bs
bs
in which b s is a subset of the original vector of estimated

b whose value
exceeds some threshold value T and u is the corresponding submatrix of
b
the original weight matrix b.
Screening procedure
r
log N
T = log(log T )
T
Choose
bi if
|b
i |
> T

bj

Applied Financial Econometrics Using Stata 3. Linear Factor Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applied Financial Econometrics Using Stata 3. Linear Factor Models

Uploaded by

Copyright:

Available Formats

Applied Financial Econometrics using Stata

3. Linear Factor Models

Queensland University of Technology

Hurn (QUT) Applied Financial Econometrics using Stata 1 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 2 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 3 / 40

The series in the file fama french.dta are:

Hurn (QUT) Applied Financial Econometrics using Stata 4 / 40

set more off

// set up a log file

// set current working directory

// load the data and add a few labels

// format date variable and set data set as time series

Hurn (QUT) Applied Financial Econometrics using Stata 5 / 40

. set more off

Hurn (QUT) Applied Financial Econometrics using Stata 6 / 40

// format date variable and set data set as time series

// plot return on market and rf on same graph using same y-axes

graph export "../factors0.pdf", as(pdf) replace

// plot return on market and rf on same graph using different y-axes

graph export "../factors.pdf", as(pdf) replace

Hurn (QUT) Applied Financial Econometrics using Stata 7 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 8 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 9 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 10 / 40

This model is usually evaluated in the form of a time-series linear

Hurn (QUT) Applied Financial Econometrics using Stata 11 / 40

z1 Coef. Std. Err. t P>|t| [95% Conf. Interval]

rm_rf 1.63492 .0492092 33.22 0.000 1.538357 1.731483

Hurn (QUT) Applied Financial Econometrics using Stata 12 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 13 / 40

// generate excess returns

drop r1-r9 // note use of hyphen r10 comes right after r1

// at least four ways to do this estimation

qui sureg z* = rm_rf

qui reshape long z, i(dateid01) j(portfolio)

Hurn (QUT) Applied Financial Econometrics using Stata 14 / 40

. reshape long z, i(dateid01) j(portfolio)

Number of obs. 1021 -> 25525

Hurn (QUT) Applied Financial Econometrics using Stata 15 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 16 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 17 / 40

Recall from the results of the classical two-variable regression model

where s 2 is the variance of the residuals of the regression.

Hurn (QUT) Applied Financial Econometrics using Stata 18 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 19 / 40

The Wald test is asymptotical valid. A finite-sample F test is also

The F distribution recognises the sample variation in the estimation of b

Hurn (QUT) Applied Financial Econometrics using Stata 20 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 21 / 40

. // Gibbons Ross Shanken test (using seemingly unrelated regression estimator)

Hurn (QUT) Applied Financial Econometrics using Stata 22 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 23 / 40

. // Estimate seemingly unrelated regression model (cheating!)

Hurn (QUT) Applied Financial Econometrics using Stata 24 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 25 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 26 / 40

Hurn (QUT) Applied Financial Econometrics using Stata 27 / 40

A natural idea is then to store estimates of i from the time-series

The cross-sectional regression residuals i are the pricing errors.

Hurn (QUT) Applied Financial Econometrics using Stata 28 / 40

A powerful Stata command which can be used to implement the two-pass

collapse clist [if] [in] [weight] [, options]

The cross-sectional regression residuals i are the pricing errors.