Week 5 Empirical - Methods 2

The CAPM and Anomalies: Empirical Research
Guanglian Hu
University of Sydney
[S1 2020]
Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 1 / 20

Roadmap
Overview of a few commonly used WRDS database
Portfolio sorts and regression-based tests of asset pricing models

WRDS
Wharton Research Data Services (WRDS) is a web-based business

data research service, from The Wharton School at the University of
Pennsylvania. WRDS database is widely used in academia finance.
CRSP
Compustat

CRSP
The Center for Research in Security Prices (CRSP) U.S. Stock Database
contains comprehensive information on equity securities listed on the major
stock exchanges in the U.S. (e.g, NYSE and NASDAQ). It also provides
information on stock market indices.
The CRSP database contains information on price and quote, holding period
returns, trading volume, shares outstanding, identifiers and distribution.
It comes with a monthly data file (end of month) and a daily data file (end
of day).
CRSP annual products are updated each year in February, and consist of
data for the entire previous year.

CRSP
Commonly used variables:
permno: primary identifier in CRSP. It is a unique permanent security
identification number assigned by CRSP to each security.
permco: a unique permanent company identification number assigned by CRSP to
each company with issues on a CRSP File. In most cases, there is one-to-one
correspondence between permno and permco. But sometimes a permco can be
associated with multiple permno.
ret: stock returns including dividend. This is the key variable in CRSP. Note that
returns are already adjusted for splits.
shrcd: share code. It is common to focus on U.S. common equity (10 and 11)
prc: closing price. If closing price is not available, CRPS uses the bid-ask average
(denoted with a negative sign). It is common to exclude penny stocks (e.g., prc<1)
vol: trading volume
shrout: number of shares of outstanding. market cap = abs(prc)*shrout
siccd: The Standard Industrial Classification code (SIC), which is used to group
companies with similar products or services. Some studies exclude utilities (SIC
codes between 4900 and 4999) and financials (SIC codes between 6000 and 6999)

CRSP: Monthly Stock File

CRSP: Daily Stock File

Using CRSP Data
Three ways to access CRSP data
Web Queries
The WRDS Cloud
Connecting Remotely. You can connect to WRDS and access WRDS

datasets in the following interactive applications:
SAS via PC-SAS/Connect
Python via Jupyter Notebooks or Spyder
R via RStudio
Matlab
Stata

Compustat
Compustat provides information on company fundamentals, including

data items from income statements, balance sheets, and statement of
cash flows.
Compustat North America, Compustat Global, Compustat Bank
Compustat North America is the most-used one. It is very common to

merge Compustat with CRSP in the cross-sectional asset pricing
literature
It comes with a quarterly dataset (fundq) and annual dataset (funda),

updated daily

Compustat
Variables included in Compustat (over 500):

gvkey : a permanent identifier assigned to each company. GVKEYs
do not change, nor are they reused.
datadate: fiscal year end date
fyear: fiscal year
fyr: fiscal year end month
at: total assets
lt: total liability
cogs: cost of goods sold
sale: Net sales
many others

funda

Portfolio Sorts
Economic theory, or empirical conjecture, often yields a prediction

that expected returns should be increasing (or decreasing) in some
characteristic.
e.g., beta, lottery feature
Portfolio sorts are widely used in the literature, for a number of

reasons:
simple and intuitive
does not require assuming a linear relationship between expected
returns and the factor (less sensitive to outliers)
the return spread between the top and bottom portfolios can be
interpreted as the profits from a trading strategy

Portfolio Sorts
A univariate portfolio sort is usually conducted as follows

individual stocks are sorted according to a given characteristic (e.g.,
size, past returns, etc.) measured over some period
these stocks are then grouped into N portfolios (usually 5 or 10)
average returns on these portfolios are then computed over a
subsequent period or the same period over which the characteristic is
measured
the significance of the relationship is judged by whether the two
extreme portfolios have significantly different average returns.
To control for other existing cross-sectional effects, one can do a
double-sort or Fama-MacBeth regression

Time Series Regressions
Recall the CAPM is given by:
E [ri ] − rf = βi (E [rm ] − rf );
We can test the CAPM by running OLS time-series regressions:
Rtei = αi + βi ft + it , t = 1, 2, 3, ......T
where Rtei is the excess return of stock i, and ft is the excess return of
the market.
According to the CAPM, αi , sometime referred to as the pricing error,

should be equal to zero.

Time Series Regressions
We also want to know whether all the pricing errors are jointly equal
to zero.
Assuming no autocorrelation or heteroskedasticity, the following test

statistics will have a chi-square distribution with N degrees of
freedom:
Note that α̂0 is a vector of the estimated alphas, Σ̂ is the variance

covariance matrix of the residuals, ET (F ) and σ̂ (f ) are the sample
mean and standard deviation of the market.

Cross-Sectional Regressions
Recall the CAPM is given by:
E [ri ] − rf = βi (E [rm ] − rf );
We can also test the CAPM by running a cross-sectional regression of

average returns on the beta.
First, find estimates of the betas from time series regressions:
Rtei = ai + βi ft + it , t = 1, 2, 3, ......T , for each i
where Rtei is the excess return of stock i, and ft is the excess return of
the market.

Second, with the estimated betas, run a regression across assets of

average returns on the betas.
E (R ei ) = λβi + αi , i = 1, 2, 3, ......N
where E (R ei ) is the average return of stock i. Note that you can run
the above regression with or without intercepts. The residuals αi are
the pricing errors.
According to CAPM, λ should be positive and close to the market
risk premium. Also the residuals of the second stage regression αi ,
should be equal to zero.


Fama-MacBeth Regressions
Fama and MacBeth (1973) suggest an alternative procedure for

running cross-sectional regressions, as well as for producing standard
errors and test statistics.
First, as before, find beta estimates with a time series regression.
Second, instead of estimating a single cross-sectional regression with

the sample averages, we now run a cross-sectional regression at each
time period:
Rtei = λt βi + αit , t = 1, 2, 3, ......T , for each t (1)

Fama-MacBeth Regressions
We estimate λ and αi as the time series average of the cross-sectional

regression estimates,
T T
1 X 1 X
λ̂ = λ̂t , α̂i = α̂i,t
T t =1 T t =1
We can use the standard deviations of the cross-sectional regression

estimates to generate the sampling errors of these estimates
T T
1 X 1 X
σ 2 (λ̂) = ( λ̂t − λ̂ ) 2
, σ 2
( α̂i ) = (α̂i,t − α̂i )2
T 2 t =1 T 2 t =1

Week 5 Empirical - Methods 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 5 Empirical - Methods 2

Uploaded by

Copyright:

Available Formats

The CAPM and Anomalies: Empirical Research

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 1 / 20

Overview of a few commonly used WRDS database

Portfolio sorts and regression-based tests of asset pricing models

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 2 / 20

Wharton Research Data Services (WRDS) is a web-based business

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 3 / 20

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 4 / 20

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 5 / 20

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 6 / 20

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 7 / 20

Three ways to access CRSP data

The WRDS Cloud

Connecting Remotely. You can connect to WRDS and access WRDS

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 8 / 20

Compustat provides information on company fundamentals, including

Compustat North America, Compustat Global, Compustat Bank

Compustat North America is the most-used one. It is very common to

It comes with a quarterly dataset (fundq) and annual dataset (funda),

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 9 / 20

Variables included in Compustat (over 500):

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 10 / 20

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 11 / 20

Economic theory, or empirical conjecture, often yields a prediction

Portfolio sorts are widely used in the literature, for a number of

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 12 / 20

A univariate portfolio sort is usually conducted as follows

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 13 / 20

Recall the CAPM is given by:

We can test the CAPM by running OLS time-series regressions:

Rtei = αi + βi ft + it , t = 1, 2, 3, ......T

According to the CAPM, αi , sometime referred to as the pricing error,

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 14 / 20

Assuming no autocorrelation or heteroskedasticity, the following test

Note that α̂0 is a vector of the estimated alphas, Σ̂ is the variance

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 15 / 20

Recall the CAPM is given by:

We can also test the CAPM by running a cross-sectional regression of

First, find estimates of the betas from time series regressions:

Rtei = ai + βi ft + it , t = 1, 2, 3, ......T , for each i

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 16 / 20

Second, with the estimated betas, run a regression across assets of

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 17 / 20

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 18 / 20

Fama and MacBeth (1973) suggest an alternative procedure for

First, as before, find beta estimates with a time series regression.

Second, instead of estimating a single cross-sectional regression with

Rtei = λt βi + αit , t = 1, 2, 3, ......T , for each t (1)

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 19 / 20

We estimate λ and αi as the time series average of the cross-sectional

We can use the standard deviations of the cross-sectional regression

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 20 / 20

You might also like

Rtei = αi + βi ft + it , t = 1, 2, 3, ......T

Rtei = ai + βi ft + it , t = 1, 2, 3, ......T , for each i