You are on page 1of 20

The CAPM and Anomalies: Empirical Research

Guanglian Hu

University of Sydney

[S1 2020]

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 1 / 20


Roadmap

Overview of a few commonly used WRDS database

Portfolio sorts and regression-based tests of asset pricing models

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 2 / 20


WRDS

Wharton Research Data Services (WRDS) is a web-based business


data research service, from The Wharton School at the University of
Pennsylvania. WRDS database is widely used in academia finance.

CRSP

Compustat

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 3 / 20


CRSP

The Center for Research in Security Prices (CRSP) U.S. Stock Database
contains comprehensive information on equity securities listed on the major
stock exchanges in the U.S. (e.g, NYSE and NASDAQ). It also provides
information on stock market indices.

The CRSP database contains information on price and quote, holding period
returns, trading volume, shares outstanding, identifiers and distribution.

It comes with a monthly data file (end of month) and a daily data file (end
of day).

CRSP annual products are updated each year in February, and consist of
data for the entire previous year.

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 4 / 20


CRSP
Commonly used variables:
permno: primary identifier in CRSP. It is a unique permanent security
identification number assigned by CRSP to each security.
permco: a unique permanent company identification number assigned by CRSP to
each company with issues on a CRSP File. In most cases, there is one-to-one
correspondence between permno and permco. But sometimes a permco can be
associated with multiple permno.
ret: stock returns including dividend. This is the key variable in CRSP. Note that
returns are already adjusted for splits.
shrcd: share code. It is common to focus on U.S. common equity (10 and 11)
prc: closing price. If closing price is not available, CRPS uses the bid-ask average
(denoted with a negative sign). It is common to exclude penny stocks (e.g., prc<1)
vol: trading volume
shrout: number of shares of outstanding. market cap = abs(prc)*shrout
siccd: The Standard Industrial Classification code (SIC), which is used to group
companies with similar products or services. Some studies exclude utilities (SIC
codes between 4900 and 4999) and financials (SIC codes between 6000 and 6999)

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 5 / 20


CRSP: Monthly Stock File

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 6 / 20


CRSP: Daily Stock File

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 7 / 20


Using CRSP Data

Three ways to access CRSP data

Web Queries

The WRDS Cloud

Connecting Remotely. You can connect to WRDS and access WRDS


datasets in the following interactive applications:
SAS via PC-SAS/Connect
Python via Jupyter Notebooks or Spyder
R via RStudio
Matlab
Stata

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 8 / 20


Compustat

Compustat provides information on company fundamentals, including


data items from income statements, balance sheets, and statement of
cash flows.

Compustat North America, Compustat Global, Compustat Bank

Compustat North America is the most-used one. It is very common to


merge Compustat with CRSP in the cross-sectional asset pricing
literature

It comes with a quarterly dataset (fundq) and annual dataset (funda),


updated daily

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 9 / 20


Compustat

Variables included in Compustat (over 500):


gvkey : a permanent identifier assigned to each company. GVKEYs
do not change, nor are they reused.
datadate: fiscal year end date
fyear: fiscal year
fyr: fiscal year end month
at: total assets
lt: total liability
cogs: cost of goods sold
sale: Net sales
many others

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 10 / 20


funda

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 11 / 20


Portfolio Sorts

Economic theory, or empirical conjecture, often yields a prediction


that expected returns should be increasing (or decreasing) in some
characteristic.
e.g., beta, lottery feature

Portfolio sorts are widely used in the literature, for a number of


reasons:
simple and intuitive
does not require assuming a linear relationship between expected
returns and the factor (less sensitive to outliers)
the return spread between the top and bottom portfolios can be
interpreted as the profits from a trading strategy

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 12 / 20


Portfolio Sorts

A univariate portfolio sort is usually conducted as follows


individual stocks are sorted according to a given characteristic (e.g.,
size, past returns, etc.) measured over some period
these stocks are then grouped into N portfolios (usually 5 or 10)
average returns on these portfolios are then computed over a
subsequent period or the same period over which the characteristic is
measured
the significance of the relationship is judged by whether the two
extreme portfolios have significantly different average returns.
To control for other existing cross-sectional effects, one can do a
double-sort or Fama-MacBeth regression

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 13 / 20


Time Series Regressions

Recall the CAPM is given by:

E [ri ] − rf = βi (E [rm ] − rf );

We can test the CAPM by running OLS time-series regressions:

Rtei = αi + βi ft + it , t = 1, 2, 3, ......T

where Rtei is the excess return of stock i, and ft is the excess return of
the market.

According to the CAPM, αi , sometime referred to as the pricing error,


should be equal to zero.

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 14 / 20


Time Series Regressions

We also want to know whether all the pricing errors are jointly equal
to zero.

Assuming no autocorrelation or heteroskedasticity, the following test


statistics will have a chi-square distribution with N degrees of
freedom:

Note that α̂0 is a vector of the estimated alphas, Σ̂ is the variance


covariance matrix of the residuals, ET (F ) and σ̂ (f ) are the sample
mean and standard deviation of the market.

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 15 / 20


Cross-Sectional Regressions

Recall the CAPM is given by:

E [ri ] − rf = βi (E [rm ] − rf );

We can also test the CAPM by running a cross-sectional regression of


average returns on the beta.

First, find estimates of the betas from time series regressions:

Rtei = ai + βi ft + it , t = 1, 2, 3, ......T , for each i

where Rtei is the excess return of stock i, and ft is the excess return of
the market.

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 16 / 20


Cross-Sectional Regressions

Second, with the estimated betas, run a regression across assets of


average returns on the betas.

E (R ei ) = λβi + αi , i = 1, 2, 3, ......N

where E (R ei ) is the average return of stock i. Note that you can run
the above regression with or without intercepts. The residuals αi are
the pricing errors.
According to CAPM, λ should be positive and close to the market
risk premium. Also the residuals of the second stage regression αi ,
should be equal to zero.

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 17 / 20


Cross-Sectional Regressions

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 18 / 20


Fama-MacBeth Regressions

Fama and MacBeth (1973) suggest an alternative procedure for


running cross-sectional regressions, as well as for producing standard
errors and test statistics.

First, as before, find beta estimates with a time series regression.

Second, instead of estimating a single cross-sectional regression with


the sample averages, we now run a cross-sectional regression at each
time period:

Rtei = λt βi + αit , t = 1, 2, 3, ......T , for each t (1)

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 19 / 20


Fama-MacBeth Regressions

We estimate λ and αi as the time series average of the cross-sectional


regression estimates,
T T
1 X 1 X
λ̂ = λ̂t , α̂i = α̂i,t
T t =1 T t =1

We can use the standard deviations of the cross-sectional regression


estimates to generate the sampling errors of these estimates
T T
1 X 1 X
σ 2 (λ̂) = ( λ̂t − λ̂ ) 2
, σ 2
( α̂i ) = (α̂i,t − α̂i )2
T 2 t =1 T 2 t =1

Guanglian Hu (University of Sydney) S1 2020 [S1 2020] 20 / 20

You might also like