721 views

Uploaded by tmshu1

save

- Ch8 Slides2
- An exploration: the turn of the month effect in equities from 1926 through 2010
- Yjc h2 Math p2 Solutions
- bodong2.pdf
- GARCH Investexcel.net
- Introduction To AmiBroker Second Edition
- Trading Volume and Stock Return
- 5 Risk Return Tradeoff Edittt
- Star Search: Signal Detection Applied to the Identification of Elite Players in the National Basketball Association
- SPY Quant v 0.01
- 10291853
- Six Sigma Green Belt Project2731 (1)
- On Improving the Experiment Methodology in Pedagogical Research - Sep 2014
- Malakoff Bayesian Statistics
- Parameter Estimation of GARCH Model
- Newtons Method of Trading new
- LWE-sept27-2006
- 10.1.1.678.5230
- HW3
- ECTA -- SLJER Vol 2(2) - Article by Tlg & Kaia
- Week 3
- 2.9_Hypothesis+Tests+and+Confidence+Intervals...多元回归中的假设检验与置信区间
- Life 109 Cycle 4 Complete
- BasicStatistical .pdf
- Chi-Square Test of Independence
- Time Series
- STAT 200 Lesson 11 Associations Between Categorical Variables
- Assessing the Unreliability Of
- UERC_COMPLETED
- Max Dama on Automated Trading
- GMO Hidden Risks of Risk Parity
- Valueinvestorsclub - Do Fund Managers Identify and Share Profitable Ideas
- Troy Shu STAT 520 PEAD Project Writeup Final

You are on page 1of 20

**Statistical Pair Trading on
**

International ETFs

Rebecca Wu

Troy Shu

STAT 434 Final Project Report

Steele

December 18, 2012

P a g e | 1

I . Execut i ve Summar y

Pair trading international ETFs with a non-adaptive strategy does not seem to perform

well over longer time frames due to the changing dynamics of mean reversion and momentum in

international ETFs. However, after applying an adaptive “filter” to our pair trading strategy, our

returns improved dramatically which suggests that being able to successfully capture these mean

reversion and momentum dynamic changes can be profitable.

Our first step was to conduct exploratory data analysis on the price and return data for 22

international ETFs. We did not find anything out of the ordinary: the international ETF prices are

highly autocorrelated, not normally distributed and not stationary while the ETF returns are

autocorrelated, heavy-tailed and stationary.

Next, we backtested our international ETF pair trading strategy. Our strategy used the

Augmented Dickey-Fuller stationarity test to select only the cointegrated ETF pairs as potential

trades.After regressing the price of one ETF against the price of the other over a rolling 120-day

formation period, our strategy then ordered the ETF pairs by the magnitude of the current

residual on the 120

th

day and selected the top 5 ETF pairs with the largest residual/divergence to

trade for the next 20 days.

Initial results were poor: the strategy produced a full period Sharpe Ratio of -0.16 and a

max drawdown of -53.3%. Plotting the rolling Sharpe Ratios showed that they oscillated around

0.00,so the overall risk-reward relationship of our initial strategy remained poor throughout time.

We considered using a GARCH(1,1) model to obtain a clearer picture of the standard deviation

of our strategy returns, and thus a clearer picture of the rolling Sharpe Ratio. However, the fact

that the residuals of our strategy’s returns are heavy-tailed precluded the use of GARCH to

model the standard deviation of our strategy’s returns. We also conduct an analysis using

different Kelly criterion bets and as expected, our strategy’s terminal wealth and compound

annualized growth rate is higher than the strategy that does not use Kelly bets.

In looking for ways to improve our initial international pairs trading ETF strategy, we

noticed that there seemed to be “regime shifts” over time between the dominance of mean

reversion or momentum in the returns of the ETFs. We applied a moving average “filter” to the

initial trading strategy to reverse the international ETF pair trades in the correct direction when it

P a g e | 2

seemed that these regime shifts occurred. Our pair trading strategy’s returns improved

dramatically, producing a full period Sharpe Ratio close to 1 and a max drawdown of -35%.

I I . Pr emi se: “Pai r s T r adi ng on I nt er nat i onal ET Fs” Paper

We decided to base our project on the premise of the quantitative financial research paper

titled “Pairs Trading on International ETFs”, authored by Schizas, Thomakos, and Wang. In their

paper, Schizas and his colleagues developed an international ETFs pair trading strategy that

produced spectacular results but did not seem to have a strong statistical foundation.

The authors of the paper used 23 international ETFs, representing countries such as the

USA, Germany, Brazil, Japan, and even smaller countries like Belgium and Malaysia. The

authors implemented their backtest using a rolling window: They had a 120-day formation

period in which they ranked all pairs of international ETFs and selected the top five to trade in a

simple 1-to-1 ratio. Then they had a 20-day trading period in which they calculated the ex-post

returns of the ETF pairs that they selected in the formation period. Rolling these two windows

forward together by 20 days produced ex-post returns for another 20 days.

To order the ETF pairs, the authors used the average absolute difference between the

cumulative returns of two ETFs starting from the beginning of the 120-day formation period. In

doing so, they were essentially betting that two international ETFs whose prices have shown to

diverge a lot will tend to converge in the future. However, they did not offereither a fundamental

economic reason or statistical evidence to explain such convergence behavior.

When assessing the performance of their strategy, theauthors neglected to provide basic

performance metrics such as monthly return, compound annualized growth rate, or max

drawdown numbers for their strategy. They only provided a single bar chart of monthly returns

and a few equity curves that are depicted below.

P a g e | 3

Their results seemed too good to be true given that very few months had negative returns,

even throughout the 2008 financial crisis.Furthermore,the negative returns never fell below -5%

while the positive returns frequently exceeded +5%, even reaching levels of 10% of 20% at some

points. The equity curves also seemed suspect since the portfolio for the top five pairs

consistently beat the market throughout all time.

The goal of our project was to develop a more statistically sound international ETFs pair

trading strategy by only trading cointegrated international ETF pairs in order to avoid the

problem of basing our trading decisions off bogus spurious regressions that would always

produce a highly significant alpha and beta even if the two international ETFs were completely

independent of each other. In addition, rather than only trading the pair on a 1-to-1 ratio, we also

used the Engle-Granger two-step method to determine the optimal cointegration ratio and

construct our trades by going long for ETF y but then going short for ETF x in proportion to the

cointegration ratio.

I I I . Descr i pt i on of Our St r at egy

In our project, we used the same type of rolling backtest that Schizas and his co-authors

used in their strategy; however, we implement the Engle-Granger two-step method in selecting

our pairs to trade. First, weperformed a regression onthe price data from the 120-day formation

period for each of the ETF pairs. Next, we ran a Dickey-Fuller stationarity test on the set of

residualsfrom theregression to check whether the first difference in price of the ETF pair was

stationary.If the residuals were not stationary, we eliminated the pairsince it did not make

statistical sense to pair trade two ETFs whose prices are expected to diverge.

After determining the valid pairs to potentially trade, we ranked the remaining pairs

based on the absolute magnitude of the last residual in the 120-day formation period (i.e. the

most recent residual).Then we selected the top five pairs with the greatest level of divergence to

trade for the following 20 days.

We constructed dollar neutral positions in the top ETF pairs that we selected using the

estimated betas from the regressions. We made sure that these trades were made in the correct

direction. For example, say that we selected to trade the SPY-EWM (iShares MSCI Malaysia

Index) pair because it had a very negative residual or -1 (very high absolute magnitude)

P a g e | 4

afterregressing x = SP¥ price on y = EwH price. By definition, residuals were calculated as

y −[x = rcsiJuol, and so a residual that was very negative meant that our [x leg was

“overpriced” compared to our y leg. So in this case, we shorted our x leg (SPY) and went long

our y leg (EWM).

Once we had the trading period (next 20 days) returns for the five pairs, we simply took

the arithmetic average across all pair returns for that day to arrive at our portfolio’s overall return

for our day; in other words, we equal weighted the five pairs that we traded in our portfolio.

One of our exit criteria was as follows: if, within the 20 day trading period, the price

residual of a pair reversed sign from the original price residual, we exited that pair (and did not

rebalance our portfolio). For example, consider the SPY-EWM pair example that we used

before: the original residual was quite negative, -1. The price residual was calculated every

trading day as y −[x, or the price of EWM minus [ times the price of SPY (since we defined x

as the price of SPY and y as the price of EWM before. One day, the price residual became 0.01;

since the sign was now opposite to the sign of our original residual (-1), we exited (on the next

day’s close, since we are only using close data). Intuitively, this made sense because we

originally entered the pair trade to capture the divergence in price as measured by the residual

with the expectation that this divergence will close, and the residual will cross zero—this

difference between the original residual and zero would be our profit. So once the residual

switched signs we would be trading in the wrong direction.Since we captured most of the profit

in the “spread” as measured by the residual one day before the entry date, we exited as soon as

possible once this happened. We exited all positions in any pairs that we were still trading once

the 20-day trading period was over.

Using a rolling backtest prevented possible data mining issues that could be encountered

from selecting a fixed training and testing period. Additionally, such a rolling backtest made

particular sense in the context of our project for 2 reasons: (1) our trading strategy was relatively

short-term so it would have been erroneous to include historical data from too far back in time,

and (2) the nature of a pairs trading strategyrequiredscreening all the pairs and only trading those

pairs that seemed the most promising. In this case, our pairs were ordered by a univariate metric,

namely the magnitude of the price regression residual.

P a g e | 5

I V. EDA of t he I nt er nat i onal ET Fs Dat a

After conducting exploratory data analysis on our international ETFs data, we found that

there were no surprises because both the ETF price and return series displayed the expected

behavior of typical financial series. As expected, the ETF prices were highly autocorrelated and

not stationary while the returns exhibited heavy tails, autocorrelation, and stationary.

Out of the 23 ETFs that Schizas and his colleagues used, we collected closing price data

for 22 ETFs that spanned the time period from April 01, 1996 to December 31, 2011.The reason

we omitted the 23

rd

ETF was because data for the ETF EZU (iShares MSCI EMU Index) was not

found on CRSP. While the majority of ETF records started on April 01, 1996, two ETFs(Korea

and Taiwan) had data starting on May 10, 2000 and June 20, 2000. From the 22 ETFs we could

form a maximum number of 231 pairs.

Nor mal i t y:

When testing the international ETFs return data for normality, both the Shapiro-Wilks

and Jarque-Bera normality tests yielded p-values of 0.00 that strongly reject the null hypothesis

Normal QQ Plot of SPY Returns

Quantiles of Standard Normal

S

P

Y

R

e

tu

rn

s

E

m

p

iric

a

l Q

u

a

n

tile

s

-2 0 2

-0

.1

0

-0

.0

5

0

.0

0

.0

5

0

.1

0

0

.1

5

Normal QQ Plot of EWZ (Germany) Returns

Quantiles of Standard Normal

E

W

G

R

e

tu

rn

s

E

m

p

iric

a

l Q

u

a

n

tile

s

-2 0 2

-0

.1

0

-0

.0

5

0

.0

0

.0

5

0

.1

0

0

.1

5

0

.2

0

Normal QQ Plot of EWJ (Japan) Returns

Quantiles of Standard Normal

E

W

J

R

e

tu

rn

s

E

m

p

iric

a

l Q

u

a

n

tile

s

-2 0 2

-0

.1

0

-0

.0

5

0

.0

0

.0

5

0

.1

0

0

.1

5

of a normal distribution. The Jarque-Bera test resulted in very high values for the test statistic for

each of the return series, indicating the presence of heavy tails. On the previous page, we

included the normal qq-plots for several of the largest ETFs in which the heavy-tailed

distribution could be easily observed.

I ndependence:

Next, we conducted the Ljung-Box test to check for

the presence of autocorrelation for all 22 ETFs. To the right,

the histogram of the p-values from the Ljung-Box tests

shows that all the p-values were very close to zero. 0.0 0.0005 0.0010 0.0015 0.0020

0

5

1

0

1

5

2

0

Histogram of p-value for Ljung Box on ETF Returns

Ljung-Box p-value

N

u

m

b

e

r o

f E

T

F

s

P a g e | 6

Therefore, we strongly rejected the null hypothesis that the return series contained no

autocorrelation for all international ETFs.

Aut ocor r el at i on:

In the ACF plots included below for several of the largest ETFs, we found that the lag 1

coefficient had the largest magnitude. Since the lag 1 coefficient was negative, the ETFs seemed

to be short-term mean reverting. There were also some lags between 10 and 20 that had large

positive magnitudes, which may potentially signal medium-term momentum, but the lags could

be too far away to be meaningful.

Lag

A

C

F

0 10 20 30

0

.0

0

.2

0

.4

0

.6

0

.8

1

.0

Autocorrelation of SPY Returns

Lag

A

C

F

0 10 20 30

0

.0

0

.2

0

.4

0

.6

0

.8

1

.0

Autocorrelation of EWG Returns

,

Lag

A

C

F

0 10 20 30

0

.0

0

.2

0

.4

0

.6

0

.8

1

.0

Autocorrelation of EWG Returns

We then collected all of the AR(1) coefficients when modeling each ETF as an AR(1)

process. The histogram on the left side of the following page shows that all of the AR(1)

coefficients were negative, meaning that all of the ETFs in our universe tended to display mean

reversion in the short term.

St at i onar i t y:

We conducted the Augmented Dickey-Fuller Stationarity test on the ETF returns to see

whether they contained a unit root. The above histogram on the right side of the page shows that

the Augmented Dickey-Fuller test statistics for all international ETFs were highly negative: the

most negative test statistic was around -73, and the least negative test statistic was -17. This

-0.15 -0.10 -0.05 0.0

0

2

4

6

Histogram of AR(1) Coefficient on ETF Returns

AR(1) coefficients

N

u

m

b

e

r o

f E

T

F

s

-80 -60 -40 -20 0

0

2

4

6

8

1

0

1

2

Histogram of Augmented Dickey-Fuller Test Statistic

ADF Test Statistic

N

u

m

b

e

r

o

f E

T

F

s

P a g e | 7

result indicated that all the international pairs did not contain a unit root and were consequently

all stationary, which is expected for financial return series.

V. Over vi ew of Our St r at egy’s Per for mance

After confirming that our data was clean, we backtested our pair trading strategy to

generate returns and see how our strategy would have performed over time. We ran our rolling

backtest over the time period from September 04, 2004 to December 31, 2011. The reason for

selecting September 04, 2004 to be the start date was because the previous trading day was the

last day in which there was at least one ETF out of the 22 that had a zero volume day. We did not

want to be trading any low liquidity ETFs.

On the following page, we included a graph of the strategy’s equity curve (cumulative

growth of investing $1 in the strategy). The strategy has a compound annualized growth rate of -

3.7%, a maximum drawdown of -53.3%, and a full period annualized Sharpe Ratio of -0.16 (the

full period annualized Sharpe Ratio was calculated by first calculating mean daily excess return

above 10 year Treasuries, for the full period,divided by daily standard deviation, then

annualizing this quotient by multiplying by 250/√250).

Interestingly, the above equity curve shows that our international ETF pairs trading

strategy seemed to consistently lose money from early 2005 to early 2008. The strategy returns

0

0.2

0.4

0.6

0.8

1

1.2

Equit y Cur ve of Our Pair Trading St rat egy

L

e

v

e

l

s

P a g e | 8

then jumped upwards erratically for a few years from mid 2008 to mid 2011. However, from mid

2011 onwards, the returns became negative again.

These results demonstrated the dynamics of dominance between mean reversion and

momentum in our pair trading strategy. The fact that our strategy consistently lost money from

2005 to 2008 meant that we were consistently making the wrong bets: instead of making a

successful bet on international ETF convergence, the ETF pairs seemed to diverge even more

after we selected them. In other words, there appeared to be momentum in the international ETFs

we traded during that period.

However, there seemed to be a regime shift after 2008, as our pair trading strategy’s

returns improved. What this suggests is that international ETFs started to become more mean

reverting than trending in the short to intermediate term. This makes intuitive sense, as the

economies of the world were in crisis during the couple of years after 2008, and so they—or at

least their markets—probably tended to move together. However, our project was an empirical

one, so the “economic story” behind the performance of our trading strategy was left as a future

research topic.

VI . EDA of St r at egy Ret ur ns

The average strategy returns had a

heavy-tailed distribution and contained

autocorrelation, which was typical of most

financial returns. The ranked pair returns

displayed the same statistical

characteristics as the average returns,

although there did not appear to be a

relationship between the rank of the pair

and the level of autocorrelation or

stationarity.

Nor mal i t y:

Both the Shapiro-Wilks and Jarque-Bera normality tests yielded p-values of 0.0000,

providing strong evidence that the returns were not normally distributed. The Jarque-Bera test

P a g e | 9

statistic had a very high value of 10,586.97, signifying the presence of heavy tails. The normal

qq-plot below confirmed this observation by showing that the strategy returns indeed followed a

heavy-tailed distribution.

I ndependence:

The Ljung-Box test resulted in a p-value of 0.0000, meaning that the

returns definitely contained autocorrelation and were not independent. The

acf-plot above supported this conclusion since it shows that there were

significant lags at lag 1, lag 5 and lag 6. Fitting an AR(p) model (p=1-6) to

the average returns revealed that the AR(1) model provided the best fit, with

the AR(6) model being a close second since they had the lowest AIC values.

This outcome was consistent with the results from the acf-plot.

St at i onar i t y:

Finally, the Dickey-Fuller test resulted in a p-value of 1.01e-16, which indicated that the

returns were stationary and did not contain a unit root. This conclusion was expected because

only data that tended to be influenced by historical values, such as price data, should contain a

unit root. Returns data, on the other hand, did not depend on past data and should be stationary

without containing a unit root.

Anal ysi s of Rank ed Pai r Ret ur ns:

After performing the same analysis on the 5 ranked pair returns series, we found that each

series also had heavy tails and autocorrelation, much like the average strategy returns. Some

ranked pair returns contained more autocorrelation than others, but there did not appear to be a

relationship between the rank of the pair and the degree of autocorrelation or stationarity. The

pairs with rank 0 and 2 only had a few statistically significant lags while ranks 1, 3 and 4

AR(p) AIC

1 -8877

2 -8870

3 -8865

4 -8858

5 -8868

6 -8872

P a g e | 10

hadnearly all significant lags up until lag 20, and the pairs with rank 1 and 3 had the lowest p-

values for the Dickey-Fuller test on an order of 10

-19

while ranks 0, 2 and 4 had higher p-values

on an order of 10

-16

.

VI I . Anal ysi s of St r at egy Per for mance: Rol l i ng Shar pe Rat i o

Analyzing the 20-day rolling Sharpe ratio of our trading strategy revealed that the

performance of our trading strategy was not very good given that the Sharpe ratio oscillated

around 0.00 across time. Calculating the rolling Sharpe ratio using the GARCH conditional

deviation, rather than the rolling standard deviation, resulted in a larger range of outliers in

addition to a smaller spread between quartile 1 and quartile 3 and did not improve the overall

performance of the strategy. A closer look revealed that the GARCH model should not be used

to fit the trading strategy returns at all.

Shar pe Rat i o Usi ng Uncondi t i onal St andar d Devi at i on:

We first calculated the rolling Sharpe ratio by dividing the rolling mean by the rolling

standard deviation. Below on the left side of the page, the plot of the rolling mean and rolling

standard deviation showed that there was a positive relationship between risk and reward, since

the mean return tended to increase as the standard deviation increases. On the right side, the plot

of the rolling Sharpe ratio using rolling standard deviation showed that the series seemed to

fluctuate around 0.00.

The box plot below showed that the rolling Sharpe ratio did in fact have a mean of 0.00, and the

values between quartiles 1 and 3 ranged from -0.01 to +0.01. Based on this result, our trading

strategy did not seem to have much value.

P a g e | 11

Shar pe Rat i o Usi ng Condi t i onal St andar d Devi at i on:

The plot of the average strategy returns above on the right showed that there was some

volatility clustering from Q4 2008 to Q2 2009, so after fitting a GARCH(1,1) model to the

returns we recalculated the rolling Sharpe ratio by dividing the rolling mean by the conditional

standard deviation. We thought this might improve our results since the GARCH conditional

standard deviation should be better at accounting for volatility clustering than rolling standard

deviation, but the performance of the strategy ended up being worse as the mean of the rolling

Sharpe ratio still remained at 0.00 while the spread between quartiles 1 and 3 shrank even further

as shown in the box plot. It was also evident from the box plot on the previous page that there

were more outliers when using the conditional standard deviation than for the unconditional

standard deviation. One particular outlier could be seen in Q3 2010 of the plot of the rolling

Sharpe ratio using conditional standard deviation.

Compar i ng Condi t i onal vs. Uncondi t i onal St andar d Devi at i on:

A comparison of the conditional standard deviation with the unconditional rolling

standard deviation in the plot to the left revealed that the conditional standard deviation

contained much more variance and had higher peaks. The conditional standard deviation had a

variance of 0.00018 while the unconditional standard deviation had a variance of 0.00013. This

explained why the Sharpe ratios calculated using the conditional standard deviation were smaller

than the Sharpe ratios calculated using the unconditional standard deviation, since the

denominator of the ratio was standard deviation.

P a g e | 12

Comparing the above box plots of the unconditional standard deviation and conditional

standard deviation further supported this observation by showing that the conditional standard

deviation had many more outliers on the upward end than for the unconditional standard

deviation.

Eval uat i on of GARCH( 1 ,1 ) M odel :

To evaluate whether using a GARCH model was appropriate in the first place, we first

confirmed that there was significant autocorrelation in the trading strategy’s average squared

returns, which suggested that the returns might display time-varying conditional

heteroskedasticity. Additionally, the Lagrange-Multiplier test produced a p-value of 0.0002,

indicating that the residuals of the GARCH model did show an ARCH effect. However, despite

the previous evidence supporting the use of the GARCH model, the normal qq-plot of the

GARCH residuals showed that the residuals were not normally distributed at all, which meant

that the GARCH model could not be used for modeling the standard deviation of the trading

strategy’s returns.

Uncondi t i onal St dev

Uncondit ional St dev

P a g e | 13

Furthermore, there was still autocorrelation present in both the residuals and squared residuals,

based on the p-values of 0.0000 from the Ljung-Box test, which meant that the GARCH model

did not successfully model the serial correlation structure in the conditional standard deviation.

Finally, the “C” coefficient in the GARCH model was not statistically significant, with a p-value

of 0.5662 that showed that the coefficient could actually be zero instead.

VI I I . Anal ysi s of St r at egy Per for mance: Kel l y Bet t i ng

We also analyzed our strategy performance by studying the wealth process of an investor

who used Kelly betting when making daily investments in our trading strategy (i.e. he would

lever our strategy’s performance—the average return of the five pairs traded—every day based

on the Kelly Criterion).We simulated both the full and fractional versions of the Kelly Criterion

under varying restrictions. In terms of the full version of the simulation, the long-only,

unleveraged strategy performed better than the long-short, leveraged strategy. On the other hand,

the fractional version of the simulation considerably outperformed the full version altogether.

Regardless of the Kelly strategies’ relative performances to each other, all the strategies

did poorly and had negative CAGR values. However, each CAGR was still higher than our

trading strategy’s CAGR of -3.70%. This was consistent with the fact that Kelly betting focuses

on maximizing long-term terminal wealth, rather than short-term wealth, so the Kelly strategies

should have higher CAGR values than the underlying trading strategy. Overall, the results from

our Kelly simulation further confirmed the weak performance of our trading strategy since even

maximizing the long-term wealth could not produce positive returns.

Ful l Kel l y Cr i t erion (Long-Only, Unleveraged)

In this scenario, the investor began with an

initial wealth of 100 and only made unleveraged bets

if there was a positive expected return (calculated

from time 0 to t). To the right, the plot of the wealth

time series showed that no bets were made for a long

period of time since the expected return was

consistently negative up until approximately day

P a g e | 14

1200 (year 2008). However, after day 1200, the level of wealth spiked briefly before plummeting

to a value of 88. The simulation resulted in a negative CAGR of -0.65%. Calculating the

summary statistics for the wealth return series showed that the returns

had both a negative mean and Sharpe Ratio, and the downside was 4

times greater than the upside. Since the expected returns were so

consistently negative over time we also tried applying the long-short, leveraged version of the

Kelly betting strategy to take unleveraged short positions when expected returns were negative,

but this strategy did not fare any better as we will discuss next.

Ful l Kel l y Cr i t er i on ( Long-Shor t , Lever aged)

We updated the investor’s strategy to include

unleveraged short positionsas well as leveraged long

positions up to twice the amount of the investor’s

total wealth. There was visibly more variance in the

wealth time series, and a higher number of bets were

made due to the inclusion of short positions.

However, after ending at a value of 84, the

strategyresulted in a negative CAGR of -1.53% indicating that it was

even less effective than the previous long-only strategy. The summary

statistics for the wealth return series showed that the standard deviation

doubled in the positive direction, while both the mean return and CAGR doubled in the negative

direction. The maximum return also doubled, but the minimum return actually remained close to

the same as for the long-only strategy. This suggested that since the max loss of wealth on any

given day stayed roughly the same, we are unfortunately taking more bets that go against us

when compared to the long-only Kelly Criterion.

Fr act i onal Kel l y Cr i t er i on ( Long-Shor t , Lever aged)

In the full Kelly criterion, a huge assumption

was made that the historical returns were indicative of

future expected returns and variance. Fractional Kelly

mitigated that risk by scaling down the size of the bet.

Summary Statistics:

Max: 2.39% μ: -0.01%

Min: -8.03% σ: 0.27%

CAGR: -0.65%

Summary Statistics:

Max: 4.28% μ: -0.03%

Min: -8.04% σ:0.52%

CAGR: -1.53%

P a g e | 15

After testing a few values of f(0<f<1), there was definitely a positive relationship between

decreasing the value of f and improving the stability of the returns with a tradeoff of lower

returns. For f=0.20, the strategy had a CAGR of -0.29% and a

mean return of -0.03%. Decreasing to f=0.05 resulted in a higher CAGR

of -0.07%, smaller spread between the minimum and maximum from

-1.61% to +0.86%, lower standard deviation of 0.03%, but also a very

small expected return close to 0. On the other hand, increasing to f=0.50 resulted in a lower

CAGR of -0.74%, larger spread between the minimum and maximum from -4.02% to +2.14%,

higher standard deviation of 0.26%, and also a more negative expected return of -0.05%.

I X. I mpr ovi ng Our St r at egy’s Per for mance

Since our international ETF pairs trading strategy seems to lose money most of the time,

we decided to look at improving our strategy’s performance, firstly by reversing the direction of

the positions we take, and then applying a moving average filter to our strategy’s equity curve.

Rever si ng t he St r at egy:

By reversing the direction of our original pair trades, we were now betting that the

international ETF pairs would continue to diverge after selecting them; since we selected pairs

based on the magnitude of divergence (as measured by a residual), the bet is essentially that there

is momentum in the divergence of international ETF pairs. The reversed strategy had a

compound annualized growth rate of 3.8%, a maximum drawdown of -66.7%, and a full period

annualized Sharpe Ratio of 0.16.

Summary Statistics:

Max: 0.86% μ: -0.03%

Min: -1.61% σ: 0.10%

CAGR: -0.29%

P a g e | 16

We noticed that this strategy’s returns tend to trend over the medium to long term.

Applying a moving average filter to the equity curve seems to be a good way to capture the

trending nature of the strategy’s returns. Specifically, we would calculate the moving average of

the strategy’s equity curve/cumulative growth. If the strategy is currently underperforming its

average performance, we would short the strategy (in this case, since the “strategy” under

consideration is actually the reverse of our original pair trading strategy, shorting the “strategy”

means taking the original unreversed trade). Likewise, if the strategy’s performance is higher

than its average performance, we would long the strategy.

M ovi ng Aver age Fi l t er :

We decided to test the performance of using a 200-day moving average as a type of trade

filter described above. The first graph is a plot of the reversed strategy’s equity curve along with

its 200 day moving average. The second graph is a plot of the reversed strategy’s equity curve

after filtering the trades by the 200 day moving average as described in the previous paragraph.

The performance numbers of the reversed pair trading strategy with the 200 day moving average

filter were as follows: 23.8% compound annualized growth rate, -35.2% maximum drawdown,

and a full period annualized Sharpe Ratio of 0.95.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Reversed St rat egy Equit y Cur ve

P a g e | 17

The filtered strategy still seemed to be very volatile: it actually had roughly the same

daily return standard deviation as the unfiltered strategy (1.79%). However, the compound

annualized growth rate was relatively high at 23.8%, which suggests that, with the 200 day

moving average filter, we are on average correctly capturing the regime shifts between

momentum and mean reversion in our international ETF pairs.

X. Fi nal Consi der at i ons

There are several considerations to take into account in the trading strategy that we

developed, and these may either be interpreted as determinants of risk in our strategy or jumping-

off points for extensions from our analysis. First of all, our trading strategy does not factor in

transaction costs, which may be significant since we are making trades somewhat frequently at a

0

0.5

1

1.5

2

Reversed St rat egy Equit y Cur ve Plot t ed

w it h M A 200

Rever sed

St r at egy

Equi t y

Cur ve

M A200

0

1

2

3

4

5

6

Equit y Cur ve Af t er Applying M A 200

Filt er

P a g e | 18

rate of 5 new positions every 20 days; since we are trading pairs, this means 10 trades per pair,

and 20 trades including both entry and exit. This averages to about a single trade every trading

day, which may be too frequent for an individual speculator, but may not be out of the realm of

possibility for a large institution like a hedge fund. In addition to transaction costs, there is

slippage due to illiquidity. We noticed that some of the international ETFs only had average

daily volume in the tens of thousands for about the first year in our backtesting period: an

institution could have had trouble trading large quantities of these ETFs in the early years

without moving the market too much. Including transaction costs and slippage would reduce the

overall profitability of our strategy.

Secondly, our trading strategy determines the ranking of the pairs based only on the last

residual of the 120-day formation period. This is to ensure that we are trading pairs that have the

greatest residual—the greatest “divergence”—right before we enter the trades, since we are

betting that the pairs will converge in the near future. We considered incorporating an

exponential moving average of the formation period residuals (weighting the more recent

residuals more) instead of just basing our trading decision on a single data point, but we decided

against it because we figured that there should not be problems with highly fluctuating residuals

since our exploratory data analysis on the international ETF’s data came out clean. Extensions

from our work may potentially consider using either an exponential moving average or some

other method of incorporating the past residuals in the formation period.

Thirdly, our initial strategy equity curve suggests that international ETF’s tend to diverge

during good economic times and converge during bad economic times. Our trading strategy

performed poorly up until the financial crisis in 2008, and then it started performing well as the

world’s economies began to move together during the crisis until it recently started dropping

again partway through 2011. This is a potential research question worth investigating.

Lastly, there is the peso problem, where historical data may not reflect all risks,

especially those in the future. An example of this phenomenon can be seen in the backtest period

from 2004 to 2008. The returns to our strategy were very consistent during that time period and

volatility was low (in this specific case, the international ETF pairs tended to diverge even more

after we picked them). If we were to put ourselves in 2008, given the historical data up until that

point, we would not have known when—if ever—the persistence in the momentum of

international ETF pairs would break down; indeed, it did break down immediately, during the

P a g e | 19

financial crisis, when international ETF pairs suddenly became mean reverting (i.e. the

international ETF pairs started moving together) and our bets on ETF pair convergence started

making money. Our models could not have foreseen this risk predicted in the historical data.

This is the reason why making models and trading strategies that are adaptive is a good thing to

do in the volatile and unpredictable world we live in today.

- Ch8 Slides2Uploaded byNgọc Huyền
- An exploration: the turn of the month effect in equities from 1926 through 2010Uploaded bytmshu1
- Yjc h2 Math p2 SolutionsUploaded byjimmytanlimlong
- bodong2.pdfUploaded byExcelsior JP
- GARCH Investexcel.netUploaded byAjinkya Agrawal
- Introduction To AmiBroker Second EditionUploaded byYakridu Anna
- Trading Volume and Stock ReturnUploaded byDung Tran
- 5 Risk Return Tradeoff EditttUploaded bydesy asrina
- Star Search: Signal Detection Applied to the Identification of Elite Players in the National Basketball AssociationUploaded byEdward Hoa
- SPY Quant v 0.01Uploaded byasxiq
- 10291853Uploaded byPriyank Badola
- Six Sigma Green Belt Project2731 (1)Uploaded byavinash_k007
- On Improving the Experiment Methodology in Pedagogical Research - Sep 2014Uploaded byaamir.saeed
- Malakoff Bayesian StatisticsUploaded bygzazueta
- Parameter Estimation of GARCH ModelUploaded bybboyvn
- Newtons Method of Trading newUploaded byShravan Vn
- LWE-sept27-2006Uploaded byFRed Pacompia
- 10.1.1.678.5230Uploaded byjoemar
- HW3Uploaded byDunbar Anders
- ECTA -- SLJER Vol 2(2) - Article by Tlg & KaiaUploaded byChathura Gunawardhana
- Week 3Uploaded bymurakip3
- 2.9_Hypothesis+Tests+and+Confidence+Intervals...多元回归中的假设检验与置信区间Uploaded byJames Jiang
- Life 109 Cycle 4 CompleteUploaded byAdy Cook
- BasicStatistical .pdfUploaded byWacks Guadalupe
- Chi-Square Test of IndependenceUploaded bylielos
- Time SeriesUploaded byM-Faheem Aslam
- STAT 200 Lesson 11 Associations Between Categorical VariablesUploaded byMichaelValdez007
- Assessing the Unreliability OfUploaded byGustavo Avendaño
- UERC_COMPLETEDUploaded byRodrigo Jerez Tapia