You are on page 1of 17

Statistical Arbitrage for Mid-frequency Trading

Nicolas Kseib, Xiaolin Lin, Lorenzo Limonta, Mike Phulsuksombati
June 11, 2014
The main goal of this project is to generate and exploit the trading signal from
real-life high-frequency/mid-frequency trading data. With the aid from Thesys, we are
able to use real-life trading data to explore and evaluate Statistical Arbitrage based
algorithms. We implement a PCA analysis to isolate residual signals. By using multiple data mining techniques, we developed market neutral trading strategies. The
parameters for different learning methods were updated using walk-forward optimization. Finally, we simulate the trading strategies using real data and evaluate their
performance. The results show our methods, by implementing different models as well
as raw residual signal, can generate profitable strategies in pre-managed data.



In the field of investment, statistical arbitrage refers to strategies attempting to profit from
pricing inefficiencies in the market, identified through mathematical models. The basic
assumption of any such strategy is that prices of similar securities will move towards a
historical average. It encompasses a variety of strategies and investment programs whose
common features are:
• Trading signals are systematic
• Trading book is market-neutral
• The mechanism for generating excess returns is statistical
The idea is to make many bets with positive expected returns, taking advantage of diversication across stocks, to produce a low-volatility investment strategy which is uncorrelated
with the market.
Historically the father of modern statistical arbitrage techniques is pairs trading, a strategy where two security with similar return behavior are first identified and then traded.
Once their respective value diverge significantly from the expected mean, one goes long on
the security under performing whilst going short on the security performing better than expected. This is done under the assumption that on the long term their price will converge
back to their mean.

looking for imbalances in the short term. . Let us represent the stocks return data. we present the walk forward optimization (WFO). we trade group of stocks against other group of stock. . i = 1. we introduce one way to construct the residual signals. although this need not be the case in practive. In section 2. The simulation result and daily return from our strategy is shown in section 3. fundamental for the correct implemantion of such strategy is understanding the correlation between the price movements of different assets that make up our book.1 PCA Analysis As one can imagine.In this paper we follow the natural extension of such strategy. going back M + 1 days as a matrix. In this section. N. as it will allow us to distinguish which information are relevant and which are noise. . Rik = Si(t0 −(k−1)∆t) − Si(t0 −k∆t) . 1. Rather than time windows of weeks or months we behave as high frequency trading (HFT) firms. This approach uses historical share-price data on a cross-section of N stocks going back M days in history. Innovative of our technique is the trading time-horizon. . the cross-section is assumed to be identical to the investment universe. We follow the approach as in [AL10] and [Lal+99]. We conclude and discuss our challenge in section 4. Yik = Rik − R¯i σˆi (2) where M 1 X ¯ Ri = Rik M k=1 (3) and M σ ¯i2 1 X ¯ i )2 =1= (Rik − R M − 1 k=1 2 (4) . M. This signal will be basis of our trading strategy. . thus implementing a generalized pairs-trading technique. Si(t0 −k∆t) (1) where Sit is the price of stock i at time t adjusted for dividends and ∆t = 1 minutes. on any given date t0 . k = 1. the method we use to update our parameter. extremely different from usual arbitrage strategies. A first approach for extracting our signal of interest from data is to use Principal Components Analysis (PCA). . . in the order of a few minutes at most. as well as our trading strategy and some signal filtering method used to improve our performance. rather than simply choosing a pair. Since some stocks are more volatile than others. . it is convinient to work with standardized returns matrix Y . For simplicity of exposition.

. λmin ] and where σ 2 is qual to the variance of the element of Y . λN ≥ 0. . l. Notice that. if Y is a T T random matrix ρC (λ) is exactly known in the limit N → ∞. v2 ) (8) We will note ρ(λ) the density of eigenvalues of the empirical correlation matrix by 1 dn(λ) N dλ where n(λ) is the number of eigenvalues of C less than λ. for any index i. Let λmin . (13) The residual signal then can be generate as the following β = (F T F )−1 F T C Cˆ = F β Residual = C − Cˆ 3 (14) (15) (16) . . we consider the corresponding “eigenportfolio”. . . we have PM M ¯ 2 1 X 1 2 k=1 (Rik − Ri ) =1 ρii = (Yik ) = M − 1 k=1 M −1 σ ¯i2 (5) (6) The commonly used solution to extract meaningful information from the data is Principal Components Ananlysis. . We consider the eigenvectors and eigenvalues of the empirical correlation matrix and rank the eigenvalues in decreasing order: N ≥ λ1 ≥ λ2 ≥ λ3 ≥ .The empirical correlation matrix C of the data is defined by M 1 X ρij = Yik Yjk . . . T → ∞ and Q = N ≥1 and reads: p (λmax − λ)(λmin − λ) Q ρC (λ) = 2πσ 2 rλ 1 1 2 λmax ±2 ) min = σ (1 + Q Q ρC (λ) = (9) ×N fixed (10) (11) with λ ∈ [λmax . . . Interestingly. . M − 1 k=1 which is symmetric and positive definite. λmax be the significant eigenvalues in the above sense For each index j. . which is such that the respective amounts invested in each of the stocks is defined as (j) (j) Qi = vi σˆi (12) The eigenportfolio returns are therefore Fjk = N (j) X v i i=1 σˆi Rik j = k. . (7) We denote the corresponding eigenvectors by (j) (j) v (j) = (v1 .

As we said the optimization is performed using the Sharpe ratio as an objective function starting with a 7M $ investment (10K$ and following a buy/short trading strategy on the 70 stocks in the XLK technology ETF). The classical WFO algorithm was used and we start by building our model using an initial amount of data satisfying the T > 1. accurate results. • The Residuals signals we computed are very small and are easily preturbed by computers. 1. we choose the Sharpe ratio to be our objective function. we want to tune more parameters and get more stable.1 Trading Strategy Parameter Estimation The walk forward optimization (WFO) methodology will be used to update the choice of the variance of the elements in the standardized returns matrix. This variance is used to compute the eigenvalue spectrum of the empirical correlation matrix. It should be noted that it is essential to find a good stable optimization procedure in order to fit for the parameters used in the modeling of the mean reversion process.3 Main Challenges • One first challenge that is built in with Random Matrix Theory is that we will have many zero returns as we have smaller time differences. This will make it hard to compute the SVD that we need for our eigenvalues/eigenvectors. The first predictive model is built using data from T the first 191 minutes of the trading day which gives a Q = N ≈ 2. which is not so ideal if we are executing in fast time. After the out of sample prediction period ends this segment of data is added to our in-sample database and we build another predictive model with a different σ 2 .1. The performance of the parameter to be optimized will be ”a-posteriori” judged in terms of the robustness or stability of the obtained optimal parameter maximizing a certain objective function. We use standard data mining techniques namely: least squares. This will allow us to update our model and account for any non-stationarity or new information in the process. In this report. random forest.2 Residual Prediction with Data Mining Techniques We also need to select a model to predict the residual signal. But we will spend more time tuning them. • The biggest challenge is that Residual signals are sensitive to the σ 2 we choose for the distribution of eigenvalues • There is reverse relationship between time cost and parameters tunning. elastic net regression and multinomial logistic regression. Ideally. 2 2.73 thus satisfying the 4 . Using the optimal model obtained in this initial period of data we make our condition N first out-of-sample predictions.

whereas if you chose a small sample the statistical significance can greatly deteriorate. Figure 2: The 25th of February plot of the Sharpe ratio versus σ 2 to compute the trading signals. We show the results for two days where for the first one profits were 5 . Figures 2 and 3 show the variation of the Sharpe ratio with respect to σ 2 for both periods during a certain day. From a preliminary analysis it seems that the choice of the length of the training periods is crucial for a correct parametrization of σ 2 . This process is repeated until the end of the day. This model is used to predict the next period consisting of 120 minutes.Figure 1: Walk Forward Optimization conditions allowing us to apply equations (10) and (11). Using periods of size 120 we end up building two different models each day having distinct values of σ 2 . indeed if you chose a large number of minutes you might run the risk of over-fitting. When the 120 minutes are over they are added to our sample and used on top of the initial data to build a new predictive model.

we need to filter out some residual before transforming them into signal. realized and for the second one losses were A future possible extension of this work is to test an important hypothesis. this could explain the good results achieved by looking at the accumulated wealth plot for the raw residuals approach on the 25th February data. it would stipulate that the absence of a good cluster of positive σ 2 values can be considered as evidence against the predictive power of the model. 2. 6 .Figure 3: The 24th of February plot of the Sharpe ratio versus σ 2 to compute the trading signals. thus an indication of a possible bearish day. Thus. for both considered periods. the residual may has some noise and we may end up in nonmarket neutral strategy. On other hand losses were incurred for the 24th of February data when using the same signal generation approach. However. The idea is to try to understand if the stability or robustness of the optimization procedure will have any impact on the profits and losses. We provide the ipython notebook to demonstrate this part and the WFO at https://github. 0. This should be considered as a warning that the methodology might try to overfit the available data.85] in accordance with the results obtained by Laloux et al.2 Signal Filtering Techniques The signal is obtained from the residual from section 2.75. Looking at the graph of σ 2 versus the Sharpe ratio again we see that the cluster of positive values of σ 2 is absent and that the objective function was highly oscillatory. It was interesting to note that the positive cluster of values was around σ 2 = [0. Indeed for the profitable day we can see in figure 2 that there is a cluster of positive values of σ 2 (achieving a high value of Sharpe ratio) and a cluster of negative values. (2008). Indeed. to achieve the more accurate signal and trade with market neutral strategy. The signal to buy the stock is when the residual is positive and the signal to short sell the stock is when the residual is negative.

7 . For example. We sort the filtered residual in order and count the number of positive residual and negative residual.Figure 4: Signal Filtering Techniques 2. we generate the signal by consider the sign of the residual if it is positive we get the signal 1 to by the stock. therefore. we will not consider that stock in our analysis.3 Sorting Residual The goal of this part is that we want to obtain the market neutral portfolio. Then we set the low magnitude residual to zero until the number of positive and negative residuals is equal.2 Active Stock Filtering We also consider only stocks that actively traded by consider ratio of zeros in the return data. 2.2. 2.2. We pick the ratio ρ to filter out and set up the banner by positive banner = ρ · max(positive residual) + (ρ − 1) · min(positive residual) negative banner = ρ · max(negative residual) + (ρ − 1) · min(negative residual) If the residuals that lie between this banner are filtered out by setting them to zero.4 Generate Signal After all the methods described above. 2.1 Residual Filtering When we get the residuals We want to extract the strong signal from them.2.2. if it is negative we get the signal −1 to sell the stock. If the return of the stock contians more the ratio of zeros than the threshold τ . we ignore the small residual with low magnitude by setting it to zero.

our simulation compares across different methods and shows there is some unstable factor on the last minute of the net wealth we invested (due to dumping all the positions). prediction and execution with in 300 milliseconds which make it feasible for high-frequency trading. we can’t identify one model that is always profitable.1 Results Simulation Settings We are provided with high frequency data by Thesys from Feb 24. 2014. And Least Square tends to result in small profit or loss. Last but not least. We need to perform more back testing to further discover the detailed personality of each methods and decide which method to implement base on different settings. 2014 to Feb 28. and Random Forest is the most time-consuming. Logistic Regression costs from 60 to 170 milliseconds in each innerloop. The other one is for the comparison on different parameters in the parameter tuning using only Raw Residual signals. The parameters are selected by tuning in the first part of simulations for comparing different methods.5 for fair comparisons across all methods. we conducted simulations to tune and evaluate different methods. across all the models we used. There are two kinds of simulations with different settings. computational cost (inner loop time). all the proposed and developed methods can finish evaluation. the proposed methods are robust at the extent of daily profit. Raw Residual Signal and Least Square based methods are the most efficient. Thirdly. minimum and maximum wealth are evaluted and compared. Thus. 3. So we perform Logisitic regression without filtering) and Active stock parameter τ = 0. Based on this data set. The highest daily profit is achieved using Elastic Net while the largest daily loss is achieved using Logistic Regression. Secondly.2 Simulation Results First of all. we 8 . we demonstrate the computational cost for different methods which can guide the feasiblity of implementation in high-frequency trading (shown in Table 1 (a) below).3 (except Logistic regression. if we choose the variance parameter for eigenvalue distribution.3 3. we are optimistic about the results we get since we do not see which model should be discarded. We used the filtering parameter ρ = 0. Daily profits for each methods are shown in Table 1 (b) and Table 1 (c) below. Daily profit (Investment returns before and after the last minute of the day). The first one is for the comparisons across different methods proposed before. in general. Random Forest costs from 130 to 280 milliseconds. However. while Raw Residual Signal costs 5 to 15 milliseconds. Elastic Net costs 50 to 200 milliseconds. In the raw residual simulations. Among all the methods. Least Square costs 16 to 115 milliseconds. Logistic regression is not stable regards to filtering parameter. But.

56 Raw Residual -2570.83 -1419.745 -3831.84 231.205 5th Day -5079.92 5296. if more than one is chosen. Strategy Raw Residual Least Square Logistic Regression Elastic Net Random Forest Min Inner-Loop Time (ms) 5 16 60 60 130 Max Inner-Loop Time (ms) 15 115 170 200 280 (a) Inner-loop time cost for different methods Strategy Elastic Net Least Square Random Forest Logistic Regression Raw Residual 1st Day 2nd Day 3rd Day 4th Day 4991.125 -1155.775 5th Day -5374. since in any given day at least two of 9 .67 -6603. it would seem to be better to limit future analysis to strategies 1-3-5 or 3-4-5.235 (b) Investement Returns the minute before last minute of the day Strategy 1st Day 2nd Day 3rd Day 4th Day Elastic Net 6223.545 3247.98 -4056.945 -1426.35 -3673.335 -1543.005 3441.71 -1137.285 -11122. under the study undertaken.325 4704.715 Logistic Regression 2755.48 -208.33 1465.275 3693. This suggest that in a real-case scenario. relying on a single strategy seems to be unwise and too risky. we will have to correctly choose which strategy to use out of the five studied and.12 Random Forest 1979. A closer look at the table reveals that any given method is unable to generate profit on more than four days out of the five considered.865 -1159.92 7410.49 (c) Investement Returns on the last minute of the day Table 1: Performance Comparison of each strategies 4 Conclusion As it can be seen from table one we are able to generate positive (P) returns on any of the day considered. though from a cursory look at table one.34 4213.835 -300.23 -8128.98 7315.54 -7885. This implies that we are just as likely to generate negative (N) or positive returns on any given day.are profitable on every day of the given data.88 2200.77 -2142.035 6212.68 3510.885 250.56 5375.985 Least Square 934. in order to correctly implement a winning HFT stats-arb strategy.815 3611.935 -4259.495 3330.76 3214.895 1207.96 -1108.255 -539.225 3363. though this ability of making profit depends strongly on the chosen optimization method.53 -7647. what weight to assign in order to maximize returns while minimizing risk.56 -5180. thus.92 3037. A first possibility could simply be choosing a static optimal weight for each of the presented optimization strategy.

In summary. This feature could be exploited to maximize return by increasing the amount of money invested through a single strategy throughout the day as it makes profit. 1467. as figure 5 through 11 shows. [Lal+99] Laurent Laloux et al. p. In: Physical review letters 83.the strategies give positive returns. “Noise dressing of financial correlation matrices”. “Statistical arbitrage in the US equities market”.under the mindful consideration of a correct computation of our book signal as well as a careful implementation of our optimization process.7 (2010). there seem to be clear daily trends depending on the optimization strategy choosen. Alternatively a continuosly updating weighting process could be applied. In: Quantitative Finance 10.7 (1999). 761–782. the results presented so far show the feasaibility of implementing a HFT stats-arb strategy. pp. while filtering out the negative effect due to negative return strategies. References [AL10] Marco Avellaneda and Jeong-Hyun Lee. A Appendix 10 .

50 100 150 200 6999000 0 7001500 Simulation of 7M investment Wealth 7001000 6998000 Wealth Simulation of 7M investment 0 50 Time 100 150 Simulation of 7M investment 200 0 50 Time 100 Time (c) Day 26 (d) Day 27 6999000 7003000 Simulation of 7M investment Wealth 200 6999000 Wealth 7002000 Simulation of 7M investment 50 150 (b) Day 25 6999000 Wealth 7002000 (a) Day 24 0 100 Time 0 50 100 150 Time (e) Day 28 Figure 5: Raw Residuals 11 200 150 200 .

Simulation of 7M investment 7000500 Wealth 0 50 100 150 6998500 6999500 6998000 6996000 Wealth 7000000 7001500 Simulation of 7M investment 200 0 50 100 Time (a) Day 24 200 (b) Day 25 Simulation of 7M investment Wealth 7000000 6996000 7002000 6998000 7004000 7000000 Simulation of 7M investment 100 150 200 0 50 100 Time Time (c) Day 26 (d) Day 27 7003000 Simulation of 7M investment Wealth 50 7001000 0 6999000 Wealth 150 Time 0 50 100 150 200 Time (e) Day 28 Figure 6: Raw Residuals after filtered with ρ = 0.3 12 150 200 .

Simulation of 7M investment 7003000 7002000 Wealth 7000000 7001000 7004000 7002000 7000000 Wealth 7006000 Simulation of 7M investment 0 50 100 150 200 0 50 100 Time 150 200 Time (a) Day 24 (b) Day 25 Simulation of 7M investment 7000000 7004000 Wealth 6999000 6998000 100 150 200 0 50 100 Time Time (c) Day 26 (d) Day 27 6998000 7000000 Simulation of 7M investment Wealth 50 6996000 0 6994000 6997000 Wealth 7000000 Simulation of 7M investment 0 50 100 150 Time (e) Day 28 Figure 7: Elastic Net 13 200 150 200 .

Simulation of 7M investment 6999000 Wealth 6997000 6998000 6999500 6998500 Wealth 7000500 7000000 Simulation of 7M investment 0 50 100 150 200 0 50 100 Time (a) Day 24 7004000 Wealth 7000000 7002000 6998000 7000000 7006000 Simulation of 7M investment 6996000 100 150 200 0 50 100 Time Time (c) Day 26 (d) Day 27 7003000 Simulation of 7M investment Wealth 50 7001000 0 6999000 Wealth 200 (b) Day 25 Simulation of 7M investment 6994000 150 Time 0 50 100 150 Time (e) Day 28 Figure 8: Least Square 14 200 150 200 .

Simulation of 7M investment Wealth 6999000 6997000 7002000 7000000 Wealth 7004000 7001000 7006000 Simulation of 7M investment 0 50 100 150 200 0 50 100 Time (a) Day 24 7001000 7000000 Wealth 7003000 7005000 7002000 Simulation of 7M investment 100 150 200 0 50 100 Time Time (c) Day 26 (d) Day 27 6996000 Wealth 7000000 Simulation of 7M investment 6992000 50 6988000 0 6999000 7001000 Wealth 200 (b) Day 25 Simulation of 7M investment 6999000 150 Time 0 50 100 150 Time (e) Day 28 Figure 9: Random Forest 15 200 150 200 .

3 16 150 200 .Simulation of 7M investment Wealth 6999000 6997000 7002000 7000000 Wealth 7004000 7001000 7006000 Simulation of 7M investment 0 50 100 150 200 0 50 100 Time (a) Day 24 7001000 7000000 Wealth 7003000 7005000 7002000 Simulation of 7M investment 50 100 150 200 0 50 100 Time Time (c) Day 26 (d) Day 27 7004000 Wealth Simulation of 7M investment 7000000 0 6999000 7001000 Wealth 200 (b) Day 25 Simulation of 7M investment 6999000 150 Time 0 50 100 150 200 Time (e) Day 28 Figure 10: Random Forest after filtered with ρ = 0.

Wealth 6998000 7000000 7004000 7002000 Simulation of 7M investment 7000000 Wealth Simulation of 7M investment 0 50 100 150 200 0 50 100 Time (a) Day 24 7000000 Wealth 6996000 6998000 6996000 7002000 7000000 Simulation of 7M investment 6992000 100 150 200 0 50 100 Time Time (c) Day 26 (d) Day 27 6998000 Simulation of 7M investment Wealth 50 6994000 0 6990000 Wealth 200 (b) Day 25 Simulation of 7M investment 6988000 150 Time 0 50 100 150 200 Time (e) Day 28 Figure 11: Logistic Regression 17 150 200 .