SSRN Id2828363 PDF

Order Anticipation around Predictable Trades∗
Mehmet Sağlam†
Lindner College of Business
University of Cincinnati
email: mehmet.saglam@uc.edu
Initial Version: August 2016

This Version: August 2018
Abstract
I study the presence of order anticipation strategies by examining predictable patterns in

large order trades. I construct three simple signals based on child-order execution patterns and
find empirical evidence that stronger signals are correlated with higher execution costs. I use
the SEC’s ban on unfiltered access and increase in noise trading as shocks to order anticipatory
activities of algorithmic traders and show that the price impact of predictability is smaller when
order anticipation becomes difficult. The empirical findings are mostly consistent with the back-
running theory which predicts delayed price impact as strategic traders learn about the large
order gradually.
Keywords: Order Anticipation, Algorithmic Trading, Predatory Trading, Back-running

JEL Classification: G12, G14.
∗
I am grateful for helpful comments from Robert Battalio, Hendrik Bessembinder, Jonathan Brogaard (EFM
discussant), Shane Conway, Brian Hatch, Terry Hendershott, Peter Hoffmann, Albert Menkveld, Maureen O’Hara,
Richard Philip (FIRN discussant), Vikas Raman (FMA discussant), Andriy Shkilko, Elvira Sojli, Mao Ye, and Haoxi-
ang Zhu and conference participants at Econometrics of Financial Markets, FIRN 2017 Sydney Market Microstructure
Meeting and FMA 2017.
†
Please address correspondence to: Mehmet Sağlam, University of Cincinnati, 2925 Campus Green Drive, Cincin-
nati, OH, 45221-0195, Phone: (513) 556-9108; Fax: (513) 556-0979; Email: mehmet.saglam@uc.edu
1
1. Introduction
The recent advances in trading technology combined with machine learning theory have provided
tools to sophisticated investors to extract valuable signals about asset prices and make dynamic
trading decisions at high-frequency. Usually referred to as high-frequency traders (HFTs), they
follow various trading strategies ranging from market making to arbitrage trading between multiple
assets or venues. In response to their rise in trading activity in multiple markets, there have been
several academic studies to decipher the costs and benefits associated with this new form of trading.1
The empirical evidence has been mostly positive by pointing to smaller bid-ask spreads and faster
price discovery. However, the implications are still controversial especially in public perception
thanks in large part to the appearance of Michael Lewis and his book, Flash Boys, in mainstream
media arguing that “the stock market is rigged.” At the center of this debate was the potential
ability of these strategic algorithmic traders to use order anticipation strategies to sniff out large
orders and exploit this information to profit at the expense of other investors.
Schedule based algorithms aiming to match time-weighted average price (TWAP) or volume-
weighted average price (VWAP) in the market are believed to be the main source of information
leakage in large order executions. In a recent survey done by ITG, a financial technology com-
pany implementing algorithmic execution services, roughly 50% of buy side investors reported that
their biggest source of information leakage occurs in schedule based algorithms.2 In some cases,
predictable patterns emerging from these algorithms can be even discerned by human traders. On
July 19, 2012, four large-cap stocks, Coca-Cola, IBM, McDonald’s and Apple, displayed identical
price patterns potentially due to a schedule based algorithm.3 In Figure 1, I plot the trade prices
for Coca-Cola. In odd half-hour intervals, the price is roughly decreasing and in even half-hour in-
tervals the price is increasing. Interestingly, the peaks occur roughly at the half-hour mark whereas
the lows come at the start of the hour. These price dynamics can be exploited by strategic traders.
1
For example, Hendershott et al. (2011) provide empirical evidence that algorithmic trading (AT) reduces spreads
and adverse selection using an exogenous event in NYSE’s quote dissemination. Brogaard et al. (2014b) find that
HFTs increase price efficiency by trading in the direction of permanent price changes. Jones (2013) and Menkveld
(2016) survey the empirical and theoretical literature in high-frequency trading.
2
“Put a Lid on it: New Way to Measure Information Leakage”, ITG, August 2017.
3
“Sawtooth Trading Hits Coke, IBM, McDonald’s, and Apple Shares”, The Wall Street Journal, July 19, 2012.
2
77.6
77.4
77.2
Price
77
76.8
76.6
76.4
0
0.5
1.5
2.5
3.5
4.5
5.5
6.5
Hours
Figure 1: This figure illustrates the trade prices for Coca-Cola on July 19, 2012 from TAQ database. There
is a strong sawtooth trading pattern potentially caused by schedule-based algorithms.
For example, in anticipation of the continuing pattern, the low price corresponding to t = 4.5 occur
a few minutes earlier leading to a trade priced at $76.75 which is significantly lower than its past
30-minute average price. Overall, this real-world example provides evidence that some strategic
traders may take advantage of such patterns with further help from machine-learning techniques.
Investors and policymakers are concerned with these order anticipation strategies due to poten-
tial negative effects on price discovery and liquidity. If an adversary trader can infer the presence
of a large buy order submitted by a long-term investor for liquidity reasons, he can trade along
with the investor during the initial period of the execution to overshoot the price and then sell
back to the investor his accumulated position at the elevated prices to achieve roughly riskless
profits. In return, this means larger execution costs for the long-term investors as examined in
Brunnermeier and Pedersen (2005). If the long-term investor is trading due to private information
that he generated with costly effort, then a strategic adversary can extract this information using
order anticipation strategies and be part of the resulting profit at the expense of the investor. This
implies that the incentive of the investor to invest in information acquisition would decrease in the
presence of order anticipation. In this case, as argued by Stiglitz (2014) and Weller (2015), the
3
markets can be less informative if algorithmic traders can share the information rents of the fun-
damental investors who spend resources to obtain information about the real economy. In order to
differentiate these two different motivations, I will use “predatory trading” and “back-running” (as
introduced by Yang and Zhu (2015)) to refer to the first and second types of anticipatory trading,
respectively. In predatory trading, the resulting price impact is expected to be temporary as the
investor is not informed whereas in the presence of back-running, the price impact is permanent
due to the information-based trading.
In theory, order anticipation strategies may also lead to desirable market liquidity conditions
usually referred to as “sunshine trading.” This possibility is often ignored in the academic dis-
cussions surrounding the effects of order anticipation. If a large order execution is submitted by
an uninformed investor, predictable executions may actually motivate market makers to provide
additional liquidity knowing that there is no adverse selection risk. Admati and Pfleiderer (1991)
formalize this theory and illustrate that public announcement of an uninformed liquidation may
reduce trading costs. They directly assume that market makers have perfect knowledge about the
informationless liquidation ex ante. In the context of large order executions, anticipatory traders
can gradually learn whether the large execution is informed or not by analyzing the price impact of
the past trades. Depending on the accuracy of this exploration process, algorithmic traders may be
incentivized to provide more liquidity during the lifetime of the execution. In the empirical liter-
ature, only the perfect information case has been extensively studied. For example, Bessembinder
et al. (2016) find supporting evidence of sunshine trading in large executions occurring in crude oil
ETF rolls.
Given the mixed implications of predictable trading both theoretically and empirically, it is
important to study the net effects arising from recurring patterns in the order flow data. In this
paper, I empirically investigate whether predictable patterns in large order executions lead to higher
or lower trading costs. Using natural experiments, I construct the potential channels between order
anticipation and predictable patterns. I then examine the consistency of my empirical findings
with predatory trading, back-running and sunshine trading. The empirical analysis utilizes more
than 20,000 parent-orders constituting more than 2.5 million child-order executions. My sample
4
includes 15 months of data on liquid S&P 500 stocks from January 1, 2011 to March 31, 2012. The
dataset consists of large orders submitted by 146 distinct investors comprised of mostly institutional
portfolio managers. All orders in the dataset are executed to match the VWAP realized during the
lifetime of the parent-order. Average order size is roughly $1 million and corresponds to roughly
1.8% of the volume traded during the execution.
If algorithmic traders follow anticipatory trading, their activities should be particularly easy
to detect in predictable executions. Here, I define execution predictability as the likelihood that a
strategic trader can succeed in inferring about the presence of a large order execution. I provide
three simple signals that quantify this measure of predictability utilizing statistics from the child-
orders of the execution. These signals are constructed based on the recurring intuition about
minimizing information leakage.
The first signal is computed using the volatility of the size of the child-order trades. For
example, if a large order is executed in a series of equal trade sizes, e.g., 150 shares, the pattern
recognition algorithms may deduce the existence of a large order with high likelihood. Second,
using a similar intuition, I consider the regularity of the trading intervals as a signal that can leak
order-size information and propose a signal that computes the volatility of time intervals between
successive trades. This is motivated by the empirical evidence in the literature that execution
algorithms exhibit robust clock-time periodicity, the tendency to make trades around full-seconds
or half-seconds as identified by Hasbrouck and Saar (2013). It is also possible to consider the
order size and trading frequency at the same time and propose an aggregate measure involving
the volatility of trading rate. Thus, the final signal is based on the correlation between executed
quantity and elapsed time and will be very close to 1 for executions with almost constant trading
rate.
It is worthwhile to emphasize that in the optimal execution literature, some of these measures
are argued as factors that actually decrease execution costs. However, these results are based on
the absence of any other competing traders in the marketplace. For example, in the seminal paper
on optimal liquidation of large block of shares, Bertsimas and Lo (1998) illustrate that equal-
partitioning policy is optimal if the price impact is permanent and linear. This suggests that the
5
volatility of the traded quantities of child-orders or trading intervals should be at minimum. Similar
deterministic algorithms are proposed in order to accommodate U-shaped volume profiles in the
markets, e.g., VWAP. All three signals are particularly motivated by information leakage due to
deterministic patterns and the presence of strategic traders with order anticipation skills would be
the most significant factor increasing the cost of the predictable executions.
This paper contributes to the literature by providing robust signals that can quantify execution
predictability and analyzing their impact on execution costs using liquid S&P 500 stocks. Since the
execution strategy is known and unique along with a single broker, the dataset allows me to clearly
verify the impact of predictable executions. The empirical findings are consistent with earlier
literature studying the link between HFT activity and institutional trading costs and uncover
another direct channel for the cost increase through execution predictability. Using a diverse
universe of institutional investors in terms of short-term trading skill, I find evidence of back-running
strategies and the cost increase is economically significant. Analyzing uninformed executions, I do
not find evidence of sunshine trading, i.e., predictable uninformed executions do not have lower
trading costs. Specifically, the empirical analysis provides four main contributions.
First, the signals are significantly correlated with execution costs after controlling for the stan-
dard determinants of price impact implying the potential presence of order anticipation. As the
predictability of the execution goes up, I find that the execution cost measured by its implemen-
tation shortfall, the percentage deviation of the average execution price from its starting price,
increases by economically substantial amounts. The median execution cost is 2.7 bps and I find
that a one-standard deviation increase in my signals increases the execution cost by a range of 1.6
to 1.8 bps after controlling for a rich set of execution level statistics including the use of marketable
and passive limit orders. Further, I show that lagged signals predict higher execution costs to
control for potential contemporaneous correlation between illiquidity and signals.
Second, I exploit the SEC ban on unfiltered access and exogenous increase in noise trading to
directly link the increase in trading costs to successful order anticipation ability. In the presence
of the SEC ban, supervised access to the market centers imposes additional controls and latency.
These speed bumps slow down the order submission of some fast algorithmic traders, limiting their
6
order anticipation abilities. I verify that proxies of AT activity point to a significant reduction
after the ban and show that when order anticipation becomes more difficult after the ban, the price
impact of predictability drops. Further, order anticipation ability will be weaker when there is an
increase in noise trading volume. Using the potential behavioral bias of traders to round prices, I
document that when noise trading disguises the patterns leaking from large order executions, the
cost of predictability again decreases.
Third, I decompose the realized cost increase into permanent and temporary components for
each signal. I find that the permanent component explains most of the cost increase suggesting that
these signals may be leaking the potential private information of the investor to the back-runners.
Further, I find that the signals are correlated with the cost increase realized at the later stage of
the execution implying a learning period and delayed reaction by back-runners. These results are
broadly consistent with back-running theory.
Fourth, I find that the broker responsible for the executions is mostly successful in achieving the
main objective of the execution strategy, i.e., minimizing the deviation between average execution
price and the market VWAP. I observe that the correlation between implementation shortfall and
VWAP slippage is negligible. This finding implies that execution predictability may still emerge in
equilibrium despite sharing informational rents with strategic traders.
These contributions have important implications for market structure. In the presence of order
anticipation, fundamental investors may either shy away from information acquisition or engage
in costly encryption strategies to minimize information leakage. In this case, policymakers can
potentially minimize this aggregate social cost as a centralized social planner. If algorithmic traders
are using public order flow data such as trade sizes and timestamps in the SIP, these data might
be rearranged to minimize its signal-to-noise ratio. For example, trade sizes and their timestamps
can be either published with noise or aggregated across different trades by the reporting market
venues. Specifically, Harris (2013) proposes that markets should report only approximate trade
sizes within various buckets or only aggregated volumes at 5-minute time intervals.
The methodology and the findings of the paper can be also utilized in the study of the Consol-
idated Audit Trail (CAT) that will be updated significantly in November 2018. On November 15,
7
2016, the SEC approved a plan to implement a software system that includes information about
the identity of the traders.4 Thus, it is expected that the regulators will soon be able to test related
theories to order anticipation using an extensive dataset. The proposed signals in this paper and
their underlying intuition can be influential for future studies.
This paper is primarily related to the growing empirical literature analyzing the relationship
between institutional trading costs and HFTs’ activities. Van Kervel and Menkveld (2015) study the
cost dynamics of large orders by studying the trading behavior of the HFTs during their execution.
They find that HFTs initially provide liquidity to the execution but finally revert their trading to
the same direction of the large order. My paper differs from this study by providing a specific
mechanism on how the HFTs may learn about the presence of the large order. Van Kervel and
Menkveld (2015) use trading data reported from four institutional investors based in Sweden who
might have used various trading algorithms due to different brokers or trading needs. Instead, my
dataset uses trading data from a much larger set of 146 distinct investors that utilize a specific
broker’s single trading algorithm. The benefit of focusing on one algorithm is that it removes
the variation in execution costs due to heterogeneity across brokers and their various algorithms.
Larger investor base helps me to study the impact across a heterogeneous group of informed and
uninformed traders. I focus on the footprints left by the trading algorithm whereas Van Kervel and
Menkveld (2015) examine how HFTs’ trading activity evolve during the execution.
Brogaard et al. (2014a) examine the shocks to the latency of London Stock Exchange that
increase HFT activity over time and study the variation of institutional trading costs around these
changes using a dataset provided by Abel Noser, a consulting firm specialized in the analysis
of institutional trading costs.5 They cannot find any clear evidence of change in trading costs
during these latency upgrades. In a closely related study to Brogaard et al. (2014a), Tong (2015)
documents evidence that institutional trading cost is higher as the HFT activity increases using Abel
Noser and NASDAQ HFT datasets. Tong (2015) also analyzes the relationship at the stock-day
or parent-order level due to unavailability of the child-order data. Instead, my analysis focuses on
4
http://www.wsj.com/articles/sec-to-vote-on-consolidated-audit-trail-to-detect-market-manipulation-1479240411
5
Hu et al. (Forthcoming) provide detailed information about the Abel Noser dataset and survey the academic
papers that utilize this dataset.
8
identifying predictable patterns using child-order data and illustrates an alternative perspective that
institutional investors may ease exploration activities of the algorithmic traders with predictable
trading.
Hirschey (2013) finds empirical evidence supporting that HFTs may increase the trading cost of
non-HFTs by trading ahead of them. He obtains his dataset from NASDAQ which labels trading
firms either an HFT or a non-HFT. The data is focused on this distinction and does not allow him
to identify a specific instance of a large order execution and its corresponding cost. For this reason,
this paper does not consider whether non-HFTs leave footprints while trading large shares.
Korajczyk and Murphy (2015) use an order-level data with masked trader ids provided by
Investment Industry Regulatory Organization of Canada, a regulatory organization for Canada’s
equity markets. As in the case of Hirschey (2013), the dataset does not fully reveal how a parent
order is split into child orders but they try to identify both large institutional orders and the set
of HFTs using reasonable set of assumptions. Using these labels, they find that HFTs initially
provide liquidity to the large order but then compete with it due to inventory management and
back-running. My dataset exactly identifies the series of the child-orders of the parent order.
Constructing the signals for predictability, I document another channel for the potential increase
in execution costs.
The rest of the paper is organized as follows: Section 2 reviews the relevant theoretical models
that allow me to form my hypotheses regarding the empirical analysis on order anticipation. In
Section 3, I describe the dataset and provide its summary statistics. Section 4 includes the detailed
information about the construction of the signals and uses multivariate analysis to test whether
the presence of the signals leads to higher execution costs. Section 5 examines the impact of
reduced order anticipatory activities on the cost of predictable executions after a regulatory shock
slowed down the order submissions of a group of algorithmic traders. Section 6 illustrates that
when patterns cannot be detected easily, the cost of predictability drops. Section 7 examines the
relationship between the empirical results and the theoretical models summarized in Section 2.
Section 8 examines the performance of the algorithm and illustrates that execution predictability
may emerge in equilibrium under misaligned objectives. Finally, I conclude in Section 9.
9
2. Theoretical Framework and Hypotheses Development
In this section, I review the relevant theoretical models that guide my empirical analysis on order
anticipation. I summarize my expectations from the perspectives of three competing theories:
sunshine trading, predatory trading and back-running. Sunshine trading implies lower trading costs
in predictable executions whereas the remaining two imply higher trading costs. One difference
between predatory trading and back-running is based on the type of the price impact motive: a
permanent impact due to private information or a temporary impact due to urgent liquidity trading.
Another difference between predatory trading and back-running is the timing of the price impact.
Early impact on prices is expected in predatory trading whereas there will be delayed price impact
in back-running due to initial learning about the large order.
Brunnermeier and Pedersen (2005) provide a model of predatory trading in which a distressed
large investor is forced to sell his position and other strategic adversaries aim to exploit from this
liquidity need. Instead of providing liquidity, predatory traders trade in the same direction initially
and cause the price to decrease further. At further depressed prices, predatory traders then buy
from the investor, which ultimately increases the total liquidation cost of the investor.6
Admati and Pfleiderer (1991) provide an alternative theory of forced liquidation based on sun-
shine trading in which investors can potentially signal that their trade is not motivated by private
information. This announcement can induce other strategic traders to provide liquidity and may
actually reduce trading costs. Specifically, Degryse et al. (2014) illustrate that order splitting can
serve as a noisy form of preannouncing trades. Bessembinder et al. (2016) extend the model of
Brunnermeier and Pedersen (2005) for resilient markets in which the immediate price impact of
trades may be transitory. In this model, in addition to the same-side trading before the liquidation,
the strategic anticipators trade in the opposite direction as the liquidator and decrease the liquida-
tor’s transitory price impact if the market is largely resilient. This benefit to the liquidator from
strategic trading persists at any level of market resiliency if there are multiple strategic traders. In
the context of large order executions, these theories suggest potential decrease in trading costs for
6
Moallemi et al. (2012) consider a model with a predatory trader who does not have perfect information about
the large order but learns about it from the noisy price impact of the past trades.
10
predictable executions.
If predatory trading increases the cost of the execution for an uninformed order, one would
expect this cost to be transitory. This implies that if predictable executions induce the stock price
to go up during a large buy order, it will be expected to revert back to its pre-execution level after a
short period. However, if the large order is submitted by an informed investor, the price changes due
to predictable patterns should be permanent. Such a permanent price impact would be consistent
with the back-running theory presented in Madrigal (1996) and Yang and Zhu (2015). These papers
use two-period Kyle models in which an informed investor trades on private information in the first
period and strategic traders receive an imperfect signal about this in the second period. They then
exploit this signal to trade in the same direction implied by the private information of the investor.
In this sense, adversary traders are trying to “steal” the private information of the investor by
inferring from the past order flow.
Yang and Zhu (2015) illustrate that the optimal strategy for the investor facing the risk of leaking
valuable order flow information is to randomize his first period trade. Back-running theory implies
that any additional price impact due to a predictable signal in a large order execution should be
permanent as the prices ultimately reflect the private information of the investor. Although both
predatory trading and back-running imply higher trading costs, they have particularly different
implications with regards to initial price movements during the execution. In predatory trading,
one would expect larger price impact during the initial period as adversary traders have perfect
information about the liquidity trade of the distressed investor and they would trade in the same
direction of the large order at the beginning of the execution. However, in back-running, there is
a learning period initially thus the price impact due to the adversary traders would be delayed till
the later stages of the execution. Thus, this difference can be also exploited to distinguish between
these two theories.
The theoretical model in Yang and Zhu (2015) directly links the success of the back-runner
to the volatility of the signal. When the signal’s volatility is higher, the back-running ability and
the resulting profit becomes lower. Therefore, this model implies that even though an algorithm
may be using the same instructions from the same code, market conditions can determine the net
11
impact on trading costs. For example, the algorithm’s footprints can be detected with less accuracy
during an increased activity from noise traders. Relatedly, predictable signals can be exploited with
higher success if the adversary trader is faster both in processing the patterns and trading on them.
Aït-Sahalia and Sağlam (2013) study this effect in the context of market-making employed by a
high-frequency trader. The high-frequency market maker is able to predict a low-frequency trader’s
liquidity need with an imperfect signal. When the market maker is confident that he is going to buy
from an impatient trader, he lowers his bid quote before his order reaches to the market. Aït-Sahalia
and Sağlam (2013) shows that when the market maker is faster, such strategic quote widening also
occurs more frequently. This theory implies that predictable signals will lead to lower trading costs
when the adversary traders lose their speed advantage. Overall, these two papers directly imply
that when order anticipation becomes more difficult because of noisier signals or slower trading
technology, the cost of predictability decreases.
3. Data
I compile the data from several sources. Stock returns, volume, outstanding shares and prices
come from the Center for Research in Security Prices (CRSP). Intraday trade and quote data come
from the Trade and Quote (TAQ) database. The proprietary data on institutional large trades is
provided by the global execution desk of a large investment bank. In the next section, I describe
this dataset.
3.1. Proprietary Execution Data
For my empirical study, I use detailed execution data from the historical order databases of a large
investment bank (“the broker”) in the US. The broker is one of the top five banks in the United
States with respect to total assets and is one of the top five providers of execution services by
market share.
The orders originate mainly from institutional portfolio managers but the investor identities are
masked. All of the large orders in this dataset are executed according to a single execution strategy,
12
VWAP. This algorithm is the most commonly employed strategy, constituting roughly 50% of all
of the broker’s execution volume. According to this strategy, parent-orders are executed in smaller
child-order trades over the course of a trading day to achieve an average execution price that is as
close as possible to the volume-weighted average price observed in the whole market during this
trading period.
Through personal communication with the data provider, I obtained the implementation details
of the VWAP algorithm. First, the client submits a large order with either a target completion time
or an urgency score in the broker’s order submission platform. The algorithm then slices the large
order using the historical volume curve over the past month between the initiation of the order and
targeted completion time. The algorithm does not explicitly model volume spikes around scheduled
announcements. In this set-up, the client may affect the initiation of the VWAP algorithm with
order size, start time and targeted end time, but once the order is initiated, the clients do not have
any control over how each child order size, price, or its timing is selected. In summary, the client
may shape the parameters of the VWAP algorithm implying that execution costs can depend on
the client identity.
This dataset provides rich attributes at the parent- and child-order level. At the parent-order
level, most of the statistics are based on the execution horizon. These statistics include order size,
direction of the order (buy or sell), order start and end times, participation rate (the ratio of order
size to the total volume during the trading interval), average execution price, proportional bid-ask
spread and mid-quote volatility based on the duration of the execution. For each parent-order, I also
have information at the child-order level. Child-order level statistics include the time (timestamped
to the millisecond), size, venue (market center) and price of each child trade.
Merging the information on child order executions with the quote data from TAQ, I infer
whether the trade execution was a result of the algorithm’s submission of a passive non-marketable
order or an aggressive marketable order. I label a buy (sell) execution at the child-level as a passive
order if the trade price was below (above) the NBBO mid-point at the time of the fill. Similarly,
I label a buy (sell) execution at the child-level as an aggressive order if the trade price was above
(below) the NBBO mid-point at the time of the fill. The remaining orders will be filled at the
13
NBBO mid-point. I then create the corresponding parent-order level statistics, passive order ratio
(PO) and aggressive order (AO) ratio, by computing the fraction of the shares executed via each
order type.
The traded asset universe includes 498 stocks from S&P 500 index with an execution duration
greater than 10 minutes but no longer than 6.5 hours, the duration of a regular trading day. All
executions occur between January 2011 and March 2012, inclusive. I also exclude executions which
have less than 5 child-order trades or have value less than $50,000 at the arrival time of the order
which correspond to approximately 1,500 executions. I finally exclude additional 200 executions
with missing entries of participation rate, spread, volatility, or duration.
This is the first research paper that is completely based on this dataset of detailed child-order
trades. The dataset has been partially used in Sağlam et al. (2014) and Sağlam and Tuzun (2018).
In Sağlam et al. (2014), the child-order dataset has been used to examine whether skilled investors
prefer to trade in dark pools. In Sağlam and Tuzun (2018), the dataset has been used to examine
the price resiliency of stocks with high ETF ownership.
3.2. Summary Statistics
The final sample consists of 20,335 executions coming from 9,856 buy and 10,479 sell orders on
498 stocks. There are 146 distinct investors submitting the orders. Each investor has at least 1
execution and at most 477 executions. Table 1 provide additional summary statistics for my final
execution data.
On average, a parent-order has value of roughly $1 million and is executed in 128 child trades
with average participation rate of 1.8%. The mean duration of the executions is a little above
3 hours. Overall, these statistics are similar in terms of order of magnitude when compared to other
datasets studied in the contemporaneous literature, for example, the Canadian and the Swedish
dataset used by Korajczyk and Murphy (2015) and Van Kervel and Menkveld (2015), respectively.
14
3.3. Measurement of Institutional Trading Costs
In this section, I introduce the most commonly used cost measure for institutional trading, imple-
mentation shortfall (IS). Perold (1988) introduced this measure to quantify the difference between
the performance of a theoretical and the implemented portfolio. Over the years, IS has gained
popularity especially as a proxy for institutional trading cost. It is computed as the normalized
difference between the average execution price and the price of the asset prior to the start of the
execution. Formally, the IS of the ith parent-order is given by
Piavg − Pi,0
ISi = sgn (Qi ) , (1)
Pi,0
where Qi is the order size with Qi > 0 (Qi < 0) for buy (sell) orders, Piavg is the volume-weighted
execution price of the parent-order and Pi,0 is the mid-quote price of the security (arrival price)
when the parent order starts being executed.
Table 1 provides the summary statistics of IS. The mean (median) IS is roughly 3.1 (2.7)
bps. The empirical literature on institutional trading costs often uses participation rate, bid-offer
spreads, volatility, order duration, turnover and market capitalization as the main drivers of IS.
For example, Almgren et al. (2005) includes participation rate, bid-offer spread and volatility in
analyzing the variation in IS. Van Kervel and Menkveld (2015) uses order duration, volatility and
turnover as control variables. Tong (2015) uses logarithm of market capitalization as an additional
control variable. Most of these studies also use stock and client fixed effects. Following these
studies, I will use these variables as controls throughout my empirical analysis.
4. Identification of Predictable Executions
In this section, I consider certain execution characteristics of parent-orders which will potentially
allow strategic algorithmic traders to realize that there is a large order being traded. If the chid-
order executions display a particular pattern with regards to randomness in order sizes or trading
intervals, strategic adversaries may exploit this information to obtain imperfect signals about the
15
presence of impatient, informed or uninformed investors. If these signals were at all informative,
one would expect them to have an ultimate effect on the price impact of the large order.
4.1. Persistent Trade Size
A strategic trader may infer from the noisy order flow that there is a large order being executed if
a series of child-orders is executed in a highly deterministic schedule, e.g., same trade size on each
child order. I expect that such executions can be subject to higher cost due to their predictable
nature as they may be exploited to generate imperfect signals about the information or urgency
level of the investors. In an opposing scenario, if the investor is uninformed, this predictability may
incentivize some algorithmic traders to provide liquidity as well. In order to test these hypotheses,
I will consider the standard deviation of the child-order sizes as a measure of potential information
leakage. Formally, for the ith parent-order in my data, I define

v
u
N
u 1 Xi
NegChildOrderVoli = −t (qi,j − q i )2 , (2)
u
Ni − 1 j=1
where qi,1 , qi,2 , . . . , qi,Ni denote the percentage of the parent order executed with each child order
and q i is the corresponding average of these values. Thus, NegChildOrderVol measures the sign-
adjusted volatility of relative child-order sizes which is comparable across executions. I add a
leading negative sign to match my prior that higher NegChildOrderVol (i.e., less randomness in
order size) should be correlated with higher costs.
If strategic traders were able to detect the magnitude of NegChildOrderVol, I would expect
that an execution with high (low) NegChildOrderVol would be more (less) costly to trade. Figure
2 illustrates this conjecture from two sample executions from the dataset. On the left, I have
an execution that is in the top quintile when sorted on NegChildOrderVol. I observe that out of
more than 70 child orders, only 4 of these do not have a size of 100. Therefore, this execution,
relatively speaking, has a very high value of NegChildOrderVol. On the right, I have an execution
that is in the bottom quintile of NegChildOrderVol statistic. I observe that this execution is pretty
random in terms of traded child-order sizes and has ultimately low NegChildOrderVol. Comparing
16
500 60 500 60
Cumulative IS Cumulative IS
Child Order Size Child Order Size
Implementation Shortfall (bps)

400 400
40 40
300 300
Quantity
Quantity
20 20
200 200
0 0
100 100
0 −20 0 −20
0 10 20 30 40 50 60 70 0 5 10 15 20 25 30 35
Child Order Child Order
Figure 2: This figure illustrates two executions with high (left) and low (right) values of NegChildOrderVol
and compares their IS trajectory during the lifetime of the order. This figure serves as a motivating visual
evidence for the formal multivariate analysis. I expect that the execution with high NegChildOrderVol to
be costlier to execute.
the IS values, I find that the execution with high NegChildOrderVol has much higher cost which
is monotonically increasing during the lifetime of the execution. Of course, IS is a noisy measure
that can be affected by many other factors, but nevertheless this simple visual evidence illustrates
the intuition of the signal construction based on leaking valuable order flow information.
4.2. Trading in Constant Intervals
Strategic traders may infer the presence of a large order if its child orders are being traded in nearly
constant trading intervals. For example, Hasbrouck and Saar (2013) report that agency algorithms
exhibit robust clock-time periodicity, the tendency to make trades around full-seconds or half-
seconds. Motivated by this finding, it is reasonable to expect that adversary traders analyzing
the time-stamps of the trades may obtain imperfect signals about the presence of informed or
impatient investors. Similar to earlier measure on child-order sizes, I will consider the standard
deviation of the time between two consecutive child-orders as a measure of potential information
17
40 30 40 30
Cumulative IS
Trade Intervals
Time Elapsed since Last Trade (s)
Time Elapsed since Last Trade (s)

20 20

30 30
10 10
20 0 20 0
−10 −10
10 10
−20 −20
Cumulative IS
Trade Intervals
0 −30 0 −30
0 5 10 15 20 25 0 2 4 6 8 10 12 14
Child Order Trade Child Order Trade
Figure 3: This figure illustrates two executions with high (left) and low (right) values of NegIntervalVol
evidence for the formal multivariate analysis. I expect that the execution with high NegIntervalVol to be
costlier to execute.
leakage. Formally, for the ith parent-order in my data, I define

v
i −1
u
u 1 NX 2
NegIntervalVoli = −t ∆ti,j+1 − ∆ti , (3)
u
Ni − 2 j=1
where ∆ti,j+1 , ti,j+1 − ti,j denote the time elapsed between child-orders as a fraction of total
duration and ∆ti is the mean value of these time intervals. Thus, NegIntervalVol measures the
sign-adjusted volatility of normalized trading intervals between consecutive child-orders. I again
add a leading negative sign to match my prior that higher NegIntervalVol (i.e., nearly constant
trading intervals) should be correlated with higher costs. With this definition, high (low) values of
NegIntervalVol are associated with high (low) periodicity of child-order trades.
Similar to the intuition developed for NegChildOrderVol, strategic traders may be able to infer
that a large order is being executed if they can also detect that a series of orders exhibit consistent
periodicity. I hypothesize that an execution with high NegIntervalVol may allow strategic traders
to form a strong belief about the presence of a large order. Figure 3 illustrates this conjecture
visually from two sample executions. On the left, I have an execution from the top quintile of
executions sorted on NegIntervalVol. I observe that each child order is separated by exactly 10
18
seconds throughout the execution of 23 child-orders. Therefore, this execution has zero volatility
with regards to trading intervals. On the right, I have an execution that is in the bottom quintile
of NegIntervalVol statistic. I observe that trading intervals of this execution seem to fluctuate
highly. I again find that the execution with high NegIntervalVol has roughly increasing IS during
the lifetime of the execution whereas the other execution ends with negative cost with more volatile
IS trajectory. This simple visual evidence is again consistent with the potential signal extraction
by algorithmic traders exploiting deterministic patterns.
4.3. Constant Trading Rate
Another measure of interest to adversary algorithmic traders may be the constancy of the execu-
tion’s trading rate. The earlier measures do not account for the joint dynamics of the executed
quantity and its timing and thus may not reflect much information on the dynamic nature of the
trading rate. I expect that executions with roughly constant trading rates may be again subject to
higher cost due to potential signal extractions by strategic algorithmic traders. As an approximate
measure for this tendency, I consider the correlation between cumulative executed quantity and the
elapsed time since the start of the execution. Formally, I define this correlation, QtyTimeCorrel, as
PNi cum
cum (t
j=1 qi,j − qi i,j − ti )
QtyTimeCorreli = r 2 qP , (4)
PNi cum cum Ni 2
j=1 qi,j − qi j=1 (ti,j − ti )
cum , Pj
where qi,j k=1 qi,j denotes the cumulative sum of qi,j and qicum denotes its corresponding mean,
respectively. Note that ti,j measures the time elapsed between the start of the ith execution and its
jth child-order trade. Thus, QtyTimeCorrel measures the correlation between cumulative executed
quantity and total time and high values of QtyTimeCorrel essentially signify a deterministic trading
schedule with fixed trading rate.
Consistent with my earlier intuition, strategic traders will be able to exploit more information
leaking from an execution with high QtyTimeCorrel. Figure 4 illustrates this expectation visually
from my dataset. On the left, I have an execution from the top quintile of executions sorted on
QtyTimeCorrel. I observe that the correlation between executed quantity and time is nearly perfect,
19
4,000 40 6,000 40
Cumulative IS Cumulative IS
Cumulative Quantity 30 Cumulative Quantity 30
5,000

3,000
20 20
4,000
Quantity
Quantity
10 10
2,000 3,000
0 0
2,000
−10 −10
1,000
1,000
−20 −20
0 −30 0 −30
0 10 20 30 40 50 60 0 20 40 60 80 100 120 140
Time (mins) Time (mins)
Figure 4: This figure illustrates two executions with high (left) and low (right) values of QtyTimeCorrel
evidence for the formal multivariate analysis. I expect that the execution with high QtyTimeCorrel to be
costlier to execute.
i.e., QtyTimeCorrel is roughly one. On the right, I have an execution that is in the bottom quintile
of QtyTimeCorrel statistic. I observe that trading rate for this execution is much higher in the
second-half of the execution. As expected by my findings so far, I again observe that the IS of the
predictable execution is higher than its unpredictable counterpart.
4.4. Multivariate Regressions
Execution cost can be function of multiple trade-level and stock-level characteristics, thus, to test
formally whether each signal is associated with higher IS, I run the following multivariate regression
at the execution level with a rich set of control variables:
X S
X K
X
ISi = α + βSignali + δj Controlj,i + γk I{m(i)=s} + νk I{c(i)=k} (5)
j s=1 k=1
13
X
+ ζh I{Active in hth half-hour} + i ,
h=1
m
where the mapping i → s is used to identify the executed stock s, s = 1, . . . , 498 and the mapping
c
i → k is used to identify the client k submitting the order where k = 1, . . . , 146. In addition to
20
these stock and client dummies, I control for intraday volume patterns using a dummy for every 30
minutes. Further, I also consider execution-level control variables including participation rate, the
ratio of passive and aggressive orders, bid-offer spread, mid-quote volatility, execution duration,
turnover, logarithm of market capitalization.
Participation rate and order duration can control for the urgency of the trade and client-fixed
effects can control over the different trading strategies or the skill level of the investor that may
be correlated with the price movements during the execution. Controlling for the ratio of passive
or aggressive orders is also important as fills due to passive orders can have more randomness in
inter-trade time intervals and may be negatively correlated with the signals overall. Since passive
orders would earn the fraction of the bid-ask spread, if the lower realization of the signals are
driven by passive orders, I may obtain spurious relationship between the signals and execution
costs. Similarly, aggressive orders are costlier and may be positively correlated with the signals and
thus they may lead to an omitted variable bias if they are not properly controlled.
Table 2 reports the regression results with standard errors clustered at the calendar day level.7
In the first three columns, I observe that the estimated coefficient for each signal is positive and
statistically significant aligning with my expectations from the earlier visual evidence. The coeffi-
cients are also economically large. In the final three columns, I standardize all of the continuous
independent variables and find that one-standard deviation increase in each signal is associated with
an IS increase of approximately 1.6 to 1.8 bps. Given that the median implementation shortfall is
2.7 bps, the coefficients on the signals are economically significant.
4.5. Lagged Signals and Execution Costs
Despite for controlling for various execution-level statistics, one potential concern with the earlier
multivariate analysis is that there can be a contemporaneous missing variable that is driving both
the signals and IS. In order to mitigate this concern, I examine the relationship between the lagged
signals and the future execution costs by constructing the signals from the first half of the regression
and testing for potential cost increase in the remaining portion of the execution. This is a more
7
Throughout the analysis, I adjust standard errors by clustering on calendar day. Double clustering on calendar
day and executed stock provides approximately identical standard errors.
21
natural test as the information leakage to potential adversary traders may take some time which
is consistent with the learning mechanism in Yang and Zhu (2015).
Formally, I partition the execution into two parts by assigning the child-order trades from the
first 50% of the order size to the Initial bin and the remaining 50% to the Final bin.8 I then
construct a new set of three signals by using child-order trades only from the Initial bin and define
them InitNegChildOrderVol, InitNegIntervalVol and InitQtyTimeCorrel, respectively. Finally, I
compute the average trade prices in the Initial and Final bins of the execution to define the price
impact measures corresponding to the initial and final periods:
P̄i,1 − Pi,0 P̄i,2 − P̄i,1

InitISi = sgn (Qi ) , FinalISi = sgn (Qi ) (6)
Pi,0 P̄i,1
where P̄i,1 is the average trade price for the Initial bin and P̄i,2 is the average trade price for the
remaining Final bin.
I regress InitIS and FinalIS on the new signals constructed from the first half of the execution
using the same set of control variables:
X S
X K
X
Costi = α + βInitSignali + δj Controlj,i + γk I{m(i)=s} + νk I{c(i)=k} (7)
j s=1 k=1
13
X
+ ζh I{Active in hth half-hour} + i ,
h=1
where Cost is either InitIS or FinalIS and InitSignal is either InitNegChildOrderVol, InitNegInter-
valVol or InitQtyTimeCorrel.
Table 3 reports the regression results. I find that InitIS is not correlated with any of the three
signals constructed from the initial stage. Even the signs of the coefficients are mixed. These
findings alleviate the endogeneity concern in the main regressions as it suggests that the signals are
not correlated with an omitted variable of illiquidity as this would have increased (decreased) the
average price observed in the initial stage of the execution for a buy (sell) order implying a positive
8
Not all of the executions can be exactly split into two equal portions. Formally, Initial bin for the ith execution
includes the maximum number the child orders from the start of the execution so that the cumulative sum of these
orders is still less than or equal to 50% of the total order.
22
correlation with the signals and InitIS. Interestingly, all of the signals are significantly correlated
with FinalIS. Overall, this finding suggests that signals are gradually processed by adversary traders
consistent with the learning mechanism in the back-running theory.
4.6. Deterministic Patterns
Deterministic patterns in the algorithm may leak order flow signals to the strategic traders. In
Figure 3, each child order was separated by exactly 10 seconds throughout the execution of 23 child-
orders. In this section, I examine whether similar deterministic patterns, which can be exploited
by strategic traders, exist in the data.
First, I examine the distribution of the time periods between two consecutive trades in a parent-
order execution. I compute the trading intervals at the precision of hundredths of a second and
focus on intervals that are greater than 200 milliseconds. In this universe, the most frequent trading
interval is surprisingly 600.00 seconds, i.e., exactly 10 minutes. There are 4,476 consecutive trades
which are spaced with this time interval. Further, the second-most frequently employed trading
interval is 599.99 seconds which is again approximately 10 minutes. This trading interval occurs
703 times in the data. To interpret the abnormality of these frequencies, note that there are only 4
consecutive trades which are spaced apart with 599.00 seconds. These two statistics strongly imply
the imperfect randomization of the trading algorithm.9
Second, I study whether trading intervals occur more frequently around round seconds. To
examine this issue formally, I compare the number of consecutive trades that are spaced with X.00
seconds versus X.50 seconds where X is an integer. If the algorithm were to be fully randomized,
one would expect to see roughly equal number of consecutive trades with both of these spacings.
Table 4 reports the frequencies of the number of consecutive trades corresponding to these two
trading interval lengths. X is chosen to be from 1 second to 30 seconds. Contrary to the null
hypothesis of perfect randomization, I find that in all cases of X, the number of consecutive trades
with trading intervals of round seconds is always more frequent. The mean difference in frequency
is 97.2 with a corresponding t-statistic of 8.1.10

9
The third-most frequent trading interval is 1.00 second with 632 occurrences.
10
The findings are identical if I exclude parent-orders that starts at the round second, (e.g., 9:30:00AM) from the
23
Overall, these analyses strongly suggest that the trading algorithm displays imperfect random-
ization across trading intervals that can be exploited by anticipatory traders.
4.7. Signal Heterogeneity
Although all of the executions are traded according to a single execution strategy, there can still
be heterogeneity in how the parent order is traded due to client instructions (e.g., order size),
market conditions and algorithm’s intrinsic randomness. In this section, I investigate what drives
this heterogeneity in detail and examine the correlation between signals and order-level statistics.
One reason for the heterogeneity may be due to imperfect ability to predict volume patterns. If
the broker were to perfectly predict the future volume profile in the market, achieving zero slippage
against market VWAP would be trivial. However, the broker can only inaccurately predict the
upcoming volume patterns by using the historical volume profiles from the previous month. To
assess the predictive power of this approach, using TAQ data, Figure 5 plots the histogram of the
prediction error, the percentage difference between realized volume and average volume. This plot
implies that predicting the future volume patterns using historical volume curves is indeed very
difficult and this would be a major source of heterogeneity between executions.
Relatedly, client instructions in the pre-trade phase can affect the predictability of the execution.
If the client chooses a large-order size, the algorithm may not have enough flexibility in randomizing
the order. Finally, market conditions during the execution may directly affect the algorithm’s child-
order selections. For example, when the volume profile is roughly constant, the execution schedule
could also mimic a similar profile and lead to more predictability. On the other hand, in these quiet
times, the algorithm may try to randomize the execution schedule as well in order to minimize the
information leakage. Note that market conditions can also change the exploitability of the signals.
For example, when noise trading is higher, algorithmic traders may not fully detect the patterns. I
examine such issues in detail in Section 5 and Section 6.
To better understand the drivers of the signals, I study each signal in detail by regressing it on
the control variables I have utilized in the multivariate regressions. I also add stock fixed-effects,
analysis. The findings are robust to the use of other non-round intervals such as X.01.
24
4,000
3,000
Frequency
2,000
1,000
0
−100 0 100 200 300
Prediction Error (%)
Figure 5: Histogram of the percentage difference between realized volume during the execution and the
average volume observed during the same interval in the past month.
client and intraday dummies.
Table 5 reports the regression results. Interestingly, all signals have the same sign in each
coefficient implying the robustness of the intuition behind the signal construction. For each signal,
the coefficient on participation rate is positive suggesting that the signals are correlated with the
order size and they measure the footprints of the order. This finding highlights the role of client
instructions on execution predictability. All signals load positively on Aggressive Ratio and Passive
Ratio implying that predictability is lower when the executions occur at the mid-quote.
Spreads are negatively correlated with the signals implying that liquidity shocks cannot fully
explain the presence of execution predictability. Volatility is positively correlated with the signals
suggesting that the algorithm shies away from volatile trading in times of price uncertainty. This
could be a strategic choice as adversary traders may also find it difficult to extract precise signals
during these volatile periods. Finally, small-cap stocks seem to have higher predictability due to
NegChildOrderVol and NegIntervalVol. Overall, these findings suggest that market conditions may
also drive the heterogeneity in the signals and imply that the algorithm could be leaving more
25
footprints in certain set of executions.
4.8. Relationship with Publicly Available Data
Execution predictability signals are constructed using proprietary data. Since these signals are not
publicly available, it is not directly conclusive that strategic traders can exploit these signals to
detect predictable large executions. In order to address this concern, I show in this section that
similar easy-to-construct signals can proxy the proprietary signals implying that strategic traders
are indeed capable of detecting these signals.
Publicly available data in TAQ do not provide any information about the presence of large
orders or how it is traded in particular child orders. However, if the participation rate of the order
is high, most of the reported trades will belong to the broker’s trades. If the client is also using
multiple brokers, due to potential correlation between VWAP algorithms across brokers, my private
signal may be also correlated with its public counterpart. For these reasons, I utilize all reported
trades in TAQ to construct the publicly replicable version of the signals using the same definitions.
Let PubNegChildOrderVol, PubNegIntervalVol, and PubQtyTimeCorrel be the publicly avail-
able counterparts of NegChildOrderVol, NegIntervalVol, and QtyTimeCorrel, respectively. Table
6 illustrates the Spearman and Pearson correlations between the public and proprietary signals.
Spearman (Pearson) correlations are between 0.25 (0.16) and 0.48 (0.39) and they are statistically
significant. As illustrated in Table 7, the correlation between the signals can go up to 0.77 when
executions with participation rates of at least 0.6% (the median) are considered. Overall, these
values point high positive rank correlation between public and proprietary signals.
4.9. Buy versus Sell Orders
There is an important stream of the microstructure literature that studies the potential asymmetry
between the cost of buy and sell orders. In the earliest example, Kraus and Stoll (1972) find that
block purchases have larger permanent price impact than block sales. In a review paper, Macey and
O’hara (1997) highlight that the direction of the order is an important determinant of execution
costs. Saar (2001) develops a theoretical model to explain why buy orders could be costlier than
26
sell orders. The model relates the asymmetric effect to the historical price performance of the stock.
After a long period of price run-ups, the model predicts a smaller asymmetry between buys and
sells. Chiyachantana et al. (2017) find supporting evidence of this theory by illustrating that price
impact asymmetry varies based on the history of the price run-up. Chiyachantana et al. (2004)
document that the asymmetry depends on the contemporaneous market condition and report that
sells (buys) have larger price impact than buys (sells) in bear (bull) markets. Hu (2009) finds that
the buy–sell asymmetry is a function of the type of the benchmark used. If one uses pre-trade
benchmarks, buys (sells) have higher implicit trading costs during rising (falling) markets but this
relationship reverses when post-trade benchmarks are used. During-trade measures such as VWAP
slippage are neutral to the market movements.
In my dataset, I have perfect knowledge of the trade direction, thus, I can directly test whether
the cost of information leakage due to predictable signals is different across buy and sell orders. To
test this hypothesis, I separately run the multivariate regression in Equation (5) for buy and sell
orders. Table 8 reports the regression results for each signal and trade direction. The coefficients
on the signals are positive in all cases and statistically significant in five out of six cases. Only the
coefficient on NegChildOrderVol in the case of buy orders is insignificant.
Further, I statistically test whether the coefficients are different in buys than in sells. From buy
orders to sell orders, the coefficient on NegChildOrderVol increases by 94.69 with the corresponding
t-statistic of 1.19; the coefficient on NegIntervalVol decreases by 16.55 with the corresponding t-
statistic of 0.68 and the coefficient on QtyTimeCorrel increases by 7.68 with the corresponding
t-statistic of 0.15. These statistics imply that for all three signals, the difference is statistically
insignificant suggesting that predictable signals are costly in both buys and sells without any
significant asymmetry. This finding on lack of asymmetry contributes to the literature studying
differences in execution costs between buy and sell orders. As mentioned earlier in this section,
there have been several studies documenting buy-sell asymmetry in execution costs. Focusing on a
particular component of execution costs, the cost of predictability, I find that no matter the trade
direction of the order, footprints of the algorithm can be exploited by strategic traders.
27
4.10. Stock-by-stock Analysis
The cost of predictable trades can be stock specific. If only a few stocks suffer from order antici-
pation activities, it would not be correct to generalize the findings. To address this issue, I run my
main regression on a stock-by-stock basis. I focus on 385 stocks with more than 20 parent-order
executions. The findings are robust to different choices of minimum parent-order executions at the
stock level. For this group of stocks, I estimate the following regression for all executions of stock
k:
X
ISi,k = αk + βk Signali,k + δj,k Controlj,i,k + i,k , (8)
j
where I include execution-level control variables including participation rate, the ratio of passive and
aggressive orders, bid-offer spread, mid-quote volatility, execution duration, turnover, logarithm of
market capitalization.
Table 9 reports the summary of the β coefficients estimated for every stock. For each signal,
the average β is positive and statistically significant. The fraction of positive coefficients range
from 62% to 68%. Overall, these findings confirm that the cost increase due to predictability is not
driven by a small group of stocks.
4.11. Further Robustness Checks
I perform a number of robustness checks to make sure that my findings are not sensitive to alterna-
tive specifications or any potential data-related biases. First, to control for a potential time trend
or endogenous changes in the trading algorithm, I run my regressions with month dummies. I did
not find any significant change in signal coefficients.
My findings are also largely unchanged if I extend the set of control variables with the ratio
executed in dark pools, inverse of the stock price, number of child orders, or past stock returns.
The cost estimates also do not change if I use a restricted dataset in which I exclude executions
occurring during the first and last 30 minutes. I undertake this robustness check to mitigate the
potential bias in my signals due to higher volume at the beginning and end of the trading day.
28
5. Evidence from the SEC’s Ban on Unfiltered Access
In this section, I investigate whether the predictable patterns are harder to exploit when a particular
group of algorithmic traders faces technological rigidity in implementing their trading strategies.
This hypothesis emerges from the theoretical model of Aït-Sahalia and Sağlam (2013) in which a
high-frequency market maker takes advantage of a signal about an investor’s urgency to trade. The
model predicts that when the market maker is faster, he can track this signal at a higher level of
accuracy and be more successful in predatory quoting, i.e., offering lower prices to an impatient
buyer. To test this implication, I utilize a regulatory shock implemented in November 2011 that
limits the order submission activities of some algorithmic traders that utilize direct market access
through their brokers. Thus, I expect that order anticipatory activities of some algorithmic traders
will weaken after the regulatory change. Consequently, this new regulation allows me to directly
study the relationship between the cost of predictability and order anticipation ability.
5.1. Regulating Unfiltered Access
On November 30, 2011, the SEC required brokers with market access to disallow their customers
to have a direct link to market centers without any supervisory control.11 Before this enforcement,
clients of broker-dealers would be able to route their orders to a market center in an unfiltered
way to achieve faster order placement and trade execution. In the presence of the ban, supervised
access to the market centers impose additional controls and latency and hence slow down the order
submission of some fast algorithmic traders.
The ban did not affect large HFT firms who were registered as broker-dealers but the impact of
the ban was still expected to be significant as unfiltered access has been prevalent before the ban
accounting for nearly 40% of trading volume in the U.S. according to Aite Group, a Boston based
research firm.12 Another evidence of the widespread utilization of unfiltered access was that a
regional and relatively smaller brokerage firm, Wedbush Securities, consistently ranked as Nasdaq’s
largest liquidity provider before the ban, with the most of the volume generated by unfiltered-access
11
https://www.sec.gov/rules/final/2011/34-64748fr.pdf
12
“Study Lays Bare Breadth of ‘Naked’ Access”, The Wall Street Journal, December 15, 2009.
29
clients.13 After the implementation of the ban, an affected high-frequency trading firm could trade
on an exchange either becoming a registered broker-dealer with the SEC or becoming a member
of an exchange. Both of these options are costly to implement. Due to the large market share of
unfiltered access and compliance costs of bypassing the regulation, I expect that the ban’s effect
on all AT activity to be economically significant.
5.2. The Impact of the Ban on Algorithmic Traders
Order anticipatory activities are not directly observable and thus it is impossible to quantify the
impact of the ban solely on strategic trading. However, given that order anticipation is a subset of
overall AT activity, I expect it to be correlated with well-established AT proxies. Recent empirical
evidence documents that average trade sizes and trade-to-order ratios are negatively correlated
with broad AT activity (e.g., Hendershott et al. (2011), Hagströmer and Norden (2013), O’Hara
et al. (2014) and Weller (2015)). In this section, I use the TAQ database to compute these two
proxies of AT activity and merge them with my execution dataset. Since the ban was enforced
on November 30, 2011, I use the trade and quote data from the beginning of November 2011 to
the end of December 2011. The findings are robust to different durations of pre-ban and post-ban
periods.
Formally, I run the following regressions for both proxies at the stock-day level to test whether
the algorithmic trading activity dropped after the ban:
X
ATi,t = α + βBant + δj Controlj,i,t + µi + i,t , (9)
j
where ATi,t is either given by the average trade size or the trade-to-order ratio (%), Bant is a
binary variable with the value of 0 before the ban and 1 after the ban and µi denotes the stock
fixed-effects. Control variables include the bid-offer spread, mid-quote volatility and the logarithm
of market capitalization of the stock.
Table 10 reports the regression results. I find that the coefficients on the Ban are positive
13
“SEC’s ‘Naked’ Proposal May Hurt Small Dealers”, Securities Industry News, January 25, 2010.
30
and highly statistically significant for both proxies suggesting that AT activity dropped after the
ban. The coefficients are also economically significant. The median trade size is approximately 200
shares and the median trade-to-order ratio is 2.35% before the ban. Thus, the coefficients imply a
7% increase in the average trade size and 22% increase in trade-to-order ratio compared to these
pre-ban values.
These findings are consistent with the analysis in Chakrabarty et al. (2014) who also study this
regulatory change in detail. Using 150 stocks equally chosen from groups of most active, normal
and least active in terms of market capitalization and trading volume, they find that the quote-to-
trade ratio declines by 15.2% and the number of quote submissions decline by 25.6% after the ban.
Furthermore, using NASDAQ’s TotalView-ITCH data feed, they report that the reaction times
to order book events slow down to 344 milliseconds from 221 after the regulation. All of these
empirical evidence suggest that order anticipation has become more difficult after the ban.
5.3. The Price Impact of Predictability after the Ban
I have confirmed with the data provider that the broker has not changed the VWAP algorithm in
response to this new regulation and hence the ban on unfiltered access provides me the opportunity
to examine the sensitivity of price impact of predictability to order anticipation. Consistent with
the drop in AT activity, I expect that the predictable signals leaking from large order executions
will now be examined by a constrained group of algorithmic traders. The additional latencies due
to imposed pre-trade checks will slow down the order anticipatory activities and the algorithm’s
footprints may not be fully exploited by the algorithmic traders. Consequently, I hypothesize that
my three signals will lead to smaller execution costs in the post-ban period.
I test this hypothesis by interacting the independent variables in my original regression in
equation 5 with post-ban dummies, Ban. Following Chakrabarty et al. (2014), I use execution
data from October 3, 2011 to January 31, 2012 that straddles the implementation of the ban on
November 30, 2011. With these definitions, I have 2,245 executions in the pre-ban period and 2,870
in the post-ban period. The findings are robust to various definitions of pre- and post-ban periods.
31
Formally, I run the following multivariate regression:
X
ISi = α + βSignali + ϑBani + θBani × Signali + δj ControlAndDummiesj,i
j
X
+ κj Bani × ControlAndDummiesj,i + i ,
j
where Bani is equal to 0 for executions occurring before the ban and 1 for executions occurring
after the ban and I include ControlAndDummies to denote all of the control variables, stock, client
and intraday dummies. Note that the ban may increase or decrease the overall trading costs which
will be reflected in the estimated coefficient of the Ban. However, if order anticipation does not
exacerbate the cost of predictable executions, the interaction coefficients between the signal and
post-ban dummy would be indistinguishable from zero.
Table 11 reports the regression results. I find that the coefficients on the interaction terms
between the signals and the ban period are all negative. The coefficients are statistically signifi-
cant for NegIntervalVol and QtyTimeCorrel. These results suggest that the ban has reduced the
exploitability of the signals when a group of algorithmic traders has slower order submission tech-
nology. Overall, these findings underline the important role of algorithmic traders for the positive
correlation between predictability signals and execution costs. As order anticipation becomes more
difficult after the ban, the same level of predictability in the trading algorithm leads to smaller
costs.
6. Evidence from Low Signal-to-Noise Periods
Anticipating orders would be difficult when order flow signals emerging from large order executions
get confounded with additional noise. The theoretical model in Yang and Zhu (2015) directly links
the back-running ability to the volatility of the signal. When the signal’s volatility is higher, the
back-runner’s expected profit becomes lower. Therefore, this model implies that even though an
algorithm is leaving similar predictable patterns, strategic traders may not exploit them fully as
they may not have perfect access to them. In this section, I investigate whether the potential
32
volatility in the signals reduces the increase in execution costs.
To test this hypothesis formally, I study the price impact of predictability around higher unin-
formed trading activity. The algorithm’s footprints can be detected with less accuracy if there is a
large amount of noise trading in the market as these trades would lower the signal-to-noise ratios of
the patterns. Empirically, it is hard to distinguish periods of high volume due to non-informational
motives. To overcome this challenge, I will exploit the potential behavioral bias of traders to round
prices. Price clustering literature documents that the propensity to trade is higher around round
prices (see e.g., Osborne (1962)) and retail human traders seem to be more sensitive to this bias (see
e.g., Chiao and Wang (2009)). I conjecture that when the arrival mid-price is close to a round price,
there will be more non-informational trading during the execution period that may cloud the signal
extraction of algorithmic traders. Thus, examining these noise trading periods in detail allows me
to directly link predictable signals to execution costs through the variation in order anticipation
ability of algorithmic traders.
6.1. Trading Volume around Round Prices
The decimal part of the arrival mid-price, the average of the prevailing bid and offer price in the
market, can take the following values: {.000, .005, .010, . . . , .985, .990, .995}. If the decimals of the
arrival mid-price are in the set of {.000, .005, .010, .990, .995}, I expect that there will be more
noise trading during these executions. Formally, let the ith execution have the binary attribute
Noisyi = 1, if its arrival mid-price is close to a whole dollar, i.e., its decimal is in the set of
{.000, .005, .010, .990, .995}. Otherwise, the binary attribute is Noisyi = 0 for the ith execution.
There are 522 executions with Noisyi = 1.
First, I verify formally that when the arrival mid-price is close to whole prices, there is an
increase in volume realized during the execution period. Let T otQi be the total volume realized
during the execution period excluding the client’s order. I run the following regression with standard
33
controls and dummies to test the hypothesis:
X
log(T otQi ) = α + βNoisyi + δj ControlAndDummiesj,i + i,t , (10)
j
Table 12 reports that the coefficient on Noisy is positive and statistically significant. The results
Qi
are similar if I use T otQi on the left hand side. In this case, the coefficient is negative suggesting
that the volume increase is still present when potential change in the client’s order size is taken
into account. These findings provide evidence of abnormal noise trading activity during executions
with round stock prices.
6.2. The Price Impact of Predictability during Abnormal Trading
I have confirmed with the data provider that the VWAP algorithm does not use a special routine
for the round arrival mid-prices so these periods of higher noise trading allows me to test the
relationship between price impact of predictability and ease of order anticipation. Due to the
noise introduced by additional trading volume, I expect that the patterns leaking from large order
executions will now be detected by strategic traders with lower accuracy. For example, detecting
orders with equidistant trading intervals is much more difficult due to the combinatorial nature
of the pattern recognition problem. As order anticipation becomes more difficult, the predictable
signals will be expected to be less costly.
I test this hypothesis by interacting the independent variables in my original regression in equa-
tion 5 with dummies of abnormal noise trading, Noisy. Formally, I run the following multivariate
regression:
X
ISi = α + βSignali + ϑNoisyi + θNoisyi × Signali + δj ControlAndDummiesj,i
j
X
+ κj Noisyi × ControlAndDummiesj,i + i ,
j
where Noisyi is equal to 1 for executions with arrival mid-price having a decimal in the set of
{.000, .005, .010, .990, .995} and 0 otherwise. I again use ControlAndDummies to denote all of the
34
control variables, stock, client and intraday dummies. If order anticipation exacerbates the cost of
predictable executions, the interaction coefficients between the signal and Noisy dummies would
be negative.
Table 13 reports the regression results. I find that the coefficients on the interaction terms
with the noisy volume dummies are all negative. The coefficients are statistically significant for
NegIntervalVol and QtyTimeCorrel. Overall, these findings support the order anticipation channel.
As pattern recognition becomes more difficult due to the complexity of processing additional trading
volume, the same level of predictability in the trading algorithm leads to smaller costs.
7. Further Consistency with Existing Theories
I have already examined various hypotheses emerging from the existing theoretical models, but
in this section, I would like to run additional analyses to further differentiate between the main
competing theories outlined in Section 2. Due to the cost increase, sunshine trading does not
seem to consistent with my evidence. On the other hand, the findings can be still consistent with
predatory trading and back-running. Recall that predatory traders are taking advantage of the
urgency or impatience of uninformed investors by trading in the same direction of the investor at
the beginning of the execution. In the back-running theory, an adversary trader first detects the
presence of private information by exploiting an order flow signal and aims to profit by trading in
the same direction at relatively later stages of the execution.
146 different investors trade in the execution data and they may potentially differ with regards
to their investment objectives. As consistent with the empirical study in Sağlam et al. (2014), I
expect that these investors will be very heterogeneous in their short-term beliefs about the funda-
mental value of the asset. Thus, it is convenient to test competing theories of order anticipation in
my dataset. Suppose that the investor initiating the execution has private information about the
fundamental value of the stock. According to the back-running theory, if the execution becomes
predictable (as gauged by the signals), the adversary algorithmic traders also trade on this infor-
mation during the later stages of the execution and the price impact of the execution increases. In
35
this case, one would expect this price change to be permanent. On the contrary, assume that the
investor has no private information about the fundamental value of the asset. If predatory traders
use order anticipation strategies during a predictable execution of a large order, then the potential
price change would be immediate and expected to be temporary. Finally, sunshine trading may
arise if strategic traders increase their liquidity provision to the executions of uninformed investors
and lower the price impact of the large order. Following the approach in Van Kervel and Menkveld
(2015), I first provide a measure for permanent price impact in order to differentiate the informed
and uninformed investors and use it in the decomposition of total price impact.
7.1. Price Impact Measures
The overall price impact (OPI ) of an execution is defined as the percentage return realized between
the start and end of the execution. One can break down this total cost measure into permanent
and temporary components. Let the permanent price impact (PPI ) of an execution be defined as
the percentage return realized between the start of the execution and the fundamental value of the
asset realized on the next trading day. Then, the temporary price impact (TPI ) is just equal to
the difference between OPI and PPI. Formally,
Pi,Ti − Pi,0
OPIi = sgn (Qi ) ,
Pi,0
Xm(i),d(i)+1 − Pi,0
PPIi = sgn (Qi ) , (11)
Pi,0
TPIi = OPIi − PPIi ,
d
where the mapping i → u is used to identify the date of the execution, Pi,0 and Pi,Ti denote the
mid-quote prices at the start and end of the execution, and Xj,u is the fundamental value of the
asset j on day u. I will proxy this fundamental value by computing the volume-weighted average
price from TAQ database using all trades and their corresponding sizes.
The competing theories are based on the presence or absence of private information, and thus
I now study the investor heterogeneity in trader universe by studying the distribution of average
PPI at the investor level.
36
7.2. Investor Heterogeneity
Using PPI as a proxy for the investor’s informed trading, I compute the distribution of the infor-
mation based trading by computing average price impact at the investor level:
PN
i=1 PPIi I{f (i)=I}
AvgPPII = PN , (12)
i=1 I{f (i)=I}
f
where the mapping i → I is used to identify the investor who submitted the order where I =
1, . . . , 146.
Table 14 reports each investor’s AvgPPI with its corresponding t-statistic. Recall that on
average each investor has approximately 139 executions in the dataset. I observe that there is
wide heterogeneity in AvgPPI values. Specifically, I find that 33 investors have positive AvgPPI
with statistical significance at 10% level. This group of investors constitute roughly 23% of my
investor universe. Similarly, there exists 26 investors with negative AvgPPI at 10% significance
level. I find that the maximum (minimum, mean and median) average return that an investor in
this group achieves per execution is approximately 260 (-991, -8 and 2) bps. These statistics show
that my investor universe is suitable to study the applicability of competing theories in predictable
executions.
7.3. Predatory Trading, Back-running or Sunshine Trading
I first test the consistency of the cost increase with predatory trading and back-running. I use my
standard regression model with stock fixed effects to study the correlation between price impact
measures and the cost of predictable executions. For each of my price impact measures, PI, and
execution predictability signals, Signal, I estimate the following regression models:
X
PIi = α + βSignali + δj ControlandDummiesj,i + i , (13)
j
where PI is OPI, PPI or TPI and Signal equals either NegChildOrderVol, NegIntervalVol or
QtyTimeCorrel and each regression includes the standard set of control variables, stock, client
37
and intraday dummies.
Table 15 reports the regression results for each price impact measure. First, I observe that all of
the signals are significant in explaining the variation in OPI providing robust results to IS analysis.
The regressions illustrate that none of the signals are significant in explaining the variation in TPI.
On the other hand, NegIntervalVol and QtyTimeCorrel are statistically significant in explaining
the variation in PPI at 10% and 1% level, respectively.
This evidence is broadly consistent with back-running as it predicts same-side trading with
informed investors. However, Brunnermeier and Pedersen (2005) also assume permanent price
impact in price dynamics so documenting the significant correlation between signals and PPI may
not fully differentiate between the theories of predatory trading and back-running. Fortunately,
these theories have also different implications with regards to initial price movements during the
execution. In predatory trading, adversary traders would trade in the same direction of the large
order during the initial period whereas in back-running, adversary trading would be delayed due to
the potential learning period about the large order. My analysis in Section 4.5 sheds light on the
mechanism by examining the cost increase across the initial and final periods. Recall that signals
constructed from the initial period were not correlated with the initial cost increase but they were
significantly correlated with final price impact measures. These findings point to initial learning by
adversary traders and thus they are not consistent with the predictions of the predatory trading
theory.
Overall, these results suggest that the order flow information leaking from the executions can in-
duce back-running activity by algorithmic traders. Signals are positively correlated with permanent
price movements and signals constructed from the first half of the execution point to higher execu-
tion costs in the second half of the execution aligning with the learning mechanism in back-running
theory.
Finally, I investigate whether there can be potential sunshine trading in uninformed executions.
To test this formally, I examine the joint impact of informed trading and the signals on IS by
separately controlling for each variable. Using the PPI as a proxy for informed trading, I test this
hypothesis by running my main regressions with an interaction term with the signals and PPI.
38
Formally, I estimate the following regressions:
X
ISi = α + θPPIi × Signali + βSignali + ϑPPIi + δj ControlAndDummiesj,i + i ,
j
where ControlAndDummies denotes all of the control variables, stock, client and intraday dummies.
Table 17 reports the regression results. Consistent with the back-running theory, I find that the
interaction terms between PPI and NegChildOrderVol and QtyTimeCorrel are significant suggest-
ing that the predictable executions are costlier if they are more informed. However, I also observe
that the coefficients on the signals are still positive and significant suggesting that sunshine trading
is not applicable. This finding further implies that back-running theory can partially explain the
increase in execution costs due to predictable patterns.
7.4. Spread Costs or Adverse Price Movements
The average trading cost, IS, can be evaluated in two components: spread costs (SC ) and adverse
price movements (APM ). For example, spread costs would be negative if the algorithm were to fill
all of the child-order with passive limit orders. Let Ni be the number of child-order trades in the
ith execution and Qi,j be the jth child-order size (in shares) of the ith execution priced at Pi,j and
let Mi,j be the mid-point of the best available quotes at the same second of the fill (i.e., NBBO
39
mid-quote price). Then, I can decompose IS into these two components as follows:
Piavg − Pi,0
ISi = sgn (Qi ) (14)
Pi,0
PNi
1
Qi j=1 Pi,j Qi,j − Pi,0
= sgn (Qi )
Pi,0
PNi Pi,j −Pi,0
j=1 Pi,0 Qi,j
= sgn (Qi )
Qi
Ni
!
X Pi,j − Mi,j + Mi,j − Pi,0 Qi,j
= sgn (Qi )
j=1
Pi,0 Qi
Ni N
! !
i
X Pi,j − Mi,j Qi,j X Mi,j − Pi,0 Qi,j
= + sgn (Qi ) + sgn (Qi ) .
j=1
Pi,0 Qi j=1
Pi,0 Qi
| {z } | {z }
,SCi ,APMi
In my dataset, I find that the average SC and APM are 0.68 bps and 2.44 bps, respectively,
suggesting that on average APM constitutes approximately 78% of the total IS.
To examine the relationship between signals and the components of the IS, I run the following
multivariate regressions at the execution level:
X
IS_Parti = α + βSignali + δj ControlandDummiesj,i + i , (15)
j
where IS_Part is either SC or APM, Signal equals either NegChildOrderVol, NegIntervalVol or
QtyTimeCorrel and each regression includes the standard set of control variables, stock, client and
intraday dummies.
Table 16 reports the regression results. I find that all signals have insignificant coefficients when
the cost component is SC but the coefficients become positive and significant in the case of APM.
Consistent with the earlier analysis, these findings suggest that predictable signals increase the
price impact of the execution rather than the spread costs and align with the predictions from the
back-running theory.
40
8. Meeting the Benchmark and Equilibrium Implications
In this section, I provide empirical evidence that the broker is mostly successful in achieving the
market benchmark price for the average execution price. This finding is important because it
illustrates that the objective functions of the broker and the algorithmic traders engaging in order
anticipation are not exactly the opposite, i.e., the trading game between them is not zero-sum.
This highlights that even if the broker is very skilled, execution predictability can still emerge
in equilibrium due to the misalignment of interests. Furthermore, this evidence reinforces the
well-established proposition that VWAP algorithm is a suboptimal choice if the investor is highly
informed.
In a VWAP execution algorithm, the main objective is to minimize the VWAP slippage (VS)
rather than the IS. Recall that in computing VS, the volume-weighted average price realized during
the execution (including all trades) is the benchmark price rather than the arrival mid-quote price,
i.e.,
Piavg − VWAPi
VSi = sgn (Qi ) , (16)
VWAPi
where VWAPi is the realized volume-weighted average price over the ith parent-order period. The
mean (median) VS is roughly 1.6 (1.1) bps. Surprisingly, the correlation between VS and IS,
despite being small in magnitude, is actually negative at −0.06. Figure 6 illustrates the lack of
strong relationship visually. This plot indicates that execution predictability may lead to higher IS
on average, but may not prevent the broker from achieving its objective, i.e., meeting the market
benchmark price.
These findings illustrate that the cost increase due to execution predictability is less pronounced
on the primary objective of the algorithm. However, this lack of impact has still very interesting
implications as it illustrates that the predictable patterns are not completely at odds with the
broker’s performance. This is consistent with the hypothesis that even if the broker is very skilled,
execution predictability may still emerge because the performance of the algorithm does not suffer
from this information leakage. This feature can also decrease the incentive of the brokers to change
41
200
100
VS (bps)
0
−100
−200
−600 −400 −200 0 200 400 600

IS (bps)
Figure 6: Scatter plot of VS and IS.
their algorithms even if they are aware that some strategic traders can detect predictability patterns.
9. Conclusion
This paper tests the presence of order anticipation strategies by examining predictable signals
appearing in large order executions. In this framework, I consider three easy-to-construct signals
based on the child-order executions of the parent-order and establish that higher predictability
implied by the signals result in higher trading costs. The signals are all constructed with the basic
intuition that deterministic patterns are easy to be deciphered by sophisticated algorithmic traders.
The cost increase due to these predictable patterns in each signal point to a robust conclusion that
strategic traders exploit the urgency or the information of the investor using order anticipation
strategies. I exploit the SEC’s ban on unfiltered access and exogenous increase in noise trading
volume as a shock to order anticipatory activities of algorithmic traders and illustrate that the
price impact of predictability is smaller when order anticipation becomes more difficult.
My findings are mostly inconsistent with predictions of sunshine trading whereas they are
42
consistent with back-running. Back-running occurs when the execution is submitted by an investor
having private information about the fundamental value of the asset and algorithmic traders would
like to decrypt this information from the order flow signals to share the profits from the permanent
price impact. Decomposing the price changes into temporary or permanent, initial or delayed
responses and spread costs versus adverse price movements, my empirical analysis reveals that
back-running theory seem to be more consistent with the empirical findings.
This paper highlights the insensitivity of VWAP execution strategy to predictable patterns.
The empirical evidence highlights that this algorithm may be successful in meeting its benchmark
price but may leave substantial footprints that can be exploited by adversary traders. Despite its
drawbacks, this algorithm is still widely used in the industry for its ease in performance bench-
marking and low-cost benefits. This framework can be also utilized to reassess the effectiveness of
the VWAP algorithm especially in informed executions.
Given the absence of sunshine trading, at least at the aggregate level, my results have important
implications for policymakers. In the context of back-running, the price discovery can be argued
to be faster whereas predatory trading may delay the price discovery. For example, Aït-Sahalia
and Sağlam (2013) find that in the presence of a skilled predatory high-frequency market maker,
market liquidity also drops. It is not also evident whether back-running is a good practice. Stiglitz
(2014) shows an important negative implication of back-running arguing that it can disincentivize
costly information acquisition about the real economy.
Finally, the empirical evidence may have potential implications in the design of financial mar-
kets. One potential policy recommendation emerging from the analysis is to randomize trade
quantities and their timestamps in public data feeds. When the signal-to-noise ratio is very low,
algorithmic traders will be potentially discouraged from engaging in order anticipation. For exam-
ple, Harris (2013) proposes that markets should report only approximate trade sizes within various
buckets or only aggregated volumes at 5-minute time intervals to prevent harmful HFT activity.
Controlled pilot studies can ultimately evaluate the effectiveness of such policies with regards to
their unintended consequences.
43
References
Admati, Anat R., and Paul Pfleiderer. “Sunshine Trading and Financial Market Equilibrium.”
Review of Financial Studies, 4 (1991), 443–481.
Aït-Sahalia, Yacine, and Mehmet Sağlam. “High Frequency Market Making: Optimal Quoting.”
(2013).
Almgren, R.; C. Thum; E. Hauptmann; and H. Li. “Direct Estimation of Equity Market Impact.”
Risk, 18 (2005), 58–62.
Bertsimas, Dimitris, and Andrew W Lo. “Optimal Control of Execution Costs.” Journal of Financial
Markets, 1 (1998), 1–50.
Bessembinder, Hendrik; Allen Carrion; Laura Tuttle; and Kumar Venkataraman. “Liquidity, Re-
siliency and Market Quality Around Predictable Trades: Theory and Evidence.” Journal of
Financial Economics, 121 (2016), 142–166.
Brogaard, Jonathan; Terrence Hendershott; Stefan Hunt; and Carla Ysusi. “High-Frequency Trad-
ing and the Execution Costs of Institutional Investors.” Financial Review, 49 (2014), 345–369.
Brogaard, Jonathan; Terrence Hendershott; and Ryan Riordan. “High-Frequency Trading and
Price Discovery.” Review of Financial Studies, 27 (2014), 2267–2306.
Brunnermeier, Markus K, and Lasse Heje Pedersen. “Predatory Trading.” The Journal of Finance,
60 (2005), 1825–1863.
Chakrabarty, Bidisha; Pankaj K Jain; Andriy Shkilko; and Konstantin Sokolov. “Speed of Market
Access and Market Quality: Evidence From the Sec Naked Access Ban.” Available at SSRN
2328231, (2014).
Chiao, Chaoshin, and Zi-May Wang. “Price Clustering: Evidence Using Comprehensive Limit-
Order Data.” Financial Review, 44 (2009), 1–29.
Chiyachantana, Chiraphol; Pankaj K Jain; Christine Jiang; and Vivek Sharma. “Permanent Price
Impact Asymmetry of Trades with Institutional Constraints.” Journal of Financial Markets,
36 (2017), 1–16.
Chiyachantana, Chiraphol N; Pankaj K Jain; Christine Jiang; and Robert A Wood. “International
Evidence on Institutional Trading Behavior and Price Impact.” The Journal of Finance, 59 (2004),
869–898.
Degryse, Hans; Frank de Jong; and Vincent van Kervel. “Does Order Splitting Signal Uninformed
Order Flow.” (2014).
44
Hagströmer, Björn, and Lars Norden. “The Diversity of High-Frequency Traders.” Journal of
Financial Markets, 16 (2013), 741–770.
Harris, Larry. “What to Do About High-Frequency Trading.” Financial Analysts Journal, 69 (2013).
Hasbrouck, Joel, and Gideon Saar. “Low-Latency Trading.” Journal of Financial Markets,
16 (2013), 646–679.
Hendershott, Terrence; Charles M Jones; and Albert J Menkveld. “Does Algorithmic Trading
Improve Liquidity.” The Journal of Finance, 66 (2011), 1–33.
Hirschey, Nicholas. “Do High-Frequency Traders Anticipate Buying and Selling Pressure.” Available
at SSRN 2238516, (2013).
Hu, Gang. “Measures of Implicit Trading Costs and Buy--Sell Asymmetry.” Journal of Financial
Markets, 12 (2009), 418–437.
Hu, Gang; Koren Jo; Yi Alex Wang; and Jing Xie. “Institutional Trading and Abel Noser Data.”
Journal of Corporate Finance, (Forthcoming).
Jones, Charles M. “What Do We Know About High-Frequency Trading.” Columbia Business School
Research Paper, (2013).
Korajczyk, Robert A, and Dermot Murphy. “High Frequency Market Making to Large Institutional
Trades.” Available at SSRN 2567016, (2015).
Kraus, Alan, and Hans R Stoll. “Price Impacts of Block Trading on the New York Stock Exchange.”
The Journal of Finance, 27 (1972), 569–588.
Macey, Jonathan R, and Maureen O’hara. “The Law and Economics of Best Execution.” Journal
of Financial Intermediation, 6 (1997), 188–223.
Madrigal, Vicente. “Non-Fundamental Speculation.” The Journal of Finance, 51 (1996), 553–578.
Menkveld, Albert J. “The Economics of High-Frequency Trading: Taking Stock.” Available at

SSRN, (2016).
Moallemi, Ciamac C; Beomsoo Park; and Benjamin Van Roy. “Strategic Execution in the Presence
of an Uninformed Arbitrageur.” Journal of Financial Markets, 15 (2012), 361–391.
O’Hara, Maureen; Chen Yao; and Mao Ye. “What’s Not There: Odd Lots and Market Data.” The
Journal of Finance, 69 (2014), 2199–2236.
Osborne, Maury FM. “Periodic Structure in the Brownian Motion of Stock Prices.” Operations
Research, 10 (1962), 345–379.
45
Perold, Andre F. “The Implementation Shortfall: Paper Versus Reality.” The Journal of Portfolio
Management, 14 (1988), 4–9.
Saar, Gideon. “Price Impact Asymmetry of Block Trades: An Institutional Trading Explanation.”
The Review of Financial Studies, 14 (2001), 1153–1181.
Sağlam, Mehmet, and Tugkan Tuzun. “Do ETFs Increase Liquidity?.” Available at SSRN 3142081,
(2018).
Sağlam, Mehmet; Ciamac C Moallemi; and Michael Sotiropoulos. “Short-Term Trading Skill: An
Analysis of Investor Heterogeneity and Execution Quality.” Available at SSRN 2463952, (2014).
Stiglitz, Joseph E Tapping the Brakes: Are Less Active Markets Safer and Better for the Economy In
Federal Reserve Bank of Atlanta 2014 Financial Markets Conference Tuning Financial Regulation
for Stability and Efficiency, April, volume 15, 2014.
Tong, Lin. “A Blessing or a Curse The Impact of High Frequency Trading on Institutional Investors.”
SSRN Working Paper, (2015).
Van Kervel, Vincent, and Albert J Menkveld. “High-Frequency Trading Around Large Institutional
Orders.” Available at SSRN 2619686, (2015).
Weller, Brian M. “Efficient Prices At Any Cost: Does Algorithmic Trading Deter Information
Acquisition.” Available at SSRN 2662254, (2015).
Yang, Liyan, and Haoxiang Zhu. “Back-Running: Seeking and Hiding Fundamental Information in
Order Flows.” (2015).
46
Table 1: Summary statistics for the main attributes in my execution data. Participation rate is equal to the ratio of executed volume to
total volume during the lifetime of the order. The bid-ask spread is normalized using the mid-quote price. Order duration is expressed as
a fraction of full trading day. The duration of a full trading day in U.S. equity markets is 6.5 hours.
Statistic Mean Min Pctl(25) Median Pctl(75) Max

Order Value ($ M) 1.015 0.050 0.131 0.343 1.001 62.864
47
Participation Rate 0.018 0.00001 0.002 0.006 0.019 0.521
Number of Child-Order Trades 127.801 5 26 60 148 4,533
Spread (bps) 3.960 0.711 2.347 3.219 4.623 45.128
Volatility 0.015 0.001 0.008 0.012 0.018 0.274
Order Duration 0.525 0.026 0.159 0.519 0.899 1.000
Implementation Shortfall (bps) 3.12 −678.30 −27.89 2.65 35.24 698.10
Table 2: This table presents the regression results where IS is the dependent variable. The main inde-
pendent variables are three execution signals: NegChildOrderVol measures the sign-adjusted volatility of
relative child-order sizes. NegIntervalVol, that measures the sign-adjusted volatility of normalized trading
intervals between consecutive child-orders. QtyTimeCorrel measures the correlation between cumulative
executed quantity and total time. I add participation rate, the ratio of aggressive and passive orders in the
parent-order, bid-offer spread, mid-quote volatility, execution duration, turnover, and logarithm of market
capitalization as control variables. Participation rate is the ratio of order size to the total volume during
the trading interval. Aggressive (passive) order ratio is the fraction of child-orders of the parent-order
that pays (earns) additional spread. Bid-offer spread is normalized using the mid-quote price. Volatility
is computed using mid-quote returns based on five seconds. Order duration is expressed as a fraction of
full trading day. Turnover is the total volume during the execution interval divided by the number of
shares outstanding. Log Market Cap is the logarithm of the market capitalization computed using the
arrival price of the order. All regressions include stock, client and intraday dummies. In columns (4)-(6),
I standardize all of the continuous independent variables. Standard errors are given in parentheses and
are adjusted by clustering on calendar day.
Dependent variable: IS
(1) (2) (3) (4) (5) (6)
∗∗ ∗∗
NegChildOrderVol 88.52 1.61
(42.29) (0.77)
NegIntervalVol 53.83∗∗ 1.77∗∗
(21.83) (0.72)
QtyTimeCorrel 70.94∗∗∗ 1.75∗∗∗
(25.03) (0.62)
Participation Rate 41.08∗∗ 39.31∗ 46.96∗∗ 1.39∗∗ 1.33∗ 1.58∗∗
(20.73) (21.16) (20.19) (0.70) (0.71) (0.68)
Aggressive Ratio 7.70∗ 8.53∗ 8.80∗ 1.82∗ 2.01∗ 2.08∗
(4.55) (4.58) (4.64) (1.07) (1.08) (1.09)
Passive Ratio 6.63 6.96 7.76 1.40 1.46 1.63
(5.42) (5.40) (5.41) (1.14) (1.14) (1.14)
Spread 0.12 0.12 0.10 0.32 0.34 0.28
(0.67) (0.67) (0.67) (1.89) (1.88) (1.88)
Volatility −1.05 −5.35 −0.85 −0.01 −0.06 −0.01
(290.31) (291.73) (291.61) (3.17) (3.18) (3.18)
Order Duration 0.99 0.33 1.57 0.35 0.12 0.56
(38.60) (38.49) (38.59) (13.75) (13.71) (13.75)
Turnover 0.34 0.34 0.34 2.67 2.68 2.68
(0.33) (0.33) (0.33) (2.65) (2.65) (2.66)
Log Market Cap 4.32 4.39 3.76 4.84 4.92 4.21
(7.57) (7.57) (7.59) (8.48) (8.48) (8.50)
Standardized No No No Yes Yes Yes
Observations 20,335 20,335 20,335 20,335 20,335 20,335
Adjusted R2 0.07 0.07 0.07 0.07 0.07 0.07
*** p < 0.01, ** p < 0.05, * p < 0.10
48
Table 3: This table presents the regression results where the initial and final price impact, InitIS and
FinalIS, are the dependent variables. The main independent variables are three execution signals: Init-
NegChildOrderVol measures the sign-adjusted volatility of relative child-order sizes in the first half of
the execution. InitNegIntervalVol, that measures the sign-adjusted volatility of normalized trading in-
tervals between consecutive child-orders in the first half of the execution. InitQtyTimeCorrel measures
the correlation between cumulative executed quantity and total time in the first half of the execution. I
add participation rate, the ratio of aggressive and passive orders, bid-offer spread, mid-quote volatility,
execution duration, turnover, and logarithm of market capitalization as control variables. All regressions
include stock, client and intraday dummies. Standard errors are given in parentheses and are adjusted by
clustering on calendar day.
Dependent Variables:
InitIS FinalIS
(1) (2) (3) (4) (5) (6)
InitNegChildOrderVol −4.04 59.06∗∗
(15.89) (25.76)
InitNegIntervalVol 10.76 26.00∗∗
(9.62) (12.39)
InitQtyTimeCorrel 16.62 54.86∗∗
(16.61) (25.31)
Participation Rate 33.58∗∗ 30.25∗ 33.24∗∗ 23.03 21.96 29.41∗
(15.61) (15.94) (15.49) (18.20) (18.48) (17.48)
Aggressive Ratio 8.77∗∗∗ 8.34∗∗ 8.56∗∗ −0.02 0.36 0.78
(3.40) (3.40) (3.41) (4.03) (4.07) (4.08)
Passive Ratio 7.77∗∗ 7.09∗ 7.52∗∗ 0.49 0.35 1.25
(3.67) (3.67) (3.66) (4.76) (4.86) (4.84)
Spread 0.25 0.28 0.27 −0.32 −0.35 −0.36
(0.53) (0.53) (0.53) (0.46) (0.46) (0.46)
Volatility 108.19 103.12 104.64 −171.82 −174.34 −173.04
(165.91) (165.41) (165.54) (287.13) (289.19) (288.68)
Order Duration −5.96 −8.01 −7.01 16.15 14.98 17.11
(26.39) (26.33) (26.33) (42.03) (42.14) (41.95)
Turnover 0.13 0.13 0.13 0.41 0.41 0.41
(0.21) (0.21) (0.21) (0.32) (0.32) (0.32)
Log Market Cap 6.76 7.14 6.90 −4.72 −4.73 −5.20
(5.29) (5.30) (5.32) (7.16) (7.14) (7.22)
Observations 20,296 20,296 20,296 20,296 20,296 20,296
Adjusted R2 0.07 0.07 0.07 0.07 0.07 0.07
*** p < 0.01, ** p < 0.05, * p < 0.10
49
Table 4: Frequency of consecutive child order trades with trading intervals of X.00 seconds versus X.50 seconds.
X is chosen to be from 1 second to 30 seconds. In the final column, I report the difference in frequencies.
X.00 seconds Frequency X.50 seconds Frequency Difference

1.00 632 1.50 451 181
2.00 554 2.50 293 261
3.00 406 3.50 247 159
4.00 359 4.50 197 162
5.00 424 5.50 159 265
6.00 305 6.50 169 136
7.00 233 7.50 135 98
8.00 232 8.50 98 134
9.00 213 9.50 96 117
10.00 296 10.50 109 187
11.00 198 11.50 89 109
12.00 198 12.50 70 128
13.00 138 13.50 85 53
14.00 162 14.50 85 77
15.00 175 15.50 79 96
16.00 145 16.50 107 38
17.00 155 17.50 82 73
18.00 147 18.50 69 78
19.00 125 19.50 93 32
20.00 156 20.50 87 69
21.00 132 21.50 73 59
22.00 116 22.50 83 33
23.00 107 23.50 75 32
24.00 116 24.50 75 41
25.00 133 25.50 88 45
26.00 159 26.50 78 81
27.00 96 27.50 86 10
28.00 142 28.50 78 64
29.00 109 29.50 78 31
30.00 157 30.50 90 67
50
Table 5: This table presents the regression results where dependent variables are three execution signals:
NegChildOrderVol measures the sign-adjusted volatility of relative child-order sizes. NegIntervalVol, that
measures the sign-adjusted volatility of normalized trading intervals between consecutive child-orders.
QtyTimeCorrel measures the correlation between cumulative executed quantity and total time. I regress
these signals on participation rate, the ratio of aggressive and passive orders, bid-offer spread, mid-quote
volatility, turnover, and logarithm of market capitalization. I also add stock fixed-effects, client and
intraday dummies. Standard errors are given in parentheses and are adjusted by clustering on calendar
day.
Dependent variable:
NegChildOrderVol NegIntervalVol QtyTimeCorrel
Participation Rate 0.07∗∗∗ 0.15∗∗∗ 0.01
(0.01) (0.01) (0.01)
Aggressive Ratio 0.02∗∗∗ 0.01∗∗∗ 0.01∗∗∗

(0.002) (0.003) (0.002)
Passive Ratio 0.02∗∗∗ 0.03∗∗∗ 0.01∗∗∗

(0.002) (0.003) (0.002)
Spread −0.0005∗∗∗ −0.001∗∗∗ −0.0003∗∗

(0.0001) (0.0002) (0.0001)
Volatility 0.09∗ 0.22∗∗∗ 0.11∗∗∗

(0.05) (0.07) (0.03)
Turnover −0.0000 −0.0001 −0.0001∗∗

(0.0000) (0.0001) (0.0000)
Log Market Cap −0.01∗∗∗ −0.01∗∗∗ −0.0005

(0.001) (0.002) (0.001)
Observations 20,335 20,335 20,335

Adjusted R2 0.37 0.44 0.33
*** p < 0.01, ** p < 0.05, * p < 0.10
51
Table 6: Spearman and Pearson correlations between public and proprietary signals using all executions.
Public Proprietary Spearman Correlation Pearson Correlation

∗∗∗
PubNegChildOrderVol NegChildOrderVol 0.48 0.39∗∗∗
PubNegIntervalVol NegIntervalVol 0.25∗∗∗ 0.16∗∗∗
PubQtyTimeCorrel QtyTimeCorrel 0.48∗∗∗ 0.28∗∗∗
*** p < 0.01, ** p < 0.05, * p < 0.10
Table 7: Spearman and Pearson correlations between public and proprietary signals when executions with partic-
ipation rates of at least 0.6% (the median) are considered.
Public Proprietary Spearman Correlation Pearson Correlation

∗∗∗
PubNegChildOrderVol NegChildOrderVol 0.77 0.58∗∗∗
PubNegIntervalVol NegIntervalVol 0.33∗∗∗ 0.25∗∗∗
PubQtyTimeCorrel QtyTimeCorrel 0.50∗∗∗ 0.29∗∗∗
*** p < 0.01, ** p < 0.05, * p < 0.10
52
Table 8: Analysis for buy and sell orders. This table presents the regression results where IS is the
dependent variable in separate datasets of buy and sell orders. The main independent variables are
three execution signals: NegChildOrderVol measures the sign-adjusted volatility of relative child-order
sizes. NegIntervalVol, that measures the sign-adjusted volatility of normalized trading intervals between
consecutive child-orders. QtyTimeCorrel measures the correlation between cumulative executed quantity
and total time. I add participation rate, bid-offer spread, the ratio of aggressive and passive orders, mid-
quote volatility, execution duration, turnover, and logarithm of market capitalization as control variables
but only the coefficients on participation rate are shown. All regressions include stock, client and intraday
dummies. Standard errors are given in parentheses and are adjusted by clustering on calendar day.
Dependent Variable: IS
Buy Sell Buy Sell Buy Sell
(1) (2) (3) (4) (5) (6)
∗∗
(62.99) (47.10)
NegIntervalVol 79.65∗∗∗ 63.10∗∗

(27.20) (31.85)
QtyTimeCorrel 90.54∗∗ 98.22∗∗∗

(41.12) (31.18)
Observations 9,856 10,479 9,856 10,479 9,856 10,479

Adjusted R2 0.08 0.12 0.02 0.12 0.08 0.12
*** p < 0.01, ** p < 0.05, * p < 0.10
53
Table 9: Stock-by-stock analysis. IS is regressed on execution control variables on a stock-by-stock basis. 385
stocks with more than 20 parent-order executions are used in the analysis. I use participation rate, the ratio of
passive and aggressive orders, bid-offer spread, mid-quote volatility, execution duration, turnover, logarithm of
market capitalization as control variables. For each signal, I report the average β coefficient across stocks and the
corresponding t-statistic. In the third row, the fraction of positive coefficients is given.

(1) (2) (3)
Mean β 251.43 160.84 137.74
t(β) 3.26 3.43 2.09
% positive 0.66 0.68 0.62
54
Table 10: This table presents the regression results where dependent variables are two proxies of AT
activity: average trade size or trade-to-order ratio (%). I use execution data from November 2011 and
December 2011. Ban is a binary variable with the value of 0 before the ban and 1 after the ban. I
add bid-offer spread, mid-quote volatility, and logarithm of market capitalization and stock fixed effects.
Standard errors are given in parentheses and are adjusted by clustering on calendar day.
Dependent variable:
Average Trade Size Trade-to-Order Ratio (%)
Ban 14.15∗∗∗ 0.52∗∗∗
(4.17) (0.18)
Spread 0.07∗∗ 0.003∗∗∗

(0.03) (0.001)
Volatility 2,118.07∗ 42.53

(1,139.72) (38.59)
Log Market Cap −44.40 −0.35

(67.80) (1.22)
Constant 128.49∗∗∗ 0.47

(48.79) (1.07)
Stock FE Yes Yes

Observations 2,141 2,141
Adjusted R2 0.97 0.57
*** p < 0.01, ** p < 0.05, * p < 0.10
55
Table 11: This table presents the regression results where IS is the dependent variable. The main in-
dependent variables are three execution signals and post-ban dummy, Ban and their interaction terms:
NegChildOrderVol measures the sign-adjusted volatility of relative child-order sizes. NegIntervalVol, that
measures the sign-adjusted volatility of normalized trading intervals between consecutive child-orders.
QtyTimeCorrel measures the correlation between cumulative executed quantity and total time. Ban is
a binary variable with the value of 0 before the ban and 1 after the ban. I add participation rate, the
ratio of aggressive and passive orders, bid-offer spread, mid-quote volatility, execution duration, turnover,
and logarithm of market capitalization as control variables. For conciseness, I only report the coefficients
of interest. All regressions include stock, client and intraday dummies. Standard errors are given in
parentheses and are adjusted by clustering on calendar day.
(1) (2) (3)
NegChildOrderVol 151.82
(137.11)
NegIntervalVol 216.36∗∗
(86.03)
QtyTimeCorrel 245.30∗∗∗
(52.27)
Ban × NegChildOrderVol −143.25

(154.12)
Ban × NegIntervalVol −204.09∗∗

(100.88)
Ban × QtyTimeCorrel −204.21∗∗∗

(77.65)
Ban × Participation Rate 55.34 81.55 49.04

(101.09) (102.76) (100.11)
Ban × Aggressive Ratio 18.39 17.86 16.71

(21.61) (21.23) (21.30)
Ban × Passive Ratio −29.48 −25.19 −31.48

(24.41) (24.13) (23.26)
Observations 5,115 5,115 5,115

Adjusted R2 0.10 0.10 0.10
*** p < 0.01, ** p < 0.05, * p < 0.10
56
Table 12: This table presents the regression results where dependent variables are logarithm of interval
volume (excluding client’s order) and the ratio of client’s order size to interval volume. Noisy is a binary
variable with the value of 1 if the arrival mid-price has a decimal in the set of {.000, .005, .010, .990, .995}
and 0 otherwise. I add bid-offer spread, mid-quote volatility, execution duration, and logarithm of market
capitalization as control variables. All regressions include stock, client and intraday dummies. Standard
errors are given in parentheses and are adjusted by clustering on calendar day.
Dependent variable:
Qi
log(T otQi ) T otQi
(1) (2)
∗∗
Noisy 0.05 −0.003∗∗
(0.02) (0.001)
Spread −0.01 −0.0001

(0.01) (0.0003)
Volatility 22.35∗∗∗ −0.34∗∗∗

(1.87) (0.03)
Order Duration 1.78∗∗∗ −0.02

(0.46) (0.02)
Log Market Cap −0.20∗∗∗ −0.004∗

(0.06) (0.002)
Observations 20,335 20,335

Adjusted R2 0.51 0.34
*** p < 0.01, ** p < 0.05, * p < 0.10
57
Table 13: This table presents the regression results where IS is the dependent variable. The main in-
dependent variables are three execution signals and noise trading dummy, Noisy and their interaction
terms: NegChildOrderVol measures the sign-adjusted volatility of relative child-order sizes. NegInter-
valVol, that measures the sign-adjusted volatility of normalized trading intervals between consecutive
child-orders. QtyTimeCorrel measures the correlation between cumulative executed quantity and total
time. Noisy is a binary variable with the value of 1 if the arrival mid-price has a decimal in the set of
{.000, .005, .010, .990, .995} and 0 otherwise. I add participation rate, bid-offer spread, the ratio of aggres-
sive and passive orders, mid-quote volatility, execution duration, and logarithm of market capitalization
as control variables. All regressions include stock, client and intraday dummies. For conciseness, I only
report the coefficients of interest. Standard errors are given in parentheses and are adjusted by clustering
on calendar day.
(1) (2) (3)
Noisy × NegChildOrderVol −65.09
(291.10)
Noisy × NegIntervalVol −378.63∗∗

(165.94)
Noisy × QtyTimeCorrel −220.21∗∗

(93.33)
Noisy × Participation Rate 466.14∗∗ 508.72∗∗ 492.33∗∗

(235.65) (232.29) (225.75)
Noisy × Aggressive Ratio −28.50 −10.34 −19.65

(38.98) (36.80) (37.06)
Noisy × Passive Ratio −58.13 −32.40 −44.20

(45.34) (47.47) (40.09)
Observations 20,335 20,335 20,335

Adjusted R2 0.07 0.07 0.07
*** p < 0.01, ** p < 0.05, * p < 0.10
58
Table 14: This table reports each investor’s average PPI with its corresponding t-statistic. The dataset has 146 distinct investors. Client
identifiers are masked with numerical aliases. I define PPI according to Equation 11 and compute the average according to Equation 12.
C78 has only one execution and thus the t-statistic is not defined.
ID AvgPPI t-stat ID AvgPPI t-stat ID AvgPPI t-stat ID AvgPPI t-stat ID AvgPPI t-stat
(bps) (bps) (bps) (bps) (bps)
C26 259.91 11.46 C123 33.52 3.70 C88 6.63 0.22 C84 -7.65 -0.55 C9 -37.30 -1.03
C64 162.30 7.88 C7 32.87 2.07 C31 6.30 0.13 C19 -9.39 -0.45 C41 -38.73 -0.95
C138 123.73 3.96 C30 31.85 1.87 C15 6.25 0.32 C75 -10.20 -0.43 C76 -41.25 -3.45
C28 123.56 3.82 C6 27.93 1.53 C22 5.73 0.06 C102 -11.64 -0.94 C131 -41.32 -4.05
C70 116.60 6.03 C81 26.07 2.16 C106 5.72 0.37 C144 -11.65 -0.88 C77 -46.09 -2.45
C73 115.85 6.72 C99 23.20 1.84 C67 4.83 0.24 C46 -13.14 -0.24 C12 -47.50 -1.76
C51 107.63 4.71 C117 22.48 0.92 C132 4.52 0.45 C135 -13.54 -0.62 C38 -49.04 -2.47
C96 93.51 2.88 C114 21.87 2.24 C125 4.46 0.40 C142 -13.66 -1.32 C57 -49.23 -3.04
C50 92.57 4.18 C133 20.09 1.78 C136 4.33 0.54 C90 -13.76 -0.67 C134 -53.27 -4.11
C119 87.90 2.81 C89 19.88 1.05 C101 4.17 0.26 C44 -14.17 -1.01 C3 -61.05 -1.82
C93 85.77 7.16 C86 19.46 0.83 C137 4.17 0.25 C118 -14.78 -1.22 C120 -62.99 -3.66
C34 84.86 4.92 C141 19.20 2.77 C69 3.41 0.23 C122 -14.88 -1.13 C23 -63.89 -3.21
59
C65 80.12 1.66 C2 18.68 0.29 C140 3.08 0.11 C52 -15.83 -1.03 C129 -78.95 -6.85
C71 79.31 2.73 C59 17.96 1.24 C130 1.68 0.15 C108 -15.85 -1.38 C25 -81.14 -3.73
C60 79.17 6.36 C55 16.80 0.60 C143 1.17 0.05 C62 -16.26 -0.44 C78 -84.55 N/A
C107 78.10 2.27 C5 15.23 0.18 C74 0.69 0.05 C37 -16.40 -0.86 C54 -88.37 -1.11
C92 67.32 1.45 C14 15.13 0.79 C13 -0.70 -0.04 C79 -17.77 -1.05 C95 -92.75 -5.20
C83 65.24 2.23 C21 15.07 0.62 C29 -1.08 -0.06 C43 -17.87 -0.53 C20 -94.73 -4.28
C115 64.31 7.54 C126 14.52 1.65 C1 -1.47 -0.07 C87 -18.79 -1.71 C16 -94.98 -7.78
C48 63.99 3.17 C63 13.83 0.60 C139 -1.62 -0.17 C24 -19.60 -1.53 C58 -97.25 -3.94
C61 59.98 2.88 C85 13.41 0.98 C80 -2.85 -0.26 C36 -21.52 -0.38 C4 -115.97 -6.57
C103 55.55 1.42 C98 12.54 1.05 C97 -3.01 -0.31 C27 -22.28 -0.53 C112 -173.99 -7.42
C8 54.43 2.96 C40 10.94 0.78 C145 -3.11 -0.33 C32 -25.79 -0.86 C56 -263.00 -5.86
C100 49.03 2.27 C72 10.62 0.98 C110 -4.03 -0.26 C49 -26.86 -1.49 C53 -268.00 -13.53
C17 48.23 1.07 C116 10.46 1.15 C146 -4.81 -0.16 C104 -27.03 -0.48 C18 -511.08 -25.06
C105 40.97 2.23 C113 9.79 0.93 C47 -4.94 -0.24 C45 -28.07 -1.21 C33 -991.23 -1.04
C42 38.69 1.87 C35 9.25 0.42 C91 -5.33 -0.36 C109 -30.43 -2.14
C124 35.10 1.17 C127 8.83 0.24 C94 -6.22 -0.52 C68 -35.21 -2.53
C11 34.61 1.38 C128 7.20 0.55 C82 -6.39 -0.50 C39 -36.52 -2.31
C10 34.05 1.43 C121 7.16 0.46 C111 -6.45 -0.31 C66 -36.56 -2.76
Table 15: This table presents the regression results where OPI, PPI and TPI are the dependent variables respectively. The main independent
variables are three execution signals: NegChildOrderVol measures the sign-adjusted volatility of relative child-order sizes. NegIntervalVol,
that measures the sign-adjusted volatility of normalized trading intervals between consecutive child-orders. QtyTimeCorrel measures the
correlation between cumulative executed quantity and total time. I add participation rate, bid-offer spread, the ratio of aggressive and
passive orders, mid-quote volatility, execution duration, turnover, and logarithm of market capitalization as control variables but only
the coefficients on participation rate are shown. All regressions include stock, client and intraday dummies. Standard errors are given in
OPI PPI TPI OPI PPI TPI OPI PPI TPI
(1) (2) (3) (4) (5) (6) (7) (8) (9)
60
∗∗∗
NegChildOrderVol 161.69 92.32 69.37
(61.14) (118.87) (87.80)
NegIntervalVol 84.20∗ 64.15∗ 20.05

(49.20) (37.36) (50.67)
QtyTimeCorrel 104.30∗∗∗ 167.24∗∗ −62.93

(34.60) (84.70) (68.46)
Observations 20,335 20,335 20,335 20,335 20,335 20,335 20,335 20,335 20,335
Adjusted R2 0.10 0.07 0.07 0.10 0.07 0.07 0.10 0.07 0.07
*** p < 0.01, ** p < 0.05, * p < 0.10
Table 16: This table presents the regression results where IS is the dependent variable. The main indepen-
dent variables are three execution signals and their interaction with PPI, the permanent price impact of the
order (in bps) that can proxy informed trading. NegChildOrderVol measures the sign-adjusted volatility of
relative child-order sizes. NegIntervalVol, that measures the sign-adjusted volatility of normalized trading
intervals between consecutive child-orders. QtyTimeCorrel measures the correlation between cumulative
executed quantity and total time. I add participation rate, bid-offer spread, the ratio of aggressive and
passive orders, mid-quote volatility, execution duration, turnover, and logarithm of market capitalization
as control variables. All regressions include stock, client and intraday dummies. Standard errors are given
in parentheses and are adjusted by clustering on calendar day.
(1) (2) (3)

∗∗
PPI × NegChildOrderVol 0.87
(0.40)
PPI × NegIntervalVol 0.15

(0.24)
PPI × QtyTimeCorrel 1.07∗∗

(0.45)
NegChildOrderVol 68.76∗∗
(33.45)
NegIntervalVol 37.54∗∗
(17.36)
QtyTimeCorrel 66.34∗∗
(30.61)
PPI 0.25∗∗∗ 0.24∗∗∗ −0.81∗

(0.01) (0.02) (0.44)
Observations 20,335 20,335 20,335

Adjusted R2 0.38 0.38 0.38
*** p < 0.01, ** p < 0.05, * p < 0.10
61
Table 17: This table presents the regression results where SC and APM are the dependent variables respectively. The main independent
variables are three execution signals: NegChildOrderVol measures the sign-adjusted volatility of relative child-order sizes. NegIntervalVol,
that measures the sign-adjusted volatility of normalized trading intervals between consecutive child-orders. QtyTimeCorrel measures the
correlation between cumulative executed quantity and total time. I add participation rate, bid-offer spread, the ratio of aggressive and
passive orders, mid-quote volatility, execution duration, turnover, and logarithm of market capitalization as control variables but only
the coefficients on participation rate are shown. All regressions include stock, client and intraday dummies. Standard errors are given in
SC APM SC APM SC APM
(1) (2) (3) (4) (5) (6)
62
∗∗
(3.87) (43.51)
NegIntervalVol 0.16 54.35∗∗

(0.48) (21.84)
QtyTimeCorrel −1.54 72.43∗∗∗

(2.02) (25.38)
Observations 20,335 20,335 20,335 20,335 20,335 20,335

Adjusted R2 0.12 0.07 0.11 0.07 0.12 0.07
*** p < 0.01, ** p < 0.05, * p < 0.10

SSRN Id2828363 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSRN Id2828363 PDF

Uploaded by

Copyright:

Available Formats

Order Anticipation around Predictable Trades∗

Initial Version: August 2016

I study the presence of order anticipation strategies by examining predictable patterns in

Keywords: Order Anticipation, Algorithmic Trading, Predatory Trading, Back-running

trading decisions at high-frequency. Usually referred to as high-frequency traders (HFTs), they

due to the information-based trading.

1.8% of the volume traded during the execution.

minimizing information leakage.

control for potential contemporaneous correlation between illiquidity and signals.

cost of predictability again decreases.

broadly consistent with back-running theory.

equilibrium despite sharing informational rents with strategic traders.

their underlying intuition can be influential for future studies.

Investment Industry Regulatory Organization of Canada, a regulatory organization for Canada’s

may emerge in equilibrium under misaligned objectives. Finally, I conclude in Section 9.

anticipation. I summarize my expectations from the perspectives of three competing theories:

in back-running due to initial learning about the large order.

inferring from the past order flow.

these two theories.

technology, the cost of predictability decreases.

3.1. Proprietary Execution Data

the client identity.

with missing entries of participation rate, spread, volatility, or duration.

the price resiliency of stocks with high ETF ownership.

3.2. Summary Statistics

execution. Formally, the IS of the ith parent-order is given by

when the parent order starts being executed.

studies, I will use these variables as controls throughout my empirical analysis.

4. Identification of Predictable Executions

4.1. Persistent Trade Size

leakage. Formally, for the ith parent-order in my data, I define

order size) should be correlated with higher costs.

Implementation Shortfall (bps)

Implementation Shortfall (bps)

4.2. Trading in Constant Intervals

Time Elapsed since Last Trade (s)

Implementation Shortfall (bps)

Implementation Shortfall (bps)

leakage. Formally, for the ith parent-order in my data, I define

sign-adjusted volatility of normalized trading intervals between consecutive child-orders. I again

NegIntervalVol are associated with high (low) periodicity of child-order trades.

by algorithmic traders exploiting deterministic patterns.

4.3. Constant Trading Rate

schedule with fixed trading rate.

Implementation Shortfall (bps)

Implementation Shortfall (bps)

predictable execution is higher than its unpredictable counterpart.

4.4. Multivariate Regressions

at the execution level with a rich set of control variables:

turnover, logarithm of market capitalization.

2.7 bps, the coefficients on the signals are economically significant.

4.5. Lagged Signals and Execution Costs

is consistent with the learning mechanism in Yang and Zhu (2015).

them InitNegChildOrderVol, InitNegIntervalVol and InitQtyTimeCorrel, respectively. Finally, I

impact measures corresponding to the initial and final periods:

P̄i,1 − Pi,0 P̄i,2 − P̄i,1

remaining Final bin.

using the same set of control variables:

consistent with the learning mechanism in the back-running theory.

4.6. Deterministic Patterns

by strategic traders, exist in the data.

the imperfect randomization of the trading algorithm.9

is 97.2 with a corresponding t-statistic of 8.1.10