You are on page 1of 68

Applied Finance

Statistical tools in algorithmic trading

Piotr Wojcik

WNE UW

Lecture #1, 20.11.2019


High-frequency data definition, characteristics and sources

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 1 / 68


Table of contents

1 Organizational matters

2 Statistical properties of returns

3 High-frequency data

4 Empirical characteristics of high-frequency data

5 High-frequency data sources

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 2 / 68


Aim of this part of the course

The aim of this part of the Applied Finance course is:


to give basic theoretical background for high-frequency algorithmic
trading,
to learn characteristics of high-frequency data and steps needed to
prepare the data, aggregate it to desired frequency,
to show how to build and verify profitability of simple own trading
strategies,
to learn specifics of backtesting/validation of machine learning
algorithms applied on time series data.
R environment will be used for practical examples – its previous
knowledge is expected.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 3 / 68


Lectures and labs

This part of the course will consist of four lectures:


1 High frequency data
2 Types of strategies, backtesting (validation)
3 Building an automated strategy
4 Statistical arbitrage and event arbitrage
Lectures will be accompanied by two practical lab sections with R:
1 Dealing with time series data incl. intraday, data aggregation
2 Applying simple strategies

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 4 / 68


Literature

Aldridge, I. (2013), High-Frequency Trading: A Practical Guide to


Algorithmic Strategies and Trading Systems
Chan, E. (2008), Quantitative Trading: How to Build Your Own Algorithmic
Trading Business
Chan, E. (2013), Algorithmic Trading: Winning strategies and their
rationale
Fabozzi, F.J., Focardi, S.M. and Kolm, P.N. (2010), Quantitative Equity
Investing: Techniques and Strategies

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 5 / 68


Statistical properties of returns

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 6 / 68


Returns

financial data is typically analyzed using returns,


a return is a difference between two subsequent price quotes
normalized by the earlier price level,
independent of the price level, returns are convenient for direct
performance comparisons across various financial instruments.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 7 / 68


Simple and logarithmic return

A simple return measure can be computed as:

Pt − Pt −1 Pt
Rt = = − 1,
Pt −1 Pt −1
where
Rt is the return for period t,
Pt is the price of the financial instrument in period t,
Pt −1 is the price of the financial instrument in period t − 1.
Despite the intuitiveness of simple returns, much of the financial literature
relies on log returns, which are defined as:

Pt
rt = log = log Pt − log Pt −1 ,
Pt −1

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 8 / 68


Why log returns?

Log returns are often preferred to simple returns for the following reasons:
if log returns are assumed to follow a normal distribution, the
underlying simple returns and the asset prices itself follow a
lognormal distribution,
lognormal distribution better reflects the actual distributions of asset
prices (eg. asset prices are generally positive),
lognormal distributions have fatter tails than in normal distributions
(like distributions of asset prices),
although not perfect in modeling fat tails of asset prices, lognormal
distributions approximates it better than normal distributions.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 9 / 68


Average return

both simple and log returns can be averaged over time to obtain
lower-frequency return estimates,
an average of simple and log returns can be computed as usual
arithmetic averages:

T
1X
E (R ) = Rt
T
t =1

T
1X
µ= rt
T
t =1

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 10 / 68


Volatility

variation in sequential returns is known as volatility, which can be


measured in a variety of ways,
the simplest measure of volatility is variance of simple or log returns:
T
1 X
var (R ) = (Rt − E (R ))2
T −1
t =1

T
1 X
2
σ = (rt − µ)2
T −1
t =1

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 11 / 68


Another common statistics

Other common statistics used to describe distributions of prices, simple or


log returns are skewness and kurtosis.
skewness measures whether a distribution skews towards either the
positive or the negative side of the mean, as compared with the
standardized normal distribution,
kurtosis is a measure of fatness of the tails of a distribution; the fatter
the tails of a return distribution, the higher the chance of an extreme
positive or negative return.
Extreme negative returns can be particularly damaging to a trading
strategy, potentially wiping out all previous profits.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 12 / 68


Skewness and kurtosis

Skewness can be measured as:

(Rt − E (R ))3
PT
1 t =1
S (R ) = 3
T −1 (var (R )) 2
Skewness of the standardized normal distribution is 0.

Kurtosis can be computed as follows:

(Rt − E (R ))4
PT
1 t =1
K (R ) =
T −1 (var (R ))2
The standard normal distribution has a kurtosis of 3.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 13 / 68


Autocorrelation

another metric useful to describe distributions of returns is


autocorrelation,
it is a measure of serial dependence between subsequent returns
sampled at a specific frequency,
autocorrelation value of order p can be determined by:

PT
t =p+1 [(Rt − E (R ))(Rt −p − E (R ))]
ρ(p) = qP qP
T T
t =p+1 ( Rt − E (R )) t =p+1 (Rt −p − E (R ))

above equation uses simple returns to compute autocorrelation, but


returns of any type can be used instead

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 14 / 68


Autocorrelation in trading

autocorrelation allows us to check whether there are any persistent


momentum/reversal relationships in the data that we could trade
upon,
for example, it is a well-known stylized fact that a large swing, or
momentum, in the price of a financial security is typically followed by
a reversal,
using autocorrelation at different frequencies we can actually
establish whether the patterns persist and whether we can trade upon
them.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 15 / 68


High-frequency data

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 16 / 68


What is high-frequency data?

20 years ago it was daily data


large data sets consisted of 1000s of stocks over 20–30 years
(5–10 millions of data rows),
now it is tick-by-tick or transaction level data on prices, quotes,
volume, order book
large data sets consist of 1000s of stocks over 10–15 years
(billions of data rows)

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 17 / 68


Tick data

The highest-frequency data is a collection of sequential “ticks”, arrivals of


the latest quote, trade, price, and volume information. Tick data usually
has the following properties:
a timestamp,
a financial security identification code,
an indicator of what information it carries:
bid price,
ask price,
available bid volume,
available ask volume,
last trade price,
last trade size,
option-specific data, such as implied volatility,

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 18 / 68


A timestamp

a timestamp records the date and time at which the quote originated,
it may be the time at which the exchange or the broker-dealer
released the quote, or the time when the trading system has received
the quote,
the quote travel time from the exchange or the broker-dealer to the
trading system can be as small as 20 milliseconds,
therefore all sophisticated systems include milliseconds as part of
their timestamps.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 19 / 68


Bid and ask prices and volumes

the bid quote/price is the highest price available for sale of the
security in the market,
the ask quote/price is the lowest price entered for buying the security
at any particular time,
available bid and ask volumes indicate the total demand and supply,
respectively, at the bid and ask prices.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 20 / 68


Bid-ask spread

the bid-ask spread is the difference between the bid quote and the
ask quote at any given time,
the bid-ask spread is the cost of instantaneously buying and selling
the security,
the higher the bid-ask spread, the higher the gain the security must
produce in order to cover the spread along with other transaction
costs,
most low-frequency price changes are large enough to make the
bid-ask spread negligible in comparison
in tick data incremental price changes can be comparable or smaller
than the bid-ask spread.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 21 / 68


Sample bid and ask quotes

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 22 / 68


Histograms of bid and ask spreads

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 23 / 68


Last trade price and size

the last trade price shows the price at which the last trade in the
security cleared,
last trade price can differ from the bid and ask,
the differences can arise when a customer posts a favorable limit
order that is immediately matched by the broker without broadcasting
the customer’s quote,
last trade size shows the actual size of the last executed trade.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 24 / 68


Empirical characteristics of high-frequency data

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 25 / 68


Empirical characteristics of high-frequency data

High-frequency data have some unique characteristics:


large volumes,
discrete-value prices,
existence of a daily periodic or diurnal pattern.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 26 / 68


Large volumes

Reuters transmitted more than 275,000 prices per day for foreign
exchange rate spot market,
median stock in the Russell 3000 produces approximately 2,100 ticks
per trading day,
number of observations in a single day of tick-by-tick data is
equivalent to 30 years of daily observations,
the quality of data does not always match its quantity,
as a consequence, there are also considerable potential of errors in
the data. The “bad ticks” need to be cleaned or, be filtered prior to
further analysis.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 27 / 68


Discrete price changes

changes in transaction price are discrete and only fall on a set of


values,
this is because different exchanges make certain rules to restrict price
changes to retain stability and functionality,
the smallest allowable price change is called a tick,
price changes must fall on multiples of the tick,
in some exchanges there are also other limits on intra-day price
change,
the discreteness leads to a high degree of kurtosis.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 28 / 68


Histograms of price changes (in ticks)

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 29 / 68


Table of price changes (in ticks)

63 trading days and 60.328 transactions,


price changes between trading days ignored,
distribution of positive and negative price changes was approximately
symmetric, with high frequency in zero and one ticks.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 30 / 68


Empirical summary statistics for log returns

Kurtosis increases with the frequency of the data.


Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 31 / 68
Periodic patterns

high-frequency data often contain strong periodic/diurnal patterns


(seasonality),
transaction volume and volatility follow so called U-curve pattern – are
significantly high after the open and shortly before the close,
in the foreign exchange markets, where there is no open and close,
volume and volatility is systematically high in active periods of the
day, where the active periods of global markets overlap.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 32 / 68


U-shape for volume

similar U-shape could also be


found in volatility, return and
bid-ask spread,
in most cases, the minimum is
observed between 11:30 am
and 1:30am.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 33 / 68


U-shape reasons

The reason for the U-shaped curve is simple to explain:


at the beginning of a trade day, all the information is well absorbed by
the market overnight,
market participants have analyzed the information and made their
trade strategy for the coming day,
at the opening they adjust their positions and simultaneously submit
orders to build new position or empty old position,
that is why more transactions take place and drive the volatility high,
during the rest time of the day, they prefer to “wait and see”,
especially during lunch break,
just before closing, all market participants make their final
adjustments according to the information they received during the
trade day, which pushes market volatility to another high.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 34 / 68


How the global FX market works

over-the-counter (OTC) or off-exchange trading is to trade financial


instruments directly between two parties,
due to the OTC nature, there’s no central marketplace for currency
transactions, therefore no single quotes,
there’re three main trading centers: London, New York and Tokyo,
currency trading happens continuously throughout the day around the
world,
as the trading session in Asia ends, the European session begins,
then followed by the North American session and finally back to the
Asian session.,
strong seasonality can be found in OTC market, however, not the
typical U-shaped curve.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 35 / 68


Global FX market – analysis difficulties

analyzing the global FX market is a difficult task,


apart from the different time zones, different countries also have
different public holidays,
another problem is with the day-light saving time: many countries, like
most Asian countries do not utilize day-light saving time at all,
those differences must be considered in cross-market comparison.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 36 / 68


Seasonalities in FX global market

empirical studies confirm the existence of significant seasonality in


global FX market,
for USD/EUR as example: the main daily maximum of volatility occurs
between 14:00 GMT and 16:00 GMT, when both the European and
the American markets are active,
the main daily minimum is between 3:00 GMT and 4:00 GMT, when
both European and American markets are closed, while lunch-break
in Asian markets,
other currencies reported very similar periodic pattern,
the market activity is high when relevant markets are open and
actively traded, especially when active period of two or more relevant
markets overlapped.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 37 / 68


Volume in FX market

late afternoon in EU is both


active time of EU and US
markets,
Euro/Dollar exchange rate often
represents the highest volatility
during this period.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 38 / 68


Intra-week pattern

intraweek high frequency data also showes significant seasonality,


almost all exchanges are closed on weekends and holidays and
therefore no trading takes place,
for weekdays the level of activity is very different,
a day-of-the-week effect can be observed across different exchanges:
in general there’s a minimum of activity on Monday and a
maximum of activity on the last two working days of the week,
more precisely, the market activity increases gradually from Monday
to Friday,
similar periodic patterns can be also found in return, volatility and
spread.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 39 / 68


Intra-week volume – W-shaped curve

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 40 / 68


High-frequency data sources

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 41 / 68


Commercial sources for historical high-frequency data

NYSE Trade and Quotes (TAQ),


Bloomberg (American equities) – Rblpap – R interface,
Olsen & Associates (FX),
Interactive Brokers (IBrokers) – IBrokers – R interface,.

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 42 / 68


Commercial sources for historical high-frequency data

tickdata
https:
//www.tickdata.com/historical-market-data-products/
quantquote
https://quantquote.com/
Πtrading.com
http://pitrading.com/historical-data.html
kibot
http://www.kibot.com/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 43 / 68


https://www.tickdata.com/historical-market-data-products

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 44 / 68


https://quantquote.com/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 45 / 68


https://quantquote.com/products_tick-data.php#sample – sample data

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 46 / 68


http://pitrading.com/historical-data.html

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 47 / 68


http://www.kibot.com/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 48 / 68


http://www.kibot.com/Buy.aspx#free_historical_data/ – free samples

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 49 / 68


Sample FREE sources for historical high-frequency (intraday) data

www.dukascopy.com (all major currency pairs, some commodities,


indices and many stocks),
https://www.alphavantage.co/ (1-minute data for past 10-15
trading days),
bossa.pl (Polish stocks, bonds and futures),
https://www.truefx.com/ (FX market)

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 50 / 68


https://www.dukascopy.com/swiss/english/marketwatch/historical/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 51 / 68


http://www.dukascopy.com/swiss/english/marketwatch/historical/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 52 / 68


http://www.dukascopy.com/swiss/english/marketwatch/historical/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 53 / 68


http://www.dukascopy.com/swiss/english/marketwatch/historical/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 54 / 68


http://www.dukascopy.com/swiss/english/marketwatch/historical/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 55 / 68


http://www.dukascopy.com/swiss/english/marketwatch/historical/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 56 / 68


http://www.dukascopy.com/swiss/english/marketwatch/historical/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 57 / 68


http://www.dukascopy.com/swiss/english/marketwatch/historical/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 58 / 68


http://www.dukascopy.com/swiss/english/marketwatch/historical/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 59 / 68


http://www.dukascopy.com/swiss/english/marketwatch/historical/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 60 / 68


http://www.dukascopy.com/swiss/english/marketwatch/historical/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 61 / 68


https://www.alphavantage.co/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 62 / 68


https://www.truefx.com/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 63 / 68


http://bossa.pl/notowania/pliki/intraday/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 64 / 68


http://bossa.pl/notowania/pliki/intraday/metastock/

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 65 / 68


http://bossa.pl/notowania/pliki/intraday/metastock/ – cont’d

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 66 / 68


http://bossa.pl/notowania/pliki/intraday/metastock/ – cont’d

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 67 / 68


Thank you for your attention!

Piotr Wojcik (WNE UW) Lecture #1, 20.11.2019 68 / 68

You might also like