Professional Documents
Culture Documents
by
Neel Hajare
S.B., Massachusetts Institute of Technology (2013)
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 2018
Copyright 2018 Neel Hajare. All rights reserved.
The author hereby grants to MIT permission to reproduce and to
distribute publicly paper and electronic copies of this thesis document in
whole or in part in any medium now known or hereafter created.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Department of Electrical Engineering and Computer Science
September 4, 2018
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prof. Haoxiang Zhu, Thesis Supervisor
September 4, 2018
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Katrina LaCurts, Chair, Masters of Engineering Thesis Committee
Pricing and Arbitrage in Cryptocurrency Markets
by
Neel Hajare
Abstract
2
Acknowledgments
I would like to thank the many people without whose help this thesis would not have been
possible.
First, I would like express my sincerest gratitude to my thesis supervisor, Professor Haoxi-
ang Zhu, for advising me throughout this work. I truly appreciate his willingness to let me
explore and run with my ideas and his incredible patience and understanding as this thesis
came to fruition. His feedback and guidance were critical, and his encouragement pushed
I am incredibly thankful for all the guidance and assistance I have received from Anne
Hunter throughout my MIT career. I would not have been able to complete either of
my degrees without her help. She has truly been instrumental in helping me realize my
dreams.
This thesis could not have been completed without the continual love and support from
those closest to me throughout this process. I will be forever grateful to Lisl Esherick,
Zach Zappala, Joseph Ong, and Belinda Gu for being there for me as I worked towards
this goal.
Key contributions from Jimmy Myatt and Colin McSwiggen were pivotal in getting me
unstuck and allowing me to continue progressing. Kind words of advice from Ilica Maha-
jan and Ryder Moody offered comfort at moments when I felt most discouraged.
Finally, I would not be where I am today without my family. My parents and my sister
have provided so much for me throughout my life, and this thesis is a testament to their
3
Contents
1 Introduction 15
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Background 21
2.1 Cryptocurrencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Exchanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.3 Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 FIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.3 WebSocket . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Data Collection 30
4
3.1.1 Currency pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.2 Exchanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 GDAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.2 Bitfinex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.3 Bitstamp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5.1 GDAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.5.2 Bitfinex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.5.3 Bitstamp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5 Arbitrage 110
5
5.6 Predicting arbitrage opportunities based on trading volume . . . . . . . . . . 128
6.5 Comparing arbitrage opportunities with deposit and withdrawal friction . . 132
7 Conclusion 134
6
List of Figures
7
3-20 Final Bitstamp ETH Update Time Deltas . . . . . . . . . . . . . . . . . . . 66
8
4-12 Bitstamp BCH Bid/Ask Spread . . . . . . . . . . . . . . . . . . . . . . . . . 108
9
List of Tables
10
4.9 Bitfinex BCH Bid/Ask Spread Data . . . . . . . . . . . . . . . . . . . . . . 103
5.4 Regression results using 5-second intervals, indifferent to arbitrage direction 118
5.17 Regression results for arbitrage window based on trading volume for BTC . 129
5.18 Regression results for arbitrage window based on trading volume for ETH . 129
11
5.19 Regression results for arbitrage window based on trading volume for LTC . . 129
5.20 Regression results for arbitrage window based on trading volume for BCH . 129
12
List of Listings
3.4 JavaScript code for reading from GDAX ticker update stream . . . . . . . . 35
3.5 Specification for request and response for Bitfinex ticker update subscription 36
3.7 JavaScript code for reading from Bitfinex ticker update stream . . . . . . . . 39
3.8 JavaScript code for reading from Bitstamp ticker update stream . . . . . . . 41
3.9 Specification for request and response for Bitfinex ticker update subscription 53
3.11 JavaScript code for reading from Bitfinex trades update stream . . . . . . . 56
3.16 JavaScript code for reading from GDAX level2 update stream . . . . . . . . 84
3.17 Specification for request and response for Bitfinex order books update sub-
scription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
13
3.19 JavaScript code for reading from Bitfinex order books update stream . . . . 88
3.21 JavaScript code for reading from Bitstamp ticker update stream . . . . . . . 92
A.1 JavaScript code for reading from GDAX ticker update stream with logging . 137
A.2 Final JavaScript code for reading from GDAX ticker update stream . . . . . 138
A.3 JavaScript code for reading from Bitfinex ticker update stream with logging 139
A.4 Final JavaScript code for reading from Bitfinex ticker update stream . . . . 141
A.6 Updated Python code to process Bitfinex Order Book updates . . . . . . . . 145
A.7 Python code to pre-process time deltas between Bitfinex Order Book updates 148
14
Chapter 1
Introduction
During 2017, cryptocurrencies moved from the technology fringe to the mainstream.
Near the start of the year, Bitcoin, the oldest and most well known cryptocurrency, sur-
passed its previous all-time high price of $1000 (last reached in 2013) and continued to rise
throughout the year to nearly $20,000 in December [15]. This price implied a total market
cap of over $300,000,000,000 [26]. Traditional television news outlets began covering price
movements daily and included segments explaining Bitcoin and other cryptocurrencies to
viewers [34]. During the course of the year, Coinbase, a leading cryptocurrency broker in
the United States, grew to have more customers than Charles Schwab [12]. Wall Street
financial firms went from publicly deriding cryptocurrencies to scrambling to create new
desks to trade them [61]. Bitcoin futures began trading on the Chicago Mercantile Ex-
change [11].
1.1 Motivation
Much has been published about the theoretical underpinnings of cryptocurrencies, their
15
though, as of the start of this work, relatively little has been said about their pricing and
the functioning of crypto-to-fiat markets [47, 19, 55, 54]. Cryptocurrencies have seen dif-
ferent prices across exchanges in the past, but it might have been expected that through
2017, as more infrastructure was developed, exchanges became more established, and ma-
jor financial institutions entered the space, prices would converge. At the end of 2017,
however, there was even more media coverage than ever about price disparities across ex-
The price premium for Bitcoin in certain countries garnered a lot of attention. In India,
Zimbabwe, and South Korea, premiums reached more than 20% over prices in the United
States [16, 8, 67]. In those cases, it seemed that the premiums were caused by local mone-
Conventional wisdom might suggest that such price differences would not exist on US Dol-
lar denominated exchanges that have fully automated trading. While exchanges make
prices publicly available, there is no freely available dataset for fine-grained prices across
exchanges to investigate this, nor is there any publicly available tool to collect this data.
Beyond rising popularity and rapidly developing infrastructure, another theme for cryp-
tocurrencies in 2017 was the rise of “altcoins,” cryptocurrencies other than Bitcoin. Even
though other cryptocurrencies have existed for years, at the start of 2017, Bitcoin com-
prised over 87% of the total value of all cryptocurrencies; by the end of the year, though,
it was only 37% [36]. One of the motivations driving adoption of altcoins was concern
about the scalability of the Bitcoin technology and network. Indeed, this concern became
evident in December of 2017 as the average Bitcoin transaction fees topped $28 and some
transactions took days to complete [39, 56]. Core selling points of many of the alterna-
tive cryptocurrencies are technological differences from Bitcoin that promise faster and/or
cheaper transactions. Indeed, these other cryptocurrencies do, in practice, have faster and
16
cheaper transactions and these properties make them more suitable as media of exchange
than Bitcoin. For example, coffee shops that have stopped taking Bitcoin due to transac-
tion friction are readily accepting other cryptocurrency alternatives. However, there is no
information as to how these properties affect the exchange prices and markets for these
cryptocurrencies.
To the best of our knowledge, the only prior paper that studies arbitrage in cryptocur-
rency markets is [45]. Though they are also interested in arbitrage opportunities across
cryptocurrency exchanges, our approaches differ. At the highest level, we are primarily
interested in the differences between cryptocurrencies and how they differ in trading pat-
terns and arbitrageability, and we make decisions throughout this work with that intent in
mind. Comparatively, they take a broader view of the markets they survey and examine
how many factors, such as foreign currency capital controls, play a role in arbitrage oppor-
Their work examines a larger number of exchanges with more variety in trading environ-
ments over a longer period of time. The cryptocurrencies they look at, too, vary on several
dimensions. Their dataset is obtained from a third-party firm and it comes with second-
level resolution for timestamps, apparently provided by each exchange independently. The
arbitrage index they use is computed on a per-minute basis, comparing the highest price
across all exchanges in a minute with the lowest price available during that minute. The
order book data they use is also snapshotted on a minutely basis. From this, they aggre-
gate to multi-minute or even day-long timescales and examine patterns on these time-
interval frequencies.
Because of our primary interest in differences between cryptocurrencies, we limit our fo-
cus to markets we presume to be the most likely to be free of exogenous barriers to ar-
bitrage. We include only cryptocurrencies that are open, well-understood, and are not
17
largely controlled by a single entity with significant market power. We examine only the
biggest and most liquid exchanges for US-dollar-denominated markets and limit our scope
to exchanges that both allow US Dollar deposits from US-based investors and provide for
because of the facilities for automated trading on these exchanges, we aim to collect data
do this, we seek to design and implement an original system for collecting and aggregating
this data directly from the exchanges. We can then be confident in our ability to exam-
ine patterns in market activity for time intervals on the order of seconds. Furthermore,
throughout our work, we account for the presence of exchange fees and their role as a bar-
rier to arbitrage.
1.2 Contributions
In this work we present the design and implementation of a system for collecting real-time
pricing data from multiple cryptocurrency exchanges, techniques for offline processing of
these raw data streams into a normalized time-series format suitable for analysis, and fi-
nally, the results of statistical analysis of pricing and trading activity of various cryptocur-
We detail our implementation which uses WebSocket APIs to provide streaming trades
and order book changes data from three of the largest cryptocurrency exchanges—
Bitfinex, GDAX1 , and Bitstamp—for the US Dollar denominated markets for four of the
that, at peak periods, we may see as many as hundreds of updates per second for a sin-
gle trading pair on an exchange. We observe that latency varies by exchange from tens of
1
GDAX has since been renamed to Coinbase Pro, but since it was still called GDAX when the data
was collected, we will refer to it as such.
18
milliseconds to over a second.
We detail the development of processing techniques for transforming these raw exchange
data streams into a format suitable for determining trades during arbitrary intervals and
best bid/ask price points at arbitrary times for each exchange. We document the behavior
observed from the exchange APIs and the heuristics created for compensating for invalid
Using our processed data, we show that average daily trading volume varies from less than
$6.5 million to over $250 million depending on the exchange and cryptocurrency examined.
For a given cryptocurrency on a given exchange, we show that bid/ask spreads vary from
Looking across exchanges, we show that arbitrage opportunities net of the highest possible
exchange fees, exist in 0.35% to 40.38% of 5-second intervals depending on the cryptocur-
rency and exchanges considered. When we exclude Bitfinex, we find that this range nar-
rows to 0.35% to 1.03%. Moreover, we show that cryptocurrencies with higher market caps
and trading volume exhibit lower frequencies of arbitrage opportunities and that these op-
portunities, when they do arise, last for shorter periods of time. We further explore several
approaches for finding relationships between trading volume and arbitrage opportunities
across exchanges. We find that trading behavior across exchanges differs greatly during
these periods, but, using logistic regression techniques, we find only weak evidence link-
second intervals.
19
1.3 Outline
types of APIs offered by cryptocurrency exchanges. Chapter 3 provides details about data
collection and processing. Chapter 4 presents the aggregation of collected trading volume
and bid/ask data. Chapter 5 presents the analysis of the data for evidence of arbitrage
20
Chapter 2
Background
2.1 Cryptocurrencies
The first cryptocurrency, Bitcoin, was introduced in 2009 [47]. Since then, many more
variants have been introduced. While cryptocurrencies vary in specific properties and de-
tails, they share a core group of characteristics. Cryptocurrencies are software-defined cur-
rencies. Mechanisms for ownership, transactions, and varying money supply are all defined
in software. The state of the entire system (distribution of currency amongst owners, total
altered through distributed consensus protocols that progressively add more ”blocks” to
the blockchain [66, 62]. Security (such as the prevention of arbitrary currency creation or
details and subtleties, but we only provide brief explanations of certain concepts that are
21
2.1.1 Transaction time
For a cryptocurrency transaction to occur, the state of the system must be updated to
task that is designed to require a certain amount of time in expectation (this amount of
time, however, varies between different cryptocurrencies). Even then, because of the dis-
that block in the blockchain (history of blocks), no record of the transaction will exist.
Disputes are resolved by selecting the longest chain. It is therefore customary to not trust
a transaction until a certain number of blocks have been created on top of the block con-
taining the transaction. As this number grows, the probability of the transaction being
removed from history becomes vanishingly small [48]. Therefore, minimum cryptocurrency
transaction times are governed by the ”block time” and the number of additional block
confirmations required by the parties to trust the permanence of the transaction. Addi-
tional time beyond the minimum transaction time may be required if the candidate trans-
action is not immediately chosen for inclusion into the next block. This is a possibility
because the number of transactions that can be included in a block is finite; if more than
this number of transactions are broadcast for inclusion, some selection must occur.
Since a transaction can only occur through inclusion in a block, and finding a block is
a transaction in a block. In the case that there are more candidate transactions than what
is possible to include in a block, transactions offering the highest fees are usually included
first since most miners are profit-maximizing and exhibit greedy behavior. It is therefore
22
generally possible to ensure that a transaction is included in the next block by offering a
sufficiently high fee. Since this is effectively an auction, the fee required depends on the
time.
2.2.1 Exchanges
The cryptocurrency markets we explore in this work are exchanges. There are, of course,
other options for buying and selling cryptocurrencies. Some companies offer opaque price
quotes for currency exchange, and it is also possible to engage in one-off private sales as in
OTC markets. In any case, there is much more information offered by the exchanges, and
they most closely mirror the markets for other financial products we wish to compare to,
We consider exchanges that serve as matching platforms, allowing users to deposit both
cryptocurrencies and fiat currencies (such as US dollars) and place orders to trade cer-
tain currencies against one another. These exchanges hold users’ currencies in exchange
accounts and match buyers with sellers. They typically generate revenue by charging fees
for each match. The fee structure varies by exchange. Exchanges may also make money by
earning interest on fiat deposits or by charging other fees, such as deposit or withdrawal
fees that vary by payment method (for example, with credit card fees often being the high-
est).
23
30-Day Trading Volume Maker Fee Taker Fee
$0-$10,000,000 0.30% 0%
$10,000,000-$100,000,000 0.20% 0%
$100,000,000+ 0.10% 0%
24
30-Day Trading Volume Maker Fee Taker Fee
Cryptocurrency exchanges typically trade 24 hours per day, 7 days per week. Because
each exchange has its own order book, prices for currencies may vary across exchanges.
Exchanges are operated independently by private companies, and each exchange may
offer different trading pairs, fee structures, and order types. Market, limit, maker-only,
2.2.2 Regulation
Cryptocurrencies are relatively new, and the industries developing around them are even
newer. As the idea itself is so new, there are often situations in which it is unclear what, if
any, laws or regulations apply. The result, so far, has been a rapidly changing environment
that is quite different from more mature, stable, and regulated financial markets, such as
25
Many exchanges allow anyone to create an account, and there is no formalized regulated
process like that for opening a brokerage account. Similarly, on the other side, there is lit-
tle stopping someone from creating a new exchange. As one might expect, this area has
been rife with fraud and theft [63, 13, 51, 21, 17]. Exchanges have been hacked and de-
posits have vanished [46, 38]. Some have been found to be insolvent, creating a run on
what deposits were left [14, 25]. Others have been shut down by governments [9, 49, 52].
2.2.3 Arbitrage
from other financial markets. An arbitrageur would need to have accounts on multiple ex-
changes, and, in the cases we examine, US Dollar and cryptocurrency deposits across each
account. This exposes the arbitrageur to the risk of holding cryptocurrencies which are
very volatile and the risk of having uninsured exchange deposits. In executing trades, ar-
bitrageurs incur exchange fees based on the exchange and their 30-day trading volume.
They then would need to be able to move US Dollars and cryptocurrencies between their
accounts on the different exchanges. Moving US Dollars would require involving a bank
as an intermediary and would subject them to fees and processing time of both exchanges
as well as the bank. Transferring cryptocurrencies from exchange to another can be done
directly, but it it requires a variable and inconsistent amount of time and a variable fee
depending on the cryptocurrency and the state of the network when the transfer occurs.
There are various API formats that exchanges uses to provide programatic access to mar-
ket participants. We briefly survey the basics of the different formats and their pros and
26
2.3.1 FIX
FIX (Financial Information eXchange) is the industry-standard protocol for the dissem-
ination of market data and order placement in traditional financial markets [43]. It has
been in use since 1992 and is used across equities, fixed income, derivatives, and foreign
exchange markets.
The FIX protocol runs on top of persistent TCP connections and allows for data messages
to be initiated either by the client or by the server. This is appropriate and advantageous
in a financial market setting in that the client can make requests, such as asking for or-
der book updates for a trading pair or placing an order, and the server can respond with
data as it becomes available such as new orders added to the book or completion status
of a placed order. Since the default behavior is for the connection to be persistent, under
While adoption is widespread within the financial markets industry, the FIX protocol is
not used in other contexts, and most code making use of this protocol is proprietary and
not open source or freely available. This presents an additional challenge and obstacle to
using such an API to collect market data as a FIX client would need to be implemented
FIX API interfaces are offered by some cryptocurrency exchanges, but not many. They are
primarily offered by exchanges catering to institutional accounts, and some (for example,
HTTP is the application-layer protocol on top of which the world wide web is built.
HTTP is also generally run on top of TCP, but the connection is only maintained for the
27
duration of a single request from a client and the corresponding response from a server.
This is appropriate in the context of navigating to a web page in a browser, but is less
than ideal for maintaining a real-time stream of financial market data. A client can re-
quest the most recent market state from the server, but after the server responds, the con-
nection is closed. A new request for market state requires the setup of a new connection.
This repeated overhead cost adds latency and processing time. Furthermore, HTTP is
stateless. Each new connection could be routed to a different server, and that server will
not know which updates the client has already received. This can be mitigated, say, if up-
dates are assigned monotonically increasing identifiers and the client includes with each
request the last identifier it has received. This, too, however, requires extra bandwidth.
Given its usage for the web, HTTP is universally supported, and libraries supporting
HTTP are widely available in essentially any mainstream programming language. In terms
of adoption by cryptocurrency exchanges, many, if not all offer REST APIs over HTTP.
2.3.3 WebSocket
The WebSocket protocol is much newer than FIX or HTTP. It was standardized in 2011.
It is also built on top of TCP and allows for full-duplex communication over a long-lasting
persistent connection. It was designed for low-overhead real-time data transfer for modern
web applications.
Though not specifically designed for financial market settings, the WebSocket protocol
tween the client and server, so there is less overhead compared to the repeated setup and
teardown required for each new request with HTTP. Furthermore, either the client or the
server can initiate data transmission, so the server can push new market updates to the
client as they become available rather than having to wait to respond to the next request
28
from the client. Connection persistence also allows for continuous transmission of messages
Being a modern web standard, the WebSocket protocol is also well-supported. Though not
as universal as HTTP, WebSocket libraries are both readily available and mature. As an
Because of the comparative advantages offered, we use WebSocket APIs exclusively for our
data collection.
29
Chapter 3
Data Collection
There are thousands of cryptocurrencies and hundreds of exchanges, some of which offer as
many as hundreds of trading pairs. Therefore, to narrow our scope, we must choose which
From a market data perspective, we are interested in choosing currency pairs and ex-
changes that carry the highest daily trading volume in order to be able to study the most
active and important markets. We are also concerned with technical considerations; it is
important to select exchanges for which reliable continuous data collection is possible.
Ease of technical implementation is also a factor. Finally, the choices of currency pairs
and exchanges are not independent in that exchanges each only offer a limited set of cur-
30
3.1.1 Currency pairs
When considering currency pairs, we only consider US Dollar denominated trading pairs.
For most of the top exchanges by trading volume, the top trading pairs by volume have
US Dollars as the fiat currency. The exchanges for which this is not true are all based in
South Korea (with top trading pairs by volume denominated in South Korean Won) and
for data collection implementation. The position of the US Dollar as a widely-used reserve
currency is also an important factor. Trading pairs that involved two cryptocurrencies
rather than one fiat currency and one cryptocurrency are not included as such pairs would
have more confounding effects present in trading. For example, we assume that an arbi-
trageur is ultimately interested in a profit in fiat currency and that both having to hold
multiple cryptocurrencies and then having to make an additional trade to convert to fiat
An issue with choosing the top trading pairs by volume is that the ranking differs between
exchanges. At this point, since we want US Dollars as one currency, we consider the mar-
ket cap rankings of the various cryptocurrencies. The top six cryptocurrencies by market
cap at the start of this work are Bitcoin, Ripple, Ethereum, Bitcoin Cash, Cardano, and
company, Ripple Labs, Inc. which controls many of the nodes in the network and a signif-
icant amount of the cryptocurrency itself, so it is quite different from the other cryptocur-
rencies that are more distributed. We eliminate Cardano from consideration because of its
age; it was only released at the end of September 2017 and is therefore less likely to be as
well studied and understood by the market as more mature cryptocurrencies. For exam-
ple, it seems more likely for a debilitating bug or security flaw to exist in its codebase than
those of cryptocurrencies that have undergone more scrutiny [10, 24, 65]. We consider ex-
31
cluding Bitcoin Cash on similar grounds, but, because of its relationship to Bitcoin, we
include it. Bitcoin Cash is a “fork” of Bitcoin that includes the entire history of the Bit-
coin blockchain prior to its inception, and Bitcoin Cash includes very few changes in it’s
functionality and implementation. Bitcoin Cash was introduced with the express purpose
of having lower transaction times and fees than Bitcoin, and this makes it an interesting
3.1.2 Exchanges
The top three exchanges by trading volume offering US Dollar denominated trading pairs
at the beginning of this work are Binance, OKEx, and Bitfinex. However, as we begin
work on implementation for data collection for Binance, we find that Binance and OKEx
do not actually offer trading against US Dollars, but rather only offer trading against
Dollar reserves the company holds. However, at the time of exchange selection, it does
not appear to be possible to freely convert between US Dollars and USDT, and there are
widespread rumors that the company does not actually have the reserve funds to back
USDT as they claim to [40, 28, 3, 18, 41, 35]. Because of this risk, we exclude Binance and
The next three exchanges by US Dollar denominated trading volume are Kraken, GDAX,
and Bitstamp. Kraken does not have a WebSocket API available for data collection, but
Bitfinex, GDAX, and Bitstamp do. Given the technical advantages of a WebSocket API
compared to a REST API as offered by Kraken as its only option, we move forward with
Bitfinex, GDAX, and Bitstamp. Availability of WebSocket APIs provides for simpler im-
32
All three of these exchanges offer the chosen currency pairs: BTC/USD, ETH/USD, LTC/
Across these exchanges we expect external barriers to arbitrage to be low. All three ex-
changes accept US Dollar deposits through a variety of means, including bank wires.
Cryptocurrencies are movable across exchange accounts as well. Bitstamp has been op-
erating since 2011, and Bitfinex and the predecessor to GDAX (Coinbase) launched in
2012 [37, 32, 6]. The operator of GDAX (Coinbase, Inc.) is based in San Francisco, CA
in the US and is presumably subject to US regulations [33]. It is, however, more difficult
to find concrete information about Bitfinex and Bitstamp. While it appears that Bitfinex
is registered in the British Virgin Islands, there is no reliable information as to where their
money, employees, or infrastructure are located [57]. Bitstamp discloses that it has entities
in Luxembourg, the UK, and the US, though again, it is not clear where their people or
servers are [7]. Despite not listing any Slovenian address, there are reports of Bitstamp’s
presence there, and they appear to be actively hiring personnel for positions in Slovenia
[59, 1].
Each of the three exchanges has an interface providing a streaming ticker, so we use that
to collect pricing data for each of the currencies across the exchanges. Each exchange has
3.2.1 GDAX
For GDAX, the request and response specifications for ticker updates are as follows:
33
1 // Request
2 // Subscribe to BTC-USD and LTC-USD ticker updates
3 {
4 "type": "subscribe",
5 "product_ids": [
6 "BTC-USD",
7 "LTC-USD"
8 ],
9 "channels": [
10 "ticker"
11 ]
12 }
Listing 3.1: Specification for request for GDAX ticker update subscription
1 // Response
2 {
3 "type": "subscriptions",
4 "channels": [
5 {
6 "name": "ticker",
7 "product_ids": [
8 "BTC-USD",
9 "LTC-USD",
10 ]
11 }
12 ]
13 }
Listing 3.2: Specification for response for GDAX ticker update subscription
1 {
2 "type": "ticker",
3 "trade_id": 20153558,
4 "sequence": 3262786978,
34
5 "time": "2017-09-02T17:05:49.250000Z", "product_id": "BTC-USD",
6 "price": "4388.01000000",
7 "side": "buy", // Taker side
8 "last_size": "0.03000000",
9 "best_bid": "4388",
10 "best_ask": "4388.01"
11 }
As we can see, with GDAX, updates are received for each trade and each update con-
tains the time, price, side, and size of the trade along with the post-trade best bid and ask
prices. To read from this stream, we use the following JavaScript code:
Listing 3.4: JavaScript code for reading from GDAX ticker update stream
We record time separately because we do not know whether the clocks between the ex-
35
changes are synchronized, so the only known reference point is on the machine where the
data is being collected and recorded. We include the conditional specifying type and time
to ensure that we only record the updates of interest and ignore extraneous unexpected
3.2.2 Bitfinex
Subscribing to the Bitfinex WebSocket Ticker API feed for the BTC/USD currency pair
1 // request
2 {
3 "event":"subscribe",
4 "channel":"ticker",
5 "pair":"BTCUSD"
6 }
7
8 // response
9 {
10 "event":"subscribed",
11 "channel":"ticker",
12 "chanId":"<CHANNEL_ID>",
13 "pair":"BTCUSD"
14 }
Listing 3.5: Specification for request and response for Bitfinex ticker update subscription
Once subscribed, data messages sent on the channel came in one of the two following for-
mats:
1 // snapshot
2 [
3 "<CHANNEL_ID>",
4 "<BID>",
36
5 "<BID_SIZE>",
6 "<ASK>",
7 "<ASK_SIZE>",
8 "<DAILY_CHANGE>",
9 "<DAILY_CHANGE_PERC>",
10 "<LAST_PRICE>",
11 "<VOLUME>",
12 "<HIGH>",
13 "<LOW>"
14 ]
15
16 // updates
17 [
18 "<CHANNEL_ID>",
19 "<BID>",
20 "<BID_SIZE>",
21 "<ASK>",
22 "<ASK_SIZE>",
23 "<DAILY_CHANGE>",
24 "<DAILY_CHANGE_PERC>",
25 "<LAST_PRICE>",
26 "<VOLUME>",
27 "<HIGH>",
28 "<LOW>"
29 ]
Note that there is no difference in the specification Bitfinex gives for the snapshot mes-
sage compared to the update message. It is not clear why any distinction is made between
37
Field Type Description
DAILY_CHANGE float Amount that the last price has changed since yesterday
DAILY_CHANGE_PERC float Amount that the price has changed expressed in percentage terms
5 const ws = bfx.ws(1);
6
7 ws.on('open', () => {
8 ws.subscribeTicker('BTCUSD');
9 ws.subscribeTicker('ETHUSD');
10 ws.subscribeTicker('LTCUSD');
11 ws.subscribeTicker('BCHUSD');
12 });
13
38
15 const logItem = {
16 date: new Date(),
17 pair: pair,
18 data: ticker
19 };
20 console.log(JSON.stringify(logItem));
21 });
22
23 ws.open();
Listing 3.7: JavaScript code for reading from Bitfinex ticker update stream
As we can see, messages from Bitfinex does not include any timestamp, so the only refer-
3.2.3 Bitstamp
live_trades_{currency_pair}
bcheur, bchbtc
EVENT trade
39
Field Description
40
20 date: new Date(),
21 pair: 'LTCUSD',
22 data: data
23 };
24 console.log(JSON.stringify(logItem));
25 });
26
Listing 3.8: JavaScript code for reading from Bitstamp ticker update stream
GDAX and Bitstamp provide timestamps with each data point in the stream. In our
pipeline, we also record our own timestamp with each data point. For Bitfinex, since no
timestamp is provided, our own timestamp is the only reference we have. For making com-
parisons across exchanges, it is not clear which timestamps we should use. Using our own
recorded timestamps provides a guarantee of consistency of clock, though each data point
41
would include delays between the actual time of the trade on the exchange and the time
recorded, and these delays would vary by exchange. Alternatively, using timestamps as re-
ported by the exchanges would eliminate these delays, but there is no guarantee of clock
Our initial implementation using the code described above struggles to remain stable and
encounters frequent crashes. We describe the mitigation strategies used and their technical
After collecting multiple days worth of data, we perform a summary analysis to gain pre-
liminary insight into the differences between the different sources, and confirm the validity
of the collected data. In this process we first look at the frequency of updates received and
We create histograms for each trading pair on each exchange using 100ms wide bins and
observe the distribution of time deltas in between consecutive updates received for each
data stream.
We note that for each of the three GDAX data streams, the distributions look similar.
The most full bin is the first—0-100ms—with a significant decline after that, though there
is an uptick between 10 and 13 seconds to a frequency not otherwise seen above 1.5 sec-
onds. We do not have a clear explanation for this phenomenon, but we suspect that it is
due to some timeout or retry logic that is activated after a period of 10 seconds.
42
Figure 3-1: Preliminary GDAX BTC Update Time Deltas
43
Figure 3-2: Preliminary GDAX ETH Update Time Deltas
44
Figure 3-3: Preliminary GDAX LTC Update Time Deltas
The histograms for the Bitfinex data streams are very different. Here we see large peaks
at each of 15, 30, 45, and 60 seconds and smaller peaks 0.5s or 0.6s on either side of these
large peaks with steady fall off on both sides. Given such regular intervals, one possible
reason is that the large peaks are due to Bitfinex intentionally aiming to provide updates
every 15 seconds and that the decreasing peaks at 30s, 45s, and 60s are caused by some of
the updates that are meant to be every 15s not arriving. The normal-like fall off on either
side of the peaks makes sense if we assume that their ”errors” when attempting to send
data every 15 seconds are approximately normally distributed, but the smaller peaks 0.5-
45
Figure 3-4: Preliminary Bitfinex BTC Update Time Deltas
46
Figure 3-5: Preliminary Bitfinex ETH Update Time Deltas
47
Figure 3-6: Preliminary Bitfinex LTC Update Time Deltas
The Bitstamp data returns to a more expected pattern. Here we see the most samples in
the 100ms-200ms bin and the second-most in the 0-100ms bin. There is a generally fall-off
thereafter with small spikes at 5s and 10s. We again assume that the spikes at 5s and 10s
seem thresholded and likely to be caused by timeout or retry logic present in the system.
Observing slightly slower update times for Bitstamp compared to GDAX is expected given
that GDAX sees more trading volume (assuming size of trades across the two exchanges
are comparable).
48
Figure 3-7: Preliminary Bitstamp BTC Update Time Deltas
49
Figure 3-8: Preliminary Bitstamp ETH Update Time Deltas
50
Figure 3-9: Preliminary Bitstamp LTC Update Time Deltas
Though the most striking resemblance when looking at the histograms is the consistency
on each exchange across the various trading pairs, we do also observe differences between
the trading pairs on each exchange and note that there does seem to be a difference in fre-
quency of trades.
The concerning realization from these observations, however, is that it appears that up-
dates from Bitfinex across all trading pairs are significantly less frequent than from the
other two exchanges. This is particularly surprising because Bitfinex is the exchange with
the highest trading volume, and a cursory qualitative examination of the user-facing trad-
ing interface suggests that many trades are happening per second.
51
Figure 3-10: Bitfinex Web Trading Interface
Furthermore, the distribution of update time deltas makes it appear as though they are
intentionally only providing updates every 15 seconds rather providing a real-time stream
of trades as we require.
In order to not be limited by the apparent artificial throttling Bitfinex is placing on their
Ticker API, we explore other options. First we write code to instead read from Bitfinex’s
The Trades channel has the following subscription request and response specifications:
52
1 // request
2 {
3 "event": "subscribe",
4 "channel": "trades",
5 "pair": "BTCUSD"
6 }
7 // response
8 {
9 "event": "subscribed",
10 "channel": "trades",
11 "chanId": "<CHANNEL_ID>",
12 "pair":"<PAIR>"
13 }
Listing 3.9: Specification for request and response for Bitfinex ticker update subscription
1 // snapshot
2 [
3 "<CHANNEL_ID>",
4 [
5 [
6 "<SEQ> OR <ID>",
7 "<TIMESTAMP>",
8 "<PRICE>",
9 "<AMOUNT>"
10 ],
11 [
12 "..."
13 ]
14 ]
15 ]
16 // updates
17 [
18 "<CHANNEL_ID>",
53
19 "te",
20 "<SEQ>",
21 "<TIMESTAMP>",
22 "<PRICE>",
23 "<AMOUNT>"
24 ]
25
26 [
27 "<CHANNEL_ID>",
28 "tu",
29 "<SEQ>",
30 "<ID>",
31 "<TIMESTAMP>",
32 "<PRICE>",
33 "<AMOUNT>"
34 ]
Based on this spec, we adapted the Bitfinex data collection code to look as shown:
54
1 const BFX = require('bitfinex-api-node');
2
5 const ws = bfx.ws(1);
6
7 ws.on('open', () => {
8 ws.subscribeTrades('BTCUSD');
9 ws.subscribeTrades('ETHUSD');
10 ws.subscribeTrades('LTCUSD');
11 ws.subscribeTrades('BCHUSD');
12 });
13
27 ws.on('close', () => {
28 console.error('close');
29 setTimeout(() => {
30 console.error('reconnecting');
31 ws.open();
32 }, 500);
33 });
34
55
38
39 ws.open();
Listing 3.11: JavaScript code for reading from Bitfinex trades update stream
We run this new code for Bitfinex alongside the existing code being used to collect data
After several days, we construct new histograms to compare the update time deltas of our
new dataset. The GDAX and Bitstamp datasets look similar to our previous observations,
but now the Bitfinex histograms look very different. We observe a more expected pattern
where we most frequently see samples in the 0-100ms range and a drop-off thereafter. It
is, if anything, more consistent with our expectations than the GDAX or Bitstamp distri-
butions in that we see an frequency almost monotonically decreasing with increasing time
intervals.
56
Figure 3-11: Final GDAX BTC Update Time Deltas
57
Figure 3-12: Final GDAX ETH Update Time Deltas
58
Figure 3-13: Final GDAX LTC Update Time Deltas
59
Figure 3-14: Final GDAX BCH Update Time Deltas
60
Figure 3-15: Final Bitfinex BTC Update Time Deltas
61
Figure 3-16: Final Bitfinex ETH Update Time Deltas
62
Figure 3-17: Final Bitfinex LTC Update Time Deltas
63
Figure 3-18: Final Bitfinex BCH Update Time Deltas
64
Figure 3-19: Final Bitstamp BTC Update Time Deltas
65
Figure 3-20: Final Bitstamp ETH Update Time Deltas
66
Figure 3-21: Final Bitstamp LTC Update Time Deltas
67
Figure 3-22: Final Bitstamp BCH Update Time Deltas
Given the constant stream of updates from all three exchanges across all of the trading
pairs, we are now confident in the integrity of the data streams for further use in our anal-
ysis.
Since the new Bitfinex feed includes timestamps for each trade with each update, it is now
possible to both record the time the trade actually took place as reported by the exchange
as well as the time the record of the trade entered our data recording pipeline. The dif-
ference between these two timestamps is a conflated measure of both latency—the delay
between when the trade occurred and when learn of it—as well as the clock skew between
the exchange’s clock and ours. For each trading pair on each exchange, we construct a
histogram with 5ms wide bins to examine the distribution of reported vs recorded trade
68
times. Here, again, we observe pronounced differences between exchanges as well as minor
For GDAX, with each of the trading pairs, we observe generally low latency, often under
40ms, with a small smattering of samples spread out over a long tail. We observe differ-
ences between trading pairs in that for BTC and BCH, only roughly 1.5% of samples ex-
hibit delays of <5ms compared to over 5% for ETH and nearly 20% for LTC. It is hard to
say with any confidence what causes these differences, but one possibility is that GDAX
vertically shards its infrastructure by trading pair, and, due to more trading activity with
BTC and BCH, latencies are higher compared to their ETH and LTC markets.
69
Figure 3-24: GDAX ETH Reported vs Recorded Time Deltas
70
Figure 3-25: GDAX LTC Reported vs Recorded Time Deltas
71
Figure 3-26: GDAX BCH Reported vs Recorded Time Deltas
The distributions for Bitfinex look dramatically different. There are no samples for any
trading pair showing delays less than 60ms, and most samples exhibit delays between
100ms and 1100ms with a fairly uniform distribution within that range. We then observe
a rapid decline with a long tail extending past 2 seconds. In contrast to GDAX, there ap-
pears to be almost no difference between the different trading pairs on this exchange. The
only observable point of note is that there appear to be spikes on the LTC data stream
every 100ms along the long tail (1.5s, 1.6s, 1.7s, ...). There is no clear reason why trades
would be bunched along these boundaries, so once again we assume that this behavior is
72
Figure 3-27: Bitfinex BTC Reported vs Recorded Time Deltas
73
Figure 3-28: Bitfinex ETH Reported vs Recorded Time Deltas
74
Figure 3-29: Bitfinex LTC Reported vs Recorded Time Deltas
75
Figure 3-30: Bitfinex BCH Reported vs Recorded Time Deltas
The histograms for Bitstamp are very similar to those for Bitfinex, albeit slightly shifted.
We observe no samples with delays of less than 230ms, and most samples seem to fall be-
tween 250ms and 1250ms with a fairly uniform spread within that range. Once again,
there is a rapid decline follow by a long tail extending past 2s. In slight contrast, for the
ETH, LTC, and BCH data streams, we observe slight peaks between 340ms and 370ms.
76
Figure 3-31: Bitstamp BTC Reported vs Recorded Time Deltas
77
Figure 3-32: Bitstamp ETH Reported vs Recorded Time Deltas
78
Figure 3-33: Bitstamp LTC Reported vs Recorded Time Deltas
79
Figure 3-34: Bitstamp BCH Reported vs Recorded Time Deltas
In all, we observe significant differences across exchanges, and minor differences across
trading pairs. We presume a large factor in the differences across exchanges is physical
Northern Virginia within the Amazon Web Services us-east-1 region [30] which is also
where our infrastructure is located. This offers an explanation as to why, without any seri-
ous consideration given to network optimization, we are able to observe trades just tens
of milliseconds after they occur. Bitfinex and Bitstamp meanwhile are not US compa-
nies and, while there is no reliable information as to where their servers are located, we
presume that they likely do not have US-based infrastructure. Thus, receiving data from
them requires more network hops and more distance traveled leading to higher latencies.
While some difference in the distributions is likely attributable to network latency, the
particular shape of the Bitfinex and Bitstamp distributions arises suspicion. In both cases,
80
we observe a relatively uniform distribution over a one-second-wide interval compared to
sharp drop-offs centered around a particular peak. We suspect that the response times
are being purposefully shaped by the exchanges. It is possible this is a security or techni-
cal consideration to smooth out traffic and avoid bursts of data inundating their servers
at once. Another possibility is that they intentionally introduce these delays in the data
streams for their general clientele and charge a premium to institutional customers willing
to pay for low latency access in order to better employ high frequency trading strategies.
This would be a variation on techniques for exchanges described in the financial literature
[50].
In any case, we must choose a method to use for our analysis going forward. Absent any
indication of reliable time coordination between the exchanges, it seems most prudent to
use timestamps as recorded by the data collection pipeline as our time of record. With
this method, despite the fact that we know the data we are considering from Bitfinex and
Bitstamp is ”stale” compared to GDAX, we are more certain that the timestamps are con-
sistent throughout the dataset. Furthermore, using this method is most reflective of the
experience of a US-based market participant. Given the physical separation between the
exchanges, it would seem to be impossible for any actor to be able to act with zero or min-
In addition to collecting data about the trades executed on the exchanges, we are also in-
terested in pricing and how it varies over time across exchanges and trading pairs. Accord-
ingly, we seek to record data allowing us to determine the best bid and best ask price for
81
3.5.1 GDAX
Although the GDAX Ticker channel supplies data for the best bid and best ask along with
each trade, these data do not provide a full picture in that updates are only provided with
each executed trade rather than with each change to the order book. To access changes
to the best bid and best ask in between executed trades, we must subscribe to the level2
channel.
The request and response specification for level2 updates are as follows:
1 // Request
2 // Subscribe to BTC-USD and LTC-USD level2 updates
3 {
4 "type": "subscribe",
5 "product_ids": [
6 "BTC-USD",
7 "LTC-USD"
8 ],
9 "channels": [
10 "level2"
11 ]
12 }
Listing 3.12: Specification for request for GDAX level2 update subscription
1 // Response
2 {
3 "type": "subscriptions",
4 "channels": [
5 {
6 "name": "level2",
7 "product_ids": [
8 "BTC-USD",
9 "LTC-USD",
82
10 ]
11 }
12 ]
13 }
Listing 3.13: Specification for response for GDAX level2 update subscription
1 {
2 "type": "snapshot",
3 "product_id": "BTC-USD",
4 "bids": [["6500.11", "0.45054140"]],
5 "asks": [["6500.15", "0.57753524"]]
6 }
1 {
2 "type": "l2update",
3 "product_id": "BTC-USD",
4 "changes": [
5 ["buy", "6500.09", "0.84702376"],
6 ["sell", "6507.00", "1.88933140"],
7 ["sell", "6505.54", "1.12386524"],
8 ["sell", "6504.38", "0"]
9 ]
10 }
83
4 ['BTC-USD', 'ETH-USD', 'LTC-USD', 'BCH-USD'],
5 'wss://ws-feed.gdax.com',
6 null,
7 { channels: ['level2'] }
8 );
9
18 websocket.on('close', () => {
19 websocket.connect();
20 });
Listing 3.16: JavaScript code for reading from GDAX level2 update stream
Even though we collect this data, given the frequency of bid/ask updates available from
the ticker channel, the decision was made to not implement the data processing necessary
to reconstruct the best bid and best ask prices from the order book updates.
3.5.2 Bitfinex
To collect bid/ask data from Bitfinex, we need to read from the Order Books channel. The
interface for subscribing to this feed has the following request and response specifications:
1 // request
2 {
3 "event":"subscribe",
4 "channel":"book",
5 "pair":"<PAIR>",
84
6 "prec":"<PRECISION>",
7 "freq":"<FREQUENCY>",
8 "length":"<LENGTH>"
9 }
10 // response
11 {
12 "event":"subscribed",
13 "channel":"book",
14 "chanId":"<CHANNEL_ID>",
15 "pair":"<PAIR>",
16 "prec":"<PRECISION>",
17 "freq":"<FREQUENCY>",
18 "len":"<LENGTH>"
19 }
Listing 3.17: Specification for request and response for Bitfinex order books update sub-
scription
Once subscribed, data messages sent on the channel came in one of the two following for-
mats:
1 // snapshot
2 [
3 "<CHANNEL_ID>",
4 [
5 [
6 "<PRICE>",
7 "<COUNT>",
8 "<AMOUNT>"
9 ],
10 ...
11 ]
12 ]
13 // updates
14 [
15 "<CHANNEL_ID>",
85
16 "<PRICE>",
17 "<COUNT>",
18 "<AMOUNT>"
19 ]
P0 for precision and F0 for frequency. Length is left at the default 25 since we are only
86
1 const BFX = require('bitfinex-api-node');
2
5 const ws = bfx.ws(1);
6
7 ws.on('open', () => {
8 ws.subscribeOrderBook('BTCUSD');
9 ws.subscribeOrderBook('ETHUSD');
10 ws.subscribeOrderBook('LTCUSD');
11 ws.subscribeOrderBook('BCHUSD');
12 });
13
27 ws.on('close', () => {
28 console.error('close');
29 setTimeout(() => {
30 console.error('reconnecting');
31 ws.open();
32 }, 500);
33 });
34
87
38
39 ws.open();
Listing 3.19: JavaScript code for reading from Bitfinex order books update stream
This code simply records the data as it is received. The format of the data provided by
Bitfinex is split into an initial snapshot and incremental updates thereafter; therefore, pro-
cessing has to be implemented and undertaken in order to reconstruct the best bid and
best ask prices at any given time. Examining the raw data stream alone would only show
updates to the order book at that time. This would provide the amount available at a spe-
cific price point at that instant in time, but without knowledge of the existing state of the
order book, it would be impossible to determine if that were the best price point on either
To reconstruct the state of the order book at each update time, the stream of data needs
ward: begin with an order book defined by the initial snapshot and record the best bid
and best ask at that time; then, apply each update and record the best bid and best ask
at that timestamp, continuing until all the data has been processed.
1 def process_update(update):
2 global bestbid, bestask
3
4 if update['data']['amount'] > 0:
5 if update['data']['count'] == 0:
6 del bidsdict[update['data']['price']]
7 else:
8 bidsdict[update['data']['price']] = {
9 'count': update['data']['count'],
88
10 'amount': update['data']['amount'],
11 }
12 if bidsdict:
13 curbestbid = max(bidsdict)
14 if curbestbid != bestbid['price']:
15 bestbid = {
16 'price': curbestbid,
17 'amount': bidsdict[curbestbid]['amount'],
18 'count': bidsdict[curbestbid]['count'],
19 }
20 output.append({
21 'date': update['date'],
22 'pair': update['pair'],
23 'best_bid': bestbid,
24 'best_ask': bestask,
25 })
26 else:
27 bestbid = {
28 'price': -math.inf,
29 'date': update['date'],
30 }
31 elif update['data']['amount'] < 0:
32 if update['data']['count'] == 0:
33 del asksdict[update['data']['price']]
34 else:
35 asksdict[update['data']['price']] = {
36 'count': update['data']['count'],
37 'amount': update['data']['amount'],
38 }
39 if asksdict:
40 curbestask = min(asksdict)
41 if curbestask != bestask['price']:
42 bestask = {
43 'price': curbestask,
44 'amount': asksdict[curbestask]['amount'],
45 'count': asksdict[curbestask]['count'],
46 }
89
47 output.append({
48 'date': update['date'],
49 'pair': update['pair'],
50 'best_bid': bestbid,
51 'best_ask': bestask,
52 })
53 else:
54 bestask = {
55 'price': math.inf,
56 'date': update['date'],
57 }
In practice, however, there are additional challenges detailed in Appendix A.2. After solv-
ing these, we are then able to output best bid and best ask prices at each update time for
3.5.3 Bitstamp
For Bitstamp, to collect bid/ask data, we use the Live Order Book stream which has the
following specification:
order_book_{currency_pair}
The currency_pair placeholder can be replaced with the following values that
EVENT data
90
The field descriptions for this stream are as follows:
Field Description
Accordingly, here is the JavaScript code that is used to collect this data:
91
26 console.log(JSON.stringify(logItem));
27 });
28
Listing 3.21: JavaScript code for reading from Bitstamp ticker update stream
Note that since we know we are only interested in the best bid and best ask, we truncate
the data before recording it. This is possible with Bitstamp where it was not with GDAX
or Bitfinex because each message includes the most recent 100 best bids and asks rather
than incremental updates from an initial snapshot. This means that the data as recorded
Furthermore, missed or out-of-order messages from Bitstamp do not pose data integrity
issues. Since each message provides the complete state, there is no possibility of inconsis-
tency. Out-of-order messages can be readily detected by comparing the supplied times-
92
tamp with the timestamps of previously received messages, and reconstruction of the
93
Chapter 4
Our data collection implementation and data processing pipeline allow us a view into each
match that occurs as well as the best bid and best ask price on each exchange for each of
the trading pairs at any given time during our sample period which spans from midnight
(00:00:00) on May 4, 2018 until midnight May 9, 2018. Using this aggregated data, we be-
gin by presenting summaries of trading volume and bid/ask spreads for each trading pair
on each exchange. We then continue with building regression models to test hypotheses
Defining a match, r, to have price pr and size sr , we can compute the total trade volume
∑
for a given trading pair on each exchange as r pr sr .
We observe that Bitfinex sees the most USD trading volume across all cryptocurrencies
and that BTC is the highest-volume traded cryptocurrency across all three exchanges.
94
Bitfinex GDAX Bitstamp
Table 4.1: Average daily trading volume by exchange and currency pair
To standardize the data streams and make them more manageable and comparable, we
first sample in 1-second increments across our 5-day dataset. We use whole-second incre-
ments beginning at 00:00:00 on the first day of our sample. If we say that ti represents the
time i seconds (zero-indexed) after the start of the interval and that each orderbook up-
date, u, has timestamp τu , we define the best bid and best ask at time ti to be the prices
amounts—rather we use the same tick size the exchange does for each trading pair ($0.01
in all cases except for BTC/USD and BCH/USD on Bitfinex which use $0.10 tick sizes).
Since we simultaneously collect data for all trading pairs over the same interval, the num-
ber of samples in each case is the same (432,000), but we present percentages for readabil-
ity. In many cases the plots become unwieldy due to both being dominated by the first
two tick size levels and by very long tails, so, in each case, we have removed the first two
tick sizes and only plot spreads up to $10 and instead elect to show this in accompanying
tables below.
We begin with the bid/ask spreads for the BTC/USD market on GDAX. Here we observe
95
tight spreads with over 94% of samples exhibiting the minimum $0.01 spread. We observe
generally decaying frequencies of wider spreads, but we do note spikes at $1.01, $3.01, and
thresholds, analogous to behavior described in public markets for other financial assets
[5, 58].
$0.01 94.14%
$0.02 0.41%
96
For ETH we are slightly less likely to see the minimum $0.01 spreads, but we observe a
much smoother and faster drop-off and shorter tail. We never observe a spread greater
than $3.73. It is surprising that these distributions look so different given that they are
the two most active markets on GDAX and involve the two most widely traded cryptocur-
rencies.
$0.01 72.71%
$0.02 1.79%
97
For the LTC/USD trading pair we observe a more compressed distribution with no sam-
$0.01 80.39%
$0.02 2.60%
The data for BCH shows many fewer samples with the minimum $0.01 spread, though
generally a pattern similar to BTC with a slow drop-off, long tail, and spike at $1.01.
98
Bid/Ask Spread Samples
$0.01 43.27%
$0.02 1.80%
For Bitfinex BTC trades in $0.10 ticks. Despite Bitfinex charging higher fees for matched
maker orders for traders with less than $7,500,000 in 30-day trading volume, we still ob-
serve the minimum $0.10 spread for over 86% of samples. We also see a much smoother
distribution of spreads than we do for GDAX and do not observe the round-number
spikes.
99
Bid/Ask Spread Samples
$0.10 86.23%
$0.20 1.89%
For ETH/USD the distribution of bid/ask spreads on Bitfinex is largely similar to that on
100
Bid/Ask Spread Samples
$0.01 75.71%
$0.02 3.04%
For LTC, the plotted distributions appear similar between Bitfinex and GDAX in that
they are compressed, but there is a significant difference in that Bitfinex exhibits $0.01
spreads in just over 40% of samples compared to over 80% on GDAX. We also observe no
101
Bid/Ask Spread Samples
$0.01 42.28%
$0.02 7.14%
Bitfinex also uses a $0.10 minimum tick size for BCH, and here we find a different pattern
with fewer than 30% of samples exhibiting $0.10 spreads and more than 3% of samples
exhibiting spreads of each of $0.20, $0.30, $0.40, $0.50, $0.60, $0.70, $0.80, $0.90, $1.00,
and $1.10. The frequency of spreads decreases with sizes greater than $1, though much
more slowly than we see in the other distributions. It is not clear what causes these dif-
102
ferences, as they are not repeated in other cases. The BCH/USD market on Bitfinex is
in between ETH/USD and LTC/USD volume-wise, so that does not seem to be a factor.
The fee structure is identical to other trading pairs, and the tick size is the same as for
BTC/USD.
$0.01 27.77%
$0.02 4.36%
103
The distribution of bid/ask spreads for BTC/USD on Bitstamp is dramatically different.
Only 3.63% of samples exhibit the minimum $0.01 spread, and there is no other spread
level that is observed in more than 1% of samples. After $0.01 and $0.02, the third most
observed bid/ask spread is $4.98. We observe spikes around each whole-dollar level similar
to those seen on GDAX, but these are even more pronounced. Furthermore, over 14% of
samples exhibit spreads greater than $10. It is unclear what causes such a dramatic varia-
tion.
$0.01 3.22%
$0.02 0.40%
104
Figure 4-9: Bitstamp BTC Bid/Ask Spread
See Table 4.10 for truncated data
Examining the ETH/USD data for Bitstamp also reveals a different pattern. Fewer than
3% of samples exhibit the minimum $0.01 spread, and the second-most frequent spread ob-
served is $0.97. Following this, the third, fourth, and fifth most frequent observed spreads
$0.01 3.63%
$0.02 0.59%
105
Figure 4-10: Bitstamp ETH Bid/Ask Spread
See Table 4.11 for truncated data
When looking at LTC, we see that the lowest possible spread is not the spread that is ob-
served most frequently. Rather, we observe a distribution centered around $0.30. The only
mark of similarity to the other exchanges is that there are no samples which show a spread
$0.01 2.24%
$0.02 0.55%
106
Figure 4-11: Bitstamp LTC Bid/Ask Spread
See Table 4.12 for truncated data
For BCH we observe $0.01 spreads for 3.11% of samples. The second most frequently ob-
served spread is $0.99 followed by $1.99 and $2.97. This appears most similar to BTC on
Bitstamp where we observe a wide distribution and spikes around whole-dollar spread lev-
els.
$0.01 3.11%
$0.02 0.40%
107
Figure 4-12: Bitstamp BCH Bid/Ask Spread
See Table 4.13 for truncated data
Across all trading pairs, Bitstamp sees many fewer instances of bid/ask spreads at the
minimum tick size. Though a contributing factor may be that Bitstamp sees lower trad-
ing activity than the other exchanges, it appears more likely that this is related to their
fee structure which always levies a charge on both makers and takers. This is supported
by the fact that this pattern exists even for BTC/USD even though Bitstamp sees more
BTC/USD trading volume than GDAX which has $0.01-wide spreads in 94.14% of sam-
ples.
It is not clear why we observe spikes at whole-number dollar amount spreads for some
trading pairs (BTC and BCH) on some exchanges (GDAX and Bitstamp) and not oth-
ers. On GDAX and Bitstamp, BCH is close in trading volume to LTC, and these volumes
are much lower than that of BTC on either exchange, so that does not appear to be a fac-
108
tor. It is possible that these two exchanges see more activity from traders who exhibit psy-
chological tendencies toward trades at these anchoring points. Still, the phenomenon does
not appear to be exhibited for ETH or LTC, so we would also have to further assume that
market participants with these characteristics are also more drawn to BTC and BCH in
particular. A draw to BTC is plausible given its position as the most widely traded and
well-known cryptocurrency. It is less obvious why this would also be true for BCH, though
it is possible that confusion between BTC and BCH among newcomers to cryptocurrency
and retail investors plays a nontrivial role. It is also true that when BCH was created by
forking BTC on August 1, 2017, everyone with BTC at the time of the fork then also had
an equivalent number of units of BCH. It is therefore possible that traders exhibiting this
behavior trade BCH as well as BTC as a result of being ”given” BCH at conception time.
109
Chapter 5
Arbitrage
their characteristics, and how they vary across trading pairs. We therefore seek to use our
We first examine the 5-second-sampled bid/ask data for each currency pair for each ex-
change pair and compute the percentage of 5-second intervals that begin with arbitrage
possible net of the highest possible exchange fees. For a cryptocurrency c and pair of ex-
changes (i, j) with bids at time t (seconds after the start of the sample), bcit and bcjt , asks
acit and acjt , and maximum taker fees fi and fj , we compute this fraction of intervals as
n
−1
5∑5
Rc (5t)
n t=0 i,j
110
1 if bcit (1 − fi ) > acjt (1 + fj )
Rci,j (t) =
or bcjt (1 − fj ) > acit (1 + fi )
0 otherwise
The percentages for the exchange pairs involving Bitfinex are so high as to arouse suspi-
cion that the bid/ask data may be inaccurate. To confirm, we compare the bid/ask data
from Bitfinex to the trades data collected separately, and we find that the prices at which
The stark difference in pricing on Bitfinex compared to the other two exchanges sug-
gests additional friction in using this exchange compared to the others despite it being
the highest-volume exchange of the three and offering a competitive fee structure. This
would be consistent with additional perceived risk based on reports of suspected fraud-
ulent behavior on their part [40]. There are numerous reports of being unable to with-
draw fiat currencies, particularly US Dollars from Bitfinex [23]. There are also reports of
of time [64, 60, 2, 4]. Additionally, most recently, there is convincing research suggesting
that Bitfinex is involved in cryptocurrency price manipulation [22]. All of these would ex-
plain why arbitrageurs would be less willing to engage with Bitfinex and why there would
111
be such persistent price differences.
Continuing, we see that the frequency for the currency pairs between GDAX and Bitstamp
do appear consistent with arbitrage activity. At the onset of this work, we had thought ar-
bitrage activity might vary by cryptocurrency and be related to the transaction time and
transaction cost associated with each cryptocurrency. Higher transaction times or higher
transaction costs might add additional friction and risk discouraging arbitrage traders.
This, however, appears not to be the case. Bitcoin, with both the highest transaction time
and highest transaction cost has the fewest incidences of arbitrageable price differences
and the lowest mean arbitrage interval time. The relationship, rather, appears to be be-
tween arbitrage opportunity and aggregate trading volume. Both GDAX and Bitstamp
have the same relative ordering of these currency pairs by average daily volume: BTC/
USD, ETH/USD, BCH/USD, and LTC/USD. This is also the same as the ordering of
Based on the above findings, we then further examine arbitrage window length. Using 1-
second samples of the bid/ask data and discarding arbitrage windows lasting fewer than
3 seconds, we compute the mean window length per currency pair considering only the
112
Length (seconds)
BTC/USD 21.50
ETH/USD 22.86
BCH/USD 23.25
LTC/USD 31.12
This same ordering holds true for both frequency of arbitrage opportunity and mean arbi-
trage window length. These results suggest that the more valuable or more actively traded
a cryptocurrency is, the more arbitrage activity it draws, and these factors overshadow
differences
Given the bid/ask and trades data assembled across the exchanges, we attempt to find
evidence of arbitrage activity by examining the bid and ask prices on each exchange at
each time t and the net volume during the subsequent time interval.
Hypothesis 1: For a given cryptocurrency, if the bid on the first exchange exceeds the ask
on the second exchange, then we expect to see net sell on the first exchange and net buy on
If there were active arbitrageurs participating in the market, we would expect that, if, at
time t, the arbitrage condition were true, specifically that the bid on one exchange ex-
ceeded the ask on another, we would see negative net volume on the first exchange and
positive net volume on the second. We construct several logistic regression models in the
113
form
1
Pr(y = 1) =
1+ e−(βx+α)
to test this. In order to align our trades data with our sampled bid/ask data, we must ag-
1 if buyer is taker
u=
−1 if seller is taker
∑
Vt = pr sr ur
r|t<=τr <t+1
In the first case, we construct one model for each cryptocurrency and each pair-wise per-
mutation of exchanges. We once again say that exchanges i and j at time t have bids bcit
and bcjt and asks acit and acjt for cryptocurrency c. We are interested in varying the time
interval length beyond the 1-second samples we have already defined, so, given that we
n
want to aggregate n samples into m intervals of width w = m
, we define our x values to
be
xk = min(bcikw − acjkw , 0)
∑(k+1)w−1
1 if Vciℓ < 0
ℓ=kw
∑(k+1)w−1
yk = and Vcjℓ > 0
ℓ=kw
0 otherwise
114
as an indicator as to whether or not trading activity reflected arbitrage exploitation. We
use an interval period of a minimum of 5 seconds to account for the varying delays in data
streams from the exchanges. This results in 24 models (6 permutations of 3 exchanges and
4 cryptocurrencies) for each interval length. The expectation here is that the resulting
logistic regression curve would be centered around the price delta at which arbitrageurs
determined there was sufficient profit incentive to engage. Though we observe positive co-
efficients and low p-values suggesting a relationship between pricing and subsequent net
trading volume, in each of these cases, however, the resulting model predicts 0 for nearly
We present the regression results below including the threshold arbitrage price delta
directioned trading activity. We also present the probability outputs of each model for
the 0.1th percentile and 99.9th percentile x values (i.e. price discrepancies). From these
we can see that even though each model shows a positive correlation between magnitude
Even if we are less confident in the Bitfinex data and only look at the Bitstamp/GDAX
results, it appears that using this method we are unable to build models that successfully
We attempt this using interval periods of 5, 10, 20, and 30 seconds, exploring interval peri-
exchanges. We only include the results from the 5-second interval models below as they
are the most predictive—using wider interval lengths results in noisier outputs and even
115
coef Thresh- 0.1th %ile 99.9th %ile Sam-
const coef p-value old prediction (x) prediction (x) ples
BTC GDAX/Bitfinex -2.1543 0.0406 0.003 $53.06 0.1039 ($0.00) 0.1427 ($8.91) 86,400
BTC Bitfinex/GDAX -2.3124 0.0027 0.000 $856.44 0.0901 ($0.00) 0.1198 ($117.70) 86,400
BTC GDAX/Bitstamp -2.3051 0.0491 0.000 $46.95 0.0907 ($0.00) 0.2933 ($29.05) 86,400
BTC Bitstamp/GDAX -2.4290 0.0136 0.000 $178.60 0.0810 ($0.00) 0.1419 ($46.26) 86,400
BTC Bitfinex/Bitstamp -2.0960 0.0050 0.000 $419.20 0.1095 ($0.00) 0.1777 ($111.83) 86,400
BTC Bitstamp/Bitfinex -1.9963 0.2205 0.000 $9.05 0.1196 ($0.00) 0.3453 ($6.15) 86,400
ETH GDAX/Bitfinex -2.1965 0.3173 0.000 $6.92 0.1001 ($0.00) 0.1729 ($1.99) 86,400
ETH Bitfinex/GDAX -2.3670 0.0298 0.000 $79.43 0.0857 ($0.00) 0.1161 ($11.32) 86,400
ETH GDAX/Bitstamp -3.2522 0.8102 0.000 $4.01 0.0372 ($0.00) 0.2722 ($2.80) 86,400
ETH Bitstamp/GDAX -3.5825 0.3311 0.000 $10.82 0.0271 ($0.00) 0.1271 ($5.00) 86,400
ETH Bitfinex/Bitstamp -3.2871 0.1096 0.000 $29.99 0.0360 ($0.00) 0.1055 ($10.49) 86,400
ETH Bitstamp/Bitfinex -3.1290 0.7055 0.000 $4.44 0.0419 ($0.00) 0.1370 ($1.83) 86,400
LTC GDAX/Bitfinex -3.0095 1.5110 0.000 $1.99 0.0470 ($0.00) 0.0887 ($0.45) 86,400
LTC Bitfinex/GDAX -3.1883 0.5254 0.000 $6.07 0.0396 ($0.00) 0.1425 ($2.65) 86,400
LTC GDAX/Bitstamp -4.1180 2.4607 0.000 $1.67 0.0160 ($0.00) 0.0977 ($0.77) 86,400
LTC Bitstamp/GDAX -4.1422 1.5386 0.000 $2.69 0.0156 ($0.00) 0.0967 ($1.24) 86,400
LTC Bitfinex/Bitstamp -4.6585 1.4725 0.000 $3.18 0.0094 ($0.00) 0.1224 ($1.83) 86,400
LTC Bitstamp/Bitfinex -4.1281 12.7227 0.000 $0.32 0.0159 ($0.00) 0.1229 ($0.17) 86,400
BCH GDAX/Bitfinex -2.8973 0.1866 0.000 $15.53 0.0523 ($0.00) 0.1167 ($4.68) 86,400
BCH Bitfinex/GDAX -3.2958 0.0670 0.000 $49.19 0.0357 ($0.00) 0.1554 ($23.92) 86,400
BCH GDAX/Bitstamp -4.3352 0.2951 0.000 $14.69 0.0129 ($0.00) 0.1560 ($8.97) 86,400
BCH Bitstamp/GDAX -4.5634 0.2611 0.000 $17.48 0.0103 ($0.00) 0.1243 ($10.00) 86,400
BCH Bitfinex/Bitstamp -4.7983 0.1489 0.000 $32.23 0.0082 ($0.00) 0.1769 ($21.90) 86,400
BCH Bitstamp/Bitfinex -4.1415 0.6965 0.000 $5.95 0.1156 ($0.00) 0.2411 ($4.30) 86,400
116
Next, noting that at a given time t, it is impossible for an arbitrage opportunity to exist
in both directions for a given pair of exchanges, the construction is simplified by reducing
exchange permutations to combinations. Rather than keeping exchanges i and j fixed for
the training of a model, we instead consider an unordered pair of exchanges, and, at each
time t assign exchanges i and j such that bcit − acjt > bcjt − acit . The case in which
these quantities are equal is irrelevant as it implies that no arbitrage opportunity exists.
xk = min(bcikw − acjkw , 0)
∑(k+1)w−1
1 if Vciℓ < 0
ℓ=kw
∑(k+1)w−1
yk = and Vcjℓ > 0
ℓ=kw
0 otherwise
The potential for lost resolution through this simplification exists if there are real dif-
This may exist, for example, if one exchange makes depositing USD more expensive or
time consuming (or even impossible) compared to depositing cryptocurrencies (or vice
versa). This construction results in 12 models, but the results have similar outcomes.
Each model predicts 0 for nearly the entire range of input data. However, we do observe
wider separations between the 0.1th percentile and 99.9th percentile predictions for the
117
coef Thresh- 0.1th %ile 99.9th %ile Sam-
const coef p-value old prediction (x) prediction (x) ples
BTC GDAX/Bitfinex -2.2700 0.0019 0.000 $1194.74 0.0936 ($0.00) 0.1147 ($117.70) 86,400
BTC GDAX/Bitstamp -2.8166 0.0408 0.000 $69.03 0.0564 ($0.00) 0.2891 ($46.94) 86,400
BTC Bitfinex/Bitstamp -2.1515 0.0063 0.000 $341.51 0.1042 ($0.00) 0.1904 ($111.83) 86,400
ETH GDAX/Bitfinex -2.3849 0.0337 0.000 $70.77 0.0843 ($0.00) 0.1188 ($11.32) 86,400
ETH GDAX/Bitstamp -3.9157 0.5264 0.000 $7.44 0.0195 ($0.00) 0.2169 ($5.00) 86,400
ETH Bitfinex/Bitstamp -3.4033 0.1374 0.000 $24.77 0.0322 ($0.00) 0.1232 ($10.49) 86,400
LTC GDAX/Bitfinex -3.2747 0.5969 0.000 $5.49 0.0364 ($0.00) 0.1555 ($2.65) 86,400
LTC GDAX/Bitstamp -4.8525 2.7746 0.000 $1.75 0.0077 ($0.00) 0.1959 ($1.24) 86,400
LTC Bitfinex/Bitstamp -4.9412 1.7496 0.000 $2.82 0.0071 ($0.00) 0.1485 ($1.83) 86,400
BCH GDAX/Bitfinex -3.4781 0.0840 0.000 $41.41 0.0299 ($0.00) 0.1872 ($23.92) 86,400
BCH GDAX/Bitstamp -5.1460 0.3898 0.000 $13.20 0.0058 ($0.00) 0.2486 ($10.36) 86,400
BCH Bitfinex/Bitstamp -4.8133 0.1530 0.000 $31.46 0.0081 ($0.00) 0.1881 ($21.90) 86,400
Table 5.4: Regression results using 5-second intervals, indifferent to arbitrage direction
The next consideration is that, by providing every time interval, many of which have no
potential for arbitrage and thus arbitrage magnitudes (i.e. x values) of 0, the models are
being too biased towards 0 outputs. Accordingly, we attempt to train models excluding
intervals with x values of 0. This, however, yields the same result of nearly no predictions
of arbitrage activity and no model predicting arbitrage trading activity for the 99.9th per-
centile arbitrage differential. Compared to the last construction, we see smaller separa-
tions between predictions for the 0.1th percentile and 99.9th percentile samples, even for
GDAX/Bitstamp comparisons.
118
coef Thresh- 0.1th %ile 99.9th %ile Sam-
const coef p-value old prediction (x) prediction (x) ples
BTC GDAX/Bitfinex -2.2545 0.0016 0.000 $1409.06 0.0950 ($0.10) 0.1127 ($117.80) 85,973
BTC GDAX/Bitstamp -2.3788 0.0176 0.000 $135.16 0.0130 ($0.01) 0.1646 ($48.27) 68,118
BTC Bitfinex/Bitstamp -1.9962 0.0029 0.000 $688.35 0.0102 ($0.04) 0.1214 ($112.21) 81,396
ETH GDAX/Bitfinex -2.3193 0.0222 0.000 $104.47 0.0896 ($0.01) 0.1123 ($11.35) 84,466
ETH GDAX/Bitstamp -3.2916 0.2596 0.000 $12.68 0.0359 ($0.01) 0.1229 ($5.11) 57,601
ETH Bitfinex/Bitstamp -3.1734 0.0893 0.000 $35.54 0.0402 ($0.01) 0.0982 ($10.71) 77,442
LTC GDAX/Bitfinex -3.1483 0.4997 0.000 $6.30 0.0414 ($0.01) 0.1402 ($2.67) 81,470
LTC GDAX/Bitstamp -4.1487 1.6875 0.000 $2.46 0.0158 ($0.01) 0.1169 ($1.85) 49,457
LTC Bitfinex/Bitstamp -4.5924 1.4126 0.000 $3.25 0.0102 ($0.01) 0.1214 ($1.26) 68,881
BCH GDAX/Bitfinex -3.2473 0.0637 0.000 $50.98 0.0374 ($0.01) 0.1519 ($23.97) 77,768
BCH GDAX/Bitstamp -4.3330 0.2462 0.000 $17.60 0.0130 ($0.01) 0.1646 ($11.00) 43,562
BCH Bitfinex/Bitstamp -4.5068 0.1260 0.000 $35.77 0.0109 ($0.01) 0.1501 ($22.00) 68,959
Table 5.5: Regression results using 5-second intervals, excluding samples without arbitrage
pricing
Taking the idea further, we note that even if we remove samples with x values of 0, we still
have many data points that do not represent arbitrage opportunities due to exchange fees.
All three exchanges charge fees for taker orders, even for traders exceeding their highest
30-day volume thresholds. We therefore again build 12 models, this time excluding data
points in which the arbitrage price difference is assuredly not large enough to be prof-
itable.
Here, many of the results are once again inconclusive, but we note that the model for BTC
on GDAX/Bitstamp does predict arbitrage-directioned trade for more than 0.1% of sam-
ples. Given the exclusionary criteria, the total number of samples considered has been
substantially reduced, and in total we see predictions of 0.5 or greater for only 11 samples.
119
The threshold arbitrage gap implied by the model is $55.87.
BTC GDAX/Bitfinex -2.1621 0.0001 0.861 $21,621 0.1032 ($0.04) 0.1043 ($100.11) 57,143
BTC GDAX/Bitstamp -2.2126 0.0396 0.000 $55.87 0.0986 ($0.00) 0.5654 ($62.54) 7532
BTC Bitfinex/Bitstamp -1.9447 0.0029 0.000 $670.59 0.1251 ($0.03) 0.1599 ($97.77) 43,384
ETH GDAX/Bitfinex -2.3734 0.0421 0.000 $56.38 0.0852 ($0.00) 0.1242 ($9.99) 60,318
ETH GDAX/Bitstamp -2.8681 0.2854 0.000 $10.05 0.0538 ($0.00) 0.1533 ($4.06) 13,811
ETH Bitfinex/Bitstamp -3.1823 0.1314 0.000 $24.22 0.0398 ($0.00) 0.1258 ($9.46) 47,970
LTC GDAX/Bitfinex -2.9469 0.4665 0.000 $6.32 0.0499 ($0.00) 0.1425 ($2.47) 53,439
LTC GDAX/Bitstamp -3.6090 1.7994 0.000 $2.01 0.0264 ($0.00) 0.1633 ($1.10) 13,239
LTC Bitfinex/Bitstamp -4.3194 1.7054 0.000 $2.53 0.0131 ($0.00) 0.1576 ($1.55) 45,252
BCH GDAX/Bitfinex -3.1848 0.0808 0.000 $39.42 0.0397 ($0.02) 0.1814 ($20.75) 56,852
BCH GDAX/Bitstamp -3.6202 0.2761 0.000 $13.11 0.0261 ($0.00) 0.2710 ($9.53) 10,935
BCH Bitfinex/Bitstamp -4.1359 0.1331 0.000 $31.07 0.0109 ($0.00) 0.1501 ($19.05) 41,157
Table 5.6: Regression results using 5-second intervals, excluding samples without arbitrage
pricing net of the minimum exchange fees
We take this approach further and attempt a construction including only data points for
which arbitrage opportunities exist net of the highest possible exchange fees. That is to
say, if exchanges i and j have maximum taker fees fi and fj , we compute x values as xk =
We observe predictions for 99.9th percentile arbitrage opportunities to exceed 0.5 for each
cryptocurrency between Bitstamp and GDAX. However, since such large arbitrage oppor-
tunities are relatively rarer between Bitstamp and GDAX, the sample sizes have become
much much smaller and we no longer see results that are statistically significant at the 5%
120
level for BTC or ETH.
For LTC and BCH we observe thresholds of $0.69 and $12.28, respectively. However, even
though the 99.9th percentile prediction exceeds 0.5, this is a very low bar. For LTC we
observe only 10 samples for which predictions exceed 0.5, and for BCH, we observe only 2.
BTC GDAX/Bitfinex -2.1894 0.0012 0.424 $1824.50 0.1007 ($0.03) 0.1089 ($75.61) 23,876
BTC GDAX/Bitstamp 0.1402 0.0302 0.377 $4.64 0.5426 ($1.02) 0.8224 ($98.30) 27
BTC Bitfinex/Bitstamp -2.2208 0.0178 0.000 $124.76 0.0979 ($0.03) 0.3846 ($46.12) 19,049
ETH GDAX/Bitfinex -2.3045 0.0530 0.000 $43.48 0.0908 ($0.00) 0.1329 ($8.09) 34,892
ETH GDAX/Bitstamp -1.5970 0.4925 0.145 $3.24 0.1692 ($0.00) 0.5104 ($3.32) 203
ETH Bitfinex/Bitstamp -3.0596 0.1782 0.000 $17.17 0.0448 ($0.00) 0.1616 ($7.93) 22,404
LTC GDAX/Bitfinex -2.8832 0.7217 0.000 $4.00 0.0530 ($0.00) 0.1911 ($2.00) 33,726
LTC GDAX/Bitstamp -2.9163 4.2313 0.000 $0.69 0.0514 ($0.00) 0.6804 ($0.87) 886
LTC Bitfinex/Bitstamp -3.8906 2.2953 0.000 $1.70 0.0200 ($0.00) 0.2167 ($1.14) 19,136
BCH GDAX/Bitfinex -2.7721 0.0814 0.000 $34.06 0.0588 ($0.00) 0.1901 ($16.24) 28,304
BCH GDAX/Bitstamp -1.7542 0.1428 0.038 $12.28 0.1475 ($0.00) 0.7982 ($22.31) 306
BCH Bitfinex/Bitstamp -3.5254 0.1272 0.000 $27.72 0.0286 ($0.00) 0.1730 ($15.41) 16,852
Table 5.7: Regression results using 5-second intervals, excluding samples without arbitrage
pricing net of the maximum exchange fees
Even setting aside the Bitfinex data, we are unable to confidently identify meaningful rela-
tionships between trading activity and prices suggesting arbitrage opportunity. It may be
the case that our attempts to look at net volume are impacted by the relative differences
in volume seen on the two exchanges for each currency pair. With the volumes for BCH/
USD and LTC/USD so much higher on GDAX than on Bitstamp, it may be that trades
121
to “correct” the price on Bitstamp have little impact on GDAX and therefore do not move
net volume in the direction we would expect. Additionally, since these trading pairs also
exist on other exchanges, and related pairs (such as BCH/BTC or LTC/BTC) even exist
on these exchanges, only looking at volume for these pairs on these exchanges may be too
The next approach taken is to investigate the relationship between the net volumes on the
highest and lowest price exchanges in the time intervals following the emergence of an ar-
bitrage opportunity.
Hypothesis 2: For a given cryptocurrency, if the bid on the first exchange exceeds the
ask on the second exchange, then we expect a more negative correlation between the signed
volumes on the two exchanges in the following time interval compared to samples that do
Instead of limiting the period we examine to just a single time interval following the ap-
pearance of an arbitrage opportunity, in this approach we examine the following five pe-
riods. For each cryptocurrency, we evaluate the bid/ask data for all three exchanges si-
multaneously (rather than pairwise as had been described previously). At each time t, the
exchange with the highest bid price is denoted i and the exchange with the lowest ask
price is denoted j. We then collect net volumes Vi and Vj for time the next 5 time inter-
We expect for Vi and Vj to be more negatively correlated due to selling on the high-priced
exchange and buying on the low-priced exchange with a return towards non-arbitrage cor-
122
relation coefficients as time intervals increase due to the arbitrage opportunity being ex-
ploited away and the volume in subsequent periods being dominated by other types of
trading activity.
When looking at periods following arbitrage opportunities, for each of the time intervals,
Vi and Vj showed very small positive correlations for all four of the cryptocurrencies. This
is in sharp contrast to the much more positive correlations shown for other periods. We
use Fisher’s r to z transformation and then compute the z test statistic to show the sta-
tistical significance of these differences. Note that while, in most cases, we do see statisti-
cally significant differences between the correlations, we do not observe convergence over
the five subsequent time intervals. This suggests that trading behavior, even over 25 sec-
onds following the appearance of an arbitrage opportunity, does not exhibit evidence of
exploitation.
123
Arbitrage No Arbitrage Comparison
124
Given our prior experience with the apparent noisiness in the Bitfinex data, we also com-
pute these figures excluding Bitfinex. The results are even more conclusive.
125
Arbitrage No Arbitrage Comparison
We see that, during ordinary periods, net volume on GDAX and Bitstamp are strongly
correlated with r-values greater than 0.8 for all trading pairs. Contrastingly, in periods
following arbitrage opportunities, the correlation coefficients drop to less than 0.06 in all
cases. This shows that arbitrage opportunities are associated with differences in trading
126
5.5 Predicting arbitrage window length based on
trading volume
Seeing as though we find little evidence found of arbitrage exploitation in trading volume
during future time periods, the next approach is to investigate whether the duration of
the existence of an arbitrage opportunity can be predicted using the trading volume on
those exchanges during the affected time window. The theory being tested is that if there
is little trading volume on the relevant exchanges when an arbitrage opportunity exists, it
is likely to continue to exist whereas high volume would be indicative of more liquidity and
Hypothesis 3: For a given cryptocurrency, the length of the arbitrage window between the
two exchanges is shorter if the per-second trading volume (unsigned) on the two exchanges
For this, we construct a linear regression model. The dependent variable is the length of
the arbitrage window in seconds, and the independent variable is the mean of the unsigned
trade volumes on the two exchanges during the period. We use volume per second to ac-
count for the phenomenon that longer windows would naturally be expected to have more
volume. Arbitrage windows lasting fewer than 3 seconds are not considered at risk of these
being artifacts of the varying time delays in streaming data from the different exchanges.
127
coef p-value
Even though we see a much lower p-value for BCH, we note the model has r2 = 0.059, so
ing volume
Next, we try to see if the opening of an arbitrage window can be predicted using the trad-
then arbitrage window is more likely to open in the following few seconds.
For this, we include only time samples that reflected arbitrage opportunities where no
such opportunity existed at the time of the previous sample. We build a logistic regres-
sion model in which the x values are the sum of the unsigned trade sizes on the relevant
exchanges during the prior 5 seconds and the y values are 1 if an arbitrage opportunity
existed at the sample time and 0 otherwise. For cryptocurrency c, exchange pairs (i, j),
time t, and matches r each having price pr , size sr , and timestamp τr , we have:
∑
xc(i,j)t = pr sr
r|t−5≤τr <t
128
1 if bcit (1 − fi ) > acjt (1 + fj )
yc(i,j)t =
or bcjt (1 − fj ) > acit (1 + fi )
0 otherwise
However, with the results of the model, we do not find evidence of such a relationship.
The fitted coefficients of zero for x suggest that trading volume in the preceding interval
coef p-value
x 0.0000 0.0000
Table 5.17: Regression results for arbitrage window based on trading volume for BTC
coef p-value
x 0.0000 0.0000
Table 5.18: Regression results for arbitrage window based on trading volume for ETH
coef p-value
x 0.0000 0.0000
Table 5.19: Regression results for arbitrage window based on trading volume for LTC
coef p-value
x 0.0000 0.0000
Table 5.20: Regression results for arbitrage window based on trading volume for BCH
129
Chapter 6
Future Work
What we have learned through this work, particularly with regard to the existence of long-
term price discrepancies for cryptocurrencies across exchanges that are indicative of avail-
In the currency pair selection phase of this work, we decided to only consider US
Dollar-denominated trading pairs. This was based on the observation that US Dollar-
denominated pairs are typically the highest volume trading pairs and we were optimizing
for data points collected per unit time. In addition, noting the US Dollar’s position as a
commonly used reserve currency made this seem like a natural choice. Given the results
130
6.2 Longer Sample Interval
The sample we have used in this work has a duration of five days. This seemed reasonable
when our prior belief was that arbitrage opportunities would be closed within seconds.
However, now that we have shown that arbitrage opportunities exist for much longer, even
it would be interesting to explore a much longer sample size. With a longer sample, it
would also be possible to note the effects of returns and volatility in the prices of cryp-
tocurrencies.
trage through triangular trading across multiple currency pairs? For example, is it possible
to perform simultaneous trades on a single exchange to buy Bitcoin with US dollars, buy
Litecoin with Bitcoin, and sell Litecoin for US dollars with the end result being no net po-
At the onset of this work, we assumed that this would be impossible, but these assump-
tions now appear to be incorrect. First, we believed that cryptocurrency markets, while
still relatively young, were mature enough that arbitrage opportunities would actively be
clear that these are not being exploited. Second, from a casual inspection of the web trad-
ing interfaces of exchanges, we assumed that trades happen frequently enough that there
would be substantial barriers to entry to operate a low enough latency trading system to
engage in these types of arbitrage trades. However, the data collected shows that, without
any consideration toward latency optimization, we were able to build a system to collect
131
market data with much lower latency than the general frequency of trades. As such, this is
APIs
At the exchange-selection phase of this work, it was discussed how only exchanges with
WebSocket APIs were considered. This was because such interfaces provided stronger as-
surances of continuous data collection and were easier to implement against. It would be
interesting to also collect data from exchanges with less suitable interfaces and determine
The results presented showed significant differences in arbitrage opportunities for the same
currency pairs between different pairs of exchanges. It was suggested based on uncon-
firmed secondhand reports that this exchange also did not provide strong guarantees of
timely or reliable deposits or withdrawals. It would seem intuitive that this unreliability
would add significant friction to achieving price parity across exchanges. It would be in-
teresting to collect data on deposit and withdrawal reliability and latency across many
exchanges and see how these factors affect apparent arbitrage opportunities.
132
6.6 Comparing arbitrage opportunities with exchange
consumer confidence
There are a number of cryptocurrency exchanges available with significant overlap in cur-
rency pairs offered. Given the nascent state of the industry, the relative lack of regula-
tory certainty, and notable events of fraud in the past, it might be the case that would-
be-arbitrageurs see significant risk in doing business with some exchanges. The discount
factor applied due to this risk may outweigh potential profits from what would otherwise
across exchange pairs while noting either confidence in those exchanges or signals thereof
(such as presence of significant funding from reputable investors, public partnerships with
etc.).
133
Chapter 7
Conclusion
In this work we designed and implemented a system for collecting real-time pricing and
trading data from multiple cryptocurrency exchanges, GDAX, Bitfinex, and Bitstamp. We
collected continuous data streams for order book changes and trades for trading pairs in-
the US Dollar. We showed our system to be capable of handling hundreds of updates per
second and robust to undocumented server behavior and unexpected network conditions.
We implemented a data pipeline for processing the raw data streams into a time-series for-
mat. We developed heuristics to infer missing data points due to exchange server errors.
We normalized the data from the different exchanges to be able to compare them.
We analyzed the resulting data and presented bid/ask spreads and trading volume. We
showed how these vary across the different cryptocurrencies and exchanges. We demon-
strated how the market environment on Bitfinex differs from other exchanges and how this
affected our analysis. We showed that, when excluding Bitfinex, trading pairs involving
cryptocurrencies exhibiting more trading volume and higher market capitalizations were
134
tween price differences suggesting arbitrage opportunity and several variables relating to
trading activity. We presented evidence from the signed trading volume to show that trad-
ing behavior differs in periods following arbitrage opportunities, though we were only able
to find weak evidence that trading behavior during the seconds following the presence of
pricings.
135
Appendix A
Within the first few hours of beginning ticker data collection from the three exchanges, we
observe program crashes that result in data collection interruption. We discuss the issues
A.1.1 GDAX
In the first 24 hours of data collection, the stream from GDAX experiences three inter-
ruptions due to program crashes. Given this frequency, it appears to be a recurrent issue
that requires attention. The code is updated to add logging when the connection is closed
or experiences an error (neither of which is expected behavior based on the GDAX API
specification).
136
4 ['BTC-USD', 'ETH-USD', 'LTC-USD', 'BCH-USD'],
5 'wss://ws-feed.gdax.com',
6 null,
7 { channels: ['ticker'] }
8 );
9
20 websocket.on('close', () => {
21 console.error('close');
22 });
23
Listing A.1: JavaScript code for reading from GDAX ticker update stream with logging
With this logging, it is determined that these WebSocket connections are being
with ECONNRESET errors which suggests that, on occasion, the connection is being closed
forcibly without respect to the WebSocket protocol. This also appears to be similar to
problematic behavior that others note experiencing with the GDAX API [53, 44]. We
137
1 const Gdax = require('gdax');
2
20 websocket.on('close', () => {
21 websocket.connect();
22 });
Listing A.2: Final JavaScript code for reading from GDAX ticker update stream
This approach works in all cases except for GDAX’s planned outages for maintenance.
Since these planned outages are infrequent and also result in temporary changes to trad-
ing rules, the sample we use for analysis is taken from a time period that does not contain
A.1.2 Bitfinex
The connection with Bitfinex also experienced similar recurrent, unexpected crashes. We
138
1 const BFX = require('bitfinex-api-node');
2
5 const ws = bfx.ws(1);
6
7 ws.on('open', () => {
8 ws.subscribeTicker('BTCUSD');
9 ws.subscribeTicker('ETHUSD');
10 ws.subscribeTicker('LTCUSD');
11 ws.subscribeTicker('BCHUSD');
12 });
13
23
28 ws.on('close', () => {
29 console.error('close');
30 });
31
36 ws.open();
139
Listing A.3: JavaScript code for reading from Bitfinex ticker update stream with logging
This logging reveals three types of behavior that causes crashes. Two of the types are sim-
ilar to the issues with GDAX where there are unexpected WebSocket closes without warn-
ing and ECONNRESET errors where the underlying connection is forcibly closed. The most
frequent type of halt though occurs when the WebSocket connection is closed by the server
after the server sends an info message informing the client of an impending server stop
requesting the client to reconnect. In light of these discoveries, we employ a similar miti-
gation strategy to automatically reconnect upon close. However, this needs to be slightly
adjusted to include a 500 millisecond delay between close and reconnect to make this work
5 const ws = bfx.ws(1);
6
7 ws.on('open', () => {
8 ws.subscribeTicker('BTCUSD');
9 ws.subscribeTicker('ETHUSD');
10 ws.subscribeTicker('LTCUSD');
11 ws.subscribeTicker('BCHUSD');
12 });
13
140
22
23 ws.on('close', () => {
24 setTimeout(() => {
25 ws.open();
26 }, 500);
27 });
28
29 ws.open();
Listing A.4: Final JavaScript code for reading from Bitfinex ticker update stream
With these changes, we are able to maintain continuous data collection from Bitfinex.
A.1.3 Bitstamp
Although there are no issues with data collection from Bitstamp in the same time frame,
we deem it prudent to investigate similar mitigation strategies in the event that spurious
disconnection issues arise in the future. However, unlike GDAX and Bitfinex, Bitstamp’s
WebSocket API is provided through a third-party company called Pusher that offers its
own publish/subscribe messaging service with an abstraction level and library on top of
native WebSockets. We can see this by examining the differences in code between what is
used for Bitstamp and what is used for the other exchanges. The result is that it is not
possible to monitor for WebSocket closures or errors in the same way it is in the other
cases. While this initially seems worrisome, we eventually observe that the Bitstamp code
To reconstruct the state of the order book at each update time, we first attempt to use the
141
1 def process_update(update):
2 global bestbid, bestask
3
4 if update['data']['amount'] > 0:
5 if update['data']['count'] == 0:
6 del bidsdict[update['data']['price']]
7 else:
8 bidsdict[update['data']['price']] = {
9 'count': update['data']['count'],
10 'amount': update['data']['amount'],
11 }
12 if bidsdict:
13 curbestbid = max(bidsdict)
14 if curbestbid != bestbid['price']:
15 bestbid = {
16 'price': curbestbid,
17 'amount': bidsdict[curbestbid]['amount'],
18 'count': bidsdict[curbestbid]['count'],
19 }
20 output.append({
21 'date': update['date'],
22 'pair': update['pair'],
23 'best_bid': bestbid,
24 'best_ask': bestask,
25 })
26 else:
27 bestbid = {
28 'price': -math.inf,
29 'date': update['date'],
30 }
31 elif update['data']['amount'] < 0:
32 if update['data']['count'] == 0:
33 del asksdict[update['data']['price']]
34 else:
35 asksdict[update['data']['price']] = {
36 'count': update['data']['count'],
37 'amount': update['data']['amount'],
142
38 }
39 if asksdict:
40 curbestask = min(asksdict)
41 if curbestask != bestask['price']:
42 bestask = {
43 'price': curbestask,
44 'amount': asksdict[curbestask]['amount'],
45 'count': asksdict[curbestask]['count'],
46 }
47 output.append({
48 'date': update['date'],
49 'pair': update['pair'],
50 'best_bid': bestbid,
51 'best_ask': bestask,
52 })
53 else:
54 bestask = {
55 'price': math.inf,
56 'date': update['date'],
57 }
However, this code does not work on the data stream that we collect. The first error that
we encounter is the presence of an update that indicates that a price level should be re-
moved from the order book even though that price level is not present in the state of the
order book immediately prior to receiving the update. At first, the assumption is that
there is a bug in the code processing the data stream, but examining all of the recorded
data between the beginning of the stream and the problematic update reveals that the
price level in question has never been in use. At this point, with the assumption that this
is a single error, we modify the code to ignore this particular update. However, proceeding
further eventually encounters another such invalid update referring to a deletion for a non-
existent price level. This requires us to change the code to handle this type of error in the
143
general case as follows:
1 def process_update(update):
2 global bestbid, bestask
3
4 if update['data']['amount'] > 0:
5 if update['data']['count'] == 0:
6 if update['data']['price'] in bidsdict:
7 del bidsdict[update['data']['price']]
8 else:
9 bidsdict[update['data']['price']] = {
10 'count': update['data']['count'],
11 'amount': update['data']['amount'],
12 }
13 if bidsdict:
14 curbestbid = max(bidsdict)
15 if curbestbid != bestbid['price']:
16 bestbid = {
17 'price': curbestbid,
18 'amount': bidsdict[curbestbid]['amount'],
19 'count': bidsdict[curbestbid]['count'],
20 }
21 output.append({
22 'date': update['date'],
23 'pair': update['pair'],
24 'best_bid': bestbid,
25 'best_ask': bestask,
26 })
27 else:
28 bestbid = {
29 'price': -math.inf,
30 'date': update['date'],
31 }
32 elif update['data']['amount'] < 0:
33 if update['data']['count'] == 0:
34 if update['data']['price'] in asksdict:
35 del asksdict[update['data']['price']]
144
36
37 else:
38 asksdict[update['data']['price']] = {
39 'count': update['data']['count'],
40 'amount': update['data']['amount'],
41 }
42 if asksdict:
43 curbestask = min(asksdict)
44 if curbestask != bestask['price']:
45 bestask = {
46 'price': curbestask,
47 'amount': asksdict[curbestask]['amount'],
48 'count': asksdict[curbestask]['count'],
49 }
50 output.append({
51 'date': update['date'],
52 'pair': update['pair'],
53 'best_bid': bestbid,
54 'best_ask': bestask,
55 })
56 else:
57 bestask = {
58 'price': math.inf,
59 'date': update['date'],
60 }
Listing A.6: Updated Python code to process Bitfinex Order Book updates
It is unclear whether we see these invalid updates as a result of missing previous additions
to the order book at the relevant price level or if the deletes are spuriously inserted into
With these modifications in place, it is possible to run this code over the entire Bitfinex
order book dataset to compute the best bid and best ask prices at each point in time.
However, while confirming the validity of this output, we note that there are instances
145
where the best bid is greater than the best ask even though this should not be possible.
Further analysis reveals that this condition exists for hours at a time continuously. Upon
inspection of the raw data updates from Bitfinex, it appears that this is happening be-
cause a single price level remains on the bid side of the order book that is much higher
than the rest (and higher than many, if not all, of the ask price levels present). We assume
that this is caused by a missed delete message for that price level. A manual inspection of
the recorded data stream shows that no such update is present. On the chance that this
is a rare error, we attempt inserting such a delete into the stream, but this only results in
analogous issues later in the data stream. We therefore determine that this is a pervasive
issue likely caused by bugs with Bitfinex that we need to account for.
Dealing with this issue is more complex than the case of a delete for a record that does
not exist as discussed previously. In this case, we need to formulate a heuristic to deter-
mine when an update should have been sent even though we receive no such update. The
first step in our approach is to assume that the invariant of the best bid being strictly less
than the best ask should always hold. The thought is that, in these cases, we should as-
sume that the newer update (bid or ask price level update) is valid and that there has
been a missed update deleting the older of the two price levels. However, in our manual
inspection in the previous phase, we note that there are often cases in which both sides of
the order book visible to us (25 price levels in each direction) are cleared in a quick suc-
cession of updates with the same timestamp. Timestamps are being recorded to the near-
est millisecond, so that is to say that we receive up to 50 updates within one millisecond.
Since this data restoration involves inferring missed data points, we deem it desirable to
minimize the number of such inferred data points we add to the stream. It therefore seems
best then to process updates we receive together in batch before testing the invariant and
We first employ this method to batch updates with the same timestamp. After this is
146
done, analysis is undertaken to examine the amount of time between updates. We deter-
mine that the most common delta between successive updates is 0 milliseconds. The sec-
ond most common delta is 1ms, and 2ms is third. When we examine the logs around in-
stances of these phenomena, it appears as though these cases represent a single logical up-
date from the Bitfinex servers, as we observe both sides of the order book being replaced
over the course of a few milliseconds. Based on these observations, we decide to process a
stream of updates as a single batch as long as the time delta between successive updates
does not exceed 5ms. With this threshold, it is possible to maintain our invariant with an
average of fewer than one inferred update per million updates processed.
As we develop these data cleansing techniques, testing each new iteration of code over
even just one day’s worth of updates for a single currency pair can take hours. This is
that the most costly computation step is parsing the timestamp embedded in each up-
date. This becomes necessary when we decide to use the time deltas between successive
updates to determine batching. To make our processing more efficient, we use separate
code to handle this timestamp parsing and compute the relevant time deltas as shown.
1 import dateutil.parser
2 import json
3 import sys
4
5 last_date = None
6 last_date_str = ''
7
147
14 date = dateutil.parser.parse(date_str)
15 if last_date is not None:
16 diff = date - last_date
17 print(diff.total_seconds())
18
19 last_date_str = date_str
20 last_date = date
Listing A.7: Python code to pre-process time deltas between Bitfinex Order Book updates
We then read these time deltas in parallel when processing the data stream so that we can
avoid timestamp parsing in this step. In all, we are able to consolidate more than 1 mil-
lion updates per day per currency pair to approximately 100,000 updates per day per cur-
rency pair. We are then able to output best bid and best ask prices at each update time
148
Bibliography
[5] Utpal Bhattacharya, Craig W Holden, and Stacey Jacobsen. Penny wise, dollar
foolish: Buy–sell imbalances on and around round numbers. Management Science,
58(2):413–431, 2012.
[9] Russell Brandom and Sarah Jeong. Why the feds took down one of Bit-
coin’s largest exchanges. https://www.theverge.com/2017/7/29/16060344/
btce-bitcoin-exchange-takedown-mt-gox-theft-law-enforcement, 2017.
[11] Evelyn Cheng. Bitcoin debuts on the world’s largest futures ex-
change, and prices fall slightly. https://www.cnbc.com/2017/12/17/
149
worlds-largest-futures-exchange-set-to-launch-bitcoin-futures-sunday-night.
html, 2017.
[12] Evelyn Cheng. Bitcoin exchange Coinbase has more users than
stock brokerage Schwab. https://www.cnbc.com/2017/11/27/
bitcoin-exchange-coinbase-has-more-users-than-stock-brokerage-schwab.
html, 2017.
[17] John Detrixhe. A South Korean bitcoin exchange has filed for
bankruptcy after being hacked again. https://qz.com/1160573/
bitcoin-exchange-youbit-files-for-bankruptcy-in-south-korea-after-latest-hack/,
2017.
[18] John Detrixhe and Joon Ian Wong. Bitcoin could fall below $5,000 if this
report on a mysterious cryptotoken is right. https://qz.com/1196866/
bitcoin-prices-could-be-40-lower-because-tether-propped-it-up/, 2018.
[19] Ittay Eyal and Emin Gün Sirer. Majority is not enough: Bitcoin mining is vulnerable.
In International conference on financial cryptography and data security, pages 436–
454. Springer, 2014.
[20] Stephen Gandel. Bitcoin’s Price Isn’t Always What You Think It
Is. https://www.bloomberg.com/gadfly/articles/2017-12-08/
bitcoin-s-price-isn-t-always-what-you-think-it-is, 2017.
150
[22] John M Griffin and Amin Shams. Is Bitcoin Really Un-Tethered? 2018.
[24] Ethan Heilman, Neha Narula, Thaddeus Dryja, and Madars Virza. IOTA Vulner-
ability Report: Cryptanalysis of the Curl Hash Function Enabling Practical Signa-
ture Forgery Attacks on the IOTA Cryptocurrency. https://github.com/mit-dci/
tangled-curl/blob/master/vuln-iota.md, 2017.
[25] Stan Higgins. Cryptsy Threatens Bankruptcy, Claims Millions Lost in Bitcoin Heist.
https://www.coindesk.com/cryptsy-bankruptcy-millions-bitcoin-stolen/,
2016.
[26] Stan Higgins. $300 Billion: Bitcoin Price Boosts Crypto Mar-
ket Value to Record High. https://www.coindesk.com/
300-billion-bitcoin-price-boosts-crypto-market-value-record-high/,
2017.
[27] Stan Higgins. As Bitcoin Soars, Prices Diverge Wildly Across Exchanges. https://
www.coindesk.com/bitcoin-soars-prices-diverge-wildly-across-exchanges/,
2017.
[32] Coinbase Inc. You Can Now Buy And Sell Bitcoin By Connect-
ing Any U.S. Bank Account. https://blog.coinbase.com/
you-can-now-buy-and-sell-bitcoin-by-connecting-any-u-s-bank-account-72457ab182c5,
2012.
[34] KGO Television Inc. Bitcoin expert explains the cryptocurrency. http://abc7news.
com/finance/bitcoin-expert-explains-the-cryptocurrency/2801193/, 2017.
[35] Arjun Kharpal. All you need to know about tether, the cryptocurrency that could
have ’devastating’ effects on the market. https://www.cnbc.com/2018/02/02/
tether-what-you-need-to-know-about-the-cryptocurrency-worrying-markets.
html, 2018.
151
[36] Arjun Kharpal. Bitcoin’s dominance of the cryptocurrency mar-
ket is at its lowest level ever. https://www.cnbc.com/2018/01/02/
bitcoin-dominance-of-cryptocurrency-market-lowest-level-ever.html, 2018.
[37] Nejc Kodrič. www.BITSTAMP.net Bitcoin exchange site for USD/BTC. https:
//bitcointalk.org/index.php?topic=38711.0, 2011.
[38] Timothy B. Lee. A brief history of Bitcoin hacks and frauds. https://arstechnica.
com/tech-policy/2017/12/a-brief-history-of-bitcoin-hacks-and-frauds/,
2017.
[39] Timothy B. Lee. Skyrocketing fees are fundamentally changing bitcoin. https://
arstechnica.com/tech-policy/2017/12/bitcoin-fees-rising-high/, 2017.
[40] Timothy B. Lee. Why experts are worried about Tether, a dollar-pegged
cryptocurrency. https://arstechnica.com/tech-policy/2018/02/
tether-says-its-cryptocurrency-is-worth-2-billion-but-its-audit-failed/,
2018.
[45] Igor Makarov and Antoinette Schoar. Trading and Arbitrage in Cryptocurrency Mar-
kets. 2018.
[46] Robert McMillan. The Inside Story of Mt. Gox, Bitcoin’s $460 Million Disaster.
https://www.wired.com/2014/03/bitcoin-exchange/, 2014.
[48] Arvind Narayanan, Joseph Bonneau, Edward Felten, Andrew Miller, and Steven
Goldfeder. Bitcoin and Cryptocurrency Technologies: A Comprehensive Introduction.
Princeton University Press, 2016.
[49] BBC News. China orders Bitcoin exchanges in capital city to close. https://www.
bbc.com/news/business-41320568, 2017.
[50] Maureen O’Hara. High-frequency trading and its impact on markets. Financial Ana-
lysts Journal, 69(2), 2013.
152
[51] Robt Price. One of the world’s biggest bitcoin exchanges
has been hacked. http://www.businessinsider.com/
south-korean-bitcoin-exchange-bithumb-hacked-ethereum-2017-7, 2017.
[52] Kenneth Rapoza. Good Luck Buying Bitcoin In India As Central Banker
Bans. https://www.forbes.com/sites/kenrapoza/2018/04/05/
good-luck-buying-bitcoin-in-india-as-central-banker-bans/, 2018.
[54] Fergal Reid and Martin Harrigan. An analysis of anonymity in the bitcoin system. In
Security and privacy in social networks, pages 197–223. Springer, 2013.
[55] Dorit Ron and Adi Shamir. Quantitative analysis of the full bitcoin transaction
graph. In International Conference on Financial Cryptography and Data Security,
pages 6–24. Springer, 2013.
[58] Robert J Shiller. Irrational exuberance: Revised and expanded third edition. Princeton
university press, 2015.
[59] Lauren Shin. Bitstamp Becomes First Nationally Licensed Bitcoin Exchange; License
Applies In 28 EU Countries. https://www.forbes.com/sites/laurashin/2016/04/
25/7886/, 2016.
[61] Hugh Son, Dakin Campbell, and Sonali Basak. Goldman Is Setting Up a Cryptocur-
rency Trading Desk. https://www.bloomberg.com/news/articles/2017-12-21/
goldman-is-said-to-be-building-a-cryptocurrency-trading-desk, 2017.
153
[64] Swartzcenter. HAS ANYONE BEEN ABLE TO WITHDRAW FROM BITFINEX?
https://www.reddit.com/r/bitfinex/comments/7g92u8/has_anyone_been_able_
to_withdraw_from_bitfinex/, 2017.
[65] Neer Varshney. This hacker made $120K in a week by finding bugs in EOS cryptocur-
rency. https://github.com/mit-dci/tangled-curl/blob/master/vuln-iota.md,
2018.
[66] Gavin Wood. Ethereum: A secure decentralised generalised transaction ledger, 2014.
154