You are on page 1of 9

Sequence-Based Target Coin Prediction for Cryptocurrency

Pump-and-Dump
Sihao Hu1 , Zhen Zhang1 , Shengliang Lu1 , Bingsheng He1 , Zhao Li2
1 National
University of Singapore, 2 Zhejiang University
{sihao.hu,zhen,lusl,dcsheb}@nus.edu.sg,zhao_li@zju.edu.cn

ABSTRACT
As the pump-and-dump schemes (P&Ds) proliferate in the cryp-
tocurrency market, it becomes imperative to detect such fraudulent
arXiv:2204.12929v1 [q-fin.ST] 21 Apr 2022

activities in advance, to inform potentially susceptible investors


before they become victims.
In this paper, we focus on the target coin prediction task, i.e.,
to predict the pump probability of all coins listed in the target ex-
change before a pump. We conduct a comprehensive study of the
latest P&Ds, investigate 709 events organized in Telegram chan-
nels from Jan. 2019 to Jan. 2022, and unearth some abnormal yet Figure 1: The $NAS pump-and-dump coordinated by the
interesting patterns of P&Ds. Empirical analysis demonstrates that Telegram channel "Binance Trading Signals".
pumped coins exhibit intra-channel homogeneity and inter-channel As P&Ds occur frequently and attract more public attention,
heterogeneity, which inspires us to develop a novel sequence-based it becomes imperative to detect such fraudulent activities[13, 27]
neural network named SNN. Specifically, SNN encodes each chan- and inform potentially susceptible investors before they become
nel’s pump history as a sequence representation via a positional victims. Existing efforts[14–16, 20] mainly fall into the category
attention mechanism, which filters useful information and allevi- of P&D post-detection, which aims to detect whether a P&D has
ates the noise introduced when the sequence length is long. We happened or is happening. However, we argue that this paradigm
also identify and address the coin-side cold-start problem in a prac- fails to meet the practical needs since the coin price can easily reach
tical setting. Extensive experiments show a lift of 1.6% AUC and its peak in less than one minute after a P&D begins, thus leaving
41.0% Hit Ratio@3 brought by our method, making it well-suited no time to alert investors. In this paper, we focus on the target
for real-world application. As a side contribution, we release the coin prediction task, i.e., to predict the pump likelihood of all coins
source code of our entire data science pipeline on GitHub1 , along listed in the exchange one hour before a P&D begins, given the
with the dataset tailored for studying the latest P&Ds. scheduled time and target exchange announced in advance. The
target coin prediction task is a challenging and interesting data
1 INTRODUCTION science problem. First, in order to achieve accurate prediction, we
Pump-and-dump (P&D) is a manipulative scheme that attempt to rely on heterogeneous data sources, including structure data such
boost an asset price before selling the cheaply purchased assets at as the historical statistics for trading price and volume, and text
a higher price. Although it originates from the stock market[23], data that collected from Telegram. Different data sources requires
the burgeoning popularity of cryptocurrencies[5, 10, 24] has re- different tools and algorithms for extracting the useful and timely
sulted in the proliferation of P&Ds within the industry. The event information for the target coin prediction task. Second, we need to
of organizing cryptocurrency P&Ds mainly took place in encrypted have an efficient data science pipeline in order to timely alert the
messaging platforms such as Telegram. Organizers create pump potential coins targeted by P&Ds.
channels to attract speculators, where only organizers can post mes- To this end, we propose an efficient data science pipeline and
sages for other members to read. For a planned pump, the organizer present it in Figure 2, which mainly consists of two parts: data col-
posts an announcement several days in advance, which clearly lection and target coin prediction. The data collection module aims
states the target date, time, and exchange. As the time approaches to find out historical P&Ds as many as possible. It explores pump
the pump time, the organizers use a countdown strategy to remind channels based on verified seed pump channels, gathers tremendous
members intermittently, not disclosing the target coin until the text messages from Telegram, and identifies useful pump messages
scheduled time. As demonstrated in Figure 1, a P&D coordinated by keyword filtering strategy and effective machine learning al-
by the Telegram channel "Binance Trading Signals" (a channel with gorithms. The data collection module works in an offline manner,
more than 300,000 subscribers) took place in Binance on August maintains a P&D dataset, and updates it on a regular basis. The
22, 2021, at 17:00 UTC. After the coin NAS was disclosed, the price target coin prediction module aims to make effective and efficient
inflated immediately and reached 250% in one minute, and then predictions given the pump messages identified in the data collec-
dropped shortly afterward. Although the whole process lasted only tion stage. It collects multi-modal features from heterogeneous data
five minutes, a staggering wealth of around 27 million dollars was sources, including market statistics for trading price and volume,
transferred between the participants. and coin id embeddings pre-trained on a large Telegram corpus.
Taking the above as the input, the prediction model outputs the
1 https://github.com/Bayi-Hu/Pump-and-Dump-Detection-on-Cryptocurrency pump probability of all coins one hour before the scheduled pump
Conference’17, July 2017, Washington, DC, USA Hu et al.

Figure 2: The data science pipeline that consists of two parts: data collection and target coin prediction.
time. The target coin prediction module can achieve a real-time Contributions: To summarize, the contributions are as follows:
response to ensure the timeliness of the whole pipeline. • Pipeline. We build a data science pipeline that extracts useful and
Empirically, our data pipeline filters 709 P&Ds out of 4,674,822 timely information from heterogeneous data sources, to support
Telegram messages posted from Jan. 2019 to Jan. 2022. In order effective and efficient target coin prediction.
to unearth some useful signals and characteristics of P&Ds to im- • Analysis. We present a comprehensive study on the latest P&Ds,
prove our prediction model, we present a comprehensive study and and uncover some interesting characteristics of P&Ds such as
discover that: 1) Coins with middle-cap value and high discussion intra-channel homogeneity and inter-channel heterogeneity.
degree in social media and forums are more likely to be targeted, • Model. We propose the SNN model for target coin prediction
indicating that the choice of the coin is well thought out; 2) The for the first time, which significantly improves the performance
buy-in behavior of organizers starts within 60 hours before the of the task. We also firstly identify and address the coin-side
scheduled pump time, and the pre-pump phenomenon confirms the cold-start problem in a practical setting.
existence of the hierarchy of Telegram channels. Precursors like the
Additionally, we open-source the entire data science pipeline to
boost in price and trading volume caused by insiders (organizers
facilitate the community’s research on the latest P&Ds.
and VIP members) can be a powerful signal for detecting P&D coins
in advance; 3) Coins pumped by the same channel exhibit homo- 2 A TYPICAL PUMP-AND-DUMP PROCESS
geneity, not only in terms of statistics like market cap, but also in Set up: A successful pump requires a large number of participants
semantics. In contrast, different channels exhibit heterogeneity in to buy the same cryptocoin at the same time. To this end, the
selection tactics. organizer would construct a public channel, which is a specific
Analysis above sheds light on the design of our prediction method. form of group chat that only allows the organizer to post messages.
There exists one recent work focusing on target coin prediction[26], Telegram is the most popular choice due to its encryption and
which regards P&Ds as isolated, unrelated events. Nevertheless, anonymity. Then the organizer recruits as many subscribers as
according to our analysis, there is some relationship between coins possible by posting invitation links on social media or forums like
pumped by the same channel, i.e., a channel’s pump history rep- Reddit and Twitter, and other pump channels for advertisement.
resents a specific selection pattern, which somehow reflects the Moreover, we also notice that many pump organizers create another
future target coin. Motivated by this, for each sample, we construct private VIP channels, which only accept paid members to subscribe.
its channel’s pump history as a sequence, and extract a sequence In return, VIP members will be informed of the target coin name
embedding to represent its selection pattern by an elaborately de- minutes or even hours before the coin name is released on the
signed sequence-based deep neural network (SNN). A positional public channel.
attention mechanism is proposed to filter useful information out of Pump Announcement: When the public channel gets enough
the sequence and alleviate the noise introduced when the sequence members, the pump organizer releases a pump announcement sev-
length is quite long. We also present a solution for addressing the eral days before the official pump time. The announcement usually
coin-side cold start problem in a practical setting. To our best knowl- includes three important information: the precise time to initiate
edge, the deep-learning technique is introduced in this task for the the pump, the target exchange, and the pairing coin. Then as the
first time. We conduct comprehensive experiments on a real-world named pump time approaches, organizers keep reminding mem-
dataset, and observe a lift of 1.6% AUC and 41.0% HR@3, showing bers of the pump event by counting down. The counting frequency
the effectiveness of our method. The overall method achieves 0.943 gradually increases from once a day to once a few hours, and then
AUC, 43.0% Hit Ratio@3, and 53.2% Hit Ratio@5, suggesting our once a few minutes before the pump time. During this process, the
method is well-suited for the real-world application. organizers advise members to transfer sufficient pairing coin into
Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump Conference’17, July 2017, Washington, DC, USA

the target exchange and post some "pump rules" like "Buy fast," "Do Table 1: Pump Message Detection Performance
not sell immediately," "Spread the word out over the social media Model AUC Precision Recall F1
to make pump last as long as possible," etc. LR 0.988 0.892 0.913 0.902
Pre-pump: As mentioned above, VIP members know the target RF 0.994 0.901 0.939 0.920
coin name in advance, so they can buy coins at a lower price and
Table 2: P&D Dataset Statics
wait to sell them at a higher price at the pump time. If VIP members
buy a large volume of coins in a short time, it can lead to a price # Sample # Event # Channel # Coin # Exchange

hike before the pump time, which we call "pre-pump." 1,335 709 108 278 18
Pump: Upon just a few minutes before the pre-arranged pump The workflow of our pump message detection is presented in
time, the organizers will end the countdown with a typical message Figure 2: Firstly, we use a keyword matching strategy to increase the
like: "The next message will be the coin name!" And then, the coin proportion of pump messages: if a message mentions any coin name
name will be posted at the pump time, generally the uppercase or exchange name or keywords such as "pump," "target," "hold," "sell,"
symbol name of the coin, like "FIC,". Then the coin price surges etc., then it is reserved, otherwise is filtered out. This simple but
immediately just in one minute, and sometimes it can reach several useful strategy selects 2,193,549 messages out of 4,674,822 messages.
times the original price. Secondly, we randomly label 5,050 messages as pump and not-
Dump: Though the organizer urges members to hold the coin and pump messages. Positive samples include pump announcements,
do not sell it immediately, participants all know that the price will pump countdowns, target coin releases, and post-pump reviews. We
immediately get down, and they do not want to sell later than remove punctuations, stop words, URLs, and emojis in the message
others. So just a few moments after the price reaches its peak, it and represent each message as a TF-IDF vector. We use 70% labeled
starts to drop, which in turn leads to more participants panic-selling. samples to train the Random Forest (RF) and the Logistic Regression
Eventually, the coin price will bounce back to the original value (LR) model, then test them on the 30% left samples. The results
before the P&D or even lower. are presented in Table 1, where precision, recall, and F1 score are
Post-pump Review: Usually, within a few hours after the P&D, the calculated based on a relatively low threshold of 0.2 to identify
organizer reviews the pump results in the channel, which mainly as many pump messages as possible. Considering a single pump
includes the trading volume, the start price, and the peak price, to message is not enough to give out the complete information of a
tout how much the coin price was increased by the pump. We also P&D, we aggregate messages from the same channel into sessions
notice that some organizers did not post the review as the pump if the time interval between two adjacent messages is less than 24
did not reach their expected levels. hours. In this case, a session is the minimum time unit for a channel
3 DATA COLLECTION to hold a P&D, and we assume that a P&D occurs at most once in a
session (which is verified by us manually). Finally, we get 1,335 P&D
3.1 Channel Exploration samples out of 2,006 sessions based on the 37,525 pump messages
We focus on exploring the pump channels in Telegram, as it is the identified by the RF model, and present the statistics in Table 2,
biggest platform for P&Ds to thrive[21, 26]. Specifically, we adopt where a sample is presented in a quintuple form, i.e., (channel_id,
the snowball strategy[21] to explore as many channels as possible, timestamp, exchange, pairing coin, target coin).
starting from a known seed of pump channels. This is based on
an observation that channel admins post invitation links across 4 PUMP-AND-DUMP SCHEMES ANALYSIS
other channels to attract more subscribers to participate in P&Ds. We conduct a series of analysis studies in coin, event, and channel
We collect 1,142 original channels from PumpOlymp[3], a website levels to answer the following research questions:
that provides verified P&D channels in Telegram. Then we use • Q1: What kind of coins are vulnerable to the P&D?
the Telethon API[4] to check their status. Among those channels, • Q2: Can P&Ds be pre-detected in advance?
nearly half (564/1,142) have been deleted due to inactivity for over • Q3: Do the selection strategies vary from channel to channel?
6 months. We retrieve messages and extract the invitation links
from the left 578 channels. As a result, we get 750 new invitation 4.1 Coin-Level Analysis (Q1)
links following this strategy. After filtering out the expired links To investigate Q1, we use the CoinGecko[2] API to retrieve coins’
and links to group chats or individuals, we get 137 channels that historical daily information like market capitalization, volume data,
are still inactive and not duplicated with the seed channels. We and social media indices, etc. For coins that are pumped, we collect
retrieve a total of 4,674,822 messages that are posted on these 715 the data from three days before the pump time, as the data is more
channels between Jan. 1, 2019, and Jan. 13, 2021. stable before P&Ds. For a fair comparison, we randomly retrieve
historical data of the top 4000 coins ranked on the CoinGecko web
3.2 Pump Message Detection from Jan. 1, 2019, to Jan. 13, 2021. Note that almost all pumped
The amount of data we collect is tremendous that manual an- coins are ranked above 4000. Figure 3 demonstrates the distribu-
notation becomes prohibitively expensive and time-consuming. tions of pumped coins and the top 4000 coins on different feature
Generally speaking, most of the Telegram messages are about fields. Due to the space limitation, we only report four features,
cryptocurrency-related news, buy/sell signals, and advertisements i.e., the market cap, Alexa rank (an index represents the global web
that are not related to P&Ds. Fortunately, most of the pump-relevant popularity), the number of Reddit subscribers, and the number of
messages follow certain specific patterns and can be easily recog- Twitter followers. From Figure 3(a) and (b), we observe that the
nized, making them detectable by the machine learning methods. distributions of market cap and Alexa rank of pumped coins are
Conference’17, July 2017, Washington, DC, USA Hu et al.

8.2

8.0

Price in BTC (1e-5)


7.8

7.6

7.4

7.2

7.0

6.8

6.6
-72:00 -66:00 -60:00 -54:00 -48:00 -42:00 -36:00 -30:00 -24:00 -18:00 -12:00 -6:00 0:00 6:00 12:00 18:00 24:00

(a) Average price in BTC over time


1.4

(a) Log market cap in USD (b) Log Alexa rank

Trading Volume (1e6)


1.2

1.0

0.8

0.6

0.4

0.2

0
-72:00 -66:00 -60:00 -54:00 -48:00 -42:00 -36:00 -30:00 -24:00 -18:00 -12:00 -6:00 0:00 6:00 12:00 18:00 24:00

(b) Average trading volume in target coin over time


12 2.00

Trading Volume in EDO (1e5)


10 1.75

Return Rate (%)


1.50
8
1.25
6

(c) Log number of reddit subscribers (d) Log number of twitter followers 4
1.00

0.75

Figure 3: Distributions of pumped coins and top-4000 coins 2 0.50

0.25
0

on four types of feature fields. 72h 60h 48h 36h 24h 12h 6h 3h 1h
0
-12:00 -9:00 -6:00 -3:00 0:00 3:00 6:00 9:00 12:00

(c) Average return within the time window (d) An example of pre-pump
closest to that of top1001-2000 coins, which indicates that P&D from x+1 hours to 1 hour before the pump
organizers prefer to target middle-cap coins. This is because it is Figure 4: The observational study of P&D events.
much more difficult to pump up the prices of large-cap coins[17, 26],
especially for pump channels with a small number of participants. Yobit was established much earlier in 2014. In June 2018, Yobit had
Meanwhile, organizers also seldom pick coins with low market cap 1,128 coins and 6,652 pairs, yet now it is down to 445 coins and
and web popularity (Alexa rank), which normally reflects low trad- 3,589 pairs, a drop of more than 50% in the number of coins. The
ing liquidity. An extreme case is if the market is frozen, there will advantages of Binance in terms of coin number, user volume, and
be no other traders being attracted to get involved in the pump, so commissions have made it become the preferred exchange for pump
wealth transfer will only happen inside the group, which can make organizers. The distribution drift has also brought about a change
common members unwilling to participate in the pump since in in the pattern of P&D attacks. Previous studies find that the average
this case, they most likely lose money. Another observation comes return rate of attacks on Binance is relatively the lowest, about 29%
from From Figure 3(c) and (d), where the number distributions of of Yobit [17, 26], involving an average of 1.42 channels per attack.
Reddit subscribers and Twitter followers are close to that of top1- Since the trading volume on Binance is much larger than on other
1000 coins but with more minor variances, suggesting that pumped exchanges, it requires more participants for a successful P&D. In
coins have the highest discussions on social media. According to comparison, we observe in our dataset that each P&D in Binance
our statistics, the probability of a coin being pumped again is 60.1%, involves 2.25 channels on average, suggesting that pump organizers
so even though we collect the data three days before the pump improve return rates by federating multiple channels.
time, the pumped coins might have already been pumped. After Precursors to P&Ds: As described in Section 2, after a organizer
being firstly pumped, they were widely discussed on social media, decides which coin to pump, he/she gradually buys in and releases
and some members of the pump group will spread misinformation the coin name in the private channel to let VIP members buy. Finally,
about the coin to lure potential victims. the coin name is announced on the public channel, and numerous
common members pump the coin price "to the moon." Before the
4.2 Event-Level Analysis (Q2) scheduled pump time, the purchase behavior of organizer and VIP
Drift of Event Distribution by Exchange: Among the 709 P&Ds members may cause a significant market movement for a middle-
in our dataset, 62.8% (445) took place in Binance, 20.6% (146) in Yobit, cap coin. To study the precursors to P&Ds, we plot the time series
8.7%(62) in Hotbit and 3.0%(21) in Kucoin, which differs significantly of price and trading volume 72 hours before and 24 hours after
from previous literature: Xu et al.[26] report 51% of P&Ds took the pump time in Figure 4(a) and (b). The time series are averaged
place in Cryptopia, 27% in Yobit, and only 17% in Binance from from 445 P&Ds that paired BTC and occurred in Binance. For each
October 2018 to February 2019; Li et al.[17] report 52.6% of P&Ds P&D, we choose the earliest announce time in multiple channels
happened in Bittrex, 32.2% in Yobit, and only 15.2% in Binance as the pump time, and retrieve historical OHLCV (open, high, low,
from May 2017 to August 2018; It is noteworthy that the ratio of close, volume) data minute by minute via Binance API[1]. The most
P&Ds happened in Binance drastically increased (from 15.2% to obvious observation is that the coin price gradually increases even
62.8%) during the past few years. We identify reasons leading to this tens of hours prior to the pump time. From Figure 4(b), we observe
change: the two most obvious reasons are Bittrex banned P&Ds in the start point of buy-in is about 57 hours before the pump time, and
Nov. 2017[6] and Cryptopia fell into liquidation after a severe hack before that point, the averaging trading volume stays at a relatively
in Jan. 2019[7]. In addition, the reversal in the ratio of P&Ds on low level. Correspondingly, the average coin price in Figure 4(a)
Binance and Yobit is also noteworthy. Binance was established in gradually starts to rise. We also notice that several hikes appear
2017, and the number of coins was relatively small at the beginning. from 48 hours to 1 hour before the pump time, which are very
According to statistics[8], there were 150 coins and 382 pairs in Jun. similar to the largest hike of trading volume around the pump time.
2018, rising to 363 coins and 1,428 pairs by Mar. 2022. In comparison, We conjecture these hikes are very likely caused by the pre-pump,
Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump Conference’17, July 2017, Washington, DC, USA

which is that VIP members buy a large number of coins in a short


time. Figure 4(d) gives a verified pre-pump event in a VIP channel
that happened 5 hours before the pump time. The boost in price
and trading volume can be a powerful signal for us to detect P&Ds
in advance. For example, we calculate the average returns within
Channel index
the time window from 𝑥 + 1 hours to 1 hour before the pump time
(c) Market cap (USD) of pumped coins across channels
(𝑥 = 1, 3, 6, 12, 24, 36, 48, 60, 72). In comparison, we randomly select
coins and timestamps to calculate statistics in the same way. As
shown in Figure 4(c), we notice that the highest average return
reaches 9.5% when 𝑥 = 60, and the average returns of randomly
selected coins are nearly zero.
Channel index
(b) Alexa rank of pumped coins across channels
4.3 Channel-Level Analysis (Q3)
To answer Q3, we investigate coin selection strategies of different
channels w.r.t. several characteristics. Due to the space limitation,
we only report three features. Figure 5(a) presents a scatter plot
of pumped coin’s market cap across different channels. Each dot Channel index
(c) The number of Reddit subscribers of pumped coins across channels
in the figure represents a pumped coin. The X-axis is the channel
index, and the Y-axis is the market capitalization value. Although Figure 5: Scatter plots of three features of pumped coins
the market cap of most of the coins falls into the range of 106 to across different channels.
109 , for a given channel, coins’ market cap is not evenly scattered Same channel
Pumped set
but exhibits homogeneity in a particular range. This observation All coins

indicates the variety in coin selection strategies among different


channels. The reason behind this is simple: The success of a P&D
heavily relies on the participation of channel members. A bigger
channel generally has more choices for coins to pump and tends
to target bigger coins to generate higher trading volume, which is
difficult to pump for those small channels. Figure 5(b) and (c) present Figure 6: Semantic similarity distributions of coin pairs se-
Alexa rank and number of Reddit subscribers of pumped coins, lected under three strategies.
which show similar aggregation effects as Figure 5(a), indicating
that selection strategies vary across different channels.
5 TARGET COIN PREDICTION
Semantic Similarity: Besides verifying a channel’s selection strat- Based on observations in the previous section, we present a sequence-
egy exhibiting the aggregation effects on numeric features, we fur- based neural network to leverage the relationship between a chan-
ther study whether coins pumped by the same channel tend to nel’s historical pumped coins and the target coin. We also identify
share similar semantics. Specifically, we pre-train SkipGram[19] and address the coin-side cold-start problem to further improve the
embeddings on a huge cryptocurrency-related corpus and use the performance.
embedding of the coin symbol to represent that coin. Figure 6 shows 5.1 Feature Generation
cosine similarity distributions of pairs selected under three strate- The features used in our approach can be divided into three cat-
gies: 1. select from the same channel’s pump history; 2. select from egories: channel, target coin, and sequence. For target coin fea-
the pumped coins; 3. select from all coins randomly. Obviously, tures, we use both stable statistics collected three days before
the semantic similarity distribution of coins pumped by the same (e.g., market_cap, Alex_rank, etc.), and market movement signals
channel exhibit not only an aggregation effect (i.e., with a minor 1 hour prior to the pump time. Different ranges of time windows
variance), but also a significantly higher average score (0.92), com- (𝑥 = [1, 3, 6, 12, 24, 48, 60, 72]) are used to calculate statistics such
pared to that of pairs selected from pumped coins (0.80) and all as price, returns vary from lengths of time. For sequence features,
coins (0.72). This empirical study further proves that a channel has we aggregate historically pumped coins based on channel id and
its distinct coin selection pattern. sort them according to pump time to construct the sequence. Basic
Summary: We summarize the key findings: features like coin_id and historical statistics are incorporated for
• A1: Coins with middle-cap value and high discussion degree in each pumped coin in the sequence. For each sample, we constrain
social media and forums are more likely to be targeted. the maximum length to 𝑁 to unify the sequence length, i.e., if a
• A2: P&Ds are predictable as the purchase behavior of organizers sequence is longer than 𝑁 , we truncate its length to 𝑁 ; If a sequence
and VIP members can cause significant market movement before is shorter than 𝑁 , we pad its length to 𝑁 with the null value, which
the scheduled pump time. will be masked in the calculation.
• A3: A channel has its distinct coin selection strategy, which varies
from other channels. Coins selected by the same channel show 5.2 SNN: Sequence-Based DNN
homogeneity in statistics and semantics. The overall structure of the proposed SNN is demonstrated in Fig-
ure 7. The key idea is to generate a refined sequence embedding
Conference’17, July 2017, Washington, DC, USA Hu et al.

Output
field is that attention distributions of a field across positions may
MLP be different from another feature field.
𝑗
For the 𝑗-th feature field of 𝑖-th position ℎ𝑖 , we normalize its
Concat&Flatten
𝑗
corresponding attention score 𝑒 𝛼𝑖 across all the positions of the
𝑗
sequence, and multiply the normalized attention score with ℎ𝑖 :
Sequence Pooling
Í𝑁 𝛼 𝑗 𝑗
𝑗 𝑒 𝑖 · ℎ𝑖
P1 P2 PN ℎ𝑠 = 𝑖=1
Í𝑁 𝛼 𝑗 (5)
𝑖=1 𝑒
𝑖

Concat Concat Concat Concat Concat


𝒉𝑠 = ℎ𝑠1 ⊕ ℎ𝑠2 ⊕ ... ⊕ ℎ𝑠𝐾 (6)
Í𝑁 𝑗
𝛼𝑖
where 𝑖=1 𝑒 is used to normalize the weights related to the 𝑗-th
feature fields across all the positions. We obtain the final sequence
Channel Coin 1 Coin 2 Coin N Target coin representation 𝒉𝑠 by concatenating representations of all feature
Sequence fields as in Eq. 6.
MLP: The embedding and attention layers are mainly based on
Figure 7: Sequence-based Deep Neural Network
linear projections. We adopt several fully connected layers and
to represent the channel’s coin selection pattern. Specifically, SNN activation functions to endow the model with non-linearity and
mainly consists of several parts: interactions between different feature dimensions. The output 𝑦ˆ
Embedding Layer: The input of SNN contains categorical id fea- represents the predicted pump probability:
tures, e.g., channel id and coin id. These id features cannot be
directly input into the model because of their high dimensionality. 𝑦ˆ = MLP (𝒉𝑐 ⊕ 𝒉𝑡 ⊕ 𝒉𝑠 ) (7)
In this case, we adopt embedding techniques to embed the sparse Loss: The objective function used in SNN is the negative log-
features into low-dimensional dense vectors, which significantly likelihood function defined as:
eases computing. To reduce the redundancy of parameters, we make 1 ∑︁
the sequence coin id and target coin id share the same latent space. 𝐿=− (𝑦 log 𝑦ˆ + (1 − 𝑦) log(1 − 𝑦))
ˆ (8)
|D|
The generated embeddings are concatenated with other numeric ˆ ∈D
( 𝑦,𝑦)
features together to generate the overall features for the channel, where D is the training set, and 𝑦 ∈ {1, 0} is the ground-truth label
coins in sequence, and target coin. The channel embedding 𝒉𝑐 and that denotes whether it is a P&D or not.
target coin embedding 𝒉𝑡 can be formalized as:
𝒉𝑐 = ℎ𝑐1 ⊕ ℎ𝑐2 ⊕ ... ⊕ ℎ𝑐𝑋 (1)
5.3 Coin-Side Cold-Start Problem
To be consistent with the real-world application, i.e., to predict
𝒉𝑡 = ℎ𝑡1 ⊕ ℎ𝑡2 ⊕ ... ⊕ ℎ𝑌𝑡 (2) the future target coins, we test our method on a testing set where
𝑗 𝑗 the samples are collected later than the training set. This strategy
where ℎ𝑐 and ℎ𝑡 are the 𝑗-th feature field attached to 𝒉𝑐 and 𝒉𝑡 ,
also prevents future information leakage as sequences of some
respectively. ⊕ is the concatenation operator, 𝑋 and 𝑌 are the num-
latter samples may include label information of some former sam-
ber of feature fields for channel and target coin. The representation
ples. Under such a setting, we observe that the coin-side cold-start
of the 𝑖-th coin in sequence can be formalized as:
problem occurs in an end-to-end training manner. Similar to the
𝒉𝑖 = ℎ𝑖1 ⊕ ℎ𝑖2 ⊕ ... ⊕ ℎ𝑖𝐾 , 𝑖 ∈ {1, 2, ..., 𝑁 } (3) item-side cold start problem in recommendation field[12], the coin-
𝑗 side cold-start problem occurs in two situations: 1. coins pumped
where ℎ𝑖 denotes the 𝑗-th feature field attached to 𝒉𝑖 and 𝐾 is the in the testing set are never be pumped in the training set; 2. coins
number of feature fields for coin in sequence. in the testing set are never exist in the training set. Coins falling
Positional Attention: It is intuitive to believe that the closer the into one of these cases can make the model hard to classify them
two pump events occur, the higher the degree of correlation be- correctly. The underlying reason is that model cannot generate
tween them, i.e., the order of a pumped coin in sequence impacts robust id representations for these coins. Figure 8(a) shows the
the final results. In the NLP field, Transformer[25] uses positional distributions of embedding ℓ1 norm of positive and negative coins
encoding to make the model aware of the order of a token in a in the training set, where two distributions have a pronounced
sentence. Different from that, we propose a positional attention difference; Figure 8(b) illustrates the distributions of embedding ℓ1
mechanism to further refine the input coin embedding. We generate norm of four types of coins in the testing set. Untrain denotes the
a positional attention vector 𝒑𝑖 for coin embedding 𝒉𝑖 as follows: coins never exist in the training set, thus their id embeddings still
remain the same as random initialization. Positive1 represents the
 1 2 𝐾

𝒑𝑖 = 𝑒 𝛼𝑖 , 𝑒 𝛼𝑖 , ..., 𝑒 𝛼𝑖 , 𝑖 ∈ {1, 2, ..., 𝑁 } (4)
positive coins that are pumped in the training set, and Positive2 is
𝑗 the distribution of positive coins that have never been pumped in
where 𝑒 𝛼𝑖 is the attention weights corresponds to the 𝑗-th feature
𝑗 the training set, which shows a similar distribution as the negative
field of 𝒉𝑖 , and 𝛼𝑖 is a learned parameter during training. We use coins. The untrain and positive2 distributions correspond to the two
𝑗 𝑗
𝑒 𝛼𝑖 instead of 𝛼𝑖 to ensure that the value is greater than 0. The situations mentioned above and are the primary factors leading to
reason to independently assign an attention score to each feature the coin-side cold-start problem.
Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump Conference’17, July 2017, Washington, DC, USA

Table 3: Dataset Description


Pumped? Training Validation Testing Total Ratio
TRUE 548 100 200 948 0.48%
FALSE 106,900 24,666 64,099 195,665 99.52%
Total 107,448 24,766 64,299 196,613 100.0%

Table 4: Performance Comparison on Testing Set


(a) l1 norm of E2E (b) l1 norm of E2E
embedding in training set embedding in testing set Metric RS LR SVM RF DNN SNN𝑉 SNN Imp.
AUC 0.500 0.888 0.906 0.914 0.920 0.927 0.935 1.6%
HR@1 0.003 0.145 0.141 0.167 0.225 0.233 0.237 5.3%
HR@3 0.009 0.255 0.269 0.313 0.278 0.339 0.392 41.0%
HR@5 0.016 0.291 0.357 0.378 0.383 0.432 0.495 29.2%
HR@10 0.031 0.357 0.454 0.499 0.498 0.558 0.596 19.7%
HR@20 0.062 0.581 0.617 0.648 0.626 0.652 0.695 11.0%
HR@30 0.093 0.635 0.674 0.674 0.727 0.740 0.749 3.0%

(c) l1 norm of semantic (d) l1 norm of semantic


the DNN, SNN𝑉 and SNN just the same architecture, i.e., a three-
embedding in training set embedding in training set layer MLP with 128, 64, and 32 hidden units based on empirical
Figure 8: ℓ1 norm distributions of two types of coin id em- hyper-parameter tuning.
bedding in training and testing sets. Evaluation Metrics: AUC is a widely-used metric that evaluates
classification performance and is not affected by the distribution
To solve the problem, instead of training end-to-end (E2E) coin
skew. Furthermore, we employ a metric Hit Ratio (HR@𝑘) to char-
id embeddings, we use a well-trained word embedding of a coin
acterize model performance at the list level. Specifically, we rank
symbol to represent that coin. Word embeddings pre-trained on
a positive sample and its corresponding negative samples as a list
a large corpus can cover almost all of the coin symbols, making
according to pump probabilities predicted by the model and use
each embedding vector sufficiently trained and containing semantic
HR@𝑘 to indicate whether the ground truth coin is included in the
information. Specifically, we adopt some well-known word embed-
top 𝑘 samples. The reported HR@𝑘 is averaged across all the lists
ding techniques, e.g., SkipGram[19], CBoW[18], to pre-train the
with 𝑘 = 1, 3, 5, 10, 20, 30. HR@𝑘 is well suited to evaluate a model
embeddings on a large corpus of Telegram messages we collected
in a real-world setting.
from the cryptocurrency-related channels or groups, not limited to
the pump channels. Figure 8(c) and Figure 8(d) shows the ℓ1 norm 6.2 Overall Performance Comparison
distributions of SkipGram embeddings of positive and negative Table 4 summarizes the results of all methods on the testing set.
coins in the training set and testing set, respectively. Apparently, The last column is the improvements of SNN relative to its base
two distributions are consistent in the two sets, indicating that DNN model. From the table, it can be observed that: Due to the
word embedding can mitigate the cold-start problem. extreme category imbalance, the random selection strategy is very
ineffective, with only 1.6% hit ratio of selecting 5 coins among all
6 EXPERIMENTS listed coins, indicating the difficulty of this prediction task. Among
6.1 Experiment Settings all the baseline methods, deep learning methods (e.g., DNN, SNN𝑉 ,
SNN) outperforms non-deep learning models. The main improve-
Dataset: To cease data heterogeneity, we focus on predicting coins ment of DNN comes from its superior representative ability for
pumped in Binance and paired with BTC. Following this setting, high-dimensional features like channel id and coin id. Among deep
we select 948 samples (71%) out of the total 1,335 samples as the learning baselines, we observe that SNN𝑉 outperforms DNN in
positive samples. For each positive sample, we label other eligible terms of all evaluation metrics. The main improvement of SNN𝑉
coins listed on Binance at the pump time as negative coins and is that it models channels’ pump history in a sequential way. This
generate features for them to construct the negative samples. We observation verifies that considering sequential information is ben-
split the training, validation, and testing sets by the timestamp eficial for improving performances in target coin prediction. Fur-
"2021-01-19 00:00:00" and "2021-05-10 00:00:00". Table 3 shows the thermore, SNN performs distinctly better than SNN𝑉 , suggesting
statistics of dataset used in the experiment. The difference between that positional attention is a more powerful tool for sequential
positive and negative ratios in three sets is caused by the different modeling. According to the results, it is obvious that SNN performs
numbers of coins listed on Binance in different periods. best among all methods in terms of all evaluation metrics. It gains
Competitors: We compare the proposed SNN with three types 1.6% AUC, 41.0% HR@3, 29.2% HR@5, and 19.7% HR@10 improve-
of baselines. The first type is random selection (RS), to show the ments against its base DNN model. The HR@5 of 49.5% indicates
poorest performance under an extremely class imbalance situa- that SNN can achieve a relatively high probability when only se-
tion; The second type is machine-learning methods on handcrafted lect the top-5 coins predicted, making it suitable for the real-world
features, like Logistic Regression (LR), Support Vector Machine application.
(SVM) and Random Forest (RF). RF is considered as the state of the
art approach [26]; The third type of methods include a DNN that 6.3 Impact of Sequence
takes the same features as input as SNN except the sequence, and a Impact of Maximum Length: We study how the sequence length
vanilla SNN (SNN𝑉 ) that generate sequence embedding by mean affects the prediction performance. Figure 9 shows AUC and HR@3,
pooling without the positional attention mechanism. We implement HR@5 for SNN and SNN𝑉 with the maximum length 𝑁 varying
Conference’17, July 2017, Washington, DC, USA Hu et al.

Table 5: Coin Embedding Test on Testing Set


Metric E2E CBOW SG SNN SNN𝐶 SNN𝑆

HR@3

HR@5
AUC

AUC 0.656 0.837 0.852 0.935 0.940 0.943


HR@1 0.0 0.035 0.043 0.237 0.244 0.250
HR@3 0.0 0.090 0.115 0.392 0.423 0.430
HR@5 0.013 0.133 0.176 0.495 0.517 0.532
HR@10 0.057 0.253 0.286 0.596 0.613 0.621
Length Length Length HR@20 0.101 0.362 0.376 0.695 0.713 0.711
Figure 9: Effect of the length 𝑙 on AUC, HR@3 and HR@5. HR@30 0.242 0.472 0.487 0.749 0.767 0.770

0.2
7 RELATED WORK
0.1 Anatomy: Kamps et al.[14] first introduce cryptocurrency P&D
and define its life cycle as accumulation, pump, and dump, and
Features

0.0
distinguish it from traditional P&Ds in the stock market. Following
-0.1
this setting, Xu et al.[26] present a comprehensive analysis of 412
P&Ds that occurred during Jun. 2018 and Feb. 2019, emphasizing
1 2 3 4 5 6 7 8 9 10
-0.2
on describing how a P&D activity is coordinated on Telegram. Li et
Position al.[17] study P&Ds from a statistical perspective and point out the
Figure 10: Heat-map of positional attention weights of SNN. detrimental influence of P&Ds on the development of the cryptocur-
from 0 to 40 while keeping other optimal hyper-parameters un- rency market. Some characteristics of P&Ds are also revealed and
changed. When 𝑁 equals 0, both SNN and SNN𝑉 degenerate into confirmed by the subsequent studies[9, 11, 16]. In addition, some
DNN. The most obvious observation from these sub-figures is that works study P&Ds from a perspective of social media: Nizzoli et
the performance of two models reach peaks when 𝑁 = 20 and then al.[21] highlight the vast presence of Twitter bots and their key part
start to decrease when 𝑁 continuously increases, suggesting that in spreading the invitation links of pump channels in Telegram. Sim-
incorporating coin information that far away from the pump time ilar results revealed by [22] show the widespread misinformation
may introduce noise. The second observation is that SNN decays from Telegram channels to mainstream social medias.
significantly slower than SNN𝑉 , which indicates that the positional P&D Post-detection: This task targets to detect whether a P&D
attention can effectively filter useful information, making SNN has happened or is happening, given the statistics like price, trading
more robust in tackling noise. volume, or signals from external sources, like social media. Kamps
Impact of Position 𝑝: To reveal meaningful patterns of different et al.[14] conduct anomaly detection on the moving averages of
positions in the sequence, we visualize part of the original attention price and trading volume data. La et al.[15, 16] improve this method
matrix (𝛼 1, 𝛼 2, ..., 𝛼 𝑁 ), 𝑁 = 10, where 𝛼𝑖 = (𝛼𝑖1, 𝛼𝑖2, ..., 𝛼𝑖𝐾 )𝑇 . The by introducing much more fine-grained features, like market buy
heat-map is shown in Figure 10, where the Y-axis is the feature field, orders. Besides only based on market statistics data, Mirtaheri et
and the X-axis is the position in the sequence, 1 is the closet position. al.[20] take into account the statistics of tweets and users that
The lower the position, the closer it to the pump time. The most mentioned cryptocurrency to make the detection. Even though
obvious observation is that the attention scores of lower positions some of the approaches[15, 16] argue they can achieve real-time
are generally larger than that of higher positions, which indicates response, the post-detection paradigm fails to meet the practical
that the contributions of most recent P&Ds are more significant needs since the coin price can easily reach its peak in less than one
than P&Ds happened long-ago. (For a channel, the time interval minute after a P&D begins.
between 10 P&Ds usually takes 1-2 months.) This observation is Target Coin Prediction: Xu et al. [26] firstly introduce this task
also consistent with the conclusion drawn from Figure 9. and use a RF model to predict the target coin, treating P&Ds as
isolated, unrelated events and not solve the cold-start problem.
Compared with them, our work takes more in-depth analysis and
6.4 Coin Embedding Test more advanced learning models, which significantly outperforms
We investigate the improvement by solving the coin-side cold-start theirs[26].
problem. Table 5 summarizes the results of six approaches on the
testing set. E2E denotes a DNN model that only takes coin id as 8 CONCLUSION
input, CW and SG are the same DNN model trained on CBoW In this paper, we present a novel data science pipeline to tackle the
and SkipGram embedding, respectively. SNN𝐶 and SNN𝑆 share problem of target coin prediction. The pipeline effectively identifies
the same architecture of SNN but the original coin id embeddings 709 P&Ds from millions of Telegram messages and extracts valu-
are replaced with CBoW and SkipGram embedding, respectively. able and timely information from heterogeneous data sources to
It is not surprising to see E2E gives the worst performance since support efficient target coin prediction. In the analysis, we discover
it suffers from the cold-start problem. In comparison, CW and SG some interesting phenomena, such as pumped coins exhibiting
both show much superior performance, especially SG achieves the intra-channel homogeneity and inter-channel heterogeneity, which
AUC value of 0.852, suggesting that pre-trained embedding can motivates us to propose SNN to exploit this part of information.
solve the cold-start problem. This conclusion is also confirmed by Consequently, our method significantly improves over the base
the observation that SNN𝐶 and SNN𝑆 achieve positive lift against model, suggesting the proposed pipeline is well-suited for real-
the original SNN. world application.
Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump Conference’17, July 2017, Washington, DC, USA

REFERENCES International Conference on Computer Communications and Networks (ICCCN),


[1] Binance kline api. https://data.binance.vision/?prefix=data/spot/monthly/klines. pages 1–9. IEEE, 2020.
[2] Coingecko api. https://www.coingecko.com/en/api. [16] M. La Morgia, A. Mei, F. Sassi, and J. Stefa. The doge of wall street: Analysis
[3] Pumpolymp. https://pumpolymp.com. and detection of pump and dump cryptocurrency manipulations. arXiv preprint
[4] Telethon api. https://docs.telethon.dev. arXiv:2105.00733, 2021.
[5] G. Attanasio, L. Cagliero, P. Garza, and E. Baralis. Quantitative cryptocurrency [17] T. Li, D. Shin, and B. Wang. Cryptocurrency pump-and-dump schemes. Available
trading: exploring the use of machine learning techniques. In Proceedings of the at SSRN 3267041, 2021.
5th Workshop on Data Science for Macro-modeling with Financial and Economic [18] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word
Datasets, pages 1–6, 2019. representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[6] Bittrex. Bittrex global trading rules. https://bittrexglobal.zendesk.com/hc/en- [19] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed repre-
us/articles/360009687839-Bittrex-Global-trading-rules. sentations of words and phrases and their compositionality. Advances in neural
[7] CoinDesk. Hacked crypto exchange cryptopia files for us bankruptcy protection, information processing systems, 26, 2013.
2019. https://www.coindesk.com/markets/2019/05/27/hacked-crypto-exchange- [20] M. Mirtaheri, S. Abu-El-Haija, F. Morstatter, G. Ver Steeg, and A. Galstyan. Iden-
cryptopia-files-for-us-bankruptcy-protection. tifying and analyzing cryptocurrency manipulations in social media. IEEE Trans-
[8] Cryptocompare. June 2018 exchange review, 2018. https://www.cryptocompare. actions on Computational Social Systems, 8(3):607–617, 2021.
com/media/34077436/exchange_review_2018_06.pdf. [21] L. Nizzoli, S. Tardelli, M. Avvenuti, S. Cresci, M. Tesconi, and E. Ferrara. Charting
[9] A. Dhawan and T. J. Putnin, š. A new wolf in town? pump-and-dump manipulation the landscape of online cryptocurrency manipulation. IEEE Access, 8:113230–
in cryptocurrency markets. Pump-and-Dump Manipulation in Cryptocurrency 113245, 2020.
Markets (November 17, 2020), 2020. [22] D. Pacheco, P.-M. Hui, C. Torres-Lugo, B. T. Truong, A. Flammini, and F. Menczer.
[10] S. Gupta, J. Hellings, S. Rahnama, and M. Sadoghi. Building high throughput Uncovering coordinated networks on social media: methods and case studies. In
permissioned blockchain fabrics: challenges and opportunities. Proceedings of Proceedings of the AAAI international conference on web and social media (ICWSM),
the VLDB Endowment, 13(12):3441–3444, 2020. 2021.
[11] J. Hamrick, F. Rouhi, A. Mukherjee, A. Feder, N. Gandal, T. Moore, and M. Vasek. [23] T. Renault. Pump-and-dump or news? stock market manipulation on social
The economics of cryptocurrency pump and dump schemes. Available at SSRN media. 2014.
3310307, 2018. [24] P. Ruan, T. T. Anh Dinh, Q. Lin, M. Zhang, G. Chen, and B. Chin Ooi. Revealing
[12] S. Hu, Y. Cao, Y. Gong, Z. Li, Y. Yang, Q. Liu, W. Ou, and S. Ji. Gift: Graph-guided every story of data in blockchain systems. ACM Sigmod Record, 49(1):70–77, 2020.
feature transfer for cold-start video click-through rate prediction. arXiv preprint [25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, et al. Attention is all you need.
arXiv:2202.11525, 2022. arXiv:1706.03762, 2017.
[13] S. Hu, X. Zhang, J. Zhou, S. Ji, et al. Turbo: Fraud detection in deposit-free leasing [26] J. Xu and B. Livshits. The anatomy of a cryptocurrency {Pump-and-Dump }
service via real-time behavior network mining. In ICDE, 2021. scheme. In 28th USENIX Security Symposium (USENIX Security 19), pages 1609–
[14] J. Kamps and B. Kleinberg. To the moon: defining and detecting cryptocurrency 1625, 2019.
pump-and-dumps. Crime Science, 7(1):1–18, 2018. [27] C. Ye, Y. Li, B. He, Z. Li, and J. Sun. Gpu-accelerated graph label propagation for
[15] M. La Morgia, A. Mei, F. Sassi, and J. Stefa. Pump and dumps in the bitcoin real-time fraud detection. In Proceedings of the 2021 International Conference on
era: Real time detection of cryptocurrency market manipulations. In 2020 29th Management of Data, pages 2348–2356, 2021.

You might also like