You are on page 1of 9

2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS)

Gas Price Prediction Based on Machine Learning


Combined with Ethereum Mempool
1st Dongwan Lan 2nd Hao Wang
College of Computer Science and Technology College of Computer Science and Technology
2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS) | 978-1-6654-7180-0/22/$31.00 ©2022 IEEE | DOI: 10.1109/MASS56207.2022.00057

Nanjing University of Aeronautics and Astronautics Nanjing University of Aeronautics and Astronautics
Nanjing, China Nanjing, China
lan.dw@nuaa.edu.cn wangh24@nuaa.edu.cn

3rd Changchun Yin 5th Chunpeng Ge


College of Computer Science and Technology College of Computer Science and Technology
Nanjing University of Aeronautics and Astronautics Nanjing University of Aeronautics and Astronautics
Nanjing, China Nanjing, China
ycc0801@nuaa.edu.cn gecp@nuaa.edu.cn

4th Lu Zhou 6th Xiaozhen Lu


College of Computer Science and Technology College of Computer Science and Technology
Nanjing University of Aeronautics and Astronautics Nanjing University of Aeronautics and Astronautics
Nanjing, China Nanjing, China
lu.zhou@nuaa.edu.cn luxiaozhen@nuaa.edu.cn

Abstract—Gas is the internal pricing (metering system) for miners include the block revenue and the transaction fee.
running a contract or in general any transaction in Ethereum. Ethereum supports smart contracts, which greatly increases
With the popularity of Ethereum, the deficiency of current the size and complexity of transactions. It requires Ethereum
Ethereum transaction pricing mechanism First Price Auctions to have a definite resource calculation mechanism to quan-
is being amplified. The fee paid to miners is the gas used
multiplied by the gas price. Hence, designing an effective and tify the resources consumed by transactions. Hence, the gas
accurate gas price prediction method is of great significance mechanism is designed to satisfy this demand. Transactions on
for improving the efficiency, transparency and security of the the Ethereum network or various operations of smart contracts
Ethereum transaction mechanism. After the Ethereum ”London” need to consume different amounts of gas. The transaction fee
Hard Fork update, EIP-1559 has been proposed to change is determined by the gas consumption and the price of each
the historical gas mechanism and make transaction fees less
volatile and more predictable. Therefore, we propose a machine unit of gas calculated by Ethers, which is called the gas price.
learning based method to predict the gas price of next blocks According to the Ethereum transaction pricing mechanism
combined with a dynamic feature exploited from mempool after First Price Auctions, miners have the priority to include the
the proposal of EIP-1559. Specifically, we consider the pending transactions with a higher gas price, which means users need
transactions and their gas price in the mempool and take them to raise their gas price to make their transactions confirmed
as a machine learning feature for the first time. Due to the
update brought by EIP-1559, we refine more features than the earlier. However, with the popularity of Ethereum and the
related works. We use machine learning models combined with flourishment of DeFi and Dapps, the gas price has fluctuated
the mempool features for prediction. Experiments conducted on back and forth substantially and once reached 500 gwei on
the dataset manifest that our model combined with the mempool May 1, 2022 [1]. In such an environment with unstable gas
data shows good prediction performance, especially significantly price, it is difficult for users to choose an appropriate gas price
improving the two indicators MAE and RMSE. Furthermore,
we analyze and discuss the challenges of our scheme and the to strike a balance between their time and financial costs.
potential profound effects brought by our work. Ethereum core developers and several researchers have paid
Index Terms—Ethereum, Gas Price, Machine Learning, Pre- much attention to this issue and attempted to propose some
diction. solutions for improvement, including upgrading the gas fee
structure. On August 5th, 2021, Ethereum activated a major
I. I NTRODUCTION backward-compatible update named “London” Hard Fork [2].
New transactions initiated in Ethereum need to be confirmed In this update, a proposal named EIP-1559 makes a significant
by miners before being included in new blocks. Incentives change on the historical gas fee mechanism in consideration of
push those miners with computing power to validate these efficient and economic motivations. The calculation method of
transactions and include them into a block. The rewards for the gas fee is adjusted to to Gas · (BaseF ee + P riorityF ee),

2155-6814/22/$31.00 ©2022 IEEE 346


DOI 10.1109/MASS56207.2022.00057
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
in which base fee is a relevantly stable value and can move 1) captures features from the current gas mechanism after
up or down each block according to a settled formula. It will the London Hard Fork update;
be burnt and not given to the miners. Users can specify the 2) takes the mempool into consideration and combine it
maximum fee per gas they are willing to give to miners to with the machine learning algorithm.
incentivize them to include their transactions. The purpose
of EIP-1559 is to improve the gas fee mechanism and make C. Our Contribution
transaction fees less volatile and more predictable. This paper predicts gas price by combining machine learn-
However, EIP-1559 does not perform the desired effect. ing models with the current state of transactions in Ethereum’s
According to Footprint Analytics [3], after the update, the mempool. Features were selected through the analysis of the
fees paid by users are mainly basic fees, and the priority fees transaction structure, combined with the changes in the gas fee
paid to miners are small and relatively stable. But the plateau structure after the London update. While crawling the feature
did not last long, and the price of Ethereum trading began to data, we collect the data of gas price prediction based on the
fluctuate. Through observation, it can be seen that there are Ethereum mempool in the same period. Merge the feature data
still large fluctuations in the changes of gas price, and gas to generate a dataset from March 17, 2022, to April 13, 2022.
price has even soared several times. Therefore, the prediction The experiment results show that the models combined with
for the gas price makes a lot of economic sense for users in mempool feature input can predict the median gas price of the
this undulatory transaction system. next block well. Furthermore, after conducting overall analysis
of the changing trend of its gas price, we find that the median
A. Related Works gas price after the update is mainly in the low price range.
Our contributions can be summarized as follows:
At present, several researches and online recommendation
tools for the gas price exist. Ethereum client Geth gives a 1) for the first time, we consider the real-time mempool
recommendation based on the gas price of previous blocks data and take them as the features with the LSTM (long-
[4]. Gasnow [5] makes gas price predictions based on the short time memory) and XGBoost (eXtreme Gradient
mempool data in the mining pool, which is more accurate than Boosting) algorithms to propose a mempool-based ma-
predictions based on historical data. Still, gasnow officially chine learning gas price prediction model.
stopped all services on October 16, 2021. EthGasStation uses 2) we consider the changes on gas mechanism after EIP-
the Poisson regression model to make gas price predictions [6]. 1559 and conclude more significant features to make our
The above tools can provide real-time gas price predictions, model more effective.
but there is still a gap between the predicted data and the actual 3) we collect real-world data from March 17, 2022 to April
value. Liu et al. [7] proposed a Machine Learning Regression- 13, 2022 and conduct the experiment on our model with
based gas price forecasting approach. Its models are based on the data. We make comparisons with the related works
the transaction structure before the Ethereum London update. and demonstrate our model performs a greater prediction
Seven of all their models have unsatisfactory prediction results. performance, especially significantly improving the two
And we selected the model with a better prediction effect indicators MAE and RMSE.
for comparative experiments. Werner et al. [8] proposed a 4) we also analyze and discuss the challenges of our
deep learning-based price prediction model and an algorithm scheme and the potential profound effects brought by
parameterized by a user-specific urgency value to recommend our work.
the gas price, with gas price predictions for different levels. II. P RELIMINARY
However, all of these works only consider the factors before
To better understand the changing trend of Ethereum gas
the update of EIP-1559 and may not be effective in the new
price and the reasons for its change, this section will first
transaction system. Meanwhile, we find these works focus only
describe the basic transaction structure of Ethereum, the
on one aspect and it lacks a comprehensive and multi-model
process from transaction initiation to confirmation, and the
prediction method for the gas price prediction.
calculation method of gas fees. Later, we will introduce the
specific concepts and functions of the Ethereum mempool in
B. Motivation
detail. Besides, we will present the rationale for the time series
Making the gas price become a visible and transparent index model we utilize in our scheme.
can help users easily find the right gas price for their individual
demands. In the short term, predicting gas price accurately A. Ethereum Gas Mechanism
can help recommend the gas price for several positions to In the transaction structure of Ethereum, gas price is the
retrench the gas fee. In the long run, if this prediction can price that users are willing to pay for gas, and gas limit is the
be widely used, it will also have the potential to improve the maximum amount of gas that each transaction can consume.
stability of gas price. So far, there is no prediction of gas price Ethereum calculates gas according to various operations in
for the adjusted gas mechanism and previous works lack the the transaction. Not only do transactions have a gas limit,
comprehensive method. All of these considerations motivate but blocks also have a gas limit. Users set the gas price
us to predict the Ethereum gas price that: according to their own needs. The user confirms the transaction

347

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
after setting the parameters, and the transaction is sent to the normal circumstances, the transaction will leave the mempool
Ethereum network by the wallet node. Miners read transactions when the block receives it. Still, it is also possible for
in the Ethereum mempool and verify them. The gas fee is transactions to be replaced due to the acceleration behavior
what motivates miners to choose transactions. According to of others or discarded due to the mempool configuration of
the principle of incentive compatibility, rational miners will the node. Therefore, transactions will increase the probability
prefer transactions with a high gas price to be processed. of being confirmed by setting a higher gas price.
The calculation method of the gas fee becomes the base fee
and the priority fee after London update. However, it can still C. LSTM Neural Network
be understood as GasLimit·gasprice, and gas price becomes The LSTM model is specially designed to solve the long-
composed of base fee and priority fee: term dependency problem in general RNN [11]. RNN is a
neural network that contains recurrent cycles that allow the
Gasf ee = GasLimit · GasP rice (1) persistence of information. The cycle is represented by a
continuous sequence. A recurrent neural network can be seen
GasP rice = basef ee + priorityf ee (2) as several identical basic units connected, and each basic unit
priorityf ee = min(M P F P G, M F P G − basef ee) (3) can pass information to the following basic unit.
To make up for the shortcomings of traditional RNN, the
Among them, gas limit refers to the maximum gas that the LSTM model is specially designed. LSTM can be known as
user is willing to pay. Max fee per gas refers to the maximum a special RNN. Compared with conventional RNN, LSTM
gas price that a user is willing to pay to perform all operations inherently has good support for long-term dependencies. There
in the transaction, and we call it MF PG for short. Max are two central cores of the LSTM model: the memory cell and
priority fee per gas refers to the highest gas price value of the non-linear gating unit. The memory cell is used to maintain
the priority fee that the user is willing to pay to complete this the system’s state, and the non-linear gate units are used to
transaction, and we call it MPF PG for short. regulate the information flowing into and out of memory tuples
The base fee will be automatically calculated according to at each point in time [12]. The LSTM neural network is shown
the block size. The base fee will automatically increase when in fig.1, each cycle can be unrolled into an infinite number
the block size is more than 15M [9]. When the block size is of repeating basic units, and LSTM also unrolls the cycle
less than 15M, the base fee will automatically decrease, and into repeating basic units. The internal structure of the basic
the base fee will eventually be burned and destroyed to the repeating unit in traditional RNN is straightforward, with only
black address. The priority fee can be understood as a queuing one superficial neural network layer. Still, the basic repeating
fee, which is part of the fee finally paid to miners. The actual unit of LSTM uses four neural network layers and interacts
parameter that the user needs to set is the max priority fee per with each other in a special relationship.
gas and the max fee per gas. The user will return the excess
cost if the max fee per gas is more than the sum of the base Š୲
fee and priority fee. ‡ŽŽ

B. Ethereum Mempool ୲ିଵ ൈ ൅ ୲

Under normal circumstances, rational miners will work ൈ –ƒŠ


to maximize their benefit. Miners will select transactions Žƒ›‡”
V V –ƒŠ V ൈ
with higher transaction fees in the Ethereum mempool for
Š୲ିଵ Š୲
verification and packaging. Before all the transaction content
is written into the block, the transaction content will be stored
in the mempool [10]. The mempool is the last step before the ୲
transaction is confirmed by the block.
A mempool in Ethereum refers to a set of memory data Fig. 1. LSTM Neuron Network.
structures within an Ethereum node. Candidate transactions
are stored in the mempool before being mined by miners. It is This cell operates on the input of the current time t, the
refered to “transaction pool” in Geth and “transaction queue” input of the previous time t-1, and the three vectors of the
in Parity. It can be understood as a waiting area before a block state to obtain the output and state of the present time.
accepts a transaction.
Valid transactions should go to the mempool before being D. XGBoost Regression Prediction
sent to an Ethereum node, but there is no unified mempool. XGBoost efficiently implements the gradient boosting ma-
Instead, each node has its mempool, which it tries to keep chine learning algorithm, a particular gradient boosting de-
in sync with other peers over the Ethereum network. Since cision tree that performs well on multiple machine learning
network communication is not always reliable or timely, each problems. The main idea is to make shared decisions by
node has a slightly different mempool. Additionally, nodes concatenating multiple decision tree models [13]. The con-
have different rules for which transactions they accept. Under catenated method we call iterative error prediction. After each

348

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
decision tree is trained, there will be an error. The next and return the second-to-last value in the sorted transaction as
decision tree will be trained on this error, and there will be the predicted value of the next block.
errors in the prediction results, and so on. Trees are used The second method is to process all transactions of the last
to predict the error of the previous tree, and the sum of the 100 blocks from now and predict the next block according to
prediction results of all trees is the final prediction result. the overall situation of the transactions. The specific steps are
The XGBoost model can be simply expressed as: as follows: first, extract all transactions of 200 blocks in the
T form of (block num, gasprice), then combine and group all

ft (x) = ht (x) (4) transactions’ data according to the same gas price and count
t=1 the number, and convert it to (count, GasP rice) form. Then
sort all gas price in ascending order, calculate the cumulative
Among them, ft (x) represents the sum of the previous t
sum of the corresponding counts, and the cumulative sum
trees, and ht (x) represents the result of the tth decision tree.
of the lowest gas price is the total number of transactions.
It can also be expressed recursively:
Finally, divide the accumulated sum by the total number of all
ft (x) = ft−1 (x) + ht (x) (5) transactions, and we record it as pct. Based on the stability of
the speed of on-chain, we can set the gas price for fast chaining
III. A M ULTI -M ODEL P REDICTION M ETHOD FOR when pct>80%. Its complete workflow chart is shown in
DYNAMIC A DJUSTMENT OF M EMPOOLS Fig. 3.
In this section, we will introduce our model and explain As shown in Fig.4, we compare the predicted results with
from the various parts of the model. Fig.2 shows the over- the actual gas price. There is a specific error in the mempool
all structure of the entire prediction model. First, we make prediction gas price results based on the above two methods.
prediction based on mempool through two different methods. However, it can be seen that the changing trend is very
Then, the prediction data are merged with other transaction consistent with the actual value. Therefore, if the data obtained
data crawled through etherscan API to be the original dataset. by the two prediction methods are input into the machine
We extract feature data after preprocessing the original dataset learning model as features, the correlation between the features
and put them into machine learning models. Finally, the and the prediction target should be extremely high. Meanwhile,
LSTM model performs point-by-point prediction and sequence considering the impact of other transaction factors on gas
prediction respectively, and the XGBoost model performs price, we decided to collect the data predicted by the two
classification prediction. methods based on mempool and combine them with other
feature data according to the intersection item. We record the
A. Gas Price Prediction Method Based on mempool feature data of the first method as mem1 gp and the second
The regression prediction method based on machine learn- method as mem2 gp.
ing considers the correlation of past data relative to the pre-
diction target. According to the transaction structure and Gas B. Feature Extraction
mechanism, we selected some features with a high correlation
with gas price as possible. However, gas price can also be According to the Ethereum transaction structure and gas
affected by quite a few external factors. If users are eager fee structure described in Chapter 2, we select the following
to get their transactions confirmed without caring about other features: GasLimit, Gasused, base fee per gas (BF PG), avg
cost factors, then gas price is affected due to human subjective max fee per gas (AMF PG), avg max priority fee per gas
factors. If the current Ethereum price is too high, users may ( AMPF PG), difficulty, transactions(txs) as factors affecting
lower their bids, and then gas price will be affected by the fluctuation of gas price. Each feature is described in detail
objective factors. We can also think that gas price is a relatively below:
independent individual from this perspective. Then the pending • GasLimit: The gas limit defined in the user transaction
transactions in the mempool in real time become our essential refers to the maximum gas that the user is willing to pay
reference factor. in order to execute this transaction [14]. The gas limit
We design two schemes to make a prediction based on in the block is defined as the maximum amount of gas
mempool. The first is to predict based on the overall state allowed in the entire block, that is, the maximum limit
of the current mempool, and the second is to predict based on of the gas sum of all transactions.
the condition of the last 100 blocks. We connect to the remote • GasUsed: All commands in the EVM are set with the gas
node through infura and obtain the transaction currently in to be spent, and the gas used is the total amount of gas
the cache according to the keyword gas price for the first consumed by all commands executed in this transaction.
method. Check its validity period in the mempool, which is Both gas limit and gas used are concepts of quantity.
how long it can survive in the mempool. Next, we get the • Base fee per gas: It refers to the minimum effective
block that is currently pending. If there is no transaction in it, gas price the user needs to pay in the transaction. This
an empty block, it will return a null value and a block number. parameter is newly proposed in the EIP-1559 proposal
If there are transactions in the pending block, they are sorted in after the London update. The calculation method of the
descending order by gas price. Record the current timestamp, base fee is GasU sed · BF P G

349

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
'DWD&ROOHFWLQJ
[ \ 0XOWL0RGHO3UHGLFWLRQ
FUDZO 'DWD0HUJH
« \ 3UHGLFWHG
/670
*DVSULFH
[
WLPHVWDPS PHPBJS 3RLQWE\3RLQW
W[V  « *3 ଵ
 « 6HTXHQFH
«« «
0(0 
322/ %ORFNBQXP PHPBJS «« « ;*%RRVW *3 ଶ

EORFNV  )HDWXUH
1RWLRQW[VUHSUHVHQWWUDQVDFWLRQV
 ([WUDFWLRQ

D 3UHGLFWLRQ%DVHGRQ0HPSRRO E )HDWUXH ([WUDFWLRQ F 3UHGLFWLRQ%DVHGRQ0DFKLQH/HDUQLQJ

Fig. 2. Overall Structure of Prediction Model.

%ORFNBQXP %ORFNBQXP %ORFNBQXP


« « « «

%ORFNBQXP *DV3ULFH &RXQW *DV3ULFH &XP *DV3ULFH 3&7 *DV3ULFH


 šଵ ଵ
‫݌‬ଵ
 šଶ ଶ
‫݌‬ଶ
šଷ ‫݌‬ଷ
«


 šସ ‫݌‬ସ
šହ

‫݌‬ହ 1H[W%ORFN
 ହ
3UHGLFWLRQ

«
«

«
«

«
«

 š௡ିଵ ௡ିଵ


‫݌‬௡ିଵ
 š௡ ܿ௡ ‫݌‬௡
«
«

 ܿ௝ ൌ σ௝௜ୀଵ ‫ݔ‬௜ ‫݌‬௝ ൌ ܿ௝ Ȁ σ௡௜ୀଵ ‫ݔ‬௜




Fig. 3. Block Transaction Analysis and Prediction Workflow.

Fig. 4. Prediction Based on Mempool.


• Max fee per gas: It refers to the maximum gas price that
the user is willing to pay to perform all operations in
the transaction. This parameter needs to be set in each C. Gas Price Prediction Method Based on Machine Learning
transaction. So, we take this parameter’s average value
for all transactions in a block as the feature, which is Analysis of the transaction structure of Ethereum and the
recorded as avg max fee per gas, refer to it as AMF PG. gas mechanism will lead to such characteristics. First, the base
• Max priority fee per gas: It refers to the highest gas fee will be adjusted according to the usage of the previous
price of the priority fee that the user is willing to pay block, and the size of the target block will be compared with
[15]. It can be understood that when max fee per gas and the size of the last block. If the size of the last block exceeds
max priority fee per gas are set to the same value, then the current target block, the base fee will increase by up to
max fee per gas is equivalent to the gas price before the 12.5%. Secondly, changes in the base fee will also indirectly
update. Similarly, this parameter needs to be set for each affect the settings of the priority fee. Ultimately, it will affect
transaction in Ethereum, so we also select the average the user’s settings for gas price. Then we can reasonably
value of this parameter in all transactions in one block believe that the block state before the target block impacts
as the feature input value, which is recorded as avg max the gas price of the current target block. Therefore, we select
priority fee per gas, referred to as AMPF PG. a deep LSTM neural network for time series forecasting.
• Difficulty: It refers to the mining difficulty of the current First, we convert and load the merged dataset from a CSV
block of Ethereum. The mining difficulty will be dynam- file into a pandas dataframe, which is used to feed the numpy
ically adjusted according to the current mining situation array of the LSTM. The Keras LSTM layer works by using
of Ethereum, so that the block time can be controlled a three-dimensional (N, W, F) numpy array [17], where N
within a stable interval as much as possible. The mining is the number of neurons in the training sequence, W is the
difficulty can also indirectly reflect the competition status sequence length, and F is the number of features per sequence.
of the current transaction [16]. We choose N as 70, W as 50, and F as 3, 4, and 5. After
• Transactions count: refers to the number of transactions loading the data and building the model, we train and test
contained in the current block, referred to as txs. the model on the dataset. For the output, the model can make
predictions in two modes: the first makes predictions on a

350

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
point-by-point basis, predicting only one point at a time, and After obtaining the original data, firstly, we convert the base
the following window uses the complete test data to predict to decimal data, then process the null values in the data, and
the next point. The second method is to predict a complete finally, unify the unit in Gwei. Among them, the max fee per
sequence, using only a portion of the training data set for gas and max priority fee of all transactions are taken as the
initializing the training window and moving the window as the feature input with the block number as the standard, and the
model predicts the next point as in the point-by-point mode. In average and median gas price are taken simultaneously. The
the second-step prediction, only one piece of data comes from median gas price is used as the prediction target. For empty
the previous prediction. There are two data from the previous block data, miners occasionally mine only an empty block
prediction in the third-step prediction, and so on 50 times. The without packaging any transactions. The gas price of this data
data will all be the previous prediction data [17]. Its simplified is 0. The frequency of such data is not high, which is a normal
schematic diagram is shown in Fig.5. phenomenon.
At the same time as the etherscan data crawling, the gas
3RLQWE\3RLQW price data obtained by the two prediction methods through the
« mempool are collected. We refer to the data crawled through
etherscan as the original data source. The two types of data
«
predicted based on the Ethereum mempool are mem1 gp and
/670
1HXUDO1HWZRUN
6HTXHQFH3UHGLFWLRQ mem2 gp. We merge the data of mem1 gp with the original
«

« data source according to timestamp and merge the data of


mem2 gp with the original data source according to the block
« number. The finally combined data format is shown in Table
’—– —–’—– I.
The lack of max fee per gas and max priority fee per gas in
Fig. 5. Simplified LSTM Prediction. the transaction data is because the transaction does not adopt
the Gas mechanism after the London update. The gas price is
We compare the functions, advantages and disadvantages
not divided into two parts, and the base fee is the gas price.
of multiple time series model forecasting. Based on the
The standard process of averaging needs to be divided by
processing ability of abnormal data and missing data, it can
the total number of transactions in the block, so it is only
be found that XGBoost has good performance and portability.
necessary to remove the number of transactions that do not
According to the collected data, Ethereum blocks occasionally
use the improved gas mechanism from the total number of
have incredibly high data, or miners choose not to package any
transactions and then average.
transactions and only empty blocks, resulting in missing gas
price values. Considering the above factors, we additionally B. Evaluation Indicators
choose the XGBoost regression prediction model, which can
have good compatibility with abnormal data. We use three of the most popular predictors, MAE, RMSE,
and R2score, which are defined as follows:
IV. E XPERIMENTS • MAE: Mean Absolute Error.
In this section, we will evaluate the performance of the m
1 
model. After flattening the data obtained by the two prediction M AE = |(yi − yˆi )| (6)
methods based on mempool and aligning the feature data m i=1
crawled by etherscan [18], they are input into the XGBoost
MAE is the mean value of absolute error, which can well
and LSTM models respectively. XGBoost sets the null value
reflect the actual situation of predicted value error.
processing method to use the average value of the feature
• MSE: Mean Squared Error, the summed average of
column to fill and sets the ratio of the training set to test set
the squared difference between the true value and the
to 7.5:2.5. We use the XGBoost regression model for training
predicted value.
and making the prediction. We use the mean absolute error,
m
mean square error, and R2 score as evaluation criteria to judge 1  2
the experiment results. (yi − yˆi ) (7)
m i=1
A. Datasets and Preprocessing
• RMSE: Root Mean Square Error.
We collect Ethereum transaction data from March 17, 2022, 
 m
to April 13, 2022 and perform data crawling through the API 1 
RM SE = 
2
provided by etherscan according to the established character- (yi − yˆi ) (8)
m i=1
istics [18]. Then we process the raw data to become a usable
dataset. At this stage, our primary work is to unify data units,
• R2 score
fill in missing valuable information and relabel some features. n 2
A total of 159,792 pieces of raw data were collected, (yi − ŷi )
R2 = 1 − ni=1 2 ∈ [0, 1] (9)
recorded as one piece of data with one block as the standard.
i=1 (yi − y i )

351

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
TABLE I
E XAMPLE OF P REPROCESSED DATA

BlockNum gas limit gas used BF PG D timestamp tx AMF PG AMPF PG MEM1 MEM2 MGP
14414732 30087858 30046709 42.7700 129 1647664473 591 64.8997 2.5284 38 44 44.1700
14414733 30058477 15337995 48.1016 129 1647664497 238 59.0064 4.3336 38 49 49.5016
14414734 30087829 20536989 48.2251 129 1647664514 428 64.7157 3.7728 38 49 49.6251
14414735 30117210 30110033 50.4262 129 1647664531 374 110.2117 11.3203 42 51 52.4262
...

In the above formula, ŷ is the predicted output value of the requires real-time connection of nodes to collect the current
model, and y represents the average value of the actual value. Ethereum mempool data. Therefore, comparative experiments
The above measures will have different meanings depending can only be compared within the range of features that can be
on the model. The specific number loses readability. The value collected. Table III shows the comparison data of the indicators
of R2 is between 0 and 1, and its value reflects the relative of the two models. From the MAE perspective, our model has
degree of the regression contribution, that is, the percentage improved considerably. From an RMSE perspective, there is
that the regression relationship can explain in the change of less room for improvement without including the mempool
the dependent variable. input.
The advantage of R2 is that the results are normalized,
making it easier to see the gaps between the models. Typically, TABLE III
for training dataset, the range of values is [0,1]. For the testing E RROR DATA C OMPARISON
set, the value may also be negative [19]. Generally speaking, Training Set Testing Set
Model
the higher the value, the higher the degree of explanation of MAE RMSE MAE RMSE
the independent variable to the dependent variable, and the LSTM 0.09 0.32 0.08 0.12
XGB 0.91 1.51 0.54 3.01
higher the percentage of the change caused by the independent MLR-LSTM 3.21 1.60 3.04 1.76
variable to the total change. MLR-XGB 2.66 1.51 2.75 1.54

C. Results and Analysis


We get the Ethereum mempool information by connect- Analyzing the reasons, we consider that Liu’s work is all
ing to the infura remote node. And all the experiments forecasting work before the London update. Our model is
were conducted on a laptop with 11th Gen Intel Core i5- designed in combination with London update transaction struc-
1135G7@2.40GHz, 16 GB memory, and 512 GB SSD. Fig.6 ture, and the factors considered may be more comprehensive
shows the variation of gas price between predicted and true and more suitable for the prediction of gas price.
values for each model. Predicted sample points from 300 to Since our model needs to access the data in the mempool in
400 were selected for plotting for LSTM and MEM-LSTM real-time, we decided to use the model of Liu et al. and collect
models. Predicted sampling points from 0 to 100 were selected the relevant data between April 18, 2022, and May 13, 2022,
for the XGB and MEM-XGB models for plotting, and it for training and testing. Contrast experiments with our model.
can be seen that each model fits well. Table II shows the We selected two models with relatively good performance
values of MAE, RMSE and R2 for all current models on results for comparison according to the original text. The
the training set and test set. Based on the overall results, the experimental results are as Table IV.
model combined with the mempool features is better. From
the perspective of MAE and RMSE, LSTM has a better effect TABLE IV
E RROR DATA C OMPARISON BASED ON M EMPOOL
on prediction. From an R2 perspective, the XGBoost model
has better predictive capabilities. The input of the mempool Model
Training Set Testing Set
feature has a greater impact on the XGBoost model. MAE RMSE MAE RMSE
MLR-LSTM(Liu’s) 3.68 1.97 4.02 2.07
MLR-XGB(Liu’s) 2.72 1.78 2.83 1.84
TABLE II MEM-LSTM(Ours) 0.09 0.33 0.06 0.11
E RROR R ESULTS OF P REDICTION MEM-XGB(Ours) 0.27 0.41 0.39 0.95
Training Set Testing Set
Model
MAE RMSE R2 MAE RMSE R2
LSTM 0.089 0.324 0.160 0.082 0.119 0.821 It can be seen that the machine learning regression pre-
XGB 0.910 1.507 0.998 0.535 3.007 0.955 diction model combined with the mempool features has a
MEM-LSTM 0.087 0.334 0.108 0.064 0.108 0.843
MEM-XGB 0.271 0.409 0.998 0.390 0.950 0.962
more notable improvement over the original machine learning
model.

According to the prediction work of Liu et al. [2] for gas V. D ISCUSSION
price in 2019, we used the data of the same period for com- In this section, we focus on some of the challenges encoun-
parative experiments. The prediction method for the mempool tered by our scheme and their potential causes. In addition,

352

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
Fig. 6. Overall Prediction of Gas Price.

we also point out other potential additional impacts that our behaviors, resulting in a surge in gas prices. The starting points
method may bring. of these behaviors are different, but they all have a common
A. Anomaly Detection feature: the sender of the transaction does not care about the
cost increase caused by the increase in gas price. They believe
We find that our model was not fit well at predicting when
that transaction confirmation speed is more important than
faced with a data surge. We have tried some solutions, such
the economic cost of high consumption. These are complex
as adding a feature to evaluate the possible impact of certain
challenges for our predictions because we cannot intervene
unexpected events on gas price or detecting behaviors such
or predict the subjective consciousness of others. In essence,
as malicious price hikes that increase gas prices for a certain
these abnormal data appear due to the contingency brought
period. However, the experiment results still cannot accurately
about by the Ethereum transaction bidding mechanism. How-
predict the mutation data. The frequency of such abnormal
ever, we can also analyze these behaviors from other angles.
data is about 0.1%-0.2%. Although not frequently, it is still a
We can mark these active addresses and increase the weight
challenge that our model needs to consider and address.
of this feature when the mark appears to offset the impact of
We tracked and analyzed all transactions in several blocks
subjective factors on gas price, which can be the future work.
before and after these mutation data and some events that
occurred around this moment that may affect the gas price. B. Impacts on Transaction Fee Mechanism
Moreover, we tried to discover the reasons and laws behind it. Ethereum historically priced transaction fees using a simple
We first selected 41 abnormal data from more than 40,000 auction mechanism, where users send transactions with bids
pieces of data. We considered the related transaction data (”gas prices”) and miners choose transactions with the highest
of 5 blocks before and after them and the external events bids, and transactions that get included pay the bid that they
that occurred at the corresponding time. After analyzing, we specify [15]. Although it can effectively satisfy the demands
summarize three potential causes of gas price mutation: for many transactions to be uploaded quickly, a system, that
1) These abnormal transactions may come from the smart leaves the uploading order of transactions entirely in the
contracts of some token systems, such as NFT or other hands of the miners, gives them too much power, which may
decentralized financial projects. They often choose to lead to several unfair behaviors such as frontrunning [20] or
significantly increase the gas price for specific incentives freeloading [21]. Meanwhile, since transaction fees are an vital
so that their transactions can be listed on the chain as source of income for miners, rational miners naturally hope the
soon as possible to ensure that their financial needs are gas price can be as high as possible because transaction fees =
confirmed. gas price * gas used. What we want is a consensus mechanism
2) Some anomaly data is due to the bidding wave of some that is completely transparent and perfectly symmetric in
arbitrage bots. The prominent feature of this situation information for all.
is that transactions from two or more same addresses On the face of it, our work is only a prediction of the
and nonces appear multiple in the pending mempool at gas price and it has little impact on the gas price. However,
a specific moment with gradually increasing gas prices. gas price is an unknown information for users while miners
In order to succeed in the zero-sum game, these arbitrage have more information about transactions and gas price than
robots will increase the gas price wildly, sometimes users with access to mempool. Therefore, there is a potential
hundreds of times, in order to beat their opponents, possibility of attack. However, the real-time prediction of gas
which is described in detail in [20]. price brought by our work brings a possible indicator of
3) A tiny portion of the mutation data has no apparent gas price for many users like a lighthouse, amplifying the
pattern. We speculate that this may be due to personal transparency of this unknown property and thus potentially
reasons, such as curious attempts or misoperations when limiting some unfair behavior. Furthermore, based on our
using Ethereum for the first time. predictions, users can ideally better choose the timing of their
The first two types of data account for 36 of the 41 sent transactions and more precisely choose the threshold of
abnormal data, which can be classified as uncontrolled bidding gas price needed to achieve their goals. This further reduces

353

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
the magnitude of gas price fluctuations, thus indirectly helping [15] G. Wood et al., “Ethereum: A secure decentralised generalised trans-
action ledger,” Ethereum project yellow paper, vol. 151, no. 2014, pp.
to achieve better stability of gas price. 1–32, 2014.
[16] V. Buterin, E. Conner, R. Dudley, M. Slipper, I. Norden, and A. Bakhta,
VI. C ONCLUSION “Eip-1559: Fee market change for eth 1.0 chain,” [Online], 2019, https:
//github.com/ethereum/EIPs/blob/master/EIPS/eip-1559.md.
In this paper, we propose a multi-model gas price prediction [17] H. Jang and J. Lee, “An empirical study on modeling and prediction
method combined with Ethereum mempool feature input. of bitcoin prices with bayesian neural networks based on blockchain
information,” Ieee Access, vol. 6, pp. 5427–5437, 2017.
While collecting Ethereum block transaction data, read the [18] “Etherscan,” [Online], 2022, https://etherscan.io/.
transaction gas price in the mempool. This data is used to [19] “sklearn.metrics, r2 score,” [Online], 2022, https://scikit-learn.org/.
predict the gas price of the next block through two non- [20] P. Daian, S. Goldfeder, T. Kell, Y. Li, X. Zhao, I. Bentov, L. Breidenbach,
and A. Juels, “Flash boys 2.0: Frontrunning in decentralized exchanges,
machine learning methods. The two kinds of prediction data miner extractable value, and consensus instability,” in 2020 IEEE
are collected and processed, and fused with the selected feature Symposium on Security and Privacy (SP). IEEE, 2020, pp. 910–927.
data. Finally, the fused feature data is input into the two mod- [21] F. Zhang, E. Cecchetti, K. Croman, A. Juels, and E. Shi, “Town crier: An
authenticated data feed for smart contracts,” in Proceedings of the 2016
els. Experiments show that it has excellent performance for aCM sIGSAC conference on computer and communications security,
the model combined with mempool feature input. Compared 2016, pp. 270–282.
with the model without mempool feature input, the prediction
ability has been improved to a certain extent. Compared
with the existing gas price prediction work, it has a good
improvement in prediction ability.

VII. ACKNOWLEDGEMENTS
This work was supported by the National Key
R&D Program of China (Grant No. 2020YFB1005900,
2020B0101090002), the National Key R&D Program of
Guangdong Province (Grant No. 2020B0101090002), the
National Natural Science Foundation of China (Grant No.
62032025, 62071222, U21A201710, U20A201092), and the
Natural Science Foundation of Jiangsu Province (Grant No.
BK20200418, BK20220880).

R EFERENCES
[1] “Average daily gas price of ethereum from august 2015 to may
16, 2022,” [Online], 2022, https://www.statista.com/statistics/1221821/
gas-price-ethereum/.
[2] T. Beiko, “London mainnet announcement,” [Online], 2021, https://blog.
ethereum.org/2021/07/15/london-mainnet-announcement/.
[3] “Footprint analytics,” [Online], 2022, https://www.footprint.network/.
[4] “etherchain.org,” [Online], 2022, https://etherchain.org/tools/gasnow/.
[5] “go-ethereum,” [Online], 2022, https://github.com/ethereum/
go-ethereum/.
[6] “Ethgasstation,” [Online], 2022, https://ethgasstation.info/.
[7] F. Liu, X. Wang, Z. Li, J. Xu, and Y. Gao, “Effective gasprice pre-
diction for carrying out economical ethereum transaction,” in 2019 6th
International Conference on Dependable Systems and Their Applications
(DSA). IEEE, 2020, pp. 329–334.
[8] S. M. Werner, P. J. Pritz, and D. Perez, “Step on the gas? a better
approach for recommending the ethereum gas price,” in Mathematical
Research for Blockchain Economy. Springer, 2020, pp. 161–177.
[9] Y. Liu, Y. Lu, K. Nayak, F. Zhang, L. Zhang, and Y. Zhao, “Empirical
analysis of eip-1559: Transaction fees, waiting time, and consensus
security,” arXiv preprint arXiv:2201.05574, 2022.
[10] G. Chen, B. Xu, M. Lu, and N.-S. Chen, “Exploring blockchain
technology and its potential applications for education,” Smart Learning
Environments, vol. 5, no. 1, pp. 1–10, 2018.
[11] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[12] C. Olah, “Understanding lstm networks,” Colah. github. io, 2015, http:
//colah.github.io/posts/2015-08-Understanding-LSTMs/.
[13] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,”
in Proceedings of the 22nd acm sigkdd international conference on
knowledge discovery and data mining, 2016, pp. 785–794.
[14] J. Aungiers, “Time series prediction using lstm deep neural net-
works,” [Online], 2018, https://www.altumintelligence.com/articles/a/
Time-Series-Prediction-Using-LSTM-Deep-Neural-Networks.

354

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.

You might also like