Professional Documents
Culture Documents
Nanjing University of Aeronautics and Astronautics Nanjing University of Aeronautics and Astronautics
Nanjing, China Nanjing, China
lan.dw@nuaa.edu.cn wangh24@nuaa.edu.cn
Abstract—Gas is the internal pricing (metering system) for miners include the block revenue and the transaction fee.
running a contract or in general any transaction in Ethereum. Ethereum supports smart contracts, which greatly increases
With the popularity of Ethereum, the deficiency of current the size and complexity of transactions. It requires Ethereum
Ethereum transaction pricing mechanism First Price Auctions to have a definite resource calculation mechanism to quan-
is being amplified. The fee paid to miners is the gas used
multiplied by the gas price. Hence, designing an effective and tify the resources consumed by transactions. Hence, the gas
accurate gas price prediction method is of great significance mechanism is designed to satisfy this demand. Transactions on
for improving the efficiency, transparency and security of the the Ethereum network or various operations of smart contracts
Ethereum transaction mechanism. After the Ethereum ”London” need to consume different amounts of gas. The transaction fee
Hard Fork update, EIP-1559 has been proposed to change is determined by the gas consumption and the price of each
the historical gas mechanism and make transaction fees less
volatile and more predictable. Therefore, we propose a machine unit of gas calculated by Ethers, which is called the gas price.
learning based method to predict the gas price of next blocks According to the Ethereum transaction pricing mechanism
combined with a dynamic feature exploited from mempool after First Price Auctions, miners have the priority to include the
the proposal of EIP-1559. Specifically, we consider the pending transactions with a higher gas price, which means users need
transactions and their gas price in the mempool and take them to raise their gas price to make their transactions confirmed
as a machine learning feature for the first time. Due to the
update brought by EIP-1559, we refine more features than the earlier. However, with the popularity of Ethereum and the
related works. We use machine learning models combined with flourishment of DeFi and Dapps, the gas price has fluctuated
the mempool features for prediction. Experiments conducted on back and forth substantially and once reached 500 gwei on
the dataset manifest that our model combined with the mempool May 1, 2022 [1]. In such an environment with unstable gas
data shows good prediction performance, especially significantly price, it is difficult for users to choose an appropriate gas price
improving the two indicators MAE and RMSE. Furthermore,
we analyze and discuss the challenges of our scheme and the to strike a balance between their time and financial costs.
potential profound effects brought by our work. Ethereum core developers and several researchers have paid
Index Terms—Ethereum, Gas Price, Machine Learning, Pre- much attention to this issue and attempted to propose some
diction. solutions for improvement, including upgrading the gas fee
structure. On August 5th, 2021, Ethereum activated a major
I. I NTRODUCTION backward-compatible update named “London” Hard Fork [2].
New transactions initiated in Ethereum need to be confirmed In this update, a proposal named EIP-1559 makes a significant
by miners before being included in new blocks. Incentives change on the historical gas fee mechanism in consideration of
push those miners with computing power to validate these efficient and economic motivations. The calculation method of
transactions and include them into a block. The rewards for the gas fee is adjusted to to Gas · (BaseF ee + P riorityF ee),
347
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
after setting the parameters, and the transaction is sent to the normal circumstances, the transaction will leave the mempool
Ethereum network by the wallet node. Miners read transactions when the block receives it. Still, it is also possible for
in the Ethereum mempool and verify them. The gas fee is transactions to be replaced due to the acceleration behavior
what motivates miners to choose transactions. According to of others or discarded due to the mempool configuration of
the principle of incentive compatibility, rational miners will the node. Therefore, transactions will increase the probability
prefer transactions with a high gas price to be processed. of being confirmed by setting a higher gas price.
The calculation method of the gas fee becomes the base fee
and the priority fee after London update. However, it can still C. LSTM Neural Network
be understood as GasLimit·gasprice, and gas price becomes The LSTM model is specially designed to solve the long-
composed of base fee and priority fee: term dependency problem in general RNN [11]. RNN is a
neural network that contains recurrent cycles that allow the
Gasf ee = GasLimit · GasP rice (1) persistence of information. The cycle is represented by a
continuous sequence. A recurrent neural network can be seen
GasP rice = basef ee + priorityf ee (2) as several identical basic units connected, and each basic unit
priorityf ee = min(M P F P G, M F P G − basef ee) (3) can pass information to the following basic unit.
To make up for the shortcomings of traditional RNN, the
Among them, gas limit refers to the maximum gas that the LSTM model is specially designed. LSTM can be known as
user is willing to pay. Max fee per gas refers to the maximum a special RNN. Compared with conventional RNN, LSTM
gas price that a user is willing to pay to perform all operations inherently has good support for long-term dependencies. There
in the transaction, and we call it MF PG for short. Max are two central cores of the LSTM model: the memory cell and
priority fee per gas refers to the highest gas price value of the non-linear gating unit. The memory cell is used to maintain
the priority fee that the user is willing to pay to complete this the system’s state, and the non-linear gate units are used to
transaction, and we call it MPF PG for short. regulate the information flowing into and out of memory tuples
The base fee will be automatically calculated according to at each point in time [12]. The LSTM neural network is shown
the block size. The base fee will automatically increase when in fig.1, each cycle can be unrolled into an infinite number
the block size is more than 15M [9]. When the block size is of repeating basic units, and LSTM also unrolls the cycle
less than 15M, the base fee will automatically decrease, and into repeating basic units. The internal structure of the basic
the base fee will eventually be burned and destroyed to the repeating unit in traditional RNN is straightforward, with only
black address. The priority fee can be understood as a queuing one superficial neural network layer. Still, the basic repeating
fee, which is part of the fee finally paid to miners. The actual unit of LSTM uses four neural network layers and interacts
parameter that the user needs to set is the max priority fee per with each other in a special relationship.
gas and the max fee per gas. The user will return the excess
cost if the max fee per gas is more than the sum of the base ୲
fee and priority fee.
348
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
decision tree is trained, there will be an error. The next and return the second-to-last value in the sorted transaction as
decision tree will be trained on this error, and there will be the predicted value of the next block.
errors in the prediction results, and so on. Trees are used The second method is to process all transactions of the last
to predict the error of the previous tree, and the sum of the 100 blocks from now and predict the next block according to
prediction results of all trees is the final prediction result. the overall situation of the transactions. The specific steps are
The XGBoost model can be simply expressed as: as follows: first, extract all transactions of 200 blocks in the
T form of (block num, gasprice), then combine and group all
ft (x) = ht (x) (4) transactions’ data according to the same gas price and count
t=1 the number, and convert it to (count, GasP rice) form. Then
sort all gas price in ascending order, calculate the cumulative
Among them, ft (x) represents the sum of the previous t
sum of the corresponding counts, and the cumulative sum
trees, and ht (x) represents the result of the tth decision tree.
of the lowest gas price is the total number of transactions.
It can also be expressed recursively:
Finally, divide the accumulated sum by the total number of all
ft (x) = ft−1 (x) + ht (x) (5) transactions, and we record it as pct. Based on the stability of
the speed of on-chain, we can set the gas price for fast chaining
III. A M ULTI -M ODEL P REDICTION M ETHOD FOR when pct>80%. Its complete workflow chart is shown in
DYNAMIC A DJUSTMENT OF M EMPOOLS Fig. 3.
In this section, we will introduce our model and explain As shown in Fig.4, we compare the predicted results with
from the various parts of the model. Fig.2 shows the over- the actual gas price. There is a specific error in the mempool
all structure of the entire prediction model. First, we make prediction gas price results based on the above two methods.
prediction based on mempool through two different methods. However, it can be seen that the changing trend is very
Then, the prediction data are merged with other transaction consistent with the actual value. Therefore, if the data obtained
data crawled through etherscan API to be the original dataset. by the two prediction methods are input into the machine
We extract feature data after preprocessing the original dataset learning model as features, the correlation between the features
and put them into machine learning models. Finally, the and the prediction target should be extremely high. Meanwhile,
LSTM model performs point-by-point prediction and sequence considering the impact of other transaction factors on gas
prediction respectively, and the XGBoost model performs price, we decided to collect the data predicted by the two
classification prediction. methods based on mempool and combine them with other
feature data according to the intersection item. We record the
A. Gas Price Prediction Method Based on mempool feature data of the first method as mem1 gp and the second
The regression prediction method based on machine learn- method as mem2 gp.
ing considers the correlation of past data relative to the pre-
diction target. According to the transaction structure and Gas B. Feature Extraction
mechanism, we selected some features with a high correlation
with gas price as possible. However, gas price can also be According to the Ethereum transaction structure and gas
affected by quite a few external factors. If users are eager fee structure described in Chapter 2, we select the following
to get their transactions confirmed without caring about other features: GasLimit, Gasused, base fee per gas (BF PG), avg
cost factors, then gas price is affected due to human subjective max fee per gas (AMF PG), avg max priority fee per gas
factors. If the current Ethereum price is too high, users may ( AMPF PG), difficulty, transactions(txs) as factors affecting
lower their bids, and then gas price will be affected by the fluctuation of gas price. Each feature is described in detail
objective factors. We can also think that gas price is a relatively below:
independent individual from this perspective. Then the pending • GasLimit: The gas limit defined in the user transaction
transactions in the mempool in real time become our essential refers to the maximum gas that the user is willing to pay
reference factor. in order to execute this transaction [14]. The gas limit
We design two schemes to make a prediction based on in the block is defined as the maximum amount of gas
mempool. The first is to predict based on the overall state allowed in the entire block, that is, the maximum limit
of the current mempool, and the second is to predict based on of the gas sum of all transactions.
the condition of the last 100 blocks. We connect to the remote • GasUsed: All commands in the EVM are set with the gas
node through infura and obtain the transaction currently in to be spent, and the gas used is the total amount of gas
the cache according to the keyword gas price for the first consumed by all commands executed in this transaction.
method. Check its validity period in the mempool, which is Both gas limit and gas used are concepts of quantity.
how long it can survive in the mempool. Next, we get the • Base fee per gas: It refers to the minimum effective
block that is currently pending. If there is no transaction in it, gas price the user needs to pay in the transaction. This
an empty block, it will return a null value and a block number. parameter is newly proposed in the EIP-1559 proposal
If there are transactions in the pending block, they are sorted in after the London update. The calculation method of the
descending order by gas price. Record the current timestamp, base fee is GasU sed · BF P G
349
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
'DWD&ROOHFWLQJ
[ \ 0XOWL0RGHO3UHGLFWLRQ
FUDZO 'DWD0HUJH
« \ 3UHGLFWHG
/670
*DVSULFH
[
WLPHVWDPS PHPBJS 3RLQWE\3RLQW
W[V « *3 ଵ
« 6HTXHQFH
«« «
0(0
322/ %ORFNBQXP PHPBJS «« « ;*%RRVW *3 ଶ
EORFNV )HDWXUH
1RWLRQW[VUHSUHVHQWWUDQVDFWLRQV
([WUDFWLRQ
ଷ
ସ ସ
ହ
ସ
ହ 1H[W%ORFN
ହ
3UHGLFWLRQ
«
«
«
«
«
«
350
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
point-by-point basis, predicting only one point at a time, and After obtaining the original data, firstly, we convert the base
the following window uses the complete test data to predict to decimal data, then process the null values in the data, and
the next point. The second method is to predict a complete finally, unify the unit in Gwei. Among them, the max fee per
sequence, using only a portion of the training data set for gas and max priority fee of all transactions are taken as the
initializing the training window and moving the window as the feature input with the block number as the standard, and the
model predicts the next point as in the point-by-point mode. In average and median gas price are taken simultaneously. The
the second-step prediction, only one piece of data comes from median gas price is used as the prediction target. For empty
the previous prediction. There are two data from the previous block data, miners occasionally mine only an empty block
prediction in the third-step prediction, and so on 50 times. The without packaging any transactions. The gas price of this data
data will all be the previous prediction data [17]. Its simplified is 0. The frequency of such data is not high, which is a normal
schematic diagram is shown in Fig.5. phenomenon.
At the same time as the etherscan data crawling, the gas
3RLQWE\3RLQW price data obtained by the two prediction methods through the
« mempool are collected. We refer to the data crawled through
etherscan as the original data source. The two types of data
«
predicted based on the Ethereum mempool are mem1 gp and
/670
1HXUDO1HWZRUN
6HTXHQFH3UHGLFWLRQ mem2 gp. We merge the data of mem1 gp with the original
«
351
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
TABLE I
E XAMPLE OF P REPROCESSED DATA
BlockNum gas limit gas used BF PG D timestamp tx AMF PG AMPF PG MEM1 MEM2 MGP
14414732 30087858 30046709 42.7700 129 1647664473 591 64.8997 2.5284 38 44 44.1700
14414733 30058477 15337995 48.1016 129 1647664497 238 59.0064 4.3336 38 49 49.5016
14414734 30087829 20536989 48.2251 129 1647664514 428 64.7157 3.7728 38 49 49.6251
14414735 30117210 30110033 50.4262 129 1647664531 374 110.2117 11.3203 42 51 52.4262
...
In the above formula, ŷ is the predicted output value of the requires real-time connection of nodes to collect the current
model, and y represents the average value of the actual value. Ethereum mempool data. Therefore, comparative experiments
The above measures will have different meanings depending can only be compared within the range of features that can be
on the model. The specific number loses readability. The value collected. Table III shows the comparison data of the indicators
of R2 is between 0 and 1, and its value reflects the relative of the two models. From the MAE perspective, our model has
degree of the regression contribution, that is, the percentage improved considerably. From an RMSE perspective, there is
that the regression relationship can explain in the change of less room for improvement without including the mempool
the dependent variable. input.
The advantage of R2 is that the results are normalized,
making it easier to see the gaps between the models. Typically, TABLE III
for training dataset, the range of values is [0,1]. For the testing E RROR DATA C OMPARISON
set, the value may also be negative [19]. Generally speaking, Training Set Testing Set
Model
the higher the value, the higher the degree of explanation of MAE RMSE MAE RMSE
the independent variable to the dependent variable, and the LSTM 0.09 0.32 0.08 0.12
XGB 0.91 1.51 0.54 3.01
higher the percentage of the change caused by the independent MLR-LSTM 3.21 1.60 3.04 1.76
variable to the total change. MLR-XGB 2.66 1.51 2.75 1.54
According to the prediction work of Liu et al. [2] for gas V. D ISCUSSION
price in 2019, we used the data of the same period for com- In this section, we focus on some of the challenges encoun-
parative experiments. The prediction method for the mempool tered by our scheme and their potential causes. In addition,
352
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
Fig. 6. Overall Prediction of Gas Price.
we also point out other potential additional impacts that our behaviors, resulting in a surge in gas prices. The starting points
method may bring. of these behaviors are different, but they all have a common
A. Anomaly Detection feature: the sender of the transaction does not care about the
cost increase caused by the increase in gas price. They believe
We find that our model was not fit well at predicting when
that transaction confirmation speed is more important than
faced with a data surge. We have tried some solutions, such
the economic cost of high consumption. These are complex
as adding a feature to evaluate the possible impact of certain
challenges for our predictions because we cannot intervene
unexpected events on gas price or detecting behaviors such
or predict the subjective consciousness of others. In essence,
as malicious price hikes that increase gas prices for a certain
these abnormal data appear due to the contingency brought
period. However, the experiment results still cannot accurately
about by the Ethereum transaction bidding mechanism. How-
predict the mutation data. The frequency of such abnormal
ever, we can also analyze these behaviors from other angles.
data is about 0.1%-0.2%. Although not frequently, it is still a
We can mark these active addresses and increase the weight
challenge that our model needs to consider and address.
of this feature when the mark appears to offset the impact of
We tracked and analyzed all transactions in several blocks
subjective factors on gas price, which can be the future work.
before and after these mutation data and some events that
occurred around this moment that may affect the gas price. B. Impacts on Transaction Fee Mechanism
Moreover, we tried to discover the reasons and laws behind it. Ethereum historically priced transaction fees using a simple
We first selected 41 abnormal data from more than 40,000 auction mechanism, where users send transactions with bids
pieces of data. We considered the related transaction data (”gas prices”) and miners choose transactions with the highest
of 5 blocks before and after them and the external events bids, and transactions that get included pay the bid that they
that occurred at the corresponding time. After analyzing, we specify [15]. Although it can effectively satisfy the demands
summarize three potential causes of gas price mutation: for many transactions to be uploaded quickly, a system, that
1) These abnormal transactions may come from the smart leaves the uploading order of transactions entirely in the
contracts of some token systems, such as NFT or other hands of the miners, gives them too much power, which may
decentralized financial projects. They often choose to lead to several unfair behaviors such as frontrunning [20] or
significantly increase the gas price for specific incentives freeloading [21]. Meanwhile, since transaction fees are an vital
so that their transactions can be listed on the chain as source of income for miners, rational miners naturally hope the
soon as possible to ensure that their financial needs are gas price can be as high as possible because transaction fees =
confirmed. gas price * gas used. What we want is a consensus mechanism
2) Some anomaly data is due to the bidding wave of some that is completely transparent and perfectly symmetric in
arbitrage bots. The prominent feature of this situation information for all.
is that transactions from two or more same addresses On the face of it, our work is only a prediction of the
and nonces appear multiple in the pending mempool at gas price and it has little impact on the gas price. However,
a specific moment with gradually increasing gas prices. gas price is an unknown information for users while miners
In order to succeed in the zero-sum game, these arbitrage have more information about transactions and gas price than
robots will increase the gas price wildly, sometimes users with access to mempool. Therefore, there is a potential
hundreds of times, in order to beat their opponents, possibility of attack. However, the real-time prediction of gas
which is described in detail in [20]. price brought by our work brings a possible indicator of
3) A tiny portion of the mutation data has no apparent gas price for many users like a lighthouse, amplifying the
pattern. We speculate that this may be due to personal transparency of this unknown property and thus potentially
reasons, such as curious attempts or misoperations when limiting some unfair behavior. Furthermore, based on our
using Ethereum for the first time. predictions, users can ideally better choose the timing of their
The first two types of data account for 36 of the 41 sent transactions and more precisely choose the threshold of
abnormal data, which can be classified as uncontrolled bidding gas price needed to achieve their goals. This further reduces
353
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.
the magnitude of gas price fluctuations, thus indirectly helping [15] G. Wood et al., “Ethereum: A secure decentralised generalised trans-
action ledger,” Ethereum project yellow paper, vol. 151, no. 2014, pp.
to achieve better stability of gas price. 1–32, 2014.
[16] V. Buterin, E. Conner, R. Dudley, M. Slipper, I. Norden, and A. Bakhta,
VI. C ONCLUSION “Eip-1559: Fee market change for eth 1.0 chain,” [Online], 2019, https:
//github.com/ethereum/EIPs/blob/master/EIPS/eip-1559.md.
In this paper, we propose a multi-model gas price prediction [17] H. Jang and J. Lee, “An empirical study on modeling and prediction
method combined with Ethereum mempool feature input. of bitcoin prices with bayesian neural networks based on blockchain
information,” Ieee Access, vol. 6, pp. 5427–5437, 2017.
While collecting Ethereum block transaction data, read the [18] “Etherscan,” [Online], 2022, https://etherscan.io/.
transaction gas price in the mempool. This data is used to [19] “sklearn.metrics, r2 score,” [Online], 2022, https://scikit-learn.org/.
predict the gas price of the next block through two non- [20] P. Daian, S. Goldfeder, T. Kell, Y. Li, X. Zhao, I. Bentov, L. Breidenbach,
and A. Juels, “Flash boys 2.0: Frontrunning in decentralized exchanges,
machine learning methods. The two kinds of prediction data miner extractable value, and consensus instability,” in 2020 IEEE
are collected and processed, and fused with the selected feature Symposium on Security and Privacy (SP). IEEE, 2020, pp. 910–927.
data. Finally, the fused feature data is input into the two mod- [21] F. Zhang, E. Cecchetti, K. Croman, A. Juels, and E. Shi, “Town crier: An
authenticated data feed for smart contracts,” in Proceedings of the 2016
els. Experiments show that it has excellent performance for aCM sIGSAC conference on computer and communications security,
the model combined with mempool feature input. Compared 2016, pp. 270–282.
with the model without mempool feature input, the prediction
ability has been improved to a certain extent. Compared
with the existing gas price prediction work, it has a good
improvement in prediction ability.
VII. ACKNOWLEDGEMENTS
This work was supported by the National Key
R&D Program of China (Grant No. 2020YFB1005900,
2020B0101090002), the National Key R&D Program of
Guangdong Province (Grant No. 2020B0101090002), the
National Natural Science Foundation of China (Grant No.
62032025, 62071222, U21A201710, U20A201092), and the
Natural Science Foundation of Jiangsu Province (Grant No.
BK20200418, BK20220880).
R EFERENCES
[1] “Average daily gas price of ethereum from august 2015 to may
16, 2022,” [Online], 2022, https://www.statista.com/statistics/1221821/
gas-price-ethereum/.
[2] T. Beiko, “London mainnet announcement,” [Online], 2021, https://blog.
ethereum.org/2021/07/15/london-mainnet-announcement/.
[3] “Footprint analytics,” [Online], 2022, https://www.footprint.network/.
[4] “etherchain.org,” [Online], 2022, https://etherchain.org/tools/gasnow/.
[5] “go-ethereum,” [Online], 2022, https://github.com/ethereum/
go-ethereum/.
[6] “Ethgasstation,” [Online], 2022, https://ethgasstation.info/.
[7] F. Liu, X. Wang, Z. Li, J. Xu, and Y. Gao, “Effective gasprice pre-
diction for carrying out economical ethereum transaction,” in 2019 6th
International Conference on Dependable Systems and Their Applications
(DSA). IEEE, 2020, pp. 329–334.
[8] S. M. Werner, P. J. Pritz, and D. Perez, “Step on the gas? a better
approach for recommending the ethereum gas price,” in Mathematical
Research for Blockchain Economy. Springer, 2020, pp. 161–177.
[9] Y. Liu, Y. Lu, K. Nayak, F. Zhang, L. Zhang, and Y. Zhao, “Empirical
analysis of eip-1559: Transaction fees, waiting time, and consensus
security,” arXiv preprint arXiv:2201.05574, 2022.
[10] G. Chen, B. Xu, M. Lu, and N.-S. Chen, “Exploring blockchain
technology and its potential applications for education,” Smart Learning
Environments, vol. 5, no. 1, pp. 1–10, 2018.
[11] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[12] C. Olah, “Understanding lstm networks,” Colah. github. io, 2015, http:
//colah.github.io/posts/2015-08-Understanding-LSTMs/.
[13] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,”
in Proceedings of the 22nd acm sigkdd international conference on
knowledge discovery and data mining, 2016, pp. 785–794.
[14] J. Aungiers, “Time series prediction using lstm deep neural net-
works,” [Online], 2018, https://www.altumintelligence.com/articles/a/
Time-Series-Prediction-Using-LSTM-Deep-Neural-Networks.
354
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on September 14,2023 at 04:47:14 UTC from IEEE Xplore. Restrictions apply.