You are on page 1of 6

2019 3rd International Conference on Circuits, System and Simulation

Smart Contract Defect Detection Based on Parallel Symbolic Execution

Zemin Tian
School of Computer Science and Technology
Beijing Institute of Technology
Beijing, 100081, China
e-mail: 2120161050@bit.edu.cn

Abstract—There are more than 1 million smart contracts in attack resulted in a theft of $60 million worth of ethers and
Ethereum and the number of ethers managed by smart prompted Ethereum's blockchain to branch [3]. Two defects
contracts has exceeded 100 million, but the security in the Parity wallet resulted in a theft of approximately $30
vulnerabilities in smart contracts seriously jeopardize the million in ethers and a freezing of $150 million in ethers [4].
financial security of Ethereum users. Existing method for Researchers have done some remarkable work on the
defect detection of smart contract bytecode using symbolic defect detection of smart contract. Mavridou et al. propose
execution does not take care of the accuracy and detection real- the FsolidM [5] to automatically generate secure smart
time at same time. In this paper a smart contract bytecode contract code. FsolidM can avoid defects introduced by
defect detection algorithm based on parallel symbolic
developers writing code by hand. However, since the smart
execution is proposed. We split a smart contract in units of
functions by analyzing the smart contract function selection
contract cannot be modified, this tool cannot be applied to
process. A symbolic execution tree is constructed for each smart contracts deployed in Ethereum. Jiang et al. generate
function to predict the function execution path. Then we different inputs from the smart contract ABI for fuzzing to
partition the symbolic execution tree into multiple sub-trees discover different types of defects [6]. This method is time
evenly. Finally, a process pool is used to perform parallel consuming because the contract has to be executed once for
symbolic execution on those sub-trees to reduce the analysis each input. Further more, the code coverage of the input of
time of smart contract defect detection. Experimental data fuzzing seriously affects detection accuracy. It is necessary
shows our method has a significant improvement in detection to use dynamic symbolic execution to improve the validity of
efficiency compared with existing symbolic execution method. test cases [7]. The shortcoming is that the instrumentation or
The speedup ratio is up to 3.1x in a 4-core computer. Besides, it logging process of dynamic symbolic execution will slow
does not introduce false positives or false negatives. down the analysis speed. Formal verification has also been
used in defect detection of smart contract. Sukrit et al.
Keywords-parallel optimization; smart contract; symbolic develop the tool ZEUS [8], which uses formal verification to
execution; defect detection; constraint solving detect defects in smart contract source code. ZEUS inserts
assert statements in source code and then converts the
I. INTRODUCTION
contract to LLVM bytecode. Finally, it use its validator to
The smart contract was first proposed by Nick Szabo in check if the inserted assertion is true to achieve the security
1995[1] which refers to a set of contracts defined in digital and correctness verification. This method requires the source
form. Currently, smart contract has been implemented in code of the contract. As of November 2018, there are over 1
various encrypted digital currencies. The implementation of million smart contracts in Ethereum. Only 40,000 contracts
smart contract in Ethereum is a nearly Turing Complete are open source. The proportion does not exceed 5% [9].
program language. After a user has written the smart contract, Thus this method is not applicable for smart contracts
the user will compile the smart contract into EVM bytecode, running in Ethereum.
and then send the compiled smart contract to Ethereum Symbolic execution was first presented in 1976 in [10].
blockchain by sending a transaction. After the above steps, a With the development of computer architecture and the
smart contract is deployed. The invocation of a smart Satisfiability Modulo Theories (SMT), Symbolic execution
contract is also done through a transaction. Now, there are is widely used in defect detection of programs [11][12].
more than 1 million smart contracts in Ethereum, and the Compared with other defect detection methods, symbol
number of ethers managed by smart contracts has exceeded execution can directly detect defects in compiled executable
100 million. files. It is capable to traverse each execution branch to
As the increasing of smart contracts, the number of achieve high coverage and discover underlying problems by
defects in smart contracts has also increased significantly. simulation execution. Recently, researchers have presented
The reasons include that some developers of smart contracts some smart contract defect detection tools based on static
do not fully understand the characteristics of the smart symbolic execution. OYENTE [13] is a representative one.
contract language Solidity and there was no audit before OYENTE performs a depth-first traversal to explore the
uploading the code. According to [2], after testing nearly 1 entire execution space of smart contract using symbolic
million smart contracts, the researchers found that 34,200 execution guided by the control flow graph.
smart contracts were defective, and the number of ethers In the process of analysis, a SMT solver is used when
managed by those smart contracts reached 4,905. The DAO conditional branch is encountered [14]. As the size of the

978-1-7281-3657-8/19/$31.00 ©2019 IEEE 127


code increases, the analysis time of the program increases Comparatively speaking, the static symbolic execution
significantly. Therefore, it becomes extremely difficult to perform much faster in analyzing programs, because it does
reach deep execution states and achieve high code coverage not require real execution of the program or dynamic
within a limited time. Table I shows the analysis time of 10 instrumentation, and branch paths states can be saved and
smart contracts using OYENTE. traversed one by one without repetitive execution of the
program.
TABLE I. ANALYSIS TIME OF 10 SMART CONTRACTS USING OYENTE
B. OYENTE
ID Time/s ID Time/s
OYENTE is a defect detection system for Ethereum
1 30.317 6 28.781
smart contracts. It symbolize program inputs and Ethereum
2 30.871 7 28.722 state variables, and uses depth-first to traverse the execution
3 30.725 8 27.794 paths of the program under the guide of control flow graph,
4 29.767 9 26.678 then collect the constraint conditions for defects and branch
5 29.490 10 26.251
paths during the symbolic execution. A SMT solver is used
to solve the constraint conditions for defects and branch
paths to complete the defect of defects in the entire execution
As shown in Table I, the analysis time of each contract paths of contracts. The general workflow is shown in Figure
exceeds 20 seconds. In fact, Ethereum's block generating rate 1.
is about 12 to 17 seconds [15]. If the current symbolic
execution method is used to detect the defects before Preprocessing module

transaction initialization, the transaction cannot be packaged Smart contract Decompiler CFG builder
bytecode
into the block chain in time.
In view of the shortcomings of the above-mentioned
previous smart contract defect detection method based on Symbolic execution module

symbol execution, this paper combines the characteristics of Load basic block
smart contract bytecode, proposes a parallel symbolic
execution method to make full use of the performance of
multi-core computers. Our proposed method reduces the Symbolic
Symbolically execute instructions;
Solve defect conditions; Solve path constraints
consumption of symbolic execution, thereby reducing the Ethereum state
Collect path constraints
analysis time of defects detection in smart contracts without
introducing false positives and false negatives. End of symbolic execution

II. DEFECT DETECTION OF SMART CONTRACT BASED ON Defect detection

SYMBOLIC EXECUTION Figure 1. The general workflow of OYENTE.


A. Symblic Execution
OYENTE consists of three parts: preprocessing,
The principle of symbolic execution is to use symbols to symbolic execution, and defect detection. The bytecodes of a
represent the inputs of program, use symbolic expressions to smart contract are converted into a control flow graph for
represent variables associated with program inputs, then booting symbolic execution in the process of preprocessing.
simulate the execution of program instructions, and collect In the process of symbolic execution, it simulates the
path constraints of branches at the same time. When a branch execution of program, solve the constraint conditions for
path is encountered, the path constraint will be solved by branch paths and defects to be detected. The simulated
SMT solver. If the constraint is solvable, then the branch can execution is performed in unit of basic block, each
be executed. instruction in the basic block is executed in sequence. If the
The symbolic execution can be divided into static instruction is related to defect, a defect condition constraints
symbolic execution [16] and dynamic symbolic execution is generated and solved by SMT solver. When the end of the
[17]. In which, the dynamic symbolic execution uses the real basic block is reached, if the last instruction of the basic
value to guide the execution of program, and collect the block is an unconditional jump, the jump target basic block
conditions of path constraints in real execution. When a is executed. Otherwise, if the last instruction is a conditional
branch path is encountered, its constrained condition is instruction, SMT solver is used to decide whether the two
negated one by one and solved to generate new inputs of paths can be executed, and the executable paths will be
corresponding branch. Finally, this completes the traversal of executed one by one. During the phase of defect detection,
the execution path of a program. However, dynamic defect is detected by collecting the defect condition
symbolic execution needs to collect information during the constraints solved in the previous step.
running process by utilizing the virtual machine, dynamic In order to find program hotspot, the execution time of
binary instrumentation or debugging technologies [19], above phases is obtained by performing profiling to the
which usually causes a decreasing of the execution speed of execution of OYENTE, and the statistic results are shown in
program. Moreover, each time a new test case is generated, Figure 2.
the program needs to be re-run for testing. It will also It can be seen from Figure 2 that the process of symbolic
increase the analysis time. execution occupies an average of 96.1% of the total analysis

128
time, while the time of preprocessing and defect detection The basic block of the entry point of each function can be
are 2.2% and 1.7% respectively. Therefore, this paper mainly obtained by analyzing the jump relationship between the
focuses on how to parallelize the symbolic execution process basic blocks of the function selection process.
to reduce the analysis time of the symbolic execution After the smart contract bytecode is decompiled, a
program. control flow graph is constructed. The entry blocks of all
functions are collected by analysing the control flow graph
from the initial basic block and tracking the jump
relationships between those basic blocks. The algorithm is
shown in Algorithm 1.

Figure 2. The execution time of all modules.

III. DEFECT DETECTION OF SMART CONTRACT USING


PARALLEL SYMBOLIC EXECUTION ALGORITHM
In this paper, we propose a parallel optimization method
to improve the performance of symbolic execution on multi-
core processors. In order to make full use of the computing
performance of multi-core processors, we partition the Algorithm 1 starts from the initial basic block, when the
symbolic execution work into multiple tasks. Then we use last instruction of the basic block is a conditional jump
the multi-process to parallelize the symbolic execution instruction, and the basic block at the location of that jump
process. Our method can significantly reduce time cost of address is not in the set 𝐹, then it insert the basic block at the
symbolic execution without introducing false positives and target location of the jump instruction to the set 𝐹 and
false negatives. proceed to the next basic block. The above process is
A. Collecting Function Entries repeated until it is found that the basic block at the location
of the jump address already exists in the set F, that is, the
After a lot of experiments on the execution process of collecting is completed.
smart contract instructions, we found that the smart contract After the process of Algorithm 1, all function entries are
program has a function selection process which is executed collected into the set F. Since every function can be the start
to find out the corresponding function. When smart contract point of one execution of smart contract, the entry blocks of
is invoked, the smart contract program always starts at the all functions can be the start location of symbolic execution
first instruction. To execute specific function, the smart process. So after function entries are collected, the symbolic
contract verifies the size of the call data, if the size of call execution process can be parallelized in function level by
data is less than 4 bytes, the program jumps to the fallback starting symbolic execution in parallel from the entry points
function. Otherwise the smart contract program gets the hash of all functions.
of the callee function which is the first 4 bytes of call data.
And the hash is compared with all the functions the smart B. Constructing the Symbolic Execution Tree
contract has, until the hash value is equal to the hash of the Although coarse-grained parallelism of symbolic
function, and smart contract jumps to the corresponding execution can be achieved by function-level parallelism, the
function to execute the instructions. Figure 3. shows the basic block number of each function is not uniform, and
process. there may be cases that individual functions contain loops
and a large number of basic blocks. To achieve a better
parallelism and reduce the time cost in parallel symbolic
exection. task partitioning at the basic block level is required.
The basic goal of basic block-level task partitioning is to
make the number of basic blocks executed per task basically
equal, for which the number of basic blocks covered by each
branch path of execution needs to be calculated. This paper
predicts the number of execution paths for each function and
the number of basic blocks for each execution path by
constructing a symbolic execution tree. The symbolic
execution tree is constructed using the control flow graph of
Figure 3. The selection process of smart contract function. each function. The algorithm is shown in Algorithm 2.

129
Algorithm 3 Partition SET
partitionSymTree( )
Data: SET T,subtree limit N,deviation threshold
Result: Partitioned path set P
01:
02: // the number of nodes in T
03: if
04: current path selection;
05: return;
06: else
07: partitionSymTree( T.leftChild );
08: partitionSymTree( T.rightChild );

The number of basic blocks per task is determined


according to the total number of blocks to be executed and
the number of processes. The total number of basic blocks
required to be executed is equal to the number of subsequent
blocks of the root node in the symbolic execution tree.
The number of basic blocks per task is approximately
equal to the value of the total number of basic blocks
devided by the number of processes. However, since the
number of basic blocks of each branch in the symbolic
execution tree is not uniform, it is necessary to add a
threshold, which makes the number of basic blocks in each
task float within a reasonable range.
Algorithm 3 performs a depth-first traversal on the
Algorithm 2 traverses the control flow graph in depth- symbolic execution tree from the root node. When the
first order to calculate the number of basic blocks per path. number of subsequent basic blocks to be executed of the
When a control flow graph has back edges, the statistical node satisfies the division condition, the execution path from
method of basic block number is different. This is because root node to the current node is divided into a subtree and
the back edges of the control flow graph is represented as a returned to its parent node, then continuing the traversal of
loop structure in the program. When traversing to the back other nodes.
edge, it will enter the loop body, resulting in the traversal Finally, the symbolic execution tree is divided into
process is limited to looping in the basic blocks of the loop subtrees with the same root node and a similar amount of
body. The algorithm this paper proposed avoids traversing basic block to be executed, and essentially each subtree is an
into loops by limiting the number of each edge traversed. execution task.
When traversing the control flow graph, the basic block D. Parallel Execution
whose last instruction is a conditional jump instruction
becomes the node of the symbol execution tree, and other After the processing of the previous step, the symbolic
types of basic blocks are recorded into the block list of the execution tree of each function is converted into a task queue,
node. so the entire symbolic execution task of the whole smart
After traversal, the control flow graph is converted to a contract is divided into multiple small symbolic execution
symbolic execution tree by this algorithm, each tree node trees which are task queues. This paper handles those task
queues in a process pool. When a process executes one task,
contains a basic block with a conditional jump instruction.
it starts from the root node of the symbolic execution tree to
and the basic blocks in the traversal path between the node
a specific branch, thereby completing the traversal to specific
and its parent node are collected into a basic block list of the
symbolic execution subtrees. The parallel execution process
node. The number of basic blocks to be executed following
is shown in Figure 4.
each node is the sum of the number of subsequent basic
blocks of the left and right branch nodes and the length of the
basic block list of the node.
C. Dividing the Symbolic Execution Tree
In order to achieve better parallelization, it is necessary to
achieve load balancing in parallel execution [18]. Therefore,
the symbolic execution tree needs to be equally divided
according to the number of subsequent basic blocks.
Algorithm 3 describes the partitioning process of the Figure 4. The parallel execution of symbolic execution tree.
symbolic execution tree.

130
Through the parallel method this paper proposed, the The symbolic execution determines whether the defect exists
hotspot of each program can be distributed evenly to each by symbolizing timestamp of the block and checking
process and executed in parallel, thereby reducing the time of whether the symbol exists in the path constraint of the
symbolic execution. transfer operation. If the symbol exists in the path constraint,
Since the core of symbolic execution is to simulate a then the defect exists.
program execution, the state of any node in symbolic
execution tree depends on the result of previous symbolic B. Experimental Results
execution. Therefore, each task needs to be executed from
the root node to ensure that the state of the node does not 30 Parallel method OYENTE
change after paralleling, which ensures that our parallel

Analysis Time /s
25
method does not introduce false positives or false negatives
of the symbolic execution detection. 20
Considering that each task needs to be executed from the 15
beginning of the function, the basic block near the root node
10
of the symbolic execution tree is repeatedly symbolically
executed by each process. However, since the number of 5
repeatedly symbolically executed basic blocks is much 0
smaller than the number of total basic blocks to be 1 2 3 4 5 6 7 8 9 1011121314151617181920
symbolically executed of the task, those repeated execution
Smart Contract
of the basic blocks caused by parallel symbolic execution
tree have less influence on the parallel efficiency. Figure 5. Analysis time of parallel method and OYENTE.

IV. EXPERIMENTAL RESULT AND ANALYSIS TABLE III. STATISTICS OF EXPERIMENTAL RESULTS OF PARALLEL
METHOD .
A. Experimental Environment
In order to evaluate the acceleration effect of our defect Item Value
detection method based on parallel symbolic execution, 20 Average analysis time 8.35s
smart contracts are extracted from Ethereum for test. The Maximum speedup ratio 3.1
experimental environment and system configuration are Average speedup ratio 2.34
shown in Table II. Number of executions of basic blocks 64131
Number of repeated executions of basic
5019
TABLE II. EXPERIMENTAL ENVIRONMENT. blocks by symbolic execution tree

Item Description Figure 5. shows the analysis time of our parallel method
Intel Xeon E3-1225 and OYENTE. The time cost by our parallel method is much
CPU
Hardware
v5, 4 cores less than the time cost using OYENTE, and the statistics of
Memory DDR3, 8GB experimental results of our parallel method is shown in Table
III. The average analysis time is 8.35s and the average
Operating System Debian 8 speedup ratio is 2.34, while the maximum speedup is up to
Number of processes 3.1. The table counts the total number of basic blocks
4
of Parallel method symbolically executed which is much larger than the total
SMT Solver Z3 4.50 amount of basic blocks of 20 contracts. The reason is that
Software there are back edges in the basic blocks of smart contract.
Python 2.7 The table also shows the number of basic blocks that are
repeatedly executed caused by symbolic execution tree,
Ethereum Client Go-ethereum 1.8.23
which is 7.8% of total number of basic block executed. It can
Ethereum Virtual
EVM 1.73 be seen that repeated executions caused by the symbolic
Machine execution tree does not obviously decrease the parallel
efficiency.
This paper chooses two common defects in smart
contract for evaluation. TABLE IV. EXPERIMENTAL RESULTS OF THE DEFECT DETECTION
1) Reentrancy. When a smart contract calls another WITH TWO METHODS

smart contract, it will wait for the return of the call, while the
OYENTE Parallel method
callee may utilize the intermediate state of the caller to
launch the attack. When the CALL instruction is Reentrancy 2 2
symbolically executed, if the updated variables cannot affect Timestamp dependency 1 1
current path constraints, there is a Reentrancy defect.
2) Timestamp dependency. When the timestamp of the Table IV shows the results of defect detection with
current block affects key operations of the contract execution OYENTE and the parallel method. It can be seen from the
such as ether transferring, there is a time dependence defect. table that the result detected by our method is consistent with

131
the result of OYENTE, which indicates that our parallel [6] Jiang, Bo, Ye Liu, and W. K. Chan. "Contractfuzzer: Fuzzing smart
method does not increase false positive rate or false negative contracts for vulnerability detection." Proceedings of the 33rd
ACM/IEEE International Conference on Automated Software
rate while accelerating analysis process Engineering. ACM, 2018.
[7] Wu, Zhi-yong, et al. "Survey of fuzzing." Application Research of
V. CONCLUSION Computers 27.3 (2010): 829-832.
This paper proposes a parallel algorithm for symbolic [8] Kalra, Sukrit, et al. "Zeus: Analyzing safety of smart contracts." 25th
execution based defect detection of smart contract. We also Annual Network and Distributed System Security Symposium
implement our parallel algorithm based on the OYENTE, (NDSS’18). 2018.
which is an open source system to detect defects in smart [9] Barton, James, and James BartonCoinDiligent. “How Many Ethereum
contract. In order to evaluate our parallel method, we Smart Contracts Are There?” CoinDiligent, 8 Nov. 2018,
coindiligent.com/how-many-ethereum-smart-contracts.
evaluates our algorithm using 20 smart contracts. The
[10] King, James C. "Symbolic execution and program testing."
experimental results show that our parallel method can make Communications of the ACM 19.7 (1976): 385-394.
full use of the performance of multi-core computers without [11] Baldoni, Roberto, et al. "A survey of symbolic execution techniques."
introducing extra false positive and false negative. In the ACM Computing Surveys (CSUR) 51.3 (2018): 50.
computer with a 4-core CPU, the speedup ratio of our [12] Siegel, Stephen F., et al. "Combining symbolic execution with model
parallel algorithm is up to 3.1, and the average speedup ratio checking to verify parallel numerical programs." ACM Transactions
is 2.34. Our future work is to reduce the amount of on Software Engineering and Methodology (TOSEM) 17.2 (2008): 10.
symbolized data by combining the real data of the smart [13] Luu, Loi, et al. "Making smart contracts smarter." Proceedings of the
contracts in Ethereum, and try to replace the symbolic 2016 ACM SIGSAC Conference on Computer and Communications
Security. ACM, 2016.
execution tree with inter-process communication to solve the
redundant repetitive executions. [14] Moser, Andreas, Christopher Kruegel, and Engin Kirda. "Exploring
multiple execution paths for malware analysis." 2007 IEEE
Symposium on Security and Privacy (SP'07). IEEE, 2007.
REFERENCES
[15] Buterin, Vitalik. "Toward a 12-second block time." Ethereum Blog
[1] Szabo, Nick. "Smart contracts: building blocks for digital markets." (2014).
EXTROPY: The Journal of Transhumanist Thought,(16) 18 (1996).
[16] Young, Michal, and Richard N. Taylor. "Combining static
[2] Nikolić, Ivica, et al. "Finding the greedy, prodigal, and suicidal concurrency analysis with symbolic execution." IEEE Transactions
contracts at scale." Proceedings of the 34th Annual Computer on Software Engineering 14.10 (1988): 1499-1511.
Security Applications Conference. ACM, 2018.
[17] Galeotti, Juan Pablo, Gordon Fraser, and Andrea Arcuri. "Improving
[3] Atzei, Nicola, Massimo Bartoletti, and Tiziana Cimoli. "A survey of search-based test suite generation with dynamic symbolic execution."
attacks on ethereum smart contracts (sok)." International Conference 2013 ieee 24th international symposium on software reliability
on Principles of Security and Trust. Springer, Berlin, Heidelberg, engineering (issre). IEEE, 2013.
2017.
[18] Whitman, Scott. "Dynamic load balancing for parallel polygon
[4] McCorry, Patrick, Malte Möser, and Syed Taha Ali. "Why Preventing rendering." IEEE Computer Graphics and Applications 14.4 (1994):
a Cryptocurrency Exchange Heist Isn’t Good Enough." Cambridge 41-48.
International Workshop on Security Protocols. Springer, Cham, 2018.
[19] Pǎsǎreanu, Corina S., et al. "Combining unit-level symbolic execution
[5] Mavridou, Anastasia, and Aron Laszka. "Designing secure ethereum and system-level concrete execution for testing NASA software."
smart contracts: A finite state machine based approach." arXiv Proceedings of the 2008 international symposium on Software testing
preprint arXiv:1711.09327 (2017). and analysis. ACM, 2008.

132

You might also like