You are on page 1of 48

DESIGNING A DATABASE FOR ETHEREUM SMART CONTRACT

VULNERABLITY ANALYSES

BSCS COMPUTER SCIENCE

Hamza Ahmed BSCS17023

Session: 2017-2021

DEPARTMENT OF COMPUTER SCIENCE

INFORMATION TECHNOLOGY UNIVERSITY

LAHORE, PAKISTAN

1
DESIGNING A DATABASE FOR ETHEREUM SMART CONTRACT
VULNERABLITY ANALYSES

A thesis submitted in partial fulfillment of the requirements for the

Degree of Bachelor of Science in

Computer Science

Hamza Ahmed BSCS17023 2017-2021

Dr. Muhammad Umar Janjua

Mr. Waqar Ahmed

Dr. Waqas Sultani

Dr. Khurram Bhatti

2
Declaration

This thesis is a presentation of my original research work. Wherever contributions of others are involved, every effort is
made to indicate this clearly, with due reference to the literature, and acknowledgement of collaborative research and
discussions. I also declare that this work is the result of my own investigations, except where identified by references and
free from plagiarism of the work of others.

Signature: ………….

Student Name Hamza Ahmed

Date: 10/09/21

3
The undersigned hereby certify that they have read and recommend this project entitled “DESIGNING A
DATABASE FOR ETHEREUM SMART CONTRACT VULNERABLITY ANALYSES” by Hamza Ahmed
for the degree of Bachelor of Science in Computer Science.

_________________________________

Dr. Muhammad Umar Janjua, Supervisor

_________________________________

Mr. Waqar Ahmad, Final Year Project Committee Member

_________________________________

Dr. Waqas Sultani, Final Year Project Committee Member

_________________________________

Dr. Khurram Bhatti, Final Year Project Committee Member

4
Acknowledgement

I would like to thank Allah Almighty and acknowledge my research supervisor Dr. Umar who took me under his wing for
the project. My gratitude also goes out to Ms. Tania Saleem and Ms. Maha Ayub who helped me out during the thesis.

5
Dedication

I would like to dedicate this thesis to my parents and my sibling, without whose support this would not have been
possible.

6
Table of Contents

Part I (Comparison of Vulnerability analysis Tools)----------------------------------------------------------------------------------11

Introduction---------------------------------------------------------------------------------------------------------------------------------13

Terminology--------------------------------------------------------------------------------------------------------------------------------14

Literature Review--------------------------------------------------------------------------------------------------------------------------15

Motivation----------------------------------------------------------------------------------------------------------------------------------16

Methodology--------------------------------------------------------------------------------------------------------------------------------17

Tools-----------------------------------------------------------------------------------------------------------------------------------------19

Tools by techniques--------------------------------------------------------------------------------------------------------------24

Vulnerabilities------------------------------------------------------------------------------------------------------------------------------26

Evaluation-----------------------------------------------------------------------------------------------------------------------------------28

Conclusion----------------------------------------------------------------------------------------------------------------------------------30

Part II (Developing database for Transper and leveraging it for analysis of security vulnerabilities in Ethereum Smart
Contracts)-----------------------------------------------------------------------------------------------------------------------------------32

Introduction---------------------------------------------------------------------------------------------------------------------------------34

Literature Review--------------------------------------------------------------------------------------------------------------------------34

Problem Statement-------------------------------------------------------------------------------------------------------------------------38

Implementation-----------------------------------------------------------------------------------------------------------------------------39

Schema----------------------------------------------------------------------------------------------------------------------------39

Schema explanation-------------------------------------------------------------------------------------------------------------40

Performance Analysis---------------------------------------------------------------------------------------------------------------------41

Conclusion----------------------------------------------------------------------------------------------------------------------------------42

References----------------------------------------------------------------------------------------------------------------------------------42
7
List of Tables

Table 1-------------------------------------------------------------------------------------------------------------------------------------15

Table 2-------------------------------------------------------------------------------------------------------------------------------------18

Table 3-------------------------------------------------------------------------------------------------------------------------------------20

Table 4-------------------------------------------------------------------------------------------------------------------------------------21

Table 5-------------------------------------------------------------------------------------------------------------------------------------22

Table 6-------------------------------------------------------------------------------------------------------------------------------------23

8
List of Figures

Fig 1. ---------------------------------------------------------------------------------------------------------------------------------------19

Fig 2. ---------------------------------------------------------------------------------------------------------------------------------------26

Fig 3. ---------------------------------------------------------------------------------------------------------------------------------------27

Fig 4. ---------------------------------------------------------------------------------------------------------------------------------------29

Fig 5. ---------------------------------------------------------------------------------------------------------------------------------------35

Fig 6. ---------------------------------------------------------------------------------------------------------------------------------------39

Fig 7. ---------------------------------------------------------------------------------------------------------------------------------------41

9
List of Abbreviations

ETH-Ether

DApps-Decentralized Applications

SWC-Smart Contract Weakness Classification Registry

DL-Deep Learning

ML-Machine Learning

CFG-Control Flow Graph

EVM-Ethereum Virtual Machine

AST-Abstract Syntax Tree

ER-Entity Relationship

10
Part I

Comparison of vulnerability analyses tools for Ethereum Smart Contracts

11
Abstract

Smart contracts, introduced with Ethereum, introduced generalized computing using the Blockchain.
However, as the Smart contracts are programmable it also opened up the opportunity for the exploitation of the
said contracts/programs. Multiple hacks through out the Ethereum environment over the years led to the rise of
tools that could detect vulnerabilities, bugs and bad coding practices.

This paper focuses on such vulnerabilities analysis tools that are open source and have implementations.
The process of surveying the tools is standardized by choosing 37 vulnerabilities that are in the SWC registry-
smart contract weakness registry. We conduct the survey over certain metrics that are important to the
evaluation of a vulnerability analysis tool as well as the technique it uses. We also answer some predefined
research questions intended to help the: developer, pen-tester and researcher. The paper intends to provide a
holistic view of the ecosystem in order to help the average user for their particular needs.

12
Introduction

Blockchain was first introduced by Satoshi Nakamoto in 2009, as part of the technology that supported
the Bitcoin the first cryptocurrency. It later continued to support other cryptocurrencies as well. [1]. Blockchain
is a distributed ledger technology that runs on nodes-computers, that run instances of the Bitcoin program. It
uses a distributed consensus protocol known as proof of work (PoW). These miners are the nodes that the
mining program for Bitcoin, or for any other cryptocurrency runs on. The program solves cryptographic puzzles
in order to generate blocks, groups of transactions, which have been bundled together and made part of the
Blockchain. Each block contains the hash of the previous block and form a liked list like structure known as a
Merkel tree. Ethereum is the second largest Blockchain platform, introduced by Vitalik Buterin in 2014 [2]. It
has its own cryptocurrency, Ether. Ethereum also uses proof of work consensus algorithm similar to Bitcoin.
However, as of the writing of this paper it is planned to be shifted to Proof of Stake algorithm [3].

Ethereum allows developers to write smart contracts, and run them on Ethereum Virtual Machine
(EVM), which is a global computer-it comprises of all the nodes running the ethereum program. This is what
the Ethereum network is made up of [4]. Smart contracts were firstly introduced by Szabo [5] in 1996. they
were introduced as part of the Ethereum platform. These contracts introduced generalized programming and
made it possible to develop the DApps (Decentralized Applications)- games, contracts, applications etc.
Ethereum has its own programming language solidity, for coding these smart contracts.

Smart contracts are essentially program code that is immutable and public. Since the smart contracts are
deployed to the Ethereum network they are registered with an address and cannot be modified due to the nature
of the Blockchain. It is a computer program, and can be hacked like any other. Since they deal in ETH, the
Ethereum cryptocurrency, their hacking results in real monetary loss. The possibilities of hacking and the
financial reward or bounty increased the chances of a hack. Multiple examples of hacks can be seen in: i) The
DAO attack in 2016, which resulted in loss of 50M Dollars [6]. ii) Rubixi and Parity wallet attack, [7], [8], [9],
resulting in loss of 3.8M Dollars. iii)The 51 percent attack, [10] resulted in loss of 5.6M Dollars.

These aforementioned exploits have resulted in Millions of dollars of losses. In total in 2020, attacks
resulted in losses of 3.8 Billion Dollars [11]. This motivated the developers to building tools that could be used
for checking the Solidity code used by the smart contracts. Many tools have been built for smart contracts
verification and vulnerability analysis. These tools can uses multiple techniques and vary for the vulnerabilities
they target.

These tools use Static Analyses techniques, such as in [12] and [13] etc. and others use Dynamic
Analyses techniques such as in [14] [15]. [16] etc. There are also others that use Deep Learning and Machine
13
Learning techniques in [17], [18]. The existing comparison techniques fall into these 3 major categories. The
paper [19] details a lot of tools but selects only a few in the testing phase. These 9 out of 35 tools do not
guarantee a holistic overview of the tools.

Terminology

A. Blockchain

Blockchain is a type of DLT (Distributed Ledger Technology) that is immutable and anonymous. It was
first introduced in a paper by Satoshi Nakamoto, in 2009, as the technology that made Bitcoin possible. The
blockchain is a linked list type data structure secure due to the public sharing and immutability of the data due
to the time stamping. Blocks are approved by a proof of consensus algorithm that is being run by the nodes. It is
a structure that links blocks-transactions which have been linked together under the same address, with each
containing the cryptographic hash of the previous one, times-tamping and data which is essentially the
transactions. These are represented as a merkle tree. This includes solving a mathematical puzzle in order to add
a block to the blockchain. Although it is vulnerable to a 51% attack. Even though it is cryptographically secure,
and immutable, the hacking of 51 percent nodes of the network would result in the approval of corrupt
transactions.

B. Ethereum

Ethereum is the contemporary to Bitcoin that also uses Blockchain. It has introduced general purpose
computing on the Blockchain by introducing Smart Contracts. Ethereum runs as a global VM (Virtual
Machine), where a state change is made part of the Blockchain by executing it on all the nodes. Ethereum was
developed by Vitalik Buterin. It has two types of account unlike Bitcoin. A programmable account that is the
Smart contract, and a user account use for dealing with ETH. The latter is only used for transactions while the
earlier is the one that has been written to run on EVM (Ethereum Virtual Machine). Ethereum is similar to the
Bitcoin network. It adds to the network that the cost of computation taken at each step of the contract, Gas,
when mined is reimbursed to the miner.

14
C. Smart Contract

They were first proposed by Szabo. It is written in Solidity and has Bytecode that runs on the EVM.
Smart contracts are executed automatically once they have been made part of the Blockchain. They can be
invoked using the user account with a specific type of argument. The contracts can also have vulnerabilities like
any other program. Immutability of the Blockchain has made the vulnerabilities a permanent part of the
contracts, and cannot be removed. Although there are certain techniques to improve the code, generally using
vulnerability scanners for smart contract to catch bugs proactively.

Literature Review

There are a few studies that have done comparison of tools as shown in Table I. However, they have
limitations and we have talked about them in the following paragraph.

Rick et al. [20], compares only static analysis tools. It limits itself to 3 tools and only 5 vulnerabilities,
which are checked against the vulnerabilities claimed and those detected. It also has detailed best practices for
the smart contracts. The tools belong only to a specific category and number of tools are too limited. There is no
standardized list of bugs or vulnerabilities checked against. Tools Compared: Mythril, Oyente, Porosity.

In [21] the survey only 9 tools and 8 vulnerabilities. Again, the number for both is limited. There were
more tools available at the time of research. The paper covers the detailed summary of each tool and
vulnerability as well as dividing the tools in tools into categories, Dynamic and Static. It also considers EOSIO

15
Smart contract bugs and vulnerabilities as well. Tools Compared: Oyente, Securify, Mythril, Manticore,
teEther, MAIAN, ContractFuzzer, EVulHunter, EOSafe.

The paper [22] considers the highest number of tools considered by any paper, 35 , and boils down to
only 9. It also considers the largest set of smart contracts 42,000, to test the tools. The consideration criteria for
the survey is to choose the ones that are publicly available, and easy to use. Smart Bugs is used in order to
automate the process of testing the tools. Limited by number of tools 9 and vulnerabilities, 10, that it compares
against. Vulnerability list is less accurate and not standardized. Extensive analysis is conducted and specific
research questions also an answered. Tools Compared: HoneyBadger, Maian, Manticore, Mythril, Osiris,
Oyente, Securify, Slither, SmartCheck

[23] survey takes into consideration the most vulnerabilities and divides them according to the layers in
the Ethereum EVM. It is the most detailed survey about the vulnerabilities that exist given how it has defined
what can occur at what level. The paper also defines strategies and details on how the vulnerabilities can be
resolved. However it does not compare any tool.

Furthermore, [24] has the highest number of tool considered. The tools are compared buy conducting in
depth analyses against metrics that include techniques and the instrumentation done by tools. The tools are
further expanded upon by summarising their workings as well as the techniques terminology explained.
However, it considers frameworks that do not conduct any actual analyses but only improve the understanding
for the code. Tools Compared: contractLarva, E-EVM, Erays, Ethir, EtherTrust, FSolidM, KEVM, MAIAN,
Manticore, Mythril, Osiris, Oyente, Porosity, Rattle, Remix- IDE, Securify, SmartCheck, Solgraph, SolMet,
Vandal, Ether, Gasper, ReGuard, SASC, sCompile, teEther, ZEUS.

Finally, [25] considers 7 tools against 16 vulnerabilities. Only compares publicly available tools but also
considers open source frameworks that only improve understanding like [24]. However, it also considers some
vulnerabilities and expands upon them with detail and examples as well as relates to attacks that happened.
Tools Compared: Oyente, ZEUS, Vandal, Ethir, Securify, MAIAN, Gasper.

Motivation

The motivation behind our research is to fill the gap left by other researches. This is done by reviewing
tools that are available to the public and not just papers. They should have physical implementations. Reviewing
them regardless of their analysis method. Standardizing the tests/vulnerabilities to be compared against. We use

16
swcregistry [26] as our guide in this matter. Since we are working as an independent researcher our comparison
would be impartial to the greatest possible extent. We are motivated by the knowledge gap, to provide the
average person who wants to test their contract the best possible choice for a tool, if not, then best possible
approach to follow. Furthermore, we want to guide the developer in order to find the best method to implement
for a specific type of vulnerability. This would be achieved by our inter-technique comparison that we have in
the evaluation section. of tools.

Methodology

As part of our paper, we have studied and reviewed 40 tools/frameworks. Our criteria to choose a tool
for our survey is that it should have a physical implementation and detect vulnerabilities. It should not be a tool
that just analyses and would not actually detect security issues. This obviously skips the ones that involve the
transformation of code or output CFG(Control Flow Graph) for better comprehension without any actual
vulnerability detection. We chose tools regardless of the techniques. We can only compare the ones publicly
available and have included them, 19, in our survey from the original corpus of 40. The paper uses the SWC
Registry [26] — the Smart Contract Weakness identification Registry — that contains vulnerabilities
documentation as the comparison metric for tools. It contains a list of potential contract weaknesses and
vulnerabilities, along with explanations, references and vulnerability examples.

There exists limited comparison between these tools and techniques that could help out the developer,
practitioner and or researcher. This is due to: i) Lack of inter-technique comparisons between dynamic and
static analyses’ tools. More often that not the tools comparison is between the same techniques, i.e static
analyses or dynamic analyses. ii) Reviews by others apart from the authors and comparisons by authorities
independent of tool. An author who is comparing his/her tool with others would obviously want their own to
stand out. iii) The lack of vulnerabilities that are standardized to be tested against. Most authors just choose the
most common or popular vulnerabilities to check against. iv) Lack of a sizeable number of open source tools
that have been tested.

Many authors choose to test tools that are not available using their academic paper. In our research, we
aim to survey publicly available tools in order to find the ones that are best, compared against a standardized
metric. We also intend to answer the following questions as part of our research. RQ1: Which tool(s) covers
most vulnerabilities? RQ2: What tool(s) has the least false positive rate? RQ3: What vulnerabilities are
not considered by any tool and why? RQ4: What level of analyses has better results?

17
18
Tools

Fig. 1
A. Taxonomy

Tables 3 4 and 5 list own the tools we compare and are publicly available. It breaks down the
techniques used in each tool if more than one. The level the analyses run on and the way the code is
changed in order for the technique(s) to run successfully. It also specifies what type for techniques is
used for example Formal Verification is further divided into model checking and abstract Interpolation.

B. Vulnerabilities

We compare the tools against the 376 Vulnerabilities that we took from [15] and see which ones
can detect the most of them. The vulnerabilities are not all indicative what a tool can detect but rather
what can detected from the given list. Many tools can detect vulnerabilities that are not on the list and
we have not included them since we wanted to standardize the process.

19
20
21
22
23
Tools by technique

Some tools use both Dynamic and Static analyses and we have grouped them separately.

1) Dynamic Analyses: [14] It is a fuzzer that extends the static analyses tool Slither in order to find
important functions and variables. These are the ones that handle ETH. Then random, ABI based, transactions
are generated, and then the fuzzing process is started.

[27] ContractFuzzer, is a static analysis based fuzzer. It uses static analyses on the ABI interface and
bytecode to extract signature of function and data types of each function. It then generates the inputs and start
with the process of fuzzing.

2) Dynamic Analyses + Static Analysis: [35] It uses concolic (symbolic execution and concrete
validation) execution with Z3 solver for SMT solving. It is optimized with three different strategies which are
unique to EthRacer. It also uses Fuzzing.

[16] Combines Mythril and Slither, the static and dynamic symbolic analyses and helps to solve
combinatorial explosion using heuristic method.

[15] Has a number of the modules as part of the package. Is the most popular overall and has an online
paid version called MythX. It uses concolic analyses, while also utilizing the Z3 solver for SMT solving.

3) Static Analyses: [38] A honeypot analyses tool that analyses honeypots at three different levels EVM,
Solidity Compiler and Ethereum Blockchain Explorer. It outlines the types of identifiers for a honeypot and
then how to mitigate them. It incorporates these techniques into the tool.

[47] This tool uses a symbolic execution engine that doesn’t let the semantic information go to waste.
Then it runs analyses on this data using predefined security properties and detects any violations.

[56] The tool decompiles the EVM bytecode into CFG and so on into different abstractions and recover
functions and important points in code to generate exploits and then test them.

[57] The tools runs analyses to extract relations and then use predefined properties, with both being
converted to declarative language of soufle in order of find vulnerabilities. It uses concolic execution.

24
[52] Tool uses static analyses and conversion into XML based form. Code analyses is further done on
the code in XML based form, find similar patterns to those of vulnerabilities using Xpath queries.

[12] A static analyzer for out of gas vulnerabilities that comes packed with a bytecode decompiler. It is
the most advanced tool in its class.

[50]The tool/framework works on AST level allows modification of the AST and then the recompilation
of the code with instrumentation or even work on the AST level.

[39] A Concolic analyses tool for suicidal contracts as well as prodigal and greedy contracts.

[40] Uses symbolic analyses and constraint solving, which is also known as dynamic symbolic analyses.
The advanced user can use instrumentation for the tool as well customise the analyses.

[57] A tool that is able to cover a range of issues by using Model Checking and using IR known as
Boogie. It can also produce counter examples in addition to proofs and works on solidity level.

[41] It uses taint analyses in addition to symbolic analyses. Also uses the Z3 and finds integer bugs. The
taint engine combination sets it apart.

[13] Oyente is a tool that uses symbolic analyses tool. It covers a diverse but limited vulnerability types.

4) Deep Learning/Machine Learning: [18] A tool that extends Oyente by using it for labelling the
vulnerabilities in the training sets and then running machine learning models such as KNN, in order to find
vulnerabilities on the test set.

[17] A tool that uses vectors which built on information extracted from the contracts. These have been
generated such that the structural and semantic information about the contract. It is used in order to find bugs
and clone code by comparing it with a database of the contracts for similarities.

25
Vulnerabilities

Fig .2:
Vulnerabilities I
26
Fig 3:
Vulnerabilities II

27
Evaluation

RQ1-Which tool(s) covers most vulnerabilities?

From our analysis we can see that Mythril, MPro. Smart Check come in second, but they have a high
false positive rate. This puts the vulnerabilities detected by them in a an untrustworthy position due to the lack
of accuracy. The custom property detecting tools can cover all possible vulnerabilities that are in existence. But
the properties need to be programmed in the tool and thus are not part of the software by default.

RQ2-What tool(s) has the least false positive rate?

Mythril, considered the most popular tool not only because of the packages it has but it also because of a
less false positive rate. MPro, that builds upon it, inherits the same properties.

RQ3-What vulnerabilities are not considered by any tool and why?

Multiple vulnerabilities are not considered, 109, 111, 117, 119, 120, 122, 123, 124, 127, 129, 130, 131,
133, 134, 135, 136. The vulnerabilities less focused on are the ones that have either not been exploited yet or are
not severe.

RQ4-What level of analyses has better results?

Working on the Bytecode level also provides an edge. It is certainly expensive to abstract on it and
produce info to run analyses buy the level of granularity that the analysis provided far exceeds what can be done
on source code level. Therefore, the Bytecode analyses tools are often more precise and better off in their
analyses with regards to false positives as seen in Mythril and MadMax. The later might only deal with a
specific type of vulnerability but the rate of false positives might be lot lower than what you might expect from
say SmartCheck. SmartCheck although covers many vulnerabilities, and is on par with what Mythril does, but
50 percent of these might be false positives.

28
Fig. 4:
Number
of tools
detectin
g each
vulnera
bility

29
Conclusion

At the beginning of the paper, we outlined that we have two aims, one to evaluate the tools and
the other to evaluate techniques in general. Our aim with the first was to provide an insight into the tools
available and to help anyone who was looking to use them. With the second we wanted to help out anyone that
wanted to use an approach to develop a new tool. We summarise upon these in the following sections.

A. Techniques

The techniques in general have their advantages and disadvantages. Static analyses might give an in
depth view of the code structure an vulnerabilities that might not be caught by dynamic analyses. However, it
might be too tedious. Dynamic analyses techniques might be faster and give access to logical bug or those that
might be encountered in the real world, run time situations, but it might not be able to figure out the bugs that
the static analyses does. Thus, neither of these could be overlooked. A comprehensive analysis of a smart
contract should, in our opinion involve both to cover maximum vulnerabilities.

Furthermore, the DL/ML tools we summarised, might be even more efficient in saving time. The tool
SmartEmbed, uses a smart contract database with bugs marked, to find clone code. This would cover up the
bugs present and cut down time exponentially instead of coming up with a novel analysis that might take too
much time. If we break down each dynamic and static analyses techniques we came across, each of them has
certain issues that would not be solved without refining them or using them in combination with other that are
from Static analyses or Dynamic or vice versa. For example, we can assume that fuzzing would cover all the
inputs if we were to use it standalone. Yes, that might be possible, but it is expensive with regards to
computation power and time. It might result in combinatorial explosion. But if you were to use it with symbolic
analyses and constraint solving, it would minimize the inputs needed to test the code. Though symbolic analyses
is on the cusp of dynamic and static analyses, combining it with constraint analyses, combined known as
conolic testing, within the later being a static analysis method, certainly gives an edge. It reduces the number of
paths to explore by generalizing a formula for the paths, then generating constraints and testing them with
fuzzing. The approach has been replicated but uses taint analyses instead of fuzzing in Oyente. Furthermore,
heuristics method might also MPro, this further by only exploring the routes which deal with certain functions
or say ether, that is employed by a tool we reviewed, MPro.

On the other hand, DL technique as we saw in SmartEmbed, or to some extent the ML one in
ContractWard, are also giving an edge to the detection of bugs. They cut down the detection times by
decreasing the effort on coming up with a novel method to look for how vulnerabilities are in the program. But

30
rather using the mind of the many, taking data from other labelled smart contracts such as SmartEmbed does
with Oyente and using them to run analyses. Formal verification technique such as Model checking combined
with symbolic execution and dynamically used improves precision of detection as seen in a tool like Mythril but
its would not solve that combinatorial explosion completely. This is dealt with in MPro using heuristics to deal
with it.

B. Tools

Among the tools we have two opinions. One that you can look at the tool that goes for the maximum
number of vulnerabilities. But the problem arises about how you are going to then work out on the false positive
detection. The choice might be Mythril or SmartCheck in the previous case. However, after what we mention a
scenario that would be prone to a specific type of vulnerabilities, say a developer’s style is to overlook out of
gas vulnerability situation you would want to choose a tool that focuses in depth on that one such as MadMax.
Or uses Vulnerabilities tools with each other depending on the issue. For example, Slither is combined with
Echidna in order to provide rich static information which is then used for fuzzing. Furthermore, Echidna is also
usable with custom properties so that would definitely customise analysis for a specific type of contract. The
freedom to modify within tool such as this is much more important as it also saves time aside from modifying
the tool itself. Oyente is being used with SmartEmbed, is another example of this. In our opinion, MPro is the
tool that stands out in general, for the vulnerability analyses as it has the Mythril engine combined with Slither’s
static analyses. But it also cuts down on combinatorial explosion using its heuristics method.

Speaking of path explosion, taint analyses run on the path or functions dealing with Ether would also
prove useful in a situation where the dependent smart contracts are less and the transactions as well. When
dealing with a specific vulnerability, MadMax for the Out of Gas vulnerabilities, tools such as EasyFlow which
we haven’t mentioned, because we couldn’t find the tool, which are niche specific. Such as the later, it deals
with Overflow vulnerabilities are more suited than MPro in our opinion.

31
Part II

Developing a database for Transper to leverage it for security analysis of smart contracts

32
Abstract

Transper, is a Smart Contract storage extraction and upgrade tool. It analyses Ethereum Smart Contracts
and extracts the data of their state variables especially mappings and their keys with the proper association in
order to upgradeor migrate the smart contract. However, the analysis and extraction results of the smart contract
are not properly stored or manage i.e, if the same smart contract run again it needs to go through the whole
procedure which affects the performance of the Transper tool.

Also, the Extraction procedure of Transper fetched live data from the Etherscan website without keeping
any record and presented to the user. This project designs and implements a database for the Transper tool,
using Flask and PostgreSQL technologies with the login page for user authentication. Designing and
implementing the database helps the Transper in increasing performance with proper management of smart
contract state variables.

33
Introduction

Ethereum Blockchain

Ethereum is the second largest Blockchain platform with the cryptocurrency, Ether. It uses a proof of
work consensus algorithm similar to Bitcoin. However, it is planned to be shifted to the Proof of Stake
algorithm [3]. Ethereum allows developers to write smart contracts, and run them on Ethereum Virtual Machine
(EVM), which is a global computer composed of all Ethereum nodes.

Ethereum Smart Contracts:

Smart contracts were firstly introduced by Szabo [5] in 1996. They were introduced as part of the
Ethereum platform. They introduced generalized programming and made it possible to develop the DApps
(Decentralized Applications). Ethereum has its own programming language solidity, for programming these
smart contracts.

Smart contracts are essentially program code that is immutable and public. Since the smart contracts are
deployed to the Ethereum network, they are registered with an address and cannot be modified due to the nature
of the Blockchain.

Each smart contract that runs on EVM maintains its own permanent storage. It is a large array where
each slot can hold 32 bytes. These slots are assigned to the variables of the smart contract in order of
declaration. The number of slots is equal to the type and size of the variable. There are four types of variables in
solidity language: Elementary, mapping, user-defined, and array.

Literature Review

The immutable smart contract is a significant challenge in the upgradability of smart contracts after
deployment. Transper is a tool to analyze and extract smart contract storage used to migrate and upgrade smart
contracts. It has static time source code analysis techniques to perform key approximation analysis for mapping
structure and handles regular variables as well as user-defined variables.

Transper extracts the storage state of Ethereum smart contract and allows analysis, packing of
the state, and then upgradation where possible. It uses four main algorithms:

34
(1) Slot Calculation: A static source code based analysis algorithm, calculates the slots of all the regular
variables stored.

(2) Key Approximation: A static, CFG based algorithm for approximating the possible set of keys used
that index mapping type variables and the origin of their corresponding definitions.

(3) State Extraction: It extracts values of all state variables. Uses the output of Slot Calculation and Key
Approximation algorithms in order to retrieve the data of smart contract deployed on Ethereum blockchain.

(4) Upgradation: Automatically upgrades the given smart contract to a new version while preserving and
migrating the current state of the newer smart contract by automatically initializing with extracted values.

Transper Workflow

Fig. 5: Transper Workflow

Analysis: In order to calculate the slots of the variables, the algorithm iterates over the AST of the smart
contract and identifies all declarations of the state variables. Solidity assigns the slots to variables in the same
order in which they are declared.

35
The algorithm uses the following steps to find the values:
• If an elementary type variable that takes 32 bytes, mapping variable or a fully dynamic array, one slot
is allocated to that variable.

• If it is an elementary type which size is less than 32 bytes then it will only take the bytes required by
the variable.

• If the slot is already using some bytes and a new elementary variable can fit in the remaining space,
bytes required by the elementary variable from the remaining space are allocated
to the variable.

• If the slot is already using some bytes and the new elementary variable can not fit into the remaining
space, a new slot is allocated to the variable.

Mapping data structure does not store its keys in the storage trie. Identification of mapping keys is
important at static time in order to recover the mapping variable. Thus we have a static and source code-based
key approximation algorithm to approximate the set of keys for mapping variables. The algorithm finds the
mapping keys’ origin that is if a key was declared as the state variable, passed as an argument, or it was
generated at run-time.

Two main algorithms of the key approximation analysis are reaching definition analysis and
backtracking algorithm. Before that there is some pre-processing:

(1) Generate a CFG for each function of the smart contract.

(2) For each function, traverse each block of code to find expressions that modify the mappings of smart
contract and mark those functions.

Then reaching definition analysis and backtracking algorithms are performed on the tagged functions to
approximate the possible set of keys for each mapping.

Reaching Definition Analysis

When an operation modifies a variable at a certain location of a the , it defines the variable’s value at the
location. When an operation reads this variable it uses the variable’s defined value at the location or from other
locations that may reach it. Reaching definition analysis is used to approximate the set of possible definitions
that reach the use of a variable. It uses a work list for each block of code to collect all definitions flowing in

36
from various incoming paths. The key approximation algorithm makes use of this reaching definition
information to find keys of all the mapping type variables.

Backtracking algorithm

The reaching definition analysis generates a set of definitions reaching every node of the CFG. These
sets may contain variables whose exact values are unknown at compile time as opposed to constants. There is a
need to track and identify the origin of all definitions and resolve those definitions to constant values. This is
done by backtracking algorithm

Extraction

Combines the output of the previous two algorithms and performs the following tasks Read the values
from the slots calculated in the first algorithm.

• Split the values between variables if more than one variables share the same slot.

• Format the extracted values to their respective data type.

• Using the information reported by key approximation algorithm, read the value of the mapping keys
from the blockchain.

Upgradation

The upgradation algorithm inserts the extracted state into the new smart contract. Upgradation requires
the new smart contract code, previous smart contract, and extracted state as input and output the new smart
contract code by initializing the variables with the extracted state aWhen we deploy the updated code, it will set
the values of state variables to the values of the previous contract thus preserving the state while upgradation.
For upgradation, first of all, AST is generated for old and new smart contracts. After that, all the state variables
are extracted from smart contracts. The intersection of both smart contracts variables are calculated and then
values are extracted for the intersected variables from the extracted state. Afterward, the values of the variables
are initialized in the new smart contract constructor and finally new smart contract code is generated from the
AST.

37
Problem Statement

The Transper web app is a smart contract analysis and upgradation tool. It is a client faced application. It
runs analysis on smart contracts and has to fetch data from the Ethersxcan website. The whole process of
fetching this data and analyzing a smart counteract could take as long as the length of the code. This could
result in unnecessary delays in case of frequent users, analysis of the same contract, or even different ones An
analysis has to be re-run for the same smart contract due to lack of storage. This results in a bad UX due to long
waiting times. We design and integrate a database in this project with Transper to resolve this redundancy.

Implementation

Technologies used:

Python-General Programming language: Used for programming the backend and transper code itself.

Flask-Backend coding: Framework used for the backend.

PostgresSQL-Database: The NoSQL database of choice for the project.

React-Frontend coding: Designed the web interface in this.

38
Schema

Fig. 6:
Schema

Schema Explanation
39
Analysis:

In analysis the Transper needs the source code of smart contract and provide the mapping keys of
mapping variables.

The Table “Analysis” store the information of key approximation analysis result. In the table, “Contract
Address” works as the primary key while it also passes this id to “Mapping ” table as foreign key that would
accommodate the complete information of mapping variable. State variables and state values are also
updated with corresponding values.

Extraction:

The analysis table is updated with extraction time. Key values are also extracted and updated keys and
key values table. Functions section in Analysis table is also updated with the extracted functions from the
code.

Upgradation:

The upgrade table is updated with the contract code being updated to as well as the address of the
previous contract. Contract table saves the time for the upgradation in this step.

Use Cases

1. Analysing a contract

User analyses a contract and presses Analysis and then Extract. Its data is stored in the database along
with the timestamp.

2. Re-Analysing a contract

User re analyses a contract or analyses one that was already in the database. The database returns the
values, and he proceeds to the Pack stage. If the values are older than 24 Hours, they are deleted and the app
fetches new values from the web.

40
Advantages of a Database

1. Better User Experience

Due to saving data in the database when re analysing the contract by the same user or different one. It
would result in faster response time and thus a better user experience.

2. Finding vulnerability trends

Transper with a database, would allow us to save data and code that would help us find vulnerabilities in
contracts. Since it saves code which belongs to contracts that may or may not have been published, in case of
upgrading contract, we can run analysis from external tools to find vulnerabilities that may have had forced a
person to use Transper. This could lead us to find the trends of common vulnerabilities in the Ethereum
environment and develop secure coding standards.

Performance

Analysis
Fig. 7: Database Performance Analysis

41
System Specs (For the testing):

4 Core i7 Processor

8GB Ram

Ubuntu 20.04 LTS

We tested the web app with the database integrated on the sample contracts provided, and had the
following results. We calculated the time for analyses for each contract with and without a database.

The analyses times were less across all the sample contracts. The total time for the analyses of a contract
is the Blue bar plus Green bar, when it has not been analysed before, or in the case of not having a database.
The grey bar shows the total analyses time after the database thus seen the reduction in the analyses time.

Conclusion

Transper with a database not only has a better UX by due to improved response time but also provides
and opportunity at data collection for real time analysis to be performed to for the contracts. This would allow
the app to improve and evolve with regards to the features it provides in addition to the basic role it has been
built for.

42
References

[1] S. Nakamoto, “Bitcoin whitepaper,” URL: https://bitcoin. org/bitcoin.pdf-( : 17.07. 2019), 2008.

[2] G. Woodet al., “Ethereum: A secure decentralised generalised trans-action ledger,” Ethereum project yellow
paper, vol. 151, no. 2014, pp.1–32, 2014.

[3] Smart contracts: Building blocks for digital markets, 1996.

[4] jumpstartmag.com/understanding-51-attacks-on-blockchains}

[5] parity.io/the-multi-sig-hack-a-postmortem/

[6] parity.io/a-postmortem-on-the-parity-multi-sig-library-self-destruct/

[7] blog.ethereum.org/2016/06/19/thinkingsmartcontractsecurity/

[8] blog.ethereum.org/2016/06/19/thinkingsmartcontractsecurity/

[9] aithority.com/technology/blockchain/blockchain-hackers-stole-3-8-billion-in-122-attacks-in-2020/

[10] N. Grech, M. Kong, A. Jurisevic, L. Brent, B. Scholz, and Y. Smaragdakis, “Madmax: Surviving out-
of-gas conditions in Ethereum smart contracts,”Proc. ACM Program. Lang., vol. 2, no. OOPSLA, Oct. 2018.
[Online]. Available: https://doi.org/10.1145/3276486

[11] L. Luu, D.-H. Chu, H. Olickel, P. Saxena, and A. Hobor, “Making smart contracts smarter,” in Proceedings
of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’16.New York,
NY, USA: Association for Computing Machinery, 2016, p.254–269. [Online]. Available:
https://doi.org/10.1145/2976749.2978309

[12] G. Grieco, W. Song, A. Cygan, J. Feist, and A. Groce, “Echidna:Effective, usable, and fast fuzzing for
smart contracts,” in Proceedings of the 29th ACM SIGSOFT International Symposium on Software
Testing and Analysis, ser. ISSTA 2020.New York, NY, USA: Association for Computing Machinery,
2020, p. 557–560. [Online].Available: https://doi.org/10.1145/3395363.3404366

[13]conference.hitb.org/hitbsecconf2018ams/materials/WHITEPAPERS/WHITEPAPER

43
[14] W. Zhang, S. Banescu, L. Pasos, S. Stewart, and V. Ganesh, “Mpro: Combining static and symbolic
analysis for scalable testing of smartcontract,” in2019 IEEE 30th International Symposium on Software
Reliability Engineering (ISSRE), 2019, pp. 456–462.

[15] swcregistry.io

[16] W. Zhang, S. Banescu, L. Pasos, S. Stewart, and V. Ganesh, “Mpro: Combining static and symbolic
analysis for scalable testing of smart contract,” in 2019 IEEE 30th International Symposium on Software
Reliability Engineering (ISSRE), 2019, pp. 456–462.

[17] Z. Gao, V. Jayasundara, L. Jiang, X. Xia, D. Lo, and J. Grundy, “Smartembed: A tool for clone and bug
detection in smart contracts through structural code embedding,” in 2019 IEEE International Conference on
Software Maintenance and Evolution (ICSME), 2019, pp. 394–397.

[18] W. Wang, J. Song, G. Xu, Y. Li, H. Wang, and C. Su, “Contractward: Automated vulnerability detection
models for ethereum smart contracts,” 2020

[21] “Comparison of static analysis tools for ethereum smart contracts,” 2018. [Online]. Available:
https://telluur.com/utwente/bachelor/Module [21] J. Xu, F. Dang, X. Ding, and t. y. v. n. p. d. M. Zhou, book
title=2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education
(ICISCAE).

[22] B. Jiang, Y. Liu, and W. K. Chan, “Contractfuzzer: Fuzzing smart con-tracts for vulnerability detection,”
in2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2018, pp. 259–
269.

[23] TY - BOOK AU - Pace, Gordon AU - Ellul, Joshua PY - 018/09/11 SP - T1 - Runtime Verification of


Ethereum Smart Contracts DO - 10.1109/EDCC.2018.00036 ER -

[24] W. Wang, J. Song, G. Xu, Y. Li, H. Wang, and C. Su, “Contractward: Automated vulnerability detection
models for ethereum smart contracts,”IEEE Transactions on Network Science and Engineering, pp. 1–1, 2020.

[25] Y. Xue, J. Ye, M. Ma, L. Ma, Y. Li, H. Wang, Y. Lin, T. Peng, andY. Liu, “Doublade: Unknown
vulnerability detection in smart contractsvia abstract signature matching and refined detection rules,” 2019.

[26] R. Norvill, B. B. F. Pontiveros, R. State, and A. Cullen, “Visual emulation for ethereum’s virtual machine,”
inNOMS 2018 - 2018IEEE/IFIP Network Operations and Management Symposium, 2018, pp.1–4.

44
[27] J. Gao, H. Liu, C. Liu, Q. Li, Z. Guan, and Z. Chen, “Easyflow: Keep ethereum away from overflow,”
in2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-
Companion), 2019, pp. 23–26.

[28] Y. Zhou, D. Kumar, S. Bakshi, J. Mason, A. Miller, and M. Bailey,“Erays: Reverse engineering ethereum’s
opaque smart contracts,” in27thUSENIX Security Symposium (USENIX Security 18).Baltimore, MD:USENIX
Association, Aug. 2018, pp. 1371–1385. [Online].
Available:https://www.usenix.org/conference/usenixsecurity18/presentation/zhou

[29] Albert E., Gordillo P., Livshits B., Rubio A., Sergey I. (2018) EthIR: A Framework for High-Level
Analysis of Ethereum Bytecode. In: Lahiri S., Wang C. (eds) Automated Technology for Verification and
Analysis. ATVA 2018. Lecture Notes in Computer Science, vol 11138. Springer, Cham.
https://doi.org/10.1007/978-3-030-01090-4_30}

[30] C. Schneidewind, I. Grishchenko, M. Scherer, and M. Maffei, “Ethor: Practical and provably sound
static analysis of ethereum smartcontracts,” in Proceedings of the 2020 ACM SIGSAC Conference on
Computer and Communications Security, ser. CCS ’20.New York, NY, USA: Association for Computing
Machinery, 2020, p. 621–640.[Online]. Available: https://doi.org/10.1145/3372297.3417250

[31] A. Kolluri, I. Nikolic, I. Sergey, A. Hobor, and P. Saxena, “Exploiting the laws of order in smart
contracts,” in Proceedings of the 28thACM SIGSOFT International Symposium on Software Testing and
Analysis, ser. ISSTA 2019.New York, NY, USA: Association for Computing Machinery, 2019, p. 363–
373. [Online]. Available: https://doi.org/10.1145/3293882.3330560

[32] Mavridou A., Laszka A. (2018) Designing Secure Ethereum Smart Contracts: A Finite State Machine
Based Approach. In: Meiklejohn S., Sako K. (eds) Financial Cryptography and Data Security. FC 2018. Lecture
Notes in Computer Science, vol 10957. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58387-
6_28

[33] V. W ̈ustholz and M. Christakis, “Harvey: A greybox fuzzer for smartcontracts,” ser. ESEC/FSE 2020.New
York, NY, USA: Association for Computing Machinery, 2020, p. 1398–1409. [Online]. Available:
https://doi.org/10.1145/3368089.3417064

[34] C. F. Torres, M. Steichen, and R. State, “The art of the scam: Demystifying honeypots in ethereum
smart contracts,” in28th USENIX Security Symposium (USENIX Security 19).Santa Clara, CA:USENIX

45
Association, Aug. 2019, pp. 1591–1607. [Online].
Available:https://www.usenix.org/conference/usenixsecurity19/presentation/Ferreira

[35] I. Nikolic, A. Kolluri, I. Sergey, P. Saxena, and A. Hobor, “Finding thegreedy, prodigal, and suicidal
contracts at scale,” 2018

[36] M. Mossberg, F. Manzano, E. Hennenfent, A. Groce, G. Grieco, J. Feist,T. Brunson, and A. Dinaburg,
“Manticore: A user-friendly symbolic execution framework for binaries and smart contracts,” in2019
34thIEEE/ACM International Conference on Automated Software Engineer-ing (ASE), 2019, pp. 1186–1189.

[37] C. F. Torres, J. Sch ̈utte, and R. State, “Osiris: Hunting for integer bugs in ethereum smart contracts,” in
Proceedings of the 34th Annual

Computer Security Applications Conference, ser. ACSAC ’18.NewYork, NY, USA: Association for
Computing Machinery, 2018, p.664–676. [Online]. Available: https://doi.org/10.1145/3274694.3274737

[38] A. Pinna, S. Ibba, G. Baralla, R. Tonelli, and M. Marchesi, “A massive analysis of ethereum smart
contracts empirical study and code metrics,” IEEE Access, 06 2019.

[39] C. Liu, H. Liu, Z. Cao, Z. Chen, B. Chen, and B. Roscoe, “Reguard:Finding reentrancy bugs in smart
contracts,” in2018 IEEE/ACM 40thInternational Conference on Software Engineering: Companion (ICSE-
Companion), 2018, pp. 65–68.

[40] E. Zhou, S. Hua, B. Pi, J. Sun, Y. Nomura, K. Yamashita, andH. Kurihara, “Security assurance for
smart contract,” in2018 9th IFIPInternational Conference on New Technologies, Mobility and Security(NTMS),
2018, pp. 1–5.

[41] J. Chang, B. Gao, H. Xiao, J. Sun, Y. Cai, and Z. Yang, “scompile:Critical path identification and analysis
for smart contracts,” 2019.

[42] H. Liu, C. Liu, W. Zhao, Y. Jiang, and J. Sun, “S-gram: Towardssemantic-aware security auditing for
ethereum smart contracts,” in2018 33rd IEEE/ACM International Conference on Automated Software
Engineering (ASE), 2018, pp. 814–819.

[43] P. Tsankov, A. Dan, D. Drachsler-Cohen, A. Gervais, F. B ̈unzli, and M. Vechev, “Securify: Practical
security analysis of smart contracts,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer and
Communications Security, ser. CCS ’18.New York, NY, USA: Association for Computing Machinery, 2018,
p. 67–82. [Online].Available: https://doi.org/10.1145/3243734.3243780
46
[44] M. Rodler, W. Li, G. O. Karame, and L. Davi, “Sereum: Protecting existing smart contracts against
reentrancy attacks,” 2018.

[45] T. D. Nguyen, L. H. Pham, J. Sun, Y. Lin, and Q. T. Minh,“Sfuzz: An efficient adaptive fuzzer
for solidity smart contracts,”inProceedings of the ACM/IEEE 42nd International Conference on Software
Engineering, ser. ICSE ’20.New York, NY, USA:Association for Computing Machinery, 2020, p. 778–788.
[Online].Available: https://doi.org/10.1145/3377811.3380334

[46] C. Peng, S. Akca, and A. Rajan, “Sif: A framework for solidity con-tract instrumentation and analysis,”
in2019 26th Asia-Pacific SoftwareEngineering Conference (APSEC), 2019, pp. 466–473.

[47] J. Feist, G. Grieco, and A. Groce, “Slither: A static analysis framework for smart contracts,” in2019
IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain
(WETSEB),2019, pp. 8–15.

[48] S. Tikhomirov, E. Voskresenskaya, I. Ivanitskiy, R. Takhaviev,E. Marchenko, and Y. Alexandrov,


“Smartcheck: Static analysis of ethereum smart contracts,” in Proceedings of the 1st
InternationalWorkshoponEmergingTrendsinSoftwareEngineeringforBlockchain, ser. WETSEB ’18.New York,
NY, USA: Association for Computing Machinery, 2018, p. 9–16. [Online]. Available
:https://doi.org/10.1145/3194113.3194115

[49] Z. Gao, V. Jayasundara, L. Jiang, X. Xia, D. Lo, and J. Grundy,“Smartembed: A tool for clone and bug
detection in smart contracts through structural code embedding,” in2019 IEEE International Conference on
Software Maintenance and Evolution (ICSME), 2019, pp.394–397.

[50] A. Ghaleb and K. Pattabiraman, “How effective are smart contract analysis tools? evaluating smart contract
static analysis tools using bug injection, ”Proceedings of the 29th ACM SIGSOFT International Symposium
on Software Testing and Analysis, Jul 2020. [Online].Available:
http://dx.doi.org/10.1145/3395363.3397385

[51] P. Zhang, F. Xiao, and X. Luo, “Soliditycheck : Quickly detecting smartcontract problems through regular
expressions,” 2019.

[52] P. Hegedus, “Towards analyzing the complexity landscape of solidity based ethereum smart contracts,”
in2018 IEEE/ACM 1st International Workshop on Emerging Trends in Software Engineering for
Blockchain(WETSEB), 2018, pp. 35–39.

47
[53] J. Krupp and C. Rossow, “teether: Gnawing at ethereum to automatically exploit smart contracts,”
in27th USENIX Security Symposium (USENIX Security 18).Baltimore, MD:
USENIXAssociation,Aug.2018,pp.1317–
1333.[Online].Available:https://www.usenix.org/conference/usenixsecurity18/presentation/krupp

[54] L. Brent, A. Jurisevic, M. Kong, E. Liu, F. Gauthier, V. Gramoli, R. Holz,and B. Scholz, “Vandal: A
scalable security analysis framework for smartcontracts,” 2018

[55] Y. Wang, S. K. Lahiri, S. Chen, R. Pan, I. Dillig, C. Born, andI. Naseer, “Formal specification and
verification of smart contracts forazure blockchain,” 2018.

[56] S. Kalra, S. Goel, M. Dhawan, and S. Sharma, “Zeus: Analyzing safety of smart contracts,” 01 2018.

[57] L. Brent, A. Jurisevic, M. Kong, E. Liu, F. Gauthier, V. Gramoli, R. Holz, and B. Scholz, “Vandal: A
scalable security analysis framework for smart contracts,” 2018.

[58] https://swcregistry.io/docs/SWC-116#description

[59] https://swcregistry.io/docs/SWC-117#description

[60] https://swcregistry.io/docs/SWC-118#description

[61] https://swcregistry.io/docs/SWC-119#description

[62] https://swcregistry.io/docs/SWC-114

[63] https://swcregistry.io/docs/SWC-126#description

[64] https://swcregistry.io/docs/SWC-127#description

[65] https://swcregistry.io/docs/SWC-128#description

48

You might also like