You are on page 1of 13

An Empirical Study of Real-World WebAssembly Binaries

Security, Languages, Use Cases

Aaron Hilbig Daniel Lehmann Michael Pradel


aaron@hilbigpost.de mail@dlehmann.eu michael@binaervarianz.de
University of Stuttgart University of Stuttgart University of Stuttgart
Germany Germany Germany

ABSTRACT supported and available in 93% of all global browser installations


WebAssembly has emerged as a low-level language for the web and as of February 2021.1 Beyond client-side web applications, WebAs-
beyond. Despite its popularity in different domains, little is known sembly is also running on Node.js and even stand-alone runtimes.
about WebAssembly binaries that occur in the wild. This paper Despite its growing popularity, the WebAssembly ecosystem is
presents a comprehensive empirical study of 8,461 unique WebAs- severely understudied. To date, little is know about how the lan-
sembly binaries gathered from a wide range of sources, including guage is used, for what purposes, and how this affects the security
source code repositories, package managers, and live websites. We of WebAssembly-based applications. In particular, we are interested
study the security properties, source languages, and use cases of the in the following research questions:
binaries and how they influence the security of the WebAssembly RQ1: Source languages and tools. WebAssembly is a compilation
ecosystem. Our findings update some previously held assumptions target, and in principle any programming language can be compiled
about real-world WebAssembly and highlight problems that call to it. What languages are actually compiled to WebAssembly, how
for future research. For example, we show that vulnerabilities that much do they contribute to the overall population, and what tools
propagate from insecure source languages potentially affect a wide are used to produce the binaries? Answering these questions is
range of binaries (e.g., two thirds of the binaries are compiled from relevant for understanding the impact of issues that specific source
memory unsafe languages, such as C and C++) and that 21% of all languages may have and for guiding future work toward source
binaries import potentially dangerous APIs from their host envi- languages and toolchains prevalent in practice.
ronment. We also show that cryptomining, which once accounted
for the majority of all WebAssembly code, has been marginalized RQ2: Vulnerabilities propagated from insecure source languages.
(less than 1% of all binaries found on the web) and gives way to a Recent work has shown that memory vulnerabilities in insecure
diverse set of use cases. Finally, 29% of all binaries on the web are source languages, such as C and C++, may be exploited in Web-
minified, calling for techniques to decompile and reverse engineer Assembly binaries, sometimes even more easily than in native bi-
WebAssembly. Overall, our results show that WebAssembly has left naries [18]. How large is the attack surface offered by real-world
its infancy and is growing up into a language that powers a diverse WebAssembly binaries compiled from insecure languages, e.g., in
ecosystem, with new challenges and opportunities for security re- terms of dangerous APIs these binaries import from JavaScript or
searchers and practitioners. Besides these insights, we also share in terms of vulnerable memory allocators they ship? Answering
the dataset underlying our study, which is 58 times larger than the this question will increase our understanding of the threat posed
largest previously reported benchmark. by vulnerabilities compiled to the web.
RQ3: Cryptomining. Previous results show [24], and recent work
CCS CONCEPTS assumes [15, 25, 34, 43], that WebAssembly is frequently used for
• Security and privacy → Software and application security. cryptojacking, i.e., cryptomining performed in the browser of an
unsuspecting client. Is cryptomining still an important threat today?
ACM Reference Format:
Aaron Hilbig, Daniel Lehmann, and Michael Pradel. 2021. An Empirical RQ4: Use cases. As a general purpose language, WebAssembly
Study of Real-World WebAssembly Binaries: Security, Languages, Use Cases. can serve many purposes in web applications and beyond. What
In Proceedings of the Web Conference 2021 (WWW ’21), April 19–23, 2021, are the typical use cases of WebAssembly? Given that the language
Ljubljana, Slovenia. ACM, New York, NY, USA, 13 pages. https://doi.org/10. is becoming more widely adopted, it is important to understand
1145/3442381.3450138 what its use cases are and how this affects the security of the web.
RQ5: Minification and names. The ability to understand WebAs-
1 INTRODUCTION
sembly binaries, e.g., for auditing third-party code or for reverse
WebAssembly is a fast, compact, low-level byte code language origi- engineering malware, depends on whether binaries contain mean-
nally intended for client-side execution in web browsers. It is widely ingful names for program elements, e.g., functions. Do real-world
This paper is published under the Creative Commons Attribution 4.0 International
WebAssembly binaries contain meaningful names or are they ob-
(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their fuscated through minification?
personal and corporate Web sites with the appropriate attribution.
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Answering these and other questions requires a set of WebAssembly
© 2021 IW3C2 (International World Wide Web Conference Committee), published binaries that is (i) representative for how WebAssembly is used in
under Creative Commons CC-BY 4.0 License.
ACM ISBN 978-1-4503-8312-7/21/04.
https://doi.org/10.1145/3442381.3450138 1 https://caniuse.com/?search=WebAssembly
WWW
WWW’21,
’21,April
April19–23,
19–23,2021,
2021,Ljubljana,
Ljubljana,Slovenia
Slovenia Aaron
AaronHilbig,
Hilbig,Daniel
DanielLehmann,
Lehmann,and
andMichael
MichaelPradel
Pradel

the int increment(int x) (func (param i32) [...] // header with


thewild
wildand and(ii)
(ii)large
largeenough
enoughtotocovercoverthethediversity
diversityofofreal-world
real-world { (result i32) // type info
WebAssembly
WebAssemblyusage. usage.Currently,
Currently,no nosuch
suchset setofofbinaries
binariesexists.
exists. return x + 1; local.get 0 20 00 // local.get 0
The
Theclosest
closestexisting
existingworkworkisisby byMusch
Muschetetal. al.[24],
[24],whowhoreport
report } i32.const 1 41 01 // i32.const 1
on i32.add 6a // i32.add
onaastudy
studyofofWebAssembly
WebAssemblyusage usageininthethetop
toponeonemillion
millionwebsites.
websites. ) 0b // end
While
Whileinspiring,
inspiring,their
theirstudy
studyfalls
fallsshort
shortinintwotworespects.
respects.First,First,itithas
has
been
beenperformed
performedatataapoint pointinintime
timewhenwhenWebAssembly
WebAssemblywas wasstill
still (a) C Source Code. (b) WebAssembly (c) WebAssembly
ininits text format. binary format.
itsinfancy,
infancy,with withusage
usagebiased
biasedtotoearly
earlyadopters,
adopters,e.g.,e.g.,cryptomin-
cryptomin-
ers, Figure1:1:Example
Exampleofofaafunction
functioncompiled
compiledtotoWebAssembly.
WebAssembly.
ers, and a single toolchain dominating the ecosystem.Since
and a single toolchain dominating the ecosystem. Sincethen,
then, Figure
many
manychanges
changeshave havehappened,
happened,including
includinghigher
higherbrowser
browseradoption,
adoption,
alternative
alternativecompilers
compilersthatthathave
havebecome
becomeavailable,
available,the theshutdown
shutdownofof • 28.8% of all binaries on the web are minified, calling for future
of instructions that cover diverse use cases, including vi-
Coinhive
Coinhive(a(acommon commoncryptomining
cryptominingplatform)
platform)[42],[42],and
andthe therealiza-
realiza- work on decompiling and reverse engineering WebAssembly, to
sualization, interactive shells for programming languages,
tion
tion that vulnerabilities in insecure source languages canalso
that vulnerabilities in insecure source languages can alsobe be ensure that security analysts can understand web applications
media players, game engines, data compression, and natural
exploited
exploitedininWebAssembly
WebAssembly[18]. [18].Second,
Second,the themethodology
methodologyproposed proposed despite the presence of low level components.
language processing.
bybyMusch
Muschetetal.al.[24]
[24]focuses
focusesonlyonlyononbinaries
binariesfound
foundon onthetheweb,
web,and Overall,
and • 28.8% our of allfindings
binariesshowon thethat
webWebAssembly
are minified, calling is “growing
for futureup”,
only
onlyon onthose
thosethat
thatareareexecuted
executedwhen whenjust justvisiting
visitingaawebsite.
website.By By whichwork leadson to decompiling
a larger and much more diverse ecosystem than in
and reverse engineering WebAssem-
only
onlylooking
lookinginto intoclient-side
client-sideweb webapplications,
applications,WebAssembly
WebAssemblyon its early
on bly,days. From athat
to ensure security perspective,
security analyststhis candiversity
understand has several
web
other
otherplatforms
platformsisisdisregarded,
disregarded,e.g.,
e.g.,ononNode.js,
Node.js,browser
browserextensions, implications. First,despite
the fact
extensions, applications thethat there of
presence arelownow many
level legitimate
components.
and
andstand-alone
stand-aloneWebAssembly
WebAssemblyruntime runtimeengines.
engines. applications, and proportionally much fewer malicious ones, shifts
This Overall, our findings
detectingshow that code
WebAssembly is vulnerable
“growing up”,
This paper presents a comprehensiveempirical
paper presents a comprehensive empiricalstudy studyofofreal-real- the focus from malicious to handling code.
world WebAssembly binaries. The core of our work is which leadstheto a larger andofmuch more diverse ecosystem than in
world WebAssembly binaries. The core of our work isWasmBench, WasmBench, Second, large fraction binaries that originate from “insecure”
aadiverse its early days. From ainsecurity perspective, this diversity hasrisk
several
diverseset setofof8,461
8,461unique
uniquebinaries
binariesgathered
gatheredfrom fromaavariety
varietyofof source languages, particular C and C++, shows the that
sources, implications. First, the fact that there are now many legitimate
sources,including
includingquerying
queryingsource
sourcecodecoderepositories
repositoriesand andpackage
package their problems, e.g., memory vulnerabilities, will now propagate to
managers, applications, and proportionally much fewer vulnerabilities
malicious ones,[18] shifts
managers,searching
searchingthe theHTTP
HTTPArchive,
Archive,and andcrawling
crawlingthe theweb.
web.The The the web. Mitigations against such memory are
binaries the focus from detecting malicious codethe to web
handling Third, the code.
safe. vulnerable
binariesfound foundthrough
throughour ourmethodology
methodologyshow showthat thatconsidering
considering becoming an important goal to keep many
only Second, thecompilation
large fraction of binaries andthat
theiroriginate
variants,from e.g., “insecure”
onlyaasinglesingleoneoneofofthese
thesedata
datasources
sourceswouldwouldmiss missaasignificant
significant different toolchains in terms of
fraction source
memory allocators compiled into the binaries, createthe
languages, in particular C and C++, shows risk that
fractionofofthe theWebAssembly
WebAssemblyecosystem.
ecosystem.While Whilewe weobviously
obviouslycan- can- a potentially
not their problems, e.g., memory vulnerabilities, will now propagate to
notguarantee
guaranteetotocover coverallallkinds
kindsofofreal-world
real-worldWebAssembly
WebAssemblyusages, usages, large attack surface. Automated tools to analyze and improve the
WasmBench the web. Mitigations againstbinaries
such memory vulnerabilities [18] are
WasmBenchprovides providesnot notonly
onlyaa58×
58×larger
largerbenchmark,
benchmark,but butalso
alsoaa security of WebAssembly are needed.
more becoming an important goal to keep the web safe. Third, the many
morediverse
diverseset setofofWebAssembly
WebAssemblybinaries,binaries,thanthanthe thelargest
largestprevi-
previ- In summary, this paper contributes:
ously different compilation toolchains and their variants, e.g., in terms of
ouslyreported
reportedbenchmark
benchmark[24]. [24].
We memory
• The first allocators compiled into
comprehensive studythe ofbinaries,
WebAssembly create binaries
a potentially
gath-
Weuse useWasmBench
WasmBenchtotoaddress addressthe theabove
aboveresearch
researchquestions
questions
through largeered
attack from surface.
multiple Automated tools to analyze
sources, including and improve
client-side web applica-the
throughaacombination
combinationofofmanualmanualinspection,
inspection,custom
customstaticstaticanalysis
analysis
tools, security
tions,ofpackage
WebAssembly managers, binaries are needed.
and source code repositories;
tools,and andstatistical
statisticalanalyses.
analyses.Our Ourfindings
findingsinclude:
include:
• Real-world WebAssembly binaries are compiled from a variety In•summary,
A combination this paperof automated
contributes: program analyses, manual inspec-
tion, and statistical analysis to answer research questions about
of source languages, including systems programming languages, • The
the first comprehensive
security, source languages, study
and use of WebAssembly
cases of WebAssembly; binaries
• Real-world WebAssembly binaries are compiled from a va-
such as C, C++, Rust, and Go, higher level languages, such as gathered
• Empirical from multiple
evidence sources,
and insights aboutincluding client-side
security-related web
proper-
riety of source languages, including systems programming
AssemblyScript (a variant of TypeScript), and some rather unex- applications, package managers, and
ties of real-world WebAssembly, some of which update earlier source code reposito-
languages, such as C, C++, Rust, and Go, higher level lan-
pected languages, such as COBOL and Kotlin. ries; and many of which call for future work on mitigation
findings
guages, such as AssemblyScript (a variant of TypeScript),
• The majority of binaries is compiled from memory-unsafe lan- • A combination
techniques of automated
and analysis tools; program analyses, manual in-
and some rather unexpected languages, such as COBOL and
guages, from which vulnerabilities may propagate into WebAs- • By spection,
far the largest and statistical
benchmark analysis to answer research
of WebAssembly ques-
binaries, which
Kotlin.
sembly binaries [18]. tions about the security, source languages,
we make available as a basis for other studies and as a benchmark and use cases of
• The majority of binaries is compiled from memory-unsafe
• 65% of all binaries and 44% of all functions in them use the “un- WebAssembly;
for future tools: https://github.com/sola-st/WasmBench .
languages, from which vulnerabilities may propagate into
managed stack”, a portion of linear memory that is unprotected • Empirical evidence and insights about security-related prop-
WebAssembly binaries [18].
by the virtual machine and that can be exploited by attackers. erties of real-world WebAssembly, some of which update
• 65% of all binaries and 44% of all functions in them use 2 BACKGROUND
• 21% of all binaries import potentially dangerous APIs from their earlier findings and many of which call for future work on
the “unmanaged stack”, a portion of linear memory that is Filemitigation
formats and modules. and WebAssembly is a low-level bytecode
host environment, e.g., the infamous eval, APIs to modify the techniques analysis tools;
unprotected by the virtual machine and that can be exploited language, executed on a stack-based virtual machine. Figure 1 shows
DOM from JavaScript, or system call-like APIs to interact with • By far the largest benchmark of WebAssembly binaries,
by attackers. a small code we example
the network and file system on platforms outside the browser.
• 21% of all binaries import potentially dangerous APIs from which make in C, the corresponding
available as a basis for code otherinstudies
WebAssem-and
An attacker compromising a binary may abuse such APIs to bly’s .wat
as a text format, and
benchmark the same
for future codehttps://github.com/sola-
tools: in WebAssembly’s .wasm
their host environment, e.g., the infamous eval, APIs to mod- binaryst/WasmBench
format. The latter is usually used to distribute WebAssem-
trigger unexpected behavior. .
ify the DOM from JavaScript, or system call-like APIs to bly. Instructions are arranged into functions, and functions are
• Contrary to earlier findings [24], cryptomining has dropped sig-
interact with the network and file system on platforms out- arranged into modules, which correspond to files. We use the terms
nificantly in relevance, comprising only 1% of all binaries. Instead, 2“module”
BACKGROUND
side the browser. An attacker compromising a binary may and “binary” interchangeably. Each module is divided
we find applications with up to many millions of instructions
abuse such APIs to trigger unexpected behavior. File formats and
into multiple modules.
sections, WebAssembly
including import and is a export
low-level bytecode
sections that
that cover diverse use cases, including visualization, interactive
• Contrary to earlier findings [24], cryptomining has dropped language, executed shared
declare functions on a stack-based virtual machine.
with the environment, Figure
a code 1 shows
section that
shells for programming languages, media players, game engines,
significantly in relevance, comprising only 1% of all bina- adefines
small code example
functions and in C, the
their corresponding
bodies, and a data sectioncode inthat WebAssem-
initializes
data compression, and natural language processing.
ries. Instead, we find applications with up to many millions bly’s .wat text
memory. format,types,
Functions, and the andsame code in
variables areWebAssembly’s
referenced by indices. .wasm
An Empirical Study of Real-World WebAssembly Binaries WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

Phase 1 Phase 3 Source


GitHub 3,148 WebAssembly-related repositories
Collection Analysis Code
Sources Dataset
Wasm Binaries, top 1,000 depended-on packages
GitHub, NPM, • Querying • Static Anal. NPM
WAPM, Addons, Metadata 2,350 WebAssembly-related packages
• Crawling • Metadata Package
Web, ... • Other • Manual
Managers WAPM all 103 packages
Phase 2 Firefox
Dedup. & Filtering top 2,500 addons by average daily users
All Sources Addons
Representative RQs
HTTP Archive 855 URLs to likely WebAssembly files

Figure 2: Overview of the phases of our methodology. Web Tranco list: top 1,000,000 websites
Own
HTTP Archive: 40,000 URLs with WebAssembly in JS code
Crawling
binary format. The latter is usually used to distribute WebAssem- 3 WebAssembly top lists
bly. Instructions are arranged into functions, and functions are Survey
Manual
Other
arranged into modules, which correspond to files. We use the terms
Figure 3: Sources from which we collect binaries.
“module” and “binary” interchangeably. Each module is divided
into multiple sections, including import and export sections that
different contexts and at different stages of deployment. Sections 3.1
declare functions shared with the environment, a code section that
to 3.4 present how we collect WebAssembly binaries from source
defines functions and their bodies, and a data section that initializes
code repositories, package managers that distribute deployed soft-
memory. Functions, types, and variables are referenced by indices.
ware, archived and live websites, and through manual search, re-
Host environments. WebAssembly runs within a host environ- spectively. Figure 3 gives on overview of the different sources we
ment, such as the browser, Node.js, or a standalone WebAssembly collect binaries from. Alongside each binary, we also collect meta-
runtime like wasmer or wasmtime. The host environment instanti- data, e.g., on which website a binary was found. All activities related
ates the WebAssembly binary and can provide imported functions to to collecting binaries were done between April and September 2020.
the module. For example, in a client-side web application, JavaScript Overall, the collection phase results in 51,148 binaries, including
code can instantiate binaries through the WebAssembly.instantiate duplicates and binaries that are not representative for real-world
and WebAssembly.Module APIs. usages of WebAssembly. Section 3.5 presents the filtering phase,
Linear memory. Each module instance has a linear memory sec- where we filter and deduplicate these binaries into a set of 8,461
tion, which can be thought of as an array of consecutive bytes. It unique binaries that serve as the basis for our study. Finally, the
stores all data handled by the module, except for locals and globals third phase analyzes the set of binaries through a combination of
(which can only be of four primitive types), including all dynami- static code analysis, analysis of metadata associated with the bina-
cally allocated data structures. The i32 datatype is used for pointers ries, and manual inspection. We present the analysis phase along
into linear memory. The linear memory has two implications. First, with its results in Section 4.
programs that want to associate non-primitive data with functions To the best of our knowledge, no prior work has gathered Web-
typically do so through a so-called “unmanaged stack”, which sim- Assembly binaries from such a diverse set of sources. As a result,
ply is a region in the linear memory used to represent the function the number of binaries we obtain is 58 times larger than the largest
stack. This stack is called “unmanaged” because it is not protected set studied so far (147 unique binaries) [24]. Our experimental re-
by the virtual machine and under full control of the program. As a sults (Section 4.2) show that the sources we consider WebAssembly
result, memory-unsafe behavior from source languages, e.g., C and binaries from complement each other, i.e., considering all of them
C++, can also affect WebAssembly binaries. Second, non-trivial ap- is crucial to obtain a representative dataset.
plications often have their own memory allocator compiled into the
binary. Both the unmanaged stack and custom memory allocators 3.1 Collecting Binaries from Repositories
may allow attackers to exploit a WebAssembly binary [18]. Our first method for collecting binaries looks into source code
Compilers, tools, and runtimes. Various compilers target Web- repositories. Even though WebAssembly is a binary format, devel-
Assembly, e.g., the Emscripten compiler for C and C++, the Rust opers often store binaries into source code repositories, e.g., to ease
compiler, the AssemblyScript compiler, or the Asterius compiler the installation of a project or to include third-party libraries. To
for Haskell. Several compilers sometimes share a common frame- gather such binaries, we clone all public repositories that are in the
work, e.g., Emscripten and the Rust compiler are based on LLVM, top 1,000 results of four queries to the GitHub search API:
while the AssemblyScript compiler and Asterius are based on Bina- • Repositories where “wasm” or “WebAssembly” is in the
ryen. Because the bare WebAssembly language does not provide repository name or description (i.e., two queries).
any built-in library, compilers often combine the compiled code • Repositories that are tagged with “WebAssembly” as one of
with a runtime environment. For example, the Emscripten runtime the used programming languages.
environment and the WebAssembly system interface (WASI) both • Repositories tagged with the topic “WebAssembly”.
offer a set of low-level system calls, e.g., to support file I/O and Overall, the queries result in 3,148 repositories, which we clone,
networking. and then search for files ending in .wasm.

3 METHODOLOGY 3.2 Collecting Binaries from Package Managers


Our methodology is split into three phases (Figure 2). In the col- Once developers deploy a WebAssembly-based application, it is
lection phase, we obtain a large set of WebAssembly binaries from often made available through a package manager. We consider three
a variety of sources. We select sources to cover WebAssembly in software ecosystems that use WebAssembly.
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Aaron Hilbig, Daniel Lehmann, and Michael Pradel

3.2.1 npm Packages. The Node Package Manager (npm) distributes We search the responses stored in the HTTP Archive tables for
JavaScript code, some of which relies on WebAssembly modules. likely WebAssembly binaries and then directly download the cor-
Packages distributed via npm are typically used in server-side ap- responding files. To this end, we query two tables, from months
plications with Node.js or on the client side. To find npm packages May and June 2020, which contain information about all requests
that contain WebAssembly binaries, we gather two sets of pack- made while crawling the websites, and the corresponding responses.
ages. First, from the full registry file of npm we compute the top These tables, called summary_requests are 434.4 GB and 476.7 GB
1,000 most depended-upon packages. Second, we query npm for all large. We filter all requests in the tables by the MIME type of the re-
packages that match at least one of the keywords “wasm” and “Web- quested resource, keeping only those commonly used to serve Web-
Assembly”, which yields 2,350 packages. We install these packages Assembly, such as application/wasm and application/octet-stream,
and their transitive dependencies, and then search the resulting and where .wasm appears in the URL. These queries result in a set
7,198 packages for .wasm files. of 855 URLs. We download files from each of these URLs using wget
and keep all that start with \0asm, WebAssembly’s magic number.
3.2.2 wapm Packages. The WebAssembly Package Manager (wapm)
specializes on distributing WebAssembly code. Most of the wapm 3.3.2 Web Crawling. The HTTP Archive-guided search covers a
packages are intended to run standalone WebAssembly runtime wide range of top-level domains, but it may miss WebAssembly
engines. Unlike for npm, we can afford to analyze all 103 available binaries on websites not covered by crawling a generic list of web-
packages. We install all packages and again extract all .wasm files. sites and binaries that one cannot identify based on their MIME
type. To collect additional binaries, we also perform our own web
3.2.3 Firefox Browser Add-ons. Browser extensions, traditionally crawling. There are three components to our crawling: the seed list,
implemented in JavaScript, nowadays can also make use of Web- the crawling algorithm, and methods for detecting WebAssembly.
Assembly code. To gather binaries used in browser extensions, we Seed lists. Any kind of web crawling requires a seed list of URLs
download the top 2,500 Firefox add-ons from addons.mozilla.org, as to start from. We consider three seed lists, one generic list of popular
measured by average daily users. We then unpack the extensions’ websites and two lists targeted specifically at WebAssembly:
XPI archives, and search again for .wasm files.
• Top one million websites. As a generic set of websites to ex-
3.3 Collecting Binaries from Websites plore, we start crawling from the one million most popular
Collecting WebAssembly binaries from the web involves several websites on the Tranco list [28], a top list more resilient to
challenges. First, as the web is too big to be searched in its entirety, manipulation.
finding suitable starting points for exploring it is crucial. Second, • “WebAssembly” in JavaScript files. WebAssembly binaries on
even when visiting a WebAssembly-powered website, it is non- websites must be executed by some surrounding JavaScript
trivial to identify and collect WebAssembly binaries from it. Some code, e.g., by calling WebAssembly.instantiate. To identify
sites embed WebAssembly modules into JavaScript source code, websites with such JavaScript code, we query a table pro-
e.g., as base64-encoded strings that are decoded and instantiated vided by the HTTP Archive that stores the full bodies of
at runtime. For such sites, we must detect WebAssembly modules all HTTP responses up to some size. We search this table,
when they are executed. Other websites spawn requests for Web- which has a total size of 9.32 TB, with Google BigQuery for
Assembly modules but never execute them during our collection all JavaScript responses that contain WebAssembly and add the
process, e.g., because execution relies on specific user inputs. A URLs of the corresponding websites to our seed list, which
purely dynamic methodology would miss such binaries. results in about 40,000 URLs.
We address the first challenge, finding good websites as starting • WebAssembly top lists. As the most targeted seed list, we
points, through two techniques. On the one hand, we can build on start crawling from three hand-curated lists of notewor-
results from the HTTP Archive for finding sites known to con- thy WebAssembly-related websites. These websites cover
tain WebAssembly binaries (Section 3.3.1). On the other hand, projects using WebAssembly3 , tools and demos4 , and
for our own crawling, we systematically start from potentially WebAssembly-based games5 .
WebAssembly-related seed URLs (Section 3.3.2). To address the Crawling algorithm. Given a seed list, our crawler visits each
second challenge of detecting WebAssembly binaries during crawl- URL on the list and recursively follows links on the visited websites.
ing, we analyze all websites through a combination of static and The crawler visits each URL, with up to one retry. If the website
dynamic detection techniques. is loading successfully, the crawler waits until either the “DOM
content loaded” event is fired and all network connections have be-
3.3.1 Direct Downloads Guided by HTTP Archive. The HTTP Archive come idle, or until a 30-second timeout occurs. The crawler collects
project2 regularly crawls the web and makes the requests and re- all WebAssembly binaries loaded or executed in this time (details
sponses available. Starting from URLs obtained from the Chrome below). For each visited website, the crawler extracts more URLs to
User Experience Report , the project currently covers over 5 million explore from the href attribute of all <a>-tags on the site.
top-level domains, monthly. We here focus on websites crawled To control the amount of sites to visit, the crawler is configured
using the desktop version of Google Chrome, which we access via with two parameters: the recursion depth d, which bounds how
Google’s BigQuery database system.
3 https://madewithwebassembly.com/
4 https://github.com/mbasso/awesome-wasm
2 https://httparchive.org/ 5 https://www.webassemblygames.com/
An Empirical Study of Real-World WebAssembly Binaries WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

many links away from the seed URLs to explore, and the exploration those variants by filename (e.g., *.opt.wasm) and path (e.g.,
breadth b, which bounds how many links to follow on each explored binaries in afl_out/).
site. If a site has more than b links, the crawler picks b of them at • Test suites: On GitHub and in some npm packages, we find
random. For the first two seed lists, we set d = b = 2, i.e., the crawler binaries that are used as test inputs for WebAssembly-related
visits at most seven sites per URL in the seed list. Because the third tools, e.g., parsers, compilers, and virtual machines. One
seed list is the most focused one, we explore it more thoroughly large portion are binaries from the official specification test
with d = 7 and b = 3, and repeat the exploration with 16 separate suite, which often test only a single instruction or language
crawler instances. We chose those parameters based on preliminary construct. We identify them by typical repositories and paths
experiments, to find most binaries in a given time budget. (e.g., files in spectest/).
• Tutorial projects: Many npm projects and some GitHub repos-
Identifying WebAssembly binaries. For each website visited by
itories are instances of users following WebAssembly tutori-
the crawler, we use a combination of two techniques to identify
als for particular tool chains.6 Those binaries are small and
WebAssembly binaries on the site. Our first detection mechanism
all very similar. We identify them based on common binary
intercepts the website’s traffic using a local proxy that inspects
names (e.g., hello_world_bg.wasm) and project names (e.g.,
the headers and contents of all requests and responses. To iden-
test-wasm@0.0.1).
tify WebAssembly modules, we check if the content-type header
• Small and invalid binaries: Finally, we remove binaries that
matches application/wasm or application/octet-stream, or if the
contain ten or fewer instructions, and binaries that cannot
URL contains .wasm, and then ensure that the response payload
be validated by the reference WebAssembly binary toolkit
starts with the proper magic number. If this is the case, we store
(WABT), even with all language extensions enabled.
the loaded file as a WebAssembly binary. The key advantage of this
detection mechanism is that it detects WebAssembly modules even
if they are not executed during the crawler’s visit of the website.
4 RESULTS
The second detection mechanism tracks calls to APIs used for in- Based on our WasmBench dataset of real-world WebAssembly bi-
stantiating WebAssembly modules, as proposed in prior work [24]. naries, we address the research questions (RQs) described in the
We transparently overwrite built-in JavaScript functions, such as introduction. For each RQ, we detail the analyses performed on
WebAssembly.instantiate, and analyze its invocations. In contrast the dataset, the direct results, and then interpret those to obtain
to the first detection mechanism, this mechanism can detect Web- insights, i.e., take-away points, often with a focus on security.
Assembly binaries that occur inline in JavaScript code, if executed.
4.1 Implementation and Experimental Setup
3.4 Collecting Binaries Manually The crawler is implemented based on Puppeteer and Puppeteer Clus-
ter, two Node.js libraries for controlling instances of the Chromium
In addition to automatically collecting WebAssembly binaries, we
browser, here version 83.0.4103.0. The static analyses described in
also gather a small number of binaries manually. On the one hand,
the following are implemented in several Rust programs to stat-
we collect binaries through manual interaction with the web in daily
ically extract relevant features, such as instructions, names, etc.
browsing between April and September 2020. On the other hand,
from the binaries, complemented by Python scripts that perform
we asked WebAssembly developers on reddit.com/r/WebAssembly
the final analyses. For parsing binaries, we use the wasmparser li-
in June 2020 for binaries they are willing to share. As discussed in
brary, a project by the Bytecode Alliance. All experiments were run
the results, these two manual collection methods complement our
on an Ubuntu 18.04 machine with two Intel Xeon CPUs at 2.2 GHz
automatically collected binaries with otherwise missed examples.
running 48 threads, equipped with 256 GB of memory. The crawling
was performed in chunks of 50,000 websites, where each chunk took
3.5 Deduplication and Filtering about 7 hours to finish, with a total of about 10 days for all crawling.
After collecting binaries and associated metadata from the afore- The static analyses usually finish within several minutes for the en-
mentioned sources, we remove duplicates and filter binaries that tire dataset. Our entire dataset and the implementation are available
are not representative of real-world applications. To deduplicate for others to build on at https://github.com/sola-st/WasmBench.
binaries, we compare files based on their SHA256 hash and remove
identical files. Unless mentioned otherwise, our study focuses in 4.2 Overview of Dataset
the deduplicated dataset. In addition to deduplication, we remove Table 1 gives an overview of our dataset. For each source, the table
binaries that are non-representative of real-world applications, be- shows how many binaries we found, and how many remain after
cause they fall into at least one of the following categories. Binaries deduplication and filtering. The last column shows how many bina-
that occur multiple times, e.g., across different sources, are only ries are found only via a single source, illustrating the importance
removed if all occurrences of it were filtered out. of particular collection methods for obtaining a diverse dataset.
• Generated binary variants: Some GitHub repositories contain Sources. The largest contribution to the dataset are the GitHub
binaries generated by research tools, e.g., to fuzz-test Web- repositories and packages from npm. Given that WebAssembly
Assembly implementations, to superoptimize WebAssembly binaries on websites or in arbitrary packages are still relatively
code [4], or to perform code diversification [2]. Since these scarce, selecting repositories and packages related to WebAssembly
tools turn a single binary into many, only slightly differ-
ent variants, we remove the generated variants. We identify 6 For example, the rustwasm book: https://rustwasm.github.io/docs/book/.
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Aaron Hilbig, Daniel Lehmann, and Michael Pradel

Table 1: Contribution of different sources to the dataset. Table 2: Binaries filtered out due to different criteria.
Found Binaries Removed Binaries
Source (see Figure 3) Filter
Total
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Unique Filtered Only Total and MichaelUnique
Aaron Hilbig, Daniel Lehmann, Pradel
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Aaron Hilbig, Daniel Lehmann, and Michael Pradel
GitHub,
Tablesearch: wasm filtered 44,218
2: Binaries out due to21,117
different 6,830
criteria.6,641 Generated binary variants, of those: 9,279 8,048
Table 2: Binaries filtered out14due to different criteria. 1250 in CROW [2] repository 100%
NPM, top dependend-upon 8 8 3 100% 8,025 7,987
1250

binaries
Removed Binaries

binaries
NPM, search: wasm 3,488 2,452 1,163 1,036 Test suites and files, of those: 80% 26,138 5,283

binaries
Removed Binaries 1000

binaries
Filter 1000 variants of WebAssembly spec suite 80%
WAPM, all
Filter 122 113
Total 108 Unique81 60%
25,133 4,931
Total 750
Firefox add-ons, top by users 29 17 17 Unique15 Invalid WebAssembly binaries 60% 13,593 3,513

ofof
750

ofof
Generated binary variants, of those: 261 9,279 141 8,048 40%

Percentile
Number
Web, HTTP
Generated Archive
binary variants, of those: 141
9,279 54
8,048 500Small binaries: <10 instructions 40% 10,146 1,881

Percentile
Number
in CROW [2] with
repository 8,025 500
Web, crawling,
in CROW seed list:
[2] repository 2,923 432
8,025 412 7,987298
7,987 Tutorial projects, of those:
250
20% 848 633
TestHTTP
suites and files, of those: 26,138 20%
Test suitesArchive
and files, of those: 2,046 268
26,138 254 5,283128
5,283 250 hello-wasm projects 695 506
variants of WebAssembly spec suite 25,133 0%
Tranco top
variants websites
of WebAssembly 769
spec suite 167
25,133 164 4,931 86
4,931
0
0 0Filters 50 100 150 200 0% 25 27 29 211 213 215 217 219 221 223 225 227
Invalid WebAssembly
WebAssembly binaries
top lists 108 13,593
89 76 3,513 43
All 0 50Size (kilobytes)
100 150 200 25 2737,808
29 211 Size 15 17 19 14,952
213 2(bytes)
2 2 221 223 225 227
Invalid WebAssembly binaries 13,593 3,513 Size (kilobytes) Size (bytes)
Small
Manual binaries: <10 instructions 93 10,146
89 88 1,881 57
Small binaries: <10 instructions 10,146 1,881 (a) Histogram of the lower 80%
8 (65.8 (b) Cumulative distribution of
Tutorial projects, of those: 848 633 opencascade.js
(a) Histogram of MB, 22.2M
the lower 80% instructions), a WebAssembly
(b) Cumulative distributionport
of
Tutorial
All projects, of those:
Sources 51,148 23,413 848 8,461 633 of the sizes of the binaries. the binary sizes (in log2 scale).
hello-wasm projects 695 506 of
of the sizes of
an open the binaries.
source the found
C++ CAD library, binaryonsizes
npm(in and
log2 GitHub;
scale).
hello-wasm projects 695 506 Figure 4: Distribution of binary sizes.
All Filters 37,808 14,952 Figure
and finally the 4: Distribution
Clang ofcompiled
compiler, itself binary sizes.
to WebAssembly
All Filters 37,808 14,952
is an effective way of finding binaries. At the same time, the sources (46.7 MB, 12.6M instructions), found on wapm and GitHub.
where we do not query for WebAssembly specifically (i.e., most
where
of the we web do not query
crawling, thefortopWebAssembly
packages fromspecifically
npm, and Firefox (i.e., most
add- Insight 2. Complex, real-world applications with millions of lines
where we do not query for WebAssembly specifically (i.e., most Insight 2. Complex, real-world applications with millions of lines
of thestill
ons) webshow
crawling,
that the top packages
WebAssembly fromare
binaries npm, andinFirefox
found the add-
wild in of code are compiled to WebAssembly.
of the web crawling, the top packages from npm, and Firefox add- of code are compiled to WebAssembly.
ons) still projects
popular show that WebAssembly
used by millions binaries
of users. are
For found
the in the
web, we wild
also in
see
ons) still show that WebAssembly binaries are found in the wild in
popular
that all projects
our fourused by millions
sources of users.
are essential forFor the web,
finding we alsosetsee
a diverse of
popular projects used by millions of users. For the web, we also see 4.3 RQ1: Source Languages and Tools
that all our four
WebAssembly sourcesOnly
binaries. are essential
crawling for finding
the top one amillion
diverse set of
websites
that all our four sources are essential for finding a diverse set of 4.3
WebAssembly
would miss atbinaries.least 225 Only crawling
unique the top
binaries. Theonefact
million
that websites
the seed Given RQ1:
4.3 RQ1: Source Languages
Sourcegoal
WebAssembly’s Languages and
and Tools
Tools
of being a universal bytecode, we study
WebAssembly binaries. Only crawling the top one million websites
would missHTTP
at least 225 unique binaries. The fact that the seed Given
whichWebAssembly’s
languages are compiled goal of beingto it inaapractice,
universaland bytecode, we study
which toolchains
would miss at least 225 unique binaries. The fact that the much
lists from Archive and the WebAssembly top lists are seed Given WebAssembly’s goal of being universal bytecode, we study
lists fromthan
smaller HTTPthe Archive
list of theand
topthe
oneWebAssembly
million top yet
websites, liststhe
arenumber
much which
are languages
used for to are compiled to it in practice, and which toolchains
it.
lists from HTTP Archive and the WebAssembly top lists are much which languages are compiled to it in practice, and which toolchains
smaller
of found than the listare
binaries of the top oroneeven
million websites,
showsyet thea number are Analysis.
used for toIt it.
smaller than the list ofsimilar
the top one higher,
million websites, that
yet the targeted
number are used for to is non-trivial to infer from a binary which source
it.
of found
seed list binaries
for are similar
crawling is key or
to even higher,
finding showsmissed
otherwise that a binaries.
targeted language
of found binaries are similar or even higher, shows that a targeted
seed list for crawling is key to finding otherwise missed binaries. Analysis.and compiler
It is non-trivial has to produced
infer from it. We rely onwhich
a binary several com-
source
seed list for crawling is key to finding otherwise missed binaries. plementary It is non-trivial
Analysis. methods. First, weto check
infer from the a binary section,
producers which sourcewhere
language and compiler has produced it. We rely on several com-
language
some toolchainsand compiler
explicitly has produced it. Welanguage(s)
rely on several com-
Insight 1. All methods we use to collect binaries contribute in plementary methods. First,encode
we check thethesource
producers section, where
a program
Insight 1. All methods we use to collect binaries contribute in plementary
is compiled methods.
from. First, we
Second, our check
analysisthe producers
searches section,
for where
characteris-
a non-negligible way. Combining different sources and collection some toolchains explicitly encode the source language(s) a program
a non-negligible way. Combining different sources and collection some toolchains explicitly encodein
that appear thethesource language(s)
section, a program
the
techniques is crucial for obtaining a representative dataset. isticcompiled
function names
from. Second, our analysis import
searches export
for characteris-
techniques is crucial for obtaining a representative dataset. is compiled
section, or from.
the Second,
optional name oursection.
analysisFor searches
example, for_ZdaPv
characteris-
is the
tic function names that appear in the import section, the export
tic function names
name-mangled deletethatoperator
appear inofthe C++; import
or section, the export
runtime.gostring is a
Filtering. Deduplication
Filtering. Deduplicationand andfiltering
filteringnon-representative
non-representativebinariesbinaries section, or the optional name section. For example, _ZdaPv is the
Filtering. Deduplication and filtering non-representative binaries section,
Go runtime or the optional
library name section.
function. Overall,For weexample, _ZdaPv is the
identify characteristic
(Section
(Section 3.5)
3.5)significantly
significantly reduces
reduces the
thedataset
dataset (Table
(Table 2).
2).In
Inparticular,
particular, name-mangled delete operator of C++; or runtime.gostring is a
(Section 3.5) significantly reduces the dataset (Table 2). In particular, name-mangled
function names delete
for C++, operator
C, Rust, ofGo,C++; or runtime.gostring
AssemblyScript, Kotlin, is anda
one
onerepository
repositorythat
thatcontains
containsgenerated
generatedvariants
variantsof ofinput
inputbinaries
binariesis Go runtime library function. Overall, we identify characteristic
one repository that contains generated variants of input binaries isis Go runtime
FStar. Third, library
the function.
analysis searchesOverall,for we identify characteristic
characteristic strings among
important
important to
tofilter,
filter,as
asititotherwise
otherwise accounts
accounts for
forover
over 8,000
8,000 unique
unique function names for C++, C, Rust, Go, AssemblyScript, Kotlin, and
important to filter, as it otherwise accounts for over 8,000 unique function
all text names forof
sequences C++, C, Rust,
>3 ASCII Go, AssemblyScript,
characters in the data Kotlin, and
section. For
binaries.
binaries.From
Fromthe thesecond
secondcategory,
category,test testsuites,
suites,we wealso
alsoseeseethat
that FStar. Third, the analysis searches for characteristic strings among
binaries. From the second category, test suites, we also see that FStar.
example, Third,
beingthecoreanalysis
types, searches for characteristic
Result::unwrap and strings among
Option::unwrap fre-
test
test binaries
binaries are
are commonly
commonly reused
reused across
across many
many projects
projects (26,138
(26,138 all text sequences of >3 ASCII characters in the data section. For
test binaries are commonly reused across many projects (26,138 all text sequences
quently appear in of >3 messages
error ASCII characters of the in the
Rust data section.
standard library. For
We
occurrences,
occurrences,but butonly
only5,283
5,283unique
uniquebinaries),
binaries),and
andthat
thatmost
mostof ofthem
them example, being core types, Result::unwrap and Option::unwrap fre-
occurrences, but only 5,283 unique binaries), and that most of them example,
identifyappearbeing core types,
characteristic Result::unwrap andMatlab,
Option::unwrap fre-
are
arefrom
from the
the official
officialspecification
specification test
test suite.
suite.The
The last
last filter
filter removes
removes quently in error strings
messages forof C++,
the Rust,
Rust standard and COBOL.
library. We
are from the official specification test suite. The last filter removes quently
Fourth, appear
if none in
of error
the messages
above work, ofwetheanalyze
Rust standard
sibling library.
files of We
bina-
binaries
binariesfrom
fromaaafew,
few,very
verysimilar
similartutorials,
tutorials,including
includingmore morethanthan500500 identify characteristic strings for C++, Rust, Matlab, and COBOL.
binaries from few, very similar tutorials, including more than 500 identify characteristic
ries collected strings for C++, Rust, Matlab, and COBOL.
binaries
binaries from
from projects
projects called
called hello-wasm. .
hello-wasm Fourth, if nonefrom of the code repositories
above work, we and package
analyze managers.
sibling files ofSibling
bina-
binaries from projects called hello-wasm. Fourth,
file here if means
none ofathe file above
in the work,
same we analyze that
directory sibling files of
shares thebina-
file
ries collected from code repositories and package managers. Sibling
Binarysizes.
Binary Asaafirst
sizes. As firstproxy
proxyforforthe
thediversity
diversityof ofthe
thecollected
collected ries
name collected
except from
for thecode repositories
extension. We and into
take packageaccount managers.
extensions Siblingfor
Binary sizes. As a first proxy for the diversity of the collected file here means a file in the same directory that shares the file
binaries,we
binaries, welook
lookinto
intotheir
theirsizes
sizesand
andinstruction
instructioncount.
count.Figure
Figure44 file
C, here Rust,
C++, means Go, a AssemblyScript/TypeScript,
file in the same directory that the shares
WebAssemblythe file
binaries, we look into their sizes and instruction count. Figure 4 name except for the extension. We take into account extensions for
showsthe
shows thehistogram
histogramand andcumulative
cumulativedistribution
distributionof ofbinary
binarysizes
sizes name
text except(.wat
format for the extension. We take into account extensions
Finally, for
shows the histogram and cumulative distribution of binary sizes C, C++, Rust, Go,/.wast ), and several
AssemblyScript/TypeScript, smaller languages.
the WebAssembly for
ininbytes.
bytes.The Thedistribution
distributionof ofthe
thenumber
numberof ofinstructions
instructionsisissimilar
similar C,someC++, Rust,code
source Go, AssemblyScript/TypeScript,
repositories and packages with themultiple
WebAssemblyuniden-
in bytes. The distribution of the number of instructions is similar text format (.wat/.wast), and several smaller languages. Finally, for
inin shape
shape andand omitted
omitted forfor space
space reasons.
reasons. While
While there
there are
are many
many text
tifiedformat (.wat
binaries, we /.wast ), and inspect
several source
smaller languages. Finally, and
for
in shape and omitted for space reasons. While there are many some source code manually
repositories and packagescode, with build
multiplescripts,
uniden-
smallbinaries,
small binaries,there
thereisisaalong
longand
andheavy
heavytail
tailtowards
towardslarger
largersizes.
sizes. some source
binaries. code
For each repositories
of the automated and packages
methods with
above,multiple uniden-
small binaries, there is a long and heavy tail towards larger sizes. tified binaries, we manually inspect source code, buildwe manually
scripts, and
Twothirds
Two thirdsof ofthe
thebinaries
binariesare arelarger
largerthan
than20 20KBKBand
andhave
havemore
more tified
inspect binaries,
binaries we manually inspect source code,that build scripts, and
Two thirds of the binaries are larger than 20 KB and have more binaries. For eachand the automated
of the predictions to confirm
methods above, our heuristics
we manually
than8,700
than 8,700instructions.
instructions.TheThemedian
medianbinary
binaryisis37.1
37.1KB
KBlarge
largeand
andhas
has binaries.
are precise. ForForeach of thewhere
binaries automatedmultiple methods
methods above, we the
identify manually
source
than 8,700 instructions. The median binary is 37.1 KB large and has inspect binaries and the predictions to confirm that our heuristics
14,885instructions.
14,885 instructions.The Thelargest
largestbinaries
binariesareareaaWebAssembly
WebAssemblyport port inspect
language, binaries and the predictions to confirm that our heuristics
14,885 instructions. The largest binaries are a WebAssembly port are precise.we Forconfirm
binariesthat where themultiple
predictions methodsare consistent.
identify the source
of TiDB77, ,aadistributed
ofTiDB distributedSQLSQLdatabase
databasewritten
writtenin inGo,
Go,with
with75.1
75.1MBMB areResults.
precise. Figure
For binaries where multiple methods identify the We
source
of TiDB7 , a distributed SQL database written in Go, with 75.1 MB language, we confirm 5a shows
that the thepredictions
inferred source languages.
are consistent. see
and16.9M
and 16.9Minstructions,
instructions,respectively,
respectively,found
foundas asaawapm
wapmpackage;
package; language, we confirm that the predictions
that almost two thirds (64.2%) of the binaries are compiled from C, are consistent.
and 16.9M instructions, respectively, found as a wapm package;
opencascade.js88 (65.8 MB, 22.2M instructions), a WebAssembly port Results. Figure 5a shows the inferred source languages. We see
opencascade.js (65.8 MB, 22.2M instructions), a WebAssembly port
7 https://github.com/pingcap/tidb Results. Figure 5a shows the inferred source languages. We see
8 https://github.com/donalffons/opencascade.js
of an open source C++ CAD library, found on npm and GitHub; that almost two thirds (64.2%) of the binaries are compiled from C,
of an open source C++ CAD library, found on npm and GitHub; that almost two thirds (64.2%) of the binaries are compiled from C,
and finally the Clang compiler, itself compiled to WebAssembly C++, or a combination of both. Given that these are memory-unsafe
and finally the Clang compiler, itself compiled to WebAssembly C++, or a combination of both. Given that these are memory-unsafe
(46.7 MB, 12.6M instructions), found on wapm and GitHub. languages, plagued with decades of vulnerabilities [41] and that
(46.7 MB, 12.6M instructions), found on wapm and GitHub. languages, plagued with decades of vulnerabilities [41] and that
7 https://github.com/pingcap/tidb WebAssembly binaries are not automatically safe from exploita-
7 https://github.com/pingcap/tidb
8 https://github.com/donalffons/opencascade.js
WebAssembly binaries are not automatically safe from exploita-
8 https://github.com/donalffons/opencascade.js tion [18], this result is highly worrying.
tion [18], this result is highly worrying.
dataset. F
piled to WebAssembly, including languages with garbage collection tematical
and heavier runtimes (Go, Matlab). This result matches WebAssem- binaries.
bly’s goal of serving as a universal bytecode. It also means binary because
analysis will become more important, since source code is not al- language
ways available, and even if it is, implementing separate analyses also, e.g.,
for many languages is impractical. cause me
An Empirical Study of Real-World WebAssembly Binaries WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

We also analyze the tools used to produce the binaries. For 20.2% 4.4.1 Us
1250 100% C++ (3,986)
of the binaries,47.1% the producers sectionProducers explicitly mentions
Section Projects (245)
them in stack” is

Percentile of binaries
Number of binaries

1000 80% File Extension (304)


the processed-by field. 10.8% of all binaries(1,710) explicitly2.9% mention being program
60% 20.2% 3.6% This desi
750 produced by Clang. No binaries mention Emscripten because it does
AnAn Empirical
Empirical Study
Study of of Real-World
Real-World WebAssembly
WebAssembly Binaries WWW
500
40%
Binaries not emit14.9% a producer section, 9.0% Unknownunlike’21,
WWW ’21, April
newer
(763)
April 19–23,
19–23,
versions 2021,
2021,of Ljubljana,
Clang.Slovenia
60.9% Strings
Ljubljana, Slovenia
These (5,156) data of w
Rust (1,261) 64.9%
results show that 3.8%
Emscripten is no longer the only way to compile locals or
20% 1.7% Names (5,495)
250 C++
C++ (3,986)
(3,986) Other (324)
47.1%
47.1% Projects (245)
C and
FigureC++
Figure to
5b
5b
9.0% WebAssembly.
shows
shows 3%
which
which Other
of
of our
our tools
methods
methods that appear
are
are most
most in the
effective
producer
effective at
at that buffe
Producers
0%
Producers Section Projects (245)
Section 8.1% 3.3% Go (146)
0 section are (9.5%), 12 (7.9%), a JavaScript stack fram
0 50 100 150 200 2(1,710)
5 13 15 File
27 29 211 2.9%
(1,710) 22.9% 2 2File 2Extension
221 223 225(304)
Extension
17 19 (304)
227 inferring
inferring
C/C++ (760) the source
therustc
source language.
AssemblyScript 20.2%
wasm-bindgen
language. 20.2% (256)ofof thethe binaries
binaries contain aahost-
contain pro-
pro-
3.6% 13
section,C from (682)
20.2% 3.6% <100 instructions (283)
Size (kilobytes) 20.2%Size (bytes) code
ducers
ducers generator,
section, and walrus
from which(7.5%)
which the
the source
source , a binary
code
code transformation
language
(b) Analysis
language can
can be
methods library,
be directly
(multi-
directly the unma
60.9%
60.9% Strings
Strings (5,156)
(5,156) and
obtained.the (a)
officialSource Go
Characteristic languages.
compiler names (0.4%).and Since ple
strings can
all apply
compilers
are also per forbinary).
important Rust, C,
in- even in n
(a) Histogram
14.9%
14.9%
of the lower 80%
9.0% Unknown
9.0% Unknown (763)
(b)
(763)
Cumulative distribution of obtained. Characteristic names and strings are also important in-
Rust
Rustof(1,261)
(1,261)
the sizes of the binaries. 3.8%
3.8% the binary 64.9% 64.9% (in log scale).
sizes 2 Figure
and
ference
ference 5: toSource
C++methods, WebAssembly
methods, languages
since
since they
they and
basedmethods
areapply
apply to
to on LLVM,
64.9%
64.9% andforwe
and inferring
60.9%
60.9%can of also
of them.
derive
binaries,
binaries, many bin
1.7%
1.7% Names
Names (5,495)
(5,495) sizes.
9.0% Figure
9.0% 8.1% 3.3%
4:
3%
3% Distribution
Other
Other (324)
(324) of binary that 79.1%
respectively.
respectively. of the binaries
Overall,
Overall, our
our are produced
methods
methods infer
infer withthe
the the
source
sourcehelp of LLVM.
language
language for
for
8.1% 3.3% Go Go (146)
(146) Analy
C/C++
C/C++ (760) AssemblyScript (256)
91%
91% (7,698)
(7,698) of of the
the 8,461
8,461 unique
unique binaries.
binaries.
C++, or(760)
a combination
CC (682)
(682) <100
AssemblyScript
of both. Given (256)that these are memory-unsafe
Insight 5. Almost 80% of all binaries are compiled with the help static ana
<100 instructions
instructions (283)
(283) (b) (b) Analysis
Analysis methodsmethods (multi- (multi-
languages, plagued with decades of vulnerabilities [41] and that of the LLVM toolchain. This implies that security mitigations, such pointer to
(a)
(a) Source
Source languages.
languages. ple
ple cancan apply
apply per per binary).
binary). 4.4 RQ2: Vulnerabilities Propagated from
WebAssembly binaries are not automatically safe from exploita- as stack canaries, would have a large effect on the ecosystem if all. Out o
Figure
Figure 5: 5: Source
Source languages
languages and and methods
methods for for inferring
inferring them. them. Source Languages
tion [18], this result is highly worrying. implemented in this toolchain. • has ty
Recent
Recent work
work shows that memory vulnerabilities in unsafe source • is decl
Insight Figure 5b shows
shows that
9 https://github.com/Sable/matwably which memoryof ourvulnerabilities
methods are most in unsafe effective source at • is the m
Insight 3. 3. Almost
Almost two-thirdstwo-thirds of of all all collected
collected binaries binaries are are com-
com- languages
languages can
can propagate
propagate to
to WebAssembly
WebAssembly binaries
binaries and
and may
may some-
some-
piled
10 inferring the source language. 20.2% of the binaries contain a pro-
https://github.com/FStarLang/kremlin of glo
piled from from memory-unsafe
memory-unsafe source source languages.
languages. These These resultsresults and and times
times
11 be
be exploited
exploited even
even more
more easily
easily than
than for
for native
native binaries
binaries [18].
[18].
those in the next section call for techniques to analyze and ensure ducers
An 8-bitsection,
VM language fromfrom which 1970s, the source code language can be directly
https://github.com/pepyakin/emchipten
those in the next section call for techniques to analyze and ensure While
While this prior
prior work
this Characteristic work evaluates
evaluates the risks of
of such attacks on
on aa small
namesthe andrisks such attacks small
12 https://github.com/rustwasm/wasm-bindgen 14
obtained. strings are also important in- The produ
WebAssembly’s
WebAssembly’s binary binary security.
security. 13
set
set of 26 binaries, most
https://github.com/rustwasm/walrus
of 26 binaries, most of
of which
which are
are compiled
compiled C/C++
C/C++ benchmarks,
benchmarks, This is true
ference methods, since they apply to 64.9% and 60.9% of binaries,
itit remains
remains unclear unclear to
to what
what extent
extent propagated
propagated vulnerabilities
vulnerabilities may
Rust comes in second place with 14.9% of all binaries, followed respectively. Overall, our methods infer the source languagemay for
Rust
Rust comes
comes in
in second
second place
place with
with 14.9%
14.9% of
of all
all binaries,
binaries, followed
followed affect
affect real-world
real-world WebAssembly
WebAssembly binaries.
binaries.
by AssemblyScript (3%) and Go (1.7%) as source languages with 91% (7,698) of the 8,461 unique binaries.
by
by AssemblyScript
AssemblyScript (3%)
(3%) and
and Go Go (1.7%)
(1.7%)there as
as source
source languages
languages with
with We
We address
address this this question
question by by studying
studying three three important
important character- character-
official WebAssembly support. Finally, is a longer tail of other
official
official WebAssembly
WebAssembly support.
support. Finally,
Finally, there
there is
is aa longer
longer tail
tail of
of other
other istic
istic of
of binaries
binaries that
that attackers
attackers can
can abuse:
abuse: (i)
(i) uses
uses of
of the
the unmanaged
unmanaged
languages, often used in single projects: Matlab999 (0.69%), FStar10 10
10 4.4 RQ2: Vulnerabilities Propagated from
languages,
languages, often
often11usedused in
in single
single projects:
projects: Matlab (0.69%), FStar stack,
stack, i.e., i.e., an an unprotected
unprotected representation
representation of of thethe function
function stack stack in
(0.33%), CHIP-8 11
11
(0.26%), several binariesMatlab compiled (0.69%),
from toy FStar lan- Sourcelinear Languages
in
(0.33%),
(0.33%), CHIP-8
CHIP-8 (0.26%),
(0.26%), several
several binaries
binaries compiled
compiled from
from toy
toy lan-
lan- WebAssembly’s
WebAssembly’s linear memory
memory (Section
(Section 2),
2), which
which attackers
attackers can
can use
use
guages, and even a single instance of COBOL. A small portion of
guages,
guages, and even aa single
single instance of
of COBOL. A
A small
small portion of Recent
for work shows
stack-based bufferthat
buffer memory
overflows andvulnerabilities
and stack
stack overflows in(Section
unsafe source 4.4.1);
binaries and (1.1%) evenis translated instance
directly from COBOL.
the WebAssembly portion
text for-of for stack-based overflows overflows (Section 4.4.1);
binaries
binaries (1.1%)
(1.1%) is
is translated
translated directly
directly from
from
mat, i.e., likely to be written by hand. Finally, for 3.3% of all binaries
the
the WebAssembly
WebAssembly text
text for-
for- (ii) unsafe memory allocators compiled into a binary,may
languages
(ii) unsafe can
memory propagate allocatorsto WebAssembly
compiled binaries
into a binary,and which
which some-at-
at-
mat,
mat, i.e.,
i.e., likely
likely to
to be
be written
written by
by hand.
hand. Finally,
Finally, for
for 3.3%
3.3% of
of all
all binaries
binaries times be
tackers
tackers can
can exploited
abuse
abuse as
aseven
aa more easily
memory
memory write
write than
primitive
primitive for native
(Section
(Section binaries
4.4.2);
4.4.2); [18].
and
and
we could not assign a source language, but since they contain less
wewe could
could not
not assign
assign aa source source language,
language, but
buttosince
since they
they contain
contain less
less While
(iii)
(iii) this prior
accesses
accesses to work evaluates
to potentially
potentially dangerous
dangerous the risks APIs
APIs ofimported
such attacks
imported from
from onthe a small
the host
host
than 100 instructions, they are also likely be written manually.
than 100 instructions, they are also
than 100 instructions, they are also likely to be written manually. likely to be written manually. set of
environment 26 binaries, most of which are
environment (Section 4.4.3). For (i) [18] reports results on 26 bina-
(Section 4.4.3). For (i) [18]compiled
reports C/C++
results benchmarks,
on 26 bina-
it remains
ries
ries only.
only. We We unclear
refineto
refine theirwhat
their extent
static
static analysis
analysispropagated and
and consider vulnerabilities
consider aa 325×
325× larger may
larger
Insight
Insight 4. 4. In In addition
addition to to C/C++,
C/C++, variousvarious other other languages
languages are are com-
com- affect real-world
dataset. For WebAssembly binaries.
dataset. For characteristics
characteristics (ii)
(ii) and
and (iii),
(iii), this
this workwork is is the
the first
first to to sys-
sys-
piled
piled to to WebAssembly,
WebAssembly, including including languages
languages with with garbage
garbage collection
collection We address
tematically this question
evaluate their by studying three important character-
tematically evaluate their prevalence
prevalence in
in real-world
real-world WebAssembly
WebAssembly
and
and heavier
heavier runtimes
runtimes (Go, (Go, Matlab).
Matlab). This This result
result matches
matches WebAssem-
WebAssem- istic of binaries thatall
binaries.
binaries. We
We study
study allattackers
binaries,can
binaries, abuse: (i) of
irrespective
irrespective uses
of theofsource
the the unmanaged
source language,
language,
bly’s
bly’s goalgoal of of serving
serving as as aa universal
universal bytecode. bytecode. It It also
also means
means binarybinary stack,
because i.e., an unprotected representation of the function stackall in
because the problem of propagated vulnerabilities may affect
the problem of propagated vulnerabilities may affect all
analysis
analysis will will become
become more more important,
important, since since sourcesource code code is is not
not al-al- WebAssembly’s linear memory (Section 2),in
languages
languages with
with memory-unsafe
memory-unsafe behavior,
behavior, inwhich
particular
particular attackers C,
C, C++, can but
C++, use
but
ways
ways available,
available, and and even even ifif itit is,is, implementing
implementing separate separate analysesanalyses for stack-based
also, e.g., buffer overflows and stack overflows (Section 4.4.1);
also, e.g., Rust,
Rust, as as its
its unsafe
unsafe keyword
keyword is
is commonly
commonly used
used [7]
[7] andand can
can
for
for many
many languages
languages is is impractical.
impractical. (ii)
cause unsafe memory
memory-safety allocators
related compiled
vulnerabilities into [46]. a binary, which at-
cause memory-safety related vulnerabilities [46].
We also analyze the tools used to produce the binaries. For 20.2% tackers can abuse as a memory write primitive (Section 4.4.2); and
We
We also
also analyze
analyze the the tools
tools usedused to to produce
produce the the binaries.
binaries. For For 20.2%
20.2% 4.4.1
4.4.1 Usage
Usage to
(iii) accesses of the
the Unmanaged
of potentially
Unmanaged dangerousStack.
Stack.APIs The so-called
The imported
so-called from “unmanaged
“unmanaged
the host
of the binaries, the producers section explicitly mentions them in stack” is
of
of the
the binaries,
binaries, the the producers
producers section section explicitly
explicitly mentionsmentions them them in in stack”
environment is aa region
region
(Section within
within the
4.4.3).theFor linear
linear
(i) memory
[18]memoryreports of aa WebAssembly
ofresults WebAssembly
on 26 bina-
the processed-by field. 10.8% of all binaries explicitly mention being program
the processed-by field.
the processed-by field. 10.8%
10.8% of of all
all binaries
binaries explicitly
produced by Clang. No binaries mention Emscripten because it does
explicitly mention
mention being being ries only.that
program We holds,
that holds, e.g.,
refine e.g., non-primitive
their static analysisdata
non-primitive and with
data with
consider function
function a 325× livetime.
livetime.
larger
produced
produced by by Clang.
Clang. No No binaries
binaries mention
mention Emscripten
Emscripten because because itit does does This
This
dataset.design
designFor is
is motivated
motivated
characteristics by
by the
the
(ii) fact
andfact that
that
(iii), all
all
this non-scalar
non-scalar
work is the data
data
first and
and
to all
all
sys-
not emit a producer section, unlike newer versions of Clang. These data of
not
not emit
emit aa producer
producer section, section, unlikeunlike newer newer versions
versions of of Clang.
Clang. These These data of which
tematically which an
an address
evaluate address their is taken
taken cannot
isprevalence cannot in be
be put
put in
real-world in WebAssembly’s
WebAssembly’s
WebAssembly
results show that Emscripten is no longer the only way to compile locals or
results
results show show that that Emscripten
Emscripten is is no
no longer
longer the the only
only way way to to compile
compile locals
binaries. or globals,
globals,
We study but
but allmust
must instead
binaries,instead reside
reside in
irrespective inof linear
linear memory.
the sourcememory. Given
Given
language,
C and C++ to WebAssembly. Other tools that appear in the producer that buffer overflows on the unmanaged stack can overwrite across
C and C++ to WebAssembly. Other tools
C and C++ to WebAssembly. Other tools12that appear in the producer that appear in the producer that
because buffer the overflows
problem on
of the unmanaged
propagated stack can overwrite
vulnerabilities may across
affect all
section are rustc (9.5%), wasm-bindgen12 (7.9%), a JavaScript host- stack frames and even
section
section are rustc (9.5%),
are rustc (9.5%), wasm-bindgen
wasm-bindgen 13
12 (7.9%),
(7.9%), aa JavaScript
JavaScript host- host- stack
languages frames with even into
andmemory-unsafe into supposedly
supposedlybehavior, “constant”
“constant”
in particular data,
data,C, this
this C++,makes
makes but
code generator, and walrus (7.5%)13 , a binary transformation library, the
code
code generator,
generator, and walrus (7.5%)
and walrus (7.5%)13,, aa binary
and the official Go compiler (0.4%). Since all compilers for Rust, C,
binary transformation
transformation library, library, also,unmanaged
the e.g., Rust, asstack
unmanaged stack aa more
its unsafe more dangerous
dangerous
keyword exploitation
exploitation
is commonly usedtarget[7] andthan
target than
can
and
and the the official
official Go Go compiler
compiler (0.4%). (0.4%). Since Since all all compilers
compilers for for Rust,
Rust, C, C, even
even
cause in
in native
native programs
memory-safety programs related[18]. For
For this
[18].vulnerabilitiesthis reason,
reason, [46]. we
we evaluate
evaluate how how
and C++ to WebAssembly are based on LLVM, we can also derive many
and
and C++ C++ to to WebAssembly
WebAssembly are are based
based on on LLVM,
LLVM, we we cancan also also derive
derive many binariesbinaries use use itit in in practice.
practice.
that 79.1% of the binaries are produced with the help of LLVM. 4.4.1 Usage of the Unmanaged Stack. The so-called “unmanaged
that
that 79.1%
79.1% of of the
the binaries
binaries are are produced
produced with with the the help
help of of LLVM.
LLVM. stack”
Analysis.is a region
Analysis. To
To analyze withinthe
analyze thetheusagelinearof
usage ofmemory
the
the unmanaged
unmanagedof a WebAssembly stack,
stack, our our
9 https://github.com/Sable/matwably
Insight
Insight 5.
5. Almost
Almost 80% 80% of of all
all binaries
binaries are are compiled
compiled with with the the help
help program
static that holds, e.g., non-primitive
static analysis performs two steps. First, it tries to identify the
analysis performs two steps. First, data
it trieswith to function
identify livetime.
the stack
stack
10 https://github.com/FStarLang/kremlin
of the LLVM toolchain. This design
pointer
pointer to is motivated
to determine
determine whether
whether by the fact that
aa binary
binary uses
uses allan non-scalar
an unmanaged
unmanaged datastack and all
stack at
at
of
11 Anthe8-bit
LLVM toolchain.
VM language fromThis1970s,implies
This implies that
that security
security mitigations,
https://github.com/pepyakin/emchipten mitigations, such such
12 https://github.com/rustwasm/wasm-bindgen
as
as stack canaries, would have a large
stack canaries, would have a large effect on the ecosystem if effect on the ecosystem if data
all.
all. Out
Outof which
of
of all
all an
global
global address
variables,
variables,is takenthe
the cannot
analysis
analysis be put
selects
selects in WebAssembly’s
the
the one
one that
that
13 https://github.com/rustwasm/walrus
implemented
implemented in
in this
this toolchain.
toolchain. •locals
• has or
has type
type globals,
i32,, i.e.,
i32 butthe
i.e., must
the type
type instead
of
of all reside in linear memory. Given
all pointers,
pointers,
•• is is declared
declared mutable,mutable, to to exclude
exclude constantsconstants like STACK_MAX,,
like STACK_MAX
99https://github.com/Sable/matwably
https://github.com/Sable/matwably •• is is the
the mostmost read read and and written
written global,
global, as as determined
determined by by thethe number
number
10
10https://github.com/FStarLang/kremlin 14
14,, and
11
https://github.com/FStarLang/kremlin of
of global.get
global.get ×× global.set
global.set instructions
instructions and
11 An
An 8-bit
8-bit VM
VM language
language from
from 1970s,
1970s, https://github.com/pepyakin/emchipten
https://github.com/pepyakin/emchipten
12
12https://github.com/rustwasm/wasm-bindgen 14
14The
https://github.com/rustwasm/wasm-bindgen The product
product ofof the
the counts
counts prefers
prefers globals
globals which
which are
are similarly
similarly often
often read
read and
and written.
written.
13
13https://github.com/rustwasm/walrus This
https://github.com/rustwasm/walrus This is
is true
true for
for the
the stack
stack pointer,
pointer, but
but not
not for
for other
other frequently
frequently accessed
accessed pointers.
pointers.
handle language extensions and makes the analysis robust enough section o
to run on thousands of real-world binaries.
Results
Results. Figure 6a shows that almost two thirds (65%) of all bina- In blue, w
ries use the unmanaged stack. While most of them use the global guages a
with index 0 as their stack pointer, our heuristics to identify the locators t
stack pointer are important, since 15.2% of all binaries use another an alloca
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Aaron Hilbig, Daniel Lehmann, and Michael Pradel
global variable. For 2% of the binaries, the analysis can clearly deter- binaries u
100%
mine that they have no unmanaged stack in linear memory, simply Clang, an
Unknown (1,994)
because there is no linear memory atNoall. loads/stores (155)
For 23.5% of the binaries,

Functions that access the


Not enough accesses (805) 80% 23.6%
No memory (167) Assembly
there is Unknown,
a linear memory section, but no mutable i32 global that

unmanaged stack
1.8% Others (9)
9.5%
60%
Emscripten (1,189) 2.0% their resp
could be a stack pointer. 14.0% Finally, 9.5% of all binaries
emmalloc (11) have at least Amon
Stack pointer: 4.3% Boehm GC (61)
global 0 (4,216) 23.5%
No mutable
i32 global (1,989)
40% one candidate mutable global pointer, but0.7% it is not accessed
0.7% wee_alloc (62)
often nate, bein
49.8% enough for our(1,430)
analysis
20% dlmalloc 16.9% to consider it the stack
eosio pointer.(368)
simple_malloc Interest-
2.0% 32.7%
EOSIO, a
15.2%
No memory (167) 0%
ingly, AssemblyScript programs 1.6% are in the last category, since its bytecode
1.3% eosio malloc (2,766)
0% 20% 40% 60% 80% 100% runtime seems Go malloc
to not (139)support stack allocation.
Stack pointer: others (1,284) Percentile of binaries AssemblyScript alloc (114) Emscript
To better understand how much a binary uses the unmanaged default a
(a) Proportion of binaries (b) Cumulative distribution of pro- stack,Figure
Figure7:6b Allocators
shows how in many
binaries, multiple
of the functions canin apply.
a binary analysis o
with identified stack pointer, portion of functions in a binary access the stack pointer at least once. Consistent with Figure 6a,
and if not why. that use the unmanaged stack.
in 35% of all binaries no function uses the stack pointer because are consi
in 35% of all binaries no function uses the stack
none is present. In the median binary, already 33% of all functions pointer because assertion
Figure 6: Unmanaged stack usage in binaries. none is present. In the and median binary, already 33% of all functions
use the stack pointer, in some binaries almost every function dataset, w
that buffer overflows on the unmanaged stack can overwrite across use
usesthe
thestack pointer, stack.
unmanaged and inOn some binaries
average acrossalmost every function
all binaries with an naries), t
stack frames and even into supposedly “constant” data, this makes uses the unmanaged
unmanaged stack, 44% stack. On average
of their functions across
makeall usebinaries
of it. with an that wer
the unmanaged stack a more dangerous exploitation target than unmanaged stack, 44% of their functions make use of it. corruptio
even in native programs [18]. For this reason, we evaluate how Insight 6. Many binaries (65%) and functions in those binaries Boehm G
many binaries use it in practice. (44%) use the unmanaged stack, which attackers may abuse for in severa
Analysis. To analyze the usage of the unmanaged stack, our runtime exploitation. This result extends earlier findings [18] to a
Insight 7
static analysis performs two steps. First, it tries to identify the stack much larger and more diverse set of real-world binaries.
allocator
pointer to determine whether a binary uses an unmanaged stack at
the risk t
all. Out of all global variables, the analysis selects the one that 4.4.2
4.4.2 Statically
Statically Linked Allocators. WebAssembly’s
Linked Allocators. WebAssembly’s memory
memory orga-orga- tion to us
• has type i32, i.e., the type of all pointers, nization
nization isis very
very low-level.
low-level. Besides
Besides the
the single
single linear
linear memory
memory section,
section, memory
• is declared mutable, to exclude constants like STACK_MAX, which
which can
can be be expanded
expanded at at runtime
runtime with
with the memory.grow instruc-
the memory.grow instruc- environm
• is the most read and written global, as determined by the tion,
tion, no
no help
help with
with allocating
allocating memory
memory isis provided
provided by
by the
the language.
language.
number of global.get × global.set instructions14 , and Subdividing
Subdividing the the linear
linear memory,
memory, e.g.,
e.g., to
to avoid
avoid fragmentation
fragmentation and and
4.4.3 Im
• has at least three reads and at least three writes, to avoid to reuse space of deallocated objects, needs to be handled by an
15 https://github.com/sola-st/wasm-binary-security exploit a
false positives in small binaries. allocator that is statically linked into the binary. Especially for bi-
We manually validate that these heuristics identify the stack pointer naries on the web and for smart contract platforms, code size is an
reliably on randomly sampled binaries. If the analysis cannot iden- important consideration, so developers can choose a lightweight
tify a stack pointer, it conservatively assumes that the binary does allocator instead of the default allocator provided by the compiler.
not use an unmanaged stack. Second, once the stack pointer is Prior work has shown [18] that those smaller allocators can lack
identified, the analysis counts the number of functions in the bi- important mitigations against heap metadata corruption and yield
nary that access the stack pointer somewhere in their body. Our powerful arbitrary write primitives for an attacker. However, it
implementation builds upon the prototype analysis provided by remains unclear what allocators developers use in practice.
Lehmann et al.15 , but uses a different WebAssembly parser to also Analysis. To identify allocators in a binary, we rely on similar
handle language extensions and makes the analysis robust enough heuristics as for source language detection (Section 4.3). That is, we
to run on thousands of real-world binaries. first identify allocators by characteristic function names in binaries,
Results. Figure 6a shows that almost two thirds (65%) of all bina- if those are available. Then, we inspect frequent strings in the data
ries use the unmanaged stack. While most of them use the global section of binaries, e.g., for error messages of certain allocators.
with index 0 as their stack pointer, our heuristics to identify the Results. Figure 7 shows our results, grouped into three categories.
stack pointer are important, since 15.2% of all binaries use another In blue, we mark default allocators provided by programming lan-
global variable. For 2% of the binaries, the analysis can clearly deter- guages and compilers, in different shades of red we mark other al-
mine that they have no unmanaged stack in linear memory, simply locators that we identified, and in gray when we could not identify
because there is no linear memory at all. For 23.5% of the binaries, an allocator. In terms of default allocators, we see that 16.9% of all
there is a linear memory section, but no mutable i32 global that binaries use dlmalloc, the default allocator provided by Emscripten,
could be a stack pointer. Finally, 9.5% of all binaries have at least Clang, and the Rust compiler when targeting WebAssembly. Go and
one candidate mutable global pointer, but it is not accessed often AssemblyScript allocators are present roughly in the proportion of
enough for our analysis to consider it the stack pointer. Interest- their respective languages.
ingly, AssemblyScript programs are in the last category, since its Among the non-default allocators, two particular ones domi-
runtime seems to not support stack allocation. nate, being in 32.7% and 4.3% of our binaries. They are both from
To better understand how much a binary uses the unmanaged EOSIO, a smart contract platform that uses WebAssembly as its
stack, Figure 6b shows how many of the functions in a binary bytecode. Those contracts can be written in C++ and compiled with
access the stack pointer at least once. Consistent with Figure 6a, Emscripten. However, most of them are not using Emscripten’s
14 The product of the counts prefers globals which are similarly often read and written. default allocator. While we did not perform an in-depth security
This is true for the stack pointer, but not for other frequently accessed pointers. analysis of EOSIO malloc and simple_malloc we can attest that both
15 https://github.com/sola-st/wasm-binary-security are considerably shorter in terms of code and do not feature any
their respective languages.
at least Among the non-default allocators, two particular ones domi-
ed often nate, being in 32.7% and 4.3% of our binaries. They are both from
Interest- EOSIO, a smart contract platform that uses WebAssembly as its
since its bytecode. Those contracts can be written in C++ and compiled with
Emscripten. However, most of them are not using Emscripten’s
managed An Empirical
default Study of While
allocator. Real-World
weWebAssembly Binariesan in-depth security
did not perform WWW ’21, April 19–23, 2021, Ljubljana, Slovenia
a binary analysis of EOSIO malloc and simple_malloc we can attest that both
gure 6a, Table 3: Imports matching potentially security-critical APIs.
are considerably
assertions shorter
that would in terms
guard of code
against and do
metadata not feature
corruption. Inany
our
because assertions Matching
unctions dataset, we also find wee_alloc (62 binaries) and emmalloc (11our
that would guard against metadata corruption. In bi- Category Patterns
dataset, we also
naries), two smallfind wee_alloc
allocators for (62
Rustbinaries) and emmalloc
and Emscripten (11 bi-
respectively, Imports Binaries %
function naries), twoalready
small allocators forvulnerable
Rust and Emscripten respectively,
with an that were found to be against heap metadata Code execution eval, exec, execve 383 160 1.9%
that were already
corruption attacksfound
[18]. to be vulnerable
Other interestingagainst
customheap metadata
allocators are emscripten_run_script
corruption
Boehm GC attacks [18]. Other interesting
(a mark-and-sweep custom and
garbage collector) allocators are
gperftools, Network access xhr, request, http, fetch 944 172 2.0%
binaries Boehm GCbinaries
in several (a mark-and-sweep
collected from garbage
Googlecollector)
domains.and gperftools, File I/O file, fd, path 7,532 1,610 19.0%
buse for in several binaries collected from Google domains. DOM interaction document, html, body, element 1,720 212 2.5%
[18] to a Dynamic linking dlopen, dlsym, dlclose 352 138 1.6%
Insight 7. WebAssembly binaries come with a variety of memory
allocators, including many custom allocators (38.6%), increasing At least one 10,468 1,797 21.2%
the risk to include vulnerable allocators. If code size is the motiva-
ry orga-
tion to use custom allocators, a more secure alternative could be a
y section, bufdelete does not. We manually inspect matches to ensure they
memory allocation or garbage collection API provided by the host
instruc- are plausible and remove benign matches otherwise.
environment [33].
anguage. Results. Table 3 shows the results of our name-based import
tion and 4.4.3 Imports
Imports of of Security-Critical
Security-Critical APIsAPIs from
from Host
Host Environment.
Environment. To To analysis. The first two columns show the category and patterns
4.4.3
exploit aa WebAssembly
exploit WebAssembly binary, binary, an
an attack
attack proceeds
proceeds in in two
two steps.
steps. The
The we match import names against. In the third column, we count
first step is compromising the state or behavior of the WebAssembly imported functions that match at least one pattern. The last two
An Empirical Study of Real-World WebAssembly Binaries
binary itself, e.g., by exploiting an unsafe allocator (Section 4.4.2) columns show the number WWW of binaries’21, April 19–23, 2021, Ljubljana, Slovenia
with at least one matching
or a buffer overflow on the unmanaged stack (Section 4.4.1). The import, and which fraction of the filtered dataset this corresponds to.
Table 3: Imports matching potentially security-critical APIs.
second step is actually performing the malicious action to the un- We see that
import, imports
and which relatedoftothe
fraction file I/O, the
filtered mostthis
dataset common category,
corresponds to.
derlying system. The only way to do so, assuming VM Matching
implementa-
Category Patterns are see
We present
that in almostrelated
imports every fifth
to filebinary.
I/O, theInterestingly,
most common evencategory,
though
tions are bug-free and host security16 is perfect (which
Imports they are not%
Binaries WebAssembly
are was originally
present in almost every fifth notbinary.
meantInterestingly,
to replace JavaScript,
even though but
[1, 36]), is to call functions imported
eval, exec, execve
into the WebAssembly binary rather for compute
WebAssembly intensive not
was originally applications,
meant to still 212 JavaScript,
replace binaries likely
but
Code
from execution
the host environment. For example, an attacker383 could 160pass1.9%
an
emscripten_run_script interactfor
rather with the DOM
compute from WebAssembly,
intensive applications, still which212attackers could
binaries likely
injected access
Network string onxhr,therequest,
unmanaged stack to an 944
http, fetch imported172 function,
2.0% use for cross-site
interact with the scripting.
DOM from InWebAssembly,
the last row, wewhich see that overall could
attackers 21.2%
e.g.,I/O
File JavaScript’s evalfile,. To
fd, estimate
path how often WebAssembly
7,532 1,610binaries
19.0% of all
use forbinaries import
cross-site at least
scripting. onelast
In the potentially
row, we seesecurity-critical API.
that overall 21.2%
use such
DOM security-critical
interaction document,host
html, APIs, we thus analyze
body, element 1,720 their 212imports.
2.5% of all binaries import at least one potentially security-critical API.
Dynamic linking dlopen, dlsym, dlclose 352 138 1.6%
Analysis. We identify imported security-critical APIs based on Insight 8. Many binaries (21.2%) import potentially dangerous
At leastimport
their one name in the WebAssembly binary. 10,468 Going 1,797 21.2%
by name APIs from their host environment, which may allow compromised
(instead of implementation) is necessary because, (1) the imple- binaries, e.g., to inject arbitrary code or to write to the file system.
first step is compromising
mentation of an imported thefunction
state or behavior of the
is supplied by WebAssembly
the host only
binary itself, e.g., by exploiting
at instantiation-time, so it is notan available
unsafe allocator (Section
when given 4.4.2)
only the
or a buffer
binary; andoverflow
(2) there on
are the
many unmanaged stack (Section
host environments, 4.4.1).
not all The
of which 4.5
4.5 RQ3:
RQ3: Cryptomining
Cryptomining
second
are using step is actuallyWASI
JavaScript. performing the malicious
for example, action tothat
defines imports the can
un- A
A study
study ofof real-world
real-world usesuses of
of WebAssembly
WebAssembly performed
performed inin early
early
derlying system. The
be implemented only way
by different to do so, assuming
standalone WebAssemblyVM implementa-
VMs in na- 2019
2019 [24]
[24] reports
reports cryptomining
cryptomining to to be
be one
one of
of the
the most
most common
common use use
tions are bug-free
tive code. We thusand identify import16names
host security is perfect
for (which
which thetheyhost
are not
im- cases
cases ofof WebAssembly
WebAssembly on on the
the web.
web. That
That study
study found
found 55.7%
55.7% of
of the
the
[1, 36]), is to call
plementation functions
is likely imported into function.
a security-critical the WebAssembly binary
E.g., the import analyzed
analyzed websites
websites to to use
use WebAssembly
WebAssembly for for cryptojacking,
cryptojacking, i.e.,
i.e., the
the
from the host environment. For example,binaries
in WebAssembly an attacker could pass
is typically boundan practice
emscripten_run_script practice ofof using
using aa website
website visitor’s
visitor’s hardware
hardware resources
resources for
for mining
mining
injected string on the unmanaged
to Emscripten-generated JavaScriptstack to ancalls
code that imported
eval. Wefunction,
match cryptocurrencies
cryptocurrencies without
without their
their consent.
consent. Identifying
Identifying and
and controlling
controlling
e.g., JavaScript’s
imports against eval . To estimate
18 patterns how
in five often WebAssembly
categories known to bebinaries
poten- cryptominers
cryptominers on on the
the web
web has
has been
been the
the focus
focus ofof several
several recent
recent pieces
pieces
use
tiallysuch security-critical
security-critical host APIs, we thus analyze their imports.
APIs: of
of work
work [15,
[15, 25,
25, 34,
34, 43].
43]. In
In this
this research
research question,
question, wewe study
study whether
whether
Analysis. We identify
• Code execution. imported
Imports security-critical
like eval APIs based on
, exec, or emscripten_run_script . cryptomining
cryptomining is is still
still an
an important
important threat
threat today.
today. We
We address
address this
this
their•import
Network name in the
access. WebAssembly
Imports containing binary. Going ,by
XHR, request httpname
, or question
question in in two
two ways.
ways. First,
First, we
we analyze
analyze those
those binaries
binaries we
we collected
collected
(insteadfetch
of implementation)
. is necessary because, (1) the imple- from
from the
the web
web for
for signs
signs ofof being
being cryptominers.
cryptominers. Second,
Second, we
we directly
directly
mentation of an
• File I/O. imported
Imports function
containing is ,supplied
file fd, or path by. the host only compare
compare the binaries gathered in earlier work with our dataset.
the binaries gathered in earlier work with our dataset.
at instantiation-time,
• DOM interaction.soImports it is not availabledocument
containing when given , htmlonly
, bodythe
, or 4.5.1 Analyzing Binaries Found on the Web. To understand the
binary;element
and (2) there 4.5.1 Analyzing Binaries Found on the Web. To understand the
could are many host
manipulate theenvironments,
DOM, which not can all ofto
lead which
XSS. prevalence of cryptomining today, we analyze all binaries collected
are using JavaScript. WASI for ,example, defines imports prevalence of cryptomining today, we analyze all binaries collected
• Dynamic linking. dlopen dlsym, and dlclose allowthat can
loading from the web, i.e., direct downloads guided by HTTP Archive and
be implemented from the web, i.e., direct downloads guided by HTTP Archive and
additionalby different
code standalone
at runtime, which WebAssembly
can lead to code VMs in na-
injection. the results of our own crawling, using VirusTotal. The VirusTotal
tiveTocode. the results of our own crawling, using VirusTotal. The VirusTotal
avoidWe thus identify
spurious matches, import names
especially forfor which
short the host
patterns likeim-
fd, API allows to upload and scan files with up to 70 independent third-
API allows to upload and scan files with up to 70 independent third-
plementation
we tokenize import is likely a security-critical
names function.
based on camel-case and E.g., the import
non-alphabet party antivirus scanners and malware detectors, and reports back
emscripten_run_script in WebAssembly binaries is typically bound party antivirus scanners and malware detectors, and reports back
characters, and then check for a pattern to occur verbatim in the the number of positive results. Among the 352 analyzed binaries,
to Emscripten-generated JavaScript code our
thatfile
callsI/O
eval . We match the number of positive results. Among the 352 analyzed binaries,
token sequence. E.g., fd_write matches category, but VirusTotal reports four files to contain malicious content. Three of
imports against 18 patterns in five categories known to be poten- VirusTotal reports four files to contain malicious content. Three of
them are likely to be the same program, as they have similar sizes
16 Host security: isolation of WebAssembly execution from the underlying host, e.g.,
tially security-critical APIs: them are likely to be the same program, as they have similar sizes
(68.8 ± 0.7kB) and the same distribution of instructions. These three
ensuring that a WebAssembly program can write only to its designated memory region. (68.8 ± 0.7kB) and the same distribution of instructions. These three
• Code execution. Imports like eval, exec, or emscripten_run_script. files are detected by 26 or more scanning tools employed by Virus-
• Network access. Imports containing XHR, request, http, or fetch. Total. One of the files is collected from http://monero.cit.net,
• File I/O. Imports containing file, fd, or path. which further supports the presumption that the binary is a cryp-
• DOM interaction. Imports containing document, html, body, or tominer, as “Monero” is the name of a common cryptocurreny.
element could manipulate the DOM, which can lead to XSS. Moreover, one of three binaries is identical to a binary we collect
• Dynamic linking. dlopen, dlsym, and dlclose allow loading addi- also from a GitHub repository called “CryptoNoter”17 , which is an
tional code at runtime, which can lead to code injection. open-source Monero cryptominer. The fourth file reported by Virus-
Total is tested positive by only one scanner. Our manual analysis
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Aaron Hilbig, Daniel Lehmann, and Michael Pradel

Other

Number of domains
files are detected by 26 or more scanning tools employed by Virus- 300 Test for WebAssembly support
Total. One of the files is collected from http://monero.cit.net, which Hyphenopoly
200
further supports the presumption that the binary is a cryptominer, Long.js

as “Monero” is the name of a common cryptocurreny. Moreover, 100

one of three binaries is identical to a binary we collect also from a 0


0 10 20 30 40
GitHub repository called “CryptoNoter”17 , which is an open-source Unique binaries, orderd by number of occurrences
Monero cryptominer. The fourth file reported by VirusTotal is tested Figure 8: Binaries found on multiple websites.
positive by only one scanner. Our manual analysis shows that the
websites, which helps understand libraries and other widely reused
report for this binary is likely to be a false positive.
components (Section 4.6.1). Second, we inspect a random sample
4.5.2 Comparison with Dataset by Musch et al. [24]. The previous
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia of 100 unique binaries, which helpsDaniel
Aaron Hilbig, understand
Lehmann, and theMichael
application
Pradel
study is based on 147 unique WebAssembly binaries, which the domains of WebAssembly (Section 4.6.2). To capture the full picture
authors kindly shared with us. TheOther intersection of their dataset Table 4: Application
of WebAssembly domains
use cases, the results of 100are on randomly
unfiltered sampled,
binaries.
Number of domains

with ours 300 contains 23 binaries, i.e., Test 16%forof their dataset
WebAssembly supportand 0.2% unique WebAssembly binaries found on the web.
4.6.1 Binaries Found on Multiple Websites. Out of all 476 unique
of our dataset.
200 To better understandHyphenopoly these binaries, we manually Application domain
Long.js binaries found on the# web,
Binaries Application
70 are reused across domain
at least#two Binaries
differ-
examine them and visit the corresponding websites. We find four
100 ent top-level
Games domains. Figure 25 8 shows a histogram
Online gambling of how often
2
of the 23 binaries to be suspicious. Two of them are among the files
binaries occur on multiple domains.
Text processing 11 The dataand
Barcodes follows
QR codes a long-tail dis- 2
flagged by0 VirusTotal, as discussed above. For one file from a website
0 10 20 30 40 tribution,
Visualizationi.e.,/aAnimation
few binaries11occur on Roommany websites,
planning while many
/ Furniture 2
that declares itself to binaries,
Unique be a “blockchain
orderd by number explorer”,
of occurrenceswe could not Mediabinaries
processing / Player 9 or only Blogging 2
other recur a few times once. The top-most widely
observeFigure 8: Binaries
any suspicious found
activity on multiple
when visiting the websites.
source website, Demo, e.g., binary
of a occurs 371 times, Cryptocurrency wallet 2
distributed 7 i.e., in 28% of all domains where
but also could not identify its functionality, and thus declare it to be programming language binaries. Regular expressions 1
but also could we detect WebAssembly
suspicious. Fornot
theidentify its functionality,
last suspicious and thus
binary, visiting thedeclare it to be
corresponding Wasm tutorial or test
suspicious. For the last suspicious binary, visiting the corresponding To better understand the 5most recurring Hashing
binaries, we analyze 1
website increases CPU load to 70%. The website offers a service to Chat 3 PDF viewer 1
website increases CPUand loadopenly
to 70%.advertises
The website them through a combination of automated clustering and manual
mine cryptocurrency theoffers a service
fact that one canto
mine cryptocurrency and openly advertises the fact that one can inspection. The automated clustering represents each binary as
start mining immediately in the browser. That is, the binary is an
start mining immediately in a set of byteunderstand
To better n-grams [21], the mostsummarizes
recurring thebinaries,
numberwe of analyze
n-gram
example of cryptomining butthe notbrowser. That is, the binary is an
cryptojacking.
example of cryptomining but not cryptojacking. occurrences
them through inaa combination
binary into a characteristic
of automated vector, clusteringand then clusters
and manual
In summary, we identify only four binaries from our “web” dataset binaries based
inspection. Theonautomated
the pairwise cosine similarity
clustering represents of their
eachvectors.
binary We as
In
as summary, we identify only
possible cryptominers (about four1%binaries from ourthree
of the dataset), “web”ofdataset
which athen
set inspect
of bytethe top-most
n-grams binaries
[21], in Figure
summarizes the8, using
number theof clusters
n-gram to
as possible cryptominers (about 1% of the
appeared to be inactive when visiting the website. While our dataset), three of which
analy- quickly identify variants of athe same binary.vector, Our analysis shows the
occurrences in a binary into characteristic and then clusters
appeared
sis may missto be inactive when
cryptomining visitinga risk
binaries, the website.
one could While
reduce ourthrough
analy- following to beonthe
binaries based themost widely
pairwise occurring
cosine similarity WebAssembly
of their vectors. binaries.We
sis may missanalysis
additional cryptomining
techniques binaries, a risk
beyond one provided
those could reduce via through
VirusTo- then inspectforthe
Testing top-most binaries
WebAssembly support. in Figure
At least 8, using the clusters
509 domains (38.5% to
additional
tal [24, 43], the prevalence of cryptomining seems to haveVirusTo-
analysis techniques beyond those provided via dropped quickly identifythat
of all domains variants of the same binary.
use WebAssembly) Our analysis
are serving shows the
a WebAssembly
tal [24, 43], theover
significantly prevalence
the pastofone cryptomining
to two years. seems to have
This resultdropped
is also following
binary that to tests
be thewhether
most widely occurring
the browser WebAssembly
supports WebAssembly binaries. at
significantly
confirmed byover the past
a manual one tooftwo
analysis years. Thisbinaries
WebAssembly result isfound
also
confirmed all.Testing
We found two variants of such binaries,
for WebAssembly support. At least 509 domains (38.5% both of which are
on the webby(Section
a manual analysis
4.6.2) and isofinWebAssembly
line with other binaries
reports found
[42] rather small: athat six instruction binary with a singleafunction called
on of all domains use WebAssembly) are serving WebAssembly
thatthe web (Sectionbecame
cryptomining 4.6.2) and lessisappealing
in line with afterother
one of reports [42]
the major test and antests
eightwhether
byte binary
that cryptomining binary that the that
browseronly supports
contains the WebAssembly
WebAssembly at
cryptomining scriptbecame
providers, less Coinhive,
appealingshut afterdown.
one ofVarlioglu
the major et magic
cryptomining script providers, Coinhive, shut down. Varlioglu et all. Wenumber
found two followed by the
variants of language
such binaries,version. bothWebsites
of which servingare
al. find a 99% decrease in sites using cryptomining among sites these test binaries often also binary
serve larger
al. find rather small: a six instruction with abinaries,
single functioni.e., they first
called
that hada made
99% decrease
use of it in sites The
before. using low cryptomining among sites
numbers of cryptominers test whether WebAssembly is supported, and if it is, load a more
that had made use of it before. The low numbers of cryptominers test and an eight byte binary that only contains the WebAssembly
found by our analysis confirms this trend and shows its declining complex binary. For example,
found by our analysis confirms this trend and shows its declining magic number followed by theatlanguage
least 397version.
domainsWebsites
that serve a test
serving
influence on the WebAssembly ecosystem. binarytest
alsobinaries
serve the Hyphenopoly librarybinaries,
discussedi.e., next.
influence on the WebAssembly ecosystem. these often also serve larger they first
testHyphenopoly.
whether WebAssemblyAt least 462 is supported,
domains (34.9% andof if
allitdomains
is, load thata more use
Insight 9. We find WebAssembly-based cryptominers to have complex binary. serve
WebAssembly) For example,
binariesatthat leastare
397partdomains
of thethat serve a test
Hyphenopoly.js
significantly dropped in importance compared to the results of binary alsolibrary,
JavaScript serve the Hyphenopoly
which uses WebAssemblylibrary discussed
to implement next. its core
an earlier study [24]. This finding motivates security research to functionality. Hyphenopoly is a polyfill that “hyphenates
shift the focus from malicious WebAssembly to vulnerabilities in Hyphenopoly. At least 462 domains (34.9% of all18 domainstext thatifuse
the
user agent does not support CSS-hyphenation”.
WebAssembly) serve binaries that are part of the Hyphenopoly.js Our clustering
WebAssembly binaries. identifies 24 variants of this
JavaScript library, which usesbinary, which are
WebAssembly all generated
to implement its from
core
the same underlying
functionality. Hyphenopolylibraryistoa support
polyfill thatdifferent natural languages.
“hyphenates text if the
4.6 RQ4:
4.6 RQ4: Use
Use Cases
Cases on
on the
the Web
Web long.js. At least 33118domains (25% of
user64-bit
agent integer arithmetic
does not supportinCSS-hyphenation”. Our clustering
Given the
Given the decreased
decreased prevalence
prevalence of
of cryptominers,
cryptominers, we we study
study what
what all domains
identifies 24 that use WebAssembly)
variants serve are
of this binary, which a binary that is part
all generated fromof
other use cases WebAssembly has. The following focuses on the 19 The
other use cases WebAssembly has. The following focuses on the samea underlying
long.js,
the JavaScript library
library for 64-bit integer
to support computations.
different natural languages.
web because it is the most prominent target platform
web because it is the most prominent target platform of WebAs- of WebAs- library is commonly used in video players.
sembly and
and because
because websites
websites are
are complete
complete applications
applications that
that we
we can
can 64-bit integer arithmetic in long.js. At least 331 domains (25% of
sembly Draco library
all domains that for
use3D serveAta least
data compression.
WebAssembly) 25 that
binary domains
is part(1.8%
of
manually analyze with reasonable effort. We address the
manually analyze with reasonable effort. We address the question question
of all domains that use WebAssembly) serve a binary that
long.js, a JavaScript library for 64-bit integer computations. The belongs
19 to
in two
in two ways.
ways. First,
First, we
we study
study binaries
binaries that
that occur
occur across
across multiple
multiple
library is commonly used in video players.
18 https://github.com/mnater/Hyphenopoly
websites, which helps understand libraries and other widely reused
17 https://github.com/JayWalker512/CryptoNoter 19 https://github.com/dcodeIO/long.js
components (Section 4.6.1). Second, we inspect a random sample Draco library for 3D data compression. At least 25 domains (1.8%
of 100 unique binaries, which helps understand the application of all domains that use WebAssembly) serve a binary that belongs to
domains of WebAssembly (Section 4.6.2). To capture the full picture the Draco library, which support compressing and decompressing
of WebAssembly use cases, the results are on unfiltered binaries. 3D data.20 These binaries commonly occur on websites with 3D
demos or integrated 3D assets.
4.6.1 Binaries Found on Multiple Websites. Out of all 476 unique
binaries found on the web, 70 are reused across at least two differ- Insight 10. The most widely occurring binaries on the web are dy-
ent top-level domains. Figure 8 shows a histogram of how often namic tests for WebAssembly support and JavaScript-WebAssembly
ong sites rather small: a six instruction binary with a single function called
ominers test and an eight byte binary that only contains the WebAssembly
eclining magic number followed by the language version. Websites serving
these test binaries often also serve larger binaries, i.e., they first
test whether WebAssembly is supported, and if it is, load a more
to have complex binary. For example, at least 397 domains that serve a test
esults of binary alsoStudy
An Empirical serve
of the Hyphenopoly
Real-World library
WebAssembly discussed next.
Binaries WWW ’21, April 19–23, 2021, Ljubljana, Slovenia
earch to
ilities in Hyphenopoly.
Table At least 462
4: Application domains
domains of(34.9%
100 of all domains
randomly that use
sampled,
WebAssembly) serve binaries that are part
unique WebAssembly binaries found on the web. of the Hyphenopoly.js Analysis. We perform a static analysis to assess two name-related
JavaScript library, which uses WebAssembly to implement its core characteristics of binaries. First, the analysis checks whether a bi-
Application domain # Binaries Application domain # Binaries
functionality. Hyphenopoly is a polyfill that “hyphenates text if the nary contains a names section. While by default binaries contain
user agent does not support25CSS-hyphenation”.
Games Online gambling18 Our clustering 2 names only for imported and exported program elements, the op-
dy what Text processing 11 Barcodes and QR codes
identifies 24 variants of this binary, which are all generated 2
from tional names section maps all function indices to identifiers. Second,
s on the Visualization
the / Animation
same underlying 11 support
library to Room planning
different / Furniture
natural 2
languages. the analysis checks whether the names of imported and exported
Media processing / Player 9 Blogging 2
WebAs- At least 331 domains names are minified. Both to save space and to obfuscate the code,
64-bite.g.,
Demo, integer
of a arithmetic in 7long.js.Cryptocurrency wallet (25% 2of
t we can allprogramming
domains that use WebAssembly)Regular serve expressions
a binary that is part 1of compilers may shorten names down to single or two-letter names
language
question Wasm atutorial
long.js, JavaScript
or testlibrary for5 64-bit integer computations.19 The
Hashing 1 devoid of information. The analysis considers a WebAssembly bi-
multiple Chat is commonly used in video
library 3 PDF viewer
players. 1 nary as minified if it contains more than ten import or export names
y reused (to exclude small, potentially hand-written modules), but the aver-
m sample Draco library for 3D data compression. At least 25 domains (1.8% age length of those names is ≤ 4 (to account for some imports that
plication of all domains
the Draco that which
library, use WebAssembly) serve a binary
support compressing that belongs to
and decompressing are never minified by Emscripten, thus increasing the average).
l picture the Draco 20library, which support compressing and decompressing
3D data.20 These binaries commonly occur on websites with 3D Results. Our results show an interesting difference between the
inaries. 3D data.
demos These binaries
or integrated commonly occur on websites with 3D
3D assets. full dataset and binaries found on the web. In the full data set, many
demos or integrated 3D assets.
6 unique binaries contain a names section (19.6%) but only 4.1% of the binaries
wo differ- Insight 10. The most widely occurring binaries on the web are dy- are minified. In contrast, among all binaries found on the web, only
ow often namic tests for WebAssembly support and JavaScript-WebAssembly 13.3% contain a names section but 28.8% are minified. These results
g-tail dis- libraries that perform computation-heavy tasks.
An Empirical Study of Real-World WebAssembly Binaries
show that for a significant fraction of websites, not only minified
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia
ile many JavaScript code [37], but also minified WebAssembly binaries make
An Empirical Study of Real-World WebAssembly Binaries WWW ’21, April 19–23, 2021, Ljubljana, Slovenia
st widely 4.6.2 Manual Inspection of a Random Sample. To better understand
18 https://github.com/mnater/Hyphenopoly
it harder to understand what code is running on the client side.
ns where 4.6.2long-tail
the Manual of Inspection
binaries foundof a Random
19 https://github.com/dcodeIO/long.js or onlyTo
on a fewSample. betterwebsite,
a single understand we Insight 12. Many WebAssembly binaries on the web (28.8%)
the
4.6.2
20
also long-tail
Manual
inspect aofrandom
binariessample
Inspection
https://github.com/google/draco
found on
of a Randoma few
of 100 or only
Sample.
unique Toa single website,
betterfound
binaries understand we
on the are minified
Insight 12. and
Manydo not contain useful
WebAssembly names.
binaries on To
thehelp
websecurity
(28.8%)
also
the inspect
web.long-tail a random
of binaries
We exclude binariessample
found of 100
on a few
detected unique
onlyorvia
onlybinaries found
thea WebAssembly on
single website, topthe
we analysts understand third-party code, future work
are minified and do not contain useful names. To help on decompiling
security
web.inspect
also
lists We exclude
(Section a3.3.2)binaries
random sample
to avoid detected
of 100only
biasing via the
unique
the results WebAssembly
binaries
toward found top
on the
pre-selected and reverse
analysts engineering
understand WebAssembly
third-party is needed.
code, future work on decompiling
lists (Section
web. We exclude
application 3.3.2)
domains. toBy avoid
binaries biasing only
detected
inspecting the binaries,
the results
via thetoward pre-selected
WebAssembly
the corresponding top and reverse engineering WebAssembly is needed.
application
lists (Section
websites, anddomains.
3.3.2)
how to theBy inspecting
avoid biasing
websites usesthethe
the binaries,
results thewe
toward
binaries, corresponding
pre-selected
identify the 5 RELATED WORK
websites,ofand
application
purpose 84 how
out ofthe
domains. thewebsites
By inspecting uses
100 binaries. thethe binaries,
binaries,
Table thewe identify
the the
corresponding
4 summarizes ap- 5 WebAssembly
RELATEDinWORK
general. WebAssembly has been formally de-
purpose
websites, of
and84 out
how of
the the 100
websites binaries.
uses Table
the 4 summarizes
binaries,
plication domains that the binaries are used in. The most common we thethe
identify ap- fined [10], including a mechanized proof ofhas thebeen soundness of de-
its
WebAssembly in general. WebAssembly formally
plicationof
purpose domains
84games, that
out of thethe100binaries
forare
binaries. used in.
Table ofThe
all most
4 summarizes common
theText
ap- type system [44]. Since the initial version
domains are accounting a quarter binaries. fined [10], including a mechanized proof of of the
the language,
soundnessseveral of its
domains domains
plication are games,that accounting
the binaries forare
a quarter in.of
used and Theallmost
binaries.
are Text
common language extensions have
processing and applications in visualization animation also type system [44]. Since thebeen proposed
initial version [6, 32, 33].
of the language, several
processing
domains areand applications
games, accounting in visualization
for a and
quarter
relatively common, with 11/100 binaries each. The remaining of animation
all areText
binaries. also
list Cryptomining.
language extensions Cryptojacking, i.e., websites
have been proposed [6, 32,that
33].use the unsus-
relatively
processing common,
and with
applications 11/100
in binaries
visualization each.
and
shows the diversity of application domains WebAssembly is used The remaining
animation are list
also pecting client’s computing
Cryptomining. resources
Cryptojacking, i.e., for mining
websites thatcryptocurrencies,
use the unsus-
shows
relatively
in, thecommon,
ranging diversity
from online of application
with 11/100 of
demos domains
binaries WebAssembly
each.
programming Thelanguages,
remainingis used
list
over has beenclient’s
pecting amongcomputing
the first applications
resources for of WebAssembly [15, 25, 34].
mining cryptocurrencies,
in, ranging
shows
support the from online
fordiversity
creating of
and demos of barcodes
application
scanning programming
domains languages,
WebAssembly
to document over
is used
viewers. Several
has beentechniques
among thedetect and defend against
first applications cryptojacking
of WebAssembly [15, [14, 43].
25, 34].
support
in, for from
ranging creating and demos
online scanning of barcodes
programming to document viewers.
languages, over Section 4.5 studies how prevalent this threat is,
Several techniques detect and defend against cryptojacking [14, 43]. showing that its
support for creating and scanning barcodes to document viewers. importance has decreased over time.this Wang et al.is,[43] discussthatlimita-
Insight 11. The application domains of WebAssembly binaries Section 4.5 studies how prevalent threat showing its
on the web
Insight 11. reflect the diversity
The application of theofweb
domains itself, showing
WebAssembly that
binaries tions of VirusTotal
importance in identifying
has decreased over time.cryptomining,
Wang et al. [43] which may impact
discuss limita-
WebAssembly
on the web reflect is usedtheindiversity
a wide rangeof theof web
applications.
itself, showing that the validity
tions of our results.
of VirusTotal However,
in identifying our manual which
cryptomining, inspectionmay of bina-
impact
WebAssembly is used in a wide range of applications. ries confirms the low prevalence of cryptominers
the validity of our results. However, our manual inspection of bina- among today’s
WebAssembly
ries confirms the binaries, which is also
low prevalence supported by Varlioglu
of cryptominers among today’s et al.’s
4.7 RQ5: Minification and Names observations about the which
declineisof cryptojacking
4.7 RQ5: Minification and Names WebAssembly binaries, also supported by[42]. Varlioglu et al.’s
As a binary format with only a low-level textual representation, WebAssembly
observations about attacks. Beyond
the decline cryptomining,[42].
of cryptojacking other kinds of at-
4.7 RQ5:format
As a binary
WebAssembly Minification
binaries only aand
withcannot as Names
below-level
easilytextual representation,
inspected and under-
WebAssembly binaries cannot tacks based on WebAssembly
WebAssembly attacks. Beyond exist. Lehmann et other
cryptomining, al. showkindsthatofvul-
at-
As
stooda binary
as sourceformat
code,with
e.g.,only abe as easily
low-level
in JavaScript. The inspected
textual and under-
abilityrepresentation,
to understand
stood as source code, e.g., in JavaScript. The ability to understand nerabilities that propagate from memory-unsafe
tacks based on WebAssembly exist. Lehmann et al. show that vul- source languages
WebAssembly
a WebAssembly binaries
binarycannot
is relevant be asforeasily inspected
auditing and under-
third-party code
a WebAssembly binary isinrelevant forexample,
auditing third-party code may also be that
nerabilities exploited
propagatein WebAssembly [18]. Otherssource
from memory-unsafe reportlanguages
examples
stood
and as source
reverse code,
engineeringe.g.,malware.
JavaScript.
For The abilitywhento aunderstand
frequently
and reverse engineering malware. For of such
may alsoattacks [3, 22].
be exploited in Custom memory
WebAssembly allocators
[18]. Others report are potentially
examples
adepended-upon
WebAssembly binary
npm is
package relevant
containsforexample,
aauditing when
WebAssembly a binary,
frequently
third-party code
the
depended-upon npm package contains a WebAssembly binary, the notsuch
of hardened
attacks[5, [3,22].
22].Section
Custom4.4 showsallocators
memory that several areof the risks
potentially
and reverse
package engineering
distribution malware.
platform mayFor example,
want to checkwhenthata it
frequently
does not
package distribution platform may want to check that it does21 not reported by prior work affect a wide range
not hardened [5, 22]. Section 4.4 shows that several of the of binaries. Another
risks
depended-upon
perform malicious npm package
actions, contains
such a WebAssembly
as stealing binary, the
cryptocurrency. Be-
perform maliciousnames,
actions, such asfunctions,
stealing cryptocurrency. 21 Be- line of attack
reported are malicious
by prior work affect WebAssembly
a wide range binaries,
of binaries.e.g., to escape
Another
package
cause distribution
meaningful platform
e.g., may
for want to check that itfor
are helpful does not
under-
cause meaningful the browser
line of attacksandbox [1, 36],WebAssembly
are malicious attacks that use side channels
binaries, e.g., to[9], and
escape
code [17],names, e.g., forbinaries,
asfunctions, are helpful for extent
under-
perform malicious actions, such stealing 21 Be-
standing especially in wecryptocurrency.
study to what attacks based on speculative execution
standing
cause code
meaningful [17], especially
names, in
e.g., for binaries, we
functions,names. study to what extent
are helpful for under- the browser sandbox [1, 36], attacks that[20].
use side channels [9], and
WebAssembly binaries provide meaningful
WebAssembly binaries provideinmeaningful WebAssembly
attacks defenses. A execution
based on speculative defense against[20]. application-level at-
standing code [17], especially binaries, wenames.
study to what extent
tacks is to enforce security
WebAssembly defenses. A defense againstpolicies on untrusted WebAssembly
application-level at-
WebAssembly
Analysis. Webinaries
performprovide
a static meaningful
20 https://github.com/google/draco names.
analysis to assess two name-related
21 https://blog.npmjs.org/post/185397814280/plot-to-steal-cryptocurrency-foiled-by-
binaries
tacks through
is to enforcetaint tracking
security [8, 40].
policies onWebAssembly
untrusted WebAssembly also serves
characteristics
Analysis. Weof binaries.
perform First,analysis
a static the analysis checks
to assess whether a bi-
two name-related as a technology
the-npm binaries throughto implement
taint trackingdefenses, e.g., to sandbox
[8, 40]. WebAssembly alsolibraries
serves
nary contains aofnames
characteristics section.
binaries. First,While by default
the analysis checksbinaries contain
whether a bi- executed in a browser [26], to implement formally verifiedlibraries
cryptog-
as a technology to implement defenses, e.g., to sandbox
names only for imported and exported program elements,
nary contains a names section. While by default binaries contain the op- raphy [30],
executed in or to ensure[26],
a browser constant-time
to implement operations
formallyfor cryptographic
verified cryptog-
tional names
names section
only for maps and
imported all function
exportedindices
programto identifiers.
elements, Second,
the op- primitives
raphy [30], [45].
or to Our
ensure results call for additional
constant-time operations mitigations, e.g., to
for cryptographic
the analysis
tional checks maps
names section whether the names
all function of imported
indices and exported
to identifiers. Second, defend against
primitives [45]. vulnerabilities
Our results callpropagated
for additional frommitigations,
source languages,
e.g., to
names
the are minified.
analysis Both to save
checks whether space and
the names to obfuscate
of imported the code,
and exported and provide guidelines for developing such techniques, e.g., by
defend against vulnerabilities propagated from source languages,
names are minified. Both to save space and to obfuscate thenames
compilers may shorten names down to single or two-letter code, showing which toolchains are most commonly
and provide guidelines for developing such techniques, e.g., by used.
devoid of information.
compilers may shorten The names analysis
down considers
to single ora WebAssembly
two-letter names bi-
Studieswhich
showing of WebAssembly.
toolchains are Musch
mostetcommonly
al. [24] systematically
used. collect
nary asof
devoid minified if it contains
information. more than
The analysis ten import
considers or export names
a WebAssembly bi-
WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Aaron Hilbig, Daniel Lehmann, and Michael Pradel

as a technology to implement defenses, e.g., to sandbox libraries REFERENCES


executed in a browser [26], to implement formally verified cryptog- [1] Georgi Geshev Alex Plaskett, Fabian Beterke. 2018. Apple Safari – Wasm Section
raphy [30], or to ensure constant-time operations for cryptographic Exploit.
[2] Javier Cabrera Arteaga, Orestis Floros Malivitsis, Oscar Luis Vera Pérez, Benoit
primitives [45]. Our results call for additional mitigations, e.g., to Baudry, and Martin Monperrus. 2020. CROW: Code Diversification for WebAs-
defend against vulnerabilities propagated from source languages, sembly. arXiv preprint arXiv:2008.07185 (2020).
[3] John Bergbom. 2018. Memory safety: old vulnerabilities become new with WebAs-
and provide guidelines for developing such techniques, e.g., by sembly.
showing which toolchains are most commonly used. [4] Javier Cabrera Arteaga, Shrinish Donde, Jian Gu, Orestis Floros, Lucas Satabin,
Benoit Baudry, and Martin Monperrus. 2020. Superoptimization of WebAssembly
Studies of WebAssembly. Musch et al. [24] systematically collect bytecode. In ICPS Companion 2020. 36–40.
WebAssembly from the web and report cryptomining to be one [5] Frank Denis. 2018. WebAssembly doesn’t make unsafe languages safe (yet).
of its prime use cases. Our work extends their findings in several [6] Craig Disselkoen, John Renner, Conrad Watt, Tal Garfinkel, Amit Levy, and Deian
Stefan. 2019. Position Paper: Progressive Memory Safety for WebAssembly. In
ways: (i) by considering a wider range of sources to gather binaries, HASP.
which results in 58× more binaries; (ii) by showing that other ap- [7] Ana Nora Evans, Bradford Campbell, and Mary Lou Soffa. 2020. Is Rust used
plications than cryptomining have become much more prevalent; Safely by Software Developers?. In ICSE. 246–257.
[8] William Fu, Raymond Lin, and Daniel Inge. 2018. TaintAssembly: Taint-Based
and (iii) by studying several properties of WebAssembly not con- Information Flow Control Tracking for WebAssembly. CoRR abs/1802.01050
sidered before, e.g., security properties and toolchains. Other work (2018).
[9] Daniel Genkin, Lev Pachmanov, Eran Tromer, and Yuval Yarom. 2018. Drive-By
studies the performance of WebAssembly and compares it to native Key-Extraction Cache Attacks from Portable Code. In ACNS.
performance [12], however again on a small set of binaries. [10] Andreas Haas, Andreas Rossberg, Derek L Schuff, Ben L Titzer, Michael Holman,
WebAssembly benchmarks. Prior work on WebAssembly often Dan Gohman, Luke Wagner, Alon Zakai, and JF Bastien. 2017. Bringing the web
up to speed with WebAssembly. In PLDI.
relies on benchmark suites that may not well represent the diversity [11] David Herrera, Hanfeng Chen, Erick Lavoie, and Laurie Hendren. 2018. Numerical
of real-world WebAssembly binaries, such as PolyBenchC, SciMark, computing on the web: benchmarking for the future. In DLS. ACM, 88–100.
[12] Abhinav Jangda, Bobby Powers, Emery D. Berger, and Arjun Guha. 2019. Not
and Ostrich [13], i.e., benchmarks of numerical or scientific compu- So Fast: Analyzing the Performance of WebAssembly vs. Native Code. In 2019
tations [10, 11, 19], SPEC CPU, i.e., benchmarks of complex C/C++ USENIX ATC. 107–120.
programs that are not typically compiled to WebAssembly [12, 18], [13] Faiz Khan, Vincent Foley-Bourgon, Sujay Kathrotia, Erick Lavoie, and Laurie J.
Hendren. 2014. Using JavaScript and WebCL for numerical computations: a
and small sets of hand-picked applications [18]. This paper instead comparative study of native and web technologies. In DLS’14. ACM, 91–102.
presents a set of thousands of real-world binaries collected from [14] Amin Kharraz, Zane Ma, Paul Murley, Charles Lever, Joshua Mason, Andrew
various sources, which we make available for future research. Miller, Nikita Borisov, Manos Antonakakis, and Michael Bailey. 2019. Outguard:
Detecting in-browser covert cryptocurrency mining in the wild. In WWW ’19.
Studies of other languages and ecosystems. Beyond WebAssembly, [15] Radhesh Krishnan Konoth, Emanuele Vineti, Veelasha Moonsamy, Martina
other studies investigate JavaScript and web security in general, Lindorfer, Christopher Kruegel, Herbert Bos, and Giovanni Vigna. 2018.
MineSweeper: An In-depth Look into Drive-by Cryptocurrency Mining and
including studies of minified and obfuscated code in the web [37], of Its Defense. In CCS 2018.
the use of the eval [31], of the communication between websites and [16] Tobias Lauinger, Abdelberi Chaabane, Sajjad Arshad, William Robertson, Christo
embedded frames with 3rd-party content [38], of outdated libraries Wilson, and Engin Kirda. 2017. Thou Shalt Not Depend on Me: Analysing the
Use of Outdated JavaScript Libraries on the Web. In NDSS 2017.
in the web [16], of trust relationships between websites that include [17] Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley. 2006. What’s
remote libraries and their corresponding library providers [27], of in a Name? A Study of Identifiers. In ICPC. 3–12.
[18] Daniel Lehmann, Johannes Kinder, and Michael Pradel. 2020. Everything Old is
implicit type conversations in JavaScript code [29], of ReDoS vul- New Again: Binary Security of WebAssembly (USENIX Security 2020). 217–234.
nerabilities in JavaScript-based web servers [39], of XSS vulnera- [19] Daniel Lehmann and Michael Pradel. 2019. Wasabi: A framework for dynamically
bilities [23], and of performance issues in JavaScript [35]. Inspired analyzing webassembly (ASPLOS 2019). 1045–1058.
[20] Giorgi Maisuradze and Christian Rossow. 2018. Ret2spec: Speculative Execution
by all that work, this paper fills in important gaps in the existing Using Return Stack Buffers. In CCS.
knowledge about security properties of real-world WebAssembly. [21] Christopher D Manning, Hinrich Schütze, and Prabhakar Raghavan. 2008. Intro-
duction to information retrieval. Cambridge university press.
[22] Brian McFadden, Tyler Lukasiewicz, Jeff Dileo, and Justin Engler. 2018. Security
6 CONCLUSION Chasms of WASM. NCC Group Whitepaper.
This paper presents a comprehensive empirical study of security [23] William Melicher, Anupam Das, Mahmood Sharif, Lujo Bauer, and Limin Jia.
2018. Riding out DOMsday: Towards Detecting and Preventing DOM Cross-Site
properties, languages, and use cases of a diverse set of real-world Scripting. Network and Distributed System Security Symposium (NDSS).
WebAssembly binaries. After gathering binaries from several sources, [24] Marius Musch, Christian Wressnegger, Martin Johns, and Konrad Rieck. 2019.
New Kid on the Web: A Study on the Prevalence of WebAssembly in the Wild
ranging from source code repositories over packages managers to (DIMVA 2019). Springer, 23–42.
live websites, we analyze them through a combination of static [25] Marius Musch, Christian Wressnegger, Martin Johns, and Konrad Rieck. 2019.
code analysis, manual inspection, and statistical analysis. Our study Thieves in the Browser: Web-based Cryptojacking in the Wild. In ARES.
[26] Shravan Narayan, Craig Disselkoen, Tal Garfinkel, Nathan Froyd, Eric Rahm,
shows that WebAssembly has grown into a diverse ecosystem with Sorin Lerner, Hovav Shacham, and Deian Stefan. 2020. Retrofitting Fine Grain
new challenges and opportunities for security researchers and prac- Isolation in the Firefox Renderer. In USENIX Security.
titioners, e.g., in analyzing vulnerabilities in WebAssembly binaries, [27] Nick Nikiforakis, Luca Invernizzi, Alexandros Kapravelos, Steven Van Acker,
Wouter Joosen, Christopher Kruegel, Frank Piessens, and Giovanni Vigna. 2012.
in hardening binaries against exploitation, and in helping security You are what you include: large-scale evaluation of remote JavaScript inclusions.
analysts reverse engineer binaries. We make the binaries underlying In CCS. 736–747.
[28] Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Kor-
our study, which yields by far the largest benchmark of WebAssem- czyński, and Wouter Joosen. 2018. Tranco: A research-oriented top sites ranking
bly binaries to date, available to support future work. hardened against manipulation. arXiv preprint arXiv:1802.01156 (2018).

ACKNOWLEDGMENTS
This work was supported by the European Research Council (ERC,
grant agreement 851895), and by the German Research Foundation
within the ConcSys and Perf4JS projects.
An Empirical Study of Real-World WebAssembly Binaries WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

[29] Michael Pradel and Koushik Sen. 2015. The Good, the Bad, and the Ugly: An [40] Aron Szanto, Timothy Tamm, and Artidoro Pagnoni. 2018. Taint Tracking for
Empirical Study of Implicit Type Conversions in JavaScript. In ECOOP. WebAssembly. https://arxiv.org/abs/1807.08349. arXiv:1807.08349 [cs.CR]
[30] J. Protzenko, B. Beurdouche, D. Merigoux, and K. Bhargavan. 2019. Formally [41] Laszlo Szekeres, Mathias Payer, Tao Wei, and Dawn Song. 2013. SoK: Eternal
Verified Cryptographic Web Applications in WebAssembly. In SP. War in Memory. In 2013 IEEE Symposium on Security and Privacy (SP 2013).
[31] Gregor Richards, Andreas Gal, Brendan Eich, and Jan Vitek. 2011. Automated [42] Said Varlioglu, Bilal Gonen, Murat Ozer, and Mehmet Bastug. 2020. Is Crypto-
construction of JavaScript benchmarks. In OOPSLA. 677–694. jacking Dead after Coinhive Shutdown?. In ICICT. IEEE, 385–389.
[32] Andreas Rossberg. 2019. Multiple per-module memories for Wasm. [43] Wenhao Wang, Benjamin Ferrell, Xiaoyang Xu, Kevin W Hamlen, and Shuang
[33] Andreas Rossberg. 2019. Proposal for adding basic reference types. Hao. 2018. Seismic: Secure in-lined script monitors for interrupting cryptojacks.
[34] Jan Rüth, Torsten Zimmermann, Konrad Wolsing, and Oliver Hohlfeld. 2018. In European Symposium on Research in Computer Security. Springer, 122–142.
Digging into Browser-based Crypto Mining. In IMC. [44] Conrad Watt. 2018. Mechanising and verifying the WebAssembly specification.
[35] Marija Selakovic and Michael Pradel. 2016. Performance Issues and Optimizations In CPP 2018. 53–65.
in JavaScript: An Empirical Study. In ICSE. [45] Conrad Watt, John Renner, Natalie Popescu, Sunjay Cauligi, and Deian Stefan.
[36] Natalie Silvanovich. 2018. The Problems and Promise of WebAssembly. 2019. CT-Wasm: Type-Driven Secure Cryptography for the Web Ecosystem.
[37] Philippe Skolka, Cristian-Alexandru Staicu, and Michael Pradel. 2019. Anything POPL (2019).
to Hide? Studying Minified and Obfuscated Code in the Web. In WWW. [46] Hui Xu, Zhuangbin Chen, Mingshen Sun, and Yangfan Zhou. 2020. Memory-
[38] Sooel Son and Vitaly Shmatikov. 2013. The Postman Always Rings Twice: At- Safety Challenge Considered Solved? An Empirical Study with All Rust CVEs.
tacking and Defending postMessage in HTML5 Websites.. In NDSS. arXiv preprint arXiv:2003.03296 (2020).
[39] Cristian-Alexandru Staicu and Michael Pradel. 2018. Freezing the Web: A Study
of ReDoS Vulnerabilities in JavaScript-based Web Servers. In USENIX Sec.

You might also like