Automatic Grey-box Fuzzing for Structured Binary Formats

WEIZZ: Automatic Grey-box Fuzzing for Structured Binary Formats
Andrea Fioraldi Daniele Cono D’Elia Emilio Coppa

Sapienza University of Rome Sapienza University of Rome Sapienza University of Rome
andreafioraldi@gmail.com delia@diag.uniroma1.it coppa@diag.uniroma1.it
Abstract—Fuzzing technologies have evolved at a fast pace in enhanced CGF embodiments of this kind are available for
recent years, revealing bugs in programs with ever increasing both grammar-based [2], [3] and chunk-based [4] formats.
depth and speed. Applications working with complex formats The shortcomings of this approach are readily apparent.
arXiv:1911.00621v1 [cs.CR] 2 Nov 2019
are however more difficult to take on, as inputs need to meet Applications typically do not come with format specifica-
certain format-specific characteristics to get through the initial tions suitable to this end. Asking users to write one is
parsing stage and reach deeper behaviors of the program. at odds with a driving factor behind the success of CGF
Unlike prior proposals based on manually written format fuzzers, that is, they work with a minimum amount of
specifications, in this paper we present a technique to automat- prior knowledge [5]. Such a request can be expensive, and
ically generate and mutate inputs for unknown chunk-based hardly applies to security contexts where users deal with
binary formats. We propose a technique to identify dependen- proprietary or undocumented formats [6].
cies between input bytes and comparison instructions, and later The second limitation is that testing only inputs that per-
use them to assign tags that characterize the processing logic fectly adhere to the specification would miss imprecisions
of the program. Tags become the building block for structure- in the implementation, while inputs that are to some degree
aware mutations involving chunks and fields of the input.
outside it may instead exercise them [6]. In our experience
such bugs are present also in applications that operate on
We show that our techniques performs comparably to
multiple input formats and share code for their processing.
structure-aware fuzzing proposals that require human as-
Recently Blazytko et al. [6] have shown in the context of
sistance. Our prototype implementation W EIZZ revealed 11
grammar-based formats how to extend the grey-box fuzzing
unknown bugs in widely used programs.
process to resemble the workings of a grammar-based solu-
tion without requiring the user to supply one. In a similar
1. Introduction vein in our work we explore automatic fuzzing of chunk-
based structures found in many binary formats [4], [7].
Recent years have witnesses a spike of activity in the Our approach. The main feature of our proposal can be
development of efficient techniques for fuzz testing, also summarized as: we attempt to learn how the input structure
known as fuzzing, from both academic and industrial actors. may look like based on how the program handles its bytes.
In particular, the coverage-based grey-box fuzzing (CGF) Nuances of this idea are present in [6] where code
approach has proven to be very effective for finding bugs coverage guides grammar structure inference, and to some
often indicative of weaknesses from a security standpoint. extent also in a few general-purpose fuzzing additions. For
The availability of the AFL fuzzing framework [1] paved instance, taint tracking can reveal which input bytes take part
the way to a large body of works proposing not only in comparisons against magic sequences [8], [9]. Similarly,
more efficient implementations, but also new techniques to input-to-state correspondence [5] can identify also checksum
deal with common fuzzing roadblocks represented by magic fields using values observed as comparison operands.
numbers and checksum tests in programs. Nonetheless, there We build on the intuition that among comparisons op-
are still several popular application scenarios that make hard erating on input-derived data, we can deem some as tightly
cases even for the most advanced CGF fuzzers. coupled to a single portion of the input structure. We also
CGF fuzzers operate by mutating inputs at the granular- experimentally observed that the order in which a program
ity of their bit and byte representations, deeming mutated exercises these checks can reveal structural details such as
inputs interesting when they lead execution to explore new the location for the type specifier of a chunk. We tag input
program portions. While this approach works very well for bytes with the most representative comparison operations
compact and unstructured inputs [2], it can lose efficacy for their processing, and devise techniques to infer plausible
when dealing with highly structured inputs that must con- boundaries for chunks and fields. Thanks to this automatic
form to some grammar or other type of specification. pipeline we can apply structure-aware mutations for chunk-
Intuitively, “undisciplined” mutations make a fuzzer based grey-box fuzzing [4] without the need for a user-
spend important time in generating many inputs that the supplied format specification.
initial parsing stages of such a program typically rejects, We start by determining candidate instructions that we
resulting in little to none code coverage improvement. Re- can consider relevant with respect to an input byte. Instead
searchers thus have added a user-supplied specification to of resorting to taint tracking, we flip bits in every input
the picture to produce and prioritize meaningful inputs: byte, execute the program, and build a dependency vector
for operands at comparison instruction sites. We analyze 2.1. Coverage-based Fuzzing
dependencies to identify input-to-state relationships and
roadblocks, and to assign tags to input bytes. Tag assignment American Fuzzy Lop (AFL) [1] is a very well-known
builds on spatial and temporal properties: we leverage the CGF fuzzer and several research works have built on it to
fact that a program typically uses distinct instructions in improve its effectiveness. To build new inputs, AFL draws
the form of comparisons to parse distinct items; to break from a queue of initial user-supplied seeds and previously
ties we prioritize older instructions since format validation generated inputs, and runs two mutation stages. Both the
normally happens early in the execution. Tags drive the deterministic and the havoc nondeterministic stage look for
inference process of an approximate chunk-based structure coverage improvements and crashes. AFL adds to the queue
of the input, enabling subsequent structure-aware mutations. inputs that improve the coverage of the program, and reports
Unlike scenarios for reconstructing specifications of in- crashing inputs to users as proofs for bugs.
Coverage. AFL adds instrumentation to a binary pro-
put formats [7], [10], [11], experimental results suggest that,
gram to intercept when a branch is hit during the execution.
like observed for grammar-based formats [6], this inference
To efficiently track hit counts, it uses a coverage map of 216
process does not have to be fully accurate: fuzzing can deal
entries and associates a counter to the hash of the branch,
with some amount of noise and imprecision. Another impor-
built using its source and destination basic block addresses.
tant consideration is that a specification does not prescribe
AFL considers an input interesting when the execution path
how its implementation should look like. Developers can
reaches a branch that yields a previously unseen hit count.
share code among functionalities introducing subtle bugs,
To limit the number of monitored inputs and discard likely
but also devise optimized variants of one functionality, for
similar executions, hit counts undergo a normalization to
instance specialized on value ranges of some relevant input
power-of-two buckets before lookup.
field. As we assign tags and attempt inference over one input
Mutations. In the deterministic stage AFL sequentially
instance at a time, we have a chance to identify and exercise
scans the input, applying a set of mutations for each position.
such patterns when aiming for code coverage improvements.
These include bit or byte flipping, arithmetic increments and
The contributions of this work can be summarized as: decrements, substitution with common constants (e.g., 0, -1,
• a dependency identification technique embedded in the MAX_INT) or values from a user-supplied dictionary. AFL
deterministic mutation stage of grey-box fuzzing; tests each mutation in isolation by executing the program on
• a tag assignment mechanism that uses dependencies to the derived input and inspecting the coverage, then it reverts
overcome roadblocks and to back a structure inference the change and moves to the next transformation.
scheme for chunk-based formats; In the havoc stage AFL applies a nondeterministic se-
• a prototype implementation of the approach on top of quence (or stack) of mutations before attempting execution.
QEMU and AFL that we call W EIZZ; Mutations come in a random number (1 to 256) and consists
• an experimental investigation where W EIZZ beats or in flips, increments, decrements, deletions, etc. at a random
matches a chunk-based CGF proposal that accesses a position in the input. AFL attempts alternate mutation se-
specification, and outperforms general-purpose fuzzers quences on the currently selected input: a power scheduler
over the applications we considered; chooses the number of inputs to generate at this stage based
• a discussion of some real-world bugs we discovered. on the program running time and the size of the generated
To facilitate analyses and extensions of our approach, coverage trace over the single input.
we will make W EIZZ available as open source.
2.2. Roadblocks
2. State of the Art In the fuzzing jargon roadblocks are comparison patterns
over the input that are intrinsically hard to overcome through
The last years have seen a large body of fuzzing-related blind mutations: the two most common embodiments are
works [12]. Grey-box fuzzers have gradually replaced ini- magic numbers, often found for instance in header fields,
tial black-box fuzzing proposals where mutation decisions and checksums, typically used to verify data integrity. In
happen without taking into account their impact on the ex- the Appendix we report examples for both from the chunk
ecution path [13]. Coverage-based grey-box fuzzing (CGF) parsing code of a PNG processing software (Section A.1).
uses lightweight instrumentation to measure code coverage, Format-specific dictionaries may help with magic num-
which discriminates whether mutated inputs are sufficiently bers, yet the fuzzer has to figure out where to place such
“interesting” by looking at control-flow changes. values. Researchers thus came up with a few approaches to
CGF is very effective in discovering bugs in real soft- handle magic numbers and checksums in grey-box fuzzers.
ware [4], yet it may struggle in the presence of roadblocks Sub-instruction Profiling. While there is no obvious
(like checksum and magic numbers) [5], or when mutating way to figure out how a large amount of logic can be
structured grammar-based [6] or chunk-based [4] input for- encoded in a single comparison, one could break multi-byte
mats. In the following we will describe the workings of the comparisons into single-byte [14] (or even single-bit [15])
popular CGF engine that W EIZZ builds upon, and recent checks to better track progress when attempting to match
proposals that cope with the challenges listed above. We constant values. LAF-I NTEL [14] and C OMPARE C OVER -
conclude by discussing where W EIZZ fits in this landscape. AGE [16] are compiler extensions to produce binaries for
such a fuzzing. H ONGG F UZZ [17] implements this tech- in mutations of the observed comparison operand values. A
nique for fuzzing programs when the source is available, topological sort of patches addresses nested checksums.
while AFL++ [18] can automatically transform binaries In the context of concolic fuzzers, E CLIPSER [27] re-
during fuzzing. S TEELIX [19] resorts to static analysis to laxes the path constraints over each input byte. It runs a
filter out likely uninteresting comparisons from profiling. program multiple times by mutating individual input bytes
Overall, sub-instruction profiling is valuable to overcome to identify the affected branches. Then it collects constraints
tests against magic numbers, but ineffective with checksums. resulting from them, considering however only linear and
Taint Analysis. Dynamic taint analysis (DTA) [20] can monotone relationships, as other constraint types would
track when and which input parts affect program instruc- likely require a full-fledged SMT solver. E CLIPSER then se-
tions. V UZZER [8] uses DTA on binary programs to bypass lects one of the branches and flips its constraints to generate
branches that check magic numbers, identified as compar- a new input, mimicking dynamic symbolic execution [25].
ison instructions where an operand is input-dependent and
the other is constant: V UZZER places the constant value 2.3. Format-aware Fuzzing
in the input portion that propagates directly to the former
operand. A NGORA [9] brings two technical improvements: Classic CGF techniques lose part of their efficacy when
it can identify magic numbers that are not contiguous in the dealing with structured input formats found in files. As
input thanks to a multi-byte form of DTA, and uses gradient mutations happen on bit-level representations of inputs,
descent to mutate tainted input bytes efficiently. It however they can hardly bring the structural changes required to
requires compiler transformations to enable such mutations. explore new compartments of the data processing logic of an
Symbolic Execution. While effective for magic bytes, application. Bringing format awareness into play can boost
DTA techniques can at best identify checksum tests, but CGF in this setting. In the literature we can distinguish
not provide sufficient information to address them. For techniques targeting either grammar-based formats, where
this reason several proposals (e.g., [21], [22], [23], [24]) inputs comply to a language grammar, or chunk-based ones,
use symbolic execution [25] to identify and try to solve where inputs follow a tree organization with data chunks
complex input constraints involved in generic roadblocks. resembling a C structure to form individual nodes. The latter
TAINT S COPE [23] proposes a methodology to identify pos- category seems prevalent in real-world software [4].
sible checksums with DTA, patch them away for the sake Grammar-based Fuzzing. L ANG F UZZ [28] is a black-
of fuzzing, and later try to repair inputs with symbolic ex- box fuzzer that using a Javascript grammar specification
ecution. The technique requires the user to provide specific generates valid inputs for a language interpreter by combin-
seeds and is subject to false positives [21], [24]. ing sample code fragments (taken, e.g., from bug reports)
Yet input constraints may be just too hard or long to and test cases. NAUTILUS [3] and S UPERION [2] are recent
solve, leading to timeouts without any actual progress. T- grey-box fuzzer proposals that can effectively test language
F UZZ [24] divides checks in two categories: non-critical, interpreters without requiring a large corpus of valid inputs
when they involve magic numbers, and critical, when they or fragments, but only an ANTLR grammar file.
describe checksums, suggesting symbolic execution to deal G RIMOIRE [6] removes the grammar specification re-
with the first kind, and manual analysis for the second. quirement. Based on R ED Q UEEN, it identifies fragments
D RILLER [21] follows a different design using symbolic from an initial set of inputs that trigger new coverage, and
execution as an alternate strategy to explore inputs, switch- strips them from parts that would not cause a coverage
ing to it when fuzzing exhausts a time budget obtaining no loss. G RIMOIRE notes information for such gaps, and later
coverage improvement. Q SYM [22] adopts a similar scheme, attempts to recursively splice in parts seen in other positions,
choosing concolic execution implemented via dynamic bi- thus mimicking grammar combinations.
nary instrumentation [26] to trade exhaustiveness for speed. Chunk-based Fuzzing. In the context of black-box tech-
Approximate Analyses. A recent trend is to explore niques, S PIKE [29] lets users describe the network protocol
solutions that approximately extract the information that in use for an application to improve the chance of finding
DTA or symbolic execution can bring without incurring their bugs. P EACH [30] generalizes this idea, applying format-
overheads. R ED Q UEEN [5] builds on the observation that aware mutations on an initial set of valid inputs using a
often input bytes flow directly, or after simple encodings user-defined input specification dubbed peach pit. As they
(e.g. swaps for endianness), into instruction operands. The are input-aware, some literature calls such blackbox fuzzers
authors show how to use this input-to-state correspondence smart [4]. One could argue that a smart grey-box variant
instead of DTA and symbolic execution to deal with magic would outperform them, as due to the lack of feedback (as in
bytes, multi-byte compares, and checksums. R ED Q UEEN explored code) they do not keep mutating interesting inputs.
in particular approximates DTA by changing input bytes AFLS MART [4] validates this speculation by adding
with random values (colorization) to see which compari- smart (or high-order) mutations to AFL that add, delete,
son operands change, suggesting a dependency. When both and splice chunks in an input. Using a peach pit, AFLS-
operands of a comparison change, but only for one there MART maintains a virtual structure of the current input,
is an input-to-state relationship, R ED Q UEEN deems it as a represented conceptually by a tree whose internal nodes are
likely checksum test. It then patches the operation, and later chunks and leaves are attributes. Chunks are characterized
attempts to repair the input with educated guesses consisting by initial and end position in the input, a format-specific
TABLE 1: Comparison with related approaches.
Fuzzer Binary Magic bytes Checksums Chunk-based Grammar-based Automatic
AFL++ 3 3 7 7 7 3
A NGORA 7 3 7 7 7 3
E CLIPSER 3 3 7 7 7 3
R ED Q UEEN 3 3 3 7 7 3
S TEELIX 3 3 7 7 7 3
NAUTILUS 7 3 7 7 3 7
S UPERION 3 3 7 7 3 7
G RIMOIRE 3 3 3 7 3 3
AFLS MART 3 3 7 3 7 7
W EIZZ 3 3 3 3 7 3
chunk type, and a list of children attributes and nested

chunks. An attribute is a field that can be mutated without
altering the structure. Chunk addition involves adding as
sibling a chunk taken from another input, with both having
a parent node of the same type. Chunk deletion trims the
corresponding input bytes. Chunk splicing replaces data in Figure 1: Two-stage architecture of W EIZZ.
a chunk using a chunk of the same type from another input.
As virtual structure construction is expensive, AFLS MART techniques. W EIZZ builds on tags to identify fields and
defers smart mutations based on the time elapsed since it chunks, mutating the contents of the formers while rearrang-
last found a new path, as trying them over every input would ing the latters via addition, deletion, and splicing operations.
make AFLS MART fall behind classic grey-box fuzzing. The two stages pick from a shared queue of inputs that
While developers have to write peach pits on purpose to previous iterations of either stages incrementally built and
apply smart fuzzers to their software, in other cases writing a annotated. W EIZZ also maintains a global structure CI to
format specification is a task already accomplished for other keep track of comparison instructions possibly involved in
purposes. L IB P ROTO B UF -M UTATOR [31] provides support checksum tests when analyzing one or more inputs.
for randomly mutating protocol buffers, which turns out
useful to generate seeds for classic grey-box fuzzers. 3.1. Surgical Stage
2.4. Discussion Algorithm 1 summarizes the working of the surgical

stage. Given an input I picked from the queue, W EIZZ starts
Tables 1 depicts where our proposal fits in the state by attempting to determine the dependencies between every
of the art of CGF techniques mentioned in the previous bit of the input taken individually and the comparison in-
sections. Modern general-purpose CGF fuzzers (for which structions in the program. The analysis is context-sensitive,
we choose a representative subset with the first five entries) that is, when analyzing a comparison instruction we take
can handle magic bytes, but only R ED Q UEEN proposes an into account the calling context. We encode the latter using a
effective solution for generic checksums. In the context of single word, and compute its exclusive OR with the address
grammar-based CGF fuzzers, G RIMOIRE is currently the of the instruction to identify the site of the comparison.
only fully automatic (i.e., no format specification needed) Procedure G ET D EPS builds two data structures, both in-
solution, and can handle both magic bytes and checksums dexed by a hash function (Sites in Figure 2) for comparison
thanks to the underlying R ED Q UEEN infrastructure. With sites. A comparison table CT stores values for operands
W EIZZ we bring similar enhancements to the context of involved in observed instances (we keep as many as |J|) of
chunk-based formats, proposing a scheme that at the same comparison instructions at different sites. For such operands
time can handle magic bytes and checksums and eliminates Deps stores which bytes in the input can influence them.
the need for a specification that characterizes AFLS MART. W EIZZ then moves to analyzing the recorded informa-
tion. For each site, it iterates over each instance present in
3. Methodology CT looking for input-to-state dependencies and checksum
information, and mutates bytes that can alter comparison
The fuzzing logic of W EIZZ comprises two stages. The operands leveraging recorded values. While the third step
surgical one (top part of Figure 1) identifies dependencies can run alongside the first two as portrayed in Figure 1, we
between an input and the comparison made in the program, serialize them in Algorithm 1 for the sake of readability.
and restrict deterministic mutations to the sole bytes that The stage then moves to tag construction, with each
apparently influence operands of comparison instructions. input byte initially untagged. W EIZZ computes a topological
The analysis of dependencies can identify input-to-state order over possibly found checksumming patterns: as we
(we use I2S for short in the following) relationships and will discuss later, this is useful when dealing with nested
checksums, and places tags on input bytes to efficiently checks. It then sorts comparison sites according to when
summarize the discovered information for the next stage. they were first encountered and processes them in this
The structure-aware stage (bottom of Figure 1) extends order. Procedure P LACE TAGS assigns tags to input bytes
the nondeterministic working of AFL with structure-aware taking into account input-to-state relationships R, checksum
J : {0 ... 28 − 1} Index of cmp instance
function S URGICAL F UZZING (I): Sites: Addr × Stack → {0 ... 216 -1} Site ID (cmp addr ⊕ calling context)
1 CT, Deps ← G ET D EPS (I) CT : Sites × J × {op1 , op2 } → V cmp site instance & operand → value
2 foreach s ∈ Sites, j ∈ J do Deps: Sites × J × {op1 , op2 } → A cmp site instance & operand → array
3 R(s, j) ← D ETECT I2S(Deps, CT, s, j, I) of len(input) booleans
4 CI ← M ARK C HECKSUMS(R(s, j), CI) R: Sites × J → InputT oStateInf o cmp site instance → I2S info
5 F UZZ O PERANDS(I, CI, Deps, s, j) CI : Addr → ChecksumInf o cmp addr → checksum info
6 Tags ← h0 ... 0i with |Tags| = len(I) Figure 2: Glossary and domain definitions.
7 o ← T OPOLOGICAL S ORT (CI, Deps)
8 foreach s: S ORT S ITES B Y E XEC O RDER (CT) do
9 Tags ← P LACE TAGS(Tags, Deps, s, CT, CI, R, o, I) W EIZZ looks for sites that see their operands changed as a
10 I, CI, fixOk ← F IX C HECKSUMS (CI, I, TAGS , O ) consequence of a mutation. The algorithm iterates over the
11 CI ← PATCH A LL C HECKSUMS (CI)
12 return I, Tags, fixOk
bits of each byte, flipping them one at a time and obtaining
a CT 0 for an instrumented execution with the new input
Algorithm 1: Surgical stage of W EIZZ.
(line 5). We mark the byte as a dependency for an operand
function G ET D EPS (I): of a comparison site instance (line 8) if its b-th byte has
1 CT ← RUN I NSTR (I)
changed in CT 0 compared to CT .
2 foreach b ∈ {0 ... len(I)-1} do
3 Deps(s, j, op)[b] ← false ∀ (s, j, op) ∈ dom(CT) Bit flips may however cause execution divergences, as
4 foreach k ∈ {0 ... 7} do some comparisons are likely responsible for steering the pro-
5 CT0 ← RUN I NSTR(B IT F LIP(I, b, k)) gram towards an execution path or another1 . Before updating
6 foreach (s, j, op) ∈ dom(CT) do dependency information, we check whether a comparison
7 if H ITS(CT, s) 6= H ITS(CT0 , s) then continue
8 Deps(s, j, op)[b] ← Deps(s, j, op)[b] ∨
site s witnessed a different number of instances in the two
CT(s, j, op) 6= CT0 (s, j, op) executions (line 7). This is a proxy that we use to determine
9 return CT, Deps whether execution did not diverge locally at the comparison
Algorithm 2: Dependency identification step. site. We prefer a local policy instead of enforcing that all
sites in CT and CT 0 see the same number of instances
as we tolerate non-local divergences. In fact for a mutated
information CI and the associated topological order o, the input some code portions can see their comparison operands
initially computed dependencies Deps, and data associated change (suggesting a dependency) without incurring control-
with comparison sites CT . flow divergences, while elsewhere the two executions may
Like other fuzzers [5], [23] W EIZZ forces checksums by differ too much to look for correlations.
patching involved instructions as it detects one, postponing As |J| is limited for efficiency, the reason why W EIZZ
the input repairing process required to meet the unaltered tracks the most recent instead of the first encountered in-
checksum condition. These are then carried out by procedure stances of a comparison site is the following. Several fuzzers
F IX C HECKSUMS, which also discards false positives. At the apply trimming transformations to shorten inputs: there is a
end of the surgical stage, W EIZZ re-applies patches to dodge good chance that such an input will explore a path hitting
in later stages known checksums for unrepaired inputs. a comparison site fewer than |J| times, and by choosing to
In the next sections we describe each above-mentioned monitor the most recent instances we learn new facts when
step. The output of this stage is an input annotated with analyzing longer inputs that lead to more than |J| instances.
discovered tags and repaired to meet checksums, and enters
the queue ready for the structure-aware stage to pick it up. 3.1.2. Analysis of Comparison Site Instances. After con-
3.1.1. Dependency Identification. A crucial feature that structing the dependencies, W EIZZ starts processing the data
makes a fuzzer effective is the ability to understand how recorded for up to |J| instances of each comparison site.
mutations over the input affect the execution of a pro- The first analysis is the detection of input-to-state cor-
gram. As we discussed in Section 2.2, heavy-duty program respondences with D ETECT I2S. R ED Q UEEN explores this
analyses providing means to this end are often costly, and concept to deal with roadblocks, based on the intuition that
recent proposals like R ED Q UEEN and E CLIPSER explore parts of the input can flow directly into the program state
fast approximate ways to extract dependency information. In in memory or registers used for comparisons (Section 2.2).
this paper we go down this avenue by proposing a possibly For instance, magic bytes found in headers are likely to take
more accurate dependency identification technique. part in comparison instructions as-is or after some simple
Algorithm 2 describes the G ET D EPS procedure used to encoding transformation [5]. We apply D ETECT I2S to every
this end. Initially we run the program over the current input operand in a comparison site instance and update the R data
instrumenting all comparison instructions. The output is a structure with newly discovered I2S facts.
comparison table CT that records the operands for the most The second analysis involves the detection of checksum-
recent |J| observations (instances) of a comparison site. ming sequences with M ARK C HECKSUMS. Similarly as in
For each comparison site CT also keeps a timestamp for R ED Q UEEN, we mark a comparison instruction as likely
when RUN I NSTR first observed it, and how many times it involved in a checksum if the following conditions hold:
encountered such site in the execution (H ITS at line 7). 1) one operand is I2S and its size is at least 2 bytes;
W EIZZ then attempts to determine which input bytes 2) the other operand is not I2S and G ET D EPS revealed
contribute to operands at comparison sites, either directly
or through derived values. By altering bytes individually, 1. When such an input is interesting (Section 2.1) we add it to the queue.
dependencies
V on some input bytes; the execution, and attempts an inference for each input byte.
3) b (Deps(s, j, op1 )[b], Deps(s, j, op2 )[b]) = false: the We apply the procedure to instruction operands individually.
sets of their byte dependencies are disjoint. If for the current byte b we find no dependency among
The intuition is that code compares the I2S operand, all recorded instances, the cycle advances to the next byte
which we speculate to be the expected value, against a value (line 2), otherwise we compute a numDeps candidate, i.e.,
derived from input bytes, and those bytes do not affect the the number n of input bytes that affect the instruction
I2S operand or we would have a circular dependency. We (line 3). If the byte is untagged we tag it with the cur-
choose a 2-byte minimum size to reduce false positives. rent instruction, otherwise we consider a reassignment. If
Like prior works we patch candidate checksum tests to the currently assigned tag T ag[b] does not come from a
be always met, and defer to a later stage both the identifi- checksum check and n is smaller than T ags[b].numDeps
cation of false positives and the input repairing required to (line 5), we reassign the tag as fewer dependencies suggest
meet the condition(s) from the original unpatched program. that the instruction may be more representative for the byte.
Finally, the F UZZ O PERANDS step replaces the deter- When W EIZZ finds a comparison treating the byte as
ministic mutations of AFL with surgical byte substitutions from a checksum value (Section 3.1.2), we always assign it
driven by data seen at comparison sites. For each input as its tag. To populate the dependsOn tag field we use a
byte, W EIZZ determines each CT entry (i.e., each operand topological sort of the dependencies Deps over each input
of a comparison site instance) it influences. Then W EIZZ byte, that is, W EIZZ knows when a byte part of a checksum
replaces the byte and, depending on the operand size, its value represents also a byte that one or more outer check-
surrounding ones using the value recorded for the other sums verify for integrity. W EIZZ will later repair inputs
operand. The replacement can use such value as-is, rotate starting from the innermost checksum, with dependsOn
it for endianness, increment/decrement it by one, perform pointing to the next comparison to process. Our technique is
zero/sign extension, or apply ASCII encoding/decoding of different than the one of R ED Q UEEN, which due to lack of
digits. Each substitution yields an input that we add to the accurate dependencies tries to build a topological sort based
queue if its execution reveals a coverage improvement. on clashes found in repair attempts with different orders.
3.1.3. Tag Placement. W EIZZ succinctly describes depen- 3.1.4. Checksum Validation and Input Repair. The last
dency information between input bytes and performed com- conceptual step of the surgical stage involves checksum val-
parisons by annotating such bytes with tags. Later steps idation and input repair with respect to checksums patched
like input repairing for checksums and structural information in the program during previous iterations. In the process
inference build heavily on the availability of such tags. W EIZZ also discards false positives among such checksums.
For the b-th input byte structure T ags[b] stores as fields: Procedure F IX C HECKSUMS (Algorithm 6 in the Ap-
• id: (the address of) the comparison instruction chosen pendix) filters tags marked as involved in checksum fields
as most significant among those b influences; for which a patch is active, and processes them in the order
• ts: the timestamp of the id instruction when first met; described in the previous section. W EIZZ initially executes
• parent: the comparison instruction that led W EIZZ to the patched program, obtaining a comparison table CT and
the most recent tag assignment prior to T ags[b]; the hash h of the coverage map as execution path footprint.
• dependsOn: when b contains a checksum value, the Then it iterates over each filtered tag as follows. It first
comparison instruction for the innermost nested check- extracts the input-derived operand value of the comparison
sum that verifies the integrity of b, if any; instruction (i.e., the checksum computed by the program)
• flags: stores the operand affected by b, if it is I2S, or and where in the input the contents of the other operand
if it is part of a checksum field; (which is I2S, see Section 3.1.2) are located, then it replaces
• numDeps: the number of input bytes that the operand the input bytes with the expected value (lines 4-5). W EIZZ
in flags depends on. makes a new instrumented execution over the repaired input,
The tag assignment process relies on spatial and tem- disabling the patch. If it handled the checksum correctly,
poral locality in how comparison instructions get executed. the resulting hash h0 will be identical to h implying no
W EIZZ tries to infer structural properties of the input based execution divergence happened, otherwise the checksum is
on the intuition that a program typically uses distinct instruc- a false positive and we have to bail.
tions to process structurally distinct items. We can thus tag After processing all the tags, we may still have patches
input bytes as related when they reach one same comparison left in the program for checksums not reflected by tags.
directly or through derived data. When more candidates are W EIZZ may in fact erroneously treat a comparison instruc-
found, we use temporal information to prioritize instructions tion as a checksum due to inaccurate dependency informa-
with a lower timestamp, as input format validation normally tion. W EIZZ removes all such patches and if after that the
takes place in parsing code from the early stages of the footprint for the execution does not change, the input repair
execution. As we elaborate later, we extend this scheme to process succeeded. This happens typically with patches for
account for checksums and to reassign tags when heuristics branch outcomes that turn out to be already “implied” by
spot a likely better candidate than the current choice. the repaired tag-dependent checksums.
Procedure P LACE TAGS (Algorithm 5 in the Appendix) Otherwise W EIZZ tries to determine which patch not
iterates over comparison sites sorted by when first met in reflected by tags may individually cause a divergence: it
tries patches one at a time and compares footprints to spot function F IELD M UTATION (Tags, I):
1 b ← pick u.a.r. from {0 ... len(I)}
offending ones. This happens when W EIZZ initially finds a 2 for i ∈ {b ... len(I)-1} do
checksum for an input and the analysis of a different input 3 if Tags[i].id==0 then continue
reveals an overfitting, so we mark it as a false positive. 4 start ← G O L EFT W HILE S AME TAG(Tags, i)
At the end of the process W EIZZ re-applies patches that 5 with probab. PrI2S or if !I2S(Tags[start]) then
6 end ← F IND F IELD E ND(Tags, start, 0)
were not a false positive: this will benefit both the second 7 I ← M UTATE(I, start, end); break
stage and future surgical iterations over other inputs. W EIZZ 8 return I
then also applies patches for checksums newly discovered by Algorithm 3: Field identification and mutation.
M ARK C HECKSUMS in the current surgical stage, so when
1
t-1
t+
1
W EIZZ will analyze again the input (or similar ones from
d+
t
d-
ar
ar
ar
d
en
en
en
st
st
st
F UZZ O PERAND) it will be able to overcome them. id B A A A A D
Tags
ts 1 5 5 5 5 16
3.2. Structure-aware Stage id B A E G B D Tags

ts 1 5 6 7 8 16
The second stage of W EIZZ generates new inputs via id B A A G G D

nondeterministic mutations that can build on tags assigned Tags
ts 1 5 5 6 6 16
in the surgical stage. Like in the AFL realm, the number of
derived inputs depends on an energy score that the power Figure 3: Field patterns identified by W EIZZ.
scheduler assigns to the original input (Section 2.1). Each
input is the result of a variable number (1 to 256) of stacked in a field using a different instruction, as in the following
mutations, with W EIZZ picking nondeterministically at each fragment from the lodepng library:
step a havoc, a field, or a chunk mutation scheme.
unsigned char lodepng_chunk_type_equals(const unsigned
The choice of keeping the havoc mutations of AFL is char* chunk, const char* type) {
shared with previous chunk-oriented works [4], as com- if (strlen(type) != 4) return 0;
return (chunk[4]==type[0] && chunk[5]==type[1] &&
bining them with structure-aware mutations improves the chunk[6]==type[2] && chunk[7]==type[3]);
scalability of the approach. }
Field mutations build on tags assigned in the surgical which checks each byte in the input string chunk using a
stage to selectively alter bytes that together likely represent different comparison statement. Such code for instance often
a single piece of information. Chunk mutations target instead appears in program to account for differences in endianness.
higher-level, larger structural transformations driven by tags. The two patterns may be combined to form a third one, as
As our inference strategy for fields and chunks is not in the bottom part of Figure 3, that we consider as well.
sound, especially if compared to manually written format Let us present how W EIZZ captures the patterns in-
specifications, we give W EIZZ leeway in their identification stantiated in Figure 3. For the first pattern, since a single
process. W EIZZ never reconstructs the input structure in instruction checks all the bytes in the field, we expect to see
full, but only identifies individual fields or chunks in a the corresponding input bytes marked with the same tag. For
nondeterministic manner when performing a mutation. The the second pattern, we expect instead to have consecutive
downside of this design choice is that W EIZZ may repeat bytes marked with different tags but consecutive associated
work or make conflicting choices among mutations, as it timestamps, as no other comparison instruction intervenes.
does not keep track of past choices. On the bright side, it In the figure we can see a field made of bytes with tag
allows W EIZZ to not commit to possibly erroneous choices ids {A, E, G, B} having respective timestamps {5, 6, 7, 8}.
when applying one of its heuristics, limiting their impact to The third pattern implies having (two or more) subsequences
the subsequent mutations, and granting more leeway to the made internally of the same tag, with the tag changing across
overall fuzzing process in exploring alternate scenarios. them but with a timestamp difference of one unit only.
In the next sections we will describe how W EIZZ uses Procedure F IELD M UTATION (Algorithm 3) looks non-
input tags to identify fields and chunks and the mutations it deterministically for a field by choosing a random position
applies. At the end of the process, we execute the program b in the input. If there is no tag for the current byte, the
over the input generated from the stacked mutations, looking cursor advances until it reaches a tagged byte (line 3). On
for crashes and coverage improvements. We promptly repair the contrary, if the byte is tagged W EIZZ checks whether it is
inputs that cause a new crash, while the remaining ones the first byte in the candidate field or there are any preceding
undergo repair when they eventually enter the surgical stage. bytes with the same tag, retracting to the leftmost in the
latter case (line 4). With the start of the field now found,
3.2.1. Fields. Field identification is a heuristic process de- W EIZZ evaluates whether to mutate it: if the initial byte is
signed around prominent patterns that W EIZZ should iden- input-to-state, the mutation happens only with a probability
tify. The first one is straightforward and sees a program as such a byte could represent a magic number. Altering
checking a field using a single comparison instruction. We a magic number would lead to program paths handling
believe it to be the most common in real-world software. The invalid inputs, which in general are less appealing for a
second one instead sees a program comparing every byte fuzzer. The extent of the mutation is decided by the helper
function F IND C HUNK E ND (Tags, k): it in the chunk as they inspect parent information. The
1 end ← G O R IGHT W HILE S AME TAG (Tags, k)
2 while Tags[end+1].ts >= Tags[k].ts do
parent is the comparison from the tag assignment prior to
3 end ← F IND C HUNK E ND(Tags, end+1) the current one, and the first activation of F IND C HUNK E ND
4 while Tags[end+1].id == Tags[k].parent do will see that the tag of the first byte of cksm matches the
5 end ← end +1 parent for the tag for type.
6 with probability Prextend do
To explain lines 6-9 in F IND C HUNK E ND, let us consider
7 while Tags[end+1].id == 0 do end ← end +1
8 while Tags[end+1].ts >= Tags[k].ts do variants of the structure that exercise them. In Figure 4b
9 end ← F IND C HUNK E ND(Tags, end+1) cksm comes before type, and in this case the algorithm
10 return end would skip over lines 2-3 without incrementing end, thus
Algorithm 4: Boundary identification for chunks. missing x and y. Lines 8-9 can add to the chunk bytes from
adjacent fields as long as they have increasing timestamps
struct { // lines struct { struct {
int type; // 1-3 int type; // 1-3 int type; // 1-3 with respect to the one from the tag for the k -th byte (the
int x, y; // 2-3 int cksm; // 4-5 int x, y; // 2-3 first byte in type in this case). In Figure 4c we added an
int cksm; // 4-5 int x, y; // 8-9 int cksm; // 4-5
} } char data[64]; // 7-9 array data of 64 bytes to the structure: it may happen that
(a) (b)
}
(c) W EIZZ leaves binary blobs untagged if the program does
rt-
1 t
r rt+
1
d-1 nd nd+
1 1
d’- nd’
not make comparisons over them or some derived values.
sta sta sta en e e en e
id B A A A A D D D D G G G G C C C C - - ... - - Line 7 can add such sequences to a chunk, extending end
ts 1 5 5 5 5 8 8 8 8 6 6 6 6 2 2 2 2 - - ... - -
parent F C C C C G G G G A A A A Z Z Z Z - - ... - - to the new end0 value depicted in Figure 4d.
type x y checksum data With F IND C HUNK E ND we propose an inference scheme
(d) inspired by the layout of popular formats. The first field in
Figure 4: Examples of chunks that W EIZZ can look for. a chunk is typically also the one that the program analyzes
first, and we leverage this fact to speculate where a chunk
procedure F IND F IELD E ND (Algorithm 7 in the Appendix), may start. If the program verifies a checksum before ac-
which looks for sequences of tagged bytes that meet one of cessing even that field, we link it to the chunk using parent
the three patterns discussed above. F IELD M UTATION then information, otherwise “later” checksums represent a normal
alters the field by choosing one among the twelve AFL data field. Lines 7-9 enlarge chunks to account for different
length-preserving havoc transformations over its bytes. layouts and tag orderings, but only with a probability to
avoid excessive extension. The algorithm can also capture
3.2.2. Chunks. W EIZZ relies on heuristics to locate plau- partially tagged blobs through the recursive steps it can take.
sible boundaries for chunks in the input sequence, then it Observe however that in some formats there might be
then applies higher-order mutations to candidate chunks. dominant chunk types, and choosing the initial position
As a running example we discuss variants of a structure randomly would reveal a chunk of non-dominant type with
representative of many chunks found in binary formats. The a smaller probability. We devise an alternate heuristic that
first variant (Figure 4a) has four fields: a type assigned copes better with this scenario. The heuristic randomly
with some well-known constant, two data fields x and y, and picks one of the comparison instructions appearing in tags,
a cksm value for integrity checking. Let us consider a pro- assembles non-overlapping chunks with F IND C HUNK E ND
gram that first computes the checksum of the structure and starting at a byte tagged by such an instruction, and picks
compares it against the namesake field, then it verifies the one among the built chunks. As W EIZZ is unaware of the
type and x fields in this order, executes possibly unrelated format characteristics, we choose between the two heuristics
comparisons (i.e., they do not alter tags for the structure with a probability (Algorithm 8 in the Appendix).
bytes), and later on makes a comparison depending on y. For the current chunk selection, W EIZZ attempts one of
For the sake of simplicity we assume that field processing the following higher-order mutations (Section 2.3):
matches the first pattern seen in the previous section (i.e., • Addition. It adds a chunk from another input to the
one comparison per field). The output of P LACE TAGS will chunk that encloses the current one. W EIZZ picks a
be coherent with the graphical representation of Figure 4d. tagged input I0 from the queue and looks for chunks
We now describe how our technique is able to identify in it starting with bytes tagged with the same parent
the boundaries of this chunk using tags and their timestamps. information of the leading bytes in the current chunk. It
Similarly as with fields, we pick a random position b within picks one such chunk randomly and extends the current
the input and analyze the preceding bytes, retracting the input by inserting the associated bytes before or after
cursor as long as the tag stays the same. For any b initially the current chunk. While AFLS MART relies on the
falling within the 4 bytes of type, W EIZZ finds the same specification for nesting information, we use the parent
start index for the chunk. To find the end position it resorts tag as a proxy for it.
to F IND C HUNK E ND (Algorithm 4). • Deletion. It removes the input bytes for the chunk.
The procedure initially recognizes the 4 bytes of type • Splicing. It picks a similar chunk from another input
(line 1), then it recursively looks for adjacent fields as long to replace the current one. It scans that input looking
as their timestamps are higher than the one for the current for chunks starting with the same tag of the current
field (lines 2-3). This recursive step reveals fields x and y. one (AFLS MART relies on type information), randomly
The cksm field has a lower timestamp value (as the program picks one, and substitutes the bytes.
checked it before analyzing type), but lines 4-5 can include
4. Implementation copy the tags from other input to the newly added bytes;
(ii) for deletion, keep the tags for the unaffected bytes; (iii)
We implemented our ideas on top of AFL 2.52b and for splicing, use the tags from the other input. Derived tags
QEMU 3.1.0 for x86-64 Linux targets. In the following we in our experience are valuable to speed up the exploration,
present instrumentation aspects specific to our approach, and and eventually will be replaced by the actual tags when the
discuss a few implementation tunings that we devised. input will undergo the surgical stage. In this scenario, the
only case where we leave an input untagged is in presence
Instrumentation. W EIZZ computes context-sensitivity in- of a havoc mutation that alters the input size.
formation for branch coverage and comparison tables by
maintaining a shadow call stack in QEMU. To deal with 5. Evaluation
the often large number of contexts in programs [32], like in
A NGORA we extend the number of buckets in the coverage We now present the results of a three-pronged evaluation
map from 216 as in AFL to 218 , and compute the index of our prototype to address the following research questions:
using the source and destination basic block addresses and a
• RQ 1. How does W EIZZ compare to state-of-the-art
one-word hash of the call stack. In the A NGORA experience,
fuzzers on targets that process chunk-based formats?
context-sensitive branch coverage may let a fuzzer explore
programs more pervasively [9]. For CT we choose to host • RQ 2. Can W EIZZ identify new bugs?
up to 216 entries since there is one less degree of freedom • RQ 3. How do tags relate to the actual input formats?
in computing indexes in the Sites function (Figure 2). And what is the role of structural mutations and road-
Natural candidates for populating CT are cmp and block bypassing in the observed improvements?
sub instructions. However, like R ED Q UEEN we also record For RQ1 we analyze the code coverage from W EIZZ and
call instructions that may involve comparator functions: a selection of fuzzers on applications evaluated in previous
we check whether the locations for the first two arguments works (e.g., [4], [27], and [8]) and that process chunk-based
according to the Linux x86-64 ABI contain valid pointers, formats. W EIZZ achieves high code coverage even when
and dump 32 bytes for each operand2 . Treating such calls compared to AFLS MART that relies on a specification. For
as operands (think of functions that behave like memcmp) RQ2, we report on 11 bugs found in well-tested software,
may improve the coverage, especially when the fuzzer is discussing 3 of them in more detail. Finally, for RQ3 we re-
configured not to instrument external libraries. port on a few case studies suggesting that the tags we assign
W EIZZ may still miss dependencies between the input can strongly resemble the actual format, and that structural
and the behavior of a program. This happens for instance in mutations are crucial for the effectiveness of W EIZZ.
optimized code when a compiler uses arithmetic and logical
operations to set CPU flags before a branch decision. In Sec- Experimental setup. We ran our tests on a server with
tion 6 we discuss possible remedies. We currently opted for two Intel Xeon E5-4610v2@2.30GHz CPUs and 256 GB of
tolerating some inconsistencies in the heuristics we use, for RAM, running Debian 9.2. To compare different fuzzers, we
instance skipping over one byte in F IND C HUNK E ND when measured the cumulative basic block coverage from running
the remaining bytes would match the expected patterns. an application over the entire set of inputs generated by the
fuzzer. We repeated each experiment 5 times, plotting the
median value in the charts and reporting in the Appendix
Deferred Surgical Fuzzing. Although the surgical stage
60%-confidence intervals as in [5]. For the second stage of
provides crucial information on input dependencies, its ex-
W EIZZ, we set Prf ield = Prchunk = 1/15, consistently with
ecution can be quite expensive. For this reason, the current
the probability of applying smart mutations in AFLS MART.
implementation of W EIZZ upon picking an input from the
To facilitate reproducibility, we plan to release our ex-
queue dynamically evaluates when it may be worth skipping
perimental setup along with source code of W EIZZ.
it, advancing to the second stage. Similarly to AFLS MART,
W EIZZ tracks the time elapsed since it last discovered an
interesting input. The probability of entering the surgical 5.1. RQ1: Chunk-based Formats
stage increases with it and becomes one after 50 seconds.
However, an input that has no tag information can only We evaluate the effectiveness in terms of code coverage
undergo havoc mutations in the second stage. An important for W EIZZ on the targets reported in Table 2. We start by
practical extension to our technique is the concept of derived comparing W EIZZ against the state-of-the-art solution for
tags. A derived tag is an educated guess on the actual chunk-based formats AFLS MART, which can apply higher-
tag based on tags seen in a similar input. Consider the order mutations over a virtual input structure (Section 2.3).
P LACE TAGS step in the surgical stage: when the assignment We then extend our comparison considering general-purpose
terminates, we can propagate this information to all the fuzzers that can still be quite effective in practice on this
similar inputs that F UZZ O PERANDS just added to the queue type of programs as seen in past works, e.g., [4] and [27].
by processing the present input. Similarly, when we attempt
high-order mutations over chunks we can: (i) for addition, 5.1.1. AFLS MART. For this comparison we study the top 6
applications from Table 2 considered in past evaluations of
2. Thus we can only monitor |J|/4=64 instances. AFLS MART to exercise 6 input formats. For AFLS MART
TABLE 2: Target applications considered in the evaluation. TABLE 3: W EIZZ vs. best alternative for targets of Figure 5:
P ROGRAM R ELEASE I NPUT F ORMAT a basic block (BB) coverage percentage higher than 100
wavpack 5.1.0 WAV means that W EIZZ outperforms the best alternative fuzzer.
decompress (openjp2) 2.3.1-git JP2
ffmpeg 4.2.1 AVI P ROGRAM B EST A LTERNATIVE BB % W. R . T. B EST C RASHES FOUND BY
libpng 1.6.37 PNG
wavpack AFLS MART 106% all
readelf (GNU binutils) 2.3.0 ELF
djpeg (libjpeg-turbo) git:5db6a68 JPEG readelf W EIZZ† 110% none
decompress AFLS MART 100% none
objdump (GNU binutils) 2.32.51 ELF
djpeg AFLS MART 96% none
mpg321 0.3.2 MP3
oggdec (vorbis-tools) 1.4.0 OGG libpng W EIZZ† 132% none
tcpdump 4.9.2 PCAP ffmpeg W EIZZ† 114% W EIZZ
gif2rgb (GIFLIB) 5.2.1 GIF
wavpack readelf
2000 8000
1800 7000
we use the peach pits written and released by its authors, and 1600
6000
1400
the latest release when we ran our tests (commit 604c40c). 1200 5000
We measured the code coverage achieved by: (a) AFLS- 1000 4000
800
MART with stacked mutations but without deterministic 3000
600
stages as suggested by its documentation, (b) W EIZZ in 400
2000
its standard configuration, and (c) a variant W EIZZ† where 200 1000
0 0
we disable F UZZ O PERANDS, checksum patching and input decompress djpeg
6000 2500
repairing. W EIZZ† allows us to focus on the effects of tag-
based mutations, without benefits from our roadblock by- 5000
2000
passing capabilities that AFLS MART misses. Nevertheless, 4000

1500
we provide (a) and (c) with a dictionary of tokens for format 3000
under analysis, like in past evaluations of AFLS MART. We 2000

1000
use AFL test cases as seeds, except for wavpack and ffmpeg 500
1000
where we use minimal syntactically valid files [33].
0 0
Figure 5 plots the median basic block coverage after 5 libpng ffmpeg
1800 18000
hours. Compared to AFLS MART, W EIZZ brings appreciably 1600 16000
higher coverage on 3 out of 6 subjects (readelf, libpng, 1400 14000
ffmpeg), slightly higher on wavpack, comparable on de- 1200 12000
compress, and slightly worse on djpeg. To understand these 1000 10000
results, we will first consider where W EIZZ† lies, and then 800
600
8000
6000
discuss the benefits from the additional features of W EIZZ. 400 4000
Higher coverage in W EIZZ† comes from the different 200 2000
approach to structural mutations: AFLS MART follows a 0

00 01 02 03 04 05 00 01 02 03 04 05
0
format specification, while we rely on how a program han- W EIZZ AFLS MART W EIZZ†
dles the input bytes. Hence, W EIZZ† can reveal behaviors
Figure 5: Basic block coverage over time (5 hours).
that characterize the actual implementation and that may
not be necessarily anticipated by the specification. The first
implication is that W EIZZ† mutates only portions that the found the same bugs. For ffmpeg, W EIZZ found a bug from a
program has already processed in the execution, as tags division by zero in a code section handling multiple formats:
derive from executed comparisons. The second is that im- we reported the bug and its developers promptly fixed it.
precision in the inference of structural facts may actually The I2S-related features of W EIZZ turned out very effective
benefit W EIZZ† . The authors of AFLS MART acknowledge to generate inputs that deviate significantly from the initial
the importance of relaxed specification to expose imprecise seed, e.g., mutating an AVI file into an MPEG-4 one. In
implementations [4]: we will return to this in Section 6. Section 5.2 we will back this claim with a few case studies.
When we consider the techniques disabled in W EIZZ† ,
W EIZZ brings higher coverage for two reasons. I2S facts 5.1.2. Grey-box Fuzzers. For this experiment we analyze
help W EIZZ replace magic bytes only in the right spots the bottom 8 programs from Table 2. These are popular
compared to dictionaries, which can also be incomplete. open source applications handling chunk-based formats that
This turns out important also in multi-format applications the community has heavily tested (e.g., by the OSS-Fuzz
like ffmpeg. Then, checksum bypassing is crucial to generate project for continuous fuzzing [34]), and were used in past
valid inputs for programs that check data integrity, such as evaluations of state-of-the-art fuzzers like [4], [5], [8], [19].
libpng that computes CRC-32 values over data. We tested the following fuzzers: (a) AFL 2.53b in
Table 3 compares W EIZZ with the best alternative for QEMU mode, (b) AFL++ 2.54c in QEMU mode enabling
code coverage at the end of the experiment. Interestingly, the CompareCoverage feature, and (c) E CLIPSER with its
W EIZZ† is a better alternative than AFLS MART for 3 sub- latest release when we ran our tests (commit 8a00591) and
jects. We also consider the collected crashes, found only its default configuration. As W EIZZ targets binary programs,
for wavpack and ffmpeg. In the first case, the three fuzzers we rule out fuzzers like A NGORA that require source instru-
TABLE 4: W EIZZ vs. best alternative for targets of Figure 6. terms of coverage over time. Consistently with previous
Columns have the same meaning used in Table 3. evaluations [27], we use as initial seed a string made of
P ROGRAM B EST A LTERNATIVE BB % W. R . T. B EST C RASHES FOUND BY the ASCII character "0" repeated 72 times. However, for
djpeg
libpng
AFL
AFL
105%
180%
none
none
libpng and tcpdump we fall back to a valid initial input3 as
objdump AFL++ 120% W EIZZ no fuzzer (excluding W EIZZ on libpng) achieved significant
mpg321 AFL++ 100% W EIZZ, AFL++
oggdec E CLIPSER 108% none coverage with artificial seeds. We also provide AFL with
readelf AFL++ 175% none format-specific dictionaries to aid it with magic numbers.
tcpdump E CLIPSER 117% none
gif2rgb AFL 100% none Figure 6 plots the median basic block coverage over
djpeg libpng
time. W EIZZ on 5 out 8 targets (libpng, oggdec, tcpdump,
700 2000 objdump and readelf) achieves significantly higher code
1800
600
1600
coverage than other fuzzers. The first three in the list above
500 1400 process formats with checksum fields that only W EIZZ re-
1200
400
1000
pairs, although E CLIPSER appears to catch up over time for
300 800 oggdec. The structural mutation capabilities combined with
200 600
400
this factor may explain the gap between W EIZZ and other
100
200 fuzzers. For objdump and readelf, by the way R ED Q UEEN
0
objdump mpg321
0
outperforms other fuzzers over them in its evaluation [5] we
3500 450
400
may speculate I2S facts4 are boosting the work of W EIZZ.
3000
350 On mpg321 and gif2rgb, AFL-based fuzzers perform
2500
300 very similarly, with a large margin over E CLIPSER, con-
2000 250
firming that the AFL standard mutations can be effective on
1500 200
150
some chunk-based formats. Finally, W EIZZ leads for djpeg
1000
100 but the other fuzzers are not too far from it.
500
50
Table 4 compares W EIZZ with the best alternative when
0 0
oggdec readelf the budget expires. Interestingly, AFL is the best alternative
400 8000
350 7000
for 3 programs. When taking into account crashes, W EIZZ
300 6000 and AFL++ generated the same crashing inputs for mpg321,
250 5000 while only W EIZZ revealed a crash for objdump.
200 4000
150 3000
100 2000
5.2. RQ2: Zero-day Bugs
50 1000
0 0 A crucial feature of W EIZZ is that it does not require a

tcpdump gif2rgb
9000 500 format specification for a program, allowing users to analyze
450
8000
400
quite different applications without additional effort.
7000
6000
350 To explore the effectiveness of W EIZZ, we ran it on
5000
300
250
several real-world targets, including software to process
4000
200 inputs not strictly falling within the chunk-based paradigm.
3000
2000
150
W EIZZ revealed 11 zero-day bugs in 7 popular, well-tested
100
1000 50 applications: 6 bugs are NULL pointer dereferences (CWE-
0
00 03 06 09 12 15 18 21 00 00 03 06 09 12 15 18 21 00
0 476), 1 involves an unaligned realloc (CWE-761), 2 can
W EIZZ E CLIPSER AFL++ AFL lead to buffer overflows (CWE-122), 1 causes an out-of-
Figure 6: Basic block coverage over time (24 hours). bounds read (CWE-125), and 1 a division by zero (CWE-
369). Table 5 reports all the bugs. Crashes from the RQ1
experiments that involve known bugs do not appear in it. We
mentation. For sub-instruction profiling, while S TEELIX is now present details on three bugs that can provide interesting
not publicly available, we choose AFL++ instead. insights on the effectiveness and versatility of W EIZZ.
While considering R ED Q UEEN could be tantalizing, we
believe its hardware-assisted instrumentation could make an CUPS. The Common UNIX Printing System is an open
unfair edge here as we compare coverage of QEMU-based source project developed by Apple and used in many UNIX-
proposals, as the different overheads may mask why a given like systems. While its HTML interface is not chunk-
methodology reaches some coverage. Also, its authors had oriented, we checked whether W EIZZ could mutate the
not released its code when we ran our experiments, and our HTTP requests enclosing it. Interestingly, W EIZZ crafted a
infrastructure does not meet its requirements. We attempt an request that led CUPS to reallocate a user-controlled buffer,
experiment in Section 5.3 by configuring W EIZZ to resemble paving the road to a House of Spirit attack [35]. The key to
its features, and compare the two techniques in Section 6. finding the bug was to have F UZZ O PERANDS replace some
As the competitors we consider have no provisions 3. The not_kitty.png AFL test case for libpng, and a PCAP of a
for structural mutations, we opted for a larger budget of few seconds from a capture-the-flag competition for tcpdump.
24 hours in order to see whether they could recover in 4. For objdump logging function call arguments turns out crucial, see [5].
TABLE 5: Bugs found and reported in real-world software. [89 50 4E 47 0D 0A 1A 0A][00][00 00 0D][49 48 44 52]
[00 00 00 20 00 00 00 20 08 03 00 00 00 44 A4 8A
Program Bug ID Type
C6][00][00 00 19][74 45 58 74][53 6F 66 74 77 61 72
objdump Bugzilla #24938 CWE-476
CUPS rdar://problem/50000749 CWE-761 65][00 41 64 6F 62 65 20 49][6D 61 67 65 52 65 61
CUPS GitHub #5598 CWE-476 64 79][71 C9 65 3C][00][00 00 0F][50 4C 54 45][66 CC
libmirage (CDEmu) CVE-2019-15540 CWE-122 CC][FF FF FF 00 00 00 33 99 66 99 FF CC 3E 4C AF
libmirage (CDEmu) CVE-2019-15757 CWE-476 15][00][00 00 61][49 44 41 54] 78 DA DC 93 31 0E C0
dmg2img Launchpad #1835461 CWE-476 20 0C 03 93 98 FF BF B9 34 14 09 D4 3A 61 61 A0
dmg2img Launchpad #1835463 CWE-125 37 B0 F8 24 0B 0B [44 13 44 D5 02 6E C1 0A 47 CC
dmg2img Launchpad #1835465 CWE-476 05 A1] E0 [A7][82 6F 17 08 CF][BA][54 A8 21 30 1A 6F
jbig2enc GitHub #65 CWE-476 01][F0 93][56 B4 3C 10] 7A 4D 20 4C][F9 B7][30 E4 44
mpg321 Launchpad #1842445 CWE-122 48 96 44 22 4C][43 EE A9 38 F6 C9 D5 9B 51 6C E5
libavformat (ffmpeg) Ticket #8335 CWE-369 F3 26 5C][02 0C 00][D2 66 03 35][B0 D7 CB 9A] 00 00
00 00 49 45 4E 44 AE 42 60 82
[]: field delimitators from WEIZZ HEX: checksum field from WEIZZ
current input bytes with I2S facts (’Accept-Language’ PNG fields: PNG_SIGNATURE LENGTH TYPE PNG_CHUNK_IHDR CRC-32
logged as operand for a call to _cups_strcasecmp), IDAT DATA PNG_CHUNK_TEXT LABEL PNG_CHUNK_TEXT DATA PNG_CHUNK_PLTE
thus materializing a valid request field that chunk mutations
Figure 7: Fields identified by W EIZZ on the
later duplicated. Apple confirmed the bug and fixed it in the
not_kitty.png test case when fuzzing libpng.
release 2.3b8. We also reported a NULL pointer dereference
from an HTTP request, fixed in the release 2.3.0.
libMirage. This library provides uniform access to data delimit each field we found in square brackets, and underline
stored in several disk image formats. W EIZZ identified a bytes that W EIZZ deems from checksum fields. W EIZZ
critical bug: an attacker can generate a heap-based buffer identifies correct boundaries with a few exceptions. The last
overflow, which in turn may corrupt allocator metadata and three fields in the file are not revealed, as they are part of a
even grant root access to the attacker, as the CDEmu daemon chunk that libpng never accesses. In some cases a data field
using it runs with root privileges on most Linux systems. is merged with the adjacent checksum value. The reason is
Interestingly, we fed W EIZZ with an ISO image as initial that libpng does not execute comparisons over the data bytes
seed and it exposed the bug in the NRG image parser, but first computes their checksum, and verifies it against the
demonstrating how it can deviate even considerably among expected value using a comparison that will characterize also
input formats based on how the code handles input bytes. the data bytes (for the data-flow dependency). Dependency
analysis on the operands (Section 3.1.2) however identifies
objdump. The GNU binary utilities (binutils) are one
the checksum correctly for F IX C HECKSUMS, and when
of the most tested program suites (e.g., [34]). W EIZZ found
the input gets repaired and eventually enters the surgical
a NULL pointer dereference inside the objdump tool for
stage again, libpng will execute new comparison instructions
displaying information about object files. Similary as with
over the data bytes, and W EIZZ will identify the data field
libMirage, W EIZZ exposed a bug in code for the PE exe-
correctly tagging them with such instructions. Finally, the
cutable format, while we used an ELF binary as initial seed.
IDAT DATA parts in grey can be seen as a binary blob
fragmented by W EIZZ into distinct fields with gaps due to
5.3. RQ3: Understanding the Impact of Tags untagged bytes, since not all the routines for their processing
perform comparisons over the blob bytes.
A fascinating question is why W EIZZ is effective over The analysis of chunk boundaries is more challenging,
the programs we considered. Pinpointing the root causes as F IND F IELD E ND is sensitive to the initial position for the
in the general case is far from trivial, since targets may analysis. For instance, when starting from the 8-byte magic
follow very different approaches to processing input bytes, number it correctly identifies the entire test case as one
and involve heterogeneous formats. We partially answer this chunk. Similarly, when starting from a byte of a length field,
question with two case studies: (i) we explore to which ex- it may capture all the other fields for that chunk5 . However,
tent tags found by W EIZZ on one seed for libpng can assist when starting from a type field it may build an incomplete
field and chunk identification, and (ii) we show that both tag- chunk. Nonetheless, even when boundary detection is im-
based mutations and roadblock bypassing are essential for precise, W EIZZ can still make valuable mutations for two
ffmpeg, but either alone is not enough for optimal coverage. reasons. First, mutations that borrow bytes from other test
Tag Accuracy. Figure 7 shows the raw bytes from the cases check the same leading tag, and this yields for instance
initial seed not_kitty.png for libpng. The test case splicing of likely “compatible” incomplete chunks. Second,
starts with a 8-byte magic number (in orange) and then this can sometimes be beneficial in order to exercise not
contains 5 chunks. Each chunk is characterized by four so well-tested corner cases or shared parsing code: we will
fields: length (4 bytes, in green), type (4 bytes, red), data return to this point in Section 6. For libpng W EIZZ achieved
(typically heterogeneous, as with PNG_CHUNK_IHDR data better coverage than other fuzzers, including AFLS MART.
in yellow), and a CRC-32 checksum (4 bytes, in blue). The Structural Mutations vs. Roadblocks. To dig deeper in
last chunk has no data since it merely marks the file end. the experiments of Section 5.1, one question could be: how
To test the field identification process, we analyzed the much coverage improvement is due to structural mutations
tags from the surgical stage for the test case. Starting from or roadblock bypassing? Let us consider ffmpeg as case
the first byte, we apply F IND F IELD E ND repeatedly at every
position after the end of the last found field. In Figure 7 we 5. Depending on the starting byte, it could miss some of the first bytes.
Basic blocks Queue size
18000 8000 considered. We leave to future work the investigation of
16000 7000
using forms of bit-level DTA [36], [37] as an alternative for
14000 6000
12000
dependency identification over “costly” inputs.
5000
10000
4000 In the implementation, W EIZZ misses dependencies for
8000
6000
3000 comparisons made with instructions other than the currently
4000 2000 supported ones (e.g., lea, add), as logging them blindly
2000 1000 can be expensive. In the spirit of V UZZER, we could resort to
0
00 01 02 03 04 05 00 01 02 03 04 05
0
an intra-procedural static analysis to point out those actually
W EIZZ W EIZZ† W EIZZ‡ AFLS MART involved in branching decisions. Similarly, we could use
heuristics to reveal code targets for switch jump tables. Like
Figure 8: Five-hour analysis of ffmpeg: W EIZZ† lacks road-
R ED Q UEEN, W EIZZ can detect but not patch checksums
block bypass techniques, W EIZZ‡ lacks tag-based mutations.
tests made via comparison functions, yet we could extend
it to hook library functions of this kind (e.g., memcmp).
study: in addition to W EIZZ and W EIZZ† that only leverages
The structure-aware stage offers automatic means to
structural mutations (Section 5.1.1), we introduce a variant
effectively fuzz chunk-based formats. In addition to not
W EIZZ‡ that can use I2S facts to by pass roadblocks like
requiring a specification, our approach is different than
R ED Q UEEN, but lacks tag-based mutations. In Figure 8 we
AFLS MART also in where we apply high-order operators.
report code coverage and input queue size after a run of 5
AFLS MART mutates chunks in a black-box fashion, that
hours6 . When taken individually, tag-based mutations seem
is, it has no evidence whether the program manipulated
to have an edge over roadblock bypass techniques (+17%
the involved input portion during the execution. W EIZZ
in terms of code coverage). This may seem surprising as
chooses among chunks that the executed comparison in-
ffmpeg supports multiple formats and I2S facts can speed
structions indirectly reveal. We find hard to argue which
up the identification of their magic numbers, unlocking
strategy is superior in the general case. Another important
several program paths. Nonetheless, structural mutations
difference that we pointed out in Section 5 is that, as our
affect code coverage more appreciably on ffmpeg. When
inference schemes are not sound, we may mutate inputs in
combining both features in W EIZZ, we get the best result
odd ways, for instance replacing only portions of a chunk
with 34% more coverage than in W EIZZ‡ . Interestingly,
with a comparable counterpart from another input. In the
W EIZZ† shows a larger queue size than W EIZZ: this means
AFLS MART paper the authors explain that they could find
that although W EIZZ explore more code portions, W EIZZ†
some bugs only thanks to a relaxed specification [4]. We
covers some of the branches more exhaustively, i.e., it is
find this consistent with the G RIMOIRE experience with
able to cover more hit count buckets for some branches.
grammars, where paths outside the specification revealed
coding mistakes [6], as we discussed also in Section 1.
6. Discussion
Yet there is room to extend the chunk boundary in-
W EIZZ introduces novel ideas in the fuzzing landscape ference schemes with new heuristics, or make the current
for computing dependency information between input bytes ones more efficient. For instance, we have explored with
and comparisons in the code in order to simultaneously promising results a variant where we locate the beginning
overcome fuzzing roadblocks and back a fully automatic of a chunk at a field made of I2S bytes, possibly indicative
analysis of chunk-based binary formats. From a practical of a magic value for its type. For the current heuristics, we
standpoint, the approach has two advantages: (i) fuzzers may refine the probability of selecting the initial byte by
already attempt bitflips during deterministic mutations, and weighing in changes in timestamp monotonicity that impact
(ii) instrumenting comparisons is becoming a common prac- F IND C HUNK E ND. We leave these extensions to future work.
tice to overcome roadblocks. We empower such analyses
to better characterize the program behavior while fuzzing,
Concluding remarks. The experimental evaluation revealed
paving the way to the tag assignment mechanism. Prior
promising results for W EIZZ, which turned out to be able to
proposals do not offer sufficient information to this end:
effectively fuzz programs working on chunk-based formats
even in the case of R ED Q UEEN, its colorization scheme [5]
without requiring a specification. We plan to extend our
identifies I2S portions of an input (crucial for roadblocks),
investigation to a larger pool of subjects, and to shed more
but cannot reveal dependencies for non-I2S bytes.
light on the impact of structure-aware techniques: how they
A downside of our technique is that analyzing dependen-
impact code coverage, which are the most effective over a
cies via flipping can get costly over large inputs. However,
given program, and how often the generated inputs would
an equally important factor is the time the program takes
not comply to the format specification. Answering these
to execute one test case. In our experiments W EIZZ applied
questions seems far from trivial due to the high throughput
the surgical stage to inputs long up to 3K bytes, achieving
and entropy of fuzzing. Finally, we would like to explore
comparable or better coverage than the other fuzzers we
adaptations of the current implementation to other archi-
6. We show the coverage from AFLS MART for reference, but not the tecture and platforms, for instance to analyze Windows
queue size as its context-insensitivity brings a different queue handling. binaries, and to other instrumentation technologies.
References [16] M. Jurczyk, “CompareCoverage,” https://github.com/
googleprojectzero/CompareCoverage/, 2019, [Online; accessed
10-Sep-2019].
[1] M. Zalewski, “American Fuzzy Lop,” https://github.com/Google/AFL,
2019, [Online; accessed 10-Sep-2019]. [17] R. Swiecki, “honggfuzz,” https://github.com/google/honggfuzz, 2017,
[Online; accessed 10-Sep-2019].
[2] J. Wang, B. Chen, L. Wei, and Y. Liu, “Superion: Grammar-aware
greybox fuzzing,” in Proceedings of the 41st International Conference [18] M. Heuse, H. Eißfeldt, and A. Fioraldi, “AFL++,” https://github.com/
on Software Engineering, ser. ICSE ’19, 2019, pp. 724–735. [Online]. vanhauser-thc/AFLplusplus, 2019, [Online; accessed 10-Sep-2019].
Available: https://doi.org/10.1109/ICSE.2019.00081 [19] Y. Li, B. Chen, M. Chandramohan, S.-W. Lin, Y. Liu, and A. Tiu,
“Steelix: Program-state based binary fuzzing,” in Proceedings of the
[3] C. Aschermann, T. Frassetto, T. Holz, P. Jauernig, A. Sadeghi, and
2017 11th Joint Meeting on Foundations of Software Engineering,
D. Teuchert, “NAUTILUS: fishing for deep bugs with grammars,” in
ser. ESEC/FSE 2017, 2017, pp. 627–637. [Online]. Available:
26th Annual Network and Distributed System Security Symposium,
http://doi.acm.org/10.1145/3106237.3106295
NDSS, 2019. [Online]. Available: https://www.ndss-symposium.org/
ndss-paper/nautilus-fishing-for-deep-bugs-with-grammars/ [20] E. J. Schwartz, T. Avgerinos, and D. Brumley, “All you ever wanted
to know about dynamic taint analysis and forward symbolic execution
[4] V. Pham, M. Boehme, A. E. Santosa, A. R. Caciulescu, and A. Roy- (but might have been afraid to ask),” in Proceedings of the 2010
choudhury, “Smart greybox fuzzing,” IEEE Transactions on Software IEEE Symposium on Security and Privacy, ser. SP 2010, 2010, pp.
Engineering, 2019. 317–331. [Online]. Available: http://dx.doi.org/10.1109/SP.2010.26
[5] C. Aschermann, S. Schumilo, T. Blazytko, R. Gawlik, and T. Holz, [21] N. Stephens, J. Grosen, C. Salls, A. Dutcher, R. Wang,
“REDQUEEN: fuzzing with input-to-state correspondence,” in 26th J. Corbetta, Y. Shoshitaishvili, C. Kruegel, and G. Vigna,
Annual Network and Distributed System Security Symposium, “Driller: Augmenting fuzzing through selective symbolic
NDSS, 2019. [Online]. Available: https://www.ndss-symposium.org/ execution.” in 23th Annual Network and Distributed System
ndss-paper/redqueen-fuzzing-with-input-to-state-correspondence/ Security Symposium, NDSS, 2016. [Online]. Available:
[6] T. Blazytko, C. Aschermann, M. Schlögel, A. Abbasi, S. Schumilo, https://www.ndss-symposium.org/wp-content/uploads/2017/09/
S. Wörner, and T. Holz, “GRIMOIRE: Synthesizing structure driller-augmenting-fuzzing-through-selective-symbolic-execution.
while fuzzing,” in 28th USENIX Security Symposium (USENIX pdf
Security 19), 2019, pp. 1985–2002. [Online]. Available: https: [22] I. Yun, S. Lee, M. Xu, Y. Jang, and T. Kim, “QSYM: A
//www.usenix.org/system/files/sec19-blazytko.pdf practical concolic execution engine tailored for hybrid fuzzing,”
in Proceedings of the 27th USENIX Conference on Security
[7] W. Cui, M. Peinado, K. Chen, H. J. Wang, and L. Irun-Briz, “Tupni:
Symposium, ser. SEC’18, 2018, pp. 745–761. [Online]. Available:
Automatic reverse engineering of input formats,” in Proceedings
http://dl.acm.org/citation.cfm?id=3277203.3277260
of the 15th ACM Conference on Computer and Communications
Security, ser. CCS ’08, 2008, pp. 391–402. [Online]. Available: [23] T. Wang, T. Wei, G. Gu, and W. Zou, “Taintscope: A checksum-aware
http://doi.acm.org/10.1145/1455770.1455820 directed fuzzing tool for automatic software vulnerability detection,”
in 2010 IEEE Symposium on Security and Privacy (SP), 2010, pp.
[8] S. Rawat, V. Jain, A. Kumar, L. Cojocar, C. Giuffrida, 497–512. [Online]. Available: https://doi.org/10.1109/SP.2010.37
and H. Bos, “Vuzzer: Application-aware evolutionary
fuzzing,” in 24th Annual Network and Distributed System [24] H. Peng, Y. Shoshitaishvili, and M. Payer, “T-fuzz: Fuzzing by
Security Symposium, NDSS, 2017. [Online]. Available: program transformation,” in 2018 IEEE Symposium on Security
https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/ and Privacy (SP), May 2018, pp. 697–710. [Online]. Available:
vuzzer-application-aware-evolutionary-fuzzing/ https://doi.org/10.1109/SP.2018.00056
[9] P. Chen and H. Chen, “Angora: Efficient fuzzing by principled [25] R. Baldoni, E. Coppa, D. C. D’Elia, C. Demetrescu, and I. Finocchi,
search,” in 2018 IEEE Symposium on Security and Privacy (SP), “A survey of symbolic execution techniques,” ACM Computing
2018, pp. 711–725. [Online]. Available: https://doi.org/10.1109/SP. Surveys, vol. 51, no. 3, pp. 50:1–50:39, 2018. [Online]. Available:
2018.00046 http://doi.acm.org/10.1145/3182657
[10] M. Höschele and A. Zeller, “Mining input grammars from [26] D. C. D’Elia, E. Coppa, S. Nicchi, F. Palmaro, and L. Cavallaro,
dynamic taints,” in Proceedings of the 31st IEEE/ACM International “Sok: Using dynamic binary instrumentation for security (and
Conference on Automated Software Engineering, ser. ASE 2016, how you may get caught red handed),” in Proceedings of the
2016, pp. 720–725. [Online]. Available: http://doi.acm.org/10.1145/ 2019 ACM Asia Conference on Computer and Communications
2970276.2970321 Security, ser. Asia CCS ’19, 2019, pp. 15–27. [Online]. Available:
http://doi.acm.org/10.1145/3321705.3329819
[11] M. Höschele and A. Zeller, “Mining input grammars with
[27] J. Choi, J. Jang, C. Han, and S. K. Cha, “Grey-box concolic testing
AUTOGRAM,” in Proceedings of the 39th International Conference
on binary code,” in Proceedings of the 41st International Conference
on Software Engineering Companion, ser. ICSE-C ’17, 2017, pp.
on Software Engineering, ser. ICSE ’19, 2019, pp. 736–747.
31–34. [Online]. Available: https://doi.org/10.1109/ICSE-C.2017.14
[Online]. Available: https://doi.org/10.1109/ICSE.2019.00082
[12] A. Zeller, R. Gopinath, M. Böhme, G. Fraser, and C. Holler, “The [28] C. Holler, K. Herzig, and A. Zeller, “Fuzzing with code fragments,”
Fuzzing Book,” https://www.fuzzingbook.org/, 2019, [Online; ac- in Proceedings of the 21st USENIX Conference on Security
cessed 10-Sep-2019]. Symposium, ser. SEC’12, 2012, pp. 38–38. [Online]. Available:
[13] A. Takanen, J. D. Demott, and C. Miller, Fuzzing for Software http://dl.acm.org/citation.cfm?id=2362793.2362831
Security Testing and Quality Assurance, 2nd ed. Norwood, MA, [29] D. Aitel, “The Advantages of Block-Based Protocol Analy-
USA: Artech House, Inc., 2018. sis for Security Testing,” https://www.immunitysec.com/downloads/
[14] “Circumventing Fuzzing Roadblocks with Compiler advantages_of_block_based_analysis.html, 2002, [Online; accessed
Transformations,” https://lafintel.wordpress.com/2016/08/15/ 10-Sep-2019].
circumventing-fuzzing-roadblocks-with-compiler-transformations/, [30] M. Eddington, “Peach fuzzing platform,” http://community.
2016, [Online; accessed 10-Sep-2019]. peachfuzzer.com/WhatIsPeach.html, [Online; accessed 10-Sep-
2019].
[15] T. Ormandy, “Making Software Dumberer,” http://taviso.decsystem.
org/making_software_dumber.pdf, 2009, [Online; accessed 10-Sep- [31] “libprotobuf-mutator,” https://github.com/google/libprotobuf-mutator,
2019]. [Online; accessed 10-Sep-2019].
[32] D. C. D’Elia, C. Demetrescu, and I. Finocchi, “Mining hot calling function P LACE TAGS (Tags, Deps, s, CT, CI, R, o, I):
contexts in small space,” Software: Practice and Experience, vol. 46, 1 for b ∈ {0 ... len(I)-1}, op ∈ {op1 , op2 } do
W
pp. 1131–1152, 2016. [Online]. Available: https://doi.org/10.1002/ 2 if ¬ j Deps(s, j, op)[b] then continue
spe.2348 3
P W
n ← k (1 if j Deps(s, j, op)[k] else 0)
4
0
n ← Tags[b].numDeps
[33] M. Bynens, “Smallest possible syntactically valid files of different
types,” https://github.com/mathiasbynens/small, 2019, [Online; ac- 5 reassign ← (!I S C KSM(Tags[b]) ∧ n0 > 4 ∧ n < n0 )
cessed 10-Sep-2019]. 6 if b 6∈ Tags ∨ I S VALID C KSM(CI, s) ∨ reassign then
7 Tags[b] ← S ET TAG(Tags[b], CT, s, op, CI, R, n, o)
[34] “Google OSS-Fuzz: continuous fuzzing of open source software,” 8 return Tags
https://github.com/google/oss-fuzz, 2019, [Online; accessed 10-Sep-
2019]. Algorithm 5: P LACE TAGS procedure.
[35] Blackngel, “MALLOC DES-MALEFICARUM,” http://phrack.org/
issues/66/10.html, 2019, [Online; accessed 10-Sep-2019]. function F IX C HECKSUMS (CI, I, Tags, o):
1 Tagsck ← F ILTER A ND S ORT TAGS (Tags, o)
[36] B. Yadegari and S. Debray, “Bit-level taint analysis,” in 2014 IEEE 2 CT, h ← RUN I NSTR ck (I)
14th International Working Conference on Source Code Analysis 3 for k ∈ {0 ... |Tagsck |-1} do
and Manipulation (SCAM), 2014, pp. 255–264. [Online]. Available: 4 v, idx ← G ET C MP O P O F TAG(CT, Tagsck [k], CI, R)
https://doi.ieeecomputersociety.org/10.1109/SCAM.2014.43 5 I ← F IX I NPUT(I, v, idx)
6 U NPATCH P ROGRAM(Tagsck [k], CI)
[37] B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray, “A 7 CT, h0 ← RUN I NSTRck (I0 )
generic approach to automatic deobfuscation of executable code,” 8 if h 6= h0 then
in 2015 IEEE Symposium on Security and Privacy (SP), 2015, pp. 9 CI ← M ARK C KSM FP(CI, Tagsck [k].id)
674–691. [Online]. Available: http://dx.doi.org/10.1109/SP.2015.47 10 return CI, I, false
11 U NPATCH U NTAGGED C HECKSUMS (CI, Tagsck )
0
12 h ← RUN L IGHT I NSTR ck (I)
A. Appendix 0
13 if h == h then return CI, I, true
14 foreach a ∈ dom(CI) | @ t: Tagsck [t].id == a do
15 PATCH P ROGRAM(C, a)
A.1 Roadblocks from lodepng 16 h00 ← RUN L IGHT I NSTRck (I)
17 if h00 6= h0 then CI ← M ARK C KSM FP(CI, a)
18 U NPATCH P ROGRAM(C, a)
As an example of roadblocks found in real-world soft- 19 return CI, I, false
ware, the following excerpt from lodepng image encoder Algorithm 6: F IX C HECKSUMS procedure.
and decoder looks for the presence of the magic string
IHDIR denoting the namesake PNG chunk type and checks function F IND F IELD E ND (Tags, k, depth):
the integrity of its data against a CRC-32 checksum: 1 end ← G O R IGHT W HILE S AME TAG (Tags, k)
2 if depth < 8 ∧ Tags[end+1].ts == Tags[k].ts+1 then
if (!lodepng_chunk_type_equals(in + 8, "IHDR")) { 3 end ← F IND F IELD E ND(Tags, end+1, depth+1)
/*error: it doesn’t start with a IHDR chunk!*/ 4 return end
CERROR_RETURN_ERROR(state->error, 29);
}
Algorithm 7: F IND F IELD E ND procedure.
...
unsigned CRC = lodepng_read32bitInt(&in[29]);
function G ET R ANDOM C HUNK (Tags):
unsigned checksum = lodepng_crc32(&in[12], 17);
if (CRC != checksum) { 1 with probability Prchunk12 do
/*invalid CRC*/ 2 k ← R AND(len(Tags))
CERROR_RETURN_ERROR(state->error, 57); 3 start ← G O L EFT W HILE S AME TAG(Tags, k)
} 4 end ← F IND C HUNK E ND(Tags, k, 0)
5 return (start, end)
6 else S
A.2 Additional Algorithms 7 id ← pick u.a.r. from i {Tags[i].id}
8 start ← 0, L ← ∅
9 while start < |Tags| do
Tag Placement. Algorithm 5 provides pseudocode for the 10 start ← S EARCH TAG R IGHT(Tags, start, id)
11 end ← F IND C HUNK E ND(Tags, start)
P LACE TAGS procedure, while we refer the reader to Sec- 12 L ← L ∪ {(start, end)}, start ← end + 1
tion 3.1.3 for a discussion of its main steps. An interesting 13 return l ∈ L chosen u.a.r.
detail presented here is related to the condition n0 > 4 when Algorithm 8: G ET R ANDOM C HUNK procedure.
computing the reassign flag. This results from experimental
observations, with 4 being the most common field size in
the formats we analyzed. Hence, W EIZZ will choose as Field Identification. Algorithm 7 shows how to implement
characterizing instruction for a byte the first comparison the F IND F IELD E ND procedure that we use to detect fields
that depends on no more than 4 bytes, ruling out possibly complying to one of the patterns from Figure 3, which we
“shorter” later comparisons as they may be accidental. discussed in Section 3.2.1. To limit the recursion at line 3
for field extension, we use 8 as maximum depth (line 7).
Checksum Handling. In Algorithm 6 we report detailed
pseudocode for the F IX C HECKSUMS procedure that we Chunk Identification. As discussed in Section 3.2.2, in
illustrated in its main steps throughout Section 3.1.4. some formats there might be dominant chunk types, and
choosing the initial position randomly would reveal a chunk A.4 Additional Experimental Data
of non-dominant type with a smaller probability. W EIZZ
chooses between the original and the alternative heuristic Table 6 and Table 7 provide the 60%-confidence in-
with a probability Prchunk12 as described in Algorithm 8. tervals for the experiments reporting the basic block code
coverage discussed in Section 5.1.
A.3 Implementation Tuning
In the following we present present additional optimiza- TABLE 6: Basic block coverage (60% confidence intervals)
tions that we integrated in our implementation. after 5-hour fuzzing of the top 6 subjects of Table 2.
Loop Bucketization. In addition to newly encountered P ROGRAMS W EIZZ AFLS MART W EIZZ†
wavpack 1824-1887 1738-1813 1614-1749
edges, AFL deems a path interesting when some branch readelf 7298-7370 6087-6188 6586-6731
hit count changes (Section 2.1). To avoid keeping too many decompress 5831-6276 6027-6569 5376-5685
inputs, AFL verifies whether the counter falls in a previ- djpeg 2109-2137 2214-2221 2121-2169
libpng 1620-1688 1000-1035 1188-1231
ously unseen power-of-two interval. However, as pointed ffmpeg 15946-17885 9352-9923 14515-14885
out by the authors of LAF-I NTEL [14], this approach falls
short in deeming progress with patterns like the following:
TABLE 7: Basic block coverage (60% confidence intervals)
for (int i = 0; i < strlen(magic); ++i) after 24-hour fuzzing of the bottom 8 subjects of Table 2.
if (magic[i] != input[i]) return;
P ROGRAMS W EIZZ E CLIPSER AFL++ AFL
To mitigate this problem, in the surgical phase W EIZZ djpeg 612-614 492-532 561-577 581-592
considers interesting also inputs that lead to the largest hit libpng 1747-1804 704-711 877-901 987-989
objdump 3366-4235 2549-2648 2756-3748 2451-2723
count observed for an edge across the entire stage. mpg321 428-451 204-204 426-427 204-204
oggdec 369-372 332-346 236-244 211-211
Optimizing I2S Fuzzing. Another tuning for the surgical readelf 7428-7603 2542-2871 4265-5424 2982-3091
tcpdump 7662-7833 6591-6720 5033-5453 4471-4576
stage involves the F UZZ O PERANDS step. W EIZZ decides gif2rgb 453-464 357-407 451-454 457-465
to skip a comparison site when previous iterations over it
were unable to generate any coverage-improving paths. In
particular, the probability of executing F UZZ O PERANDS on
a site is initially one and decreases with the ratio between
failed and total attempts.

Automatic Grey-box Fuzzing for Structured Binary Formats

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automatic Grey-box Fuzzing for Structured Binary Formats

Uploaded by

Copyright:

Available Formats

WEIZZ: Automatic Grey-box Fuzzing for Structured Binary Formats

Andrea Fioraldi Daniele Cono D’Elia Emilio Coppa

chunk type, and a list of children attributes and nested

2.4. Discussion Algorithm 1 summarizes the working of the surgical

3.2. Structure-aware Stage id B A E G B D Tags

The second stage of W EIZZ generates new inputs via id B A A G G D

passing capabilities that AFLS MART misses. Nevertheless, 4000

under analysis, like in past evaluations of AFLS MART. We 2000

compress, and slightly worse on djpeg. To understand these 1000 10000

Higher coverage in W EIZZ† comes from the different 200 2000

approach to structural mutations: AFLS MART follows a 0

0 0 A crucial feature of W EIZZ is that it does not require a

You might also like