Verifying C11 Programs Operationally: Simon Doherty Brijesh Dongol

Verifying C11 Programs Operationally
Simon Doherty Brijesh Dongol

Department of Computer Science Department of Computer Science
University of Sheffield University of Surrey
s.doherty@sheffield.ac.uk b.dongol@surrey.ac.uk
Heike Wehrheim John Derrick

Department of Computer Science Department of Computer Science
arXiv:1811.09143v1 [cs.PL] 22 Nov 2018
University of Paderborn University of Sheffield

wehrheim@upb.de j.derrick@sheffield.ac.uk
Abstract been a substantial effort to develop an operational seman-

This paper develops an operational semantics for a release- tics: for weak memory models in general [12, 13, 15] and for
acquire fragment of the C11 memory model with relaxed C11 specifically [19, 20].
accesses. We show that the semantics is both sound and Our key goal in this paper is to develop an operational
complete with respect to the axiomatic model. The seman- model that supports verification of weak memory C11 pro-
tics relies on a per-thread notion of observability, which al- grams. Like many programming languages, C11 has several
lows one to reason about a weak memory C11 program in advanced features, e.g., speculation, that contributes to the
program order. On top of this, we develop a proof calculus complexity of the logics for reasoning about them. Some
for invariant-based reasoning, which we use to verify the operational models (e.g., [20]) attempt to deal with the full
release-acquire version of Peterson’s mutual exclusion algo- complexity of the language and its behaviour. Other mod-
rithm. els focus on a well-behaved and well-understood fragment
(e.g., [13, 15]). In order to support an intuitive verification
Keywords Operational semantics, C11, Verification, Sound- method, we take the latter course. We do not handle some-
ness and Completeness forms of speculation (thin-air reads), release sequences, non-
atomic accesses or sequentially consistent accesses. This leaves
1 Introduction us with the so-called RAR fragment [4] of C11 (see Section 4.1),
where sb ∪ rf is acyclic, and thus dependencies between op-
Intensive research on the correctness of shared-memory con-
erations are easier to manage. All read/write/update opera-
current programs over the last three decades has resulted
tions are either relaxed or synchronised via release-acquire
in a variety of tools and techniques. However, the vast ma-
annotations. Acyclicity of sb ∪ rf precludes behaviours al-
jority of these have been developed on the assumption of
lowed by hardware architectures such as Power [16]. Thus,
sequential consistency [18]. Programs running on modern
to ensure programs proved correct by our logic remain sound,
hardware execute using weak memory models [1], requiring
one must ensure adequate fencing of independent instruc-
many of these techniques to be reworked.
tions during compilation (see [16] for details).
This paper is focused on the C11 memory model, which
This paper comprises three main contributions. The first
has been the topic of several recent papers (e.g., [3–6, 9, 12–
contribution is an operational semantics for the RAR frag-
14, 17, 19, 20]). Typically the C11 memory model is described
ment that we prove to be both sound and complete with re-
using an axiomatic semantics [3–5, 17] via a two-step proce-
spect to the axiomatic definition. Our semantics (like [13,
dure. (1) Construct candidate executions of a program com-
22]) allows each thread to have its own (per-thread) obser-
prising low-level (e.g., read/write operations) in which reads
vations of memory. We build on the recently proposed ex-
may return an arbitrary value. (2) Apply a number of axioms
tended coherence order [17] (which is the transitive closure of
over the memory model to rule out invalid candidate execu-
the communication relation in [2]). The extended coherence
tions. Such axioms may state, for instance, that every read is
order describes the order of reads and writes to a variable
validated by a write that has written the value read. Of par-
(see Example 3.3), which in turn enables one to define how
ticular interest are axioms that exclude certain cycles from
events may be introduced in a valid C11 execution without
arising. However precise, axiomatic definitions are unsuit-
violating validity of the axioms.
able for program verification (in particular, those involving
We combine the extended coherence order with the causal-
invariant-based reasoning), which requires one to consider
ity relation of C11 (formalised by happens-before) to define
the step-wise execution of a program. There has therefore
the set of writes already encountered by each thread. This
set is in turn used to define the writes observable by the
Conference’17, July 2017, Washington, DC, USA
thread (see Section 3.2). Our operational semantics naturally
2019.
1
Conference’17, July 2017, Washington, DC, USA S. Doherty, B. Dongol, H. Wehrheim, and J. Derrick
builds on observability: reads are validated on-the-fly (as Algorithm 1 Peterson’s algorithm with release-acquire
opposed to a post-hoc manner in the axiomatic semantics). Init: flag 1 = false ∧ flag 2 = false ∧ turn = 1
Thus, each state constructed using the transition relations 1: thread 1 1: thread 2
of our operational semantics is a valid C11 state (see Sec- 2: flag 1 := true ; 2: flag 2 := true;
tion 4.2). Moreover, we show that any candidate execution 3: turn.swap(2)RA ; 3: turn.swap(1)RA ;
that is valid according to the axiomatic semantics can be 4: while (flag 2 = true)A 4: while (flag 1 = true)A
generated by our operational semantics. ∧ turn = 2 ∧ turn = 1
do skip do skip
The second contribution is a verification technique that
builds on the operational semantics to enable inductive rea- 5: Critical section ; 5: Critical section ;
6: flag 1 :=R false; 6: flag 2 :=R false;
soning over the program steps. One difficulty in using an
operational semantics of weak-memory to support verifi-
cation is the fact that the state spaces of such operational
models are far more complicated than the state space that Commands have their standard meanings. The only ex-
one would use for a verification over sequentially consis- ceptions are the synchronising annotations, release R, ac-
tent memory, where the shared store can be represented quire A and release-acquire RA (which we describe in detail
using a simple mapping from variables to values. We ad- below), and the command swap, which generates a read-
dress this issue by developing a notation that builds on con- modify-write update event, atomically swapping the vari-
ventional reasoning (over sequentially consistent memory). able x with value n. Note that (for simplicity), we only present
For example, we include assertions that ensure a thread will a release-acquire version of the swap operation, but leave
read a particular value in a C11 state and assertions that en- in the RA annotation for emphasis. Furthermore, we assume
sure happens-before order between writes to different vari- unannotated accesses are relaxed, i.e., data races do not give
ables. The former is analogous to equations on variables and rise to undefined behaviour; however it is straightforward
their values in the conventional setting; the latter has no di- to extend the semantics to incorporate non-atomic accesses
rect analogue in a sequentially consistent setting (the clos- (which potentially generate undefined behaviour).
est analogue is the use of auxiliary variables [21] to record Example 2.1 (Peterson’s algorithm). The running example
whether certain operations have already occurred). for this paper will be the classic Peterson’s mutual exclusion
Our third contribution is the demonstration of the utility algorithm for two threads (see Algorithm 1) implemented
of our verification method by proving the mutual exclusion using release-acquire annotations (this algorithm is taken
property of a C11 version of Peterson’s algorithm [24]. from [24]). As with Peterson’s original algorithm, variable
flagi is used to indicate whether thread i intends to enter
2 Command Language its critical section and a shared variable turn is used to “give
This section describes our command language and defines way” when both threads intend to enter their critical sec-
its uninterpreted operational semantics; namely, an operational tions at the same time.
semantics that generate the read, write or update action for The difference in the C11 implementation is with the syn-
each step of the corresponding command. These actions are chronisation annotations. The flag variable is set to true (line 2)
in turn used to generate state transitions in Section 3, where using relaxed atomics (which does not induce any synchro-
the reads and writes are interpreted in a C11 state. Such a nisation), but is set to false (line 6) using a release annotation.
decoupled approach is inspired by the approach taken by The intention of the latter is to synchronise this write to flag
Lahav et al. [14]. with the read of flag at line 4 in the other thread. The value
of turn is set using a swap command, which induces release-
acquire synchronisation. Note that the read of turn within
2.1 Syntax the guard of the busy wait loop (line 4) is relaxed. However,
The syntax of commands (for a single thread) is defined by as we shall see, the algorithm still satisfies the mutual exclu-
the following grammar, where Exp and Com define expression property.
sions and commands, respectively. We assume that ⊖ is a
unary operator (e.g., ¬), ⊗ is a binary operator (e.g., ∧, ∨), B 2.2 Uninterpreted semantics
is an expression (of type Exp) that evaluates to a boolean, x The uninterpreted operational semantics of commands is
is a variable (of type Var) and n is a value (of type Val). given by a relation −→ ⊆ Com × Actτ × Com, where
Ø
Exp ::= Val | ExpA | ⊖Exp | Exp ⊗ Exp Act = {rd(x, n), rd A(x, n), wr (x, n),
x ∈Var;m,n ∈Val wr R (x, n), upd RA(x, m, n)}
Com ::= skip | x .swap(n)RA | x := Exp | x :=R Exp |
Com; Com | if B then Com else Com | τ < Act is a silent action and Actτ = Act ∪ {τ }. We write
a
while B do Com C −→ C ′ for (C, a, C ′) ∈ −→.
2
Verifying C11 Programs Operationally Conference’17, July 2017, Washington, DC, USA
x ∈ fv(E) n ∈ Val x ∈ fv(E) n ∈ Val fv(E) , ∅ fv(E 1 ) , ∅ fv(E 1 ) = ∅

a = rd(x, n) a = rd A (x, n) eval(E, a, E ′ ) eval(E 1, a, E 1′ ) eval(E 2, a, E 2′ )
eval(E, a, E[n/x]) eval(E A, a, E[n/x]) eval(⊖E, a, ⊖E ′ ) eval(E 1 ⊗ E 2 , a, E 1′ ⊗ E 2 ) eval(E 1 ⊗ E 2 , a, E 1 ⊗ E 2′ )
Figure 1. Expression evaluation
An expression evaluation step is formalised by a relation We formalise C11 states in Section 3.1 and define an oper-
eval(E, a, E ′), where E, E ′ are expressions and a is a read ac- ational event semantics based on observability (Section 3.2).
tion that is generated by the evaluation step (see Figure 1). This event semantics in turn gives rise to an interpreted se-
We assume fv(E) returns the set of free variables in E. Note mantics (Section 3.3).
that eval(E, a, E ′) is only defined when fv(E) , ∅. Moreover,
in the presence of a binary operator, expression evaluation is 3.1 C11 States and Basic Orders
assumed to take place from left to right. The notation E[n/x] The formalisation in this section follows the existing litera-
stands for expression E with variable x replaced by value n. ture on axiomatic C11 semantics [5, 17]. First we give some
The uninterpreted operational semantics for commands preliminary definitions.
is given in Figure 2. Again, most of these rules are straight-
forward. We assume [[E]] denotes the value of (variable-free) Notation. For an action a ∈ Act, we let var(a) ∈ Var be the
expression E. An assignment x := E generates a read action variable read (or written to), rdval(a) ∈ Val be the value read
whenever fv(E) , ∅ and a write action whenever fv(E) = ∅. and wrval(a) ∈ Val be the value written. We extend actions
A swap command generates an update action, and guard to events of type Evt = G × Actτ × T , where G is the set of
evaluation either generates a read or a silent action. tags used to uniquely identify events in an execution. For an
Note that the uninterpreted operational semantics allows event (д, a, t), where д is a tag, a is an action, and t is a thread
any value to be read. Thus, we have the following property. identifier, we define tag(e) = д, act(e) = a, tid(e) = t, and
r d (x,m) (using lifting) var(e) = var(act(e)), wrval(e) = wrval(act(e)),
Proposition 2.2. For all m, m ′ ∈ Val, if C −−−−−−→ C ′, then rdval(e) = rdval(act(e)). For a relation R ⊆ Evt × Evt, we let
r d (x,m′ ) r d A (x,m) r d A (x,m′ )
C −−−−−−−→ C ′ ; if C −−−−−−−→ C ′ , then C −−−−−−−−→ C ′ ; and if R |t and R |v be the restriction of R to events of thread t, and
upd RA (x,m,n) upd RA (x,m′,n) variable v, respectively.
C −−−−−−−−−−−→ C ′ , then C −−−−−−−−−−−→ C ′ . We let U denote the RMW update events, and distinguish
For simplicity, we assume concurrency at the top level the sets WrR ⊇ U (write release), RdA ⊇ U (read acquire),
only. We let T be the set of all threads and use function of WrX (write relaxed) and RdX (read relaxed). Finally, we de-
type Prog : T → Com to model a program comprising mul- fine Rd = RdA ∪ RdX (all reads) and Wr = WrR ∪ WrX (all
tiple threads. The uninterpreted operational semantics of a writes).
program is given by a relation −→ ⊆ Prog × T × Actτ ×
a Definition 3.1. A C11 state is a triple D = ((D, sb), rf, mo)
Prog (using overloading). As before, we write P −→t P ′
for (P, t, a, P ′ ) ∈ −→. An evaluation step of a program P is comprising a set of events D paired with a sequenced-before
given by the rule Prog (Figure 2), which relies on the unin- relation sb ⊆ D × D, a reads-from relation rf ⊆ Wr × Rd and
terpreted operational semantics of a command to generate a modification order mo ⊆ Wr × Wr.
an action a and command C from the command P(t). The
program after taking a transition is the program P but with We let Σ denote the set of all C11 states. The three rela-
t mapped to the new command C. tions in a C11 state ((D, sb), rf, mo) reflect different relation-
Since threads execute independently in the uninterpreted ships between operations. The sequenced-before relation sb
semantics, all actions commute. records the program order within one thread; sb |t is a strict
a1 a2 total order for each thread t. The reads-from relation rf pro-
Proposition 2.3. If P −→t1 P1 and P1 −→t2 P ′ and t 1 , t 2 , vides the justification for the values being read: every read
a2 a1
then there exists a P2 such that P −→t2 P2 and P2 −→t1 P ′ . must have a corresponding action that writes the value be-
ing read. The modification order mo describes an ordering
3 An Operational Semantics for RAR C11 of the writes on variables; mo |v is a strict total order for each
We now extend the semantics from Section 2 and interpret variable v ∈ Var.
read, write and update actions in the C11 memory model. Weak memory models are often defined in terms of a
We develop an operational semantics that takes inspiration happens-before order (denoted hb), which formalises a no-
from the axiomatic descriptions [4, 5, 17]. In Section 4.2, we tion of causality. In C11, an event occuring in a thread before
show that the operational model is in fact equivalent to a another event in the same thread induces sequenced-before
reformulation (inspired by [17]) of the RAR fragment of the order (denoted sb), which in turn induces happens before
RC11 semantics [4]. order. Moreover, reads-from edges induce happens-before
3
a
eval(E, a, E ′ ) fv(E) = ∅ a = wr (x, [[E]]) fv(E) = ∅ a = wr R (x, [[E]]) C 1 −→ C 1′
a a a τ a
x := E −→ x := E ′ x := E −→ skip x :=R E −→ skip skip; C −→ C C 1 ; C 2 −→ C 1′ ; C 2
m, n ∈ Val
a = upd RA (x, m, n) eval(B, a, B ′) [[B]] = true [[B]] = false
a a τ τ
x.swap(n)RA −→ skip if B then C 1 else C 2 −→ if B then C 1 else C 2 −→ C 1 if B then C 1 else C 2 −→ C 2
if B ′ then C 1 else C 2
eval(B, a, B ′) [[B]] = true [[B]] = false
a τ τ
while B do C −→ while B ′ do C while B do C −→ C; while B do C while B do C −→ skip
a
P(t) −→ C
Prog
a
P −→t P[t 7→ C]
Figure 2. Uninterpreted operational semantics of commands and programs
order when the corresponding actions in the edge are syn- the synchronised read rd 3A (x, 2) is justified by the sw from
chronising actions (i.e., a release and an acquire). This is for- wr 2R (x, 2) and fixed before upd 1RA (x, 2, 4) via the fr relation.
malised by an additional synchronises-with relation (denoted Update events are related by both mo and rf to the immedi-
sw). Formally, we define ately preceding write, and possibly related to later writes/updates
sw = rf ∩ (WrR × RdA ) hb = (sb ∪ sw)+ . by mo and fr. If the write being read is releasing, then an up-
date induces an sw (e.g., see upd 1RA (x, 2, 4)).
As is standard in the literature, we assume all variables are
initialised by a special thread 0 ∈ T . Define the set of ini- In addition, our semantics uses the extended coherence or-
tialising writes to be IWr = {w ∈ Wr | tid(w) = 0}. The der2 [17], denoted eco, which is an order that fixes the order
initial states of our operational model are those of the form of reads and writes to each variable (see Example 3.3 below).
σ0 = ((I, ∅), ∅, ∅) where I ⊆ IWr, and for each variable x, Formally we define:
there is exactly one write w ∈ I such that var(w) = x. For a eco = (fr ∪ mo ∪ rf)+
state σ = ((D, _), _, _), let I σ = D ∩ IWr.
The relation fr = (rf −1 ; mo)\Id (where ; is relational com- Example 3.3. For executions of a C11 program, eco over a
position) is the “from-read” relation1 that relates each read single variable takes the following form, where w 1 , . . . , w 5
to all writes that are mo-after the write the read has read are writes and r 1 , r 1′ etc are reads and u is an update.
from. We must subtract Id (identity) edges from rf −1 ; mo to r1 r2 r3 r4
cope with update events, which have the potential to induce r 1′
r 2′ r 4′
reflexivity in fr [4, 17]. rf fr rf fr rf fr rf fr
r 1′′ rf
fr
Example 3.2. An example C11 state is given below, where w1 w2 w3 u w4
mo mo mo mo
threads 1-4 have executed some actions. Since the actions
are unique, we elide the tags from each event, and we iden- Reads r 1 , r 1′ and r 1′′ read from the write w 1 , inducing from-
tify the thread id with the action itself, e.g., wr 1 (y, 1) is the read edges to w 2 (the write that immediately follows w 1 in
action wr (y, 1) executed by thread 1. mo). The update u induces an rf from w 3 (the write event
wr 0 (x, 0), wr 0(y, 0), wr 0(z, 0) immediately before u in mo) and an fr to w 4 (the write event
sb, mo sb immediately after u in mo).
sb, mo
sb, mo
rf
upd 1RA (x, 2, 4) wr 2 (y, 1) wr 3 (z, 3) rd 4 (z, 3) 3.2 Event Semantics and Observability
sb sb mo, rf sb
mo, sw Recalling that Σ denotes the set of all possible C11 states and
wr 2R (x, 2) rd 3A (x, 2) upd 4RA (y, 0, 5) Wr is the set of all writes (including updates), each step of
sw
fr mo, fr the event semantics is formalised by the transition relation
RA ⊆ Σ × Wr⊥ × Evt × Σ (see Figure 3), where we have
w,e ′
The initialising writes are sb-before all thread actions, but Wr⊥ = Wr ∪ {⊥} and ⊥ < Wr. Again, we write σ RA σ
are not ordered amongst themselves. Relation sb also de- ′
for (σ , w, e, σ ) ∈ RA .
scribes the order for each thread. Relation mo describes the w,e ′
For each rule σ RA σ , w is the write being observed
order of modifications for each variable. The unsynchronised by the event e. Strictly speaking, the event semantics could
read rd 4 (z, 3) is justified by the rf from wr 3 (z, 3), whereas
2 Thenon-transitive version of this order is commonly referred to as the
1 fr is also referred to as “reads-before” [17]. com order [2].
4
a ∈ {rd(x, n), rd A (x, n)} var(w) = x wrval(w) = n of the update itself. We therefore define the set of covered
w ∈ OWσ (t) rf ′ = rf ∪ {(w, e)} mo ′ = mo writes as follows:
Read
w,e
((D, sb), rf, mo) RA ((D, sb) + e, rf ′, mo ′)
CWσ = {w ∈ Wr ∩ D | ∃u ∈ U. (w, u) ∈ rf}
a ∈ {wr (x, n),wr R (x, n)} w ∈ OWσ (t)\CWσ
var(w) = x rf ′ = rf mo′ = mo[w, e] Example 3.4. Consider the C11 state σ in Example 3.2. Given
Write
w,e
that I = {wr 0 (x, 0), wr 0(y, 0), wr 0(z, 0)} is the set of initialis-
((D, sb), rf, mo) RA ((D, sb) + e, rf ′, mo ′ ) ing writes, the encountered writes for each thread are as
a = upd RA(x, m, n) w ∈ OWσ (t)\CWσ var(w) = x follows:
wrval(w) = m rf ′ = rf ∪ {(w, e)} mo′ = mo[w, e] EWσ (1) = I ∪ {wr 2R (x, 2), upd 1RA(x, 2, 4)}
RMW
w,e
((D, sb), rf, mo) RA ((D, sb) + e, rf ′, mo′ ) EWσ (2) = I ∪ {wr 2 (y, 1), wr 2R (x, 2), upd 4RA(y, 0, 5)}
Figure 3. Event semantics assuming σ = ((D, sb), rf, mo), EWσ (3) = I ∪ {wr 2 (y, 1), wr 2R (x, 2), wr 3(z, 3), upd 4RA (y, 0, 5)}
e = (д, a, t) and д < taдs(D) EWσ (4) = I ∪ {wr 3 (z, 3), upd 4RA (y, 0, 5)}
The observable writes are hence
be defined without the w. However, making this observed
write explicit is useful for the verification (Section 5). OWσ (1) = {wr 0 (y, 0), wr 0(z, 0), wr 2(y, 1), wr 3 (z, 3),
We now describe each of the rules in Figure 3. Executing upd 1RA (x, 2, 4), upd 4RA(y, 0, 5)}
each event e updates (D, sb) to: OWσ (2) = {wr 0 (z, 0), wr 2(y, 1), wr 3(z, 3), upd 1RA (x, 2, 4)}
OWσ (3) = {wr 2 (y, 1), wr 2R (x, 2), wr 3(z, 3), upd 1RA (x, 2, 4)}

D ∪ {e},
(D, sb) + e =
sb ∪ ({e ′ ∈ D | tid(e ′) ∈ {tid(e), 0}} × {e}) OWσ (4) = {wr 0 (x, 0), wr 2(y, 1), wr 2R (x, 2), wr 3(z, 3),
Thus, the initial writes are sb-prior to every non-initialising upd 1RA (x, 2, 4), upd 4RA(y, 0, 5)}
event. Relations rf and mo are updated according to the
The covered writes are CWσ = {wr 0 (y, 0), wr 2R(x, 2)}.
write events in D that are observable to the thread executing
the given event. To this end, we must distinguish three sets Observable writes are used to resolve the read events in
of writes: encountered writes and observable writes, which each thread. Namely, a thread t can read from any write
are specific to each thread, and covered writes, which are the event in OWσ (t). This is reflected in the Read rule, where
set of writes that are immediately followed, in reads-from the rf component is updated to record an rf from some ob-
order, by an update event. servable write w to the read event e, provided w writes to the
The set of encountered writes are the writes that thread t is variable that e reads and the value read matches the value
aware of (either directly or indirectly) in state σ = ((D, sb), rf, mo),written.
and are given by: To explain the write and update semantics, we require
EWσ (t) = {w ∈ Wr ∩ D | ∃e ∈ D. tid(e) = t ∧ some more formal machinery. The observable and covered
? ? writes together determine the allowable updates to the mo
(w, e) ∈ eco ; hb }
relation after executing a write event. Unlike SC, a write
where R ? is the reflexive closure of relation R. Thus, for each event to variable x is not simply appended to the end of
w ∈ EWσ (t), there must exist an event e of thread t such mo |x . Instead we allow a thread t that performs a write e (or
that w is either eco- or hb- or eco; hb-prior to e. Note that update) to x to insert e after any observable write w in mo |x
EWσ (t) = ∅ if the thread t has not executed any actions; that is not a covered write. This condition is sufficient to
as soon as the thread executes its first action, we have I ⊆ ensure no cyclic dependencies arise as a result of performing
EWσ (t). the write.
From these, we determine the observable writes, which are Given that R[x] is the relational image of x in R, we de-
the writes that thread t can observe in its next read. These fine R ⇓x = {x } ∪ R −1 [x] to be the set of all elements in R
are defined as: that relate to x (inclusive). The insertion of a write event e
directly after a write w in mo is given by
OWσ (t) = {w ∈ Wr ∩ D | ∀w ′ ∈ EWσ (t). (w, w ′) < mo}
mo[w, e] = mo ∪ (mo⇓w × {e}) ∪ ({e} × mo[w])
Thus, observable writes are not succeeded by any encoun-
tered write in modification order, i.e., the thread has not The rules Write and RMW update mo in the same way.
seen another write overwriting the value being read. For the write event e executed by thread t, they pick some w
Finally, to guarantee atomicity of the update events, there that writes to the same variable as e, is observable to t and
cannot be any write operations (in modification order) be- not covered by an update event, then insert e immediately
tween the write that an update reads from and the write after w in mo.
5
Example 3.5. For the execution in Example 3.4, no thread We therefore conclude that thread 2’s guard will evaluate to
may introduce a write between wr 1R (x, 2) and upd 1RA (x, 4, 5), true, causing it to spin at line 4. In contrast, thread 1 can
or between wr 0 (y, 0) and upd 5RA (y, 0, 7). read from either wr 0 (flag 2 , false) or wr 2 (flag 2 , true) since it
has not yet encountered the event wr 2 (flag 2 , true). Similarly,
3.3 Interpreted Semantics since it has not yet encountered upd 2RA (turn, 2, 1), it can read
We now combine the event semantics with the uninterpreted from both upd 1RA (turn, 1, 2) and upd 2RA (turn, 2, 1). Thread 1
semantics to give an interpreted semantics for the language therefore could spin at line 4 or exit the busy loop. Note that
in Section 2 overall. We give two generic rules that allows once it has read a new value for flag 2 or turn, the previous
different memory models to be plugged in for the event se- value (in mo-order) can no longer be read.
mantics.
To this end, we define a configuration to be a pair (P, σ ), This example demonstrates how the basic synchronisa-
consisting of a program P and a state σ of the memory model. tion principle of Peterson’s algorithm is guaranteed by the
The command part of a configuration triggers events that release-acquire annotations. Namely, (1) the updates on turn
are agnostic to values. However, the memory model will are totally ordered via hb due to the release-acquire anno-
only allow certain values in read events. This idea is cap- tation on statement swap, and (2) the thread that is first to
tured by the following two rules combining the uninterpreted execute swap, may miss to see that the other thread has set
program semantics (i.e., rule Prog) from Section 2.2 and an its flag.
event semantics in some memory model M:
a w,e 4 Validation of Operational Semantics
P −→t P ′ σ M σ
′
τ We now justify our operational semantics by showing it to
P −→t P ′ act(e) = a tid(e) = t
be sound and complete with respect to an existing axiomatic
⊥,τ w,e
(P, σ ) =⇒M (P ′, σ ) (P, σ ) =⇒M (P ′ , σ ′) version of the C11 memory model. There are several ver-
sions of the C11 axiomatic semantics that might be regarded
The first rule describes a τ -step and does not change the
as both standard and complete [4, 5, 17]. Our semantics deals
state. The second states that a thread can execute action a
only with the release, acquire and relaxed annotations on
in the current state σ only if the event semantics of the mem-
operations. We call this the RAR fragment of C11. The stan-
ory model in consideration permits it.
dard C11 semantics also specifies the behaviour of opera-
Example 3.6. Consider the state of Peterson’s algorithm tions carrying sequentially consistent and non-atomic anno-
(Algorithm 1) in RA C11 that results when thread 1 has reached tations. We ignore these annotations here. Our semantics
the guard at line 4, and thread 2 is about to execute line 3. Ex- closely resembles the RAR fragment of [4] and [17]. Like
ecution of this step introduces the boxed event upd 2RA (turn, 2, 1). [4], we use the convention that update operations are repre-
(We use the box for emphasis; it does not carry any special sented as a single event, rather than a read/write pair. Like
semantic meaning.) [17], we adopt the constraint that sb∪rf is acyclic, and make
wr 0 (flag 1 , false) wr 0 (turn, 1) wr 0 (flag 2 , false) use of the extended coherence order.3 The axiomatic seman-
tics is given in Section 4.1. Soundness and completeness of
sb sb sb
sb, mo sb, mo the memory model is presented in Section 4.2.
sb
wr 1 (flag 1 , true) mo, rf wr 2 (flag 2 , true) 4.1 Background: RAR Fragment of RC11
sb sb Axiomatic semantics start with pre-executions, which are can-
didates for valid C11 executions. A number of axioms are
upd 1RA (turn, 1, 2) upd 2RA (turn, 2, 1) used to define which of these candidates are considered real
mo, sw, fr
executions. Pre-executions only contain a set of events and
In the state without the boxed event, thread 2 can read from program order (as represented by the sequenced-before re-
wr 0 (turn, 1) via a read event, but it cannot do so via an up- lation). We call such a pair (D, sb) a pre-execution state. New
date because wr 0 (turn, 1) is covered by the existing update events can be added to pre-execution states using the + op-
upd 1RA (turn, 1, 2). Hence the update of thread 1 (when the erator in the same way as in Figure 3. Thus, if (D, sb)
⊥,e
PE
event in the box occurs) updates turn from 2 to 1, which (D ′, sb′) then (D ′, sb′ ) = (D, sb) + e. These pre-execution
creates mo, sw and fr edges from upd 1RA (turn, 1, 2). steps are combined with the steps of a program as before,
Now consider a continuation from the state with the boxed w,e
i.e., using the rules in Section 3.3, i.e., we replace M by
event, where the threads read the values in their respec-
tive guards. Thread 2 has encountered wr 1 (flag 1 , true), and 3 In the appendix, we prove that our axiomatic model is equivalent to a
hence, is no longer able to observe wr 0 (flag 1 , false). Simi- variant of the RAR fragment of [4]. This proof is supported by a mechani-
larly, since thread 2 has encountered upd 2RA (turn, 2, 1) it is sation in Memalloy [23], which shows our models is equivalent to the RAR
no longer able to observe wr 0 (turn, 1) or upd 1RA (turn, 1, 2). fragment for models upto size 7.
6
⊥,e ⊥,e
PE . Since the first event in PE is always ⊥ (no write Example 4.5. Consider the following simple program with
events are observed), we write e
for
⊥,e two threads.
PE PE .
e1 e2 thread 1: z := x thread 2: x := 5
Proposition 4.1. If γ PE γ 1 and γ 1 PE γ ′ and tid(e 1 ) ,
e e We have mapping P0 = {1 7→ z := x, 2 7→ x := 5}. The
tid(e 2 ), then there exists a γ 2 s.t. γ 2 PE γ 2 and γ 2 1 PE γ ′ .
following pre-execution is possible:
Once a candidate pre-execution (D, sb) is computed, it is r d 1 (x, 5) w r 1(z, 5) w r 2(x, 5)
augmented with the relations rf and mo. δ 0 ======⇒PE δ 1 ======⇒PE δ 2 ======⇒PE δ 3
Definition 4.2. A C11 execution ((D, sb), rf, mo) is valid iff where δi = (Pi , γi ). The pre-execution state δ 3 can be justi-
each of the following axioms hold: fied using the following C11 state
SB-Total. Sequenced-before is a total order over the events rf sb
of each (non-initialising) thread and orders all initialising wr 2 (x, 5) rd 1 (x, 5) wr 1 (z, 5)
writes before all other events. Formally, for any e, e ′ ∈ D, The sequence of events is however not possible in the RA
((e, e ′) ∈ sb ⇒ tid(e) = 0 ∨ tid(e) = tid(e ′ )) ∧ semantics since we cannot have a read without the prior
(tid(e) = 0 ∧ tid(e ′) , 0 ⇒ (e, e ′) ∈ sb) ∧ write that it reads from, and hence the first transition cannot
(tid(e) , 0 ∧ tid(e) = tid(e ′) ∧ e ,e ′ ⇒ (e, e ′) ∈ sb ∪ sb−1 ) . be emulated. Still, the operational semantics can reach the
same final C11 state by executing
MO-Valid. Modification order is a strict order on Wr ∩ D
w r 2 (x, 5) r d 1 (x, 5)
consisting of a disjoint union of relations {mo |x }x ∈Var which (P0 , σ0 ) ======⇒RA (P1′ , σ1′) ======⇒RA
are themselves total. That is, for any w, w ′ ∈ Wr ∩ D, w r 1(z, 5)
(P2′ , σ2′) ======⇒RA (P3 , σ3 )
((w, w ′)∈ mo ⇒ var(e) = var(e ′ )) ∧
(tid(w) = 0 ∧ tid(w ′) , 0 ∧ var(w) = var(w ′) ⇒ which is also a sequence of steps in ==⇒PE .
(w, w ′) ∈ mo) ∧ The “reordering” of events described in Example 4.5 is al-
′
(tid(w) , 0 ∧ tid(w ) , 0 ∧ ways possible: for every sequence of steps of pre-executions,
var(w) = var(w ′) ∧ w , w ′ ⇒ (w, w ′) ∈ mo ∪ mo−1 ) . we can find a corresponding permutation of these steps in
RF-Complete. Each read matches exactly one write in the which reads are ordered after their writes (and the program
execution, i.e., for every e ∈ Rd ∩ D there is exactly one order within threads is preserved).
w ∈ Wr ∩ D such that (w, e) ∈ rf, and for every (e, e ′) ∈ rf, Putting together Propositions 2.3 and 4.1, we have the fol-
lowing result.
e ∈ Wr ∧ e ′ ∈ Rd ∧ var(e) = var(e ′ ) ∧ wrval(e) = rdval(e ′ ). e1 e2
Proposition 4.6. If (P, γ ) =⇒PE (P1 , γ 1 ) and (P1 , γ 1 ) =⇒PE
No-Thin-Air. The relation sb ∪ rf is acyclic. (P ′, γ ′ ) where tid(e 1 ) , tid(e 2 ), then there exists a program
Coherence. The relations hb; eco? and eco are irreflexive. e2
P2 and a pre-execution state γ 2 such that (P, γ ) =⇒PE (P2 , γ 2 )
Definition 4.3. A pre-execution state γ is justifiable iff there e1
and (P2 , γ 2 ) =⇒PE (P ′, γ ′).
exist relations rf and mo such that (γ , rf, mo) is valid.
This proposition is used to prove a permutation theorem
4.2 Soundness and Completeness for independent elements. We say that sequence e 1e 2 . . . en
Having defined a new operational semantics for C11, the is a linearization of a strict order ≺ iff dom(≺) ∪ ran(≺) =
next step is now the comparison with the existing axiomatic {e 1 , e 2 , . . . , en } and for any ei , e j , we have ei ≺ e j ⇒ i < j.
semantics. In the following, we prove the before given op- e1 e2 ek
Lemma 4.7. Let Q = (P0 , γ 0 ) =⇒PE (P1 , γ 1 ) =⇒PE . . . ==⇒PE
erational and axiomatic semantics to be equal. We start by
(Pk , γk ) and γk = (D k , sbk ). Then for every linearization f 1 , . . . , fk
showing that the executions of the operational semantics
of sbk , there exist programs P1′ , . . . , Pn−1
′ and pre-execution
are all consistent. ′ ′
states γ 1 , . . . , γn−1 such that
Theorem 4.4. Let σ = ((D, sb), rf, mo) be a C11 state reach- f1 f2 fk
able from σ0 using relation RA . Then σ satisfies SB-Total, ⇒PE (P1′ , γ 1′) =
(P0 , γ 0 ) = ⇒PE . . . =⇒PE (Pk , γk ) .
MO-Valid, RF-Complete, No-Thin-Air and Coherence.
We now show that for every justifiable pre-execution there
We next show that all consistent executions of a program is an execution of the C11 semantics that ends in the C11
are reachable in our operational semantics. We do so in two state justifying the pre-execution. The theorem uses a no-
steps. First, we consider the runs of a program on the mem- tion that restricts pre-executions and C11 executions to a
ory model. Since the axiomatic semantics in its pre-execution set of events. For a set of events E ⊆ D, we define:
allows for reads before the appropriate writes, not every se- (D, sb)↓E = (E, sb ∩ (E × E))
quence of events possible for pre-executions is also possible
in the operational semantics. (γ , rf, mo)↓E = (γ ↓E , rf ∩ (E × E), mo ∩ (E × E))
7
In the completeness proof, we assume that the given pre- Definition 5.1. Let t be a thread, σ a state and v a value.
e1 e2 ek σ
execution sequence (P0 , γ 0 ) =⇒PE (P1 , γ 1 ) =⇒PE . . . ==⇒PE The determinate-value assertion x =t v holds iff
(Pk , γk ) has been reordered such that e 1 . . . ek is a lineariza- OWσ (t) |x = {σ .last(x)} (1)
tion of sbk ∪rf k , where rf k is the reads-from relation used in
v = wrval(σ .last(x)) (2)
the justification of γk . Such a linearization is possible since
?
sbk ∪ rf k is acyclic (axiom No-Thin-Air). σ .last(x) ∈ I σ ∪ {e | ∃e ′ . tid(e) = t ∧ (e, e ′) ∈ σ .hb } (3)
Condition (3) states that σ .last(x) is either an operation of
e1 e2 ek the initialising thread, an operation of t, or happens-before
Theorem 4.8. Suppose (P0 , γ 0 ) =⇒PE (P1 , γ 1 ) =⇒PE . . . ==⇒PE
(Pk , γk ) such that γk = (D k , sbk ) is justifiable with justifi- an operation of t.
cation σk = (γk , rf k , mok ) and e 1 , . . . , ek is a linearization Example 5.2. To illustrate the determinate-value assertion,
e1 e2 ek
of sbk ∪ rf k . Then (P0 , σ0 ) =⇒RA (P1 , σ1 ) =⇒RA . . . ==⇒RA consider the two states below. In each case, assume there
(Pk , σk ), where σi = (γk , rf k , mok )↓{e1, ...,ei } , 0 < i < k. are writes (not shown) to variable x that are mo-prior to the
write to x. Also assume that each write is the last write in
mo order.
5 Verification wr 1 (x, 2) rd 2A (y, 1) wr 0 (x, 2)
rf
rd 1 (x, 2) rd 2A (y, 1)
We now describe our verification method (Section 5.1), build- sb sb
sw sw
ing on the operational semantics. In Section 5.2, we apply it wr 1R (y, 1) wr 1R (y, 1)
to our case study, Peterson’s mutual exclusion algorithm.
For the state on the left, after the boxed operation, thread 2
σ
satisfies x =2 2, but for the state on the right, thread 2 does
5.1 Verification Method not. In each case, the only write to x that thread 2 can ob-
Our verification method is built around two kinds of asser- serve is the illustrated write to x, but thread 2 satisfies a
tions for describing states of the operational semantics. The corresponding determinate value assertion only on the left
first kind, determinate-value assertions, are used to describe state. This is because on the left we have (wr 1(x, 2), rd 2(x, 2)) ∈
the values that a read operation might return. As such, these hb, but the unsychronised rf edge on the right means that
assertions are analogous to equations that specify the val- there is no analagous hb edge.
ues of variables in a conventional (i.e., sequentially consis- In our verification, determinate-value assertions support
tent) setting in which the state of an algorithm can be repre- clean interaction with variable-ordering assertions, which
sented as a store that maps variables to values. The second we describe shortly. Note that because our operational model
kind of assertion, variable-ordering assertions, has no direct prevents update operations reading from covered writes (see
analogue in the conventional setting. Variable-ordering as- Section 3), i.e., are more restricted than read operations, an
sertions provide a way to describe how information about a update operation on a variable x may only be able to read
variable propagates between threads. σ
from the last write to x even if x = v is false for all v. Below,
we show how to handle important instances of this situa-
Determinate-values. In the following, we assume that σ = tion.
((D, sb), rf, mo) is a valid C11 state. We let σ .last(x) be the The next two lemmas are immediate from the definition
write or update to x in D that is not succeeded by another σ
of =t . Lemma 5.3 below ensures that the value returned
write or update in mo |x . Note that σ .last(x) is well-defined by a reading transition using the semantics in Figure 3 is
in any valid state σ . When X is a set of operations and x is consistent with the determinate-value assertion. Lemma 5.4
a variable, X |x = {e ∈ X | var(e) = x }. For the determi- ensures that when a determinate-value assertion holds for
nate value assertions, consider some thread t and variable two threads reading from the same variable, they return the
x. In some states of the operational semantics, there is ex- same values for the variable.
actly one write that t can read-from when reading x. This
is true precisely when OWσ (t) |x = {σ .last(x)} (recall that Lemma 5.3 (Determinate-Value Read). For any Read or RMW
m,e σ
σ .last(x) is never covered, and so σ .last(x) can always be transition (P, σ ) =⇒RA (P ′ , σ ′), if var(e) =tid(e ) v, then rdval(e) =
observed in a transition). Under such a condition, the value v.
returned by a read of x in thread t must be wrval(σ .last(x)).
This ultimately provides us with a weak memory analogue Lemma 5.4 (Determinate-Value Agreement). For threads t, t ′ ,
σ σ
of an equation asserting that a given variable has a given variable x, and values v, v ′, if x =t v and x =t ′ v ′ then v = v ′,
value in a conventional sequentially consistent setting. and thus t and t ′ agree on the value of x.
Determinate-value assertions differ from their conventional
counterparts in that they are relative to a particular thread.
8
It is almost definitive of weak-memory systems that distinct the fact that for such a variable, any modification but the
threads can have different views of the memory state. last is covered. Thus, we have the following lemma.
Variable-ordering. How can we ensure that distinct threads
can agree on (or share) sufficient determinate-value asser- Lemma 5.6 (Last Modification Transition). Let t = tid(e)
tions to support a verification? We address this problem by and x = var(e) for some event e. For any reachable transition
m,e σ
using another class of assertion: variable-order assertions, (P, σ ) =⇒RA (P ′ , σ ′), m = σ .last(x) if either x =t v, for some
which orders two variables whenever the last writes to the value v, or x is an update only variable in σ .
variables are causally (i.e., hb) ordered.
Definition 5.5. Let x, y be variables and σ a state with hb In other cases, other kinds of invariants can be used to guar-
σ
relation σ .hb. The variable-order assertion x → y holds iff antee this last-modification property.
(σ .last(x), σ .last(y)) ∈ σ .hb .
Example 5.7. Consider the following message-passing in-
For example, the state σ in the left of Example 5.2 without teraction between two threads:
σ σ
the boxed event satisfies x → y. When x → y, a determinate- Init: f = 0 ∧ d = 0
σ
value assertion x =t v can be “copied” to another thread t ′ , thread 1 thread 2
whenever t ′ performs an acquiring read that reads-from the 1 : d := 5; 1 : while !f A do skip;
last modification of y and this write is releasing. It is easy to 2 : f :=R 1; 2 : r := d;
see that in a state σ ′ after such a synchronisation, σ ′ .last(x)
σ′ Here, thread 1 sets the data variable d to 5, and then indi-
is happens-before an operation of t ′ , and thus x = t ′ v. cates that the data is ready by setting the flag variable f to
Inference rules. Figure 4 presents a set of rules that pre- 1. Thread 2 awaits this condition, and then consumes the
cisely captures reasoning principles for determinate-value data. In order to show that this simple program is correct,
and variable-order assertions. The “copying” of determinate we must be able to prove that thread 2 always reads the cor-
value assertions is captured in rule Transfer4 . For the left rect value at line 2.
state in Example 5.2 we can see this copying: when the boxed We sketch a proof that for any state σ ′ , where thread 2 is
event rd 2A (y, 1) occurs (leading to state σ ′ ), the determinate σ′
at line 2, we have d = 2 5. First note that this program satis-
σ σ
value assertion x =1 2 is “copied” to thread 2 giving x =2 2 fies the invariant that for each write w satisfying var(w) = f
by rule Transfer. Rule WOrd shows how we introduce vari- and wrval(w) = 1, w is a releasing write of thread 1 and
able ordering assertions: a variable ordering assertion can last(f ) = w. Using rules ModLast and WOrd, after execut-
be introduced every time a thread writes to one variable (y σ
ing line 2 of thread 1, the resulting state σ satisfies d =1 5
in the rule), while having a determinate value assertion on σ
and d → f . This fact, along with the invariant above satisfy
another variable (x in the rule). Note that this rule would
the premises of the Transfer rule where x is d and y is f .
not be sound, without Condition (3) of Definition 5.1: since σ′
the existence of an hb edge from σ ′ .last(x) to σ ′ .last(y). Thus, when the loop exits, into state σ ′, we have d = 2 5, as
Last modification transitions. Observe that the rules in required.
Figure 4 are all conditioned on the modification that is ob-
served in the transition being the last modification to the Equipped with these techniques, we now show that Pe-
given variable. Thus, we must be able to prove that a given terson’s algorithm with the synchronisation annotations as
read or update observes the last modification. There are sev- given in Section 2 guarantees mutual exclusion.
σ
eral ways to do this. It is easy to see that if x =t v for some
thread t in some state σ then t can only read the last write 5.2 An Example Verification: Peterson’s Algorithm
to x. We formalise this claim in Lemma 5.6 below, and in our We turn now to the verification of the version of Peterson’s
case study we show how to use it in verification. Mutual Exclusion algorithm given in Algorithm 1. Our ver-
Update operations provide another way to guarantee that ification consists of proving a mutual exclusion invariant
a given operation observes the last modification at a given (Theorem 5.8) stating that there is no reachable state in which
variable. Given a C11 state ((D, _), _, mo), an update-only both processes are in their respective critical sections.
variable is any variable x such that for all modifications To state our invariants, we make use of an auxiliary pro-
m ∈ D with x = var(m), either m is an update or m ∈ IWr. gram counter function, which for each thread, returns the
Note that initially, every variable is an update only variable. line number of Algorithm 1 that the thread is currently exe-
In the operational semantics, update-only variables have the cuting. More precisely, for each configuration (P, σ ) of Peter-
property that any new update or write can only be added to son’s algorithm, and t a thread with t ∈ {1, 2}, the expres-
the end of the modification order. This is a consequence of sion P.pc t returns i when P(t) is the part of the program
4 We show soundness of these proof rules in the appendix. starting on line i.
9
σ
x = var(e) y = var(e) x → y m ∈ WrR|y
σ e ∈ U |y
σ0 = ((I, ∅), ∅, ∅) e ∈ Wr |x x =t v (m, e) ∈ sw
σ
I ⊆ IWr m = σ .last(x) m = σ .last(y) x →y
Init ModLast Transfer UOrd
σ0 σ′ σ′ σ′
x = t wrval(σ0 .last(x)) x = tid(e ) wrval(e) x = tid(e ) v x →y
e < Wr |x x = var(e) e ∈ RdA |x x , y e ∈ Wr |y e < Wr | {x,y }

σ σ σ
x =t v m ∈ WrR |x m = σ .last(x) x =tid(e ) v m = σ .last(y) x →y
NoMod AcqRd WOrd NoModOrd
σ′ σ′ σ′ σ′
x =t v x = tid(e ) rdval(e) x →y x →y
m,e
Figure 4. Rules for determinate-value and variable-order assertions. We assume σ , m, e, σ ′ satisfy (_, σ ) =⇒RA (_, σ ′).
The mutual exclusion property for Algorithm 1 is proved Theorem 5.8 (Mutual exclusion). For each reachable config-
in Theorem 5.8, which relies on the following invariants. uration (P, σ ), P.pc 1 , 5 ∨ P.pc 2 , 5.
turn is an update-only location (4) Proof. Assume that P.pc 1 = 5 and P.pc 2 = 5. Then, by Prop-
σ σ σ σ
turn =1 2 ∨ turn =2 1 (5) erty (9) above, we have turn =1 2 and turn =2 1. But this is
σ impossible by Lemma 5.4. Thus, P.pc 1 , 5 or P.pc 2 , 5.
P.pc t ∈ {3, 4, 5, 6} =⇒ flagt =t true (6)
σ
P.pc t ∈ {4, 5, 6} =⇒ flagt → turn (7) 6 Conclusion and Related Work
P.pc t ∈ {4, 5, 6} ∧ P.pc tˆ ∈ {4, 5, 6} =⇒ (8) We have developed an operational semantics for the RAR
σ σ fragment of the C11 memory model, which has been shown
flagtˆ =t true ∨ turn =tˆ t
σ to be both sound and complete with respect to the axiomatic
P.pc t = 5 ∧ P.pc tˆ ∈ {4, 5, 6} =⇒ turn =t̂ t (9) description. Thus, every state generated by the operational
σ
P.pc t = 2 =⇒ flagt = false (10) semantics is guaranteed to be one allowed by the axiomatic
semantics. Moreover, any execution that is valid with re-
As in the classical (sequentially consistent) setting, we prove spect to the axiomatic semantics can be generated by the op-
that these invariants hold for the initial configuration and erational semantics. Our semantics relies on a thread-local
for each transition of the algorithm. For space reasons we view of observability6 , which is defined in terms of eco and
only provide details for one of these cases, i.e., where the hb orders. We have developed a proof technique for our
first test at line 4 is evaluated to false, causing it to enter the operational semantics with a notation that follows conven-
critical section.5 tional proofs of sequentially consistent memory as much as
possible. Finally, we have applied this technique to an exam-
Proof. We consider the first test at line 4, flagt = false, in the ple verification.
case where the test fails (the success case is very simple). There is a large body of related work; here, we provide a
Let (P ′ , σ ′) be the configuration after the step in question brief snapshot. There are several works aimed at providing
Assume that P.pc t = 4, P ′ .pct = 5, and e = Rt (flag t̂ , false). operational semantics for a larger subset of C11, including
Because e is not a write and the value of pct̂ does not models that aim to address the so-called thin-air problem
change, it is easy to use the NoMod and NoModOrd rules (that we rule out by the No-Thin-Air axiom), which invari-
to show that each invariant except for (9) is preserved. We ably lead to more complex semantics. Nienhuis et al. [20]
now prove that (9) is preserved. We do so by proving that provide a semantics that supports inductive reasoning, but
σ′
turn = tˆ t under the assumption that P ′ .pctˆ ∈ {4, 5, 6}. Be- they are forced to consider an order that does not include sb.
cause P.pc tˆ = P ′ .pctˆ , we have P.pc tˆ ∈ {4, 5, 6}. Further- This complicates a verification technique that follows pro-
more, by Lemma 5.3 and the fact that e = Rt (flagtˆ , false) and gram order. Kang et al. [13] develop an operational model
σ
m = σ .last(flagtˆ ) the assertion flag tˆ =t true is false. Thus aimed at handling cycles in sb ∪ rf. Again, their sophisti-
σ
by Invariant 8, we have turn =t̂ t. Then, from rule NoMod, cated model handles a larger subset of the C11 language,
σ′ but at the cost of a more complicated state space and transi-
and the fact that e is not a write, we have turn = tˆ t, as re- tion relation. Lahav et al. [14] provide an operational model
quired. for a stronger release-acquire model, where sb ∪ rf ∪ mo is
required to be acyclic.
These invariants are sufficient to prove that Peterson’s
Kang et al. [13] provide a basic program logic for prov-
Algorithm satisfies the mutual exclusion property.
ing invariants; using their semantics in verification remains
5 The full proof is available in the extended version. 6 Our notion of observability differs from those defined in [8, 12].
10
an open problem. Jagadeesan et al. [10] develop an opera- [13] J. Kang, C.-K. Hur, O. Lahav, V. Vafeiadis, and D. Dreyer. 2017.
tional semantics for capable of coping with out-of-order ex- A promising semantics for relaxed-memory concurrency. In POPL.
ecutions for the Java memory model. However, their work ACM, 175–189.
[14] O. Lahav, N. Giannarakis, and V. Vafeiadis. 2016. Taming release-
aims to support Java compiler optimisations and they do not acquire consistency. In POPL, R. Bodík and R. Majumdar (Eds.). ACM,
consider program verification. One avenue for future work 649–662.
is to see how our notions of determinate-value and variable- [15] O. Lahav and V. Vafeiadis. 2015. Owicki-Gries Reasoning for Weak
ordering assertions might be applied to a more sophisticated Memory Models. In ICALP (2) (Lecture Notes in Computer Science),
semantics [10, 13]. Vol. 9135. Springer, 311–323.
[16] O. Lahav and V. Vafeiadis. 2016. Explaining Relaxed Memory Models
Concurrent separation logic (CSL) provides a different ap- with Program Transformations. In FM (LNCS), J. S. Fitzgerald, C. L.
proach to verification, and several frameworks have been de- Heitmeyer, S. Gnesi, and A. Philippou (Eds.), Vol. 9995. 479–495.
veloped for dealing with C11-style weak memory [6, 7, 11, [17] O. Lahav, V. Vafeiadis, J. Kang, C.-K. Hur, and D. Dreyer. 2017. Repair-
12]. These frameworks typically deal with a fragment of C11 ing sequential consistency in C/C++11. In PLDI, A. Cohen and M. T.
Vechev (Eds.). ACM, 618–632.
containing release/acquire operations but that is not compa-
[18] L. Lamport. 1979. How to Make a Multiprocessor Computer That
rable to the fragment of our model. Weak-memory CSL has Correctly Executes Multiprocess Programs. IEEE Trans. Computers
been a very active area of research for several years, and we 28, 9 (1979), 690–691.
refer the reader to the introduction of [12] for an excellent [19] C. Lidbury and A. F. Donaldson. 2017. Dynamic race detection for
review. C++11. In POPL. ACM, 443–457.
[20] K. Nienhuis, K. Memarian, and P. Sewell. 2016. An operational seman-
tics for C/C++11 concurrency. In OOPSLA. ACM, 111–128.
Acknowledgments [21] S. S. Owicki. 1975. Axiomatic Proof Techniques for Parallel Programs.
Garland Publishing, New York.
This work has been supported by EPSRC grants EP/R032556/1
[22] A. Podkopaev, I. Sergey, and A. Nanevski. 2016. Operational Aspects
and EP/M017044/1, and DFG grant WE 2290/12-1. We thank of C/C++ Concurrency. CoRR abs/1606.01400 (2016). arXiv:1606.01400
John Wickerson for helpful discussions. We also thank our [23] J. Wickerson, M. Batty, T. Sorensen, and G. A. Constantinides. 2017.
anonymous reviewers and shepherd (Viktor Vafeiadis) for Automatically comparing memory consistency models. In POPL.
their comments, which have helped improve the paper. ACM, 190–204.
[24] A. Williams. 2018. https://www.justsoftwaresolutions.co.uk/threading/petersons_lock_
Accessed: 2018-06-20.
References
[1] S. V. Adve and K. Gharachorloo. 1996. Shared Memory Consistency
Models: A Tutorial. IEEE Computer 29, 12 (1996), 66–76.
[2] J. Alglave, L. Maranget, and M. Tautschnig. 2014. Herding Cats: Mod-
elling, Simulation, Testing, and Data Mining for Weak Memory. ACM
Trans. Program. Lang. Syst. 36, 2 (2014), 7:1–7:74.
[3] M. Batty, M. Dodds, and A. Gotsman. 2013. Library abstraction for
C/C++ concurrency. In POPL, R. Giacobazzi and R. Cousot (Eds.).
ACM, 235–248.
[4] M. Batty, A. F. Donaldson, and J. Wickerson. 2016. Overhauling SC
atomics in C11 and OpenCL. In POPL. ACM, 634–648.
[5] M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. 2011. Mathe-
matizing C++ concurrency. In POPL, T. Ball and M. Sagiv (Eds.). ACM,
55–66.
[6] M. Doko and V. Vafeiadis. 2016. A Program Logic for C11 Memory
Fences. In VMCAI (LNCS), Vol. 9583. Springer, 413–430.
[7] M. Doko and V. Vafeiadis. 2017. Tackling Real-Life Relaxed Concur-
rency with FSL++. In ESOP. 448–475.
[8] D. S. Fava, M. Steffen, and V. Stolz. 2018. Operational Semantics of a
Weak Memory Model with Channel Synchronization. In FM (LNCS),
Vol. 10951. Springer, 258–276.
[9] M. He, V. Vafeiadis, S. Qin, and J. F. Ferreira. 2016. Reasoning about
Fences and Relaxed Atomics. In PDP. IEEE Computer Society, 520–
527.
[10] R. Jagadeesan, C. Pitcher, and J. Riely. 2010. Generative Operational
Semantics for Relaxed Memory Models. In ESOP (Lecture Notes in
Computer Science), Vol. 6012. Springer, 307–326.
[11] R. Jung, D. Swasey, F. Sieczkowski, K. Svendsen, A. Turon, L. Birkedal,
and D. Dreyer. 2015. Iris: Monoids and Invariants as an Orthogonal
Basis for Concurrent Reasoning. In POPL. ACM, 637–650.
[12] J.-O. Kaiser, H.-H. Dang, D. Dreyer, O. Lahav, and V. Vafeiadis. 2017.
Strong Logic for Weak Memory: Reasoning About Release-Acquire
Consistency in Iris. In ECOOP. 17:1–17:29.
11
A Proofs of Section 4.2 and (e, e ′′ ) ∈ ecoi +1 . However, this would mean
Theorem 4.4. Let σ = ((D, sb), rf, mo) be a C11 state reach- we have w < OWσi (t), and hence the edge (w, e) ∈
able from σ0 using relation RA . Then σ satisfies SB-Total, moi +1 cannot exist.
MO-Valid, RF-Complete, No-Thin-Air and Coherence. 2. There is a path with edges (w ′, e ′′ ) ∈ ecoi?+1 and
(e ′′, w) ∈ hbi +1 . This potentially creates a reflexive
Proof of Theorem 4.4. By induction on the number of steps edge, via the composition of edges (e ′′, w) ∈ hbi +1
executed to reach σ . and (w, e ′′ ) ∈ ecoi +1 . However, this also means that
Induction base. The initial state σ0 satisfies all conditions we have edges (w, w ′) ∈ moi , (w ′, e ′′ ) ∈ ecoi? and
as all relations are empty and there are no read event in σ0 . (e ′′, w) ∈ hbi , i.e., hbi ; ecoi? is reflexive, which is a
contradiction.
Induction step. Let σi be a C11 state reachable in i steps
3. There is a path with edges (w ′, e ′′ ) ∈ ecoi?+1 and
that fulfils the C11 consistency conditions. Let σi e C 11
(e ′′, r ) ∈ hbi +1 . This case is similar to the one above.
σi +1 . We need to show that σi +1 satisfies all conditions.
• e is an update event. This introduces edges (e ′, e) ∈
SB-Total: Follows from definition of + and induction hy- sbi +1 , (w, e) ∈ rf i +1 , (w, e) ∈ moi +1 . The proof is simi-
pothesis. lar to the proofs of the read and write cases.
MO-Valid: Follows from definition of mo[w, e] and induc- We now show ecoi +1 is irreflexive, assuming ecoi is irreflex-
tion hypothesis.
ive. We perform case analysis on the type of event e.
RF-Complete: Follows from rules Read and RMW and in-
• e is a read event. This introduces eco edges (w, e) ∈
duction hypothesis.
rf i +1 and (e, w ′) ∈ fri +1 for each w ′ such that (w, w ′) ∈
No-Thin-Air: Let σi = ((D i , sbi ), rf i , moi ), assume (by in- moi . If ecoi +1 is reflexive, we must have an edge (w ′, w) ∈
duction hypothesis) that sbi ∪ rf i is acyclic and consider the ecoi +1 . But this means we have edges (w ′, w) ∈ ecoi
introduction of element e to σi . For each rule in Figure 3, e and (w, w ′) ∈ moi ⊆ ecoi +1 , i.e., ecoi is reflexive,
is maximal in sbi +1 . Thus sbi +1 ∪ rf i is acyclic. Moreover, if which is a contradiction.
e is a write rf i +1 = rf i , and if e is a read or an update, e is • e is a write event. This introduces eco edges (w, e) ∈
maximal in rf i +1 , and hence, sbi +1 ∪ rf i +1 is acyclic. moi +1 and (r , e) ∈ fri +1 . If e is maximal in moi +1 we are
Coherence. Assume hbi ; ecoi? is irreflexive. Consider case done as ecoi +1 cannot be reflexive. Otherwise, there
distinction on the type of event e. is an path that leaves e via an edge (e, w ′) ∈ moi +1 .
• e is a read event. This introduces edges (e ′, e) ∈ sbi +1 , Then we either have a path with edge (w ′, w) ∈ ecoi +1
(w, e) ∈ rf i +1 and (e, w ′) ∈ fri +1 for each w ′ such that or a path with edge (w ′, r ) ∈ ecoi +1 . However, both
(w, w ′) ∈ moi . If we have a cycle in hbi +1 ; ecoi?+1 , the contradict the assumption that ecoi is irreflexive.
cycle has to pass through e, and therefore also leave e • e is an update event. This introduces eco edges (w, e) ∈
via some (e, w ′) ∈ fri +1 edge (since these are the only rf i +1 , (w, e) ∈ moi +1 , as well as (e, w ′) ∈ moi +1 and
outgoing edges from e). There are two cases: (e, w ′) ∈ fri +1 for each w ′ such that (w, w ′) ∈ moi .
1. There is a path with edges (w ′, e ′′ ) ∈ ecoi?+1 and If ecoi +1 is reflexive, we must have an edge (w ′, w) ∈
(e ′′, e ′ ) ∈ hbi?+1 for some e ′′ ∈ D i . ecoi +1 . But this means we have edges (w ′, w) ∈ ecoi
2. There is a path with edges (w ′, e ′′ ) ∈ ecoi?+1 and and (w, w ′) ∈ moi ⊆ ecoi +1 , i.e., ecoi is reflexive,
(e ′′, w) ∈ hbi?+1 and (w, e) ∈ swi?+1 (due to the events which is a contradiction.
w and e synchronising via R and A synchronisation).
Both cases potentially create a reflexive edge via the
composition of edges (e ′′ , e) ∈ hbi +1 and (e, e ′′ ) ∈ e1 e2
Lemma 4.7. Let Q = (P0 , γ 0 ) =⇒PE (P1 , γ 1 ) =⇒PE . . . ==⇒PE
ek
ecoi +1 . However, we now have w ′ ∈ EWσi (t) and since (Pk , γk ) and γk = (D k , sbk ). Then for every linearization f 1 , . . . , fk
(w, w ′) ∈ mo, we have w < OWσi (t) and hence the of sbk , there exist programs P1′ , . . . , Pn−1
′ and pre-execution
edge (w, e) ∈ rf i +1 cannot exist, giving rise to a con- ′ ′
states γ 1 , . . . , γn−1 such that
tradiction.
• e is a write event. This introduces edges (e ′, e) ∈ sbi +1 , f1 f2 fk
⇒PE (P1′ , γ 1′) =
(P0 , γ 0 ) = ⇒PE . . . =⇒PE (Pk , γk ) .
(w, e) ∈ moi +1 and (r , e) ∈ fri +1 for any read r in D i
that reads var(e). If e is maximal in moi +1 we are done
as hbi +1 ; ecoi?+1 cannot be reflexive. Otherwise, there
must be an edge that leaves e via an edge (e, w ′) ∈ Proof of Lemma 4.7. Let F = f 1 , . . . , fk and E = e 1 , . . . , ek ,
moi +1 . We have two cases: and let δi represent the pair (Pi , γi ). Suppose f 1 , the first
1. There is a path with edges (w ′, e ′′ ) ∈ ecoi?+1 and element of F is the element ei , the ith element of E. We show
(e ′′, e ′ ) ∈ hbi?+1 . This potentially creates a reflexive that Q can be transformed into a valid sequence of PE steps
edge, via the composition of edges (e ′′ , e) ∈ hbi +1 such that ei is the first event considered. By definition, we
12
have that Q is the sequence: where σi = (γk , rf k , mok )↓{e1, ...,ei } , 0 < i < k.
e1 e i −1 ei ek
δ 0 =⇒PE . . . δi −2 ===⇒PE δi −1 =⇒PE δi . . . ==⇒PE δk . Proof of Theorem 4.8. By induction on the number of steps.
Since both E and F are a linearizations of sbk , we have the Base case. The initial configurations agree and hence the
property: claim holds for 0 steps.
Induction step. Let the above claim hold for sequences up
∀j. 0 ≤ j ≤ i − 1 ⇒ tid(e j ) , tid(ei ) (11)
to length j. We perform a case split on the type of event
By (11), we have in particular that tid(ei −1 ) , tid(ei ). Thus e j+1 < D j .
ei
by Proposition 4.6 there must exist a δi′−1 such that δi −2 =⇒PE 1. e j+1 is a read event of a thread t, i.e., tid(e j+1 ) = t. Let
e i −1
δi′−1 ===⇒PE δi , and hence, a valid pre-execution sequence w be the write event that e j+1 reads from, i.e., (w, e j+1 ) ∈
e1 ei e i −1 ek
rf k . We known w ∈ D j since we consider elements in
δ 0 =⇒PE . . . δi −2 =⇒PE δi′−1 ===⇒PE δi . . . ==⇒PE δk . sbk ∪ rf k order.
Again by property (11), we have that tid(ei −2 ) , tid(ei ) and We need to show that w ∈ OWσ j (t). The proof is by
the process above can be repeated so that we obtain: contradiction. Assume w < OWσ j (t), then there ex-
e1 ei e i −2 e i −1 ek
ists a w ′ ∈ EWσ j (t) such that (w, w ′) ∈ moj . Hence
δ 0 =⇒PE . . . =⇒PE δi′−2 ===⇒PE δi′−1 ===⇒PE δi . . . ==⇒PE δk . (e j+1 , w ′) ∈ frk and there exists some e such that (w ′, e) ∈
Further repeating this process, we obtain: ecok? and (e, e j+1 ) ∈ hbk? . There are three possibilities:
ei e1 e i −2 e i −1 • (w ′, e), (e, e j+1 ) ∈ Id. This is an immediate contra-
δ 0 =⇒PE δ 1′ =⇒PE δ 2′ . . . δi′−2 ===⇒PE δi′−1 ===⇒PE diction since it implies w ′ = e j+1 .
e i +1 ek
δi ===⇒PE . . . ==⇒PE δk . • (w ′, e) ∈ ecok and (e, e j+1 ) ∈ Id, i.e., e = e j+1 . This
contradicts the assumption that ecok is irreflexive.
which (since ei = f 1 ) is equivalent to:
• (w ′, e) ∈ ecok? and (e, e j+1 ) ∈ hbk . We then have
f1 e1 e i −2 e i −1
⇒PE δ 1′ =⇒PE δ 2′ . . . δi′−2 ===⇒PE δi′−1 ===⇒PE
δ0 = (e j+1 , e) ∈ ecok resulting in a contradiction to the
e i +1 ek assumption that hbk ; ecok? is irreflexive.
δi ===⇒PE . . . ==⇒PE δk . The contradictory scenario is depicted by the follow-
We can now repeat the entire process for f 2 using the prop- ing diagram:
erty and percolate the element it corresponds to in E it to the w rf k frk
correct position in F since f 2 , ei , i.e. f 2 corresponds to an
element in {e 1 , . . . ek }\{ei }. Assuming that f 2 corresponds mok
to position i ′ in E, we have property ∀j. 1 ≤ j ≤ i ′ − 1 ⇒ w′ e e j+1
tid(e j ) , tid(ei ′ ) (analogous to (11)), where the lower index
ecok? hbk?
is increased by 1 and upper index is adjusted to i ′ − 1. Once 2. Suppose e j+1 is a write event and w is the immediate
f 2 is in position, we can repeat for f 3 and so forth. predecessor of e j+1 in mok . Note that w may either be
We now show that for every justifiable pre-execution there a write or an update event. We must show that it is
e j +1
is an execution of the C11 semantics that ends in the C11 possible to take a RA step such that e j+1 is placed
state justifying the pre-execution. The theorem uses a no- immediately after w. To this end, we must show that
tion that restricts pre-executions and C11 executions to a w ∈ OWσ j (t).
set of events. For a set of events E ⊆ D, we define: Suppose not, i.e., w < OWσ j (t). Then there exists an
(D, sb)↓E = (E, sb ∩ (E × E)) event w ′ ∈ EWσ j (t) such that (w, w ′) ∈ mo and an
(γ , rf, mo)↓E = (γ ↓E , rf ∩ (E × E), mo ∩ (E × E)) event e such that (w ′, e) ∈ ecok? and (e, e j+1 ) ∈ hbk? .
Since we have assumed w is an immediate predecessor
In the completeness proof, we assume that the given pre- of e j+1 in mok and that (w, w ′) ∈ mok , we must have
e1 e2 ek
execution sequence (P0 , γ 0 ) =⇒PE (P1 , γ 1 ) =⇒PE . . . ==⇒PE (e j+1 , w ′). There are three possibilities:
(Pk , γk ) has been reordered such that e 1 . . . ek is a lineariza- • (w ′, e), (e, e j+1 ) ∈ Id. This is an immediate contradic-
tion of sbk ∪rf k , where rf k is the reads-from relation used in tion since it implies w ′ = e j+1 and we have assumed
the justification of γk . Such a linearization is possible since w ′ ∈ D j , e j+1 < D j .
sbk ∪ rf k is acyclic (axiom No-Thin-Air). • (w ′, e) ∈ ecok and (e, e j+1 ) ∈ Id, i.e., e = e j+1 . This
e1 e2 ek contradicts the assumption that ecok is irreflexive.
Theorem 4.8.Suppose (P0 , γ 0 ) =⇒PE (P1 , γ 1 ) =⇒PE . . . ==⇒PE
• (w ′, e) ∈ ecok? and (e, e j+1 ) ∈ hbk . We then have
(Pk , γk ) such that γk = (D k , sbk ) is justifiable with justifica-
(e j+1 , e) ∈ ecok resulting in a contradiction to the
tion σk = (γk , rf k , mok ) and e 1 , . . . , ek is a linearization of
assumption that hbk ; ecok? is irreflexive.
sbk ∪ rf k . Then
e1 e2 ek
The contradictory scenario is depicted by the follow-
(P0 , σ0 ) =⇒RA (P1 , σ1 ) =⇒RA . . . ==⇒RA (Pk , σk ) ing diagram:
13
w mok mok Proof. If property 1 holds, then OWσ (t) |x = {σ .last(x)}, and
thus m = σ .last(x). If property 2 holds, then every modifica-
mok
w′ e e j+1 tion to x is covered, except the last. Thus, because m is not
ecok? hbk? covered, m = σ .last(x).
We also need to show that w selected is not covered. B.2 Soundness of determinate-value and
Again assume the contrary: there exists some update variable-order assertions
event u such that (w, u) ∈ rf k . Then (w, u) ∈ mok For simplicity, we copy Figure 4 as Figure 5. We refer to the
as well. Hence there is an edge (u, e j+1 ) ∈ frk . Since set in condition (3) of Definition 5.1 as the happens-before
the update u and e j+1 write to the same location, they cone of t in σ , and hence define:
need to be mo-ordered. Here we have two cases:
σ .hbc(t) = I σ ∪ {e | ∃e ′ . tid(e) = t ∧ (e, e ′ ) ∈ σ .hb? }
• If (u, e j+1 ) ∈ mok , then w is not the immediate pre-
decessor of e j+1 in mok . Lemma B.1. Init is valid.
• If (e j+1 , u) ∈ mok , then the frk edge and the mok
Proof. We have σ0 = ((I, ∅), ∅, ∅). We have three sub proofs
edge together form a cyle, contradicting irreflexiv-
ity of ecok . 1. Since mo = ∅, we have OWσ0 (t) = I , and also |I |x | = 1
3. Suppose e j+1 is an update event and w is the imme- and hence I |x = {σ0 .last(x)} = OWσ0 (t) |x .
diate predecessor of e j+1 in mok . We must show that 2. Trivial by definition.
e j +1 3. Immediate since σ .last(x) ∈ I .
it is possible to take a RA step such that e j+1 is
placed immediately after w. This case is a combina-
tion of the read and write cases, namely if we assume Lemma B.2 (Establish Determinate-Values). For any reach-
w < OWσ j (t), then there must exist a w ′ and e as m,e
able transition (P, σ ) =⇒RA (P ′, σ ′), the rules NoMod, Mod-
shown in the diagram below, which is a contradiction. Last, Transfer and AcqRead from Figure 5 hold.
w mok mok , frk
Proof.
mok NoMod. This is easy to check since σ .last(x) = σ ′ .last(x)
w′ e e j+1 and OWσ (t) |x = OWσ ′ (t) |x .
ecok? hbk?
ModLast. Since m = σ .last(x), the new modification e is
added to the end of mo |x , so that σ ′ .last(x) = e. Because
e ∈ OWσ ′ (t) |x and e is mo-after every other modification
B Proofs for Section 5 in σ ′ .mo |x , OWσ ′ (t) |x = {e}. Finally, because tid(e) = t,
B.1 Proofs of lemmas e ∈ {e | ∃e ′ . tid(e) = t ∧ (e, e ′ ) ∈ σ .hb? }.
σ
Lemma 5.3. (Determinate-Value Read) For any Read or RMW Transfer. First note that because x → y we have
m,e ′ ′ σ
transition (P, σ ) =⇒RA (P , σ ), if var(e) =tid(e ) v, then rdval(e) = (σ .last(x), σ .last(y)) ∈ σ .hb . (12)
v. Because hb is irreflexive, we also have x , y, and thus by
σ
Proof. By the definition of var(e) =tid(e ) v, we know m = NoMod:
σ .last(x). Both the Read or RMW transitions stipulate that σ ′ .last(x) = σ .last(x) . (13)
′
By (12) and (13), we have (σ .last(x), σ .last(y)) ∈ σ .hb. More-
the value read is the rdval(e) = wrval(m) = v.
over,
Lemma 5.4. (Determinate-Value Agreement) For any threads (σ ′ .last(x), σ .last(y)) ∈ σ .hb
′ ′ σ σ ′
t, t location x, and values v, v , if x =t v and x =t ′ v then ⇒ (σ ′ .last(x), σ .last(y)) ∈ σ ′ .hb because σ .hb ⊆ σ ′ .hb
v = v ′, and thus t and t ′ agree on the value of x.
⇒ (σ .last(y), e) ∈ σ ′ .hb ∧ assumption
σ σ
Proof. By x =t v and x =t ′ v ′, we have OWσ (t) = OWσ (t ′ ) = (σ ′ .last(x), e) ∈ σ ′ .hb (σ .last(y), e) ∈ sw
{σ .last(x)}, and thus v = v ′. Therefore,
σ ′ .last(x) ∈ σ ′ .hbc(tid(e))
Lemma 5.6. Let t = tid(e) and x = var(e) for some event
m,e This proves the third property of the determinate-value as-
e. For any reachable transition (P, σ ) =⇒RA (P ′ , σ ′), m = sertion. We prove the two remaining properties of the determinate-
σ .last(x) if either of the following conditions hold: value assertion:
σ
1. x =t v, for some value v, or • Since σ ′ .last(x) ∈ σ ′ .hbc(tid(e)), we have σ ′ .last(x) ∈
2. x is an update only location in σ . EWσ (t), and therefore σ ′ .OW(t) |x = {σ ′ .last(x)}.
14
σ
x = var(e) y = var(e) x → y m ∈ WrR|y
σ e ∈ U |y
σ0 = ((I, ∅), ∅, ∅) e ∈ Wr |x x =t v (m, e) ∈ sw
σ
I ⊆ IWr m = σ .last(x) m = σ .last(y) x →y
Init ModLast Transfer UOrd
σ0 σ′ σ′ σ′
x = t wrval(σ0 .last(x)) x = tid(e ) wrval(e) x = tid(e ) v x →y
e < Wr |x x = var(e) e ∈ RdA |x x , y e ∈ Wr |y e < Wr | {x,y }

σ σ σ
x =t v m ∈ WrR |x m = σ .last(x) x =tid(e ) v m = σ .last(y) x →y
NoMod AcqRd WOrd NoModOrd
σ′ σ′ σ′ σ′
x =t v x = tid(e ) rdval(e) x →y x →y
m,e
Figure 5. Rules for determinate-value and variable-order assertions. We assume σ , m, e, σ ′ satisfy (_, σ ) =⇒RA (_, σ ′).
• By (13), wrval(σ ′ .last(x)) = v iff wrval(σ .last(x)) = v, it follows that (σ ′ .last(x), σ ′.last(y)) ∈ σ ′ .hb, or equivalently
σ σ′
which is true because x =t ′ v. x → y.
AcqRd. We know σ ′ .mo |x = σ .mo |x and thus σ ′ .last(x) =
σ .last(x). Therefore by the assumption m = σ .last(x), we UpdOrd. There are two cases to consider. First, assume m ,
σ
have m = σ ′ .last(x). Because m ∈ EWσ ′ (t) and m is maxi- σ .last(y). In this case, σ ′ .last(y) = σ .last(y). Because x → y,
mal in σ ′ .mo |x we have σ ′ .OW(t) |x = {m} = {σ ′ .last(x)} we have (σ .last(x), σ .last(y)) ∈ σ .hb ⊆ σ ′ .hb, and thus
by definition of OWσ ′ . The fact that rdval(e) = wrval(m) fol- σ
(σ .last(x), σ ′.last(y)) ∈ σ ′ .hb. Because x → y (by the ir-
lows from the premises of rules Read and RMW. Finally, we reflexivity of hb), x and y are distinct variables and thus e <
have (m, e) ∈ sw ⊆ σ ′ .hb thus σ ′ .last(x) ∈ σ .hbc(t). Wr |x . Therefore, σ ′ .last(x) = σ .last(x), so (σ .last(x), σ ′.last(y)) ∈
σ ′ .hb as required.
Lemma B.3 (Establish Location-Order). For any reachable
m,e Second, assume m = σ .last(y). In this case σ ′ .last(y) = e.
transition (P, σ ) =⇒RA (P ′ , σ ′), the rules WriteOrd and Furthermore, because m ∈ WrR |y and e is an update (which
NoModOrd hold. is acquiring) we have (m, e) ∈ sw and therefore (m, e) ∈
σ
Proof. σ ′ .hb. Because x → y we have (σ .last(x), σ .last(y)) ∈ σ .hb ⊆
WriteOrd. Note that σ .last(x) = σ ′ .last(x) since x , y and σ ′ .hb and thus, (σ .last(x), m) ∈ σ ′ .hb so (σ .last(x), e) ∈
σ
e ∈ Wr |y . By x =tid(e ) v, we have σ .last(x) ∈ σ .hbc(tid(e)). σ ′ .hb by transitivity. Finally, because σ ′ .last(x) = σ .last(x)
Expanding the definition of hbc and reformulating slightly, and σ ′ .last(y) = e we have (σ ′ .last(x), σ ′.last(y)) ∈ σ ′ .hb
we see that as required.
σ .hbc(tid(e)) = I σ ∪ {e ′ | tid(e ′) = tid(e)} ∪

{e ′ | ∃e ′′ . tid(e ′′ ) = tid(e) ∧ (e ′, e ′′ ) ∈ σ .hb} C Relationship with Canonical C11
In this section, we describe the relationship between the ver-
Thus, there are three cases to consider. 1. σ .last(x) ∈ I σ . In
σ′ sion of the C11 semantics given in Section 4 and that of
this case, we get x → y since we will have [4], on which it is closely based. The semantics of [4] uses
(σ ′ .last(x), σ ′.last(y)) ∈ σ ′ .sb ⊆ σ ′ .hb. a notion of candidate execution as described below. We fo-
cus on the relationship between our notion of validity (Def-
2. tid(σ .last(x)) = tid(e). By transition rules Write and inition 4.2) of this paper, and their notion of consistency,
RMW of our operational semantics, (σ .last(x), e) ∈ σ ′ .sb which we call canonical consistency in this appendix. We
σ′ prove that, for any candidate execution, validity without the
and hence (σ ′ .last(x), e) ∈ σ ′ .hb, or equivalently x → y.
NoThinAir axiom and a version of canonical consistency
3. There exists an e ′ with tid(e ′) = tid(e) and (σ .last(x), e ′) ∈ (described below) are equivalent.
σ .hb. Because σ ′ .last(x) = σ .last(x) and σ .hb ⊆ σ ′ .hb we Batty et al [4] use a notion of candidate execution (Defini-
have (σ ′ .last(x), e ′) ∈ σ ′ .hb. By the modification transitions tion 7, [4]), which gives certain well-formedness conditions
of our operational semantics we also have (e ′, e) ∈ σ ′ .sb, on executions. For the purposes of this appendix, we define
and thus (putting the two together) we have (σ ′ .last(x), e) ∈ candidate executions as follows:
σ ′ .hb as required.
NoModOrd. This is easy to check since because e is not a Definition C.1 (Candidate Execution). A tuple ((D, sb), rf, mo)
modification of x or y we have σ ′ .last(x) = σ .last(x) and is a candidate execution if it satisfies the conjunction of the
σ ′ .last(y) = σ .last(y). Now, because (σ .last(x), σ .last(y)) ∈ conditions RF-Complete, MO-Valid and SB-Total of Defi-
σ nition 4.2 in our current paper.
σ .hb (the content of x → y) and the fact that σ .hb ⊆ σ ′ .hb,
15
Minor variations in presentation prevent us from claim- that explicitly requires the irreflexivity of the rf relation.
ing that the definition just given is strictly equivalent to Def- This second change is purely presentational, and does not
inition 7 of [4]. Principally, [4] employs an equivalence re- change the strength of the semantics.
lation to determine when two operations are on the same
Definition C.3 (Weak Canonical RAR Consistency). A can-
thread, whereas we index operations with a thread identi-
didate execution D = (D, sb, rf, mo) is canonically consis-
fier. Another difference is that Batty et al. [4] define the hb
tent if all the following conditions hold
relation such that initialising writes are hb-prior to all other
events, whereas we stipulate that initialising writes are sb- irrefl(hb) (HB)
prior to all other events (thus ensuring the hb-ordering indi- −1 ? ?
irrefl((rf ) ; mo; rf ; hb) (COH)
rectly). With these caveats aside, the definition of candidate
execution given here is essentially the same as that of [4]. irrefl(rf; hb) (RF)
Let (D, sb, rf, mo) be a candidate execution. As is true for irrefl(rf) (RFI)
all versions of the C11 memory model, canonical consis- −1
irrefl((mo; mo; rf ) ∪ (mo; rf)) (UPD)
tency is defined in terms of the happens-before relation, which
in turn is defined in terms of the synchronises-with relation. where hb is defined from D as usual.
The synchronises-with relation of [4], which we call canon-
As we shall see, the validity condition irrefl(eco? ; hb) (as
ical synchronises-with and denote by swC is slightly larger
used in the coherence condition of Definition 4.2 in our pa-
than our definition
per) captures the collective effect of conditions HB, COH
sw ⊆ swC and RF. The condition UPD, which we sometimes call update
The extra edges in swC relate to the so-called release se- atomicity, requires that each update appears in mo-order im-
quences, which we have ignored in our presentation. The mediately after the write that the update reads from. As we
effect of this relaxation is that our version defines a weaker shall see, the validity condition irrefl(eco) implies update
semantics, with more valid executions. atomicity, and for any candidate execution, the update atom-
The happens-before relation in [4], which we call canon- icity property implies irrefl(eco).
ical happens-before and denote hbC , is defined as follows The following lemma follows easily from the fact that
hbC = (sb ∪ (I × ¬I ) ∪ swC )+ hb ⊆ hbC .
where ¬I is the complement of the set of initialising writes. Lemma C.4. For any candidate execution
In our version of the semantics, I × ¬I ⊆ sb, and thus sb ∪ D = ((D, sb), rf, mo),
(I × ¬I ) = sb so
if D is canonical consistent, then it is weakly canonical consis-
hbC = (sb ∪ swC )+ tent.
similar to our definition. Thus, because sw ⊆ swC , hb ⊆
From now on, we consider only weak canonical consis-
hbC .
tency. Thus, when we refer to properties HB, COH, RF, and
We now present the definition of consistency given in [4]
UPD we mean those of weak canonical consistency.
as it relates to the RAR fragment.
For the remainder of the section, we work towards prov-
Definition C.2 (Canonical RAR Consistency). A candidate ing the following theorem
execution D = (D, sb, rf, mo) is canonically consistent if all
Theorem C.5. For any candidate execution
the following conditions hold
D = ((D, sb), rf, mo),
irrefl(hbC ) (HB-C)
irrefl((rf −1 )? ; mo; rf ? ; hbC ) (COH-C) D is weakly canonical consistent iff D satisfies Coherence of
Definition 4.2 on page 7.
irrefl(rf; hbC ) (RF-C)
As we shall see, much of our proof is about reformulating
irrefl(rf ∪ (mo; mo; rf −1 ) ∪ (mo; rf)) (UPD-C)
the various relations and axioms that make-up the canonical
where hbC is defined from D as above. memory model.
The following lemma provides a more convenient form
To account for the fact that sw ⊆ swC , and thus hb ⊆ hbC ,
for the UPD property.
we give a slightly weaker notion of canonical consistency,
called weak canonical RAR consistency, which we prove equiv- Lemma C.6. For any candidate execution
alent to our notion of validity. This weaker condition is ob-
D = ((D, sb), rf, mo),
tained from the stronger by replacing hbC by hb. Also, to
simplify presentation of the proof, we split the condition the UPD condition (that is, irrefl((mo; mo; rf −1 ) ∪ (mo; rf))) is
RF-C into two conditions: one called RF, and one called RFI equivalent to irrefl(fr; mo) ∧ irrefl(rf; mo).
16
Proof. First note that for any relations r , s we have both vi fr; fr ⊆ fr
irrefl(r ; s) ⇔ irrefl(s; r ) Proof. • i) Consider (x, y) ∈ rf and (y, z) ∈ fr. Because
irrefl(r ∪ s) ⇔ irrefl(r ) ∧ irrefl(s). rf is one-to-many, rf −1 (y) = x. Because (y, z) ∈ fr,
(rf −1 (y), z) ∈ mo. Therefore, (x, z) ∈ mo as required.
Using these equivalences, UPD is equivalent to
• ii) Consider (x, y) ∈ rf and (y, z) ∈ mo. Because y
irrefl(rf −1 ; mo; mo) ∧ irrefl(rf; mo). has an incoming rf edge it is a read, because it has an
It remains to show that irrefl(rf −1 ; mo; mo) is equivalent to outgoing mo edge, it is a modification, and so y is an
irrefl(fr; mo). Because fr ⊆ rf −1 ; mo, we have update. Thus, by Lemma C.7ii, (x, y) ∈ mo and then
the result follows by transitivity.
fr; mo ⊆ rf −1 ; mo; mo • iii) Consider (x, y) ∈ rf and (y, z) ∈ rf. Because y has
−1 both incoming and outgoing rf edges, it is an update.
and thus if irrefl(rf ; mo; mo) then irrefl(fr; mo).
Finally, we show that if there is a cycle in rf −1 ; mo; mo Thus, by Lemma C.7ii, (x, y) ∈ mo and then the result
then there is also one in fr; mo. Assume that (x, x) ∈ rf −1 ; mo; mo. is immediate.
Then there is some y such that (x, y) ∈ rf −1 ; mo and (y, x) ∈ • iv) Consider (x, y) ∈ mo and (y, z) ∈ fr. Because y has
mo. There are two cases to consider. In the first case, y = x. an incoming mo edge it is a modification, because it
But this is impossible because then we would have (x, x) ∈ has an outgoing fr edge, it is a read, and thus y is an
mo, contrary to the irreflexivity of mo. In the second case, update. Thus, by Lemma C.7i, (y, z) ∈ mo and now
x , y, but then (x, y) ∈ (rf −1 ; mo) − Id = fr so (x, x) ∈ fr; mo (x, z) ∈ mo by transitivity.
and we are done. • v) Let (x, z) ∈ fr; mo. Thus, there is some y , x such
that (x, y) ∈ rf −1 ; mo and (y, z) ∈ mo. Let w be unique
The first lemma says that each update operation can only such that (w, x) ∈ rf and so (w, y) ∈ mo. By transi-
read from its immediate mo predecessor. tivity of mo we have (w, z) ∈ mo and thus (x, z) ∈
Lemma C.7 (Update orderings). For any candidate execu- rf −1 ; mo. It remains to show that x , z. Assume oth-
tion (D, sb, rf, mo), satisfying UPD the following properties erwise. Then x is an update (because it has both an
hold for any update u ∈ D and event x ∈ D: incoming rf edge and an incoming mo edge). Now, by
i Lemma C.7i, because (x, y) ∈ fr, (x, y) ∈ mo and thus
(x, z) ∈ mo. But now x , y by the irreflexivity of mo.
(u, x) ∈ fr =⇒ (u, x) ∈ mo
• vi) Consider (x, y) ∈ fr and (y, z) ∈ fr. Because y
ii has an incoming fr edge it is a modification, and be-
(x, u) ∈ rf =⇒ (x, u) ∈ mo cause y has an outgoing fr edge it is a read. Thus,
Proof. Note first that mo must order u and x (in some direc- y is an update and by Lemma C.7i, (y, z) ∈ mo. So
tion). This is because var(u) = var(x), u is a modification, x now (x, z) ∈ fr; mo so by Property v of this Lemma,
is a modification (because it either has an incoming fr edge (x, z) ∈ fr.
or an outgoing rf edge) and mo is total over modifications
to the same location. Therefore, it is sufficient to derive a The next lemma presents a “closed-form” for the eco re-
contradiction from the assumption that mo orders the two lation, in which eco is defined without use of a transitive
operations the “wrong” way. closure, providing a simple set of cases that must be con-
Assume for Property i that (u, x) ∈ fr and (x, u) ∈ mo. sidered when analysing the relation. This is inspired by a
But then (u, u) ∈ fr; mo contrary to the UPD property, as similar expression in [17].
formulated in Lemma C.6.
Assume for Property ii that (x, u) ∈ rf and (u, x) ∈ mo. Lemma C.9 (eco cases). For any semi-consistent execution
But then (x, x) ∈ rf; mo contrary to the UPD property, as (D, sb, rf, mo), with update atomicity
formulated in Lemma C.6. eco = rf ∪ mo ∪ fr ∪ (mo; rf) ∪ (fr; rf)
We next need some properties about the structure of eco. Proof. Let
Lemma C.8 (Coherence inclusions). For any candidate ex- eco′ = rf ∪ mo ∪ fr ∪ (mo; rf) ∪ (fr; rf)
ecution (D, sb, rf, mo), that satisfies the UPD property the fol- We show that eco′ = eco.
lowing inclusions hold: Recall that eco = (fr ∪ mo ∪ rf)+ . It is easy to see that
i rf; fr ⊆ mo eco′ ⊆ eco (as each option in the union defining eco′ is
ii rf; mo ⊆ mo included in one or two steps of eco).
iii rf; rf ⊆ mo; rf We prove that eco ⊆ eco′ by induction.
iv mo; fr ⊆ mo Let p be a (nonempty) path through the transitive closure
v fr; mo ⊆ fr eco. Thus for each i such that i + 1 < |p| (where |p| is the
17
length of p) (pi , pi +1) ∈ fr ∪ mo ∪ rf (indexing from 0). We • mo; rf; fr ⊆ eco′ . But
prove by induction on the length of p that (p0, p |p |−1 ) ∈ eco′ , mo; rf; fr ⊆ mo; mo by Lemma C.8i
which is sufficient to show that eco ⊆ eco′ . For the base case,
p contains two elements, p0 and p1 , so we must prove that ⊆ mo by transitivity
′
(p0 , p1 ) ∈ eco′ . But this follows from the fact that ⊆ eco by definition of eco′
fr ∪ mo ∪ rf ⊆ rf ∪ mo ∪ fr ∪ (mo; rf) ∪ (fr; rf) • mo; rf; mo ⊆ eco′ . But
which is clear by inspection. For the induction, assume there mo; rf; mo ⊆ mo; mo by Lemma C.8ii
is some p ′ and x such that p ′ = p; hxi and both ⊆ mo by transitivity
′
(p0 , p |p |−1 ) ∈ eco ′
⊆ eco by definition of eco′
(p |p |−1 , x) ∈ fr ∪ mo ∪ rf • mo; rf; rf ⊆ eco′ . But
We must prove that (p0 , x) ∈ eco′. It is sufficient to show mo; rf; rf ⊆ mo; mo; rf by Lemma C.8iii
that ⊆ mo; rf by transitivity
eco′ ; (fr ∪ mo ∪ rf) ⊆ eco′ ′
But by distributivity of ; over ∪, this is equivalent to
• fr; rf; fr ⊆ eco′ . But
′ ′ ′ ′
(eco ; fr) ∪ (eco ; mo) ∪ (eco ; rf) ⊆ eco
fr; rf; fr ⊆ fr; mo by Lemma C.8i
Expanding the definition of eco′ and applying distributivity
⊆ fr by Lemma C.8v
once again we obtain 15 cases to check: five options in the ′
union defining eco′ combined with each of the three rela- ⊆ eco by definition of eco′
tions fr, mo, rf. The cases, and their proofs are as follows: • fr; rf; mo ⊆ eco′ . But
• rf; fr ⊆ eco′ . But fr; rf; mo ⊆ fr; mo by Lemma C.8ii
rf; fr ⊆ mo by Lemma C.8i ⊆ fr by Lemma C.8v
′ ′ ′
⊆ eco by definition of eco ⊆ eco by definition of eco′
• rf; mo ⊆ eco′ . But • fr; rf; rf ⊆ eco′ . But
rf; mo ⊆ mo by Lemma C.8ii fr; rf; rf ⊆ fr; mo; rf by Lemma C.8iii
′
⊆ eco by definition of eco′ ⊆ fr; rf by Lemma C.8v
′
• rf; rf ⊆ eco′ . But ⊆ eco by definition of eco′
rf; rf ⊆ mo; rf by Lemma C.8iii This completes our proof.
⊆ eco′ by definition of eco′ Lemma C.10 (Weak Canonical RAR Consistency implies
• mo; fr ⊆ eco′ . But eco-irreflexivity). For a candidate execution
mo; fr ⊆ mo by Lemma C.8iv D = ((D, sb), rf, mo),
⊆ eco′ by definition of eco′ if D is weakly canonical consistent then D satisfies irrefl(eco).
• mo; mo ⊆ eco′ . But Proof. Recall from Lemma C.9 that

mo; mo ⊆ mo by transitivity eco = rf ∪ mo ∪ fr ∪ (mo; rf) ∪ (fr; rf)
⊆ eco′ by definition of eco′ Assume for a contradiction that there is some (x, x) ∈ eco.
There are five cases to consider: one for each option of the
• mo; rf ⊆ eco′ . But this is true by definition of eco′ . union. It cannot be that (x, x) ∈ rf, (x, x) ∈ fr, (x, x) ∈ mo
• fr; fr ⊆ eco′ . But edges, because all these relations are irreflexive. Thus the
fr; fr ⊆ fr by Lemma C.8vi pair (x, x) must appear in one of the following: mo; rf or
⊆ eco′ by definition of eco′ fr; rf. We treat each case separately.
In the first case, we have (x, x) ∈ mo; rf for some x ∈ D.
• fr; mo ⊆ eco′ . But The relation mo; rf goes from modifying operations to read-
fr; mo ⊆ fr by Lemma C.8v ing operations, so again x must be an update. Let w ′ be the
′ modification satisfying (x, w ′) ∈ mo and (w ′, x) ∈ rf (the ex-
istence of this operation is guaranteed by the definition of
• fr; rf ⊆ eco′ . But this is true by definition of eco′ . relational composition). But now, by Property Lemma C.7i,
18
(w ′, x) ∈ mo and thus (x, x) ∈ mo contrary to the irreflexiv- Because irrefl(eco? ; hb) and eco; hb ⊆ eco? ; hb we have
ity of mo. irrefl(eco; hb). By Lemma C.11, this implies the conjunction
In the second case, we have (x, x) ∈ fr; rf. Let w be the of COH and RF. This completes our proof.
modification satisfying (x, w) ∈ fr and (w, x) ∈ rf (again,
the existence of this modification is guaranteed by relational
Lemma C.14 (Coherence implies Update Atomicity). For a
composition). Because
candidate execution D = ((D, sb), rf, mo), if D satisfies irrefl(eco? )
fr = rf −1 ; mo \ Id then D satisfies the update atomicity property UPD.
there is some modification w ′ satisfying (w ′, x) ∈ rf, (w ′, w) ∈ Proof. By Lemma C.6, UPD is equivalent to irrefl(fr; mo) ∧
mo and w ′ , w. But because rf is one-to-many rf −1 (x) = w irrefl(rf; mo). But fr; mo ⊆ eco, so because irrefl(eco) we
and rf −1 (x) = w ′ , and thus w = w ′ , a contradiction. This have irrefl(fr; mo). Likewise, rf; mo ⊆ eco so irrefl(rf; mo).
completes our proof. This is sufficient to prove UPD.
Lemma C.11. For a candidate execution The four lemmas C.10, C.12, C.13 and C.13 together imply
C.15 can now be used to prove our main theorem.
D = ((D, sb), rf, mo),
Theorem C.15. For any candidate execution
irrefl(eco; hb) is equivalent to the conjunction of COH and RF
D = ((D, sb), rf, mo),
(defined in Definition C.3).
D is weakly canonical consistent iff D satisfies SC Coherence
Proof. Let R = (rf −1 )?; mo; rf ? so that COH is equivalent to of Definition 4.2 on page 7.
irrefl(R; hb). Now, note that
Proof. For the left-to-right direction, see Lemmas C.10 and
(rf −1 )?; mo; rf ? = (mo ∪ fr); (rf ∪ Id) (14) C.12. For the right-to-left direction, see Lemmas C.13 and
def of fr, ref. clos. C.13.
= (mo ∪ (mo; rf) ∪ fr ∪ (fr; rf)) (15)
D Proof of Peterson’s algorithm
distrib of ; over ∪
A configuration (P, σ ) is an initial configuration of Peterson’s
Therefore, recalling from Lemma C.9 that algorithm if σ is an initial state of our semantics and the
following conditions hold:
eco = rf ∪ mo ∪ fr ∪ (mo; rf) ∪ (fr; rf)
P.pc t = 2 (16)
we have eco = R ∪ rf. Thus, because ; distributes over ∪,
eco; hb = (R; hb) ∪ (rf; hb), and so irrefl(eco; hb) is equiv- wrval(σ .last(turn)) ∈ {1, 2} (17)
alent to irrefl(R; hb ∪ rf; hb). But, because irrefl(r ∪ s) ⇔ wrval(σ .last(flagt )) = false (18)
irrefl(r ) ∧ irrefl(s) for any relations r , s, this is equivalent to for each t ∈ {1, 2}. The last condition here is not strictly
the conjunction of COH and RF. necessary for our proof, but it is needed to ensure that Pe-
terson’s algorithm makes progress.
Lemma C.12 (Weak Canonical RAR Consistency implies
eco; hb-irreflexivity). For a candidate execution Lemma D.1 (Peterson’s C11 Invariants). If (P, σ ) is a state
reachable of Peterson’s algorithm, then (P, σ ) satisfies the fol-
D = ((D, sb), rf, mo),
lowing for each t, tˆ ∈ {1, 2}.
whenever D is weakly canonical consistent, we have that D turn is an update-only location (19)
satisfies irrefl(eco? ; hb). σ σ
turn =1 2 ∨ turn =2 1 (20)
Proof. First, note that Property HB of weakly canonical con- σ
sistency ensures that irrefl(hb), so if there is a cycle in eco? ; hb P.pc t ∈ {3, 4, 5, 6} =⇒ flagt =t true (21)
then there is a cycle in eco; hb (so we must actually take σ
P.pc t ∈ {4, 5, 6} =⇒ flag t → turn (22)
an eco step). We prove that this later is impossible. But by
Lemma C.11, because D satisfies COH and RF we have irrefl(eco; hb) P.pc t ∈ {4, 5, 6} ∧ P.pc tˆ ∈ {4, 5, 6} =⇒ (23)
σ σ
as required. flag t̂ =t true ∨ turn =tˆ t
σ
Lemma C.13 (Coherence implies canonical coherence). For P.pc t = 5 ∧ P.pc tˆ ∈ {4, 5, 6} =⇒ turn =tˆ t (24)
σ
a candidate execution D = ((D, sb), rf, mo), if D satisfies irrefl(eco? ; hb) P.pc t = 2 =⇒ flagt = false (25)
then D satisfies all of HB, COH, and RF above.
Proof. We first prove that each property holds in the initial
Proof. Assume irrefl(eco? ; hb). Because irrefl(eco? ; hb) and configuration. Let (P, σ ) be an initial state. Thus, for each t,
hb ⊆ eco? ; hb we have irrefl(hb) as required for HB. σ .pct = 2. This is sufficient to show that all of the invariants
19
σ′
21, 22, 23 and 24 are true initially, as these invariants all (23’) [P ′ .pctˆ ∈ {4, 5, 6} ∧ P ′ .pct ∈ {4, 5, 6} =⇒ flag t = tˆ
assume that at least one thread t has P.pc t , 2. We show σ′
true ∨ turn = t t]. It is again sufficient that P ′ .pct <
that each remaining invariant holds as follows:
{4, 5, 6}.
(19) By definition, every location is update-only in an initial σ′
(24) [P ′ .pct = 5 ∧ P ′ .pctˆ ∈ {4, 5, 6} =⇒ turn = tˆ t]. It is
state sufficient that P ′ .pct , 5.
(20) This follows from Init, and the initial condition 17, σ′
σ σ (24’) [P ′ .pct̂ = 5 ∧ P ′ .pct ∈ {4, 5, 6} =⇒ turn = t t]. It is
turn = 0 or turn = 1.
again sufficient that P ′ .pct < {4, 5, 6}.
(25) This follows from Init, and the initial condition σ
(25) [P.pc t = 2 =⇒ flag t = false]. It is sufficient that
wrval(σ0 .last(f laдt )) = false. P ′ .pct = 3.
σ
(25’) P.pc tˆ = 2 =⇒ flag t̂ = false. From rule NoMod and
σ
We now prove, for each transition that each property is the fact that e < Wr |flag t̂ , it follows that if flag tˆ = false,
m,e σ′
preserved. Fix a transition (P, σ ) =⇒RA (P ′ , σ ′) with (P, σ ) then flag tˆ = false, which is sufficient to prove that the
satisfying the invariants of Figure D.1. Also, fix a thread t invariant is preserved.
(thus fixing tˆ), which is the thread executing the operation
represented by the transition. For each transition, we prove For the remaining cases, we do not explicitly state the in-
that each invariant is preserved. Where appropriate, we do variant that we are proving. The mapping from labels to in-
so for both t and tˆ. Invariants applied to tˆ are marked with a variants remains as above.
primed label. We ignore the execution of line 5, as the criti- Case 2: P.pc t = 3, and P ′ .pct = 4 and e = Ut (turn, v, t̂)
cal section does not modify the variables used in Peterson’s for some v. By Lemma 5.6, because e is an update and turn
algorithm. is an update-only location, m = σ .last(turn).
Case 1: P.pc t = 2, and P ′ .pct = 3 and e = Wt (flagt , true). (19) This follows because e is an update.
It follows from Lemma 5.6, and invariant 25 that (20) From the rule ModLast, and that e ∈ Wr |turn and
σ′
m = σ .last(flagt ). m = σ .last(turn) it follows that turn = t wrval(e) = tˆ,
which is sufficient.
(19) [turn is an update-only location in σ ′ ]. This follows (21) Note that by Invariant 21 applied to (P, σ ), we have
σ
because turn is an update-only location in σ , and e < flagt =t true. Then, from the rule NoMod, and the
σ′
Wr |turn . fact that e < Wr |flag t it follows that flagt = t true as
σ′ σ′ required.
(20) [turn = 1 2∨turn = 2 1]. From the rule NoMod, and the
σ
fact that e < Wr |turn it follows that if turn =1 2 (resp. (21’) From rule NoMod and the fact that e < Wr |flagt̂ , it
σ σ′
σ σ′ σ′ follows that if flagtˆ =tˆ true, then flag tˆ = tˆ true, which
turn =2 1) then turn = 1 2 (resp. turn = 2 1), which is
sufficient to prove that the invariant is preserved. is sufficient to prove that the invariant is preserved.
σ′ (22) Note that by Invariant 21 applied to (P, σ ), we have
(21) [P ′ .pct ∈ {3, 4, 5, 6} =⇒ flagt = t true]. From rule σ
flagt =t true. Then, from the rule WriteOrd, and
ModLast, the fact that e ∈ Wr |flagt and that m =
σ′
the fact that turn and flagt are distinct variables, e ∈
σ .last(flagt ) it follows that flag t = wrval(e) = true. σ′
σ′
Wr |turn and m = σ .last(turn), we have flagt → turn as
(21’) [P ′ .pctˆ ∈ {3, 4, 5, 6} =⇒ flagtˆ = tˆ true]. From rule required.
NoMod and the fact that e < Wr |flag t̂ , it follows that (22’) Note first that because turn is update only, m is an
σ σ′ update and thus m ∈ WrR |turn . Then, from rule Up-
if flagtˆ =tˆ true, then flag tˆ = tˆ true, which is sufficient
to prove that the invariant is preserved. dOrd and the fact that e ∈ U |turn it follows that if
σ′ σ σ′
(22) [P ′ .pct ∈ {4, 5, 6} =⇒ flagt → turn]. Similar to the flagtˆ → turn, then flagtˆ → turn, which is sufficient
proof for Invariant 21, P ′ .pct = 2 < {4, 5, 6}. to prove that the invariant is preserved.
σ′ (23) In the proof that this transition preserves Invariant 20,
(22’) [P ′ .pct̂ ∈ {4, 5, 6} =⇒ flag tˆ → turn]. From rule σ′
NoMod-Ord and the fact that e < Wr |flag t̂ ∪ Wr |turn , it we proved that turn = t wrval(e) = tˆ, which is suffi-
σ σ′ cient to prove that this current invariant is preserved.
follows that if flag tˆ → turn, then flag tˆ → turn, which σ′
(23’) Again, we know that turn = t wrval(e) = tˆ, which
is sufficient to prove that the invariant is preserved.
σ′ is sufficient to prove that this invariant is preserved
(23) [P ′ .pct ∈ {4, 5, 6} ∧ P ′ .pctˆ ∈ {4, 5, 6} =⇒ flag tˆ = t (bearing in mind that t and tˆ are transposed in this
σ′
true∨turn = tˆ t]. As before, it is sufficient that P ′ .pct < invariant).
{4, 5}. (24) It is sufficient that P ′ .pct , 5.
20
σ′
(24’) Again, the fact that turn = t wrval(e) = tˆ is enough. (20) This invariant is preserved by rule NoMod and the fact
(25) It is sufficient that P ′ .pct , 2. that e < Wr |turn .
(25’) From rule NoMod and the fact that e < Wr |flag t̂ , it (21) Note that P ′ .pct < {3, 4, 5, 6}, which is sufficient to
σ′ show that this invariant is preserved.
follows that if flagtˆ false, then flag tˆ = tˆ false, which is
(21’) This invariant is preserved by rule NoMod and the
sufficient to prove that the invariant is preserved.
fact that e < Wrflagt̂ .
Case 3: In this case, we consider the first test at line 4 (22) Note that P ′ .pct < {4, 5}, which is sufficient to show
flagt = false. If this test returns true, then nothing about that this invariant is preserved.
the state changes except that t moves to the second test (22’) This invariant is preserved by rule NoMod and the
in the condition. Because nothing about the state is chang- fact that e < Wr |flag t̂ ∪ Wr |turn .
ing, application of the rules NoMod and NoModOrd can be (23) Note that P ′ .pct < {4, 5, 6}, which is sufficient to show
used to show that all the invariants are preserved in a stan- that this invariant is preserved.
dard way. Therefore, we only consider in detail the situation (23’) Again, it is sufficient to note that P ′ .pct < {4, 5, 6}
when the test returns false. Thus, assume that P.pc t = 4, and (bearing in mind that t and tˆ are transposed in 23’).
P ′ .pct = 5, and e = Rt (flagtˆ , false). (24) This invariant is preserved by rule NoMod and the fact
Because e is not a write and the value of pct̂ does not that e < Wr |turn .
change, it is straightforward to use the rules NoMod and (24’) This invariant is preserved by rule NoMod and the
NoModOrd to show that each invariant except for 24 and fact that e < Wr |turn .
24’ are preserved. Because it is simpler, we first prove that (25) From rule ModLast and the fact that e ∈ Wr |flag t , m =
24’ is preserved. Briefly, P ′ .pct̂ = P.pc tˆ , and because e < σ′
σ σ′
σ .last(flagt ) and wrval(e) = false, we have flag t = t
Wr |flag t we have flagt =t̂ true =⇒ flagt = tˆ true (by rule false as required.
σ
NoMod), and because e < Wr |turn we have turn =tˆ tˆ =⇒ (25’) This invariant is preserved by rule NoMod and the
′
σ
turn = tˆ tˆ. These three properties are sufficient to show that fact that e < Wr |flag t̂ .
24’ is preserved. This completes our proof.
We now prove that 24 is preserved. We do so by proving
σ′
that turn = tˆ t under the assumption that P ′ .pctˆ ∈ {4, 5, 6}.
E Mechanisation in MemAlloy
Because P.pc t̂ = P ′ .pct̂ , we have P.pc tˆ ∈ {4, 5, 6}. Thus, The .cat files for MemAlloy
because P.pc t = 4, Invariant 23 guarantees that https://github.com/johnwickerson/memalloy
σ σ are given below.
flag t̂ =t true ∨ turn =tˆ = t
• c11_rar.cat contains our formalisation of the RAR
σ
But the disjunct flagtˆ =t must be false. If it were true, Lemma fragment.
5.3 the read e would have to return true, contrary to the hy- • c11_simp_2.cat is the same as c11_simp.cat, dis-
σ
pothesis that e = Rt (flag t̂ , false). Thus turn =tˆ = t. Then, tributed with MemAlloy, but imports c11_base_rar.cat
from rule NoMod, and the fact that e is not a write, we have instead of c11_base.cat.
σ′ • c11_base_rar.cat contains definitions common to
turn = tˆ t as required.
Case 4: In this case, we consider the second test at line both c11_rar.cat and c11_simp_2.cat. It is essen-
4 turn = tˆ. As before, if this test returns true, then all the tially the file distributed with MemAlloy, but with a
invariants are straight-forwardly preserved. So assume that simplified sw relation that ignores release sequences.
P.pc t = 4, and P ′ .pct = 5, and e = Rt (turn, t). It also omits events such as SC events that are not part
Again, because e is not a write and the value of pctˆ does of our C11 model.
not change, it is easy to show that each invariant except No differences were found between c11_rar.cat and c11_simp_2.cat
for 24 is preserved. We show that Invariant 24 is preserved for models up to size 7.
σ′ As a sanity check, both c11_rar.cat and c11_simp_2.cat
by proving that turn = tˆ t. By Lemma 5.3, and the fact that
σ were compared against c11_lidbury.cat,which formalises
e = Rt (turn, t) the assertion turn =t tˆ is false. Thus, by
σ Lidbury and Donaldson [19], revealing the exact same sets
Invariant 20, turn =tˆ t. Then, from rule NoMod, and the
σ′
of counterexamples.
fact that e is not a write, we have turn = tˆ t as required.
Case 5: P.pc t = 6, and P ′ .pct = 2 and e = Wt (flagt , false). E.1 File c11_rar.cat
It follows from Lemma 5.6, and Invariant 21 that (* This file imports c11_rar_base.cat and
m = σ .last(flagt ). rephrases the acyclicity axiom in terms of eco *)
(19) This invariant is preserved because e < Wr |turn . "C"

21
include "c11_rar_base.cat" let ur = (cnf & thd) \ (po | po^-1)

undefined_unless empty ur as Ur
let eco = (rf | co | fr)+
(* coherence, etc *)
irreflexive hb as hb_irr acyclic hbl | rf | co | fr as HbCom
irreflexive hb ; eco as hb_eco_irr
irreflexive eco as eco_irr (* no "if(r==0)" *)
deadness_requires empty if_zero as No_If_Zero
E.2 File c11_simp_2.cat
(* no unsequenced races *)
(* This is the C11 file distributed with deadness_requires empty ur as Dead_Ur
Memalloy, but imports c11_rar_base.cat
instead of c11_base.cat *) (* coherence edges are forced *)
deadness_requires empty unforced_co as Forced_Co
"C"
include "c11_rar_base.cat" (* external control dependency *)
let cde = ((rf \ thd) | ctrl)* ; ctrl
acyclic scp as Ssimp (* dependable release sequence *)
let drs = rs \ ([R]; !cde)
E.3 File c11_base_rar.cat (* dependable synchronises-with *)
"C" let dsw =
include "basic.cat" sw & (((fsb?; [REL]; drs?) \ (!ctrl; !cde)) ; rf)
(* dependable happens-before *)
(* Modifications to c11_base.cat *) let dhb = po?; (dsw;ctrl)*
(* self-satisfying cycle *)
(* synchronises with (sw) simplified to let ssc = id & cde
elide release sequences *) (* potential data race *)
let sw = [REL]; rf; [ACQ] let pdr = cnf \ (A*A)
(* reads-from on non-atomic location *)
empty [SC] as omitSC let narf = rf & (NAL*NAL)
empty [NAL] as omitNAL
empty [F] as omitF deadness_requires
empty [R & W] \ [REL & ACQ] as RAOnlyRMW empty pdr \ (dhb | dhb^-1 | narf;ssc | ssc;narf^-1)
as Dead_Pdr
(* Definitions below are from c11_base.cat *)
let scb = fsb?; (co | fr | hb); sbf?
let fsb = [F]; po let scp = (scb & (SC * SC)) \ id
let sbf = po; [F]
(* release sequence *)
let rs = poloc*; rf*
(* happens before *)
let hb = (po | sw)+
let hbl = hb & loc
(* conflict *)
let cnf = (((W*M) | (M*W)) & loc) \ id
(* data race *)
let dr = (cnf \ (A*A)) \ thd \ (hb | hb^-1)
undefined_unless empty dr as Dr
(* unsequenced race *)
22

Verifying C11 Programs Operationally: Simon Doherty Brijesh Dongol

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Verifying C11 Programs Operationally: Simon Doherty Brijesh Dongol

Uploaded by

Copyright:

Available Formats

Verifying C11 Programs Operationally

Simon Doherty Brijesh Dongol

Heike Wehrheim John Derrick

University of Paderborn University of Sheﬃeld

Abstract been a substantial eﬀort to develop an operational seman-

x ∈ fv(E) n ∈ Val x ∈ fv(E) n ∈ Val fv(E) , ∅ fv(E 1 ) , ∅ fv(E 1 ) = ∅

Figure 1. Expression evaluation

e < Wr |x x = var(e) e ∈ RdA |x x , y e ∈ Wr |y e < Wr | {x,y }

e < Wr |x x = var(e) e ∈ RdA |x x , y e ∈ Wr |y e < Wr | {x,y }

σ .hbc(tid(e)) = I σ ∪ {e ′ | tid(e ′) = tid(e)} ∪

mo; fr ⊆ mo by Lemma C.8iv D = ((D, sb), rf, mo),

⊆ eco′ by definition of eco′ if D is weakly canonical consistent then D satisfies irrefl(eco).

• mo; mo ⊆ eco′ . But Proof. Recall from Lemma C.9 that

(19) This invariant is preserved because e < Wr |turn . "C"

include "c11_rar_base.cat" let ur = (cnf & thd) \ (po | po^-1)

You might also like