You are on page 1of 26

Fetch&Increment, Queues, Sets and Stacks with Multiplicity:

A Relaxation that Allows Set-Linearizable Implementations from


Read/Write Operations∗

Armando Castañeda† Sergio Rajsbaum‡ Michel Raynal§

Abstract
Considering asynchronous shared memory systems in which any number of processes may
crash, this work identifies and gives relaxations for fetch&increment, queues, sets and stacks that
can be non-blocking or wait-free implemented using only Read/Write operations and without
Read-After-Write synchronization patterns. Set-linearizability, a generalization of linearizability
designed to specify concurrent behaviors, is used to express these relaxations formally, and
precisely identify the subset of executions which preserve the original sequential behavior. The
specifications allow for an item/value to be returned more than once by different operations,
but only in case of concurrency; we call such a property multiplicity. Hence, these definitions
give rise to new notions of the objects suited to concurrency. As far as we know, this work is the
first that provides relaxations of objects with consensus number two which can be implemented
on top of Read/Write registers only.

1 Introduction
1.1 Context
In asynchronous crash-prone systems where processes communicate by accessing a shared memory,
linearizable [24] implementations of concurrent counters, queues, stacks, sets, and other concur-
rent objects, need extensive synchronization among processes [32], which in turn might jeopardize
performance and scalability. It has been formally shown that this cost is sometimes unavoid-
able, under various specific assumptions [8, 9, 17]. However, often applications do not require all
guarantees offered by linearizable implementations [39]. Thus, much research has focused on im-
proving performance of concurrent objects by relaxing their semantics. Furthermore, several works
have focused on relaxations for queues and stacks, achieving significant performance improvements
(e.g. [19, 21, 27, 39]).
It is impossible however to implement queues and stacks, and in general any object with con-
sensus number two, with only Read/Write operations, without relaxing their specification. This

A preliminary version of the results in this paper were presented in OPODIS 2020 [14]. Supported by UNAM-
PAPIIT grants IN106520 and IN108720, and French grant Byblos ANR-20-CE25-0002-01.

Instituto de Matemáticas, Universidad Nacional Autónoma de México, Mexico; armando.castaneda@im.unam.mx;

Instituto de Matemáticas, Universidad Nacional Autónoma de México, Mexico; rajsbaum@im.unam.mx;
§
Univ Rennes, Inria, CNRS, IRISA, Rennes, France & Department of Computing, Polytechnic University, Hong
Kong; michel.raynal@irisa.fr

1
is because the consensus number of Read/Write operations is only one [22], hence too weak to
implement any of these objects. Thus, any such implementation requires the use of atomic Read-
Modify-Write operations, such as Compare&Swap or Test&Set, which in principle are slower than
simple Read/Write operations.1 To the best of our knowledge, even implementations of relaxed
versions of queues or stacks do not avoid the use of Read-Modify-Write operations.
In this article, we are interested in exploring if there are meaningful relaxations of objects
with consensus number greater than one that can be implemented using only simple Read/Write
operations. Namely, if there are non-trivial relaxations with consensus number one. Hence, this
work is a theoretical investigation of the power of the Read/Write model for concurrent objects with
formalized specifications.

1.2 The Multiplicity Relaxation


We identify and formally define relaxations of fetch&increment, queues, sets and stacks, (all of
them with consensus number two) that can be implemented using only Read/Write operations.
The relaxation is called multiplicity. Intuitively, it allows sets of operations to logically happen
concurrently if, from any state of the object, each operation transitions the state of the object in
the same way from that state, and return the same value. Thus, multiplicity applied to queues
allows an item to be returned by several dequeue operations, and applied to fetch&increment
similarly allows several operations to return the same integer value. It additionally requires that
the relaxation happens only if the operations are concurrent. As already argued [31], this style of
relaxation (namely, several operations returning the same value) applied to queue- and stack-type
concurrent objects can be useful in a wide range of applications, such as parallel garbage collection,
fixed point computations in program analysis, constraint solvers (e.g. SAT solvers), state space
search exploration in model checking, as well as integer and mixed programming solvers.
One of the main challenges in designing concurrent data structures lies in the difficulty of
formally specifying what is meant by “concurrent specification”. To provide such formal specifi-
cations, we use the set-linearizability formalism [13, 33], a specification method that is useful to
specify the behavior of a set-sequential object with concurrent patterns of operation invocations,
instead of only in sequential patterns. Using this specification method, we are able to precisely
state in which concurrent executions the relaxed behavior of the object is allowed to take place, and
demand a strict behavior in other executions, especially when operation invocations are sequential.
More precisely, these specifications force any implementation to exhibit the classical sequential
behavior in absence concurrency. Hence the specifications provide us with well-defined concurrent
fetch&increment, queues, sets and stacks suited to crash-tolerant read/write based systems.
We present first set-linearizable implementations of sets and fetch&increment with multiplicity
from Read/Write operations. For the case queues and stacks, we follow a modular approach that
exploits the composability property of set-linearizability [13], namely, we show set-linearizable
implementations of queues and stacks with multiplicity, assuming sets and fetch&increment with
multiplicity as base objects. All implementation are wait-free [22], except the one for the queue
which is only non-blocking [24]. By composability, these four implementations provide Read/Write
set-linearizable implementations of queues and stacks with multiplicity.
An interesting feature of the proposed implementations is the lack of Read-After-Write patterns.
Read-After-Write, which is the bases of the flag principle [23], is a useful synchronization mechanism
1
High contention however can make Read/Write operations slower.

2
in which a process first writes in a shared variable and then reads a different shared variable. This
mechanism is widely used, for example, in the famous Lamport’s bakery mutual exclusion algorithm
(see for example [23, 37, 42]). In our implementations, every operations performs a sequence of
reads, followed possibly by a sequence of writes, hence avoiding Read-After-Write. It s well-known
that Read-After-Write patterns are expensive to implement in real multicore architectures, and
avoiding such pattern can increase performance, e,g. [31]. Furthermore, it has been shown that any
non-blocking linearizable implementation of a queue or a stack (as well as other concurrent objects)
must uses either Read-Modify-Write operations or Read-After-Write patterns [9]. Therefore, queues
and stacks with multiplicity are non-trivial relaxations that evade this impossibility result.
All together, our results show that very simple synchronization mechanisms, i.e. Read/Write
with no Read-After-Write, are able to implement non-trivial concurrent objects.
Since we are interested in the computability power of Read/Write operations to implement
concurrent objects (that otherwise are impossible), our algorithms are presented in an idealized
shared-memory computational model with arrays of infinite length. We hope these algorithms will
help to develop a better understanding of fundamentals that can derive solutions for real multicore
architectures, with good performance and scalability.

1.3 Related Work


By the universality of consensus [22], we know that every concurrent object specified with a se-
quential specification can be wait-free linearizable implemented using Read/Write registers and base
objects with consensus number ∞, e.g. Compare&Swap [23, 37, 42]. However, the resulting imple-
mentation might not be efficient because first, as it is universal, the construction does not exploit
the semantics of the particular object, and Compare&Swap may be an expensive base operation.
Moreover, such an approach would prevent us from investigating the power and the limit of the
Read/Write world, (as it was done for Snapshot object for which there are several linearizable wait-
free Read/Write efficient implementations, e.g. [1, 7, 10, 25]), and finding accordingly meaningful
specifications of concurrent objects with efficient Read/Write-implementations.
It has been frequently pointed out that classic concurrent data structures have to be relaxed
in order to support scalability, and examples are known showing how natural relaxations on the
ordering guarantees of queues or stacks can result in higher performance and greater scalability [39].
Thus, for the past ten years there has been a surge of interest in relaxed concurrent data structures
from practitioners (e.g. [34]). Also, theoreticians have identified inherent limitations in achieving
high scalability in the implementation of linearizable objects [8, 9, 17].
Some articles relax the sequential specification of traditional data structures, while others relax
their correctness condition requirements. As an example of relaxing the requirement of a sequential
data structure, [21, 26, 27, 35] present a k-FIFO queue (called out-of-order in [21]) in which, roughly
speaking, elements may be dequeued out of FIFO order up to a constant k ≥ 0. A family of relaxed
queues and stacks is introduced in [40], and studied from a computability point of view (consensus
numbers). It is defined in [21] the k-stuttering relaxation of a queue/stack, where an item can be
returned by a dequeue/pop operation without actually removing the item, up to k ≥ 0 times, even
in sequential executions. Our queue/stack with multiplicity is a stronger version of k-stuttering,
in the sense that an item can be returned by two operations if and only if the operations are
concurrent. Relaxed priority queues (in the flavor of [40]) and associated performance experiments
are presented in [6, 43].

3
Other works design a weakening of the consistency condition. For instance, quasi-linearizabi-
lity [3], which models relaxed data structures through a distance function from valid sequential
executions. This work provides examples of quasi-linearizable concurrent implementations that
outperform state of the art standard implementations. A quantitative relaxation framework to
formally specify relaxed objects is introduced in [19, 21] where relaxed queues, stacks and priority
queues are studied. This framework is more powerful than quasi-linearizability. It is shown in [41]
that linearizability and three data type relaxations studied in [21], k-out-of-order, k-lateness, and
k-stuttering, can also be defined as consistency conditions. Local linearizability [18] is a relaxed
consistency condition that is applicable to set-type concurrent data structures like pools, queues,
and stacks. The notion of distributional linearizability [5] captures randomized relaxations. This
formalism is applied to MultiQueues [38], a family of concurrent data structures implementing re-
laxed concurrent priority queues. The previous works use relaxed specifications, but still sequential,
while we relax the specification to make it concurrent (using set-linearizability).
Idempotent work-stealing [31] is a relaxation of work-stealing where every item is taken at least
once instead of exactly once. Work-stealing is a popular technique to implement load balancing
in a distributed manner, in which each process maintains its own set of tasks where only the
owner can insert tasks but any process can extract tasks from the set; the tasks are taken in some
order, e.g. FIFO or LIFO. When the order is FIFO/LIFO, the sets are nothing else than single-
enqueuer/pusher multi-dequeuer/popper queues/stacks. It has been shown that the idempotent
relaxation provide weaker guarantees that multiplicity [12]. Particularly, in idempotent work-
stealing an item can be extracted an unbounded number of times by non-concurrent operations.
Concurrency-awareness [20] is a formalism equivalent to set-linearizability. Modular formal
verification techniques are provided in [20], that are used to obtain a fully modular linearizability
proof of the elimination stack [15], which uses set-linearizable/concurrency-aware exchanger objects.
Our model of computation with set-linearizable objects is similar to that in [20] with concurrent-
aware objects.

1.4 Organization
The article is organized as follows. Section 2 presets the standard asynchronous model of com-
putation with atomic sequential base objects, while Section 3 presents the correctness conditions
considered here, namely, linearizability and set-linearizability. Sets and Fetch&Inc with multiplic-
ity are defined and Read/Write set-linearizable implemented in Section 4. To construct on top of
these implementations, Section 5 defines a generalization of the standard model where base objects
are set-sequential instead of sequential. Then, queues and stacks with multiplicity are defined an
Read/Write set-linearizable implemented in Section 6, following a modular approach. The paper
concludes with a final discussion in Section 7.

2 Model of Computation with Atomic/Linearizable Base Objects


We consider first the standard concurrent system model with n asynchronous processes, p0 , . . . , pn−1 ,
which may crash at any time during an execution, namely, a process that crashes stops taking steps.
The index of process pi is i. Processes communicate with each other by invoking atomic operations
on shared base objects. Atomicity means that every operation occur instantaneously, and even if

4
several processes concurrently invoke operations of the same object, the operations are serialized,
in some order.
A base object can provide atomic Read/ Write operations (such an object is henceforth called
a register ), or more powerful atomic Read-Modify-Write operations, such as Fetch&Inc, Swap or
Compare&Swap. All base objects are formally specified with a sequential state machine (details
in Section 3). Here we just mention the base objects used in our algorithms. The operation
R.Swap(x) atomically reads the current value of R, sets its value to x and returns R’s old value,
R.Fetch&Inc() atomically adds 1 to the current value of R and returns the previous value, while
R.Compare&Swap(new, old) is a conditional replacement operation that atomically checks if the
current value of R is equal to old, and if so, replaces it with new and returns true; otherwise, R
remains unchanged and the operation returns false.
A (high-level) concurrent object is defined by a state machine consisting of a set of states, a finite
set of operations, and a set of transitions between states. The specification does not necessarily have
to be sequential, namely, state transitions might involve several invocations. Section 3 formalizes
this notion and the different types of objects considered in this paper.
An implementation of a concurrent object T is a distributed algorithm A consisting of local
state machines A1 , . . . , An . Local machine Ai specifies which operations on base objects pi executes
in order to return a response when it invokes a high-level operation of T . Each of these base objects
operation invocations is a step.
An execution of A is a possibly infinite sequence of steps, namely, executions of base objects
operations, plus invocations and responses to high-level operations of the concurrent object T , with
the following well-formedness properties:

1. Each process is sequential. It first invokes a high-level operation, and only when it has a
corresponding response, it can invoke another high-level operation.

2. A process takes steps only between an invocation and a response.

3. For any invocation to an operation op, denoted invi (op), of a process pi , the steps of pi
between that invocation and its corresponding response (if there is one), denoted resi (op),
are steps that are specified by Ai when pi invokes op.

An operation in an execution is complete if both its invocation and response appear in the
execution. An operation is pending if only its invocation appears in the execution. It is assumed
that after a process completes an operation, non-deterministically picks the operation it executes
next. For sake of simplicity, and without loss of generality, we identify the invocation of an operation
with its first step, and its response with its last step.
A process is correct in an infinite execution if it takes infinitely many steps. An implemen-
tation is wait-free if in every infinite execution, every correct process completes infinitely many
operations [22]. An implementation is non-blocking if in every infinite execution, infinitely many
operations complete [24]. Thus, a wait-free implementation is non-blocking but not necessarily vice
versa.
An implementation uses Read-After-Write if it has an execution in which one of its high-level
operations writes in shared variable and then reads a different shared variable. We will provide
Read/Write implementations that avoid this synchronization pattern. Concretely, any operation
first performs a sequence of reads, followed possibly by sequence writes.

5
The consensus number of a shared object O is the maximum number of processes that can
solve the well-known consensus problem, using any number of instances of O in addition to any
number of Read/Write registers [22]. Consensus numbers induce the consensus hierarchy where
objects are classified according their consensus numbers. The simple Read/Write operations stand
at the bottom of the hierarchy, with consensus number one. At the top of the hierarchy we find
operations with infinite consensus number, like Compare&Swap, that provide the maximum possible
coordination.

3 Correctness Conditions
3.1 The Linearizability Correctness Condition
Linearizability [24] is the standard notion used to define a correct concurrent implementation of
an object defined by a sequential specification. It generalizes the notion of atomicity of registers
where all operations happens one after the other, even if several operations are invoked concurrently.
Intuitively, an execution is linearizable if operations can be ordered sequentially, without reordering
non-overlapping operations, so that their responses satisfy the specification of the implemented
object.
A sequential specification of a concurrent object T is a state machine specified through a tran-
sition function δ. Given a state q and an invocation invi (op) of process pi , δ(q, invi (op)) returns
the tuple (q 0 , resi (op)) (or a set of tuples if the machine is non-deterministic) indicating that the
machine moves to state q 0 and the response to op is resi (op). The sequences of invocation-response
tuples, hinvi (op) : resi (op)i, produced by the state machine are its sequential executions. For sake
of clarity, a tuple hinvi (op) : resi (op)i is simply denoted op. Also, subscripts of invocations and
responses are omitted.
It is not difficult to provide sequential specifications of the base objects with operations Fetch&Inc,
Swap or Compare&Swap, that are only informally defined in Section 2.
To formalize linearizability we define a partial order <E on the complete operations of an
execution E: op <E op0 if and only if res(op) precedes inv(op0 ) in E. Two complete operations are
concurrent, denoted op ||E op0 , if they are incomparable by <E . The execution is sequential if <E
is a total order.

Definition 1 (Linearizability). Let A be an implementation of a concurrent object T . A finite


execution E of A is linearizable if there is a sequential execution S of T such that

1. S contains every complete operation of E and might contain some pending operations.2 Inputs
and outputs of invocations and responses in S agree with inputs and outputs in E.

2. For every two complete operations op and op0 in E, if op <E op0 , then op <S op0 .

We say that A is linearizable if each of its executions is linearizable.

An important property of linearizability is that of being composable (also called local ) [24].
This property implies that if an atomic base object X of the model is replaced with a non-atomic
wait-free linearizable implementation A of X, any execution of the system with A is equivalent to
an execution where A is atomic, since A gives the “illusion” that all its operations invocations are
2
Note that this avoid the need of defining linearizability using the notion of completions of executions.

6
atomic. Formally, for every execution E with A, there is an execution F with X (instead of A)
such that every non-atomic operation op of A with inputs args and response res is mapped to an
atomic step hX.op(args) : resi, satisfying that <F ⊆ <E , i.e. F respects the real-time order in E.
This property allows us to the design algorithms in a modular manner, as any linearization of F
directly gives a linearization of E, hence we can reason directly about F .

3.2 The Set-Linearizability Correctness Condition


To formally specify our concurrent objects, we use the formalism provided by the set-linearizability
consistency condition [13, 33]. Roughly speaking, set-linearizability allows us to linearize several
operations in the same point, namely, all these operations are executed concurrently. Figure 1
schematizes the differences between the two consistency conditions where each double-end arrow
represents an operation execution. It is known that set-linearizability has strictly more expressive-
ness power than linearizability. Particularly, our concurrent queues and stacks with multiplicity
cannot be succinctly specified with the linearizability formalism.
A set-sequential specification of a concurrent object differs from a sequential execution in that
δ receives as input the current state q of the machine and a set Inv = {invid1 (op1 ), . . . , invidt (opt )}
of operation invocations, and δ(q, Inv) returns (q 0 , Res), where q 0 is the next state and Res =
{resid1 (op1 ), . . . , residt (opt )} are the responses to the invocations in Inv and each idi denotes the
index of the invoking/responding process. Intuitively, all operations op1 , . . . , opt are performed con-
currently and move the machine from state q to q 0 . The sequence of sets Inv, Res is a concurrency
class of the machine. The sequences of concurrency classes produced by the state machine are its
set-sequential executions. In our set-sequential specifications, invocations will be subscripted with
the index of the invoking process only when there is more that one invocation in a concurrency
class. Observe that a set-sequential specification in which all concurrency classes have a single
element corresponds to a sequential specification.
Given a set-sequential execution S of a set-sequential object, the partial order <S on the oper-
ations of S is defined as as above: op <S op0 if and only if res(op) precedes inv(op0 ) in S, namely,
the concurrency class of op appears before the concurrency class of op0 .

Linearizability

Set-Linearizability

Figure 1: Linearizability requires a total order on the operations while set-linearizability allows several
operations to be linearized at the same linearization point.

Definition 2 (Set-linearizability). Let A be an implementation of a concurrent object T . A finite


execution E of A is set-linearizable if there is a set-sequential execution S of T such that
1. S contains every completed operation of E and might contain some pending operations. Inputs
and outputs of invocations and responses in S agree with inputs and outputs in E.

7
2. For every two completed operations op and op0 in E, if op <E op0 , then op <S op0 .
We say that A is set-linearizable if each of its executions is set-linearizable.
As linearizability, the set-linearizability formalism is composable [13]. We will use this property
in Section 5 to define a variant of the standard model with base objects that are set-atomic, i.e.,
several operations might happen concurrently at the same set-step of the base object. The base
objects are then replaced with wait-free set-linearizable implementations of them, to obtain our
set-linearizable queues and stacks with multiplicity in a modular manner.

4 Sets and Fetch&Inc with Multiplicity


We derive our set-linearizable queues and stacks with multiplicity from set-linearizable implementa-
tions of sets and readable Fetch&Inc, both with multiplicity. This section presents the set-sequential
specification of the objects and provide wait-free set-linearizable implementations of them that use
only Read/Write operations and no Read-After-Write patterns. In Section 6, we exploit the com-
posability property of set-linearizability to derive our queues and stacks with multiplicity.

4.1 Sets with Multiplicity


In a set with multiplicity, concurrent Take operations are allowed to return the same item and no
item in the set is lost, and if a Take operation finds that the set is empty, it also returns the number
of items that have been put in the set (and also taken from the set) so far; the last property will be
useful in our set-linearizable implementation of a queue with multiplicity. The formal set-sequential
specification is:
Definition 3 (The Set-Sequential Set with Multiplicity). The universe of items that can be put in
the set is N = {1, 2, . . .}, and the set of states Q is the cartesian product between the set with all
finite subsets of N, and N itself. The initial state is (∅, 0). The transitions are the following:
1. For every (S, q) ∈ Q,

δ((S, q), Put(x)) = ((S ∪ {x}, q + 1), hPut(x) : truei).

2. For every (S, q) ∈ Q \ {(∅, y)|y ∈ N} and t ∈ {1, . . . , n} distinct process indexes, id1 , . . . , idt ,

δ((S, q), {Takeid1 (), . . . , Takeidt ()}) = {((S\{x}, q), {hTakeid1 () : xi, . . . , hTakeidt () : xi})|x ∈ S}.

3. For every (∅, q) ∈ Q with q ∈ N,

δ((∅, q), Take()) = ((∅, q), hTake() : (∅, q)i).

Note that the second transition is non-deterministic and correspond to concurrent operations
that can take any of the items in the set.
Figure 2 contains a simple Read/Write wait-free set-linearizable implementation of a set with
multiplicity. It assumes that every item is put in the set at most once and every process invokes
Put at most once. The object with these assumption is easy to implement3 , and it suffices for our
3
Although the object can be Read/Write implemented without them.

8
Shared Variables:
M [1, . . . , n] : array of Read/Write base objects initialized to ⊥

Operation Put(xi ) is
(01) M [i].Write(xi )
(02) return true
end Put

Operation Take() is
(03) mi ← Read all entries of M , in some order
(04) while true do
(05) if ∃j, mi [j] 6= ⊥, > then
(06) M [j].Write(>)
(07) return mi [j]
(08) end if
(09) m0i ← Read all entries of M , in some order
(10) if ∀j, mi [j] = m0i [j] then
(11) qi ← # entries of mi containing >
(12) return (∅, qi )
(13) end if
(14) mi ← m0i
(15) end while
end Take

Figure 2: A wait-free set-lineariazable implementation of a set with multiplicity (code for process pi ). The
implementations assumes that every item is put in the set at most once and every process invokes Put at
most once.

purposes in the next sections. In the implementation, every process has a dedicated entry in a
shared array M where it writes its item when it invokes Put. When a process invokes Take, it scans
M until it either finds an item to be returned, after marking the item’s position in M as taken
(line 06), or returns the number of items that have been inserted in the set so far, if it discover
that the set is empty (i.e. it takes a clean double collect in line 09).

Theorem 1. The algorithm in Figure 2 is a wait-free set-linearizable implementation of a set with


multiplicity that uses only Read/Write operations and no Read-After-Write patterns, assuming that
every item is put in the set at most once and every process invokes Put at most once.

Proof. The implementations only Read/Write operations, and does not use Read-After-Write pat-
terns as Put only writes and Take performs a sequence of reads followed by possibly a write. As for
progress, Put is obviously wait-free. It directly follows from the algorithm that a Take operation
continues taking steps as long as a more items are put and taken from the set (hence the condition
in line 10 does not hold). Since, by assumption, in every infinite execution there are finitely many
Put operations, eventually every Put operation either returns. Thus, Take is wait-free too.
We now argue that the implementation is set-linearizable. Let E be any finite execution of the
implementation. Since the algorithm is wait-free, there is an extension of E in which all its pending
operations are completed, and no new operation is started. Any set-linearization of such extension
is a set-linearization of E. Thus, without loss of generality, we can assume that all operations in E
are completed.
We obtain a set-linearization SE of E. In the set-linearization, at any time the state of the
object is encoded in M : the items in the set are the items in M , i.e. the values distinct from ⊥
and >, and the number of items that have been inserted so far is the number of items in M plus
the number of items that have been taken, i.e. the entries containing >.

9
Every Put operation is set-linearized at its Write step, in a concurrency class for itself. Let op be
any Take operation in E that returns in line 12, and consider the first Read step e of op among the
n Read steps that it takes in line 09, before it returns. Since the condition in line 10 is true (in the
while loop iteration e is part of), M contains no item at step e (i.e. each of its entries contains ⊥
or >). Thus, op is set-linearized at its step e, in a concurrency class for itself. We now set-linearize
all Take operations that return items. Consider the set Ax with all Take operations in E that
return the same item x. By assumption, every item is put in the set at most once, and hence every
operation in Ax reads x from the same entry M [j]. Observe that all Read steps of the operations
in Ax that read x from M [j] appear in E before the first Write step, denoted ex , that appears in E,
among the Write steps of the operations in Ax that write > in M [j]; note that ex is the only step
among those Write steps that actually removes x from the set. We set-linearize all operations in Ax
at the step ex , i.e. the operations form a concurrency class placed at ex . The concurrency classes
just defined give the set-sequential execution SE . By construction it is a set-sequential execution of
the set with multiplicity because (1) every Take operations that returns in line 12 is set-linearized
at a step where M contains no item and its entries containing > are precisely the number of items
that have been inserted (and removed) so far, and (2) a Take operation can return an item x if and
only if x is written first in M , by a Put operation.
To conclude the proof, we argue that SE is a set-linearization of E. By construction, E and SE
have the same operations with same responses. Consider operations op and op0 such that op <E op0 .
In SE , every operation is set-linearized at a step that lies between the invocation and response of
the operation. Thus, the concurrency class of op appears before the concurrency class of op0 in SE ,
i.e. op <SE op0 . We conclude that SE is a set-linearization of E.

4.2 Fetch&Inc with Multiplicity


Fetch&Inc with multiplicity allows distinct operations to obtain the same value, but only if they
are concurrent. Formally, the set-sequential specification appear next, which also provides a Read
operation; the Read operation will be useful in the set-linearizable implementations of queues and
stacks with multiplicity.
Definition 4 (The Set-Sequential Readable Fetch&Inc with Multiplicity). The set of states is
N = {1, 2, . . .}, with the initial state being 1. The transitions are the following:
1. For every state q ∈ N and t ∈ {1, . . . , n} distinct process indexes, id1 , . . . , idt ,

δ(q, {Fetch&Incid1 , . . . , Fetch&Incidt ()}) = (q+1, {hFetch&Incid1 () : qi, . . . , hFetch&Incidt () : qi}).

2. For every state q ∈ N,


δ(q, Read()) = (q, hRead() : qi).

Our set-linearizable implementation of Fetch&Inc with multiplicity uses n-dimensional shared


array M of Read/Write base objects, each initialized to 1. When calling Fetch&Inc, process pi first
reads (in some order) all entries of M , stores the maximum value it reads in a local variable xi ,
then calls M [i].Write(xi + 1) and finally returns xi . A process calling Read simply reads all entries
of M and returns the maximum among them.
Theorem 2. There is a wait-free and set-linearizable implementation of readable Fetch&Inc with
multiplicity that uses only Read/Write operations and does not use Read-After-Write patterns.

10
Proof. Consider the Read/Write implementation described just above. Clearly, the implementation
is wait-free. It does not use Read-After-Write patter because Fetch&Inc performs a sequence of reads
followed by a write, and Read performs a sequence of reads. We argue that it is set-linearizable
too.
Let E be any finite execution of the implementation. Since the algorithm is wait-free, there
is an extension of E in which all its pending operations are completed, and no new operation is
started. Any set-linearization of such extension is a set-linearization of E. Thus, without loss of
generality, we can assume that all operations in E are completed.
We obtain a set-linearization SE of E. In the set-linearization, at any time the state of the
object is the maximum value in M . Consider the set Aq with all Fetch&Inc operations in E that
return the same value q; hence all these operations write q + 1 in M . Let eq be the step among the
Write steps of the operations in Aq that appears first in E; let op denote the operation eq belongs
to. We claim that eq lies between the invocation and response of any operation op0 in Aq distinct
from op: eq cannot appear before the invocation of op0 because the response of op0 would be at
least q + 1, instead of q, and eq cannot appear after the invocation of op0 because eq would not be
the first step that writes q + 1, contradicting the definition of eq . We set-linearize all operations in
Aq at step eq , i.e. the operations form a concurrency class placed at eq .
We now linearize (i.e. in a concurrency class for itself) each Read operation in E. Consider any
such operation op, and let q denote the value it returns. Let e be the first Read step of op, and let
f denote the first Read step of op that reads q. Note that at step e, no entry of M contains a value
greater than q, becase if so, op would return a value grater than q. Operation op is linearized as
follows. If at e no entry of M contains a value greater than q, then op is linearized at e. Otherwise,
op is linearized right before the the first step (not of op) that writes a value greater than q in M
and lies between e and f ; if several Read operations are linearized at the same place, any sequential
order is chosen.
The concurrency classes define the set-sequential execution SE . We next show that SE is a
set-linearization of E.
First, we argue that SE is a set-sequential execution of readable Fetch&Inc with multiplicity.
Let X be the maximum value in M at the end of E. From inverse induction on q ∈ {1, . . . , X − 1},
we can argue that there is a step eq in E, as defined above: the only way an entry of M stores X
at some point is because an operation performs Write(X), and hence eX−1 exists; the operations
eX−1 belongs to reads X − 1 from an entry of M , and hence M stores X − 1 at some point, and
the argument continues using the same reasoning. Observe now that eq appears in E before eq+1 :
an operation can perform Write(q + 2) only if it reads q + 1 from an entry of M first. Thus we have
that in E, e1 appears first, next e2 , then e3 , and so on up to eX−1 . By construction, the Fetch&Inc
operations returning q are set-linearized at the first step that writes q + 1 in M , and every Read
operation returning q of is linearized right before a step where M does not contain a value greater
than q. Thus we have that SE is a set-sequential execution of the Fetch&Inc with multiplicity.
To conclude that SE is a set-linearization of E, we first observe that, by construction, E and SE
have the same operations with same responses. Consider operations op and op0 such that op <E op0 .
In SE , every operation is set-linearized at a step that lies between the invocation and response of
the operation. Thus, the concurrency class of op in SE appears before the concurrency class of op0
in SE , i.e. op <SE op0 . Therefore, SE is a set-linearization of E.

11
5 Model of Computation with Set-Atomic/Set-Linearizable Base
Objects
In the rest of the paper we consider a generalization of the standard model described in Sec-
tion 2, where base object are set-atomic instead of atomic. Base objects are now specified with
set-sequential state machines (see Section 3), and hence several operations might be executed con-
currently at the same set-step of the object, i.e. a set with operations of the same base object, only
if the specification of an object allows it. More specifically, if at a given time of an execution several
processes are about to invoke operations of the same set-atomic base object X, and the operations
are allowed to be executed concurrently, then the next set-step of X might include all operations;
X is allowed to serialize the operations too.
The notions of implementation, pending and complete operations, linearizability, set-linearizabi-
lity, wait-freedom and non-blockingness are the same in the generalized model. The only difference
is that now an execution of an implementation is a possibly infinite sequence of set-steps of base
objects, plus sets of invocations of high-level operations and set of responses to high-level operations
of the high-level object to be implemented. The executions satisfy the usual well-formedness prop-
erties. The partial relation <E over the set of completed operations of an execution E is defined
similarly.
It is important to mention that a process is “unaware” if its operation of base objects belong
to set-steps with other operations. That is, in an execution E, the view of a process in E is only
the sequence of pi ’s operations of base objects that appear in E. It can however however the case
that the specification of a base object provides information telling a process that a given operation
appears in a set-step with several operations (e.g. we can define a version of a set with multiplicity
in which Take additionally returns a boolean indicating if the returned item is taken by more than
one Take operation).
In our set-linearizable queue and stack with multiplicity implementations, the set-atomic base
object will be the set with multiplicity and the readable Fetch&Inc with multiplicity defined in
Section 4. By the specification of readable Fetch&Inc with multiplicity, set-steps with Read opera-
tions can contain only a single operation. Similarly, by the specification of a set with multiplicity,
set-steps with Put operations or Take that return empty, are singletons. By the composability
property of set-linearizability, from the wait-free Read/Write implementations of these objects in
Section 4, we can obtain fully Read/Write set-linearizable implementations of queues and stacks
with multiplicity.

6 Stacks and Queues with Multiplicity


In this section we explore relaxations of queues and stacks that can be set-linearizable implemented
using only Read/Write operations. As already anticipated, the implementations are derived in a
modular manner from the set-linearizable implementations in Section 4. The case of the stack is
presented first and then the case of the queue.

12
6.1 Stacks with Multiplicity
Roughly speaking, our concurrent stack allows concurrent Pop operations to obtain the same item,
but all items are returned in LIFO order, and no pushed item is lost. Formally, our stack is specified
as follows:
Definition 5 (The Set-Sequential Stack with Multiplicity). The universe of items that can be
pushed is N = {1, 2, . . .}, and the set of states Q is the infinite set of strings N∗ . The initial state
is the empty string, denoted . In state q, the first element in q represents the top of the stack,
which might be empty if q is the empty string. The transitions are the following:
1. For every state q ∈ Q,

δ(q, Push(x)) = (x · q, hPush(x) : truei).

2. For every state x · q ∈ Q \ {} and t ∈ {1, . . . , n} distinct process indexes, id1 , . . . , idt ,

δ(x · q, {Popid1 (), . . . , Popidt ()}) = (q, {hPopid1 () : xi, . . . , hPopidt () : xi}).

3.
δ(, Pop()) = (, hPop() : i).

We obtain a set-linearizable implementation of a stack with multiplicity from the simple lin-
earizable wait-free stack implementation of Afek, Gafni and Morisson [2], which uses Fetch&Inc
and Test&Set base objects, whose consensus number is 2. Figure 3 contains a slight variant of this
algorithm that uses Swap and readable Fetch&Inc objects, both with consensus number 2.4 A Push
operation reserves a slot in Item by atomically reading and incrementing T op (Line 01) and then
places its item in the corresponding position (Line 02). A Pop operation simply reads the T op of the
stack (Line 04) and scans down Items from that position (Line 05), trying to obtain an item with
the help of a Swap operation (Lines 06 and 07); if the operation cannot get an item (a non-⊥ value),
it returns empty (denoted , Line 09). In what follows, we call this implementation Seq-Stack. It
is worth mentioning that, although Seq-Stack has a simple structure, its linearizability proof is far
from trivial, the difficult part being proving that items are taken in LIFO order.
The algorithm in Figure 4 is a wait-free set-linearizable implementation of the stack with mul-
tiplicity, which we call Set-Seq-Stack. The implementation is a simple modification of Seq-Stack,
where the readable Fetch&Inc base object in T op is replaced with a readable Fetch&Inc with mul-
tiplicity base object in T op sets, and the Swap base objects in Items are replaced by sets with
multiplicity base objects in Sets. Since several proceses can get the same value from T op sets, in
Line 01 (as Fetch&Inc has multiplicity), several Push operation can store their items in the same
set in Sets, in Line 02; note that the set-sequential specification of T op sets implies that this can
happen only if the Push operations are concurrent (as the steps in Line 02 of the operations must
belong to the same set-step of T op sets). Similarly, since the sets in Sets have multiplicity, it can
happen that several Pop operations get the same item from the same set, in Line 06; again, the
set-sequential specification of sets with multiplicity directly implies that those Pop operations are
concurrent.
For simplicity and without loss of generality, the proof of the following theorem assumes that
every item is pushed at most once in every execution.
4
The authors themselves explain in [2] how to replace Test&Set with Swap.

13
Shared Variables:
T op : readable Fetch&Inc base object initialized to 1
Items[1, . . .] : array of Swap base objects initialized to ⊥

Operation Push(xi ) is
(01) topi ← T op.Fetch&Inc()
(02) Items[topi ].Write(xi )
(03) return true
end Push

Operation Pop() is
(04) topi ← T op.Read() − 1
(05) for ri ← topi down to 1 do
(06) xi ← Items[ri ].Swap(⊥)
(07) if xi 6= ⊥ then return xi end if
(08) end for
(09) return 
end Pop

Figure 3: Stack implementation Seq-Stack of Afek, Gafni and Morisson [2] (code for process pi ).

Shared Variables:
T op sets : readable Fetch&Inc with multiplicity base object initialized to 1
Sets[1, . . .] : array of sets with multiplicity base objects initialized to ∅

Operation Push(x) is
(01) topi ← T op sets.Fetch&Inc()
(02) Sets[topi ].Put(x)
(03) return true
end Push

Operation Pop() is
(04) topi ← T op sets.Read() − 1
(05) for ri ← topi down to 1 do
(06) xi ← Sets[ri ].Take()
(07) if xi 6= (∅, ·) then return xi end if
(08) end for
(09) return 
end Pop

Figure 4: Set-Seq-Stack: a set-linearizable wait-free stack with multiplicity (code for process pi ).

Theorem 3. The algorithm Set-Seq-Stack (Figure 4) is a wait-free set-linearizable implementation


of the stack with multiplicity, from readable Fetch&Inc with multiplicity and sets with multiplicity
set-sequential base objects.
Proof. It follows directly from pseudocode that the implementation is wait-free. Thus, we focus
on proving that the implementation is set-linearizable. Let E be any execution of Set-Seq-Stack.
Since the algorithm is wait-free, there is an extension of E in which all its pending operations
are completed, and no new operation is started. Any set-linearization of such extension is a set-
linearization of E. Thus, without loss of generality, we can assume all operations in E are completed.

Removing multiplicity of Pop operations. To set-linearize E, we first obtain a related exe-


cution F of Set-Seq-Stack. Let x be an item that is popped by several Pop operations. As already
explained, all these operations must be concurrent, as the set-sequential specification of a set with
multiplicity implies that their Take steps in Line 06 that obtain x belong to the same set-step of the
same base object in Sets. We remove the steps, invocations and responses of all these operation,

14
except for the one whose Read step in Line 04 appears last in E; namely, we keep only the Pop
operation that spans the shortest invocation-response interval in E (recall that invocations and re-
sponses of operations are identified with their firsts and lasts steps, as explained in Section 2). We
do the same for each such item x. Let F denote be the resulting sequence of steps, invocations and
responses. Observe that F is actually an execution of Set-Seq-Stack: the removed Pop operations
only read T op sets, and F still contains a Pop operation that pops the item of each removed Pop
operation.

Linearizability implies set-linearizability. For the time being, suppose that F is set-linearizable
and let SetLin(F ) be any set-linearization of it. We make two claims about F . The first one is that
SetLin(F ) is actually a linearization of a stack (i.e. each concurrency class is a singleton), namely,
with no relaxation. The claim directly follows from the fact that every item in F is popped at most
once and the set-sequential specification of a stack with multiplicity allows concurrency classes with
more than one operation only in the case of Pop operations that take the same item.
The second claim is that a set-linearization SetLin(E) of E can be obtained from SetLin(F )
by adding every Pop operation removed from E to the concurrency class of SetLin(F ) with the
Pop operation returning the same item. The main observation to prove the claim is that, by
construction, the invocation-response interval of every Pop operation removed from E, contains
the invocation-response interval of the Pop in F that returns the same item. To prove the claim,
consider two operations op and op0 of E such that op <E op0 . We will argue that the concurrency
class of op appears before the concurrency class of op0 in SetLin(E), namely, op <SetLin(E) op0 , from
which follows that SetLin(E) is indeed a set-linearization of E (as E and SetLin(E) have the same
set of operations and with the same responses, by construction, and SetLin(E) is a set-sequential
execution of a stack with multiplicity, since SetLin(F ) is a sequential execution of the stack). We
have four cases:

1. op and op0 appear in F . We have that op <F op0 , and hence op <SetLin(F ) op0 , since SetLin(F )
is a linearization of F . By definition of SetLin(E), op <SetLin(E) op0 .

2. op appears in F and op0 does not appear in F . It must be that op0 is a Pop operation that
returns an item, say x. Let op00 be the Pop operation of F that returns x. As already explained,
in E, the invocation-response interval of op0 contains the invocation-response interval of op00 .
Then, op <E op00 , which implies that op <F op00 . Thus, op <SetLin(F ) op00 , since SetLin(F ) is
a linearization of F , and then op <SetLin(E) op00 , by definition of SetLin(E). Since op0 belongs
to the concurrency class of op00 in SetLin(E), we finally have op <SetLin(E) op0 .

3. op does not appear in F and op0 appears in F . Then, op is a Pop operation that returns
an item, say x; let op00 be the Pop operation of F that returns x. In E, the responses of op
and op00 occur at the same Take set-step in Line 06 (recall that invocations and responses
of operations are identified with their firsts and lasts steps), and hence op00 <E op0 , which
implies that op00 <F op0 . Thus, op00 <SetLin(F ) op0 , since SetLin(F ) is a linearization of F , and
then op00 <SetLin(E) op0 , by definition of SetLin(E). Since op belongs to the concurrency class
of op00 in SetLin(E), we have op <SetLin(E) op0 .

4. op and op0 does not appear in F . In this case op and op0 are Pop operations that return
distinct items, say x and y, respectively. Let op00 and op000 be the Pop operations of F that
return x and y, respectively. Using a similar reasoning as in the previous two cases, we

15
can show that op00 <SetLin(E) op000 , and then op <SetLin(E) op0 , as op and op0 belong to the
concurrency classes of op00 and op000 in SetLin(E), respectively.

From Set-Seq-Stack to Seq-Stack. Thus, to conclude our set-linearizability proof of Set-Seq-


Stack, we just need to argue that F is indeed linearizable. We do so by a sort of reduction to the
linearizability proof of Seq-Stack. We transform F into an execution H of Seq-Stack such that every
linearization Lin(H) of H gives a linearization Lin(F ) of F ; since Seq-Stack is linearizable, Lin(H)
exists, and hence Lin(F ) exists too, from which we conclude that Set-Seq-Stack is set-linearizable.
..
.

T op sets ..
4 .

⊥ 7 T op
3 f e f 6
e 5
d 4
2 d b 3
a 2
c c 1
1 a b
Items
Sets

Figure 5: Graphical description of the transformation from F to H.

The main idea in the transformation is that the number of Pop operations that have executed
Line 01 of Set-Seq-Stack so far gives the current value of T ail in Seq-Stack, and the contents of Sets
in Set-Seq-Stack encode the content of the Items array in Seq-Stack. The idea of the construction
is schematized with an example in Figure 5.

An example of the transformation. In the example, the state of T op sets and Sets at the
end of F appear at the left. In F , only six Pop operations happen, some of them pending and some
completed; three of the Pop operations get one from T op sets in Line 01, one operation gets two,
and two operations get three. A non-underlined item in a set of Sets indicates that the item has
actually been put in the set, i.e., the Push operation pushing the item is complete, hence it has
already executed Line 02, whereas an underlined item indicates that the item has not been put in
the set yet, namely, the Push operation pushing the item is pending, and then it has not executed
Line 02. The state of T op and Items at the end of H appear at the right of the figure. In H, the
same six Push operations happen too, and hence T op contains seven at the end of H. Each set in
Sets defines a segment of Items; the length of a segment equals the number of Push operations
that put their item in the corresponding set. As before, an underlined item in Items indicates
that the Push operation pushing that item is pending and has not executed it step in Line 02 of
Seq-Stack so far. To complete the example, H is obtained from F as follows. Consider the set-step
T op set.Fetch&Inc in F that returns 1; let e denote that step. Then, the steps T op sets.Fetch&Inc
in Line 01 of Push(a), Push(b) and Push(c) belong to e (i.e. they are concurrent). In H, e is

16
replaced with T op.Fetch&Inc of Push(c) that returns 1, followed by T op.Fetch&Inc of Push(a) that
returns 2, and finally T op.Fetch&Inc of Push(b) that returns 3. These steps correspond to steps
in Line 01 of Seq-Stack, and the specific ordering is because c, a and b are stored in the example
in Items[1], Items[2] and Items[3], respectively. We do a similar replacement with the other set-
steps T op set.Fetch&Inc in F . Consider now the step Sets[1].Put(a) in Line 02 of Push(a) in F
(which is assumed to exist as a is not underlined); this step is replaced with the Items[2].Write(a)
of Push(a), which corresponds to a step in Line 02 of Seq-Stack. We do a similar replacement with
all Sets[·].Put(·) steps in F . This gives the execution H of Seq-Stack. A main idea is that each set
in an entry of Sets in F is represented by its induced segment of Items in H.
Sets[1].Put(a) Sets[1].Put(c) Sets[1].Put(b)
F
Sets[1].Take() : (∅, 0) Sets[1].Take() : a Sets[1].Take() : c

Items[2].Write(a) Items[1].Write(c) Items[3].Write(b)


H
Items[3, 2, 1].Swap(⊥) : ⊥, ⊥, ⊥ Items[3, 2].Swap(⊥) : ⊥, a Items[3, 2, 1].Swap(⊥) : ⊥, ⊥, c

Figure 6: Extracting the items order from F .

A natural question that arises after our example is the order of the items in a segment of Items
defined by a set in Sets. As a concrete case in our example, the items in Sets[1] are placed c, a and
b in Items[1], Items[2] and Items[3] in H, respectively. Why did we pick this specific order? In the
example, there is no particular reason: since there are no Pop operations in F , any possible order
of a, b and c in the segment Items[1, 2, 3] gives an execution of Seq-Stack; it similarly happens with
the items in Sets[3]. However, if there a Pop operations that get any of these items, such order is
important for H to be an actual execution of Seq-Stack. To exemplify this, consider the situation
depicted in Figure 6. Now F , at the top of the figure, contains steps Sets[1].Take that correspond
to Line 06, and belong to distinct Pop operations; the first step does not get an item (and hence
the corresponding Pop operation returns ), while the second and the third do get items. We look
at these Sets[1].Take steps to obtain a partial order ≺ on the items in the set Sets[1], and then
extend ≺ to a total order; such total order gives the order of the items in the segment Items[1, 2, 3].
Consider the step hSets[1].Take() : ai. Right before that step, Sets[1] contains a and c, and
since a is taken first among the two items, we will add the relation c ≺ a. Consider the next step
hSets[1].Take() : ci. Right before that step, Sets[1] contains only c, and hence the step does not add
relations. We now consider any total order that includes the relation c ≺ a, for example c ≺ a ≺ b.
Then, we assign c, a and b to Items[1], Items[2] and Items[3], respectively, which agrees with the
total order c ≺ a ≺ b.
To obtain H, each step Sets[1].Take() of F is replaced in H with a sequence of Swap steps
that correspond to steps in Line 06 of Seq-Stack, and involve some or all of the entries in the
segment Items[1, 2, 3]. The resulting execution H is depicted at the bottom of Figure 6. The
step hSets[1].Take() : (∅, 0)i is replaced with step hItems[3].Swap(⊥) : ⊥i, hItems[2].Swap(⊥) : ⊥i
and hItems[1].Swap(⊥) : ⊥i, in that order (abbreviated in Figure 6 with Items[3, 2, 1].Swap(⊥) :
⊥, ⊥, ⊥); note that the values obtained by these operations are consistent with the contents in
Items[1, 2, 3] at that step. Similarly, the step hSets[1].Take() : ai is replaced with hItems[3].Swap(⊥) :

17
⊥i, hItems[2].Swap(⊥) : ai, and finally hSets[1].Take() : ci is replaced with hItems[3].Swap(⊥) : ⊥i,
hItems[2].Swap(⊥) : ⊥i and hItems[1].Swap(⊥) : ci.

The transformation in detail. We formalize the transformation just described. First we present
some definitions, considering the state of the shared base object at the end of F . Let T op sets
itself denote its content at the end of F . For each q ≥ 1, let Sq be the set containing every item
x such that Push(x) appears in F and the operation obtains q in its step in Line 01. We have
that: (1) for every Push(x) in F , there is a set Sq containing x, by definition of the sets Sq ; (2) for
every q ≥ T op set, Sq is empty, by de specification of readable Fetch&Inc with multiplicity; (3) for
every q < T op set, Sq is non-empty, by de specification of readable Fetch&Inc with multiplicity;
(4) for every two distinct q, r < T op set, Sq and Sr are disjoint, since every item is pushed at
most once, by assumption. For each x ∈ Sq , let set(x) be the integer q. Also let |S≤q | denote the
sum |S1 | + |S2 | + . . . + |Sq |, and let |S0 | denote 0. From now on, we consider only sets Sq with
1 ≤ q < T op set. Each such set defines a segment of Items[1, 2, . . . , |S≤T op set−1 |], whose length
equals the cardinality of the set. Specifically, Sq defines the segment Items[|S≤q−1 | + 1, . . . , |S≤q |];
alternatively, we say that the segment is defined by Sq .
We now assign each item in Sq to and entry in the segment defined by Sq . To do so, we define
a strict partial order ≺ over the items in Sq , and then extend it to a total order from which the
assignment is directly obtained. We define ≺ by considering the steps Sets[q].Take in F , that
correspond to Line 06. Let e be any of these steps that returns an item, say x, and let (T, r)
be the state of the set Sets[q] right before e. Then, x ∈ T . For every y ∈ T distinct to x, we
add the relation y ≺ x. We do the same with any such step e. All Sets[q].Take steps that return
(∅, ·) do not add any relation. We have that ≺ is irreflexive (as every item is pushed at most one,
by assumption) and antisymmetric (as once y ≺ x is set, x is removed from Sets[q] and never
pushed again). Consider the transitive extension of ≺, and consider any linear extension of the
resulting strict partial order. By abuse of notation, let ≺ denote that total order. Let us denote
the items in Sq as x0 , x1 , . . . , x|Sq |−1 such that x0 ≺ x1 ≺ . . . ≺ x|Sq |−1 . Then, each xi is assigned
to Items[|S≤q−1 | + 1 + i]; namely, x0 is assigned to Items[|S≤q−1 | + 1], x1 to Items[|S≤q−1 | + 2],
and so on, up to x|Sq |−1 that is assigned to Items[|S≤q−1 | + 1 + |Sq | − 1] = Items[|S≤q |]. For each
xi ∈ Sq , let id(xi ) denote |S≤q−1 | + 1 + i, i..e. the index of Items that is assigned to.
We are now ready to explain in detail how F is transformed into H, arguing during the expla-
nation that H is actually an execution of Seq-Stack. We take any operation in F and replace its
steps with steps of Seq-Stack with consistent responses.

• Push operations. Consider any set-step T os top.Fetch&Inc in F ; let e denote the step. Then,
e contains steps that correspond to Line 01 and are of some concurrent Push operations of
F . Observe that the set Sq as defined above contains the items pushed by these operations,
where q is the value that the Fetch&Inc’s return in e (hence e contains as many Fetch&Inc
operations as the number of items in Sq , and the state of T op sets right before e is q).
Let x0 , x0 , . . . , x|Sq |−1 denote the items in Sq such that id(x1 ) < id(x2 ) < . . . < id(x|Sq |−1 ).
By definition, id(xi ) = |S≤q−1 | + 1 + i. Now, in H, e is replaced with a sequnce of steps
T op.Fetch&Inc, each of them corresponding to a step in Line 01 of Seq-Stack. Concretely, e
is replaced with the next sequence with |Sq | steps:

hT op.Fetch&Inc() : id(x0 )i, hT op.Fetch&Inc() : id(x1 )i, . . . , hT op.Fetch&Inc() : id(x|Sq |−1 )i,

18
where hT op.Fetch&Inc() : id(xi )i corresponds to the step of Push(xi ) in Line 01, in H. Observe
that in H, T op’s content before the sequence that replaces e is id(x0 ) = |S≤q−1 | + 1, and
its content after the sequence is id(x|Sq |−1 ) + 1 = |S≤q | + 1, which is consistent with the
specifications of Fetch&Inc. Thus, T op follows the specification of Fetch&Inc in H.
Consider any Push(x) in F . We just said how its step in Line 01 is replaced in H. We
now deal with its step Sets[set(x)].Put(x) corresponding to Line 02, which will be denoted
e: the step is simply replaced with hItems[id(x)].Write(x) : truei; this step is of Push(x) in
H and corresponds to Line 02 of Seq-Stack, and will be denoted H(e). Note that in H, the
step of Push(x) corresponding to Line 01 obtains id(x), as defined above, and hence it is
consistent with Seq-Stack that Push(x) writes in Items[id(x)]. Also observe that in F , the
set in Sets[set(x)] contains x right after e happens, and in H, the segment of Items defined
by Sset(x) contains x in its entry Items[id(x)] right after H(e) happens.

• Pop operations. Consider any Pop operation of F , and let e be its T op sets.Read step corre-
sponding to Line 04. In H, this step is replaced with hT op.Read() : |S≤q | + 1i, where q + 1
is the content of T ops sets at e; let H(e) denote that step of H, which corresponds to a step
of Line 04 of Seq-Stack. We argue that the response of e(H) in H is consistent. Prior to e in
F , there are q set-steps that correspond to T op sets.Fetch&Inc in Line 01; these q set-steps
are replaced in H as mentioned above. By construction, the content of T op in H right before
e(H) is |S≤q | + 1, from which follows that the response of H(e) in H is consistent.
Consider now any step Sets[q].Take of the same Pop operation, which corresponds to a
step in Line 06. Let e denote that step. Consider the set Sq as defined before, and let
x0 , x0 , . . . , x|Sq |−1 denote the items in Sq such that id(x1 ) < id(x2 ) < . . . < id(x|Sq |−1 ). Sup-
pose first that e returns y. By definition, y ∈ Sq . Let xi such that y = xi . Then, e is replaced
in H with the next sequence of steps, which will be denoted H(e):

hItems[id(x|Sq |−1 )].Swap(⊥) : ⊥i, . . . , hItems[id(xi+1 )].Swap(⊥) : ⊥i, hItems[id(xi )].Swap(⊥) : xi i.

These steps correspond to steps in Line 06 of Seq-Stack, all of them of the same Pop operation
in H. The set Sets[q] contains xi right before e, in F . Thus, the response in the step
hItems[id(xi )].Swap(⊥) : xi i in H(e) is consistent; note that xi does not appear in the segment
defined by Sq after H(e) in H. We argue that the responses in the steps in H(e) before
hItems[id(xi )].Swap(⊥) : xi i are consistent too. The only way one of these responses is
inconsistent is if in F , Sets[q] contains an item z right before e, and z = xj with id(xj ) >
id(xi ). Namely, hItems[id(xj )].Swap(⊥) : ⊥i appears before hItems[id(xi )].Swap(⊥) : xi i
in H(e) (and hence the response in hItems[id(xj )].Swap(⊥) : ⊥i is incorrect in H becase
Items[id(xj )] = xj right before e(H)). But this cannot happen, because xj ≺ xi , by definition
of ≺, and hence, the index assigned to xj is smaller than the index assigned to xi , namely,
id(xj ) < id(xi ), from which follows that hItems[id(xj )].Swap(⊥) : ⊥i does not even appear
in H(e). Thus, all responses in H(e) that replace e are consistent.
To conclude the transformation, suppose that e returns (∅, ·). Then, e is replaced in H with
the sequence of steps:

hItems[id(x|Sq |−1 )].Swap(⊥) : ⊥i, . . . , hItems[id(x1 )].Swap(⊥) : ⊥i, hItems[id(x0 )].Swap(⊥) : ⊥i.

As Sets[q] is empty right before e in F , all responses in the sequence are consistent.

19
Therefore, F is transformed into an execution H of Seq-Stack. By construction, F and H have
the same operations and with the same responses. Moreover, <F =<H , since only some steps of F
are replaced to obtain H. Thus, any linearization of H is a linearization of F . This completes the
proof of the theorem.

The composability property of set-linearizability implies that a fully Read/Write and wait-
free set-linearizable implementation of a stack with multiplicity is be obtained from Set-Seq-Stack
and the wait-free Read/Write set-linearizable implementations in Section 4. Furthermore, when
replacing those implementations in Set-Seq-Stack, Push executes a sequence of reads followed by
two writes, and Pop executes a sequence of reads, possibly followed by one write. We therefore
have:

Corollary 1. Set-sequential stacks with multiplicity can be wait-free set-linearizable implemented


using only Read/Write base objects and no Read-After-Write patterns.

6.2 Queues with Multiplicity


The specification of set-sequential queues with multiplicity is very similar to that of set-sequential
stacks with multiplicity:

Definition 6 (The Set-Sequential Queue with Multiplicity). The universe of items that can be
enqueued is N = {1, 2, . . .}, and the set of states Q is the infinite set of strings N∗ . The initial
state is the empty string, denoted . In state q, the first element in q represents the head of the
queue, which might be empty if q is the empty string. The transitions are the following:

1. For every state q ∈ Q,


δ(q, Enq(x)) = (q · x, hEnq(x) : truei).

2. For every state x · q ∈ Q \ {} and t ∈ {1, . . . , n} distinct process indexes, id1 , . . . , idt ,

δ(x · q, {Deqid1 (), . . . , Deqidt ()}) = (q, {hDeqid1 () : xi, . . . , hDeqidt () : xi}).

3.
δ(, Deq()) = (, hDeq() : i).

We now consider the linearizable queue implementation in Figure 7, which uses objects with
consensus number two. The idea of the implementation, which we call Seq-Queue, is similar to that
of Seq-Stack in the previous section. Differently from Seq-Stack, whose operations are wait-free,
Seq-Queue has a wait-free Enqueue operation, denoted Enq, and a non-blocking Dequeue operation,
denoted Deq.5
Seq-Queue is a slight modification of the non-blocking queue implementation of Li [28], which
in turn is a variation of the blocking queue implementation of Herlihy and Wing [24]. Each Enq
operation simply reserves a slot for its item by performing Fetch&Inc to the tail of the queue,
Line 01, and then stores it in Items, Line 02. A Deq operation repeatedly tries to obtain an item
scanning Items from position 1 to the tail of the queue (from its perspective), Line 08; every time
it sees an item has been stored in an entry of Items, Lines 10 and 11, it tries to obtain the item
5
It is an open question if there are wait-free linearizable queue implementations using only objects with consensus
number two (see [4, 11, 16, 29, 30]).

20
by atomically replacing it with >, which signals that the item stored in that entry has been taken,
Line 12. While scanning, the operation records the number of items that has been taken (from its
perspective), Line 14, and if this number is equal to the number of items that were taken in the
previous scan and T ail has not changed its value (cf. double clean scan), it declares the queue is
empty, Line 17. Despite its simplicity, Seq-Queue’s linearizability proof is far from trivial.

Shared Variables:
T ail : Fetch&Inc base object initialized to 1
Items[1, . . .] : array of Swap base objects initialized to ⊥

Operation Enq(xi ) is
(01) taili ← T ail.Fetch&Inc()
(02) Items[taili ].Write(xi )
(03) return true
end Enq

Operation Deq() is
(04) taken0i ← 0
(05) taili0 ← 0
(06) while true do
(07) takeni ← 0
(08) taili ← T ail.Read() − 1
(09) for ri ← 1 up to taili do
(10) xi ← Items[ri ].Read()
(11) if xi 6= ⊥ then
(12) xi ← Items[ri ].Swap(>)
(13) if xi 6= > then return xi end if
(14) takeni ← takeni + 1
(15) end if
(16) end for
(17) if takeni = taken0i and taili = taili0 then return  end if
(18) taken0i ← takeni
(19) taili0 ← taili
(20) end while
end Deq

Figure 7: Non-blocking linearizable queue Seq-Queue from base objects with consensus number 2 (code for
pi ).

The algorithm in Figure 8 is a set-linearizable non-blocking implementation of a queue with


multiplicity, which we call Set-Seq-Queue. The algorithm is a variant of Seq-Queue that replaces
T ail with a Fetch&Inc with multiplicity base object in T ail sets, and similarly replaces the array
Items with Swap base objects with the array Sets with sets with multiplicity base objects. Note
that the value of takeni is updated when Sets[ri ].T ake() is empty, in Line 12 (recall that q denotes
the number of items that have been inserted/removed so far in/from the set). The correctness
proof of Set-Seq-Queue is similar to the correctness proof of Set-Seq-Stack in Theorem 3.

Theorem 4. The algorithm Set-Seq-Queue (Figure 8) is a non-blocking set-linearizable imple-


mentation of the stack with multiplicity, from readable Fetch&Inc with multiplicity and sets with
multiplicity set-sequential base objects.

Proof. First, observe that the Enq method is wait-free. To prove that Deq is lock-free, it is enough
to observe that the only way a Deq operations never terminates is because it sets larger values in
takeni or taili at the end of each iteration of the while loop, which can only happen if there are
new Enq operations and Deq operations, implying that infinitely many operations are completed.

21
Shared Variables:
T ail sets : readable Fetch&Inc with multiplicity base object initialized to 1
Sets[1, . . .] : array of sets with multiplicity base objects initialized to ∅

Operation Enq(x) is
(01) taili ← T ail sets.Fetch&Inc()
(02) Sets[taili ].Write(x)
(03) return true
end Enq

Operation Deq() is
(04) taken0i ← 0
(05) taili0 ← 0
(06) while true do
(07) takeni ← 0
(08) taili ← T ail sets.Read() − 1
(09) for ri ← 1 up to taili do
(10) xi ← Sets[ri ].Take()
(11) if xi = (∅, q) then takeni ← takeni + q
(12) else return xi end if
(13) end for
(14) if takeni = taken0i and taili = taili0 then return  end if
(15) taken0i ← takeni
(16) taili0 ← taili
(17) end while
end Deq

Figure 8: Read/Write non-blocking set-concurrent queue Set-Seq-Queue with multiplicity (code for pi ).

The set-linearizability proof is similar to that in the proof of Theorem 3. From any execution
E with no pending operations, it is similarly obtained an execution F where every item is taken at
most once. It is the case again that every linearization of F gives a set-linearization of E. Next,
F is transformed in a similar way to obtain an execution H of Seq-Queue; the differences in the
transformation are the following:

1. Once the relation ≺ over the elements of a set Sq is defined, the items in Sq are assigned to the
corresponding segment of Items in opposite order, since Deq scans Items in index-ascending
order in Seq-Queue, namely, from 1 up to taili .
2. The local variable takeni is updated right after the sequences of steps that replace a step
Sets[ri ].Take() in Line 10 that returns (∅, q); concretely, q is added to takeni .

Finally, any linearizability of H gives a linearizability of F , which concludes the proof.

Again, the composability property of set-linearizability and our Read/Write implementations in


Section 4 directly give the following:
Corollary 2. Set-sequential queues with multiplicity can be non-blocking set-linearizable imple-
mented using only Read/Write base objects and no Read-After-Write patterns.

7 Final Discussion
Considering classical data structures initially defined for sequential computing, this work has in-
troduced new well-defined specifications suitable for a concurrent system. Also, it has investigated
algorithms that implement them on top of “as weak as possible” synchronization mechanisms.

22
It has introduced the notion of set-sequential sets, queues, stacks and Fetch&Inc with multi-
plicity; these are concurrent versions of its sequential counterparts where concurrent operations are
allowed to obtain the same item/value. Non-blocking and wait-free set-linearizable implementations
were presented, all based only on Read/Write operations and without Read-After-Write patterns;
arguably these are among the simplest synchronization mechanisms in concurrent systems. The
queue and stack with multiplicity set-linearizable implementations were derived in a modular man-
ner from set-linearizable implementations of Fetch&Inc and sets with multiplicity. These are the
first Read/Write implementations of relaxations of objects with consensus number two.
Objects defined by a sequential specification are at the center of distributed computing [36].
The present article has shown that set-linearizability allows us to consider objects as inherently
concurrent, and not only as sequential objects whose specification has been “massaged” in order to
use them in a concurrency context. Sequential objects constitute a well-known but strict subset of
concurrent objects. Remarkably, the concurrent objects presented in this article behave as sequen-
tial objects when accessed sequentially, hence they naturally adapt to the concurrency patterns
that occur in their executions.
This work also can be seen as extending the results described in [13, 33] where the notion of
set-linearizability [33] was investigated. It has shown that linearizability and set-linearizability
constitute a hierarchy of consistency conditions that allow us to formally express the behavior of
non-trivial (and still meaningful) concurrent queues and stacks on top of simple base objects such
as Read/Write registers.
An interesting extension to this work is to explore if the notion of multiplicity can lead to
practical efficient implementations. In this direction, [12] has proposed fully Read/Write fence-free
work stealing with multiplicity implementations, that have stronger guarantees than previous work
stealing relaxations. Another interesting extension is to explore if there are other relaxations of con-
current data structures that allow implementations without requiring the stronger computational
power provided by atomic Read-Modify-Write operations.

References
[1] Afek Y., Attiya H., Dolev D., Gafni E., Merritt M., and Shavit N., Atomic snapshots of shared memory.
Journal of the ACM, 40(4):873-890 (1993)

[2] Afek Y., Gafni E., and and Morrison A., Common2 extended to stacks and unbounded concurrency.
Distributed Computing, 20(4):239-252 (2007)

[3] Afek Y., Korland G., and Yanovsky E., Quasi-linearizability: relaxed consistency for improved concur-
rency. Proc. 14th Int’l Conference on Principles of Distributed Systems, (OPODIS’10), Springer LNCS
6490, pp. 395-410 (2010)

[4] Afek Y., Weisberger E., and Weisman H., A completeness theorem for a class of synchronization objects.
Proc. 12th ACM Symposium on Principles of Distributed Computing (PODC’93), ACM Press, pp. 159–
170 (1993)

[5] Alistarh D., Brown T., Kopinsky J., Li J. and Nadiradze G., Distributionally linearizable data struc-
tures. Proc. 30th on Symposium on Parallelism in Algorithms and Architectures (SPAA’15), ACM
Press, pp. 133–142 (2018)

23
[6] Alistarh D., Kopinsky J., Li J., and Shavit N., The SprayList: a scalable relaxed priority queue. Proc.
20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’15),
ACM Press, pp. 11–20 (2015)

[7] Aspnes J., Attiya H., Censor-Hillel K., and Ellen F., Limited-use atomic snapshots with polylogarithmic
step complexity. Journal of the ACM, 62(1), pp. 3:1–3:22 (2015)

[8] Attiya H., Guerraoui R., Hendler D., and Kuznetsov P., The complexity of obstruction-free implemen-
tations. Journal of the ACM, 56(4), Article 24, 33 pages (2009)

[9] Attiya H., Guerraoui R., Hendler D., Kuznetsov P., Michael M.M., and Vechev M.T., Laws of order:
expensive synchronization in concurrent algorithms cannot be eliminated. Proc. 38th ACM SIGPLAN-
SIGACT Symposium on Principles of Programming Languages (POPL’11), ACM Press, pp. 487-498
(2011)

[10] Attiya H., Herlihy M., and Rachman O., Atomic snapshots using lattice agreement. Distributed Com-
puting, 8(3):121–132 (1995)

[11] Attiya H., Castañeda A., Hendler D., Nontrivial and universal helping for wait-free queues and stacks.
Journal of Parallel Distributed Computing, 121:1-14 (2018)

[12] Castañeda A., Piña M. A., Fully read/write fence-free work-stealing with multiplicity. Proc. 35th In-
ternational Symposium on Distributed Computing (DISC’21), LIPIcs Vol. 209, pp. 16:1–16:20 (2021)

[13] Castañeda A., Rajsbaum S., and Raynal M., Unifying concurrent objects and distributed tasks: interval-
linearizability. Journal of the ACM, 65(6), Article 45, 42 pages (2018)

[14] Castañeda A., Rajsbaum S., and Raynal M., Relaxed queues and stacks from read/write operations
Proc. 24th Int’l Conference Principles of Distributed Systems (OPODIS’20), LIPIcs Vol. 184, pp. 13:1-
13:19 (2020)

[15] Hendler D., Shavit N., Yerushalmi L, A scalable lock-free stack algorithm. J. Parallel Distributed
Comput. 70(1): 1-12 (2010)

[16] Eisenstat D., Two-enqueuer queue in Common2. ArXiV:0805.O444v2, 12 pages (2009)

[17] Ellen F., Hendler D., and and Shavit N., On the inherent sequentiality of concurrent objects, SIAM
Journal on Computing, 41(3):519-536 (2012)

[18] Haas A., Henzinger T.A., Holzer A., Kirsch Ch.M, Lippautz M., Payer H., Sezgin A., Sokolova A., and
Veith H., Local linearizability for concurrent set-type data structures. Proc. 27th Int’l Conference on
Concurrency Theory, (CONCUR’16), LIPIcs Vol. 59, pages 6:1–6:15 (2016)

[19] Haas A., Lippautz M., Henzinger T.A., Payer H., Sokolova A., Kirsch C.M., and Sezgin A., Distributed
queues in shared memory: multicore performance and scalability through quantitative relaxation. Com-
puting Frontiers Conference (CF’13), ACM Press, pp.17:1–17:9 (2013)

[20] Hemed N., Rinetzky N., and Vafeiadis V., Modular verification of concurrent-aware linearizability. Proc.
29th Int’l Conference on Distributed Computing (DISC’15), Springer LNCS 9363, pp. 371–387 (2015)

[21] Henzinger T.A., Kirsch C.M., Payer H., Sezgin A., and Sokolova A., Quantitative relaxation of concur-
rent data structures. Proc. 40th ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages (POPL’13), ACM Pres, pp. 17:1–17:9 (2013)

[22] Herlihy M.P., Wait-free synchronization. ACM Transactions on Programming Languages and Systems,
13(1):124-149 (1991)

24
[23] Herlihy M.P. and Shavit N., The art of multiprocessor programming. Morgan Kaufmann, 508 pages,
ISBN 978-0-12-370591-4 (2008)

[24] Herlihy M.P. and Wing J.M., Linearizability: a correctness condition for concurrent objects. ACM
Transactions on Programming Languages and Systems, 12(3):463-492 (1990)

[25] Imbs D. and Raynal M., Help when needed, but no more: efficient read/write partial snapshot. Journal
of Parallel and Distributed Computing, 72(1):1-13 (2012)

[26] Kirsch C.M., Lippautz, and Payer H., Fast and scalable lock-free FIFO queues. Proc. 12th Int. Confer-
ence on Parallel Computing Technologies (PaCT’13), Springer LNCS 7979, pp. 208–223 (2013)

[27] Kirsch C.M., Payer H., Röck H., and and Ana Sokolova A., Performance, scalability, and semantics
of concurrent FIFO queues. Proc. 12th Int. Conference on Algorithms and Architectures for Parallel
Processing (ICAAPP’12), Springer LNCS 7439, pp. 273–287 (2012)

[28] Li Z., Non-blocking implementations of queues in asynchronous distributed shared-memory systems.


Tech report, Department of Computer Science, University of Toronto, (2001)

[29] Matei D., A single-enqueuer wait-free queue implementation. Proc. 18th Int’l Conference on Distributed
Computing (DISC’04), Springer LNCS 3274, pp. 132–143 (2004)

[30] Matei D., Brodsky A., and Ellen F., Restricted stack implementations. 19th Int’l Conference on Dis-
tributed Computing (DISC’05), Springer LNCS 3724, pp. 137–151 (2005)

[31] Michael M.M., Vechev M.T., and Saraswat S.A., Idempotent work stealing. Proc. 14th ACM SIGPLAN
Symposium on Principles and Practice of Parallel Programming, (PPOPP’09), ACM Press, pp. 45–54
(2009)

[32] Moir M. and Shavit N., Concurrent data structures. Handbook of Data Structures and Applications,
chapter 47, Chapman and hall/CrC Press, 33 pages (2007)

[33] Neiger G., Brief announcement: Set linearizability. Proc. 13th annual ACM symposium on Principles
of distributed computing (PODC’94), ACM Press, page 396 (1994)

[34] Nguyen D., Lenharth A., and Pingali K., A lightweight infrastructure for graph analytics. Proc. 24th
ACM Symposium on Operating Systems Principles (SOSP’13), ACM Press, pp. 456–471 (2013)

[35] Payer H., Röck H., Kirsch M.M., and Sokolova A., Brief announcement: Scalability versus seman-
tics of concurrent FIFO queues. Proc. 30th ACM Symposium on Principles of Distributed Computing
(PODC’11), ACM Press, pp. 331–332 (2011)

[36] Rajsbaum S. and Raynal M., Mastering concurrent computing through sequential thinking. Communi-
cations of the ACM, 63(1):78–87 (2020)

[37] Raynal M., Concurrent programming: algorithms, principles and foundations. Springer, 515 pages,
ISBN 978-3-642-32026-2 (2013)

[38] Rihani H., Sanders P. and Dementiev R., Brief Announcement: MultiQueues: simple relaxed concur-
rent priority queues. Proc. 327th ACM on Symposium on Parallelism in Algorithms and Architectures
(SPAA’12), ACM Press, pp. 80–82 (2015)

[39] Shavit N., Data structures in the multicore age. Communications of the ACM, 54(3):76–84 (2011)

[40] Shavit N. and Taubenfeld G., The computability of relaxed data structures: queues and stacks as
examples. Distributed Computing, 29(5):395–407 (2016)

25
[41] Talmage E. and Welch J.L., Relaxed data types as consistency conditions. Algorithms, 11(5)61, 18
pages (2018)

[42] Taubenfeld G., Synchronization algorithms and concurrent programming. Pearson Education/Prentice
Hall, 423 pages, ISBN 0-131-97259-6 (2006)

[43] Zhou T., Michael M.M., and Spear M.F., A practical, scalable, relaxed priority queue. Proc. 48th Int’l
Conference on Parallel Processing (ICPP’19), pp. 57:1–57:10 (2019)

26

You might also like