Lightweight Asynchronous Snapshots for Distributed Dataflows

Lightweight Asynchronous Snapshots for Distributed Dataflows
Paris Carbone1 Gyula Fóra2 Stephan Ewen3 Seif Haridi1,2 Kostas Tzoumas3
1 KTH Royal Institute of Technology - {parisc,haridi}@kth.se

2 Swedish Institute of Computer Science - {gyfora, seif}@sics.se
3 Data Artisans GmbH - {stephan, kostas}@data-artisans.com
arXiv:1506.08603v1 [cs.DC] 29 Jun 2015
Abstract from dataflow processing systems such as Apache

Distributed stateful stream processing enables the de- Flink [1] and Naiad [11], especially in the domains of
ployment and execution of large scale continuous com- real-time analysis (e.g. predictive analytics and com-
putations in the cloud, targeting both low latency and plex event processing). Fault tolerance is of paramount
high throughput. One of the most fundamental chal- importance in such systems, as failures cannot be af-
lenges of this paradigm is providing processing guaran- forded in most real-world use cases. Currently known
tees under potential failures. Existing approaches rely approaches that guarantee exactly-once semantics on
on periodic global state snapshots that can be used stateful processing systems rely on global, consistent
for failure recovery. Those approaches suffer from two snapshots of the execution state. However, there are
main drawbacks. First, they often stall the overall com- two main drawbacks that make their application inef-
putation which impacts ingestion. Second, they ea- ficient for real-time stream processing. Synchronous
gerly persist all records in transit along with the oper- snapshotting techniques stop the overall execution of a
ation states which results in larger snapshots than re- distributed computation in order to obtain a consistent
quired. In this work we propose Asynchronous Bar- view of the overall state. Furthermore, to our knowl-
rier Snapshotting (ABS), a lightweight algorithm suited edge all of the existing algorithms for distributed snap-
for modern dataflow execution engines that minimises shots include records that are in transit in channels or
space requirements. ABS persists only operator states unprocessed messages throughout the execution graph
on acyclic execution topologies while keeping a min- as part of the snapshotted state. Most often this includes
imal record log on cyclic dataflows. We implemented state that is larger than required.
ABS on Apache Flink, a distributed analytics engine In this work, we focus on providing lightweight
that supports stateful stream processing. Our evalua- snapshotting, specifically targeted at distributed state-
tion shows that our algorithm does not have a heavy ful dataflow systems, with low impact on performance.
impact on the execution, maintaining linear scalability Our solution provides asynchronous state snapshots
and performing well with frequent snapshots. with low space costs that contain only operator states
in acyclic execution topologies. Additionally, we cover
Keywords fault tolerance, distributed computing, stream the case of cyclic execution graphs by applying down-
processing, dataflow, cloud computing, state manage- stream backup on selected parts of the topology while
ment keeping the snapshot state to minimum. Our technique
does not halt the streaming operation and it only intro-
1. Introduction
duces a small runtime overhead. The contributions of
Distributed dataflow processing is an emerging paradigm this paper can be summarised as follows:
for data intensive computing which allows continuous
computations on data in high volumes, targeting low • We propose and implement an asynchronous snap-
end-to-end latency while guaranteeing high through- shotting algorithm that achieves minimal snapshots
put. Several time-critical applications could benefit on acyclic execution graphs.
• We describe and implement a generalisation of our throughout the execution graph that trigger the per-
algorithm that works on cyclic execution graphs. sistence of operator and channel state. This approach
• We show the benefits of our approach compared to though still suffers from additional space requirements
the state-of-the-art using Apache Flink Streaming as due to the need of an upstream backup and as a re-
a base system for comparisons. sult higher recovery times caused by the reprocess-
ing of backup records. Our approach extends the orig-
The rest of the paper is organised as follows: Section 2 inal asynchronous snapshotting idea of Chandy and
gives an overview of existing approaches for distributed Lamport, however, it considers no backup logging of
global snapshots in stateful dataflow systems. Section records for acyclic graphs while also keeping very se-
3 provides an overview of the Apache Flink processing lective backup records on cyclic execution graphs.
and execution model followed by Section 4 where we
describe our main approach to global snapshotting in
detail. Our recovery scheme is described briefly in Sec-
3. Background: The Apache Flink System
tion 5. Finally, Section 6 summarises our implementa- Our current work is guided by the need for fault toler-
tion followed by our evaluation in Section 7 and future ance on Apache Flink Streaming, a distributed stream
work and conclusion in Section 8. analytics system that is part of the Apache Flink Stack
(former Stratosphere [2]). Apache Flink is architec-
2. Related Work tured around a generic runtime engine uniformly pro-
cessing both batch and streaming jobs composed of
Several recovery mechanisms have been proposed dur-
stateful interconnected tasks. Analytics jobs in Flink
ing the last decade for systems that do continuous pro-
are compiled into directed graphs of tasks. Data el-
cessing [4, 11]. Systems that emulate continuous pro-
ements are fetched from external sources and routed
cessing into stateless distributed batch computations
through the task graph in a pipelined fashion. Tasks are
such as Discretized Streams and Comet [6, 15] rely
continuously manipulating their internal state based on
on state recomputation. On the other hand, stateful
the received inputs and are generating new outputs.
dataflow systems such as Naiad, SDGs, Piccolo and
SEEP [3, 5, 11, 12] , which are our main focus in this
work, use checkpointing to obtain consistent snapshots 3.1 The Streaming Programming Model
of the global execution for failure recovery. The Apache Flink API for stream processing allows
The problem of consistent global snapshots in dis- the composition of complex streaming analytics jobs
tributed environments, as introduced by Chandy and by exposing unbounded partitioned data streams (par-
Lamport [4], has been researched extensively through- tially ordered sequences of records) as its core data
out the last decades [4, 7, 8]. A global snapshot the- abstraction, called DataStreams. DataStreams can be
oretically reflects the overall state of an execution, or created from external sources (e.g. message queues,
a possible state at a specific instance of its operation. socket streams, custom generators) or by invoking op-
A simple but costly approach employed by Naiad [11] erations on other DataStreams. DataStreams support
is to perform a synchronous snapshot in three steps: several operators such as map, filter and reduce in the
first halting the overall computation of the execution form of higher order functions that are applied incre-
graph, then performing the snapshot and finally in- mentally per record and generate new DataStreams.
structing each task to continue its operation once the Every operator can be parallelised by placing parallel
global snapshot is complete. This approach has a high instances to run on different partitions of the respec-
impact on both throughput and space requirements due tive stream, thus, allowing the distributed execution of
to the need to block the whole computation, while also stream transformations.
relying on upstream backup that logs emitted records The code example in 1 shows how to implement a
at the producer side. Another popular approach, origi- simple incremental word count in Apache Flink. In this
nally proposed by Chandy and Lamport [4], that is de- program words are read from a text file and the current
ployed in many systems today is to perform snapshots count for each word is printed to the standard output.
asynchronously while eagerly doing upstream backup This is a stateful streaming program as sources need to
[4, 5, 10]. This is achieved by distributing markers be aware of their current file offset and counters need
A way of providing this resilience is to periodically
capture snapshots of the execution graph which can
be used later to recover from failures. A snapshot is
a global state of the execution graph, capturing all nec-
essary information to restart the computation from that
specific execution state.
Figure 1: The execution graph for incremental word
count
4.1 Problem Definition
to maintain the current count for each word as their We define a global snapshot G∗ = (T ∗ , E ∗ ) of an exe-
internal state. cution graph G = (T, E) as a set of all task and edge
states, T ∗ and E ∗ respectively. In more detail, T ∗ con-
1 val env : StreamExecutionEnvironment = ... sists of all operator states st∗ ∈ T ∗ , ∀t ∈ T , while E ∗ is
2 env.setParallelism(2) a set of all channel states e∗ ∈ E ∗ where e∗ consists of
3
records that are in transit on e.
4 val wordStream = env.readTextFile(path)
We require that certain properties hold for each snap-
5 val countStream = wordStream.groupBy(_).count
6 countStream.print
shot G∗ in order to guarantee correct results after recov-
ery such as termination and feasibility as described by
Example 1: Incremental Word Count Tel [14].
Termination guarantees that a snapshot algorithm even-
3.2 Distributed Dataflow Execution tually finishes in finite time after its initiation if all pro-
cesses are alive. Feasibility expresses the meaningful-
When a user executes an application all DataStream ness of a snapshot, i.e. that during the snapshotting pro-
operators compile into an execution graph that is in cess no information has been lost regarding the com-
principle a directed graph G = (T, E), similarly to Na- putation. Formally, this implies that causal order [9] is
iad [11] where vertices T represent tasks and edges E maintained in the snapshot such that records delivered
represent data channels between tasks. An execution in tasks are also sent from the viewpoint of a snapshot.
graph is depicted in Fig. 1 for the incremental word
count example. As shown, every instance of an opera-
tor is encapsulated on a respective task. Tasks can be 4.2 ABS for Acyclic Dataflows
further classified as sources when they have no input It is feasible to do snapshots without persisting channel
channels and sinks when no output channels are set. states when the execution is divided into stages. Stages
Furthermore, M denotes the set of all records trans- divide the injected data streams and all associated com-
ferred by tasks during their parallel execution. Each putations into a series of possible executions where all
task t ∈ T encapsulates the independent execution of an prior inputs and generated outputs have been fully pro-
operator instance and is composed of the following: (1) cessed. The set of operator states at the end of a stage
a set of input and output channels: It , Ot ⊆ E; (2) an op- reflectsg the whole execution history, therefore, it can
erator state st and (3) a user defined function (UDF) ft . be solely used for a snapshot. The core idea behind our
Data ingestion is pull-based : during its execution each algorithm is to create identical snapshots with staged
task consumes input records, updates its operator state snapshotting while keeping a continuous data inges-
and generates new records according to its user defined tion.
function. More specifically, for each record r ∈ M re- In our approach, stages are emulated in a continu-
ceived by a task t ∈ T a new state st0 is produced along ous dataflow execution by special barrier markers in-
with a set of output records D ⊆ M according to its jected in the input data streams periodically that are
UDF ft : st , r 7→ hst0 , Di. pushed throughout the whole execution graph down
to the sinks. Global snapshots are incrementally con-
4. Asynchronous Barrier Snapshotting structed as each task receives the barriers indicating ex-
In order to provide consistent results, distributed pro- ecution stages. We further make the following assump-
cessing systems need to be resilient to task failures. tions for our algorithm:
Figure 2: Asynchronous barrier snapshots for acyclic graphs
Algorithm 1 Asynchronous Barrier Snapshotting for Broadcasting messages is also supported on all out-
Acyclic Execution Graphs put channels.
1: upon event hInit | input channels, out- • Messages injected in source tasks (i.e. stage barri-
put channels, fun, init statei do ers) are resolved into a “Nil” input channel.
2: state := init state; blocked inputs := 0;
/
3: inputs := input channels; The ABS Algorithm 1 proceeds as follows (depicted
4: out puts := out put channels; udf := f un; in Fig. 2): A central coordinator periodically injects
5:
stage barriers to all the sources. When a source receives
6: upon event hreceive | input, hbarrierii do a barrier it takes a snapshot of its current state, then
7: if input 6= Nil then broadcasts the barrier to all its outputs (Fig.2(a)). When
8: blocked inputs := blocked inputs ∪ a non-source task receives a barrier from one of its in-
{input}; puts, it blocks that input until it receives a barrier from
9: trigger hblock | inputi; all inputs (line 9, Fig.2(b)). When barriers have been re-
ceived from all the inputs, the task takes a snapshot of
10: if blocked inputs = inputs then
its current state and broadcasts the barrier to its outputs
11: blocked inputs := 0;/
(lines 12-13, Fig.2(c)). Then, the task unblocks its input
12: broadcast hsend | outputs, hbarrierii;
channels to continue its computation (line 15, Fig.2(d)).
13: trigger hsnapshot | statei;
The complete global snapshot G∗ = (T ∗ , E ∗ ) will con-
14: for each inputs as input
sist solely of all operator states T ∗ where E ∗ = 0.
/
15: trigger hunblock | input i; 1
Proof Sketch : As mentioned earlier a snapshot algo-
16:
rithm should guarantee termination and feasibility.
17:
Termination is guaranteed by the channel and acyclic
18: upon event hreceive | input, msgi do
execution graph properties. Reliability of the channels
19: {state0 , out records} := udf (msg, state);
ensures that every barrier sent will eventually be re-
20: state := state0 ;
ceived as long as the tasks are alive. Furthermore, as
21: for each out records as {out put, out record}
there is always a path from a source, every task in a
22: trigger hsend | output, out recordi;
DAG topology will eventually receive barriers from all
23:
its input channels and take a snapshot.
24:
For feasibility it suffices to show that the operator states
in a global snapshot reflect only the history of records
• Network channels are quasi-reliable, respect a FIFO processed up to the last stage. This is guaranteed by
delivery order and can be blocked and unblocked. the FIFO ordering property of the channels and the
When a channel is blocked all messages are buffered blocking of input channels upon barriers ensuring that
but not delivered until it gets unblocked.
• Tasks can trigger operations on their channel com- 1 We omit the formal proof of the algorithms in this version of the
ponents such as block, unblock and send messages. paper due to space limitations
Figure 3: Asynchronous barrier snapshots for cyclic graphs
no post-shot records of a stage (records succeeding a delivered from their back-edges until they receive stage
barrier) are processed before a snapshot is taken. barriers from them (line 26). This allows, as seen in Fig.
3(c), for all pre-shot records that are in transit within
4.3 ABS for Cyclic Dataflows loops to be included in the current snapshot. Mind that
the final global snapshot G∗ = (T ∗ , L∗ ) contains all
In the presence of directed cycles in the execution
task states T ∗ and only the back-edge records in transit
graph the ABS algorithm presented before would not
L∗ ⊂ E ∗ .
terminate resulting into a deadlock, as tasks in a cycle
Proof Sketch: Again, we need to prove that termina-
would wait indefinitely to receive barriers from all their
tion and feasibility are guaranteed in this version of the
inputs. Additionally, records that are arbitrarily in tran-
algorithm.
sit within cycles would not be included in the snapshot,
As in 4.2 termination is guaranteed, because every task
thus violating feasibility. It is therefore needed to con-
will eventually receive barriers from all its inputs (in-
sistently include all records generated within a cycle
cluding the back-end channels) and complete its snap-
in the snapshot for feasibility and to put these records
shot. By broadcasting the barrier as soon as receiving
back in transit upon recovery. Our approach to deal
it from all regular inputs, we avoid the deadlock condi-
with cyclic graphs extends the basic algorithm with-
tion mentioned previously.
out inducing any additional channel blocking as seen in
The FIFO ordering property still holds for back-edges
Algorithm 2. First, we identify back-edges L on loops
and the following properties prove feasibility . (1) Each
in the execution graph by static analysis. From control
task state included in the snapshot is a state copy of
flow graph theory a back-edge in a directed graph is
the respective task taken before processing any post-
an edge that points to a vertex that has already been
shot events from barriers received on regular inputs.
visited during a depth-first search. The execution graph
(2) The downstream log included in the snapshot is
G(T, E \ L) is a DAG containing all tasks in the topol-
complete and contains all pending post-shot records
ogy. From the perspective of this DAG the algorithm
prior to barriers received on back-edges due to FIFO
operates as before, nevertheless, we additionally apply
guarantees.
downstream backup of records received from identi-
fied back-edges over the duration of a snapshot. This
is achieved by each task t that is a consumer of back-
edges Lt ⊆ It , Lt creating a backup log of all records 5. Failure Recovery
received from Lt from the moment it forwards barri- While not being the main focus of this work, a work-
ers until receiving them back from Lt . Barriers push ing failure recovery scheme motivates the application
all records in transit within loops into the downstream of our snapshotting approach. Thus, we provide a brief
logs, so they are included once in the consistent snap- explanation here regarding its operation. There are sev-
shot. eral failure recovery schemes that work with consis-
The ABS Algorithm 2 proceeds as follows (depicted tent snapshots. In its simplest form the whole execution
in Fig. 3) in more detail: Tasks with back-edge inputs graph can be restarted from the last global snapshot as
create a local copy of their state once all their regular such: every task t (1) retrieves from persistent storage
(e ∈/ L) channels have delivered barriers (line 14, Fig.3 its associated state for the snapshot st and sets it as its
(b)). Moreover, from this point they log all records initial state, (2) recovers its backup log and processes
Algorithm 2 Asynchronous Barrier Snapshotting for
Cyclic Execution Graphs
1: upon event hInit | input channels,
backedge channels, output channels, fun,
init statei do Figure 4: Upstream Dependency Recovery Scheme
2: state := init state; marked := 0;
/
3: inputs := input channels; logging := False;
4: out puts := out put channels; udf := f un; stream nodes to avoid recomputation. To achieve this
5: loop inputs := backedge channels; we can follow a similar scheme to SDGs [5] and mark
6: state copy := Nil; backup log := []; records with sequence numbers from the sources, thus,
7: every downstream node can discard records with se-
8: upon event hreceive | input, hbarrierii do quence numbers less than what they have processed al-
9: marked := marked ∪ {input}; ready.
10: regular := inputs \ loop inputs;
11: if input 6= Nil AND input ∈ / loop inputs then 6. Implementation
12: trigger hblock | inputi; We contributed the implementation of the ABS algo-
13: if ¬logging AND marked = regular then rithm to Apache Flink in order to provide exactly-once
14: state copy := state; logging := True; processing semantics for the streaming runtime. In our
15: broadcast hsend | outputs, hbarrierii; current implementation blocked channels store all in-
16: for each inputs as input coming records on disk instead of keeping them in
17: trigger hunblock | input i; memory to increase scalability. While this technique
18: ensures robustness, it increases the runtime impact of
19: if marked = input channels then the ABS algorithm.
20: trigger hsnapshot | {state copy, In order to distinguish operator state from data we
backup log}i; introduced an explicit OperatorState interface which
21: marked := 0;/ logging := False; contains methods for updating and checkpointing the
22: state copy := Nil; backup log := []; state. We provided OperatorState implementations for
the stateful runtime operators supported by Apache
23:
Flink such as offset based sources or aggregations.
24: upon event hreceive | input, msgi do
Snapshot coordination is implemented as an actor
25: if logging AND node ∈ loop inputs then
process on the job manager that keeps a global state
26: backup log := backup log :: [input];
for an execution graph of a single job. The coordinator
27: {state0 , out records} := udf (msg, state); periodically injects stage barriers to all sources of the
28: state := state0 ; execution graph. Upon reconfiguration, the last glob-
29: for each out records as {out put, out record} ally snapshotted state is restored in the operators from
30: trigger hsend | output, out recordi; a distributed in-memory persistent storage.
31:
32: 7. Evaluation
The goal of our evaluation is to compare the runtime
all contained records, (3) starts ingesting records from overhead of ABS with the globally synchronised snap-
its input channels. shot algorithm employed in Naiad [11] and also to test
A partial graph recovery scheme is also possible, the scalability of the algorithm to larger number of
similarly to TimeStream [13], by rescheduling only up- nodes.
stream task dependencies (tasks that hold output chan-
nels to the failed tasks) and their respective upstream 7.1 Experiment Setup
tasks up to the sources. An example recovery plan is The execution topology (Fig. 5) used for evaluation
shown in Fig. 4. In order to offer exactly-once seman- consists of 6 distinct operators with parallelism equal
tics, duplicate records should be ignored in all down- to the number of cluster nodes, which translates into a
800 600
ABS 3 sec snapshots
700 Synchronised 500 Baseline
Baseline
600 400
Net runtime (sec)
Net runtime (sec)

500 300
400 200
300 100
200 0
0 2 4 6 8 10 0 5 10 15 20 25 30 35 40
Snapshot interval (sec) Cluster size (number of nodes)
Figure 5: Execution topology Figure 6: Runtime impact comparison Figure 7: Runtime with ABS on dif-
used for evaluation on varying snapshot intervals ferent cluster sizes
6*cluster size task vertices. The execution contains 3 time guarantees for many applications that are latency-
full network shuffles in order to accentuate the possi- critical, such as intrusion detection pipelines. Thus,
ble impact of channel blocking in ABS. Sources gen- such applications would further benefit by the perfor-
erate a total of 1 billion records which are distributed mance of ABS. In Fig. 7 we compare the scalability
uniformly among the source instances. The state of the of the topology running ABS with 3 second snapshot
operators in the topology were per-key aggregates and intervals against the baseline (without fault tolerance).
source offsets. The experiments were run on Amazon It is clear that both the baseline job and ABS achieved
EC2 cluster using up to 40 m3.medium instances. linear scalability.
We measured the runtime overhead of our evalua-
tion job running under different snapshotting schemes, 8. Future Work and Conclusion
namely ABS and synchronised snapshotting [11] with In future work we are planning to explore the possi-
varying snapshot intervals. We implemented the syn- bilities of further lowering the impact of ABS by de-
chronous snapshotting algorithm used in Naiad [11] on coupling snapshotting state and operational state. That
Apache Flink in order to have identical execution back- allows for purely asynchronous state management as
end for the comparison. This experiment was run using tasks can continuously process records while persist-
a 10 node cluster. To evaluate the scalability of our al- ing snapshots. In such scheme there is also the need
gorithm, we processed a fixed amount of input records to synchronise pre-shot and post-shot records with the
(1 billion) while increasing the parallelism of our topol- respective state which can be resolved by marking the
ogy from 5 up to 40 nodes. records depending on the snapshot they belong to. As
this approach would increase the computational, space
7.2 Results and network I/O requirements of the algorithm, we plan
In Fig. 6 we depict the runtime impact of the two al- to compare its performance to our current ABS imple-
gorithms against the baseline (no fault tolerance). The mentation. Finally, we plan to investigate different re-
large performance impact of synchronous snapshotting covery techniques that maintain exactly-once seman-
is especially visible when the interval of snapshotting tics while minimising the need for reconfiguration by
is small. That is due to the fact that the system spends operating on a per-task granularity.
more time not processing any data, in order to obtain In summary, we focused on the problem of perform-
global snapshots. ABS has a much lower impact on ing periodic global snapshots on distributed dataflow
the runtime as it runs continuously without blocking systems. We introduced ABS, a new snapshotting tech-
the overall execution, while maintaining a rather sta- nique that achieves good throughput. ABS is the first
ble throughput rate. For larger snapshot intervals the algorithm that considers the minimum state possible
impact of the synchronised algorithm is less signifi- for acyclic execution topologies. Furthermore, we ex-
cant since it operates in bursts (of 1-2 sec in our ex- tend ABS to work with cyclic execution graphs by
periments) while letting the system run at its normal storing only the records that need to be reprocessed
throughput during the rest of its execution. Neverthe- upon recovery. We implemented ABS on Apache Flink
less, bursts can often violate SLAs in terms of real- and evaluated our approach against synchronous snap-
shotting. At this early stage ABS shows good results, [13] Z. Qian, Y. He, C. Su, Z. Wu, H. Zhu, T. Zhang,
having low impact on the overall execution throughput L. Zhou, Y. Yu, and Z. Zhang. Timestream: Reliable
with linear scalability. stream computation in the cloud. In Proceedings of the
8th ACM European Conference on Computer Systems,
pages 1–14. ACM, 2013.
References
[14] G. Tel. Introduction to distributed algorithms. Cam-
[1] Apache flink. https://flink.apache.org/.
bridge university press, 2000.
[2] A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Frey-
[15] M. Zaharia, T. Das, H. Li, S. Shenker, and I. Sto-
tag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser,
ica. Discretized streams: an efficient and fault-tolerant
V. Markl, et al. The stratosphere platform for big data
model for stream processing on large clusters. In Pro-
analytics. The VLDB JournalThe International Journal
ceedings of the 4th USENIX conference on Hot Topics
on Very Large Data Bases, 23(6):939–964, 2014.
in Cloud Ccomputing, pages 10–10. USENIX Associa-
[3] R. Castro Fernandez, M. Migliavacca, E. Kalyvianaki, tion, 2012.
and P. Pietzuch. Integrating scale out and fault toler-
ance in stream processing using operator state manage-
ment. In Proceedings of the 2013 ACM SIGMOD in-
ternational conference on Management of data, pages
725–736. ACM, 2013.
[4] K. M. Chandy and L. Lamport. Distributed snapshots:
determining global states of distributed systems. ACM
Transactions on Computer Systems (TOCS), 3(1):63–
75, 1985.
[5] R. C. Fernandez, M. Migliavacca, E. Kalyvianaki, and
P. Pietzuch. Making state explicit for imperative big
data processing. In USENIX ATC, 2014.
[6] B. He, M. Yang, Z. Guo, R. Chen, B. Su, W. Lin, and
L. Zhou. Comet: batched stream processing for data
intensive distributed computing. In Proceedings of the
1st ACM symposium on Cloud computing, pages 63–74.
ACM, 2010.
[7] A. D. Kshemkalyani, M. Raynal, and M. Singhal. An
introduction to snapshot algorithms in distributed com-
puting. Distributed systems engineering, 2(4):224,
1995.
[8] T. H. Lai and T. H. Yang. On distributed snapshots.
Information Processing Letters, 25(3):153–158, 1987.
[9] L. Lamport. Time, clocks, and the ordering of events
in a distributed system. Communications of the ACM,
21(7):558–565, 1978.
[10] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Ky-
rola, and J. M. Hellerstein. Distributed graphlab: a
framework for machine learning and data mining in
the cloud. Proceedings of the VLDB Endowment,
5(8):716–727, 2012.
[11] D. G. Murray, F. McSherry, R. Isaacs, M. Isard,
P. Barham, and M. Abadi. Naiad: a timely dataflow sys-
tem. In Proceedings of the Twenty-Fourth ACM Sympo-
sium on Operating Systems Principles, pages 439–455.
ACM, 2013.
[12] R. Power and J. Li. Piccolo: Building fast, distributed
programs with partitioned tables. In OSDI, volume 10,
pages 1–14, 2010.

Lightweight Asynchronous Snapshots for Distributed Dataflows

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lightweight Asynchronous Snapshots for Distributed Dataflows

Uploaded by

Copyright:

Available Formats

Lightweight Asynchronous Snapshots for Distributed Dataflows

1 KTH Royal Institute of Technology - {parisc,haridi}@kth.se

Abstract from dataflow processing systems such as Apache

Net runtime (sec)

Net runtime (sec)

You might also like