• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
A Design Framework for Highly Concurrent Systems
Matt Welsh, Steven D. Gribble, Eric A. Brewer, and David Culler
Computer Science Division University of California, Berkeley Berkeley, CA 94720 USA
{
 mdw,gribble,brewer,culler
}
@cs.berkeley.edu
Abstract
Building highly concurrent systems, such as large-scaleInternet services, requires managing many information flowsat once and maintaining peak throughput when demand ex-ceeds resource availability. In addition, any platform sup-porting Internet services must provide high availability andbe able to cope with burstiness of load. Many approachesto building concurrent systems have been proposed, whichgenerally fall into the two categories of threaded and event-driven programming. We propose that threads and eventsare actually on the ends of a design spectrum, and thatthe best implementation strategy for these applications issomewhere in between.We present a general-purpose design framework for build-ing highly concurrent systems, based on three design com-ponents —
tasks
,
queues
, and
thread pools
— which encapsu-late the concurrency, performance, fault isolation, and soft-ware engineering benefits of both threads and events. Wepresent a set of design patterns that can be applied to mapan application onto an implementation using these compo-nents. In addition, we provide an analysis of several systems(including an Internet services platform and a highly avail-able, distributed, persistent data store) constructed usingour framework, demonstrating its benefit for building andreasoning about concurrent applications.
1 Introduction
Large Internet services must deal with concurrencyat an unprecedented scale. The number of concurrentsessions and hits per day to Internet sites translatesinto an even higher number of I/O and network re-quests, placing enormous demands on underlying re-sources. Microsoft’s web sites receive over 300 millionhits with 4.1 million users a day; Lycos has over 82 mil-lion page views and more than a million users daily. Asthe demand for Internet services grows, as does theirfunctionality, new system design techniques must beused to manage this load.In addition to high concurrency, Internet serviceshave three other properties which necessitate a freshlook at how these systems are designed: burstiness,continuous demand, and human-scale access latency.Burstiness of load is fundamental to the Internet; deal-ing with overload conditions must be designed into thesystem from the beginning. Internet services must alsoexhibit very high availability, with a downtime of nomore than a few minutes a year. Finally, because ac-cess latencies for Internet services are at human scaleand are limited by WAN and modem access times, animportant engineering tradeoff to make is to optimizefor high throughput rather than low latency.Building highly concurrent systems is inherently dif-ficult. Structuring code to achieve high throughputis not well-supported by existing programming mod-els. While threads are a commonly used device forexpressing concurrency, the high resource usage andscalability limits of many thread implementations hasled many developers to prefer an event-driven ap-proach. However, these event-driven systems are gen-erally built from scratch for particular applications,and depend on mechanisms not well-supported by mostlanguages and operating systems. In addition, us-ing event-driven programming for concurrency can bemore complex to develop and debug than threads.That threads and events are best viewed as the op-posite ends of a design spectrum; the key to developinghighly concurrent systems is to operate in the
middle
of this spectrum. Event-driven techniques are usefulfor obtaining high concurrency, but when building realsystems, threads are valuable (and in many cases re-quired) for exploiting multiprocessor parallelism anddealing with blocking I/O mechanisms. Most devel-opers are aware that this spectrum exists, by utilizingboth thread and event-oriented approaches for concur-rency. However, the dimensions of this spectrum arenot currently well understood.We propose a general-purpose design framework for
 
building highly concurrent systems. The key idea be-hind our framework is to use event-driven program-ming for high throughput, but leverage threads (inlimited quantities) for parallelism and ease of program-ming. In addition, our framework addresses the otherrequirements for these applications: high availabilityand maintenance of high throughput under load. Theformer is achieved by introducing fault boundaries be-tween application components; the latter by condition-ing the load placed on system resources.This framework provides a means to reason aboutthe structural and performance characteristics of thesystem as a whole. We analyze several different sys-tems in terms of the framework, including a distributedpersistent store and a scalable Internet services plat-form. This analysis demonstrates that our designframework provides a useful model for building andreasoning about concurrent systems.
2 Motivation: Robust Throughput
To explore the space of concurrent programmingstyles, consider a hypothetical server (as illustrated inFigure 1) that receives
A
tasks per second from a num-ber of clients, imposes a server-side delay of 
L
secondsper task before returning a response, but overlaps asmany tasks as possible. We denote the task completionrate of the server as
. A concrete example of such aserver would be a web proxy cache; if a request to thecache misses, there is a large latency while the page isfetched from the authoritative server, but during thattime the task doesn’t consume CPU cycles. For eachresponse that a client receives, it immediately issuesanother task to the server; this is therefore a closed-loop system.There are two prevalent strategies for handlingconcurrency in modern systems:
threads
and
events
.Threading allows programmers to write straight-linecode and rely on the operating system to overlap com-putation and I/O by transparently switching acrossthreads. The alternative,
events
, allows programmersto manage concurrency explicitly by structuring codeas a single-threaded handler that reacts to events (suchas non-blocking I/O completions, application-specificmessages, or timer events). We explore each of thesein turn, and then formulate a robust hybrid design pat-tern, which leads to our general design framework.
2.1 Threaded Servers
A simple threaded implementation of this server(Figure 2) uses a single, dedicated thread to service
serverlatency:
L
sectask arrival rate:
A
tasks / sec# concurrenttasks in server:
A x L
taskscompletion rate:
S
tasks / secclosed loopimplies
S = A
Figure 1:
Concurrent server model:
The server receives
A
tasks per second, handles each task with a latency of 
L
seconds, and has a service response rate of 
tasks persecond. The system is closed loop: each service responsecauses another tasks to be injected into the server; thus,
=
A
in steady state.
the network, and hands off incoming tasks to individ-ual task-handling threads, which step through all of the stages of processing that task. One handler threadis created per task. An optimization of this simplescheme creates a pool of several threads in advanceand dispatches tasks to threads from this pool, therebyamortizing the high cost of thread creation and de-struction. In steady state, the number of threads
that execute concurrently in the server is
×
L
. Asthe per-task latency increases, there is a correspondingincrease in the number of concurrent threads needed toabsorb this latency while maintaining a fixed through-put, and likewise the number of threads scales linearlywith throughput for fixed latency.Threads have become the dominant form of ex-pressing concurrency. Thread support is standard-ized across most operating systems, and is so well-established that it is incorporated in modern lan-guages, such as Java [9]. Programmers are comfortablecoding in the sequential programming style of threadsand tools are relatively mature. In addition, threadsallow applications to scale with the number of proces-sors in an SMP system, as the operating system canschedule threads to execute concurrently on separateprocessors.Thread programming presents a number of correct-ness and tuning challenges. Synchronization primi-tives (such as locks, mutexes, or condition variables)are a common source of bugs. Lock contention cancause serious performance degradation as the numberof threads competing for a lock increases.
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...