The problem is that the issues, concepts, andabstractions for the interaction of transactions andscalable systems have no names and are not crisplyunderstood. When they get applied, they areinconsistently applied and sometimes come back to biteus. One goal of this paper is to launch a discussionwhich can increase awareness of these concepts and,hopefully, drive towards a common set of terms and anagreed approach to scalable programs.This paper attempts to name and formalize someabstractions implicitly in use for years to implementscalable systems.
Think about Almost-Infinite Scaling of ApplicationsTo frame the discussion on scaling, this paper presentsan informal thought experiment on the impact of almost-infinite scaling. I assume the number of customers, purchasable entities, orders, shipments,health-care-patients, taxpayers, bank accounts, and allother business concepts manipulated by the applicationgrow significantly larger over time. Typically, theindividual things do not get significantly larger; wesimply get more and more of them. It really doesn’tmatter what resource on the computer is saturated first,the increase in demand will drive us to spread whatformerly ran on a small set of machines to run over alarger set of machines…
Almost-infinite scaling is a loose, imprecise, anddeliberately amorphous way to motivate the need to bevery clear about when and where we can knowsomething fits on one machine and what to do if wecannot ensure it does fit on one machine. Furthermore,we want to scale almost linearly
with the load (bothdata and computation).
Describe a Few Common Patterns for Scalable AppsWhat are the impacts of almost-infinite scaling on thebusiness logic? I am asserting that scaling impliesusing a new abstraction called an “entity” as you writeyour program. An entity lives on a single machine at atime and the application can only manipulate one entityat a time. A consequence of almost-infinite scaling isthat this programmatic abstraction must be exposed tothe developer of business logic.By naming and discussing this as-yet-unnamedconcept, it is hoped that we can agree on a consistentprogrammatic approach and a consistent understandingof the issues involved in building scalable systems.Furthermore, the use of entities has implications on themessaging patterns used to connect the entities. Theselead to the creation of state machines that cope with the
To be clear, this is conceptually assuming tens of thousands or hundreds of thousands of machines. Toomany to make them behave like one “big” machine.
Scaling at N log N for some big log would be reallynice…message delivery inconsistencies foisted upon theinnocent application developer as they attempt to buildscalable solutions to business problems.
Let’s start out with three assumptions which areasserted and not justified. We simply assume these aretrue based on experience.
Layers of the Application and Scale-AgnosticismLet’s start by presuming (at least) two layers in eachscalable application. These layers differ in theirperception of scaling. They may have other differencesbut that is not relevant to this discussion.The lower layer of the application understands the factthat more computers get added to make the systemscale. In addition to other work, it manages themapping of the upper layer’s code to the physicalmachines and their locations. The lower layer is
in that it understands this mapping. We arepresuming that the lower layer provides a
scale-agnostic programming abstraction
to the upper layer
.Using this scale-agnostic programming abstraction, theupper layer of application code is written withoutworrying about scaling issues. By sticking to the scale-agnostic programming abstraction, we can writeapplication code that is not worried about the changeshappening when the application is deployed againstever increasing load.Over time, the lower layer of these applications mayevolve to become new platforms or middleware whichsimplify the creation of scale-agnostic applications(similar to the past scenarios when CICS and other TP-Monitors evolved to simplify the creation of applications for block-mode terminals).The focus of this discussion is on the possibilitiesposed by these nascent scale-agnostic APIs.
Scopes of Transactional SerializabilityLots of academic work has been done on the notion of providing transactional serializability across distributedsystems. This includes 2PC (two phase commit) whichcan easily block when nodes are unavailable and otherprotocols which do not block in the face of nodefailures such as the Paxos algorithm.
Google’s MapReduce is an example of a scale-agnosticprogramming abstraction.
Scale Agnostic CodeScale-Aware CodeImplementing Supportfor the Scale-AgnosticProgramming Abstraction