You are on page 1of 32

The InfoQ eMag / Issue #87 / November 2020

Edge Cloud

Is Edge Computing a Deploying Edge Cloud State at the Edge:


Thing? Solutions without an Interview with
Sacrificing Security Peter Bourgon

FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT


InfoQ @ InfoQ InfoQ InfoQ

Edge Cloud

IN THIS ISSUE

Is Edge Computing a Thing? 06 Edge Computing and Flow


Evolution 22
Deploying Edge Cloud
Solutions without Sacrificing
Security 11 State at the Edge: an
Interview with Peter Bourgon 27

The Modern Edge 17

PRODUCTION EDITOR Ana Ciobotaru / COPY EDITORS Lawrence Nyveen & Susan Conant / DESIGN Dragos Balasoiu
GENERAL FEEDBACK feedback@infoq.com / ADVERTISING sales@infoq.com / EDITORIAL editors@infoq.com
CONTRIBUTORS

Simon Crosby  Sam Bocetta


is the CTO of SWIM.AI. Previously, Simon is a former security analyst, having spent
was a co-founder and CTO of Bromium, a the bulk of his as a network engineer for the
security technology company. At Bromium, Navy. He is now semi-retired, and educates
Simon built a highly secure virtualized the public about security and privacy
system to protect applications. Prior to technology. Much of Sam’s work involved
Bromium, Crosby was the co-founder and penetration testing ballistic systems. He
CTO of XenSource before its acquisition by analyzed our networks looking for entry
Citrix, and later served as the CTO of the points, then created security-vulnerability
Virtualization and Management Division at assessments based on my findings. Further,
Citrix. Previously, Crosby was a principal he helped plan, manage, and execute
engineer at Intel. Crosby was also the sophisticated “ethical” hacking exercises to
founder of CPlane, a network-optimization identify vulnerabilities and reduce the risk
software vendor. Simon Crosby was named posture of enterprise systems used by the
as one of InfoWorld’s Top 25 CTOs. Navy (both on land and at sea).

Zack Bloom James Urquhart


is the Head of Developer Marketing at is a Global Field CTO with VMware Tanzu.
Cloudflare. He is the creator of open source Mr. Urquhart brings over 25 years of
projects which have gathered over forty experience in distributed applications
thousand stars on Github. development, deployment, and operations,
focusing on software as a complex adaptive
system, cloud native applications and
platforms, and automation. Mr. Urquhart
has also written and spoken extensively
about software agility and the business
opportunities it affords.
A LETTER FROM
THE EDITOR

“Edge” isn’t a new thing to most ly, Crosby lays out an approach
technologists. However, if you powered by stateful, real-time
think it just relates to Inter- streaming, and what’s needed to
net-of-Things (IoT), it’s time make that a reality.
to take a fresh look. Whether
you’re at a startup to established Extending your solutions to the
enterprise, edge computing has edge might involve rethinking
become more relevant. Why? your security strategy. In “De-
You’re generating data in more ploying Edge Cloud Solutions
places: SaaS systems, public without Sacrificing Security”,
clouds, remote devices, on-prem- Sam Bocetta first explores the
ises, and partner data centers. security challenges with an edge
Richard Seroter You’re also creating systems that solution. Then he assesses the
execute logic on mobile phones “interconnected processes” that
is Director of Outbound Product or vehicles, then at the CDN, and you’ll need to consider when
Management at Google Cloud, with
a master’s degree in Engineering finally in your application. These building a secure edge solution
from the University of Colorado. He’s distributed systems solve one at scale.
also an instructor at Pluralsight,
the lead InfoQ.com editor for cloud set of problems, and can create
computing, a frequent public speaker,
the author of multiple books on
another set of problems. At InfoQ, “The Modern Edge”, from Zack
software design and development, we sought to learn more from the Bloom at Cloudflare, looks at
and a former 12-time Microsoft MVP
for cloud. As Director of Outbound people building and evaluating practical examples of how com-
Product Management at Google these edge-powered systems. panies are using edge com-
Cloud, Richard leads a team focused
on products and customer success puting. He explains how edge
for app modernization (e.g. Anthos). In “Is Edge Computing a Real computing powers JavaScript
Richard maintains a regularly updated
blog on topics of architecture and Thing”, industry leader Simon package management system
solution design and can be found on
Crosby proposes that edge npm. That’s a big use case, but
Twitter as @rseroter.
computing requires us to rethink Bloom also tells about two “reg-
how we process data. Specifical- ular” companies that use edge
The InfoQ eMag / Issue #87 / November 2020
computing in their own ways to serve
their customers.

Industry vet James Urquhart sees


edge as increasingly essential, in a
thoughtful piece entitled “Edge Com-
puting and Flow Evolution.” Urquhart
draws parallels between distributed
computing and real-world systems—
think cities, electrical grids, and the
body’s circulatory system.

Finally, it’s important to look at where


the next innovations may be. In an
interview with Fastly’s Peter Bourgon,
we learn about state processing at
the edge. Bourgon proposes a handful
of ideas that are designed to respect
the unique challenges of distributed
data processing, while stripping away
unnecessary complexity.

This series of articles touches on


many of the key aspects of designing
and delivering a solution that uses
edge computing. We hope you enjoy
it, and that it sparks new ideas and
debates with your colleagues.

5
The InfoQ eMag / Issue #87 / November 2020

Is Edge Computing a Thing?


by Simon Crosby, CTO of SWIM.AI.

 Enterprise architects are I’m not referring to lift+shift at least, the “edge” is not a
comfortable using public cloud legacy applications that are place.  Instead it’s a new way to
services to help quickly build expensive to run in the cloud compute on streaming data. 
applications. The (REST API (and will take forever to re-
+ stateless microservice + write).  Instead, my focus is a There are many uses for ad-
database) model for cloud- class of applications that are vanced computing embedded
native apps is a pattern that increasingly important to en- in next-gen products “at the
has been key to scaling the terprise customers that analyze edge”, from cars to compres-
cloud (any server will do) and streaming data – from produc- sors.  The engineers that devel-
increased use of abstractions tion, products, supply chain op them will use the best CPUs,
such as Kubernetes further partners, employees and more ML acceleration, hypervisors,
simplifies the operational - to deliver real-time insights Linux and other technology
deployment and management to help drive the business.  In- to build vertically integrated
of microservices.  But there’s a creasingly these apps are put solutions: Better cars, com-
class of applications for which into the “edge computing” cat- pressors and drones.   Is this
today’s cloud-native stacks are egory.    In this piece I’ll dissect “edge computing”?   Sure, but
an awkward fit.   the requirements for these apps not in a generic sense – these
and (hopefully) convince you are tightly integrated solutions
that, for streaming data apps rather than computing systems

6
The InfoQ eMag / Issue #87 / November 2020
that could be used for a broad Continuously and statefully that processes data from mobile
set of applications - interesting analyze new data: Stateless devices.   
but only narrowly.   But what microservices can only update or
about the data that these and consume from a database.  That OK, what’s different?   Whereas
other connected products pro- means their performance is cloud computing builds on a
duce?  “Smart” devices (with a dominated by database response powerful triumvirate: Stateless
CPU and lots to say) are being times.  Often many database APIs and micro-services, and
connected at a rate of about 2M/ accesses will be required to stateful databases, streaming
hour, and there already are a bil- compute a contextual result, applications need different infra-
lion mobile devices in our hands making performance non-real- structure abstractions: 
- so increasingly networks are time.  Stateful processing is key.
becoming flooded with stream- “Things” are stateful: Whereas
ing data.  Applications that drink Always have an answer: The the stateless model of the web
from this firehose are becoming “store then analyze” approach serves traditional cloud services
common, and we need tools to needs to become “analyze, react, well because it lets any server
help devs create them. then (maybe) store”, without (including “serverless” Lamb-
database roundtrips.  Storing raw da / Functions) process an event,
The term “edge computing” data for later analysis is slow this has a massive drawback
implies a generic capability that and batch focused, and moreover when  processing streaming
is different from cloud comput- most raw data is only ephemeral- data:  For each new event, an app
ing.  While there are often re- ly useful.  must load (code and) the previ-
quirements such as data volume ous state from a database, com-
reduction, latency or security/ Process all data: Sub-sampling is pute a new value, and store the
compliance concerns that dictate inadequate.  The software stack new state back in the database. 
an on-prem component of an needs to process every event, This bounds performance to the
application, other than these, on-the-fly, and share insights in round-trip time to the database,
does edge computing have real-time. which inevitably is a million times
unique requirements?  It does: slower than the CPU.  We need a
Real-time analysis of streaming Analyze and visualize in context: computing architecture that can
data demands that we kick the An event is of value only in the deal with real world state chang-
REST + database habit.  But there context in which it was gener- es - “things” change state often
is nothing that is unique to the ated, so the interrelationship and changes to one thing propa-
physical edge.  This is great news between event sources is key, at gate to others.    
because it means that “edge a granular level.
applications” can run on cloud Algorithms must be adapted for
infrastructure, or on prem.  “Edge These requirements for stream boundless data:  Algorithms that
computing” is definitely a thing, processing software are inde- compute analytics, learn or pre-
but it’s about processing stream- pendent of “where” the solution dict must be adapted to deal with
ing data from the edge, as op- runs. If the data sources are boundless data – computing on
posed to running the application fixed, then it makes sense to every new event.
at the physical edge.   Edge ap- co-locate some compute nearby
plications that process streaming – particularly to deal with data Context is vital: Events aren’t
data from real world things have reduction. But a cloud-hosted meaningful in isolation.  Re-
to:  stack is needed for any solution al-world relatedness of things

7
such as containment, proximity ful processing, and not the (as well as persisting its state
The InfoQ eMag / Issue #87 / November 2020

and adjacency are key in applica- usual stateless microservice in the background to protect
tions that reason about events in model that relies on REST and against failures).  Web agents are
context.  So, edge computation databases.  But rather than thus stateful, concurrent digital
is necessarily “in (a dynamical- make engineers scratch their twins of real-world data sourc-
ly changing) data graph” rather heads and learn something new, es that at all times mirror the
than “over (a pre-built) graph”. we have to provide an approach relevant state of the real-world
that is easy to adopt – using things they represent.  The dia-
A key observation is that “things” familiar dev and devops meta- gram shows web-agents created
in the real-world change state phors to make it easy for engi- as digital twins of sensors in a
concurrently and independent- neers to quickly deliver stateful smart city environment.  
ly, and their state changes are applications that offer real-time
context specific – based on the insights at scale.  Below we
state of (other things in) their discuss a powerful “stream-
environment.  It is the state ing-native” application platform
changes in things that are critical called  SwimOS, an Apache 2 li-
for applications, and not raw censed platform loosely modeled
data.  Moreover, whereas da- on the actor model, that builds
tabases are good at capturing graphs of linked actors to state-
relationships that are fixed (bus- fully analyze sources of stream-
es have engines) they are poor at ing data in real-time.  Developers
capturing dynamically computed create simple object-oriented
relationships (a truck with bad applications in Java or JavaS- Web agents link to each oth-
braking behavior is near an in- cript.  Streaming data builds an er based on computed context
spector).  Real-world relation- automatically scaled-out graph based on changes in the data,
ships between data sources are of stateful, concurrent objects - dynamically building a graph that
fluid, and based on computed called web agents – actors that reflects real-world relationships
relationships such as bad brak- are effectively “digital twins” of like proximity, containment, or
ing behavior, the application data sources.  even correlation.  Linked agents
should respond differently. see each other’s state changes
Finally, effects of changes are in real-time.  Agents can con-
immediate, local and contextual currently compute on their own
(the inspector is notified to stop state and that of agents they are
the truck).   The dynamic nature linked to.  They analyze, learn and
of relationships suggests a graph predict, and continuously stream
database – and indeed a graph of enriched insights to UIs, bro-
related “things” is what is  need- kers, data lakes and enterprise
ed.  But in this case, to satisfy the applications. 
need to process continuously, the
graph itself needs to be fluid and The diagram shows that the
computation must occur “in the sensors at an intersection link to
graph”. the intersection digital twin.  In-
Each web agent actively process- tersections link to their neigh-
In summary: Edge computing es raw data from a single source bors, enabling them to see state
at any scale demands state- and keeps its state in memory changes in real-time.   The links

8
The InfoQ eMag / Issue #87 / November 2020
are dynamically computed: mem- is in-memory and web agents SwimOS moves analysis, learning
bership/containment and prox- can compute immediately when and prediction into the dynam-
imity are used to build the graph data arrives or a contextual state ically constructed graph of web
using insights computed by the change in another linked agent agents.  An application is a thus
digital twins from their own data. occurs.  Streaming implementa- a dynamically constructed graph
SwimOS benefits from in-memo- tions of key analytical, learning built from data.  This dramatical-
ry, stateful, concurrent computa- and prediction algorithms are ly simplifies application creation:
tion that yields several orders of included in SwimOS, but it is also The developer simply describes
magnitude performance im- easy to interface to existing plat- the objects and their inputs, and
provement over database-centric forms such as Spark.     the calculations that they use
analysis simply because all state to determine how to link.  When

9
data arrives, SwimOS creates a the inspector and the truck is city.  Instead, only the local con-
The InfoQ eMag / Issue #87 / November 2020

web agent for each data source, thus also ephemeral, built when text for each intersection is used,
each of which independently and the truck enters a geo-fence by the digital twin of the intersec-
concurrently computes as data around the inspector, and broken tion itself, to learn and predict.  In
flows over it, and in turn streams when it leaves. each city, the app for the city is
its insights – such as predictions built from the data, without need-
or analytical insights. Each is An application in SwimOS is a ing to change a line of code.
responsible for consuming raw dynamic graph of linked web
data from its real-world sibling, agents that continuously con- A word about transformations: In
and links dynamically to other currently compute as data flows traditional analytical stacks – for
agents based on computed rela- over the graph.  As each web example using Spark – the data
tionships.  As data flows over the agent modifies its own state, is transported to the application,
graph, each web agent computes that state is immediately visi- which is forced to transform
insights using its own state and ble to linked web agents in the data to state, for each thing, and
that of other agents to which it is graph.  This is achieved through save it to a database, before the
linked.    streaming – a link is effect a analytical app can operate on
streaming API, and takes the the state of the real world sourc-
You can see SwimOS in action in form of a URI – just like a REST es.  SwimOS transforms raw data
an application that consumes API call.   SwimOS uses a pro- to the relevant state of the source
over 4TB per day of data from the tocol called WARP, which runs in each web agent.  The data-to-
traffic infrastructure in Palo Alto, over web-sockets, to synchronize state transformation is done by
CA, to predict the future state of state and deliver streamed in- each digital twin and the graph
each intersection (click on the sights. The key difference is this: of web agents is a mirror of the
blue dots to see the predicted each web agent transforms its states of the real-world sourc-
phases, and the colors for down- own raw data into state chang- es.  Then, each digital twin can
timers of each light).   This es, and those state changes are compute locally, at high resolu-
application runs in many cities in streamed to linked web agents in tion, using its own state and the
the USA, and delivers predictions, the graph.  They in turn compute states of linked web agents.  This
on a per intersection basis, via an based on their own states, and proceeds concurrently, at CPU
Azure hosted API to customers the states of agents to which and memory speed, without
that need to route their vehicles they are linked.   needing to access a data-
through each city.  The source base.  The performance increase
code for the application is part of A SwimOS application is built is huge, and the reductions of
the SwimOS GitHub repo. Start- from the data, given a simple ob- infrastructure required are com-
ing with SwimOS is easy – the ject definition for each source.  A mensurate.  We frequently find
site is complete with docs and single app is easy to build.  But that a SwimOS application uses
tutorials. there is another benefit in this less than 5% of the infrastructure
approach: If an application is re- of a database-centric implemen-
In SwimOS the graph is a living, used in multiple sites, no chang- tation of the same application.  
dynamically changing struc- es are required.  For example,
ture. In the example earlier: data in a smart city application is Finally, SwimOS digital twins are
(a truck with bad braking behav- used to build an application that real-time streaming mirrors of
ior is near an inspector) the rela- predicts future traffic behavior in real-world things – things in the
tionship is dynamically computed the city.  But note that there is no edge environment.  They com-
and ephemeral; the link between specific model required for each pute and stream insights on the

10
fly.  Visualization tools are vital needed is an ability to com-

The InfoQ eMag / Issue #87 / November 2020


TL;DR
for such applications.  Swi- pute (anywhere) on streaming
mOS includes a set of JS and data from large numbers of
typescript bindings that enable dynamically changing devices,
developers to quickly devel- in the edge environment.  This
• “Edge” is not a place,
op browser-based UIs that in turn demands an architec-
but rather, a new way to
update in real-time, simply tural pattern for stateful, dis-
compute on streaming
because they subscribe to the tributed computing.  SwimOS
data.
streamed insights computed is an example of a stateful,
by web agents. real-time platform for appli- • Real-time analysis of
cations that process real-time streaming data demands
In summary: Edge Computing streaming data. that we kick the REST +
is definitely a thing, but the Database habit.
computing need not occur
• Cloud computing is
at the edge.  Instead what is
built on stateless mi-
croservices and stateful
databases, whereas
edge requires stateful
“things”, boundless data
streams, and dynamic
context.

• SwimOS is an example
of a stateful, real-time
platform for applications
that process real-time
streaming data

11
The InfoQ eMag / Issue #87 / November 2020

Deploying Edge Cloud Solutions without


Sacrificing Security
by Sam Bocetta, Security Researcher

Edge cloud was a major topic of security implications of these cloud solutions whilst retaining
debate at RSA this year. Multi- same architectures. strong security, it’s worth re-
ple panels were devoted to the minding yourself why edge cloud
subject, and even in those that This tension has been apparent solutions such as Software-as-
weren’t the utility of edge solu- for a while. In their 2020 Outlook a-Service (SaaS) were initially
tions was often raised. At the report, Carbon Black pointed to a developed. Such solutions are
same time, however, a tension bit of a rift between IT and secu- becoming virtually commonplace,
was apparent: operations and rity teams regarding resource al- to the point that SaaS in partic-
dev staff were quick to stress the location in cloud edge structures.  ular is projected to account for
performance gains of edge cloud almost all of the software needs
infrastructures, and cybersecurity Edge, Cloud, and Edge Cloud for 86% of companies within the
pros raised concerns about the In order to see the challeng- next two years (it already is being
es involved in deploying edge

12
used to a lesser extent by 90% of These concerns are well illus- data back to the edge for pro-

The InfoQ eMag / Issue #87 / November 2020


companies right now). trated by the ongoing worry that cessing. The challenge here is to
the most high-profile example of ensure that this data is authenti-
Some security firms will tell you edge cloud systems – automated cated and verified, and is there-
that their edge cloud SaaS solu- vehicles – can be easily hacked. fore safe to enter into an organi-
tions are only designed to ensure The ease with which data used zations’ internal systems. 
security, but that’s not quite true. by autonomous vehicles can be
In reality, edge cloud systems accessed and manipulated has Fragmentation
were developed with one simple been a concern for years, and as First, and most obviously, edge
factor in mind: bandwidth. That’s a result many of the security pro- cloud systems fragment data.
why, for instance, the Open Glos- tocols used in ege cloud systems Having each device connected
sary of Edge Computing, an open have been designed, primarily, to directly to cloud services might
source effort led by the Linux protect autonomous vehicles.  incur a performance loss, but at
Foundation’s LF Edge group, de- least this data is centralized, and
fines edge cloud systems primar- The Challenges can be covered by a single cloud
ily in terms of performance: “by There are a few problems with security policy. Because edge
shortening the distance between edge cloud solutions from a cloud servers – almost by defi-
devices and the cloud resources security perspective. Some are nition – need to be connected
that serve them,” the glossa- technical, and some relate to to many different devices, they
ry explains, “and also reducing the way in which these ser- represent a nightmare when it
network hops, edge computing vices are used within a typical comes to securing these same
mitigates the latency and band- organization. connections.
width constraints of today’s
Internet, ushering in new classes Architecture Fragmentation is not only a prob-
of applications.” First, let’s think about the struc- lem when it comes to protecting
ture of edge cloud systems. In data, though. With a growing
In other words, as the number of most implementations, edges are number of IoT devices running
IoT devices connected to net- within organizations’ comput- via edge processing, each needs
works began to increase expo- ing boundaries, and so they will to be authenticated and follow a
nentially around five or so years be protected by a wide variety privacy policy that allows net-
ago, many systems engineers of tools that focus on perimeter work admins to keep control of
found that their cloud providers scanning and intrusion detection. their data. The edge cloud model
were not keeping up with the However, that›s not quite the makes it inherently difficult to
increased computing load. The whole story: in most systems, apply global privacy policies to
solution was to insert another there will also be a tunnel each device, since each is com-
level of processing between de- between the edge straight to municating independently.
vices and cloud storage provid- cloud storage. 
ers, and thereby reduce the data Physical Security
loads that cloud services had to Sending data from the edge to A third issue with edge cloud
process. Only later, in fact, were the cloud in a secure way is fairly systems is that locking down
edge cloud systems thought straightforward, because orga- physical access to these devices
about as a tool to secure the nizations will control the infra- can be a challenge. The devices
devices they interact with. And structure that is used to encrypt typically used in edge cloud in-
that’s primarily why cybersecuri- and verify it. The problem arises frastructures are designed, after
ty pros don’t trust them. when the cloud needs to send all, to be portable, and as such

13
are more susceptible to physical demands to your edge cloud the level of cybersecurity risk –
The InfoQ eMag / Issue #87 / November 2020

tampering than standard data system.  that they provide. 


devices.
Overcoming this challenge re- In this context, it is all too
An example of this is the “micro quires a dual approach. On the common for IoT devices to be
data centers” that many telecom- one hand, management needs to connected to networks (and
munications providers are now be made aware of the limitations connected together) in poorly se-
making use of. These centers of edge cloud systems, both in cured horizontal structures. Not
sometimes sit at the base of cell terms of computing power and only does this make them more
towers, and pre-process data be- security, in order to prevent many susceptible to attack, but it also
fore feeding either back to con- new devices being connected to allows intruders a huge degree of
sumer devices or into corporate them. Secondly, engineers should lateral movement once they are
data systems. Micro data centers design edge cloud systems with inside IoT networks. 
like this can dramatically improve a view to the future, and make
the performance of cell networks, sure that the security that is built So Why Use Edge Cloud?
but they are also vulnerable to into these systems is easily un- Given all these security risks,
physical tampering. derstandable for other employees and given that a recent study by
working with them. Tech Republic found that two-
Sprawl thirds of IT teams considered
All of these issues are com- User Error edge computing as more of a
pounded by the tendency of edge The problem of “sprawl” is relat- threat than an opportunity, it›s
cloud systems to grow beyond ed to another: that many IT pro- worth wondering why we need
the boundaries they were origi- fessionals simply don’t take IoT edge cloud solutions at all.
nally designed to operate within. device security seriously. Despite This is, in fact, a very pertinent
In large organizations, building the well-documented security is- question, because some
edge cloud functionality can be sues that these devices present, analysts have argued that the
an invitation for other engineers, many people simply don’t realize advent of 5G networks, coupled
from other parts of your organi- the level of connectivity – and with the increased computing
zation, to shift their computing power of contemporary IoT
devices, means that most of the
processing currently done by
edge cloud systems can now be
done by devices themselves.

That doesn’t seem to be born


out by the facts, though. In a
recent report Futuriom writes
that 5G will actually be a catalyst
for edge-compute technology.
“Applications using 5G technol-
ogy will change traffic demand
patterns, providing the biggest
driver for edge computing in mo-
bile cellular networks,” the report
states. 

14
In other words: whilst connec- by edge cloud systems can be over security. However, in

The InfoQ eMag / Issue #87 / November 2020


tion and cloud technologies are broken into a number of intercon- traditional cloud systems, the
developing rapidly, demand for nected processes. processing required to run
them is increasing even faster. these devices can be managed
This is a particular problem when Decentralization and Resilience centrally. As IoT devices begin
it comes to managing online First, it’s worth pointing out that to utilize edge cloud solutions,
backup services, because with- one of the features that makes this exposes them to increased
out proper oversight a cloud edge edge cloud infrastructures so threats.
system can end up undermining hard to secure – the fragmen-
the integrity of backup policies tation of data – can also make The most commonly suggested
implemented by individual teams. them more resilient. This is solution to this problem is to in-
because, as Proteus Duxbury, a crease the security of IoT devic-
Put simply, companies can’t transformation expert at PA Con- es themselves. However, at the
afford to give up their cloud- sulting, said recently, “instead moment many embedded devic-
based systems or devices. of one or two or even three data es lack the computing power to
Cloud connected VoIP systems centers, where if they’re close encrypt data before sending it to
can save businesses 70% of their enough together that, say, a big either cloud or edge cloud sys-
total phone bills on average, and storm could impact them all, you tems. As a result, network engi-
companies that turn to cloud have distributed data and com- neers have been forced to rely on
computing to fulfill their software pute on the edge, which makes it other forms of security. 
needs have seen major increases much more resilient to malicious
in productivity to help drive busi- and nonmalicious events.”  Full Spectrum Security
ness growth. At the same time, Securing edge cloud systems
the bandwidth available to these In some ways, then, pushing is ultimately a problem of scale
same companies lags far behind data to the edge can mean that rather than of essence. Secu-
the amount of data they need to attacks on organizations are less rity professionals already have
process on the cloud. Edge cloud effective, because they are not access to many of the tools that
computing, in this context, seems able to compromise a centralized are required to protect these
like an obvious choice. data storage system which holds systems, but will need to huge-
every piece of sensitive data. ly extend their reach in order to
Ensuring Security On the other hand, and as seen protect data on the edge. 
This means that, for now, security above, this fragmentation can
teams are stuck with edge cloud make the application of global In fact, in many ways securing
solutions, and will have to work security measures more difficult. edge cloud systems requires
out how to harden them further network engineers to return to
against cyberattack. Crucial to The IoT and Encryption the basic principles of network
this attempt will be the deploy- Another issue that is raised by security, but then to apply them
ment of perimeter scanning the widespread adoption of edge outside the systems that they
systems that are able to analyse cloud systems is the security of directly manage. These elements
not just standard network data, the IoT itself. Concerns about the include:
but a huge variety of other forms security of IoT devices are not
of data such as that produced by new, of course: it has long been • Perimeter scanning tech-
embedded IoT devices. Overcom- noted that the design of these niques that use encrypted
ing the security challenges posed devices prioritizes connectivity tunnels, firewalls, and access

15
control policies to protect the identity of the entity, real time tication and verification steps
The InfoQ eMag / Issue #87 / November 2020

data held in edge cloud context and security/compliance on every connected device, at a
systems. policies.” At a basic level, SASE frequency that ensures that the
combines SD-WAN, SWG, CASB, data being handled stays secure
• Securing applications running
ZTNA and FWaaS as core abil- whilst not affecting global net-
on the edge in the same ways
ities, with the ability to identify work performance. 
that applications running
sensitive data or malware and
within your organization are
the ability to decrypt content at Pushing The Edge
already secured.
line speed, with continuous mon- Whilst cloud edge computing
• Upgrading threat detection itoring of sessions for risk and offers many opportunities, it also
capabilities so that intrusion trust levels.  comes with challenges. To make
can be detected not just in matters worse, these challenges
relation to cloud or in-house Though SASE is still a new ap- come at a time when security
systems, but also for the proach, Gartner has high hopes teams are struggling to keep up
edge. for the new technology. They with other developments – the
predict that, by 2024, at least necessity to go multi-cloud
• Automated patching that
40% of enterprises will have in whilst still using cloud-na-
allows network managers
place strategies to adopt this tive tools, and becoming involved
to trust that both software
approach. in DevSecOps migrations.
and firmware automatically
receives security updates.
The End Of Zero Trust? For that reason, a major deter-
Securing edge cloud systems mining factor in the security of
Secure Access Service Edge
also involves overturning some edge cloud systems will be the
(SASE)
basic misconceptions about speed at which they are de-
All of these approaches and tools
threat hunting. Namely, it might ployed by businesses. Though
have been combined by Gartner
be that the increased popularity edge cloud can offer significant
into a new category of hardware
of edge cloud solutions overturns gains in terms of performance, it
and services that are specifically
another piece of received wis- will not replace traditional cloud
designed to improve edge cloud
dom, the superiority of the zero models where these are currently
security. In 2019, the firm coined
trust model. Many of these new working well. 
a new term – Secure Access
systems, or at least the devices
Service Edge (SASE) – to define
that they interface with, will be In some ways, this removes
these systems.
extremely difficult to bring into some of the pressure on security
single sign-on and user access teams, who can afford to design
Gartner has defined SASE as a
control processes.  each edge cloud system with
combination of multiple existing
security in mind at the earliest
technologies. The new paradigm,
Instead, ensuring security in edge possible stage. On the other
they say, «combines network
cloud solutions might require hand, as edge cloud systems
security functions (such as SWG,
a more pragmatic approach, in grow in importance, security pro-
CASB, FWaaS and ZTNA), with
which individual networks are fessionals will be in the unenvi-
WAN capabilities (i.e., SDWAN)
segmented and protected indi- able position of having to secure
to support the dynamic secure
vidually. This, in turn, requires cloud, edge cloud, and in-house
access needs of organizations.
that networks be configured to system simultaneously. 
These capabilities are delivered
automatically perform authen-
primarily aaS and based upon

16
As with any piece of new tech-

The InfoQ eMag / Issue #87 / November 2020


TL;DR
nology, the level of security of
cloud edge solutions is unlikely to
become apparent anytime soon.
But that doesn’t mean that we
shouldn’t put in place tools and • Edge cloud systems face
processes to protect these sys- security issues when
tems as far as is practical.  it comes to fragment-
ing data, locking down
physical access, and the
tendency of edge cloud
systems to grow beyond
the boundaries of what
they were originally de-
signed to operate in.

• Most security systems


are stuck with edge
cloud systems, and
need to figure out how
to harden them against
attack rather than aban-
don them

• Overcoming the secu-


rity challenges of edge
cloud security systems
will really come down to
decentralization, en-
cryption, and utilizing
full spectrum security
measures

• A big determining factor


for the security of edge
cloud systems will be
the speed at which busi-
nesses deploy them

17
The InfoQ eMag / Issue #87 / November 2020

The Modern Edge


by Zack Bloom, Head of Developer Marketing at Cloudflare

Too often we hear fanciful stories that I thought it would be fun you download is powered by
of the purpose of ‘edge com- to share some features of how edge computing! 
puting’. I was raised on tales of interesting companies are using
self-driving cars (which probably edge computing in ways you “We can say pretty confidently
should be able to drive with- might not expect to solve real there’s about 11 or 12 million Ja-
out an internet connection), problems and help build a better vaScript developers in the world.
AI-driven security cameras, and Internet. And we know because that’s how
oil drilling platforms which can’t many people are using npm.” -
process their own data. These npm at the Edge Isaac Schlueter, Creator of npm
use-cases are figurative and npm was acquired by GitHub a
fanciful, and while they may exist few weeks ago. What you might As is common with many ser-
someday, they aren’t a real part not know is every npm package vices, npm began by carefully di-
of our lives and jobs. To combat viding the world into ‘static’ files

18
and ‘dynamic’ responses like npmjs.org searches. the world. Workers takes the V8 JavaScript and

The InfoQ eMag / Issue #87 / November 2020


Static files could be cached and delivered by a WebAssembly engine which runs as a part of Goo-
CDN, while dynamic responses had to travel all gle Chrome and runs it on Cloudflare’s network
the way to a central origin to be responded to. of thousands of servers in hundreds of locations
In the words of Isaac Schlueter, creator of npm: around the world. As V8 doesn’t need to launch
“Once you have a CDN, you have a really high a dedicated process or container (or Kuberne-
degree of control over [how files are delivered], tes pod) for each customer’s code, it’s possible
especially if you’re talking about GETs and espe- to perform serverless computing at grand scale
cially if you’re talking about fetches of downloads without the ‘cold starts’ which plague traditional
of archives that can be long lived… that’s just how serverless systems. Code running in a Worker
NPM is architected.” starts in single-digit milliseconds, and requires
as little as one tenth the memory overhead of a
Unfortunately using a CDN is not without conse- Node.js process.
quences. Static files can’t be customized based
on who is viewing them or what that viewer is Inside their Worker code npm makes decisions
looking for. Similarly, dynamic responses are slow about whether a package is private or not. For
as they may have to travel halfway around the private packages they ensure that the user has
world and hard to scale as they rely on inflexi- the appropriate authentication token (using the
ble infrastructure. npm was built on JavaScript, standardized WebCrypto API) before delivering the
but they found themselves with a full time engi- file.
neer devoted to writing not JS, but bespoke CDN
configuration languages. As their service became To understand the transformation, let’s consider
more powerful with features like npm for Enter- performing a similar type of authentication on a
prise, it became very necessary to deliver different request using VCL, the configuration language
packages to different users, something a CDN was of the Varnish Cache used by several CDNs,
not very good at. versus being able to write code with JavaScript
and Cloudflare Workers. The VCL code has been
npm uses Cloudflare Workers which run JavaS- abridged for space:
cript in Cloudflare’s points of presence around

Figure 1

19
VCL Authentication Example Using JavaScript with Cloudflare Workers
The InfoQ eMag / Issue #87 / November 2020

sub vcl_recv {
/* unset state tracking header to avoid async function handle(request) {
client sending it */
// Make an authentication request that
if (req.restarts == 0) { is identical to the
unset req.http.X-Authed; // original request, but a GET with no
body.
}
let authUrl = new URL(request.url)
authUrl.pathname = “/authenticate”
if (!req.http.X-Authed) {
let authResponse = await fetch(au-
/* stash the original URL and Host
thUrl, {
for later */
...request,
set req.http.X-Orig-URL = req.url;
method: “GET”,
/* set the URL to what the auth back-
end expects */ body: null
set req.url = “/authenticate”; })
/* Auth requests won’t be cached, so
pass */
if (authResponse.status === 200) {
return(pass);
// Client is authenticated.
}
return fetch(request)
} else if (authResponse.status ===
if (req.http.X-Authed == “true”) { 401) {
/* we’re authed, so proceed with the // Authentication server wants cre-
request */ dentials from the client.
/* reset the URL */ return authResponse
set req.url = req.http.X-Orig-URL; } else {
} else { // Every other response from the au-
thentication server becomes 403.
/* the auth backend refused the re-
quest, so 403 the client */ return new Response(null, {
error 403; status: 403,
} statusText: “Forbidden”,
})
#CDN recv }
}
...etc...
}
 

20
As a programmer, I certainly find I will never, never, never, nev- the act of god to take his data-

The InfoQ eMag / Issue #87 / November 2020


the code easier to decipher. This er, never, NEVER build on the base offline.
means edge computing isn’t just cloud - Rafa
relevant for esoteric use-cases, Similarly, his application serv-
it’s relevant for each and every To Rafa, the public cloud is ex- ers are run on thirty dedicated
CDN requirement developers ceptionally expensive. He is able machines in different corners of
have. When compared to deploy- to get dedicated machines so the world. Cloudflare global load
ing similar code in a central loca- affordably that his entire hosting balancing is used to map his
tion using something like Node. bill is under $2000 a month, for a incoming web traffic to the near-
JS, deploying to a CDN offers complex service with more than est host, and to perform health
a massive scale advantage for 200k users a day. The secret to checks to remove hosts which
npm. Any site being powered by getting such cheap hardware might be offline.
a CDN is already being served by is being able to always accept
thousands of servers, this means the best (cheapest) deal. He can Many of us make performance
any code they write is already afford that because he doesn’t decisions based on ‘best prac-
horizontally scaled to handle need to worry about the avail- tices’, generally accepted min-
their hundreds of thousands of ability of any one of his instanc- imums required to build some-
requests per second. es. He uses MySQL with Galera thing of quality. For Rafa though,
Cluster to arrange all seven of things are more simple: “How can
Building Your Own Edge his database points of presence we build a user experience that
Anycart looks like any other into a master-master replication is so good that if Google doesn’t
grocery delivery service. When configuration. That configuration put us first, they’re doing a bad
you dig deeper you find one big means he can write to any one of job.” As an example of what that
exception: it’s very fast. Not the the instances and have it appear looks like, they don’t use the in-
delivery, the website! It’s also everywhere. dustry standard tool for frontend
growing very quickly. (See Figure programming: React. It’s simply
1, page 19) Careful programming has en- too slow. 
sured that his system expects all
This growth isn’t coming from processing to be done asynchro- Outsmarting Performance
paid ads or marketing, it’s com- nously, so replication lag which What can cleansing herbs teach
ing purely from the speed of would be an issue is minimized. you about performance? They
the site translating into a better Batch jobs are processed on taught Shalom Volchok that
experience and better SEO than a 48-core machine which his e-commerce platforms are not
its competitors. team only pays $180 a month to ready for the modern web. Sha-
run (the same machine on AWS lom’s family ran a major herbal
CEO, Rafa Sanches, has built a would be over $3000). His users product company that wanted
unique setup which eschews any are also generally only touching to perform A/B tests. An A/B
public cloud provider. Instead, their own records (like their cart test is where a web developer
he leases dedicated servers in and their recipes), meaning con- deploys two variants of a page,
dozens of locations around the flict resolution is rarely an issue shows each variant to a portion
world. In fact, he has over 30 between replicas. This system of visitors, and tests that perform
dedicated points of presence for means his users get blazing fast better. Unfortunately for them,
his application, and seven more reads and writes from anywhere every method of doing A/B tests
for his database, entirely built on earth, and he gets to sleep with existing systems delivered
and managed by him. well knowing it would take quite subpar performance. In fact, ev-

21
ery e-commerce vendor they tried deployed directly to a CDN, but
The InfoQ eMag / Issue #87 / November 2020

TL;DR
simply didn’t have the capability actually support the tools mod-
to both do things dynamically and ern businesses rely on. It goes to
with performance. show you what technology can do
when it meets a problem ready to
• Edge computing isn’t
So Shalom founded Outsmartly, be solved.
just for esoteric applica-
a startup e-commerce platform
tions, it’s a tool for every
which is built on the edge. Like The Future
web developer
many new ideas, they had to build The future of web development
the tools to build their prod- and computing is going to look a • It will take time for
uct. Namely, they constructed a lot like the present. Hard working edge-computing-level
Cloudflare Workers-based hosting and passionate people will build performance to become
platform which allows for vari- amazing things, and they will a best practice
ants to be deployed and served often not get the credit they de-
• npm uses Cloudflare
directly from the edge. Features serve. I believe in that future that
Workers to power
like multiple site variants being websites will be faster for those
every single package
deployed at once and integration people who are most disenfran-
downloaded
with most of the modern web chised by the Internet of today.
landscape (React, Angular, Vue) Platforms will be more powerful • Edge computing allows
all had to be built from scratch. and have less moving pieces, ‘static’ files to not be
With their platform they are able less APIs, and less languages. static anymore, which
to host sites which load in tens of Ultimately edge computing isn’t might eliminate the CDN
milliseconds almost anywhere on going to solve someone else’s as we know it
Earth, but which can have many problems, it will solve yours!
of the capabilities of a traditional
web server.

The dream location would be to


render a full application within the
CDN’s cache — effectively that’s
exactly what Cloudflare Work-
ers enables for our platform. For
large ecommerce companies, this
page load improvement can equal
millions of dollars in incremen-
tal revenue. The performance is
really that astonishing. — Shalom
Volchok, co-founder Outsmartly

They now can deliver static


websites that support an unlim-
ited number of A/B tests (and the
statistically relevant analytics
to go with them). These sites
are just as fast as any static site

22
The InfoQ eMag / Issue #87 / November 2020
Edge Computing and Flow Evolution
by James Urquhart Global Field CTO with VMware Tanzu

As every enterprise finds their ly for geographically distributed thing like “What exactly is edge
software portfolio merging into businesses, such as national re- computing?”, but having been
a single, large scale distributed tail chains, insurance brokerages, through the cloud wars, I know
computing architecture, we are or even modern manufacturing better than to launch yet another
also seeing that architecture supply chains. There is science debate on definitions and se-
evolve. “New” patterns seem to that explains these patterns, mantics. Instead, “the edge” can
be trending that are purported to coming from the field of complex be seen as the set of deployment
accommodate the sheer scale of adaptive systems. It is probably locations that are not in the data
digitizing corporate operations. worth understanding the basics center. This is good enough for
For some, the rise of edge com- of this science, as it will help our purposes.
puting would be one of the great you make better decisions about
changes to the way computing what to run centrally, and what to Now, let’s look at what complex-
gets done since cloud computing run “on the edge”. ity science can tell us about the
replaced private data centers. evolution of the edge. Geoffrey
First, let’s take a second to look West is a theoretical physicist
But the truth is edge computing at this amorphous beast we call and distinguished professor at
is an easily predictable pattern as edge computing. Normally my the Santa Fe Institute, which
digital operations scale, especial- next sentence would be some- has been the epicenter of com-

23
plexity science for the last three Cities are especially interest- quantities: one quantity varies as
The InfoQ eMag / Issue #87 / November 2020

decades. West studied the ways ing to West (and to me, frankly), a power of another. For instance,
systems handle flow: the move- as vehicular infrastructure has considering the area of a square
ment of shared resources be- evolved into a very similar pat- in terms of the length of its side,
tween the agents of the system. tern. Interstates bring traffic to if the length is doubled, the area
Think blood vessels delivering highways and major thorough- is multiplied by a factor of four.
oxygen to cells, the electric grid fares, which in turn feed local
delivering electrons to machines streets and neighborhood lanes. In flow systems, components
and appliances, or even city Air infrastructure has major inter- have a power law relationships
infrastructure delivering passen- national hubs feeding smaller with each other, The nature of
gers to homes and businesses. national airports which may even those relationships can tell us a
feed tiny regional airports. The lot about how systems will con-
What West noticed is that these shipping industry has massive tinue to evolve over time, Sys-
systems seem to evolve in very ships that feed railroads and gi- tems where key resources scale
analogous patterns. Our circula- ant semi-trucks that in turn feed at a power law less than one
tory system, for example, delivers local distribution warehouses with respect to the infrastructure
blood throughout our body using that feed (often smaller) trucks depended on to deliver those
a system that has huge central that feed local stores. resources will hit limits that pre-
blood vessels which deliver blood clude further growth.
to smaller vessels, that in turn This pattern of large, core “trunk”
do the same, until you reach the flows with “limbs, branches, and In mammals, metabolism in-
capillaries that are tiny, but serve leaves” is incredibly common in creases with a ¾ power law rela-
very specific sets of cells. complex systems that handle tive to size. Thus, a larger animal
flow. So much so, in fact, that has a higher overall metabolic
Our electric grid has evolved such West and his colleagues dis- rate (the energy used to maintain
that large, central generators covered there are mathematical cellular function per unit of time)
feed a massive core transmis- patterns in the way these sys- than its smaller counterparts,
sion infrastructure (sometimes tems are structured. While some but the difference is not directly
operating at tens of thousands of elements scale very quickly, and proportional. As the animal gets
volts!), which in turn feeds more others scale very slowly relative larger, the change in metabolic
localized transmission infra- to the system as a whole, all rate gets lower for each addi-
structure (working at hundreds scale with an exponential trend tional kilogram. So, on a per-cell
or thousands of volts), which known as a “power law”. basis, the metabolism actually
again passes through step-down gets lower in larger animals than
transformers at the neighbor- Power laws and limits in smaller animals, but because
hood level to become the local A power law is defined as follows the number of cells is so much
standard voltage that runs in our by Wikipedia: higher, it adds up to a larger met-
homes. (Even our homes are one abolic rate.
further step-down point as the In statistics, a power law is
incoming current is controlled a functional relationship between Combine this with the interesting
and distributed to allow the same two quantities, where a relative fact that all mammals have the
utility lines to serve dozens of change in one quantity results in same average number of heart-
different devices simultaneously. a proportional relative change in beats in their lifespan. Tiny ani-
the other quantity, independent mals that live short lives have to
of the initial size of those beat their tiny heart much faster

24
to maintain blood flow through of computing architecture will It turns out that some of the

The InfoQ eMag / Issue #87 / November 2020


their blood vessels. Large an- change with it? highest volume of chatter in a
imals can ship more blood a distributed application is be-
longer distance with more effi- We can start with the simple fact tween the end nodes (things like
ciency (thanks to a bigger pump), that two fundamental elements mobile phones, laptops, and even
and thus the heart doesn’t have of distributed computing aren’t digital sensors) and the serv-
to beat as often to maintain changing anytime soon: the ers that deliver data and user
the same blood pressure at the time-per-instruction-executed on interfaces to them. For latency
capillaries (where oxygen and modern CPUs (which has levelled sensitive applications, having ev-
carbon dioxide are exchanged out in the last decade or so), and eryone globally reach one or two
with the cells). However, there is the speed of light. Anything we data centers results in some per-
a limit there, too, as the physics do to scale distributed computing centage of the population having
of animal blood flow hit a wall at will be limited by these con- less than optimal performance.
about the size of an elephant for stants. This means that a) the
land animals, or a blue whale for only way to gain more computing OK, so distribute everything to
ocean dwelling mammals. power is to add more processors, computers as close as possible
and b) operational latency will to those end nodes, you might
Interestingly, the slower metabol- naturally suffer if you increase think. Well, now you have the
ic rate per cell in larger animals geographic distance between problem that the backend ser-
explains another limit. Ever won- processors.  vices that depend most on each
der why large animals live longer other are talking over longer
than small ones? Well, that slow Thus, as computing power grows, distances (and through more net-
cellular metabolic rate means the performance gain of each work hops, etc.). This may make
cells in large animals wear out additional GB/s will result in the entire system less efficient.
slower than those in smaller a smaller gain in performance
animals, thus contributing to a than the last. At some point, the There is a third option, however.
longer overall life span. (There gain in processing power will Place as much of the system
are obviously many details that largely be displaced by the loss that directly interfaces with end
I am leaving out, but read West’s in network performance, and the nodes in computers that can
book if you are curious to know application won’t be able to scale be placed as close to those end
more.) effectively any further. nodes as possible. Then, place
the services (and data) that are
Power Laws and Computing Edge and Data Centers Are Both heavily dependent on one anoth-
So, let’s apply this line of thinking Essential er in data centers (hopefully also
to another system in which a re- How do you address that? Well, distributed, but not as extensive-
source (data) must flow between the first obvious answer might be ly). This creates a “trunk, limb,
agents (computers) at great to reduce the distance between branch” model that enables the
scale. The Internet is an amazing compute nodes that depend on direct interaction between end
example of such a system, and each other. The more “chatter” nodes to be handled by local
the application portfolio (and between nodes, the less network edge nodes, but the exchange of
supporting infrastructure) for the distance there should be between shared data, events, and other
average Fortune 2000 company the two (ideally). So centralizing interactions between services to
is a great subset. As the scale of everything in data centers should be optimized in central comput-
these systems increases, how be perfect, right? ing locations (data centers).
might we predict the patterns

25
This mimics what happens in to public cloud providers (with a

TL;DR
The InfoQ eMag / Issue #87 / November 2020

nature in very real ways, and few, very large scale exceptions).
is why I believe that the future
architectures of enterprise and While I assume that just about • Edge computing rep-
commercial systems will follow any company with distributed resents a major new
this pattern (though it will morph physical operations has some architectural option for
a little from this strict interpreta- form of edge computing today, enterprise computing,
tion). We’ll see massive growth I would be surprised if more but its rapid adoption
in edge computing because than 30% or so actually have an should not be surprising.
of it (though anyone using an edge computing strategy. We all
application or content delivery should, however, as this is a nat- • Geoffrey West of the
network is already doing it). And, ural evolution of the world’s great Santa Fe Institute wrote
you’ll see data center business digital complex adaptive system, a book, Scale: The Uni-
continue, though more and more Internet-based computing. versal Laws of Growth,
of that business is likely to move Innovation, Sustainabil-
ity, and the Pace of Life
in Organisms, Cities,
Economies, and Com-
panies, that explains the
evolution of flow-centric
systems like distributed
computing

• Key elements of distrib-


uted computing, namely
processing power and
network performance
scale at a power law of
less than one relative to
one another

• The limitations of these


key elements at scale
make edge computing a
logical addition to data
centers and end-user
computing for software
at scale

26
SPONSORED ARTICLE

The InfoQ eMag / Issue #87 / November 2020


Apps Are Becoming Distributed,
What about Your Infra?

You’ve read how modern compa- these are software-first com- • Digital transformation of
nies are disrupting their legacy panies in their categories. And business processes wher-
peers by pursuing a data-driven why are they software first? They ever they’re needed — at the
culture… are data-driven. branch, the factory, the store…
even the car or plane
• Chick-fil-A’s edge use case Their secret sauce is the con-
• The distribution of workloads
allows them to transform tinuous pursuit to process and
and clusters themselves
their fast food restaurants; extract insights from data to
due to modern app architec-
throwing jet fuel on top of drive their business decisions
ture and deployment trends,
their already impressive and operational efficiencies. So
including micro-services,
growth. if that’s the recipe, why aren’t
containers and multi-cloud
incumbents doing it?
• An industry-first, soft-
ware-only mobile infrastruc- As a result, distributed apps and
ture by Rakuten Mobile DISTRIBUTED DATA & APPS... data in the enterprise typically
require the following:
• Tesla Autopilot functional- OH MY!
ity getting ever so close to
• Real-time responses
Level 5, with new features THE TREND
and enhancements pushed to • Data is getting increasingly • Hyper-localization aka...edge 
vehicles almost bi-weekly distributed in the enterprise, • Near-zero downtime
and for multiple reasons,
There are a myriad of examples, including: • Intelligent management and
but the common thread is this: security

To address the above, there


are technological and opera-
tional shifts required. Those
changes can be summarized
in this figure .

Please read the full-length version of this article here.

27
The InfoQ eMag / Issue #87 / November 2020

State at the Edge: an Interview


with Peter Bourgon
by Richard Seroter, Director of Outbound Product Management at Google Cloud

At this year’s QCon London, Fast- store state? InfoQ reached out to tions to be fast, and the speed of
ly’s Peter Bourgon did a well-re- Peter to further explore a handful light imposes some unavoidable
ceived talk about the challenges of areas that we think are inter- physical constraints on how we
of state management in esting to our readers.  can meet those expectations.
distributed systems. Specifically, Past a certain point, the only way
he talked about how an InfoQ: Do you think “edge” is the to decrease latency and improve
architecture and communication next widely adopted architec- experiences is to move your
model for a global-scale edge ture for modern systems? Or is application, in whole or in part,
platform. it an important, but small niche physically closer to your users.
suitable for a specific subset of And edge platforms are the way
It was a talk that addressed a systems? And if so, which ones? to do that.
wide-range of topics. Is there
a central source of truth for the Bourgon: I don’t think that “edge” It’s true that not all systems
data? How should data be syn- is an architecture in itself, but will take the same value from
chronized across the system? rather it’s a component of an ar- extending their state and logic
What’s the right data structure to chitecture. Users expect applica- out of the datacenter. There’s

28
definitely some re-thinking, making a traditional transaction operations. Over-simplifying, if

The InfoQ eMag / Issue #87 / November 2020


re-factoring, and re-architecting becomes cost-prohibitive. You you make sure the operations
work involved, and that’s always have to allow users to manipulate are associative, commutative,
an engineering decision of costs state locally, without establish- and idempotent, then CRDTs
and benefits. But, looking to the ing global consensus, and this allow you to apply them in any
future, I think that the role of the means opting in to non-tradition- order, including with duplicates,
edge in system design is going to al, typically eventually consistent, and get the same, deterministic
get bigger and more important. data systems. results at the end. Said another
way, CRDTs have built-in conflict
InfoQ: You say in your talk that InfoQ: In your talk, you go into resolution, so you don’t have
a “general purpose database” some depth on conflict-free rep- to do that messy work in your
isn’t a fit for stateful edge sys- licated data types (CRDTs). Can application. Formally, they exhibit
tems. Can you explain a bit why you explain more about these, something called strong eventual
someone wouldn’t want a sin- why they make sense for a state- consistency.
gle centralized database for an ful edge architecture, and how/
edge system? Is it entirely about what data is stored? This property, by itself, means
latency? any system built with CRDTs
Bourgon: Arguably the hardest can have absolutely trivial fault
Bourgon: I think latency is the part of distributed systems is management. Try to send the op-
biggest reason. If you’re ex- dealing with faults. Computers erations, in any order and without
tending your system out of the are ephemeral, networks are any coordination, to the nodes
datacenter and toward the edge, unreliable, topologies change that need them. If there’s a prob-
at least part of the reason is — the fallacies of distributed lem, just try again later. That’s it.
presumably that the round-trip computing are well-known, and As long as the operations even-
costs are too high otherwise. Us- accommodating them tends tually get where they need to go,
ers experience those costs with to dominate the engineering the system is guaranteed to be
static assets, with application effort of successful systems. correct. By choosing a smarter
logic, and also with state, so if And if your system is managing state primitive, we can build a
your transactions always go back state, things get much more much simpler and more reliable
to your origin, you’re not taking difficult: maintaining a useful system.
full advantage of the architecture. consistency model for users
requires extremely careful CRDTs enable a lot of cool use
But there’s another part, too, coordination, with stronger cases, like offline-first docu-
which is related to consistency. consistency typically demanding ment editing that can sync up
With a centralized database, it’s commensurate effort. This automatically. In the context of
relatively easy to express trans- inevitably corresponds to more edge state, they let us keep our
actions against a single, logical, bugs and less reliability. transactions local to the point of
coherent global “truth”. Because presence, to meet our latency re-
all the parts of the system aren’t CRDTs, or conflict-free replicated quirements, while still being able
separated by very much distance, data types, are a relatively novel to share state globally.
you can perform the necessary state primitive that give us a way
communication for that transac- to skirt around a lot of this com- Of course these benefits don’t
tion quickly, and stay within la- plexity. I think of them as careful- come for free. For one thing, it’s
tency budgets. But if you balkan- ly constructed data types, each not always easy, or even obvi-
ize your state all over the world, combined with a specific set of ous, how to model your data as

29
CRDTs. Simple-seeming opera- the messaging pattern, but ex- to know how their network
The InfoQ eMag / Issue #87 / November 2020

tions like delete can have fiend- perience has taught me that it’s interfaces translate binary
ishly complex CRDT equivalents. significantly easier with a syn- data to electrical signals over
Also, the sometimes messy de- chronous RPC-style approach. copper, or ethernet frames to
tails of multiple parallel versions Similar to how CRDTs eliminate binary, or datagrams to ethernet
of state tend to surface in the entire classes of complexity frames, and so on — the OSI
APIs of these systems, and ap- related to fault handling, syn- model’s abstractions enable
plications have to adapt to deal chronous RPCs eliminate entire the developer to spend their
with them, which isn’t always classes of complexity related to complexity budget on a much
easy. Most significantly, a single queueing theory. They get you higher level. It’s basically stood
byte of usable, logical state in a automatic backpressure, they let the test of time, so I think it’s a
CRDT requires many more bytes you take advantage of the queues pretty good set of abstractions.
of actual memory, which can that already exist at various lay-
quickly render the economics of ers of the operating system and Developers have also relied
these systems infeasible. network stack, and, importantly, on the abstraction of a single,
they make it a lot easier to build global truth in their data layer
Like any technology put to pro- deterministic components. since essentially the very first
ductive use, engineering compro- database. It’s understandable
mises, trade-offs, and optimiza- InfoQ: You concluded your and productive, so that makes
tions are required to make CRDTs talk by saying that “consensus sense. But I think it’s a bit like the
viable. rounds, or leader election, or Newtonian model for physics: it
distributed locks, or distributed works, until it doesn’t. At really
InfoQ: You made the argument transactions” are dead ends, large scale, Newtownian physics
that synchronous calls for state and that large scale systems are no longer predicts real-world
synchronization might be the going to use simple communica- behavior, so we have to switch
better option than an asynchro- tion, and be eventually consis- to the more complex Relativistic
nous, event-driven one. Why is tent. Tell us more about why you model. Similarly, when we start
that? think that’s the case. extending our data layer across
large physical distances, I believe
Bourgon: This was sort of a mi- Bourgon: I think the whole the abstraction of a single, global
nor point, but an important one, of human technological truth begins to leak. An incredible
related to the implementation of achievement has been an amount of engineering effort is
distributed systems. It’s essen- exercise in creating, reifying, required to maintain the illusion
tially impossible to build confi- and extending abstractions. of atomicity, using techniques
dence in the safety and reliability The human capacity for like the ones I listed. At some
of large-scale systems like this understanding and managing point, the developers working in
without being able to run the complexity is essentially fixed; in and around this layer of abstrac-
system under deterministic test, order to make more and greater tion are going to exhaust their
or simulation. And it’s essentially things possible, we have to complexity budget.
impossible to simulate a system use abstractions to “wall off”
unless each component can be domains of complexity behind Typical RDBMS systems and
modeled as a plain, determinis- simpler models that can be common distributed systems
tic state machine. It’s certainly more easily understood and built techniques are incredibly useful
possible to get these properties upon. For example, application and productive up to a certain
using asynchronous events as developers today don’t need scale. But past that scale, the

30
The InfoQ eMag / Issue #87 / November 2020
TL;DR
complexity required to prop up to build reliable, large-scale dis-
the illusion of global atomicity tributed systems.
becomes unreliable, unproduc-
tive, and ultimately unjustifiable. For what it’s worth, this isn’t ex- • Latency is the biggest
While there’s an upfront cost to actly novel thinking. The natural reason that a gener-
thinking about multiple parallel world is full of incredibly complex al purpose database
universes of state in your appli- systems whose behaviors are isn’t a good fit for edge
cation, I believe biting that bullet emergent from simple primitives architectures.
offsets orders of magnitude and rules. My favorite example is
• CRDTs, or conflict-free
more hidden complexity in the probably how groups of fireflies
replicated data types, are
alternate, leaky abstraction. And manage to synchronize their
a relatively novel state
I believe that, eventually, we’re lights — no leader election or
primitive that give us a
going to realize it’s the only way consensus rounds involved.
way to skirt around a lot
of the complexity around
consistency.

• Similar to how CRDTs


eliminate entire classes
of complexity related
to fault handling, using
synchronous remote
procedure calls for syn-
chronization eliminate
entire classes of com-
Peter Bourgon
plexity related to queue-
s currently leading re- ing theory.
search and development • Typical RDBMS systems
on a global infrastructure and common distributed
for state at the edge at systems techniques are
Fastly, a CDN and edge incredibly useful and
cloud platform. He is productive up to a cer-
the author of Go kit, the tain scale, and after that,
preeminent toolkit for the complexity becomes
microservices in Go; and unjustifiable.
several large-scale coordi-
nation-avoiding distributed
systems, including Roshi
(stream index) and OK Log
(log aggregation).

31
InfoQ @ InfoQ InfoQ InfoQ

Curious about
previous issues?
The InfoQ eMag / Issue #83 / March 2020 The InfoQ eMag / Issue #77 / October 2019 The InfoQ eMag / Issue #81 / January 2020

Service Mesh Taming Complex Microservices:


Ultimate Guide Systems in Production Testing, Observing,
and Understanding
@emilywithcurls

Service Service Mesh Exploring the An Engineer’s Sustainable Operations Testing in Tyler Treat on
12 Microservices Obscuring
Mesh Implementations (Possible) Future of Guide to a Good in Complex Systems with Production—Quality Microservice
Testing Techniques Complexity
Features and Products Service Meshes Night’s Sleep Production Excellence Software, Faster Observability

FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT
FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT

This eMag aims to answer To tame complexity and its This eMag takes a deep
pertinent questions for effects, organizations need dive into the techniques and
software architects and a structured, multi-pronged, culture changes required
technical leaders, such as: human-focused approach, to successfully test,
what is a service mesh?, that: makes operations observe, and understand
do I need a service mesh?, work sustainable, centers microservices.
and how do I evaluate the decisions around customer
different service mesh experience, uses continuous
offerings? testing, and includes chaos
engineering and system
observability. In this
eMag, we cover all of these
topics to help you tame the
complexity in your system.

You might also like