Professional Documents
Culture Documents
IN THIS ISSUE
PRODUCTION EDITOR Ana Ciobotaru / COPY EDITORS Lawrence Nyveen & Susan Conant / DESIGN Dragos Balasoiu
GENERAL FEEDBACK feedback@infoq.com / ADVERTISING sales@infoq.com / EDITORIAL editors@infoq.com
CONTRIBUTORS
www.dbooks.org
A LETTER FROM
THE EDITOR
Application Programming In- performance targets. Services
terfaces (APIs) are seemingly must also be monitored, both
everywhere. Thanks to the pop- from an operational perspec-
ularity of web-based products, tive (SLIs), and also a business
cloud-based X-as-a-service perspective (KPIs).
offerings, and IoT, it is becom-
ing increasingly important for According to Akamai research,
engineers to understand all API calls now make up 83% of
aspects of APIs, from design, to all web traffic. In “Four Case
building, to operation. Studies for Implementing
Real-Time APIs”, Karthik Krish-
Daniel Bryant Research shows that there is naswamy explores the idea that
increasing demand for near competitive advantage is no
works as a Product Architect at
Datawire, and is the News Manager real-time APIs, in which speed longer won by simply having
at InfoQ, and Chair for QCon London. and flexibility of response is APIs; the key to gaining ground
His current technical expertise
focuses on ‘DevOps’ tooling, cloud/ vitally important. This emag is based on the performance
container platforms and microservice
implementations. Daniel is a leader
explores this emerging trend in and the reliability of those APIs.
within the London Java Community more detail.
(LJC), contributes to several open
source projects, writes for well-known The four case studies all pro-
technical websites such as InfoQ, In “Real-Time APIs: Mike vide interesting lessons. How-
O’Reilly, and DZone, and regularly
presents at international conferences Amundsen on Designing for ever, a key takeaway is that
such as QCon, JavaOne, and Devoxx. Speed and Observability”, we when revenue is correlated with
learn how to meet the increas- speed, performance trumps
ing demand placed on API- feature richness in API man-
based systems, and explore agement solutions. Teams
three areas for consideration: should recognize when this is
architecting for performance, the case, and design system
monitoring for performance, applications and infrastructure
and managing for performance. accordingly.
5
www.dbooks.org
The Challenges of Building a Reliable
The InfoQ eMag / Issue #86 / October 2020
Globally, there is an increasing also be used to stream high vol- server-side complexity, orscal-
appetite for data delivered in umes of data between different ing to support high-volume
real time. Since both producers businesses or different compo- data streams.
and consumers are more and nents of a system.
more interested in faster expe- What exactly is a real-time
riences and instantaneous data This article starts by exploring API?
transactions, we are witnessing the fundamental differences Usually, when we talk about
the emergence of the real-time between the REST model and data being delivered in real
API. real-time APIs. Up next, we dive time, we think about speed. By
into some of the many engi- this logic, one could assume
This new type of event-driven neering challenges and consid- that improving REST APIs to
API is suitable for a wide variety erations involved in building a be more responsive and able
of use cases. It can be used to reliable and scalable event-driv- to execute operations in real
power real-time functionality en ecosystem, such as choos- time (or as close as possi-
and technologies such as chat, ing the right communication ble) makes them real-time
alerts, and notifications or IoT protocol and sub scription APIs. However, that’s just an
devices. Real-time APIs can model, managing client, and improvement of an existing
condition, not a fundamental
change. Just because a tra-
ditional REST API can deliver
data in real time does not
make it a real-time API.
6
The InfoQ eMag / Issue #86 / October 2020
Streaming, Pub/Sub, and Push Let’s look at a simple and fiction- Compared to REST, event-driven
are patterns that can be success- al example to better understand APIs invert complexity and put
fully delivered via an event-driven how these components interact. more responsibility on the shoul-
architecture. This makes all of Let’s say we have a football app ders of producers rather than
them fall under the umbrella of that uses a data stream to deliver consumers. (See Figure 2)
event-driven APIs. real-time updates to end-users
whenever something relevant This complexity inversion relates
Unlike the popular request-re- happens on the field. If a goal to the very foundation of the way
sponse model of REST APIs, is scored, the event is pushed event-driven APIs are designed.
event-driven APIs follow an to a channel. When a consumer While in a REST paradigm the
asynchronous communication uses the app, they connect to the consumer is always responsible
model. An event-driven archi- respective channel, which then for maintaining state and always
tecture consists of the following pushes the event to the client has to trigger requests to get up-
main components: device. dates. In an event-driven system,
the producer is responsible for
• Event producers—they push Note that in an event-driven maintaining state and pushing
data to channels whenever an architecture, producers and updates to the consumer.
event takes place. consumers are decoupled.
Components perform their task Event-driven architecture
• Channels—they push the data
independently and are unaware considerations
received from event produc-
of each other. This separation Building a dependable
ers to event consumers.
of concerns allows you to more event-driven architecture is by no
• Event consumers—they reliably scale a real-time system means an easy feat. There is an
subscribe to channels and and it can prevent potential is- entire array of engineering chal-
consume the data. sues with one of the components lenges you will have to face and
from impacting the other ones. decisions you will have to make.
Among them, protocol fragmen-
7
www.dbooks.org
tation and choosing the right model is simpler from a producer to data when they are online
The InfoQ eMag / Issue #86 / October 2020
subscription model (client-initi- perspective: if no consumers are (connected) and don’t care
ated or server-initiated) for your subscribed to the data stream, what happens when they are
specific use case are some of the the producer has no work to do disconnected. Due to this fact,
most pressing things you need to and relies on clients to decide the complexity of the producer
consider. when to reconnect. Additionally, is reduced, as the server-side
complexities around maintaining doesn’t need to be stateful.
While traditional REST APIs all state are also handled by con- The complexity is kept at a low
use HTTP as the transport and sumers. Wildly popular and ef- level even on the consumer side:
protocol layer, the situation is fective even when high volumes generally, all client devices have
much more complex when it of data are involved, WebSockets to do is connect and subscribe to
comes to event-driven APIs. You represent the most common the channels they want to listen
can choose between multiple dif- example of a client-initiated to for messages.
ferent protocols. Options include protocol.
the simple webhook, the newer There are several client-initiated
WebSub, popular open protocols In contrast, with a server-initi- protocols you can choose from.
such as WebSockets, MQTT or ated approach, the producer is The most popular and efficient
SSE, or even streaming protocols, responsible for pushing data to ones are:
such as Kafka. consumers whenever an event
occurs. This model is often pref- • WebSocket. Provides full-du-
This diversity can be a dou- erable for consumers, especially plex communication channels
ble-edged sword—on one hand, when the volume of data increas- over a single TCP connection.
you aren’t restricted to only one es, as they are not responsible Much lower overhead than
protocol; on the other hand, you for maintaining any state—this half-duplex alternatives such
need to select the best one for responsibility sits with the pro- as HTTP polling. Great choice
your use case, which adds an ducer. The common webhook— for financial tickers, loca-
additional layer of engineering which is great for pushing rather tion-based apps, and chat
complexity. infrequent low-latency updates— solutions.
is the most obvious example of a
• MQTT. The go-to protocol
Besides choosing a protocol, you server-initiated protocol.
for streaming data between
also have to think about sub-
devices with limited CPU
scription models: server-initiated We are now going to dive into
power and/or battery life, and
(push-based) or client-initiated more details and explore the
networks with expensive or
(pull-based). Note that some strengths, weaknesses, and
low bandwidth, unpredict-
protocols can be used with both engineering complexities of cli-
able stability, or high latency.
models, while some protocols ent-initiated and server-initiated
Great for IoT.
only support one of the two sub- subscriptions.
scription approaches. Of course, • SSE. Open, lightweight,
this brings even more engineer- Client-initiated vs. server- subscribe-only protocol for
ing complexity to the table. initiated models—challenges and event-driven data streams.
use cases Ideal for subscribing to data
In a client-initiated model, the Client-initiated models are the feeds, such as live sport
consumer is responsible for best choice for last-mile deliv- updates.
connecting and subscribing to an ery of data to end-user devices.
event-driven data stream. This These devices only need access
8
Among these, WebSocket is Tennis Australia was interested // Subscribe to messages on
9
www.dbooks.org
ample, let’s say you have two are HTTP-based, so there’s an To avoid having to deal with
The InfoQ eMag / Issue #86 / October 2020
servers (A and B) that are con- overhead with each webhook complex engineering challenges,
suming two streams of data. event (message) because each HubSpot decided to use Ably as
Let’s imagine that server A fails. one triggers a new request. In a message broker that enables
How does server B know it needs addition, webhooks provide no chat communication between
to pick up additional work? How integrity or message ordering end-users. Furthermore, Ably
does it even know where server guarantees. uses a server-initiated model
A left off? This is just a basic to push chat data into Amazon
example, but imagine how hard it That’s why for streaming data Kinesis, which is the data pro-
would be to manage hundreds of at scale you should usually go cessing component of HubSpot’s
servers consuming hundreds of with a streaming protocol such message bus ecosystem. (See
data streams. As a general rule, as AMQP, Kafka, or ActiveMQ, to Figure 3)
when there’s a lot of complex- name just a few. Streaming pro-
ity involved, it’s the producer’s tocols generally have much lower Consumer complexity is kept to
responsibility to handle it; data overheads per message, and they a minimum. HubSpot only has to
ingestion should be as simple as provide ordering and integrity expose a Kinesis endpoint and
possible for the consumer. guarantees. They can even pro- Ably streams the chat data over
vide additional benefits—idempo- as many connections as needed.
That’s why in the case of stream- tency, for example. Last but not
ing data at scale you should least, streaming protocols enable A brief conclusion
adopt a server-initiated model. you to shard data before stream- Hopefully, this article offers a
This way, the responsibility of ing it to consumers. taste of what real-time APIs are
sharding data across multiple and helps readers navigate some
connections and managing It’s time to look at a real-life im- of the many complexities and
those connections rests with plementation of a server-initiated challenges of building an effec-
the producer. Things are kept model. HubSpot is a well-known tive real-time architecture. It is
rather simple on the consumer developer of marketing, sales, naive to think that by improving a
side—they would typically have and customer service software. traditional REST API to be quicker
to use a load balancer to distrib- As part of its offering, HubSpot and more responsive you get a
ute the incoming data streams to provides a chat service (Conver- real-time API. Real time in the
available nodes for consumption, sations) that enables communi- context of APIs means so much
but that’s about as complex as it cation between end-users. The more.
gets. organization is also interested
in streaming all that chat data to By design, real-time APIs are
Webhooks are often used in serv- other HubSpot services for on- event-driven; this is a funda-
er-initiated models. The webhook ward processing and persistent mental shift from the request-re-
is a very popular pattern because storage. Using a client-initiated sponse pattern of RESTful
it’s simple and effective. As a subscription to successfully services. In the event-driven
consumer, you would have a load stream high volumes of data to paradigm, the responsibility is in-
balancer that receives webhook their internal message buses is verted, and the core of engineer-
requests and distributes them to not really an option. For this to ing complexities rests with the
servers to be processed. How- happen, HubSpot would need to data producer, with the purpose
ever, webhooks become less know what channels are active of making data ingestion as easy
and less effective as the volume at any point in time, to pull data as possible for the consumer.
of data increases. Webhooks from them.
10
But having one real-time API is not enough—this is not a solution to a
• When it comes to
event-driven APIs, engi-
neers can choose between
multiple different protocols.
Options include the simple
webhook, the newer Web-
Sub, popular open protocols
such as WebSockets, MQTT
or SSE, or even streaming
protocols, such as Kafka. In
addition to choosing a pro-
tocol, engineers also have
to think about subscription
models: server-initiated
(push-based) or client-initi-
ated (pull-based).
11
www.dbooks.org
The InfoQ eMag / Issue #86 / October 2020
In a recent apidays webinar, Mike both an increased scale of oper- The recent adoption of cloud
Amundsen, trainer and author of ation and a decreased response technology and microser-
the recent O’Reilly book API Traf- time. vices-based architecture has
fic Management 101, presented enabled innovation and increased
“High Performing APIs: Architect- The changing nature of cus- speed of development through-
ing for Speed at Scale”. Drawing tomer expectations, combined out the business world.
on recent research by IDC, he with new data-driven products
argued that organizations will and an increase in consumption However, this technology and
have to drive systemic changes at the edge (mobile, IoT, etc.), architecture style is not always
to meet the upcoming increased has meant that the need for low conducive to creating performant
demand for consumption of latency “real-time APIs” is rapidly applications; in the cloud nearly
business services via APIs. This becoming the norm. everything communicates over
change in requirements relates to a virtualized network, and in a
12
service-oriented architecture 90% of new applications being areas for consideration: archi-
13
www.dbooks.org
The vast majority of infrastruc- services can be independently results, which can remove the
The InfoQ eMag / Issue #86 / October 2020
ture components and services released and independently need to constantly query up-
within a cloud platform are con- scaled. Embracing asynchronous stream services. Data may also
nected over a network—e.g. block methods of communicating, such need to be “staged” appropriately
stores are often implemented as as messaging and events, can re- throughout the entire end-to-end
network-attached storage (NAS). duce wait states. This offers the request handling process. For
Colocation of components is not possibility of an API call rapidly example, caching results and
guaranteed—e.g. an API gateway returning an acknowledgment data in localized points of pres-
might be located in a different and decreasing latency from an ence (PoPs) via content delivery
physical data center than the end-user perspective, even if the networks (CDNs), caching in an
virtual machine (VM) a backend actual work has been queued API gateway, and replication of
application is running on. The at the backend. Engineers will data stores across availability
global reach of cloud platforms also need to build in transaction zones (local data centers) and
opens opportunities for reaching reversal and recovery processes globally. For some high trans-
new customers and also provides to reduce the impact of inevitable action throughput use cases,
more effective disaster recov- failures. Additional thought lead- writes may have to be streamed
ery options, but this also means ers in this space, such as Bernd to meet demand, for example,
that the distance between your Ruecker, encourage engineers writing data locally or via a high
customers and your services can to build “smart APIs” and learn throughput distributed logging
increase. more about business process system like Apache Kafka for
modeling and event sourcing. writing to an external data store
Amundsen suggested that en- at a later point in time.
gineering teams redesign com- For systems to perform as
ponents into smaller services, required, data read and write Engineers may have to “re-
following the best practices patterns will frequently have to think the network,” (respecting
associated with the microser- be reengineered. Amundsen sug- the eight fallacies of distributed
vices architectural style. These gested judicious use of caching computing), and design their
cloud infrastructure to follow
best practices relevant to their
cloud vendor and application
architecture. Decreasing request
and response size may also be
required to meet demands. This
may be engineered in tandem
with the ability to increase the
message volume. The industry
may see the “return of the RPC,”
with strongly-typed communi-
cation contracts and high-per-
formance binary protocols. As
convenient as JSON is, compared
to HTTP, a lot of computation
(and time) is used in serialization
and deserialization, and the text-
14
based message payloads sent cloud networking may consist of are increasingly adopting the
With infrastructure emitting lenge as much as it is a technical for “insight” into how a system is
metrics and services producing challenge. performing are vitally important.
SLI- and KPI-based metrics, As stated by Dr. Nicole Forsgren
the corresponding data must be Managing for performance and colleagues in Accelerate, the
effectively captured, processed, Amundsen stated that the orga- four key metrics that are correlat-
and presented to key stake- nization should aim to create a ed with high performing organi-
holders for this to be acted on. “dashboard culture.” Monitoring zations are lead time, deployment
This is often a cultural chal- top-line metrics, user traffic, and frequency, mean time to restore
the continuous delivery process (MTTR), and change failure
15
www.dbooks.org
percentage. Engineers should The elastic nature of cloud infra- top-level metrics related to a
The InfoQ eMag / Issue #86 / October 2020
be able to use internal tooling to structure typically allows almost customer’s experience, and also
effectively solve problems, such unlimited scaling, albeit with allow engineers to test hypothe-
as controlling access to func- potentially unlimited cost. How- ses for issues by drilling-down to
tionality, scaling services, and ever, the culture of managing for see individual service-to-service
diagnosing errors. performance requires the ability metrics. Being able to answer ad
to easily understand how cur- hoc questions of an observability
The addition of security mitiga- rent systems are being used. For or monitoring solution is current-
tion is often in conflict with per- example,engineers need to know ly an area of active research and
formance—for example, adding whether scaling specific compo- product development.
layers of security verification can nents is an appropriate response
slow responses—but this is a to seeing increased loads at any Once the ability to gain insight
delicate balancing act. Engineers given point in time. and solve problems has been
should be able to understand obtained, the next stage of
systems, their threat models, and Diagnosing errors in a cloud- managing for performance is the
whether there are any indicators based microservices system ability to anticipate. This is the
of current problems—e.g. an typically requires an effective capability to automatically recov-
increased number of HTTP 503 user experience (or developer er from failure—building antifrag-
error codes being generated, or experience) in addition to re- ile systems—and the ability to
a gateway being overloaded with quiring a collection of tooling experiment with the system, such
traffic. to collect, process, and display as examining fault-tolerance
metrics, logs, and traces. Dash- properties via the use of chaos
boards and tooling should show engineering.
16
Summary the engineering organization to
17
www.dbooks.org
Load Testing APIs and Websites with
The InfoQ eMag / Issue #86 / October 2020
You open your software engi- After a long pause, “It’s already testing and end up with little time
neering logbook and start writ- late, time goes by. There is so to improve performance overall.
ing—“Research and Development much left to tell you, so many Often, working on improvements
Team’s log. Never could we have things I wish I knew, so many is where you want to spend most
foreseen such a failure with our mistakes I wish I never made.” if not all of your time. Load test-
application and frankly, I don’t Unstoppable, you write, “If only ing is just a means to an end and
get it. We had it all; it was a mas- there was something we could nothing else.
terpiece! Tests were green, met- have done to make it work...”
rics were good, users were happy, If you are afraid because your
and yet when the traffic arrived in And then, lucidity striking back deadline is tomorrow and you are
droves, we still failed.” at you—you wonder: why do we looking for a quick win, then wel-
even do load testing in the first come, this is the article for you.
Pausing to take a deep breath, place? Is it to validate perfor-
you continue—“We thought we mance after a long stretch of Writing the simplest possible
were prepared; we thought we development or is it really to get simulation
had the hang of running a popu- useful feedback from our appli- In this article, we will be writing
lar site. How did we do? Not even cation and increase its scaling a bit of code, Scala code to be
close. We initially thought that capabilities and performance? more precise as we will be using
a link on TechCrunch wouldn’t In both cases validation takes Gatling. While the code might
bring that much traffic. We also place, but in the latter, getting look scary, I can assure it is not.
thought we could handle the feedback is at the center of the In fact, Scala can be left aside
spike in load after our TV spot process. for the moment and we can
ran. What was the point of all this think about Gatling as its own
testing if the application stopped You see, this isn’t really about language:
responding right after launch, load testing per se, the goal is
and we couldn’t fix this regard- focused on making sure your
less of the number of servers we application won’t crash when it
threw at it, trying to salvage the goes live. You don’t want to be
situation as we could.” writing “the cathedral” of load
18
import io.gatling.core.Predef._ val scn = scenario(“simplest”)
19
www.dbooks.org
When you run a Gatling simula- plestPossibleSimulation. is to use a build tool, such as
The InfoQ eMag / Issue #86 / October 2020
tion, it will launch all the scenari- scala inside the user-files/ Maven, or SBT. If this is the
os configured inside at the same simulations folder. Then, from solution you would like to try, you
time, which is what we’ll do right the command line, at the root of can clone our demo repositories
now. the bundle folder, type: using git:
20
You will notice there is a request named “Home The idea is simple: lots of users, the smallest
Configuring the Injection Profile It is difficult to say how many users you can launch
Looking back, when configuring our scenario we at the same time without being too demanding on
did the following: hardware. There are a lot of possible limitations:
CPU, memory, bandwidth, number of connections
scn.inject(atOnceUsers(1))
the Linux kernel can open, number of available
sockets on the machine, etc. Is it easy to strain
Which is great for debugging purposes, but not re-
hardware by putting too high a value and ending up
ally thrilling in relation to load testing. The thing is,
with nonsensical results?
there is no right or wrong way to write an injection
profile, but first things first.
This is where you could need multiple machines to
run your test. To give an example, a Linux kernel, if
An injection profile is a chain of rules, executed in
optimized properly, can easily open 5k connections
the provided order, that describes at which rate you
every second, 10k being already too much.
want your users to start their own scenario. While
you might not know the exact profile you must
Splitting 10 thousand users over the span of a min-
write, you almost always have an idea of the ex-
ute would still be considered a stress test, because
pected behavior. This is because you are typically
of how strenuous the time constraint is:
either anticipating a lot of incoming users at a spe-
cific point in time, or you are expecting more users
scn.inject(
to come as your business grows. rampUsers(10000) during 1.minute
)
Questions to think about include: do you expect
users to arrive all at the same time? This would be Soak test
the case if you intend to offer a flash sale or your If the time span gets too long, the user journey will
website will appear on TV soon; and do you have an start feeling more familiar. Think “users per day,”
idea of the pattern your users will follow? It could “users per week,” and so on. However, imitating
be that users arrive at specific hours, are spread how users arrive along a day is not the purpose of a
across the day, or appear only within working soak test, just the consequence.
hours, etc. This knowledge will give you an angle
of attack, which we call a type of test. I will present When soak testing, what you want to test is the
three of them. system behavior under a long stretch of time: how
does the CPU behave? Can we see any memory
Stress testing leaks? How do the disks behave? The network?
When we think about load testing, often we think
about “stress testing,” which it turns out is only a A way of doing this is to model users arriving over a
single type of test; “Flash sales” is the underlying long period of time:
meaning.
21
www.dbooks.org
scn.inject( You could say doing this 20 times could be a bit
The InfoQ eMag / Issue #86 / October 2020
Capacity test To get started, just run a capacity test that makes
Finally, you could simply be testing how much your application crash as soon as possible. You
throughput your system can handle. In which case only need to add complexity to the scenario when
a capacity test is the way to go. Combining meth- everything seems to run smoothly.
ods from the previous test, the idea is to level the
throughput from some arbitrary time, increase the Then, you need to look at the metrics. (See Figure
load, level again, and continue until everything goes 2).
down and we get a limit. Something like:
What do all of these results even mean?
scn.inject(
constantUsersPerSec(10) during 5.minutes, The above chart shows response time percentiles.
rampUsersPerSec(10) to 20 during When load testing, we could be tempted to use av-
30.seconds, erages to analyze the global response time and that
constantUsersPerSec(20) during 5.minutes, would be error-prone. If an average can give you
rampUsersPerSec(20) to 30 during
30.seconds, a quick overview of what happened in a run, it will
... hide under the rug all the things you actually want
) to look at. This is where percentiles come in handy.
22
The InfoQ eMag / Issue #86 / October 2020
Figure 1
Figure 2
Think of it this way: if the average on your application, and 0.01% of times in ascending order and
response time is some amount of them timed-out, that would be 30 mark spots:
milliseconds, how does the expe- thousand users that weren’t able
rience feel in the worst case for to access your application. • 0% of the responses time is
1% of your user base? Better or the minimum, marking the
worse? How does it feel for 0.1% Percentiles are usually used, lowest value
of your users? And so on, getting which correspond to thinking
• 100% is the maximum
closer and closer to zero. The about this the other way around.
higher the amount of users and The 1% worse case for users is • 50% is the median
requests, the closer you’ll need turned into “how does the expe-
to get to zero in order to study rience feel at best for 99% of the From here, you go to 99% and
extreme behaviors. To give you users,” 0.1% is turned into 99.9%, as close to 100% as possible, by
an example, if you had 3 million etc. A way to model the compu- adding nines, depending on how
users performing a single request tation is to sort all the response many users you have.
23
www.dbooks.org
In the previous chart, we had a This is the most important part. If There is a catch though. Out of
The InfoQ eMag / Issue #86 / October 2020
99 percentile of 63 milliseconds, you know the running conditions these three acceptance criteria,
but is that good or not? With of a test, you can do wonders. only one is actually useful to de-
the maximum being 65, it would You can (and should) test locally, scribe a system under load, can
seem so. Would it be better if it with everything running on your you guess which and what they
were10 milliseconds, though? computer, as long as you under- can be replaced with?
stand what it boils down to: no
Most of the time metrics are inference can be made on how (1) “Mean response time under
contextual and don’t have any it will run in production, but that 250ms”: It should come naturally
meaning by themselves. Broadly allows you to do regression test- from the previous section that
speaking, a 10ms response time ing. Just compare the metrics while average isn’t bad, it isn’t
on localhost isn’t an achieve- between multiple runs, and it will sufficient to describe user behav-
ment, and it would be impossible tell you whether you made the ior, it should always be seconded
from Paris to Australia due to performance better or worse. with percentiles:
the speed of light constraint. We
have to ask, “in what conditions You don’t need a full-scale • Mean response time under
was the run actually performed?” testing environment to do that. 250ms
This will help us greatly deduce If you know your end-users are
• 99 percentile under 250ms
whether or not the run was actu- mobile users, testing with all
ally that good. These conditions machines located in a single data • Max under 1000ms
include: center could lead to a disaster.
However, you don’t need to be (2) “2000 active users”: This one
• What is the type of server the 100% realistic either. Just keep is more tricky. Nowadays we are
application is running? in mind what it implies and what flooded with analytics showing
you can deduce from the testing us “active users” and such, so
• Where is it located?
conditions: a data center being a it should be tempting to define
• What is the application huge local network, results can’t acceptance criteria using this
doing? be compared to mobile networks, measurement. That is an issue
and observed results would in though. The only way to model
• Is it under a network?
fact be way better than reality. an amount of active users direct-
• Does it have TLS? ly is by creating a closed mod-
Specifying the need el. Here, you will end up with a
• What is the scenario doing?
Not only should you be queue of users, and this is where
• How do you expect the appli- aware of the conditions un- the issue lies. Just think about it.
cation to behave? der which the tests are run, If your application were to slow
but you should also decide down and users are on a queue,
• Does everything run on the beforehand what makes the they would start piling up in the
same cloud provider in the test a success or a failure. To queue, but only a small amount,
same data center? do that, we define criteria, as (the maximum amount allowed in
• Are there any kinds of latency such: the application), would be “ac-
to be expected? Think mobile tive” and performing requests.
(3G), long distance. 1. Mean response time under The amount of active users
250ms would stay the same, but users
• Etc.
2. 2000 active users would be waiting outside as they
would be in a shop.
3. Less than 1% failed requests
24
In real life, if a website goes down, people will • We got at least 100 requests/seconds
• Between 100 and 200 new users per second val baseHttpProtocol =
http.baseUrl(“https://computer-database.
• More than 100 requests per second gatling.io”)
// When
(3) “Less than 1% failed requests” was in fact the val scn = scenario(“simplest”)
only criterion that properly represents a system .exec(
under load between the three. However, it is not to http(“Home”)
.get(“/”)
be taken as a rule of thumb. Depending on the use
)
case, 1% may be too high. Think of an e-commerce
site, you might allow some pages to fail here and // Given
there, but having a failure at the end of the conver- setUp(
scn.inject(
sion funnel right before buying would be fatal to
rampUsersPerSec(1) to 200 during
your business model. You would have this specific 1.minute,
request with a failure criterion at 0% and the rest to constantUsersPerSec(200) during
either a higher percentage or no criterion at all. 9.minutes
)
).protocols(baseHttpProtocol) // Then
All of this leads to a specification based on the Giv- .assertions(
en-When-Then model and how to think about load global.requestsPerSec.gte(100),
testing in general, with everything we learned so far: global.responseTime.percentile(99).
lt(250),
global.failedRequests.percent.lte(1)
• Given: Injection Profile )
}
• When: Scenario
• Then: Acceptance Criteria The Scenario (When) part didn’t change (yet), but
you will need the Injection Profile (Given) at the
Using the previous examples, it can look like this: same place, and a new part, called assertions, as
Then. What assertions does is computing met-
• Given: a load of 200 users per second rics from within all the simulation, and will fail the
“build” if they are under/over the requirements.
• When: users visit the home page Using a Maven project, for example, you’ll see:
• Then:
25
www.dbooks.org
[INFO] ------------------------------------ have on the Gatling side. It will help you see if you
The InfoQ eMag / Issue #86 / October 2020
Note that in the previous example we used the 2. Run a load test (capacity, for example)
keyword global as a starting point, but you could
3. Analyze and deduce a limit
be as precise as having a single request having no
failures and ignore all the others: 4. Tune
Additional tooling will help catch the essentials • The Advanced Tutorial to go deeper
that Gatling can’t. As all metrics given by Gatling
• The Cheat-Sheet that lists all the Gatling lan-
are from the point of view of the user, you’ll need
guage keywords and their usage
to make sure you are equipped with system and
application monitoring on the other side of the
spectrum. System monitoring is also very useful to
26
The InfoQ eMag / Issue #86 / October 2020
Four Case Studies for
Implementing Real-Time APIs
by Karthik Krishnaswamy Director | Product Marketing at F5
API calls now make up 83% of That means competitive advan- win business in this highly com-
all web traffic. The age of the tage is no longer won by sim- petitive landscape.
API has arrived, and companies ply having the APIs; the key to
should be well past the point of gaining ground is based on the Automate API management at
just having enthusiasm for de- performance and the reliability scale for microservices: Green-
veloping APIs — they need them of those APIs. According to our field online bank
to survive in digital business. research at NGINX, you need to
But in the digital era, it’s easy for be able to process an API call in Objective: Deliver real-time per-
your customers and partners to 30ms or less in order to deliver formance and reliability at scale
switch services. Don’t like your real-time experiences. to customers accessing appli-
bank? Opening a new account cations from non-smart mobile
in another bank is as simple as Here are four case studies of devices.
downloading an app. APIs have companies that created an API
made it easy for us to consume structure capable of delivering Problem: Microservices-based
services across all industries. the real-time speed needed to applications delivering poor
27
www.dbooks.org
performance and struggling a lot of complexity around both center and the client applications
The InfoQ eMag / Issue #86 / October 2020
with added latency when pro- API scalability and increased accessing those APIs external-
cessing internal API calls for inter-service communication ly (north-south traffic). They
transactions. (east-west traffic), which devel- also need constant connectivity
opers needed to allow microser- between the control plane and
Key takeaway: External services vices to communicate with each data plane, which requires using
can consist of multiple, internal other. third-party modules, scripts, and
API calls among microservices; local databases. Processing a
slow-performing API infrastruc- This posed a significant chal- single request creates significant
ture can degrade service levels. lenge to the bank as most of their overhead — and it only gets more
customers were not accessing complex when dealing with the
Background and challenge their services from smart devic- east-west traffic associated with
This green-field bank was es but using non-smart mobile a distributed application.
launched in 2016 with the goal phones. As a result, the bank
of providing banking services to was not able to rely on custom- Considering that a single trans-
poor and rural areas in South Af- er computing power. Instead, a action or request could require
rica. In order to compete with the simple mobile phone would com- multiple internal API calls, the
incumbent banks, they decided municate with the bank via SMS, bank found it extremely difficult
to build a modern digital banking which would kick off a series of to deliver good user experiences
application from the ground up internal APIs calls to process the to their customers. For example,
using containers and microser- transaction and deliver the result the bank’s existing API man-
vices to provide digital banking back to the customer via SMS. agement solution was adding
services to unbanked or under- anywhere from 100 to 500ms of
banked households. How real-time API management transaction latency to each call.
helped
They built a distributed architec- Unreliable or slow performance The bank had also established
ture based on this microservices can directly impact or even pre- CI/CD tooling and frameworks
reference architecture. This pro- vent the adoption of new digital to automate microservices
vided the bank with the flexibility services, making it difficult for a development and deployment.
needed to deliver new services business to maximize the poten- They had a Site Reliability (SRE)
quickly enough to match the ex- tial of new products and expand team tasked with overseeing
pectations of today’s digital con- its offerings. Thus, it is not only the entire system and needed
sumers, while limiting downtime. crucial that an API processes a solution that could integrate
The bank chose this reference calls at acceptable speeds, but it easily with their CI/CD pipeline
architecture as it was prescrip- is equally important to have an infrastructure.
tive and gave them a roadmap API infrastructure in place that is
to start small – initially hand- able to route traffic to resources Deploying an API management
ing ingress traffic to a modest correctly, authenticate users, se- solution that decouples the data
Kubernetes cluster – and grow cure APIs, prioritize calls, provide and control plane introduced less
to hundreds or even thousands proper bandwidth, and cache API latency and offered high-per-
of microservices connected via a responses. formance API traffic mediation
service mesh. for both external service and
Most traditional APIM solutions inter-service communication.
However, pivoting to a micros- were made to handle traffic Runtime connectivity to the con-
ervices architecture introduced between servers in the data trol plane was no longer needed
28
for the data plane to process and well-equipped to process this within the corporate network,
29
www.dbooks.org
sale (POS) systems will time out the second card they use is not a richer developer portal, API
The InfoQ eMag / Issue #86 / October 2020
after a set limit expires. Cards issued by the same company, it design tools, and API transfor-
will need to be run through again, could potentially mean millions in mation features of their previous
but transactions will automati- lost dollars—all due to timed-out solution.
cally fail to avoid any duplicate API calls.
transactions being charged to The improved transaction laten-
the card. The company pioneered a re- cy and reliability had the added
al-time API reference architec- benefit of helping to create and
At the same time, the organiza- ture to ensure API calls were pro- solidify new revenue streams. As
tion was moving to Open Banking cessed in as close to real-time part of their open banking efforts,
standards, which provide API as possible. For example, they the company is now able to ex-
specifications that enable shar- deployed clusters of two or more pose their core transactional en-
ing of customer-permissioned high availability API gateways to gine to ISVs and developers and
data and analytics with third-par- improve the reliability and resil- win more business as a result of
ty developers and firms to build iency of their APIs. The company the speed and reliability advan-
applications and services—in- also chose to enable dynamic tages they can demonstrate over
creasing the volume of API calls authentication by pre-provision- their competitors.
into the hundreds of billions. ing authentication information
(using API keys and JSON Web Securely process billions of
As a result, the company started Tokens, or JWT), which made transactions using a single
looking for a solution that would authentication almost instanta- API management solution –
be able to manage and scale to neous. In addition, they chose US-based financial services
handle billions of API calls as to delegate authorization to the company
fast as possible — the goal was business-logic layer of their Objective: A lightweight, cost-ef-
set to sub 70ms latency per API backend so that their API gate- fective solution that can process
call. This was deemed as the ways were only responsible for and route API calls for REST APIs,
threshold before customer ex- handling authentication, resulting SOAP APIs, and externally ac-
perience would be impacted for in faster response times to calls. cessed services that meets strict
POS transactions. federal financial compliance
By following these best practices, requirements.
How real-time API management the company was able to achieve
helped response times that were consis- Problem: The company had three
The company’s existing API solu- tently less than 10ms, exceeding existing solutions to manage
tion was adding 500ms of laten- the performance requirements each of the different types of
cy for every API call, causing a by 85%. This delivered tangible, traffic, which required configuring
nominal amount of transactions direct savings—helping to not specific gateways multiple times
to fail. But even a small fraction only recover lost transactions but in order to make changes and
of billions of transactions is a enabling them to process even quickly achieve scale to serve
significant number, especial- more transactions than before. the needs of internal and external
ly when they result in a loss of The company concluded that consumers.
revenue. these performance gains were
more critical to their business Key takeaway: High API trans-
It’s common for customers to try outcomes than some of the ad- action throughput (calls per
paying again with another credit ditional features the incumbent second) is essential for rapid
card when a transaction fails. If solution had, which included adoption
30
Background and challenge ure specific gateways — making able to reduce context switch-
31
www.dbooks.org
The InfoQ eMag / Issue #86 / October 2020
TL;DR
dle around 360 billion API calls per
month — or about 12 billion calls in
a single day with peak traffic of 2
million API calls per second. • API calls now make up 83%
of all web traffic. Competi-
tive advantage is no longer
Real-time APIs require a real-time won by simply having APIs;
API solution the key to gaining ground is
The case studies above serve to based on the performance
and the reliability of those
demonstrate some of the most APIs.
common ways we are helping or-
• Modern systems can
ganizations develop their API pro- consist of multiple internal
grams. We would love to hear from API calls among microser-
vices; slow-performing API
you about your real-time API needs infrastructure can degrade
and experiences with delivering APIs service levels throughout an
organisation’s systems.
in real-time. What API management
challenges are you facing? How are • Processing internal API calls
within the corporate firewall
you dealing with scaling your API instead of routing it through
infrastructure to meet your modern- a cloud outside the corporate
ization needs? network can improve perfor-
mance significantly.
32
SPONSORED ARTICLE
As companies seek to compete in the digital era, APIs become a critical IT and business resource. Archi-
tecting the right underlying infrastructure ensures not only that your APIs are stable and secure, but also
that they qualify as real‑time APIs, able to process API calls end-to-end within 30 milliseconds (ms).
API architectures are broadly broken up into two components: the data plane, or API gateway, and the
control plane, which includes policy and developer portal servers. A real‑time API architecture depends
mostly on the API gateway, which acts as a proxy to process API traffic. It’s the critical link in the perfor-
mance chain.
API gateways perform a variety of functions including authenticating API calls, routing requests to the
right backends, applying rate limits to prevent overburdening your systems, and handling errors and ex-
ceptions. Once you have decided to implement real‑time APIs, what are the key characteristics of the API
gateway architecture? How do you deploy API gateways?
This article addresses these questions, providing a real‑time API reference architecture based on our
work with NGINX’s largest, most demanding customers. We encompass all aspects of the API manage-
ment solution but go deeper on the API gateway which is responsible for ensuring real‑time performance
thresholds are met.
• API gateway. A fast, lightweight data‑plane component that processes API traffic. This is the most
critical component in the real‑time architecture.
• API policy server. A decoupled server that configures API gateways, as well as supplying API lifecycle
management policies.
• API developer portal. A decoupled web server that provides documentation for rapid onboarding for
developers who use the API.
• API security service. A separate web application firewall (WAF) and fraud detection component which
provides security beyond the basic security mechanisms built into the API gateway
• API identity service. A separate service that sets authentication and authorization policies for identity
and access management and integrates with the API gateway and policy servers.
• DevOps tooling. A separate set of tools to integrate API management into CI/CD and developer
pipelines.
33
www.dbooks.org
Real-Time APIs in the Context
The InfoQ eMag / Issue #86 / October 2020
of Apache Kafka
by Robin Moffatt, Senior Developer Advocate at Confluent
34
• Event: CarParked – To store streams of
35
www.dbooks.org
cluster, and the topic (a way of organising data in meet legal compliance. Traditional messaging sys-
The InfoQ eMag / Issue #86 / October 2020
Kafka, kind of like tables in an RDBMS) to which we tems like RabbitMQ, ActiveMQ cannot support such
want to send the event. requirements.
There are clients available for Kafka in many differ- package main
ent languages. Here’s an example of producing an
import (
event to Kafka using Go: “fmt”
36
The InfoQ eMag / Issue #86 / October 2020
of itself, forming a Consumer Group that allows for Any application can consume from this topic, using
processing of your data in parallel. Kafka will then the native Consumer API that we saw above, or by
allocate events to each consumer within the group using a REST call. Just like with the native Con-
based on the topic partitions available, and it will sumer API, consumers using the REST API are also
actively manage the group should members sub- members of a Consumer Group, which is termed a
sequently leave or join (e.g., in case one consumer subscription. Thus with the REST API you have to
instance crashed). declare both your consumer and subscription first:
This means that multiple services can use the curl -X POST -H “Content-Type: application/
vnd.kafka.v2+json” \
same data, without any interdependency between
--data ‘{“name”: “rmoff_consumer”,
them. The same data can also be routed to data- “format”: “json”, “auto.offset.reset”:
stores elsewhere using the Kafka Connect API “earliest”}’ \
which is discussed below. http://localhost:8082/consumers/rmoff_
consumer
The Producer and Consumer APIs are available in curl -X POST -H “Content-Type:
libraries for Java, C/C++, Go, Python, Node.js, and application/vnd.kafka.v2+json” --data
many more. But what if your application wants to ‘{“topics”:[“carpark”]}’ \
http://localhost:8082/consumers/
use HTTP instead of the native Kafka protocol? For
rmoff_consumer/instances/rmoff_consumer/
this, there is a REST Proxy. subscription
Using a REST API with Apache Kafka Having done this you can then read the events:
Let’s say we’re writing an application for a de-
vice for a smart car park. A payload for the event curl -X GET -H “Accept: application/vnd.
kafka.json.v2+json” \
recording the fact that a car has just occupied a
http://localhost:8082/consumers/rmoff_
space might look like this: consumer/instances/rmoff_consumer/records
[
{ {
“name”: “NCP Sheffield”, “topic”: “carpark”,
“space”: “A42”, “key”: null,
“occupied”: true “value”: {
} “name”: “Sheffield NCP”,
“space”: “A42”,
We could put this event on a Kafka topic, which “occupied”: true
},
would also record the time of the event as part of
“partition”: 0,
the event’s metadata. Producing data to Kafka us- “offset”: 0
ing the Confluent REST Proxy is a straightforward }
REST call: ]
37
www.dbooks.org
The InfoQ eMag / Issue #86 / October 2020
take a stream of events and look at the bigger pic- or directly by the client, as required. With this done,
ture—of all the cars coming and going, how many any client can now subscribe to a stream of chang-
spaces are there free right now? Or perhaps we’d es from the original topic but with a filter applied.
like to be able to subscribe to a stream of updates For example, to get a notification when a space is
for a particular car park only? released at a particular car park they could run:
To work with the data in ksqlDB we first need to curl --http2 ‘http://localhost:8088/query-
declare a schema: stream’ \
--data-raw ‘{“sql”:”SELECT TIMESTAMPTO
CREATE STREAM CARPARK_EVENTS (NAME STRING(ROWTIME,’\’’yyyy-MM-dd HH:mm:ss’\’’)
VARCHAR, AS EVENT_TS, SPACE FROM CARPARK_EVENTS
SPACE WHERE NAME=’\’’Sheffield NCP’\’’ and
VARCHAR, OCCUPIED=false EMIT CHANGES;”}’
OCCUPIED
BOOLEAN) This call results in a streaming response from the
WITH (KAFKA_ server, with a header and then when any matching
TOPIC=’carpark’, events from the source topic are received these are
VALUE_
FORMAT=’JSON’); sent to the client:
38
The InfoQ eMag / Issue #86 / October 2020
{“queryId”:”383894a7-05ee-4ec8-bb3b-c5ad39811539”,”columnNames”:[“EVENT_
TS”,”SPACE”],”columnTypes”:[“STRING”,”STRING”]}
…
[“2020-08-05 16:02:33”,”A42”]
…
…
…
[“2020-08-05 16:07:31”,”D72”]
…
We can use ksqlDB to define and populate new streams of data too. By prepending a SELECT statement
with CREATE STREAM streamname AS we can route the output of the continuous query to a Kafka topic.
Thus we can use ksqlDB to perform transformations, joins, filtering, and more on the events that we send
to Kafka. ksqlDB supports the concept of a table as a first-class object type, and we could use this to
enrich the car park events that we’re receiving with information about the car park itself:
You’ll notice we’ve also used a CASE statement to apply logic to the data enabling us to create a running
count of available spaces. The above CREATE STREAM populates a Kafka topic that looks like this:
+----------------+-------+----------+----------------------------+----------+------------
--+
|NAME |SPACE |OCCUPIED |LOCATION |CAPACITY |OCCUPIED_IND
|
+----------------+-------+----------+----------------------------+----------+------------
--+
|Sheffield NCP |E48 |true |{LAT=53.4265964, LON=-1.8426|1000 |1
|
| | | |386} | |
|
Finally, let’s see how we can create a stateful aggregation in ksqlDB and query it from a client. To create
the materialised view, you run SQL that includes aggregate functions:
39
www.dbooks.org
This state is maintained across the distributed ksqlDB nodes and can be queried directly using the REST
The InfoQ eMag / Issue #86 / October 2020
API:
Unlike the streaming response that we saw above, queries against the state (known as “pull queries”, as
opposed to “push queries”) return immediately and then exit:
{“queryId”:null,”columnNames”:[“OCCUPIED_SPACES”],”columnTypes”:[“INTEGER”]}
[30]
If the application wants to get the latest figure, they can reissue the query, and the value may or may not
have changed
There is also a Java client for ksqlDB, and community-authored Python and Go clients.
Continuing from the example of an application that sends an event every time a car parks or leaves a
space, it’s likely that we’ll want to use this information elsewhere, such as:
Using Apache Kafka’s Connect API you can define streaming integrations with systems both in and out of
Kafka. For example, to stream the data from Kafka to S3 in real-time you could run:
40
The InfoQ eMag / Issue #86 / October 2020
}’ cluding loose coupling, service autonomy, elasticity,
flexible evolvability, and resilience.
Now the same data that is driving the notifications
to your application, and building the state that You can use the APIs of Kafka and its surrounding
your application can query directly, is also stream- ecosystem, including ksqlDB, for both subscrip-
ing to S3. Each use is decoupled from the other. tion-based consumption as well as key/value look-
If we subsequently want to stream the same data ups against materialised views, without the need
to another target such as Snowflake, we just add for additional data stores. The APIs are available as
another Kafka Connect configuration; the other native clients as well as over REST.
consumers are entirely unaffected. Kafka Connect
can also stream data into Kafka. For example, the To learn more about Apache Kafka visit developer.
CARPARK_REFERENCE table that we use in ksqlDB confluent.io. Confluent Platform is a distribution
above could be streamed using change data cap- of Apache Kafka that includes all the components
ture (CDC) from a database that acts as the system discussed in this article. It’s available on-premises
of record for this data. or as a managed service called Confluent Cloud.
You can find the code samples for this article and a
Conclusion Docker Compose to run it yourself on GitHub. If you
Apache Kafka offers a scalable event streaming would like to learn more about building event-driv-
platform with which you can build applications en systems around Kafka, then be sure to read Ben
around the powerful concept of events. By using Stopford’s excellent book Designing Event-Driven
events as the basis for connecting your applica- Systems.
tions and services, you benefit in many ways, in-
41
www.dbooks.org
InfoQ @ InfoQ InfoQ InfoQ
Curious about
previous issues?
The InfoQ eMag / Issue #83 / March 2020 The InfoQ eMag / Issue #77 / October 2019 The InfoQ eMag / Issue #81 / January 2020
Service Service Mesh Exploring the An Engineer’s Sustainable Operations Testing in Tyler Treat on
12 Microservices Obscuring
Mesh Implementations (Possible) Future of Guide to a Good in Complex Systems with Production—Quality Microservice
Testing Techniques Complexity
Features and Products Service Meshes Night’s Sleep Production Excellence Software, Faster Observability
FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT
FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT
This eMag aims to answer To tame complexity and its This eMag takes a deep
pertinent questions for effects, organizations need dive into the techniques and
software architects and a structured, multi-pronged, culture changes required
technical leaders, such as: human-focused approach, to successfully test,
what is a service mesh?, that: makes operations observe, and understand
do I need a service mesh?, work sustainable, centers microservices.
and how do I evaluate the decisions around customer
different service mesh experience, uses continuous
offerings? testing, and includes chaos
engineering and system
observability. In this
eMag, we cover all of these
topics to help you tame the
complexity in your system.