You are on page 1of 6

374 BROUGHT TO YOU IN PARTNERSHIP WITH

Multi-Cloud Data Layer


Essentials
CONTENTS

•  Why a Multi-Cloud Data Layer?

•  Multi-Cloud Data Layer Workloads

Understanding Core Application Workloads and •  Additional Considerations for


Multi-Cloud Data Layers
Best Practices •  Conclusion

DAI CLEGG
PRODUCT EVANGELIST, YUGABYTE

WHY A MULTI-CLOUD DATA LAYER? But in reality, it is not a fight, a war of competing database models. It is
For decades, separation of business logic from the persistent data a co-existence. ACID, SQL databases are still the best at transactional
it manipulates has been well established in software application consistency. But NoSQL, graph, document, analytic, and streaming
architectures. This is because the same data can be used in many databases and technologies excel elsewhere in the spectrum of data
different ways to serve many different use cases, so it cannot be services’ needs. And mature technologies are available to address all
embedded in any one application. With the emergence of microservices, these workloads.
the need has come for this data to become a service.
By implementing more applications in the cloud as microservices, the
But this is not monolithic data. Many microservices using a single data services needed to support these applications will be delivered by
database results in very tight coupling. This makes it impossible to different technologies and databases — each optimized by the needs of
deploy new or evolving services individually if they need to make the microservices it is supporting.
database changes that will affect other services. Breaking complex
The result is not an attempt to shoehorn all workloads into a single
service interdependencies requires a data layer of multiple databases,
winner-takes-all database. The optimal result is a multi-cloud data
each serving a single microservice or possibly a few, tightly related
layer with the capability to deliver the appropriate data services for any
microservices.
use case.
DELIVERING DATA SERVICES FOR ANY USE CASE
Cloud computing offers microservices the perfect operating
environment: resilient, scalable, and distributable. But the database
lagged the embrace of the virtual world of the cloud. Traditional
RDBMS, the workhorse of data services since before the turn of the
millennium, is monolithic. You scale a traditional RDBMS by running it
on more powerful hardware. But in the cloud, this has limitations. And
resilience comes only from database replication and complex disaster
recovery processes.

The response to this was a proliferation of NoSQL databases and other


data management tools. But monolithic RDBMS hung on because it
can deliver ACID transactional consistency, which is vital for systems
of record and systems of engagement. And lately, SQL databases are
fighting back with cloud-native transactional capability.

REFCARD | MARCH 2022 1


effortless
distributed
sql

Get started with a free cluster in


AWS or GCP in minutes and embrace
a fully managed DBaaS:

yugabyte.com/cloud/

YugabyteDB is a 100% open source, PostrgreSQL compatible


distributed SQL database that helps developers build and ship
apps faster. Focus on growing your business, not on complex
data infrastructure management.
REFCARD | MULTI-CLOUD DATA L AYER ESSENTIALS

MULTI-CLOUD DATA LAYER CORE CAPABILITIES •   This also means local processing. If there is shared data
A multi-cloud data layer delivers three core capabilities for both involved, this will mean distributed data — or data being
existing applications to help bridge digital transformation initiatives available close to where it is needed.
as well as new cloud-native applications and microservices. These core
•   In addition, local processing means data not just distributed in
capabilities include:
cloud regions; it may also indicate edge computing capability.
1. Freedom from tradeoffs: With a multi-cloud data layer,
developers can achieve low latency, ACID transactions, familiar TRANSACTIONAL CONSISTENCY

interfaces, relational capabilities, horizontal scale, high Transactional consistency is the need for two different entities to be

performance, geographic distribution, and ease of deployment, updated and it being mandatory that both fail or succeed.

all in one database system.


Monetary transactions are the most obvious, but there are others, such
2. Simplified operations: A multi-cloud data layer should be as reserving stock for an order against Inventory.
easy to deploy and scale to help minimize disruption going
•   This means ACID compliance, usually a SQL database.
from a traditional RDBMS to a cloud-native database for
cloud-native applications. It also provides the flexibility ARRIVAL VELOCITY
to allow organizations to start running in minutes through How fast do requests arrive? Internet of Things (IoT) applications often
a fully managed offering, or be able to deploy anywhere, generate millions of events per second. The first requirement is to
including containers across public, private, and hybrid cloud ingest them at sustained peak levels. But processing them may be a
environments. For all deployments, organizations should be different operation.
able to start small and scale horizontally without impacting
performance, uptime, or data integrity (i.e., scale-out vs. A business use case defined to be a monitoring, alert, or dashboard

scale-up). application based on the event stream may be delivered as (at least)
two services: ingest, analytic derivation of complex KPIs, and dashboard
3. Built-in security and resiliency: Uncompromising security
delivery of indicators and trends.
and availability are expected, with these core features designed
into a multi-cloud data layer from the start to make it easy and •   The rapid ingest will normally predicate stream processing

seamless to enable. capabilities enabled on a persistent store on the back-end that


is appropriate for the processing requirements.
MULTI-CLOUD DATA LAYER WORKLOADS •   The appropriate back-end capability might be a NoSQL
It is tempting to define workloads by the classes of data management database, a transactional SQL database, or a data lake. The
tools available (e.g., SQL and NoSQL). That’s a shortcut that will suffice determining factor will probably relate to the further processing
much of the time, but it is a better practice to consider the needs needs of the arrival data.
of individual use cases. Independently of the business value that
the use case implements, it will have a number of orthogonal data DATA ACCESS VOLUME
requirements. How much data needs to be accessed to service the request? Ad-hoc
analytic queries will often consume huge volumes of data.
For example: Is it interactive? Does it need to ingest high-arrival rate
events? Will it involve an ad-hoc query of large volumes of data? There •   If the queries are ad-hoc on structured data and data

are a number of such questions, and how they are answered for a aggregations (such as OLAP cubes) cannot be pre-built to serve

specific use case indicates the appropriate data services it will need. them, the most appropriate technology might be a massively
parallel data warehouse or data lakehouse.
There are several key factors that define multi-cloud data layer
•   But if the queries are ad-hoc on unstructured data, the most
workloads. Let’s walk through them to better understand what they are
appropriate technology might well be enterprise search
and how they function.
software.

LATENCY •   If “families” of queries can be defined once and reused many


How much time can elapse before the service is regarded as a failure? times, then OLAP processing in many kinds of stores may meet
This might be a time out in a time-critical API call, or it might be loss the need.
of customers over time on a high-volume, poor-performing website.
Regardless of the latency in processing the request, significant network
latency must not be injected.

REFCARD | MARCH 2022 3 BROUGHT TO YOU IN PARTNERSHIP WITH


REFCARD | MULTI-CLOUD DATA L AYER ESSENTIALS

DATA COMPLEXITY The critical point is that it is important to analyze the use case backlog
Is the data to be accessed a complex structure? by what is needed, not by what has been used in the past.

•  In some cases, apparent complexity is a matter of looking at


MAPPING USE CASES ONTO WORKLOADS
too granular a level. For example, this content is made up of
A use case may have contradictory requirements. In the streaming
sections, containing paragraphs, sentences, words, and letters.
data dashboard example mentioned above, low latency and huge
Each letter has fonts and many other properties. But take a step
data volume access are genuinely not achievable, but a reappraisal of
back and it’s a simple hierarchical structure, easily managed in a
the core requirement can usually define a series of services that will
document database or a key-value pair structure.
achieve the objective.
•  Other cases may involve genuinely complex data structures.
SELECTING DATA SERVICE TECHNOLOGIES TO
Relational databases have often been used for such structures.
SUPPORT WORKLOADS
But nowadays, other technologies, such as graph databases,
To the man with only a hammer, everything looks like a nail. The art of
can offer significant performance improvements for particular
the data layer architect is to have the right tool at hand or to recognize
use cases.
the right tool, and add it to their toolbox.
There are potentially other categories and ontologies. Even within this
If we adopt the broad workload categorization above, we can map them
categorization, important refinements have been omitted for clarity and
onto the capabilities of the family of data management tools that we
brevity (e.g., the differences for read-heavy versus write-heavy workloads).
might expect to see in a data layer. Table 1, below, shows this graphically:
However, this ontology can be regarded as a good rule-of-thumb.

Table 1

DATA ENGINE FEATURES

ANALYTIC DOCUMENT
TRANSACTIONAL STREAMING DATA GRAPH ENTERPRISE
DATABASE/ DATABASE/ KEY-
DATABASE PLATFORM DATABASE SEARCH
DATASTORE VALUE STORE

Low latency ✓ ✓ ✓

Transactional ✓
consistency

High arrival velocity in combination in combination in combination

High data volume ✓ ✓

Complex data ✓

ADDITIONAL CONSIDERATIONS FOR They must combine, for example, to serve high-arrival velocity use
MULTI-CLOUD DATA LAYERS cases; and they also need to share data.
There are a few other important considerations to keep in mind when
One widely used pattern for sharing enterprise reference data in
defining multi-cloud data layer workloads.
real time is a service request from one database instance, serving a
SERVICES WITHIN THE DATA LAYER particular subject domain (say Orders), to another database, serving
Looking at Table 1 above, it becomes obvious that the analysis of workload another domain like Customers. By treating data layer servers as cloud-
requirements for a specific use case will often leave choices that must be native components and wrapping them in APIs, intra-service layer
taken on a case-by-case basis. What is equally apparent is that for any service requests appear and behave exactly as application service
typical set of enterprise use cases, no single date engine will suffice. requests. This makes for a consistent and manageable architecture.

The need for a multi-cloud data layer is apparent. But it is not the case Figure 1 summarizes this configuration:
that all the component databases and other data engines in the data
SEE FIGURE 1 ON NEXT PAGE
layer operate as siloes, each with their own set of use cases.

REFCARD | MARCH 2022 4 BROUGHT TO YOU IN PARTNERSHIP WITH


REFCARD | MULTI-CLOUD DATA L AYER ESSENTIALS

Figure 1 Figure 2

But if the topology spreads over multiple cloud regions, the latency
introduced might be unacceptable in some cases. Even in a modest
In this example, each database would need to be replicated in at least
portfolio of use cases, there will be overlap in the domain schemas
one other availability zone in each region. But the simplified Figure 2
needed to support them. User, Customer, Product, Location, and others
illustrates the concept of distributed replicated databases.
are entities that will be needed over and over again.

AGILITY
When separating the subject domains and using service requests to
We have been looking at an architectural view of the data layer. And it is
“join” them is too slow (and it will be in some cases), other methods
important to have that vision. Building a multi-cloud data layer is not a
must be considered. The right solution for a given workload depends
matter for emergent design. The vision is fairly consistent, including the
on multiple factors, but the question to be asked of every use of every
choices, upgrades, and replacement of specific tools. But we don’t need
entity is: Does the transaction demand the latest version?
all its capability on Day One. Implementation of data layer capability
For example, to create an order for a customer, we only need the ID of can be driven by what the business use cases demand of it.
the Customer, which is immutable, and possibly the Name for search
Develop, communicate, and agree on the vision and strategy for the
and verification. Our service doesn’t need the guaranteed latest
data layer, but implement as needed.
version. Asynchronous replication of a distributed Customer database
with an availability zone close to us will meet the need.
ORGANIZATIONAL VARIABILITY
Or perhaps change-data-capture (CDC) integration with the core The material that we have explored here suggests an architectural

Customer schema might suffice. But this integration will not happen if framework within which application developers must deliver. A well-

we’re scheduling deliveries of Orders. We need to ensure the address trodden path makes it easier and quicker to travel from A to B.

is the latest point in which we convey to an external delivery service.


But modern application developers expect, with good reason, to pick
In this use case, we probably want a data-layer-to-data-layer service
the tools — and path — that best meet their particular needs, for their
invocation of the Customer Location entity (probably served by a
particular objectives. The days of a central architects’ group imposing
dedicated Customer service). We can afford the latency involved
a solution space for every problem space have not generally survived in
because we don't have a web user waiting; this is a back-office function.
agile cloud-native development.

So, a topology that distributes the reference data (the most commonly
However, ignoring the benefits of the well-trodden path, and failing
reused entities), using asynchronous replication to each region,
to learn from the experience of others, is an equally undesirable anti-
providing a local copy without imposing high latency, is a useful
pattern. Data layer architects and developers, following the approach
pattern. And because reference data (systems of record) generally
discussed here, will have to find their own balance. This balance will
change relatively infrequently, asynchronously rippling change across
depend on many human and organizational variables beyond this
the network will almost certainly be sufficient.
discussion.

Figure 2, below, shows this as a database configuration, representing in


this case three domain databases each replicated across three regions:

REFCARD | MARCH 2022 5 BROUGHT TO YOU IN PARTNERSHIP WITH


REFCARD | MULTI-CLOUD DATA L AYER ESSENTIALS

CONCLUSION
600 Park Offices Drive, Suite 300
Microservices and cloud-native architectures deliver an effective Research Triangle Park, NC 27709
engine for application modernization, but modernization of the data 888.678.0399 | 919.678.0300

layer lags. What is more, the variety of different application types At DZone, we foster a collaborative environment that empowers developers and
tech professionals to share knowledge, build skills, and solve problems through
demands multiple workload types, each with their unique needs and content, code, and community. We thoughtfully — and with intention — challenge
the status quo and value diverse perspectives so that, as one, we can inspire
capabilities. No single database can meet all these needs and deliver all positive change through technology.
these capabilities without unacceptable tradeoffs. As a result, we are
entering the age of the multi-cloud data layer. Copyright © 2022 DZone, Inc. All rights reserved. No part of this publication
may be reproduced, stored in a retrieval system, or transmitted, in any form or
by means of electronic, mechanical, photocopying, or otherwise, without prior
With a properly architected multi-cloud data layer, you can build your written permission of the publisher.
data foundation to handle all cloud workloads with ease across any
location or region. It’s a powerful solution for workloads that require
resiliency from failure, low latency, and horizontal scaling to meet
unbounded and potentially peaky demand. WRITTEN BY DAI CLEGG,
PRODUCT EVANGELIST, YUGABYTE

Dai Clegg has worked in Engineering & Product


roles at industry giants Oracle and IBM, and at
disruptive startups Netezza, Acunu and Yugabyte. He
has launched and grown database, analytics, middleware,
and development tools products. Dai has deep experience on both
sides of the NoSQL/SQL debate which he brings to bear on the
challenges of data in the cloud-native era.

Dai has a BSc from Birmingham University and an MSc from Birkbeck
College (UK).

REFCARD | MARCH 2022 6 BROUGHT TO YOU IN PARTNERSHIP WITH

You might also like