Professional Documents
Culture Documents
DAI CLEGG
PRODUCT EVANGELIST, YUGABYTE
WHY A MULTI-CLOUD DATA LAYER? But in reality, it is not a fight, a war of competing database models. It is
For decades, separation of business logic from the persistent data a co-existence. ACID, SQL databases are still the best at transactional
it manipulates has been well established in software application consistency. But NoSQL, graph, document, analytic, and streaming
architectures. This is because the same data can be used in many databases and technologies excel elsewhere in the spectrum of data
different ways to serve many different use cases, so it cannot be services’ needs. And mature technologies are available to address all
embedded in any one application. With the emergence of microservices, these workloads.
the need has come for this data to become a service.
By implementing more applications in the cloud as microservices, the
But this is not monolithic data. Many microservices using a single data services needed to support these applications will be delivered by
database results in very tight coupling. This makes it impossible to different technologies and databases — each optimized by the needs of
deploy new or evolving services individually if they need to make the microservices it is supporting.
database changes that will affect other services. Breaking complex
The result is not an attempt to shoehorn all workloads into a single
service interdependencies requires a data layer of multiple databases,
winner-takes-all database. The optimal result is a multi-cloud data
each serving a single microservice or possibly a few, tightly related
layer with the capability to deliver the appropriate data services for any
microservices.
use case.
DELIVERING DATA SERVICES FOR ANY USE CASE
Cloud computing offers microservices the perfect operating
environment: resilient, scalable, and distributable. But the database
lagged the embrace of the virtual world of the cloud. Traditional
RDBMS, the workhorse of data services since before the turn of the
millennium, is monolithic. You scale a traditional RDBMS by running it
on more powerful hardware. But in the cloud, this has limitations. And
resilience comes only from database replication and complex disaster
recovery processes.
yugabyte.com/cloud/
MULTI-CLOUD DATA LAYER CORE CAPABILITIES • This also means local processing. If there is shared data
A multi-cloud data layer delivers three core capabilities for both involved, this will mean distributed data — or data being
existing applications to help bridge digital transformation initiatives available close to where it is needed.
as well as new cloud-native applications and microservices. These core
• In addition, local processing means data not just distributed in
capabilities include:
cloud regions; it may also indicate edge computing capability.
1. Freedom from tradeoffs: With a multi-cloud data layer,
developers can achieve low latency, ACID transactions, familiar TRANSACTIONAL CONSISTENCY
interfaces, relational capabilities, horizontal scale, high Transactional consistency is the need for two different entities to be
performance, geographic distribution, and ease of deployment, updated and it being mandatory that both fail or succeed.
scale-up). application based on the event stream may be delivered as (at least)
two services: ingest, analytic derivation of complex KPIs, and dashboard
3. Built-in security and resiliency: Uncompromising security
delivery of indicators and trends.
and availability are expected, with these core features designed
into a multi-cloud data layer from the start to make it easy and • The rapid ingest will normally predicate stream processing
are a number of such questions, and how they are answered for a aggregations (such as OLAP cubes) cannot be pre-built to serve
specific use case indicates the appropriate data services it will need. them, the most appropriate technology might be a massively
parallel data warehouse or data lakehouse.
There are several key factors that define multi-cloud data layer
• But if the queries are ad-hoc on unstructured data, the most
workloads. Let’s walk through them to better understand what they are
appropriate technology might well be enterprise search
and how they function.
software.
DATA COMPLEXITY The critical point is that it is important to analyze the use case backlog
Is the data to be accessed a complex structure? by what is needed, not by what has been used in the past.
Table 1
ANALYTIC DOCUMENT
TRANSACTIONAL STREAMING DATA GRAPH ENTERPRISE
DATABASE/ DATABASE/ KEY-
DATABASE PLATFORM DATABASE SEARCH
DATASTORE VALUE STORE
Low latency ✓ ✓ ✓
Transactional ✓
consistency
Complex data ✓
ADDITIONAL CONSIDERATIONS FOR They must combine, for example, to serve high-arrival velocity use
MULTI-CLOUD DATA LAYERS cases; and they also need to share data.
There are a few other important considerations to keep in mind when
One widely used pattern for sharing enterprise reference data in
defining multi-cloud data layer workloads.
real time is a service request from one database instance, serving a
SERVICES WITHIN THE DATA LAYER particular subject domain (say Orders), to another database, serving
Looking at Table 1 above, it becomes obvious that the analysis of workload another domain like Customers. By treating data layer servers as cloud-
requirements for a specific use case will often leave choices that must be native components and wrapping them in APIs, intra-service layer
taken on a case-by-case basis. What is equally apparent is that for any service requests appear and behave exactly as application service
typical set of enterprise use cases, no single date engine will suffice. requests. This makes for a consistent and manageable architecture.
The need for a multi-cloud data layer is apparent. But it is not the case Figure 1 summarizes this configuration:
that all the component databases and other data engines in the data
SEE FIGURE 1 ON NEXT PAGE
layer operate as siloes, each with their own set of use cases.
Figure 1 Figure 2
But if the topology spreads over multiple cloud regions, the latency
introduced might be unacceptable in some cases. Even in a modest
In this example, each database would need to be replicated in at least
portfolio of use cases, there will be overlap in the domain schemas
one other availability zone in each region. But the simplified Figure 2
needed to support them. User, Customer, Product, Location, and others
illustrates the concept of distributed replicated databases.
are entities that will be needed over and over again.
AGILITY
When separating the subject domains and using service requests to
We have been looking at an architectural view of the data layer. And it is
“join” them is too slow (and it will be in some cases), other methods
important to have that vision. Building a multi-cloud data layer is not a
must be considered. The right solution for a given workload depends
matter for emergent design. The vision is fairly consistent, including the
on multiple factors, but the question to be asked of every use of every
choices, upgrades, and replacement of specific tools. But we don’t need
entity is: Does the transaction demand the latest version?
all its capability on Day One. Implementation of data layer capability
For example, to create an order for a customer, we only need the ID of can be driven by what the business use cases demand of it.
the Customer, which is immutable, and possibly the Name for search
Develop, communicate, and agree on the vision and strategy for the
and verification. Our service doesn’t need the guaranteed latest
data layer, but implement as needed.
version. Asynchronous replication of a distributed Customer database
with an availability zone close to us will meet the need.
ORGANIZATIONAL VARIABILITY
Or perhaps change-data-capture (CDC) integration with the core The material that we have explored here suggests an architectural
Customer schema might suffice. But this integration will not happen if framework within which application developers must deliver. A well-
we’re scheduling deliveries of Orders. We need to ensure the address trodden path makes it easier and quicker to travel from A to B.
So, a topology that distributes the reference data (the most commonly
However, ignoring the benefits of the well-trodden path, and failing
reused entities), using asynchronous replication to each region,
to learn from the experience of others, is an equally undesirable anti-
providing a local copy without imposing high latency, is a useful
pattern. Data layer architects and developers, following the approach
pattern. And because reference data (systems of record) generally
discussed here, will have to find their own balance. This balance will
change relatively infrequently, asynchronously rippling change across
depend on many human and organizational variables beyond this
the network will almost certainly be sufficient.
discussion.
CONCLUSION
600 Park Offices Drive, Suite 300
Microservices and cloud-native architectures deliver an effective Research Triangle Park, NC 27709
engine for application modernization, but modernization of the data 888.678.0399 | 919.678.0300
layer lags. What is more, the variety of different application types At DZone, we foster a collaborative environment that empowers developers and
tech professionals to share knowledge, build skills, and solve problems through
demands multiple workload types, each with their unique needs and content, code, and community. We thoughtfully — and with intention — challenge
the status quo and value diverse perspectives so that, as one, we can inspire
capabilities. No single database can meet all these needs and deliver all positive change through technology.
these capabilities without unacceptable tradeoffs. As a result, we are
entering the age of the multi-cloud data layer. Copyright © 2022 DZone, Inc. All rights reserved. No part of this publication
may be reproduced, stored in a retrieval system, or transmitted, in any form or
by means of electronic, mechanical, photocopying, or otherwise, without prior
With a properly architected multi-cloud data layer, you can build your written permission of the publisher.
data foundation to handle all cloud workloads with ease across any
location or region. It’s a powerful solution for workloads that require
resiliency from failure, low latency, and horizontal scaling to meet
unbounded and potentially peaky demand. WRITTEN BY DAI CLEGG,
PRODUCT EVANGELIST, YUGABYTE
Dai has a BSc from Birmingham University and an MSc from Birkbeck
College (UK).