You are on page 1of 40

DevOps for

The InfoQ eMag / Issue #74 / June 2019

the Database

How to Source-Control DevOps Why and How Database


Your Databases for for the Changes Should Be Included
DevOps Database in the Deployment Pipeline

FACILITATING THE SPREAD OF KNOWLEDGE AND INNOVATION IN PROFESSIONAL SOFTWARE DEVELOPMENT


InfoQ @ InfoQ InfoQ InfoQ

DevOps for
the Database
IN THIS ISSUE

How to Source-Control Your


Databases for DevOps 07 How Database Administration
Fits into DevOps 25

DevOps for
the Database 12 Treating Shared Databases
Like APIs in a DevOps World 28

Why and How Database Continuous Delivery for


Changes Should Be Included
in the Deployment Pipeline 18 Databases: Microservices, Team
Structures, and Conway’s Law 30

GENERAL FEEDBACK feedback@infoq.com / ADVERTISING sales@infoq.com / EDITORIAL editors@infoq.com


CONTRIBUTORS

Jonathan Allen Eduardo Piairo


got his start working on MIS projects for a is a deployment-pipeline craftsman always
health clinic in the late ’90s, bringing them ready to learn about source control, CI,
up from Access and Excel to an enterprise and CD for databases, applications, and
solution by degrees. After spending five infrastructure. The deployment pipeline
years writing automated trading systems is his favorite technical and cultural tool.
for the financial sector, he became a Piariro works at Basecone as an operations
consultant on a variety of projects including engineer with automation, collaboration,
the UI for a robotic warehouse, the middle and communication as priorities.
tier for cancer-research software, and
the big-data needs of a major real-estate
insurance company. In his free time, he
enjoys studying and writing about martial
arts from the 16th century.

Ben Linders
is an Independent Consultant in Agile, Lean,
Quality and Continuous Improvement,
based in The Netherlands. Author of
Matthew Skelton Getting Value out of Agile Retrospectives,
Waardevolle Agile Retrospectives, What
has been building, deploying, and operating Drives Quality, The Agile Self-assessment
commercial software systems since Game, and Continuous Improvement.
1998. Head of Consulting at Conflux, Creator of many Agile Coaching Tools,
he specialises in Continuous Delivery, for example, the Agile Self-assessment
operability and organisation design for Game. As an adviser, coach and trainer he
software in manufacturing, ecommerce, and helps organizations by deploying effective
online services, including cloud, IoT, and software development and management
embedded software. practices.
The InfoQ eMag / Issue #74 / June 2019

A LETTER FROM
THE EDITOR
Sooner or later, database chang- once you hear it from the experts
es become the bottleneck for who’ve been there and done that.
traditional organizations trying to
accelerate software delivery with Jonathan Allen gets us started
modern CI/CD deployment pipe- with the basics: version con-
lines. After building, testing and trolling all database changes and
deploying application changes making sure they are visible and
through the pipeline, environment deployed from a pipeline rath-
configuration, infrastructure er than via standalone manual
changes and even automated processes. Changes might refer
security tests are usually next to schema and index updates but
Manuel Pais in line. Databases tend to get also procedures, views, triggers,
is a DevOps and Delivery Consultant,
left behind (no one heard of configuration settings, reference
focused on teams and flow. DevDBOps yet), partly because data, and so on. The key is to en-
Manuel helps organizations adopt
test automation and continuous of concerns around testing data sure the ability to recreate a full
delivery, as well as understand migration and performance. But database from scratch if need be
DevOps from both technical and
human perspectives. Co-curator of there are often issues around - minus the data (which should
DevOpsTopologies.com. DevOps ownership as well, with gate- come from backups).
lead editor for InfoQ. Co-founder of
DevOps Lisbon meetup. Coauthor keeping DBAs incentivized to
of the book “Team Topologies:
ensure stability and security of Besides tooling (for CI/CD,
Organizing Business and Technology
Teams for Fast Flow”. central databases (customer deployment, monitoring and
Tweets @manupaisable
data has become, after all, the observability), we must also ad-
most important asset for many dress people (growing database
organizations) while application skills for application engineers),
teams are incentivized to move culture (make sure the new ways
fast and deliver features, thus of working are more resilient and
requiring frequent changes to the easy to apply, thus avoiding a
database. reversion to old habits), and pro-
cess concerns (plan ahead and
In this eMag we provide you con- stick to it, but do it in an iterative
crete directions on how to deal fashion), says Baron Schwartz.
with the above concerns, and The article is an adaptation from
some new patterns for database a talk where Schwartz highlights
development and operations in patterns and anti-patterns he’s
this fast paced DevOps world. seen in 20 years of experience
It’s not as scary as it may seem with databases and tools.

4
The InfoQ eMag / Issue #74 / June 2019
A short report on a talk by Simon Because CI/CD and DevOps are
Sabin at WinOps 2017 points also about people and culture,
out the benefit of treating shared we include an interview with the
databases as APIs. This includes always pragmatic Dan North
practices like respecting con- talking about the evolution of the
sumer (application) driven con- DBA role (and team) and how it
tracts and reducing batch size relates to dev and ops roles.
for database changes, adding
deployment information in the Matthew Skelton recommends a
database itself, or improving se- “DBA-as-a-Service” model sup-
curity with a “private” vs “public” porting independent application
approach to database design. teams with their own database
Plenty of food for thought! development skills, especially
when moving to a microservices
Eduardo Piairo’s article talks us approach. Skelton reminds us of
through a staged approach to CI/ the need to consider the impor-
CD for databases, highlighting tance of Conway’s law when de-
the practices required for each signing systems and the underly-
stage. Continuous Integration ing data stores. In particular, the
means that database changes effectiveness of microservices
are validated through automated can be hindered if the database
testing, while Continuous Deliv- design does not match the orga-
ery means that new database nization’s team structures and
versions can be deployed from responsibilities.
the pipeline. Piairo further iden-
tifies different pipeline design
strategies for databases, based
on the organization’s maturi-
ty - from fully independent DB
pipelines to joined pipelines (app
and DB code) at the deployment
stage to fully integrated app/DB
pipelines, sharing a single CI/CD
process.

5
The InfoQ eMag / Issue #74 / June 2019

Deliver
Deliver
Deliver
value
value
quicker
value
quicker
quicker
while
while
keeping
while
keeping
keeping
youryour
data
your
data
safe
data
safe
safe

Redgate
Redgate
Redgate
helps
helps
IT
helps
teams
IT teams
IT teams
balance
balance
balance
the need
thethe
need
toneed
to to
deliver
deliver
deliver
software
software
software
faster
faster
with
faster
with
the
with
need
thethe
need
toneed
protect
to protect
to protect
andand
preserve
and
preserve
preserve
business
business
business
critical
critical
critical
data.data.
data.

YouYou
and
You
and
your
andyour
business
your
business
business
benefit
benefit
benefit
fromfrom
a from
DevOps
a DevOps
a DevOps
approach
approach
approach
to database
to database
to database
development,
development,
development,
whilewhile
staying
while
staying
staying
compliant
compliant
compliant
andand
minimizing
and
minimizing
minimizing
risk.risk.
risk.

FindFind
out
Find
out
more
out
more
atmore
at at
www.red-gate.com/DevOps
www.red-gate.com/DevOps
www.red-gate.com/DevOps
6
The InfoQ eMag / Issue #74 / June 2019
How to Source-Control
Your Databases for DevOps
by Jonathan Allen, Software Engineer and Lead Editor

A robust DevOps environment requires having continuous integration for every component of the system
but too often the database is omitted from the equation, leading to problems from fragile production re-
leases and inefficient development practices to simply making it harder to onboard new programmers.

both relational and NoSQL databases have aspects to consider in making them a part of a successful
continuous integration.

Source control for schema • tables or collections;


The first issue to address is source control and
• constraints;
schemas. It is not appropriate to have developers
apply database changes in an ad hoc manner. That • indexes;
would be the equivalent to changing JavaScript
• views;
files by editing them directly on the production
server. • stored procedures, functions, and triggers; and

• database configuration.
When planning your source control for the da-
tabase, make sure you capture everything. This
You may think that because you are using a sche-
includes, but is not limited to:
ma-less database, you don’t need source control.

7
But even then, you are going to When working with whole-sche- the database objects themselves,
The InfoQ eMag / Issue #74 / June 2019

need to account for indexes and ma source control, you usually you store the list of steps nec-
overall database configuration don’t write your migration scripts essary to create the database
settings. If your indexing scheme directly. The deployment tools objects. The one I’ve used suc-
in your QA database is differ- figure out what changes you cessfully in the past is Liquibase,
ent than that in your production need by comparing the current but they all work pretty much the
database, it will be difficult to state of the database with the same. 
perform meaningful performance idealized version in source con-
tests. trol. This allows you to rapidly The main advantage of a change-
make changes to the database script tool is your full control over
There are two basic types of and see the results. When using the final migration script. This
source control for databases, this type of tool, I rarely alter the makes it easier to perform com-
which I’ll call “whole schema” database directly and instead plex changes such as splitting or
and “change script”. allow the tooling to do most of combining tables.
the work. 
“Whole-schema” source control Unfortunately, there are several
is where your source control Occasionally, the tooling isn’t disadvantages to this style of
looks how you want your data- enough, even with pre- and source control. The first is the
base to look. When using this post-deployment scripts. In need to write the change scripts.
pattern, you can see all the tables those cases, a database develop- While it does give you more
and views laid out exactly as they er or DBA will have to manually control, it is also time consum-
are meant to be, which can make modify the generated migration ing. It is much easier to just add
it easier to understand the data- script, which can break your a property to a C# class than to
base without having to deploy it.  continuous deployment scheme. write out the ALTER TABLE script
This usually happens when there in longhand.
An example of whole-schema are major changes to a table’s
source control for SQL Server structure, as the generated mi- This becomes even more prob-
is SQL Server Data Tools (SSDT). gration script can be inefficient in lematic when working with things
In SSDT, all of your database these cases. like SchemaBinding. SchemaB-
objects are expressed in terms of inding enables some advanced
CREATE scripts. This can be con- Another advantage of features in SQL Server by lock-
venient when you want to pro- whole-schema source control ing down the table schema. If
totype a new object using SQL, is that it supports code analy- you make changes to a table or
then paste the final draft directly sis. For example, if you alter the view’s design, you have to first
into the database project under name of a column but forget to remove the SchemaBinding from
source control.  change it in a view, SSDT will any views that touch the table
return a compile error. Just like or view. And if those views are
Another example of whole-sche- static typing for application pro- schema-bound by other views,
ma source control is Entity gramming languages, this catch- you need to temporarily unbind
Framework migrations. Instead es a lot of errors and prevents those too. All of this boilerplate,
of SQL scripts, the database is you from attempting to deploy while easy for a whole-schema
primarily represented by C#/VB clearly broken scripts.  source control tool to generate, is
classes. Again, you can get a hard to do correctly by hand.
good picture of the database by The other option is change-script
browsing the source code. source control. Instead of storing

8
Another disadvantage of change- bles that hold data that users are actions (e.g., shipping receipts or

The InfoQ eMag / Issue #74 / June 2019


script tools is they tend to be not meant to modify. For exam- logs). 
opaque. For example, if you want ple, this could contain table-drive
to know the columns of a table logic for business rules, the Real user data is almost never
without deploying it, you need various status codes for a state directly added to source control.
to read every change script that machine, or just a list of key-val- However, it is a good practice to
touches the table. This makes it ue pairs that match an enum in have realistic-looking sample
easy to miss something.  the application code. data available for development
and testing. This could be direct-
When starting a new database The data in this type of table ly in the database project, stored
project from scratch, I always should be treated as source elsewhere in source control, or, if
choose whole-schema source code. Changes to it need to go particularly large, kept on a sepa-
control when available. This through the same review and rate file share.
allows developers to work much QA process you would use for
faster and requires less knowl- any other code change. And it is A hybrid table is one that stores
edge to use successfully so long especially important the changes both managed and user data.
as you have one person to set it automatically deploy along with There are two ways to partition
up. other application and database one. In a column partition, users
deployments, otherwise those can modify some columns but
For existing databases, especially changes may be accidentally not others. In this scenario, you
ones in production for a number forgotten. treat the table just like a nor-
of years, change-script source mal managed table but with the
control is often more appropri- In SSDT, I handle this using a additional restriction that you
ate. Whole-schema tools make post-deployment script. In this never update a user-controlled
certain assumptions about the script, I populate a temporary column. Use a row partition when
database design, while change- table with what the data should you have certain records that
script tools work with any data- look like, then use a MERGE users cannot modify. A common
base. Also, whole-schema tools statement to update the real scenario I run into is the need
are harder to build so they may table.  for hard-coded values in the
simply be not available for your users table. On larger systems,
particular database. If your source-control tool I may have a separate user ID
doesn’t handle this well, you for each microservice that can
Data management could build a standalone tool to make changes independent from
Tables can be broadly classified perform the data updates. What’s any real person. For example, a
as “managed”, “user”, or “hybrid” important isn’t how you do it but user may be called “Bank Data
depending on the nature of the whether or not the process is Importer”. 
data they contain. The way you easy to use and reliable.
treat these tables is different The best way I have found for
depending on which category it User tables are those tables in managing row-partitioned hybrid
falls into. which a user can add or mod- tables is via reserved keys. When
ify data. This can include both I define the identity/auto-number
A common mistake when putting tables they can modify directly column, I’ll set the initial value
databases under source control (e.g., name and address) and at 1,000, leaving users 1 through
is forgetting the data. Invariably tables modified indirectly by their 999 to be managed through
there are going to be lookup ta- source control. This requires

9
a database that allows you to the migration scripts no longer As a rule of thumb, I allow the
The InfoQ eMag / Issue #74 / June 2019

manually set a value in an iden- work, requiring you to restore the shared developer database to
tity column. In SQL Server, this is database from backups. update whenever a check-in is
done via the SET IDENTITY_IN- made the relevant branch. This
SERT command.  Another issue is schema drift. can be somewhat disruptive, but
This is when the development usually it is manageable. And
Another option for dealing with database no longer matches you do need a place to verify the
this scenario is to have a col- what’s in source control. Over deployment scripts before they
umn called “SystemControlled” time, developer databases tend hit integration.
or something to that effect. to accumulate non-production
When this entry is set to 1/true, tables, test scripts, temporary For my integration databases, I
it means the application will not views, and other cruft to be tend to schedule deployments to
allow direct modifications. If set cleared out. This is much easier happen once per day along with
to 0/false, then the deployment to do when each developer has any services. This provides a
scripts know to ignore it. their own database that they can stable platform that UI develop-
reset whenever they want. ers can work with. Few things are
Individual developer databases as frustrating to UI developers as
Once you have the schema and The final and most important not knowing if code that sudden-
data under source control, you issue is that service and UI de- ly starts failing was their mistake
can take the next step and start velopers need a stable platform or a problem in the services/
setting up individual developer to write their code. If the shared database.
databases. Just as each devel- database is constantly in flux,
oper should be able to run their they can’t work efficiently. At one Database security and source
own instance of a web server, company I worked at, it was not control
each developer who may be usual to see developers shout, An often-overlooked aspect of
modifying the database design “Services are down again!” and database management is secu-
needs to be able to run their own then spend an hour playing video rity — specifically, which users
copy of the database. games while the shared database and roles have access to which
was put back together. tables, views, and stored pro-
This rule is frequently violated, cedures. In the all-too-familiar
to the determent of the devel- Shared developer and integration worst-case scenario, the ap-
opment team. Invariably, de- databases plication gets full access to the
velopers making changes to a The number-one rule for a shared database. It can read and write to
shared environment are going to developer or integration database every column of every table, even
interfere with each other. They is that no one should directly ones it has no business touching.
may even get into deployment modify the database. The only Data breaches often result from
duels, when two developers each way a shared database should exploitations of flaws in a minor
try to deploy changes to the update is via the build server utility application with far more
database. When this happens and the continuous-integration/ access rights than it needs. 
with a whole-schema tool, they deployment process. Not only
will alternately revert the other’s does this prevent schema drift, The main objection to locking
changes without even realizing it allows scheduled updates to down the database is that devel-
it. If using a change-script tool, reduce disruptions. opers don’t know what will break.
the database may be placed in Because they have never locked
an indeterminate state for which down the database before, they

10
literally have no idea what permissions the application actually

The InfoQ eMag / Issue #74 / June 2019


TL;DR
needs.

The solution to this is to put the permissions in source control


from day one. This way, when the developer tests the application, • Database schema, includ-
ing indexes, need to be in
it will fail immediately if the permissions are incorrect. This, in source control.
turn, all permissions are settled means by the time the applica-
tion gets to QA and there is no guesswork or risk of a missing • Data that control business
permission. logic such as lookup tables
also need to be in source
control.
Containerization
Depending on the nature of your project, containerization of the • Developers need a way
database is an optional step. I’ll offer two use cases to illustrate to easily create local
databases.
why.
• Shared databases should
In the first use case, the project has a simple branching structure: only be updated via a build
there is a dev branch that feeds into a QA branch, which in turn server.
feeds into staging and finally production. This can be handled by
four shared databases, one for each stage in the pipeline.

In the second use case, we have a set of major feature branches.


Each major feature branch is further subdivided into a dev and QA
branch. Features have to pass QA before becoming candidates
for merging into the main branch, so each major feature needs its
own test environment.

In the first use case, containerization is probably a waste of time


even if your web services do need containers. For the second use
case, containerization of some sort is critical. If not real containers
(e.g., Docker), then at least deployment scripts that can generate
new environments as needed (e.g., AWS or Azure) when creating
new major feature branches. 

11
The InfoQ eMag / Issue #74 / June 2019

DevOps for
the Database
Presentation summary by Manuel Pais, DevOps and Delivery Consultant

Baron Schwartz, CTO and founder of VividCortex, used real stories to


The Presenter
explore why it is hard to apply DevOps principles and practices to da-
tabases and how to get better at it in his from a presentation at QCon
San Francisco 2018.

He covered what the research shows about DevOps, databases, and


company performance; current and emerging trends in building and
managing data tiers; the traditional dedicated DBA role; and more.
Baron Schwartz
is the CTO and founder of
VividCortex. He has written a This article is adapted from Schwartz’s talk and draws on his 20
lot of open source software, years of experience with databases to present patterns and anti-pat-
and several books including
High Performance MySQL. terns for including databases in a DevOps world.
He’s focused his career on
learning and teaching about
scalability, performance, and Three database DevOps stories
observability of systems
Schwartz’s first story was about a company with a fast-growing cus-
generally (including the view
that teams are systems tomer base in a highly regulated industry. The company was throwing
and culture influences their
performance), and databases
money at problems, hiring people and buying more hardware. They
specifically. took the same approach to databases, promoting their single DBA to
manager and charging her with hiring a team of developers to run the

12
databases. The strategy was to outages will still happen if devs all developers have to be able

The InfoQ eMag / Issue #74 / June 2019


multiply the database adminis- push code over the wall to DBAs. to do this by themselves for the
tration team in proportion to the most obscure database out-
development teams. Despite 5x The third story is about a rel- ages at 3:00 a.m. on a Sunday.
to 8x growth, they did not see a atively small team that was You should be able to call in
need for fundamental change in running a large media organi- reinforcements, and it’s really
the way of working but thought zation. They definitely punched valuable to have somebody who
that they could adapt simply above their weight in the effi- understands how buffer pools
by multiplying teams in order ciency of their infrastructure. and latches work. But all devel-
to deal with the required effort. They managed it all with only two opers have to be able to realize
Over time, outages and incidents database-operations engineers, if a query is running too slowly
increased significantly. The DBA who decided to buy the data- or too frequently and some other
team was under pressure to keep base monitoring tool Schwartz’s basic aspects of performance
up with the development teams’ company (at the time) developed. monitoring.
workloads. This was a traditional About a dozen and a half de-
kind of DBA culture, which we’ll velopers overseas began using Because the schema and the
look at later on. the software on a daily basis. data model are also part of
Interestingly, the two opera- the codebase, they should go
The second database DevOps tions engineers only used it on a through the same deployment
story was about a friend of weekly basis. They intentionally pipeline as the rest of the ap-
Schwartz’s who had joined one avoided becoming bottlenecks by plication. Database code needs
of Silicon Valley’s fastest grow- letting their (qualified) developers to be version-controlled and
ing startups with a “move fast monitor their own databases. deployed to production using
and break things” culture. When the same tools and procedures
he joined, he was the sole DBA Broadening the database respon- as any other code. Application
for about 20 developers. Within a sibility from only the DBAs to deployment should therefore
couple of months, they had 100 the entire team can be extremely include database changes and
developers working there. De- efficient but also challenging automated schema migrations.
velopers were expected to come in terms of getting develop-
in and ship code on day one ers to interact directly with live Less critical but highly benefi-
to prove they could keep up in databases. cial is to automatically rebuild
this hyper-growth organization. pre-production environments
However, Schwartz’s friend was Database DevOps that are production-like, possibly
still the only DBA in house and For Schwartz, DevOps in the by restoring the previous night’s
new developers were mostly ju- context of the database means backup — which also tests that
nior. Inevitably, outages became first of all that developers have to backups are restorable. Finally,
a frequent occurrence. Despite own how their system operates automation of database oper-
the friend’s good intentions to against the database in produc- ations is necessary to support
set up database-change con- tion. That means they own the large-scale operations. Alter-
trol and review processes, this schema and related application natively, you can use a service
simply was not going to work. workload and performance. like Amazon’s RDS that already
Even if one single DBA were able Secondly, they have to be able supports that kind of scaling.
to review database changes from to debug, troubleshoot, triage,
100 developers (highly unlikely), and repair their own database Database management is one of
outages. This is not to say that the last DevOps holdouts. Gen-

13
erally speaking, if you can apply harder for developers to build ap- keeps the wheels on. That kind of
The InfoQ eMag / Issue #74 / June 2019

DevOps principles, practices, cul- plications that run well in produc- work should be automated.
tures, habits, and so forth to the tion. They become limited in the
database, you get similar benefits speed at which they can learn, fix, You also need tooling for data-
as you do from the other parts of and improve. Ultimately, develop- base monitoring and observabil-
your applications. er productivity declines and they ity, otherwise you can’t improve
become dependent on DBAs to their performance. Schwartz’s
One major gain is entering a literally debug their application. definition of observability is an
virtuous cycle of stability, which Schwartz has seen many cases attribute or property of a system
leads to more speed, which in where application bugs could that allows it to be observed,
turn leads to more stability. only be diagnosed by looking at besides all the required tooling
Instability slows things down, database behavior, which really to do so. For Schwartz, monitor-
which means that you can’t fix should be unacceptable. ing is the activity of continually
problems as quickly and appli- testing a system for conditions
cations get less stable. DevOps Bringing DevOps to the database that you have predetermined.
definitely helps increase stability There are four core elements to Observability is about taking the
and speed. successfully bring DevOps to the telemetry from the system and
database: people, culture, struc- guiding it into an analysis pipe-
Without DevOps, you have DBAs ture and process, and tooling. It line in order to be able to ask
responsible for code that they also requires a continuous learn- questions on the fly about how
can’t control — because a query ing process — you can’t do it with your systems are behaving. In
is code that runs in the database. a single DevOps adoption project. other words, monitoring tells you
When you deploy an application, if your systems are broken; ob-
you’re deploying queries into Tooling is the most concrete servability tells you why they’re
the database as well. That often and immediate of elements to broken and how to fix them.
leads to putting in gatekeeping grasp, so let’s start with it. You
controls and processes, which need tooling for deploying and All teams that Schwartz sees
creates a dependency for devel- releasing database changes in a working under high scale with
opers by offloading developer continuous fashion. You need to low downtime focus on knowl-
work onto the DBA team. The eliminate manual interactions to edge sharing. They use ancillary
latter lack the time to focus on be able to do frequent and auto- tooling like wiki pages, doc-
strategic activities like designing mated changes to the database. umentation (e.g., runbooks),
the next version of the platform Manual toil keeps you working dashboards, chatbots, and so on.
and helping developers design in the current system instead of There’s a lot of ancillary tooling
better applications and sche- focusing on improving it. that serves to tie people’s expe-
ma. This means you have highly rience, knowledge, and mental
knowledgeable, hard-to-find It’s easy to see how much man- models about the system to-
database administrators who are ual work there is because it’s gether in a shareable way that
doing unskilled routine work and, typically a painful process that allows other people to rapidly
even worse, becoming a bottle- people are explicitly conscious benefit from it. These teams put
neck to faster delivery. of. By participating in retrospec- in the time and effort to share
tives and conversations within knowledge.
With this approach, production the organization, you might be
feedback doesn’t come back surprised to realize how much
to development, making it even manual work is still required to

14
important, but what is lacking

The InfoQ eMag / Issue #74 / June 2019


most of the time is planning and
execution. What are you going to
do and in what order? How will
that contribute to achieving your
goals? You will need this kind
of high-level, directed, thematic
progression to visualize progress
and a clear connection to your
desired goals. It’s also relevant
(not redundant) to clarify what
will remain unchanged. People
fear change so keeping some
familiar reference points will help
advance the process of change.

Even if the high-level plan is


Take this screenshot from Etsy’s to be responsible for the live ambitious, begin the process with
internal Deployinator tool for de- performance of their services. small wins and build on them. For
ployments as an example. Notice But you need a process to ensure instance, start with a single team.
the links at the bottom pointing that you can do that safely. It’s fine to have this team out-
to logs, forums, and things to perform the others for a period of
watch for after a deployment (are The first and foremost rule of time, delivering better software
you seeing what you expected or processes is: Don’t bring down faster, with fewer outages and a
seeing something unexpected?). production. It’s okay if things lower rate of change failure. That
This is all important, time-saving break sometimes, but be care- gives you concrete selling points
knowledge gained from past ex- ful and diligent to avoid major that other teams can appreciate
periences that has been distilled issues. Schwartz likes this quote as they try to figure out what is
into wiki pages, forums, and from Chaos Engineering: “Chaos different in the outperforming
other sharing tools. engineering is not for breaking team. Starting small and getting
systems in production. If you buy-in will earn you the right to
Team structures work best when know that your system is going continue the process.
teams are not segregated into, to break, you don’t need chaos,
for example, the Go developers you need to fix it.” Culture change is a big part of
team or the JavaScript devel- DevOps and is a bigger root
opers team or all the operations So you don’t release Chaos Mon- problem than technology. The
people in one team and all the key on a system that you expect problem is that culture is emer-
QA people in another. Those are is going to break. You do it on gent so you can’t directly oper-
the infamous silos, which reduce a system that you expect to be ate on it: you can’t just go and
knowledge sharing and create able to stay up, then watch what change culture. Changing the
bottlenecks in delivery. Instead, happens and learn. incentives is what drives culture.
teams should be service-orient- Some people say “show me who
ed, autonomous, highly aligned, In order to get people on board, gets rewarded or promoted, and
and trusted. Last but not least, it helps to have a plan. Lead- I’ll tell you what your real values
they need to own what they build, ership, vision, and mission are are and what your real culture is.”

15
Schwartz finds that statement to with them. Psychological safety stores, in production. Schwartz
The InfoQ eMag / Issue #74 / June 2019

be truthful. and trust mean three things to noted several successful cases
Schwartz: listening to people and where engineers use the data-
In order to change culture, you saying, “I hear you, you matter to base toolset extensively to mon-
need to change the perceived me, you exist;” empathizing with itor what is happening, even if
value of work. Operations work them and saying, “I care;” and they didn’t understand specifics
shouldn’t have low status. Car- supporting them and saying, “I’ve like buffer pools and latches and
rying a pager shouldn’t have low got your back.” When you put SQL transaction isolation levels.
status. Being on call for your own those three things together, you
code shouldn’t have low status. can create psychological trust in Failure
Reverse these perceptions by in- a team. And this kind of safety is Many database-automation
creasing the status of the person critical to enable risk taking and and operational tools are frag-
who is owning the production learning. ile because they were not built
behavior of your systems. by people with experience run-
When it comes to people, you ning databases at scale. This
Look at what creates friction and need experts. Increasing the will cause critical outages and
resistance to the change that you autonomy of development teams perhaps impact careers. Automa-
want to create and then create doesn’t mean you no longer tion that tries to work outside a
a path of least resistance. Make need database experts. You need well-defined scope is more likely
the right thing the easiest thing people who are going to be able to experience these problems.
to do and try to make the wrong to predict that a tool is going to On the other hand, giving up on
thing or the old ways a little bit cause an outage because it was automation and relying on man-
harder to do. Definitely get every- written for Microsoft Access on ual toil will not help either. That
body on the same page as far as a desktop and you are running will perpetuate the problems until
experiencing the benefits of the PostgreSQL on a thousand nodes they build to a crisis.
new ways of working and share in the cloud. When Schwartz was
the pain among everyone. Don’t doing performance consulting, Another problem Schwartz has
always call the same engineer often the performance problems seen is the use of distinct sets
every time there is a problem at that he saw were caused by au- of tooling for deploying code to
3:00 a.m. tomation, by monitoring systems production and for deploying
that were causing undue burden database changes to produc-
At the same time, you need on the systems. Many problems tion. That means you have two
leadership support. And leader- come from the tooling itself rath- distributed systems that must
ship comes from all levels, not er than the applications using coordinate with each other. Try to
just from the top of the org chart. them. use the code-deployment tooling,
You need real buy-in and sup- automation, and infrastructure
port from peers and SMEs, not DBAs are subject-matter experts as code for the database as well,
just with words but through their with deep domain knowledge; rather than building a shadow
actions. If you have somebody they should not be caretakers copy in parallel.
passive-aggressively working of the database. They should be
against changes, the changes helping everybody else to get Culture change is hard. Too often
will never stick. better, enabling engineers in ser- people expect that bringing in a
vice teams to learn the necessary new vendor with modern tools
Finally, you need to build trust database skills to build and run will create culture change. But
with other people and empathize their applications, including data you can’t really do that from the

16
outside, so relying on these tools maintenance becomes nearly

The InfoQ eMag / Issue #74 / June 2019


TL;DR
to transform the ways of working impossible.
will likely fail and the initiators
may have their reputations on the When you bring DevOps to the
line. database, the DBA becomes a • Without DevOps, database
changes get thrown over
database reliability engineer, as the wall to DBAs. The latter
Planning these initiatives also explained in Database Reliability react by gatekeeping in
has its dangers. You might be Engineering, a highly recom- order to avoid frequent
outages but end up a bot-
pushing too fast, trying to get too mended book. Similar to how tleneck to faster delivery.
much changed in a short period. sysadmins became site reliability
You might be pushing speed of engineers (SRE), you need to ap- • DevOps for the database
delivery at the expense of stabili- ply that same kind of a worldview means developers have to
own their schemas and re-
ty. By going all in, you are not giv- change from database admin- lated application workload
ing people enough time to absorb istration to database reliability and performance.
changes. People need to pay engineering.
down technical debt, to sit with • Developers also have to
be able to debug, trouble-
their systems, to use the observ- By adopting the approach shoot, triage, and repair
ability tooling to learn how the Schwartz outlines, you get their own database outag-
systems really run, to poke and rewarding outcomes from the es as much as possible.
ask little questions that trigger process itself, from a healthier
• Database changes need to
new and unexpected discoveries. culture that you build, and from
be version-controlled and
the new tooling that supports take the same path to pro-
Challenges your work. And individuals derive duction (i.e., a single auto-
mated deployment pipeline
There are many of challenges on benefits and competences that
that can deploy schema
this journey in terms of chang- boost their careers. and other changes) as any
ing politics, culture, and people. other changes.
Advocating for change is a skill
and a difficult one to learn. But it • To bring DevOps to the da-
tabases, we need to con-
is critical for your career growth. sider four elements: people
Being able to clearly articulate (enable engineers’ data-
what needs to be done, why it base competency), culture
(make the new ways of
needs to be done, and the bene- working more appealing
fit is a skill that all of should be than the old), process (plan
working on. ahead and follow through
iteratively), and tools
(deployment, monitoring,
There are also tooling challeng- observability, CI/CD).
es. Traditional databases are
hard to operate in a cloud-na-
tive way. They were not built for
non-blocking schema changes
and failover is hard. When run-
ning them at scale, performance
becomes critical and downtime
must be minimal, and planned

17
Why and How Database
The InfoQ eMag / Issue #74 / June 2019

Changes Should
Be Included in the
Deployment Pipeline
by Eduardo Piairo, DevOps Coach

I’ve spoken several times on how databases and applications should


coexist in the same deployment pipeline. In other words, they should
share a single development lifecycle. You should not handle database
development differently from application development.

A database change should be subject to the same activities — source


control, continuous integration (CI), and continuous delivery (CD) —
as application changes:

Disclaimer: Both CI and CD are sets of principles and practices that


go beyond build, test, and deploy. The use of these terms throughout
this article is an assumed oversimplification with the purpose of see-
ing CI and CD as goals to be achieved by making use of the deploy-
ment pipeline (but not restricted to that alone).

18
Data makes databases special. Data must persist data and should consider both to justify the chang-

The InfoQ eMag / Issue #74 / June 2019


before, during, and after the deployment. Rolling es they make to the database.
back a database is riskier than rolling back an ap-
plication. If your application is down or buggy, you Value of automation
will annoy some of your customers and possibly Automation lets you control database development
lose money — but if you lose data, you will also lose by making the deployment process repeatable,
your reputation and customer’s trust. reliable, and consistent. Another perspective is
that automation allows you to fail (and failures will
The fear of database changes is the main reason inevitably happen) in a controlled and known way.
that most organizations handle database and Instead of recurrently having to (re)learn manual
application changes differently, which leads to work, you improve and rely on automation.  
two common scenarios: database changes are
not included in the deployment pipeline and data- Additionally, automation provides bonuses in that it
base changes pass through a different deployment removes/reduces human intervention and increas-
process. es speed of response to change.

By not including the database in the pipeline, most Where you start automation depends on your cho-
of the work related to database changes ends up sen approach. However, it’s a good idea to move
being manual, with the associated costs and risks. from left to right in the deployment pipeline and
On top of that, this: take baby steps.

• results in a lack of traceability of database The first step is to describe the database change
changes (changes history), using a script. Normally, a person does this.

• prevents the full application of proper CI and CD


The next step is to build and validate an artifact
practices, and
(e.g., a ZIP or NuGet package) containing the data-
• instills fear of changes. base changes. This step should be fully automated.
In other words, you need to use scripts to build the
Sometimes, the database and the application have package, run unit/integration tests, and send the
independent pipelines, which means that the devel- package to an artifact-management system and/
opment team and the DBA team need to keep syn- or to a deployment server/service. Normally, build
chronizing. This requires a lot of communication, servers include these steps by default.
which leads to misunderstandings and mistakes.
The last step would be to automate the deploy-
The end result is that databases become a bottle- ment of the database changes in the target server/
neck in an agile delivery process. To fix this, data- database. As in the previous step, you can use the
base changes need to become trivial, normal, and deployment server’s built-in options or build your
comprehensible. This is accomplished by storing own customized scripts.
changes in source control, automating deployment
steps, and sharing knowledge and responsibility Scripts and source control
between the development teams and the DBA team. You can describe every database change with a
DBAs should not be the last stronghold of the da- SQL script, meaning that you can store them in
tabase but the first to be consulted when designing source control — for example, in a Git reposito-
a database change. Developers should understand ry. SQL scripts are the best way to document the
that processing data is not the same as storing changes made in the database.

19
CREATE TABLE [dbo].[Member]( The following script represents how the database
The InfoQ eMag / Issue #74 / June 2019

[MemberId] [INT] IDENTITY(1,1) NOT should start from scratch, not the commands need-
NULL,
ed to migrate from the current database state. This
[MemberFirstName] [NVARCHAR](50)
NULL, is known as a declarative-state or desired-state
[MemberLastName] [NVARCHAR](50) NULL, approach.
CONSTRAINT [PK_Member] PRIMARY KEY
CLUSTERED CREATE TABLE [dbo].[Member](
( [MemberId] [INT] IDENTITY(1,1) NOT
[MemberId] ASC NULL,
)WITH (PAD_INDEX = OFF, STATISTICS_ [MemberFirstName] [NVARCHAR](50)
NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, NULL,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = [MemberLastName] [NVARCHAR](50) NULL,
ON) ON [PRIMARY] [MemberEmail] [NVARCHAR](50) NULL,
) ON [PRIMARY] CONSTRAINT [PK_Member] PRIMARY KEY
CLUSTERED
Source control is the first stage of the deployment (
pipeline and provides a number of benefits. [MemberId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_
NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
Traceability through change history allows you ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS =
to discover what changed, when it changed, who ON) ON [PRIMARY]
changed it, and why. ) ON [PRIMARY]

This script, on the other hand, represents how the


By using a centralized or distributed mechanism
database should change from its current state.
for sharing the database code (SQL scripts) across
This is known as an imperative or migration-based
the organization, the process for developing and
approach.
deploying database changes is also shared. This
supports sharing and learning. Since the organiza- ALTER TABLE [dbo].[Member]
tion is using a shared process to deploy database ADD[MemberEmail] [NVARCHAR](50) NULL
changes, it becomes easier to apply standards and
conventions that reduce conflicts and promote best State and migration approaches each has advan-
practices. tages and disadvantages and different implications
for the CI and CD stages of the pipeline.
While this article focuses on SQL databases,
NoSQL databases face the same change-manage-
ment challenges (even if the approaches differ as
schema validation is not possible on NoSQL). The
common approach for NoSQL databases is to make
the application (and development team) responsi-
ble for managing change, possibly at runtime rather
than at deploy time. In the state approach, the change script is auto-
matically generated by a state comparator that
After version-controlling database changes, you looks at two different states and generates a delta
next need to determine if you should use a state- script. In the migrations approach, a person creates
based or a migration-based approach to describe the change script.
database changes.

20
The state approach is usually faster in terms of change script cre-

The InfoQ eMag / Issue #74 / June 2019


ation/generation and the scripts are automatically created following
best practices. However, the state comparator generates the del-
ta script at delivery, as it needs to compare the state of the target
database with the desired-state script coming from source control.
This means there should be a manual review of the generated change
script.

In the migrations approach, the quality of the change script entirely


depends on the SQL skills of the script creator. However, this ap-
proach is more powerful because the script creator knows the context
of the change, and since the change is described in an imperative
way, it can be peer-reviewed early in the process.

State and migration approaches are not necessarily mutually exclu-


sive. In fact, both approaches are useful and necessary at different
times. However, combining both approaches in the same deployment
pipeline can become very hard to manage.

Independently of the approach, the golden rule is to use small batch-


es, i.e. keep changes as small as possible.

This list is from a real-world migration approach. We managed migra-


tions with Flyway and we stuck to a simple rule: one script, one oper-
ation type, one object. This helped enforce the use of small migration
scripts, which significantly reduced the number and complexity of
merge conflicts.

CI and CD
Achieving source control for database changes unlocks the second
and third stages of the deployment pipeline. You can now apply CI
and CD practices.

CI is when database changes are integrated and validated through


testing. You must decide what to test, what tests to use (unit, inte-

21
gration, etc.), and when to test (before, during, and after deployment).
The InfoQ eMag / Issue #74 / June 2019

Using small batches greatly reduces the associated risk.

This stage highly depends on the way business logic is managed


between application and database.

CD focuses on delivering database changes to the target


environment.

Metrics to consider include the time that the system needs to be


offline (e.g., to make a backup), time to recover from a failed deploy-
ment (e.g., time to restore a backup), and the scope of effect: sharing
a database between different applications increases the blast radius
on the system when the database becomes unavailable.

Again, small batches mitigate the risk.

Deployment scenarios
Scenario 1, a fully independent pipeline, is common in organizations
starting the process of database automation/integration with appli-
cation development. The first instinct is to build a separate pipeline
dedicated to the database in order to avoid affecting the existing
application pipeline.

The database and application have completely independent deploy-


ment pipelines: different code repositories, different CI processes,
different CD processes. The development team and the DBA team
must work to synchronize. In this scenario, the teams involved in the
process share no knowledge but instead form silos.

It’s possible, however, that a database (or a part of it) lives inde-
pendently from the application. For example, the database may not
connect to any application and users can establish a direct connec-
tion to the database. Although this use case is not common, having
an independent pipeline for the database would then be appropriate.

22
A second scenario is delivering the application and database

The InfoQ eMag / Issue #74 / June 2019


together.

Since the goal is to keep database and application changes in sync,


this scenario is the logical next step.

The database and application code live in different repositories and


have independent CI processes (deployment packages get created
separately), but share the same CD process. Delivery is the contact
point between database and application.

Despite the synchronization improvement when comparing to sce-


nario 1, this does not yet ensure synchronization between the correct
versions of the database and the application. It only guarantees the
latest versions of both get delivered together.

Scenario 3 is building and delivering the application and database


together. This is an exotic scenario that normally only represents a
short learning step on the way to scenario 4.

The database and application code still live in different repositories


but the CI process is shared to some extent (e.g., the application CI
process only starts after the database CI process succeeds). The
success of each part is related or connected.

23
Scenario 4 is a fully integrated pipeline. The database and applica-
The InfoQ eMag / Issue #74 / June 2019

TL;DR
tion code live in the same repository and share the same CI and CD
processes.

• Databases, although dif-


ferent from applications,
can and should be included
in the same development
process as applications.
This is called “database
shift left”.
You might think this scenario does not requiring synchronization, but
what is taking place is in fact a continuous synchronization process • When all database changes
that starts in source control and continues with code reviews of the are described with scripts,
you can apply source
database scripts. control and the database
development process can
Once database and application share the same deployment pipeline, include continuous-in-
tegration and continu-
every step of the pipeline can become a learning opportunity for all ous-delivery activities,
organization members. namely taking part in the
deployment pipeline.
It’s worthwhile to mention that achieving synchronization between
• Automation is the special
database and application does not do away with database mainte-
ingredient that makes the
nance or optimization operations. The deployment pipeline should deployment pipeline re-
support deployments when either only the database or only the appli- peatable, reliable, and fast,
reducing fear of database
cation needs to change.
changes.

The deployment pipeline is a cultural and technical tool for managing • Migrations-based and
changes in the software development process and should include state-based are two differ-
databases, applications, and infrastructure. ent approaches to describ-
ing a database change.
Independent of the choice,
Following the same deployment process for databases as well as small batches are always a
applications aligns with Gene Kim’s “three ways”, the principles un- good option.
derpinning DevOps, because it increases flow visibility, leading to a
• The deployment pipeline
better understanding of how to safely and quickly change the system, is a technical and cultural
it increases feedback and creates more learning opportunities at all tool that should reflect
levels, and sharing knowledge through the pipeline allows the organi- DevOps values and practic-
es according to the needs
zation to build a learning system, to avoid extensive documentation of each organization.   
(often a source of waste), and to reduce the truck factor (relying on a
system instead of relying on heroes).

The deployment pipeline is your software factory: the better your fac-
tory, the better your product, your software.

24
The InfoQ eMag / Issue #74 / June 2019
How Database
Administration
Fits into DevOps
by Ben Linders, Trainer / Coach / Author / Speaker

The Interviewee
The Agile Consortium Interna- database administration when
tional and Unicom organized organizations adopt DevOps.
the DevOps Summit Brussels
2016 on February 4 in Brussels, InfoQ: What activities do DBAs
where Dan North spoke about typically perform?
how database administration fits
into the world of DevOps. Dan North: The DBA role typically Dan North
spans multiple production envi- writes software and coaches
InfoQ interviewed him about the ronments, development teams, teams in Agile and Lean
methods. He believes in
activities that database admin- technologies, and stakeholders. putting people first and
istrators (DBAs) perform and They may be tuning a database writing simple, pragmatic
software. He believes
how they relate to the activities one minute, applying a security that most problems that
of developers and operations, patch the next, responding to a teams face are about
communication, and all the
how database administration is production issue, or answering others are too. This is why he
puts so much emphasis on
usually organized, how the data- developers’ questions. They need “getting the words right”, and
base fits into DevOps or contin- to ensure backups and replica- why he is so passionate about
BDD, communication and how
uous delivery (CD), and what he tion are configured correctly, to people learn.
expects the future to bring for ensure the appropriate systems

25
and users (and no one else!) have equivalent OO data structures, conservative, if not outright skep-
The InfoQ eMag / Issue #74 / June 2019

access to the right databases, but things quickly become com- tical, about database changes
and they need to be on hand to plex when your desired domain coming through from developers.
troubleshoot unusual system model diverges from the data-
behavior. base schema or when there are Sometimes, there is a mix of pro-
performance, availability, or scal- duction DBAs and development
Their real value lies in under- ing considerations. At this point, DBAs. The former tend to sit
standing the mechanics and having a helpful DBA as part of together, doing all the production
details of the database itself, its the development team can be maintenance work I described
run-time characteristics, and invaluable. above. The latter are helping
configuration, so they can bridge development teams interact with
the gap between developers On the operations side, the DBA the database correctly, both in
writing queries and operations is often responsible for imple- terms of schema design and
staff running jobs. A skilled DBA menting a business’s replication querying. This model can work
can identify ways to speed up or availability strategy. It is oper- really well, especially where the
slow-running queries, either by ations’ role to monitor the sys- production and development
changing the query logic, altering tems, diagnose issues, and keep DBAs have an existing relation-
the database schema, or editing the lights on, but the DBAs will ship of trust. The production
database run-time parameters. be closely involved in monitoring DBAs know the development
For instance, changing the order and diagnosing database-re- DBAs will ensure a level of quality
of joins, introducing an index (or lated issues. They also define and sanity in the schema design
sometimes removing an index!), the database management and and database queries, and the
adding hints to the database’s maintenance processes that the development DBAs trust the pro-
query execution planner, or up- operations team carry out. duction DBAs to configure and
dating database heuristics can maintain the various database
all have a dramatic impact on InfoQ: How is database adminis- instances appropriately.
performance. tration usually organized? What
are the benefits and pitfalls of InfoQ: How does the database
InfoQ: How do their activities re- organizing it that way? fit into DevOps or continuous
late to those done by developers delivery?
and by operations? North: Traditionally, the DBA
role is another technology silo, North: In a lot of organizations,
North: Sadly, too few developers turning requests or tickets into it simply doesn’t. Any database
really understand what goes on work. This means they can be changes go through a separate
in a relational database. Map- lacking context about the broader out-of-band process indepen-
ping layers like Hibernate or business or technology needs dent of application code chang-
Microsoft’s Entity Framework and constraints, and are doing es, and shepherding data-
hides the internals from regular the best they can in an informa- base-and-application changes
enterprise developers, whose tion vacuum. In my experience, through to production can be
bread and butter is C# or Java DBAs are often in an on-call fraught.
programming, and of course this support role so they are the ones
is a double-edged sword. On one being paged in the middle of the Some organisations talk about
hand, it makes it easier to de- night when a developer’s query “database as code”, and have
velop basic applications where exceeds some usage threshold. some kind of automated process
the database schema maps onto Because of this, they tend to be that runs changes into the data-

26
base as managed change scripts. The database tooling should

The InfoQ eMag / Issue #74 / June 2019


These scripts live under source figure out what has changed
control along with the applica- since the last version of the data-
tion code, which makes it easier base and create the appropriate
to track and analyze changes. migrations. Some vendors such
These change scripts, known as as Red Gate are making inroads
migration scripts or simply mi- in this space, but it feels like we
grations, are the closest we have still have a long way to go. Most
gotten to treating the database of the current “agile” database
as code, but they still contain lots tooling is about creating, apply-
of unnecessary complexity. ing, and managing migrations
rather than genuinely treating the
InfoQ: What do you expect that database as code.
the future will bring for database
administration when organiza-
tions adopt DevOps? How can
DBAs prepare themselves?

North: The ideal model is for


DBAs to be an integral part of
both development and operations
teams. DevOps is about integrat-
ing the technical advances of ag-
ile development such as continu-
ous integration, automation, and
version control into an operations
context while retaining the rigour
and discipline of the lights-on
mentality.

In the future, I would like data-


base changes to be as simple
as code changes. I don’t want to
write migration scripts by hand
or keep an audit of which scripts
have been run in and which ones
haven’t. I should be able to make
whatever changes I choose to a
development database and then
check it in like I would with code.
I don’t hand-roll the diffs that go
into my version control system, I
just change the software and the
VCS figures out what changed.

27
Treating Shared
The InfoQ eMag / Issue #74 / June 2019

Databases Like APIs


in a DevOps World
by Manuel Pais, DevOps and Delivery Consultant

Simon Sabin, principal consultant used to change the database’s


at Sabin.io, spoke at the WinOps internal structure while keeping
London 2017 conference on how backwards compatibility with the
to include database changes in a applications’ data operations. He
continuous-deployment model. A gave the example of migrating
key aspect when sharing data- a table of credit-card numbers
bases across multiple services or from plain text to encrypted text,
applications is to treat them as where the applications might
APIs, from the perspective of the still retrieve the numbers in plain
database owners. text by querying a view while the
actual table is encrypted and
Sabin suggested that mecha- decrypted via a trigger, as illus-
nisms such as views, triggers, trated below.
and stored procedures can be

28
The database respects an im- anisms are more complemen- “DataOps” or “database lifecycle

The InfoQ eMag / Issue #74 / June 2019


plicit contract/interface with tary than opposed. For schema management (DLM)”). Eduardo
the applications whereby in- changes, a gold schema ap- Piairo, another speaker at Win-
ternal changes (often required proach — where a target schema Ops 2017 and author of an article
for performance, security, and (desired state) is declared and a above, said, “It’s not about defin-
other improvements) should not tool such as SQL Compare ap- ing a new process for handling
impact the applications. Follow- plies the diff between current database changes; it’s about
ing the API analogy, breaking and desired state — is adequate. integrating them in the service
changes should be minimized More complex modifications lifecycle along with application
and communicated with plen- might be better suited for mi- and infrastructure code.”
ty of notice so that application gration scripts such as Flyway
developers can adapt and deploy or SQL Change Automation.
forward-compatible updates
before the database changes oc- Other recommendations from
cur. Such an approach can take Sabin included designing the
advantage of consumer-driven database to fit your tooling —
contracts for databases. not the other way around — and
logging deployment information
According to Sabin, this ap- in the database itself so it’s easy
proach is needed to intermediate to trace and diagnose issues.
two distinct views when dealing Treating database changes
with shared databases: applica- as part of the delivery pipeline
tion developers tend to see them also promotes auditability and
as dumb storage while database compliance. Finally, Sabin sug-
administrators (DBAs) see them gested that data security (at a
as repositories of critical busi- time when major data breaches
ness data. In an ideal world, each continue to take place) could be
application team would own their largely improved with a private
database and the team would in- versus public approach — for
clude a DBA role, but this is often example, by having small ac-
hard to apply. cessible (public) databases
that simply hold views to larger,
When applications themselves secure databases (private) not
require schema or other changes, directly reachable by the applica-
Sabin advises reducing batch tions themselves.
size (because achieving reliability
and consistency in databases Similarly to how the DevSecOps
is a complex task) and apply- movement took some time to
ing one small change at a time, become established, the idea of
together with the corresponding including database changes in an
changed application code. The integrated, continuous-deploy-
choice of migration mechanism ment process has been around
(gold schema versus migration for some time with different
scripts) will depend on the type designations (Sabin calls it
of change. For Sabin, the mech- “Data DevOps” and others call it

29
SPONSORED ARTICLE
The InfoQ eMag / Issue #74 / June 2019

4 top tips for starting your


database DevOps journey
If you’ve read the Accelerate collaboration between DBAs and and the C-level, all of whom need
State of DevOps Report from developers, a positive impact to understand the benefits to be
DORA and the 2019 State of on meeting compliance require- gained.”
Database DevOps report from ments, and gaining freedom to
Redgate, you may feel a rumble innovate without fearing data- It’s important, therefore, to gather
of envy that some businesses base deployments. allies from around the business
are already living the benefits of and ensure they understand
implementing DevOps, achieving Because of the critical business how including the database in
improvements to their metrics, data in databases, there’s often a DevOps can address their specif-
streamlining processes, and belief that implementing data- ic concerns.
saving time while delivering more base DevOps can take years. The
value. Quite simply, they’re ‘doing State of Database DevOps report, 2. Demonstrate the ROI your
it right’. however, found that over 50% of business will gain
organizations that have already Do your research before reaching
In short, 85% of organizations adopted DevOps for application out to execs for support. By
have adopted DevOps or plan to development estimate it would arming yourself with facts
do so in the next two years, 77% take less than six months to and figures that back up your
of application developers are now include the database. request for the investment, you’ll
responsible for database devel- be better prepared to field any
opment, and applying DevOps So how can you make the shift blockers before they arise.
practices like version control and and demonstrate that your soft-
continuous integration to the da- ware delivery processes are a In your efforts, don’t just look
tabase is seen as key to driving key driver of business value and at the financial Return on
high performance. success? Here are 4 key tips for Investment that can be gained.
kickstarting your DevOps journey. Look too at the business benefits
The Accelerate report talks about that adopting DevOps for the
version control and the automa- 1. Engage stakeholders from database can bring. We did some
tion of database changes along- across the organization landmark research into this in
side the application, and con- This is not a solo mission, and a recent whitepaper that shows
cludes: When teams follow these you won’t be able to get far how the monetary ROI can be
practices, database changes on your own. As the Database calculated, and balances it with
don’t slow them down, or cause DevOps Report points out: the value-driven benefits for
problems when they perform different stakeholders.
code deployments. “A wide range of stakeholders
are involved in implementing a
As well as speeding up the de- DevOps initiative, from IT Direc- Please read the full-length
livery of value, it means better tors or Managers to Developers version of this article here.

30
Continuous Delivery for Databases:

The InfoQ eMag / Issue #74 / June 2019


Microservices, Team Structures, and
Conway’s Law
by Matthew Skelton, Head of Consulting at Conflux

The way we think about data structures, and new ownership


and databases must adapt to models for software systems, all
dynamic cloud infrastructure of which have implications for
and continuous delivery (CD). data and databases.
Multiple factors are driving sig-
nificant changes to the way we In “Common database deploy-
design, build, and operate soft- ment blockers and continuous
ware systems: the need for rapid delivery headaches”, I looked at
deployments and feedback from some of the technical reasons
software changes, increasingly why databases often become
complex distributed systems, and undeployable over time, and how
powerful new tooling for exam- this becomes a blocker for CD.
ple. These changes require new This article will look at some
ways of writing code, new team of the factors driving the need

31
for better deployability and will affecting other applications and chronous, and complex — are
The InfoQ eMag / Issue #74 / June 2019

investigate the emerging pattern features, but without waiting for a often not possible to understand
of microservices for building and large regression test of all con- fully prior to deployment, and
deploying software. nected systems. their behavior is understood only
by observation in production.
Along the way, we’ll consider why Several factors drive this in-
and how we can apply some of creased need for deployability. Finally, an ability to rapidly
the lessons from microservices First, startup businesses can deploy changes is essential to
to databases in order to improve now build a viable product, using defend against serious security
deployability. In particular, we’ll new cloud technologies, with vulnerabilities such as Heart-
see how the eventual consisten- staggering speed to rival those of bleed and Shellshock in core
cy found in many microservices large incumbent players. Since it libraries, or platform vulnera-
architectures could be applied began to offer films over the In- bilities such as SQL injection in
to modern systems, avoiding the ternet via Amazon Web Services’ Drupal 7. Waiting for a two-week
close coupling (and availability cloud services, Netflix has grown regression-test cycle is simply
challenges) of immediate consis- to become a major competitor to not acceptable in these cases,
tency. Finally, we’ll look at how both television companies and to nor is manual patching of hun-
Conway’s law limits the shape the film industry. With software dreds of servers: changes must
of systems that teams can build, designed and built to work with be deployed using automation,
alongside some new (or redis- the good and bad features of driven by pre-tested scripts.
covered) ownership patterns, hu- public cloud systems, companies
man factors, and team topologies like Netflix are able to innovate at In short, organizations must val-
that help to improve deployability speeds significantly higher than ue highly the need for change in
in a CD context. organizations relying on systems modern software systems, and
built for the pre-cloud age. design systems that are amena-
Factors driving deployability ble to rapid, regular, repeatable,
The ability to deploy new or A second reason why deployabili- and reliable change — an ap-
changed functionality safe- ty is now hugely important is that proach characterized as “con-
ly and rapidly to a production if software systems are easier to tinuous delivery” by Jez Humble
environment — deployabili- deploy, deployments will happen and Dave Farley in their 2010
ty — has become increasingly more frequently and so more book of the same name.
important since the emergence feedback will be available to de-
of commodity cloud computing velopment and operations teams. The use of the microservices
(virtualization combined with Faster feedback allows for more architectural pattern to compose
software-defined elastic infra- timely and rapid improvements, modern software systems is one
structure). There are plenty of responding to sudden changes in response to the need for software
common examples of where this the market or user base. that we can easily change. The
is important: we might want to microservices style separates
deploy a new application, a mod- Faster feedback in turn provides a software application into a
ified user journey, or a bug fix to a an improved understanding of number of discrete services, each
single feature. Regardless of the how the software really works providing an API and each with
specific case, the crucial thing is for teams involved, a third driver its own store or view of data that
that the change can be made in- for good deployability. Modern it needs. Centralized data stores
dependently and in an isolated software systems — which are of the traditional RDBMS form
way in order to avoid adversely increasingly distributed, asyn- are typically replaced with local

32
data stores and either a central the services is often left up to set, and the subsequently lower

The InfoQ eMag / Issue #74 / June 2019


asynchronous publish/subscribe the team (within certain common risk of errors.
event store or an eventually cross-team parameters, such as
consistent data store based on log aggregation, monitoring, and Microservices and databases
NoSQL document or column da- error diagnosis). Common questions from people
tabases (or both). These non-re- familiar with the RDBMS-based
lational technologies and data The ownership of the service software systems of the 1990s
update schemes decouple the extends to decisions about when and 2000s relate to the consis-
different services both temporally and what to deploy (in collabora- tency and governance of data for
and in terms of data schemas tion with the product or service software built using a microser-
while allowing them to be com- owner) rather than orchestrating vices approach, with data dis-
posed to make larger workflows. across multiple teams, which can tributed into multiple data stores
be painful. This means that if a based on different technologies.
Within an organization, we new feature or bug fix is ready to Likewise, the more operationally
would desperately avoid hav- go into production, the team will minded ask how database back-
ing to couple the deployment deploy the change immediately, ups can be managed effectively
of our own new application to using monitoring and analysis in a world of polyglot persistence
the deployment of new appli- to closely track performance with databases created by
cations from our suppliers and and behavior and being ready development teams. These are
our customers. In a similar way, to roll back (or roll forward) if important questions and not to
there is increasingly a benefit to necessary. We acknowledge the be discounted but I think these
decoupling the deployment of possibility of breaking things, but questions can sometimes miss
one business service from that of weighed that against the value the point of microservices and
other, unrelated services within of getting changes deployed into increased deployability; when
the same organization. production frequently and rapidly, thinking about microservices, we
minimizing the size of the change
In some respects, the micros-
ervices style looks somewhat
like SOA and there is arguably an
“Emperor’s New Clothes” argu-
ment about the term “microser-
vices”. However, where microser-
vices departs radically from SOA
is in the ownership model for the
services: with microservices, only
a single team (or perhaps even
a single developer) will develop
and change the code for a given
service. The team is encouraged
to take a deep ownership of the
services they develop, with this
ownership extending to deploy-
ment and operation of the ser- Figure 1 / Separate organizations are hugely resistant to close
vices. The choice of technologies coupling and immediate consistency in their respective systems.
to use for building and operating

33
need to adopt a different sense of common — it might take min- boundary as the line at which
The InfoQ eMag / Issue #74 / June 2019

scale. utes, hours, or days to process the default data update scheme
and the organization deals with switches to immediate consis-
Consider a successful organ- the small proportion of anom- tency — it was merely a conve-
isation with a typical web- alies in back-office processing, nient place for that in the 1990s
based OLTP system for orders, refunds, etc. They use proven and 2000s. The boundary for
built around 2001 and since techniques such as master immediate consistency has to lie
evolved. Such a system likely has data management (MDM) to tie fairly close to the operation being
a large core database running on together identifiers from different performed; otherwise, if we in-
Oracle, DB2, or SQL Server plus databases, allowing audits and sisted on immediate consistency
numerous secondary databases. queries for common data across across all supplier organizations
The databases may serve several the enterprise. and their suppliers ad infinitum
separate software applications. we’d have to “stop the world”
Database operations might be Between the databases of dif- briefly just to withdraw cash
coordinated across these data- ferent organizations, eventual from an ATM! There are times
bases using distributed transac- consistency, not immediate when immediate consistency is
tions or cross-database calls. consistency, is the norm. Con- needed, but eventual consistency
sider the Internet domain-name should be seen as the norm or
However, the organization need system (DNS) as a great example default.
to send data or read data from of eventual consistency that has
remote systems — such as those been proven to work robustly for When considering microservices
of a supplier — and it is likely decades. and eventual consistency in par-
to only sparingly use distribut- ticular, we need to take the kinds
ed transactions. Asynchronous There is nothing particular of rules that we’d naturally apply
data reconciliation is much more special about the organizational to larger systems (say book-
keeping for a single organiza-
tion or orders for a single online
store) and apply these at a level
or granularity an order of mag-
nitude smaller: at the boundary
of individual business services.
In the same way that we have
come to avoid distributed trans-
actions across organizational
boundaries, with a microservices
architecture, we avoid distrib-
uted transactions across sepa-
rate business services, allowing
event-driven asynchronous
messaging to trigger workflows
in related services.
Figure 2 / For the vast majority of activities, eventual
consistency is the norm. This approach is also perfect for most So, we can see that there are
services and systems within an organization. similar patterns of data updates
at work between organizations

34
and between business services years ago, particularly in terms has some reasonable dash-

The InfoQ eMag / Issue #74 / June 2019


in a microservices architecture: of their programmability (config- boarding features in versions
asynchronous data reconcilia- uration through scripts and APIs 12 and 13). Tools such as DLM
tion with eventual consistency, rather than arcane config files or Dashboard from Red Gate simpli-
typically driven by specific data GUIs). We have had to adopt a fy tracking changes to database
events. In both cases, we must first-class approach to monitor- schemas, reducing the burden on
design data workflows for this ing for modern web-connected DBAs and allowing the automa-
asynchronous, event-driven pat- systems because these systems tion of audit and change-request
tern, although the technologies are distributed, and will fail in activities that can be time-con-
we use for each can appear to be ways we cannot predict. suming to do manually.
very different.
Thankfully, the overhead of mon- The increase in computing power
Another pattern that is shared itoring applications is propor- that allows us to run much more
between microservices and tradi- tionately smaller than in the past, sophisticated monitoring and
tional RDBMS systems is the use with faster CPUs, networks, etc. metrics tooling against multiple
of event streams for construct- combined with modern algo- discrete microservices can also
ing a reliable view of the world. rithms for dynamically reducing be brought to bear on the chal-
Every DBA is familiar with the monitoring overhead. Enter- lenge of monitoring the explosion
transaction log used in relational prise-grade persistent storage of database instances and types
databases and how it’s funda- (spinning disks, SSDs) is now which these microservices her-
mental to the consistency of the also significantly cheaper per ald. Ultimately, this might result
data store, allowing replication terabyte than 10 years ago. Yes, in what I call a “DBA as a service”
and failure recovery by replaying we have more data to store, but model.
a serialized stream of events. operating systems, typical run-
We use the same premise as the times, and support files have not In one organization I worked
basis for many microservices ar- grown so fast as to absorb the for, we identified a large system
chitectures, where the technique decrease in storage cost. Cheap- component (part of the core
is known as “event sourcing”. er persistent storage means that transaction-processing flow) that
An event-sourced microservices denormalized data is less of a was difficult to test and whose
system could arguably be seen concern with respect to data CI builds were often broken.
as a kind of application-level sprawl, meaning that an increase Two different teams frequently
transaction log, with various ser- in data caches or duplication in committed changes to the Git
vice updates triggered by events event stores or document stores repository, which led to a lack
in the log. If event streams are is acceptable. of ownership. We looked at the
a solid foundation for relation- Git commits and realised that in
al databases, why not also for Monitoring tools for data- effect the component was really
business applications built on bases also have improved formed of two separate chunks
microservices? significantly in recent years. of functionality, and that one
Runtime monitoring tools such team tended to make changes to
Changes that make as Quest Foglight, SolarWinds one chunk with the second team
microservices easier Database Performance Analyzer, mostly changing the second
Application and network mon- and Red Gate SQL Monitor all chunk. We split the component
itoring tools are significantly help operations teams to keep in half, creating two separate Git
better in almost every respect tabs on database performance repositories, allowing each team
compared to those of even a few (even Oracle Enterprise Manager

35
to focus on the functionality they and operating the software is consistency, and queue-based
The InfoQ eMag / Issue #74 / June 2019

cared about. that the microservices are easier event-sourced operations for
to understand than the larger background updates. The sense
Rather than spending time monolithic systems. Systems of ownership tends to keep the
deploying database schema that are easier to comprehend quality of code high, and avoids a
changes manually to individual and deploy (avoiding ma- large and unwieldy mass of tests
core relational databases, the chine-level multitenancy) make accreting to the software.
next-generation DBA will likely be teams much less fearful of
monitoring the production estate change, and in turn allows them Conway’s law and the
for new database instances, to feel more in control of the sys- implications for CD
checking conformance to basic tems they deal with. For modern The way in which teams commu-
standards and performance, and software systems whose behav- nicate and interact is the major
triggering a workflow to auto- ior cannot easily be predicted, the determinant of the resulting soft-
matically backup new databases. removal of fear and increase in ware system architecture; this
Data audit will be another im- confidence are crucial psycho- force (termed the “homomorphic
portant part of that DBA’s role, logical factors in the long-term force” by software guru Allan
making sure that there is trace- success of such systems. Kelly) is captured by Conway’s
ability across the various data law, published in 1968: organiza-
sources and technologies and Microservices architectures aim tions that design systems… are
flagging any gaps. for human-sized complexity, constrained to produce designs
decoupling behavior from other that are copies of the commu-
One of the advantages of micro- systems by the use of asynchro- nication structures of these
services for the teams building nous, lazy data loading, tuneable organizations.

Software architect Ruth Malan


expanded on this in 2008: “if the
architecture of the system and
the architecture of the organiza-
tion are at odds, the architecture
of the organization wins.”

In short, this means that we


must design our organizations to
match the software architecture
that we know (or think) we need.
If we value the microservices
approach of multiple independent
teams, each developing and op-
erating their own set of discrete
Figure 3 / Seen through the lens of Conway’s Law, a workflow services, then we will need at
with central DBA and operations teams, rotated 90 degrees least some database capability
clockwise, reveals the inevitable application architecture — within each team. If we kept a
distributed front-end and application layers and a single, centralized DBA team for these
monolithic data layer. ideally independent teams, Con-
way’s law tells us that we would

36
end up with an architecture that After we start looking at soft- own DBA, because many mi-

The InfoQ eMag / Issue #74 / June 2019


reflects that communication ware systems through the lens croservices will have relatively
pattern — probably a central of Conway’s law, we begin to see simple and small databases with
database, shared by all services, the team topologies reflected in flexible schemas and logic in the
something we wanted to avoid! the software, even if that results application to handle schema
in an unplanned or undocument- changes (or to support multiple
Since realizing the significance of ed software architecture. For schemas simultaneously). Per-
Conway’s law several years ago, I organizations that have decided haps one DBA can assist several
have been increasingly interested to adopt microservices in order different microservices product
in team topologies within orga- to deploy features more rapidly teams, although you’d need some
nizations and how these help or and independently, there must be care not to introduce coupling at
hinder the effective development a change in the team structures if the data level due to the shared
and operation of the software the new software architecture is DBA.
systems required by the organi- to be effective.
zation. When talking to clients, In practice, this might mean that
I have often heard comments The team building and operating every team has their own data
like this from overworked DBAs: the microservices must handle expert or data owner, responsible
“When we have multiple stories the vast majority of database for only their localized data ex-
all working at the same time, it’s changes, implying a database perience and architecture (there
hard to coordinate those changes capability inside the team. That are parallels between this and
properly.” does not necessarily mean that some of the potential DevOps
each microservices team gets its team topologies I’ve discussed
in the past). Other aspects of
data management might be best
handled by a separate team: the
“DBA as a service” mentioned
above. This overarching team
would provide a service that
included:

• database discovery (ex-


istence, characteristics,
requirements),

• taking and testing back-


ups of a range of supported
databases,

• checking the database recov-


ery modes,
Figure 4 / For a true microservices model, and the agility that • configuring the high-avail-
implies, each service team needs to support its own data store, ability (HA) features of
potentially with an overarching (or underlying) data team to different databases based on
provide a consistent environment and overview of the complete tuneable configuration set by
application. the product team, and

37
• auditing data from multiple A good way to think about a “DBA limitations, and keep the bound-
The InfoQ eMag / Issue #74 / June 2019

databases for business-level as a service” team is to imagine it aries clear; adhere to the social
consistency (such as postal as something you’d pay for from contracts and seek collaboration
formats). AWS. Amazon don’t care about at the right places.
what’s in your application but will
Of course, each autonomous coordinate your data/backups/ Supporting practices for
service team must also commit resilience/performance. It’s im- database CD
to not breaking functionality for portant to maintain only a mini- In addition to changes in tech-
other teams. In practice, this mal interaction surface between nologies and teams, there are
means that software written by each microservice and the “DBA some essential good practices
one team will work with multiple as a service” team (perhaps just for effective CD with databases
versions of calling systems, ei- an API) so that they’re not pulled using microservices. Foremost is
ther through implicit compatibili- into the design and maintenance the need for a focus on the flow
ty in the API (“being forgiving”) or of individual services. of business requirements with
through use of some kind of ex- clear cross-program prioritiza-
plicit versioning scheme (prefer- There is no definitively correct tion of multiple workstreams, so
ably semantic versioning). When team topology for database that teams are not swamped with
combined with a concertina management within a microser- conflicting requests from differ-
approach — expanding the num- vices approach, but a traditional ent budget holders.
ber of supported older clients, centralized DBA function is likely
then contracting again as clients to be too constraining if the DBA The second core practice is
upgrade — this decouples each team must approve all database to use version control for all
separate service from changes changes. Each team topology database changes: schema,
in related services, an essential has implications for how the users, triggers, stored proce-
aspect of enabling independent database can change and evolve. dures, reference data, and so on.
deployability. Choose a model based on an un- The only database changes that
derstanding of its strengths and should be made in production
are CRUD operations result-
ing from inbound or scheduled
data updates; all other changes,
including updates to indexes and
performance tuning, should flow
down from upstream environ-
ments, having been tested before
being deployed (using automa-
tion) into production.

Practices such as consum-


er-driven contracts and con-
certina versioning for databas-
es help to clarify the relationship
between different teams and
to decouple database changes
Figure 5 / One potential topology for a microservices/“DBA as a from application changes, both
service” team structure. of which can help an organiza-

38
tion to gradually move towards a the 1990s and 2000s, in fact the

The InfoQ eMag / Issue #74 / June 2019


microservices architecture from premises are familiar but sim-
one based on an older monolithic ply applied at a different scale.
design. Just as most organizations kept
database transactions with-
It is also important to avoid in their organization and using
technologies and tools that asynchronous data reconciliation
exist only in production (I call and eventual consistency, so
these “singleton tools”). Special most microservices aim to keep
production-only tools prevent database transactions within
product teams from owning their their service, relying on event-
service, creating silos between based messaging to coordinate
development and the DBA func- workflows across multiple ser-
tion. Similarly, having database vices. The event stream is then
objects that exist only in produc- fundamental to both relational
tion (such as triggers, agent jobs, databases and to many micros-
backup routines, procedures, in- ervices systems.
dexes, etc.) can prevent a product
team from taking ownership of To support the microservices ap-
the services they are building. proach, we need new team struc-
tures. Accepting the implications
Summary of Conway’s law, we should set
The microservices approach to up our teams so that the com-
building software systems is munication structures map onto
partly a response to the need for the kind of software architecture
more rapid deployment of dis- we know we need: in this case,
crete software functionality. Mi- independent teams that consume
croservices typically use a data certain database management
model that relies on asynchro- activities as a service. The DBA
nous data updates, often with role probably changes quite rad-
eventual consistency rather than ically in terms of its responsibil-
immediate consistency, and often ities and involvement in product
using an event-driven pub-sub evolution, from being part of a
scheme. A single team owns the central DBA function that reviews
group of services they develop every change to either being part
and operate, and data is gener- of the product team on a daily
ally either kept local to a service basis, or to a “DBA as a service”
(perhaps exposed via an API) behind the scenes, essential to
or published to the event store, the cultivation and orchestration
decoupling the service from other of a coherent data environment
services. for the business.

Although at first the model looks


very different from traditional
RDBMS-based applications of

39
InfoQ @ InfoQ InfoQ InfoQ

Curious about
previous issues?

ODR brings you insights In this eMag, we explore In this eMag, we present you
on how to build a great the real-world stories of expert security advice on
organisation, looking at the organizations that have how to effectively integrate
evidence and quality of the adopted some of these security practices and
advice we have available new ways of working: processes in the software
from academia and our sociocracy, Holacracy, teal delivery lifecycle, so that
thought leaders in our organizations, self-selection, everyone from development
industry. This is a collection self-management, and with to security and operations
of reviews and commentary no managers. understands and contributes
on the culture and methods to the overall security
being used and promoted of the applications and
across our industry. Topics infrastructure.
cover organisational
structuring to leadership and
team psychology.

You might also like