Professional Documents
Culture Documents
Mostafa Elhemali, Niall Gallagher, Nicholas Gordon, Joseph Idziorek, Richard Krog
Colin Lazier, Erben Mo, Akhilesh Mritunjai, Somu Perianayagam ,Tim Rath
Swami Sivasubramanian, James Christopher Sorenson III, Sroaj Sosothikul, Doug Terry, Akshat Vig
dynamodb-paper@amazon.com
Amazon Web Services
Abstract on its ability to serve requests with consistent low latency. For
DynamoDB customers, consistent performance at any scale
Amazon DynamoDB is a NoSQL cloud database service that
is often more important than median request service times be-
provides consistent performance at any scale. Hundreds of
cause unexpectedly high latency requests can amplify through
thousands of customers rely on DynamoDB for its fundamen-
higher layers of applications that depend on DynamoDB and
tal properties: consistent performance, availability, durability,
lead to a bad customer experience. The goal of the design of
and a fully managed serverless experience. In 2021, during the
DynamoDB is to complete all requests with low single-digit
66-hour Amazon Prime Day shopping event, Amazon systems
millisecond latencies. In addition, the large and diverse set
- including Alexa, the Amazon.com sites, and Amazon fulfill-
of customers who use DynamoDB rely on an ever-expanding
ment centers, made trillions of API calls to DynamoDB, peak-
feature set as shown in Figure 1. As DynamoDB has evolved
ing at 89.2 million requests per second, while experiencing
over the last ten years, a key challenge has been adding fea-
high availability with single-digit millisecond performance.
tures without impacting operational requirements. To benefit
Since the launch of DynamoDB in 2012, its design and im-
customers and application developers, DynamoDB uniquely
plementation have evolved in response to our experiences
integrates the following six fundamental system properties:
operating it. The system has successfully dealt with issues
DynamoDB is a fully managed cloud service. Using the
related to fairness, traffic imbalance across partitions, moni-
DynamoDB API, applications create tables and read and write
toring, and automated system operations without impacting
data without regard for where those tables are stored or how
availability or performance. Reliability is essential, as even
they’re managed. DynamoDB frees developers from the bur-
the slightest disruption can significantly impact customers.
den of patching software, managing hardware, configuring a
This paper presents our experience operating DynamoDB at
distributed database cluster, and managing ongoing cluster
a massive scale and how the architecture continues to evolve
operations. DynamoDB handles resource provisioning, au-
to meet the ever-increasing demands of customer workloads.
tomatically recovers from failures, encrypts data, manages
software upgrades, performs backups, and accomplishes other
1 Introduction tasks required of a fully-managed service.
DynamoDB employs a multi-tenant architecture. Dy-
Amazon DynamoDB is a NoSQL cloud database service that namoDB stores data from different customers on the same
supports fast and predictable performance at any scale. Dy- physical machines to ensure high utilization of resources, en-
namoDB is a foundational AWS service that serves hundreds abling us to pass the cost savings to our customers. Resource
of thousands of customers using a massive number of servers reservations, tight provisioning, and monitored usage provide
located in data centers around the world. DynamoDB powers isolation between the workloads of co-resident tables.
multiple high-traffic Amazon properties and systems includ- DynamoDB achieves boundless scale for tables. There
ing Alexa, the Amazon.com sites, and all Amazon fulfillment are no predefined limits for the amount of data each table
centers. Moreover, many AWS services such as AWS Lambda, can store. Tables grow elastically to meet the demand of the
AWS Lake Formation, and Amazon SageMaker are built on customers’ applications. DynamoDB is designed to scale the
DynamoDB, as well as hundreds of thousands of customer resources dedicated to a table from several servers to many
applications. thousands as needed. DynamoDB spreads an application’s
These applications and services have demanding opera- data across more servers as the amount of data storage and
tional requirements with respect to performance, reliability, the demand for throughput requirements grow.
durability, efficiency, and scale. The users of DynamoDB rely DynamoDB provides predictable performance. The simple
Figure 1: DynamoDB timeline
DynamoDB API with GetItem and PutItem operations allows • Performing continuous verification of data-at-rest is a
it to respond to requests with consistent low latency. An ap- reliable way to protect against both hardware failures
plication running in the same AWS Region as its data will and software bugs in order to meet high durability goals.
typically see average service-side latencies in the low single-
digit millisecond range for a 1 KB item. Most importantly, • Maintaining high availability as a system evolves re-
DynamoDB latencies are predictable. Even as tables grow quires careful operational discipline and tooling. Mech-
from a few megabytes to hundreds of terabytes, latencies re- anisms such as formal proofs of complex algorithms,
main stable due to the distributed nature of data placement game days (chaos and load tests), upgrade/downgrade
and request routing algorithms in DynamoDB. DynamoDB tests, and deployment safety provides the freedom to
handles any level of traffic through horizontal scaling and safely adjust and experiment with the code without the
automatically partitions and re-partitions data to meet an ap- fear of compromising correctness.
plication’s I/O performance requirements. • Designing systems for predictability over absolute ef-
DynamoDB is highly available. DynamoDB replicates data ficiency improves system stability. While components
across multiple data centers—called Availability Zones in such as caches can improve performance, do not allow
AWS—and automatically re-replicates in the case of disk them to hide the work that would be performed in their
or node failures to meet stringent availability and durability absence, ensuring that the system is always provisioned
requirements. Customers can also create global tables that are to handle the unexpected.
geo-replicated across selected Regions for disaster recovery
and provide low latency access from anywhere. DynamoDB The structure of this paper is as follows: Section 2 expands
offers an availability SLA of 99.99 for regular tables and on the history of DynamoDB and explains its origins, which
99.999 for global tables (where DynamoDB replicates across derive from the original Dynamo system. Section 3 presents
tables across multiple AWS Regions). the architectural overview of DynamoDB. Section 4 covers
DynamoDB supports flexible use cases. DynamoDB the journey of DynamoDB from provisioned to on-demand
doesn’t force developers into a particular data model or con- tables. Section 5 covers how DynamoDB ensures strong dura-
sistency model. DynamoDB tables don’t have a fixed schema bility. Section 6 describes the availability challenges faced
but instead allow each data item to contain any number of at- and how these challenges were handled. Section 7 provides
tributes with varying types, including multi-valued attributes. some experimental results based on the Yahoo! Cloud Serving
Tables use a key-value or document data model. Developers Benchmark (YCSB) benchmarks, and Section 8 concludes
can request strong or eventual consistency when reading items the paper.
from a table.
In this paper, we describe how DynamoDB evolved as a
2 History
distributed database service to meet the needs of its customers
without losing its key aspect of providing a single-tenant ex- The design of DynamoDB was motivated by our experiences
perience to every customer using a multi-tenant architecture. with its predecessor Dynamo [9], which was the first NoSQL
The paper explains the challenges faced by the system and database system developed at Amazon. Dynamo was created
how the service evolved to handle those challenges while con- in response to the need for a highly scalable, available, and
necting the required changes to a common theme of durability, durable key-value database for shopping cart data. In the early
availability, scalability, and predictable performance. years, Amazon learned that providing applications with direct
The paper captures the following lessons that we have access to traditional enterprise database instances led to scal-
learnt over the years ing bottlenecks such as connection management, interference
• Adapting to customers’ traffic patterns to reshape the between concurrent workloads, and operational problems with
physical partitioning scheme of the database tables im- tasks such as schema upgrades. Thus, a service-oriented ar-
proves customer experience. chitecture was adopted to encapsulate an application’s data
behind service-level APIs that allowed sufficient decoupling Operation Description
to address tasks like reconfiguration without having to disrupt Inserts a new item, or replaces an old item
PutItem
clients. with a new item.
High availability is a critical property of a database ser- Updates an existing item, or adds a new
UpdateItem
vice as any downtime can impact customers that depend on item to the table if it doesn’t already exist.
the data. Another critical requirement for Dynamo was pre- The DeleteItem operation deletes a single
DeleteItem
dictable performance so that applications could provide a item from the table by the primary key.
consistent experience to their users. To achieve these goals, The GetItem operation returns a set of at-
Amazon had to start from first principles when designing GetItem tributes for the item with the given primary
Dynamo. The adoption of Dynamo widened to serve several key.
use cases within Amazon because it was the only database
service that provided high reliability at scale. However, Dy- Table 1: DynamoDB CRUD APIs for items
namo still carried the operational complexity of self-managed
large database systems. Dynamo was a single-tenant system
that shared most of the name of the previous Dynamo system
and teams were responsible for managing their own Dynamo
but little of its architecture. Amazon DynamoDB was the
installations. Teams had to become experts on various parts of
result of everything we’d learned from building large-scale,
the database service and the resulting operational complexity
non-relational databases for Amazon.com and has evolved
became a barrier to adoption.
based on our experiences building highly scalable and reliable
During this period, Amazon launched new services (no-
cloud computing services at AWS.
tably Amazon S3 and Amazon SimpleDB) that focused on a
managed and elastic experience in order to remove this op-
erational burden. Amazon engineers preferred to use these 3 Architecture
services instead of managing their own systems like Dynamo,
even though the functionality of Dynamo was often better A DynamoDB table is a collection of items, and each item is
aligned with their applications’ needs. Managed elastic ser- a collection of attributes. Each item is uniquely identified by
vices freed developers from administrating databases and a primary key. The schema of the primary key is specified at
allowed them to focus on their applications. the table creation time. The primary key schema contains a
The first database-as-a-service from Amazon was Sim- partition key or a partition and sort key (a composite primary
pleDB [1], a fully managed elastic NoSQL database service. key). The partition key’s value is always used as an input to
SimpleDB provided multi-datacenter replication, high avail- an internal hash function. The output from the hash function
ability, and high durability without the need for customers to and the sort key value (if present) determines where the item
set up, configure, or patch their database. Like Dynamo, Sim- will be stored. Multiple items can have the same partition key
pleDB also had a very simple table interface with a restricted value in a table with a composite primary key. However, those
query set that served as a building block for many developers. items must have different sort key values.
While SimpleDB was successful and powered many applica- DynamoDB also supports secondary indexes to provide
tions, it had some limitations. One limitation was that tables enhanced querying capability. A table can have one or more
had a small capacity in terms of storage (10GB) and of request secondary indexes. A secondary index allows querying the
throughput. Another limitation was the unpredictable query data in the table using an alternate key, in addition to queries
and write latencies, which stemmed from the fact that all table against the primary key. DynamoDB provides a simple in-
attributes were indexed, and the index needed to be updated terface to store or retrieve items from a table or an index.
with every write. These limitations created a new kind of Table 1 contains the primary operations available to clients
operational burden for developers. They had to divide data for reading and writing items in DynamoDB tables. Any oper-
between multiple tables to meet their application’s storage ation that inserts, updates, or deletes an item can be specified
and throughput requirements. with a condition that must be satisfied in order for the op-
We realized that the goal of removing the limitations of Sim- eration to succeed. DynamoDB supports ACID transactions
pleDB and providing a scalable NoSQL database service with enabling applications to update multiple items while ensuring
predictable performance could not be met with the SimpleDB atomicity, consistency, isolation, and durability (ACID) across
APIs. We concluded that a better solution would combine the items without compromising the scalability, availability, and
best parts of the original Dynamo design (incremental scala- performance characteristics of DynamoDB tables.
bility and predictable high performance) with the best parts of A DynamoDB table is divided into multiple partitions to
SimpleDB (ease of administration of a cloud service, consis- handle the throughput and storage requirements of the table.
tency, and a table-based data model that is richer than a pure Each partition of the table hosts a disjoint and contiguous part
key-value store). These architectural discussions culminated of the table’s key-range. Each partition has multiple replicas
in Amazon DynamoDB, a public service launched in 2012 distributed across different Availability Zones for high avail-
Figure 2: Storage replica on a storage node
7 Micro benchmarks
8 Conclusion
To show that scale doesn’t affect the latencies observed by
applications, we ran YCSB [8] workloads of types A (50 DynamoDB has pioneered the space of cloud-native NoSQL
percent reads and 50 percent updates) and B (95 percent reads databases. It is a critical component of thousands of applica-
tions used daily for shopping, food, transportation, banking, [7] C. Constantinescu, I. Parulkar, R. Harper, and S. Micha-
entertainment, and so much more. Developers rely on its lak. Silent data corruption—myth or reality? In 2008
ability to scale data workloads while providing steady perfor- IEEE International Conference on Dependable Systems
mance, high availability, and low operational complexity. For and Networks With FTCS and DCC (DSN), pages 108–
more than 10 years, DynamoDB has maintained these key 109. IEEE, 2008.
properties and extended its appeal to application developers
with game-changing features such as on-demand capacity, [8] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan,
point-in-time backup and restore, multi-Region replication, and R. Sears. Benchmarking cloud serving systems
and atomic transactions. with ycsb. In Proceedings of the 1st ACM symposium
on Cloud computing, pages 143–154, 2010.