Professional Documents
Culture Documents
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
• What is Amazon Aurora?
• Quick recap: Database internals & motivation for building Aurora
• Cloud-native database architecture
• Durability at scale
• Performance results
Features & demos
• Global databases
• Fast database cloning
• Database backtrack
What is Amazon Aurora ?
Enterprise class cloud native database
Interme Interme
diate diate
Log records with before and after images are stored in a write-ahead log (WAL)
DO REDO
UNDO
Log record
Quick recap: Crash Recovery Log records on
Pages on durable storage durable storage
Tx1
Checkpoint Tx2
Tx3
tr tr
System recovery
Quick recap: I/Os required for persistence Log records on
Pages on durable storage durable storage
Checkpoint write:
page sized, e.g. 16KB
Attached storage
Aurora approach: Log is the database
Solution:
Problem 2:
Solution:
Aurora approach: compute & storage separation
Compute & storage have different lifetimes Compute
SQL
Compute instances Transactions
SQL
We built a log-structured
Transactions
distributed storage system that
is multi-tenant, multi-attach, Caching
and purpose-built for
databases
Logging + storage
I/O flow in Amazon Aurora storage node
① Receive log records and add to in-memory
Storage node
queue and durably persist log records
Log records
Incoming queue 1 7 ② ACK to the database
Database ACK GC
③ Organize records and identify gaps in log
instance ④ Gossip with peers to fill in holes
2 Update
Queue Coalesce Data ⑤ Coalesce log records into new page versions
Pages Scrub
⑥ Periodically stage log and new page versions
5
Sort
Group 3 to S3
8
⑦ Periodically garbage collect old versions
Peer Peer-to-peer gossip Hot
storage log ⑧ Periodically validate CRC codes on blocks
nodes
4
Continuous backup
Note:
6
• All steps are asynchronous
S3 backup • Only steps 1 and 2 are in the foreground
latency path
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Uncorrelated and independent failures
X
Storage nodes with SSDs
What about AZ + 1 failures?
Lose 2/3 copies
Lose quorum
X
X
Storage nodes with SSDs
Lose data
Aurora tolerates AZ + 1 failures
Replicate 6-ways with 2 copies per AZ
Write quorum of 4/6
X
X
What if there is an AZ + 1 failure ?
Still have 3 copies
Storage nodes with SSDs
No data loss
Rebuild failed copy by copying from 3 copies
Recover write availability
Aurora uses segmented storage
➢ Partition volume into 𝑛 fixed-size segments
• Replicate each segment 6 ways into a protection group (PG)
A B C D E F
A B C D E G
Epoch 3: Node F is confirmed
unhealthy; new quorum group with
node G is active
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Aurora I/O profile
MySQL with replica Aurora
AZ 1 3 AZ 2 AZ 1 AZ 2 AZ 3
1 Replica Replica
Primary Replica Primary instance instance
instance instance instance
1 4 1
ASYNC 4/6 QUORUM
2 5
Continuous backup
Amazon EBS Amazon EBS
mirror mirror
Amazon
S3
Amazon
MySQL I/O profile for 30-min Sysbench run Aurora IO profile for 30-min Sysbench run S3
705,000 705,000
700,000
200,000
200,000
170,000 600,000
150,000 500,000
400,000
100,000
290,787
300,000 257,122
50,000 200,000
Replication
2 Replication
server agent
4
Continuous Storage fleet Storage fleet Continuous
backup backup
① Primary instance sends log records in parallel to storage nodes, High throughput: Up to 150K writes/second; negligible
replica instances, and replication server
performance impact
② Replication server streams log records to replication agent in
secondary region
Low replica lag: <1 second cross-region replica lag under
③ Replication agent sends log records in parallel to storage nodes and heavy load
replica instances Fast recovery: <1 minute to accept full read-write
④ Replication server pulls log records from storage nodes to catch up workloads after region failure
after outages
Global replication performance
Logical vs. physical replication
250,000 600 250,000 5.00
4.50
500
200,000 200,000 4.00
3.50
400
150,000 150,000 3.00
seconds
seconds
QPS
QPS
300 2.50
0 0 0 0.00
QPS Lag QPS Lag
Replication
2 Replication
server agent
4
Continuous Storage fleet Storage fleet Continuous
backup backup
Replication Lag
~110ms
2019-11-28 2019-11-28
07:48:15.614.124 07:48:15.725.479
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fast database cloning
Benchmarks
Create a copy of a database without
duplicate storage costs
Dev/test
• Creation of a clone is instantaneous because it applications Clone
doesn’t require deep copy
• Data copy happens only on write, when original
and cloned volume data differ
Clone Clone
Production Production
applications applications
Typical use cases
• Clone a production database to run tests
• Reorganize a database Production database
Page Page Page Page Page Page Page Page Page Page Page
1 2 3 4 5 1 2 3 4 5 6
• ~40M rows
• ~25GB space used
Clone
Database State (Cloned)
t0 t1 t2 t3 t4
Backtrack is a quick way to bring the database to a particular point
in time without having to restore from backups
• Rewinding the database to quickly recover from unintentional DML/DDL
operations
• Rewind multiple times to determine the desired point in time in the database
state; for example, quickly iterate over schema changes without having to restore
multiple times
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Database State
(@07:05:07)
• Added a column
• Added two rows
Oops!
Let’s Backtrack
Let’s Backtrack
07:06:00
Database State
(after Backtrack)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.