Professional Documents
Culture Documents
Jay Runkel
jay.runkel@mongodb.com
@jayrunkel
Agenda
• Customer Stories
• Sharding for Performance/Scale
– When to shard?
– How many shards do I need?
• Types of Sharding
• How to Pick a Shard Key
• Sharding for Other Reasons
2
Customer Stories
4
Foursquare
• 50M users.
• 6B check-ins to date (6M per day growth).
• 55M points of interest / venues.
• 1.7M merchants using the platform for marketing
• 11 MongoDB clusters
– 8 are sharded
6
CarFax
7
CarFax Shards
• 13 billion+ documents
– 1.5 billion documents added every year
• 1 vehicle history report is > 200 documents
• 12 Shards
• 9-node replica sets
• Replicas distributed across 3 data centers
8
9
What is Sharding?
Sharding Overview
Application
Driver
… …
Query Query Query
Router Router Router
Key Range
0..100
mongod
Read/Write Scalability
14
Scaling: Sharding
mongod mongod
Read/Write Scalability
15
Scaling: Sharding
Read/Write Scalability
16
How do I know I need to shard?
Does one server/replica…
Disk IOPS
• Handle my query throughput RAM
(operations per second)? Network
Disk IOPS
• Respond to queries fast enough RAM
19(latency)? Network
How many shards do I need?
Disk Space: How Many Shards Do I
Need?
• Sum of disk space across shards > greater than
required storage size
21
Disk Space: How Many Shards Do I
Need?
• Sum of disk space across shards > greater than
required storage size
Example
Storage size = 3 TB
Server disk capacity = 2 TB
2 Shards Required
22
RAM: How Many Shards Do I Need?
• WorkSet in RAM
– Shorter latency
– Higher Throughput
23
RAM: How Many Shards Do I Need?
24
RAM: How Many Shards Do I Need?
Example
428/128 = 3.34
4 Shards Required
25
Disk Throughput: How Many Shards
Do I Need
• Sum of IOPS across shards > greater than
required IOPS
• IOPS are difficult to estimate
– Update doc
– Update indexes
– Append to journal
– Log entry?
26
Disk Throughput: How Many Shards
Do I Need
• Sum of IOPS across shards > greater than
required IOPS
• IOPS are difficult to estimate
– Update doc
– Update indexes
– Append to journal Example
– Log entry? Required IOPS = 11000
Server disk IOPS = 5000
27
Types of Sharding
Sharding Types
• Range
• Tag-Aware
• Hashed
32
Range Sharding
Read/Write Scalability
33
Tag-Aware Sharding
Tag Ranges Shard Tag Start End
Winter 23 Dec 21 Mar
Spring 22 Mar 21 Jun
Summer 21 Jun 23 Sep
Fall 24 Sep 22 Dec
Shard Tags
34
Hash-Sharding
35
Hashed shard key
• Pros:
– Evenly distributed writes
• Cons:
– Random data (and index) updates can be IO
intensive
– Range-based queries turn into scatter gather
mongos
37
Hashed sharding document
distribution
38
How do I Pick A Shard Key
Shard Key characteristics
40
Low cardinality shard key
mongos
[ a, b )
41
Ascending shard key
mongos
[ ISODate(…), $maxKe
42
Reasons to Shard
Reasons to shard
• Scale
– Data volume
– Query volume
• Tiered Storage
Primary:LON
Secondary:NYC
Secondary:SYD
Primary:NYC
Secondary:LON
Secondary:SYD
Primary:SYD
Secondary:LON
Secondary:NYC
45
Tiered Storage
• Save hardware costs
• Put frequently accessed documents on fast
servers
– Infrequently accessed documents on less capable
servers
• Use Tag aware sharding
• 40 TB Database
• 2 shards of 20 TB each
• Challenge
– Cannot meet restore SLA after data loss
mongod mongod
20 TB 20 TB
47
Fast Restore
• 40 TB Database
• 4 shards of 10 TB each
• Solution
– Reduce the restore time by 50%
10 TB 10 TB 10 TB 10 TB
48
Summary
Determining the # of shards
• To determine required # of shards determine
– Storage requirements
– Latency requirements
– Throughput requirements
• Derive total
– Disk capacity
– Disk throughput
– RAM
• Scalability
• Geo-aware clusters
• Tiered Storage
51
Sharding: Where to go from here…
• MongoDB Manual:
http://docs.mongodb.org/manual/sharding/
• Other Webinars:
– How to Achieve Scale With MongoDB
• White Papers
– MongoDB Performance Best Practices
– MongoDB Architecture Guide
52
Thank You