You are on page 1of 48

#MongoDB

Sharding Methods For


MongoDB

Jay Runkel
jay.runkel@mongodb.com
@jayrunkel
Agenda

• Customer Stories
• Sharding for Performance/Scale
– When to shard?
– How many shards do I need?
• Types of Sharding
• How to Pick a Shard Key
• Sharding for Other Reasons

2
Customer Stories
4
Foursquare

• 50M users.
• 6B check-ins to date (6M per day growth).
• 55M points of interest / venues.
• 1.7M merchants using the platform for marketing

• Operations Per Second: 300,000


• Documents: 5.5B
5
Foursquare clusters

• 11 MongoDB clusters
– 8 are sharded

• Largest cluster has 15 shards (check ins)


– Sharded on user id

6
CarFax

• Large data set

7
CarFax Shards

• 13 billion+ documents
– 1.5 billion documents added every year
• 1 vehicle history report is > 200 documents

• 12 Shards
• 9-node replica sets
• Replicas distributed across 3 data centers

8
9
What is Sharding?
Sharding Overview

Application

Driver

… …
Query Query Query
Router Router Router

Shard 1 Shard 2 Shard 3 Shard N


Primary Primary Primary Primary

Secondary Secondary Secondary Secondary

Secondary Secondary Secondary Secondary


12
Scaling: Sharding

Key Range
0..100

mongod

Read/Write Scalability

14
Scaling: Sharding

Key Range Key Range


0..50 51..100

mongod mongod

Read/Write Scalability

15
Scaling: Sharding

Key Range Key Range Key Range Key Range


0..25 26..50 51..75 76.. 100

mongod mongod mongod mongod

Read/Write Scalability

16
How do I know I need to shard?
Does one server/replica…

• Have enough disk space to store


all my data?

• Handle my query throughput


(operations per second)?

• Respond to queries fast enough


18(latency)?
Does one server/replica set…
Server Specs
• Have enough disk space to store
all my data? Disk Capacity

Disk IOPS
• Handle my query throughput RAM
(operations per second)? Network

Disk IOPS
• Respond to queries fast enough RAM
19(latency)? Network
How many shards do I need?
Disk Space: How Many Shards Do I
Need?
• Sum of disk space across shards > greater than
required storage size

21
Disk Space: How Many Shards Do I
Need?
• Sum of disk space across shards > greater than
required storage size

Example

Storage size = 3 TB
Server disk capacity = 2 TB

2 Shards Required

22
RAM: How Many Shards Do I Need?

• Working set should fit in RAM


– Sum of RAM across shards > Working Set

• WorkSet = Indexes plus the set of documents


accessed frequently

• WorkSet in RAM 
– Shorter latency
– Higher Throughput

23
RAM: How Many Shards Do I Need?

• Measuring Index Size and Working Set


db.stats() – index size of each collection
db.serverStatus({ workingSet: 1}) – working
set size estimate

24
RAM: How Many Shards Do I Need?

• Measuring Index Size and Working Set


db.stats() – index size of each collection
db.serverStatus({ workingSet: 1}) – working
set size estimate

Example

Working Set = 428 GB


Server RAM = 128 GB

428/128 = 3.34

4 Shards Required
25
Disk Throughput: How Many Shards
Do I Need
• Sum of IOPS across shards > greater than
required IOPS
• IOPS are difficult to estimate
– Update doc
– Update indexes
– Append to journal
– Log entry?

• Best approach – build a prototype and measure

26
Disk Throughput: How Many Shards
Do I Need
• Sum of IOPS across shards > greater than
required IOPS
• IOPS are difficult to estimate
– Update doc
– Update indexes
– Append to journal Example
– Log entry? Required IOPS = 11000
Server disk IOPS = 5000

• Best approach – build a prototype and measure


3 Shards Required

27
Types of Sharding
Sharding Types

• Range
• Tag-Aware
• Hashed

32
Range Sharding

Key Range Key Range Key Range Key Range


0..25 26..50 51..75 76.. 100

mongod mongod mongod mongod

Read/Write Scalability

33
Tag-Aware Sharding
Tag Ranges Shard Tag Start End
Winter 23 Dec 21 Mar
Spring 22 Mar 21 Jun
Summer 21 Jun 23 Sep
Fall 24 Sep 22 Dec

Shard Tags

Winter Spring Summer Fall

mongod mongod mongod mongod

34
Hash-Sharding

Hash Range Hash Range Hash Range Hash Range


0000..4444 4445..8000 i8001..aaaa aaab..ffff

mongod mongod mongod mongod

35
Hashed shard key
• Pros:
– Evenly distributed writes
• Cons:
– Random data (and index) updates can be IO
intensive
– Range-based queries turn into scatter gather
mongos

Shard 1 Shard 2 Shard 3 Shard N


36
Range sharding document
distribution

37
Hashed sharding document
distribution

38
How do I Pick A Shard Key
Shard Key characteristics

• A good shard key has:


– sufficient cardinality
– distributed writes
– targeted reads ("query isolation")
• Shard key should be in every query if possible
– scatter gather otherwise
• Choosing a good shard key is important!
– affects performance and scalability
– changing it later is expensive

40
Low cardinality shard key

• Induces "jumbo chunks"


• Examples: boolean field

mongos
[ a, b )

Shard 1 Shard 2 Shard 3 Shard N

41
Ascending shard key

• Monotonically increasing shard key values


cause "hot spots" on inserts
• Examples: timestamps, _id

mongos

[ ISODate(…), $maxKe

Shard 1 Shard 2 Shard 3 Shard N

42
Reasons to Shard
Reasons to shard

• Scale
– Data volume
– Query volume

• Global deployment with local writes


– Geography aware sharding

• Tiered Storage

• Fast backup restore


44
Global Deployment/Local Writes

Primary:LON

Secondary:NYC

Secondary:SYD
Primary:NYC

Secondary:LON

Secondary:SYD

Primary:SYD

Secondary:LON

Secondary:NYC

45
Tiered Storage
• Save hardware costs
• Put frequently accessed documents on fast
servers
– Infrequently accessed documents on less capable
servers
• Use Tag aware sharding

Current Current Archive Archive

mongod mongod mongod mongod

SSD SSD HDD HDD


46
Fast Restore

• 40 TB Database
• 2 shards of 20 TB each
• Challenge
– Cannot meet restore SLA after data loss

mongod mongod

20 TB 20 TB

47
Fast Restore

• 40 TB Database
• 4 shards of 10 TB each
• Solution
– Reduce the restore time by 50%

mongod mongod mongod mongod

10 TB 10 TB 10 TB 10 TB

48
Summary
Determining the # of shards
• To determine required # of shards determine
– Storage requirements
– Latency requirements
– Throughput requirements

• Derive total
– Disk capacity
– Disk throughput
– RAM

• Calculate # of shards based upon individual


50
server specs
Leverage Sharding For

• Scalability

• Geo-aware clusters

• Tiered Storage

• Reduce backup restore times

51
Sharding: Where to go from here…

• MongoDB Manual:
http://docs.mongodb.org/manual/sharding/

• Other Webinars:
– How to Achieve Scale With MongoDB

• White Papers
– MongoDB Performance Best Practices
– MongoDB Architecture Guide

52
Thank You

You might also like