P. 1
Mongo Scaling

Mongo Scaling

|Views: 273|Likes:
Published by Alvin John Richards
Given at : FrOSCon
In this talk we will cover Schema Design, Replication & Sharding and how they affect building a scalable system with MongoDB. First we will cover the basic concepts and then go over some sample use cases and discuss the alternatives. After this talk you should leave with some great strategies for building your application in MongoDB that will scale as you need it to.
Given at : FrOSCon
In this talk we will cover Schema Design, Replication & Sharding and how they affect building a scalable system with MongoDB. First we will cover the basic concepts and then go over some sample use cases and discuss the alternatives. After this talk you should leave with some great strategies for building your application in MongoDB that will scale as you need it to.

More info:

Published by: Alvin John Richards on Aug 21, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

09/28/2011

pdf

text

original

Easy to Start, Easy to Develop, Easy to Scale Alvin Richards Senior Director Enterprise Engineering alvin@10gen.

com

Sunday, August 21, 2011

1

Topics we will cover fast!
• Vertical Scaling • Horizontal Scaling with MongoDB

• Schema & Index design • Auto Sharding • Replication

Sunday, August 21, 2011

2

Scaling

• Operations/sec go up • Storage needs go up • Capacity • IOPs • Complexity goes up • Caching
Sunday, August 21, 2011 3

How do you scale now?
• Optimization & Tuning • Schema & Index Design • O/S tuning • Hardware configuration
$$$

• Vertical scaling • Hardware is expensive • Hard to scale in cloud
throughput
Sunday, August 21, 2011 4

MongoDB Scaling - Single Node
read

node_a1

write

Sunday, August 21, 2011

5

Read scaling - add Replicas
read

node_b1 node_a1

write

Sunday, August 21, 2011

6

Read scaling - add Replicas
read

node_c1 node_b1 node_a1

write

Sunday, August 21, 2011

7

Write scaling - Sharding
read
shard1
node_c1 node_b1 node_a1

write

Sunday, August 21, 2011

8

Write scaling - add Shards
read
shard1
node_c1 node_b1 node_a1

shard2
node_c2 node_b2 node_a2

write

Sunday, August 21, 2011

9

Write scaling - add Shards
read
shard1
node_c1 node_b1 node_a1

shard2
node_c2 node_b2 node_a2

shard3
node_c3 node_b3 node_a3

write

Sunday, August 21, 2011

10

Scaling with MongoDB
• Schema & Index Design • Sharding • Replication

Sunday, August 21, 2011

11

Schema
• Data model effects performance

• Embedding versus Linking

• Partial versus full document writes • Partial versus full document reads
• Schema and Schema usage critical for scaling and
perfromance

• Roundtrips to database • Disk seek time • Size of data to read & write

Sunday, August 21, 2011

12

Indexes
• Index common queries • Do not over index

• Right-balanced indexes keep working set small

•(A) and (A,B) are equivalent, choose one

Sunday, August 21, 2011

13

Query for {a: 7}
With  Index
[-­‐

∞,  5)

[5,  10)

[10,  

∞)

[-­‐

∞,  5)  buckets

[5,  7)

[7,  9)

[9,  10)

[10,  

∞)  buckets

{...}  {...}  {...}  {...}  {...}  {...}  {...}  {...}  {...}  {...}  {...}

Without  index  -­‐  Scan
Sunday, August 21, 2011 14

Indexing Embedded Documents & Multikeys
db.posts.save({    title:        “My  First  blog”,    tags:          [“mongodb”,  “cool”],    comments:  [          {author:  “James”,  ts  :  new  Date()}  ] }); db.posts.ensureIndex({“tags”:  1})   db.posts.ensureIndex({“comments.author”:  1})

Sunday, August 21, 2011

15

Picking an a Index
find({x:  10,  y:  “foo”})

   scan terminate    index  on  x

   index  on  y

remember

Sunday, August 21, 2011

16

What is Sharding
• Ad-hoc partitioning • Consistent hashing

• Amazon Dynamo • Google BigTable • Yahoo! PNUTS • MongoDB

• Range based partitioning

Sunday, August 21, 2011

17

MongoDB Sharding
• Automatic partitioning and management • Range based • Convert to sharded system with no downtime • Fully consistent

Sunday, August 21, 2011

18

How MongoDB Sharding works
>  db.runCommand(  {  addshard  :  "shard1"  }  );
>  db.runCommand(        {  shardCollection  :  “mydb.blogs”,            key  :  {  age  :  1}  }  )

-∞   +∞  

•Range keys from -∞ to +∞   •Ranges are stored as “chunks”
Sunday, August 21, 2011 19

How MongoDB Sharding works
>  db.posts.save(  {age:40}  )

-∞   +∞   -∞   40 41 +∞  

•Data in inserted •Ranges are split into more “chunks”
Sunday, August 21, 2011 20

How MongoDB Sharding works
>  db.posts.save(  {age:40}  ) >  db.posts.save(  {age:50}  )

-∞   +∞   -∞   40 41 +∞   51 +∞  

41 50

•More Data in inserted •Ranges are split into more“chunks”
Sunday, August 21, 2011 21

How MongoDB Sharding works
>  db.posts.save(  {age:40}  ) >  db.posts.save(  {age:50}  ) >  db.posts.save(  {age:60}  )

-∞   +∞   -∞   40 41 +∞   51 +∞   61 +∞  
22

41 50

51 60
Sunday, August 21, 2011

How MongoDB Sharding works
>  db.posts.save(  {age:40}  ) >  db.posts.save(  {age:50}  ) >  db.posts.save(  {age:60}  )

-∞   +∞   -∞   40 41 +∞   51 +∞   61 +∞  
23

41 50

51 60
Sunday, August 21, 2011

How MongoDB Sharding works

shard1 -∞   40 41 50 51 60 61 +∞  
Sunday, August 21, 2011 23

How MongoDB Sharding works
>  db.runCommand(  {  addshard  :  "shard2"  }  );

-∞   40 41 50 51 60 61 +∞  
Sunday, August 21, 2011 24

How MongoDB Sharding works
>  db.runCommand(  {  addshard  :  "shard2"  }  );

shard1 -∞   40 41 50 51 60 61 +∞  
Sunday, August 21, 2011 24

How MongoDB Sharding works
>  db.runCommand(  {  addshard  :  "shard2"  }  );

shard1 -∞   40

shard2 41 50

51 60 61 +∞  
Sunday, August 21, 2011 24

How MongoDB Sharding works
>  db.runCommand(  {  addshard  :  "shard2"  }  ); >  db.runCommand(  {  addshard  :  "shard3"  }  );

shard1 -∞   40

shard2 41 50

shard3

51 60 61 +∞  
Sunday, August 21, 2011 24

Sharding Features
• Shard data without no downtime • Automatic balancing as data is written • Commands routed (switched) to correct node

• Inserts - must have the Shard Key • Updates - must have the Shard Key • Queries • Indexed Queries

• With Shard Key - routed to nodes • Without Shard Key - scatter gather • With Shard Key - routed in order • Without Shard Key - distributed sort merge
25

Sunday, August 21, 2011

MongoDB Replication
• MongoDB replication like MySQL replication

•Asynchronous master/slave
• Variations:

•Master / slave •Replica Sets

Sunday, August 21, 2011

26

Replica Set features
• A cluster of N servers • Any (one) node can be primary • Consensus election of primary • Automatic failover • Automatic recovery • All writes to primary • Reads can be to primary (default) or a secondary

Sunday, August 21, 2011

27

How MongoDB Replication works
Member  1 Member  3

Member  2

•Set is made up of 2 or more nodes

Sunday, August 21, 2011

28

How MongoDB Replication works
Member  1 Member  3

Member  2 PRIMARY

•Election establishes the PRIMARY •Data replication from PRIMARY to SECONDARY
Sunday, August 21, 2011 29

How MongoDB Replication works
Member  1 negotiate   new  master Member  3

Member  2 DOWN

•PRIMARY may fail •Automatic election of new PRIMARY
Sunday, August 21, 2011 30

How MongoDB Replication works
Member  1

Member  3 PRIMARY

Member  2 DOWN

•New PRIMARY elected •Replication Set re-established
Sunday, August 21, 2011 31

How MongoDB Replication works
Member  1

Member  3 PRIMARY

RECOVERING

Member  2

•Automatic recovery

Sunday, August 21, 2011

32

How MongoDB Replication works
Member  1

Member  3 PRIMARY

Member  2

•Replication Set re-established

Sunday, August 21, 2011

33

Creating a Replica Set
>  cfg  =  {        _id  :  "acme_a",        members  :  [            {  _id  :  0,  host  :  "sf1.acme.com"  },            {  _id  :  1,  host  :  "sf2.acme.com"  },            {  _id  :  2,  host  :  "sf3.acme.com"  }  ]  } >  use  admin >  db.runCommand(  {  replSetInitiate  :  cfg  }  )

Sunday, August 21, 2011

34

Replica Set Member Types
• Normal {priority:1} • Passive {priority:0} • Arbiters

• Cannot be elected as PRIMARY • Can vote in an election • Do not hold any data

• Hidden {hidden:True} • Tagging - New in 2.0 • tags  :  {"dc":  "ny"},  "rack":  "r23s5"}
Sunday, August 21, 2011 35

Using Replicas

slaveOk() - driver will send read requests to Secondaries - driver will always send writes to Primary Java examples -­‐  DB.slaveOk() -­‐  Collection.slaveOk()

-­‐  find(q).addOption(Bytes.QUERYOPTION_SLAVEOK);

Sunday, August 21, 2011

36

Safe Writes
•  db.runCommand({getLastError:  1,  w  :  1})
- ensure write is synchronous - command returns after primary has written to memory

•  w=n  or  w='majority'

- n is the number of nodes data must be replicated to - driver will always send writes to Primary

• w='myTag' [MongoDB 2.0]

- Each member is "tagged" e.g. "US_EAST", "EMEA", "US_WEST" - Ensure that the write is executed in each tagged "region"

• fsync:true
- Ensures changed disk blocks are flushed to disk

• j:true

- Ensures changes are flush to Journal
Sunday, August 21, 2011 37

Replication features

• Reads from Primary are always consistent • Reads from Secondaries are eventually consistent • Automatic failover if a Primary fails • Automatic recovery when a node joins the set

Sunday, August 21, 2011

38

Scaling Use Case
• User profile information • Multiple ways to identify a "user"

• Facebook ID • Twitter Name • Email address • SSN# / National Identifier
• What is the best schema, index and sharding strategy?

Sunday, August 21, 2011

39

Schema #1
>  db.profiles.save( {  _id  :  "  facebook_name  :  "alvin.j.richards",  twitter_name  :  "jonnyeight",  linkedin_name  :  "alvinrichards",  details  :  {  loc:  [50.78076,7.181969],  ...} }) >  db.profiles.ensureIndex({facebook_name:1}) >  db.runCommand(        {  shardCollection  :  “social.profiles”,            key  :  {  facebook_name  :  1}  }  )

Sunday, August 21, 2011

40

Schema #1
Good: • Schema is simple to understand • Easy to add new identifiers, e.g. foursquare name • Query is routed to a shard
db.profiles.find({facebook_name:  "alvin.j.richards"})

Bad: • Each identifier needs a separate index • More indexes means less data in memory • Memory contention and disk paging • Query is scatter/gathered across cluster
db.profiles.find({linkedin_name:"alvinrichards"})

Sunday, August 21, 2011

41

Schema #2
>  db.profiles.save( {  _id  :  ObjectId("1234")  details  :  {loc:  [50.78076,7.181969],  ...}}) >  db.identfiers.save( {  _id  :  {type:  "facebook_name",  value:  "alvin.j.richards},    profile:  ObjectId("1234")}) >  db.identfiers.save( {  _id  :  {type:  "twitter_name",  value:  "jonnyeight},  profile:  ObjectId("1234")}) >  db.runCommand(        {  shardCollection  :  “social.identifiers”,            key  :  {  _id  :  1}  }  ) >  db.runCommand(        {  shardCollection  :  “social.profiles”,            key  :  {  _id  :  1}  }  )
Sunday, August 21, 2011 42

Schema #2
Good: • Easy to add new identifiers, e.g. foursquare name • All query are routed to a shard >  db.profiles.find(
{_id  :  {type:  "facebook_name":  value:  "alvin.j.richards"}})

>  db.profiles.find(
{_id  :  {type:  "foursquare_id":  value:  "alvin10gen"}})

Bad: • Schema is more complex • Two lookups are required for each access (but both routed) • Need to maintain links (data relationships)

Sunday, August 21, 2011

43

Summary
• Schema & Index design • Simplest way to scale •Sharding • Automatically scale writes •Replication •Automatically scale reads
Sunday, August 21, 2011 44

download at mongodb.org
alvin@10gen.com MongoDB Munich, Germany - October 10
conferences,  appearances,  and  meetups
http://www.10gen.com/events

http://bit.ly/mongoW  

Facebook                    |                  Twitter                  |                  LinkedIn
@mongodb

http://linkd.in/joinmongo

Sunday, August 21, 2011

45

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->