Professional Documents
Culture Documents
Agenda
18:00 - 18:15 : Why MongoDB 18:15 - 18:45 : Schema Design 18:45 - 19:00 : Break 19:00 - 19:45 : Scaling 19:45 - 20:00 : Q & A 20:00 - 22:00 : After Party!
Google/DoubleClick, Oracle, Apple, NetApp NYC, Palo Alto, London, Dublin & Sydney 110+ employees
Todays challenges
sharding
remove
transactions productivity
Why we exist
Finance
Gaming
Infrastructure
Real-time Analytics
Media
Mobile
Topics
Schema design is easy! Data as Objects in code Common patterns Single table inheritance One-to-Many & Many-to-Many Buckets Trees Queues Inventory
Terminology
RDBMS Table Row(s) Index Join Partition Partition
Key MongoDB Collection JSON
Document Index Embedding
&
Linking Shard Shard
Key
embedding
embedding
linking
Design Session
Design documents that simply map to your application
>
post
=
{author:
"Herg",
date:
ISODate("2011-09-18T09:56:06.298Z"),
text:
"Destination
Moon",
tags:
["comic",
"adventure"]} >
db.posts.insert(post)
Query operators
Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...
http://nysi.org.uk/kids_stuff/rocket/rocket.htm
> db.posts.update( {text: "Destination Moon" }, { "$push": {comments: new_comment}, "$inc": {comments_count: 1}})
Common Patterns
Common Patterns
http://www.ickr.com/photos/colinwarren/158628063
Inheritance
http://www.ickr.com/photos/dysonstarr/5098228295
Inheritance
circle 3.14
square 4
rect
10
// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}}) // create index > db.shapes.ensureIndex({radius: 1}, {sparse:true})
One to Many
http://www.ickr.com/photos/j-sh/6502708899/
One to Many
One to Many relationships can specify degree of association between objects containment life-cycle
Many to Many
http://www.ickr.com/photos/pats0n/6013379192
Many - Many
Example:
- Product can be in many categories - Category can have many products
Many - Many
products:
{
_id:
10,
name:
"Destination
Moon",
category_ids:
[
20,
30
]
}
categories:
{
_id:
20,
name:
"comic",
product_ids:
[
10,
11,
12
]
} categories:
{
_id:
21,
name:
"movie",
product_ids:
[
10
]
}
Many - Many
products:
{
_id:
10,
name:
"Destination
Moon",
category_ids:
[
20,
30
]
}
categories:
{
_id:
20,
name:
"comic",
product_ids:
[
10,
11,
12
]
} categories:
{
_id:
21,
name:
"movie",
product_ids:
[
10
]
} //All
categories
for
a
given
product >
db.categories.find({product_ids:
10})
Alternative
products:
{
_id:
10,
name:
"Destination
Moon",
category_ids:
[
20,
30
]
}
categories:
{
_id:
20,
name:
"comic"}
Alternative
products:
{
_id:
10,
name:
"Destination
Moon",
category_ids:
[
20,
30
]
}
categories:
{
_id:
20,
name:
"comic"} //
All
products
for
a
given
category >
db.products.find({category_ids:
20)})
//
All
categories
for
a
given
product product
=
db.products.find(_id
:
some_id) >
db.categories.find({_id
:
{$in
:
product.category_ids}})
Trees
http://www.ickr.com/photos/cubagallery/5949819558
Trees
Hierarchical information
Trees
Full Tree in Document
{
comments:
[
{
author:
Kyle,
text:
...,
replies:
[
{author:
James,
text:
...,
replies:
[]}
]}
] }
Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 16MB limit
Array of Ancestors
B E
C D F
- Store all Ancestors of a node { _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" }
Array of Ancestors
B E
C D F
- Store all Ancestors of a node { _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" } // find all threads where "b" is in > db.posts.find({thread: "b"}) // find replies to "e" > db.posts.find({replyTo: "e"}) // find history of "f" > threads = db.posts.findOne( {_id:"f"} ).thread > db.posts.find( { _id: { $in : threads } )
Trees as Paths
Store hierarchy as a path expression - Separate each node by a delimiter, e.g. / - Use text search for nd parts of a tree
{
comments:
[
{
author:
"Kyle",
text:
"initial
post",
path:
"/"
},
{
author:
"Jim",
text:
"jims
comment",
path:
"/jim"
},
{
author:
"Kyle",
text:
"Kyles
reply
to
Jim",
path
:
"/jim/kyle"}
]
} //
Find
the
conversations
Jim
was
part
of
>
db.posts.find({path:
/^jim/})
Queue
http://www.ickr.com/photos/deanspic/4960440218
Queue
Need to maintain order and state Ensure that updates are atomic
db.jobs.save(
{
inprogress:
false,
priority:
1,
...
}); //
find
highest
priority
job
and
mark
as
in-progress job
=
db.jobs.findAndModify({
query:
{inprogress:
false},
sort:
{priority:
-1},
update:
{$set:
{inprogress:
true,
started:
new
Date()}},
new:
true})
User has a number of "votes" they can use A nite stock that you can "sell" A resource that can be "provisioned"
Inventory
User has a number of "votes" they can use A nite stock that you can "sell" A resource that can be "provisioned"
//
Number
of
votes
and
who
the
user
voted
for
{
_id:
"alvin",
votes:
42,
voted_for:
[]
}
//
Subtract
a
vote
and
add
the
blog
voted
for
db.user.update(
{
_id:
"alvin",
votes
:
{
$gt
:
0},
voted_for:
{$ne:
"Destination
Moon"
},
{
"$push":
{voted_for:
"Destination
Moon"},
"$inc":
{votes:
-1}})
Large, deeply nested documents One size ts all collections One collection per user
Too many indexes; wrong keys indexed Frequent queries do not use index
Summary
Schema design is different in MongoDB Basic data design principals stay the same Focus on how the application manipulates data Rapidly evolve schema to meet your requirements Enjoy your new freedom, use it wisely :-)
Part 3 - Scaling
Scaling
shard1
A-Z
write
shard1
shard2
A-M
N-Z
write
shard1
shard2
shard3
A-H
I-Q
R-Z
write
shard1
3:1 Data/Mem
write
shard1
shard2
shard3
A-H
I-Q
R-Z
1:1 Data/Mem
write
Replication
read
300 GB Data
A-Z
write
Replication
read
300 GB Data
write
Sharding internals
Large Dataset
Primary Key as username s t
MongoDBs Sharding handle the scale problem by chunking Break up pieces of data into smaller chunks, spread across
many data nodes Each data node contains many chunks If a chunk gets too large or a node overloaded, data can be rebalanced
Large Dataset
Primary Key as username s t u v w x y z
Scaling
Data Node 1 Large Dataset Data Node 2 Data Node 3
25% of chunks
Data Node 4
25% of chunks
Representing data as chunks allows many levels of scale across n data nodes
Scaling
Data Node 1 Data Node 2 Data Node 3 Data Node 4 5 Data Node
Data Node 2
b u z g
Data Node 3
t e f w
Data Node 4
v h y d
Data Node 5
The goal is equilibrium - an equal distribution. As nodes are added (or even removed)
chunks can be redistributed for balance.
Data Node 2
z g
Data Node 3
t e f
Data Node 4
v h d
Data Node 5
w b y x
Data Node 2
z g
Data Node 3
t e f
Data Node 4
v h d
Data Node 5
w b y x
Write to keyziggy
Data Node 2
z g
Data Node 3
t e f
Data Node 4
v h d
Data Node 5
w b y x
Write to keyziggy
Data Node 2
z g
Data Node 3
t e f
Data Node 4
v h d
Data Node 5
w b y x
Data Node 2
z g
Data Node 3
t e f
Data Node 4
v h d
Data Node 5
w b y x
Data Node 2
z g
Data Node 3
t e f
Data Node 4
v h d
Data Node 5
w b y x
z1
z2
Data Node 2
z g
Data Node 3
t e f
Data Node 4
v h d
Data Node 5
w b y x
z1
z2
Data Node 2
z2 u z1 g
Data Node 3
t e f
Data Node 4
v h d
Data Node 5
w b y x
Each new part of the Z chunk (left & right) now contains half of the keys
Data Node 2
z1 g
Data Node 3
t e f
Data Node 4
v h z2 d
Data Node 5
w b y x
As chunks continue to grow and split, they can be rebalanced to keep an equal share of data on each server.
Data Node 2
z1 g
Data Node 3
t e f
Data Node 4
v h z2 d
Data Node 5
w b y x
Data Node 2
z1 g
Data Node 3
t e f
Data Node 4
v h z2 d
Data Node 5
w b y x
Data Node 2
z1 g
Data Node 3
t e f
Data Node 4
v h z2 d
Data Node 5
w b y x
Summary
Scaling is simple Add capacity before you need it System automatically re-balances your data No downtime to add capacity No code changes required
download at mongodb.org
alvin@10gen.com
http://bit.ly/mongo>
http://linkd.in/joinmongo