Schema Tricks & Tips

#MongoDBdays
Schema Design 4 Real World Use Cases

Alvin Richards
Technical Director, 10gen @jonnyeight alvin@10gen.com alvinonmongodb.com
One size ts all?
Agenda
Why is schema design important 4 Real World Schemas
Inbox History Indexed Attributes Multiple Identities
Conclusions
Single Table En
Why is Schema Design important?

Largest factor for a performant system Schema design with MongoDB is different

RBMS "What answers do I have?" MongoDB "What question will I have?"
#1 - Message Inbox
Lets get
Social
Sending Messages
Design Goals
Efciently send new messages to recipients Efciently read inbox
Reading my Inbox
3 Approaches (there are more)

Fan out on Read Fan out on Write Fan out on Write with Bucketing
Fan out on read

// Shard on "from" db.shardCollection( "mongodbdays.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )
Fan out on read Send Message

Send Message
Shard 1
Shard 2
Shard 3
Fan out on read Inbox Read

Read Inbox
Shard 1
Shard 2
Shard 3
Considerations
1 document per message sent Multiple recipients in an array key Reading an inbox is nding all messages with my
own name in the recipient eld
Requires scatter-gather on sharded cluster Then a lot of random IO on a shard to nd
everything
Fan out on write

// Shard on recipient and sent db.shardCollection( "mongodbdays.inbox", { recipient: 1, sent: 1 } ) msg = { from: "Joe, to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for ( recipient in msg.to ) { msg.recipient = recipient db.inbox.save( msg ); } // Read my inbox db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )
Fan out on write Send Message

Send Message
Shard 1
Shard 2
Shard 3
Fan out on write Read Inbox

Read Inbox
Shard 1
Shard 2
Shard 3
Considerations
1 document per recipient Reading my inbox is just nding all of the
messages with me as the recipient

Can shard on recipient, so inbox reads hit one
shard
But still lots of random IO on the shard
Fan out on write with buckets

Each inbox document is an array of messages Append a message onto inbox of recipient Bucket inbox documents so theres not too many
per document shard
Can shard on recipient, so inbox reads hit one A few documents to read the whole inbox
Fan out on write with buckets

// Shard on owner / sequence db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( "mongodbdays.users", { user_name: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ], sent: new Date(), message: "Hi!", } // Send a message for( recipient in msg.to) { count = db.users.findAndModify({ query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50); db.inbox.update( { owner: msg.to[recipient], sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } ); } // Read my inbox db.inbox.find( { owner: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 )
Bucketed fan out on write - Send

Send Message
Shard 1
Shard 2
Shard 3
Bucketed fan out on write - Read

Read Inbox
Shard 1
Shard 2
Shard 3
#2 History
Design Goals
Need to retain a limited amount of history e.g.
Hours, Days, Weeks May be legislative requirement (e.g. HIPPA, SOX, DPA)
Need to query efciently by

match ranges

Bucket by Number of messages Fixed size Array Bucket by Date + TTL Collections
Inbox Bucket by # messages

db.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, ]} // Query with a date range db.inbox.find ( { owner: "friend1", messages: { $elemMatch: { sent: { $gte: ISODate("2013-04-04 ") }}}}) // Remove elements based on a date db.inbox.update( { owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("2013-04-04 ") } } } } )
Considerations
Shrinking documents, space can be reclaimed with
db.runCommand ( { compact: '<collection>' } )
Removing the document after the last element in
the array as been removed

{ "_id" : , "messages" : [ ], "owner" : "friend1",
"sequence" : 0 }
Maintain the latest Fixed Size Array

msg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" } // 2.4 Introduces $each, $sort and $slice for $push db.messages.update( { _id: 1 }, { $push: { messages: { $each: [ msg ], $sort: { sent: 1 }, $slice: -50 } } )
Considerations
Need to compute the size of the array based on
retention period
TTL Collections
// messages: one doc per user per day db.inbox.findOne() { _id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ]
// Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } )
#3 Indexed Attributes
Design Goal
Application needs to stored a variable number of
attributes e.g.
User dened Form Meta Data tags
Queries needed
Equality Range based
Need to be efcient, regardless of the number of
attributes

Attributes Attributes as Objects in an Array
Attributes as a Sub-Document
db.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("2013-03-01T09:59:42.689Z" } } ) db.files.insert( { _id:"local.1", attr: { type: "text", size: 128} } ) db.files.insert( { _id:"mongod", attr: { type: "binary", size: 256, created: ISODate("2013-04-01T18:13:42.689Z") } } ) // Need to create an index for each item in the sub-document db.files.ensureIndex( { "attr.type": 1 } ) db.files.find( { "attr.type": "text"} )
// Can perform range queries db.files.ensureIndex( { "attr.size": 1 } ) db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )
Considerations
Each attribute needs an Index Each time you extend, you add an index Lots and lots of indexes
Attributes as Objects in Array

db.files.insert( { _id: "local.0", attr: [ { type: "text" }, { size: 64 }, { created: ISODate("2013-03-01T09:59:42.689Z" } ] } ) db.files.insert( { _id: "local.1", attr: [ { type: "text" }, { size: 128 } ] } ) db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("2013-04-01T18:13:42.689Z") } ] } ) db.files.ensureIndex( { attr: 1 } )
Queries
// Range queries db.files.find( { attr: { $gt: { size:64 }, $lte: { size: 16384 } } } ) db.files.find( { attr: { $gte: { created: ISODate("2013-02-01T00:00:01.689Z") } } } ) // Multiple condition Only the first predicate on the query can use the Index // ensure that this is the most selective. // Index Intersection will allow multiple indexes, see SERVER-3071 db.files.find( { $and: [ { attr: { $gte: { created: ISODate("2013-02-01T ") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ]}) // Each $or can use an index db.files.find( { $or: [ { attr: { $gte: { created: ISODate("2013-02-01T ") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ]})
#4 Multiple Identities
Design Goal
Ability to look up by a number of different
identities e.g.

Username Email address FB Handle LinkedIn URL

Identiers in a single document Separate Identiers from Content
Single Document by User

db.users.findOne() { _id: "joe", email: "joe@example.com, fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: { } } // Shard collection by _id db.shardCollection("mongodbdays.users", { _id: 1 } ) // Create indexes on each key db.users.ensureIndex( { email: 1} ) db.users.ensureIndex( { fb: 1 } ) db.users.ensureIndex( { li: 1 } )
Read by _id (shard key)

nd( { _id: "joe"} )
Shard 1
Shard 2
Shard 3
Read by email (non-shard key)

nd ( { email: joe@example.com } )
Shard 1
Shard 2
Shard 3
Considerations
Lookup by shard key is routed to 1 shard Lookup by other identier is scatter gathered
across all shards
Secondary keys cannot have a unique index
Document per Identity

// Create unique index db.identities.ensureIndex( { identifier : 1} , { unique: true} ) // Create a document for each db.identities.save( { identifier db.identities.save( { identifier db.identities.save( { identifier users document : { hndl: "joe" }, user: "1200-42" } ) : { email: "joe@example.com" }, user: "1200-42" } ) : { li: "joe.e.smith" }, user: "1200-42" } )
// Shard collection by _id db.shardCollection( "mongodbdays.identities", { identifier : 1 } ) // Create unique index db.users.ensureIndex( { _id: 1} , { unique: true} ) // Create a docuemnt that holds all the other user attributes db.users.save( { _id: "1200-42", ... } ) // Shard collection by _id db.shardCollection( "mongodbdays.users", { _id: 1 } )
Read requires 2 reads

db.identities.nd({"identier" : { "hndl" : "joe" }}) db.users.nd( { _id: "1200-42"} )
Shard 1
Shard 2
Shard 3
Solution
Lookup to Identities is a routed query Lookup to Users is a routed query Unique indexes available
Conclusion
Summary
Multiple ways to model a domain problem Understand the key uses cases of your app Balance between ease of query vs. ease of write Random IO should be avoided
#MongoDBdays
Thank You
Alvin Richards
@jonnyeight Technical Director, 10gen alvin@10gen.com alvinonmongodb.com

Schema Tricks &amp; Tips

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Schema Tricks &amp; Tips

Uploaded by

Copyright:

Available Formats

#MongoDBdays

Schema Design 4 Real World Use Cases

One size ts all?

Why is Schema Design important?

RBMS "What answers do I have?" MongoDB "What question will I have?"

3 Approaches (there are more)

Fan out on read

Fan out on read Send Message

Fan out on read Inbox Read

own name in the recipient eld

Requires scatter-gather on sharded cluster Then a lot of random IO on a shard to nd

Fan out on write

Fan out on write Send Message

Fan out on write Read Inbox

messages with me as the recipient

Fan out on write with buckets

per document shard

Fan out on write with buckets

Bucketed fan out on write - Send

Bucketed fan out on write - Read

Need to query efciently by

3 Approaches (there are more)

Inbox Bucket by # messages

Removing the document after the last element in

the array as been removed

Maintain the latest Fixed Size Array

User dened Form Meta Data tags

Need to be efcient, regardless of the number of

2 Approaches (there are more)

Attributes as Objects in Array

Username Email address FB Handle LinkedIn URL

2 Approaches (there are more)

Single Document by User

Read by _id (shard key)

Read by email (non-shard key)

across all shards

Secondary keys cannot have a unique index

Document per Identity

Read requires 2 reads

You might also like

Schema Tricks & Tips

Schema Tricks & Tips