You are on page 1of 90

1

!ared 8osoñ
[for[ared
Cpen source, hlgh performance daLabase
2
Cha||enges
• varlably Lyped daLa
• Complex ob[ecLs
• Plgh Lransacuon raLes
• Large daLa slze
• Plgh avallablllLy
Cpportun|nes
• AglllLy
• CosL reducuon
• Slmpllñcauon
3
VCLUML AND 1¥ÞL
CI DA1A
AGILL DLVLLCÞMLN1
• SysLems scallng horlzonLally,
noL verucally
• CommodlLy servers
• Cloud Compuung
• 1rllllons of records
• 10's of mllllons of querles
per second
• volume of daLa
• Seml-sLrucLured and
unsLrucLured daLa
• lLerauve and conunuous
• new and emerglng apps
NLW AkCnI1LC1UkLS
4
DLVLLCÞLk ÞkCDUC1IVI1¥ DLCkLASLS
• needed Lo add new soûware layers of C8M, Cachlng,
Shardlng and Message Cueue
• Þolymorphlc, seml-sLrucLured and unsLrucLured daLa
noL well supporLed
CCS1 CI DA1A8ASL INCkLASLS
• lncreased daLabase llcenslng cosL
• verucal, noL horlzonLal, scallng
• Plgh cosL of SAn
LAunCP
+30 uA?S
+90 uA?S
+6 MCn1PS
+1 ?LA8
Þ8C!LC1
S1A81
uLnC8MALlZL
uA1A MCuLL
S1CÞ uSlnC
!ClnS
CuS1CM
CACPlnC LA?L8
CuS1CM
SPA8ulnC
5
Pow we goL here
6
• varlably Lyped daLa
• Complex daLa ob[ecLs
• Plgh Lransacuon raLes
• Large daLa slze
• Plgh avallablllLy
• Aglle developmenL
7
• MeLadaLa managemenL
• LAv anu-pauern
• Sparse Lable anu-
pauern
8
Lnnty Aur|bute Va|ue
1 1ype 8ook
1 1lLle lnslde lnLel
1 AuLhor Andy Crove
2 1ype LapLop
2 ManufacLur
er
uell
2 Model lnsplron
2 8AM 8gb
3 1ype Cereal
Difficult to query
Difficult to index
Expensive to query
9
1ype 1|t|e Author Manufacturer Mode| kAM Screen
W|dth
8ook lnslde lnLel Andy Crove
LapLop uell lnsplron 8gb 12"
1v Þanasonlc vlera 32"
MÞ3 Margln Walker lugazl ulschord
Constant schema changes
Space inefficient
Overloaded fields
10
Type Manufacturer Model RAM Screen
Width
Laptop Dell Inspiron 8gb 12”
Manufacturer Model Screen
Width
Panasonic Viera 52”
Title Author Manufacturer
Margin Walker Fugazi Dischord
Querying is difficult
Hard to add more types
11
Complex
objects
12
Cha||enges
• ConsLanL load from cllenL
• lar ln excess of slngle server
capaclLy
• Can never Lake Lhe sysLem
down
Database
Query
Query
Query
Query
Query
Query
Query Query
13
Cha||enges
• Addlng more sLorage over ume
• Aglng ouL daLa LhaL's no longer needed
• Mlnlmlzlng resource overhead of ºcold" daLa
Fast Storage Archival Storage
Recent Data Old Data
Add Capacity
14
15
16
• Rigid schemas
Variably typed
data
• Normalization can be hard
• Dependent on joins
Complex Objects
• Vertical scaling
• Poor data locality
High transaction
rate
• Difficult to maintain consistency & HA
• HA a bolt-on to many RDBMS
High Availability
• Schema changes
• Monolithic data model
Agile
Development
17
A new daLa model
18

var posL = [ auLhor: º!ared",
daLe: new uaLe(),
LexL: ºnoSCL now 2012",
Lags: [ºnoSCL", ºMongou8"]}

> db.posLs.save(posL)
19
>db.posLs.ñnd()

[ _ld : Cb[ecLld("4c4ba3c0672c683e3e8aabf3"),
auLhor : "!ared",
daLe : "SaL !ul 24 2010 19:47:11 CM1-0700 (Þu1)",
LexL : "noSCL now 2012",
Lags : [ "noSCL", "Mongou8" ] }

noLes:
- _ld ls unlque, buL can be anyLhlng you'd llke
20
CreaLe lndex on any lleld ln uocumenL

// 1 means ascendlng, -1 means descendlng

>db.posLs.ensurelndex([auLhor: 1})

>db.posLs.ñnd([auLhor: '!ared'})

[ _ld : Cb[ecLld("4c4ba3c0672c683e3e8aabf3"),
auLhor : "!ared",
... }
21
• Condluonal CperaLors
– $all, $exlsLs, $mod, $ne, $ln, $nln, $nor, $or, $slze, $Lype
– $lL, $lLe, $gL, $gLe

// ñnd posLs wlLh any Lags
> db.posLs.ñnd( [Lags: [$exlsLs: Lrue }} )

// ñnd posLs maLchlng a regular expresslon
> db.posLs.ñnd( [auLhor: /^!ar*/l } )

// counL posLs by auLhor
> db.posLs.ñnd( [auLhor: '!ared'} ).counL()
22
• $seL, $unseL, $lnc, $push, $pushAll, $pull, $pullAll, $blL
> commenL = [ auLhor: º8rendan",
daLe: new uaLe(),
LexL: ºl wanL a freakln pony"}

> db.posLs.updaLe( [ _ld: º..." },
$push: [commenLs: commenL} ),

23
[ _ld : Cb[ecLld("4c4ba3c0672c683e3e8aabf3"),
auLhor : "!ared",
daLe : "SaL !ul 24 2010 19:47:11 CM1-0700 (Þu1)",
LexL : "noSCL now 2012",
Lags : [ "noSCL", "Mongou8" ],
commenLs : [
[
auLhor : "8rendan",
daLe : "SaL !ul 24 2010 20:31:03 CM1-0700 (Þu1)",
LexL : "l wanL a freakln pony"
}
]}
24
// lndex nesLed documenLs
> db.posLs.ensurelndex( ºcommenLs.auLhor":1 )
! db.posLs.ñnd(['commenLs.auLhor':'8rendan'})
// lndex on Lags
> db.posLs.ensurelndex( Lags: 1)
> db.posLs.ñnd( [ Lags: 'Mongou8' } )

// geospaual lndex
> db.posLs.ensurelndex( ºauLhor.locauon": º2d" )
> db.posLs.ñnd( ºauLhor.locauon" : [ $near : [22,42] } )


25
db.posts.aggregate(
[ $pro[ecL : [
auLhor : 1,
Lags : 1,
} },
[ $unwlnd : º$Lags" },
[ $group : [
_ld : [ Lags : 1 },
auLhors : [
$add1oSeL : º$auLhor"
} } }
),

26
lL's hlghly avallable
27
Primary
Secondary
Secondary
Asynchronous
Replication
Read
Write
Read
Read
D
r
i
v
e
r

28
Primary
Secondary
Secondary
Read
Read
D
r
i
v
e
r

29
Primary
Primary
Secondary
Read
Write
Read
Automatic
Leader Election
D
r
i
v
e
r

30
Secondary
Primary
Secondary
Read
Write
Read
Read
D
r
i
v
e
r

31
WlLh Lunable
conslsLency
32
Primary
Secondary
Secondary
Read
Write
D
r
i
v
e
r

33
Primary
Secondary
Secondary
Read
Write
D
r
i
v
e
r

Read
34
uurablllLy
35
• llre and forgeL
• WalL for error
• WalL for fsync
• WalL for [ournal sync
• WalL for repllcauon
36
Driver Primary
write
apply in memory
37
Driver Primary
getLastError
apply in memory
write
38
Driver Primary
getLastError
apply in memory
write
j:true
Write to journal
39
Driver Primary
getLastError
apply in memory
write
w:2
Secondary
replicate
40
Value Meaning
<n:lnLeger> 8epllcaLe Lo n members of repllca seL
ºma[orlLy" 8epllcaLe Lo a ma[orlLy of repllca seL
members
<m:modename> use cusLom error mode name
41
{ _id: “someSet”,
members: [
{ _id:0, host:”A”, tags: { dc: “ny”}},
{ _id:1, host:”B”, tags: { dc: “ny”}},
{ _id:2, host:”C”, tags: { dc: “sf”}},
{ _id:3, host:”D”, tags: { dc: “sf”}},
{ _id:4, host:”E”, tags: { dc: “cloud”}},
settings: {
getLastErrorModes: {
veryImportant: { dc: 3 },
sortOfImportant: { dc: 2 }
}
}
}
These are the
modes you can
use in write
concern
42
• 8eLween 0..1000
• PlghesL member LhaL ls up Lo daLe wlns
– up Lo daLe == wlLhln 10 seconds of prlmary
• lf a hlgher prlorlLy member caLches up, lL wlll force elecuon
and wln
Primary
priority = 3
Secondary
priority = 2
Secondary
priority = 1

43
• Lags behlnd masLer by conñgurable ume delay
• AuLomaucally hldden from cllenLs
• ÞroLecLs agalnsL operaLor errors
– AccldenLally deleLe daLabase
– Appllcauon corrupLs daLa
44
• voLe ln elecuons
• uon'L sLore a copy of daLa
• use as ue breaker
45
Data Center
Primary
46
Data Center
Primary Secondary Secondary
Zone 1 Zone 2 Zone 3
47
Data Center
Primary Secondary
hidden = true
Secondary
backups
Zone 1 Zone 2 Zone 3
48
Active Data Center
Standby Data Center
Primary
priority = 1
Secondary
priority = 1
Secondary
priority = 0
Zone 1 Zone 2
49
West Coast DC Central DC East Coast DC
Secondary
priority = 1

Abiter
Primary
priority = 2
Secondary
priority = 2

Secondary
priority = 1

Zone 1
Zone 2
Zone 1
Zone 2
50
Shardlng
51
Shard
client client client client
mongos mongos
config
config
config
mongod
mongod
mongod
Shard
mongod
mongod
mongod
Shard
mongod
mongod
mongod
Config
Servers
52
{
name: “Jared”,
email: “jsr@10gen.com”,
}
{
name: “Scott”,
email: “scott@10gen.com”,
}
{
name: “Dan”,
email: “dan@10gen.com”,
}
> db.runCommand( { shardcollection: “test.users”,
key: { email: 1 }} )
53
-! +!
54
-! +!
dan@10gen.com
jsr@10gen.com
scott@10gen.com
55
-! +!
dan@10gen.com
jsr@10gen.com
scott@10gen.com
Split!
56
-! +!
dan@10gen.com
jsr@10gen.com
scott@10gen.com
Split! This is a
chunk
This is a
chunk
57
-! +!
dan@10gen.com
jsr@10gen.com
scott@10gen.com
58
-! +!
dan@10gen.com
jsr@10gen.com
scott@10gen.com
59
-! +!
dan@10gen.com
jsr@10gen.com
scott@10gen.com
Split!
60
• SLored ln Lhe conñg servers
• Cached ln MongoS
• used Lo rouLe requesLs and keep clusLer balanced
Min Key Max Key Shard
-~
dan[10gen.com 1
dan[10gen.com [sr[10gen.com 1
[sr[10gen.com scou[10gen.com 1
scou[10gen.com +~ 1
61
Shard 1 Shard 2 Shard 3 Shard 4
5
9
1
6
10
2
7
11
3
8
12
4
17
21
13
18
22
14
19
23
15
20
24
16
29
33
25
30
34
26
31
35
27
32
36
28
41
45
37
42
46
38
43
47
39
44
48
40
mongos
balancer
config
config
config
Chunks!
62
mongos
balancer
config
config
config
Shard 1 Shard 2 Shard 3 Shard 4
5
9
1
6
10
2
7
11
3
8
12
4
21 22 23 24 33 34 35 36 45 46 47 48
Imbalance Imbalance
63
mongos
balancer
Move chunk 1
to Shard 2
config
config
config
Shard 1 Shard 2 Shard 3 Shard 4
5
9
1
6
10
2
7
11
3
8
12
4
21 22 23 24 33 34 35 36 45 46 47 48
64
mongos
balancer
config
config
config
Shard 1 Shard 2 Shard 3 Shard 4
5
9
6
10
2
7
11
3
8
12
4
21 22 23 24 33 34 35 36 45 46 47 48
1
65
mongos
balancer
Chunk 1 now lives
on Shard 2
config
config
config
Shard 1 Shard 2 Shard 3 Shard 4
5
9
1 6
10
2
7
11
3
8
12
4
21 22 23 24 33 34 35 36 45 46 47 48
66
8y Shard •ey
8ouLed
db.users.find(
{email: “jsr@10gen.com”})
SorLed by
shard key
8ouLed ln order
db.users.find().sort({email:-1})
llnd by non
shard key
Scauer CaLher
db.users.find({state:”CA”})

SorLed by
non shard
key
ulsLrlbuLed merge
sorL
db.users.find().sort({state:1})
67
lnserLs
8equlres shard
key
db.users.insert({
name: “Jared”,
email: “jsr@10gen.com”})
8emoves
8ouLed db.users.delete({
email: “jsr@10gen.com”})
Scauered db.users.delete({name: “Jared”})
updaLes
8ouLed db.users.update(
{email: “jsr@10gen.com”},
{$set: { state: “CA”}})
Scauered db.users.update(
{state: “FZ”},
{$set:{ state: “CA”}}, false, true )
68
mongos
Shard 1 Shard 2 Shard 3
1
2
3
4
1. Query arrives at
MongoS
2. MongoS routes query
to a single shard
3. Shard returns results
of query
4. Results returned to
client
69
mongos
Shard 1 Shard 2 Shard 3
1
4
1. Cuery arrlves aL
MongoS
2. MongoS broadcasLs
query Lo all shards
3. Lach shard reLurns
resulLs for query
4. 8esulLs comblned and
reLurned Lo cllenL
2
2
3
3
2
3
70
mongos
Shard 1 Shard 2 Shard 3
1
3
6
1. Cuery arrlves aL
MongoS
2. MongoS broadcasLs
query Lo all shards
3. Lach shard locally sorLs
resulLs
4. 8esulLs reLurned Lo
mongos
3. MongoS merge sorLs
lndlvldual resulLs
6. Comblned sorLed resulL
reLurned Lo cllenL
2
2
3 3
4
4
5
2
4
71
use cases
72
user uaLa ManagemenL Plgh volume uaLa leeds
w
ConLenL ManagemenL Cperauonal lnLelllgence MeLa uaLa ManagemenL
73
• More machines, more sensors,
more data
• Variably structured
Machine
Generated
Data
• High frequency trading
Stock Market
Data
• Multiple sources of data
• Each changes their format
constantly
Social Media
Firehose
74
Data
Sources
Asynchronous writes
Flexible document
model can adapt to
changes in sensor
format
Write to memory with
periodic disk flush
Data
Sources
Data
Sources
Data
Sources
Scale writes over
multiple shards
75
• Large volume of state about users
• Very strict latency requirements
Ad Targeting
• Expose report data to millions of customers
• Report on large volumes of data
• Reports that update in real time
Real time
dashboards
• What are people talking about?
Social Media
Monitoring
76
Dashboards
API
Low latency reads
Parallelize queries
across replicas and
shards
In database
aggregation
Flexible schema
adapts to changing
input data
Can use same cluster
to collect, store, and
report on data
77
" lnLulL hosLs more Lhan 300,000
webslLes
" wanLed Lo collecL and analyze
daLa Lo recommend converslon
and lead generauon
lmprovemenLs Lo cusLomers.
" WlLh 10 years worLh of user
daLa, lL Look several days Lo
process Lhe lnformauon uslng a
relauonal daLabase.
Problem
" lnLulL hosLs more Lhan 300,000
webslLes
" wanLed Lo collecL and analyze
daLa Lo recommend converslon
and lead generauon
lmprovemenLs Lo cusLomers.
" WlLh 10 years worLh of user
daLa, lL Look several days Lo
process Lhe lnformauon uslng a
relauonal daLabase.
Why Mongou8
" ln one week lnLulL was able Lo
become proñclenL ln Mongou8
developmenL
" ueveloped appllcauon feaLures
more qulckly for Mongou8 Lhan
for relauonal daLabases
" MongoD8 was 2.S nmes faster
than MySÇL
lmpacL
Intu|t re||es on a MongoD8-powered rea|-nme ana|yncs too| for sma|| bus|nesses to
der|ve |nteresnng and acnonab|e pauerns from the|r customers' webs|te tramc
we JlJ o ptototype fot ooe week, ooJ wltblo ooe week we boJ moJe blq ptoqtess. vety blq ptoqtess. lt
wos so omozloq tbot we JeclJeJ, ´let´s qo wltb tbls.´ -nlrmala 8anganaLhan, lnLulL
78
1
2
3
See Ad
See Ad
4
Click
Convert
! !""#$%&$'( #$%&'($%'$&%'&)*
*'+%,-$.%,(/
*001%( /
*!-$"2.( 3
/ $40,%..$"2( +,-$.* -$4%( $%& /*
/ $40,%..$"2( +,-%.* -$4%( %&% /*
/ !1$!#( +,-%.5 -$4%( %&( /*
/ *''&-"&!*,-( +0,1231.*
.#6( +,4-5%&5.*
-$4%( %(' /*
/ 06,!7*.%( +0,1231.* -$4%( &(' /
8
9
9
/
Rich profiles
collecting multiple
complex actions
Scale out to support
high throughput of
activities tracked
Indexing and
querying to support
matching, frequency
capping
Dynamic schemas
make it easy to track
vendor specific
attributes
79
• Meta data about artifacts
• Content in the library
Data
Archiving
• Have data sources that you don’t have access
to
• Stores meta-data on those stores and figure out
which ones have the content
Information
discovery
• Retina scans
• Finger prints
Biometrics
80
! :;<=6 #7789-,:;)*
->0%6 #<33=)*
!"62-,>6 #>?@12)*
-$-1%( #ABCD8B2 >?@12)
/
! ->0%6 #AE285,C2)*
4%'$64( #F8E,GDC)*
!"62-,>6 #>?@12)*
>%*,( #&777 <F)
/
Flexible data model
for similar, but
different objects
Indexing and rich
query API for easy
searching and sorting
-;H,ECIDJ84H
5DB-K! #!"62-,>?6 #>?@12) /LM
81
" Managlng 2018 of daLa (slx
bllllon lmages for mllllons of
cusLomers) paruuonlng by
funcuon.
" Pome-grown key value sLore on
Lop of Lhelr Cracle daLabase
oñered sub-par performance
" Codebase for Lhls hybrld sLore
became hard Lo manage
" Plgh llcenslng, PW cosLs

Problem
" !SCn-based daLa sLrucLure
" Þrovlded Shuuer€y wlLh an
aglle, hlgh performance,
scalable soluuon aL a low cosL.
" Works seamlessly wlLh
Shuuer€y's servlces-based
archlLecLure
Why Mongou8
" 300• cosL reducuon and 900•
performance lmprovemenL
compared Lo prevlous Cracle
lmplemenLauon
" AcceleraLed ume-Lo-markeL for
nearly a dozen pro[ecLs on
Mongou8
" lmproved Þerformance by
reduclng average laLency for
lnserLs from 400ms Lo 2ms.

lmpacL
Shuuerßy uses MongoD8 to safeguard more than s|x b||||on |mages for m||||ons of
customers |n the form of photos and v|deos, and turn everyday p|ctures |nto keepsakes
1be ´teolly klllet teosoo´ fot osloq Mooqou8 ls lts tlcb I5ON-boseJ Joto sttoctote, wblcb o[ets 5bouetfy
oo oqlle opptoocb to Jevelop sofwote. wltb Mooqou8, tbe 5bouetfy teom coo polckly Jevelop ooJ
Jeploy oew oppllcouoos, especlolly web 2.0 ooJ soclol feototes. -•enny Corman, ulrecLor of uaLa Servlces

82
• Comments and user generated
content
• Personalization of content, layout
News Site
• Generate layout on the fly for each
device that connects
• No need to cache static pages
Multi-Device
rendering

• Store large objects
• Simple modeling of metadata
Sharing
83
! !*4%,*( #ND=3B -')*
1"!*-$"2( O P$%%H'$9&&&* &QHQQ( R
/
! !*4%,*( #F,B3B (- G=SS)*
0%"01%( O #TDG)* #F,E30) R*
-*#%2&"2( SUVW,28KX%7$%P7&P7QY$96&%6&(H77%ZXL
/
! ",$@$2( #5,C8;33=HC3G[1I3234[\]-5%&54-5)*
1$!%2.%( #FE8,2DJ8 F3GG3B4 FF7)*
.$A%( !
'$4%2.$"2.( O $%'* (% R*
62$-.6 #1D\804)
/
/
Flexible data model
for similar, but
different objects
Horizontal scalability
for large data sets
Geo spatial indexing
for location based
searches
GridFS for large
object storage
84
" Analyze a sLaggerlng amounL of
daLa for a sysLem bulld on
conunuous sLream of hlgh-
quallLy LexL pulled from onllne
sources
" Addlng Loo much daLa Loo
qulckly resulLed ln ouLages,
Lables locked for Lens of
seconds durlng lnserLs
" lnlually launched enurely on
MySCL buL qulckly hlL
performance road blocks

Problem
llfe wltb Mooqou8 bos beeo qooJ fot wotJolk. Oot coJe ls fostet, mote fexlble ooJ Jtomoucolly smollet.
5loce we Joo´t speoJ ume wottyloq oboot tbe Jotobose, we coo speoJ mote ume wtluoq coJe fot oot
oppllcouoo. -1ony 1am, vlce ÞresldenL of Lnglneerlng and 1echnlcal Co-founder
" MlgraLed 3 bllllon records ln a
slngle day wlLh zero downume
" Mongou8 powers every
webslLe requesLs: 20m AÞl calls
per day
" AblllLy Lo ellmlnaLed
memcached layer, creaung a
slmpllñed sysLem LhaL requlred
fewer resources and was less
prone Lo error.
Why Mongou8
" keduced code by 7S¼
compared to MySÇL
" leLch ume cuL from 400ms Lo
60ms
" SusLalned lnserL speed of 8k
words per second, wlLh
frequenL bursLs of up Lo 30k per
second
" SlgnlñcanL cosL savlngs and 13•
reducuon ln servers

lmpacL
Wordnlk uses Mongou8 as Lhe foundauon for lLs ºllve" dlcuonary LhaL sLores lLs enure
LexL corpus ‚ 3.S1 of data |n 20 b||||on records
85
• Scale out to large graphs
• Easy to search and
process
Social
Graphs
• Authentication,
Authorization and
Accounting
Identity
Management
86
Social Graph
Documents enable
disk locality of all
profile data for a user
Sharding partitions
user profiles across
available servers
Native support for
Arrays makes it easy
to store connections
inside user profile
87
8evlew
88
• varleLy, veloclLy and volume make lL dlƒculL
• uocumenLs are easler for many use cases
89
• ulsLrlbuLed by defaulL
• Plgh avallablllLy
• Cloud deploymenL
90
• uocumenL orlenLed daLa model
• Plghly avallable deploymenLs
• SLrong conslsLency model
• PorlzonLally scalable archlLecLure