You are on page 1of 76

Senior Director of Performance Engineering, MongoDB

Alvin Richards
#MongoDBWorld
Mythbusting: Understanding How
We Measure the Performance of
MongoDB
Before we start
We are going to look a lot at
C++ kernel code
Java benchmarks
JavaScript tests
And lots of charts
And its going to be awesome!
Measuring "Performance"
https://www.youtube.com/watch?v=7wm-pZp_mi0
Benchmarking
Some common traps
Performance measurement & diagnosis
What's next
Part One
Some Common Traps
The Milk Train Doesn't Stop Here Anymore
Tennessee Williams
"We all live in a house on re, no re department to
call; no way out, just the upstairs window to look out of
while the re burns the house down with us trapped,
locked in it."
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "!"
doc.put("c",cVal);
String padVal = "!";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "!"
doc.put("c",cVal);
String padVal = "!";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "!"
doc.put("c",cVal);
String padVal = "!";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "!"
doc.put("c",cVal);
String padVal = "!";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "!"
doc.put("c",cVal);
String padVal = "!";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
#1 Time taken to Insert x Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "!"
doc.put("c",cVal);
String padVal = "!";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
So that looks ok, right?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "!"
doc.put("c",cVal);
String padVal = "!";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Object creation and GC
management?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "!"
doc.put("c",cVal);
String padVal = "!";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Thread contention on
nextInt()?
Object creation and GC
management?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "!"
doc.put("c",cVal);
String padVal = "!";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Time to synthesize data?
Object creation and GC
management?
Thread contention on
nextInt()?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "!"
doc.put("c",cVal);
String padVal = "!";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Object creation and GC
management?
Thread contention on
addAndGet()?
Thread contention on
nextInt()?
Time to synthesize data?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "!"
doc.put("c",cVal);
String padVal = "!";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Object creation and GC
management?
Clock resolution?
Thread contention on
nextInt()?
Time to synthesize data?
Thread contention on
addAndGet()?
// Pre Create the Object outside the Loop
BasicDBObject[] aDocs = new BasicDBObject[documentsPerInsert];
for (int i=0; i < documentsPerInsert; i++) {
BasicDBObject doc = new BasicDBObject();
String cVal = "!";
doc.put("c",cVal);
String padVal = "!";
doc.put("pad",padVal);
aDocs[i] = doc;
}

Solution: Pre-Create the objects
Pre-create non varying
data outside the timing
loop
Alternative
Pre-create the data in a file; load from file

// Use ThreadLocalRandom generator or an instance of java.util.Random per thread
java.util.concurrent.ThreadLocalRandom rand;

for (long roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
doc = aDocs[i];
doc.put("_id",id);
doc.put("k", nextInt(rand, numMaxInserts)+1);
}

coll.insert(aDocs);
numInserts += documentsPerInsert;
}
// Maintain count outside the loop
globalInserts.addAndGet(documentsPerInsert * roundNum);

Solution: Remove contention
Remove contention
nextInt() by making
Thread local

// Use ThreadLocalRandom generator or an instance of java.util.Random per thread
java.util.concurrent.ThreadLocalRandom rand;

for (long roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
doc = aDocs[i];
doc.put("_id",id);
doc.put("k", nextInt(rand, numMaxInserts)+1);
}

coll.insert(aDocs);
numInserts += documentsPerInsert;
}
// Maintain count outside the loop
globalInserts.addAndGet(documentsPerInsert * roundNum);

Solution: Remove contention
Remove contention on
addAndGet()
Remove contention
nextInt() by making
Thread local

long startTime = System.currentTimeMillis();
!
long endTime = System.currentTimeMillis();




long startTime = System.nanoTime();
!
long endTime = System.nanoTime() - startTime;


Solution: Timer resolution
"resolution is at least as
good as that of
currentTimeMillis()"

"granularity of the value
depends on the
underlying operating
system and may be larger"
Source
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html
General Principal #1
Know what you are
measuring

BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;

while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
#2 Response time to return all results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;

while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
#2 Response time to return all results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;

while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
#2 Response time to return all results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;

while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
#2 Response time to return all results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;

while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
So that looks ok, right?
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;

while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Each doc is is 4080 bytes on
disk with powerOf2Sizes
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;

while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Each doc is is 4080 bytes on
disk with powerOf2Sizes
Unrestricted predicate?
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}

BasicDBObject predicate = new BasicDBObject();

long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;

while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
What are else you measuring?
Each doc is is 4080 bytes on
disk with powerOf2Sizes
Measuring
Time to parse &
execute query
Time to retrieve all
document
But also
Cost of shipping ~4MB
data through network
stack
Unrestricted predicate?

BasicDBObject predicate = new BasicDBObject();
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);

long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;

while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Solution: Limit the projection
Return xed range

BasicDBObject predicate = new BasicDBObject();
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);

long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;

while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Solution: Limit the projection
Only project _id
Return xed range

BasicDBObject predicate = new BasicDBObject();
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);

long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;

while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Solution: Limit the projection
Only project _id
Only 46k transferred
through network stack
Return xed range
General Principal #2
Measure only what you
need to measure

Part Two
Performance measurement
& diagnosis
The Physical Principles of the Quantum Theory (1930)
Werner Heisenberg
"Every experiment destroys some of the knowledge of
the system which was obtained by previous
experiments."
Broad categories
Micro Benchmarks
Workloads
Micro benchmarks: mongo-perf
mongo-perf: goals
Measure
commands
Congure
Single mongod, ReplSet size (1 -> n), Sharding
Single vs. Multiple DB
O/S
Characterize
Throughput by thread count
Compare
What do you get?
Better
What do you get?
Measured
improvement
between rc0 and rc2
Better
tests.push( { name: "Commands.CountsIntIDRange",
pre: function( collection ) {
collection.drop();
for ( var i = 0; i < 1000; i++ ) {
collection.insert( { _id : i } );
}
collection.getDB().getLastError();
},
ops: [
{ op: "command",
ns : "testdb",
command : { count : "mycollection",
query : { _id : { "$gt" : 10, "$lt" : 100 } } } }
] } );

Benchmark source code
tests.push( { name: "Commands.CountsIntIDRange",
pre: function( collection ) {
collection.drop();
for ( var i = 0; i < 1000; i++ ) {
collection.insert( { _id : i } );
}
collection.getDB().getLastError();
},
ops: [
{ op: "command",
ns : "testdb",
command : { count : "mycollection",
query : { _id : { "$gt" : 10, "$lt" : 100 } } } }
] } );

Benchmark source code
tests.push( { name: "Commands.CountsIntIDRange",
pre: function( collection ) {
collection.drop();
for ( var i = 0; i < 1000; i++ ) {
collection.insert( { _id : i } );
}
collection.getDB().getLastError();
},
ops: [
{ op: "command",
ns : "testdb",
command : { count : "mycollection",
query : { _id : { "$gt" : 10, "$lt" : 100 } } } }
] } );

Benchmark source code
tests.push( { name: "Commands.CountsIntIDRange",
pre: function( collection ) {
collection.drop();
for ( var i = 0; i < 1000; i++ ) {
collection.insert( { _id : i } );
}
collection.getDB().getLastError();
},
ops: [
{ op: "command",
ns : "testdb",
command : { count : "mycollection",
query : { _id : { "$gt" : 10, "$lt" : 100 } } } }
] } );

Benchmark source code
Code Change
Workloads
"public" workloads
YCSB
Sysbench
"real world" simulations
Inbox fan in/out
Message Stores
Content Management
Example: Bulk Load Performance
16m Documents
Better
55% degradation
2.6.0-rc1 vs 2.4.10
Ouch where's the tree in the woods?
2.4.10 -> 2.6.0
4495 git commits
git-bisect
Bisect between good/bad hashes
git-bisect nominates a new githash
Build against githash
Re-run test
Conrm if this githash is good/bad
Rinse and repeat
Code Change - Bad Githash
Code Change - Fix
Bulk Load Performance - Fix
Better
11% improvement
2.6.1 vs 2.4.10
The problem with measurement
Observability
What can you observe on the system?
Effect
What effects does t heobservation cause?
mtools
mtools
MongoDB log le analysis
Filter logs for operations, events
Response time, lock durations
Plot
https://github.com/rueckstiess/mtools
Response Times > 100ms
Bulk Insert 2.6.0-rc0
Ops/Sec
Time
Response Times > 100ms
Bulk Insert 2.6.0-rc0 vs. 2.6.0-rc2
Floor raised
Code Change Yielding Policy
Code Change
Response Times
Bulk Insert 2.6.0 vs 2.6.1
Ceiling similar, lower oor
resulting in 40% improvement
in throughput
Secondary effects of Yield policy change
Write lock time reduced
Order of magnitude reduction
of write lock duration
> db.serverStatus()
Yes will cause a read lock to be acquired

> db.serverStatus({recordStats:0})
No lock is not acquired

> mongostat
Yes - until SERVER-14008 resolved, uses db.serverStatus()
Unexpected side effects of
measurement?
CPU sampling
Get an impression of
Call Graphs
CPU time spent on node and called nodes
> sudo apt-get install google-perftools
> sudo apt-get install libunwind7-dev

> scons --use-cpu-profiler mongod



Setup & building with google-proler
> mongodb dbpath <!>
Note: Do not use fork

> mongo
> use admin
> db.runCommand({_cpuProfilerStart: {profileFilename: 'foo.prof'}})
Execute some commands that you want to profile
> db.runCommand({_cpuProfilerStop: 1})

Start the proling
Sample start vs. end of workload
Sample start vs. end of workload
Code change
Public Benchmarks Not all forks are the
same
YCSB
https://github.com/achille/YCSB
sysbench-mongodb
https://github.com/mdcallag/sysbench-mongodb
Part Three
And next?
Beavis & Butthead
"The future sucks. Change it."
"I'm way cool Beavis, but I cannot change the future."
What we are working on
mongo-perf
UI refactor
Adding more micro benchmarks
Workloads
Adding external benchmarks
Creating benchmarks for common use cases
Inbox fan in/out
Analytical dashboards
Stream / Feeds
Customers, Partners & Community
Here's how you can help change the future!
Got a great workload? Great benchmark?
Want to donate it?
alvin@mongodb.com
Don't be that benchmark
#1 Know what you are measuring

#2 Measure only what you need to measure
alvin@mongodb.com / @jonnyeight
Senior Director of Performance Engineering, MongoDB
Alvin Richards
#MongoDBWorld
Thank You

You might also like