You are on page 1of 23

How to build an app with

Twitter-like throughput
on just 9 servers...
Lew Cirne, Founder & CEO - New Relic
I’m Lew Cirne
@sweetlew
What our app does

APM as a Service

In-app agent instrumentation (BCI, etc)

150,000+ app processes monitored, globally (10K customers)

Each process reports a few hundred metrics per minute

5 Languages (Ruby, Java, PHP, .NET, Python)


Each day we collect 20 billion measurements,
from 150,000 application processes,
for over 10,000 customers.
Each day we collect 20 billion measurements,
from 150,000 application processes,
for over 10,000 customers.

All on 9 servers.
We capture “Timeslices” u t
n e i s a b o
Each o
Response Time 250 bytes
4 hours from 11:04 to 15:04
s i n g l e t w e e t
Count: 1242 A
Avg: 337 ms
is a b o u t th e
Min: 0.63 ms
Max: 95669 ms same size
Std Dev: 782
timeslice insertion rate: 100K/second

>7 billion rows per day


Twitter peak insertion rate:
8K rows per second

9 Servers handle all


data collection
Collecting is one thing...
• We provide realtime monitoring
• One minute granularity
• Data is almost always stale
• Each user/account has different data
• Page caching and other easy solutions don’t work for us.
Our most popular page...

e F u l l Page
Averag Time:
Load
2.4 Sec
Our most popular page...

e F u l l Page
Averag Time:
Load
2.4 Sec
Main App Software stack
User Interface Data Collectors Data Store
& REST API MySQL
Servlets on Jetty Sharded by accounts
Rails 2.3
Simplified architecture...
9 Collector / Aggregator / DB’s
Sustained 100K
insertion rate per
second

S
Customer’s environment HTTP

24 Core Intel Nehalem


48 GB RAM
SAS attached RAID 5
No Virtualization

(either cloud
or datacenter)
2 Web App Servers

12 Core Intel Nehalem


48 GB RAM
Even more data!

On May 17, we launched Real User Monitoring


• Using Episodes to measure browser load time of every page view

• Browser reports data to our ‘Beacon’ servers

• Monitoring >1 Billion page views per week

• Doubled our total inbound HTTP requests in a MONTH


Beacon Architecture
Response Time 0.15ms

RUM Beacons
Real User Asynchronously
Browsers Billions of metrics from
Servlets Capture and
across the globe enqueue (in-memory) aggregate and
forward
Timeslices to our
Collectors
Over 1 Billion user sessions
measured for performance in first Currently at EC2
month.
Challenges
• Data Purging
• Determining what to pre-aggregate
• Large Accounts
• MySQL Optimization and Tuning
• I/O performance - (virtualized to
dedicated) ...
5 Lessons Learned
1. Keep it simple
2. Less is more
3. Trendy != Reliable
4. Plan for scale
od es
s
Epi New

Ja Relic
va
u by
5. Use the right technology Ngin
x Je/y
R

Rails
for a given task
See New Relic
Monitor New Relic
at our booth

You might also like