Professional Documents
Culture Documents
A BRIEF HISTORY
Josh Clemm
www.linkedin.com/in/joshclemm
“
Scaling = replacing all the components
of a car while driving it at 100mph
300M
250M
100M
50M
32M
0M
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
5
LINKEDIN SCALE TODAY
7
Let’s start from
the beginning
LEO
DB
LINKEDIN’S ORIGINAL ARCHITECTURE
Circa 2003
So far so good, but two areas to improve:
RPC Member
LEO Graph
Lucene
Circa 2004
Getting better, but the single database was
under heavy load.
● Master/slave concept
● Writes go to main DB
Databus Replica
Replica
Main DB relay Replica DB
REPLICA DBs TAKEAWAYS
Databus Replica
Replica
Main DB relay Replica DB
LINKEDIN WITH REPLICA DBs
RPC
Member
LEO Graph Search
Profile
R/O R/W Connection Updates
Updates
Circa 2006
As LinkedIn continued to grow, the
monolithic application Leo was becoming
problematic.
Public Profile
Web App
LEO
Profile Service
Yet another
Service
Circa 2008 on
SERVICE ORIENTED ARCHITECTURE
Frontend
Web App
Edu Data
Data Kafka
Service
Service
Hadoop
DB Voldemort
SERVICE ORIENTED ARCHITECTURE COMPARISON
PROS CONS
● Stateless services ● Ops overhead
easily scale
● Introduces backwards
● Decoupled domains compatibility issues
756
Getting better, but LinkedIn was
experiencing hypergrowth...
CACHING
DB
● We use memcache, couchbase,
even Voldemort
“
There are only two hard problems in
Computer Science:
Cache invalidation, naming things, and
off-by-one errors.
Kafka
BENEFITS
● Enabled near realtime access to any data source
Josh Clemm
www.linkedin.com/in/joshclemm
LEARN MORE
● Blog version of this slide deck
https://engineering.linkedin.com/architecture/brief-history-scaling-linkedin
● LinkedIn Open-Source
https://engineering.linkedin.com/open-source
○ Profile
http://engineering.linkedin.com/profile/engineering-new-linkedin-profile
● Play Framework
○ Introduction at LinkedIn https://engineering.linkedin.
com/play/composable-and-streamable-play-apps
● System tuning
http://engineering.linkedin.com/performance/optimizing-linux-memory-management-
low-latency-high-throughput-databases