You are on page 1of 25

How to scale

(with ruby on rails)

George Palmer
george.palmer@gmail.com
3dogsbark.com
Overview
• One server
• Two servers
• Scaling the database
• Scaling the web server
• User clusters
• Final architecture
• Caching
• Cached architecture
• Links
• Questions
George Palmer
17th February 2007
How you start out
Shared Hosting

Web Server DB

• Shared Hosting
• One web server and DB on same machine
• Application designed for one machine
• Volume of traffic will depend on host
George Palmer
17th February 2007
Two servers

Web Server DB

• Possibly still shared hosting


• Web server and DB on different machine
• Minimal changes to code
• Volume of traffic will depend on whether made it
to dedicated machines
George Palmer
17th February 2007
Scaling the database (1)
Slave

Web Server Master


DB Slave

Slave

• DB setup more suited to read intensive


applications (MySQL replication)
• Should be on dedicated hosts
• Minimal changes to code

George Palmer
17th February 2007
Scaling the database (2)
MySQL Cluster

Master
DB
Web Server

Master
DB

• DB setup more suited to equal read/write


applications (MySQL cluster)
• Should be on dedicated hosts
• Minimal changes to code

George Palmer
17th February 2007
Scaling the web server
Web Server

Worker thread
Worker thread
DB
Worker thread Farm
Worker thread

• Web Server comprises of “Worker


threads” that process work as it comes in

George Palmer
17th February 2007
Load balancing
App Server

Load balancer App Server DB


Farm

App Server

• App Server depends:


– Rails (Mongrel, FastCGI)
– PHP
– J2EE
• Some changes to code will be required
George Palmer
17th February 2007
The story so far…

App Server Slave

Load balancer App Server Master


DB Slave

App Server
Slave

• App servers continue to scale but the


database side is somewhat limited…

George Palmer
17th February 2007
User Clusters
• For each user registered on the service
add a entry to a master database detailing
where their user data is stored
– UserID
– DB Cluster
– Basic authorisation details such as username,
password, any NLS settings

George Palmer
17th February 2007
User Clusters (2)
SELECT * FROM
users WHERE
username=‘Bob’ Master
AND … DB

App Server user_id=91732


db_cluster=2

User clusters are


themselves one of the two User User
database setups outlined Cluster 1 Cluster 2
earlier

George Palmer
17th February 2007
User Clusters (3)
• ID management becomes an issue
– Best to use master DB id as user_id in user cluster
– If let cluster allocate then make sure use offset and
increment (not auto_increment)
• Other DBs such as session must reference a
user by id and DB cluster
• Serious code changes may be required
• Will want to have ability to move use users
between clusters

George Palmer
17th February 2007
The final architecture
• As number of app servers grow it’s a good idea
to add a database connection manager (eg
SQLRelay)
• Extract out session, search, translation
databases onto own machines
• Use MySQL cluster (or equivalent) for any
critical database
– In replication setup can make a slave a backup
master
• Add a NFS/SAN for static files
George Palmer
17th February 2007
The final architecture (2)
NFS/SAN Master Master
DB DB
App Server 1

Session
App Server 2 DB
DB Connection
Load balancer
Manager
… Search
DB
App Server 50
NLS
DB
User User
Cluster Cluster
Master 1 Master
2

Slave Slave Slave Slave Slave Slave

George Palmer
17th February 2007
Issues
• Load balancer and database connection manager are
single point of failure
– Easy solved
• 2PC needed for some operations. For example a user
wants to be removed from search database
– 2PC not supported in rails
• Rails doesn’t support database switching for a given
model
– Can do explicitly on each request but expensive due to
connection establishment overhead
– Can get round if using connection manager but a proper solution
is required (I may write a gem to do this)

George Palmer
17th February 2007
Making the most of your assets
• In a lot of web applications a huge % of
the hits are read only. Hence the need for
caching:
– Squid
• A reverse-proxy (or webserver accelerator)
– Memcached
• Distributed memory caching solution

George Palmer
17th February 2007
Squid
App Server 1
Squid …
Not in App Server 2
In cache
cache
NFS/SAN

• Lookup of pages is in memory, storing of


files is on disk
• Can act also act as a load balancer
• Pages can be expired by sending DELETE
request to proxy
George Palmer
17th February 2007
Memcached
Physical Machine Physical Machine

App Server App Server


DB Farm

Memcached Memcached

(Not in
memcached)
• Location of data is irrespective of physical machine
• A really nice simple API
– SET
– GET
– DELETE
• In rails only a fews LOC will make a model cached
• Also useful for tracking cross machine information – eg dodge user behaviour

George Palmer
17th February 2007
Cached Architecture
• Introduce Squid
– Acts as load balancer (note there are higher
performing load balancers)
• Introduce memcached
– Can go on every machine that has spare
memory
• Best suited to application servers which have high
CPU usage but low memory requirements

George Palmer
17th February 2007
Cached architecture
NFS/SAN Master Master
M DB DB
App Server 1
C
Session
M DB
App Server 2
C DB Connection
Squid
Manager
… Search
M DB
App Server 50
C
NLS
DB
User User
Cluster Cluster
Master 1 Master
2
MC=memcached

Slave Slave Slave Slave Slave Slave

George Palmer
17th February 2007
Cached architecture
• Wikipedia quote a cache hit rate of 78%
for squid and 7% for memcached
– So only 15% of hits actually get to the DB!!
• Performance is a whole new ball game but
we recently gained 15-20% by optimising
our rails configuration
– But don’t get carried away - at some point the
time you spend exceeds the money saved

George Palmer
17th February 2007
Cached architecture – 1 machine
Physical Machine
NFS/SAN Master Master
DB DB
App Server 1

Session
App Server 2 DB
DB Connection
Squid Memcached Manager
… Search
DB
App Server 5

NLS
DB
User
Cluster
1 Master

Slave Slave Slave

George Palmer
17th February 2007
How far can it go?
• For a truly global application, with millions
of users - In order of ease:
– Have a cache on each continent
– Make user clusters based on user location
• Distribute the clusters physically around the world
– Introduce app servers on each continent
– If you must replicate your site globally then
use transaction replication software, eg
GoldenGate
George Palmer
17th February 2007
Useful Links
• http://www.squid-cache.org/
• http://www.danga.com/memcached/
• http://sqlrelay.sourceforge.net/

• http://railsexpress.de/blog/

George Palmer
17th February 2007
Questions?

George Palmer
17th February 2007

You might also like