You are on page 1of 37

Optimise and Scale MySQL

Mike Griffiths
Proven Scaling
About me and us
• Me:
 Used & administered MySQL databases for 10 years
 Consultant with Proven Scaling
 Used to be a DBA and Service Architect at Yahoo!
 I don’t always follow the party line
• Us:
 Founded in 2006
 Specialise in MySQL, but work with whole stack
 Primarily consult on architecture, design and
optimisation for large scalable systems
 We also do training, DBA work, audits, coding…

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Mike’s Mantras
• Scalability + Availability = Service Quality
• Be as lazy as possible
• Be paranoid
• You’re only as strong as your weakest link

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Common Scenario #1
• “I don’t think scalability will ever be a problem for
me.”
 Denial
 Overconfidence
 No-one uses my product

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Common Scenario #2
• “It’s all melting down! Help!”
• Find the cause
 External Factors
 Slashdot, Digg, Blogs
 Internal Factors
 New features
 Bad code
 Marketing

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Pre-requisites
• Solid development process
• Version & Release control
• Test / Staging Environment
• Load testing strategy
• Good cross-functional communication

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Common Scenario #3
• “I might have problems with scaling up in the
future…
 … I’ll fix them if they happen.”
 … I’ll fix them now. All of them.”

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Extreme Approaches
•Wait & See •Fix All Now
 Faster, Cheaper  Slower, Expensive
 Quality of service  Time to market
compromised increased
 Slower to react to  Risk of wasted time
changes in workload  Increased quality of
 Sometimes chosen service
because of lack of skill  Better prepared for
or knowledge unexpected
 Prototypes are useful,
but they are what they
are!

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Sensible Approach
• Somewhere between the two extremes
• Architect for what you might conceivably face in
the future
• Implement now what you know you will face
• Always avoid dead ends and short cuts
 Even if it means much more effort now

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
MySQL Scalability
•Query Optimisation •Caching
•Functional Separation
•Replication •Configuration
•Sharding
•Archiving •Hardware

•Isolation •Capacity Planning
•Error Handling

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Query Optimisation
• Ensure queries are correctly indexed
• Use the slow query log
• Learn EXPLAIN
• Look carefully at queries which can’t be optimised
 Rewrite queries
 Change schema
 Denormalise
 Split into multiple queries

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Query Optimisation
• Big is generally slower
 Longer to recover, query, maintain
• Partitioning in MySQL 5.1 can sometimes help
query performance
• Is your server optimally configured?

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Functional Separation
• Common prototype/start-up scenario is to have
multiple functions on one database host
 Blog
 Forum
 Monitoring
 Application
• Split into different databases
• Move different databases to different hosts
• Can provide short-term breathing space

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Replication
• Can help scalability problems with reads
• Doesn’t help with scaling writes
• Asynchronous nature adds complexity to
applications
• Can be used to increase availability
 Actually, replace “can” with “should” here
• Part of a scalability solution
 … but not the whole solution
 Be aware of its strengths and weaknesses
 Know what doesn’t work with replication

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Replication: Read
Scalability
• Single master (write point) replicates to multiple
read-only slaves
• Too many slaves can overload a master
 Use “relay” slaves to build a replication tree
• Inefficient (cost, storage, memory) strategy for
scaling reads in isolation
 Load balancing strategy is key
 Consider having different roles for slaves
 Combine with sensible caching elsewhere in stack
for best results

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Replication: Write
Scalability
• Replication doesn’t help with scaling writes
• Write-saturated master normally means write-
saturated slaves
 Row-based replication in MySQL 5.1 can help
sometimes
• Updates to slaves done in single thread
 Overall write capacity on a slave is less than the
master
• Read performance on slaves drops rapidly as write
load increases

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Replication:
Asynchronous
• Applications need to be aware of replication delays
• Worst case: write followed by a read
• How to handle?
 Promotion of read-only database handles to
read/write when required
 Use binary log positions to work out if a slave has
new enough data
 Can other users be shown out-of-date data?
• Can be very difficult to add handling for replication
delays into existing applications

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Replication: Availability
• Deploy servers in pairs using master-master
replication
• Never use master-master as a way of scaling writes
• Use virtual IP addresses to control access - see
MMM, Flipper, Linux HA
• Use “inactive” machine for maintenance, backups,
slow reads for reporting

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Sharding
• Splitting database into smaller chunks
• Only solution for scaling writes
• Needs careful planning
• Can be difficult to implement
• Even more difficult, if not impossible to retro-fit to
an existing system
• Architect your system with sharding in mind from
Day One
 … even if not immediately implemented or used

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Sharding
• How to split the data?
 Hashing on user-supplied data
 Username, email address
 Splitting on other data
 Time
 Data dictionary
 Any combination of the above

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Sharding
• What data to split?
 Cross-shard queries are not impossible, but
logistically more difficult
 Use your application design to decide
• Popular choice: Primary/Secondary split
 Primary dataset
 Frequently used information (highly cacheable)
 Relationships
 Pointers to secondary data
 Secondary dataset(s)
 Vertically split data
 Less frequently used information

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Sharding
• Keep shards to the right size
 Small enough to be manageable
 Large enough so you don’t need thousands
 Consistent size per shard
• Architectures should allow for adding shards
 Hash-based sharding often falls down here

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Sharding
• Accessing data across shards is more difficult, but
not impossible
 Handle JOINs in your application
 Replicate primary data to secondary shards
 Maintain summary tables outside of shards
 Parallelisation of execution across shards can give
significant performance boost

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Archiving
• Big is generally slower
• Keep your data size as small as possible
• Move older & less frequently accessed data
 To slower and/or cheaper infrastructure
 To infrastructure with real or imagined lower service
level

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Caching
• Caching closer to the user is more efficient
 Use a content delivery network
 Use squid
 Use memcached
 Maybe use MySQL’s query cache

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Isolation & Error
Handling
• Isolate your application from MySQL
• Design your application to run without the MySQL
server (as much as possible)
• Pre-produce full and/or partial web pages
• No need to process web access logs in real-time
• Cope (more) gracefully when there’s an outage

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Configuration
• Ensure your servers are configured correctly
 Use LVM or ZFS for snapshots for backup
 Ensure BBWC is on and working
 Enables InnoDB to commit without disk head movement
 Make sure MySQL’s configured right
 Use InnoDB. MyISAM rarely the right choice.
 Give as much memory as possible to InnoDB
 Aim: Get all data in InnoDB Buffer Pool
• Ensure everything’s monitored
• Automate whatever you can

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Hardware for MySQL
• Should I scale up or scale out?
• Each approach has benefits
 Don’t listen to sales & marketing people
• Each approach has problems
 Don’t listen to sales & marketing people
• Should I consider cloud computing?

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Scale up
• Can be cheaper
 Smaller power and space usage
 Fewer machines to administer
• Many eggs in one basket
• MySQL’s own scaling problems add complexity
 Need to run multiple instances to take advantage of
massively parallel machines

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Scale up
• Storage doesn’t scale up cheaply
 Significant storage infrastructure might be required
 Cost per I/O operation and terabyte likely to be
significantly higher
 … but, with a SAN, you only have one “live” copy of your
data
 Power & space savings could be cancelled out
 Single copy of data reduces maintainability
 I/O latency with SAN normally higher than with local
disk

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Scale out
• Can be cheaper
 Lower initial cost
 Local storage cheaper
 … offset by duplication of data
• Can be more expensive
 Power, space, administration
• More frequent failures
 … but potentially less damaging

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Cloud computing
• More appropriate in other parts of the application
stack
• Worth considering if you have peaky, CPU-
intensive workload
 Cater for your baseline yourself
 Use on-demand services for peaks
• Otherwise, avoid!
 Poor I/O performance will hurt you

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Hardware
• Hardware strategy depends on:
 Application Architecture
 Predicted Growth Rates
 Funding

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Capacity Planning
• Working out what you need to provide your
product(s) to your users within an acceptable
timeframe
• Constant process
• Use all the data available
 Extrapolate from monitoring data
 Use load testing data
 Use data gained from prototype testing
• Allow for hardware failure

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Capacity Planning

Past traffic data
42 Future traffic
predictions

… and a new load
balancing
Traffic profile strategy

Service Level
Performance data
Requirements
Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Summary
• Sharding for scaling writes
• Replication and caching for scaling reads
• Big is bad. Small is super. Manageable is magic.
• Get the right hardware for the job
• Use your kit as efficiently as possible
 Optimise your queries
 Configure things correctly
• Monitor as much as possible
• Think about scaling now, not later

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC
Thanks!

Want help to make it scale? Get in touch!
consulting@provenscaling.com

Copyright 2008 Proven Scaling Ltd / Proven Scaling LLC