Scaling out MySQL: Hardware today and tomorrow

Jeremy Cole, Eric Bergen {jeremy,eric}@provenscaling.com

Overview
• A look at hardware out there today • What’s important for MySQL?

The big questions

What about 64-bit?
• Make absolutely everything 64-bit • Every server you buy now will have 64-bit CPUs • Except in a few corner cases, it won’t hurt, but may not help • For MySQL servers, it will absolutely make your life easier • Caveat: If you use third-party software, it may not work properly due to library issues etc.

How many cores?
• MySQL has problems scaling on many-core CPUs • Peter Zaitsev and Mark Callaghan have addressed the issues many times in blog posts and conference sessions • We normally recommend dual dual core or dual quad core • Unless you are highly concurrent and CPU-bound, dual dual core at a faster clock speed should perform better than dual quad core at a slower clock speed

How much memory?
• As much as you can! • Memory is quite cheap these days, as 4GB DIMMs have come down in price by about 80% or more • Typical servers can hold up to 32GB, go for it!

Shared storage?
• This is usually the biggest question: Should I buy this big expensive SAN, or should I put some disks in RAID in each server? • Shared storage places a lot of trust in a single system • Reliability can be more difficult to achieve when a single system failure affects multiple other systems • Storage shared across many tasks will make it very difficult to provide reliable service to MySQL • I/O latency is much higher on SAN or NAS systems

Which vendor?
• Major server vendors: Dell, HP, IBM, Sun • Smaller server vendors: SuperMicro, Rackable, Silicon Mechanics, iX Systems, etc. • Bigger vendors can generally provide equipment much faster in a pinch • Bigger vendors will have an easier time providing the same type of machines over a longer period of time • Smaller vendors may be more willing to work with you on custom configurations or special needs

Acronym Soup

RAID
• • • • • • “Redundant Array of Inexpensive Disks” Different RAID levels: 0, 1, 5, 10 are common For databases, 5 and 10 are the most common Can be connected via IDE, SATA, SCSI, SAS Can be internal or external (“shelf”) Can be implemented in hardware (LSI, 3ware, Adaptec, etc.) or software (Linux kernel, etc.)

RAID: Common Levels
• RAID 0 - Striping • RAID 1 - Mirroring • RAID 10 (1+0) - Mirroring + Striping • RAID 0+1 - Striping + Mirroring • RAID 5 - Distributed Parity

DAS
• “Direct-Attached Storage” • Usually refers to a set of many RAIDed disks • RAID isn’t necessarily a prerequisite to being DAS, you could have a JBOD DAS • “Direct-Attached” because it’s attached to the host that will use the disks, not to a “headend” or other interim host

JBOD
• “Just a Bunch of Disks” • Disks that are not RAIDed or part of a SAN or NAS system • The OS will see each individual disk and is responsible for combining them if necessary (using e.g. software RAID or LVM)

BBWC, TB[B]U
• “Battery-Backed Write Cache” • “Transportable Battery [Backup] Unit” • A cache to hold writes while queuing them to be written to the actual disks • Usually present in RAID cards • Almost always present in SAN or other solutions

BBWC: Write Back vs. Through
• A BBWC can be in “write back” or “write through” mode • “Write Back” uses the cache without writing the data to physical disk immediately (very dangerous without working battery) -- but drastically increases performance on sequential, individually committed writes (such as binary logs, InnoDB logs) • “Write Through” requires data to be written to the physical disk before acknowledging writes -- but is slow

SAN
• “Storage Area Network” • Generally either FC (Fibre Channel) or iSCSI (SCSI over IP, often via Gigabit Ethernet) • Provides a volume to the host as a block device • SANs are typically shared by many machines, but each volume on a SAN is normally only used by one host (“initiator”) at a time • SANs may provide the ability to take copy-on-write snapshots to the host

NAS
• “Network Attached Storage” • Generally NFS and/or CIFS • Provides the host a view of files via a high-level export protocol • NASes are typically shared by many machines, and a single volume may be shared by many hosts • NASes coordinate access to files

Out with the old: PATA, SCSI
• “Parallel ATA” • Older host interface, primarily used in desktop machines • “SCSI” :) (ok, technically “Small Computer System Interface”) • Older host interface, primarily used in servers • Allows for hot swapping • High pin count, requires terminators, etc.

In with the new: SATA, SAS
• “Serial ATA” • New version of ATA using a serial protocol at 1.5 Gbps and 3.0 Gbps • Very low pin count, simple cables, hot swappable • • • • “Serial Attached SCSI” Same basic host interface as SATA SAS hosts can connect to SATA disks seamlessly SAS has additional features, such as multiple attachment, and a richer command set

SSD
• • • • “Solid State Disk” Uses flash memory to store data Capable of very low latency for random “seek” Commercially available versions are much better suited to high random read environments than random writes • Kevin Burton did lots of research on available SSDs, conclusion:
 Not fast enough for high random write environments yet  InnoDB needs work to really take advantage

MySQL Stuff

Typical MySQL Requirements
• Assuming high write needs, fairly large database • BBWC to allow InnoDB to commit without disk head movement • Lots of memory to allow for a large InnoDB buffer pool • Storage with low latency and high random write throughput • Decent (but not awesome) CPUs

Memory Allocation
• Assuming an InnoDB-only system • Normally recommend system memory minus perhaps 2GB should be allocated to InnoDB buffer pool • Very little memory needed for anything else really!

Shared vs. Independent
• Shared storage systems can be used in combination with Linux HA to achieve failover • Independent storage can be used in combination with MySQL replication to achieve failover • On shared storage systems, failover will require a recovery of MySQL databases • On independent storage, failover can be nearly instantaneous

An Example Machine
• • • • • • Dell PowerEdge 2950 Dual Quad Core E5430 @ 2.66Ghz 32GB 667Mhz RAM 8 x 73GB 15k RPM 2.5” SAS Dual power supplies Rack mount kit

• List price: $8400 • Real price: ~$6k • Power consumption: typical 440W (3.83A @ 115V)

Special Hardware

Kickfire
• Execute queries on a SQL-processing custom chip • Massive access to large memories • Very cool tech!

Violin Memory
• Half a TB of DRAM in a 2U • Accessible as a block device • Very cool tech as well!

High-speed Interconnects
• InifiniBand • Dolphin Interconnect • Both very interesting for clustered systems, providing low latency high throughput network access • Software has to be written specifically to use either