Falcon from the Beginning

Jim Starkey jstarkey@mysql.com

Why Falcon? Because the World is Changing!
 Hardware is evolving rapidly  Customers need ACID transactions Atomic – the books should balance Consistent – the alternative is chaos Isolated – preserve programmer’s sanity(sic) Durable – who wants to lose data?

Where Hardware is going
 CPUs breed like rabbits – more sockets, more cores per socket, more threads per core  Memory is bigger, faster, and cheaper  Disks are bigger and cheaper but not much faster  (Boxes are cheaper and more plentiful, but that’s a different story)

Where Applications are going
      Batch – dead! Timesharing – dead! Departmental computing – dead! Client server – fading fast Application servers for most of us Web services for the really big guys

The Database challenge
 Traditional challenge: Exhaust CPU, memory, and disk simultaneously  Today’s challenge: Exhaust CPU and memory and avoid the disk

Falcon tradeoffs
 Use memory (page cache) to avoid disk reads  Use memory (record cache) to avoid the page cache manipulation.  Use CPU to find the fastest path to a record  Use CPU to minimize record size  Synchronize most data structures with user mode read/write locks  Synchronize high contention data structures with interlocked instructions.

The Falcon architecture
 Incomplete in-memory database with disk backfill  Multi-version concurrency control in memory  Updates in memory until commit  Group commits to a single serial log write  Post-commit multi-threaded pipe line to move updates to disk

Incomplete in-memory database
 Selected records cached in memory  Separate cache for disk pages  Record cache hit is 15% the cost of a page cache hit  Record cache is more memory efficient than page cache

Record Encoding - Cache Efficiency
 Records encoded by value, not declaration  String “abc” occupies the same space in varchar(3) or varchar(4096)  The number 7 is the same where small, medium, int, bigint, decimal, or numeric

Multi-Version Concurrency Control
 Update operations create new record versions  New version is tagged with transaction id, points to old version  System tracks which transactions should see which versions  Readers don’t block writers  Everyone sees a consistent view of the data

Updates Are in Memory Until Commit
 Updates held in memory pending commit (well, usually)  Index changes held in memory pending commit (same caveat)  Verb rollback is dirt cheap  Transaction rollback is dirt cheap

At Commit…
 Pending record updates flushed to serial log  Pending index updates flushed to serial log  Commit record written to serial log  Serial log flushed to the oxide  And the transaction is committed!

Alas, Memory isn’t infinite, so
 Large transaction chills uncommitted data (flushes it to the log early)  Chilled records can be thawed (fetched from the log)  Scavenger garbage collects unloved records periodically  When things get really bad, entire record chains flushed to backlog  (Note: This is hard and we aren’t done.)

Falcon Weaknesses
 Transactions are ACID but not serializable  Latency advantage disappears at saturation  Very large transactions degrade performance  Optimized for Web, not batch

Falcon Strengths
 Runs like a memory database when data fits in cache  Scales like disk-based database when data doesn’t fit in cache  Lowest possible latency for Web applications  Absorbs huge spiky loads

Performance Measurement
 Generally benchmark against InnoDB (transactional engines)  We use the DBT2 benchmark: High contention Write intensive – 40% records touched are updated Measures only performance at saturation  DBT2 (we believe) is InnoDB’s best spot and Falcon’s worst

Benchmarking Results
 16 & 8 cpu system: Falcon exceeds InnoDB performance  4 cpu systems: Falcon exceeds InnoDB performance for moderate to large number of threads  2 cpu systems: Rough parity, advantage to InnoDB  1 cpu systems: InnoDB wins  Caveat: Results subject to change! Both systems are moving targets!!!

When should you use what?
 If you don’t need ACID, MyISAM is probably fastest  For Uniprocessors and small memory systems, InnoDB is a good choice  For large transaction batch, InnoDB may be best match  For multi-cores and large number of threads, Falcon is probably best  For the Web, Falcon is hard to beat.

 Questions?