P. 1
OSCON2007: Landscape of trx engines

OSCON2007: Landscape of trx engines

|Views: 26|Likes:
Published by Oleksiy Kovyrin

More info:

Published by: Oleksiy Kovyrin on Feb 20, 2008
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

06/16/2009

pdf

text

original

Landscape of Open Source Transactional Storage Engines

Peter Zaitsev Vadim Tkachenko http://MySQLPerformanceBlog.com

Aboutus
- Founder Per s cona Lt d - M ySQ L Peror ance Focused C onsuli f m tng - htp:/ w w . ySQ LPeror anceBl com -aut s t /w M f m og. hor - W or ked f M ySQ L AB f year or or s - Pet – l er ead of“ i Peror ance G r H gh f m oup” Vadi hi , m s rghthand i - Long tm e M ySQ L user f bunch ofper i s or sonaly l i ved pr ect nvol oj s

M ySQ L pl nabl ar t ur ugi e chiect e

M ySQ L Tr ansactonalEngi i nes
- B DB -Legacy St age Engi r oved i 5. not or ne, em n 1 t ed est - InnoDB -“ ostpopul ” ( M ar The onl com m onl used) y y st age engi by I or ne nnobase O y.
- S olidDB -St age Engi fom Sold I or aton Technol or ne r i nf m i ogy - PB XT - St age Engi by SN AP I or ne nnovaton ( i Paul McCullagh) - Falcon - New Storage Engine by MySQL AB, Project lead by Jim Starkey - NDB - MySQL Cluster is a whole other beast and not covered

I nnoD B
- htp:/ w w .nnodb. t /w i com / - M at e St age Engi devel ur or ne, opm entst t by H ei ared kki Tuur over10 year ago. i s - H ei w as l ng f a w ays t i pr kki ooki or o m ove tadii r tonal dat abases peror ance f m - Acquied by O r e i t end of2005 r acl n he - The onl Tr y ansactonalst age engi avaiabl i i or ne l e n M ySQ L 5. ofi alr ease 0 fci el

soldD B i
- htp:/ w w . i ech. t /w soldt com / i Bf M ySQ L/ soldD or - O penSour ced i 2006 n - Exi i St age Engi t stng or ne echnol ogy “nt at w ih i egr ed” t M ySQ L - Focused on r i lt and M uli ocessorScal lt elabiiy tpr abiiy - C ur enty shi r l pped as pr oducton r i eady.

Prm eBase XT ( i PBXT)
- htp:/ w w . i ebase. t /w prm com / / xt - W rten m ai y by Paul McCullagh since 2005 it nl
- Not a port of existing storage engine to MySQL but new writeup - Uses number of unusual design decisions - Only 50% transactional - Focused on efficient BLOB storage - http://www.blobstreaming.org/

Fal con
- htp:/ t /dev. ysqlcom / m . doc/al f con/ i en/ndex. m l ht - Based on “ etr r ur engi by Ji St key N fastuct e” ne m ar - Pur chased by M ySQ L AB i eary 2006 n l - “ ght ei D esi Li w ght gn” - Focused on Tr ansactonalneeds ofW eb Applcaton, i i i efi entuse ofl ge am ountofm em or fci ar y

Design and Behavior

I nnoD B desi gn
- M VC C and ver efi entr l y fci ow evell ocks - C l erng by prm ar key,w rt t sam e pages ust i i y ie o - noncom pr essed secondar i y ndexes w .tansacton i o r i nf - Si e t espace ort espace pert e ngl abl abl abl - Pessi i i l ng m stc ocki - I antD eadl det i nst ock ecton - Fuzzy C heckpoi i ntng - “ oubl rt f pari page w rt pr ecton D eW ie” or tal ie ot i

I nnoD B
- D EAD LO C K det i ecton
Session Session Session Session Session Session 1: 2: 1: 2: 1: 2: BEGIN; BEGIN; UPDATE UPDATE UPDATE UPDATE test test test test SET SET SET SET name=‘random1-1’ name=‘random2-1’ name=‘random1-2’ name=‘random2-2’ WHERE WHERE WHERE WHERE id=1; id=2; id=2; id=1;

- I nnoD B det deadl ( r 1213)Ins tantly i ect ock Er or n second sessi on - Pessi i i l ng: m stc ocki - U PD ATE t sam e r i t o concur enttansacton – he ow n w r r i second tansacton w ais on C O M M I R O LLBAC K i r i t T/ n fr ist

I nnoD B Stengt r hs
- Pow erulM VC C f - G ood peror ance on w i r f m de ange ofw or oads kl - G r St lt eat abiiy - G r D at Pr ecton eat a ot i - Prm ar Key C l erng alow s a l ofoptm i i i y ust i l ot i zatons - Tr ansacton i o i secondar i i nf n y ndexes alow f i l ast ndex onl scans y - Adaptve H ash i i ndexes and ot advanced t her echni ques

I nnoD B W eaknesses
- Sl D evel ow opm entpace i r n ecentyear s - Stl havi scal lt i il ng abiiy ssues w ih m uli e C PU s t tpl - U nscal e Aut I em ent Br abl o-ncr , oken G r oup C om m i t t ake ver l y ong t fx o i - Lar f prnt especi l f secondar i ge oot i , aly or y ndexes - I t ns outnotso l ge as w e com par t ur ar e - Stl m essy i egr i w ih M ySQ L il nt aton t - H ow do you see how m uch space i fee i I s r n nnodb t espace ? abl

SoldD B D esi i gn
- M VC C and R ow l evell ng ocki - C l erng by Prm ar Key ust i i y - N ew dat st ed i new pages a or n - “ BonsaiTr used f M uliVer oni ee” or t si ng - O PTI I C and PESSI I C l ng specii on M STI M STI ocki fed t el abl evel - O nlne Backup ( otusabl f Sl i N e or ave cr i eaton) - H i Avaiabl sync r i i pr i gh l e eplcaton om sed soon.

soldD B -PESSI I C i M STI
- D EAD LO C K -D EAD LO C K det ed i fr Sessi afer ect n ist on t 20 sec ofw aii tng - Ti eoutbased deadl m ocks - U PD ATE t o r s – second sessi w ai on fr w ow on t ist

soldD B -O PTI I C i M STI
- D EAD LO C K -D EAD LO C K det ed i second Sessi ect n on i m edi el butw ih er or1205 – Lock w ai tm eoutexceeded m at y t r ti - U PD ATE t o concur entr s: w r ow - SESSI N O SESSI N O SESSI N O SESSI N O 1:BEG I ; N 2:BEG I ; N 1:U PD ATE t SET nam e = ‘nd’W H ER E i est r d=2; 2:U PD ATE t SET nam e = ‘nd’W H ER E i est r d=2;

- I Sessi 2 w e got n on : ER R O R 1205 ( Y000) Lock w ai tm eoutexceeded;ty H : ti r r ari tansacton est tng r i - Thi i O K f O PTI I C engi s s or M STI nes,butm ay cause toubl i r e n W eb applcatons. i i

S olidDB S treng ths and Weaknes s
- Li ied pr m t oducton usage t r l t l i o ealy el - O utofst age engi or nes r ew ed m ostsi iari desi evi m l n gn t I o nnodb - C hoi ofO ptm i i vs Pessi i i i ni f som e ce i stc m stc s ce or applcatons i i - N o i antdeadl det i nst ock ecton - So f avaiabl as speci dow nl ar l e al oad onl ( even a y not pl n) ugi

PBXT D esi gn
- M VC C W ih r l t ow evell ng ocki - “ PerD at abase”Tr ansactons i - N o r dur lt yet w eak cr eal abiiy , ash r ecover y - O PTI I C l ng M STI ocki - W rt once,w rt sequentaly t l ie ie i l o og - N everupdat i pl e n ace - D at cache + Key cache a - Efi entBLO B H andlng fci i

PBXT
- D EAD LO C K det ed i second sessi 1213 er or ect n on, r - U PD ATE t o concur entr s – optm i i w r ow i stc, second sessi on: ER R O R 1020 ( Y000) R ecor has changed si H : d nce l ast r ead i t e 'est n abl t 2'

PBXT Stengt and W eaknesses r hs
- N otyetcom m onl used i pr y n oducton ( e ti butgot i w red t m any bugs) oo - Ver good peror ance f som e w or oads y f m or kl - Efi entSt age, cl fci or ose t M yI o SAM - Focused on BLO B efi enthandlng,exta f ur lke fci i r eat es i Bl Steam i ob r ng - Stl m ai y one m an pr ect il nl oj - Lar ToD o,a l needs t be done,i udi R ecover ge ot o ncl ng y - Pot i l l ge Pur ng over entaly ar gi head

Fal con D esi gn
- M VC C , r l ow evell ng (n pr i noti t ocki i actce, n heor y) - PESSI I C l ng M STI ocki - N otcl er by prm ar key ust ed i y - R ow cache ( cache onl r s you need) y ow - “ ptm al i O i ” ndex taver on r si - “ at C om pr D a essi -N uls,Em pt Sti on” l y rngs - Al ays needs t r w o ead r dat ( ow a because ofi ndex stuct e) r ur

Fal con
- D EAD LO C K: I Sessi n on2: ER R O R 1020 ( Y000) R ecor has changed si H : d nce l ast r ead i t e 'est n abl t 2' - Ann H ar i rson t l Fal els con checks cycl i l gr es n ock aph perodi l r hert i caly at han i anty on r l w ai nst l ow ock t - U PD ATE: Second sessi w ais on t

Fal con Stengt i W eaknesses r hs n
- Stl Al il pha w ih m any bugs – Eary t j t l o udge - Ver actve suppor fom M ySQ L AB y i tr - Fastdevel opm entpace – bugs bei fxed qui y,m aj ng i ckl or peror ance i pr f m m ovem ent durng l 3 m ont s i ast hs - G ood i egr i w ih M ySQ L,i t es f peror ance nt aton t e abl or f m dat a - N o Prm ar key cl erng orcoverng i i y ust i i ndex suppor t - D if entdesi deci ons can com plcat m i aton fom fer gn si i e gr i r I nnodb (hough l calbehavi becam e cl t ogi or oser )

There are lies, big lies and there are Benchmarks

Benchm ar – t ngs t not ks hi o e
- Benchm ar m ay notbe r evantf peror ance of ks el or f m yourapplcaton i i - Eary ver ons w e ti f Fal l si red or con,PBXT m ay change t rperor ance pr hei f m operi bef e pr tes or oducton i - Ther i nott m uch experence outw her t ng e s oo i e uni Fal con,PBXT and Sold w ih M ySQ L as t i t hey ar bar y e el used i pr n oducton i - W e di l benchm ar t d ess ks han w ant – spenta l of ed ot tm e fghtng/epori bugs and checki fxes i i i r tng ng i

Benchm ar ks
- R ead- nl on t calt e f w ebO y ypi abl or applcaton i i - D BT2 – TPC - em ul i C aton - D el D VD St e – em ul i ofel or aton com m er sie ce t - Sysbench – O LTP tansactons r i - Sql bench -sm al dat set si e user t calquer l a , ngl , ypi y pater t ns

Box
- D el Pow er l Edge 2950 - C ent S r ease 4. O el 5 - 4 C PU m odelnam e :I elR )Xeon( )C PU nt ( R 2. 33G H z st eppi ng :6 cpu M H z :2327. 529 cache si ze :4096 KB - 16 G B ofR AM - R AI 10 ( 10K R PM 3. SAS har drves) D 6 5” d i 5148 @

M ySQ L Ver ons si
- Yes,t s m eans ver on afect peror ance notonl hi si f s f m y st age engi butw e coul notgetal st age engi or ne d l or ne w or ng w ih sam e M ySQ L ver on. ki t si - I nnoD B and PBXT 5. 19 1. - Fal con 6. 1- pha,bk tee fom 100. al r r Jul - SoldD B i 5. 410. 0073

Engi nes par et s am er
- 12 G B ofR AM f bufer or f s - I nnoD B --innodb_buffer_pool_s ize=12G --innodb_flus h_method=O_DIR E C T --innodb-log -file-s ize=100M - SoldD B --s oliddb-cache-s ize=12G i - Fal con --falcon_min_record_memory=2G --falcon_max_record_memory=4G --falcon_pag e_cache_s ize=8G - PBXT pbxt_index_cache_s ize=8G pbxt_record_cache_s ize=4G

D BT2 C onfgur i D et l i aton ais
- D BT2 - htp:/ t /osdl . dbtsour or net cef ge. / - 10 C oncur entuser ( r s about2 f each C PU cor and or e di sk) - “ o D el t f l l Zer ay” o uly oad M ySQ L Ser ver - I 400W confgur i r n i aton educed avaiabl m em or t 4G l e y o by l ng 12G B ofm em or t have i I bound. ocki y o tO - Bufersi w er r f zes e educed t 2G B o

D BT2 – 10 w ar ehouses
18000 17000 16000 15000 14000 13000 12000 11000 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0
17744

- 10 w ar ehouses,10 clent ( asi ~ i s dat ze 700M ) - R esul i N ew O r tn der Tr ansacton PerM i e, i nut m or i beter e s t - PBXT cr ashed - O l ver on ofFal d si con had ~1100 N O TPM
NOTPM

8209 6097

InnoDB SolidDB Falcon PBXT

- G r i pr eat m ovem ent!

D BT2 – 400 w ar ehouses
- D at si ~ 29G B a ze - SoldD B i cr ashed afer336 m i t ns - D i N otdi e l d sabl ogs on SoldD B t have t ngs i o hi com par e. abl
140 130 120 110 100 90 80 70 60 50 40 30 20 10 0 Time, min
40 63 InnoDB PBXT Falcom

Load time
136

D BT2,400W ,D at si a ze
Size of loaded data
45000 40000 35000 30000 25000 20000 15000 10000 5000 0 MB
30726 InnoDB SolidDB PBXT Falcon 38266 42191 41770

- Sur i ngl l ge si przi y ar ze fom PBXT r - SoldD B – t es w er i abl e l oaded i o M yI nt SAM and t hen convered t t o SoldD B i - I w as cr t ashi ng ot w i her se

D BT2,400W ,R esuls t
- PBXT cr ashed
1200 1100 1000 900 800 700 600 500 400 300 200 100 0 NOTPM
178 495 InnoDB SolidDB Falcon 1105

- R esul i N ew O r tn der Tr ansacton PerM i e, i nut m or i beter e s t

D el D VD St e l or
- D at ze asi M edi um 1 GB 2, 000, 000 C ust er om s 100, 000 Pr oduct s - Fal con – cr ashed - PBXT – a l ofer or ot r s - R esul i N ew O r s tn der perm i e,m or i nut e s beter t
18000 17000 16000 15000 14000 13000 12000 11000 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0
17589

7594

InnoDB SolidDB

orders per minute

sysbench
- O l Fal der con used i t s t . N ew one cr n hi est ashes : ( - C oupl ofR EAD - N LY queres agai t calt e f e O i nst ypi abl or W ebapplcatons – i o ofuseraccount i i nf :
CREATE TABLE IF NOT EXISTS sbtest ( id int(10) unsigned NOT NULL auto_increment, name varchar(64) NOT NULL default '', email varchar(64) NOT NULL default '', password varchar(64) NOT NULL default '', dob date default NULL, address varchar(128) NOT NULL default '', city varchar(64) NOT NULL default '', state_id tinyint(3) unsigned NOT NULL default '0', zip varchar(8) NOT NULL default '', country_id smallint(5) unsigned NOT NULL default '0', PRIMARY KEY (id), KEY `country_id` (country_id,state_id,city) )

sysbench,r ead by prm ar key i y
SELECT name FROM sbtest WHERE id=?

65000.00 60000.00 55000.00 50000.00 45000.00 40000.00 35000.00 30000.00 25000.00 20000.00 15000.00 10000.00 5000.00 0.00 1 4 16 64 128 256

Innodb Falcon SolidDB PBXT

Innodb and Solid have sweat spot being clustered by PK

quries / sec

clients

sysbench,r ead by i ndex
200.00 175.00 150.00 125.00

SELECT name FROM sbtest WHERE country_id=?

quries / sec

100.00 75.00 50.00 25.00 0.00 1 4 16 64 128 256

Innodb Falcon SolidDB PBXT

PBXT Excels

Falcon comes next

clients

sysbench,r ead by cover i ed ndex
250.00 225.00 200.00 175.00 150.00 125.00 100.00 75.00 50.00 25.00 0.00 1 4 16 64 128 256

quries / sec

Innodb Falcon PBXT

SELECT state_id FROM sbtest WHERE country_id=?
● ●

PBXT still SolidDB best Falcon can't use covered index

clients

sysbench,r ead by i ndex,LI I 20 M T
50000.00 45000.00 40000.00 35000.00 30000.00 25000.00 20000.00 15000.00 10000.00 5000.00 0.00 1 4 16 64 128 256

quries / sec

Innodb Falcon SolidDB PBXT

SELECT name FROM sbtest WHERE country_id=? LIMIT 20

Falcon Does not optimize Limit

clients

Innodb Scales poorly

Sysbench O LTP
- D at ze asi 100, 000, 000 r s ow ~25G B - U nior di rbuton f m sti i - IO / bound l oad - r ead /w rt tansactons ie r i - R educed avaiabl m em or by l ng 12G B ourof l e y ocki 16G B

Sysbench O LTP,tm e t l i o oad dat a
- U si m ulival ng t- ue I SER Ts r hert N at han LO AD D ATA I FI N LE - Sold and Fal i con ar e even sl ert ow han I nnodb w hi i know n t be ch s o sl com par t ow ed o M yI SAM f dat l or a oad.
3500 3250 3000 2750 2500 2250 2000 1750 1500 1250 1000 750 500 250 0 sec
1237 1930 InnoDB SolidDB PBXT Falcon 2880 3364

Sysbench O LTP,D at ze asi
Datasize, varchar vs char
27.5 25 22.5 20 17.5 15 12.5 10 7.5 5 2.5 0 InnoDB SolidDB PBXT Falcon

26.44 22.51 23.0323

- C om parson ofst ages i or ofcharand var char col ns i t t e um n he abl - Fal con uses dynam i c l engt r s anyw ay h ow

14.8

char, GB varchar, GB

9.6

8.718.71

- PBXT sur i ngl has prsi y sam e huge si i bot ze n h cases

Sysbench O LTP,r esuls t
I/O bound
50 45 40
46.24

- M em or lm ied t 4G B, yi t o 2G B f bufer or f s - I nnodb and SoldD B have i beneftdue t cl erng i o ust i by prm ar key i y
InnoDB SolidDB PBXT Falcon

transactions / sec

35 30 25 20 15 12.77 10
10.62 10.3 5.8 5.71 3.87 5 4.86 30.14 26.11 22.33 19.06

- Al butFal l con scal w el e l f I bound w or oad or O kl w ih t s am ountofhar t hi d drves. i

0

1

4

64

clients

Sel ed sql ect bench r esuls t
single operation repeated N times, total time in secs. less is better Operation | 1| 2| 3| |innodb_|pbxt_fa|soliddb| alter_table_add (100) | 8.00| 3.00| 32.00| count (100) | 12.00| 8.00| 28.00| count_distinct (1000) | 6.00| 8.00| 74.00| count_distinct_2 (1000) | 11.00| 11.00| 16.00| count_group_on_key_parts (1000) | 7.00| 10.00| 83.00| count_on_key (50100) | 70.00| 94.00| 210.00| delete_all_many_keys (1) | 17.00| 2.00| 28.00| insert (350768) | 6.00| 5.00| 21.00| outer_join (10) | 14.00| 7.00| 61.00| select_key2_return_prim (200000) | 30.00| 29.00| 25.00| select_many_fields (2000) | 8.00| 6.00| 5.00| update_big (10) | 18.00| 56.00| 727.00| update_of_key_big (501) | 19.00| 6.00| 165.00| update_of_primary_key_many_keys (256| 44.00| 17.00| 55.00| update_with_key_prefix (100000) | 19.00| 8.00| 10.00|

C oncl on usi
- Al r ew ed st age engi l evi or nes butI nnoD B ar cur enty e r l t unst e f pr oo abl or oducton use.SoldD B com es cl i i osest . - I nnoD B i stl w i s il nneri m aj iy oft s n ort est - Fal con has ser i ve ssues w ih LI I optm i i and I t M T i zaton O bound scal lt abiiy - PBXT and Fal con w i i cerai t s n n t n est - SoldD B i cur enty an out deri t m s ofPeror ance i s r l si n er f m - N eed t r si w hen pr o evi t oducton ver ons ofal st age i si l or engi nes ar r e eady.

The End
Thanks f com i ! or ng Sldes w ilbe publshed at i l i htp:/ w w . ysql f m ancebl com / t /w m peror og. Feelfee t appr r o oach us w ih yourqueston t i M ySQ L Peror ance O ptm i i C onsuli Avaiabl f m i zaton tng l e htp:/ w w . ysql f m ancebl com / ysqlconsuli t /w m peror og. m tng/

Sysbench O LTP,r esuls,char t
CPU bound
37.5 35 32.5 30 27.5 25 22.5 20 17.5 15 12.5 10 7.5 5 2.5 0
36.71 34.77 29.36 29.1 25.11 20.4 17.51 17.27 InnoDB SolidDB PBXT Falcon

- D at ze com par e asi abl w ih m em or si t y ze

transactions / sec

18.75 15.15 13.81 8.87

1

4

64

clients

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->