Professional Documents
Culture Documents
BI TP NHM
MC LC
MC LC ............................................................................................................................2 M U ..............................................................................................................................5 1. Nhng c trng ca Cassandra .......................................................................................6 1.1 Khi nim....................................................................................................................6 1.2 Phn tn v khng tp trung ha .................................................................................6 1.3 Kh nng m rng mm do .......................................................................................6 1.4 Tnh sn sng cao v kh nng chu li .....................................................................7 1.5 Tnh nht qun ty chnh ............................................................................................7 1.6 Hng dng Row oriented .......................................................................................8 1.7 Schema Free (khng b rng buc v lc ) ........................................................8 1.8 Hiu nng cao..............................................................................................................9 2. Kin trc Cassandra........................................................................................................10 2.1 Kt ni gia cc nt (giao thc Gossip)....................................................................10 2.2 Cc thnh vin ca cm v cc nt ht ging ...........................................................10 2.3 Trng thi d tht bo v s phc hi .......................................................................10 2.4 Phn vng d liu trong Cassandra...........................................................................11 2.4.1 Gii thiu v Phn vng trong cm Trung tm a d liu .................................12 2.4.2 Hiu bit v cc loi phn vng..........................................................................14 2.5 Nhn bn trong Cassandra. .......................................................................................16 Chin lc xc nh v tr nhn bn.............................................................................16 2.6 Kin trc lin kt mng .............................................................................................17 2.7 Snitches .....................................................................................................................20 Cc dng Snitch ...........................................................................................................21 2.8 Yu cu t pha Client trong Cassandra....................................................................23 2.8.1 Yu cu ghi .........................................................................................................23
2
Truy vn ghi ti trung tm a d liu ..........................................................................24 2.8.2 Truy vn c .......................................................................................................25 2.9 Lp k hoch trin khai cm Cassandra....................................................................26 2.9.1 La chn phn cng cho ci t doanh nghip ..................................................26 2.9.2 Lp k hoch cho mt cm Amazon EC2 ..........................................................29 2.9.3 La chn ty chn cu hnh nt .........................................................................31 3. M hnh d liu Cassandra.............................................................................................35 3.1 So snh m hnh d liu Cassandra vi c s d liu quan h.................................35 3.2 Keyspaces..................................................................................................................37 3.3 Column Families .......................................................................................................37 3.4 Columns ....................................................................................................................39 3.5 Cc column c bit (Counter, Expiring, Super) ......................................................40 3.5.1 Expiring Columns ...............................................................................................40 3.5.2 Counter Columns ................................................................................................40 3.5.3 Super Columns....................................................................................................41 3.6 Cc kiu d liu (Comparators v Validators)..........................................................42 3.6.1 Validators............................................................................................................43 3.6.2 Comparators........................................................................................................43 3.7 Nn column family....................................................................................................43 3.7.1 Khi no s dng nn ...........................................................................................44 3.7.2 Cu hnh nn cho mt Column Family...............................................................44 3.8 Ch mc trong Cassandra ..........................................................................................45 3.8.1 Ch mc chnh.....................................................................................................45 3.8.2 Ch mc th cp ..................................................................................................46 3.8.3 To v s dng ch mc th cp .........................................................................46 3.9 Thit k m hnh d liu ...........................................................................................47
3
3.9.1 Da trn cc truy vn..........................................................................................47 3.9.2 Phi chun ha ti u.......................................................................................48 3.9.3 Lp k hoch cho vic ghi trng lp...................................................................48 3.9.4 S dng cc kha dng t nhin hoc thay th ..................................................48 3.9.5 Cc kiu UUID cho tn ct.................................................................................49 KT LUN ........................................................................................................................50 TI LIU THAM KHO ..................................................................................................50
M U
Ngy nay, cc dch v trn Internet phi x l khi lng d liu rt ln. Hu ht d liu s c lu tr phn tn trn nhiu my ch khc nhau. Cc c s d liu quan h ang c trin khai hin nay gii quyt rt tt nhim v lu tr d liu nht nh no , nhng v tnh tp trung m chng c th gy ra vn khi m rng. V, ngi s dng thng mun tm cch bt cc thao tc join na cc bng, tc l phi chun ha d liu vic lu tr nhiu bn sao lu ca d liu ph v hon ton thit k ban u, c trong c s d liu v ng dng. Hn na, chng ta thng xuyn cn tm ng xung quanh cc giao dch phn tn - ni rt d hnh thnh nn cc nt c chai. Nhng hot ng ny thng khng c h tr trc tip trong bt c c s d liu quan h no. V vy, cc h qun tr c s d liu quan h (RDBMS) t ra khng cn ph hp vi cc dch v nh th ny na. Ngi ta bt u ngh ti vic pht trin cc DBMS mi ph hp qun l cc khi lng d liu phn tn ny. Cc DBMS ny thng c gi l NoSQL. Mt i din ni bt ca cc NoSQL l Cassandra. Cassandra l h c s d liu phn tn m ngun m c bt u bi Facebook. Nm 2008, Facebook chuyn n cho cng ng m ngun m v c Apache tip tc pht trin n ngy hm nay. Cassandra c coi l s kt hp ca Amazons Dynamo v Googles BigTable. Cassandra l mt m hnh c s d liu phn tn hon ton, c kh nng chu li cc tt. Mt tnh cht na l n rt linh hot, tc c/ghi tng tuyn tnh khi b sung thm h tng mi. Trong bi tiu lun ny, nhm xin trnh by v nhng c trng ca h c s d liu phn tn Cassandra. Tip theo l phn kin trc v m hnh d liu m Cassandra s dng.
phc v. Nhng sau phn mm bn thn n phi c mt c ch ni b gi d liu ca n ng b vi cc nt khc trong cluster. Kh nng m rng mm do, cp n mt c tnh c bit ca m rng theo chiu ngang. N c ngha l cluster ca bn c th m rng quy m v gim quy m xung mt cch lin mch. lm iu ny, cc cm phi c kh nng chp nhn cc nt mi c th bt u tham gia bng cch nhn c mt bn sao ca mt s hoc tt c d liu v bt u phc v yu cu ngi s dng mi m khng c s gin on ln hoc cu hnh li ton b cluster.Bn khng cn phi khi ng li qu trnh ca bn.Bn khng cn thay i cc truy vn ng dng ca bn. Bn khng phi t cn bng li cc d liu. Ch cn thm mt my - Cassandra s tm thy n v lm n hot ng. M rng quy m xung, tt nhin, c ngha l loi b mt s kh nng x l ca cluster. Bn c th phi lm iu ny nu bn di chuyn mt phn ng dng ca bn sang nn tng khc, hoc nu ng dng ca bn b gim s lng ngi dng v bn cn phi bt u bn bt phn cng.Chng ta hy hy vng iu khng xy ra.Nhng nu c, bn s khng cn phi ph v ton h thng c quy m nh li. 1.4 Tnh sn sng cao v kh nng chu li Trong thut ng kin trc ni chung, tnh sn sng ca mt h thng c nh gi da trn kh nng p ng cc yu cu ca h thng . Nhng my tnh c th mc phi rt nhiu kiu li, t li phn cng n vic t mng. V a s cc my tnh khng th trnh khi cc loi li ny. Tt nhin, c mt s my tnh phc tp c th t mnh lm gim thiu cc li ny, v chng c nhiu phn cng thay th, v c kh nng gi thng bo v cc s kin li t chuyn i cc thnh phn phn cng ca mnh. Nhng bt c ai c th v tnh lm hng mt cp Ethernet, v n s c lp mt trung tm d liu duy nht. V vy, mt h thng c nh gi l c tnh sn sng cao, n thng phi bao gm nhiu my tnh ni mng,v phn mm m h ang chy phi c kh nng iu hnh trong mt cluster v c mt vi c ch nhn din li cc node thng qua yu cu ti cc phn khc ca h thng. Cassandra c tnh sn sng cao. Bn c th thay th cc nt li trong cluster m khng gy ra thi gian cht, v bn c th sao chp d liu n nhiu trung tm d liu cung cp ci thin hiu nng v gim thi gian dng nu mt trung tm d liu phi i mt vi mt thm ha nh ha hon hoc l lt. 1.5 Tnh nht qun ty chnh Tnh nht qun c bn c ngha l thao tc c lun c tr v gi tr bn ghi mi nht. Hy xem xt vic hai khch hng ang c gng t cng mt mt hng vo gi
7
hng ca h trn mt trang web thng mi in t. Nu ti t mt hng cui cng trong kho vo gi hng ca ti ngay lp tc sau khi bn lm vic , bn s nhn c hng c thm vo gi hng ca bn, v ti cn phi c thng bo rng mt hng ny ht. iu ny c m bo xy ra khi trng thi ca thao tc ghi l nht qun gia tt c cc nt c d liu . Tuy nhin, vic m rng quy m lu tr d liu ngha l chng ta phi nh i gia tnh nht qun, tnh sn sng v kh nng chu li (chng ta ch c th la chn 2 trong 3 c im ny). V Cassandra u tin tnh sn sng hn l tnh nht qun. V th, Cassandra c c im l ty chnh tnh nht qun, tc l n cho php ngi dng iu chnh mc nht qun theo yu cu, trong s cn nhc vi mc sn sng. Mc nht qun l mt thit lp m client phi ch ra trn mi thao tc v cho php bn quyt nh bao nhin bn sao trong cluster phi nhn bit thao tc ghi hay p ng li thao tc c c xem l thnh cng.
phc tp, Cassandra cho php bn m hnh ha cc truy vn mnh cn, v sau cung cp d liu cho n.
trong Cassandra trnh nh tuyn cc yu cu t my khch n cc nt khng th truy cp c. (Cassandra cng c th trnh c cc yu cu nh tuyn cc nt cn sng, nhng hot ng km, qua dynamic snitch). Trong qu trnh truyn ti cc bn tin t cc nt khc c trc tip (cc nt giao tip trc tip n n) v gin tip (thng tin c c khi nghe qua 2 nt, 3 nt ), Cassandra s dng mt c ch tnh ton mt ngng cho mi nt da vo iu kin mng, khi lng cng vic, hoc cc iu kin khc m c th nh hng n qu trnh truyn ti. Trong qu trnh trao i tin n, mi nt duy tr mt ca s trt bo cc thng bo tin n t cc nt khc trong cluster. Trong Cassandra, cu hnh phi_convict_threshold iu chnh nhy cho vic d tht bi. Cc gi tr mc nh l fine cho hu ht cc tnh hung, nhng DataStax ngh tng n 12 cho Amazon EC2 do tc nghn mng thng xuyn xy ra trn nn tng . Node c th tht bi do nhiu nguyn nhn khc nhau nh tht bi phn cng, mt mng chnh thc thay i nt thnh vin trong mt cluster, cc qun tr vin s dng tin ch nodetool thm hoc loi b cc nt trong mt cm Cassandra. Khi mt node tr li trc tuyn sau khi khng hot ng, n c th b l vic sao chp cc d liu m n duy tr. Mt khi qu trnh d tht bi nh du mt nt l offline, nu nh hinted handoff c kch hot th qu trnh ghi nh c thc hin bi cc bn sao khc. Tuy nhin, n c th xy ra tnh hung vic ghi b b l gia khong thi gian ca mt nt thc s offline cho ti khi n b pht hin l offline. Hoc nu mt nt khng hot ng lu hn max_hint_window_in_ms (mc nh l mt gi), gi s khng cn c lu li. V l do , tt nht l thng xuyn chy nodetool sa cha tt c cc nt m bo chng ton vn d liu, v cng chy repair sau khi hi phc mt nt offline trong mt thi gian di.
Cc ct d liu c phn chia qua cc nt da trn kha hng. xc nh cc nt bn sao u tin ca mt dng cn sng, vng trn c quay theo chiu kim ng h cho n khi n nh v cc nt vi mt gi tr th ln hn gi tr kha hng. Mi nt c trch nhim i vi 1 khu vc xc nh trong vng trn gia bn thn v nt chu trch nhim khu vc lin k n.Vi cc nt c sp xp theo th t th bi, nt cui cng c coi l tin thn ca nt u tin. V d, hy xem xt mt cm n gin gm 4 nt, ni tt c cc d liu c qun l bi 1 cm c nh s trong khong t 0 n 100. Mi nt c gn mt th bi i din cho mt im trong phm vi ny. Trong v d n gin ny, cc th c gi tr l 0, 25, 50, v 75. Nt u tin, vi token 0, chu trch nhim v phm vi gi (75-0). Nt vi th bi thp nht cng chp nhn kha hng t hn so vi cc m th bi thp nht v nhiu hn vi cc m th bi cao nht.
2.4.1 Gii thiu v Phn vng trong cm Trung tm a d liu Trong trin khai trung tm a d liu, v tr bn sao c tnh cho mi trung tm d liu da vo chnh sch NetworkTopologyStrategy. Trong mi trung tm d liu (hoc nhm nhn bn), bn sao u tin cho mt hng c th c xc nh bi gi tr th bi gn cho mt nt. Cc bn sao trong cng mt trung tm d liu c xc nh bng vic dch chuyn vng theo chiu kim ng h cho n khi n tm c nt u tin trong bnh rng (rack) khc. Nu bn khng tnh ton th phn vng m bo d liu c phn b u cho mi trung tm d liu, bn c th gp phi tnh trng phn phi d liu khng ng u trong mi trung tm d liu.
12
Mc ch l m bo rng cc nt ti mi trung tm d liu u c phn chia th bi trn phm vi tng th. Nu khng, bn c th gp tnh trng cc nt trong mi trung tm d liu s hu mt s lng khng cn xng cc kha hng. Mi trung tm d liu phi c phn chia mt cch c lp, tuy nhin vic gn th bi trong phm vi 1 cm khng c xung t vi nhau (mi node phi c mt th duy nht). Xem Calculating Tokens for a Multi-Data Center Cluster cho cc chin lc v vic lm th no sinh cc th cho cc cm trung tm a d liu.
13
2.4.2 Hiu bit v cc loi phn vng Khng ging nh hu ht cc la chn cu hnh khc trong Cassandra, phn vng ch c th thay i c khi ti li tt c cc d liu. Bi vy cn la chn v cu hnh phn vng chnh xc trc khi khi to cluster. Cassandra cung cp mt s phn vng out-of-the-box, nhng cc phn vng ngu nhin l s la chn tt nht khi trin khai Cassandra. Phn vng ngu nhin RandomPartitioner l phn vng mc nh cho mt cm Cassandra, v trong hu ht cc trng hp l s la chn ng. Vic phn vng ngu nhin s dng hm bm ph hp xc nh xem nt no s lu tr hng no. Khng ging nh vic s dng modulus-by-node-count, hm bm ph hp m bo rng khi cc nt c thm vo cluster, s lng d liu b nh hng l t nht. phn phi cc d liu u qua cc nt, mt thut ton bm to ra mt gi tr MD5 hash ca kha hng. Phm vi c th ca gi tr bm l t 0 n 2 ** 127. Mi nt trong cm c gn mt th bi i din mt gi tr hash trong phm vi ny. Mt nt sau
14
s hu cc hng vi mt gi tr hash t hn s th bi ca n. i vi vic trin khai trung tm d liu n l, cc th bi c tnh bng cch chia phm vi bm bi s lng cc nt trong cluster. i vi vic trin khai nhiu trung tm d liu, th c tnh cho mi trung tm d liu (khong bm nn c chia u cho cc nt trong mi nhm nhn bn). Li ch chnh ca phng php ny l mt khi th ca bn c t ph hp, d liu t tt c cc ct c phn b u trn ton cm m khng tn nhiu thi gian x l. V d, mt ct c th c s dng tn ngi dng nh l kha hng v mt nhn thi gian ct, cc kha hng t mi ct ring l vn lun chuyn ng u. iu ny c ngha l c v vit cc yu cu ca cluster cng s c phn b u. Mt li ch ca vic s dng phn vng ngu nhin l n gin ha vic cn bng ti ti mi cm. Bi v mi mt phn trong phm vi bm s nhn c mt s lng trung bnh cng cc hng, n lm cho vic gn th bi cho cc nt mi c d dng hn. Phn vng theo th t Vic phn vng theo th t m bo rng cc kha hng c lu tr theo th t sp xp. DataStax khuyn co bn la chn cch phn vng ngu nhin trn mt phn vng tr khi ng dng ca bn thc s cn cch phn vng khc. S dng mt phn vng c sp xp cho php qut s lng hng ln, c ngha l bn c th qut cc hng nh th bn ang di chuyn con tr thng qua mt ch s truyn thng. V d, nu ng dng ca bn c tn ngi s dng nh l kha hng, bn c th qut hng cho ngi s dng c tn gia Jake v Joe. y l loi truy vn s khng thc hin c vi cc phn vng c kha hng ngu nhin, v cc kha c lu tr trong th t ca bng bm MD5 ca n (khng phi theo tun t). Phn vng theo th t khng c khuyn khch v nhng l do sau: Vic tun t ghi d liu c th gy ra im nng. Nu ng dng ca bn c xu hng ghi hoc cp nht mt khi lin tc cc hng ti mt thi im m vic vit khng c phn phi trn cluster, u thc hin trn mt nt. iu ny thng xuyn l mt vn vi cc ng dng x l d liu nhn thi gian. Cn ph qun l cao cn bng ti trong cluster. Mt phn vng theo th t yu cu qun tr vin t tnh ton phm vi th bi da trn cc c tnh ca h v phn phi kha hng. Trong thc t, iu ny i hi cc nt tch cc di chuyn th bi thch ng vi phn phi thc t ca d liu khi n c ti.
15
Cn bng ti khng ng u gia cc ct lin quan. Nu ng dng ca bn c nhiu ct, rt c th l nhng ct c kha hng khc nhau v phn phi d liu khc nhau. Mt phn vng theo th t c th dn n phn phi khng ng u cho cc ct trong cng mt cm. Vi Cassandra, c ba s la chn trong vic xy dng phn vng theo th t: ByteOrderedPartitioner kha hng c lu tr theo th t cc raw byte thay v chuyn i chng sang cc chui m ha. Tokens c tnh bng cch nhn vo cc gi tr thc t ca d liu, s dng h thp lc phn cho cc k t u trong kha. V d, nu bn mun hng phn vng theo th t bng ch ci, bn c th ch nh th A bng cch s dng h thp lc phn i din l 41. OrderPreservingPartitioner kha hng c lu tr theo th t da trn m UTF8. Yu cu kha hng phi m ha theo UTF-8. CollatingOrderPreservingPartitioner kha hng c lu tr theo th t da trn ting Anh M. Cng yu cu cc kha hng phi m ha theo UTF-8
Chin lc n gin SimpleStrategy l cch mc nh khi to mt keyspace bng cch s dng Cassandra CLI. Cng c khc, chng hn nh CQL, yu cu bn phi xc nh r rng mt chin lc. SimpleStrategy t bn sao u tin trn mt nt c xc nh bng partitioner (cch phn vng). Bn sao b sung c t trn cc nt tip theo trong vng theo chiu kim ng h m khng xem xt v tr cc nt hoc v tr trung tm d liu.
Khi quyt nh c bao nhiu bn sao cu hnh trong mi trung tm d liu, xem xt chnh l (1) m bo d liu c c tt ti mi trung tm d liu, khng c tr, v (2) kch bn khi tht bi. Hai cch ph bin nht cu hnh cc cm trong nhiu trung tm d liu l: To hai bn sao trong mi trung tm d liu. Cu hnh ny m bo khi 1 nt n l trong 1 nhm nhn bn b li vn cho php c c d liu ( mt mc nht qun ONE). To ba bn sao trong mi trung tm d liu. Cu hnh ny s dng phc v cc nhu cu truy cp thi gian thc. Trong Cassandra khi nim trung tm d liu v nhm nhn bn l tng t nhau, nhm nhn bn l nhm cc nt c cu hnh li vi nhau phc v cho vic nhn bn. Vi NetworkTopologyStrategy, v tr bn sao c xc nh c lp trong mi trung tm d liu (hoc nhm nhn bn). Bn sao u tin ti mi trung tm d liu c t theo cc phn vng (ging nh vi SimpleStrategy). Bn sao sau trong cng mt trung tm d liu c xc nh bng cch tin theo chiu kim ng h cho n khi mt nt 1 rack khc t nhn bn trc c tm thy. Nu khng c nt nh vy, bn sao b sung s c t trong cng mt rack. NetworkTopologyStrategy u tin t cc bn sao ln cc rack ring bit nu c th. Cc nt trong cng mt rack (hoc tng ng nhm vt l) c th d dng li cng 1 thi gian do ngun, li phn cng hoc cc vn mng. Di y l mt v d v cch NetworkTopologyStrategy t bn sao gia hai trung tm d liu vi 4 nhn t nhn bn (hai bn sao Trung tm d liu 1 v hai bn sao trong Trung tm d liu 2):
18
19
2.7 Snitches
Snitch l mt thnh phn cu hnh ca mt cm Cassandra c s dng xc nh cc nt c nhm li vi nhau nh th no trong cu trc lin kt mng tng th (rack v cc nhm trung tm d liu). Cassandra s dng thng tin ny nh tuyn cc yu cu t 1 nt mt cch hiu qu nht c th. Snitch khng nh hng n cc yu cu gia cc ng dng ca khch hng v Cassandra (N khng kim sot client ang kt ni n nt no). Snitches c cu hnh cho mt cm Cassandra trong file cu hnh cassandra.yaml. Tt c cc nt trong mt cluster nn s dng cng mt cu hnh snitch. Khi gn th bi, gn chng lun phin (so le) cho cc Rack, v d: rack1, rack2, rack3, rack1, rack2, rack3...
20
Cc dng Snitch SimpleSnitch (Snitch n gin) SimpleSnitch (mc nh) l thch hp nu bn khng c thng tin v rack hoc trung tm thng tin d liu c sn. Trin khai trung tm d liu duy nht (hoc mt vng trong m my cng cng) thng ri vo loi ny. Nu s dng snitch, dng replication_factor = <#> khi xc nh phm vi kha ca bn. Snitch ny khng xc nh c thng tin v trung tm d liu hoc rack. DseSimpleSnitch DseSimpleSnitch c s dng khi trin khai DataStax Enterprise (DSE). N ph hp vi cu hnh Hadoop iu phi phn tch d liu v cc ng dng thi gian thc. N c th c s dng cho cc cm DSE hn hp nm trong mt trung tm d liu vt l. N cng c th c s dng cho cm DSE a d liu c chnh xc 2 trung tm d liu, vi tt c cc nt phn tch trong cng mt trung tm d liu v tt c cc nt Cassandra thi gian thc trong mt trung tm d liu cn li. Nu s dng snitch, s dng Analytics hoc Cassandra l tn trung tm d liu mc nh khi nh ngha khong khng gian kha.
21
RackInferringSnitch RackInferringSnitch xc inh cu trc lin kt ca mng bng cch phn tch a ch IP ca cc nt. Snitch ny gi nh rng cc octet th hai xc nh cc trung tm d liu cha nt , v cc octet th ba xc nh cc rack.
PropertyFileSnitch PropertyFileSnitch xc nh v tr ca cc nt bng cch s dng nh ngha ca ngi dng trong file: cassandra-topology.properties. Snitch ny l tt nht khi IP ca cc nt l khng thng nht hoc bn c yu cu nhn bn phc tp. Xem Cu hnh PropertyFileSnitch bit thm thng tin. Dng Snitch ny bn c th t tn trung tm d liu ca mnh theo tn mong mun c nh ngha trong flie: cassandratopology.properties. EC2Snitch EC2Snitch dng cho trin khai cc cm trn Amazon EC2, ni m tt c cc cluster dn tri trn nhiu vng. Thay v s dng a ch IP ca nt suy ra v tr nt, Snitch ny s dng API AWS yu cu khu vc v phm vi cn trng cho 1 nt. Khu vc ny c coi l trung tm d liu v cc phm vi cn trng chnh l cc rack trong trung tm d liu. V d, nu mt nt trung tm d liu c tn us-east-1a, us-east th v tr rack l 1a. Dynamic Snitching (Snitching ng) Theo mc nh, tt c snitches cng s dng mt lp snitch ng gim st tr khi c, nh tuyn cc yu t client trnh xa cc nt hiu nng thp. Snithc ng c kch hot theo mc nh, v c khuyn khch s dng trong tt c cc phm vi. Snitching ng c cu hnh trong file cassandra.yaml cho mi nt.
22
23
Truy vn ghi ti trung tm a d liu Trong s trin khai trung tm a d liu, Cassandra ti u hiu nng ghi bng vic chn mt nt iu phi trong mi trung tm d liu t xa x l nhng truy vn ti cc bn sao trong trung tm d liu. Nt iu phi c lin h bi ng dng Client ch yu cu chuyn tip truy vn ghi ti mi nt trong trung tm d liu t xa. Nu s dng mc nht qun l 1 hoc LOCAL_QUORUM, th ch nhng nt trong cng trung tm d liu vi nt iu phi phi phn hi li truy vn ca Client l truy vn thnh cng. Theo cch ny th v tr a l khng nh hng ti thi gian phn hi truy vn Client.
24
2.8.2 Truy vn c i vi truy vn c, c hai loi truy vn c m iu phi c th gi ti mt bn sao; mt truy vn c trc tip v mt truy vn sa c nn. S lng bn sao c lin h bi mt truy vn c trc tip c xc nh bi mc nht qun c a ra bi Client. Truy vn sa c nn c gi ti bt c bn sao b sung no m khng nhn truy vn trc tip. Truy vn sa c m bo rng hng c truy vn c thc hin nht qun tt c cc bn ghi. Do , trc tin nt iu phi ln h vi cc bn sao c ch nh bi mc nht qun. Nt iu phi s gi nhng truy vn ti bn sao m ang phn hi nhanh chng nht. Nhng nt c lin h s phn hi vi d liu truy vn; nu nhiu nt c lin h, th cc hng mi bn sao c so snh trong b nh xem liu chng c nht qun. Nu chng khng nht qun, th bn sao c d liu gn dy nht (da vo khong thi gian) s c nt iu phi s dng chuyn tip kt qu v cho Client. m bo rng tt c cc bn sao u c phin bn gn y nht ca d liu c thng xuyn, th nt iu phi cng lin h v so snh d liu cc bn sao m khng c truy vn c trc tip xem c nht qun v khng b li thi. Nu b li thi th s c cp nht nhng gi tr c ghi gn y nht. Tn trnh ny uc gi l sa c. Sa c c th c cu hnh i vi mi h ct ( s dng read_repair_chance), v c cho php mc nh. V d, trong mt cm vi s lng bn sao l 3, v mc nht qun l QUORUM, 2 trong 3 bn sao c lin h thc hin yu cu c trc tip. Gi nh rng cc bn sao c lin h c nhng phin bn ca cc hng khc nhau, th bn sao c phin bn gn y nht s tr li d liu c truy vn. Bn sao th ba c kim tra tnh
25
nht qun so vi hai bn sao trc, v nu cn thit, bn sao gn y nht s sinh ra mt lnh ghi cp nht nhng phin bn li thi.
- i vi phn cng chuyn dng, RAM c kch thc ti thiu l 8GB l cn thit. DataStax c t vn t 16GB-32GB. - Khong trng heap JAVA nn c thit lp ti a l 8GB hoc bng b nh RAM, thm ch thp hn. - i vi mi trng o s dng ti thiu l 4GB, nh Amazon EC2. CPU Cassandra c tnh ng thi cao v s dng nhiu li CPU nht c th. - i vi phn cng chuyn dng, b vi x l 8 li l im tuyt vi gia hiu nng v gi - i vi nhng mi trng o, xem xt vic s dng mt nh cung cp m cho php CPU c c ch truyn lot nh Rackspace Cloud Servers. cng Cassandra ghi d liu vo cng vi 2 mc ch: - Tt c d liu c ghi vo tp lu vt lu di - Khi cc ngng c t ti, Cassandra y cu trc d liu trong b nh vo cc tp d liu SSTable lu tr lu di. Bn ghi cam kt nhn c tt c cc lnh ghi n mt nt Cassandra, nhng ch c trong sut thi gian nt khi ng. Bn ghi cam kt c gii phng sau khi d liu tng ng c y vo. Nguc li, Cc lnh ghi SSTable (tp tin d liu) c xy ra mt cch khng ng b v c c trong sut thi gian Client tm kim. Ngoi ra, SSTable nh k c gn kt li. S gn kt li ci thin hiu nng bng vic ni v ghi li d liu v b qua d liu c. Tuy nhin, trong sut qu trnh gn kt li (hoc sa nt) s tn dng a v kch thc th mc d liu c th tng ln ng k. Vi l do ny, DataStax t vn trng ra mt s lng khong trng a rnh ri cho mi nt (50% [trng hp xu nht] cho gn kt li tng cp, 10% cho gn kt li san lp). S lng nt S lng ln d liu trn mi a trong mng khng quan trng bng kch thc tng mi nt. Vic s dng s lng ln ca cc nt nh hn tt hn vic s dng t cc nt ln hn bi v s c tht nt c chai trncc nt ln trong sut qu trnh gn kt. Mng V Cassandra l ni lu tr d liu phn tn, nn n t ti trn mng x l cc truy vn c/ghi v cc bn sao d liu qua cc nt mng. Cassandra phi chc
27
chn rng mng c th x l c giao thng mng trnh hin tng tht nt c trai. - Bng thng l 1000Mbit/s hoc ln hn - Kt ni vi giao din Thrift (listen_address) ti NIC (Card giao din mng). - Kt ni giao din my ch RPC (rpc_address) ti NIC khc. Cassandra rt hiu qu vi nhng truy vn nh tuyn ti cc nt bn sao m gn nht v mt v tr a l vi cc nt iu phi x l truy vn. Cassandra s nht nt bn sao cng gi nu c th v s la chn nt bn sao c t trong cng trung tm d liu so vi nhng nt bn sao nm trung tm d liu t xa. Cassandra s dng nhng cng sau y. Nu s dng mt tng la, th phi chc chn rng cc nt trong mt cm c th chuyn ti nt khc qua nhng cng ny. Cng M t Cc cng cng cng 22 SSH (mc inh)
Cng ni b cc nt Cng c trng Cassandra 1024+ 7000 9160 Cng kt ni li/quay vng lp JMX Cng ni b cc nt Cassandra Cng gim st JMX Cassandra
Cng c trng OpsCenter 50031 61620 61621 OpsCenter HTTP proxy i vi Job Tracker Cng gim st cc nt ni b OpsCenter Cng agent OpsCenter
Nt Nhn chung, khi bn c tng la gia cc my, rt kh chy JMX qua mt mng v bo tr an ninh. Bi v kt ni JMX qua cng 7199, bt tay hai bn v
28
sau s dng bt c cng no trong dy 1024+. Thay v s dng SSH thc thi lnh kt ni t xa ti JMX cc b hoc s dng DataStax OpsCenter. 2.9.2 Lp k hoch cho mt cm Amazon EC2 Cc cm Cassandra c th c trin khai trn cc h tng m my nh Amazon EC2. i vi cc cm Cassandra trn EC2, s dng trng hp ln hoc rt ln vi ni lu tr cc b. RAID0, t c th mc d liu v bn ghi cam kt trn cng a RAID0. Trong thc t iu ny chng minh l tt hn c bn ghi cam kt trn a gc (ni m ti nguyn dng chung). i vi d liu d phng, xem xt vic trin khai cm Cassandra qua nhiu vng sn c v s dng a EBS lu cc tp d liu d phng Cassandra. a EBS khng c t vn lu tr d liu Cassandra v hiu nng mng v truy nhp vo ra a khng ph hp vi Cassandra v nhng l do sau: - a EBS tranh u trc tip v thng lng mng vi nhng gi tin chun. C ngha l thng lng EBS c kh nng li nu bn bo ha mt lin kt mng. - a EBS c hiu nng khng tin cy. Hiu nng I/O c th chm mt cch c bit, gy ra cho h thng c v ghi li n tn khi ton b cm tr thnh khng phn hi. - Vic tng thm cng sut bng cch tng s lng a EBS cho mi my ch khng m rng. DataStax cung cp mt hnh nh my Amazon (AMI) cho php bn nhanh chng trin khai cm Cassandra nhiu nt trn Amazon EC2. DataStax AMI khi to tt c cc nt trong mt vng sn c s dng SimpleSnitch. Nu bn mun mt cm EC2 m m rng nhiu vng v khu vc sn c, th khng s dng DataStaxAMI. Thay v , khi to EC2 cho mi nt Cassandra v sau cu hnh cm nh l mt cm trung tm a d liu. Tnh ton dung lng a s dng tnh ton s lng d liu m cc nt Cassandra lu tr, tnh ton dung lng a c th s dng trn mi nt, v sau nhn vi s lng nt trong cm. Nh rng trong mt cm, phn s c bn ghi cam kt v cc th mc d liu trn cc a khc nhau. Tnh ton ny l cho vic c lng dung lng s dng ca khi lng d liu. Bt u vi dung lng th ca a vt l: raw_capacity = disk_size * number_of_disks
29
Tnh ton cho hao ph nh dng h thng tp tin (khong 10%) v mc RAID ang s dng. V d, s dng RAID-10, tnh ton s l: (raw_capacity * 0.9) / 2 = formatted_disk_space Trong hot ng thng thng, Cassandra thng xuyn yu cu dung lng a cho s gn kt li v cc hot ng sa cha. i vi hiu sut ti u, DataStax khuyn co rng bn khng nn dng ht dung lng a ca bn, nhng c th chy 50-80% cng sut. Vy, tnh ton cho khng gian a nh sau (v d s dng 50% cng sut): formatted_disk_space * 0.5 = usable_disk_space Tnh ton kch thc d liu ngi dng Nh vi tt c cc h thng lu tr d liu, kch thc ca d liu th s ln hn khi d liu c np vo trong Cassandra do hao ph lu tr. Trung bnh, d liu th s ln gp khong 2 ln kch thc trn a sau khi ti vo c s d liu, nhng c th nh hn hoc ln hn nhiu ph thuc vo c trng ca d liu v cc thuc tnh ct. Tnh ton trong phn ny l tnh ton cho d liu trn a ch khng phi d liu lu trong b nh. - Hao ph ct: Mi ct trong Cassandra yu cu s dng 15 byte hao ph. V mi dng trong mt ct thuc tnh c th c tn cc ct khc nhau cng nh gi tr ca ct khc nhau, siu d liu c lu tr trong mi ct. i vi ct m v ct ht hiu lc, thm 8 bt m rng. V th tng kch thc ca mt ct thng thng l: total_column_size = column_name_size + column_value_size + 15 - Hao ph dng: Ging nh ct, mi dng cng c hao ph khi lu tr trong a. Mi dng trong Cassandra c 23 byte hao ph. - Kha chnh: Mi ct cng duy tr mt ch s chnh. Chi ph kha chnh tr nn quan trng khi bn c nhiu hng gy . Kch thc ca kha chnh c c lng nh sau (theo byte): primary_key_index = number_of_rows * (32 + average_key_size) - Chi ph bn sao: Cc yu t bn sao ng vai tr quan trng trong trng hp xc nh bao nhiu dung lng a c s dng. i vi bn sao bng 1 th khng c chi ph cho bn sao. Nu s lng bn sao ln hn 1, th yu cu lu tr d liu tng bao gm c chi ph bn sao: replication_overhead = total_data_size * (replication_factor - 1)
30
2.9.3 La chn ty chn cu hnh nt Mt phn chnh ca k hoch trin khai cm Cassandra l hiu v thit lp cc thuc tnh cu hnh nt khc nhau. Trong phn ny gii thch cc quyt nh cu hnh khc nhau cn c thc hin trc khi trin khai mt cm Cassandra, hoc l cm trung tm a d liu hoc l a nt hoc n nt. Nhng thuc tnh ny c cp trong phn ny c thit lp trong file cu hnh cassandra.yaml. Mi nt nn c cu hnh ng trc khi khi ng n ln u tin. Thit lp ni lu tr: Mc nh, mt nt c cu hnh lu tr d liu n qun l trong /var/lib/cassandra. Trong trin khai mt cm, bn nn thay i commitlog_directory n trn cc thit b a khc hn l data_file_directories. Thit lp Gossip Thit lp gossip kim sot s tham gia mt nt trong mt cm v lm th no bit c mt nt thuc mt cm. Thuc tnh Cluster_name M t Tn ca cm ni m nt tham gia. Tn chung cho mi nt trong mt cm a ch IP hoc tn my ch m cc nt Cassandra khc s s dng kt ni ti nt ny. Nn c thay i t localhost thnh a ch cng cng i vi my ch. Mt danh sch cc a ch Ip cc nt khi to ca qu trnh Gossip. Mi nt phi c cng mt danh sch cc seed. Trong cc cm trung tm a d liu, danh sch seed bao gm cc nt n t mi trung tm d liu. Cng giao tip gia cc nt ( mc nh l 7000), c s dng nh ngha dy cc d liu m nt chu trch nhim.
Listen_address
Seeds
Storage_port Initial_token
31
Thanh lc trng thi Gossip trn mi nt Thng tin Gossip c lu tr cc b bi mi nt s dng ngay lp tc trong ln khi ng k tip m khng c bt c s ch i Gossip. lm sch lch s gossip trong mi ln khi ng nt (v d, a ch IP nt thay i), th thm dng sau y vo file cassandra-env.sh. File n c t trong /usr/share/cassandra hoc <install_location>/conf. -Dcassandra.load_ring_state=false Ci t phn vng Khi bn trin khai mt cm Cassandra, bn cn phi chc chn rng mi nt l chu trch nhim cho mt s lng d liu. iu ny cn c gi l cn bng ti.iu ny c thc hin bng cch cu hnh cc phn vng cho mi nt, v vic gn mt cch chnh xc cho cc nt gi tr initial_token. DataStax khuyn co s dng RandomPartitioner (mc nh) i vi tt c cc trin khai cm. Gi nh s dng phn vng ny, mi nt trong cm c gn mt token m biu din mt gi tr bm trong dy t 0 ti 2**127. i vi cc cm trong tt c cc nt nm trong trung tm d liu n, bn c th tnh ton cc token bng vic chia dy cho tng s nt trong cm. Trong trin khai trung tm a d liu, token nn c tnh ton mi trung tm d liu c cn bng ti. Xem Calculating Tokens i vi cc phng php khc nhau sinh token cho mi nt trong cc cm trung tm a d liu v n d liu. Cu hnh thng tin Thng tin (Snitch) cho bit v tr ca cc nut trong topo mng. iu ny nh hng n ni m nhng bn sao c t cng nh cch cc truy vn c nh tuyn gia cc bn sao nh th no. Thuc tnh endpoint_snitch cu hnh thng tin cho nt. Tt c cc nt nn c cng mt cu hnh thng tin. i vi cc cm trung tm n d liu (hoc nt n), s dng mc nh SimpleSnitch l . Tuy nhin, nu ban lp k hoch m rng cc cm ca bn cc thi im sau thnh nhiu trung tm d liu v nhiu gi d liu, th ban nn chn trc cc kiu cu hnh thng tin v trung tm d liu v rch t khi bt u. Cu hnh PropertyFileSnitch
32
PropertyFileSnitch cho php bn nh ngha tn trung tm d liu v tn gi l bt c ci g m bn mun. Vic s dng thng tin ny yu cu bn nh ngha chi tit mng cho mi nt trong cm trong file cu hnh cassandra-topology.properties. File ny c t trong /etc/cassandra/conf/cassandra.yaml i vi chng trnh ci t c ng gi hoc trong <install_location>/conf/cassandra.yaml i vi chng trnh ci t dng nh phn. V d, gi s bn c mt a ch IP khng ng nht v hai trung tm d liu vt l vi hai gi trong mi trung tm, v mt trung tm d liu l gic th 3 sao chp d liu phn tch: # Data Center One 175.56.12.105=DC1:RAC1 175.50.13.200=DC1:RAC1 175.54.35.197=DC1:RAC1 120.53.24.101=DC1:RAC2 120.55.16.200=DC1:RAC2 120.57.102.103=DC1:RAC2 # Data Center Two 110.56.12.120=DC2:RAC1 110.50.13.201=DC2:RAC1 110.54.35.184=DC2:RAC1 50.33.23.120=DC2:RAC2 50.45.14.220=DC2:RAC2 50.17.10.203=DC2:RAC2 # Analytics Replication Group 172.106.12.120=DC3:RAC1 172.106.12.121=DC3:RAC1
33
34
35
Trong Cassandra, keyspace l ni cha tt c d liu ng dng, tng t vi mt c s d liu hay lc trong mt c s d liu quan h. Bn trong keyspace l mt hoc nhiu i tng column family tng t nh cc bng. Cc column family cha cc ct v mt tp cc ct c xc nh bi mt row key do ng dng cung cp. Mi dng l trong mt column family khng nht thit phi c cng cc ct. Cassandra khng p t quan h gia cc column family nh cch m c s d liu quan h thc hin vi cc bng: khng c kha ngoi trong Cassandra, v vic join cc column family khi truy vn khng c h tr. Mi column family c mt tp cc ct t cha c d nh c truy nhp cng nhau tha mn cc truy vn no t ng dng. V d, s dng v d ng dng blog trn, bn c th c mt column family cho user v blog entry nh trong m hnh quan h. Sau , cc column family khc c th c thm vo h tr truy vn m ng dng cn thc hin. V d, tr li truy vn nhng ngi dng no ng k xem blog ca ti , hay cho ti xem tt c cc blog vit v thi trang hay cho ti xem cc bi vit mi nht ca cc blog m ti ng k, bn c th cn thit k cc column family b sung h tr nhng truy vn ny. Ch rng cn thc hin mt s phi chun ha i vi d liu.
36
3.2 Keyspaces
Trong Cassandra, keyspace l ni cha d liu cho ng dng ca bn, ging nh lc trong mt c s d liu quan h. keyspace c dng nhm cc column family li vi nhau. Thng th mt cluster c mt column family cho mt ng dng. Vic nhn bn c iu khin trn c s keyspace, v d liu c nhng yu cu nhn bn khc nhau nn t nhng keyspace khc nhau. Keyspace khng c thit k s dng nh mt lp bn quan trng trong m hnh d liu, m n ch nh mt cch iu khin vic nhn bn d liu cho mt tp cc column family. Cc lnh ca ngn ng nh ngha d liu (DDL) cho vic nh ngha v thay i keyspace c cung cp trong rt nhiu giao din khch hng khc nhau nh Cassandra CLI v CQL. V d, nh ngha c mt keyspace trong CQL: CREATE KEYSPACE keyspace_name WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor=2; Hoc trong Cassandra CLI: CREATE KEYSPACE keyspace_name WITH placement_strategy = 'SimpleStrategy' AND strategy_options = [{replication_factor:2}];
kiu d liu. C 2 kiu mu thit k column family ph bin trong Cassandra: cc column family ng v tnh. Mt column family tnh s dng mt tp tng i c nh cc tn ct ct v ging vi c s d liu quan h hn. V d, mt column family lu tr d liu ngi dng c th c cc ct tn ngi dng, a ch, email, s in thoi Mc d cc dng s c cng mt tp ct, chng khng bt buc phi c gi tr xc nh cho tt c cc ct. Column family tnh thng c metadata c nh ngha trc cho mi ct.
Mt column family ng tn dng c u im trong kh nng ca Cassandra dng cc tn ct bt k m ng dng cung cp lu tr d liu. Mt column family ng cho php bn tnh ton trc cc tp kt qu v lu chng trong mt dng n truy vn d liu hiu qu. Mi dng l mt snapshot ca d liu tha mn mt truy vn c th. V d, mt column family theo di ngi s dng ng k xem mt blog ca ngi dng no .
38
Thay v nh ngha metadata cho cc ct ring l, mt column family ng nh ngha kiu thng tin cho cc tn gi gi tr ca ct, nhng tn v gi tr thc s ca ct c t i ng dng khi mt ct c thm vo. Vi tt c cc column family, mi dng l duy nht v c xc nh bng kha ca dng , ging nh kha chnh trong bng quan h. Mt column family lun c phn chia theo kha dng ca n, v kha dng lun lun c nh ch mc n. Kha dng khng c php trng.
3.4 Columns
Ct l n v d liu nh nht trong Cassandra. N l mt b gm c tn, gi tr, v mt nhn thi gian.
Mt ct phi c ten, v tn c th l mt nhn tnh (nh tn, hay email) hoc n c th c t t ng khi ct c to ra bi ng dng. Ct c th c nh ch mc theo tn ca n. Tuy nhin, mt hn ch ca cc ch mc ct l chng khng h tr cc truy vn yu cu truy nhp n d liu c th t, nh cc d liu chui thi gian. Trong trng hp ny, ch mc th cp trn mt ct nhn thi gian l khng v bn khong th diu khin th t sp xp ca ct vi mt ch mc th cp. Vi nhng trng hp th t sp xp l quan trng, vic duy tr th cng mt column family nh mt ch mc l mt cch khc tra cu ct d liu c sp xp theo th t. Mt ct khng nht thit phi c mt gi tr. i khi tt c thng tin ng dng cn tha mn mt truy vn no c th c lu tr ngay tn ca ct. V d, nu bn ang s dng mt column family nh mt ci nhn c th ha d truy vn cc dng t cc column family khc, tt c nhng g bn cn l lu tr kha dng m bn ang tm kim, gi tr c th trng. Cassandra s dng ct nhn thi gian xc nh cp nht gn nht ca mt ct. Nhn thi gian c cung cp bi ng dng. Nhn thi gian gn nht lun t dc khi yu cu d liu, nu nhiu phin cng cp nht mt ct trong mt dng cng mt lc th cp nhp mi nht l cp nht s c tn ti.
39
V bn trong, cu trc ca mt ct counter c mt cht phc tp hn. Cassandra theo di trng thi phn tn ca counter cng nh nhn thi gian m server sinh ra khi xa mt ct counter. V l do ny, iu quan trng l tt c cc node trong cluster phi c ng h c ng b ha bng giao thc thi gian mng (network time protocol NTP). Mt ct counter c th c c hay vit bt c mc nht qun no. Tuy nhin, iu quan trng l phi hiu rng khng ging nhng ct thng thng, vic ghi vo ct counter yu cu mt ln c trc m bo rng cc gi tr counter phn tn vn thng nht vi nhau trn cc bn sao. Nu bn ghi mc nht qun l 1, vic c n s khng nh hng n ghi tr, nn 1 l mc nht qun ph bin nht dng vi counter. 3.5.3 Super Columns Mt Cassandra column family c th cha c ct thng thng v siu ct iu ny lm cho mc lng ghp trong cu trc column family thng thng tng ln. Siu ct c tp thnh t tn (siu) ct v mt bn c sp xp ca cc ct con. Mt siu ct c th ch ra mt kiu d liu (comparator) cho c tn siu ct v tn ct con.
Mt siu ct l mt cch nhm nhiu ct da trn mt gi tr tm kim chung. Mc ch s dng chnh ca siu ct l phi chun ha nhiu dng t cc column family khc vo trong mt dng, cho php ly d liu di ci nhn c th ha. V d, gi s bn mun to ra mt ci nhn c th ca cc blog entry cho nhng blogger m mt ngi dng ng k xem blog ca h.
41
Mt hn ch ca siu ct l tt c cc ct con ca siu ct phi c gii tun t ha c tng gi tr ring ca ct con, v bn khng th to ra cc ch mc th cp trn cc ct con ca siu ct. Do , vic dng siu ct ph hp nht cho trng hp s lng ct con tng i nh.
42
3.6.1 Validators Vi tt c cc column family, cch thc hnh tt nht l nh ngha mt kiu d liu cho kha dng s dng thuc tnh key_validation_class. Vi cc column family tnh, bn nn nh ngha mt ct v kiu d liu tng ng khi bn nh ngha column family s dng thuc tnh column_metadata. Vi cc column family ng (cc tn ct khng c bit trc), bn nn ch ra mt default_validation_class thay v nh ngha kiu d liu cho tng ct. Cc validator cho kha v ct c th c thm vo hoc thay i trong nh ngha column family bt c khi no. Nu bn ch nh mt validator khng hp l trn column family, yu cu ti d liu c th b nhm ln, v vic thm hay cp nht d liu khong tun theo validator ch nh s b t chi. 3.6.2 Comparators Trong mt dng, cc ct lun c lu tr theo th t sp xp theo tn ct. Comparator ch ra kiu d liu cho tn ct, cng nh th t sp xp m cc ct c lu trong mt dng. Khng ging validator, comparator c th khng c thay i sau khi column family c nh ngha, nn y l mt xem xt quan trng khi nh ngha mt column family trong Cassandra. Thng th tn ca column family tnh thng l kiu chui, v th t sp xp ca ct khng quan trng. Vi cc column family ng, th t sp xp li quan trng. V d, trong mt column family lu tr d liu chui thi gian (tn ct v nhn thi gian), c d liu theo th t sp xp c cn n trch ra tp kt qu t mt dng cc ct.
43
Khng ging nh c s d liu truyn thng, hiu nng ghi b nh hng tiu cc bi vic nn trong Cassandra. Ghi d liu trn cc bng nn trong thc t cho thy ci thin c 10% hiu nng. Trong c s d liu quan h truyn thng, vic ghi i hi ghe ln cc file d liu ang tn ti trn a. iu ny ngha l cc c s d liu phi nh v cc trang lin quan trn a, gii nn chng, ghi d liu lin quan ln, v sau li nn li (mt thao tc t v c s dng CPU v vo ra a). V cc file d liu Cassandra SSTable l bt bin (chng khng dc vit li sau khi c ghi vo a), khong cn phi gin nn x l thao tc ghi. SSTable ch c nn mt ln, khi chng c ghi vo a. Vic nn c th a li cc li ch sau, ph thuc vo c trng d liu ca column family: Gim 2x-4xtrong kch thc d liu Ci thin 25-35% hiu nng c Ci thin 5-10% hiu nng ca thao tc ghi. 3.7.1 Khi no s dng nn Nn ph hp nht cho cc column family c nhiu dng, mi dng c cng s ct, hoc t nht c nhiu ct chung. V d, mt column family cha d liu ngi dng nh tn, email c th l ng c vin tt cho vic nn. Cng c nhiu d liu tng ng trn cc dng t l nn cng ln, v t c hiu nng c tt hn. Nn khong tt i vi cc column family m mi dng c tp cc ct khc nhau, hoc c rt t dng rng. Column family ng nh vy s khng tt trong t l nn. 3.7.2 Cu hnh nn cho mt Column Family Khi bn to hay cp nht mt column family, bn c th chn lm n thnh mt column family nn bng cch thit lp thuc tnh compression_options. Bn c th cho php nn khi bn to mt column family mi, hoc cp nht mt column family c thm vic nn vo sau. Khi bn thm thao tc nn vo mt column family mi, SSTable d c trn a khong c nn ngay lp tc. Bt c SSTable mi no c to ra cng s c nn, v cc SSTable c s c nn trong qu trnh nn Cassandra thng thng. Nu cn, bn c th p buc cc bng SSTable c c ghi li v nn bng cch s dng mt s nodetool. V d, to ra mt column family mi c thc hin nn bng Cassandra CLI, bn lm nh sau:
44
[default@demo] CREATE COLUMN FAMILY key_validation_class=UTF8Type AND column_metadata = [ {column_name: name, validation_class: validation_class: UTF8Type} {column_name: state, validation_class: UTF8Type} UTF8Type} {column_name: birth_year, UTF8Type}
users
WITH
{column_name:
email,
{column_name:
gender,
validation_class:
AND
45
3.8.2 Ch mc th cp Cc ch mc th cp trong Cassandra ch cc ch mc trn gi tr ct ( phn bit vi ch mc kha dng chnh cho mt column family). Cassandra h tr cc ch mc th cp ca kiu KEYS (tng t nh ch mc bm). Cc ch mc th cp cho php truy vn hiu qu bi vic ch ra cc gi tr bng php bng (where column x = value y). V, cc truy vn trn cc gi tr c nh ch mc c th p dng cc b lc b sung vo tp kt qu cho cc gi tr ct khc. Ch mc th cp ca Cassandra tt nht cho cc trng hp nhiu dng cha gi tr c nh ch mc. V d, gi s bn c mt bng user vi hng t ngi dng, v mun tm kim ngi dng theo bang m h sng. Rt nhiu ngi dng s c cng gi tr ct cho bang (nh CA, NY, TX). y l ng c vin tt nht cho ch mc th cp. Mc khc, nu bn mun tm kim ngi dng theo a ch email ca h (mt gi tr thng l duy nht cho mi ngi), th c th hiu qu hn khi duy tr mt cch th cng column family ng di dng mt ch mc. Thm ch vi cc ct cha d liu c nht, s dng cc ch mc th cp cng l mt iu khn ngoan, chng no khi lng truy vn ti cc column family c nh ch mc l va phi v khng di mt ti lin tc. Mt u im khc ca cc ch mc th cp l s d dng v mt thao tc duy tr ch mc. Khi bn to mt ch mc th cp cho mt ct, n nh ch mc d liu ngm bn di. Cc column family c client duy tr nh nhng ch mc phi c to mt cch th cng; v d, nu ct bang c nh ch mc bng cch tp ra mt column family nh users_by_state , ng dng client phi xy dng column family vi d liu t column family users. 3.8.3 To v s dng ch mc th cp Bn c th ch ra kiu KEYS khi to ra nh ngha ct, hoc bn c th thm vo sau nh ch mc mt ct c sn. Cc ch mc th cp c to mt cch t ng bi tin trnh nn m khng cn phi kha cc thao tc c, ghi. V d, trong Cassandra CLI, bn c th to ra mt ch mc th cp trn mt ct khi nh ngha column family (ch index_type: c t KEYS cho cc ct state v birth_year):
46
[default@demo] create column family users with comparator=UTF8Type ... and column_metadata=[{column_name: full_name, validation_class: UTF8Type}, ... {column_name: email, validation_class: UTF8Type}, ... {column_name: birth_year, validation_class: LongType, index_type: KEYS}, ... {column_name: state, validation_class: UTF8Type, index_type: KEYS}]; Hoc bn c th thm mt ch mc vo column family c: [default@demo] update column family users with comparator=UTF8Type ... and column_metadata=[{column_name: full_name, validation_class: UTF8Type}, ... {column_name: email, validation_class: UTF8Type}, ... {column_name: birth_year, validation_class: LongType, index_type: KEYS}, ... {column_name: state, validation_class: UTF8Type, index_type: KEYS}]; V ch mc th cp c to cho ct state, gi tr ca n c th c truy vn trc tip vi nhng ngi dng sng mt bang no , nh [default@demo] get users where state = 'TX';
ch quan tm n d liu ca 6 thng trc, nhng iu ny nn l nhn t trong thit k m hnh d liu cho Cassandra. 3.9.2 Phi chun ha ti u Trong th gii quan h, m hnh d liu thng c thit k vi mc tiu chun ha d liu gim thiu ti a d tha. Vic chun ha thng lin quan n vic to ra cc bng c cu trc cht ch, nh hn v sau nh ngha quan h gia chng. Trong qu trnh truy vn, cc bng lin quan c kt ni vi nhau tha mn yu cu. Cassandra khng c mi quan h kha ngoi ging nh c s d liu quan h, ngha l bn khng th kt ni nhiu column family vi nhau p ng mt yu cu truy vn c th. Cassandra thc hin tt nht khi d liu cn tha mn mt truy vn no c t cng mt column family. C gng thit k m hnh d liu mt hoc mt vi dng trong mt column family c dng tr li mi truy vn. iu ny hy sinh khng gian a (mt trong nhng ti nguyn r nht cho mt server) gim s lng tm kim trn a v lu lng mng. 3.9.3 Lp k hoch cho vic ghi trng lp Trong mt column family, mi dng c xc nh bi kha dng ca n, mt chui c di gn nh khng gii hn. Kha ny khng c dng c th, n phi l duy nht trong mt column family. Khng ging kha chnh trong c s d liu quan h, Cassandra khng bt buc tnh duy nht. Vic thm vo mt kha dng b trng s upsert (kt hp ca insert v update) cc ct trong cu lnh insert ch khng tr v li vi phm. 3.9.4 S dng cc kha dng t nhin hoc thay th Mt vn cn xem xt l s dng cc kha t nhin hay thay th cho mt column family. Mt kha thay th l kha c sinh ra (nh UUID) xc nh duy nht mt dng, nhng khng c quan h vi d liu thc t trong dng . Vi mt s column family, d liu c th cha cc gi tr c m bo l duy nht v thng khng c cp nht sau khi dng c to ra. V d, username trong column family user. y c gi l kha t nhin. Cc kha t nhin khin cho d liu d c hn, v loi b nhu cu cc ch mc b sung hoc phi chun ha. Tuy nhin, tr khi ng dng ca bn m bo tnh duy nht, th vn c nguy c vit ln ct d liu. V cc dng kha t nhin khng cho php cp nht kha dng mt cch d dng. V d, nu kha dng ca bn l mt a ch email v ngi dng mun thay i a ch email ca mnh, bn c th phi to ra mt dng mi vi a ch email mi v sao chp tt c cc ct ang c t dng c sang dng mi.
48
3.9.5 Cc kiu UUID cho tn ct Kiu UUID comparator (id duy nht) c s dng trnh xung t trong tn ct. V d, nu bn mun xc nh mt ct (nh blog entry hay tweet) theo nhn thi gian ca n, nhiu client vit cng mt kha dng ng thi c th gy ra xung t v nhn thi gian, nguy c ghi d liu khng c nh b ghi . S dng UUIDType th hin mt kiu UUID (da trn thi gian) c th trnh nhng xung t nh vy.
49
KT LUN
Trong tiu lun ny nhm em trnh by khi qut v h c s d liu phn tn Cassandra. Cassandra, h thng phn phi v qun l c s d liu, vn c pht trin bi Facebook, cng b m ngun m vo thng 07/2008 v chnh thc gia nhp vo i gia nh Apache. V trong cui thng 02/2010, Cassandra tr thnh Apache Top-Level Project (TLP). Mc d vn cn kh mi m vi cng ng ngi s dng, nhng cng ngh ca Cassandra c ng dng rng ri trong nhng cng ty v t chc c quy m nh Cisco, Twitter v Digg. Cassandra m u cho mt th h database k tip l mt th h c s d liu non-relational (khng quan h), distributed (phn tn), m ngun m, horizontal scalable (kh nng m rng theo chiu ngang) c th lu tr, x l t mt lng rt nh cho ti hng petabytes d liu trong h thng c chu ti, li cao vi nhng i hi v ti nguyn phn cng thp. Tiu lun gii thiu v h c s d liu phn tn Cassandra, kin trc v m hnh d liu trong Cassandra. y l mt tiu lun kh mi, phn ln l tham kho t ti liu nc ngoi, khng th trnh khi nhng thiu xt. Chng em mong thy c v cc bn gp kin thc ca chng em ngy cng hon thin hn. Xin chn thnh cm n.
50