You are on page 1of 50

HC VIN CNG NGH BU CHNH VIN THNG

KHOA QUC T V O TO SAU I HC


*********

BI TP NHM

H C S D LIU PHN TN CASSANDRA


Nhm thc hin: Nguyn Trng Giang Nguyn Hng Hnh Vit Phng Th Nhn Mn: Lp: Cc h thng phn tn Truyn d liu v mng my tnh M11CQCT01-B TS. H Hi Nam

Gio vin hng dn:

H Ni, thng 06 - 2012


1

MC LC
MC LC ............................................................................................................................2 M U ..............................................................................................................................5 1. Nhng c trng ca Cassandra .......................................................................................6 1.1 Khi nim....................................................................................................................6 1.2 Phn tn v khng tp trung ha .................................................................................6 1.3 Kh nng m rng mm do .......................................................................................6 1.4 Tnh sn sng cao v kh nng chu li .....................................................................7 1.5 Tnh nht qun ty chnh ............................................................................................7 1.6 Hng dng Row oriented .......................................................................................8 1.7 Schema Free (khng b rng buc v lc ) ........................................................8 1.8 Hiu nng cao..............................................................................................................9 2. Kin trc Cassandra........................................................................................................10 2.1 Kt ni gia cc nt (giao thc Gossip)....................................................................10 2.2 Cc thnh vin ca cm v cc nt ht ging ...........................................................10 2.3 Trng thi d tht bo v s phc hi .......................................................................10 2.4 Phn vng d liu trong Cassandra...........................................................................11 2.4.1 Gii thiu v Phn vng trong cm Trung tm a d liu .................................12 2.4.2 Hiu bit v cc loi phn vng..........................................................................14 2.5 Nhn bn trong Cassandra. .......................................................................................16 Chin lc xc nh v tr nhn bn.............................................................................16 2.6 Kin trc lin kt mng .............................................................................................17 2.7 Snitches .....................................................................................................................20 Cc dng Snitch ...........................................................................................................21 2.8 Yu cu t pha Client trong Cassandra....................................................................23 2.8.1 Yu cu ghi .........................................................................................................23
2

Truy vn ghi ti trung tm a d liu ..........................................................................24 2.8.2 Truy vn c .......................................................................................................25 2.9 Lp k hoch trin khai cm Cassandra....................................................................26 2.9.1 La chn phn cng cho ci t doanh nghip ..................................................26 2.9.2 Lp k hoch cho mt cm Amazon EC2 ..........................................................29 2.9.3 La chn ty chn cu hnh nt .........................................................................31 3. M hnh d liu Cassandra.............................................................................................35 3.1 So snh m hnh d liu Cassandra vi c s d liu quan h.................................35 3.2 Keyspaces..................................................................................................................37 3.3 Column Families .......................................................................................................37 3.4 Columns ....................................................................................................................39 3.5 Cc column c bit (Counter, Expiring, Super) ......................................................40 3.5.1 Expiring Columns ...............................................................................................40 3.5.2 Counter Columns ................................................................................................40 3.5.3 Super Columns....................................................................................................41 3.6 Cc kiu d liu (Comparators v Validators)..........................................................42 3.6.1 Validators............................................................................................................43 3.6.2 Comparators........................................................................................................43 3.7 Nn column family....................................................................................................43 3.7.1 Khi no s dng nn ...........................................................................................44 3.7.2 Cu hnh nn cho mt Column Family...............................................................44 3.8 Ch mc trong Cassandra ..........................................................................................45 3.8.1 Ch mc chnh.....................................................................................................45 3.8.2 Ch mc th cp ..................................................................................................46 3.8.3 To v s dng ch mc th cp .........................................................................46 3.9 Thit k m hnh d liu ...........................................................................................47
3

3.9.1 Da trn cc truy vn..........................................................................................47 3.9.2 Phi chun ha ti u.......................................................................................48 3.9.3 Lp k hoch cho vic ghi trng lp...................................................................48 3.9.4 S dng cc kha dng t nhin hoc thay th ..................................................48 3.9.5 Cc kiu UUID cho tn ct.................................................................................49 KT LUN ........................................................................................................................50 TI LIU THAM KHO ..................................................................................................50

M U
Ngy nay, cc dch v trn Internet phi x l khi lng d liu rt ln. Hu ht d liu s c lu tr phn tn trn nhiu my ch khc nhau. Cc c s d liu quan h ang c trin khai hin nay gii quyt rt tt nhim v lu tr d liu nht nh no , nhng v tnh tp trung m chng c th gy ra vn khi m rng. V, ngi s dng thng mun tm cch bt cc thao tc join na cc bng, tc l phi chun ha d liu vic lu tr nhiu bn sao lu ca d liu ph v hon ton thit k ban u, c trong c s d liu v ng dng. Hn na, chng ta thng xuyn cn tm ng xung quanh cc giao dch phn tn - ni rt d hnh thnh nn cc nt c chai. Nhng hot ng ny thng khng c h tr trc tip trong bt c c s d liu quan h no. V vy, cc h qun tr c s d liu quan h (RDBMS) t ra khng cn ph hp vi cc dch v nh th ny na. Ngi ta bt u ngh ti vic pht trin cc DBMS mi ph hp qun l cc khi lng d liu phn tn ny. Cc DBMS ny thng c gi l NoSQL. Mt i din ni bt ca cc NoSQL l Cassandra. Cassandra l h c s d liu phn tn m ngun m c bt u bi Facebook. Nm 2008, Facebook chuyn n cho cng ng m ngun m v c Apache tip tc pht trin n ngy hm nay. Cassandra c coi l s kt hp ca Amazons Dynamo v Googles BigTable. Cassandra l mt m hnh c s d liu phn tn hon ton, c kh nng chu li cc tt. Mt tnh cht na l n rt linh hot, tc c/ghi tng tuyn tnh khi b sung thm h tng mi. Trong bi tiu lun ny, nhm xin trnh by v nhng c trng ca h c s d liu phn tn Cassandra. Tip theo l phn kin trc v m hnh d liu m Cassandra s dng.

1. Nhng c trng ca Cassandra


1.1 Khi nim
"Apache Cassandra l mt m ngun m, phn tn, khng tp trung ha, kh nng m rng cao, tnh sn sng cao, chu li, tnh nht qun, c s d liu hng ct da trn c s thit k phn tn trn Dynamo ca Amazon v m hnh d liu ca n trn Bigtable ca Google. c to ra bi Facebook, n c s dng ph bin nht ti mt s sites trn Web".

1.2 Phn tn v khng tp trung ha


Cassandra l phn tn, c ngha l n c kh nng chy trn nhiu my trong khi xut hin trc ngi s dng nh mt th thng nht. Thc t Cassandra khng tp trung ha c ngha l khng c im li duy nht no. Tt c cc nt trong mt cluster Cassandra hot ng nh nhau. iu ny i khigi l my ch i xng. Bi v tt c chng u lm nhng vic nh nhau, v khng c mt my ch c bit c phi iu phi cc hot ng, nh vi m hnh ch /t trong MySQL, Bigtable, v nhng nhiu h c s d liu khc. Do , c im khng tp trung ha c hai u im quan trng: n n gin hn s dng hn m hnh ch /t, v n gip bn trnh vic h thng ngng hot ng. V cc node l ging nhau nn vic thao tc v duy tr lu tr khng tp trung ha d dng hn so vi m hnh ch/t. iu c ngha bn khng cn bt k kin thc c bit m rng, thit lp 50 nt khng khc nhau nhiu lm so vi thit lp mt nt. Hn na, trong mt thit lp master/slave, master c th tr thnh mt im li duy nht (SPOF). trnh iu ny, bn thng cn phi tng thm tnh phc tp mi trng c nhiu master. Bi v tt c cc bn sao trong Cassandra u ng nht, mt nt b li s khng lm gin on dch v. Theo mt cch ngn gn hn, bi v Cassandra c phn phi v khng tp trung, n khng c im li duy nht, v h tr sn sng cao. 1.3 Kh nng m rng mm do Kh nng m rng l mt tnh nng kin trc ca mt h thng cho php n c th tip tc phc v s yu cu ln hn vi hot ng t b suy gim. M rng theo chiu dc - ch n gin thm kh nng phn cng v b nh my tnh hin ti - l cch d nht t c iu ny. M rng theo chiu ngang, c ngha l thm nhiu my cha tt c hoc mt phn d liu khng c my no phi chu ton b gnh nng ca yu cu
6

phc v. Nhng sau phn mm bn thn n phi c mt c ch ni b gi d liu ca n ng b vi cc nt khc trong cluster. Kh nng m rng mm do, cp n mt c tnh c bit ca m rng theo chiu ngang. N c ngha l cluster ca bn c th m rng quy m v gim quy m xung mt cch lin mch. lm iu ny, cc cm phi c kh nng chp nhn cc nt mi c th bt u tham gia bng cch nhn c mt bn sao ca mt s hoc tt c d liu v bt u phc v yu cu ngi s dng mi m khng c s gin on ln hoc cu hnh li ton b cluster.Bn khng cn phi khi ng li qu trnh ca bn.Bn khng cn thay i cc truy vn ng dng ca bn. Bn khng phi t cn bng li cc d liu. Ch cn thm mt my - Cassandra s tm thy n v lm n hot ng. M rng quy m xung, tt nhin, c ngha l loi b mt s kh nng x l ca cluster. Bn c th phi lm iu ny nu bn di chuyn mt phn ng dng ca bn sang nn tng khc, hoc nu ng dng ca bn b gim s lng ngi dng v bn cn phi bt u bn bt phn cng.Chng ta hy hy vng iu khng xy ra.Nhng nu c, bn s khng cn phi ph v ton h thng c quy m nh li. 1.4 Tnh sn sng cao v kh nng chu li Trong thut ng kin trc ni chung, tnh sn sng ca mt h thng c nh gi da trn kh nng p ng cc yu cu ca h thng . Nhng my tnh c th mc phi rt nhiu kiu li, t li phn cng n vic t mng. V a s cc my tnh khng th trnh khi cc loi li ny. Tt nhin, c mt s my tnh phc tp c th t mnh lm gim thiu cc li ny, v chng c nhiu phn cng thay th, v c kh nng gi thng bo v cc s kin li t chuyn i cc thnh phn phn cng ca mnh. Nhng bt c ai c th v tnh lm hng mt cp Ethernet, v n s c lp mt trung tm d liu duy nht. V vy, mt h thng c nh gi l c tnh sn sng cao, n thng phi bao gm nhiu my tnh ni mng,v phn mm m h ang chy phi c kh nng iu hnh trong mt cluster v c mt vi c ch nhn din li cc node thng qua yu cu ti cc phn khc ca h thng. Cassandra c tnh sn sng cao. Bn c th thay th cc nt li trong cluster m khng gy ra thi gian cht, v bn c th sao chp d liu n nhiu trung tm d liu cung cp ci thin hiu nng v gim thi gian dng nu mt trung tm d liu phi i mt vi mt thm ha nh ha hon hoc l lt. 1.5 Tnh nht qun ty chnh Tnh nht qun c bn c ngha l thao tc c lun c tr v gi tr bn ghi mi nht. Hy xem xt vic hai khch hng ang c gng t cng mt mt hng vo gi
7

hng ca h trn mt trang web thng mi in t. Nu ti t mt hng cui cng trong kho vo gi hng ca ti ngay lp tc sau khi bn lm vic , bn s nhn c hng c thm vo gi hng ca bn, v ti cn phi c thng bo rng mt hng ny ht. iu ny c m bo xy ra khi trng thi ca thao tc ghi l nht qun gia tt c cc nt c d liu . Tuy nhin, vic m rng quy m lu tr d liu ngha l chng ta phi nh i gia tnh nht qun, tnh sn sng v kh nng chu li (chng ta ch c th la chn 2 trong 3 c im ny). V Cassandra u tin tnh sn sng hn l tnh nht qun. V th, Cassandra c c im l ty chnh tnh nht qun, tc l n cho php ngi dng iu chnh mc nht qun theo yu cu, trong s cn nhc vi mc sn sng. Mc nht qun l mt thit lp m client phi ch ra trn mi thao tc v cho php bn quyt nh bao nhin bn sao trong cluster phi nhn bit thao tc ghi hay p ng li thao tc c c xem l thnh cng.

1.6 Hng dng Row oriented


Cassandra thng c gi l c s d liu hng dng, v iu ny khng sai. N khng c quan h, v n th hin cu trc d liu ca mnh trong nhng hashtable a chiu ri rc. Ri rc ngha l vi mt dng bt k, bn c th c mt hoc nhiu ct, nhng mi dng khng cn phi c tt c cc ct ging nhau nh trong c s d liu quan h. Mi dng c mt kha ring lm cho d liu ca n c th truy cp c. Cassandra lu tr d liu trong bng bm a chiu nn bn c th khng cn phi quyt nh trc cu trc d liu ca mnh nn nh th no, hoc cc bn ghi cn nhng trng g iu ny rt c li nu bn va mi bt u pht trin ng dng, vic thm vo hoc loi b cc chc nng l kh thng xuyn. Tuy nhin, iu ny khng c ngha l bn hon ton khng phi suy ngh g v cu trc d liu. Cassandra khin bn c suy ngh khc i v n. Thay v thit k mt m hnh d liu ban u v sau thit k cc truy vn xung quanh m hnh nh trong h c s d liu quan h, bn c th t do ngh v cc truy vn trc, v sau cung cp d liu tr li nhng truy vn .

1.7 Schema Free (khng b rng buc v lc )


Cassandra yu cu bn nh ngha mt container gi l keyspace cha cc column family. Keyspace thc cht ch l mt namespace logic gi cc column family v thuc tnh cu hnh c th. Ngoi ra, cc bng d liu ri rc bn bn c th ch cn a d liu vo , s dng ct mnh mun, v khng cn phi nh ngha trc cc ct. Thay v m hnh ha d liu bng cc cng c m hnh ha t tin, v vit nhng cu truy vn
8

phc tp, Cassandra cho php bn m hnh ha cc truy vn mnh cn, v sau cung cp d liu cho n.

1.8 Hiu nng cao


Cassandra c thit k nhm mc ch tn dng c u im ca cc my a x l/ a li, v chy trn nhiu nhiu my nhiu trung tm d liu vi khi lng d liu khng l. Cassandra c th hot ng tt thm ch di ti cng vic cao. Thao tc ghi trong Cassandra cng c thc hin rt nhanh chng. Khi bn thm nhiu server, bn c th duy tr tt c c tnh mong mun ca Cassandra m khng phi hy sinh hiu nng hot ng.

2. Kin trc Cassandra


Cassandra c th hin l mt tp hp ca cc nt c lp c cu hnh li vi nhau thnh mt cm (Cluster).Trong mt cm, tt c cc nt l ngang hng, c ngha l khng c nt chnh hoc nt trung tm qun l cc tin trnh. Mt nt c th tham gia mt cm Cassandra da trn cc kha cnh nht nh trong cu hnh ca n. Phn ny gii thch nhng kha cnh ca kin trc cm Cassandra.

2.1 Kt ni gia cc nt (giao thc Gossip)


Cassandra s dng mt giao thc gi l tin n (Gossip) khm ph v tr v thng tin trng thi v cc nt khc tham gia trong mt cm Cassandra. Gossip l mt giao thc truyn thng ngang hng, trong cc nt nh k trao i thng tin trng thi v bn thn v v cc nt khc m chng bit. Trong Cassandra, qu trnh Gossip chy mi giy v mi nt trao i cc thng bo trng thi ca n ti a vi 3 nt khc trong cluster. Cc nt thng tin trao i v bn thn v v cc nt khc m h n c bit, do , tt c cc nt nhanh chng tm hiu v tt c cc nt khc trong cm (cluster). Mt Gossip c mt phin bn lin kt vi n, do trong qu trnh trao i tin n, vi mi nt c th thng tin c hn b ghi bng cc trng thi hin ti cp nht ca n.

2.2 Cc thnh vin ca cm v cc nt ht ging


Khi mt nt bt u, n nhn vo tp tin cu hnh ca n xc nh tn ca cm Cassandra cha n v nt c gi l ht ging lin h, c c thng tin v cc nt khc trong cluster. im lin lc ca cc cm c cu hnh trong file cassandra.yaml cho mi nt. ngn chn s t on trong truyn thng tin n (gossip), tt c cc nt trong cluster phi c cng mt danh sch cc nt ht ging c lit k trong tp tin cu hnh ca n. iu ny l quan trng nht khi mt nt khi ng. Theo mc nh, mt nt s nh cc nt khc, n c giao tip k c khi khi ng li. Ch : Cc nt ht ging c thit k khng c mc ch no khc hn khi ng qu trnh loan tin n cho cc nt mi gia nhp nhm.

2.3 Trng thi d tht bo v s phc hi


D tht bi l thng qua trng thi ca tin n (gossip), t 1 nt xc nh xem cc nt khc trong h thng ang online hay offline. Thng tin d tht bi cng c s dng
10

trong Cassandra trnh nh tuyn cc yu cu t my khch n cc nt khng th truy cp c. (Cassandra cng c th trnh c cc yu cu nh tuyn cc nt cn sng, nhng hot ng km, qua dynamic snitch). Trong qu trnh truyn ti cc bn tin t cc nt khc c trc tip (cc nt giao tip trc tip n n) v gin tip (thng tin c c khi nghe qua 2 nt, 3 nt ), Cassandra s dng mt c ch tnh ton mt ngng cho mi nt da vo iu kin mng, khi lng cng vic, hoc cc iu kin khc m c th nh hng n qu trnh truyn ti. Trong qu trnh trao i tin n, mi nt duy tr mt ca s trt bo cc thng bo tin n t cc nt khc trong cluster. Trong Cassandra, cu hnh phi_convict_threshold iu chnh nhy cho vic d tht bi. Cc gi tr mc nh l fine cho hu ht cc tnh hung, nhng DataStax ngh tng n 12 cho Amazon EC2 do tc nghn mng thng xuyn xy ra trn nn tng . Node c th tht bi do nhiu nguyn nhn khc nhau nh tht bi phn cng, mt mng chnh thc thay i nt thnh vin trong mt cluster, cc qun tr vin s dng tin ch nodetool thm hoc loi b cc nt trong mt cm Cassandra. Khi mt node tr li trc tuyn sau khi khng hot ng, n c th b l vic sao chp cc d liu m n duy tr. Mt khi qu trnh d tht bi nh du mt nt l offline, nu nh hinted handoff c kch hot th qu trnh ghi nh c thc hin bi cc bn sao khc. Tuy nhin, n c th xy ra tnh hung vic ghi b b l gia khong thi gian ca mt nt thc s offline cho ti khi n b pht hin l offline. Hoc nu mt nt khng hot ng lu hn max_hint_window_in_ms (mc nh l mt gi), gi s khng cn c lu li. V l do , tt nht l thng xuyn chy nodetool sa cha tt c cc nt m bo chng ton vn d liu, v cng chy repair sau khi hi phc mt nt offline trong mt thi gian di.

2.4 Phn vng d liu trong Cassandra


Khi bn khi ng mt cm Cassandra, bn phi chn cch thc d liu s c chia trn cc nt trong cluster. iu ny c thc hin bng cch chn mt phn vng cho cluster. Trong Cassandra, tng s d liu c qun l bi 1 cm c i din nh l mt khng gian hoc vng trn. Vng trn c chia tng ng vi phm vi v s lng cc nt, mi nt chu trch nhim cho mt hoc nhiu vng ca ton b d liu. Trc khi mt nt c th tham gia vng, n phi c gn mt th bi. Th bi xc nh v tr ca nt trn vng trn v phm vi ca d liu n chu trch nhim.
11

Cc ct d liu c phn chia qua cc nt da trn kha hng. xc nh cc nt bn sao u tin ca mt dng cn sng, vng trn c quay theo chiu kim ng h cho n khi n nh v cc nt vi mt gi tr th ln hn gi tr kha hng. Mi nt c trch nhim i vi 1 khu vc xc nh trong vng trn gia bn thn v nt chu trch nhim khu vc lin k n.Vi cc nt c sp xp theo th t th bi, nt cui cng c coi l tin thn ca nt u tin. V d, hy xem xt mt cm n gin gm 4 nt, ni tt c cc d liu c qun l bi 1 cm c nh s trong khong t 0 n 100. Mi nt c gn mt th bi i din cho mt im trong phm vi ny. Trong v d n gin ny, cc th c gi tr l 0, 25, 50, v 75. Nt u tin, vi token 0, chu trch nhim v phm vi gi (75-0). Nt vi th bi thp nht cng chp nhn kha hng t hn so vi cc m th bi thp nht v nhiu hn vi cc m th bi cao nht.

2.4.1 Gii thiu v Phn vng trong cm Trung tm a d liu Trong trin khai trung tm a d liu, v tr bn sao c tnh cho mi trung tm d liu da vo chnh sch NetworkTopologyStrategy. Trong mi trung tm d liu (hoc nhm nhn bn), bn sao u tin cho mt hng c th c xc nh bi gi tr th bi gn cho mt nt. Cc bn sao trong cng mt trung tm d liu c xc nh bng vic dch chuyn vng theo chiu kim ng h cho n khi n tm c nt u tin trong bnh rng (rack) khc. Nu bn khng tnh ton th phn vng m bo d liu c phn b u cho mi trung tm d liu, bn c th gp phi tnh trng phn phi d liu khng ng u trong mi trung tm d liu.
12

Mc ch l m bo rng cc nt ti mi trung tm d liu u c phn chia th bi trn phm vi tng th. Nu khng, bn c th gp tnh trng cc nt trong mi trung tm d liu s hu mt s lng khng cn xng cc kha hng. Mi trung tm d liu phi c phn chia mt cch c lp, tuy nhin vic gn th bi trong phm vi 1 cm khng c xung t vi nhau (mi node phi c mt th duy nht). Xem Calculating Tokens for a Multi-Data Center Cluster cho cc chin lc v vic lm th no sinh cc th cho cc cm trung tm a d liu.

13

2.4.2 Hiu bit v cc loi phn vng Khng ging nh hu ht cc la chn cu hnh khc trong Cassandra, phn vng ch c th thay i c khi ti li tt c cc d liu. Bi vy cn la chn v cu hnh phn vng chnh xc trc khi khi to cluster. Cassandra cung cp mt s phn vng out-of-the-box, nhng cc phn vng ngu nhin l s la chn tt nht khi trin khai Cassandra. Phn vng ngu nhin RandomPartitioner l phn vng mc nh cho mt cm Cassandra, v trong hu ht cc trng hp l s la chn ng. Vic phn vng ngu nhin s dng hm bm ph hp xc nh xem nt no s lu tr hng no. Khng ging nh vic s dng modulus-by-node-count, hm bm ph hp m bo rng khi cc nt c thm vo cluster, s lng d liu b nh hng l t nht. phn phi cc d liu u qua cc nt, mt thut ton bm to ra mt gi tr MD5 hash ca kha hng. Phm vi c th ca gi tr bm l t 0 n 2 ** 127. Mi nt trong cm c gn mt th bi i din mt gi tr hash trong phm vi ny. Mt nt sau
14

s hu cc hng vi mt gi tr hash t hn s th bi ca n. i vi vic trin khai trung tm d liu n l, cc th bi c tnh bng cch chia phm vi bm bi s lng cc nt trong cluster. i vi vic trin khai nhiu trung tm d liu, th c tnh cho mi trung tm d liu (khong bm nn c chia u cho cc nt trong mi nhm nhn bn). Li ch chnh ca phng php ny l mt khi th ca bn c t ph hp, d liu t tt c cc ct c phn b u trn ton cm m khng tn nhiu thi gian x l. V d, mt ct c th c s dng tn ngi dng nh l kha hng v mt nhn thi gian ct, cc kha hng t mi ct ring l vn lun chuyn ng u. iu ny c ngha l c v vit cc yu cu ca cluster cng s c phn b u. Mt li ch ca vic s dng phn vng ngu nhin l n gin ha vic cn bng ti ti mi cm. Bi v mi mt phn trong phm vi bm s nhn c mt s lng trung bnh cng cc hng, n lm cho vic gn th bi cho cc nt mi c d dng hn. Phn vng theo th t Vic phn vng theo th t m bo rng cc kha hng c lu tr theo th t sp xp. DataStax khuyn co bn la chn cch phn vng ngu nhin trn mt phn vng tr khi ng dng ca bn thc s cn cch phn vng khc. S dng mt phn vng c sp xp cho php qut s lng hng ln, c ngha l bn c th qut cc hng nh th bn ang di chuyn con tr thng qua mt ch s truyn thng. V d, nu ng dng ca bn c tn ngi s dng nh l kha hng, bn c th qut hng cho ngi s dng c tn gia Jake v Joe. y l loi truy vn s khng thc hin c vi cc phn vng c kha hng ngu nhin, v cc kha c lu tr trong th t ca bng bm MD5 ca n (khng phi theo tun t). Phn vng theo th t khng c khuyn khch v nhng l do sau: Vic tun t ghi d liu c th gy ra im nng. Nu ng dng ca bn c xu hng ghi hoc cp nht mt khi lin tc cc hng ti mt thi im m vic vit khng c phn phi trn cluster, u thc hin trn mt nt. iu ny thng xuyn l mt vn vi cc ng dng x l d liu nhn thi gian. Cn ph qun l cao cn bng ti trong cluster. Mt phn vng theo th t yu cu qun tr vin t tnh ton phm vi th bi da trn cc c tnh ca h v phn phi kha hng. Trong thc t, iu ny i hi cc nt tch cc di chuyn th bi thch ng vi phn phi thc t ca d liu khi n c ti.
15

Cn bng ti khng ng u gia cc ct lin quan. Nu ng dng ca bn c nhiu ct, rt c th l nhng ct c kha hng khc nhau v phn phi d liu khc nhau. Mt phn vng theo th t c th dn n phn phi khng ng u cho cc ct trong cng mt cm. Vi Cassandra, c ba s la chn trong vic xy dng phn vng theo th t: ByteOrderedPartitioner kha hng c lu tr theo th t cc raw byte thay v chuyn i chng sang cc chui m ha. Tokens c tnh bng cch nhn vo cc gi tr thc t ca d liu, s dng h thp lc phn cho cc k t u trong kha. V d, nu bn mun hng phn vng theo th t bng ch ci, bn c th ch nh th A bng cch s dng h thp lc phn i din l 41. OrderPreservingPartitioner kha hng c lu tr theo th t da trn m UTF8. Yu cu kha hng phi m ha theo UTF-8. CollatingOrderPreservingPartitioner kha hng c lu tr theo th t da trn ting Anh M. Cng yu cu cc kha hng phi m ha theo UTF-8

2.5 Nhn bn trong Cassandra.


Nhn bn l qu trnh lu tr cc bn sao ca d liu trn nhiu nt m bo tin cy v kh nng chu li. Khi bn to mt keyspace (khng gian kha) trong Cassandra, bn phi c chnh sch quyt nh v tr cc bn sao: s lng bn sao v cch nhng bn sao c phn phi trn cc nt trong cluster. Chin lc nhn bn da trn cu hnh ca cm gip xc nh v tr vt l ca cc nt v khong cch gia chng. Tng s bn sao trn cluster thng c gi l nhn t nhn bn. Mt nhn t nhn bn ca 1 c ngha l ch c mt bn sao ca mi hng. Mt nhn t nhn bn ca 2 c ngha l c hai bn sao cho mi hng. Tt c cc bn sao c quan trng nh nhau, khng c bn sao no c coi l chnh, ph v cch c v ghi khi x l cc request Nh mt quy lut chung, cc nhn t nhn bn khng c vt qu s lng cc nt trong cluster. Tuy nhin, c th tng nhn t nhn bn, v sau thm s lng mong mun ca cc nt sau . Khi nhn t nhn bn vt qu s lng cc nt, lnh ghi s b t chi, nhng lnh c s c phc v min l p ng c mc nht qun (consistency level). Chin lc xc nh v tr nhn bn. Replica placement strategy (chin lc xc nh v tr bn sao) cch thc phn phi khng gian kha gia cc bn sao trn cluster. chin lc xc nh v tr bn sao c thit lp khi bn to mt keyspace. C mt s chin lc la chn da trn mc tiu ca bn v cc thng tin bn c v v tr cc nt.
16

Chin lc n gin SimpleStrategy l cch mc nh khi to mt keyspace bng cch s dng Cassandra CLI. Cng c khc, chng hn nh CQL, yu cu bn phi xc nh r rng mt chin lc. SimpleStrategy t bn sao u tin trn mt nt c xc nh bng partitioner (cch phn vng). Bn sao b sung c t trn cc nt tip theo trong vng theo chiu kim ng h m khng xem xt v tr cc nt hoc v tr trung tm d liu.

2.6 Kin trc lin kt mng


NetworkTopologyStrategy l chin lc nhn bn c a thch khi bn c thng tin v cch thc cc nt c nhm li trong trung tm d liu ca bn, hoc bn c (hoc k hoch c) cluster trin khai trn nhiu trung tm d liu. Chin lc ny cho php bn xc nh c bao nhiu bn sao bn mun trong mi trung tm d liu.
17

Khi quyt nh c bao nhiu bn sao cu hnh trong mi trung tm d liu, xem xt chnh l (1) m bo d liu c c tt ti mi trung tm d liu, khng c tr, v (2) kch bn khi tht bi. Hai cch ph bin nht cu hnh cc cm trong nhiu trung tm d liu l: To hai bn sao trong mi trung tm d liu. Cu hnh ny m bo khi 1 nt n l trong 1 nhm nhn bn b li vn cho php c c d liu ( mt mc nht qun ONE). To ba bn sao trong mi trung tm d liu. Cu hnh ny s dng phc v cc nhu cu truy cp thi gian thc. Trong Cassandra khi nim trung tm d liu v nhm nhn bn l tng t nhau, nhm nhn bn l nhm cc nt c cu hnh li vi nhau phc v cho vic nhn bn. Vi NetworkTopologyStrategy, v tr bn sao c xc nh c lp trong mi trung tm d liu (hoc nhm nhn bn). Bn sao u tin ti mi trung tm d liu c t theo cc phn vng (ging nh vi SimpleStrategy). Bn sao sau trong cng mt trung tm d liu c xc nh bng cch tin theo chiu kim ng h cho n khi mt nt 1 rack khc t nhn bn trc c tm thy. Nu khng c nt nh vy, bn sao b sung s c t trong cng mt rack. NetworkTopologyStrategy u tin t cc bn sao ln cc rack ring bit nu c th. Cc nt trong cng mt rack (hoc tng ng nhm vt l) c th d dng li cng 1 thi gian do ngun, li phn cng hoc cc vn mng. Di y l mt v d v cch NetworkTopologyStrategy t bn sao gia hai trung tm d liu vi 4 nhn t nhn bn (hai bn sao Trung tm d liu 1 v hai bn sao trong Trung tm d liu 2):

18

19

2.7 Snitches
Snitch l mt thnh phn cu hnh ca mt cm Cassandra c s dng xc nh cc nt c nhm li vi nhau nh th no trong cu trc lin kt mng tng th (rack v cc nhm trung tm d liu). Cassandra s dng thng tin ny nh tuyn cc yu cu t 1 nt mt cch hiu qu nht c th. Snitch khng nh hng n cc yu cu gia cc ng dng ca khch hng v Cassandra (N khng kim sot client ang kt ni n nt no). Snitches c cu hnh cho mt cm Cassandra trong file cu hnh cassandra.yaml. Tt c cc nt trong mt cluster nn s dng cng mt cu hnh snitch. Khi gn th bi, gn chng lun phin (so le) cho cc Rack, v d: rack1, rack2, rack3, rack1, rack2, rack3...

20

Cc dng Snitch SimpleSnitch (Snitch n gin) SimpleSnitch (mc nh) l thch hp nu bn khng c thng tin v rack hoc trung tm thng tin d liu c sn. Trin khai trung tm d liu duy nht (hoc mt vng trong m my cng cng) thng ri vo loi ny. Nu s dng snitch, dng replication_factor = <#> khi xc nh phm vi kha ca bn. Snitch ny khng xc nh c thng tin v trung tm d liu hoc rack. DseSimpleSnitch DseSimpleSnitch c s dng khi trin khai DataStax Enterprise (DSE). N ph hp vi cu hnh Hadoop iu phi phn tch d liu v cc ng dng thi gian thc. N c th c s dng cho cc cm DSE hn hp nm trong mt trung tm d liu vt l. N cng c th c s dng cho cm DSE a d liu c chnh xc 2 trung tm d liu, vi tt c cc nt phn tch trong cng mt trung tm d liu v tt c cc nt Cassandra thi gian thc trong mt trung tm d liu cn li. Nu s dng snitch, s dng Analytics hoc Cassandra l tn trung tm d liu mc nh khi nh ngha khong khng gian kha.
21

RackInferringSnitch RackInferringSnitch xc inh cu trc lin kt ca mng bng cch phn tch a ch IP ca cc nt. Snitch ny gi nh rng cc octet th hai xc nh cc trung tm d liu cha nt , v cc octet th ba xc nh cc rack.

PropertyFileSnitch PropertyFileSnitch xc nh v tr ca cc nt bng cch s dng nh ngha ca ngi dng trong file: cassandra-topology.properties. Snitch ny l tt nht khi IP ca cc nt l khng thng nht hoc bn c yu cu nhn bn phc tp. Xem Cu hnh PropertyFileSnitch bit thm thng tin. Dng Snitch ny bn c th t tn trung tm d liu ca mnh theo tn mong mun c nh ngha trong flie: cassandratopology.properties. EC2Snitch EC2Snitch dng cho trin khai cc cm trn Amazon EC2, ni m tt c cc cluster dn tri trn nhiu vng. Thay v s dng a ch IP ca nt suy ra v tr nt, Snitch ny s dng API AWS yu cu khu vc v phm vi cn trng cho 1 nt. Khu vc ny c coi l trung tm d liu v cc phm vi cn trng chnh l cc rack trong trung tm d liu. V d, nu mt nt trung tm d liu c tn us-east-1a, us-east th v tr rack l 1a. Dynamic Snitching (Snitching ng) Theo mc nh, tt c snitches cng s dng mt lp snitch ng gim st tr khi c, nh tuyn cc yu t client trnh xa cc nt hiu nng thp. Snithc ng c kch hot theo mc nh, v c khuyn khch s dng trong tt c cc phm vi. Snitching ng c cu hnh trong file cassandra.yaml cho mi nt.

22

2.8 Yu cu t pha Client trong Cassandra


Cc nt trong Cassandra l ngang hng. 1 client c hay vit cc yu cu c th lm vic vi bt k nt no trong cm. Khi mt client kt ni n mt nt v a ra cc yu cu c hoc vit, nt ng vai tr l iu phi vin cho cc hot ng . iu phi vin hot ng nh mt proxy gia cc ng dng khch hng v cc nt (hoc bn sao) cha cc d liu c yu cu. iu phi vin xc nh nt no trong vng s nhn c cc yu cu da trn cu hnh phn vng cm v chin lc t v tr bn sao. 2.8.1 Yu cu ghi ghi d liu, iu phi vin gi yu cu ghi cho tt c cc bn sao s hu hng ang c ghi. Min l tt c cc nt sao ang hot ng v rnh, chng s ghi da vo consistency level trong yu cu ca client. Mc nht qun ghi xc nh c bao nhiu nt nhn bn phi p ng yu cu th vic ghi mi c coi l thnh cng. V d, trong mt trung tm d liu ca 1 cm c 10 nt vi nhn t nhn bn l 3, yu cu ghi s i n tt c 3 nt s hu hng yu cu. Nu mc thng nht ghi theo quy nh ca cliet l ONE, nt u tin hon tt vic ghi s bo v cho iu phi vin, sau iu phi vin s nhn tin thnh cng li cho client. Mt mc nht qun ONE c ngha rng n c th c 2 trong 3 bn sao c th b l vic ghi nu chng ang down ti thi im yu cu c a ra. Nu 1 bn sao b vic ghi, hng da liu s c thc hin ph hp sau thng qua mt c ch t sa li ca Cassandra.

23

Truy vn ghi ti trung tm a d liu Trong s trin khai trung tm a d liu, Cassandra ti u hiu nng ghi bng vic chn mt nt iu phi trong mi trung tm d liu t xa x l nhng truy vn ti cc bn sao trong trung tm d liu. Nt iu phi c lin h bi ng dng Client ch yu cu chuyn tip truy vn ghi ti mi nt trong trung tm d liu t xa. Nu s dng mc nht qun l 1 hoc LOCAL_QUORUM, th ch nhng nt trong cng trung tm d liu vi nt iu phi phi phn hi li truy vn ca Client l truy vn thnh cng. Theo cch ny th v tr a l khng nh hng ti thi gian phn hi truy vn Client.

24

2.8.2 Truy vn c i vi truy vn c, c hai loi truy vn c m iu phi c th gi ti mt bn sao; mt truy vn c trc tip v mt truy vn sa c nn. S lng bn sao c lin h bi mt truy vn c trc tip c xc nh bi mc nht qun c a ra bi Client. Truy vn sa c nn c gi ti bt c bn sao b sung no m khng nhn truy vn trc tip. Truy vn sa c m bo rng hng c truy vn c thc hin nht qun tt c cc bn ghi. Do , trc tin nt iu phi ln h vi cc bn sao c ch nh bi mc nht qun. Nt iu phi s gi nhng truy vn ti bn sao m ang phn hi nhanh chng nht. Nhng nt c lin h s phn hi vi d liu truy vn; nu nhiu nt c lin h, th cc hng mi bn sao c so snh trong b nh xem liu chng c nht qun. Nu chng khng nht qun, th bn sao c d liu gn dy nht (da vo khong thi gian) s c nt iu phi s dng chuyn tip kt qu v cho Client. m bo rng tt c cc bn sao u c phin bn gn y nht ca d liu c thng xuyn, th nt iu phi cng lin h v so snh d liu cc bn sao m khng c truy vn c trc tip xem c nht qun v khng b li thi. Nu b li thi th s c cp nht nhng gi tr c ghi gn y nht. Tn trnh ny uc gi l sa c. Sa c c th c cu hnh i vi mi h ct ( s dng read_repair_chance), v c cho php mc nh. V d, trong mt cm vi s lng bn sao l 3, v mc nht qun l QUORUM, 2 trong 3 bn sao c lin h thc hin yu cu c trc tip. Gi nh rng cc bn sao c lin h c nhng phin bn ca cc hng khc nhau, th bn sao c phin bn gn y nht s tr li d liu c truy vn. Bn sao th ba c kim tra tnh
25

nht qun so vi hai bn sao trc, v nu cn thit, bn sao gn y nht s sinh ra mt lnh ghi cp nht nhng phin bn li thi.

2.9 Lp k hoch trin khai cm Cassandra


Khi lp k hoch trin khai cm Cassandra, cn c mt tng tt v khi lng ban u ca d liu m chng ta cn lu tr v c lng tt kh nng ti ca ng dng. 2.9.1 La chn phn cng cho ci t doanh nghip Vi bt c ng dng no, vic la chn phn cng thch hp phc thuc vo vic la chn cn bng ca nhng ti nguyn sau: b nh, CPU, cng, s lng nt v mng. B nh: Nt Cassandra c cng nhiu b nh th hiu nng c cng tt. Nhiu RAM hn cho php kch c Cache ln hn v gim truy nhp vo / ra cng i vi lnh c. Nhiu b nh RAM hn cng cho php bng b nh t chc nhiu d liu ghi gn y hn. Bng b nh cng ln th cng t s lng bng SS b y vo cng v t file quyets trong sut qu trnh c. S lng RAM l tng ph thuc vo kch thc d on trc c ca d liu nng.
26

- i vi phn cng chuyn dng, RAM c kch thc ti thiu l 8GB l cn thit. DataStax c t vn t 16GB-32GB. - Khong trng heap JAVA nn c thit lp ti a l 8GB hoc bng b nh RAM, thm ch thp hn. - i vi mi trng o s dng ti thiu l 4GB, nh Amazon EC2. CPU Cassandra c tnh ng thi cao v s dng nhiu li CPU nht c th. - i vi phn cng chuyn dng, b vi x l 8 li l im tuyt vi gia hiu nng v gi - i vi nhng mi trng o, xem xt vic s dng mt nh cung cp m cho php CPU c c ch truyn lot nh Rackspace Cloud Servers. cng Cassandra ghi d liu vo cng vi 2 mc ch: - Tt c d liu c ghi vo tp lu vt lu di - Khi cc ngng c t ti, Cassandra y cu trc d liu trong b nh vo cc tp d liu SSTable lu tr lu di. Bn ghi cam kt nhn c tt c cc lnh ghi n mt nt Cassandra, nhng ch c trong sut thi gian nt khi ng. Bn ghi cam kt c gii phng sau khi d liu tng ng c y vo. Nguc li, Cc lnh ghi SSTable (tp tin d liu) c xy ra mt cch khng ng b v c c trong sut thi gian Client tm kim. Ngoi ra, SSTable nh k c gn kt li. S gn kt li ci thin hiu nng bng vic ni v ghi li d liu v b qua d liu c. Tuy nhin, trong sut qu trnh gn kt li (hoc sa nt) s tn dng a v kch thc th mc d liu c th tng ln ng k. Vi l do ny, DataStax t vn trng ra mt s lng khong trng a rnh ri cho mi nt (50% [trng hp xu nht] cho gn kt li tng cp, 10% cho gn kt li san lp). S lng nt S lng ln d liu trn mi a trong mng khng quan trng bng kch thc tng mi nt. Vic s dng s lng ln ca cc nt nh hn tt hn vic s dng t cc nt ln hn bi v s c tht nt c chai trncc nt ln trong sut qu trnh gn kt. Mng V Cassandra l ni lu tr d liu phn tn, nn n t ti trn mng x l cc truy vn c/ghi v cc bn sao d liu qua cc nt mng. Cassandra phi chc
27

chn rng mng c th x l c giao thng mng trnh hin tng tht nt c trai. - Bng thng l 1000Mbit/s hoc ln hn - Kt ni vi giao din Thrift (listen_address) ti NIC (Card giao din mng). - Kt ni giao din my ch RPC (rpc_address) ti NIC khc. Cassandra rt hiu qu vi nhng truy vn nh tuyn ti cc nt bn sao m gn nht v mt v tr a l vi cc nt iu phi x l truy vn. Cassandra s nht nt bn sao cng gi nu c th v s la chn nt bn sao c t trong cng trung tm d liu so vi nhng nt bn sao nm trung tm d liu t xa. Cassandra s dng nhng cng sau y. Nu s dng mt tng la, th phi chc chn rng cc nt trong mt cm c th chuyn ti nt khc qua nhng cng ny. Cng M t Cc cng cng cng 22 SSH (mc inh)

Cng c trng OpsCenter 8888 Cng website OpsCenter

Cng ni b cc nt Cng c trng Cassandra 1024+ 7000 9160 Cng kt ni li/quay vng lp JMX Cng ni b cc nt Cassandra Cng gim st JMX Cassandra

Cng c trng OpsCenter 50031 61620 61621 OpsCenter HTTP proxy i vi Job Tracker Cng gim st cc nt ni b OpsCenter Cng agent OpsCenter

Nt Nhn chung, khi bn c tng la gia cc my, rt kh chy JMX qua mt mng v bo tr an ninh. Bi v kt ni JMX qua cng 7199, bt tay hai bn v
28

sau s dng bt c cng no trong dy 1024+. Thay v s dng SSH thc thi lnh kt ni t xa ti JMX cc b hoc s dng DataStax OpsCenter. 2.9.2 Lp k hoch cho mt cm Amazon EC2 Cc cm Cassandra c th c trin khai trn cc h tng m my nh Amazon EC2. i vi cc cm Cassandra trn EC2, s dng trng hp ln hoc rt ln vi ni lu tr cc b. RAID0, t c th mc d liu v bn ghi cam kt trn cng a RAID0. Trong thc t iu ny chng minh l tt hn c bn ghi cam kt trn a gc (ni m ti nguyn dng chung). i vi d liu d phng, xem xt vic trin khai cm Cassandra qua nhiu vng sn c v s dng a EBS lu cc tp d liu d phng Cassandra. a EBS khng c t vn lu tr d liu Cassandra v hiu nng mng v truy nhp vo ra a khng ph hp vi Cassandra v nhng l do sau: - a EBS tranh u trc tip v thng lng mng vi nhng gi tin chun. C ngha l thng lng EBS c kh nng li nu bn bo ha mt lin kt mng. - a EBS c hiu nng khng tin cy. Hiu nng I/O c th chm mt cch c bit, gy ra cho h thng c v ghi li n tn khi ton b cm tr thnh khng phn hi. - Vic tng thm cng sut bng cch tng s lng a EBS cho mi my ch khng m rng. DataStax cung cp mt hnh nh my Amazon (AMI) cho php bn nhanh chng trin khai cm Cassandra nhiu nt trn Amazon EC2. DataStax AMI khi to tt c cc nt trong mt vng sn c s dng SimpleSnitch. Nu bn mun mt cm EC2 m m rng nhiu vng v khu vc sn c, th khng s dng DataStaxAMI. Thay v , khi to EC2 cho mi nt Cassandra v sau cu hnh cm nh l mt cm trung tm a d liu. Tnh ton dung lng a s dng tnh ton s lng d liu m cc nt Cassandra lu tr, tnh ton dung lng a c th s dng trn mi nt, v sau nhn vi s lng nt trong cm. Nh rng trong mt cm, phn s c bn ghi cam kt v cc th mc d liu trn cc a khc nhau. Tnh ton ny l cho vic c lng dung lng s dng ca khi lng d liu. Bt u vi dung lng th ca a vt l: raw_capacity = disk_size * number_of_disks
29

Tnh ton cho hao ph nh dng h thng tp tin (khong 10%) v mc RAID ang s dng. V d, s dng RAID-10, tnh ton s l: (raw_capacity * 0.9) / 2 = formatted_disk_space Trong hot ng thng thng, Cassandra thng xuyn yu cu dung lng a cho s gn kt li v cc hot ng sa cha. i vi hiu sut ti u, DataStax khuyn co rng bn khng nn dng ht dung lng a ca bn, nhng c th chy 50-80% cng sut. Vy, tnh ton cho khng gian a nh sau (v d s dng 50% cng sut): formatted_disk_space * 0.5 = usable_disk_space Tnh ton kch thc d liu ngi dng Nh vi tt c cc h thng lu tr d liu, kch thc ca d liu th s ln hn khi d liu c np vo trong Cassandra do hao ph lu tr. Trung bnh, d liu th s ln gp khong 2 ln kch thc trn a sau khi ti vo c s d liu, nhng c th nh hn hoc ln hn nhiu ph thuc vo c trng ca d liu v cc thuc tnh ct. Tnh ton trong phn ny l tnh ton cho d liu trn a ch khng phi d liu lu trong b nh. - Hao ph ct: Mi ct trong Cassandra yu cu s dng 15 byte hao ph. V mi dng trong mt ct thuc tnh c th c tn cc ct khc nhau cng nh gi tr ca ct khc nhau, siu d liu c lu tr trong mi ct. i vi ct m v ct ht hiu lc, thm 8 bt m rng. V th tng kch thc ca mt ct thng thng l: total_column_size = column_name_size + column_value_size + 15 - Hao ph dng: Ging nh ct, mi dng cng c hao ph khi lu tr trong a. Mi dng trong Cassandra c 23 byte hao ph. - Kha chnh: Mi ct cng duy tr mt ch s chnh. Chi ph kha chnh tr nn quan trng khi bn c nhiu hng gy . Kch thc ca kha chnh c c lng nh sau (theo byte): primary_key_index = number_of_rows * (32 + average_key_size) - Chi ph bn sao: Cc yu t bn sao ng vai tr quan trng trong trng hp xc nh bao nhiu dung lng a c s dng. i vi bn sao bng 1 th khng c chi ph cho bn sao. Nu s lng bn sao ln hn 1, th yu cu lu tr d liu tng bao gm c chi ph bn sao: replication_overhead = total_data_size * (replication_factor - 1)

30

2.9.3 La chn ty chn cu hnh nt Mt phn chnh ca k hoch trin khai cm Cassandra l hiu v thit lp cc thuc tnh cu hnh nt khc nhau. Trong phn ny gii thch cc quyt nh cu hnh khc nhau cn c thc hin trc khi trin khai mt cm Cassandra, hoc l cm trung tm a d liu hoc l a nt hoc n nt. Nhng thuc tnh ny c cp trong phn ny c thit lp trong file cu hnh cassandra.yaml. Mi nt nn c cu hnh ng trc khi khi ng n ln u tin. Thit lp ni lu tr: Mc nh, mt nt c cu hnh lu tr d liu n qun l trong /var/lib/cassandra. Trong trin khai mt cm, bn nn thay i commitlog_directory n trn cc thit b a khc hn l data_file_directories. Thit lp Gossip Thit lp gossip kim sot s tham gia mt nt trong mt cm v lm th no bit c mt nt thuc mt cm. Thuc tnh Cluster_name M t Tn ca cm ni m nt tham gia. Tn chung cho mi nt trong mt cm a ch IP hoc tn my ch m cc nt Cassandra khc s s dng kt ni ti nt ny. Nn c thay i t localhost thnh a ch cng cng i vi my ch. Mt danh sch cc a ch Ip cc nt khi to ca qu trnh Gossip. Mi nt phi c cng mt danh sch cc seed. Trong cc cm trung tm a d liu, danh sch seed bao gm cc nt n t mi trung tm d liu. Cng giao tip gia cc nt ( mc nh l 7000), c s dng nh ngha dy cc d liu m nt chu trch nhim.

Listen_address

Seeds

Storage_port Initial_token

31

Thanh lc trng thi Gossip trn mi nt Thng tin Gossip c lu tr cc b bi mi nt s dng ngay lp tc trong ln khi ng k tip m khng c bt c s ch i Gossip. lm sch lch s gossip trong mi ln khi ng nt (v d, a ch IP nt thay i), th thm dng sau y vo file cassandra-env.sh. File n c t trong /usr/share/cassandra hoc <install_location>/conf. -Dcassandra.load_ring_state=false Ci t phn vng Khi bn trin khai mt cm Cassandra, bn cn phi chc chn rng mi nt l chu trch nhim cho mt s lng d liu. iu ny cn c gi l cn bng ti.iu ny c thc hin bng cch cu hnh cc phn vng cho mi nt, v vic gn mt cch chnh xc cho cc nt gi tr initial_token. DataStax khuyn co s dng RandomPartitioner (mc nh) i vi tt c cc trin khai cm. Gi nh s dng phn vng ny, mi nt trong cm c gn mt token m biu din mt gi tr bm trong dy t 0 ti 2**127. i vi cc cm trong tt c cc nt nm trong trung tm d liu n, bn c th tnh ton cc token bng vic chia dy cho tng s nt trong cm. Trong trin khai trung tm a d liu, token nn c tnh ton mi trung tm d liu c cn bng ti. Xem Calculating Tokens i vi cc phng php khc nhau sinh token cho mi nt trong cc cm trung tm a d liu v n d liu. Cu hnh thng tin Thng tin (Snitch) cho bit v tr ca cc nut trong topo mng. iu ny nh hng n ni m nhng bn sao c t cng nh cch cc truy vn c nh tuyn gia cc bn sao nh th no. Thuc tnh endpoint_snitch cu hnh thng tin cho nt. Tt c cc nt nn c cng mt cu hnh thng tin. i vi cc cm trung tm n d liu (hoc nt n), s dng mc nh SimpleSnitch l . Tuy nhin, nu ban lp k hoch m rng cc cm ca bn cc thi im sau thnh nhiu trung tm d liu v nhiu gi d liu, th ban nn chn trc cc kiu cu hnh thng tin v trung tm d liu v rch t khi bt u. Cu hnh PropertyFileSnitch

32

PropertyFileSnitch cho php bn nh ngha tn trung tm d liu v tn gi l bt c ci g m bn mun. Vic s dng thng tin ny yu cu bn nh ngha chi tit mng cho mi nt trong cm trong file cu hnh cassandra-topology.properties. File ny c t trong /etc/cassandra/conf/cassandra.yaml i vi chng trnh ci t c ng gi hoc trong <install_location>/conf/cassandra.yaml i vi chng trnh ci t dng nh phn. V d, gi s bn c mt a ch IP khng ng nht v hai trung tm d liu vt l vi hai gi trong mi trung tm, v mt trung tm d liu l gic th 3 sao chp d liu phn tch: # Data Center One 175.56.12.105=DC1:RAC1 175.50.13.200=DC1:RAC1 175.54.35.197=DC1:RAC1 120.53.24.101=DC1:RAC2 120.55.16.200=DC1:RAC2 120.57.102.103=DC1:RAC2 # Data Center Two 110.56.12.120=DC2:RAC1 110.50.13.201=DC2:RAC1 110.54.35.184=DC2:RAC1 50.33.23.120=DC2:RAC2 50.45.14.220=DC2:RAC2 50.17.10.203=DC2:RAC2 # Analytics Replication Group 172.106.12.120=DC3:RAC1 172.106.12.121=DC3:RAC1
33

172.106.12.122=DC3:RAC1 # default for unknown nodes default=DC3:RAC1

34

3. M hnh d liu Cassandra


M hnh d liu Cassandra l mt lc ng, m hnh d liu hng ct. Tc l, khng ging c s d liu quan h, ngi s dng khng cn m hnh ha tt c cc ct m ng dng cn n, v mi dng khng cn phi c cng mt tp hp cc ct ging nhau. Ct v siu d liu ca n c th c thm vo bi ng dng khi cn m khng lm dng chng trnh.

3.1 So snh m hnh d liu Cassandra vi c s d liu quan h


M hnh d liu Cassandra c thit k cho d liu phn tn vi quy m rt ln. Mc d nhu cu so snh Cassandra vi c s d liu quan h l rt t nhin, nhng chng hon ton khc nhau. Trong c s d liu quan h, d liu c lu trong cc bng, cc bng ny to thnh mt ng dng m chng thng lin quan n nhau. D liu c chun ha gim cc bn ghi d tha, v cc bng c kt ni bng kha tha mn mt truy vn no y. V d, xem xt mt ng dng cho php ngi dng to cc bi blog. Trong ng dng ny, bi blog c phn loi theo ch (th thao, thi trang ). Ngi dng c th chn ng k xem blog ca nhng ngi khc. Trong v d ny, user id l kha chnh trong bng user, v l kha ngoi trong cc bng blog v subcriber. Tng t, category id l kha chnh ca bng category v l kha ngoi trong bng blog_entry. S dng m hnh quan h, cc truy vn SQL c t thc hin join nhiu bng tr li cu hi nhng ngi dng no ng k xem blog ca ti, hay cho ti xem tt c cc blog vit v thi trang hay cho ti xem cc bi vit mi nht ca cc blog m ti ng k.

35

Trong Cassandra, keyspace l ni cha tt c d liu ng dng, tng t vi mt c s d liu hay lc trong mt c s d liu quan h. Bn trong keyspace l mt hoc nhiu i tng column family tng t nh cc bng. Cc column family cha cc ct v mt tp cc ct c xc nh bi mt row key do ng dng cung cp. Mi dng l trong mt column family khng nht thit phi c cng cc ct. Cassandra khng p t quan h gia cc column family nh cch m c s d liu quan h thc hin vi cc bng: khng c kha ngoi trong Cassandra, v vic join cc column family khi truy vn khng c h tr. Mi column family c mt tp cc ct t cha c d nh c truy nhp cng nhau tha mn cc truy vn no t ng dng. V d, s dng v d ng dng blog trn, bn c th c mt column family cho user v blog entry nh trong m hnh quan h. Sau , cc column family khc c th c thm vo h tr truy vn m ng dng cn thc hin. V d, tr li truy vn nhng ngi dng no ng k xem blog ca ti , hay cho ti xem tt c cc blog vit v thi trang hay cho ti xem cc bi vit mi nht ca cc blog m ti ng k, bn c th cn thit k cc column family b sung h tr nhng truy vn ny. Ch rng cn thc hin mt s phi chun ha i vi d liu.

36

3.2 Keyspaces
Trong Cassandra, keyspace l ni cha d liu cho ng dng ca bn, ging nh lc trong mt c s d liu quan h. keyspace c dng nhm cc column family li vi nhau. Thng th mt cluster c mt column family cho mt ng dng. Vic nhn bn c iu khin trn c s keyspace, v d liu c nhng yu cu nhn bn khc nhau nn t nhng keyspace khc nhau. Keyspace khng c thit k s dng nh mt lp bn quan trng trong m hnh d liu, m n ch nh mt cch iu khin vic nhn bn d liu cho mt tp cc column family. Cc lnh ca ngn ng nh ngha d liu (DDL) cho vic nh ngha v thay i keyspace c cung cp trong rt nhiu giao din khch hng khc nhau nh Cassandra CLI v CQL. V d, nh ngha c mt keyspace trong CQL: CREATE KEYSPACE keyspace_name WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor=2; Hoc trong Cassandra CLI: CREATE KEYSPACE keyspace_name WITH placement_strategy = 'SimpleStrategy' AND strategy_options = [{replication_factor:2}];

3.3 Column Families


Khi so snh Cassandra vi c s d liu quan h, column family ging nh bng trong n cha cc ct v dng. Tuy nhin, mt column family cn thay i ln trong suy ngh ca nhng ngi quen thuc vi th gii quan h. Trong c s d liu quan h, bn nh ngha ra cc bng c cc ct c nh. Bng xc nh tn ct, v kiu d liu ca n, v sau ng dng cung cp cc dng hon thin schema : mi dng cha cng mt s ct c nh nh nhau. Trong Cassandra, bn nh ngha cc column family. Cc column family c th (v nn) nh ngha metadata v cc ct, nhng cc ct thc s tp thnh mt dng c xc nh bi ng dng. Mi dng c th c s lng ct khc nhau. Mc d cc column family rt linh hot, nhng trong thc th mi column family khng hon ton khng c lc . Mi column family nn c thit k cha mt
37

kiu d liu. C 2 kiu mu thit k column family ph bin trong Cassandra: cc column family ng v tnh. Mt column family tnh s dng mt tp tng i c nh cc tn ct ct v ging vi c s d liu quan h hn. V d, mt column family lu tr d liu ngi dng c th c cc ct tn ngi dng, a ch, email, s in thoi Mc d cc dng s c cng mt tp ct, chng khng bt buc phi c gi tr xc nh cho tt c cc ct. Column family tnh thng c metadata c nh ngha trc cho mi ct.

Mt column family ng tn dng c u im trong kh nng ca Cassandra dng cc tn ct bt k m ng dng cung cp lu tr d liu. Mt column family ng cho php bn tnh ton trc cc tp kt qu v lu chng trong mt dng n truy vn d liu hiu qu. Mi dng l mt snapshot ca d liu tha mn mt truy vn c th. V d, mt column family theo di ngi s dng ng k xem mt blog ca ngi dng no .

38

Thay v nh ngha metadata cho cc ct ring l, mt column family ng nh ngha kiu thng tin cho cc tn gi gi tr ca ct, nhng tn v gi tr thc s ca ct c t i ng dng khi mt ct c thm vo. Vi tt c cc column family, mi dng l duy nht v c xc nh bng kha ca dng , ging nh kha chnh trong bng quan h. Mt column family lun c phn chia theo kha dng ca n, v kha dng lun lun c nh ch mc n. Kha dng khng c php trng.

3.4 Columns
Ct l n v d liu nh nht trong Cassandra. N l mt b gm c tn, gi tr, v mt nhn thi gian.

Mt ct phi c ten, v tn c th l mt nhn tnh (nh tn, hay email) hoc n c th c t t ng khi ct c to ra bi ng dng. Ct c th c nh ch mc theo tn ca n. Tuy nhin, mt hn ch ca cc ch mc ct l chng khng h tr cc truy vn yu cu truy nhp n d liu c th t, nh cc d liu chui thi gian. Trong trng hp ny, ch mc th cp trn mt ct nhn thi gian l khng v bn khong th diu khin th t sp xp ca ct vi mt ch mc th cp. Vi nhng trng hp th t sp xp l quan trng, vic duy tr th cng mt column family nh mt ch mc l mt cch khc tra cu ct d liu c sp xp theo th t. Mt ct khng nht thit phi c mt gi tr. i khi tt c thng tin ng dng cn tha mn mt truy vn no c th c lu tr ngay tn ca ct. V d, nu bn ang s dng mt column family nh mt ci nhn c th ha d truy vn cc dng t cc column family khc, tt c nhng g bn cn l lu tr kha dng m bn ang tm kim, gi tr c th trng. Cassandra s dng ct nhn thi gian xc nh cp nht gn nht ca mt ct. Nhn thi gian c cung cp bi ng dng. Nhn thi gian gn nht lun t dc khi yu cu d liu, nu nhiu phin cng cp nht mt ct trong mt dng cng mt lc th cp nhp mi nht l cp nht s c tn ti.

39

3.5 Cc column c bit (Counter, Expiring, Super)


3.5.1 Expiring Columns Mt ct c th c mt ngy ht hn ty chn gi l TTL (time to live). Mi khi mt ct c thm vo, ng dng yu cu c th ch ra mt gi tr TTL ty chn, c nh ngha bng giy, cho ct . Cc ct TTL c nh du xa sau khi thi gian yu cu ht hn. Mt khi chng c nh du xa, chng s t ng b loi b khi cc qu trnh sa hay nn thng thng. Bn c th s dng CLI hay CQL thit lp TTL cho mt ct. Nu bn mun thay i TTL ca mt ct c hn, bn phi thm li ct vi gi tr TTL mi. Trong Cassandra vic thm mt ct thc s l thao tc thm hoc cp nht, ph thuc vo phin bn trc ca ct tn ti hay cha. iu ny c ngha l cp nht TTL cho mt ct vi mt gi tr khng xc nh, bn phi c ct v sau thm li ct vi mt gi tr TTL mi. Cc ct TTL c chnh xc n mt giy, c tnh ton trn server. Do , mt gi tr TTL rt nh c l khng c my ngha. Hn na, cc ng h trn server phi c ng b ha; nu khng chnh xc c th b gim v thi gian ht hn c tnh ton trn my ch chnh nhn thao tc thm ct u tin, nhng sau li c c ra bi cc my khc trn cluster. Mt ct c hn c thm 8 byte mo u trong b nh hay a ( ghi TTL v thi gian ht hn) so vi cc ct chun. 3.5.2 Counter Columns Counter l mt kiu ct c bit c s dng lu tr mt s c gi tr m t tng khi c s xut hin ca mt s kin hoc tin trnh c th no . V d, bn c th s dng ct counter m s ln mt trang c xem. Cc counter column family phi s dng CounterColumnType l kiu d liu cho ct. iu ny c ngha l hin ti, cc ct counter ch c th c lu tr trong cc column family chuyn bit; chng s c php trn vi cc ct thng thng trong tng lai. Ct counter khc vi cc ct thng thng ch mt khi n c nh ngha, ng dng ch c th cp nht gi tr ct bng cch tng hoc gim n. cp nht ti ct counter, ng dng cn truyn tn ca ct v gi tr tng (hoc gim); khng cn nhn thi gian.
40

V bn trong, cu trc ca mt ct counter c mt cht phc tp hn. Cassandra theo di trng thi phn tn ca counter cng nh nhn thi gian m server sinh ra khi xa mt ct counter. V l do ny, iu quan trng l tt c cc node trong cluster phi c ng h c ng b ha bng giao thc thi gian mng (network time protocol NTP). Mt ct counter c th c c hay vit bt c mc nht qun no. Tuy nhin, iu quan trng l phi hiu rng khng ging nhng ct thng thng, vic ghi vo ct counter yu cu mt ln c trc m bo rng cc gi tr counter phn tn vn thng nht vi nhau trn cc bn sao. Nu bn ghi mc nht qun l 1, vic c n s khng nh hng n ghi tr, nn 1 l mc nht qun ph bin nht dng vi counter. 3.5.3 Super Columns Mt Cassandra column family c th cha c ct thng thng v siu ct iu ny lm cho mc lng ghp trong cu trc column family thng thng tng ln. Siu ct c tp thnh t tn (siu) ct v mt bn c sp xp ca cc ct con. Mt siu ct c th ch ra mt kiu d liu (comparator) cho c tn siu ct v tn ct con.

Mt siu ct l mt cch nhm nhiu ct da trn mt gi tr tm kim chung. Mc ch s dng chnh ca siu ct l phi chun ha nhiu dng t cc column family khc vo trong mt dng, cho php ly d liu di ci nhn c th ha. V d, gi s bn mun to ra mt ci nhn c th ca cc blog entry cho nhng blogger m mt ngi dng ng k xem blog ca h.

41

Mt hn ch ca siu ct l tt c cc ct con ca siu ct phi c gii tun t ha c tng gi tr ring ca ct con, v bn khng th to ra cc ch mc th cp trn cc ct con ca siu ct. Do , vic dng siu ct ph hp nht cho trng hp s lng ct con tng i nh.

3.6 Cc kiu d liu (Comparators v Validators)


Trong c s d liu quan h, bn phi ch r kiu d liu cho mi ct khi bn nh ngha bng. Kiu d liu rng buc cc gi tr c th c thm vo ct . V d, nu bn c mt ct nh ngha kiu d liu integer, bn s khng c php a d liu c gi tr char vo ct . Tn ct trong c s d liu quan h thng l cc nhn c nh c gn khi to bng. Trong Cassandra, kiu d liu cho mt gi tr ct (hay kha ca dng) c gi l validator. Kiu d liu cho mt tn ct c gi l comparator. Bn c th nh ngha cc kiu d liu khi to cc lc column family nhng Cassandra khng yu cu vic ny. V bn trong, Cassandra lu tn ct v gi tr di dng cc mng byte hexa (BytesType). y l kiu m ha mc nh c dng nu cc kiu d liu khng c nh ngha trong lc column family (hoc khng c ch ra bi client yu cu). Cassandra c sn mt s kiu d liu c th c dng cho c vadidator (kiu d liu cho kha dng v gi tr ct) v comparator (kiu d liu cho tn ct). Mt ngoi l l CounterColumnType, ch c dng l gi tr ct (khng c dng cho tn ct hay kha dng).

42

3.6.1 Validators Vi tt c cc column family, cch thc hnh tt nht l nh ngha mt kiu d liu cho kha dng s dng thuc tnh key_validation_class. Vi cc column family tnh, bn nn nh ngha mt ct v kiu d liu tng ng khi bn nh ngha column family s dng thuc tnh column_metadata. Vi cc column family ng (cc tn ct khng c bit trc), bn nn ch ra mt default_validation_class thay v nh ngha kiu d liu cho tng ct. Cc validator cho kha v ct c th c thm vo hoc thay i trong nh ngha column family bt c khi no. Nu bn ch nh mt validator khng hp l trn column family, yu cu ti d liu c th b nhm ln, v vic thm hay cp nht d liu khong tun theo validator ch nh s b t chi. 3.6.2 Comparators Trong mt dng, cc ct lun c lu tr theo th t sp xp theo tn ct. Comparator ch ra kiu d liu cho tn ct, cng nh th t sp xp m cc ct c lu trong mt dng. Khng ging validator, comparator c th khng c thay i sau khi column family c nh ngha, nn y l mt xem xt quan trng khi nh ngha mt column family trong Cassandra. Thng th tn ca column family tnh thng l kiu chui, v th t sp xp ca ct khng quan trng. Vi cc column family ng, th t sp xp li quan trng. V d, trong mt column family lu tr d liu chui thi gian (tn ct v nhn thi gian), c d liu theo th t sp xp c cn n trch ra tp kt qu t mt dng cc ct.

3.7 Nn column family


Vic nn d liu c th c cu hnh trn mi column family. Vic nn ny s ti a ha dung lng lu tr ca cc node Cassandra bng cch gim dung lng d liu trn a. Hn na ngoi tit kim khng gian lu tr, vic nn cng gim vo ra a, c bit l cho cc cng vic ch c. Bn cnh vic gim kch thc d liu, vic nn thng ci thin c hiu nng c v ghi. Cassandra c th nhanh chng tm ra v tr ca cc dng trong ch mc SSTable, v ch gii nn nhm d liu c lin quan. iu ny c ngha vic nn ci thin hiu nng c khng ch bng cch cho php lu tr c nhiu d liu hn trn b nh m cn c li vi cc cng vic c tp d liu khng t va vo b nh.

43

Khng ging nh c s d liu truyn thng, hiu nng ghi b nh hng tiu cc bi vic nn trong Cassandra. Ghi d liu trn cc bng nn trong thc t cho thy ci thin c 10% hiu nng. Trong c s d liu quan h truyn thng, vic ghi i hi ghe ln cc file d liu ang tn ti trn a. iu ny ngha l cc c s d liu phi nh v cc trang lin quan trn a, gii nn chng, ghi d liu lin quan ln, v sau li nn li (mt thao tc t v c s dng CPU v vo ra a). V cc file d liu Cassandra SSTable l bt bin (chng khng dc vit li sau khi c ghi vo a), khong cn phi gin nn x l thao tc ghi. SSTable ch c nn mt ln, khi chng c ghi vo a. Vic nn c th a li cc li ch sau, ph thuc vo c trng d liu ca column family: Gim 2x-4xtrong kch thc d liu Ci thin 25-35% hiu nng c Ci thin 5-10% hiu nng ca thao tc ghi. 3.7.1 Khi no s dng nn Nn ph hp nht cho cc column family c nhiu dng, mi dng c cng s ct, hoc t nht c nhiu ct chung. V d, mt column family cha d liu ngi dng nh tn, email c th l ng c vin tt cho vic nn. Cng c nhiu d liu tng ng trn cc dng t l nn cng ln, v t c hiu nng c tt hn. Nn khong tt i vi cc column family m mi dng c tp cc ct khc nhau, hoc c rt t dng rng. Column family ng nh vy s khng tt trong t l nn. 3.7.2 Cu hnh nn cho mt Column Family Khi bn to hay cp nht mt column family, bn c th chn lm n thnh mt column family nn bng cch thit lp thuc tnh compression_options. Bn c th cho php nn khi bn to mt column family mi, hoc cp nht mt column family c thm vic nn vo sau. Khi bn thm thao tc nn vo mt column family mi, SSTable d c trn a khong c nn ngay lp tc. Bt c SSTable mi no c to ra cng s c nn, v cc SSTable c s c nn trong qu trnh nn Cassandra thng thng. Nu cn, bn c th p buc cc bng SSTable c c ghi li v nn bng cch s dng mt s nodetool. V d, to ra mt column family mi c thc hin nn bng Cassandra CLI, bn lm nh sau:
44

[default@demo] CREATE COLUMN FAMILY key_validation_class=UTF8Type AND column_metadata = [ {column_name: name, validation_class: validation_class: UTF8Type} {column_name: state, validation_class: UTF8Type} UTF8Type} {column_name: birth_year, UTF8Type}

users

WITH

{column_name:

email,

{column_name:

gender,

validation_class:

validation_class: LongType} ] compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64};

AND

3.8 Ch mc trong Cassandra


Mt ch mc l mt cu trc d liu cho php tm kim d liu nhanh, hiu qu ph hp vi mt iu kin cho. 3.8.1 Ch mc chnh Trong thit k c s d liu quan h, mt kha chnh l kha duy nht c dng xc nh mi dng trong bng. Mt ch mc kha chnh, ging nh ch mc, tng tc truy nhp ngu nhin ti d liu trong bng. Kha chnh cng m bo tnh duy nht ca bn ghi, v c th iu khin th t trong cc bn ghi c phn cm mt cch vt l, hoc c lu tr trong c s d liu. Trong Cassandra, ch mc chnh cho mt column family l ch mc ca cc kha dng. Mi node gi ch mc ny cho d liu n qun l. Cc dng c gn cho cc node bi phn vng cluster c cu hnh, v chin lc t bn sao keyspace cu hnh. Ch mc chnh trong Cassandra cho php tm kim cc dng theo kha dng. V mi node bit khong cc kha m n qun l, cc dng c yu cu c th c nh v mt cch hiu qu bng vic qut cc ch mc dng ch trn cc bn sao lin quan. Vi cc kha dng c phn vng ngu nhin (mc nh trong Cassandra), cc kha dng c phn vng bi m bm MD5 v khng c qut theo th t nh trong cc ch mc cy nh phn truyn thng. S dng cc phn vng c th t cho php gii hn cc truy vn ti cc dng, nhng khng c khuyn khch v phc tp trong vic duy tr vic phn tn d liu trn cc node.

45

3.8.2 Ch mc th cp Cc ch mc th cp trong Cassandra ch cc ch mc trn gi tr ct ( phn bit vi ch mc kha dng chnh cho mt column family). Cassandra h tr cc ch mc th cp ca kiu KEYS (tng t nh ch mc bm). Cc ch mc th cp cho php truy vn hiu qu bi vic ch ra cc gi tr bng php bng (where column x = value y). V, cc truy vn trn cc gi tr c nh ch mc c th p dng cc b lc b sung vo tp kt qu cho cc gi tr ct khc. Ch mc th cp ca Cassandra tt nht cho cc trng hp nhiu dng cha gi tr c nh ch mc. V d, gi s bn c mt bng user vi hng t ngi dng, v mun tm kim ngi dng theo bang m h sng. Rt nhiu ngi dng s c cng gi tr ct cho bang (nh CA, NY, TX). y l ng c vin tt nht cho ch mc th cp. Mc khc, nu bn mun tm kim ngi dng theo a ch email ca h (mt gi tr thng l duy nht cho mi ngi), th c th hiu qu hn khi duy tr mt cch th cng column family ng di dng mt ch mc. Thm ch vi cc ct cha d liu c nht, s dng cc ch mc th cp cng l mt iu khn ngoan, chng no khi lng truy vn ti cc column family c nh ch mc l va phi v khng di mt ti lin tc. Mt u im khc ca cc ch mc th cp l s d dng v mt thao tc duy tr ch mc. Khi bn to mt ch mc th cp cho mt ct, n nh ch mc d liu ngm bn di. Cc column family c client duy tr nh nhng ch mc phi c to mt cch th cng; v d, nu ct bang c nh ch mc bng cch tp ra mt column family nh users_by_state , ng dng client phi xy dng column family vi d liu t column family users. 3.8.3 To v s dng ch mc th cp Bn c th ch ra kiu KEYS khi to ra nh ngha ct, hoc bn c th thm vo sau nh ch mc mt ct c sn. Cc ch mc th cp c to mt cch t ng bi tin trnh nn m khng cn phi kha cc thao tc c, ghi. V d, trong Cassandra CLI, bn c th to ra mt ch mc th cp trn mt ct khi nh ngha column family (ch index_type: c t KEYS cho cc ct state v birth_year):
46

[default@demo] create column family users with comparator=UTF8Type ... and column_metadata=[{column_name: full_name, validation_class: UTF8Type}, ... {column_name: email, validation_class: UTF8Type}, ... {column_name: birth_year, validation_class: LongType, index_type: KEYS}, ... {column_name: state, validation_class: UTF8Type, index_type: KEYS}]; Hoc bn c th thm mt ch mc vo column family c: [default@demo] update column family users with comparator=UTF8Type ... and column_metadata=[{column_name: full_name, validation_class: UTF8Type}, ... {column_name: email, validation_class: UTF8Type}, ... {column_name: birth_year, validation_class: LongType, index_type: KEYS}, ... {column_name: state, validation_class: UTF8Type, index_type: KEYS}]; V ch mc th cp c to cho ct state, gi tr ca n c th c truy vn trc tip vi nhng ngi dng sng mt bang no , nh [default@demo] get users where state = 'TX';

3.9 Thit k m hnh d liu


Vic thit k mt m hnh d liu trong Cassandra cn mt s cn nhc trong thit k khc so vi trong c s d liu quan h. Sau cng, m hnh d liu bn thit k ph thuc vo d liu m bn mun lu gi v cch bn truy nhp n. Tuy nhin, c mt s cn nhc thit k chung cho vic d kin m hnh d liu Cassandra. 3.9.1 Da trn cc truy vn Cch tt nht ti gn vi m hnh ha d liu cho Cassandra l bt u vi cc truy vn v lm ngc tr li t . Ngh v cc hnh ng m ng dng ca bn cn thc hin, bn mun truy nhp d liu nh th no, v sau thit k cc column family h tr cc mu truy cp . V d, bt u vi vic lit k tt c nhng use case m ng dng cn h tr. Ngh v d liu bn mun lu gi v cc tra cu ng dng cn lm. Ch c cc yu cu v sp xp, lc v nhm d liu. V d, nu bn cn cc s kin theo mt th t thi gian, hoc bn
47

ch quan tm n d liu ca 6 thng trc, nhng iu ny nn l nhn t trong thit k m hnh d liu cho Cassandra. 3.9.2 Phi chun ha ti u Trong th gii quan h, m hnh d liu thng c thit k vi mc tiu chun ha d liu gim thiu ti a d tha. Vic chun ha thng lin quan n vic to ra cc bng c cu trc cht ch, nh hn v sau nh ngha quan h gia chng. Trong qu trnh truy vn, cc bng lin quan c kt ni vi nhau tha mn yu cu. Cassandra khng c mi quan h kha ngoi ging nh c s d liu quan h, ngha l bn khng th kt ni nhiu column family vi nhau p ng mt yu cu truy vn c th. Cassandra thc hin tt nht khi d liu cn tha mn mt truy vn no c t cng mt column family. C gng thit k m hnh d liu mt hoc mt vi dng trong mt column family c dng tr li mi truy vn. iu ny hy sinh khng gian a (mt trong nhng ti nguyn r nht cho mt server) gim s lng tm kim trn a v lu lng mng. 3.9.3 Lp k hoch cho vic ghi trng lp Trong mt column family, mi dng c xc nh bi kha dng ca n, mt chui c di gn nh khng gii hn. Kha ny khng c dng c th, n phi l duy nht trong mt column family. Khng ging kha chnh trong c s d liu quan h, Cassandra khng bt buc tnh duy nht. Vic thm vo mt kha dng b trng s upsert (kt hp ca insert v update) cc ct trong cu lnh insert ch khng tr v li vi phm. 3.9.4 S dng cc kha dng t nhin hoc thay th Mt vn cn xem xt l s dng cc kha t nhin hay thay th cho mt column family. Mt kha thay th l kha c sinh ra (nh UUID) xc nh duy nht mt dng, nhng khng c quan h vi d liu thc t trong dng . Vi mt s column family, d liu c th cha cc gi tr c m bo l duy nht v thng khng c cp nht sau khi dng c to ra. V d, username trong column family user. y c gi l kha t nhin. Cc kha t nhin khin cho d liu d c hn, v loi b nhu cu cc ch mc b sung hoc phi chun ha. Tuy nhin, tr khi ng dng ca bn m bo tnh duy nht, th vn c nguy c vit ln ct d liu. V cc dng kha t nhin khng cho php cp nht kha dng mt cch d dng. V d, nu kha dng ca bn l mt a ch email v ngi dng mun thay i a ch email ca mnh, bn c th phi to ra mt dng mi vi a ch email mi v sao chp tt c cc ct ang c t dng c sang dng mi.
48

3.9.5 Cc kiu UUID cho tn ct Kiu UUID comparator (id duy nht) c s dng trnh xung t trong tn ct. V d, nu bn mun xc nh mt ct (nh blog entry hay tweet) theo nhn thi gian ca n, nhiu client vit cng mt kha dng ng thi c th gy ra xung t v nhn thi gian, nguy c ghi d liu khng c nh b ghi . S dng UUIDType th hin mt kiu UUID (da trn thi gian) c th trnh nhng xung t nh vy.

49

KT LUN
Trong tiu lun ny nhm em trnh by khi qut v h c s d liu phn tn Cassandra. Cassandra, h thng phn phi v qun l c s d liu, vn c pht trin bi Facebook, cng b m ngun m vo thng 07/2008 v chnh thc gia nhp vo i gia nh Apache. V trong cui thng 02/2010, Cassandra tr thnh Apache Top-Level Project (TLP). Mc d vn cn kh mi m vi cng ng ngi s dng, nhng cng ngh ca Cassandra c ng dng rng ri trong nhng cng ty v t chc c quy m nh Cisco, Twitter v Digg. Cassandra m u cho mt th h database k tip l mt th h c s d liu non-relational (khng quan h), distributed (phn tn), m ngun m, horizontal scalable (kh nng m rng theo chiu ngang) c th lu tr, x l t mt lng rt nh cho ti hng petabytes d liu trong h thng c chu ti, li cao vi nhng i hi v ti nguyn phn cng thp. Tiu lun gii thiu v h c s d liu phn tn Cassandra, kin trc v m hnh d liu trong Cassandra. y l mt tiu lun kh mi, phn ln l tham kho t ti liu nc ngoi, khng th trnh khi nhng thiu xt. Chng em mong thy c v cc bn gp kin thc ca chng em ngy cng hon thin hn. Xin chn thnh cm n.

TI LIU THAM KHO


1. Apache Cassandra Documentation (http://www.datastax.com/docsource/pdf/cassandra10.pdf ) 2. Cassandra: The Definitive Guide Eben Hewitt 3. Cassandra 2.0 Tutorial V1.0 Sbastien Jourdain, Fatiha Zeghir 2005/06/01 4. Cassandra An Introduction Mikio L. Braun Leo Jugel TU Berlin, twimpact LinuxTag Berlin, 13. May 2011

50

You might also like