You are on page 1of 2

Nhn t bn ngoi, files lu trn HDFS cng ging nh lu trong

Windows hay Linux. Chng ta c th create new, delete, move,


renameNhng trn thc t, d liu c chia ra thnh blocks lu
tr trn rt nhiu DataNode, mi block c nhiu bn sao(mc nh
l 3) lu trn nhiu DataNode khc nhau, phng khi mt
DataNode no c s c th h thng vn hot ng bnh
thng. Ngoi ra cn c 1(v ch 1) NameNode lm nhim v
qun l d liuv iu tit cc lnh i hi thao tc chng.
Cn MapReduce gip cho vic x l song song c thun li, t
nht gm 3 b phn: hm Map phn tch data thnh cc
cp(key, value); hm Reduce cn c vo cc key gom tp hp
cc cp nh vy li vi nhau v a ra kt qu; hm Main iu
tit. Mi mt thao tc Map hoc Reduce c gi l TaskTracker.
Thng thng TaskTrackers c chy trn DataNodes gim
ng truyn. Tasktrackers c JobTracker cn c vo thng tin
ca blocks khi to trn DataNode ph hp. JobTracker khng
nht thit chy trn cng my vi NameNode.
Hy xem 1 v d n gin l thng k tn s xut hin ca tng t
trong : hello world, hello hadoop. 1 TaskTracker s Map on
hello world cho ra (hello, 1)(world, 1). TaskTracker khc s
Map on hello hadoop cho ra (hello, 1)(hadoop, 1). Sau
1 TaskTracker khc s Reduce cho ra kt qu (hello, 2)(world,
1)(hadoop, 1).

Hnh di y minh ha mt v d v d liu thi tit. T kho d liu thi tit v


nhit ca cc thi im trong cc nm, ngi ta mun thng k nhit cao
nht cho tng nm:

Trong hnh trn, t nhng d liu th l cc log ghi c t cc thit b s c


phn gii v chuyn thnh d liu trung gian, sau qu trnh map v reduce s
thc hin cng vic ca mnh ly c kt qu cn tnh. Trong thc t, vi yu
cu x l phc tp, map v reduce c th c gi v s dng nhiu ln cho ti khi
c c kt qu cn tnh.

You might also like