Professional Documents
Culture Documents
TS. H Bo Quc
1011036
Ni dung
Gii thiu
Nhu cu thc t
Hadoop l g?
Lch s pht trin
Nhu cu thc t
Cc node l cc PC
Hadoop l g?
tn vi d liu rt ln.
Quy m: hng terabyte d liu, hng ngn node.
Thnh phn:
Lu tr: HDFS (Hadoop Distributed Filesystem)
X l: MapReduce
H tr m hnh lp trnh Map/Reduce
Hadoop Common
HDFS?
Cch thc lu tr v sa li
H thng file
ng dng
cng vt l
10
H thng file
H thng file
H thng file
cng vt l
cng vt l
cng vt l
11
Mc tiu ca HDFS
im yu ca HDFS
13
14
Cc khi nim
Block: n v lu tr d liu nh nht
Hadoop dng mc nh 64MB/block
1 file chia lm nhiu block
NameNode
Qun l thng tin ca tt c cc file trong cluster
DataNode
Qun l cc block d liu
15
16
NameNode
Qun l v tr ca cc block
17
DataNode
Qun l cc block
18
Bn sao d liu:
Mi file c nhiu bn sao nhiu bn sao ca block
19
20
Cc k quan trng
bn vng ca HDFS
3 loi li chnh:
Li NameNode
Li DataNode
S cn tr ca mng my tnh
22
Ti cn bng cluster
Chuyn cc block sang DataNode khc c khong
trng di nh mc qui nh
23
Li NameNode
C th lm h thng HDFS v dng
To cc bn copy ca FsImage v EditLog
Khi NameNode restart, h thng s ly bn
sao gn nht.
24
C ch hot ng
c d liu:
Chng trnh client yu cu c d liu t
NameNode
node.
25
C ch hot ng (t.t)
26
27
C ch hot ng (t.t)
Ghi d liu:
Ghi theo dng ng ng (pipeline)
Chng trnh yu cu thao tc ghi NameNode
NameNode kim tra quyn ghi v m bo file khng
tn ti
Cc bn sao ca block to thnh ng ng d
liu tun t c ghi vo
28
C ch hot ng (t.t)
29
30
Map Reduce
Map Reduce l g ?
Thc thi
Demo
31
X l d liu vi quy m ln
Mun x dng 1000 CPU
Mong mun mt m hnh qun l n gin
32
Map Reduce l g ?
M hnh lp trnh
MapReduce c xy dng t m hnh lp trnh hm v lp trnh song
song
33
Map Reduce l g ?
c d liu ln
Map Reduce l g ?
35
(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2>
-> reduce -> <k3, v3> (output)
36
Hm Map
Mi phn t ca d liu u vo s c
truyn
cho
hm
Map
di
dng
cp
<key,value>
Hm Map xut ra mt hoc nhiu cp
<key,value>
37
Hm Reduce
Kt hp, x l, bin i cc value
u ra l mt cp <key,value> c x l
38
V d word count
39
Mapper
u vo : Mt dng ca vn bn
u ra : key : t, value : 1
Reducer
u vo : key : t, values : tp hp cc gi tr
m c ca mi t
u ra : key : t, value : tng
40
Tt c cc gi tr c x l c lp
Thc thi MR
42
Thc thi ( bc 1)
Input
Data
User
Program
Shard 0
Shard 1
Shard 2
Shard 3
Shard 4
Shard 5
Shard 6
43
Thc thi ( bc 2)
Map Reduce sao chp chng trnh ny vo
Master
User
Program
Workers
Workers
Workers
Workers
Workers
44
Thc thi ( bc 3)
Master
Idle
Worker
45
Thc thi ( bc 4)
Shard 0
Map
worker
Key/value pairs
46
Thc thi ( bc 5)
Master
Map
worker
Local
Storage
47
Thc thi ( bc 6)
Master
Disk locations
Reduce
worker
remote
Storage
48
Thc thi ( bc 7)
Sorts data
Partition
Output file
Reduce
worker
49
Thc thi ( bc 8)
wakeup
User
Program
Output
files
50
L mt framework
S dng HDFS
DFS
MapReduce
Master
Namenode
Jobtracker
Slave
Datanote
Tasktracker
51
52
53
Job Submission
Yu cu ID cho job mi (1 )
Kim tra cc th mc u vo v u ra
54
Khi to Job
55
Phn phi cc tc v
56
Thc thi tc v
57
59
Kt thc Job
tc v cui cng
Kh nng chu li
Li tc v (Task Failure)
Vng li ngoi l, B git bi VJM, Treo
JobTracker giao cho TaskTracker khc x l trong
mt gii hn nht nh
Kh nng chu li
Li TaskTracker
Crashing, Chy chm, khng gi bo co ng hn
cho JobTracker
62
Kh nng chu li
Li Jobtracker
Nghim trng
Cha c hng gii quyt
63
Ti u ha
64
Ti u ha
a ra hm combiner
C th chy trn cng my vi cc mapper
Chy c lp vi cc mapper khc
Mini Reducer, lm gim u ra ca cc giai
on Map. Tit kim bng thng
65
ng dng
Sp xp d liu phn tn
Web Ranking
Dch my
Indexing
...
66
Tng kt
Tp trung vo vn chnh cn x l
67
Map
68
Reduce
69
Main
70