Professional Documents
Culture Documents
Lab 11 Map Reduce
Lab 11 Map Reduce
1 Gii thiu
thit lp mt Hadoop cluster, SV chn t nht l 4 my tnh. Mi my c vai tr nh sau:
- 1 my lm NameNode: dng qun l khng gian tn (namespace) v i u khin truy cp cho
h thng. Ch c mt my NameNode duy nht trong cluster.
- 1my lm J obTracker: theo di hot ng ca cc node con (slave node). Ch mt my lm
J obTracker
- DataNode: lu gi cc file d liu v qun l vng l u tr cc b ca n. C th c nhiu
DataNode.
- TaskTracker: l cc node con (slaves) thc thi cc tc v map v reduce. C th c nhiu
TaskTracker.
Mi node trong cluster c th l master hay slave. NameNode v J obTracker c th l cng mt
my nhng tng hiu sut cho h thng th chng nn l hai my ring bit v chng l cc my
ng vai tr master. Cc my cn li l slave v m nhn vai tr DataNode ln TaskTracker.
Yu cu sinh vin thc thi ng dng WordCount trn hai m hnh: Pseudo-Distributed Operation v
Fully-Distributed Operation hiu r hot ng ca m hnh Map/Reduce v kin trc HDFS (Hadoop
Distributed FileSystem).
2 Ni dung
2.1 Ci t v s dng MapReduce nh mt Cluster
SV chn t nht 4 my v thc hin thit lp cc bc nh sau:
- Download : hadoop distribution t mt trong cc lin kt sau
o http:// hadoop.apache.org
o http://www.cse.hcmut.edu.vn/~nathu/XLSS
- Cu hnh 3 tp tin chnh trong th mc hadoop-version/conf:
conf/core-site.xml: Thit lp namenode
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://172.28.12.20:9000</value>
</property>
</configuration>
SV thay i a ch IP tng ng vi my chn !
conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
conf/mapred-site.xml: Thit lp JobJacker
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker </name>
<value>172.28.12.45:9001</value> // SV thay doi dia chi IP tuong ung
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/phuong/local</value> // SV thay doi thu muc tuong ung
</property>
<property>
<name>mapred.map.tasks</name>
<value>20</value> // SV thay so luong map task
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>4</value> // SV thay doi so luong reduce task
</property>
<property>
<name>mapred.task.tracker.http.address</name>
<value>172.28.12.45:50060</value> // SV thay doi dia chi IP tuong ung
</property>
</configuration>
- Khi ng mi trng hadoop mapreduce bng cc lnh sau:
Trn NameNode thc hin lnh
o $bin/hadoop namenode format
o $bin/start-dfs.sh
Trn J obTracker thc hin lnh:
o $bin/start-mapred.sh
o Thc hin duyt cc trang web sau kim tra xem hadoop mapreduce hot ng hay
cha :
Namenode: http://Dia chi IP NameNode:50070
J obTracker:
http://Dia chi IP J obTracker:50030
- Thc thi ng dng mu c cung cp bi hadoop:
o $bin/hadoop fs -put conf input
o $bin/hadoop jar hadoop-example-*.jar grep input output dfs[a-z.]+
o $bin/hadoop fs get output output
o $cat output/*
- Kt thc mi trng hadoop mapreduce
Trn NameNode thc hin lnh
o $bin/hadoop namenode format
o $bin/stop-dfs.sh
Trn J obTracker thc hin lnh:
o $bin/stop-mapred.sh
2.2 Thc thi ng dng WordCount
SV c th s dng m ngun WordCount.java ca Google nh bn di hoc t vit.
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>{
private final static IntWritable one =new IntWritable(1);
private Text word =new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable>output, Reporter reporter) throws
IOException {
String line =value.toString();
StringTokenizer tokenizer =new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{
public void reduce(Text key, Iterator<IntWritable>values, OutputCollector<Text, IntWritable>output, Reporter reporter) throws
IOException {
int sum =0;
while (values.hasNext()) {
sum +=values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
J obConf conf =new J obConf(WordCount.class);
conf.setJ obName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
J obClient.runJ ob(conf);
}
}
thc thi ng dng, sv cn thc hin:
2.2. 1 Chuyn chng trnh WordCount.java thnh file .jar: vd WordCount.jar
$ mkdir wordcount_class
$ javac -classpath {HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d
wordcount_classes WordCount.java
$ jar cvf wordcount.jar C wordcount_classes/ .
2.2.2 Thc thi chng trnh
SV to hai tp tin file1, file2 c ni dung ty , chuyn chng vo th mc input v thc thi lnh bn
di:
$ bin/hadoop jar WordCount.jar org.myorg.WordCount input output
Ch : Mt s lnh thao tc trn HDFS
$ bin/hadoop dfs put <source> <dest> : cung cp input cho chng trnh
$ bin/hadoop dfs get <dest> <source>: ly v output ca chng trnh.
$ bin/hadoop dfs rmr <dir> : xa th mc.
$ bin/hadoop dfs rm <file> : xa tp tin
3 Bi tp
1 SV thc thi chng trnh WordCount ver2 ca Google.
2 SV vit chng trnh tnh PI theo m hnh Map/Reduce.