You are on page 1of 6

Lab 11 Hadoop MapReduce (2)

1 Gii thiu
thit lp mt Hadoop cluster, SV chn t nht l 4 my tnh. Mi my c vai tr nh sau:
- 1 my lm NameNode: dng qun l khng gian tn (namespace) v i u khin truy cp cho
h thng. Ch c mt my NameNode duy nht trong cluster.
- 1my lm J obTracker: theo di hot ng ca cc node con (slave node). Ch mt my lm
J obTracker
- DataNode: lu gi cc file d liu v qun l vng l u tr cc b ca n. C th c nhiu
DataNode.
- TaskTracker: l cc node con (slaves) thc thi cc tc v map v reduce. C th c nhiu
TaskTracker.
Mi node trong cluster c th l master hay slave. NameNode v J obTracker c th l cng mt
my nhng tng hiu sut cho h thng th chng nn l hai my ring bit v chng l cc my
ng vai tr master. Cc my cn li l slave v m nhn vai tr DataNode ln TaskTracker.
Yu cu sinh vin thc thi ng dng WordCount trn hai m hnh: Pseudo-Distributed Operation v
Fully-Distributed Operation hiu r hot ng ca m hnh Map/Reduce v kin trc HDFS (Hadoop
Distributed FileSystem).

2 Ni dung
2.1 Ci t v s dng MapReduce nh mt Cluster
SV chn t nht 4 my v thc hin thit lp cc bc nh sau:
- Download : hadoop distribution t mt trong cc lin kt sau
o http:// hadoop.apache.org
o http://www.cse.hcmut.edu.vn/~nathu/XLSS
- Cu hnh 3 tp tin chnh trong th mc hadoop-version/conf:
conf/core-site.xml: Thit lp namenode
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://172.28.12.20:9000</value>
</property>
</configuration>
SV thay i a ch IP tng ng vi my chn !

conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>

conf/mapred-site.xml: Thit lp JobJacker
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker </name>
<value>172.28.12.45:9001</value> // SV thay doi dia chi IP tuong ung
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/phuong/local</value> // SV thay doi thu muc tuong ung
</property>

<property>
<name>mapred.map.tasks</name>
<value>20</value> // SV thay so luong map task
</property>

<property>
<name>mapred.reduce.tasks</name>
<value>4</value> // SV thay doi so luong reduce task
</property>

<property>
<name>mapred.task.tracker.http.address</name>
<value>172.28.12.45:50060</value> // SV thay doi dia chi IP tuong ung
</property>
</configuration>

- Khi ng mi trng hadoop mapreduce bng cc lnh sau:
Trn NameNode thc hin lnh
o $bin/hadoop namenode format
o $bin/start-dfs.sh
Trn J obTracker thc hin lnh:
o $bin/start-mapred.sh
o Thc hin duyt cc trang web sau kim tra xem hadoop mapreduce hot ng hay
cha :
Namenode: http://Dia chi IP NameNode:50070
J obTracker:

http://Dia chi IP J obTracker:50030
- Thc thi ng dng mu c cung cp bi hadoop:

o $bin/hadoop fs -put conf input
o $bin/hadoop jar hadoop-example-*.jar grep input output dfs[a-z.]+
o $bin/hadoop fs get output output
o $cat output/*
- Kt thc mi trng hadoop mapreduce
Trn NameNode thc hin lnh
o $bin/hadoop namenode format
o $bin/stop-dfs.sh
Trn J obTracker thc hin lnh:
o $bin/stop-mapred.sh

2.2 Thc thi ng dng WordCount
SV c th s dng m ngun WordCount.java ca Google nh bn di hoc t vit.
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount {



public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>{

private final static IntWritable one =new IntWritable(1);
private Text word =new Text();



public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable>output, Reporter reporter) throws
IOException {
String line =value.toString();


StringTokenizer tokenizer =new StringTokenizer(line);


while (tokenizer.hasMoreTokens()) {


word.set(tokenizer.nextToken());
output.collect(word, one);


}


}


}



public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{

public void reduce(Text key, Iterator<IntWritable>values, OutputCollector<Text, IntWritable>output, Reporter reporter) throws
IOException {
int sum =0;


while (values.hasNext()) {


sum +=values.next().get();


}


output.collect(key, new IntWritable(sum));


}


}



public static void main(String[] args) throws Exception {

J obConf conf =new J obConf(WordCount.class);
conf.setJ obName("wordcount");


conf.setOutputKeyClass(Text.class);


conf.setOutputValueClass(IntWritable.class);


conf.setMapperClass(Map.class);


conf.setCombinerClass(Reduce.class);


conf.setReducerClass(Reduce.class);


conf.setInputFormat(TextInputFormat.class);


conf.setOutputFormat(TextOutputFormat.class);


FileInputFormat.setInputPaths(conf, new Path(args[0]));


FileOutputFormat.setOutputPath(conf, new Path(args[1]));


J obClient.runJ ob(conf);


}
}


thc thi ng dng, sv cn thc hin:

2.2. 1 Chuyn chng trnh WordCount.java thnh file .jar: vd WordCount.jar

$ mkdir wordcount_class
$ javac -classpath {HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d
wordcount_classes WordCount.java
$ jar cvf wordcount.jar C wordcount_classes/ .

2.2.2 Thc thi chng trnh

SV to hai tp tin file1, file2 c ni dung ty , chuyn chng vo th mc input v thc thi lnh bn
di:

$ bin/hadoop jar WordCount.jar org.myorg.WordCount input output

Ch : Mt s lnh thao tc trn HDFS

$ bin/hadoop dfs put <source> <dest> : cung cp input cho chng trnh
$ bin/hadoop dfs get <dest> <source>: ly v output ca chng trnh.
$ bin/hadoop dfs rmr <dir> : xa th mc.
$ bin/hadoop dfs rm <file> : xa tp tin

3 Bi tp
1 SV thc thi chng trnh WordCount ver2 ca Google.
2 SV vit chng trnh tnh PI theo m hnh Map/Reduce.

You might also like