Lab 11 Map Reduce

Lab 11 Hadoop MapReduce (2)
1 Gii thiu
thit lp mt Hadoop cluster, SV chn t nht l 4 my tnh. Mi my c vai tr nh sau:
- 1 my lm NameNode: dng qun l khng gian tn (namespace) v i u khin truy cp cho
h thng. Ch c mt my NameNode duy nht trong cluster.
- 1my lm J obTracker: theo di hot ng ca cc node con (slave node). Ch mt my lm
J obTracker
- DataNode: lu gi cc file d liu v qun l vng l u tr cc b ca n. C th c nhiu
DataNode.
- TaskTracker: l cc node con (slaves) thc thi cc tc v map v reduce. C th c nhiu
TaskTracker.
Mi node trong cluster c th l master hay slave. NameNode v J obTracker c th l cng mt
my nhng tng hiu sut cho h thng th chng nn l hai my ring bit v chng l cc my
ng vai tr master. Cc my cn li l slave v m nhn vai tr DataNode ln TaskTracker.
Yu cu sinh vin thc thi ng dng WordCount trn hai m hnh: Pseudo-Distributed Operation v
Fully-Distributed Operation hiu r hot ng ca m hnh Map/Reduce v kin trc HDFS (Hadoop
Distributed FileSystem).

2 Ni dung
2.1 Ci t v s dng MapReduce nh mt Cluster
SV chn t nht 4 my v thc hin thit lp cc bc nh sau:
- Download : hadoop distribution t mt trong cc lin kt sau
o http:// hadoop.apache.org
o http://www.cse.hcmut.edu.vn/~nathu/XLSS
- Cu hnh 3 tp tin chnh trong th mc hadoop-version/conf:
conf/core-site.xml: Thit lp namenode
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://172.28.12.20:9000</value>
</property>
</configuration>
SV thay i a ch IP tng ng vi my chn !

conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>

conf/mapred-site.xml: Thit lp JobJacker
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>mapred.job.tracker </name>
<value>172.28.12.45:9001</value> // SV thay doi dia chi IP tuong ung
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/phuong/local</value> // SV thay doi thu muc tuong ung
</property>

<property>
<name>mapred.map.tasks</name>
<value>20</value> // SV thay so luong map task
</property>

<property>
<name>mapred.reduce.tasks</name>
<value>4</value> // SV thay doi so luong reduce task
</property>

<property>
<name>mapred.task.tracker.http.address</name>
<value>172.28.12.45:50060</value> // SV thay doi dia chi IP tuong ung
</property>
</configuration>

- Khi ng mi trng hadoop mapreduce bng cc lnh sau:
Trn NameNode thc hin lnh
o $bin/hadoop namenode format
o $bin/start-dfs.sh
Trn J obTracker thc hin lnh:
o $bin/start-mapred.sh
o Thc hin duyt cc trang web sau kim tra xem hadoop mapreduce hot ng hay
cha :
Namenode: http://Dia chi IP NameNode:50070
J obTracker:

http://Dia chi IP J obTracker:50030
- Thc thi ng dng mu c cung cp bi hadoop:

o $bin/hadoop fs -put conf input
o $bin/hadoop jar hadoop-example-*.jar grep input output dfs[a-z.]+
o $bin/hadoop fs get output output
o $cat output/*
- Kt thc mi trng hadoop mapreduce
Trn NameNode thc hin lnh
o $bin/hadoop namenode format
o $bin/stop-dfs.sh
Trn J obTracker thc hin lnh:
o $bin/stop-mapred.sh

2.2 Thc thi ng dng WordCount
SV c th s dng m ngun WordCount.java ca Google nh bn di hoc t vit.
package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class WordCount {

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>{

private final static IntWritable one =new IntWritable(1);
private Text word =new Text();

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable>output, Reporter reporter) throws
IOException {
String line =value.toString();

StringTokenizer tokenizer =new StringTokenizer(line);

while (tokenizer.hasMoreTokens()) {

word.set(tokenizer.nextToken());
output.collect(word, one);

}

}

}

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{

public void reduce(Text key, Iterator<IntWritable>values, OutputCollector<Text, IntWritable>output, Reporter reporter) throws
IOException {
int sum =0;

while (values.hasNext()) {

sum +=values.next().get();

}

output.collect(key, new IntWritable(sum));

}

}

public static void main(String[] args) throws Exception {

J obConf conf =new J obConf(WordCount.class);
conf.setJ obName("wordcount");

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);

conf.setCombinerClass(Reduce.class);

conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);

conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

J obClient.runJ ob(conf);

}
}

thc thi ng dng, sv cn thc hin:

2.2. 1 Chuyn chng trnh WordCount.java thnh file .jar: vd WordCount.jar

$ mkdir wordcount_class
$ javac -classpath {HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d
wordcount_classes WordCount.java
$ jar cvf wordcount.jar C wordcount_classes/ .

2.2.2 Thc thi chng trnh

SV to hai tp tin file1, file2 c ni dung ty , chuyn chng vo th mc input v thc thi lnh bn
di:

$ bin/hadoop jar WordCount.jar org.myorg.WordCount input output

Ch : Mt s lnh thao tc trn HDFS

$ bin/hadoop dfs put <source> <dest> : cung cp input cho chng trnh
$ bin/hadoop dfs get <dest> <source>: ly v output ca chng trnh.
$ bin/hadoop dfs rmr <dir> : xa th mc.
$ bin/hadoop dfs rm <file> : xa tp tin

3 Bi tp
1 SV thc thi chng trnh WordCount ver2 ca Google.
2 SV vit chng trnh tnh PI theo m hnh Map/Reduce.

Lab 11 Map Reduce

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab 11 Map Reduce

Uploaded by

Copyright:

Available Formats

Lab 11 Hadoop MapReduce (2)

You might also like