You are on page 1of 40

Hadoop

3.
MapReduce


MapReduce
MapReduce
MapReduce
MapReduce
Hadoop
MapReduce

MapReduce



MapReduce Google:
Jeffrey

Dean, Sanjay Ghemawat. MapReduce:


Simplified Data Processing on Large Clusters. 2004.

: Web

MapReduce Google

,
:

.
3800 C++ 700
(
)

MapReduce
Google C++
Apache Hadoop
Java
Erlang
NoSQL:

MongoDB
CouchB

MapReduce


, ,

MapReduce

Map

Map: toUpper(str)
,
!

Reduce

Reduce: +

MapReduce

MapReduce
Map Reduce
: -
Map Reduce
:


Reduce

Map

1:
issue open_bid open_ask bid
ask
AFKS
0,95
1,3 0,95
1,3
AFLT
2,15
2,57 2,15 2,57
AKHA
0,28
0,72 0,28 0,72
AKRN
45,25
46,5
45
46
ALNU
700
700
AMEZ
0,475
0,515 0,475 0,515

1http://ftp.rts.ru/pub/info/stats/

Reduce


Reduce

: WordCount


:
file1: Hello World Bye World
file2: Hello Hadoop Goodbye Hadoop

:
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2

Map WordCount
:
map (filename, file-contents):
for each word in file-contents:
emit (word, 1)

:
file 1:
Hello 1
World 1
Bye 1
World 1
file2:
Hello 1
Hadoop 1
Goodbye 1
Hadoop 1

Reduce WordCount
:
reduce (word, values):
sum = 0
for each value in values:
sum = sum + value
emit (word, sum)

:
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2

MapReduce

MapReduce

HDFS

Map

Map

Map
-

MapReduce
-
Reduce

Reduce

HDFS

MapReduce

MapReduce



Map Reduce

MapReduce

MapReduce :

(
)
(
)



Hadoop

grep

:
Map:
.
:
:

Reduce:

URL

:
URL
Map:
Web- :
: URL
: 1

Reduce:
URL :
: URL
:

: ,

Map:
:
:
:

Reduce
:
:
:

:

Map:
:
:

Reduce:

MapReduce Hadoop

Hadoop MapReduce

Java
Map
Reduce
Streaming
: Linux Windows
(), Unix Java

Hadoop

Hadoop
Job MapReduce
Task Job, Map
Reduce
Job Tracker
Hadoop,

Task Tracker Task

Hadoop
public class WordCount {
// Map
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
// Reduce
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
// main Hadoop
public static void main(String[] args) throws Exception {
}

WordCount Map Hadoop


public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);


private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}

WordCount Reduce Hadoop


public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter
reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}

Hadoop
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}

conf.setCombinerClass(Reduce.class);


Map
Reduce
:

Map: (<Hello, 1>, <World, 1>, <Hello, 1>, <Hadoop, 1>)


Combiner: (<Hello, 2>, <World, 1>, <Hadoop, 1>)



Reducer,

MapReduce
Map
Map

,
Map

64

Hadoop Map
,

HDFS

MapReduce ,

MapReduce Map
64

HDFS (WORM)

HDFS 64 .

MapReduce
,

HDFS

MapReduce

HBase, Hive, Pig, Mahout .

Map
Reduce

MapReduce

Map,
Reduce

MapReduce

MapReduce



Map Reduce

MapReduce
,
,

MapReduce: Simplified Data Processing on


Large Clusters

MapReduce Tutorial

http://labs.google.com/papers/mapreduce.html

http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html

A Study of Skew in MapReduce Applications

http://nuage.cs.washington.edu/pubs/opencirrus2011.pdf

You might also like