Professional Documents
Culture Documents
3.
MapReduce
MapReduce
MapReduce
MapReduce
MapReduce
Hadoop
MapReduce
MapReduce
MapReduce Google:
Jeffrey
: Web
MapReduce Google
,
:
.
3800 C++ 700
(
)
MapReduce
Google C++
Apache Hadoop
Java
Erlang
NoSQL:
MongoDB
CouchB
MapReduce
, ,
MapReduce
Map
Map: toUpper(str)
,
!
Reduce
Reduce: +
MapReduce
MapReduce
Map Reduce
: -
Map Reduce
:
Reduce
Map
1:
issue open_bid open_ask bid
ask
AFKS
0,95
1,3 0,95
1,3
AFLT
2,15
2,57 2,15 2,57
AKHA
0,28
0,72 0,28 0,72
AKRN
45,25
46,5
45
46
ALNU
700
700
AMEZ
0,475
0,515 0,475 0,515
1http://ftp.rts.ru/pub/info/stats/
Reduce
Reduce
: WordCount
:
file1: Hello World Bye World
file2: Hello Hadoop Goodbye Hadoop
:
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2
Map WordCount
:
map (filename, file-contents):
for each word in file-contents:
emit (word, 1)
:
file 1:
Hello 1
World 1
Bye 1
World 1
file2:
Hello 1
Hadoop 1
Goodbye 1
Hadoop 1
Reduce WordCount
:
reduce (word, values):
sum = 0
for each value in values:
sum = sum + value
emit (word, sum)
:
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2
MapReduce
MapReduce
HDFS
Map
Map
Map
-
MapReduce
-
Reduce
Reduce
HDFS
MapReduce
MapReduce
Map Reduce
MapReduce
MapReduce :
(
)
(
)
Hadoop
grep
:
Map:
.
:
:
Reduce:
URL
:
URL
Map:
Web- :
: URL
: 1
Reduce:
URL :
: URL
:
: ,
Map:
:
:
:
Reduce
:
:
:
:
Map:
:
:
Reduce:
MapReduce Hadoop
Hadoop MapReduce
Java
Map
Reduce
Streaming
: Linux Windows
(), Unix Java
Hadoop
Hadoop
Job MapReduce
Task Job, Map
Reduce
Job Tracker
Hadoop,
Task Tracker Task
Hadoop
public class WordCount {
// Map
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
// Reduce
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
// main Hadoop
public static void main(String[] args) throws Exception {
}
Hadoop
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
conf.setCombinerClass(Reduce.class);
Map
Reduce
:
Reducer,
MapReduce
Map
Map
,
Map
64
Hadoop Map
,
HDFS
MapReduce ,
MapReduce Map
64
HDFS (WORM)
HDFS 64 .
MapReduce
,
HDFS
MapReduce
Map
Reduce
MapReduce
Map,
Reduce
MapReduce
MapReduce
Map Reduce
MapReduce
,
,
MapReduce Tutorial
http://labs.google.com/papers/mapreduce.html
http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html
http://nuage.cs.washington.edu/pubs/opencirrus2011.pdf