Hadoop 3

Hadoop
3.
MapReduce

MapReduce
MapReduce
MapReduce
MapReduce
Hadoop
MapReduce
MapReduce

MapReduce Google:
Jeffrey
Dean, Sanjay Ghemawat. MapReduce:

Simplified Data Processing on Large Clusters. 2004.
: Web

MapReduce Google
,
:
.
3800 C++ 700
(
)

MapReduce
Google C++
Apache Hadoop
Java
Erlang
NoSQL:
MongoDB
CouchB
MapReduce

, ,

MapReduce

Map
Map: toUpper(str)
,
!
Reduce
Reduce: +

MapReduce
MapReduce
Map Reduce
: -
Map Reduce
:

Reduce
Map
1:
issue open_bid open_ask bid
ask
AFKS
0,95
1,3 0,95
1,3
AFLT
2,15
2,57 2,15 2,57
AKHA
0,28
0,72 0,28 0,72
AKRN
45,25
46,5
45
46
ALNU
700
700
AMEZ
0,475
0,515 0,475 0,515
1http://ftp.rts.ru/pub/info/stats/
Reduce

Reduce
: WordCount

:
file1: Hello World Bye World
file2: Hello Hadoop Goodbye Hadoop
:
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2
Map WordCount
:
map (filename, file-contents):
for each word in file-contents:
emit (word, 1)
:
file 1:
Hello 1
World 1
Bye 1
World 1
file2:
Hello 1
Hadoop 1
Goodbye 1
Hadoop 1
Reduce WordCount
:
reduce (word, values):
sum = 0
for each value in values:
sum = sum + value
emit (word, sum)
:
Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2
MapReduce
MapReduce
HDFS

Map

Map
Map
-
MapReduce
-
Reduce

Reduce

HDFS
MapReduce
MapReduce

Map Reduce

MapReduce
MapReduce :

(
)
(
)

Hadoop
grep
:
Map:
.
:
:
Reduce:
URL
:
URL
Map:
Web- :
: URL
: 1
Reduce:
URL :
: URL
:
: ,

Map:
:
:
:
Reduce
:
:
:
:

Map:
:
:
Reduce:

MapReduce Hadoop
Hadoop MapReduce

Java
Map
Reduce
Streaming
: Linux Windows
(), Unix Java
Hadoop
Hadoop
Job MapReduce
Task Job, Map
Reduce
Job Tracker
Hadoop,

Task Tracker Task
Hadoop
public class WordCount {
// Map
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
// Reduce
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
// main Hadoop
public static void main(String[] args) throws Exception {
}
WordCount Map Hadoop

public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);

private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
WordCount Reduce Hadoop

public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter
reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
Hadoop
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
conf.setCombinerClass(Reduce.class);

Map
Reduce
:
Map: (<Hello, 1>, <World, 1>, <Hello, 1>, <Hadoop, 1>)

Combiner: (<Hello, 2>, <World, 1>, <Hadoop, 1>)

Reducer,

MapReduce
Map
Map

,
Map
64
Hadoop Map
,

HDFS
MapReduce ,

MapReduce Map
64
HDFS (WORM)
HDFS 64 .

MapReduce
,

HDFS
MapReduce
HBase, Hive, Pig, Mahout .
Map
Reduce
MapReduce

Map,
Reduce
MapReduce
MapReduce

Map Reduce

MapReduce
,
,
MapReduce: Simplified Data Processing on

Large Clusters
MapReduce Tutorial
http://labs.google.com/papers/mapreduce.html
http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html
A Study of Skew in MapReduce Applications
http://nuage.cs.washington.edu/pubs/opencirrus2011.pdf

Hadoop 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop 3

Uploaded by

Copyright:

Available Formats

Hadoop

Dean, Sanjay Ghemawat. MapReduce:

WordCount Map Hadoop

private final static IntWritable one = new IntWritable(1);

WordCount Reduce Hadoop

Map: (<Hello, 1>, <World, 1>, <Hello, 1>, <Hadoop, 1>)

HBase, Hive, Pig, Mahout .

MapReduce: Simplified Data Processing on

A Study of Skew in MapReduce Applications

You might also like