Professional Documents
Culture Documents
PERFORMANCE TUNING
Abstract
This paper explains tuning of Hadoop configuration parameters which
Abstract 1
Document Flow 2
Definitions 2
Parameters affecting 4
Performance
Conclusion 12
About Impetus 13
Impetus Technologies Inc.
1 www.impetus.com
October 2009
W H I T E P A P E R
HADOOP
PERFORMANCE TESTING
Definitions • mapred.reduce.parallel.
copies : The number of threads used
The document highlights the following to copy map outputs to the Reducer.
the job.
• mapred.compress.map.output
: Specifies whether to compress output
of maps. Map Operations: Map task involves the
2
W H I T E P A P E R
HADOOP
PERFORMANCE TESTING
3
W H I T E P A P E R
HADOOP
PERFORMANCE TESTING
Output file partitions are made available to completed and their output has been
copied. Map outputs are merged
the Reducers over HTTP. The number of maintaining their sorting order. This is
done in rounds. For example if there
worker threads used to serve the file
were 40 map outputs and the merge
partitions is controlled by the task factor was 10 (the default, controlled by
the io.sort.factor property, just
tracker.http.threads property—this like in the map’s merge) then there
setting is per tasktracker, not per map task would be 4 rounds. In first round 4 files
will be merged and in remaining 3
slot. The default of 40 may need increasing rounds 10 files are merged. The last
batch of files is not merged and directly
for large clusters running large jobs.
given to the reduce phase.
Reduce Operations: The Reducer has three 3. Reduce: During reduce phase the
reduce function is invoked for each key
phases in the sorted output. The output of this
phase is written directly to the output
1. Copy: Each map task’s output for the
filesystem, typically HDFS.
corresponding Reducer is copied as
soon as map task completes. The
reduce task has a small number of
copier threads so that it can fetch map Parameters affecting Performance
outputs in parallel. The default is 5
threads, but can be changed by setting dfs.block.size: File system block size
the
mapred.reduce.parallel.copies • Default
Default: 67108864 (bytes)
property. The map output is copied to
• Suggestions
Suggestions:
the reduce tasktracker’s memory buffer
which is controlled by mapred.job o Small cluster and large data set:
.shuffle.input.buffer.percent default block size will create a large
(specifies the proportion of the heap to number of map tasks.
use for this purpose). When the in-
memory buffer reaches a threshold size e.g. Input data size = 160 GB and
(controlled by mapred. dfs.block.size = 64 MB then
job.shuffle.merge.percent), or the minimum no. of maps=
reaches a threshold number of map (160*1024)/64 = 2560 maps.
outputs
If dfs.block.size = 128 MB
(mapred.inmem.merge.threshold),
minimum no. of maps=
it is merged and spilled to disk. As the
(160*1024)/128 = 1280 maps.
copies accumulate on disk, a
background thread merges them into If dfs.block.size = 256 MB
larger, sorted files. This saves some minimum no. of maps=
time in subsequent merging. (160*1024)/256 = 640 maps.
2. Sort: This phase should actually be In a small cluster (6-7 nodes) the
called the Merge phase as the sorting is map task creation overhead is
done at the map side. This phase starts considerable. So dfs.block.size
when all the maps have been should be large in this case but small
4
W H I T E P A P E R
HADOOP
PERFORMANCE TESTING
mapred.compress.map.output: Map
mapred.tasktracker.map/
Output Compression
reduce.tasks.maximum: Maximum
• Default: False
tasks (map/reduce) for a tasktracker
• Pros: Faster disk write, saves disk
space, less time in data transfer (from • Default: 2
Mappers to Reducers).
• Suggestions:
• Cons: Overhead in compression at
Mappers and decompression at o This value should be set according
Reducers. to the hardware specification of
cluster nodes and resource
• Suggestions: For large cluster and requirements of tasks (map/reduce).
large jobs this property should be set
true. The compression codec can also e.g. a node has 8GB main memory
be set through the property mapred. + 8 core CPU + swap space
map.output.compression.codec Maximum memory required by a
(Default is org.apache.hadoop.io. task ~ 500MB
compress.DefaultCodec).
Memory required by
mapred.map/reduce.tasks. tasktracker, Datanode and
other processes ~ (1 + 1 +1) =
speculative.execution: Enable/ 3GB
Disable task (map/reduce) speculative Maximum tasks that can be run =
execution (8-3) GB/500MB = 10
• Cons: Increases the job time if the task o The memory available to each task
progress is slow due to complex and (JVM) is controlled by
5
W H I T E P A P E R
HADOOP
PERFORMANCE TESTING
mapred.child.java.opts • Default: 1
property. The default is –Xmx200m
(200 MB). Other JVM options can • Suggestions: The overhead of JVM
also be provided in this property. creation for each task is around 1
second. So for the tasks which live for
seconds or a few minutes and have
io.sort.mb: Buffer size (MBs) for sorting lengthy initialization, this value can be
increased to gain performance.
• Default: 100
• Suggestions: mapred.reduce.parallel.copies:
6
W H I T E P A P E R
HADOOP
PERFORMANCE TESTING
effectively, only ¼ hard disk space is called in reduce phase when all the maps
available for business use. had been finished. So in large jobs where
Reducer loads data (>100 MB for business
The default value for
logic) in-memory on initialization, the
mapred.local.dir is
performance can be increased by lazily
${hadoop.tmp.dir}/mapred/local. So if
initializing Reducers i.e. loading data in
mapred.local.dir is not set,
reduce method controlled by an initialize
hadoop.tmp.dir must have enough
flag variable which assures that it is loaded
space to hold job’s intermediate data. If the
only once.
node doesn’t have enough temporary
space the task attempt will fail and starts a By lazily initializing Reducers which require
new attempt, thus reducing the memory (for business logic) on initialization,
performance. number of maps can be increased
(controlled by mapred.tasktracker.
JVM tuning: Besides normal java code
map.tasks.maximum ;).
optimizations, JVM settings for each child
slave node end, the Tasktracker and data Maximum memory required by each map
node use 1 GM RAM each. Effective use of task = 400 MB
the remaining RAM as well as choosing the If Reducer loads 400 MB of data (metadata
default max RAM for child tasks is 200MB No. of reduce tasks to run = 4
which might be insufficient for many Maximum memory required by
production grade jobs. The JVM settings for Tasktracker + Datanode = 2GB
7
W H I T E P A P E R
HADOOP
PERFORMANCE TESTING
Number of map tasks can be run = (8-2- processing metadata from raw metadata
business logic, using around 1.2 GB of Approaches: The problem was divided into
metadata (complex hierarchical look up/ 5 Map Reduce jobs- Metadata job, Input
master data). Processing involves complex processing job for parsing the input files
8
W H I T E P A P E R
HADOOP
PERFORMANCE TESTING
9
W H I T E P A P E R
HADOOP
PERFORMANCE TESTING
10
W H I T E P A P E R
HADOOP
PERFORMANCE TESTING
11
W H I T E P A P E R
HADOOP
PERFORMANCE TESTING
12
W H I T E P A P E R
HADOOP
PERFORMANCE TESTING
13