You are on page 1of 1

Shrinivas Joshi, Advanced Micro Devices, Inc. {shrinivas.joshi@amd.

com}
Introduction
Users who have deployed and tried tuning their Hadoop clusters for the first time will certainly attest to the fact that optimizing Hadoop clusters is a daunting task. Apart from the nature and implementation of Hadoop jobs, hardware, network infrastructure, OS, JVM, and Hadoop configuration properties all have a significant impact on performance and scalability of Hadoop workloads. Using TeraSort as a reference workload, this poster attempts to educate the audience on challenges involved in tuning performance of Hadoop setup, different tools available for monitoring and diagnosing performance bottlenecks, tuning best practices, empirical data on effect of various tunings on performance, and some future directions.

Hadoop Configuration Tuning
• Using maximum possible map and reduce slots, identify the optimal number of disks that maximizes I/O bandwidth. • Experiment with different HDFS block sizes. • Identify heap usage and GC characteristics of framework processes and lock in their heap and GC settings. • Start with biggest-payoff properties. • Enable map output compression. • Experiment with different codec choices. In our experience, LZO codec performed better across multiple workloads. • Experiment with JVM reuse policy. • This helps reduce disk & network I/O wait and increase CPU utilization.
Effect of enabling map compression using LZO
1.2 1 100.00% 1 100.00% 1 100.00% 1.2 1

Best Practices
• If memory is not a bottleneck, try to eliminate map-side spills by tuning io.sort.mb. At the least, reduce the number of spills. • In case there are no spills, tune io.sort.spill.percent. •This also helps reduce disk I/O wait time and increase CPU utilization.
Effect of eliminating map-side spills & tuning io.sort.factor

• Try to avoid or eliminate intermediate disk I/O operations on reduce side by tuning heap sizes of reduce JVMs. • If the reduce functionality is not heap heavy, adjust map output buffer size. • Tune framework-related resources such as task tracker threads, data node, and name node handler count.
Effect of tuning reduce-side configurations
1.2 100.00%

Hadoop:
• Identify the right number of data disks your job requires. • Observe Hadoop framework heap usage and GC patterns and lock in heap and GC JVM flags for these processes. • Using default settings, start with Hadoop configuration tuning. • Focus on properties shown to have a big impact, such as map/reduce output compression and JVM reuse policy. • Avoid or eliminate map-side spills. • Avoid or eliminate reduce-side disk I/O by tuning reduce JVM heap size, map output copier threads, merge factor, and tasktracker threads handling reduce-side requests. • If reduce functionality is not heapheavy, try to maximize the map output storage buffers. • Try to maximize CPU usage by tuning number of map and reduce JVMs. • Tune other framework-related components such as datanode and namenode handler count.

Scaling with number of data disks
1.2

94.00%

Execution time

Execution time

Execution time

0.8 0.6 0.4 0.2 0

Execution time

0.8 1 disk 43.00% 0.4 3 disks 5 disks

81.80% 0.8 0.6 0.4 0.2 0

0.8

71.80%

0.6

0.6

51.00%

Compression disabled Compression enabled

Map-side spills occuring Map-side spills eliminated

Without reduce side tuning With reduce side tuning

0.4

0.2

0.2

0

0

Number of data disks

Map output compression configuration

Map-side spill configurations

Reduce side buffer space configuration

JVM Configuration Tuning
• On 64-bit Oracle JDK 6 update 25, compressed pointers are ON by default. If you are using an older version of JDK and compressed pointers are disabled, experiment with enabling them. • Compressed pointers reduce memory footprint.

Challenges
• Hadoop is a large and complicated framework involving a number of entities interacting with each other across multiple hardware systems. • Performance of Hadoop jobs is sensitive to every component of the cluster stack: Hadoop configuration, JVM, OS, network infrastructure, underlying hardware, and possibly BIOS settings. • Hadoop supports a large number of configuration properties and a good chunk of these can potentially impact performance. • As with any large software system, diagnosing performance issues is a complicated task.

• Biased-locking feature of Oracle HotSpot JDK improves performance in scenarios with uncontended locks. • Given the architecture of Hadoop framework, biased locking should generally improve performance.

• Experiment with AggressiveOpts flag. • Experiment with UseCompressedStrings flag. • Experiment with UseStringCache flag. • Experiment with UseNUMA flag. •Experiment with UseLargePages flag.

• Verify whether the JVM is running out of code cache. Increase code cache size if necessary. • Perform detailed GC log analysis and tuning.

JVM:

• Experiment with performanceaffecting JVM flags such as biased locking, compressed pointers, AggressiveOpts, etc. • Perform a thorough Java GC analysis and tune GC related flags.

Operating System:
Effect of biased locking
1.2 1.2 100.00% 100.00% 102.00% 1

Effect of compressed pointers
1.2 1 Execution time 0.8 0.6 0.4 0.2 0 Compressed pointers configuration Compressed pointers disabled Compressed pointers enabled
100.00% 96.80%

Effect of JVM aggressive optimizations
1.2

Effect of GC tuning
100.00% 97.00%

1

94.50%

1

• Experiment with various OS-level tunings such as choice of FS, choice of I/O scheduler, and limits on OS resources.

Execution time

Execution time

0.8

Execution time

0.8 Without -XX:+AggressiveOpts flag With -XX:+AggressiveOpts flag 0.4

BIOS:
Without GC tuning With GC tuning

0.8

0.6

Biased locking disabled Biased locking enabled

0.6

0.6

0.4

0.4

• Finally, verify if any default BIOS settings are negatively affecting performance of your Hadoop jobs.

0.2

0.2

0.2

0

0

0

Biased locking configuration

Aggressive optimizations configuration

GC tuning

Tuning Highlights
OS Configuration Tuning
• On ClusterA, we achieved a speed-up factor of 5.6X. • On ClusterB, we achieved a speed-up factor of 2.2X.
• Linux kernels support 4 different types of I/O schedulers – CFQ, deadline, no-op, and anticipatory. • Experiment with different choices of I/O scheduler, especially CFQ and deadline. • Linux OS limits such as max open file descriptors and epoll limits can affect performance. • Experiment with these limits.
Total percentage gains as compared to the baseline execution time
0.30 80 4.59 1.26

Tools
• Ganglia and Nagios – effective in
monitoring cluster-wide statistics. • Hadoop Vaidya – offers guidelines on mitigating framework-level bottlenecks. • Hadoop Core – offers simple and insightful information on performanceaffecting events. • AMD CodeAnalystTM, Oracle® Solaris Studio and Intel® VTuneTM – profilers for in-depth source-level analysis. • AMD MultEvent, Linux perf, and OProfile – platform-level profilers for in-depth analysis of hardware-level performance events.

• Certain Linux distributions support EXT4 as the default file-system type. • If you are using another type of file system, experiment with EXT4 file system.

• By default, every file read operation triggers a disk write operation for maintaining last access time. • Disable this logging using noatime, nodirtaime FS attributes. • Experiment with other FS attribute tuning such as extent, flex_bg, barrier etc.
Effect of noatime file system attribute
1.2

70

60 3.30 50

8.30 BIOS

Effect of EXT4 file system
1.2 100.00% 91.00%

Effect of I/O scheduler choices
1.2 1.2 100.00% 85.00%

Effect of Linux OS resource settings
100.00% 101.00%

40 76.01

OS JVM Hadoop

1

1

100.00% 85.00%

1

1

Execution time

Execution time

Execution time

0.8

0.8

0.8

Execution time

0.8

30

0.6

Data disks using EXT3 FS Data disks using EXT4 FS

0.6

FS noatime attribute disabled FS noatime attribute enabled

0.6

"deadline" scheduler "CFQ" scheduler

0.6

Default open FD limit (1024) Open fd limit set to 16K

44.58 20

0.4

0.4

0.4

0.4

0.2

0.2

0.2

0.2

10

0

0

0

0

File system type

FS attributes

IO scheduler choices

Linux open file descriptors limit settings

0 64 GB TeraSort on ClusterA 1TB TeraSort on ClusterB

Cluster Configurations

Cluster A
• 6 DNs, 1 NN: 2 chips/6 cores per chip: AMD OpteronTM 2435 @2.6GHz • 16GB DDR2 800 RAM per node • 6 x 1TB Samsung Spinpoint F3 @7200 • Ubuntu 11.04 Server x64, Oracle JDK6 update 25 x64 • Dataset size – 64 GB

BIOS Configuration Tuning
• Native command queuing feature of modern hard drives helps improve I/O performance by optimizing drive head movement. • Experiment with AHCI option in BIOS, which can be used to enable NCQ mode.
• When all the CPU cores on the hardware are not fully utilized, the processor could be downgrading CPU frequency. • Experiment with ACPI and other power-related BIOS options.

• On some AMD processor-based systems, HyperTransportTM link speed and width are dynamically adjusted to reduce power consumption. • If memory bandwidth is a bottleneck, experiment with this option.

• Modern AMD processors support a feature called HT assist (a.k.a. probe filters). This feature reduces traffic on memory interconnects at the expense of some portion of L3 cache. • Experiment with this feature. Try disabling it if your job is cache-sensitive.

Conclusion & Future Direction
• Configuration tuning of all the components of Hadoop stack is an important exercise and can offer a huge performance payoff. • Different Hadoop workloads will have different characteristics, so it is important to experiment with different tuning options. • We want to perform a similar study in multi-tenant environments and in the cloud environment. • We want to explore JVM and JDK optimizations targeting peculiar characteristics of Hadoop framework and jobs running on top of it.

Cluster B
• 5 DNs: 2 chips/4 cores per chip: AMD Opteron 2356TM @2.3GHz • 1 NN: 4 chips/4 cores per chip: AMD Opteron 8356TM @ 2.6GHz • 64GB DDR2 667 and DDR2 800 RAM • 6 x 1TB Samsung Spinpoint F3 @7200 • Ubuntu 11.04 Server x64, Oracle JDK6 update 25 x64 • Dataset size – 1 TB
Execution time

Effect of power-saving settings in BIOS
1.2 100.00% 99.00% 1 1.2

Effecting of tuning HyperTransport settings
100.00% 98.00%

1

Execution time

0.8

0.8 HyperTransport link speed and width set to "auto" HyperTransport link speed set to 2.2GHz and width set to 16

0.6

Power saving enabled in BIOS Power savings disabled in BIOS

0.6

0.4

0.4

0.2

0.2

0

0

Power saving settings in BIOS

HyperTransport settings

© 2011 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, AMD Opteron and combinations thereof are trademarks of Advanced Micro Devices, Inc. HyperTransport is a trademark of HyperTransport Technology Consortium.