You are on page 1of 3

TRAINING SHEET

Developer Training for Apache Hadoop

Take your knowledge to the next level with Cloudera’s Apache Hadoop
Training and Certification
Cloudera’s four-day developer training course delivers the key concepts and
expertise necessary to create robust data processing applications using Apache
Hadoop.

Through lecture and interactive, hands-on exercises, attendees will navigate the
Hadoop ecosystem, learning topics such as
“Cloudera has true expertise in their
• MapReduce and the Hadoop Distributed File System (HDFS) and how to write
ranks, offering intimate insight and
MapReduce code
experience with the Apache
• Best practices and considerations for Hadoop development, debugging Hadoop ecosystem.”
techniques and implementation of workflows and common algorithms
Justin Hancock,
• How to leverage Hive, Pig, Sqoop, Flume, Oozie and other projects from the Director
Apache Hadoop ecosystem

• Optimal hardware configurations and network considerations for building out,


maintaining and monitoring your Hadoop cluster

• Advanced Hadoop API topics required for real-world data analysis

Upon completion of the course, attendees are able to attempt the Cloudera
Certified Developer for Apache Hadoop (CCDH) exam. Certification is a great
differentiator; it helps establish individuals as leaders in their field, providing
customers with tangible evidence of their skills.

Audience
This course is intended for experienced developers who wish to write, maintain,
and/or optimize Apache Hadoop jobs. A background in Java is preferred, but
experience with other programming language such as PHP, Python or C# is
sufficient.

Cloudera, Inc. 210 Portage Avenue, Palo Alto, CA 94306 USA | 1-888-789-1488 or 1-650-362-0488 | cloudera.com
©2011 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their
respective companies. Information is subject to change without notice.
TRAINING SHEET

Developer Training for Apache Hadoop

Course Outline: Cloudera Developer Training for Apache Hadoop

• The Motivation For Hadoop


o Problems with traditional large-scale systems • Common MapReduce Algorithms
o Requirements for a new approach o Sorting and Searching
o Indexing
• Hadoop Basic Concepts o Machine Learning with Mahout
o An Overview of Hadoop o Term Frequency - Inverse Document Frequency
o The Hadoop Distributed File System o Word Co-Occurrence
o Hands-On Exercise o Hands-On Exercise
o How MapReduce Works
o Hands-On Exercise • Practical Development Tips and Techniques
o Anatomy of a Hadoop Cluster o Testing with MRUnit
o Other Hadoop Ecosystem Components o Debugging MapReduce Code
o Using LocalJobRunner Mode for Easier Debugging
• Writing a MapReduce Program o Eclipse development techniques
o The MapReduce Flow o Retrieving Job Information with Counters
o Examining a Sample MapReduce Program o Logging
o Basic MapReduce API Concepts o Splittable File Formats
o The Driver Code o Determining the Optimal Number of Reducers
o The Mapper o Map-Only MapReduce Jobs
o The Reducer o Implementing Multiple Mappers using ChainMapper
o Hadoop’s Streaming API o Hands-On Exercise
o Using Eclipse for Rapid Development
o Hands-On Exercise • More Advanced MapReduce Programming
o Custom Writables and WritableComparables
• Integrating Hadoop Into The Workflow o Saving Binary Data using SequenceFiles and Avro Files
o Relational Database Management Systems o Creating InputFormats and OutputFormats
o Storage Systems o Hands-On Exercise
o Creating workflows with Oozie
o Importing Data from RDBMSs With Sqoop • Joining Data Sets in MapReduce Jobs
o Hands-On Exercise o Map-Side Joins
o Importing Real-Time Data with Flume o The Secondary Sort
o Accessing HDFS Using FuseDFS and Hoop o Reduce-Side Joins
o Hands-On Exercise
• Delving Deeper Into The Hadoop API
o Using Combiners • Graph Manipulation in Hadoop
o Using LocalJobRunner Mode for Faster Development o Introduction to graph techniques
o Reducing Intermediate Data with Combiners o Representing Graphs in Hadoop
o The configure and close methods for MapReduce o Implementing a sample algorithm: Single Source
Setup and Teardown Shortest Path
o Writing Partitioners for Better Load Balancing
o Directly Accessing HDFS • Creating Workflows with Oozie
o Using The Distributed Cache o The Motivation for Oozie
o Hands-On Exercise o Oozie’s Workflow Definition Format
o Hands-On Exercise
• Using Hive and Pig
o Hive Basics • Cloudera Certified Developer for Apache Hadoop (CCDH) exam
o Pig Basics
o Hands-On Exercise

Cloudera, Inc. 210 Portage Avenue, Palo Alto, CA 94306 USA | 1-888-789-1488 or 1-650-362-0488 | cloudera.com
©2011 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their
respective companies. Information is subject to change without notice.
TRAINING SHEET

Developer Training for Apache Hadoop

Cloudera Certified Developer for Apache Hadoop (CCDH)


Establish yourself as a trusted and valuable resource by completing the online certification exam for Apache Hadoop developers.
The exam is demanding and is designed to test your fluency with concepts and terminology in the following areas:

Computing Environment The current mix of computing resources and demands that motivates use of a technology like Apache
Hadoop
Hadoop Distributed File System How files are stored and managed in HDFS; the infrastructure that supports HDFS

MapReduce The phases of execution and framework for running a MapReduce job. Expected properties of job runs
based on number of mappers, number of reducers and distribution of data
Hadoop API The Java classes that make up the API for developers who wish to write Apache Hadoop MapReduce
jobs
Hadoop Platform The basic purpose, design and operation of tools that augment the Apache Hadoop core to make a
comprehensive platform, including Hadoop Streaming, fuse-dfs, Apache Hive, Apache Pig, Apache
Flume, Apache Sqoop, Apache HBase, Apache Oozie and HUE

Cloudera, Inc. 210 Portage Avenue, Palo Alto, CA 94306 USA | 1-888-789-1488 or 1-650-362-0488 | cloudera.com
©2011 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their
respective companies. Information is subject to change without notice.

You might also like