Professional Documents
Culture Documents
September 2015
Hortonworks University
Hortonworks University provides an immersive and valuable real world experience In scenario-based
training Courses
Hadoop Certification
Join an exclusive group of professionals with demonstrated skills and the qualifications to
prove it. Hortonworks certified professionals are recognized as leaders in the field.
The
“self-‐paced”
learning
library
accelerates
time
to
Hadoop
Access
the
Self
Paced
Learning
Library
today
competency.
In
addition,
the
learning
library
content
is
Access
to
Hortonworks
Self
Paced
Learning
Library
is
included
as
constantly
being
expanded
with
new
content
being
added
on
an
part
of
the
Hortonworks
Enterprise,
Enterprise
Plus
&
Premiere
ongoing
basis.
Subscriptions
for
each
named
Support
Contact.
Additional
Self
Paced
Learning
Library
subscriptions
can
be
purchased
on
a
per-‐
Target
Audience
user
basis
for
individuals
who
are
not
named
Support
Contacts.
Hortonworks
University
“Self-‐Paced”
Learning
Library
is
designed
for
those
new
to
Hadoop,
as
well
as
architects,
developers,
Hortonworks
University
analysts,
data
scientists,
an
IT
decision
makers
-‐
essentially
Hortonworks
University
is
your
expert
source
for
Apache
anyone
with
a
need
or
desire
to
learn
more
about
Apache
Hadoop
training
and
certification.
Public
and
private
on-‐site
Hadoop
and
the
Hortonworks
Data
Platform.
courses
are
available
for
developers,
administrators,
data
analysts
and
other
IT
professionals
involved
in
implementing
big
Prerequisites: None. data
solutions.
Classes
combine
presentation
material
with
industry-‐leading
hands-‐on
labs
that
fully
prepare
students
for
real-‐world
Hadoop
deployments.
Overview Duration
This course provides a technical overview of Apache Hadoop. It 8
Hours,
On
Line.
includes high-level information about concepts, architecture,
operation, and uses of the Hortonworks Data Platform (HDP) and Target Audience
the Hadoop ecosystem. The course provides an optional primer Data architects, data integration architects, managers, C-level
for those who plan to attend a hands-on, instructor-led course executives, decision makers, technical infrastructure team, and
Hadoop administrators or developers who want to understand
Course Objectives the fundamentals of Big Data and the Hadoop ecosystem.
• Describe what makes data “Big Data”
• List data types stored and analyzed in Hadoop Prerequisites
No previous Hadoop or programming knowledge is required.
• Describe how Big Data and Hadoop fit into your current
Students will need browser
access
to
the
Internet.
infrastructure and environment
• Describe fundamentals of:
o the Hadoop Distributed File System (HDFS) Format
o YARN • 100%
self-‐paced,
online
exploration
(for
employees,
o MapReduce partners
or
support
subscription
customers)
o Hadoop frameworks: (Pig, Hive, HCatalog, Storm, Solr, or
Spark, HBase, Oozie, Ambari, ZooKeeper, Sqoop, • 100%
instructor
led
discussion
Flume, and Falcon)
o Recognize use cases for Hadoop Certification
o Describe the business value of Hadoop Hortonworks offers a comprehensive certification program that
o Describe new technologies like Tez and the Knox identifies you as an expert in Apache Hadoop. Visit
Gateway hortonworks.com/training/certification for more information.
Overview Duration
This course is designed for big data analysts who want to use the 2
days
HBase NoSQL database which runs on top of HDFS to provide
real-time read/write access to sparse datasets. Topics include Target Audience
HBase architecture, services, installation and schema design. Architects, software developers, and analysts responsible for
implementing non-SQL databases in order to handle sparse data
sets commonly found in big data use cases.
Course Objectives
• How HBase integrates with Hadoop and HDFS
Prerequisites
• Architectural components and core concepts of HBase
Students must have basic familiarity with data management
• HBase functionality
systems. Familiarity with Hadoop or databases is helpful but not
• Installing and configuring HBase
required. Students new to Hadoop are encouraged to attend the
• HBase schema design
HDP Overview: Apache Hadoop Essentials course.
• Importing and exporting data
• Backup and recovery
• Monitoring and managing HBase Format
• How Apache Phoenix works with HBase 35%
Lecture/Discussion
• How HBase integrates with Apache ZooKeeper 65%
Hands-‐-‐-‐on
Labs
• HBase services and data operations
• Optimizing HBase Access Certification
Hortonworks offers a comprehensive certification program that
Hands-On Labs identifies you as an expert in Apache Hadoop. Visit
• Using Hadoop and MapReduce hortonworks.com/training/certification for more information.
• Using HBase
• Importing Data from MySQL to HBase Hortonworks University
• Using Apache ZooKeeper Hortonworks University is your expert source for Apache Hadoop
• Examining Configuration Files training and certification. Public and private on-site courses are
• Using Backup and Snapshot available for developers, administrators, data analysts and other
• HBase Shell Operations IT professionals involved in implementing big data solutions.
• Creating Tables with Multiple Column Families Classes combine presentation material with industry-leading
• Exploring HBase Schema hands-on labs that fully prepare students for real-world Hadoop
• Blocksize and Bloom filters scenarios.
• Exporting Data
• Using a Java Data Access Object Application to
Interact with HBase
HDP Developer: Java
Overview Hands-On Labs
This advanced course provides Java programmers a deep-dive • Configuring a Hadoop Development Environment
into Hadoop application development. Students will learn how to • Putting data into HDFS using Java
design and develop efficient and effective MapReduce • Write a distributed grep MapReduce application
applications for Hadoop using the Hortonworks Data Platform, • Write an inverted index MapReduce application
including how to implement combiners, partitioners, secondary • Configure and use a combiner
sorts, custom input and output formats, joining large datasets, • Writing custom combiners and partitioners
unit testing, and developing UDFs for Pig and Hive. Labs are run • Globally sort output using the TotalOrderPartitioner
on a 7-node HDP 2.1 cluster running in a virtual machine that • Writing a MapReduce job to sort data using a composite key
students can keep for use after the training.
• Writing a custom InputFormat class
• Writing a custom OutputFormat class
Duration
• Compute a simple moving average of stock price data
4
days
• Use data compression
• Define a RawComparator
Target Audience
• Perform a map-side join
Experienced Java software engineers who need to develop Java • Using a Bloom filter
MapReduce applications for Hadoop.
• Unit testing a MapReduce job
• Importing data into HBase
Course Objectives • Writing an HBase MapReduce job
• Writing User-Defined Pig and Hive functions
• Describe Hadoop 2 and the Hadoop Distributed File System
• Defining an Oozie workflow
• Describe the YARN framework
• Develop and run a Java MapReduce application on YARN
• Use combiners and in-map aggregation
Prerequisites
Students must have experience developing Java applications and
• Write a custom partitioner to avoid data skew on reducers
using a Java IDE. Labs are completed using the Eclipse IDE and
• Perform a secondary sort
Gradle. No prior Hadoop knowledge is required.
• Recognize use cases for built-in input and output formats
• Write a custom MapReduce input and output format
• Optimize a MapReduce job
Format
• Configure MapReduce to optimize mappers and reducers 50%
Lecture/Discussion
• Develop a custom RawComparator class 50%
Hands-‐on
Labs
• Distribute files as LocalResources
• Describe and perform join techniques in Hadoop Certification
• Perform unit tests using the UnitMR API Hortonworks offers a comprehensive certification program that
• Describe the basic architecture of HBase identifies you as an expert in Apache Hadoop. Visit
• Write an HBase MapReduce application hortonworks.com/training/certification for more information.
• List use cases for Pig and Hive
• Write a simple Pig script to explore and transform big data Hortonworks University
• Write a Pig UDF (User-Defined Function) in Java Hortonworks University is your expert source for Apache Hadoop
• Write a Hive UDF in Java training and certification. Public and private on-site courses are
• Use JobControl class to create a MapReduce workflow available for developers, administrators, data analysts and other
• Use Oozie to define and schedule workflows IT professionals involved in implementing big data solutions.
Classes combine presentation material with industry-leading
hands-on labs that fully prepare students for real-world Hadoop
scenarios.
Overview Duration
This course is designed for developers who want to create 2
days
custom YARN applications for Apache Hadoop. It will include: the
YARN architecture, YARN development steps, writing a YARN Target Audience
client and ApplicationMaster, and launching Containers. The Java software engineers who need to develop YARN applications
course uses Eclipse and Gradle connected remotely to a 7-node on Hadoop by writing YARN clients and ApplicationMasters.
HDP cluster running in a virtual machine.
Prerequisites
Course Objectives Students should be experienced Java developers who have
• Describe the YARN architecture attended HDP Developer: Java OR HDP Developer: Pig and Hive
• Describe the YARN application lifecycle OR are experienced with Hadoop and MapReduce development.
• Write a YARN client application
• Run a YARN application on a Hadoop cluster
• Monitor the status of a running YARN application Format
• View the aggregated logs of a YARN application 50%
Lecture/Discussion
• Configure a ContainerLaunchContext 50%
Hands-‐on
Labs
• Use a LocalResource to share application files across a cluster
• Write a YARN ApplicationMaster Certification
• Describe the differences between synchronous and Hortonworks offers a comprehensive certification program that
asynchronous ApplicationMasters identifies you as an expert in Apache Hadoop. Visit
• Allocate Containers in a cluster hortonworks.com/training/certification for more information.
• Launch Containers on NodeManagers
• Write a custom Container to perform specific business logic
Hortonworks University
• Explain the job schedulers of the ResourceManager
Hortonworks University is your expert source for Apache Hadoop
• Define queues for the Capacity Scheduler
training and certification. Public and private on-site courses are
available for developers, administrators, data analysts and other
Hands-On Labs IT professionals involved in implementing big data solutions.
• Run a YARN Application Classes combine presentation material with industry-leading
• Setup a YARN Development Environment hands-on labs that fully prepare students for real-world Hadoop
• Write a YARN Client scenarios.
• Submit an ApplicationMaster
• Write an ApplicationMaster
• Requesting Containers
• Running Containers
• Writing Custom Containers
Course Objectives
Overview • Recognize differences between batch and real-time data
This course provides a technical introduction to the fundamentals processing
of Apache Storm and Trident that includes the concepts,
• Define Storm elements including tuples, streams, spouts,
terminology, architecture, installation, operation, and
topologies, worker processes, executors, and stream
management of Storm and Trident. Simple Storm and Trident groupings
code excerpts are provided throughout the course. The course
also includes an introduction to, and code samples for, Apache • Explain and install Storm architectural components,
Kafka. Apache Kafka is a messaging system that is commonly including Nimbus, Supervisors, and ZooKeeper cluster
used in concert with Storm and Trident. • Recognize/interpret Java code for a spout, bolt, or topology
• Identify how to develop and submit a topology to a local or
Duration
remote distributed cluster
Approximately
2
days
• Recognize and explain the differences between reliable and
unreliable Storm operation
Target Audience
• Manage and monitor Storm using the command-line client or
Data architects, data integration architects, technical browser-based Storm User Interface (UI)
infrastructure team, and Hadoop administrators or developers • Define Kafka topics, producers, consumers, and brokers
who want to understand the fundamentals of Storm and Trident.
• Publish Kafka messages to Storm or Trident topologies
• Define Trident elements including tuples, streams, batches,
Prerequisites partitions, topologies, Trident spouts, and operations
No previous Hadoop or programming knowledge is required. • Recognize and interpret the code for Trident operations,
Students will need browser
access
to
the
Internet.
including filters, functions, aggregations, merges, and joins
• Recognize the differences between the different types of
Format Trident state
Self-paced, online exploration or • Identify how Trident state supports exactly-once processing
Instructor led exploration and discussion semantics and idempotent operation
• Recognize the differences in fault tolerance between different
Hortonworks University types of Trident spouts
Hortonworks University is your expert source for Apache Hadoop • Recognize and interpret the code for Trident state-based
training and certification. Public and private on-site courses are operations
available for developers, administrators, data analysts and other
IT professionals involved in implementing big data solutions.
Certification
Classes combine presentation material with industry-leading
Hortonworks offers a comprehensive certification program that
hands-on labs that fully prepare students for real-world Hadoop
identifies you as an expert in Apache Hadoop. Visit
scenarios.
hortonworks.com/training/certification for more information.
HDP Operations: Hadoop Administration 1
Overview Course Objectives
This course is designed for administrators who will be
managing the Hortonworks Data Platform (HDP) 2.3 with • Summarize and enterprise environment including Big Data,
Ambari. It covers installation, configuration, and other typical Hadoop and the Hortonworks Data Platform (HDP)
• Install HDP
cluster maintenance tasks.
• Manage Ambari Users and Groups
• Manage Hadoop Services
Duration
• Use HDFS Storage
4
days
• Manage HDFS Storage
Target Audience
• Configure HDFS Storage
IT administrators and operators responsible for installing, • Configure HDFS Transparent Data Encryption
configuring and supporting an HDP 2.3 deployment in a Linux • Configure the YARN Resource Manager
environment using Ambari. • Submit YARN Jobs
• Configure the YARN Capacity Scheduler
Add and Remove Cluster Nodes
Hands-On Labs •
• Configure HDFS and YARN Rack Awareness
• Introduction to the Lab Environment • Configure HDFS and YARN High Availability
• Performing an Interactive Ambari HDP Cluster Installation • Monitor a Cluster
• Configuring Ambari Users and Groups • Protect a Cluster with Backups
• Managing Hadoop Services
• Using HDFS Files and Directories
• Using WebHDFS Prerequisites
• Configuring HDFS ACLs Attendees should be familiar with with Hadoop and Linux
• Managing HDFS environments.
• Managing HDFS Quotas
• Configuring HDFS Transparent Data Encryption Format
• Configuring and Managing YARN 60%
Lecture/Discussion
• Non-Ambari YARN Management 40%
Hands-on
Labs
• Configuring YARN Failure Sensitivity, Work Preserving
Restarts, and Log Aggregation Settings Certification
• Submitting YARN Jobs Hortonworks offers a comprehensive certification program that
• Configuring Different Workload Types identifies you as an expert in Apache Hadoop. Visit
• Configuring User and Groups for YARN Labs hortonworks.com/training/certification for more information.
• Configuring YARN Resource Behavior and Queues
• User, Group and Fine-Tuned Resource Management
Hortonworks University
• Adding Worker Nodes
Hortonworks University is your expert source for Apache Hadoop
• Configuring Rack Awareness
training and certification. Public and private on-site courses are
• Configuring HDFS High Availability
available for developers, administrators, data analysts and other
• Configuring YARN High Availability IT professionals involved in implementing big data solutions.
• Configuring and Managing Ambari Alerts Classes combine presentation material with industry-leading
• Configuring and Managing HDFS Snapshots hands-on labs that fully prepare students for real-world Hadoop
• Using Distributed Copy (DistCP)
scenarios.
Overview Duration
This course is designed for administrators who are familiar with 2
days
administering other Hadoop distributions and are migrating to the
Hortonworks Data Platform (HDP). It covers installation, Target Audience
configuration, maintenance, security and performance topics.
Experienced
Hadoop
administrators
and
operators
responsible
for
installing,
configuring
and
supporting
the
Hortonworks
Data
Course Objectives Platform.
• Install and configure an HDP 2.x cluster
• Use Ambari to monitor and manage a cluster Prerequisites
• Mount HDFS to a local filesystem using the NFS Gateway Attendees should be familiar with Hadoop fundamentals, have
• Configure Hive for Tez experience administering a Hadoop cluster, and installation of
• Use Ambari to configure the schedulers of the configuration of Hadoop components such as Sqoop, Flume,
ResourceManager Hive, Pig and Oozie.
• Commission and decommission worker nodes using Ambari
• Use Falcon to define and process data pipelines
Format
• Take snapshots using the HDFS snapshot feature
50%
Lecture/Discussion
• Implement and configure NameNode HA using Ambari
• Secure an HDP cluster using Ambari 50%
Hands-‐on
Labs
• Setup a Knox gateway
Certification
Hands-On Labs Hortonworks offers a comprehensive certification program that
• Install HDP 2.x using Ambari identifies you as an expert in Apache Hadoop. Visit
• Add a new node to the cluster hortonworks.com/training/certification for more information.
• Stop and start HDP services
• Mount HDFS to a local file system Hortonworks University
• Configure the capacity scheduler Hortonworks University is your expert source for Apache Hadoop
• Use WebHDFS training and certification. Public and private on-site courses are
• Dataset mirroring using Falcon available for developers, administrators, data analysts and other
• Commission and decommission a worker node using Ambari IT professionals involved in implementing big data solutions.
• Use HDFS snapshots Classes combine presentation material with industry-leading
• Configure NameNode HA using Ambari hands-on labs that fully prepare students for real-world Hadoop
• Secure an HDP cluster using Ambari scenarios.
• Setting up a Knox gateway
Overview Duration
This course is designed for administrators who will be installing, 4
days
configuring and managing HBase clusters. It covers installation
with Ambari, configuration, security and troubleshooting HBase Target Audience
implementations. The course includes an end-of-course project in Architects, software developers, and analysts responsible for
which students work together to design and implement an HBase implementing non-SQL databases in order to handle sparse data
schema.
sets commonly found in big data use cases.
Course Objectives
Prerequisites
• Hadoop Primer
Students must have basic familiarity with data management
o Hadoop, Hortonworks, and Big Data
systems. Familiarity with Hadoop or databases is helpful but not
o HDFS and YARN
required. Students new to Hadoop are encouraged to take the
• Discussion: Running Applications in the Cloud
HDP Overview: Apache Hadoop Essentials course.
• Apache HBase Overview
• Provisioning the Cluster
• Using the HBase Shell Format
• Ingesting Data 50%
Lecture/Discussion
• Operational Management 50%
Hands-‐on
Labs
• Backup and Recovery
• Security Certification
• Monitoring HBase and Diagnosing Problems Hortonworks offers a comprehensive certification program that
• Maintenance identifies you as an expert in Apache Hadoop. Visit
• Troubleshooting hortonworks.com/training/certification for more information.
Exam Objectives
View the complete list of objectives below, which includes links to
the corresponding documentation and/or other resources.