You are on page 1of 26

Training Catalog

Apache Hadoop Training from the Experts

Summer 2015 Training Catalog

Apache  Hadoop  Training  


From  the  Experts  

September 2015
Hortonworks University
Hortonworks University provides an immersive and valuable real world experience In scenario-based
training Courses

• Public, private on site and virtual led courses


• Self paced learning library
• Global delivery capability
• Performance based certifications
• Academic program
• Industry-leading lecture and hands-on labs
• High definition instructor led training
• Individualized learning paths
Learning Paths

Hadoop Certification
Join an exclusive group of professionals with demonstrated skills and the qualifications to
prove it. Hortonworks certified professionals are recognized as leaders in the field.

Hortonworks Certified Developer:


• HDP Certified Developer (HDPCD)
• HDP Certified Developer: Java (HDPCD: Java)

Hortonworks Certified Administrator:


• HDP Certified Administrator (HDPCA)
Table of Contents
Hortonworks University Self-Paced Learning Library
7
HDP Overview: Essentials
8
HDP Analyst: Data Science
9
HDP Analyst: HBase Essentials
10
HDP Developer: Java
11
HDP Developer: Apache Pig and Hive

HDP Developer: Windows


12

HDP Developer: YARN 13

HDP Developer: Storm and Trident Fundamentals 14

HDP Operations: Hadoop Administration 1 15

HDP Operations: Migrating to the Hortonworks Data Platform HDP 16

Operations: Apache HBase Advanced Management 17


18
HDP Certified Administrator (HDPCA)
19
HDP Certified Developer (HDPCD)
20
Hortonworks University Academic Program
21
HDP Academic Analyst: Data Science
22
HDP Academic Developer: Apache Pig and Hive
23
HDP Academic Operations: Install and Manage with Apache Ambari
24
Hortonworks University Self-Paced Learning Library

Overview   Self-­‐Paced  Learning  Content  includes:  


Hortonworks  University  “Self-­‐Paced”  Learning  Library  is  an  on   • HDP  Overview:  Essentials
demand-­‐learning  library  that  is  accessed  using  a  Hortonworks   • HDP  Developer:  Apache  Pig  &  Hive
University  account.    Learners  can  view  lessons  anywhere,  at  any   • HDP  Developer:  Java
time,  and  complete  lessons  at  their  own  pace.  Lessons  can  be   • HDP  Developer:  Windows
stopped  and  started,  as  needed,  and  completion  is  tracked  via   • HDP  Developer:  Developing  Custom  YARN  Applications
the  Hortonworks  University  Learning  Management  System.     • HDP  Operations:  Install  and  Manage  with  Apache
Ambari
This  learning  library  makes  it  easy  for  Hadoop  Administrators,   • HDP  Operations:  Migrating  to  the  Hortonworks  Data
Data  Analysts,  and  Developers  to  continuously  learn  and  stay  up-­‐ Platform
to-­‐date  on  Hortonworks  Data  Platform.   • HDP  Analyst:  Data  Science
• HDP  Analyst:  HBase  Essentials
Hortonworks  University  courses  are  designed  and  developed  by  
Hadoop  experts  and  provide  an  immersive  and  valuable  real   Duration  
world  experience.  In  our  scenario-­‐based  training  courses,  we   Access  to  the  Hortonworks  University  Self  Paced  Learning  Library  
offer  unmatched  depth  and  expertise.    We  prepare  you  to  be  an   is  provided  for  a  12-­‐month  period  per  individual  named  user.    
expert  with  highly  valued,  practical  skills  and  prepare  you  to   The  subscription  includes  access  to  over  200  hours  of  learning  
successfully  complete  Hortonworks  Technical  Certifications.     lessons.    

The  “self-­‐paced”  learning  library  accelerates  time  to  Hadoop   Access  the  Self  Paced  Learning  Library  today  
competency.  In  addition,  the  learning  library  content  is   Access  to  Hortonworks  Self  Paced  Learning  Library  is  included  as  
constantly  being  expanded  with  new  content  being  added  on  an   part  of  the  Hortonworks  Enterprise,  Enterprise  Plus  &  Premiere  
ongoing  basis.     Subscriptions  for  each  named  Support  Contact.  Additional  Self  
Paced  Learning  Library  subscriptions  can  be  purchased  on  a  per-­‐
Target  Audience   user  basis  for  individuals  who  are  not  named  Support  Contacts.  
Hortonworks  University  “Self-­‐Paced”  Learning  Library  is  designed  
for  those  new  to  Hadoop,  as  well  as  architects,  developers,   Hortonworks  University  
analysts,  data  scientists,  an  IT  decision  makers  -­‐  essentially   Hortonworks  University  is  your  expert  source  for  Apache  
anyone  with  a  need  or  desire  to  learn  more  about  Apache   Hadoop  training  and  certification.  Public  and  private  on-­‐site  
Hadoop  and  the  Hortonworks  Data  Platform.     courses  are  available  for  developers,  administrators,  data  
analysts  and  other  IT  professionals  involved  in  implementing  big  
Prerequisites: None. data  solutions.  Classes  combine  presentation  material  with  
industry-­‐leading  hands-­‐on  labs  that  fully  prepare  students  for  
real-­‐world  Hadoop  deployments.  

For  more  information  please  contact  


trainingops@hortonworks.com    

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
5470 Great America Parkway
tested for enterprise-grade deployments.
Santa Clara, CA. 95054 USA  
HDP Overview: Apache Hadoop Essentials

Overview Duration  
This course provides a technical overview of Apache Hadoop. It 8  Hours,  On  Line.  
includes high-level information about concepts, architecture,
operation, and uses of the Hortonworks Data Platform (HDP) and Target Audience  
the Hadoop ecosystem. The course provides an optional primer Data architects, data integration architects, managers, C-level
for those who plan to attend a hands-on, instructor-led course executives, decision makers, technical infrastructure team, and
Hadoop administrators or developers who want to understand
Course Objectives the fundamentals of Big Data and the Hadoop ecosystem.  
• Describe what makes data “Big Data”
• List data types stored and analyzed in Hadoop Prerequisites
No previous Hadoop or programming knowledge is required.
• Describe how Big Data and Hadoop fit into your current
Students will need browser  access  to  the  Internet.
infrastructure and environment
• Describe fundamentals of:
o the Hadoop Distributed File System (HDFS) Format
o YARN • 100%  self-­‐paced,  online  exploration  (for  employees,
o MapReduce partners  or  support  subscription  customers)
o Hadoop frameworks: (Pig, Hive, HCatalog, Storm, Solr, or
Spark, HBase, Oozie, Ambari, ZooKeeper, Sqoop, • 100%  instructor  led  discussion
Flume, and Falcon)
o Recognize use cases for Hadoop Certification
o Describe the business value of Hadoop Hortonworks offers a comprehensive certification program that
o Describe new technologies like Tez and the Knox identifies you as an expert in Apache Hadoop. Visit
Gateway hortonworks.com/training/certification for more information.

Hands-On Labs Hortonworks University


• There are no labs for this course. Hortonworks University is your expert source for Apache Hadoop
training and certification. Public and private on-site courses are
available for developers, administrators, data analysts and other
IT professionals involved in implementing big data solutions.
Classes combine presentation material with industry-leading
hands-on labs that fully prepare students for real-world Hadoop
scenarios.

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
5470 Great America Parkway
tested for enterprise-grade deployments.
Santa Clara, CA 95054 USA  
HDP Analyst: Data Science

Overview Hands-On Labs


This course is designed for students preparing to become familiar • Setting Up a Development Environment
with the processes and practice of data science, including • Using HDFS Commands
machine learning and natural language processing. Included are: • Using Mahout for Machine Learning
tools and programming languages (Python, IPython, Mahout, Pig, • Getting Started with Pig
NumPy, Pandas, SciPy, Scikit-learn), the Natural Language • Exploring Data with Pig
Toolkit (NLTK), and Spark MLlib. • Using the IPython Notebook
• Data Analysis with Python
Target Audience   • Interpolating Data Points
Computer science and data analytics students who need to apply • Define a Pig UDF in Python
data science and machine learning on Hadoop. • Streaming Python with Pig
• K-Nearest Neighbor and K-Means Clustering
• K-Means Clustering
Course Objectives • Using NLTK for Natural Language Processing
• Recognize use cases for data science • Classifying Text using Naive Bayes
• Describe the architecture of Hadoop and YARN • Spark Programming and Spark MLlib
• Describe supervised and unsupervised learning differences • Running Data Science Algorithms using Spark MLib
• List the six machine learning tasks
• Use Mahout to run a machine learning algorithm on Hadoop
Prerequisites
• Use Pig to transform and prepare data on Hadoop
Students must have experience with at least one programming or
• Write a Python script
scripting language, knowledge in statistics and/or mathematics,
• Use NumPy to analyze big data and a basic understanding of big data and Hadoop principles.  
• Use the data structure classes in the pandas library
• Write a Python script that invokes SciPy machine learning
• Describe options for running Python code on a Hadoop cluster
• Write a Pig User-Defined Function in Python Certification
• Use Pig streaming on Hadoop with a Python script Hortonworks offers a comprehensive certification program that
• Write a Python script that invokes scikit-learn identifies you as an expert in Apache Hadoop. Visit
• Use the k-nearest neighbor algorithm to predict values hortonworks.com/training/certification for more information.
• Run a machine learning algorithm on a distributed data set
• Describe use cases for Natural Language Processing (NLP)
• Perform sentence segmentation on a large body of text
• Perform part-of-speech tagging Hortonworks University
• Use the Natural Language Toolkit (NLTK) Hortonworks University is your expert source for Apache Hadoop
• Describe the components of a Spark application training and certification. Courses are available for developers,
• Write a Spark application in Python data analysts and administrators. Classes combine presentation
• Run machine learning algorithms using Spark MLlib material with industry-leading hands-on labs that fully prepare
students for real-world Hadoop scenarios.

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
5470 Great America Parkway
tested for enterprise-grade deployments.
Santa Clara, CA 95054 USA  
HDP Analyst: Apache HBase Essentials

Overview Duration  
This course is designed for big data analysts who want to use the 2  days
HBase NoSQL database which runs on top of HDFS to provide
real-time read/write access to sparse datasets. Topics include Target Audience    
HBase architecture, services, installation and schema design. Architects, software developers, and analysts responsible for
implementing non-SQL databases in order to handle sparse data
sets commonly found in big data use cases.
Course Objectives
• How HBase integrates with Hadoop and HDFS
Prerequisites
• Architectural components and core concepts of HBase
Students must have basic familiarity with data management
• HBase functionality
systems. Familiarity with Hadoop or databases is helpful but not
• Installing and configuring HBase
required. Students new to Hadoop are encouraged to attend the
• HBase schema design
HDP Overview: Apache Hadoop Essentials course.    
• Importing and exporting data
• Backup and recovery
• Monitoring and managing HBase Format
• How Apache Phoenix works with HBase 35%     Lecture/Discussion  
• How HBase integrates with Apache ZooKeeper 65%  Hands-­‐-­‐-­‐on  Labs  
• HBase services and data operations
• Optimizing HBase Access Certification
Hortonworks offers a comprehensive certification program that
Hands-On Labs identifies you as an expert in Apache Hadoop. Visit
• Using Hadoop and MapReduce hortonworks.com/training/certification for more information.
• Using HBase
• Importing Data from MySQL to HBase Hortonworks University
• Using Apache ZooKeeper Hortonworks University is your expert source for Apache Hadoop
• Examining Configuration Files training and certification. Public and private on-site courses are
• Using Backup and Snapshot available for developers, administrators, data analysts and other
• HBase Shell Operations IT professionals involved in implementing big data solutions.
• Creating Tables with Multiple Column Families Classes combine presentation material with industry-leading
• Exploring HBase Schema hands-on labs that fully prepare students for real-world Hadoop
• Blocksize and Bloom filters scenarios.
• Exporting Data
• Using a Java Data Access Object Application to
Interact with HBase
HDP Developer: Java
Overview Hands-On Labs
This advanced course provides Java programmers a deep-dive • Configuring a Hadoop Development Environment
into Hadoop application development. Students will learn how to • Putting data into HDFS using Java
design and develop efficient and effective MapReduce • Write a distributed grep MapReduce application
applications for Hadoop using the Hortonworks Data Platform, • Write an inverted index MapReduce application
including how to implement combiners, partitioners, secondary • Configure and use a combiner
sorts, custom input and output formats, joining large datasets, • Writing custom combiners and partitioners
unit testing, and developing UDFs for Pig and Hive. Labs are run • Globally sort output using the TotalOrderPartitioner
on a 7-node HDP 2.1 cluster running in a virtual machine that • Writing a MapReduce job to sort data using a composite key
students can keep for use after the training.   • Writing a custom InputFormat class
• Writing a custom OutputFormat class
Duration   • Compute a simple moving average of stock price data
4  days   • Use data compression
• Define a RawComparator
Target Audience   • Perform a map-side join
Experienced Java software engineers who need to develop Java • Using a Bloom filter
MapReduce applications for Hadoop.   • Unit testing a MapReduce job
• Importing data into HBase
Course Objectives • Writing an HBase MapReduce job
• Writing User-Defined Pig and Hive functions
• Describe Hadoop 2 and the Hadoop Distributed File System
• Defining an Oozie workflow
• Describe the YARN framework
• Develop and run a Java MapReduce application on YARN
• Use combiners and in-map aggregation
Prerequisites
Students must have experience developing Java applications and
• Write a custom partitioner to avoid data skew on reducers
using a Java IDE. Labs are completed using the Eclipse IDE and
• Perform a secondary sort
Gradle. No prior Hadoop knowledge is required.  
• Recognize use cases for built-in input and output formats
• Write a custom MapReduce input and output format
• Optimize a MapReduce job
Format
• Configure MapReduce to optimize mappers and reducers 50%  Lecture/Discussion  
• Develop a custom RawComparator class 50%  Hands-­‐on  Labs
• Distribute files as LocalResources
• Describe and perform join techniques in Hadoop Certification
• Perform unit tests using the UnitMR API Hortonworks offers a comprehensive certification program that
• Describe the basic architecture of HBase identifies you as an expert in Apache Hadoop. Visit
• Write an HBase MapReduce application hortonworks.com/training/certification for more information.
• List use cases for Pig and Hive
• Write a simple Pig script to explore and transform big data Hortonworks University
• Write a Pig UDF (User-Defined Function) in Java Hortonworks University is your expert source for Apache Hadoop
• Write a Hive UDF in Java training and certification. Public and private on-site courses are
• Use JobControl class to create a MapReduce workflow available for developers, administrators, data analysts and other
• Use Oozie to define and schedule workflows IT professionals involved in implementing big data solutions.
Classes combine presentation material with industry-leading
hands-on labs that fully prepare students for real-world Hadoop
scenarios.

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
5470 Great America Parkway
tested for enterprise-grade deployments.
Santa Clara, CA 95054 USA  
HDP Developer: Apache Pig and Hive
Overview Hands-On Labs
This course is designed for developers who need to create • Use HDFS commands to add/remove files and folders
applications to analyze Big Data stored in Apache Hadoop using • Use Sqoop to transfer data between HDFS and a RDBMS
Pig and Hive. Topics include: Hadoop, YARN, HDFS, • Run MapReduce and YARN application jobs
MapReduce, data ingestion, workflow definition and using Pig • Explore and transform data using Pig
and Hive to perform data analytics on Big Data. Labs are • Split and join a dataset using Pig
executed on a 7-node HDP cluster.   • Use Pig to transform and export a dataset for use with Hive
• Use HCatLoader and HCatStorer
Duration   • Use Hive to discover useful information in a dataset
Describe how Hive queries get executed as MapReduce jobs
4  days  

• Perform a join of two datasets with Hive
• Use advanced Hive features: windowing, views, ORC files
Target Audience   • Use Hive analytics functions
Software developers who need to understand and develop • Write a custom reducer in Python
applications for Hadoop.   • Analyze and sessionize clickstream data
• Compute quantiles of NYSE stock prices
Course Objectives • Use Hive to compute ngrams on Avro-formatted files
• Describe Hadoop, YARN and use cases for Hadoop • Define an Oozie workflow
• Describe Hadoop ecosystem tools and frameworks
• Describe the HDFS architecture Prerequisites
• Use the Hadoop client to input data into HDFS Students should be familiar with programming principles and
• Transfer data between Hadoop and a relational database have experience in software development. SQL knowledge is also
• Explain YARN and MaoReduce architectures helpful. No prior Hadoop knowledge is required.    
• Run a MapReduce job on YARN
• Use Pig to explore and transform data in HDFS Format
• Use Hive to explore Understand how Hive tables are defined 50%  Lecture/Discussion  
and implementedand analyze data sets 50%  Hands-­‐on  Labs
• Use the new Hive windowing functions
• Explain and use the various Hive file formats
• Create and populate a Hive table that uses ORC file formats Certification
• Use Hive to run SQL-like queries to perform data analysis Hortonworks offers a comprehensive certification program that
• Use Hive to join datasets using a variety of techniques, identifies you as an expert in Apache Hadoop. Visit
including Map-side joins and Sort-Merge-Bucket joins hortonworks.com/training/certification for more information.
• Write efficient Hive queries
• Create ngrams and context ngrams using Hive Hortonworks University
• Perform data analytics like quantiles and page rank on Big Hortonworks University is your expert source for Apache Hadoop
Data using the DataFu Pig library training and certification. Public and private on-site courses are
• Explain the uses and purpose of HCatalog available for developers, administrators, data analysts and other
• Use HCatalog with Pig and Hive IT professionals involved in implementing big data solutions.
• Define a workflow using Oozie Classes combine presentation material with industry-leading
• Schedule a recurring workflow using the Oozie Coordinator hands-on labs that fully prepare students for real-world Hadoop
scenarios.

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
5470 Great America Parkway
tested for enterprise-grade deployments.
Santa Clara, CA 95054 USA  
HDP Developer: Windows

Overview Hands-On Labs


This course is designed for developers who create applications • Start HDP on Windows
and analyze Big Data in Apache Hadoop on Windows using Pig • Add/remove files and folders from HDFS
and Hive. Topics include: Hadoop, YARN, the Hadoop • Transfer data between HDFS and Microsoft SQL Server
Distributed File System (HDFS), MapReduce, Sqoop and the • Run a MapReduce job
HiveODBC Driver. • Using Pig to analyze data
• Retrieve HCatalog schemas from within a 
Pig script
Duration   • Using Hive tables and queries
4  days   • Advanced Hive features like windowing, views and ORC files
• Hive analytics functions using the Pig DataFu library
• Compute quantiles
Target Audience  
• Use Hive to compute ngrams on Avro-formatted files
Software developers who need to understand and develop
• Connect Microsoft Excel to Hadoop with HiveODBC Driver
applications for Hadoop 2.x on Windows.
• Run a YARN application
• Define an Oozie workflow
Course Objectives
• Describe Hadoop and Hadoop and YARN
• Describe the Hadoop ecosystem Prerequisites
• List Components & deployment options for HDP on Windows Students should be familiar with programming principles and
• Describe the HDFS architecture have experience in software development. SQL knowledge and
• Use the Hadoop client to input data into HDFS familiarity with Microsoft Windows is also helpful. No prior
• Transfer data between Hadoop and Microsoft SQL Server Hadoop knowledge is required.
• Describe the MapReduce and YARN architecture
• Run a MapReduce job on YARN Format
• Write a Pig script 50%  Lecture/Discussion  
• Define advanced Pig relations 50%  Hands-­‐on  Labs
• Use Pig to apply structure to unstructured Big Data
• Invoke a Pig User-Defined Function
• Use Pig to organize and analyze Big Data Certification
• Describe how Hive tables are defined and implemented Hortonworks offers a comprehensive certification program that
• Use Hive windowing functions identifies you as an expert in Apache Hadoop. Visit
hortonworks.com/training/certification for more information.
• Define and use Hive file formats
• Create Hive tables that use the ORC file format
• Use Hive to run SQL-like queries to perform data analysis Hortonworks University
• Use Hive to join datasets Hortonworks University is your expert source for Apache Hadoop
• Create ngrams and context ngrams using Hive training and certification. Public and private on-site courses are
• Perform data analytics available for developers, administrators, data analysts and other
• Use HCatalog with Pig and Hive IT professionals involved in implementing big data solutions.
• Install and configure HiveODBC Driver for Windows Classes combine presentation material with industry-leading
• Import data from Hadoop into Microsoft Excel hands-on labs that fully prepare students for real-world Hadoop
• Define a workflow using Oozie scenarios.

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
5470 Great America Parkway
tested for enterprise-grade deployments.
Santa Clara, CA 95054 USA  
HDP Developer: Custom YARN Applications

Overview Duration  
This course is designed for developers who want to create 2  days  
custom YARN applications for Apache Hadoop. It will include: the
YARN architecture, YARN development steps, writing a YARN Target Audience  
client and ApplicationMaster, and launching Containers. The Java software engineers who need to develop YARN applications
course uses Eclipse and Gradle connected remotely to a 7-node on Hadoop by writing YARN clients and ApplicationMasters.
HDP cluster running in a virtual machine.

Prerequisites
Course Objectives Students should be experienced Java developers who have
• Describe the YARN architecture attended HDP Developer: Java OR HDP Developer: Pig and Hive
• Describe the YARN application lifecycle OR are experienced with Hadoop and MapReduce development.
• Write a YARN client application
• Run a YARN application on a Hadoop cluster
• Monitor the status of a running YARN application Format
• View the aggregated logs of a YARN application 50%  Lecture/Discussion  
• Configure a ContainerLaunchContext 50%  Hands-­‐on  Labs
• Use a LocalResource to share application files across a cluster
• Write a YARN ApplicationMaster Certification
• Describe the differences between synchronous and Hortonworks offers a comprehensive certification program that
asynchronous ApplicationMasters identifies you as an expert in Apache Hadoop. Visit
• Allocate Containers in a cluster hortonworks.com/training/certification for more information.
• Launch Containers on NodeManagers
• Write a custom Container to perform specific business logic
Hortonworks University
• Explain the job schedulers of the ResourceManager
Hortonworks University is your expert source for Apache Hadoop
• Define queues for the Capacity Scheduler
training and certification. Public and private on-site courses are
available for developers, administrators, data analysts and other
Hands-On Labs IT professionals involved in implementing big data solutions.
• Run a YARN Application Classes combine presentation material with industry-leading
• Setup a YARN Development Environment hands-on labs that fully prepare students for real-world Hadoop
• Write a YARN Client scenarios.
• Submit an ApplicationMaster
• Write an ApplicationMaster
• Requesting Containers
• Running Containers
• Writing Custom Containers

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
5470 Great America Parkway
tested for enterprise-grade deployments.
Santa Clara, CA 95054 USA  
HDP Developer: Storm and Trident Fundamentals

Course Objectives
Overview • Recognize differences between batch and real-time data
This course provides a technical introduction to the fundamentals processing
of Apache Storm and Trident that includes the concepts,
• Define Storm elements including tuples, streams, spouts,
terminology, architecture, installation, operation, and
topologies, worker processes, executors, and stream
management of Storm and Trident. Simple Storm and Trident groupings
code excerpts are provided throughout the course. The course
also includes an introduction to, and code samples for, Apache • Explain and install Storm architectural components,
Kafka. Apache Kafka is a messaging system that is commonly including Nimbus, Supervisors, and ZooKeeper cluster
used in concert with Storm and Trident. • Recognize/interpret Java code for a spout, bolt, or topology
• Identify how to develop and submit a topology to a local or
Duration     remote distributed cluster
Approximately  2  days   • Recognize and explain the differences between reliable and
unreliable Storm operation
Target Audience     • Manage and monitor Storm using the command-line client or
Data architects, data integration architects, technical browser-based Storm User Interface (UI)
infrastructure team, and Hadoop administrators or developers • Define Kafka topics, producers, consumers, and brokers
who want to understand the fundamentals of Storm and Trident.   • Publish Kafka messages to Storm or Trident topologies
• Define Trident elements including tuples, streams, batches,
Prerequisites partitions, topologies, Trident spouts, and operations
No previous Hadoop or programming knowledge is required. • Recognize and interpret the code for Trident operations,
Students will need browser  access  to  the  Internet.   including filters, functions, aggregations, merges, and joins
• Recognize the differences between the different types of
Format Trident state
Self-paced, online exploration or • Identify how Trident state supports exactly-once processing
Instructor led exploration and discussion semantics and idempotent operation
• Recognize the differences in fault tolerance between different
Hortonworks University types of Trident spouts
Hortonworks University is your expert source for Apache Hadoop • Recognize and interpret the code for Trident state-based
training and certification. Public and private on-site courses are operations
available for developers, administrators, data analysts and other
IT professionals involved in implementing big data solutions.
Certification
Classes combine presentation material with industry-leading
Hortonworks offers a comprehensive certification program that
hands-on labs that fully prepare students for real-world Hadoop
identifies you as an expert in Apache Hadoop. Visit
scenarios.
hortonworks.com/training/certification for more information.
HDP Operations: Hadoop Administration 1
Overview Course Objectives
This course is designed for administrators who will be
managing the Hortonworks Data Platform (HDP) 2.3 with • Summarize and enterprise environment including Big Data,
Ambari. It covers installation, configuration, and other typical Hadoop and the Hortonworks Data Platform (HDP)
• Install HDP
cluster maintenance tasks.  
• Manage Ambari Users and Groups
• Manage Hadoop Services
Duration  
• Use HDFS Storage
4  days   • Manage HDFS Storage
Target Audience   • Configure HDFS Storage
IT administrators and operators responsible for installing, • Configure HDFS Transparent Data Encryption
configuring and supporting an HDP 2.3 deployment in a Linux • Configure the YARN Resource Manager
environment using Ambari. • Submit YARN Jobs
• Configure the YARN Capacity Scheduler
Add and Remove Cluster Nodes
Hands-On Labs •
• Configure HDFS and YARN Rack Awareness
• Introduction to the Lab Environment • Configure HDFS and YARN High Availability
• Performing an Interactive Ambari HDP Cluster Installation • Monitor a Cluster
• Configuring Ambari Users and Groups • Protect a Cluster with Backups
• Managing Hadoop Services
• Using HDFS Files and Directories
• Using WebHDFS Prerequisites
• Configuring HDFS ACLs Attendees should be familiar with with Hadoop and Linux
• Managing HDFS environments.    
• Managing HDFS Quotas
• Configuring HDFS Transparent Data Encryption Format
• Configuring and Managing YARN 60%  Lecture/Discussion  
• Non-Ambari YARN Management 40%  Hands-on  Labs
• Configuring YARN Failure Sensitivity, Work Preserving
Restarts, and Log Aggregation Settings Certification
• Submitting YARN Jobs Hortonworks offers a comprehensive certification program that
• Configuring Different Workload Types identifies you as an expert in Apache Hadoop. Visit
• Configuring User and Groups for YARN Labs hortonworks.com/training/certification for more information.
• Configuring YARN Resource Behavior and Queues
• User, Group and Fine-Tuned Resource Management
Hortonworks University
• Adding Worker Nodes
Hortonworks University is your expert source for Apache Hadoop
• Configuring Rack Awareness
training and certification. Public and private on-site courses are
• Configuring HDFS High Availability
available for developers, administrators, data analysts and other
• Configuring YARN High Availability IT professionals involved in implementing big data solutions.
• Configuring and Managing Ambari Alerts Classes combine presentation material with industry-leading
• Configuring and Managing HDFS Snapshots hands-on labs that fully prepare students for real-world Hadoop
• Using Distributed Copy (DistCP)
scenarios.

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
tested for enterprise-grade deployments. 5470 Great America Parkway
Santa Clara, CA 95054 USA  
HDP Operations: Migrating to the Hortonworks Data Platform

Overview Duration  
This course is designed for administrators who are familiar with 2  days  
administering other Hadoop distributions and are migrating to the
Hortonworks Data Platform (HDP). It covers installation, Target Audience  
configuration, maintenance, security and performance topics.   Experienced  Hadoop  administrators  and  operators  responsible  
for  installing,  configuring  and  supporting  the  Hortonworks  Data  
Course Objectives Platform.  
• Install and configure an HDP 2.x cluster
• Use Ambari to monitor and manage a cluster Prerequisites
• Mount HDFS to a local filesystem using the NFS Gateway Attendees should be familiar with Hadoop fundamentals, have
• Configure Hive for Tez experience administering a Hadoop cluster, and installation of
• Use Ambari to configure the schedulers of the configuration of Hadoop components such as Sqoop, Flume,
ResourceManager Hive, Pig and Oozie.  
• Commission and decommission worker nodes using Ambari
• Use Falcon to define and process data pipelines
Format
• Take snapshots using the HDFS snapshot feature
50%  Lecture/Discussion  
• Implement and configure NameNode HA using Ambari
• Secure an HDP cluster using Ambari 50%  Hands-­‐on  Labs
• Setup a Knox gateway
Certification
Hands-On Labs Hortonworks offers a comprehensive certification program that
• Install HDP 2.x using Ambari identifies you as an expert in Apache Hadoop. Visit
• Add a new node to the cluster hortonworks.com/training/certification for more information.
• Stop and start HDP services
• Mount HDFS to a local file system Hortonworks University
• Configure the capacity scheduler Hortonworks University is your expert source for Apache Hadoop
• Use WebHDFS training and certification. Public and private on-site courses are
• Dataset mirroring using Falcon available for developers, administrators, data analysts and other
• Commission and decommission a worker node using Ambari IT professionals involved in implementing big data solutions.
• Use HDFS snapshots Classes combine presentation material with industry-leading
• Configure NameNode HA using Ambari hands-on labs that fully prepare students for real-world Hadoop
• Secure an HDP cluster using Ambari scenarios.
• Setting up a Knox gateway

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
tested for enterprise-grade deployments. 5470 Great America Parkway
Santa Clara, CA 95054 USA  
HDP Operations: Apache HBase Advanced Management

Overview Duration  
This course is designed for administrators who will be installing, 4  days  
configuring and managing HBase clusters. It covers installation
with Ambari, configuration, security and troubleshooting HBase Target Audience  
implementations. The course includes an end-of-course project in Architects, software developers, and analysts responsible for
which students work together to design and implement an HBase implementing non-SQL databases in order to handle sparse data
schema.   sets commonly found in big data use cases.

Course Objectives
Prerequisites
• Hadoop Primer
Students must have basic familiarity with data management
o Hadoop, Hortonworks, and Big Data
systems. Familiarity with Hadoop or databases is helpful but not
o HDFS and YARN
required. Students new to Hadoop are encouraged to take the
• Discussion: Running Applications in the Cloud
HDP Overview: Apache Hadoop Essentials course.  
• Apache HBase Overview
• Provisioning the Cluster
• Using the HBase Shell Format
• Ingesting Data 50%  Lecture/Discussion  
• Operational Management 50%  Hands-­‐on  Labs
• Backup and Recovery
• Security Certification
• Monitoring HBase and Diagnosing Problems Hortonworks offers a comprehensive certification program that
• Maintenance identifies you as an expert in Apache Hadoop. Visit
• Troubleshooting hortonworks.com/training/certification for more information.

Hands-On Labs Hortonworks University


• Installing and Configuring HBase with Ambari Hortonworks University is your expert source for Apache Hadoop
• Manually Installing HBase (Optional) training and certification. Public and private on-site courses are
• Using Shell Commands available for developers, administrators, data analysts and other
• Ingesting Data using ImportTSV IT professionals involved in implementing big data solutions.
• Enabling HBase High Availability Classes combine presentation material with industry-leading
• Viewing Log Files hands-on labs that fully prepare students for real-world Hadoop
• Configuring and Enabling Snapshots scenarios.
• Configuring Cluster Replication
• Enabling Authentication and Authorization
• Diagnosing and Resolving Hot Spotting
• Region Splitting
• Monitoring JVM Garbage Collection
• End-of-Course Project: Designing an HBase Schema

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: 1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
5470 Great American Parkway
tested for enterprise-grade deployments.
Santa Clara, CA 95054 USA  
HDP Certified Administrator (HDPCA)
Certification Overview Take the Exam Anytime, Anywhere
Hortonworks has redesigned its certification program to create an The  HDPCA  exam  is  available  from  any  computer,  anywhere,  at  
industry-recognized certification where individuals prove their any  time.  All  you  need  is  a  webcam  and  a  good  Internet  
Hadoop knowledge by performing actual hands-on tasks on a connection.  
Hortonworks Data Platform (HDP) cluster, as opposed to
answering multiple-choice questions. The HDP Certified
Administrator (HDPCA) exam is designed for Hadoop system How to Register
administrators and operators responsible for installing, Candidates  need  to  create  an  account  at  www.examslocal.com.  
configuring and supporting an HPD cluster. Once  you  are  registered  and  logged  in,  select  “Schedule  an  
Exam”,  and  then  enter  “Hortonworks”  in  the  “Search  Here”  field  
Purpose of the Exam to  locate  and  select  the  HDP  Certified  Administrator  exam.  The  
The purpose of this exam is to provide organizations that use cost  of  the  exam  is  $250  USD.  
Hadoop with a means of identifying suitably qualified staff to
install, configure, secure and troubleshoot a Hortonwork Data
Platform cluster using Apache Ambari.
Duration  
2  hours  
Exam Description
The exam has five main categories of tasks that involve: Description of the Minimally Qualified Candidate  
• Installation The Minimally Qualified Candidate (MQC) for this certification has
• Configuration hands-on experience installing, configuring, securing and
• Troubleshooting troubleshooting a Hadoop cluster, and can perform the
• High Availability objectives of the HDPCA exam.  
• Security
Prerequisites
The exam is based on the Hortonworks Data Platform 2.2 Candidates for the HPDCA exam should be able to perform each
installed and managed with Ambari 2.0.0. of the tasks in the list of exam objectives below. Candidates are
also encouraged to attempt the practice exam. Visit
Exam Objectives www.hortonworks.com/training/class/hdp-certified-administrator-
View the complete list of objectives below, which includes links to hdpca-exam/ for more details.
the corresponding documentation and/or other resources.

Language Hortonworks University


The exam is delivered in English. Hortonworks University is your expert source for Apache Hadoop
training and certification. Public and private on-site courses are
available for developers, administrators, data analysts and other
IT professionals involved in implementing big data solutions.
Classes combine presentation material with industry-leading
hands-on labs that fully prepare students for real-world Hadoop
scenarios.

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
5470 Great America Parkway
tested for enterprise-grade deployments.
Santa Clara, CA 95054 USA  
HDP Certified Developer (HDPCD) Exam
Certification Overview Duration  
Hortonworks has redesigned its certification program to create an 2  hours  
industry-recognized certification where individuals prove their
Hadoop knowledge by performing actual hands-on tasks on a Description of the Minimally Qualified Candidate  
Hortonworks Data Platform (HDP) cluster, as opposed to The Minimally Qualified Candidate (MQC) for this certification can
answering multiple-choice questions. The HDP Certified develop Hadoop applications for ingesting, transforming, and
Developer (HDPCD) exam is the first of our new hands-on, analyzing data stored in Hadoop using the open-source tools of
performance-based exams designed for Hadoop developers the Hortonworks Data Platform, including Pig, Hive, Sqoop and
working with frameworks like Pig, Hive, Sqoop and Flume. Flume.  

Purpose of the Exam Prerequisites


The purpose of this exam is to provide organizations that use Candidates for the HPDCD exam should be able to perform each
Hadoop with a means of identifying suitably qualified staff to of the tasks in the list of exam objectives below.
develop Hadoop applications for storing, processing, and
analyzing data stored in Hadoop using the open-source tools of Language
the Hortonworks Data Platform (HDP), including Pig, Hive, Sqoop The exam is delivered in English
and Flume.
Hortonworks University
Exam Description Hortonworks University is your expert source for Apache Hadoop
The exam has three main categories of tasks that involve: training and certification. Public and private on-site courses are
• Data ingestion available for developers, administrators, data analysts and other
• Data transformation IT professionals involved in implementing big data solutions.
• Data analysis Classes combine presentation material with industry-leading
hands-on labs that fully prepare students for real-world Hadoop
The exam is based on the Hortonworks Data Platform 2.2 scenarios.
installed and managed with Ambari 1.7.0, which includes Pig
0.14.0, Hive 0.14.0, Sqoop 1.4.5, and Flume 1.5.0. Each
candidate will be given access to an HDP 2.2 cluster along with a
list of tasks to be performed on that cluster.

Exam Objectives
View the complete list of objectives below, which includes links to
the corresponding documentation and/or other resources.

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
5470 Great America Parkway
tested for enterprise-grade deployments.
Santa Clara, CA 95054 USA  
Hortonworks University Academic Program
Overview A Win-Win Situation
The Big Data skills gap is real.
For Students
Every minute there are more than two million Google The Hortonworks University Academic Program enables
searches, roughly 685,000 Facebook updates, 200 million students to obtain authorized training that will prepare
sent emails and 48 hours of video uploaded to YouTube. them for certification, bolstering their employment
Companies collect this data about their customers, but opportunities with firms seeking skilled Hadoop
many struggle to implement meaningful ways to analyze professionals.
and process it. And while there are emerging big data
solutions and tools to better understand business Students receive worldwide access to high quality
problems, there are not enough candidates in today's educational content, certification opportunities, and
employment pool with appropriate skills to implement experience with Hortonworks technologies.
them.
For Academic Institutions
• More than 85% of Fortune 500 organizations will be Hortonworks University partners with accredited colleges
unable to exploit big data analytics in 2015 (Gartner) and universities to meet those needs.
• 46% of companies report inadequate staffing for big
data analytics (TDWI Research) Academic partners receive support from Hortonworks for
• By 2018 the US could face a shortfall of as many as the inclusion of Hortonworks technologies in their course
1.5 million analysts skilled in big data (McKinsey) catalog.

Academic Partners Academic Partner Responsibilities


Becoming an Academic Partner is easy: • Each academic institution is responsible for meeting
classroom set-up requirements
• There is no cost to join • Students must be currently enrolled in the college or
• Student materials are purchased directly from our university
book vendor • Instructional hours must spread across an entire
• Instructors may prepare to teach our materials at their semester
own pace using our materials. • Hortonworks course materials may not be altered, but
institutions are free to add supplemental content.

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
tested for enterprise-grade deployments. 5470 Great America Parkway
Santa Clara, CA. 95054 USA
HDP Academic Analyst: Data Science

Overview Hands-On Labs


This course is designed for students preparing to become familiar • Setting Up a Development Environment
with the processes and practice of data science, including • Using HDFS Commands
machine learning and natural language processing. Included are: • Using Mahout for Machine Learning
tools and programming languages (Python, IPython, Mahout, Pig, • Getting Started with Pig
NumPy, Pandas, SciPy, Scikit-learn), the Natural Language • Exploring Data with Pig
Toolkit (NLTK), and Spark MLlib. • Using the IPython Notebook
• Data Analysis with Python
Target Audience   • Interpolating Data Points
Computer science and data analytics students who need to apply • Define a Pig UDF in Python
data science and machine learning on Hadoop. • Streaming Python with Pig
• K-Nearest Neighbor and K-Means Clustering
• K-Means Clustering
Course Objectives • Using NLTK for Natural Language Processing
• Recognize use cases for data science • Classifying Text using Naive Bayes
• Describe the architecture of Hadoop and YARN • Spark Programming and Spark MLlib
• Describe supervised and unsupervised learning differences • Running Data Science Algorithms using Spark MLib
• List the six machine learning tasks
• Use Mahout to run a machine learning algorithm on Hadoop
Prerequisites
• Use Pig to transform and prepare data on Hadoop
Students must have experience with at least one programming or
• Write a Python script
scripting language, knowledge in statistics and/or mathematics,
• Use NumPy to analyze big data
and a basic understanding of big data and Hadoop principles.  
• Use the data structure classes in the pandas library
• Write a Python script that invokes SciPy machine learning
• Describe options for running Python code on a Hadoop cluster
• Write a Pig User-Defined Function in Python Certification
• Use Pig streaming on Hadoop with a Python script Hortonworks offers a comprehensive certification program that
• Write a Python script that invokes scikit-learn identifies you as an expert in Apache Hadoop. Visit
• Use the k-nearest neighbor algorithm to predict values hortonworks.com/training/certification for more information.
• Run a machine learning algorithm on a distributed data set
• Describe use cases for Natural Language Processing (NLP)
• Perform sentence segmentation on a large body of text
• Perform part-of-speech tagging Hortonworks University
• Use the Natural Language Toolkit (NLTK) Hortonworks University is your expert source for Apache Hadoop
• Describe the components of a Spark application training and certification. Courses are available for developers,
• Write a Spark application in Python data analysts and administrators. Classes combine presentation
• Run machine learning algorithms using Spark MLlib material with industry-leading hands-on labs that fully prepare
students for real-world Hadoop scenarios.

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
5470 Great America Parkway
tested for enterprise-grade deployments.
Santa Clara, CA 95054 USA  
HDP Academic Developer: Apache Pig and Hive

Overview Hands-On Labs


This course is designed for students preparing to become familiar • Use HDFS commands to add/remove files and folders
with Big Data application development in Apache Hadoop using • Use Sqoop to transfer data between HDFS and a RDBMS
Pig and Hive. Topics include: Hadoop, YARN, HDFS, • Run MapReduce and YARN application jobs
MapReduce, data ingestion, workflow definition and using Pig • Explore and transform data using Pig
and Hive to perform data analytics on Big Data. • Split and join a dataset using Pig
• Use Pig to transform and export a dataset for use with Hive
• Use HCatLoader and HCatStorer
Target Audience   • Use Hive to discover useful information in a dataset
Computer Science students who need to understand and • Describe how Hive queries get executed as MapReduce jobs
develop applications for Hadoop. • Perform a join of two datasets with Hive
• Use advanced Hive features: windowing, views, ORC files
• Use Hive analytics functions
• Write a custom reducer in Python
Course Objectives • Analyze and sessionize clickstream data
• Describe Hadoop, YARN and use cases for Hadoop • Compute quantiles of NYSE stock prices
• Describe Hadoop ecosystem tools and frameworks • Use Hive to compute ngrams on Avro-formatted files
• Describe the HDFS architecture • Define an Oozie workflow
• Use the Hadoop client to input data into HDFS
• Transfer data between Hadoop and a relational database Prerequisites
• Explain YARN and MapReduce architectures Students should be familiar with programming principles and
• Run a MapReduce job on YARN have experience in software development. SQL knowledge is also
• Use Pig to explore and transform data in HDFS helpful. No prior Hadoop knowledge is required.    
• Understand how Hive tables are defined and implemented
and analyze data sets
• Use the new Hive windowing functions Certification
• Explain and use the various Hive file formats Hortonworks offers a comprehensive certification program that
• Create and populate a Hive table that uses ORC file formats identifies you as an expert in Apache Hadoop. Visit
• Use Hive to run SQL-like queries to perform data analysis hortonworks.com/training/certification for more information.
• Use Hive to join datasets using a variety of techniques,
including Map-side joins and Sort-Merge-Bucket joins
• Write efficient Hive queries Hortonworks University
• Create ngrams and context ngrams using Hive Hortonworks University is your expert source for Apache Hadoop
• Perform data analytics like quantiles and page rank on Big training and certification. Courses are available for developers,
Data using the DataFu Pig library data analysts and administrators. Classes combine presentation
• Explain the uses and purpose of HCatalog material with industry-leading hands-on labs that fully prepare
• Use HCatalog with Pig and Hive students for real-world Hadoop scenarios.
• Define a workflow using Oozie
• Schedule a recurring workflow using the Oozie Coordinator

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
5470 Great America Parkway
tested for enterprise-grade deployments.
Santa Clara, CA 95054 USA  
   

HDP Academic Operations: Install and Manage with Apache Ambari

Overview Hands-On Labs


This course is designed for students preparing to become • Setting up the HDP environment
administrators for the Hortonworks Data Platform (HDP). It covers • Installing an HDP cluster
installation, configuration, maintenance, security and • Adding a new host to the an HDP cluster
performance topics.     • Managing HDP services
• Using HDFS commands
• Demo: Understanding block storage
• Verifying data with block scanner and fsck
Target Audience     • Using WebHDFS
Computer science students who need to learn about installing, • Demo: Understanding MapReduce
configuring and supporting an Apache Hadoop 2.0 deployment in • Using Hive tables
a Linux environment. • Using Sqoop
• Installing and testing Flume
• Running an Oozie Workflow
• Defining and processing the data pipeline with Falcon
Course Objectives • Using HDFS Snapshot
• Configuring rack awareness
• Describe tools and frameworks in the Hadoop ecosystem
• Implementing NameNode HA
• Describe the Hadoop Distributed File System (HDFS)
• Securing an HDP Cluster
• Install and configure an HDP cluster
• Setting up a Knox gateway
• Ensure data integrity
• Deploy and configure YARN on a cluster
Prerequisites
• Schedule YARN jobs
Attendees should be familiar with Hadoop and Linux
• Configure and troubleshoot MapReduce jobs
environments.      
• Describe enterprise data movement
• Use HDFS web services
• Configure a Hiveserver
• Transfer data with Sqoop
Certification
Hortonworks offers a comprehensive certification program that
• Transfer data with Flume
identifies you as an expert in Apache Hadoop. Visit
• Data processing and management with Falcon
hortonworks.com/training/certification for more information.
• Monitor HDP2 services
• Perform backups and recovery
• Providing high availability through rack awareness Hortonworks University
• Providing high availability of through NameNode HA Hortonworks University is your expert source for Apache Hadoop
• Use Apache Knox gateway as a single authentication point training and certification. Courses are available for developers,
• Describe security requirements in an HDP cluster data analysts and administrators. Classes combine presentation
• Commission and decommission nodes material with industry-leading hands-on labs that fully prepare
students for real-world Hadoop scenarios.

About Hortonworks US: 1.855.846.7866


Hortonworks develops, distributes and supports the International: +1.408.916.4121
only 100 percent open source distribution of www.hortonworks.com
Apache Hadoop explicitly architected, built and
tested for enterprise-grade deployments. 5470 Great America Parkway
Santa Clara, CA 95054 USA  
Visit us online:
training.hortonworks.com

You might also like