Hadoop Course Catalog (UPDATED)

Big Data & Hadoop Certification
Course Catalogue
© EduPristine – www.edupristine.com
© EduPristine For [Big Data & Hadoop]
Introduction
Opinions are good. Data is better…
© EduPristine For [Big Data & Hadoop] 1

Some facts ….
 Every day, people send 150 billion new email messages. The number of mobile devices already
exceeds the world's population and is growing. With every keystroke and click, we are creating
new data at a blistering pace.
 90% of the data in the world today has been created in the last two years alone.
 80% of data captured today is unstructured.
 This brave new world is a potential treasure trove for data scientists and analysts who can comb
through massive amounts of data for new insights, research breakthroughs and make marketing
strategies.
 But It also presents a problem for traditional relational databases and analytics tools, which were
not built to handle such massive data .
 Another challenge is the mixed sources and formats, which include XML, log files, objects, text,
binary and more.

Big Data is growing…

For an example…
 “80% of data captured today is unstructured, from sensors used to gather climate information,
posts to social media sites, digital pictures and videos, purchase transaction records, and cell
phone GPS signals, to name a few. All of this unstructured data is Big Data : According to IBM”

Demand of Hadoop
 With Hadoop, no data is too big. Hadoop has gained momentum mainly due to its ability to
analyze unstructured Big Data to draw important predictions for businesses
 Hadoop is the core platform for structuring Big Data, and solves the problem of making it useful
for analytics purposes.
 Hadoop is more than just a faster, cheaper database and analytics tool. In some cases, the
Hadoop framework lets users query datasets in previously unimaginable ways.
 Hadoop appeals to IT leaders because of the improved performance, scalability, flexibility,
efficiency, extensibility and fault tolerance it offers
 Big Data means Big Opportunities
• 16000+ openings today-> Source : naukari.com
• Big Demand but No One Qualifies : For every 100 openings, there are only 2 qualified candidates" - fast
company dot com
• Hadoop pioneers in the online world -- including eBay, Face book, LinkedIn, Netflix and Twitter - paved the
way for companies in other data-intensive industries and now you have huge opportunities in Industries
such as: finance, technology, telecom and government. Increasingly, IT companies are finding a place for
Hadoop in their data architecture plans
Nowadays…
 Yahoo! has ~20,000 machines running Hadoop

 The largest clusters are currently 2000 nodes
 Several petabytes of user data (compressed, unreplicated)
 Yahoo! runs hundreds of thousands of jobs every month

The power of Hadoop
 Hadoop brings the ability to cheaply

process large amounts of data,
regardless of its structure. By large, we
mean from 10-100 gigabytes and above.
 USP:
1. Open source technology
2. No high end sever machines required

Top companies using Hadoop

Hottest Salaries in IT sector

Who can join this course ???
 Software Engineers, who are into ETL/Programming and exploring for great job opportunities in
hadoop.
 Managers, who are looking for the latest technologies to be implemented in their organization, to
meet the current & upcoming challenges of data management.
 Any Graduate/Post-Graduate, who is aspiring a great career towards the cutting edge
technologies.
Pre-requisites for the course:

 Prerequisites for learning Hadoop include hands-on experience in Core Java & Unix and good
analytical skills to grasp and apply the concepts in Hadoop. This course helps you brush up your
Java Skills needed to write Map Reduce programs.
 Edupristine provides comprehensive recordings complimentary Course "Java Essentials for
Hadoop" to all the participants who enroll for the Hadoop Training. This course helps you to brush
up your Java Skills needed to write Map Reduce programs.

Training Snapshot
Training Highlight
 3 days- Online Java and Unix coverage
 11 Days Classroom Training (55 Hours)
 Online Access- Live Instructor Based Training.
 10 hrs. Live Projects.
 20 hrs. Cloudera Preparation (not in PRO package)
 PowerPoint Presentation covering all classes & Hadoop Software CD
 Recorded Videos covering all classes & Hadoop Definitive Book
 Access to repeat any session with next batch
 Cloudera Dumps & Exam Questions (not in PRO package)
 Quiz & Assignment
 24x7 Life Time access on material
 24x7 live support / Discussion Forum
 Placement Assistance/ CV Updation/Interview question updates
Participants will be able to:
 Master the concepts of Hadoop Distributed File System

 Setup a Hadoop Cluster
 Write MapReduce Code in Java
 Perform Data Analytics using Pig and Hive
 Understand Data Loading Techniques using Sqoop and Flume
 Implement HBase, MapReduce Integration, Advanced Usage and Advanced Indexing
 Have a good understanding of ZooKeeper service
 Use Apache Oozie to Schedule and Manage Hadoop Jobs
 Implement best Practices for Hadoop Development and Debugging
 Develop a working Hadoop Architecture
 Work on a Real Life Project on Big Data Analytics and gain Hands on Project Experience

The Hadoop Bestiary
The Hadoop Bestiary

Flume Collection and import of log and event data
HBase Column-oriented database scaling to billions of rows
HDFS Distributed redundant file system for Hadoop
HIVE Data warehouse with SQL-like access
MapReduce Parallel computation on server clusters
PIG High-level programming language for Hadoop computations
Sqoop Imports data from relational databases
Zookeeper Configuration management and coordination
System Requirement: 4 GB Ram, 64 bit processor

Day wise break up of Hadoop Program
Day Topic Content Objective

 Object Oriented  Understand & brush up
Basic Java &  Portable core java concepts.
Day Introduction to  Multi Threaded  Introduction to Hadoop
1&2 Hadoop  Secure Technology.
Technology.  Platform Independent  Mode : Online session
Introduction to  Unix
Day 3  How to operate Unix system
Unix and Basics of  Online Session
Hadoop

Day wise break up of Hadoop Program – Cont'd…

Understanding  Cluster Specification  To understand the
Pseudo Cluster  Hadoop Configuration (Configuration different components of
Environment Management a Hadoop Pseudo Cluster
 Environment Settings and about different
&  Important Hadoop Daemon Properties configuration files to be
 Hadoop Daemon Addresses and Ports used in the cluster.
Introduction To Other Hadoop Properties)
Hadoop  Basic Linux and HDFS commands
Distributed File  Design of HDFS  To understand what is
System (HDFS).  HDFS Concepts HDFS, its requirement for
Day 4  Command Line Interface running Map-Reduce and
 Hadoop File Systems how it differs from other
 Java Interface distributed file systems.
 Data Flow (Anatomy of a File Read,
Anatomy of a File Write, Coherency
Model)
 Parallel Copying with DISTCP
 Hadoop Archives


 Hadoop Data Types, Functional -
Concept of Mappers, Functional -
Concept of Reducers, The Execution
Understanding - Framework, Concept of Practitioners,  To get an idea of how
Map-Reduce Functional - Concept of Combiners, Map-Reduce framework
Day
Basics and Map- Distributed File System, Hadoop Cluster works and why Map-
5&6
Reduce Types and Architecture, MapReduce Types, Input Reduce is tightly coupled
Formats Formats (Input Splits and Records, Text with HDFS.
Input, Binary Input, Multiple Inputs),
Output Formats (Text Output, Binary
Output, Multiple Output).


 Modes of Spark
 Spark Installation Demo
 Overview of Spark on a cluster
 Spark Standalone Cluster
 Invoking Spark Shell
 Creating the Spark Context
 Loading a File in Shell
 Performing Some Basic Operations on Files in Spark Shell How to Run programs
 Building a Spark Project with sbt up to 100x faster than
 Running Spark Project with sbt, Caching Overview
Hadoop Map Reduce
Day  Distributed Persistence
SPARK in memory, or 10x
7&8  Spark Streaming Overview
 RDDs faster on disk using
 Transformations in RDD Spark.
 Actions in RDD, Loading Data in RDD
 Saving Data through RDD
 Key-Value Pair RDD
 MapReduce and Pair RDD Operation
 Java/Scala/Python and Hadoop Integration Hands on
 Loading of Data
 Hive Queries through Spark
 Performance Tuning Tips in Spark


Day HIVE  Hive Architecture, Running Hive,  To understand HIVE, how
9 Comparison with Traditional Database data can be loaded into
(Schema on Read Versus Schema on HIVE and query data
Write, Updates, Transactions and from Hive and so on.
Indexes), HiveQL (Data Types, Operators
and Functions), Tables (Managed Tables
and External Tables, Partitions and
Buckets, Storage Formats, Importing
Data, Altering Tables, Dropping Tables),
Querying Data (Sorting And Aggregating,
Map Reduce Scripts, Joins & Sub queries
& Views, Map and Reduce site Join to
optimize Query), User Defined Functions,
Appending Data into existing Hive Table,
Custom Map/Reduce in Hive.

 Installing and Running Pig, Grunt, Pig's

 To learn what is Pig,
Data Model, Pig Latin, Developing &
Day where we can use Pig,
PIG Testing Pig Latin Scripts, Writing
10 how Pig is tightly coupled
Evaluation, Filter, Load & Store
with Map-Reduce.
Functions.
 To understand Sqoop,
 Database Imports, Working with
Day how import and export is
Imported Data, Importing Large Objects,
11 SQOOP done in/from HDFS and
Performing Exports, Exports - A Deeper
what is the internal
Look.
architecture of Sqoop.


 We will provided data sets on which
participants will work as a part of the
 To work on a real life
Day Project.
Live Project 1 project.
12  App. Development.
 Running search Query.
 To understand HBase,
 Introduction, Client API - Basics, Client
how data can be loaded
API - Advanced Features, Client API -
Day into HBase and query
HBASE Administrative Features, Available Client,
13 data from HBase using
Architecture, Map Reduce Integration,
client and so on.
Advanced Usage, Advance Indexing.
 We will provided data sets on which

participants will work as a part of the
Day  To work on a real life
Live Project 2 Project.
14 project.
 App. Development.
 Running search Query.

Course Highlights & Fees
Course Highlights Hadoop Pro Hadoop Plus Hadoop Premium

Rs. 25000 Rs. 35000 Rs. 35000
11 Days Classroom Training (55 Hours)   
12 Days - Live Instructor Based Training.   
20 hrs. Live Projects.   
PowerPoint Presentation covering all classes   
Recorded Videos covering Java and Unix sessions.   
Recorded Videos of Live Instructor based Training.   
Quiz & Assignment   
24x7 Life Time access on material and Support   
Discussion Forum   
Certification of completion & excellence from
  
EduPristine
Cloudera training workshop for 4 days (20 hrs.
 
classroom)
PPT for Cloudera exam preparation  
Material
i.Hadoop Definitive Guide"  
ii.Sample Question papers of Cloudera Hadoop Developer
Cloudera CCDH examination:

i) Exam Registration- $295 fees included
Payment Mode
Procedure to ENROLL
 Online Payment via Net Banking transfer or by log on to www.edupristine.com
• <click<Fees<Buy (debit/credit card)
 The bank details are given below:
• Bank Account Name: Neev Knowledge management Pvt. Ltd
• Bank Name: HDFC
• Branch Address: Maneji Wadia building, ground floor, Nanik Motwani Marg fort, Mumbai,
• Account Number: 00602560008449
• Routing Number/ Sort Code: 021000021
• Swift Code: HDFCINBB
• RTGS/NEFT IFSC Code: HDFC0000060
• Account Type: Current
• Address: 702, Raaj Chambers, Near Andheri Subway, Old Nagardas Road, Andheri East Mumbai 69
 Cash Payment (Handover to venue co-ordinator & collect the receipt on spot)
 Cheque Payment in favor of “Neev Knowledge Management Pvt Ltd”
 For Registration Please Contact: Vishal Shah – +91 – 750 6142 717

Thank you!
Contact:
EduPristine
702, Raaj Chambers, Old Nagardas Road, Andheri (E),
Mumbai-400 069. INDIA
www.edupristine.com
Ph. +91 22 4211 7474
© EduPristine – www.edupristine.com
© EduPristine For [Big Data & Hadoop]

Hadoop Course Catalog (UPDATED)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop Course Catalog (UPDATED)

Uploaded by

Copyright:

Available Formats

Big Data & Hadoop Certification

Opinions are good. Data is better…

© EduPristine For [Big Data & Hadoop] 1

© EduPristine For [Big Data & Hadoop] 2

© EduPristine For [Big Data & Hadoop] 3

© EduPristine For [Big Data & Hadoop] 4

 Yahoo! has ~20,000 machines running Hadoop

© EduPristine For [Big Data & Hadoop] 6

 Hadoop brings the ability to cheaply

© EduPristine For [Big Data & Hadoop] 7

© EduPristine For [Big Data & Hadoop] 8

© EduPristine For [Big Data & Hadoop] 9

Pre-requisites for the course:

© EduPristine For [Big Data & Hadoop] 10

 Master the concepts of Hadoop Distributed File System

© EduPristine For [Big Data & Hadoop] 12

The Hadoop Bestiary

System Requirement: 4 GB Ram, 64 bit processor

© EduPristine For [Big Data & Hadoop] 13

Day Topic Content Objective

© EduPristine For [Big Data & Hadoop] 14

Day Topic Content Objective

© EduPristine For [Big Data & Hadoop] 15

Day Topic Content Objective

© EduPristine For [Big Data & Hadoop] 16

Day Topic Content Objective

© EduPristine For [Big Data & Hadoop] 17

Day Topic Content Objective

© EduPristine For [Big Data & Hadoop] 18

Day Topic Content Objective

 Installing and Running Pig, Grunt, Pig's

© EduPristine For [Big Data & Hadoop] 19

Day Topic Content Objective

 We will provided data sets on which

© EduPristine For [Big Data & Hadoop] 20

Course Highlights Hadoop Pro Hadoop Plus Hadoop Premium

© EduPristine For [Big Data & Hadoop] 22

You might also like