Professional Documents
Culture Documents
Course Curriculum - Big Data and Hadoop PDF
Course Curriculum - Big Data and Hadoop PDF
Module 1
HADOOP ARCHITECTURE
Learning Objectives - In this module, you will understand what is Big Data, What are the
limitations of the existing solutions for Big Data problem; How Hadoop solves the Big
Data problem, what are the common Hadoop ecosystem components, Hadoop
Architecture, HDFS and Map Reduce Framework, and Anatomy of File Write and Read.
Topics:
What is Big Data
Hadoop Architecture
Hadoop ecosystem components
Hadoop Storage: HDFS
Hadoop Processing: MapReduce Framework
Hadoop Server Roles: NameNode, Secondary NameNode, and DataNode
Anatomy of File Write and Read.
Module 2
Topics:
Hadoop Cluster Architecture
Hadoop Cluster Configuration files
Hadoop Cluster Modes
Multi-Node Hadoop Cluster
A Typical Production Hadoop Cluster
MapReduce Job execution
Common Hadoop Shell commands
Data Loading Techniques:
o FLUME
o SQOOP
o Hadoop Copy Commands
Hadoop Project: Data Loading
Module 3
HADOOP MAPREDUCE FRAMEWORK
Learning Objectives - In this module, you will understand Hadoop MapReduce
framework and how MapReduce works on data stored in HDFS. You will also learn the
different types of Input and Output formats in MapReduce framework and their usage.
Topics:
Hadoop Data Types
Hadoop MapReduce paradigm
Map and Reduce tasks
MapReduce Execution Framework
Practitioners and Combiners
Input Formats (Input Splits and Records, Text Input, Binary Input, Multiple Inputs)
Output Formats (Text Output, Binary Output, Multiple Output)
Hadoop Project: MapReduce Programming
Module 4
ADVANCE MAPREDUCE
Learning Objectives - In this module, you will learn Advance MapReduce concepts such
as Counters, Schedulers, Custom Writable, Compression, Serialization, Tuning, Error
Handling, and how to deal with complex MapReduce programs.
Topics:
Counters
Custom Writables
Unit Testing: JUnit and MRUnit testing framework
Error Handling and Tuning
Advance MapReduce
Hadoop Project: Advance MapReduce programming and error handling.
Module 5
Topics:
Installing and Running Pig
Grunt
Pig's Data Model
Pig Latin
Developing & Testing Pig Latin Scripts
Writing Evaluation
Filter
Load & Store Functions
Hadoop Project: Pig Scripting
Module 6
Topics:
Hive Architecture and Installation
Comparison with Traditional Database
HiveQL: Data Types, Operators and Functions
Hive Tables(Managed Tables and External Tables, Partitions and Buckets, Storage
Formats, Importing Data, Altering Tables, Dropping Tables)
Querying Data (Sorting And Aggregating, Map Reduce Scripts, Joins & Sub queries,
Views, Map and Reduce side Joins to optimize Query).
Module 7
Topics:
Hive: Data manipulation with Hive
User Defined Functions
Appending Data into existing Hive Table
Custom Map/Reduce in Hive
Hadoop Project: Hive Scripting
HBase: Introduction to HBase
Client API's and their features
Available Client
HBase Architecture
MapReduce Integration.
Module 8
Topics:
HBase: Advanced Usage
Schema Design
Advance Indexing
Coprocessors
Hadoop Project: HBase tables
The ZooKeeper Service: Data Model
Operations
Implementation
Consistency
Sessions
States.
Module 9
Topics:
Schedulers: Fair and Capacity
Hadoop 2.0 New Features: Name Node High Availability
HDFS Federation
MRv2
YARN
Running MRv1 in YARN
Upgrade your existing MRv1 code to MRv2
Programming in YARN framework.
Module 10
Some of the data sets on which you may work as a part of the project work:
Twitter Data Analysis : Download twitter data and the put it in HBase and use Pig,
Hive and MapReduce to garner the popularity of some hashtags
Stack Exchange Ranking and Percentile data-set : It is dataset from stack Over flow,
in which there ranking and percentile details of Users
Loan Dataset : It deals with the users who has taken along with their Emi details,
time period etc
Data -sets by Government: like Worker Population Ratio (per 1000) for persons of
age (15-59) years according to the current weekly status approach for each state/UT
Machine Learning Dataset like Badges datasets : The dataset is for system to
encode names , for ed +/- label followed by a person’s name
Weather Dataset : it has all the details of weather over a period of time using which
you may find out the hottest or coldest or average temperature
In addition, you can choose your own dataset and create a project around that as well.
Why Learn Hadoop?
It is becoming almostimpossible for the large companies to store, retrieve and process
data which is ever-increasing. If any company gets hold on this, nothing can stop it from
becoming the next BIG success. The problem lies in the use of traditional systems to
store enormous data. Though these systems were a success a few years ago, with
increasing amount and complexity of data, these are soon becoming obsolete. The
good news this…Hadoop, which is not less than a panacea for all those companies
working with BIG DATA in a variety of applications and has since become an implicit
standard for storing, handling, evaluating and retrieving hundreds of terabytes, and
even petabytes of data.
The importance of Hadoop is evident from the fact that there are many global MNCs
that are using Hadoop and consider it as an integral part of their functioning, such as
companies like Yahoo and Facebook! On February 19, 2008, Yahoo! Inc. established
the world's largest Hadoop production application. The Yahoo! Search Webmap is a
Hadoop application that runs on over 10,000 core Linux cluster and generates data that
is now widely used in every Yahoo! Web search query.
Facebook, a $5.1 billion company has over 1 billion active users, according to Wikipedia
in 2012. Storing and managing data of such magnitude could have been a problem,
even for a company like Facebook. But thanks to Apache Hadoop. Facebook uses
Hadoop to keep track of each and every profile it has on it, as well as all the data
related to them like their pictures, posts, comments etc.
Opportunities for Hadoopers are infinite - being a Hadoop architect, developer, tester
and so on. If cracking and managing BIG Data is your passion in life, then think no more
and Join Edureka’s Hadoop Online course and carve a niche for yourself!
Happy Hadooping!