You are on page 1of 2

Big Data & Hadoop Class Syllabus

 BigData Introduction and Hadoop o Mapper


o Reducer
Fundamentals
o Driver program
o Data Storage and Analysis
o How to package the job
o Comparison with RDBMS
o MapReduce WebUI
 Hadoop – A Brief History o How MapReduce Job run?
 MapReduce – Part1 o Shuffle & Sort
o Map and Reduce o Speculative Execution
o Sample Program  InputFormats
o Combiner o Input Splits and Record Reader
o Practitioners and Custom Partitioned o Default Input Formats
 Hadoop Streaming & Pipes o Implement Custom Input Format
 HDFS  OutputFormats
o Blocks o Default Output formats
o NN & DN o Output Record Reader
o HDFS Federation & High Availability  Compression
 HDFS Clients o Map Output
o HDFS Command Line o Final Output
o HDFS CLI – File System
S Operations Lab o Splittable vs Non Splittable
o HDFS Web UI o Compression Codecs
o HDFS Java Client  Serialization
o HDFS Java Client – File System o Data types –default
Operations Lab o Writable vs Writable Comparable
o CRUD Operations using Java Client
o Custom Data types – Custom
o Anatomy of File Read and File Write
Writable/Comparable
o DistCp
 File Based Data structures
o Cluster balancing
o Sequence file
 YARN – Cluster Management o Reading and Writing into Sequence file
(Hadoop 2.x) o Map File
o How Yarn Applications run?  Tuning MapReduce Jobs
o YARN vs MapReduce
 Advanced MapReduce
o YARN Scheduling
o Counters
 Capacity Scheduler
 Built-In Counters Classification
 Fair Scheduler
 User Defined Counters
 FIFO Scheduler
o Sorting
 Map Reduce – Part2  Partial Sort
o Env Setup
 Total Sort
o Tool and ToolRunner  Secondary Sort

Page 1
Big Data & Hadoop Class Syllabus
o Joins o CAP theorem
 Map-side joins o HBase Architecture
 Reduce-side joins o HBase Clients – Java Client
 Distributed Cache o Loadling Data
 Hive o UDF,UDAF,UDTFs
o Comparison with RDBMS  Zookeeper
o HQL o Zookeeper in HBase
o Data types o How Zookeeper is used in Production
o Tables  Ambari
o Importing and Exporting
o Real time Cluster deployment Using
o Partitioning and Bucketing – Advanced.
Ambari
o Joins and Join Optimization. o Monitoring the Cluster
o Functions- Built in & user defined
 REST API
o Advanced Optimization of HQL
o Introduction
o Storage File Formats – Advanced
o Real time Use cases of How REST is used
o Loading and Storing Data
with Hadoop
o SerDes – Advanced
 Labs:
 Sqoop
o Real Time use cases and Data sets
o Important basics
covered (10+ Real Time datasets)
o Import – Deep dive
o Word count, Sensors(Weather
o Export – Deep dive s
Sensors)Dataset, Social Media data sets
o Sqoop Optimization – Incremental Load
like YouTube, Twitter data analysis,
o Many more
o Jav and Unix Basics Lab
 PIG o Hadoop, Hive, Sqoop, Oozie, HBase,
o Important basics Flume Installations –Pseudo&Cluster
o Pig Latin  Master Project:
o Data types o Real-time DataWarehouse migration:
o Functions – Built-in, User Defined o Real-time concepts covered are
o Loading and Storing Data  Hive - Advanced topics
 Flume  Sqoop import/export
o Configure Flume and Import data  Oozie Scheduling
o Architecture and LAB  How Hadoop MR used in DW
 Oozie  RDBMS concepts
 ETL tool concepts
o Different workflow jobs
 Integration with Reporting tools
o Ooze scheduler.
o LAB – covers advanced topics
 HBase
o NoSQL databases Introduction

Page 2

You might also like