Professional Documents
Culture Documents
Page 1
Big Data & Hadoop Class Syllabus
o Joins o CAP theorem
Map-side joins o HBase Architecture
Reduce-side joins o HBase Clients – Java Client
Distributed Cache o Loadling Data
Hive o UDF,UDAF,UDTFs
o Comparison with RDBMS Zookeeper
o HQL o Zookeeper in HBase
o Data types o How Zookeeper is used in Production
o Tables Ambari
o Importing and Exporting
o Real time Cluster deployment Using
o Partitioning and Bucketing – Advanced.
Ambari
o Joins and Join Optimization. o Monitoring the Cluster
o Functions- Built in & user defined
REST API
o Advanced Optimization of HQL
o Introduction
o Storage File Formats – Advanced
o Real time Use cases of How REST is used
o Loading and Storing Data
with Hadoop
o SerDes – Advanced
Labs:
Sqoop
o Real Time use cases and Data sets
o Important basics
covered (10+ Real Time datasets)
o Import – Deep dive
o Word count, Sensors(Weather
o Export – Deep dive s
Sensors)Dataset, Social Media data sets
o Sqoop Optimization – Incremental Load
like YouTube, Twitter data analysis,
o Many more
o Jav and Unix Basics Lab
PIG o Hadoop, Hive, Sqoop, Oozie, HBase,
o Important basics Flume Installations –Pseudo&Cluster
o Pig Latin Master Project:
o Data types o Real-time DataWarehouse migration:
o Functions – Built-in, User Defined o Real-time concepts covered are
o Loading and Storing Data Hive - Advanced topics
Flume Sqoop import/export
o Configure Flume and Import data Oozie Scheduling
o Architecture and LAB How Hadoop MR used in DW
Oozie RDBMS concepts
ETL tool concepts
o Different workflow jobs
Integration with Reporting tools
o Ooze scheduler.
o LAB – covers advanced topics
HBase
o NoSQL databases Introduction
Page 2