Big Data - Hadoop & Spark Training Syllabus: Tamilboomi

Big Data – Hadoop & Spark Training Syllabus Tamilboomi
What is hadoop? After this class you will be able to,
Hadoop is a platform written in java where we  Have in-depth knowledge about

can able to process large amount of data. hadoop.
Hadoop eco system has lots of tools which  Have hands-on experience on hadoop.
make processing the bigdata made easy.  Complete a project on hadoop
independently.
Let’s learn how to do that end to end..!!
 Know how to switch career to hadoop
Objective: from any other technology.
 Develop your own spark application.
Over the past years, Hadoop & Spark has seen  Understanding different components of
enormous industry adoption and facing lack of spark.
skills in the market. To help bridge the gap we  Performance tune a spark application.
have designed this course with industry
 Prepare and complete Horton works
expectations with real time examples. This is
spark developer certification (With min
course will help you understand variety of big
1 month of practice)
data application development options and let
 Build data pipeline using spark API’s and
you develop your own and Performance tune
Dataframes.
the same.
 Analyze Spark jobs using the UI’s and
This course is for, logs.
 Create Streaming jobs and run on YARN
 Professionals who wants to learn & cluster.
develop Hadoop & Spark applications.
 Professionals who wants to do Course Overview:
certification (Hortonworks : HDPCD,
HDPCDSPARK)(Cloudera: CCA175,  Introduction to Hadoop
CCA159).  Hadoop Architecture In-depth
 And those are is interested to learn travel.
about latest technology for their career  Map Reduce 1.0 & YARN
improvement.  Pig & Hive
Course Structure:  Sqoop & Flume
 Hbase, oozie & Zookeeper
 This course is designed with 50% theory  Welcome to Spark.
and 50% Hands on.  Programming with RDD.
 You will be given real time POC to solve  SparkSQL & DataFrames.
and learn.  Spark Job Execution.
Hadoop – Project (English) – Click Here  Cluster Architecture for Spark.
 Introduction to Kafka.
Hadoop - Intro Session (Tamil) – Click Here  Introduction to Spark Streaming.
SPARK - Intro Session (Tamil) – Click Here
Tamilboomi Page 1
Module 1: Introduction to Hadoop World: Module 4: Pig & Hive.
 Dataaaaaaa.....Bigdata..!  Hive introduction.

 What is bigdata? 3 + 1 V's.  Hive data model.
 What is Hadoop , why hadoop & Its  Hive implementation of sample project.
history.  Pig Introduction.
 Hadoop Eco System an overview.  Pig Data structure.
(HDFS,MAPREDUCE,SQOOP,FLUME,PIG,  Pig Implementation on sample project.
HIVE,OOZIE,HBASE..etc)  How pig & hive is used in real time
 Current Requirements and Future project?
possibilities in Hadoop.  Module 4 assignment.
 RDBMS vs Hadoop
 Wait..Finally what hadoop is not? Module 5: Sqoop & Flume.
 Do we need java to learn hadoop?  Flume introduction.
 Hadoop installation  Flume configuration.
Module 2: Hadoop Architecture In-depth  Flume sample Project.
 Sqoop Introduction.
travel:
 Sqoop configuration.
 HDFS - An introduction.  Sqoop Sample project.
 How data is stored in hdfs? (Travel of a
byte). Module 6: Hbase, oozie & Zookeeper
 Hadoop Daemons:  oozie introduction.
o Name node.  oozie Overview and configuration.
o Data node.  zookeeper overview.
o Job Tracker.  HBASE Introduction.
o Task tracker.  HBASE Overview.
 Fault tolerance in hadoop.  SPARK Over view
 HA mode in HDFS.
 How files are handled in projects SPARK
(sample Project Scenario Execution)
Intro Session(Tamil) – Click Here
Module 3: Map Reduce 1.0 & YARN.
Module 1: Welcome to Spark:
 Mapreduce history.
 Welcome to the world of Spark.
 How Map Reduce is being used in
 Bye Bye Hadoop? (Hadoop Vs Spark).
Projects.
 Spark Components:
 Mapreduce architecture,Key-Value pair.
o Spark Core
 YARN 2.0 architecture.
o Spark SQL
 Java Implementation of map reduce.
o Graphx
(Sample POC)
o Mlib
 Mapper, Reducer, Combiner Different
 Spark Use cases in real time.
combination.
Tamilboomi Page 2
Hands on:  Job Performance (tuning).
 Installing and configuring spark in your Hands on:

machine.
 Running a sample program in spark.  Visualizing DAG execution.
 Executing a spark use case.  Measuring memory usage.
 Understanding
Module 2: Programming with RDD: performance.
 What is RDD? Module 5: Introduction to Kafka.

 Why RDD?
 How RDD gets executed in a spark  Introduction to Kafka.
application.  Kafka architecture.
 Producers,Consumers in Kafka.
 Transformations in RDD.
 Actions in RDD.  Working with kafka.
 RDD Programming API’s. Hands on:
Hands On:  Installing & configuring
 Creating RDD from a Data file. kafka.
 Applying transformations &  Producing and consuming
actions in RDD. messages.
 Interactive queries using RDD. Module 6: Spark Streaming.
Module 3: Spark SQL/DataFrames.  Introduction to Spark Streaming.
 SparkSQL/Dataframe Uses.  DSTREAM API’s and Stateful
 DataFrame / SQL API’s Streams.
 Spark & Hive Integration.  Realiablity and fault recovery.
 Catalyst query optimization. Hands on:
Hands on:  Creating DStream from source.
 Create dataframe from a file.  Integration of Kafka and Spark
 Create dataframe from a table. streaming.
 Caching and reusing  Developing a kafka-spark
dataframes. application.
 Query with dataframes API and  Viewing Stream jobs in WebUI.
SQL. ----------------------------------------------------------
Module 4: Spark Execution & Optimization.
 Jobs Stages & tasks.

 Partitions and Shuffles. For More details :
 Data locality.
Tamilboomi Page 3
Mail: Arumugam@tamilboomi.com,
arumugamsip@gmail.com
Whatsapp: +91 9619663272
Visit to: www.tamilboomi.com
For cloudera VM and Free Bigdata Startup
kit: Startup kit link Click here.
Happy Learning..!
Tamilboomi Page 4

Big Data - Hadoop & Spark Training Syllabus: Tamilboomi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data - Hadoop & Spark Training Syllabus: Tamilboomi

Uploaded by

Copyright:

Available Formats

Big Data – Hadoop & Spark Training Syllabus Tamilboomi

What is hadoop? After this class you will be able to,

Hadoop is a platform written in java where we  Have in-depth knowledge about

SPARK - Intro Session (Tamil) – Click Here

Module 1: Introduction to Hadoop World: Module 4: Pig & Hive.

 Dataaaaaaa.....Bigdata..!  Hive introduction.

Hands on:  Job Performance (tuning).

 Installing and configuring spark in your Hands on:

 What is RDD? Module 5: Introduction to Kafka.

 Jobs Stages & tasks.

You might also like