You are on page 1of 1

1st step : Data ingestion -It is important to know data ingestion and workflow

management if you want to maintain efficient data pipelines. Learn Sqoop, Flume and
Oozie in this module.

Import and export data using nifi


Import streaming data using Apache kafka
Workflow management using ____

2nd step phase 1: Big data processing-After learning how to collect and manage
data, learn how to process it with skills in MapReduce, Pig, Hive, HQL & HBase.

MapReduce phases
Data processing methods
Handling big data with Apache Pig
Architecture of Hive
Database creation and operation of tables
Hive query language (HQL) statements
HBase NoSQL database and integration with Hadoop

2nd step phase 2: Real-time analytics and streaming data


Add value to your analysis by making it in real-time. Learn the fundamentals of
Kafka in this module.

Kafka components & use cases


Configuration of producers and consumers
Reading messages from Kafka
Kafka streams, features, concepts, architecture

2nd step phase 3 - Big data analytics with spark


Finally, learn big data analytics in full swing with Spark. This extensive module
will give you all the practice you need to process and analyze data sets to solve
problems.

Get your Spark cluster running


Spark DataFrames and RDD
Analyze and process real world data sets with Spark SQL and Spark streaming
Regression, clustering and classification algorithms using Spark MLLib

You might also like