You are on page 1of 3

INT313: BIG DATA PROCESSING FRAMEWORK

L:2 T:0 P:2 Credits:3

Unit I

Introduction to Big Data Processing Frameworks : Introduction to Processing Engines and

Processing Frameworks, Introduction to Batch Processing Systems, Introduction to Stream only

frameworks, Introduction to Hybrid frameworks, Comparison of frameworks

Unit II

Working with Hybrid Framework: Apache Spark : Introduction to Apache Spark, Features of

Apache Spark, Components of Apache Spark, Sentiment Analysis using Apache Spark, Installation of

Apache Spark, Working with Apache Spark using Scala, Introduction to Apache Spark Programming

Unit III

Working with Hybrid Framework: Apache Flink : Introduction to Apache Flink, Working with

Apache Flink Ecosystem, Features of Apache Flink, Apache Flink Architecture, Installation and

Configuration of Apache Flink on Ubuntu, Apache Flink Shell Commands

Unit IV

Working with Stream-only framework: Apache Storm : Introduction to Apache Storm, Building

blocks of Storm Topologies, Apache Storm - Cluster Architecture, Apache Storm - Workflow, Apache

Storm Installation, Possible Use Cases of Apache Storm


Unit V

Working with Stream-only framework: Apache Samza : Introduction to Apache Samza, Apache

Samza Architecture : Streaming Layer, Apache Samza Architecture : Execution Layer, Apache Samza

Architecture : Processing Layer, Introduction to hello-samza (starter project for Apache Samza jobs)

Unit VI

Apache Samza: Working with Apache Kafka: Apache Kafka - Cluster Architecture, Apache Kafka – Basic
Operations, Apache Kafka - Simple Producer Example, Apache Kafka - Consumer Group Example, Apache
Kafka - Integration With Storm, Apache Kafka - Integration With Spark, Real Time Application (Twitter)

List of Practicals/Experiments

1. Setting Up Eclipse for Apache Storm and making it ready for first program.
2. Setting up Maven Project for demonstration of spouts and bolts using Apache Storm.
3. Sentiment Analysis using Apache Spark
4. Demonstrate the use of Mini Reducer i.e. combiner in Apache Hadoop Map Reduce.
5. Demonstrate the use of GraphX in Apache Spark.
6. Demonstrate the use of Spark Streaming in Apache Spark.
7. Demonstrate the use of Producer and Consumer in Apache Kafka.
8. Create an Apache Flink project in Eclipse
9. Implementing different types of Joins in Apache Spark.
10. Apache Flink - Running a Flink Program
References:

1. Big Data Simplified by Sourabh Mukherjee, Amit Kumar Das, Sayan Goswami, Pearson, India

2. Data Analytics with Spark Using Python by Jeffrey Aven, Pearson, India

3. Big Data Fundamentals by Thomas Erl, Pearson, India

You might also like