Professional Documents
Culture Documents
COURSE OBJECTIVES
To learn
1. Compare various file systems and use an appropriate file system for storing different types of
data.
2. Demonstrate the concepts of Hadoop ecosystem for storing and processing of unstructured
data.
3. Apply the knowledge of programming to process the stored data using Hadoop tools and
generate reports.
4. Connect to web data sources for data gathering, Integrate data sources with hadoop
components to process streaming data.
5. Tabulate and examine the results generated using hadoop components.
1. Make use of distributed file system like HDFS to store unstructured data in hadoop cluster
2. Apply the knowledge of MapReduce programming model to process unstructured data and
achieve appropriate results
3. Write pig scripts to solve problems of MapReduce programming model on huge volumes of
data.
4. Tabulate the Unstructured data from files and generate reports using Hive Component
5. Connect to web data sources like twitter for gathering data, Integrate data sources with
hadoop component like flume to process streaming data.
MAPREDUCE JOB LIFE CYCLE: How Mapreduce Works? Understanding Mapper ,Combiner,
Partitioner ,Shuffle & Sort and Reduce phases of MapReduce Application, Developing Map Reduce
Jobs based on the requirement using given datasets like weather dataset.
INTRODUCTION TO PIG
UNIT-III Classes: 12
Understanding pig and pig Platform, introduction to Pig Latin Language and Execution engine,
running pig in different modes, Pig Grunt Shell and its usage.
WRITING PIG SCRIPTS USING PIG LATIN: Writing pig scripts and saving them in text editor,
running pig scripts from command line.
HIVE DDL, DML AND HIVE SCRIPTS: Hive Statements, Understanding and working with Hive
Data Definition Languages and Manipulation Language statements, Creating Hive Scripts and running
them from hive terminal and commands line.
FLUME: Introduction to Flume agent, understanding Flume components Source, Channel and Sink,
Writing flume configuration file, running flume configuration file to ingest the data into HDFS.
OOZIE: Introduction to Oozie, Understanding work flow and how to create Work flow using Work
Flow definition language in XML, running a basic Oozie workflow to run a Map Reduce job.