You are on page 1of 25

Women Engg.

College, Ajmer

Presented by : Monalisa Meena


Assistant Professor
Dept. of Computer Enginerring
Big Data Analytics
Credit: 3
Max. Marks: 150(IA:30, ETE:120) 3L+0T+0P End Term
Exam: 3 Hours
 Withthe right analytics, big data can deliver
richer insight since it draws from multiple
sources and transactions to uncover hidden
patterns and relationships.
 Prescriptive –reveals what actions should be taken.
This is the most valuable kind of analysis and usually
results in rules and recommendations for next steps.
 Predictive – what might happen. The deliverables are
usually a predictive forecast.
 Diagnostic – A look at past performance to determine
what happened and why. The result of the analysis is often
an analytic dashboard.
 Descriptive –
What is happening now based on incoming
data. To mine the analytics, you typically use a real-time
dashboard and/or email reports.
 Objective,
 scopeand
 outcome of the course.
 Big data features and challenges, Problems
with Traditional Large-Scale System ,
Sources of Big Data, 3 V’s of Big Data, Types
of Data. Working with Big Data: Google File
System. Hadoop Distributed File System
(HDFS) - Building blocks of Hadoop
(Namenode. Data node. Secondary
Namenode. Job Tracker. Task Tracker),
Introducing and Configuring Hadoop cluster
(Local. Pseudo- distributed mode, Fully
Distributed mode). Configuring XML files.
A Weather Dataset. Understanding Hadoop
API for MapReduce Framework (Old and
New). Basic programs of Hadoop MapReduce:
Driver code. Mapper code, Reducer code.
Record Reader, Combiner,Partitioner.
 TheWritable Interface. Writable Comparable
and comparators. Writable Classes: Writable
wrappers for Java primitives. Text. Bytes
Writable. Null Writable, Object Writable and
Generic Writable. Writable collections.
Implementing a Custom Writable:
Implementing a Raw Comparator for speed,
Custom comparators.
 Hadoop Programming Made Easier Admiring
the Pig Architecture, Going with the Pig Latin
Application Flow. Working through the ABCs
of Pig Latin. Evaluating Local and Distributed
Modes of Running Pig Scripts, Checking out
the Pig Script Interfaces, Scripting with Pig
Latin.
 Part of hadoop eco system
 Was developed by yahoo
 High level data flow system
 Provides abstraction over mapreduce
 LOAD
 FOREACH
 FILTER
 JOIN
 ORDERBY
 STORE
 DISTINCT
 GROUP
 COGROUP
 Load
 Transform
 Dump or store.
 UsesPig Latin
 Requires JRE
Pig Latin
Compiler
I/O File in
Pig Script HDFS

Execution of
UDF present In Map Reduce
LFS Function O/P File stored
in HDFS
 To run pig in local mode, we need to access a
single machine; all files, jars, which are
going to process should be installed and run
in local environment.
 This mode is considered, when there are
smaller set of data for testing the code.
 Mapreduce is locally simulated with the local
JobRunner class of hadoop
Pig –x local
 To run pig in Distributed mode, we need to
access Hadoop clusters and HDFS installation.
 Map reduce mode is the default mode.
 In this mode, pig translates the queries into
mapreduce job and runs the job on the
hadoop cluster. This cluster can be pseudo or
fully distributed cluster.
pig
pig –x mapreduce
 SayingHello to Hive, Seeing How the Hive is
Put Together, Getting Started with Apache
Hive. Examining the Hive Clients. Working
with Hive Data Types. Creating and Managing
Databases and Tables, Seeing How the Hive
Data Manipulation Language Works, Querying
and Analyzing Data.

You might also like