You are on page 1of 3

BIG DATA ANALYTICS

Semester III Hours of Instruction/week: 4


No of credits: 3
17MITC15

Objective

 To explore the fundamental concepts of big data analytics, Filtering Streams, Hadoop,
and Visual data analysis techniques.

UNIT I INTRODUCTION TO BIG DATA 12


Introduction to BigData Platform – Challenges of Conventional Systems - Intelligent data
analysis – Nature of Data - Analytic Processes and Tools - Analysis vs Reporting - Modern Data
Analytic Tools - Statistical Concepts: Sampling Distributions - Re-Sampling - Statistical
Inference - Prediction Error.

UNIT II MINING DATA STREAMS


Introduction To Streams Concepts – Stream Data Model and Architecture - Stream Computing - 12
Sampling Data in a Stream – Filtering Streams – Counting Distinct Elements in a Stream –
Estimating Moments – Counting Oneness in a Window – Decaying Window - Real time
Analytics Platform(RTAP)
Applications - Case Studies - Real Time Sentiment Analysis, Stock Market Predictions.

UNIT III HADOOP 12


History of Hadoop- The Hadoop Distributed File System – Components of Hadoop- Analyzing
the Data with Hadoop- Scaling Out- Hadoop Streaming- Design of HDFS- Java interfaces to
HDFS-Basics-Developing a Map Reduce Application-How Map Reduce Works-Anatomy of a
Map Reduce Job run-Failures-Job Scheduling-Shuffle and Sort – Task execution - Map Reduce
Types and Formats- Map Reduce Features

UNIT IV HADOOP ENVIRONMENT 12


Setting up a Hadoop Cluster - Cluster specification - Cluster Setup and Installation - Hadoop
Configuration-Security in Hadoop - Administering Hadoop – HDFS - Monitoring-Maintenance-
Hadoop benchmarks- Hadoop in the cloud

UNIT V FRAMEWORKS 12
Applications on Big Data Using Pig and Hive – Data processing operators in Pig – Hive services
– HiveQL – Querying Data in Hive - fundamentals of HBase and ZooKeeper - IBM InfoSphere
BigInsights and Streams. Visualizations - Visual data analysis techniques, interaction techniques;
Systems and applications

Total Hours: 60
REFERENCES

1. Michael Berthold, David J. Hand, “Intelligent Data Analysis”, Springer, 2007.


2. Tom White “ Hadoop: The Definitive Guide” Third Edition, O’reilly Media, 2012.
3. Chris Eaton, Dirk DeRoos, Tom Deutsch, George Lapis, Paul Zikopoulos,
“Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data”,
McGrawHill Publishing, 2012
4. Anand Rajaraman and Jeffrey David Ullman, “Mining of Massive Datasets”, Cambridge
University Press, 2012.
5. Bill Franks, “Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data
Streams with Advanced Analytics”, John Wiley & sons, 2012.
6. Glenn J. Myatt, “Making Sense of Data”, John Wiley & Sons, 2007
7. Pete Warden, “Big Data Glossary”, O’Reilly, 2011.
8. Jiawei Han, Micheline Kamber “Data Mining Concepts and Techniques”, Second
Edition, Elsevier, Reprinted 2008.
9. Da Ruan,Guoquing Chen, Etienne E.Kerre, Geert Wets, Intelligent Data Mining,
Springer,2007
10. Paul Zikopoulos ,Dirk deRoos , Krishnan Parasuraman , Thomas Deutsch , James Giles ,
David Corrigan , Harness the Power of Big Data The IBM Big Data Platform, Tata
McGraw Hill Publications, 2012.
11. Zikopoulos, Paul, Chris Eaton, Understanding Big Data: Analytics for Enterprise Class
Hadoop and Streaming Data, Tata McGraw Hill Publications, 2011.
DATA ANALYTICS -PRACTICAL IV

Semester III Hours of Instruction/week:6


17MITC18 No of credits: 4

List of Programs

1. Implement the following Data structures


a)Linked Lists
b) Stacks
c) Queues
d) Set
e) Map
2. Perform setting up and Installing Hadoop in its different operating modes:
a).Standalone,
b).Pseudo distributed,
c).Fully distributed.
2. Use web based tools to monitor your Hadoop setup.
4. Implement the following file management tasks in Hadoop:
a).Adding files and directories
b).Retrieving files
c).Deleting files

Hint: A typical Hadoop workflow creates data files (such as log files) elsewhere and copies them
into HDFS using one of the above command line utilities.
5. Map Reduce application for word counting on Hadoop cluster.

6. Write a Map Reduce program that mines weather data.


Weather sensors collecting data every hour at many locations across the globe gather a
large volume of log data, which is a good candidate for analysis with MapReduce, since it
is semi structured and record-oriented.
7. Implement Matrix Multiplication with Hadoop Map Reduce
8. Install and Run Pig then write Pig Latin scripts to sort, group, join, project, and filter your
data.
9. Install and Run Hive then use Hive to create, alter, and drop databases, tables, views,
functions, and indexes
10. Unstructured data into NoSQL data and do all operations such as NoSQL query with API.