You are on page 1of 4

CE441: Big Data Analytics

Credit and Hours:

Teaching Scheme Theory Practical Total Credit


Hours/week 3 4 7
Marks 100 100 200

A. Objective of the Course:


The main objectives for offering the course Big Data Analytics are:
 How to develop algorithms for the statistical analysis of big data;
 Knowledge of big data applications;
 How to use fundamental principles used in predictive analytics;
 Evaluate and apply appropriate principles, techniques and theories to large-scale
data science problems.
B. Outline of the Course:
Sr. Title of the unit Minimum number of
No. hours
1 Big Data and Analytics 02
2 Data Collection, Sampling and Preprocessing 06
3 Predictive Analytics, Descriptive Analytics, Survival 08
Analysis
4 Introduction to Hadoop and Hadoop Architecture 08
5 HDFS, HIVE and HIVEQL, HBASE 08
6 Apache Spark and MongoDB 08
7 Big Data Applications and Visualization 05

Total hours (Theory): 45


Total hours (Lab): 30
Total hours: 75
C. Detailed Syllabus:

1. Big Data and Analytics 02 hours 4%


Introduction to Big Data, Big Data Characteristics, Types of Big
Data, Traditional Versus Big Data Approach, Technologies
Available for Big Data, Infrastructure for Big Data, Use of Data
Analytics, Big Data Challenges.
2. Data Collection, Sampling and Preprocessing 06 hours 13%
Types of Data Sources Sampling, Types of Data Elements ,Visual
Data Exploration and Exploratory Statistical Analysis, Missing
Values, Outlier Detection and Treatment, Standardizing Data,
Categorization, Weights of Evidence Coding, Variable Selection,
Segmentation
3. Predictive Analytics, Descriptive Analytics & Survival 08 hours 18%
Analysis
Predictive Analytics: Target Definition, Linear Regression,
Logistic Regression, Decision Trees, Neural Networks, Support
Vector Machines, Ensemble Methods, Multiclass Classification
Techniques, Evaluating Predictive Models
Descriptive Analytics: Association Rules, Sequence Rules,
Segmentation
Survival Analysis: Survival Analysis Measurements, Kaplan Meier
Analysis, Parametric Survival Analysis, Proportional Hazards
Regression, Extensions of Survival Analysis Models, Evaluating
Survival Analysis Models
4. Introduction to Hadoop and Hadoop Architecture 08 hours 18%
Big Data – Apache Hadoop & Hadoop EcoSystem, Moving
Data in and out of Hadoop – Understanding inputs and outputs
of MapReduce -, Data Serialization
5. HDFS, HIVE AND HIVEQL, HBASE 08 hours 18%
HDFS-Overview, Installation and Shell, Java API; Hive
Architecture and Installation, Comparison with Traditional
Database, HiveQL Querying Data, Sorting And Aggregating, Map
Reduce Scripts, Joins & Sub queries, HBase concepts, Advanced
Usage, Schema Design, Advance Indexing, PIG, Zookeeper , how
it helps in monitoring a cluster, HBase uses Zookeeper and how to
Build Applications with Zookeeper
6. Apache Spark, MongoDB and Neo4j 08 hours 18%
Introduction to Data Analysis with Spark, Downloading Spark
and Getting Started, Programming with RDD, Spark SQL, Spark
Streaming.

Introduction to MongoDB key features, Core Server tools,


MongoDB through the JavaScript’s Shell, Creating and Querying
through Indexes, Document-Oriented, principles of schema
design, Constructing queries on Databases, collections and
Documents , MongoDB Query Language
7. Graph Analytics and Data Visualization 05 hours 11%
Apache Spark GraphX: Property Graph, Graph Operator,
SubGraph, Triplet
Neo4j: Modeling data with Neo4j, Cypher Query Language:
General clauses, Read and Write clauses.
Big Data Visualization with D3.js, Kibana and Grafana

D. Instructional Method and Pedagogy:


At the start of course, the course delivery pattern, prerequisite of the subject will be discussed.

 Lectures will be conducted with the aid of multi-media projector, black board, OHP
etc.

 Attendance is compulsory in lectures and laboratory which carries 5 Marks weightage.

 Two internal exams will be conducted and average of the same will be converted to
equivalent of 15 Marks as a part of internal theory evaluation.
 Assignments based on course content will be given to the students at the end of each
unit/topic and will be evaluated at regular interval. It carries a weightage of 5 Marks as a
part of internal theory evaluation.

 Surprise tests/Quizzes/Seminar will be conducted which carries 5 Marks as a part of


internal theory evaluation.

 The course includes a laboratory, where students have an opportunity to build an


appreciation for the concepts being taught in lectures.

 Experiments/Tutorials related to course content will be carried out in the laboratory.

E. Student Learning Outcomes:


 Students will to build and maintain reliable, scalable, distributed systems with
Apache Hadoop and Spark.

 Students will be able to write Map-Reduce based Applications

 Students will be able to design and build MongoDB based Big data Applications and
learn MongoDB query language

 Students will learn difference between conventional SQL query language and NoSQL
and Graph processing and visualization.

 Students will learn tips and tricks for Big Data use cases and solutions.

F. Recommended Study Material:


 Text Books:
1. Bart Baesens , Analytics in a Big Data World: The Essential Guide to Data Science and
its Applications, ,Wiley, 2014
 Reference Books:
1. Dirk Deroos et al., Hadoop for Dummies, Dreamtech Press, 2014.
2. Chuck Lam, Hadoop in Action, December, 2010.
3. Leskovec, Rajaraman, Ullman, Mining of Massive Datasets, Cambridge University
Press.
4. I.H. Witten and E. Frank, Data Mining: Practical Machine learning tools and
techniques.
 Web Materials:
1. http://www.bigdatauniversity.com

You might also like