You are on page 1of 42

Department of Computer Science

Interdisciplinary Masters Programme

Syllabus

MSc (Data Analytics)


2021-2022

CHRIST (Deemed to be University), Bangalore.


Karnataka, India
www.christuniversity.in
Syllabus for MSc (Data Analytics) 2021-2022

Department Overview

Department of Computer Science of CHRIST (Deemed to be University) strives to shape


outstanding computer professionals with ethical and human values to reshape nation’s destiny.
The training imparted aims to prepare young minds for the challenging opportunities in the IT
industry with a global awareness rooted in the Indian soil, nourished and supported by experts
in the field.

Vision

The Department of Computer Science endeavors to imbibe the vision of the University
“Excellence and Service”. The department is committed to this philosophy which pervades
every aspect and functioning of the department.

Mission

“To develop IT professionals with ethical and human values”. To accomplish our mission, the
department encourages students to apply their acquired knowledge and skills towards
professional achievements in their career. The department also moulds the students to be
socially responsible and ethically sound.

Introduction to the Programme

MSc in Data Analytics is a six trimester inter-disciplinary post-graduate degree programme


conducted by Department of Computer Science. This programme is designed for working
professionals and graduates who want to launch their career in the in-demand and lucrative
field of data analytics. As organizations are looking ways to exploit the power of big data,
technology professionals who are experienced in analytics are in high demand. This
programme aims to offer thorough knowledge of the theory and practice of data analytics to
become a leading practioner in the field of data analytics. This programme accommodates a
wide audience of learners whose specific interests in data analytics may be either technical or
business focused.

Programme Objectives

 To enable learners to develop knowledge and skills in current and emerging areas of
data analytics.
 To critically assess and evaluate business and technical strategies for data analytics.
 To demonstrate expert knowledge of data analysis, statistics, tools, techniques and
technologies of data analytics.
 To develop project-management, critical-thinking, problem-solving and decision-
making skills.
 To formulate and implement a novel research idea and conduct research in the field of
data analytics.

2
Syllabus for MSc (Data Analytics) 2021-22

Ethics and Human Values

1. Only proprietary or open source software would be used for academic teaching and
learning purposes.
2. Copying of programs from internet, friends or from other sources is strictly discouraged
since it impairs development of programming skills.
3. Unique Practical (Domain based) exercises ensures that the students don’t involve in
code plagiarism.
4. Projects undertaken by students during the course are done in teams to improve
collaborative work and synergy between team members.
5. Projects involve modularization which initiates students to take individual
responsibility for common goals.
6. Passion for excellence is promoted among the students, be it in software development
or project documentation.
7. Giving due credit to sources during the seminar and research assignment is promoted
among the students
8. The course and its design enforce the practice of good referencing technique to improve
the sense of integrity.
9. Courses involving group discussions and debates on ethical practices and human values
are designed to sensitize the students in dealing with customers and members within the
organization.

Programme Outcomes:
On successful completion of the MSc programme students will be able to

PO1: Engage in continuous reflective learning in the context of technology and scientific
advancement.
PO2: Identify the need and scope of the Interdisciplinary research.
PO3: Enhance research culture and uphold the scientific integrity and objectivity
PO4: Understand the professional, ethical and social responsibilities
PO5: Understand the importance and the judicious use of technology for the sustainability of
the environment
PO6: Enhance disciplinary competency, employability and leadership skills

Programme Specific Outcomes:

PSO1: Problem Analysis and Design: Ability to identify analyze and design solutions for data
analytics problems using fundamental principles of mathematics, Statistics, computing sciences, and
relevant domain disciplines.
PSO2: Modern software tool usage: Acquire the skills in handling data analytics programming tools
towards problem solving and solution analysis for domain specific problems.
PSO3 Societal and Environmental Concern: Utilize the data analytics theories for societal and
environmental concerns.
PSO4: Professional Ethics: Understand and commit to professional ethics and cyber regulations,
responsibilities, and norms of professional computing practices.
PSO5: Applications in Multi disciplinary domains: Understand the role of statistical approaches and
apply the same to solve the real life problems in the fields of data analytics.
PSO6: Project Management: Apply the research-based knowledge to analyse and solve advanced
problems in data analytics.

3
Syllabus for MSc (Data Analytics) 2021-2022

Programme Structure of MSc (Data Analytics) -Trimester wise

Trimester I

Course Code Course Title Course No. of Marks Credits


Type Hrs /
Week
MDA131 Principles of Data Analytics Core 04 100 04

MDA171 Statistical Methods using R Core 05 100 04

MDA172 Python for Data Analytics Core 05 100 04

Total 14 300 12

Trimester II

Course Code Course Title Course No. of Marks Credits


Type Hrs /
Week
MDA231 Mathematical Foundation for Data Analytics Core 04 100 04

MDA271 Database Technologies Core 05 100 04

MDA272 Data Mining Core 05 100 04

Total 14 300 12

Trimester III

Course Code Course Title Course No. of Marks Credits


Type Hrs /
Week
MDA331 Artificial Intelligence Core 04 100 04

MDA371 Regression Modelling Core 05 100 04

MDA372 Big Data Analytics Core 05 100 04

Total 14 300 12

4
Syllabus for MSc (Data Analytics) 2021-22

Trimester IV

Course Code Course Title Course No. of Marks Credits


Type Hrs /
Week
MDA471 Machine Learning Core 05 100 04

MDA472 Natural Language Processing DSE 05 100 04

Generic Elective - I GE 04 100 04

Total 14 300 12

Trimester V

Course Code Course Title Course No. of Marks Credits


Type Hrs /
Week

MDA571 Data Visualization Core 05 100 04

MDA572 Neural Networks and Deep Learning DSE 05 100 04

Generic Elective - II GE 04 100 04

Total 14 300 12

Trimester VI

Course Code Course Title Course No. of Marks Credits


Type Hrs /
Week
MDA681 Project Core 08 100 04

Generic Elective - III GE 04 100 04

Generic Elective - IV GE 04 100 04

Total 16 300 12

5
Syllabus for MSc (Data Analytics) 2021-2022

Elective Courses offered by Computer Science Department


Discipline Specific Elective
MDA472 Natural Language Processing DSE 05 100 04

MDA572 Neural Networks and Deep Learning DSE 05 100 04

Generic Elective Courses

MDA461 Business Intelligence GE 04 100 04

MDA561 Internet of Things GE 04 100 04

MDA661 Web Analytics GE 04 100 04

MDA662 Cloud Analytics GE 04 100 04

6
Syllabus for MSc (Data Analytics) 2021-22

Trimester – I

MDA131: PRINCIPLES OF DATA ANALYTICS

Total Teaching Hours for Semester: 60


Max Marks: 100 Credits: 04

Course Objectives

To provide strong foundation for data analytics and application area related to it and
understand the underlying core concepts and emerging technologies in data analytics.

Course Outcomes

CO1: Explore the fundamental concepts of data analytics


CO2: Understand data analysis techniques for applications handling large data
CO3: Understand various machine learning algorithms used in data analytics process
CO4: Visualize and present the inference using various tools
CO5: Learn to think through the ethics surrounding privacy, data sharing and algorithmic
decision-making

Unit-1 Teaching Hours: 12

INTRODUCTION
Data Analytics - Types – Phases - Quality and Quantity of data – Measurement - Exploratory
data analysis - Business Intelligence

Unit-2 Teaching Hours: 12

BIG DATA
Big Data and Cloud technologies - Introduction to HADOOP: Big Data, Apache Hadoop,
MapReduce - Data Serialization - Data Extraction - Stacking Data - Dealing with data.

Unit-3 Teaching Hours: 12

DATA VISUALIZATION
Introduction to data visualization – Data visualization options – Filters – Dashboard
development tools – Creating an interactive dashboard with dc.js - summary.

Unit-4 Teaching Hours: 12

ANALYTICS AND MACHINE LEARNING


Machine learning – Modeling Process – Training model – Validating model – Predicting new
observations –Supervised learning algorithms – Unsupervised learning algorithms.

7
Syllabus for MSc (Data Analytics) 2021-2022

Unit-5 Teaching Hours: 12

ETHICS AND RECENT TRENDS


Data Science Ethics – Doing good data science – Owners of the data - Valuing different
aspects of privacy - Getting informed consent - The Five Cs – Diversity – Inclusion – Future
Trends.

Essential Reading:

[1] Davy Cielen, Arno D. B. Meysman, Mohamed Ali, Introducing Data Science, Manning
Publications Co., 1st edition, 2016.
[2] Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to
Statistical Learning: with Applications in R, Springer, 1st edition, 2013.
[3] Bart Baesens, Analytics in a Big Data World: The Essential Guide to Data Science and its
Applications, Wiley.
[4] D J Patil, Hilary Mason, Mike Loukides, Ethics and Data Science, O’ Reilly, 1st edition,
2018.

Recommended Reading:

[1] Dr Anil Maheshwari, Data Analytics Made Accessible, Publisher: Amazon.com Services
LLC.
[2] Joel Grus, Data Science from Scratch: First Principles with Python, O’Reilly, 1st edition,
2015.
[3] Cathy O'Neil, Rachel Schutt, Doing Data Science, Straight Talk from the Frontline, O’
Reilly, 1st edition, 2013.
[4] Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman, Mining of Massive Datasets,
Cambridge University Press, 2nd edition, 2014.
[5] Eric Siegel, Predictive Analytics The Power to Predict Who Will Click, Buy, Lie, or Die,
2nd Ed., Wiley.

8
Syllabus for MSc (Data Analytics) 2021-22

MDA171: STATISTICAL METHODS USING R

Total Teaching Hours for Semester: 75


Max Marks: 100 Credits: 04

Course Objectives

This course is to equip the students to visualize and analyse the data using R and to
communicate statistical results in correct manner.

Course Outcomes

CO1: Understand R and R studio


CO2: Create reports using R markdown
CO3: Analyse data for a given problem
CO4: Apply probability and statistics in real life problems
CO5: Draw scientific inference from data using R

Unit-1 Teaching Hours: 15

R AND R STUDIO
Getting started with R - installing R and R studio - getting help - installing and loading
packages - simple arithmetic calculations - data structure – expressions - conditional statements
– functions – loops - R–markdown - introduction to Statistics - probability and data with R.
Lab Exercises
1. R program to illustrate different data structures
2. Defining functions and making report in markdown

Unit-2 Teaching Hours: 15

EXPLORATORY DATA ANALYSIS


Visualizing numerical data - graphing systems available in R - descriptive Statistics - measures
of central tendency and dispersion – correlation - transforming data - exploring categorical
variables.
Lab Exercises
3. Loading dataset and visualizing data
4. Producing descriptive statistics measures

Unit-3 Teaching Hours: 15

PROBABILITY AND PROBABILITY DISTRIBUTIONS


Introduction - disjoint events - general addition rule – independence - probability examples -
disjoint vs. Independent - conditional probability - probability trees - normal distribution -
evaluating the normal distribution - working with the normal distribution - binomial
distribution - normal approximation to binomial - working with the binomial distribution.

9
Syllabus for MSc (Data Analytics) 2021-2022

Lab Exercises
5. Computing probabilities in R
6. Functions for probability distributions in R

Unit-4 Teaching Hours: 15

ESTIMATION
Introduction to Inference - sampling from population - maximum likelihood estimator - least
square estimator - confidence interval (CI) (for a mean) - accuracy vs. Precision - required
sample size for mean, CI (for the mean) examples.
Lab Exercises
7. Finding ML estimates and least square estimates
8. Constructing confidence interval

Unit-5 Teaching Hours: 15

TESTING OF HYPOTHESIS
Introduction - hypothesis testing (HT) - decision errors - large sample and small sample tests -
inference for other estimators - significance vs. confidence level - statistical vs. practical
significance - inference for proportions.
Lab Exercises
9. Carrying out large sample tests in R
10. Some small samples tests: t-test, paired t-test in R

Essential Reading:

[1] Grolemund G., Hands-on programming with R: write your own functions and simulations,
O' Reilly Media Inc., 2014.
[2]James G., Witten D., Hastie T., & Tibshirani R, An introduction to statistical learning: with
Applications in R, Springer, 2013.

Recommended Reading:

[1] Gupta S. C., & Kapoor V. K., Fundamental of Mathematical Statistics, Sultan Chand &
Sons, 2018.
[2] Peng R. D, Exploratory data analysis with R, Lulu.Com, 2012.
[3] Peng R. D, R programming for data science, Leanpub, 2016.
[4] Teetor P, R cookbook: Proven recipes for data analysis, statistics, and graphics, O' Reilly
Media Inc., 2011.
[5] Crawley M. J., The R book, John Wiley & Sons, 2012.

10
Syllabus for MSc (Data Analytics) 2021-22

MDA172: PYTHON FOR DATA ANALYTICS


Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives

The objective of this course is to provide comprehensive knowledge of python programming


paradigms required for Data Analytics.

Course Outcomes

CO1: Demonstrate the use of built-in objects of Python


CO2: Demonstrate significant experience with python program development environment
CO3: Implement numerical programming, data handling and visualization through
NumPy, Pandas and MatplotLib modules.

Unit-1 Teaching Hours: 15

INTRODUCTION TO PYTHON
Structure of Python Program-Underlying mechanism of Module Execution-Branching and
Looping-Problem Solving Using Branches and Loops-Functions - Lists and Mutability-
Problem Solving Using Lists and Functions
Lab Exercises
1. Demonstrate usage of branching and looping statements
2. Demonstrate Recursive functions
3. Demonstrate Lists

Unit-2 Teaching Hours: 15

SEQUENCE DATATYPES AND OBJECT-ORIENTED PROGRAMMING


Sequences, Mapping and Sets- Dictionaries- -Classes: Classes and Instances-Inheritance-
Exceptional Handling-Introduction to Regular Expressions using “re” module.
Lab Exercises
4. Demonstrate Tuples and Sets
5. Demonstrate Dictionaries
6. Demonstrate inheritance and exceptional handling
7. Demonstrate use of “re”.

Unit-3 Teaching Hours: 15

USING NUMPY
Basics of NumPy-Computation on NumPy-Aggregations-Computation on Arrays-
Comparisons, Masks and Boolean Arrays-Fancy Indexing-Sorting Arrays-Structured Data:
NumPy’s Structured Array.

11
Syllabus for MSc (Data Analytics) 2021-2022

Lab Exercises
8. Demonstrate Aggregation
9. Demonstrate Indexing and Sorting

Unit-4 Teaching Hours: 15

DATA MANIPULATION WITH PANDAS


Introduction to Pandas Objects - Data indexing and Selection - Operating on Data in Pandas -
Handling Missing Data - Hierarchical Indexing - Combining Data Sets - Aggregation and
Grouping - Pivot Tables.
Lab Exercises
10. Demonstrate handling of missing data
11. Demonstrate hierarchical indexing

Unit-5 Teaching Hours: 15

VISUALIZATION AND MATPLOTLIB


Basic functions of matplotlib - Simple Line Plot, Scatter Plot - Density and Contour Plots -
Histograms, Binnings and Density - Customizing Plot Legends, Colour Bars - Three-
Dimensional Plotting in Matplotlib.
Lab Exercises
12. Demonstrate Scatter Plot
13. Demonstrate 3D plotting

Essential Reading:

[1] Jake VanderPlas, Python Data Science Handbook - Essential Tools for Working with Data,
O’Reily Media Inc., 2016.
[2] Zhang.Y, An Introduction to Python and Computer Programming, Springer Publications,
2016.

Recommended Reading:

[1] Joel Grus , Data Science from Scratch First Principles with Python, O’Reilly Media, 2016.
[2] T.R.Padmanabhan, Programming with Python, Springer Publications, 2016.

12
Syllabus for MSc (Data Analytics) 2021-22

Trimester – II

MDA231: MATHEMATICAL FOUNDATION FOR DATA ANALYTICS

Total Teaching Hours for Semester: 60


Max Marks: 100 Credits: 04

Course Objectives

Linear Algebra plays a fundamental role in the theory of Data Science. This course aims at
introducing the basic notions of vector spaces, Linear Algebra and the use of Linear Algebra in
applications to Data Science.

Course Outcomes

CO1: Understand the properties of Vector spaces


CO2: Use the properties of Linear Maps in solving problems on Linear Algebra
CO3: Demonstrate proficiency on the topics Eigen values, Eigen vectors and Inner Product
Spaces
CO4: Apply mathematics for some applications in Data Science

Unit-1 Teaching Hours: 15

INTRODUCTION TO VECTOR SPACES


Vector Spaces: Rn and Cn, lists, Fnand digression on Fields, Definition of Vector spaces,
Subspaces, sums of Subspaces, Direct Sums, Span and Linear Independence, bases, dimension.

Unit-2 Teaching Hours: 20

LINEAR MAPS
Definition of Linear Maps - Algebraic Operations on - Null spaces and
Injectivity - Range and Surjectivity - Fundamental Theorems of Linear Maps - Representing a
Linear Map by a Matrix - Invertible Linear Maps - Isomorphic Vector spaces - Linear Map as
Matrix Multiplication - Operators - Products of Vector Spaces - Product of Direct Sum -
Quotients of Vector spaces.

Unit-3 Teaching Hours: 10

EIGENVALUES, EIGENVECTORS, AND INNER PRODUCT SPACES


Eigen values and Eigenvectors - Eigenvectors and Upper Triangular matrices – Eigen spaces
and Diagonal Matrices - Inner Products and Norms - Linear functionals on Inner Product
spaces.

13
Syllabus for MSc (Data Analytics) 2021-2022

Unit-4 Teaching Hours: 15

MATHEMATICS APPLIED TO DATA SCIENCE


Singular value decomposition - Handwritten digits and simple algorithm - Classification of
handwritten digits using SVD bases - Tangent distance - Text Mining.

Essential Reading:

[1] S. Axler, Linear algebra done right, Springer, 2017.


[2] Eld n Lars, Matrix methods in data mining and pattern recognition, Society for Industrial
and Applied Mathematics, 2007.

Recommended Reading:

[1] E. Davis, Linear algebra and probability for computer science applications, CRC Press,
2012.
[2] J. V. Kepner and J. R. Gilbert, Graph algorithms in the language of linear algebra, Society
for Industrial and Applied Mathematics, 2011.
[3] D. A. Simovici, Linear algebra tools for data mining, World Scientific Publishing, 2012.
[4] P. N. Klein, Coding the matrix: linear algebra through applications to computer science,
Newtonian Press, 2015.

14
Syllabus for MSc (Data Analytics) 2021-22

MDA271: Database Technologies


Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives

The main objective of this course is to fundamental knowledge and practical experience with,
database concepts. It includes the concepts and terminologies which facilitate the construction
of database tables and write effective queries. Also, to Comprehend Data warehouse and its
functions.

Course Outcomes

CO1: Demonstrate various databases


CO2: Compose effective queries
CO3: Distinguish database from data warehouse and examine its applications

Unit-1 Teaching Hours: 15

INTRODUCTION
Concept & Overview of DBMS, Data Models, Database Languages, Database Administrator,
Database Users, Three Schema architecture of DBMS. Basic concepts, Design Issues, Mapping
Constraints, Keys, Entity-Relationship Diagram, Weak Entity Sets, Extended E-R features

Lab Exercises
1. Data Definition,
2. Table Creation
3. Constraints

Unit-2 Teaching Hours: 15

RELATIONAL MODEL AND DATABASE DESIGN


SQL and Integrity Constraints, Concept of DDL, DML, DCL. Basic Structure, Set operations,
Aggregate Functions, Null Values, Domain Constraints, Referential Integrity Constraints,
assertions, views, Nested Subqueries, Functional Dependency, Different anomalies in
designing a Database, Normalization : using functional dependencies, Boyce-Codd Normal
Form, 4NF, 5NF

Lab Exercises
4. Insert, Select, Update & Delete Commands
5. Nested Queries & Join Queries
6. Views

15
Syllabus for MSc (Data Analytics) 2021-2022

Unit-3 Teaching Hours: 15

DATA WAREHOUSE: THE BUILDING BLOCKS


Defining Features, Data Warehouses and Data Marts, Architectural Types, Overview of the
Components, Metadata in the Data warehouse, Data Design and Data Preparation: Principles of
Dimensional Modeling, Dimensional Modeling Advanced Topics From Requirements To Data
Design, The Star Schema, Star Schema Keys, Advantages of the Star Schema, Star Schema:
Examples, Dimensional Modeling: Advanced Topics, Updates to the Dimension Tables,
Miscellaneous Dimensions, The Snowflake Schema, Aggregate Fact Tables, Families Oo Stars

Lab Exercises
7. Importing source data structures
8. Design Target Data Structures

Unit-4 Teaching Hours: 15

REQUIREMENTS, REALITIES, ARCHITECTURE AND DATA FLOW


Requirements, ETL Data Structures, Extracting, Cleaning and Conforming, Delivering
Dimension Tables, Delivering Fact Tables (CH:1,2,3,4,5,6)

Lab Exercises
9. Create target structure
10. Design and build the ETL mapping
11. Perform the ETL process and transform into data map

Unit-5 Teaching Hours: 15

IMPLEMENTATION, OPERATIONS AND ETL SYSTEMS:


Development, Operations, Metadata, Real-Time ETL Systems. (CH:7,8,9,11)

Lab Exercises
12. Create the cube and process it
13. Generating Reports
14. Creating the Pivot table and pivot chart using some existing data

Essential Reading:
[1] Henry F. Korth and Silberschatz Abraham, “Database System Concepts”, Mc.Graw Hill.
[2] Thomas Cannolly and Carolyn Begg, “Database Systems, A Practical Approach to Design,
Implementation and Management”, Third Edition, Pearson Education, 2007.
[3] The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2 nd John
Wiley & Sons, Inc. New York, USA, 2002

Recommended Reading:
[1] LiorRokach and OdedMaimon, Data Mining and Knowledge Discovery Handbook,
Springer, 2nd edition, 2010.

16
Syllabus for MSc (Data Analytics) 2021-22

MDA272: DATA MINING


Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives

To preprocess and analyze data, to choose relevant models and algorithms for respective
applications and to develop research interest towards advances in data mining

Course Outcomes

CO1: Understand different types of data to be mined


CO2: Categorize the scenario for applying different data mining techniques
CO3: Evaluate different models used for classification and Clustering
CO4: Focus towards research and innovation

Unit-1 Teaching Hours: 15

Introduction and Data Preprocessing


Data Mining – Kinds of data to be mined – Kinds of patterns to be mined – Technologies –
Targeted Applications - Major Issues in Data Mining – Data Objects and Attribute Types –
Measuring Data similarity and dissimilarity - Data Cleaning –Data Integration - Data
Reduction – Data Transformation – Data Discretization
Lab Exercises:
1. Identify a dataset, Preprocess the dataset set using normalization techniques
2. Explore data reduction techniques

Unit-2 Teaching Hours: 15

MINING FREQUENT PATTERNS AND ADVANCED PATTERN MINING


Basic Concepts – Frequent Itemset Mining Methods – Pattern Evaluation Methods – Pattern
Mining in Multilevel, Multidimensional space – Constraint-Based Frequent Pattern Mining –
Mining Compressed or Approximate Patterns – Pattern Exploration and Application
Lab Exercises:
3. Identify frequent itemsets using Apriori Algorithm
4. Generate FP Tree for a transaction dataset

Unit-3 Teaching Hours: 15

CLASSIFICATION TECHNIQUES
Basic Concepts – Decision Tree Induction – Bayes Classification Methods – Rule-Based
Classification – Model Evaluation and Selection – Techniques to Improve Classification
Accuracy – Bayesian Belief Networks – Classification by Backpropagation – Support Vector
Machines
Lab Exercises:
1. Construct Decision Tree for a dataset and identify the order of attributes
2. Apply Bayes Classification

17
Syllabus for MSc (Data Analytics) 2021-2022

Unit-4 Teaching Hours: 15

CLUSTERING TECHNIQUES
Cluster Analysis – Partitioning Methods - Hierarchical Methods – Density-Based Methods
(Includes all clustering techniques under the given categories in the Text Book)
Lab Exercises:
1. Demonstrate Naïve Bayes Classifier
2. Apply K-Means Clustering for given number of clusters

Unit-5 Teaching Hours: 15

Outlier Detection and APPLICATIONS


Outliers and Outlier Analysis – Clustering-Based Approach – Classification-Based Approach –
Mining Complex Data Types – Data Mining Applications

Lab Exercises:
1. Demonstrate Hierarchical clustering for a large dataset
2. Case studies and assignment

Essential Reading:
[1] Data Mining Concept and Techniques, Jiawei Han, Micheline Kamber, Jian Pie, Morgan
and Kaufmann Publisher, Third Edition, 2012
[2] Data Mining Techniques, Arun K Pujari, Second Edition, Universities Press India Pvt. Ltd.
2010

Recommended Reading:
[1] Data Mining and Predictive Analytics Daniel T. Larose, Chantal D. Larose (Wiley Series
on Methods and Applications in Data Mining), Wiley Publications,
[2] Data Mining: Practical Machine Learning Tools and Techniques, Ian H. Witten, Eibe
Frank, Mark A. Hall, Morgan and Kaufmann Publisher, Third Edition, 2014

Web Resources:
[1] https://data-flair.training/blogs/data-mining-tutorial/
[2] https://www.tutorialride.com/data-mining/data-mining-tutorial.htm

18
Syllabus for MSc (Data Analytics) 2021-22

Trimester - III
MDA331: ARTIFICIAL INTELLIGENCE
Total Teaching Hours for Semester: 60
Max Marks: 100 Credits: 04

Course Objectives
This course aims at developing an understanding about the issues involved in defining and
simulating perception, identifying the problems where AI is required and the different methods
available, to compare and contrast different AI techniques available, to define and explain
learning algorithms and to provide the student additional experience in the analysis and
evaluation of complicated systems.

Course Outcomes
CO1: Express the modern view of AI and its foundation
CO2: Illustrate Search Strategies with algorithms and Problems
CO3: Implement Propositional logic and apply inference rules
CO4: Apply suitable techniques for NLP and Game Playing

Unit – 1 Teaching Hours: 12


INTRODUCTION
Introduction to AI, The Foundations of AI, AI Technique -Tic-Tac-Toe. Problem
characteristics, Production system characteristics, Production systems: 8-puzzle problem.
Searching: Uniformed search strategies – Breadth first search, depth first search.

Unit – 2 Teaching Hours: 12


LOCAL SEARCH ALGORITHMS
Generate and Test, Hill climbing, simulated annealing search, Constraint satisfaction problems,
Greedy best first search, A* search, AO* search. Toy problems

Unit – 3 Teaching Hours: 12


KNOWLEDGE REPRESENTATION
First order logic. Inference in first order logic, propositional Vs. first order inference,
unification & lifts, Clausal form conversion, Forward chaining, Backward chaining,
Resolution.
SELF LEARNING
Propositional logic - syntax & semantics

Unit – 4 Teaching Hours: 12


GAME PLAYING
Overview, Minimax algorithm, Alpha-Beta pruning, Additional Refinements. Probabilistic
Reasoning : Ad Hoc Methods., Expert System, Expert System Shells

19
Syllabus for MSc (Data Analytics) 2021-2022

Unit – 5 Teaching Hours: 12


NATURAL LANGUAGE PROCESSING
Introduction, Practical Applications of NLP, Syntax processing, Semantic Analysis, Pragmatic
and Discourse Processing: Analysis, Perception.

Essential Reading:
[1] E. Rich and K. Knight, Artificial Intelligence, 3rd Edition. New york: TMH, 2019.
[2] S. Russell and P. Norvig, Artificial Intelligence A Modern Approach, 3 rd Edition. Pearson
Education, 2019.

Recommended Reading:
[1] Eugene Charniak and Drew McDermott, Introduction to Artificial Intelligence, 2ndEdition.
Singapore: Pearson Education, 2005.
[2] George F Luger, Artificial Intelligence Structures and Strategies for Complex
ProblemSolving, 4th Edition. Singapore: Pearson Education, 2008, ISBN-13 9780321545893
[3] N.L. Nilsson, Artificial Intelligence: A New Synthesis, 1st Edition. USA:
MorganKaufmann, 2000.
[4] Introduction to artificial intelligence by Patterson, ISBN-13: 978-0134771007

Web Resources:
1. https://ai.google/education/
2. https://intellipaat.com/blog/tutorial/artificial-intelligence-tutorial/
3. https://www.javatpoint.com/artificial-intelligence-tutorial

20
Syllabus for MSc (Data Analytics) 2021-22

MDA371: REGRESSION MODELLING


Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives

This course equips students to assess the relationship between variables in a data set and a
continuous response variable. In this course, students learn to fit simple and multiple linear
regression models using the R program.

Course Outcomes

CO1: Understand simple and multiple linear regression models.


CO2: Analyze relationships between multiple variables.
CO3: Build linear models and predict the study variable using the R program
CO4: Validate regression models
CO5: Model categorical and count data

Unit-1 Teaching Hours: 15

INTRODUCTION
Introduction to regression: regression through the origin, linear least squares, regression to the
mean, basic definitions: notation for data, the empirical mean, the empirical standard deviation
and variance, normalization, empirical covariance, some facts about correlation.

Lab Exercises in R

1. Visualizing data for model fitting using R


2. Finding least square estimates for parameters in the simple linear model

Unit-2 Teaching Hours: 15

SIMPLE LINEAR REGRESSION MODEL


Simple linear model with normal errors, regression parameters: interpretation, properties,
estimation and testing of hypotheses, prediction using the regression model. R- squared.

Lab Exercises in R

3. Building a basic linear regression model for the association between a single
explanatory variable and a response variable.
4. Finding interval estimates and testing hypotheses in a simple linear model.
Unit-3 Teaching Hours: 15

MULTIVARIABLE REGRESSION ANALYSIS

21
Syllabus for MSc (Data Analytics) 2021-2022

Multivariable linear regression model, estimation, example with two variables, simple linear
regression: the general case, interpretation of the coefficients, fitted values, residuals and
residual variation
Lab Exercises in R
5. Building a multiple linear regression model for the association between explanatory
variables and a response variable.
6. Finding interval estimates and testing hypotheses in multiple linear models.

Unit-4 Teaching Hours: 15

RESIDUALS, VARIATION, DIAGNOSTICS AND MODEL SELECTION


Residuals, influential, high leverage and outlying points, residuals, leverage and influence
measures, model selection: the Rumsfeldian triplet, general rules, R squared and adjusted R
squared, variance inflation factor, the impact of over- and under-fitting on residual variance
estimation, covariate model selection
Lab Exercises in R
7. Residual analysis of linear regression model
8. Model selection and nested model testing

Unit-5 Teaching Hours: 15

GENERALIZED LINEAR MODELS


Logistic regression: modelling binary response, estimation, odds, modelling the odds,
interpreting logistic regression, Poisson distribution and Poisson regression: modelling count
data, estimation, Poisson distribution, linear regression, Poisson regression, mean-variance
relationship, rates.
Lab Exercises in R
9. Building a logistic regression model for the categorical response variable
10. Modelling count data using a Poisson regression model.
Essential Reading:

[1] Fox, J., & Weisberg, S, An R companion to applied regression, Sage publications, 2018.
[2] Caffo, B., Regression models for data science in R, Leanpub, 2015.
Recommended Reading:

[1] Ciaburro, G., Regression Analysis with R: Design and develop statistical nodes to identify
unique relationships within data at scale, Packt Publishing Ltd, 2018.
[2] Sheather, S., A modern approach to regression with R, Springer Science & Business Media,
2009.
[3] Lilja, D. J., Linear Regression Using R: An Introduction to Data Modeling, University of
Minnesota Libraries Publishing, 2016.

22
Syllabus for MSc (Data Analytics) 2021-22

MDA372: BIG DATA ANALYTICS


Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives

The subject is intended to give the knowledge of Big Data evolving in every real-time
applications and how they are manipulated using the emerging technologies. This course
breaks down the walls of complexity in processing Big Data by providing a practical approach
to developing Java applications on top of the Hadoop platform. It describes the Hadoop
architecture and how to work with the Hadoop Distributed File System (HDFS) and HBase in
Ubuntu platform.

Course Outcomes
CO1: Able to understand the Big Data concepts in real time scenario
CO2: Understand the architecture of Hadoop with practical
CO3: Apply map reduce concept to implement in cloud

Unit-1 Teaching Hours: 15

INTRODUCTION
Distributed file system – Big Data and its importance, Four Vs, Drivers for Big data, Big data
analytics, Big data applications, Algorithms using map reduce, Matrix-Vector Multiplication
by Map Reduce.
Apache Hadoop– Moving Data in and out of Hadoop – Understanding inputs and outputs of
MapReduce - Data Serialization, Problems with traditional large-scale systems-Requirements
for a new approach-Hadoop – Scaling-Distributed Framework-Hadoop v/s RDBMS-Brief
history of Hadoop.
Lab Exercises:
1. Word count application in Hadoop.
2. Sorting the data using MapReduce.

Unit-2 Teaching Hours: 15

CONFIGURATIONS OF HADOOP
Hadoop Processes (NN, SNN, JT, DN, TT)-Temporary directory – UI-Common errors when
running Hadoop cluster, solutions.
Setting up Hadoop on a local Ubuntu host: Prerequisites, downloading Hadoop, setting up
SSH, configuring the pseudo-distributed mode, HDFS directory, NameNode, Examples of
MapReduce, Using Elastic MapReduce, Comparison of local versus EMR Hadoop.
Understanding MapReduce:Key/value pairs,TheHadoop Java API for MapReduce, Writing
MapReduce programs, Hadoop-specific data types, Input/output.
Developing MapReduce Programs: Using languages other than Java with Hadoop, Analysing a
large dataset.

23
Syllabus for MSc (Data Analytics) 2021-2022

Lab Exercises:
3. Finding max and min value in Hadoop.
4. Implementation of decision tree algorithms using MapReduce.

Unit-3 Teaching Hours: 15

ADVANCED MAPREDUCE TECHNIQUES


Simple, advanced, and in-between Joins, Graph algorithms, using language-independent data
structures.
Hadoop configuration properties - Setting up a cluster, Cluster access control, managing the
NameNode, Managing HDFS, MapReduce management, Scaling.
Lab Exercises:
5. Implementation of K-means Clustering using MapReduce.
6. Generation of Frequent Itemset using MapReduce.

Unit-4 Teaching Hours: 15

HADOOP STREAMING
Hadoop Streaming - Streaming Command Options - Specifying a Java Class as the
Mapper/Reducer - Packaging Files With Job Submissions - Specifying Other Plug-ins for Jobs.
Lab Exercises:
7. Count the number of missing and invalid values through joining two large given
datasets.
8. Using hadoop’s map-reduce, Evaluating Number of Products Sold in Each Country in
the online shopping portal. Dataset is given.

Unit-5 Teaching Hours: 15

HIVE & PIG


Architecture, Installation, Configuration, Hive vs RDBMS, Tables, DDL & DML, Partitioning
& Bucketing, Hive Web Interface, Pig, Use case of Pig, Pig Components, Data Model, Pig
Latin.
Lab Exercises:
9. Analyze the sentiment for product reviews, this work proposes a MapReduce
technique provided by Apache Hadoop.
10. Trend Analysis based on Access Pattern over Web Logs using Hadoop.

Essential Reading:
[1] Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, Professional Hadoop Solutions,
Wiley, 2015.
[2] Tom White, Hadoop: The Definitive Guide, O’Reilly Media Inc., 2015.
[3] Garry Turkington, Hadoop Beginner's Guide, Packt Publishing, 2013.

24
Syllabus for MSc (Data Analytics) 2021-22

Recommended Reading:
[1] Pethuru Raj, Anupama Raman, DhivyaNagaraj and Siddhartha Duggirala, High-
Performance Big-Data Analytics: Computing Systems and Approaches, Springer, 2015.
[2] Jonathan R. Owens, Jon Lentz and Brian Femiano, Hadoop Real-World Solutions
Cookbook, Packt Publishing, 2013.
[3] Tom White, HADOOP: The definitive Guide, O Reilly, 2012.

25
Syllabus for MSc (Data Analytics) 2021-2022

Trimester - IV

MDA471: MACHINE LEARNING


Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives

The objective of this course is to provide introduction to the principles and design of machine
learning algorithms. The course is aimed at providing foundations for conceptual aspects of
machine learning algorithms along with their applications to solve real world problems.

Course Outcomes

CO1: Understand the basic principles of machine learning techniques.


CO2: Understand how machine learning problems are formulated and solved.
CO3: Apply machine learning algorithms to solve real world problems.

Unit-1 Teaching Hours: 15

INTRODUCTION
Machine Learning-Examples of Machine Applications-Learning Associations-Classification-
Regression-Unsupervised Learning-Reinforcement Learning. Supervised Learning: Learning
class from examples- Probably Approach Correct(PAC) Learning-Noise-Learning Multiple
classes. Regression-Model Selection and Generalization.
Introduction to Parametric methods-Maximum Likelihood Estimation:Bernoulli Density-
Multinomial Density-Gaussian Density, Nonparametric Density Estimation: Histogram
Estimator-Kernel Estimator-K-Nearest Neighbour Estimator
Lab Exercises:
1. Data Exploration using parametric Methods
2. Data Exploration using non-parametric Methods

Unit-2 Teaching Hours: 15

DIMENSIONALITY REDUCTION
Dimensionality Reduction: Introduction- Subset Selection-Principal Component Analysis,
Feature Embedding-Factor Analysis-Singular Value Decomposition-Multidimensional Scaling-
Linear Discriminant Analysis- Bayesian Decision Theory
Lab Exercises:
3. Regression analysis
4. Data reduction using Principal Component Analysis
5. Data reduction using multi-dimensional scaling

Unit-3 Teaching Hours: 15

SUPERVISED LEARNING – I

26
Syllabus for MSc (Data Analytics) 2021-22

Linear Discrimination: Introduction- Generalizing the Linear Model-Geometry of the Linear


Discriminant- Pairwise Separation-Gradient Descent-Logistic Discrimination.

Kernel Machines
Introduction- optical separating hyperplane- v-SVM, kernel tricks- vertical kernel- vertical
kernel- defining kernel- multiclass kernel machines- one-class kernel machines
Lab Exercises:
6. Linear discrimination
7. Logistic discrimination
8. Classification using kernel machines

Unit-4 Teaching Hours: 15

SUPERVISED LEARNING – II
Multilayer perceptron
Introduction, training a perceptron- learning Boolean functions- multilayer perceptron-
backpropogation algorithm- training procedures.

Combining Multiple Learners


Rationale-Generating diverse learners- Model combination schemes- voting, Bagging-
Boosting- fine tuning an Ensemble

Lab Exercises:
9. Classification using MLP
10. Ensemble Learning

Unit-5 Teaching Hours: 15

UNSUPERVISED LEARNING
Clustering
Introduction-Mixture Densities, K-Means Clustering- Expectation-Maximization algorithm-
Mixtures of Latent Variable Models-Supervised Learning after Clustering-Spectral Clustering-
Hierarchical Clustering-Clustering- Choosing the number of Clusters
Lab Exercises:
11. K means clustering
12. Hierarchical clustering

Essential Reading:
[1]. E. Alpaydin, Introduction to Machine Learning, 3rd Edition, MIT Press, 2014.

Recommended Reading:
[1] C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2016.
[2] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data
Mining, Inference and Prediction, Springer, 2nd Edition, 2009
[3] K. P. Murphy, Machine Learning: A Probabilistic Perspective, MIT Press, 2012.

27
Syllabus for MSc (Data Analytics) 2021-2022

MDA472: NATURAL LANGUAGE PROCESSING

Total Teaching Hours for Semester: 75


Max Marks: 100 Credits: 04

Course Objectives

The goal is to make familiar with the concepts of the study of human language from a
computational perspective. It covers syntactic, semantic and discourse processing models,
emphasizing machine learning concepts.

Course Outcomes
CO1: Understand various approaches on syntax and semantics in NLP
CO2: Apply various methods to discourse, generation, dialogue and summarization using
NLP.
CO3: Analyze various methodologies used in Machine Translation, machine learning
techniques used in NLP including unsupervised models and to analyze real time applications

Unit-1 Teaching Hours: 15

INTRODUCTION
Introduction to NLP- Background and overview- NLP Applications -NLP hard Ambiguity-
Algorithms and models, Knowledge Bottlenecks in NLP- Introduction to NLTK, Case study
Lab Exercises
1. Write a program to tokenize text
2. Write a program to count word frequency and to remove stop words

Unit-2 Teaching Hours: 15

PARSING AND SYNTAX


Word Level Analysis: Regular Expressions, Text Normalization, Edit Distance, Parsing and
Syntax- Spelling, Error Detection and correction-Words and Word classes- Part-of Speech
Tagging, Naive Bayes and Sentiment Classification: Case study

Lab Exercises
3. Write a program to program to tokenize Non-English Languages
4. Write a program to get synonyms from WordNet

Unit-3 Teaching Hours: 15

SMOOTHED ESTIMATION AND LANGUAGE MODELLING


N-gram Language Models: N-Grams, Evaluating Language Models -The language modelling
problem
SEMANTIC ANALYSIS AND DISCOURSE PROCESSING

28
Syllabus for MSc (Data Analytics) 2021-22

Semantic Analysis: Meaning Representation-Lexical Semantics- Ambiguity-Word Sense


Disambiguation. Discourse Processing: cohesion-Reference Resolution- Discourse Coherence
and Structure.
Lab Exercises:
5. Write a program to get Antonyms from WordNet
6. Write a program for stemming Non-English words

Unit-4 Teaching Hours: 15

NATURALLANGUAGE GENERATION AND MACHINE TRANSLATION


Natural Language Generation: Architecture of NLG Systems, Applications
Machine Translation: Problems in Machine Translation- Machine Translation Approaches-
Evaluation of Machine Translation systems.
Case study: Characteristics of Indian Languages
Lab Exercises:
7. Write a program for lemmatizing words Using WordNet
8. Write a program to differentiate stemming and lemmatizing words

Unit-5 Teaching Hours: 15

INFORMATION RETRIEVAL AND LEXICAL RESOURCES


Information Retrieval: Design features of Information Retrieval Systems-Classical, Non-
classical, Alternative Models of Information Retrieval – valuation Lexical Resources: Word
Embeddings - Word2vec- Glove.
UNSUPERVISED METHODS IN NLP Graphical Models for Sequence Labelling in NLP
Lab Exercises:
9. Write a program for POS Tagging or Word Embeddings.
10. Case study-based program (IBM) or Sentiment analysis

Essential Reading:
[1] Daniel Jurafsky and James H., Speech and Language Processing, 2nd Edition, Martin
Prentice Hall, 2013.
[2] Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999.

Recommended Reading:
[1] Roland R. Hausser, Foundations of Computational Linguistics: Human-computer
Communication in Natural Language, Springer, 2014.
[2] Steven Bird, Ewan Klein and Edward Loper, Natural Language Processing with Python,
O’Reilly Media; First edition, 2009.

Web resources:
[1] https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
[2] https://nptel.ac.in/courses/106101007/
[3] NLTK – Natural Language Tool Kit- http://www.nltk.org

29
Syllabus for MSc (Data Analytics) 2021-2022

Trimester - V
MDA571: DATA VISUALIZATION
Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives
The main objective of this course to know the basics of data visualization and understand the
importance of data visualization, design and use of visual components. It provides the
knowledge of various visualization structures such as tables, spatial data, time-varying data,
tree and network.

Course Outcomes
CO1: Understand the visual representation of data
CO2: Apply the visual mapping and reference model
CO3: Analyze the one, two and multi-dimensional data for the data visualization process
CO4: Evaluate the visualization of groups, trees, graphs, clusters, networks and software
CO5: Construct the effective model for data visualization by using various techniques

Unit – 1 Teaching Hours: 15


Introduction
Introduction of visual perception - visual representation of data - Gestalt principles -
information overloads
Visual representation
Creating visual representations
Lab Exercises:
1. Data classification
2. Data segmentation

Unit – 2 Teaching Hours: 15


Visual representation
Visualization reference model - visual mapping - visual analytics - Design of visualization
applications
Classification
Classification of visualization systems - Interaction and visualization techniques misleading
Lab Exercises:
3. Data compression
4. Visualization of 2D and 3D

Unit – 3 Teaching Hours: 15


Classification
Visualization of one, two and multi-dimensional data, text and text documents
Visualization elements
Visualization of groups, trees, graphs, clusters, networks, software - Metaphorical visualization

30
Syllabus for MSc (Data Analytics) 2021-22

Lab Exercises:
5. Image visualization
6. Heat maps, Dot distribution maps, cartograms

Unit – 4 Teaching Hours: 15


Visualization Application
Visualization of volumetric data, - vector fields, processes and simulations - Visualization of
maps, geographic information - GIS systems - collaborative visualizations - evaluating
visualizations
Lab Exercises:
7. Diagrams visualization
8. Collaborative visualization

Unit – 5 Teaching Hours: 15


Online Visualization Tools
Plotly, Sisense, PowerBI, IBM Watson analytics, Kibana, Grafana, D3.js, Fusion charts,
Tableau public, charted, Google charts, Flot, chartist.js, High charts, Datawrapper, Dygraphs,
Raw, Timeline JS, Polymaps,
Lab Exercises:
9. Different charts
10. Geographical maps and information

Essential Reading:
[1] Ward, Grinstein Keim, Interactive Data Visualization: Foundations, Techniques, and
Applications. Natick: A K Peters Ltd, 2015
[2] Data Visualization: A practical Introduction, Kieran Healy, 2018

Recommended Reading:
[1] Dirken Jos, Expert data visualization, Packt publishing Ltd, 2017
[2] Stephanie Evergreen, Effective data visualization: The right chart for the right data, 2016
[3] Fundamentals of data visualization: A primer on making informative and compelling
figures by Claus. O Wilke, O’Reilly, 2019

31
Syllabus for MSc (Data Analytics) 2021-2022

MDA572: NEURAL NETWORKS AND DEEP LEARNING


Total Teaching Hours for Semester: 75
Max Marks: 100 Credits: 04

Course Objectives
Understand the concepts and models of the neural networks and deep learning and its
applications.

Course Outcomes
CO1: Understand the major technology trends in neural networks and deep learning
CO2: Build, train and apply neural networks and fully connected deep neural networks
CO3: Implement efficient (vectorized) neural networks for real time application

Unit – 1 Teaching Hours: 15

INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS


Neural Networks-Application Scope of Neural Networks- Fundamental Concept of ANN: The
Artificial Neural Network-Biological Neural Network-Comparison between Biological Neuron
and Artificial Neuron-Evolution of Neural Network. Basic models of ANN-Learning Methods-
Activation Functions-Importance Terminologies of ANN
Lab Exercises:
1. a. Calculate the output of a simple neuron using binary and bipolar sigmoidal activation
functions.
b. Classify the given input vectors into 4 categories using perceptron network with two
input neurons and two output neurons.

Unit – 2 Teaching Hours: 15

SUPERVISED LEARNING NETWORK


Shallow neural networks- Perceptron Networks-Theory-Perceptron Learning Rule-
Architecture-Flowchart for training Process-Perceptron Training Algorithm for Single and
Multiple Output Classes.
Back Propagation Network- Theory-Architecture-Flowchart for training process-Training
Algorithm-Learning Factors for Back-Propagation Network.
Radial Basis Function Network RBFN: Theory, Architecture, Flowchart and Algorithm.
Lab Exercises:
2. a. Classification of AND or OR problem using perceptron network.
b. Classification of an XOR problem using the multilayer perceptron Network.

Unit – 3 Teaching Hours: 15

CONVOLUTIONAL NEURAL NETWORK


Introduction - Components of CNN Architecture - Rectified Linear Unit (ReLU) Layer -
Exponential Linear Unit (ELU, or SELU) - Unique Properties of CNN -Architectures of CNN -
Applications of CNN.

32
Syllabus for MSc (Data Analytics) 2021-22

Lab Exercises:
3. Implementation of BPN for training a single-hidden-layer back propagation network.
4. Implementation of BPN for training a multi-hidden-layer back propagation network.

Unit – 4 Teaching Hours: 15

RECURRENT NEURAL NETWORK


Introduction- The Architecture of Recurrent Neural Network- The Challenges of Training
Recurrent Networks- Echo-State Networks- Long Short-Term Memory (LSTM) - Applications
of RNN.
Lab Exercises:
5. Implementation of Convolution Neural Network
6. Implementation of RNN

Unit – 5 Teaching Hours: 15


AUTO ENCODER AND RESTRICTED BOLTZMANN MACHINE
Introduction - Features of Auto encoder Types of Autoencoder

Restricted Boltzmann Machine- Boltzmann Machine - RBM Architecture -Example - Types of


RBM
Lab Exercises:
7. Implementation of auto encoder
8. Implementation of Restricted Boltzmann Machine

Essential Reading:

[1] S.N.Sivanandam, S. N. Deepa, Principles of Soft Computing, Wiley-India, 3rd Edition,


2018.
[2] Dr. S Lovelyn Rose, Dr. L Ashok Kumar, Dr. D Karthika Renuka, Deep Learning Using
Python, Wiley-India, 1st Edition, 2019.

Recommended Reading:

[1] Charu C. Aggarwal, Neural Networks and Deep Learning, Springer, September 2018.
[2] Francois Chollet, Deep Learning with Python, Manning Publications; 1st edition, 2017
[3] John D. Kelleher, Deep Learning (MIT Press Essential Knowledge series), The MIT Press,
2019.

Web Resources:
[1] www.coursera.org
[2] http://neuralnetworksanddeeplearning.com

33
Syllabus for MSc (Data Analytics) 2021-2022

Trimester – VI

MDA681: PROJECT

Total Teaching Hours for Semester:


Max Marks: 100 Credits: 04

Course Objectives

This course helps the student to develop students to become globally competent and to
inculcate Entrepreneurial skills among students.

Course Outcomes
CO1: Develop Real time Projects
CO2: Practices different data science principles and strategies in the project.

It is a full time project to be taken up either in the industry or in an R&D organization.

34
Syllabus for MSc (Data Analytics) 2021-22

Generic Elective Courses


MDA461: BUSINESS INTELLIGENCE
Total Teaching Hours for Semester: 60
Max Marks: 100 Credits: 04

Course Objectives
This course is designed to introduce a concept of Business Intelligence for better business
decision. Also gives practical knowledge on implementation of Business Intelligence concepts.

Course Outcomes
CO1: Understand the fundamentals of business intelligence and link data mining with business
intelligence.
CO2: Apply various modeling techniques and business intelligence methods to various situations using
data mining principles and techniques
CO3: Implement data analysis techniques to make better business decisions and demonstrate the
impact of business reporting, information visualization, and dashboards

Unit – 1 Teaching Hours: 12


DECISION SUPPORT SYSTEMS AND BUSINESS INTELLIGENCE
The Concept of Decision Support Systems – A Framework for Business Intelligence -
Effective and timely decisions – A Work System View of Decision Support – The Major Tools
and Techniques of Managerial Decision Support - Data, information and knowledge – Role of
mathematical models – Business intelligence architectures: Cycle of a business intelligence
analysis – Enabling factors in business intelligence projects – Development of a business
intelligence system

Unit – 2 Teaching Hours: 12

BASICS OF DATA INTEGRATION ETL


Concepts of data integration - need and advantages of using data integration - introduction to
common data integration approaches - introduction to ETL - introduction to data quality, data
profiling concepts and applications

Unit – 3 Teaching Hours: 12

INTRODUCTION TO MULTI-DIMENSIONAL DATA MODELING


Introduction to data and dimension modeling - multidimensional data model - ER Modeling vs.
multi-dimensional modeling - concepts of dimensions, facts, cubes, attribute, hierarchies, star
and snowflake schema - introduction to business metrics and KPIs - creating cubes using SSAS

Unit – 4 Teaching Hours: 12

KNOWLEDGE MANAGEMENT AND KNOWLEDGE DELIVERY


Introduction to Knowledge Management – Knowledge Management Activities – Approaches
to Knowledge Management - Information Technology(IT) in Knowledge Management - The
business intelligence user types - Standard reports - Interactive Analysis and Ad Hoc Querying

35
Syllabus for MSc (Data Analytics) 2021-2022

- Parameterized Reports and Self-Service Reporting - Dimensional analysis -


Alerts/Notifications - Visualization: Charts, Graphs, Widgets, Scorecards and Dashboards -
Geographic Visualization - Integrated Analytics - Considerations: Optimizing the Presentation
for the Right Message

Unit – 5 Teaching Hours: 12

DATA MINING FUNCTIONALITIES


Association rules mining - Mining Association rules from single level, multilevel transaction
databases - Classification and prediction - Decision tree induction - Bayesian Classification -
k-nearest - neighbour classification - Cluster analysis -Types of data in clustering,
categorization of clustering methods

Essential Reading:
[1] Efraim Turban, Ramesh Sharda, Dursun Delen, Decision Support and Business Intelligence
Systems, 9th Edition, Pearson 2013.
[2] Cindi Howson, Successful Business Intelligence, Unlock the Value of BI & Big Data
Hardcover –Second Edition: Import, Nov 2013.
[3] Gert H.N. Laursen, JesperThorlund, Business Analytics for Managers: Taking Business
Intelligence beyond Reporting Paperback , Sep 2013

Recommended Reading:
[1] Carlo Vercellis, Business Intelligence: Data Mining and Optimization for Decision
Making, Wiley Publications, 2009.
[2] David Loshin Morgan, Kaufman, Business Intelligence: The Savvy Manager’s Guide,
Second Edition, 2012.
[3] Ralph Kimball , Margy Ross , Warren Thornthwaite, Joy Mundy, Bob Becker, The Data
Warehouse Lifecycle Toolkit, Wiley Publication Inc., 2007.
[4] G.K.Gupta, Introduction to Data Mining with case studies, Prentice Hall of India, 2011.

36
Syllabus for MSc (Data Analytics) 2021-22

MDA561: INTERNET OF THINGS

Total Teaching Hours for Semester: 60


Max Marks: 100 Credits: 04

Course Objectives
The explosive growth of the “Internet of Things” is changing our world and the rapid growth
of IoT components is allowing people to innovate new designs and products at home. Wireless
Sensor Networks form the basis of the Internet of Things. To latch on to the applications in the
field of IoT of the recent times, this course provides a deeper understanding of the underlying
concepts of IoT and Wireless Sensor Networks.

Course Outcomes
CO1: Understand the concepts of IoT and IoT enabling technologies
CO2: Gain knowledge on IoT programming and able to develop IoT applications
CO3: Identify different issues in wireless ad hoc and sensor networks
CO4: To develop an understanding of sensor network architectures from a design and
performance perspective
CO5: To understand the layered approach in sensor networks and WSN protocols

Unit-1 Teaching Hours: 12

INTRODUCTION TO IoT
Introduction to IoT - Definition and Characteristics, Physical Design Things- Protocols,
Logical Design- Functional Blocks, Communication Models- Communication APIs-
Introduction to measure the physical quantities, IoT Enabling Technologies - Wireless Sensor
Networks, Cloud Computing Big Data Analytics, Communication Protocols- Embedded
System- IoT Levels and Deployment Templates.

Unit-2 Teaching Hours: 12

IoT PROGRAMMING
Introduction to Smart Systems using IoT - IoT Design Methodology- IoT Boards (Rasberry Pi,
Arduino) and IDE - Case Study: Weather Monitoring- Logical Design using Python, Data
types & Data Structures- Control Flow, Functions- Modules- Packages, File Handling -
Date/Time Operations, Classes- Python Packages of Interest for IoT.

Unit-3 Teaching Hours: 12

IoT APPLICATIONS
Home Automation – Smart Cities- Environment, Energy- Retail, Logistics- Agriculture,
Industry- Health and Lifestyle- IoT and M2M.

37
Syllabus for MSc (Data Analytics) 2021-2022

Unit-4 Teaching Hours: 12

NETWORK OF WIRELESS SENSOR NODES


Sensing and Sensors - Wireless Sensor Networks, Challenges and Constraints - Applications:
Structural Health Monitoring, Traffic Control, Health Care - Node Architecture - Operating
system.

Unit-5 Teaching Hours: 12

MAC, ROUTING AND TRANSPORT CONTROL IN WSN


Introduction – Fundamentals of MAC Protocols – MAC protocols for WSN – Sensor MAC
Case Study – Routing Challenges and Design Issues – Routing Strategies – Transport Control
Protocols – Transport Protocol Design Issues – Performance of Transport Protocols

Essential Reading:
[1] ArshdeepBahga and Vijay Madisetti, Internet of Things: Hands-on Approach, Hyderabad
University Press, 2015.
[2] KazemSohraby, Daniel Minoli and TaiebZnati, Wireless Sensor Networks: Technology.
Protocols and Application, Wiley Publications, 2010.
[3] WaltenegusDargie and Christian Poellabauer, Fundamentals of Wireless Sensor Networks:
Theory and Practice, AJohn Wiley and Sons Ltd., 2010.

Recommended Reading:
[1] Edgar Callaway, Wireless Sensor Networks: Architecture and Protocols, Auerbach
Publications, 2003.
[2] Michael Miller, The Internet of Things, Pearson Education, 2015.
[3] Holger Karl and Andreas Willig, Protocols and Architectures for Wireless Sensor
Networks, John Wiley & Sons Inc., 2005.
[4] ErdalÇayırcıandChunmingRong, Security in Wireless Ad Hoc and Sensor Networks, John
Wiley and Sons, 2009.
[5] Carlos De MoraisCordeiro and Dharma PrakashAgrawal, Ad Hoc and Sensor Networks:
Theory and Applications, World Scientific Publishing, 2011.
[6] WaltenegusDargie and Christian Poellabauer, Fundamentals of Wireless Sensor Networks
Theory and Practice, John Wiley and Sons, 2010
[7] Adrian Perrig and J. D. Tygar, Secure Broadcast Communication: In Wired and Wireless
Networks, Springer, 2006.

38
Syllabus for MSc (Data Analytics) 2021-22

MDA661: WEB ANALYTICS

Total Teaching Hours for Semester: 60


Max Marks: 100 Credits: 04

Course Objectives
The objective of this course is to provide overview and importance of Web analytics and helps
to understand role of Web analytic. This course also explores the effective of Web analytic
strategies and implementation.

Course Outcomes
CO1: Understand the concept and importance of Web analytics in an organization and the role
of Web analytic in collecting, analyzing and reporting website traffic.
CO2: Identify key tools and diagnostics associated with Web analytics.
CO3: Explore effective Web analytics strategies and implementation and Understand the
importance of web analytic as a tool for e-Commerce, business research, and market research.

Unit-1 Teaching Hours: 10


INTRODUCTION TO WEB ANALYTICS
Introduction to Web Analytics: Web Analytics Approach – A Model of Analysis – Context
matters – Data Contradiction – Working of Web Analytics: Log file analysis – Page tagging –
Metrics and Dimensions – Interacting with data in Google Analytics

Unit-2 Teaching Hours: 12


LEARNING ABOUT USERS THROUGH WEB ANALYTICS
Goals: Introduction – Goals and Conversions – Conversion Rate – Goal reports in Google
Analytics – Performance Indicators – Analyzing Web Users: Learning about users – Traffic
Analysis – Analyzing user content – Click-Path analysis – Segmentation

Unit-3 Teaching Hours: 12


GOOGLE ANALYTICS
Different analytical tools - Key features and capabilities of Google analytics- How Google
analytics works - Implementing Google analytics - Getting up and running with Google
analytics -Navigating Google analytics – Using Google analytics reports -Google metrics -
Using visitor data to drive website improvement- Focusing on key performance indicators-
Integrating Google analytics with third-Party applications

Unit-4 Teaching Hours: 12


OVERVIEW OF QUALITATIVE ANALYSIS
Lab Usability Testing- Heuristic Evaluations- Site Visits- Surveys (Questionnaires) - Testing
and Experimentation: A/B Testing and Multivariate Testing-Competitive Intelligence -
Analysis Search Analytics: Performing Internal Site Search Analytics, Search Engine
Optimization (SEO) and Pay per Click (PPC)-Website Optimization against KPIs- Content
optimization- Funnel/Goal optimization - Text Analytics: Natural Language Processing (NLP)-
Supervised Machine Learning (ML) Algorithms-API and Web data scarping using R and
Python

39
Syllabus for MSc (Data Analytics) 2021-2022

Unit-5 Teaching Hours: 14


VISUAL ANALYTICS:
Drill down and hierarchies-Sorting-Grouping- Additional Ways to Group- Creating Sets-
Analysis with Cubes and MDX- Filtering for Top and Top N-
Using the Filter Shelf- The Formatting Pane- Trend Lines- Forecasting- Formatting-
Parameters -
SOCIAL NETWORK ANALYSIS:
Types of social network-Graph Visualization-Network Relationships-Network structures:
equivalence-Network Evolution-Diffusion in networks- Descriptive Modeling-Predictive
Modeling-Customer Profiling-Network targeting

Essential Reading:
[1] Beasley M, (2013), Practical web analytics for user experience: How analytics can help you
understand your users. Newnes, 1st edition, Morgan Kaufmann.
[2] Sponder M, (2013), Social media analytics: Effective tools for building, interpreting, and
using metrics, 1st edition, McGraw Hill Professional.
[3] Clifton B, (2012), Advanced Web Metrics with Google Analytics, 3rd edition, John Wiley
& Sons..

Recommended Reading:
[1] Peterson E. T, (2004), Web Analytics Demystified: AMarketer's Guide to Understanding
How Your Web Site Affects Your Business. Ingram.
[2] Sostre P, LeClaire J, (2007), Web Analytics for dummies, John Wiley & Sons.
[3] Burby J, Atchison S, (2007), Actionable web analytics: using data to make smart business
decisions, John Wiley & Sons.
[4] Dykes B, (2011), Web analytics action hero: Using analysis to gain insight and optimize
your business, Adobe Press.

Web resources:
[1] https://analytics.google.com/analytics/web/
[2] https://www.optimizely.com/optimization-glossary/web-analytics/
[3] https://www.tutorialspoint.com/web_analytics/web_analytics_introduction.htm

40
Syllabus for MSc (Data Analytics) 2021-22

MDA662: CLOUD ANALYTICS

Total Teaching Hours for Semester: 60


Max Marks: 100 Credits: 04

Course Objectives
The objective of this course is to explore the basics of cloud analytics and the major cloud
solutions. Students will learn how to analyze extremely large data sets, and to create visual
representations of that data. Also aim to provide students with hands-on experience working
with data at scale.

Course Outcomes
CO1: Interpret the deployment and service models of cloud applications.
CO2: Describe big data analytical concepts.
CO3: Ingest, store, and secure data.
CO4: Process and Visualize structured and unstructured data.

Unit-1 Teaching Hours: 12

INTRODUCTION
Introduction to cloud computing - Major benefits of cloud computing - Cloud computing
deployment models - Private cloud - Public cloud - Hybrid cloud - Types of cloud computing
services -Infrastructure as a Service – PaaS – SaaS - Emerging cloud technologies and services
- Different ways to secure the cloud - Risks and challenges with the cloud - What is cloud
analytics? Parameters before adopting cloud strategy - Technologies utilized by cloud
computing

Unit-2 Teaching Hours: 12

CLOUD ENABLING TECHNOLOGIES


Virtualization - Load Balancing - Scalability & Elasticity – Deployment –Replication –
Monitoring - Software Defined Networking - Network Function Virtualization – MapReduce -
Identity and Access Management - Service Level Agreements - Billing

Unit-3 Teaching Hours: 12

BASIC CLOUD SERVICES & PLATFORMS


Compute Services
Amazon Elastic Compute Cloud - Google Compute Engine - Windows Azure Virtual
Machines
Storage Services
Amazon Simple Storage Service - Google Cloud Storage - Windows Azure Storage
Database Services
Amazon Relational Data Store - Amazon DynamoDB - Google Cloud SQL - Google Cloud
Datastore - Windows Azure SQL Database - Windows Azure Table Service

41
Syllabus for MSc (Data Analytics) 2021-2022

Unit-4 Teaching Hours: 12

DATA INGESTION AND STORING


Cloud Dataflow - The Dataflow programming model - Cloud Pub/Sub - Cloud storage - Cloud
SQL - Cloud BigTable - Cloud Spanner - Cloud Datastore - Persistent disks
PROCESSING AND VISUALIZING
Google BigQuery - Cloud Dataproc - Google Cloud Datalab - Google Data Studio

Unit-5 Teaching Hours: 12

MACHINE LEARNING, DEEP LEARNING AND AI


Services on Artificial intelligence - Machine learning - Cloud Natural Language API –
TensorFlow - Cloud Speech API - Cloud Translation API - Cloud Vision API - Cloud Video
Intelligence – Dialogflow – AutoML

Essential Reading:
[1] Sanket Thodge, Cloud Analytics with Google Cloud Platform, Packt Publishing, 2018.
[2] Arshdeep Bahga and Vijay Madisetti, Cloud computing - A Hands-On Approach, Create
Space Independent Publishing Platform, 2014.

Recommended Reading:
[1] Deven Shah, Kailash Jayaswal, Donald J. Houde, Jagannath Kallakurchi, Cloud Computing
- Black Book, Wiley, 2014.
[2] Thomas Erl, Ricardo Puttini, Zaigham Mahmood, Cloud Computing: Concepts,
Technology & Architecture, Prentice Hall, 2014.

Web resources:
[1] https://www.w3schools.in/cloud-computing/cloud-computing/
[2] https://docs.aws.amazon.com
[3] https://cloud.google.com › docs
[4] https://docs.microsoft.com › en-us › azure

42

You might also like