Professional Documents
Culture Documents
Prof. K. Srinivas
Course Coordinator
Course Objectives:
Find a meaningful pattern in data
Implement the analytic algorithms to solve the real life problems
Graphically interpret data
Handle large scale analytics projects from various domains
COURSE OUTCOMES
Upon successful completion of the course, the student will be able to:
CO1 Understand the concepts of Data mining and Big Data Analytics
CO2 Apply machine learning algorithms for data analytics
CO3 Analyze various text categorization algorithms
CO4 Use Technology and tools to solve the Big Data Analytics problems
PO PO PO PO PO PO PO PO PO PO PO PO PSO PSO
1 2 3 4 5 6 7 8 9 10 11 12 1 2
CO1 3
CO2 3 1 1 1
CO3 2 1 1 1
CO4 2 2 3 1
Course Outcome Indicators (COIs):
Program Outcomes
PO2: Problem analysis: Identify, formulate, review research literature, and analyze
complex engineering problems reaching substantiated conclusions using first
principles of mathematics, natural sciences, and engineering sciences.
PO5: Modern tool usage: Create, select, and apply appropriate techniques, resources,
and modern engineering and IT tools including prediction and modeling to
complex engineering activities with an understanding of the limitations.
PO6: The engineer and society: Apply reasoning informed by the contextual
knowledge to assess societal, health, safety, legal and cultural issues and the
consequent responsibilities relevant to the professional engineering practice.
PO7: Environment and sustainability: Understand the impact of the professional
engineering solutions in societal and environmental contexts, and demonstrate the
knowledge of, and need for sustainable development.
PO8: Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.
PO12: Lifelong learning: Recognize the need for and have the preparation and ability to
engage in independent and life-long learning in the broadest context of
technological change.
PSO1: Develop software applications/solutions as per the needs of Industry and society
PSO2: Adopt new and fast emerging technologies in computer science and engineering
COURSE CONTENT
UNIT I
Data Mining: Data Mining, Kinds of Patterns Can Be Mined, Applications of data mining.
Data pre-processing: Data Cleaning: Missing Values, Noisy Data, Data Cleaning as a Process;
Data Integration: Entity Identification Problem, Redundancy and Correlation Analysis, Tuple
Duplication, Data Value Conflict Detection and Resolution; Data Transformation and Data
Discretization: Data Transformation Strategies Overview, Data Transformation by Normalization,
Discretization by Binning, Discretization by Histogram Analysis.
Introduction to Big Data Analytics: Big Data Overview, State of the Practice in Analytics, Key
Roles for the New Big Data Ecosystem, Examples of Big Data Analytics
Data Analytics Lifecycle: Data Analytics Lifecycle Overview, Discovery, Data Preparation, Model
Planning, Model Building, Communicate Results, Operationalize
UNIT II
Association Rules: Apriori Algorithm, Evaluation of Candidate Rules, Applications of Association
Rules, Transactions in a Grocery Store,Validation and Testing;
Regression: Linear Regression, Logistic Regression
Advanced Analytical Theory and Methods-Classification: Decision Trees, Naïve Bayes;
Classification by Back propagation
Advanced Analytical Theory and Methods-Clustering: major categories of clustering methods, k-
means, k-nearest neighbor; DBSCAN
UNIT III
Advanced Analytical Theory and Methods-Time Series Analysis: Overview of Time Series
Analysis, ARIMA Model.
Advanced Analytical Theory and Methods-Text Analysis: Text Analysis Steps, Text Analysis
Example, Collecting Raw Text, Representing Text, Term Frequency—Inverse Document
Frequency (TFIDF), Categorizing Documents by Topics, Determining Sentiments
UNIT IV
Advanced Analytics- Technology and Tools: MapReduce and Hadoop: Analytics for Unstructured
Data, The Hadoop Ecosystem,
In-Database Analytics: SQL Essentials, In-Database Text Analysis.
Putting It All Together: Communicating and operationalizing an Analytics Project, Creating the
final deliverables, and Data Visualization basics.
TEXT BOOKS
[1] Data Science and Big Data Analytics, EMC2 Education Services, John Wiley, 2015 [Unit
II,III,IV]
[2] Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, 3 ed, Elsevier
Publishers [Unit I]
REFERENCE BOOKS
[1] Simon Walkowiak Big Data Analytics with R: Leverage R Programming to uncover hidden
patterns in your Big Data ,Packt publishing, 2016
[2] Nathan Marz, James Warren, “Big Data-Principles and best practices of scalable real-time data
systems”, DreamTech Press, 2015
[3] Benjamin Bengfort, Jenny Kim, Data Analytics with Hadoop: An Introduction for Data
Scientists, OReilly ,1st Edition, 2016
E-RESOURCES AND OTHER DIGITAL MATERIAL
[1] Prof. D. Janaki Ram and S. Srinath, III Madras, Data Mining and Knowledge Discovery
https://freevideolectures.com/course/2280/database-design/35, Last accessed on 11th April
2020
[2] Prof. Nandan sudharsanam and Prof . B.Ravindran , IIT Madras, Introduction to Data
Analytics http://nptel.ac.in/courses/110106064/23, Last accessed on 11th April 2020
Sess C O COI BTL Topic(s) Session Book – T1, Teaching Active Evaluation
No Level Outcomes [CH No], Learning Learning Components
[Page No] Methods Methods
1. 1 1 1 Data Mining: Data Understand the T2, 1.2, 5 Board/ A1, S1,
Mining, Kinds of Patterns kinds of T2,1.4, PPT HA, SE
Can Be Mined- patterns of data 15-18 Exam
Concept/Class mining
Description:
Characterization and
Discrimination,
Classification, Prediction,
2. 1 1 1 Cluster Analysis and Understand the T2, 1.4, Board/ Quiz A1, S1,
Outlier Analysis, kinds of 19-20 PPT HA, SE
Applications of data patterns and Exam
mining applications of
data mining
3. 1 1 1 Data Preparation: Data Understand T2,3.2, Board/ A1, S1,
Cleaning: Missing data cleaning 88-91 PPT HA, SE
Values, Noisy Data, Data methods Exam
Cleaning as a Process.
4. 1 2 2 Data Integration: Entity Understand and T2,3.3, Board/ Paper A1, S1,
Identification Problem, analyse data 93-99 PPT Work HA, SE
Redundancy and integration Exam
Correlation Analysis, techniques
Tuple Duplication, Data
Value Conflict Detection
and Resolution
5. 1 2 2 Data Transformation and Understand and T2,3.5, Board/ Paper A1, S1,
Data Discretization: Data analyse data 111-115 PPT Work HA, SE
Transformation Strategies transformation Exam
Overview, Normalization, and
Discretization by Binning discretization
and Histogram Analysis methods
6. 1 1 1 Introduction to Big Data Understand T1,1.2, Board/ A1, S1,
Analytics: Big Data fundamentals of 29-31 PPT HA, SE
Overview, State of the Big data Exam
Practice in Analytics, Key analytics
Roles for the New Big
Data Ecosystem. T1,1.4, 41
Example of Big Data
Analytics
7. 1 1 1 Data Analytics Life cycle: Understand life T1,2.1, Board/ A1, S1,
Data Analytics Life cycle cycle of Big 47-49 PPT HA, SE
Overview, Key Roles for data analytics Exam
a Successful Analytics
Project, Background and
Overview of Data
Analytics Lifecycle
8. 1 1 1 Discovery: Learning the Understand life T1,2.2, Board/ A1, S1,
Business Domain, cycle of Big 53-58 PPT HA, SE
Resources: Framing the data analytics Exam
Problem, Identifying Key
Stakeholders,
Interviewing the
Analytics Sponsor,
Developing Initial
Hypotheses, Identifying
Potential Data Sources
9. 1 1 1 Model Planning: Data Understand life T1, 2.4, Board/ A1, S1,
Exploration and Variable cycle of Big 68-75 PPT HA, SE
Selection, Model data analytics Exam
Selection, Common Tools
for the Model Planning
Phase
10. 1 1 1 Communicate Results and Understand life T1,2.6, Board/ Quiz A1, S1,
Operationalize cycle of Big 76-78 PPT HA, SE
data analytics Exam
11. 2 1 1 Association Rules: Understand T1, 5, Board/ S1, HA ,
Overview, Apriori Apriori 175-179 PPT SE Exam
Algorithm algorithm
12. 2 3 3 Evaluation of Candidate Apply apriori T1, 5.3, Board/ Case S1, HA ,
Rules, Applications of algorithm and 180-183, PPT study SE Exam
Association Rules, evaluate 196
Transactions in a Grocery associate rules
Store, The Groceries
Dataset, Validation and
Testing
13. 2 1 1 Regression: Linear Understand T1, 6.1, Board/ Case S1, HA ,
Regression, Use Cases, Linear 204-205 PPT study SE Exam
Model Description Regression and
its usecases
14. 2 3 3 Logistic Regression, Use Understand and T1, 6.2, Board/ Quiz S1, HA ,
Cases Model Description analyze various 222 PPT SE Exam
Additional Regression types of
Models Regression
models
15. 2 3 3 Advanced Analytical Apply decision T2, 6.3, Board/ Case S1, HA ,
Theory and Methods- tree algorithm 291-292 PPT study SE Exam
Classification: Decision to classify the
Trees, Decision Tree data
Induction
16. 2 2 2 Attribute Selection Understand and T2, 6.3.2, Board/ Memory S1, HA ,
Measure analyze various 296 PPT Matrix SE Exam
types attribute
selection
measures
17. 2 3 3 Naïve Bayes, Baye’s Understand and T2, 6.4, Board/ Case S1, HA ,
Theorem, Naïve Bayesian apply Naïve 310 PPT study SE Exam
Classification Bayesian
classification
algorithm
18. 2 2 2 Classification by Back Analyze the T2, 6.6, Board/ S1, HA ,
propagation, A Multilayer Back 327-329 PPT SE Exam
Feed Forward Neural propagation
Network, Defining a algorithm to
Network Topology, Back classify the data
propagation
19. 2 3 3 Case study of Back Apply the Back Board/ Case S1, HA ,
propagation propagation to PPT study SE Exam
classify the data
20. 2 1 1 Advanced Analytical Understand T2, 7.3, Board/ S1, HA ,
Theory and Methods- major 398 PPT SE Exam
Clustering: major categories of
categories of clustering clustering
methods methods
21. 2 3 3 K-means and case study Apply k-mean T2, 7.4.1, Board/ Case A2, S2,
algorithm to 402 PPT study HA, SE
classify the data Exam
22. 2 3 3 K-Nearest Neighbor and Apply KNN T2, 6.9.1, Board/ Quiz A2, S2,
DBSCAN and DBSCAN 377 and PPT HA, SE
to classify the T2, 7.6.1, Exam
data 418
23. 3 1 1 Advanced Analytical Understand T1, 8.1, Board/ A2, S2,
Theory and Methods- Overview of 282 PPT HA, SE
Time Series Analysis: Time Series Exam
Overview of Time Series Analysis
Analysis
24. 3 2 2 Box-Jenkins Methodology, Analyse the T1, 8.1.1, Board/ A2, S2,
ARIMA Model, ARIMA Model 283-286 PPT HA, SE
Autocorrelation Function Exam
(ACF)
25. 3 2 2 Autoregressive Model, Analyse the T1, 8.2.3, Board/ A2, S2,
Moving Average Models Autoregressive 287-289 PPT HA, SE
and Moving Exam
Average
models
26. 3 3 3 ARMA and ARIMA Analyze T1, 8.2.4, Board/ Quiz A2, S2,
Models ARMA and 290 PPT HA, SE
ARIMA models Exam
27. 3 1 1 Advanced Analytical Understand T1, 9.1, Board/ A2, S2,
Theory and Methods-Text Text Analysis 310-311 PPT HA, SE
Analysis: Text Analysis Steps and Exam
Steps, Text Analysis examples
Example
28. 3 2 2 Collecting Raw Text, Understand T1, 9.2, Board/ A2, S2,
Representing Text Text Analysis 314-318 PPT HA, SE
Steps Exam
29. 3 1 1 Term Frequency—Inverse Understand T1, 9.5, Board/ A2, HA
Document Frequency TFIDF and 324 PPT A2, S2,
(TFIDF), Categorizing Categorizing T1, 9.6, HA, SE
Documents by Topics Documents by 329 Exam
Topics
30. 3 3 3 Determining Sentiments Apply text T1, 9.7, Board/ Case A2, S2,
analysis to 333 PPT Study HA, SE
Determining Exam
Sentiments
31. 4 1 1 MapReduce and Hadoop: Understand T1, 10.1, Board/ S2, HA,
Analytics for fundamentals of 353 PPT SE Exam
Unstructured Data, Use Hadoop T1, 10.1.3,
Cases, Apache Hadoop 356
32. 4 2 2 The Hadoop Ecosystem: Understand and T1, 10.2, Board/ S2, HA,
Pig anayze Hadoop 364 PPT SE Exam
Ecosystem
33. 4 2 2 Hive and HBase Understand and T1, 10.2, Board/ Quiz S2, HA,
anayze Hadoop 366-369 PPT SE Exam
Ecosystem
34. 4 3 3 In-Database Analytics: Use SQL in In- T1, 11, Board/ Paper S2, HA,
SQL Essentials, Joins Database 389-391 PPT Work SE Exam
Analytics
35. 4 3 3 Set Operations, Grouping Use SQL in In- T1, 11, Board/ Paper S2, HA,
Extensions Database 393-395 PPT Work SE Exam
Analytics
36. 4 1 1 In-Database Text Understand In- T1, 12, Board/ S2, HA,
Analysis Database Text 400 PPT SE Exam
Analysis
37. 4 2 2 Putting It All Together: Analyze the T1, 12.1, Board/ S2, HA,
Communicating and Data Analytics 422 PPT SE Exam
operationalizing an life cycle and
Analytics Project, create the final T1, 12.1,
Creating the final deliverables 425
deliverables
38. 4 3 3 Developing Core Material Developing T1, 12.1, Board/ S2, HA,
for Multiple Audiences, Core Material 426-430 PPT SE Exam
Project Goals, Main for Multiple
Findings Audiences,
Project Goals,
Main Findings
39. 4 1 1 Approach, Model Understand the T1, 12.1, Board/ S2, HA,
Description, Key Points key points of 432-434 PPT SE Exam
Supported with Data data
40. 4 2 2 Data Visualization basics, Understand and T1, 12.3, Board/ Quiz S2, HA,
Key Points Supported use data 441-443 PPT SE Exam
with Data, Evolution of a visualization
Graphs techniques
41. 4 2 2 Common Representation Understand and T1, 12.3, Board/ S2, HA,
Methods, How to Clean use data 451-457 PPT SE Exam
Up a Graphic, Additional visualization
Considerations techniques
PRACTICAL COMPONENT
List of Experiments supposed to finish in Open Lab Sessions:
Hour 1 2 3 4 5 6 7 8 9
Compon 8.40 – 9.40 – 10.40 – 11.40 – 12.40 – 1.40 – 2.40 – 3.40 – 8.40 –
Day ent
9.40 10.40 11.40 12.40 1.40 2.40 3.40 4.40 9.40
Mon Theory
Lab
Tue Theory
Lab
Wed Theory
Lab
Thur Theory
Lab
Fri Theory
Lab
Theory
Sat
Lab
REMEDIAL CLASSES:
Supplement course handout, which may perhaps include special lectures and discussions that would be planned, and
schedule notified accordingly.
SELF-LEARNING:
Assignments to promote self-learning, survey of contents from multiple sources.
3 marks in each theory course shall be given for regularity in a graded manner as given in
the Table 3.
PLAGIARISM POLICY
Use of unfair means in any of the evaluation components will be dealt with strictly, and the case will be reported to the
examination committee.
GENERAL INSTRUCTIONS
Students should come prepared for classes and carry the text book(s) or material(s) as prescribed by the Course Faculty to the
class.
NOTICES
All notices will be communicated through the institution email.
All notices concerning the course will be displayed on the respective Notice Boards.
HEAD OF DEPARTMENT: