You are on page 1of 13

Course Handout

Academic Year: 2020-21


17CS3702A - DATA ANALYTICS
IV /IV B. Tech Program, First Semester

Prof. K. Srinivas
Course Coordinator

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


VELAGAPUDI RAMAKRISHNA SIDDHARTHA ENGINEERING COLLEGE
(AUTONOMOUS)
(AFFILIATED TO JNTUK, KAKINADA) KANURU, VIJAYAWADA – 520007
17CS3702A DATA ANALYTICS

Course Category: Programme Elective Credits: 3


Course Type: Theory Lecture -Tutorial-Practice: 3-0-0
Prerequisites: Programming in C, Continuous Evaluation: 30
Theory of Semester end Evaluation: 70
Computation Total Marks: 100

Course Prof .K. Srinivas Team of Instructors: Prof. K. Srinivas


Coordinator: Mr. G Arun Kumar

Course Objectives:
 Find a meaningful pattern in data
 Implement the analytic algorithms to solve the real life problems
 Graphically interpret data
 Handle large scale analytics projects from various domains

COURSE OUTCOMES

Upon successful completion of the course, the student will be able to:

CO1 Understand the concepts of Data mining and Big Data Analytics
CO2 Apply machine learning algorithms for data analytics
CO3 Analyze various text categorization algorithms

CO4 Use Technology and tools to solve the Big Data Analytics problems

Contribution of Course Outcomes towards achievement of Program Outcomes


(1 – Low, 2 - Medium, 3 – High)

PO PO PO PO PO PO PO PO PO PO PO PO PSO PSO
1 2 3 4 5 6 7 8 9 10 11 12 1 2
CO1 3
CO2 3 1 1 1
CO3 2 1 1 1

CO4 2 2 3 1
Course Outcome Indicators (COIs):

Course Highest COI-1 COI-2 COI-3


Outcome BTL (BTL1) (BTL2) (BTL3)
No.
CO 1 2 1 2
CO 2 3 1 2 3
CO 3 3 1 2 3
CO 4 3 1 2 3

PROGRAM OUTCOMES & PROGRAM SPECIFIC OUTCOMES (POs/PSOs)

Program Outcomes

PO1: Engineering knowledge: Apply the knowledge of mathematics, science,


engineering fundamentals, and an engineering specialization to the solution of
complex engineering problems.

PO2: Problem analysis: Identify, formulate, review research literature, and analyze
complex engineering problems reaching substantiated conclusions using first
principles of mathematics, natural sciences, and engineering sciences.

PO3: Design/development of solutions: Design solutions for complex engineering


problems and design system components or processes that meet the specified
needs with appropriate consideration for the public health and safety, and the
cultural, societal, and environmental considerations.

PO4: Conduct investigations of complex problems: Use research-based knowledge


and research methods including design of experiments, analysis and interpretation
of data, and synthesis of the information to provide valid conclusions.

PO5: Modern tool usage: Create, select, and apply appropriate techniques, resources,
and modern engineering and IT tools including prediction and modeling to
complex engineering activities with an understanding of the limitations.

PO6: The engineer and society: Apply reasoning informed by the contextual
knowledge to assess societal, health, safety, legal and cultural issues and the
consequent responsibilities relevant to the professional engineering practice.
PO7: Environment and sustainability: Understand the impact of the professional
engineering solutions in societal and environmental contexts, and demonstrate the
knowledge of, and need for sustainable development.
PO8: Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.

PO9: Individual and teamwork: Function effectively as an individual, and as a


member or leader in diverse teams, and in multi-disciplinary settings.

PO10: Communication: Communicate effectively on complex engineering activities


with the engineering community and with society at large, such as, being able to
comprehend and write effective reports and design documentation, make effective
presentations, and give and receive clear instructions.

PO11: Project management and finance: Demonstrate knowledge and understanding


of the engineering and management principles and apply these to one’s own work,
as a member and leader in a team, to manage projects and in multi-disciplinary
environments.

PO12: Lifelong learning: Recognize the need for and have the preparation and ability to
engage in independent and life-long learning in the broadest context of
technological change.

PROGRAM SPECIFIC OUTCOMES (PSOs):

PSO1: Develop software applications/solutions as per the needs of Industry and society

PSO2: Adopt new and fast emerging technologies in computer science and engineering

COURSE CONTENT
UNIT I
Data Mining: Data Mining, Kinds of Patterns Can Be Mined, Applications of data mining.
Data pre-processing: Data Cleaning: Missing Values, Noisy Data, Data Cleaning as a Process;
Data Integration: Entity Identification Problem, Redundancy and Correlation Analysis, Tuple
Duplication, Data Value Conflict Detection and Resolution; Data Transformation and Data
Discretization: Data Transformation Strategies Overview, Data Transformation by Normalization,
Discretization by Binning, Discretization by Histogram Analysis.
Introduction to Big Data Analytics: Big Data Overview, State of the Practice in Analytics, Key
Roles for the New Big Data Ecosystem, Examples of Big Data Analytics
Data Analytics Lifecycle: Data Analytics Lifecycle Overview, Discovery, Data Preparation, Model
Planning, Model Building, Communicate Results, Operationalize

UNIT II
Association Rules: Apriori Algorithm, Evaluation of Candidate Rules, Applications of Association
Rules, Transactions in a Grocery Store,Validation and Testing;
Regression: Linear Regression, Logistic Regression
Advanced Analytical Theory and Methods-Classification: Decision Trees, Naïve Bayes;
Classification by Back propagation
Advanced Analytical Theory and Methods-Clustering: major categories of clustering methods, k-
means, k-nearest neighbor; DBSCAN

UNIT III
Advanced Analytical Theory and Methods-Time Series Analysis: Overview of Time Series
Analysis, ARIMA Model.
Advanced Analytical Theory and Methods-Text Analysis: Text Analysis Steps, Text Analysis
Example, Collecting Raw Text, Representing Text, Term Frequency—Inverse Document
Frequency (TFIDF), Categorizing Documents by Topics, Determining Sentiments

UNIT IV
Advanced Analytics- Technology and Tools: MapReduce and Hadoop: Analytics for Unstructured
Data, The Hadoop Ecosystem,
In-Database Analytics: SQL Essentials, In-Database Text Analysis.
Putting It All Together: Communicating and operationalizing an Analytics Project, Creating the
final deliverables, and Data Visualization basics.

TEXT BOOKS
[1] Data Science and Big Data Analytics, EMC2 Education Services, John Wiley, 2015 [Unit
II,III,IV]
[2] Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, 3 ed, Elsevier
Publishers [Unit I]
REFERENCE BOOKS
[1] Simon Walkowiak Big Data Analytics with R: Leverage R Programming to uncover hidden
patterns in your Big Data ,Packt publishing, 2016
[2] Nathan Marz, James Warren, “Big Data-Principles and best practices of scalable real-time data
systems”, DreamTech Press, 2015
[3] Benjamin Bengfort, Jenny Kim, Data Analytics with Hadoop: An Introduction for Data
Scientists, OReilly ,1st Edition, 2016
E-RESOURCES AND OTHER DIGITAL MATERIAL
[1] Prof. D. Janaki Ram and S. Srinath, III Madras, Data Mining and Knowledge Discovery
https://freevideolectures.com/course/2280/database-design/35, Last accessed on 11th April
2020
[2] Prof. Nandan sudharsanam and Prof . B.Ravindran , IIT Madras, Introduction to Data
Analytics http://nptel.ac.in/courses/110106064/23, Last accessed on 11th April 2020

COURSE DELIVERY PLAN:

Sess C O COI BTL Topic(s) Session Book – T1, Teaching Active Evaluation
No Level Outcomes [CH No], Learning Learning Components
[Page No] Methods Methods
1. 1 1 1 Data Mining: Data Understand the T2, 1.2, 5 Board/ A1, S1,
Mining, Kinds of Patterns kinds of T2,1.4, PPT HA, SE
Can Be Mined- patterns of data 15-18 Exam
Concept/Class mining
Description:
Characterization and
Discrimination,
Classification, Prediction,
2. 1 1 1 Cluster Analysis and Understand the T2, 1.4, Board/ Quiz A1, S1,
Outlier Analysis, kinds of 19-20 PPT HA, SE
Applications of data patterns and Exam
mining applications of
data mining
3. 1 1 1 Data Preparation: Data Understand T2,3.2, Board/ A1, S1,
Cleaning: Missing data cleaning 88-91 PPT HA, SE
Values, Noisy Data, Data methods Exam
Cleaning as a Process.
4. 1 2 2 Data Integration: Entity Understand and T2,3.3, Board/ Paper A1, S1,
Identification Problem, analyse data 93-99 PPT Work HA, SE
Redundancy and integration Exam
Correlation Analysis, techniques
Tuple Duplication, Data
Value Conflict Detection
and Resolution
5. 1 2 2 Data Transformation and Understand and T2,3.5, Board/ Paper A1, S1,
Data Discretization: Data analyse data 111-115 PPT Work HA, SE
Transformation Strategies transformation Exam
Overview, Normalization, and
Discretization by Binning discretization
and Histogram Analysis methods
6. 1 1 1 Introduction to Big Data Understand T1,1.2, Board/ A1, S1,
Analytics: Big Data fundamentals of 29-31 PPT HA, SE
Overview, State of the Big data Exam
Practice in Analytics, Key analytics
Roles for the New Big
Data Ecosystem. T1,1.4, 41
Example of Big Data
Analytics
7. 1 1 1 Data Analytics Life cycle: Understand life T1,2.1, Board/ A1, S1,
Data Analytics Life cycle cycle of Big 47-49 PPT HA, SE
Overview, Key Roles for data analytics Exam
a Successful Analytics
Project, Background and
Overview of Data
Analytics Lifecycle
8. 1 1 1 Discovery: Learning the Understand life T1,2.2, Board/ A1, S1,
Business Domain, cycle of Big 53-58 PPT HA, SE
Resources: Framing the data analytics Exam
Problem, Identifying Key
Stakeholders,
Interviewing the
Analytics Sponsor,
Developing Initial
Hypotheses, Identifying
Potential Data Sources
9. 1 1 1 Model Planning: Data Understand life T1, 2.4, Board/ A1, S1,
Exploration and Variable cycle of Big 68-75 PPT HA, SE
Selection, Model data analytics Exam
Selection, Common Tools
for the Model Planning
Phase

10. 1 1 1 Communicate Results and Understand life T1,2.6, Board/ Quiz A1, S1,
Operationalize cycle of Big 76-78 PPT HA, SE
data analytics Exam
11. 2 1 1 Association Rules: Understand T1, 5, Board/ S1, HA ,
Overview, Apriori Apriori 175-179 PPT SE Exam
Algorithm algorithm
12. 2 3 3 Evaluation of Candidate Apply apriori T1, 5.3, Board/ Case S1, HA ,
Rules, Applications of algorithm and 180-183, PPT study SE Exam
Association Rules, evaluate 196
Transactions in a Grocery associate rules
Store, The Groceries
Dataset, Validation and
Testing
13. 2 1 1 Regression: Linear Understand T1, 6.1, Board/ Case S1, HA ,
Regression, Use Cases, Linear 204-205 PPT study SE Exam
Model Description Regression and
its usecases
14. 2 3 3 Logistic Regression, Use Understand and T1, 6.2, Board/ Quiz S1, HA ,
Cases Model Description analyze various 222 PPT SE Exam
Additional Regression types of
Models Regression
models
15. 2 3 3 Advanced Analytical Apply decision T2, 6.3, Board/ Case S1, HA ,
Theory and Methods- tree algorithm 291-292 PPT study SE Exam
Classification: Decision to classify the
Trees, Decision Tree data
Induction
16. 2 2 2 Attribute Selection Understand and T2, 6.3.2, Board/ Memory S1, HA ,
Measure analyze various 296 PPT Matrix SE Exam
types attribute
selection
measures
17. 2 3 3 Naïve Bayes, Baye’s Understand and T2, 6.4, Board/ Case S1, HA ,
Theorem, Naïve Bayesian apply Naïve 310 PPT study SE Exam
Classification Bayesian
classification
algorithm
18. 2 2 2 Classification by Back Analyze the T2, 6.6, Board/ S1, HA ,
propagation, A Multilayer Back 327-329 PPT SE Exam
Feed Forward Neural propagation
Network, Defining a algorithm to
Network Topology, Back classify the data
propagation
19. 2 3 3 Case study of Back Apply the Back Board/ Case S1, HA ,
propagation propagation to PPT study SE Exam
classify the data
20. 2 1 1 Advanced Analytical Understand T2, 7.3, Board/ S1, HA ,
Theory and Methods- major 398 PPT SE Exam
Clustering: major categories of
categories of clustering clustering
methods methods
21. 2 3 3 K-means and case study Apply k-mean T2, 7.4.1, Board/ Case A2, S2,
algorithm to 402 PPT study HA, SE
classify the data Exam
22. 2 3 3 K-Nearest Neighbor and Apply KNN T2, 6.9.1, Board/ Quiz A2, S2,
DBSCAN and DBSCAN 377 and PPT HA, SE
to classify the T2, 7.6.1, Exam
data 418
23. 3 1 1 Advanced Analytical Understand T1, 8.1, Board/ A2, S2,
Theory and Methods- Overview of 282 PPT HA, SE
Time Series Analysis: Time Series Exam
Overview of Time Series Analysis
Analysis
24. 3 2 2 Box-Jenkins Methodology, Analyse the T1, 8.1.1, Board/ A2, S2,
ARIMA Model, ARIMA Model 283-286 PPT HA, SE
Autocorrelation Function Exam
(ACF)
25. 3 2 2 Autoregressive Model, Analyse the T1, 8.2.3, Board/ A2, S2,
Moving Average Models Autoregressive 287-289 PPT HA, SE
and Moving Exam
Average
models
26. 3 3 3 ARMA and ARIMA Analyze T1, 8.2.4, Board/ Quiz A2, S2,
Models ARMA and 290 PPT HA, SE
ARIMA models Exam
27. 3 1 1 Advanced Analytical Understand T1, 9.1, Board/ A2, S2,
Theory and Methods-Text Text Analysis 310-311 PPT HA, SE
Analysis: Text Analysis Steps and Exam
Steps, Text Analysis examples
Example
28. 3 2 2 Collecting Raw Text, Understand T1, 9.2, Board/ A2, S2,
Representing Text Text Analysis 314-318 PPT HA, SE
Steps Exam
29. 3 1 1 Term Frequency—Inverse Understand T1, 9.5, Board/ A2, HA
Document Frequency TFIDF and 324 PPT A2, S2,
(TFIDF), Categorizing Categorizing T1, 9.6, HA, SE
Documents by Topics Documents by 329 Exam
Topics
30. 3 3 3 Determining Sentiments Apply text T1, 9.7, Board/ Case A2, S2,
analysis to 333 PPT Study HA, SE
Determining Exam
Sentiments
31. 4 1 1 MapReduce and Hadoop: Understand T1, 10.1, Board/ S2, HA,
Analytics for fundamentals of 353 PPT SE Exam
Unstructured Data, Use Hadoop T1, 10.1.3,
Cases, Apache Hadoop 356
32. 4 2 2 The Hadoop Ecosystem: Understand and T1, 10.2, Board/ S2, HA,
Pig anayze Hadoop 364 PPT SE Exam
Ecosystem
33. 4 2 2 Hive and HBase Understand and T1, 10.2, Board/ Quiz S2, HA,
anayze Hadoop 366-369 PPT SE Exam
Ecosystem
34. 4 3 3 In-Database Analytics: Use SQL in In- T1, 11, Board/ Paper S2, HA,
SQL Essentials, Joins Database 389-391 PPT Work SE Exam
Analytics
35. 4 3 3 Set Operations, Grouping Use SQL in In- T1, 11, Board/ Paper S2, HA,
Extensions Database 393-395 PPT Work SE Exam
Analytics
36. 4 1 1 In-Database Text Understand In- T1, 12, Board/ S2, HA,
Analysis Database Text 400 PPT SE Exam
Analysis
37. 4 2 2 Putting It All Together: Analyze the T1, 12.1, Board/ S2, HA,
Communicating and Data Analytics 422 PPT SE Exam
operationalizing an life cycle and
Analytics Project, create the final T1, 12.1,
Creating the final deliverables 425
deliverables
38. 4 3 3 Developing Core Material Developing T1, 12.1, Board/ S2, HA,
for Multiple Audiences, Core Material 426-430 PPT SE Exam
Project Goals, Main for Multiple
Findings Audiences,
Project Goals,
Main Findings
39. 4 1 1 Approach, Model Understand the T1, 12.1, Board/ S2, HA,
Description, Key Points key points of 432-434 PPT SE Exam
Supported with Data data
40. 4 2 2 Data Visualization basics, Understand and T1, 12.3, Board/ Quiz S2, HA,
Key Points Supported use data 441-443 PPT SE Exam
with Data, Evolution of a visualization
Graphs techniques
41. 4 2 2 Common Representation Understand and T1, 12.3, Board/ S2, HA,
Methods, How to Clean use data 451-457 PPT SE Exam
Up a Graphic, Additional visualization
Considerations techniques

PRACTICAL COMPONENT
List of Experiments supposed to finish in Open Lab Sessions:

Lab session List of Experiments


No
Preprocessing: Removal specified attribute, discrimination of a continuous valued attribute, standardization and
1.
normalization of data.
2. Association Mining: Finding Association Rules using Apriori principle e
3. Classification: Use the Classification technique to classify y the given dataset
4. Clustering: Apply the clustering technique to classify the given dataset
5. Time Series: Apply Time series techniques for prediction.
6. Text Analysis: Use text analysis methods for sentiment analysis
7. Hadoop file management: Adding files and directories ,Retrieving files , Deleting files
8. Word Count application: MapReduce program to understand MapReduce Paradigm
9. Pig Latin scripts : To sort, group, join for a given dataset
NO-SQL database –Apcache Hbase: To set Hbase shell environment and to create tables, insert rows, display
10.
contents
11. Database manipulation using Hive: To create, alter, drop databases and views
12. Functions and indexes in Hive
13. Data Analytics Lab Project
COURSE TIME TABLE
Course Conduct
Theory Lecture 2 Sections | 72 Students each | 3 Lectures per week
Class Room | Course Coordinator
Practical 2 Sections | 72 Students each | 1 Per week | each 2 hrs.
2 Batches | 3 Instructors | 72 Computers 90 minutes Experiment |
30 minutes Evaluation for 25 students per instructor

Hour 1 2 3 4 5 6 7 8 9
Compon 8.40 – 9.40 – 10.40 – 11.40 – 12.40 – 1.40 – 2.40 – 3.40 – 8.40 –
Day ent
9.40 10.40 11.40 12.40 1.40 2.40 3.40 4.40 9.40

Mon Theory
Lab
Tue Theory
Lab
Wed Theory
Lab
Thur Theory
Lab
Fri Theory
Lab
Theory
Sat
Lab

REMEDIAL CLASSES:
Supplement course handout, which may perhaps include special lectures and discussions that would be planned, and
schedule notified accordingly.

SELF-LEARNING:
Assignments to promote self-learning, survey of contents from multiple sources.

S.NoTopics CO ALM References/MOOCS

1 Introduction to Machine Learning 2 https://towardsdatascience.com/introduction-to-machine-


Algorithms: Linear Regression learning-algorithms-linear-regression-14c4e325882a

2 Linear Regression using Python 2 https://towardsdatascience.com/linear-regression-using-python


b136c91bf0a2

3 Understand and Implement 2 the http://www.adeveloperdiary.com/data-science/machine-


Backpropagation Algorithm From learning/understand-and-implement-the-backpropagation-
Scratch In Python algorithm-from-scratch-in-python/
4 In Depth: k-Means Clustering 3 https://jakevdp.github.io/PythonDataScienceHandbook/05.11
means.html

DELIVERY DETAILS OF CONTENT BEYOND SYLLABUS:


Content beyond syllabus covered (if any) should be delivered to all students that would be planned, and schedule
notified accordingly.

S.No Advanced Topics, Additional CO POs &PSOs ALM References/MOOCS

Reading, Research papers and any

1. Part 1: Statistics and Probability in Data https://medium.com/analytics-vidhya/statistics


and-probability-in-data-science-data-science
Science | Data Science 2020 5cbd41856cd3

2. Part 2: Statistics and Probability in Data https://medium.com/analytics-vidhya/part


statistics-and-probability-in-data-science-data
Science | Data Science 2020 science-2020-ed74652b8318

THEORY COURSE WITH 3 CREDITS - EVALUATION PLAN:

Evaluation Evaluation Assessment Duration


Marks CO1 CO2 CO3 CO4
Type Component Dates (Hours)

Blooms Taxonomy Level


Assignment –I 45 Min
Assignment -I Max Marks: 10 M
Dates
Sessional –I 60 Min
Sessional -I
Max Marks:12 M Dates
In-Semester Assignment –II 45 Min
Assignment -II Max Marks: 10 M
Summative Dates
Evaluation Sessional –II 60 Min
Sessional -II Max Marks: 12 M
Total = 30 % Dates
Home Home Assignment Dates
Max Marks: 5 M
Assignment
Attendance Max Marks: 3 M Continuous evaluation
End-Semester
Summative
Semester End End Sem Exam
Evaluation Max Marks: 70M 3 hrs
Exam Dates
Total = 70 %
ATTENDANCE POLICY

3 marks in each theory course shall be given for regularity in a graded manner as given in
the Table 3.

PLAGIARISM POLICY
Use of unfair means in any of the evaluation components will be dealt with strictly, and the case will be reported to the
examination committee.

COURSE TEAM MEMBERS, CHAMBER CONSULTATION HOURS AND CHAMBER VENUE


DETAILS:
Each instructor will specify his / her chamber consultation hours during which the student can contact him / her in his / her
chamber for consultation.

S.No. Name of Faculty Chamber Chamber ConsultationChamber Consultation


Signature of Course
Consultation Day
Timings
(s) for each day Room No: faculty
1 Dr K.Srinivas Working days 4.00 pm to 5 pm

2 Mr. G Arun Kumar Working days 4.00 pm to 5 pm

GENERAL INSTRUCTIONS
Students should come prepared for classes and carry the text book(s) or material(s) as prescribed by the Course Faculty to the
class.

NOTICES
All notices will be communicated through the institution email.
All notices concerning the course will be displayed on the respective Notice Boards.

Signature of COURSE COORDINATOR:

Signature of Department Prof. Incharge Academics & Vetting Team Member:

HEAD OF DEPARTMENT:

You might also like