You are on page 1of 13

(Established under the Presidency University Act, 2013 of the Karnataka Act 41 of 2013)

ACA-2 [2021] COURSE HAND OUT

SCHOOL: School of Engineering DEPT.: CSE DATE OF ISSUE: 7/9/2022

NAME OF THE PROGRAM: B. Tech. (Computer Science and Engineering)


P.R.C. APPROVAL REF.: PU/AC-18.8/CSE 16/CSE/2020-24
SEMESTER/YEAR: V semester/III year
COURSE TITLE & CODE: CSE2021 - Data Mining
COURSE CREDIT STRUCTURE: 3-0-3
CONTACT HOURS: 41 Classes
INSTRUCTOR INCHARGE: Ms. Thabassum Khan.S
COURSE INSTRUCTOR(S): Dr.T.K.Thivakaran, Dr.Alamelu Mangai Jothidurai, Dr.Gowthul Alam
M M, Mr. Amogh Pramod Kulkarni, Ms. Sapna R, Ms.Pavithra N, Ms.
Shruthi U, Ms.C.M.Manasa.

PROGRAM OUTCOMES:

PO-1: Engineering knowledge: Apply the knowledge of mathematics, science, engineering


fundamentals, and an engineering specialization to the solution of complex engineering problems.

PO-2: Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of mathematics, natural
sciences, and engineering sciences.

PO-3: Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for the
public health and safety, and the cultural, societal, and environmental considerations.

PO-4: Conduct investigations of complex problems: Use research-based knowledge and research
methods including design of experiments, analysis and interpretation of data, and synthesis of the
information to provide valid conclusions.

PO-5: Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities with an
understanding of the limitations
PO-6: The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal,
health, safety, legal and cultural issues, and the consequent responsibilities relevant to the professional
engineering practice.

PO-7: Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts, and demonstrate the knowledge of and need for sustainable
development.

PO-8: Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of
the engineering practice.

PO-9: Individual and teamwork: Function effectively as an individual, and as a member or leader in diverse
teams, and in multidisciplinary settings.

PO-10: Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports and
design documentation, make effective presentations, and give and receive clear instructions.

PO-11: Project management and finance: Demonstrate knowledge and understanding of the engineering
and management principles and apply these to one's own work, as a member and leader in a team, to
manage projects and in multidisciplinary environments.

PO-12: Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.

COURSE PREREQUISITES: -NIL -


COURSE DESCRIPTION: This course introduces an extensive study on data pre-processing and
classification algorithms. This course will help the students in selecting suitable data mining
algorithms to solve the real time problems, and to discover frequent item sets by association rule
algorithm. The course emphasizes the recent trends in spatial mining. It interacts the students to
study the different Clustering algorithms.

Type of Skills : Employability , Real-time Knowledge


Nature of Course: Application Based

COURSE OUTCOMES: On successful completion of the course the students shall be able to:
1. Describe the basic concepts and issues involved in Data Mining.
[Knowledge]

2. Discuss different preprocessing techniques on Data Analysis.


[Comprehension]
3. Identify frequent item sets by using Association rule algorithms.
[Application]

4. Apply different Classification algorithms in data mining. [Application]


5. Implement various clustering techniques on Data. [Application]
MAPPING OF C.O. WITH P.O.: [H-HIGH, M- MODERATE, L-LOW]

COs PO-01 PO-2 PO-03 PO-04 PO-05 PO-06 PO-07 PO-08 PO-09 PO-10 PO-11 PO-12
CO1 H L L L - - - - - - - M
CO2 H M M M - - - - - - - M
CO3 H H H H - - - - - - - M
CO4 H H H H - - - - - - - M
CO5 H M L L - - - - - - - M

COURSE CONTENT (SYLLABUS):

Module 1: Introduction to Data Mining [ 06 Hrs] [Knowledge]

Introduction to Data mining: Definition, KDD, Challenges, Data Mining Tasks - Data Mining Goals–
Stages of the Data Mining Process–Data Mining Techniques– Applications – Major Issues in Data
mining.
Module 2: Data Preprocessing [07 Hrs][Comprehension]

Types of data – Data Quality – Data Pre-processing Techniques – Similarity and Dissimilarity
measures.

Module 3: Data Mining – Frequent Patterns [07 Hrs][Application]

Motivation and terminology: Basic idea - Item sets – Generating frequent item sets and rules
efficiently – Apriori Algorithm – FP Growth.
Assignment: Apply the Apriori algorithms for finding the frequent Item set in the given TDB.

Module 4: Classification [08 Hrs][Application]

Basic concepts – Decision tree Induction – Bayes classification methods – Rule based classification –
Classification by Back Propagation – Lazy learners.
Assignment:
1) Find the Gini Index value of the attributes.
2) Classify the given model using Decision tree algorithm.

Module 5: Cluster Analysis Methods and Pattern Mining [08 Hrs][Application]

Cluster Analysis-Partitioning methods – Hierarchical methods – Basics of Density based method –


Pattern mining: A Road Map – Spatial Mining.

DELIVERY PROCEDURE (PEDAGOGY):

Self-learning: Data Quality, Types of Data, Text Mining, Web Mining, Data Mining Applications.
Participative Learning: Applying cluster and classifier algorithm on real –time applications (Group
discussions).
REFERENCE MATERIALS:

Text Books:

T1. Jiawei Han, Micheline Kamber and Jian Pei, “Data Mining: Concepts and Techniques”,
Morgan Kaufmann Publishers, Third Edition, 2012.

Reference Books:

R1. Tan P. N, Steinbach M and Kumar V, “Introduction to Data Mining”, Pearson Education, 2016.

R2. G K Gupta, “Introduction to Data Mining with Case Studies”, Third Edition, PHI, 2014.

R3. Alex Berson and Stephen J. Smith, “Data Warehousing, Data Mining and OLAP”, Tata McGraw Hill.

Web Based Resources and E-books:

W1. https://onlinecourses.swayam2.ac.in/cec20_cs12/preview
Text book of Data Mining: Concepts and Techniques, Jiawei Han, Micheline Kamber and Jian
Pei, Morgan Kaufmann Publishers, 2012.

W2. https://puniversity.informaticsglobal.com:2284/ehost/detail/detail?vid=7&sid=e2d7362a-
fd3049a98f0393e963521dbd%40redis&bdata=JnNpdGU9ZWhvc3QtbGl2ZQ%3d%3d#AN=377411
&db=nlebk

GUIDELINES TO STUDENTS:
1. Students are required to strictly adhere to assignment deadlines.
2. Students are required to actively participate in classroom discussions and other activities which is
planned in and out of the classroom.

COURSE SCHEDULE:

Sl. STARTING CONCLUDING TOTAL NUMBER OF


ACTIVITY
No. DATE DATE PERIODS
01 Program integration 12/09/2022 12/09/2022 01
06
02 Module: 01 13/09/2022 26/09/2022 [06 Lectures +
01 Course Integration]
07
04 Module: 02 27/09/2022 14/10/2022 [06 Lectures +
01 Course Integration]
05 Midterm 03/11/2022 07/11/2022 --
01
06 Discussion of Midterm paper 10/11/2022 11/11/2022
[01 Lectures]
07
07 Module: 03 18/10/2022 16/11/2022 [07 Lectures +
01 Course Integration]

08 Assignment 1 --

08
09 Module: 04 17/11/2022 08/12/2022 [08 Lectures +
01 Course Integration]
Assignment 2
10 --
(Article Review)
08
11 Module: 05 09/12/2022 26/12/2022 [08 Lectures +
01 Course Integration]
12 Quiz 1 --
13 Revision 27/12/2022 29/12/2022 --
14 End Term 05/01/2023 25/01/2023 --
Total No. of Hours 41

SCHEDULE OF INSTRUCTION:

Course
Sl. Session no
Lesson Title Topics Outcome Delivery Mode Reference
No [with date]
Number
CO1, CO2, Classroom
L1 Program
1 Program Integration CO3, CO4, Lecture/ PPT T1, R1
[12/09/2022] Integration
CO5 presentation
Classroom
Introduction to Introduction to Data mining:
2 L2 CO1 Lecture/ PPT T1, R1, R2
Module 1 Definition, KDD, Challenges
presentation
Classroom
Tasks
3 L3 Data Mining Tasks CO1 Lecture/ PPT T1
Related to DM
presentation
Tasks Classroom
Data Mining Tasks CO1
4 L4 Related to DM Lecture/ PPT T1
presentation
Foundations Classroom
Data Mining Goals, Stages of the
5 L5 and Goals in CO1 Lecture/ PPT T1
Data Mining Process
DM presentation
Techniques & Classroom
Data Mining Techniques,
6 L6 Applications in CO1 Lecture/ PPT T1
Applications
DM presentation
Classroom
7 L7 Issues in DM Major Issues in Data mining. CO1 Lecture/ PPT T1
presentation
Classroom
8 L8 Types of Issues Major Issues in Data mining. CO1 Lecture/ PPT T1
presentation

Revision of Module-1 Course integration

Classroom
Introduction to Introduction to Data Pre-
9 L9 CO2 Lecture/ PPT T1
Module 2 processing, Data Quality.
presentation
Classroom
Types in data Types of data Pre-Processing,
10 L10 CO2 Lecture/ PPT T1
Pre-Processing Aggregation, sampling
presentation
Pre-processing Classroom
Dimensionality Reduction,
11 L11 techniques CO2 Lecture/ PPT T1
Feature Subset Selection
presentation
Classroom
Steps related to
12 L12 Feature Creation CO2 Lecture/ PPT T1, R1
Pre-processing
presentation
Data Discretization and Classroom
Pre-processing T1, R1, R2,
13 L13 Binarization, Variable CO2 Lecture/ PPT
techniques W1, W3
Transformation presentation
Measures of Similarity & Dissimilarity Classroom
T1, R1, R2,
14 L14 Similarities Measures CO2 Lecture/ PPT
W1, W3
presentation
Comparison of Similarity & Classroom
Comparison of T1, R1, R2,
15 L15 Dissimilarity Measures CO2 Lecture/ PPT
Measures W1
presentation

Revision of Module-2 Course integration

Introduction to Data Mining – Frequent Classroom


T1, R1, R2,
16 L16 Module 3 Patterns, Basic Concepts. CO3 Lecture/ PPT
W1
presentation
Classroom
Basics, Item Frequent Patterns, Basic idea - T1, R1, R2,
17 L17 CO3 Lecture/ PPT
Sets Item sets W1
presentation
Classroom
Item Sets & Generating frequent item sets
18 L18 CO3 Lecture/ PPT R1, R2, W1
Rules Efficiency and rules efficiently
presentation
Algorithm with Classroom
19 L19 problem & Apriori Algorithm CO3 Lecture/ PPT R1, R2, W1
Solution presentation
Classroom
Key Example T1, R1, R2,
20 L20 Examples of Apriori Algorithm CO3 Lecture/ PPT
with Solution W1
presentation
Classroom
21 L21 FP Growth FP Growth CO1, CO2 Lecture/ PPT T1, R1
presentation
By using
FP Growth Finding frequent Item Sets, Classroom
T1, R1, R2,
22 L22 Algorithm Generating of Strong Association CO3 Lecture/ PPT
W1
finding Item Rules presentation
sets

Classroom
23 L23 Midterm Discussion of Midterm paper
Lecture

Revision of Module-3 Course integration

Classroom
Introduction to Introduction to Classification, T1, R1, R2,
24 L24 CO4 Lecture/ PPT
Module 4 Basic concepts W1
presentation
Classroom
Decision R1, R2, W1,
25 L25 Decision tree Induction CO4 Lecture/ PPT
Trees W2
presentation
Classroom R1, R2, W1,
Decision Decision tree Induction CO4
26 L26 Lecture/ PPT W2
Trees
presentation
Classroom
Classification Decision tree Induction,
27 L27 CO3, CO4 Lecture/ PPT T1, R3
Methods Bayes classification methods
presentation
Bayes Classroom
Algorithm for Bayes
28 L28 Classification CO4 Lecture/ PPT T1, R3
classification methods
Method presentation
Classroom
29 L29 Rule Based Rule based classification CO4 Lecture/ PPT T1, R3
presentation
Classroom
Rule Based Example for Rule Based
30 L30 CO4 Lecture/ PPT T1, R3
classification classification
presentation
Back Classroom
T1, R3
31 L31 Propagation Classification by Back Propagation CO4 Lecture/ PPT
Method presentation
Classroom
Lazy learners
32 L32 Methods CO4 Lecture/ PPT T1, R3
presentation

Revision of Module-4 Course integration

Classroom
Introduction to Introduction to Cluster Analysis
33 L33 Methods and Pattern Mining CO5 Lecture/ PPT T1
Module 5
presentation
Classroom
Cluster Analysis, Partitioning
34 L34 Analysis CO5 Lecture/ PPT T1
Methods.
presentation
Classroom
35 L35 Method 1 Partitioning methods CO5 Lecture/ PPT T1
presentation

Partitioning methods, Participative


36 L36 Method 2 CO5 T1, R2
Hierarchical methods Learning

Participative
37 L37 Method 2 Hierarchical methods CO5 T1, R2
Learning

Density Based Participative


38 L38 Basics of Density based method CO5 T1, W2
Method Learning

Classroom
Pattern
39 L39 Pattern mining: A Road Map CO5 Lecture/ PPT T1
mining
presentation
Classroom
Spatial
40 L40 Spatial Mining CO5 Lecture/ PPT T1
Mining
presentation
Classroom
41 L41 Revision Revision CO5 Lecture/ PPT T1
presentation

ASSESSMENT SCHEDULE:

Course
Assessment Duration Venue, DATE &
Sl. No. Contents outcome Marks Weightage
type In Hours TIME
Number
Midterm Module 03/11/2022 to
1 CO 1, CO 2 2 hours 50 Marks 25%
Examination 1,2 07/11/2022
Module DOI: 16/11/2022
2 Assignment 1 CO 2, CO 3 -- 15 Marks 7.5%
2,3 DOS: 21/11/2022
Assignment 2
DOI:28/11/2022
3 (Article Module 4 CO 4 20 Marks 10%
-- DOS: 01/12/2022
Review)
Module
4 Quiz 1 CO 4, CO 5 -- 15 Marks 7.5% 23/12/2022
4, 5
CO1,
Module CO 2, 100 05/01/2023 to
5 End Term 3 hours 50%
1,2,3,4,5 CO 3, CO 4, Marks 25/01/2023
CO 5

COURSE CLEARANCE CRITERIA:

1. Course Clearance Criteria will be as per Program Regulations.

CONTACT TIMINGS IN THE CHAMBER FOR ANY DISCUSSIONS:

1. Will be informed by the respective faculty in the class (or will be given in the TT).

SAMPLE THOUGHT PROVOKING QUESTIONS:

COURSE
SL BLOOM’S
QUESTION MARKS OUTCOME
NO LEVEL
NO.
Suppose that you are employed as a datamining
consultant for an Internets Search engine company,
describe how data mining can help the company by
1. 5 marks CO 1 L1
giving Specific examples of how techniques, such as
clustering, classification, Association rule mining, and
anomaly detection can be applied.
Which of the following quantities are likely to
show more temporal autocorrelation: daily rainfall or
2. 5 Marks CO1 L2
daily temperature? Why?

Consider the following set of frequent3-itemsets:


{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {2, 3, 4}, {2,
3. 3, 5}, 5 marks CO3 L3
{3, 4, 5}.
Assume that there are only five items in the data set.
List all candidate 4-itemsets obtained by a candidate
generation procedure using the Fk−1 × F1 merging
strategy.
Listallcandidate4-
itemsetsobtainedbythecandidategenerationprocedure
in Apriori.
List all candidate 4-itemsets that survive the candidate
pruning step of the Apriori algorithm.
Discuss which algorithm is an influential algorithm for
4. mining frequent item sets for Boolean association 5 marks CO3 L3
rules? Explain with an example?
Discuss about the outliers? Explain the weakness and
5. 4 marks CO5 L4
strengths in hierarchical clustering methods?
Explain about IF-THEN rules used for classification with
6. an example and also discuss about sequential covering 5 marks CO3 L1
algorithm?
Explain how to mine frequent item sets using vertical 10
7. CO4 L1
data format? marks

Target set for course Outcome attainment:

Target set for


Sl. No. C.O. No. Course Outcomes attainment in
percentage
Explain the basic concepts and issues involved in Data
01 CO1 75%
Mining.
Discuss different preprocessing techniques on Data
02 CO2 70%
Analysis
Discover frequent item sets by using Association rule
03 CO3 60%
algorithms
Apply different Classification and Clustering
04 CO4 60%
techniques used in data mining
Describe the advances in data mining for the selected real
05 CO5 60%
time application.

Signature of the Course Instructor


This course has been duly verified Approved by the D.A.C.

Signature of the Chairperson D.A.C.

Course Completion Remarks &Self-Assessment.

Activity Scheduled Actual


Sl.
as listed in the course Completion Completion Remarks
No.
Schedule Date Date

1. Program integration (L1) 12/09/2022


2. Module: 01

3. Module: 02

4. Mid-Term Examination 03/11/2022 07/11/2022


Review and Discussion
5. of Mid-Term Exam
Papers

6. Module:03

7. Assignment 1

8. Module:04

Assignment 2
9.
(Article Review)

10. Module:05

11. Quiz 1

Any specific suggestion/Observations on content/coverage/pedagogical methods used etc.:

Course Outcome Attainment:

Target set for Actual C.O. Remarks on attainment &


Sl.
C.O. No. Course Outcomes attainment in attainment in Measures to enhance the
No.
Percentage Percentage attainment
Explain the basic concepts
01 CO1 and issues involved in Data 75%
Mining.
Discuss different
preprocessing techniques on
02 CO2 70%
Data
Analysis
Discover frequent item sets
03 CO3 by using Association rule 60%
algorithms
Apply different
Classification and
04 CO4 Clustering 60%
techniques used in data
mining
Describe the advances in data
05 CO5 mining for the selected real 60%
time application.

Name and signature of the faculty member:

D.A.C. observation and approval:

You might also like