Professional Documents
Culture Documents
DM Handout
DM Handout
In addition to part I (General Handout for all courses appended to the time-table) this portion gives
further specific details regarding the course.
Course Objective:
To gain a comprehensive understanding of various data mining technique (theoretical and practical
aspect) and the ability to compare their merits and demerits for solving real-world problems.
Course Description:
This course explores the concepts and techniques of data mining, a promising and flourishing frontier in
database systems. The scope of the course covers basic data mining tasks like data pre-processing,
exploratory data analysis, data quality measures, classification, clustering, and anomaly detection
techniques. This course is designed to provide students with a broad understanding of the design and
use of different data mining algorithms. The course also aims at providing a holistic view of data
mining. It will have database, statistical, algorithmic and application perspectives of data mining.
Furthermore, the objective of the course is to have hands-on on data mining algorithms.
Text Book:
T1 Pang-Ning Tan, Micheal Steinbach, Vipin Kumar, “Introduction to Data Mining”, Pearson,
2016
Reference Books:
R1 Han J & Kamber M, “Data Mining: Concepts and Techniques”, Morgan Kaufmann
Publishers, 2011
R2 Hand D, Mannila H, & Smyth P, “Principles of Data Mining”, MIT Press, 2004
R3 Pujari A K, “Data Mining Techniques”, University Press (India), 2013
R4 Anand Rajaraman and Jeffrey Ullman, “Mining of Massive Datasets”, 2011
Learning Objectives
LO1 Students will gain an understanding of Data Mining as a whole and its components.
LO2 Students will know data pre-processing techniques, their issues and possible
conventional solutions- Noise Reduction, Data Reduction, and Missing Values etc.
LO3 Students will have a detailed understanding of clustering and classification methods, their
limitations and applications.
LO4 Students will acquire knowledge about data warehousing, decision making, and association
rule mining algorithms.
LO5 After the course completion, students will be able to design and build real-world
applications using data mining algorithms.
Course Plan:
Lecture No.: Topics
1 Introduction, Motivation, Plan, Evaluation, Policies
Introduction to Data Mining
What is Data mining
2-3
Motivation & challenges
Data Mining Tasks
Data
Types of Data
Data quality
4-7
Data Preprocessing
Measures of Similarity & Dissimilarity
Exploratory Data Analysis (EDA)
Cluster Analysis: Basic concepts and algorithms
Overview
K-Means
Agglomerative and Divisive hierarchical clustering
8-15 DBSCAN
BIRCH Algorithm
CURE Algorithm
Bradley Fayyad Reina (BFR) Algorithm
Cluster evaluation
Cluster Analysis: Additional Issues and Algorithms
Characteristics of Data, Clusters and Clustering Algorithms
16-18 Prototype-based clustering
Density-based Clustering
Graph-based Clustering
Classification
Basics
General approach to solving a classification problem
19-21 Decision Tree
Model overfitting
Evaluating the performance of a classifier
Methods of comparing classifiers
22 Course Pre-Summary for Mid-Semester Exam
Classification: Alternative Techniques
Rule-based classifiers
Nearest-neighbour classifiers
23-28
Bayesian Classifiers
Support Vector Machines
Ensemble methods
Mining Frequent Itemsets
Application of Frequent Itemset
Association Rule Mining
29-30
Apriori and FP Growth algorithm
Confidence, interest, lift measures
Rule Evaluation
Finding Similar Items:
Application of Nearest Neighbor Search
31-35 Jaccard and Cosine Similarity
Shingles
Min-hashing
Locality Sensitive Hashing (LSH)
Distance Measure
36-37 Theory of LSH
LSH Families for other Distance Measure
Application of LSH
Anomaly Detection
Preliminaries
Statistical Approaches
38-39
Proximity-based outlier detection
Density-based outlier detection
Clustering based Techniques
Adversarial Machine Learning
40-43 Poisoning Attacks
Evasion Attacks
44 Course Summary, Review for End-Semester exam
Evaluation:
Component Nature Examination Schedule Weightage
Quiz – I/II/III Closed Book TBA 10%
As per the timetable
Mid Semester Closed Book 07/03/20, Saturday 30%
(11:00 AM - 12:30 PM)
Assignment – I/II/III Open Book TBA 20%
As per the timetable
Comprehensive Test TBA 04/05/20, Monday 40%
(AN)
Office Hours:
Hemant Rathore: Thus 10am to 12pm
Announcements:
All notices concerning this course will be displayed on the course page of the Photon server.
o http://photon.bits-goa.ac.in/lms/
Follow-up with ID/ARC notices as well.
Make-up Policy:
Quiz / Assignment: No Makeup
Mid-Semester/Comprehensive Makeup:
o Only with prior permission (in written)
o Given only on justifiable ground
o Will not be given to attend any marriage/function etc.
Instructor in-charge
CS F415