SAK 5609

DATA MINING
Prof. Madya Dr. Md. Nasir bin
Sulaiman
nasir@fsktm.upm.edu.my
03-89466507
012-6323430
Synopsis
 Kredit: 3(3+0)
 Contact hours: 3 x 1 hour per week
 Semester: I
 Emphasis on concepts of data mining. It includes
principles of data mining, data mining functions,
data mining processes, data mining techniques
such as K-nearest neighbour and clustering
algorithms, rule induction, decision tree
algorithms, association rule mining, neural
networks and genetic algorithms; and data mining
examples. Industrial and scientific applications
will be given.
Assessment & References
 Assessment:
– Exercises (10%)
– Project I (15%) + presentation I (5%) Week 7
Project II (15%) + presentation II (5%) Week 14
– Mid-exam 20% (1 hour) Week 6
– Final exam 30% (1.5 hours) Week 15 - 17

 References:
– Jiawei Han & Micheline Kamber, (2001), ―Data Mining: Concepts
and Techniques‖, Morgan Kaufman.
– Michael J.A.Berry & Gordon S. Linoff, (2004), ―Data Mining
Techniques (2nd edition)‖, Wiley.
– Other related articles


Course Contents
Chapter 1 Introduction
– Motivation
– Origin of data mining
– What it is/ isn’t
– The KDD process
– Types of data
– Data mining tasks
• Association rule mining, sequential rules, clustering,
classification, anomaly detection
Course contents
Chapter 2 Data issues
– What is data set?
– Types of attributes
– Transformation for different types
– Types of data
• Structured data, record data, data matrix, document
data, transaction data, graph data, ordered data
– Data quality
• Noise and outliers, missing values,
inconsistent/duplicate data
Course contents
Chapter 3 Data preprocessing
– Why Data Preprocessing?
– Why Is Data Preprocessing Important?
– Major Tasks in Data Preprocessing
• Data Cleaning
• Data integration
• Data transformation
• Data reduction
• Data discretization
Course contents
Chapter 4 Association rule mining
– Introduction
– The Model
– Goal and Key Features
– Mining Algorithms
– Problems with the Association Rule Model
– Issues of association rules
– Other Main Works on Association Rules
Course contents
 Chapter 5 Classification
– Overview
– An example application
– Definition
– Classification Model
– General Approach
– Classification—A Two-Step Process
– Classification Techniques
– Evaluating classification methods
– Decision Tree Based Classification, rule based classifiers, nearest
neighbor classifiers etc
Course contents
Chapter 6 Clustering
– Introduction
– What is/is not cluster analysis?
– Examples of clustering applications
– Concepts of clustering
– Types of data in clustering analysis
– Types of clustering – hierarchical, partitional
– Major Clustering Techniques
– Types of clusters
– Clustering algorithms
Chapter 7 Anomaly Detection
Applications
Causes of anomalies
Approaches to anomaly detection
– Statistical
– Proximity-based outlier detection
– Density-based outlier detection
– Clustering-based techniques
Issues dealing with anomalies

Course Contents
Chapter 8 Visualization
– What is visualization?
– Motivation for visualization
– General categories of visualization
– Representation
– Arrangement
– Selection
– Do’s and don’ts
– Visualization techniques
Course contents
Chapter 9 Text mining, web mining
– Introduction
– Text processing
– Relevance judgement
– Web Search
– Search engines