Professional Documents
Culture Documents
Data Mining and Knowledge Discovery in Databases (KDD) State of The Art
Data Mining and Knowledge Discovery in Databases (KDD) State of The Art
1
Conference overview
User
Shell System-
Knowledge Enginee
Base r
User Interface
DataMining
Inference
Data/Exp
ert
What is KDD?
Why is KDD necessary
The KDD process
KDD operations and methods
Prediction
What? Opaque
Description
Why? Transparent
Verification driven
Validating hypothesis
Querying and reporting (spreadsheets, pivot
tables)
Multidimensional analysis (dimensional
summaries); On Line Analytical Processing
Statistical analysis
Discovery driven
Exploratory data analysis
Predictive modeling
Database segmentation
Link analysis
Deviation detection
DataMining
Transformation
Preprocessing Knowledge
Selection
Patterns
Transformed
Preprocessed Data
Target Data
Original
Data
Data
AI
Machine learning
Statistics
Databases and data warehousing
High performance computing
Visualization
T. Nouri Data Mining
15
Need for data mining tools
Scalability
Efficient and sufficient sampling
In-memory vs. disk-based processing
High performance computing
Automation
Ease of use
Using prior knowledge
T. Nouri Data Mining
23
Types of data mining tasks
Association rules
Sequence mining
Classification(decision tree etc.)
Clustering
Deviation detection
K-nearest neighbors
If A and B then C
If A and not B then C
If A and B and C then D etc.
701 38 NewYork
701 38 London
701 38 Cairo
11 531 NewYork
11 531 Cairo
2 43 Sports High
3 68 Family Low High
4 32 Truck Low
5 20 Family High
High Low
Distance-based
Numeric
Euclidean distance (root of sum of squared
differences along each dimension)
Angle between two vectors
Categorical
Number of common features (categorical)
Partition-based
Enumerate partitions and score each
T. Nouri Data Mining
44
K-means algorithm
Initial seeds
New centers
Final centers
outlier
Neighborhood
5 of class
3 of class