Professional Documents
Culture Documents
Some Definitions
• Data : Data are any facts, numbers, or text that can be
processed by a computer.
– operational or transactional data such as, sales, cost, inventory,
payroll, and accounting
– nonoperational data, such as industry sales, forecast data, and
macro economic data
– meta data - data about the data itself, such as logical database
design or data dictionary definitions
Monitor
Metadata & OLAP Server
Other
sources Integrator
Analysis
Operational Extract Query
DBs Transform Data Serve Reports
Load
Refresh
Warehouse Data mining
Data Marts
Office Day
Month
5/14/22 Data Mining: Concepts and Techniques 6
A Sample Data Cube
TV
od
PC U.S.A
Pr
VCR
Country
sum
Canada
Mexico
sum
• Other names
– Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting, business
intelligence, etc.
The KDD process
The non-trivial process of identifying valid, novel,
potentially useful, and ultimately understandable
patterns in data
Multiple process
non-trivial process
Justified patterns/models
valid
novel Previously unknown
Pattern Evaluation
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases
Architecture: Typical Data Mining System
Pattern Evaluation
Knowled
Data Mining Engine ge-Base
• Cluster Analysis
• Outlier Analysis
Which Technologies are used?
• Statistics
– Studies the collection , analysis, interpretation or
explanation and presentation of data
– Statistical model are widely used to model data
• Machine Learning
– Investigate how computers can learn based on data
– Computer is program to automatically learn to
recognize complex pattern and make intelligent
decision base on data
• Supervised learning
• Unsupervised learning
• Semi-supervised learning
Which Technologies are used?
• Business Intelligence
– Provide historical, current, and predictive views of
business operations
• Web search engines
– It is a specialized computer server that searches
for information on the web
Major issues in data Mining
It is partition into five groups
1. Mining methodology
2. User interaction
3. Efficiency and scalability
4. Diversity of data types
5. Data mining and society
Major issues in data Mining