You are on page 1of 13

TOPIC 1b

HISTORY, EVOLUTION AND


CLASSIFICATION OF DATA MINING
OBJECTIVES
To introduce about Data Mining (DM) and its
relationship with data and knowledge

To discuss the history, evolution and motivation of DM ✅

To discuss DM techniques, tasks, applications and some


major issues
HISTORY OF DATA MINING

• The term “data mining” appeared around 1990 in the database community.
• Gregory Piatetsky-Shapiro coined the term “Knowledge Discovery in Databases”
for the first workshop on the same topic (KDD-1989) and this term become more
popular in AI and Machine Learning Community.
• Currently, Data Mining and KDD are used interchangeably.
• Since about 2007, “Predictive Analytics” and since 2011, “Data Science” terms
were also used to describe this field
(Source: Coenen, 2011)
ORIGIN OF DATA MINING
• Draws ideas from machine learning/AI, pattern
recognition, statistics, and database systems

AI,
• Traditional techniques may be unsuitable due Statistics
Machine Learning,
to data that is Pattern
• Large-scale Recognition
• High dimensional
Data Mining
• Heterogeneous
• Complex
• Distributed Database
systems
• A key component of the emerging field of data
science and data-driven discovery
THE EVOLUTION OF DATA MINING
Evolutionary Step Enabling Technologies Business Question Characteristics

Data Collection Computers, tapes, "What was my total revenue Retrospective, static data
(1960s) disks in the last five years?" delivery
Data Access RDBMS, SQL, ODBC "What were unit sales in New Retrospective, dynamic data
(1980s) England last March? delivery at record level

Data OLAP, multidimensional "What were unit sales in New Retrospective, dynamic data
Warehousing databases, England last March? Drill delivery at multiple levels
(1990s) Data warehouses down to Boston”

Data Mining Advanced algorithms, “What’s likely to happen to Prospective, proactive


(Emerging Today) Multiprocessor computers, Boston unit sales next informative delivery
Massive databases month? Why?”

Source: www.thearling.com
MOTIVATION OF DATA MINING
Growth of data both in commercial and scientific databases
due to advances in data generation and collection technologies

• Commercial Viewpoint
o Lots of data is being collected and warehoused
Amazon, Shopee, Lazada
o Computers have become cheaper and more powerful
(E-commerce)

• Scientific Viewpoint
o Data collected and stored at enormous speeds
o Helps scientists in automated analysis of massive
datasets

https://www.ncdc.noaa.gov/sotc/global/202003
KNOWLEDGE DISCOVERY (KDD) PROCESS
• This is a view from typical Pattern Evaluation
database systems and data
warehousing communities
Data Mining
• Data mining plays an
essential role in the
knowledge discovery Task-relevant Data
process
Data Warehouse Selection

Data Cleaning

Data Integration

Databases
DATA MINING : 1-STEP OF KDD

Knowledge Discovery in Databases

Data mining

Task
Techniques
CLASSIFICATION OF DATA MINING SYSTEMS

Kinds of Knowledge Techniques used

Kinds of Database Categorizing data (Classification) Machine learning


Find relationship (Association) Pattern recognition
Relational Neural Network
Subdivide similar data (Clustering)
Data warehouse Naïve-Bayes
Make prediction K-nearest neighbour
Transactional DB … Rough Set
Advanced DB system \ Application adapted
Statistic
Flat files
WWW Finance
Marketing
Medical
Stock
Telecommunication
WHY DM? POTENTIAL APPLICATIONS
Data analysis and decision support
1. Market analysis and management
Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
2. Risk analysis and management
Forecasting, customer retention, improved underwriting, quality
control, competitive analysis
3. Fraud detection and detection of unusual patterns (outliers)

Other Applications
4. Text mining (news group, email, documents) and Web mining
5. Stream data mining
6. DNA and bio-data analysis
MARKET ANALYSIS & MANAGEMENT

CUSTOMER REQUIREMENT PROVISION OF SUMMARY


CUSTOMER PROFILING INFORMATION
ANALYSIS
Clustering or classifying the 1. identifying the best products for 1. multidimensional summary
customers based on the different customers reports
products they purchase 2. predict what factors will attract 2. statistical summary
new customers information (data central
tendency and variation)
REFERENCES

1. Tan, Steinbach, Karpatne, Kumar, Lecture Notes, Chapter 1, Introduction to Data Mining, 2 nd Edition, 2018
2. Pang-Ning Tan, Michael Steinbach & Vipin Kumar, Introduction to Data Mining, Addison Wesley, 2019.
3. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 3rd Edition, Morgan Kaufmann, 2012.
4. Coenen, Frans. Data mining: past, present and future. Knowledge Engineering Review, 26(1), 25-29, 2011
5. Gregory Piatetsky-Shapiro, Data Science: Past, Present, and Future KDnuggets 1© Kdnuggets, 2016
THANK YOU
Shuzlina Abdul Rahman | Sofianita Mutalib | Siti Nur Kamaliah Kamarudin | Farah Syazwani Mohd Rashid

You might also like