0% found this document useful (0 votes)

13 views58 pages

Chap1 Introduction

Uploaded by

viet.nguyenba4605

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views58 pages

Chap1 Introduction

Uploaded by

viet.nguyenba4605

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Mining and Application

Lecture 01. Introduction to Data Mining

Content
 Why data mining?
 What is data mining?
 Knowledge Discovery (KDD) Process
 A Multi-Dimensional View of Data Mining
 Data Mining Tasks
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary

2
Why Data Mining?
 The Explosive Growth of Data: from terabytes to petabytes
 Data collection and data availability
 Automated data collection tools, database systems, Web, computerized society
 Major sources of abundant data
 Business: Web, e-commerce, transactions, stocks, …
 Science: Remote sensing, bioinformatics, scientific simulation, …
 Society and everyone: news, digital cameras, YouTube
 Computers have become cheaper and more powerful
 Expectations
 Gathered data will have value either for the purpose collected or for a purpose not envisioned.
 Data rich but information poor!
 What does those data mean?
 How to analyze data?
 Traditional techniques infeasible for raw data 3
Origins of Data Mining
We are drowning in data but
starving for knowledge!
“Necessity is the mother of
invention”—Data mining—
Automated analysis of
massive data sets

We are data rich, but information poor.

“Necessity is the mother of invention”. - Plato

4
Origins of Data Mining
 Draws ideas from machine learning/AI, pattern recognition, statistics, and
database systems

 Traditional techniques may be unsuitable due to data that is

 Large-scale
 High dimensional
 Heterogeneous
 Complex
 Distributed

 A key component of the emerging field of data science and data-driven discovery

5
Data Mining and Related Field

Machine Pattern Statistics

Learning Recognition

Applications Data Mining Visualization

Algorithm Database High-Performance

Technology Computing

6
Why Data Mining?
 Great opportunities to improve productivity in all walks of life
 Great Opportunities to Solve Society’s Major Problems

7
Content
 Why data mining?
 What is data mining?
 Knowledge Discovery (KDD) Process
 A Multi-Dimensional View of Data Mining
 Data Mining Tasks
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary

8
What Is Data Mining?
 Data mining (knowledge discovery from data)
 Extraction of interesting (non-trivial, implicit, previously unknown and potentially
useful) patterns or knowledge from huge amount of data
 Alternative names
 Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern
analysis, data archeology, data dredging, information harvesting, business intelligence,
etc.
 Is everything “data mining”?
 When is data mining used?

9
What Is Data Mining?
 1. Hãy hiển thị số tiền Ông Smith trong ngày 5 tháng Giêng?
 2. Có bao nhiêu nhà đầu tư nước ngoài mua cổ phiếu X
trong tháng trước ?
 3. Hiển thị mọi cổ phiếu trong CSDL với mệnh giá tăng ?
 4. Các cổ phiếu tăng giá có đặc trưng gì?
 5. Hy vọng gì về cổ phiếu X trong tuần tiếp theo ?
 6. Trong tháng tiếp theo, sẽ có bao nhiêu đoàn viên công
đoàn không trả được nợ của họ?
 7. Những người mua sản phẩm Y có đặc trưng gì ?
Potential Applications
 Data analysis and decision support
 Market analysis and management
 Target marketing, customer relationship management (CRM), market basket analysis, cross
selling, market segmentation
 Risk analysis and management
 Forecasting, customer retention, improved underwriting, quality control, competitive
analysis
 Fraud detection and detection of unusual patterns (outliers)
 Other Applications
 Text mining (news group, email, documents) and Web mining
 Stream data mining
 Bioinformatics and bio-data analysis

11
Content
 Why data mining?
 What is data mining?
 Knowledge Discovery (KDD) Process
 A Multi-Dimensional View of Data Mining
 Data Mining Tasks
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary

12
Knowledge Discovery (KDD) Process
 This is a view from typical Pattern Evaluation/ 5

database systems and data Presentation

4
warehousing communities 3
 Data mining plays an Data Mining Patterns
essential role in the
knowledge discovery Task-relevant Data
process 2

Data Warehouse Selection/Transformation

Data
Cleaning
1 Data Integration

Data Sources 13
Knowledge Discovery (KDD) Process (cont.)
 Learning the application domain
 relevant prior knowledge and goals of application
 Identifying a target data set: data selection
 Data processing
 Data cleaning (remove noise and inconsistent data)
 Data integration (multiple data sources maybe combined)
 Data selection (data relevant to the analysis task are retrieved from database)
 Data transformation (data transformed or consolidated into forms appropriate for mining)
(Done with data preprocessing)
 Data mining (an essential process where intelligent methods are applied to extract
data patterns)
 Pattern evaluation (indentify the truly interesting patterns)
 Knowledge presentation (mined knowledge is presented to the user with
visualization or representation techniques)
 Use of discovered knowledge
14
KDD Process: A View from ML and Statistics

Input Data Data Pre- Data Post-

Processing Mining Processing

Data integration Pattern discovery Pattern evaluation

Normalization Association & correlation Pattern selection
Feature selection Classification
Pattern interpretation
Clustering
Dimension reduction Pattern visualization
Outlier analysis
…………

 This is a view from typical machine learning and statistics communities

15
Multi-Dimensional View of Data Mining
 Data to be mined
 Database data (extended-relational, object-oriented, heterogeneous, legacy), data warehouse,
transactional data, stream, spatiotemporal, time-series, sequence, text and web, multi-media, graphs
& social and information networks
 Knowledge to be mined (or: Data mining functions)
 Characterization, discrimination, association, classification, clustering, trend/deviation, outlier
analysis, etc.
 Descriptive vs. predictive data mining
 Multiple/integrated functions and mining at multiple levels
 Techniques utilized
 Data-intensive, data warehouse (OLAP), machine learning, statistics, pattern recognition,
visualization, high-performance, etc.
 Applications adapted
 Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text
mining, Web mining, etc.
16
Content
 Why data mining?
 What is data mining?
 Knowledge Discovery (KDD) Process
 A Multi-Dimensional View of Data Mining
 Data Mining Tasks
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary

17
Data Mining Tasks
 Prediction Tasks
 Use some variables to predict unknown or future values of other variables
 Description Tasks
 Find human-interpretable patterns that describe the data.

Common data mining tasks

 Classification [Predictive]
 Clustering [Descriptive]
 Association Rule Discovery [Descriptive]
 Sequential Pattern Discovery [Descriptive]
 Regression [Predictive]
 Deviation Detection [Predictive]

18
Data Mining Tasks

19
Tình huống 1

Người đang sử dụng

thẻ ID = 1234 thật sự là
chủ nhân của thẻ hay là
một tên trộm?
Tình huống 2

Marital Taxable
Tid Refund Evade
Status Income
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
Ông A (Tid = 100)
6 No Married 60K No có khả năng trốn
7 Yes Divorced 220K No
8 No Single 85K Yes thuế???
9 No Married 75K No
10 No Single 90K Yes
10
Tình huống 3
Ngày mai cổ
phiếu STB sẽ
tăng???
Tình huống 4
Khóa MãSV MônHọc1 MônHọc2 … TốtNghiệp
2004 1 9.0 8.5 … Có
2004 2 6.5 8.0 … Có
2004 3 4.0 2.5 … Không
2004 8 5.5 3.5 … Không
2004 14 5.0 5.5 … Có
… … … … … …
2005 90 7.0 6.0 … Có (80%)
2006 24 9.5 7.5 … Có (90%)
2007 82 5.5 4.5 … Không (45%)
2008 47 2.0 3.0 … Không (97%)
… … … … … …

Làm sao xác định được khả năng tốt

nghiệp của một sinh viên hiện tại? 23
Data Mining Tasks

Data
Tid Refund Marital Taxable
Status Income Cheat

1 Yes Single 125K No

2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
11 No Married 60K No
12 Yes Divorced 220K No
13 No Single 85K Yes
14 No Married 75K No
15 No Single 90K Yes
10

Milk
24
Main data mining tasks: Classification
 Classification - is the task of generalizing known structure to apply to new
data. For example, an e-mail program might attempt to classify an e-mail as
"legitimate" or as "spam".
 In machine learning and statistics, classification is the problem of identifying
to which of a set of categories a new observation belongs, on the basis of a
training set of data containing observations whose category membership is
known.
 In the terminology of machine learning, classification is considered an
instance of supervised learning, i.e. learning where a training set of correctly
identified observations is available. The corresponding unsupervised
procedure is known as clustering, and involves grouping data into categories
based on some measure of inherent similarity or distance.
25
Main data mining tasks: Classification
 Given a collection of records (training set )
 Each record contains a set of attributes, one of the attributes is the class.
 Find a model for class attribute as a function of the
values of other attributes.
 Goal: previously unseen records should be assigned a
class as accurately as possible.
 A test set is used to determine the accuracy of the model. Usually, the given data set is
divided into training and test sets, with training set used to build the model and test set
used to validate it.

26
Classification Example
Model for predicting credit worthiness

Class E m p lo y e d
# years at
Level of Credit
Tid Employed present No Yes
Education Worthy
address
1 Yes Graduate 5 Yes
2 Yes High School 2 No No E d u c a tio n
3 No Undergrad 1 No
4 Yes High School 10 Yes { H ig h s c h o o l,
G ra d u a te
… … … … … U n d e rg ra d }
10

N um ber of N um ber of
y e a rs y e a rs

> 3 yr < 3 yr > 7 yrs < 7 y rs

Yes No Y es No

27
Classification Example

# years at
Level of Credit
Tid Employed present
Education Worthy
address
1 Yes Undergrad 7 ?
# years at 2 No Graduate 3 ?
Level of Credit
Tid Employed present 3 Yes High School 2 ?
Education Worthy
address
… … … … …
1 Yes Graduate 5 Yes 10

2 Yes High School 2 No

3 No Undergrad 1 No
4 Yes High School 10 Yes
… … … … …
10
Test
Set

Training
Learn
Model
Set Classifier

28
Examples of Classification Task
 Classifying credit card transactions as legitimate or
fraudulent
 Classifying land covers (water bodies, urban areas,
forests, etc.) using satellite data
 Categorizing news stories as finance, weather,
entertainment, sports, etc
 Identifying intruders in the cyberspace
 Predicting tumor cells as benign or malignant
 Classifying secondary structures of protein as
alpha-helix, beta-sheet, or random coil

29
Classification: Application 1
 Direct Marketing
 Goal: Reduce cost of mailing by targeting a set of
consumers likely to buy a new cell-phone product.
 Approach:
 Use the data for a similar product introduced before.
 We know which customers decided to buy and which decided
otherwise. This {buy, don’t buy} decision forms the class attribute.
 Collect various demographic, lifestyle, and company-interaction
related information about all such customers.
 Type of business, where they stay, how much they earn, etc.
 Use this information as input attributes to learn a classifier model.
30
Classification: Application 2
 Fraud Detection
 Goal: Predict fraudulent cases in credit card transactions.
 Approach:
 Use credit card transactions and the information on its account-holder
as attributes.
 When does a customer buy, what does he buy, how often he pays on time, etc
 Label past transactions as fraud or fair transactions. This forms the
class attribute.
 Learn a model for the class of the transactions.
 Use this model to detect fraud by observing credit card transactions
on an account.

31
Classification: Application 3
 Customer Attrition/Churn:
 Goal: To predict whether a customer is likely to be lost
to a competitor.
 Approach:
 Use detailed record of transactions with each of the past and present
customers, to find attributes.
 How often the customer calls, where he calls, what time-of-the day he calls most,
his financial status, marital status, etc.
 Label the customers as loyal or disloyal.
 Find a model for loyalty.
32
Main data mining tasks: Classification
Several classification algorithms include:
 Linear classifiers: Fisher's linear discriminant, Logistic regression, Naive
Bayes classifier, Perceptron
 Support vector machines: Least squares support vector machines
 Quadratic classifiers
 Kernel estimation: k-nearest neighbor
 Boosting (meta-algorithm)
 Decision trees: Random forests
 Neural networks
 Learning vector quantization

33
Main data mining tasks - Deviation/Anomaly/Change Detection
 Detect significant deviations from normal
behavior
 Applications:
 Credit Card Fraud Detection
 Network Intrusion Detection
 Identify anomalous behavior from sensor
networks for monitoring and surveillance.
 Detecting changes in the global forest cover.

34
Main data mining tasks - Association rule
 Association rule learning (Dependency modelling) - Searches for
relationships between variables.
 Association rule learning is a method for discovering interesting
relations between variables in large databases. It is intended to
identify strong rules discovered in databases using some
measures of interestingness.
 For example, a supermarket might gather data on customer
purchasing habits. Using association rule learning, the
supermarket can determine which products are frequently bought
together and use this information for marketing purposes. This is
sometimes referred to as market basket analysis.
35
Main data mining tasks - Association rule
 In order to select interesting rules from the set of all possible rules,
constraints on various measures of significance and interest are used. The
best-known constraints are minimum thresholds on support and confidence.
 Association rules are usually required to satisfy a user-specified minimum
support and a user-specified minimum confidence at the same time.
Association rule generation is usually split up into two separate steps:
1. A minimum support threshold is applied to find all frequent item-sets in a
database.
2. A minimum confidence constraint is applied to these frequent item-sets in
order to form rules.

36
Association Rule Discovery: Definition
 Given a set of records each of which contain some number of
items from a given collection
 Produce dependency rules which will predict occurrence of an item
based on occurrences of other items.
TID Items
1 Bread, Coke, Milk Rules Discovered:
2 Beer, Bread {Milk} --> {Coke}
3 Beer, Coke, Diaper, Milk {Diaper, Milk} --> {Beer}
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk

37
Main data mining tasks - Association rule
 Many algorithms for generating association rules were presented over time.
 Apriori algorithm
 Eclat algorithm (Equivalence Class Transformation)
 FP-growth algorithm (FP: Frequent Pattern), AprioriDP
 …
 Other types of association mining
 Multi-Relation Association Rules
 Context Based Association Rules
 …

38
Association Analysis: Applications
 Market-basket analysis
 Rules are used for sales promotion, shelf management, and inventory management

 Telecommunication alarm diagnosis

 Rules are used to find combination of alarms that occur together frequently in the
same time period

 Medical Informatics
 Rules are used to find combination of patient symptoms and test results associated
with certain diseases

39
Association Rule Discovery: Application
 Supermarket shelf management.
 Goal: To identify items that are bought together by sufficiently many
customers.
 Approach: Process the point-of-sale data collected with barcode scanners
to find dependencies among items.
 A classic rule --
 If a customer buys diaper and milk, then he is very likely to buy beer:

40
Main data mining tasks - Clustering
 Clustering - is the task of discovering groups and structures in the data that
are in some way or another "similar", without using known structures in the
data.
 Cluster analysis or clustering is the task of grouping a set of objects in such a
way that objects in the same group (called a cluster) are more similar (in
some sense or another) to each other than to those in other groups (clusters).
 Clustering is a main task of exploratory data mining, and a common
technique for statistical data analysis, used in many fields, including
machine learning, pattern recognition, image analysis, information retrieval,
and bioinformatics.

41
Main data mining tasks - Clustering
 Finding groups of objects such that the objects in a group will be similar (or
related) to one another and different from (or unrelated to) the objects in other
groups
Euclidean Distance Based Clustering in 3-D space.
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized

42
Main data mining tasks – Clustering (cont.)
 Clustering algorithms can be categorized based on their cluster model
 Connectivity based clustering (hierarchical clustering)
 Centroid-based clustering
 Distribution-based clustering
 Density-based clustering
 In recent years considerable effort has been put into improving the
performance of existing algorithms.
 The researches of clustering algorithms?

43
Main data mining tasks – Clustering
 Given a set of data points, each having a set of
attributes, and a similarity measure among them, find
clusters such that
 Data points in one cluster are more similar to one another.
 Data points in separate clusters are less similar to one another.
 Similarity Measures:
 Euclidean Distance if attributes are continuous.
 Other Problem-specific Measures.

44
Applications of Cluster Analysis
Understanding
 Custom profiling for targeted marketing
 Group related documents for browsing
 Group genes and proteins that have similar functionality
 Group stocks with similar price fluctuations
Summarization
 Reduce the size of large data sets
Courtesy: Michael Eisen
Clusters for Raw SST and Raw NPP
90

Use of K-means to partition Sea

Surface Temperature (SST) and
60

Land Cluster 2

30 Net Primary Production (NPP) into

Land Cluster 1 clusters that reflect the Northern
latitude

0
and Southern Hemispheres.
Ice or No NPP

-30

Sea Cluster 2

-60

Sea Cluster 1

-90
-180 -150 -1 20 -90 -60 -30 0 30 60 90 1 20 150 180
Clus ter
longitude

45
Clustering: Application 1
 Market Segmentation:
 Goal: subdivide a market into distinct subsets of customers
where any subset may conceivably be selected as a market
target to be reached with a distinct marketing mix.
 Approach:
 Collect different attributes of customers based on their geographical and lifestyle
related information.
 Find clusters of similar customers.
 Measure the clustering quality by observing buying patterns of customers in same
cluster vs. those from different clusters.

46
Clustering: Application 2
 Document Clustering:
 Goal: To find groups of documents that are similar to each
other based on the important terms appearing in them.
 Approach: To identify frequently occurring terms in each
document. Form a similarity measure based on the frequencies
of different terms. Use it to cluster.
 Gain: Information Retrieval can utilize the clusters to relate a
new document or search term to clustered documents.

47
Main data mining tasks - Regression
Predict a value of a given continuous valued variable based on the
values of other variables, assuming a linear or nonlinear model of
dependency.
Extensively studied in statistics, neural network fields.
Examples:
Predicting sales amounts of new product based on advetising
expenditure.
Predicting wind velocities as a function of temperature,
humidity, air pressure, etc.
Time series prediction of stock market indices.
48
Content
 Why data mining?
 What is data mining?
 Knowledge Discovery (KDD) Process
 A Multi-Dimensional View of Data Mining
 Data Mining Tasks
 Major Issues in Data Mining
 A Brief History of Data Mining and Data Mining Society
 Summary

49
Major Issues in Data Mining
 Mining methodology and User interaction
 Mining different kinds of knowledge
 DM should cover a wide spectrum of data analysis and knowledge discovery tasks
 Enable to use the database in different ways
 Require the development of numerous data mining techniques
 Interactive mining of knowledge at multiple levels of abstraction
 Difficult to know exactly what will be discovered
 Allow users to focus the search, refine data mining requests
 Incorporation of background knowledge
 Guide the discovery process
 Allow discovered patterns to be expressed in concise terms and different levels of abstraction
 Data mining query languages and ad hoc data mining
 High-level query languages need to be developed
 Should be integrated with a DB/DW query language
50
Major Issues in Data Mining
 Presentation and visualization of results
 Knowledge should be easily understood and directly usable
 High level languages, visual representations or other expressive forms
 Require the DM system to adopt the above techniques

 Handling noisy or incomplete data

 Require data cleaning methods and data analysis methods that can handle noise

 Pattern evaluation – the interestingness problem

 How to develop techniques to access the interestingness of discovered patterns,
especially with subjective measures bases on user beliefs or expectations

51
Major Issues in Data Mining
 Performance Issues
 Efficiency and scalability
 Huge amount of data
 Running time must be predictable and acceptable
 Parallel, distributed and incremental mining algorithms
 Divide the data into partitions and processed in parallel
 Incorporate database updates without having to mine the entire data again from
scratch

 Diversity of Database Types

 Other database that contain complex data objects, multimedia data,
spatial data, etc.
 Expect to have different DM systems for different kinds of data
 Heterogeneous databases and global information systems
 Web mining becomes a very challenging and fast-evolving field in data mining

52
A Brief History of Data Mining Society
 1989 IJCAI Workshop on Knowledge Discovery in Databases
 Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, 1991)
 1991-1994 Workshops on Knowledge Discovery in Databases
 Advances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth,
and R. Uthurusamy, 1996)
 1995-1998 International Conferences on Knowledge Discovery in Databases and Data Mining
(KDD’95-98)
 Journal of Data Mining and Knowledge Discovery (1997)
 ACM SIGKDD conferences since 1998 and SIGKDD Explorations
 More conferences on data mining
 PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001), (IEEE) ICDM (2001), WSDM
(2008), etc.
 ACM Transactions on KDD (2007)
53
Conferences and Journals on Data Mining
 KDD Conferences
 ACM SIGKDD Int. Conf. on Knowledge Discovery in Databases and
Data Mining (KDD)
 SIAM Data Mining Conf. (SDM)
 (IEEE) Int. Conf. on Data Mining (ICDM)
 Conf. on Principles and practices of Knowledge Discovery and Data
Mining (PKDD)
 Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD)
 Other related conferences
 ACM SIGMOD, VLDB, (IEEE) ICDE
 WWW, SIGIR, ICML, CVPR, NIPS
 Journals
 Data Mining and Knowledge Discovery (DAMI or DMKD)
 IEEE Trans. On Knowledge and Data Eng. (TKDE)
 KDD Explorations
 ACM Trans. on KDD
54
Summary
 Data mining: Discovering interesting patterns and knowledge from
massive amount of data
 A KDD process includes data cleaning, data integration, data selection,
transformation, data mining, pattern evaluation, and knowledge
presentation
 Mining can be performed in a variety of data
 Data mining functionalities: characterization, discrimination, association,
classification, clustering, trend and outlier analysis, etc.
 Data mining technologies and applications
 Major issues in data mining
55
References
1. Tan, Steinbach, Karpatne, Kumar, Introduction to Data
Mining, 2nd Edition, 2018,
2. Jiawei Han, Micheline Kamber. “Data Mining: Concepts
and Techniques”, Third Edition, Morgan Kaufmann
Publishers, 2012
3. Fayyad, et.al. Advances in Knowledge Discovery and
Data Mining, 1996

56
Bài tập
1. Thế nào là khai thác dữ liệu? Cho ví dụ minh họa
2. Các kiểu dữ liệu, thông tin nào có khả năng được sử dụng trong qui trình
KDD?
3. Cho ví dụ thực tế về việc áp dụng KTDL đem đến thành công trong kinh
doanh (ngoài các ví dụ đã có trong bài giảng)
 Gợi ý: Bài toán tăng doanh thu của thị trường bán lẻ. Bài toán xây dựng
kế hoạch quảng cáo và khuyến mãi
 Loại DL nào được thu thập? Loại tác vụ nào của KTDL được sử dụng?
Có thể thay bằng phương pháp truy vấn DL hay phân tích thống kê đơn
giản không?
Lưu ý: Cần tìm vì dụ ứng dụng thực tế và kèm địa chỉ tài liệu hay website có
giới thiệu về ứng dụng này.
58

Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
17 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
2 pages
Data Mining for Business Insights
100% (3)
Data Mining for Business Insights
11 pages
Unit 1 Datamining
No ratings yet
Unit 1 Datamining
16 pages
Introduction to Data Mining Techniques
No ratings yet
Introduction to Data Mining Techniques
15 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
12 pages
Understanding Data Mining Processes
No ratings yet
Understanding Data Mining Processes
6 pages
Data Mining (Introduction)
No ratings yet
Data Mining (Introduction)
31 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
29 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Comprehensive Data Mining Guide
No ratings yet
Comprehensive Data Mining Guide
52 pages
FDS Unit01
No ratings yet
FDS Unit01
10 pages
Data Mining: Applications and Techniques
No ratings yet
Data Mining: Applications and Techniques
60 pages
Unit III
No ratings yet
Unit III
101 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
43 pages
Data Mining
No ratings yet
Data Mining
25 pages
Data Mining Fundamentals and Applications
No ratings yet
Data Mining Fundamentals and Applications
21 pages
Data Mining Essentials for Analysts
No ratings yet
Data Mining Essentials for Analysts
73 pages
Data Mining Overview by Archana Ketkar
No ratings yet
Data Mining Overview by Archana Ketkar
24 pages
Data Mining: Key Concepts and Steps
No ratings yet
Data Mining: Key Concepts and Steps
25 pages
Week-1-Introduction To Data Mining
No ratings yet
Week-1-Introduction To Data Mining
43 pages
CH 2
No ratings yet
CH 2
37 pages
Data Mining
No ratings yet
Data Mining
395 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Module1 1 Introduction
No ratings yet
Module1 1 Introduction
27 pages
Understanding Data Mining Techniques
No ratings yet
Understanding Data Mining Techniques
33 pages
Data Minng
No ratings yet
Data Minng
20 pages
Understanding Data Mining Concepts
No ratings yet
Understanding Data Mining Concepts
30 pages
Understanding Data Mining Techniques
No ratings yet
Understanding Data Mining Techniques
11 pages
Data Mining: Overview and Applications
No ratings yet
Data Mining: Overview and Applications
24 pages
Data Mining Essentials for Students
No ratings yet
Data Mining Essentials for Students
15 pages
Unit - I
No ratings yet
Unit - I
22 pages
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
No ratings yet
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
14 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
27 pages
Introduction
No ratings yet
Introduction
26 pages
Data Mining for Business Insights
100% (1)
Data Mining for Business Insights
39 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
59 pages
Data Mining and Knowledge Discovery Guide
No ratings yet
Data Mining and Knowledge Discovery Guide
21 pages
Unit 1
No ratings yet
Unit 1
19 pages
1 - Lect 1 & 2 Data Mining
No ratings yet
1 - Lect 1 & 2 Data Mining
20 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
23 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
Data Mining and Warehousing-1
No ratings yet
Data Mining and Warehousing-1
43 pages
Understanding Data Mining Processes
No ratings yet
Understanding Data Mining Processes
49 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
62 pages
Data Mining Techniques and Processes
No ratings yet
Data Mining Techniques and Processes
22 pages
Understanding Data Mining Processes
No ratings yet
Understanding Data Mining Processes
15 pages
Understanding Data Mining Essentials
No ratings yet
Understanding Data Mining Essentials
79 pages
Understanding Data Mining Techniques
No ratings yet
Understanding Data Mining Techniques
31 pages
Data Mining and Warehousing Explained
No ratings yet
Data Mining and Warehousing Explained
42 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
13 pages
Understanding Data Mining Techniques
No ratings yet
Understanding Data Mining Techniques
7 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
43 pages
Unit - 4 Introduction To Data Mining
No ratings yet
Unit - 4 Introduction To Data Mining
71 pages
Data Mining Survey Overview
No ratings yet
Data Mining Survey Overview
8 pages
Affidavit of Employment
No ratings yet
Affidavit of Employment
1 page
(SHIHAB) Ipe-206
No ratings yet
(SHIHAB) Ipe-206
5 pages
Data Analytics Internship Assignment
No ratings yet
Data Analytics Internship Assignment
6 pages
Mining Geology and Exploration Overview
No ratings yet
Mining Geology and Exploration Overview
17 pages
Manish (OBS & GYNE) - Compressed
No ratings yet
Manish (OBS & GYNE) - Compressed
26 pages
Metalorganic Frameworks Mofs As Catalysts 2022
No ratings yet
Metalorganic Frameworks Mofs As Catalysts 2022
785 pages
Basic Intelligence
50% (2)
Basic Intelligence
18 pages
Radiation Management in Dental Practice
No ratings yet
Radiation Management in Dental Practice
4 pages
Agarose Gel Electrophoresis
No ratings yet
Agarose Gel Electrophoresis
11 pages
Dragon Magazine Archive
No ratings yet
Dragon Magazine Archive
2 pages
DS6 Series Soft Start Controllers: Instructional Leafl Et IL03901001E
No ratings yet
DS6 Series Soft Start Controllers: Instructional Leafl Et IL03901001E
8 pages
Spir Form Rev3
100% (1)
Spir Form Rev3
6 pages
Simrit - Vibration Control (Catalog 2007)
No ratings yet
Simrit - Vibration Control (Catalog 2007)
131 pages
Yelp
No ratings yet
Yelp
28 pages
UK Public Health Passenger Locator Form
No ratings yet
UK Public Health Passenger Locator Form
3 pages
NBFCs & MFIs Conference Agenda
No ratings yet
NBFCs & MFIs Conference Agenda
2 pages
Snap 2012 Question Paper in PDF Form
No ratings yet
Snap 2012 Question Paper in PDF Form
5 pages
Questions From 4.58 To 4.96
No ratings yet
Questions From 4.58 To 4.96
14 pages
Lulu Hiring Managers' Recruitment Guide
No ratings yet
Lulu Hiring Managers' Recruitment Guide
38 pages
Coimbatore Land-Use Change Study
No ratings yet
Coimbatore Land-Use Change Study
14 pages
UK Talent Management Challenges
No ratings yet
UK Talent Management Challenges
11 pages
Road Construction for Civil Eng.
No ratings yet
Road Construction for Civil Eng.
20 pages
Argument Essay
0% (1)
Argument Essay
6 pages
Machine Learning Midterm Exam 2010
No ratings yet
Machine Learning Midterm Exam 2010
15 pages
Utility Building Proposed Plan Layout
No ratings yet
Utility Building Proposed Plan Layout
1 page
Cosmetic License
No ratings yet
Cosmetic License
14 pages
CabrioC2x1-100 GDL GDL 3 ClickSeamedMetal LS EN3
No ratings yet
CabrioC2x1-100 GDL GDL 3 ClickSeamedMetal LS EN3
1 page
The Counseling Process Nygca
No ratings yet
The Counseling Process Nygca
8 pages
Austin King
No ratings yet
Austin King
3 pages

Chap1 Introduction

Uploaded by

Chap1 Introduction

Uploaded by

Data Mining and Application

Lecture 01. Introduction to Data Mining

We are data rich, but information poor.

 Traditional techniques may be unsuitable due to data that is

Machine Pattern Statistics

Applications Data Mining Visualization

Algorithm Database High-Performance

database systems and data Presentation

Data Warehouse Selection/Transformation

Input Data Data Pre- Data Post-

Data integration Pattern discovery Pattern evaluation

 This is a view from typical machine learning and statistics communities

Common data mining tasks

Người đang sử dụng

Làm sao xác định được khả năng tốt

1 Yes Single 125K No

> 3 yr < 3 yr > 7 yrs < 7 y rs

2 Yes High School 2 No

 Telecommunication alarm diagnosis

Use of K-means to partition Sea

30 Net Primary Production (NPP) into

 Handling noisy or incomplete data

 Pattern evaluation – the interestingness problem

 Diversity of Database Types

You might also like