Professional Documents
Culture Documents
1
Meaning of data mining
Extracting information from huge sets of data
Unsuspected relationships
2
Cont…
Summarize the data in novel ways that are both understandable
and useful to the data owner
Popularly known as Knowledge Discovery in Databases (KDD)
Extracted knowledge can be used for any of the following applications
Market Analysis
Fraud Detection
Customer Retention
Production Control
Science Exploration
3
Knowledge discovery steps
Data cleaning
4
Cont…
Data integration
common source
Data relevant to the analysis task are retrieved from the database
5
Cont…
Data transformation
Data are transformed and consolidated into forms appropriate for
mining
Data mining
Intelligent methods are applied to extract data patterns
Pattern evaluation
Identifying the truly interesting patterns representing knowledge based
on interestingness measure
6
Cont…
Knowledge representation
Visualization and knowledge representation techniques are used to
present mined knowledge to users
7
What kinds of data can be mined?
Data mining can be applied to any kind of data as long as the data are meaningful for a
target application
The most basic forms of data for mining applications are:
Relational database
o Collection of tables, each of which is assigned a unique name
o Each table consists of a set of attributes and usually stores a large set of
tuples
o Most commonly available and richest information repositories for searching
trends
8
Cont…
Data Warehouses
9
Cont…
Transactional data
oset of records representing transactions
oEach with a time stamp, an identifier and a set of items
oTransaction files could also be descriptive data for items
oTypical data mining analysis on transactional data is
Market basket analysis or association rules
Multimedia databases
o Include video, images, audio and text media
10
Cont…
Spatial databases
o Store geographical information like maps, and global or regional
positioning
Time-series databases
o Contain time related data like stock market data or logged activities
Use the model to predict the class of objects whose class label is unknown
13
Data mining functionalities
Characterization
o Summarization of general features of objects in a target class
14
Cont…
Discrimination
o Comparison of the general features of objects between two classes referred
to as the target class and the contrasting class
Association analysis
o Studies the frequency of items occurring together in transactional database
15
Cont…
Classification
o Organization of data in given classes
o Use a training set where all objects are already associated with known
class labels
o Classification algorithm learns from the training set and builds a model
16
Cont…
o Classification model can be represented in various forms
17
Cont…
Prediction
o The major idea is to use a large number of past values to consider
probable future values
Clustering
o Organization of data in classes
o In clustering, class labels are unknown and it is up to the clustering
algorithm to discover acceptable classes
o clustering approaches are
Maximizing the similarity between objects in a same class (intra-
class similarity)
18
Cont…
Minimizing the similarity between objects of different classes (inter-
class similarity)
Outlier analysis
o Data elements that cannot be grouped in a given class or cluster
o Data set that do not comply with the general behavior or model of the
data
o Many data mining methods discard outliers as noise or exceptions
o However, in some applications (e.g., fraud detection) the rare events
can be more interesting than the more regularly occurring ones
19
Technologies used in data mining
Data mining has incorporated many techniques from other domains
20
Essence of data mining
Moving toward the Information Age
o Vast amounts of data are collected daily analyzing such data is an important need
every day
21
Cont…
o Explosively growing, widely available, and gigantic body of data makes our
22
Cont…
23
Cont…
oComputer hardware technology
available for
24
Cont…
Transaction management
Information retrieval
Data analysis
WWW
25
Cont…
The abundance of data, coupled with the need for powerful data analysis
tools
o Described as a data rich but information poor situation
o Fast-growing, tremendous amount of data, collected and stored in large
and numerous data repositories
Exceeded our human ability for comprehension without powerful
tools
o Widening gap between data and information calls for
Systematic development of data mining tools
That can turn data tombs into “golden nuggets” of knowledge
26
Relationship b/n Data mining, Data warehousing and OLAP
o Data warehouses are constructed via a process of data cleaning, data integration,
o The following figure shows typical framework for construction and use of a data
27
28
Cont…
o Data in a data warehouse are organized around major subjects
o Data cube allows the precomputation and fast access of summarized data
29
Cont…
OLAP (Online analytical processing)
o Use background knowledge regarding the domain of the data being studied
30
31
32
Issues in data mining
Many pending issues have to be addressed in data mining
Performance issues
o The issues of scalability and efficiency of the data mining methods when
34
Cont…
Data source issues
o Many issues related to the data sources are exist such as
Diversity of data types
Data glut problem
Storing different types of data in a variety of repositories
o Different kinds of data and sources may require distinct algorithms and
methodologies
o Proliferation of heterogeneous data sources poses important challenges on
the database community and data mining community
35
Cont…
Security and social issues
o Sensitive and private information about individuals or companies is
gathered and stored
o This information is collected for
Customer profiling, user behavior understanding, correlating personal data
with other information
Screen real-estate
37
Thank You
38