You are on page 1of 2

Overview

Data Mining Primitives


Chapter 4

Data mining without user interaction is usually not


helpful
Users may request a few data mining primitives
to be performed on data.

specification of data to be mined


set of data in which the user is interested
kinds of knowledge to be mined
background knowledge useful in guiding the discovery
process
specification of how knowledge should be visualized

Pieces of a Data Mining Task

Task Relevant Data

What data to mine


list of relevant attributes

Kinds of knowledge to be mined

characterization
discrimination
association
classification
clustering
evolution analysis

Mixable view of the data


name of database or warehouse
name of tables or cubes
conditions for selecting useful data
type = home entertainment

Background knowledge
concept hierarchies

Interestingness Measures

type = fruit

attributes or dimensions (e.g.; name and price)

separate patterns from knowledge

Presentation and visualization of patterns

Background Knowledge:
Concept Hierarchies

Kind of Knowledge to be Mined


Templates or metapatterns may be used to
specify output of results:
P(X: customer, W) AND Q(X,Y) buys(X,Z)
age(X,30..30) AND income(X, 40K49K)
buys(X, VCR) [2.2%, 60%]
Might specify to classify input file of customers as
likely to buy , not likely to buy
indicates 60% confidence is to be used and such cases
should represent 2.2% of all transactions.

Concept Hierarchy
defines a sequence of mappings from a set of low-level
concept to higher-level.
location
time
product

Types of hierarchies

schema hierarchy
set-grouping hierarchy
operation derived hierarchy
rule-based hierarchy

Concept Hierarchies
Schema
total or partial order among an attribute, usually a
warehouse dimension (time, location, etc.)

Set-Group
values for a given attribute are lumped into grups of
constants or range values

Operation defined
automatically derived, clustering, extraction, etc.

Rule-based
hierarchy may be well defined by set of rules

You might also like