You are on page 1of 21

Data Mining Functionalities:

- Characterization and Discrimination


- Associations

- Classification and Prediction

- Cluster Analysis

- Outlier Analysis

- Evolution Analysis

Dr Senthilkumar N C, Asso Prof, SITE


➢ Classification
➢ Regression
➢ Cluster Analysis
➢ Association
Analysis

Dr. RANICHANDRA - VIT - VELLORE


Classification

Sunny

Windy
Weather
Report

Rainy

Cloudy

Dr. RANICHANDRA - VIT - VELLORE


➢ Predict Category(Categorical Variable)

Classification Examples
➢ Classify tumour as benign or malignant

➢ Predict if it will rain tomorrow


➢ Determine if loan application is high-, medium-, or
low-risk
➢ Identify sentiment as positive, negative, or neutral

Dr. RANICHANDRA - VIT - VELLORE


Regression

=20L

=1C

New House plan


(Data)
=30L

=40L

Dr. RANICHANDRA - VIT - VELLORE


➢ Predict Numeric Value (Numeric Variable)

Regression Examples

➢ Estimate demand for a product based on time


of year
➢ Predict score on a test
➢ Determine likelihood of drug effectiveness for
patient
➢ Predict amount of rain

Dr. RANICHANDRA - VIT - VELLORE


Cluster
Analysis

Teenager

Senior

Adult

Dr. RANICHANDRA - VIT - VELLORE


➢ Organize similar items into groups

Clustering Examples

➢ Identify areas of similar topography (desert,


grass, etc.)
➢ Categorize different types of tissues from medical
images
➢ Determine different groups of weather patterns
➢ Discover crime hot spots

Dr. RANICHANDRA - VIT - VELLORE


Association
Analysis

Customer Bought in
Tatacliq.

Dr. RANICHANDRA - VIT - VELLORE


➢ Find rules to capture associations between
items.

Association Examples
➢ Recommend items based on purchase/browsing
history
➢ Have sales on related items often purchased
together
➢ Identify web pages accessed together

Dr. RANICHANDRA - VIT - VELLORE


■ Data Mining Functionalities:
■ Characterization and Discrimination
- Data characterization is a summarization of the general characteristics
or features of a target class of data

Eg. to study the characteristics of software products whose sales


increased by 10% in the last year

- Data discrimination is a comparison of the general features of target


class data objects with the general features of objects from one or a set of
contrasting classes

Eg. the user may like to compare the general features of software
products whose sales increased by 10% in the last year with those whose
sales decreased by at least 30% during the same period
Dr Senthilkumar N C, Asso Prof, SITE
Mining Frequent Patterns, Associations,
and Correlations
-frequent itemset

-(frequent) sequential pattern

-A substructure can refer to different structural forms, such as


graphs, trees, or lattices, which may be combined with itemsets or
subsequences.

-If a substructure occurs frequently, it is called a (frequent)


structured pattern

Dr Senthilkumar N C, Asso Prof, SITE


-Association analysis
buys(X; “computer”) =>buys(X; “software”)

[support = 1%; confidence = 50%]

“computer => software [1%, 50%]”

age(X, “20…29”) ^ income(X, “20K…29K”) => buys(X, “CD


player”) [support = 2%, confidence = 60%]

Association rules are discarded as uninteresting if they do not


satisfy both a minimum support threshold and a minimum
confidence threshold

Dr Senthilkumar N C, Asso Prof, SITE


Classification and Prediction

Classification is the process of finding a model (or


function) that describes and distinguishes data classes or
concepts, for the purpose of being able to use the model
to predict the class of objects whose class label is
unknown

The derived model may be represented in various forms,


such as classification (IF-THEN) rules, decision trees,
mathematical formulae, or neural networks
Dr Senthilkumar N C, Asso Prof, SITE
■ A decision tree is a flow-chart-like
tree structure, where each node
denotes a test on an attribute value,
each branch represents an
of the test,
outcome and tree leaves
represent classes or class
distributions.
■ Decision trees can easily
be converted to classification
rules. Dr Senthilkumar N C, Asso Prof, SITE
■ A neural network, when used for classification, is
typically a collection of neuron-like processing units
with weighted connections between the units.

Dr Senthilkumar N C, Asso Prof, SITE


Regression
classification predicts categorical (discrete, unordered) labels,
Regression models continuous-valued functions.
- used to predict missing or unavailable numerical data values

- Regression analysis (statistical methodology).

Linear Regression :

y = w0 + w1x

Multiple Linear Regression (SAS, SPSS, S-Plus)

y = w0 + w1x 1 + w2x 2

Nonlinear Regression

y = w0 + w1x + w2x 2 + w3x 3

Dr Senthilkumar N C, Asso Prof, SITE


Cluster
Analysis:
Classification and prediction, which analyze class-labeled data objects,

clustering analyzes data objects without consulting a known class label.

The objects are clustered or grouped based on

maximizing the intraclass similarity

Objects within a cluster have high similarity in comparison to


one another

minimizing the interclass similarity

Objects are very dissimilar to objects in other clusters.

Dr Senthilkumar N C, Asso Prof, SITE


Dr Senthilkumar N C, Asso Prof, SITE
Outlier Analysis:

A database may contain data objects that do not comply with the general
behavior or model of the data. These data objects are outliers.

Most data mining methods discard outliers as noise or exceptions.

- fraud detection

The analysis of outlier data is referred to as outlier mining.

Outliers may be detected by

statistical tests (distribution or probability model for the data)

distance measures (objects that are a substantial distance from cluster)

deviation-based methods (examining differences in the


main characteristics of objects in a group)

Dr Senthilkumar N C, Asso Prof, SITE


Evolution Analysis:

Data evolution analysis describes and models regularities or trends for


objects whose behavior changes over time.
It may include

Characterization and discrimination

Association and correlation

analysis Classification and

prediction Clustering of time

related data

Analysis include

Time-series data analysis

Sequence or periodicity pattern


Dr Senthilkumar N C, Asso Prof, SITE
matching Similarity-based data analysis.

You might also like