You are on page 1of 17

Data Mining

IS314

Dr. Ayman Alhelbawy 20 March 2022


Data Mining Tasks (cont.)
Clustering

1. Identifying a set of similarity groups in the data


Data Mining Tasks (cont.)
Cluster Analysis

• Unsupervised learning (i.e., Class label is unknown)


• Group data to form new categories (i.e., clusters), e.g.,
cluster houses to find distribution patterns
• Principle: Maximizing intra-class similarity & minimizing
interclass similarity
• Many methods and applications

Data Mining Tasks (cont.)


Clustering Application Example
Data Mining Tasks (cont.)
Outlier Analysis

• Outlier analysis
• Outlier: A data object that does not comply with the
general behavior of the data
• Noise or exception? ― One person’s garbage could be
another person’s treasure
• Methods: by product of clustering or regression analysis,

• Useful in fraud detection, rare events analysis

Data Mining Tasks (cont.)


Deviation/Anomaly/Change Detection

Discovering the most significant changes in data


Data Mining Tasks (cont.)
Regression
Data Mining Functions
Association and Correlation Analysis (cont.)
Time and Ordering: Sequential Pattern,
Trend and Evolution Analysis

• Sequence, trend and evolution analysis


• Trend, time-series, and deviation analysis: e.g.,
regression and value prediction
• •Sequential pattern mining
e.g., first buy digital camera, then buy large SD memory cards
• Periodicity analysis
• •Motifs and biological sequence analysis
Approximate and consecutive motifs
• Similarity-based analysis
• Mining data streams
• Ordered, time-varying, potentially infinite, data
streams

Structure and Network Analysis

• Graph mining
• Finding frequent subgraphs (e.g., chemical compounds), trees
(XML), substructures (web fragments)
• Information network analysis
• Social networks: actors (objects, nodes) and relationships (edges)
• e.g., author networks in CS, terrorist networks
• Multiple heterogeneous networks
• A person could be multiple information networks: friends, family,
classmates, …
• Links carry a lot of semantic information: Link mining
• Web mining
• Web is a big information network: from PageRank to Google
• Analysis of Web information networks
• Web community discovery, opinion mining, usage mining, …

Data Mining Applications

Marketing, customer profiling and retention, identifying


potential customers, market segmentation.

Fraud detection : identifying credit card fraud, intrusion


detection

Biological data mining

Text and web mining

Scientific data analysis

Stock market analysis

Any application that involves a large amount of data …


Data Mining Challenges

• Scalability
•Complex and Heterogeneous Data
•Data Quality
•Data Ownership and Distribution
•Privacy Preservation
•Streaming Data

Data Mining Challenges (cont.)


Scalability

Scalability means the ability of the algorithm to


work well on small and large data.

In other words, the running time of a data


mining algorithm must be predictable and
acceptable in large databases.

Data Mining Challenges (cont.)


Complex and Heterogeneous Data

Recent years have seen the emergence of new


and complex source of information like for
example semi-structured data and hyperlinks,
XML documents. Relationship between data
should be considered like graph connectivity,
temporal and sequence data.
Data Mining Challenges (cont.)
Data Quality

It is about the quality of data that will affect the resulting pattern.
Factors affecting quality are: Noise and outliers, duplicate data,
missing values and inconsistent data

Data Ownership and Distribution

Sometimes data needed for an analysis is not stored in one


location or owned by one organization.
Challenges:
(1) reduce amount of communication needed to perform computatio
(2) consolidate data mining results from multiple source
(3) data security issu

Complex and Heterogeneous Data


Privacy Preservation:

Data mining is always accused to threaten the


individual privacy. Data mining is a source of
information leakag

Streaming Data:

Data stream is continuous no stored data like


network traf c. This is a big challenge for data
miners and there are a lot of algorithms introduced
for such kind of data
fi
e

Thank You.
Questions????

You might also like