You are on page 1of 11

B. P.

Poddar Institute of Management & Technology


Department of Computer Science and Engineering

PPT Presentation
DATA WAREHOUSING AND DATA MINING(PEC-IT602B)
Even Semester 2024

Outlier analysis
PRESENTED BY-
Introduction to Outlier
Analysis
Outlier analysis involves the identification and examination of data points
that significantly differ from the majority of the dataset. Understanding
and addressing outliers is crucial in data analysis and decision-making.
Types of Outliers

Global Outliers Collective Outliers Contextual Outliers


Global outliers are also called In a given set of data, Contextual outlier analysis
point outliers. Global outliers when a group of data enables the users to examine
are taken as the simplest form points deviates from the outliers in different contexts and
of outliers. When data points rest of the data set is conditions, which can be useful in
deviate from all the rest of the called collective outliers. various applications.
data points in a given data set,
it is known as the global
outlier.
Importance of Outlier Detection in Data
Warehousing and Data Mining

1 Data Integrity
Outlier detection ensures the accuracy and reliability of data stored in warehouses,
thereby maintaining data integrity.

2 Enhanced Decision Making


Identifying outliers aids in making informed decisions, leading to improved business
strategies and risk management.

3 Improved Model Performance


Eliminating outliers from datasets contributes to the development of more accurate
and reliable models in data mining.
Common Techniques for Outlier Detection
Statistical Methods Machine Learning Visualization Techniques
Approaches
Utilizing statistical Utilizing visual
measures such as Z-score Implementing algorithms representation of data to
and standard deviation to like Isolation Forest and identify patterns and
identify outliers in datasets. Local Outlier Factor for anomalies, aiding in outlier
outlier detection in large detection.
datasets.
Statistical Methods for Outlier Analysis

Data Preprocessing
Applying normalization and transformation techniques to prepare the data for
statistical analysis.

Identification Techniques
Using measures like median absolute deviation to identify and label outliers
within the dataset.

Assessment and Treatment


Evaluating the impact of outliers and determining whether to remove,
transform, or retain them for analysis.
Machine Learning Approaches
for Outlier Detection
1 Isolation Forest
Utilizes decision trees to isolate outliers with few conditions,
making it efficient for large datasets.

2 Local Outlier Factor


Computes the local density deviation of a data point with
respect to its neighbors to identify anomalies.

3 One-class SVM
Assumes that the majority of the data is in one class, and
identifies the outliers as observations that lie far from it.
Challenges and Limitations in Outlier
Analysis
Noisy Data High-Dimensional Scalability
Dealing with noisy and Data The need for efficient
irrelevant data points that Challenges in detecting outlier detection methods
may be mistaken as outliers in datasets with that can handle large and
outliers, posing challenges numerous dimensions, as dynamic datasets without
in accurate identification. it increases the complexity significant performance
of analysis. degradation.
Applications of Outlier Analysis in
Data Warehousing and Data Mining
Financial Fraud Detection Identifying unusual patterns in transaction
data to detect potential fraudulent activities.

Healthcare Analytics Analyzing patient data to identify outliers


that may indicate medical errors or unusual
health conditions.

Supply Chain Management Detecting irregularities in supply chain data


to optimize inventory management and
reduce losses.
Conclusion and Key Takeaways
1 Continuous Monitoring
Emphasizing the importance of ongoing surveillance and analysis to detect and manage
outliers.

2 Collaborative Efforts
Highlighting the need for cross-functional collaboration to effectively address outliers
in complex datasets.

3 Value of Insights
Underlining the potential business value derived from uncovering valuable insights
through outlier analysis processes.
Reference

• https://www.scaler.com/topics/data-mining-
tutorial/outlier-analysis-in-data-mining/

• https://www.educba.com/outlier-in-data-mining/

• https://www.mygreatlearning.com/blog/outlier-
analysis-explained/

• https://www.javatpoint.com/what-is-outlier-in-
data-mining

THANK YOU

You might also like