Welcome to Scribd!

Unit 2 Data Mining

Uploaded by

0% found this document useful (0 votes)

0 views7 pages

This document discusses different techniques used for data mining and big data analytics, including classification analysis, association rule learning, anomaly detection, clustering analysis, and regression analysis. Classification analysis segments data into predefined classes or groups. Association rule learning identifies frequent patterns and relationships between variables in large datasets. Anomaly detection finds outliers and unexpected observations. Clustering analysis groups similar objects together without predefined classes. Regression analysis identifies relationships between dependent and independent variables for prediction purposes.

Original Description:

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

0 views7 pages

Unit 2 Data Mining

Uploaded by

rawatamansingh480

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 7

Search inside document

INTRODUCTION To

BIG DATA
DATA MINING
DATA MINING TECHNIQIES AND
BDA
THERE IS ALWAYS A BIG AND BOUND RELATION OF DATA MINING TECHNIQUES ALONG WITH
BIG DATA ANALYTICS
1. ASSOCIATION RULE LEARNING
2. ANOMALY OR OUTLIER DETECTION
3. CLUSTERING ANALYSIS
4. REGRESSION ANALYSIS
5. CLASSIFICATION ANALYSIS
CLASSIFICATION ANALYSIS

 THIS ANALYSIS IS USED TO RETRIEVE IMPORTANT AND RELEVANT INFORMATION ABOUT DATA,
AND METADATA.
IT IS USED TO CLASSIFY DIFFERENT DATA IN DIFFERENT CLASSES. CLASSIFICATION IS SIMILAR
TO CLUSTERING IN A WAY THAT IT ALSO SEGMENTS DATA RECORDS INTO DIFFERENT SEGMENTS
CALLED CLASSES. BUT UNLIKE CLUSTERING, HERE THE DATA ANALYSTS WOULD HAVE THE
KNOWLEDGE OF DIFFERENT CLASSES OR CLUSTER
IN CLASSIFICATION ANALYSIS YOU WOULD APPLY ALGORITHMS TO DECIDE HOW NEW DATA
SHOULD BE CLASSIFIED.A CLASSIC EXAMPLE OF CLASSIFICATION ANALYSIS WOULD BE OUR
OUTLOOK EMAIL. IN OUTLOOK, THEY USE CERTAIN ALGORITHMS TO CHARACTERIZE AN EMAIL
AS LEGITIMATE OR SPAM.
ASSOCIATION RULE LEARNING

 IT REFERS TO THE METHOD THAT CAN HELP YOU IDENTIFY SOME INTERESTING RELATIONS
(DEPENDENCY MODELING) BETWEEN DIFFERENT VARIABLES IN LARGE DATABASES.
THIS TECHNIQUE CAN HELP YOU UNPACK SOME HIDDEN PATTERNS IN THE DATA THAT CAN BE
USED TO IDENTIFY VARIABLES WITHIN THE DATA AND THE CONCURRENCE OF DIFFERENT
VARIABLES THAT APPEAR VERY FREQUENTLY IN THE DATASET.
ASSOCIATION RULES ARE USEFUL FOR EXAMINING AND FORECASTING CUSTOMER BEHAVIOR.
IT IS HIGHLY RECOMMENDED IN THE RETAIL INDUSTRY ANALYSIS.
THIS TECHNIQUE IS USED TO DETERMINE SHOPPING BASKET DATA ANALYSIS, PRODUCT
CLUSTERING, CATALOG DESIGN AND STORE LAYOUT. IN IT, PROGRAMMERS USE ASSOCIATION
RULES TO BUILD PROGRAMS CAPABLE OF MACHINE LEARNING.
ANOMALY OR OUTLIER
DETECTION

 THIS REFERS TO THE OBSERVATION FOR DATA ITEMS IN A DATASET THAT DO NOT MATCH AN
EXPECTED PATTERN OR AN EXPECTED BEHAVIOR. ANOMALIES ARE ALSO KNOWN AS OUTLIERS,
NOVELTIES, NOISE, DEVIATIONS AND EXCEPTIONS.
OFTEN THEY PROVIDE CRITICAL AND ACTIONABLE INFORMATION. AN ANOMALY IS AN ITEM THAT
DEVIATES CONSIDERABLY FROM THE COMMON AVERAGE WITHIN A DATASET OR A COMBINATION
OF DATA.
THESE TYPES OF ITEMS ARE STATISTICALLY ALOOF AS COMPARED TO THE REST OF THE DATA AND
HENCE, IT INDICATES THAT SOMETHING OUT OF THE ORDINARY HAS HAPPENED AND REQUIRES
ADDITIONAL ATTENTION.
THIS TECHNIQUE CAN BE USED IN A VARIETY OF DOMAINS, SUCH AS INTRUSION DETECTION,
SYSTEM HEALTH MONITORING, FRAUD DETECTION, FAULT DETECTION, EVENT DETECTION IN
SENSOR NETWORKS, AND DETECTING ECO-SYSTEM DISTURBANCES. ANALYSTS OFTEN REMOVE THE
ANOMALOUS DATA FROM THE DATASET TOP DISCOVER RESULTS WITH AN INCREASED ACCURACY.
CLUSTERING ANALYSIS

 THE CLUSTER IS ACTUALLY A COLLECTION OF DATA OBJECTS; THOSE OBJECTS ARE SIMILAR
WITHIN THE SAME CLUSTER. THAT MEANS THE OBJECTS ARE SIMILAR TO ONE ANOTHER
WITHIN THE SAME GROUP AND THEY ARE RATHER DIFFERENT OR THEY ARE DISSIMILAR OR
UNRELATED TO THE OBJECTS IN OTHER GROUPS OR IN OTHER CLUSTERS.
CLUSTERING ANALYSIS IS THE PROCESS OF DISCOVERING GROUPS AND CLUSTERS IN THE DATA
IN SUCH A WAY THAT THE DEGREE OF ASSOCIATION BETWEEN TWO OBJECTS IS HIGHEST IF
THEY BELONG TO THE SAME GROUP AND LOWEST OTHERWISE.A RESULT OF THIS ANALYSIS CAN
BE USED TO CREATE CUSTOMER PROFILING.
REGRESSION ANALYSIS

IN STATISTICAL TERMS, A REGRESSION ANALYSIS IS THE PROCESS OF IDENTIFYING AND

ANALYZING THE RELATIONSHIP AMONG VARIABLES.
IT CAN HELP YOU UNDERSTAND THE CHARACTERISTIC VALUE OF THE DEPENDENT VARIABLE
CHANGES, IF ANY ONE OF THE INDEPENDENT VARIABLES IS VARIED. THIS MEANS ONE VARIABLE
IS DEPENDENT ON ANOTHER, BUT IT IS NOT VICE VERSA.
IT IS GENERALLY USED FOR PREDICTION AND FORECASTING.

Full Stack D3 and Data Visualization
Document620 pages
Full Stack D3 and Data Visualization
kevin
88% (8)
Data Cleaning Data Integration Data Selection Data Transformation Data Mining Pattern Evaluation Knowledge Presentation
Document3 pages
Data Cleaning Data Integration Data Selection Data Transformation Data Mining Pattern Evaluation Knowledge Presentation
baskarchennai
No ratings yet
Engineering Data Analysis
Document19 pages
Engineering Data Analysis
Bil Sabio
100% (2)
ESCAPE-31, June 6-9, 2021 Istanbul Turkey PDF
Document1 page
ESCAPE-31, June 6-9, 2021 Istanbul Turkey PDF
Anik Aich
No ratings yet
Anomalies
Document1 page
Anomalies
Zel's Field
No ratings yet
BA-2.2-Data Mining
Document18 pages
BA-2.2-Data Mining
akash
No ratings yet
Research Methodology Module 41
Document7 pages
Research Methodology Module 41
manu
No ratings yet
Prevention of Security Concerns During Outlier Detection
Document3 pages
Prevention of Security Concerns During Outlier Detection
surendiran123
No ratings yet
Anomaly Detection Guidebook
Document16 pages
Anomaly Detection Guidebook
Anonymous FkphLMPwua
100% (1)
Axioms: Wavelet-Based Monitoring For Biosurveillance
Document27 pages
Axioms: Wavelet-Based Monitoring For Biosurveillance
intj2001712
No ratings yet
A Survey On Clustering Techniques in Medical Diagnosis
Document6 pages
A Survey On Clustering Techniques in Medical Diagnosis
EighthSenseGroup
No ratings yet
What Is Data Mining?: 1. Classification Analysis
Document5 pages
What Is Data Mining?: 1. Classification Analysis
Junaid Ahnaf
No ratings yet
Anomaly Detection - Analysis and Prediction Techniques in IoT Environment A Systematic Literature Review
Document18 pages
Anomaly Detection - Analysis and Prediction Techniques in IoT Environment A Systematic Literature Review
rasecla
No ratings yet
Design and Implementation of High End Multiple Security Based ATM Monitoring System
Document3 pages
Design and Implementation of High End Multiple Security Based ATM Monitoring System
Editor IJRITCC
No ratings yet
An Overview of Data Mining in Medical Field: Reema Arora, Sandeep Jaglan
Document4 pages
An Overview of Data Mining in Medical Field: Reema Arora, Sandeep Jaglan
erpublication
No ratings yet
Goldstein 2016
Document31 pages
Goldstein 2016
Istavay Orbegoso Salas
No ratings yet
Introduction To Kidney Disease
Document10 pages
Introduction To Kidney Disease
spandansahil15
No ratings yet
Outlier Mining Techniques For Uncertain Data
Document7 pages
Outlier Mining Techniques For Uncertain Data
International Journal of Application or Innovation in Engineering & Management
No ratings yet
Predicting Earthquakes Through Data Mining
Document12 pages
Predicting Earthquakes Through Data Mining
Prathyusha Reddy
No ratings yet
Anonymization Is The Making Data Publicly Without Compromising The Individual Privacy
Document4 pages
Anonymization Is The Making Data Publicly Without Compromising The Individual Privacy
shini s g
No ratings yet
1997wilson161-2 Setting Alert Action Limit
Document2 pages
1997wilson161-2 Setting Alert Action Limit
Phạm Quang Huy
100% (1)
Disease Prediction Using Data Mining
Document5 pages
Disease Prediction Using Data Mining
abhishek
No ratings yet
Hybrid Model For Clinical Diagnosis and Treatment Using Data Mining Techniques
Document4 pages
Hybrid Model For Clinical Diagnosis and Treatment Using Data Mining Techniques
theijes
No ratings yet
Introduction To Kidney Disease
Document10 pages
Introduction To Kidney Disease
spandansahil15
No ratings yet
P Veerabathiran Computerscience
Document48 pages
P Veerabathiran Computerscience
Dhilsanth SL
No ratings yet
Project Report
Document7 pages
Project Report
M Shahid Khan
No ratings yet
Exploratory Data Analysis: Datascience Using Python Topic: 3
Document32 pages
Exploratory Data Analysis: Datascience Using Python Topic: 3
KALYANI KIRAN
No ratings yet
Wong 05 A
Document38 pages
Wong 05 A
Dérick Fernandes
No ratings yet
Heart Disease Detection
Document18 pages
Heart Disease Detection
dummy dummy
No ratings yet
Registries Missing Link New v1 1
Document10 pages
Registries Missing Link New v1 1
ad00ia
No ratings yet
Q4 - W6 Eapp
Document19 pages
Q4 - W6 Eapp
ʟᴏғɪ.
No ratings yet
Proposal On Design and Implementation of Computerized Patient Diagnosis System
Document4 pages
Proposal On Design and Implementation of Computerized Patient Diagnosis System
kolawole
No ratings yet
Identifying Extreme Observations, Outliers and Noise in Clinical and Genetic Data
Document17 pages
Identifying Extreme Observations, Outliers and Noise in Clinical and Genetic Data
Trúc Lê
No ratings yet
Introduction To Data Mining
Document38 pages
Introduction To Data Mining
HUsic896514
No ratings yet
Anomaly Detection 2
Document8 pages
Anomaly Detection 2
Aishwarya Singh
No ratings yet
Roles of Ststs@prob - Reporting
Document24 pages
Roles of Ststs@prob - Reporting
Dane Delima
No ratings yet
AI Assignment 1
Document4 pages
AI Assignment 1
Abraham Onyedikachi Ogudu
No ratings yet
Introduction and Problem Area: Tentative Title - Detecting Interesting Patterns
Document7 pages
Introduction and Problem Area: Tentative Title - Detecting Interesting Patterns
Aishwarya Singh
No ratings yet
IMAC00 Unsupervised PDF
Document7 pages
IMAC00 Unsupervised PDF
rs0004
No ratings yet
Group 9 - ANALYTICAL RESEARCH
Document36 pages
Group 9 - ANALYTICAL RESEARCH
Genuslee Mapedze
No ratings yet
Lung Disease Prediction System Using Data Mining Techniques
Document6 pages
Lung Disease Prediction System Using Data Mining Techniques
KEZZIA MAE ABELLA
No ratings yet
Math 212 Module 1
Document5 pages
Math 212 Module 1
Jefone Jade Guzon
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
Document11 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
IAEME Publication
No ratings yet
Cluster Analysis For Anomaly Detection in Accounting Data: An Audit Approach
Document17 pages
Cluster Analysis For Anomaly Detection in Accounting Data: An Audit Approach
Statistics Smart Solutions
No ratings yet
Research Design
Document16 pages
Research Design
Danielle Raven Garcia
No ratings yet
Applications of Statistical Data Mining Methods
Document17 pages
Applications of Statistical Data Mining Methods
adjeikonadukingsley2005
No ratings yet
Statistics Kokay NEW
Document19 pages
Statistics Kokay NEW
MC Nocum Alvendia
No ratings yet
Predicting and Classifing The Levels of Autism Spectrum Disorder
Document53 pages
Predicting and Classifing The Levels of Autism Spectrum Disorder
Vedha Technologies
No ratings yet
Neural Networks in Data Mining
Document5 pages
Neural Networks in Data Mining
Boobalan R
No ratings yet
C1a - Anomaly Detection
Document12 pages
C1a - Anomaly Detection
HII TOH PING / UPM
No ratings yet
2009 Data Cleaning
Document8 pages
2009 Data Cleaning
Glory of Billy's Empire Jorton Knight
No ratings yet
Data M Ining Classifi Cation: Fabricio Voznika Leo Nardo Via Na
Document6 pages
Data M Ining Classifi Cation: Fabricio Voznika Leo Nardo Via Na
Rishi Ghai
No ratings yet
Differencial Link Analysis in Health Care Using Data Mining
Document27 pages
Differencial Link Analysis in Health Care Using Data Mining
veerabalaj
No ratings yet
Predicting Diabetes in Medical Datasets Using Machine Learning Techniques
Document14 pages
Predicting Diabetes in Medical Datasets Using Machine Learning Techniques
Sardar Manikanta Yadav
No ratings yet
Mining Using Genitic Algorithms
Document7 pages
Mining Using Genitic Algorithms
Chandu Chandana
No ratings yet
Anomaly Detection
Document7 pages
Anomaly Detection
joshuapeter961204
No ratings yet
1 s2.0 S0957417423014963 Main
Document20 pages
1 s2.0 S0957417423014963 Main
m.ramaki
No ratings yet
Animal Biotechnology - Docx Sumi
Document3 pages
Animal Biotechnology - Docx Sumi
akash
No ratings yet
Data Audit
Document8 pages
Data Audit
Vamsi Vasisht
No ratings yet
Cs1 Summary 2022 (CH 1 To 16) With Index (16.03.23)
Document211 pages
Cs1 Summary 2022 (CH 1 To 16) With Index (16.03.23)
Mahek Shah
No ratings yet
Heart Disease Prediction Using The Data Mining Techniques
Document5 pages
Heart Disease Prediction Using The Data Mining Techniques
EighthSenseGroup
No ratings yet
Numerical Methods in Environmental Data Analysis
From Everand
Numerical Methods in Environmental Data Analysis
Moses Eterigho Emetere
No ratings yet
ICT Tools-2
Document11 pages
ICT Tools-2
ivymacalma09
No ratings yet
Eng - Instructions ERC-M
Document42 pages
Eng - Instructions ERC-M
Florin Petolea
No ratings yet
Wireless TV Receiver Self-Installation Guide
Document2 pages
Wireless TV Receiver Self-Installation Guide
Gary Goyle
No ratings yet
Regression Model Development For Exposure at Default (EAD) : July 2010
Document22 pages
Regression Model Development For Exposure at Default (EAD) : July 2010
Viktória Kónya
No ratings yet
AmpliTube ToneNET - Quick - Start - Guide
Document30 pages
AmpliTube ToneNET - Quick - Start - Guide
gerardo mendez
No ratings yet
WorkCentre 3220 3210 Firmware Upgrade Instructions Using CentreWare IS
Document2 pages
WorkCentre 3220 3210 Firmware Upgrade Instructions Using CentreWare IS
Rubens Mesquita Rodrigues
No ratings yet
670/650 Series Version 2.2 IEC: Quick Start Guide
Document46 pages
670/650 Series Version 2.2 IEC: Quick Start Guide
Luis Yucra
No ratings yet
Parallel Multiscale Autoregressive Density Estimation
Document16 pages
Parallel Multiscale Autoregressive Density Estimation
Mario Galindo Queralt
No ratings yet
OAX000203 Data Configuration For MRS ISSUE2.0-20041018-A
Document34 pages
OAX000203 Data Configuration For MRS ISSUE2.0-20041018-A
Randy
No ratings yet
Components and Formats of Business Correspondence
Document15 pages
Components and Formats of Business Correspondence
Joseth
No ratings yet
ISO TR 13028 2010 (E) - Character PDF Document
Document40 pages
ISO TR 13028 2010 (E) - Character PDF Document
Microformas
No ratings yet
Gohar Computer Science Keybook 03
Document68 pages
Gohar Computer Science Keybook 03
ayat fatima
No ratings yet
Gmail - FWD - ? Thanks! Your Booking Is Confirmed at Avenra Garden Hotel
Document4 pages
Gmail - FWD - ? Thanks! Your Booking Is Confirmed at Avenra Garden Hotel
TONMOY AHSAN HABIB
No ratings yet
8 Function Phases in Expert Mode
Document22 pages
8 Function Phases in Expert Mode
edward carroll
No ratings yet
Operating Instructions DVD Recorder DMR-EX773EB: Dear Customer
Document96 pages
Operating Instructions DVD Recorder DMR-EX773EB: Dear Customer
Stephen M
No ratings yet
Profiling Lightweight Container Platforms: Microk8S and K3S in Comparison To Kubernetes
Document9 pages
Profiling Lightweight Container Platforms: Microk8S and K3S in Comparison To Kubernetes
donaxe6605
No ratings yet
A6V10452678
Document7 pages
A6V10452678
ABI RAJESH GANESHA RAJA
No ratings yet
Genesys Licensing Guide: System Level Guides 8.5.x
Document106 pages
Genesys Licensing Guide: System Level Guides 8.5.x
jayar medico
No ratings yet
ACP122F
Document69 pages
ACP122F
Laurenz Laurente
No ratings yet
Arkel Door Control KM 10
Document3 pages
Arkel Door Control KM 10
thanggimme.phan
No ratings yet
The 15 Best Compressor Plugins For Every Mix Situation - LANDR Blog PDF
Document1 page
The 15 Best Compressor Plugins For Every Mix Situation - LANDR Blog PDF
Fafa Otarias
No ratings yet
Library Digital Resources
Document29 pages
Library Digital Resources
vaibhav chaurasia
100% (1)
DBMS Lab Manual
Document40 pages
DBMS Lab Manual
Dalli Punith Reddy
No ratings yet
Advanced Sorting Algorithms Chap3
Document41 pages
Advanced Sorting Algorithms Chap3
Teka Mac
No ratings yet
Customer Module Component
Document6 pages
Customer Module Component
Asad Javed
No ratings yet
Tk102 2 Gps Tracker User Manual CORVUS
Document23 pages
Tk102 2 Gps Tracker User Manual CORVUS
tituro
No ratings yet
CCNA 1 v51 Chapter 6 PT Practice Skills Assessment Packet Tracer
Document13 pages
CCNA 1 v51 Chapter 6 PT Practice Skills Assessment Packet Tracer
Cz Tay
100% (1)