PCA_SMOTE

Uploaded by

kobaya7455

0% found this document useful (0 votes)

2 views15 pages

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

2 views15 pages

PCA_SMOTE

Uploaded by

kobaya7455

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 15

Search inside document

Principal Component Analysis, or PCA

• Is a dimensionality-reduction method

• It is often used to reduce the dimensionality of large data sets

How?
• by transforming a large set of variables into a smaller one that still
contains most of the information in the large set.
Principal Component Analysis, or PCA
• Reducing the number of variables of a data set naturally comes at the
expense of accuracy

• But the trick in dimensionality reduction is to trade a little accuracy

for simplicity.

• Because smaller data sets are easier to explore and visualize and
make analyzing data much easier and faster for machine learning
algorithms without extraneous variables to process.
Idea of PCA
• Reduce the number of variables of a data set, while preserving as
much information as possible.
Principal Component Analysis (PCA)

• Given a set of points, how do we know

if they can be compressed like in
the previous example?
– The answer is to look into the
correlation between the points
– The tool for doing this is
called PCA
PCA
• By finding the eigenvalues and eigenvectors of the
covariance matrix, we find that the eigenvectors
with the largest eigenvalues correspond to the
dimensions that have the strongest correlation in
the dataset.
• This is the principal component.
• PCA is a useful statistical technique that has
found application in:
– fields such as face recognition and image compression
– finding patterns in data of high dimension.
Imbalanced Data Set
Imbalanced Data Set
• Classification predictive modeling involves predicting a class label for
a given observation.

• An imbalanced classification problem is an example of a classification

problem where the distribution of examples across the known classes
is biased or skewed.

• The distribution can vary from a slight bias to a severe imbalance

where there is one example in the minority class for hundreds,
thousands, or millions of examples in the majority class or classes.
Example
• Cancer Prediction

No Cancer – 900 --- Majority Class

Yes Cancer – 100 ----Minority Class

If 1000 record are given which is biased towards NC – still Accuracy is 90%

Most algorithm work towards Majority class

Business Problems Minority class is the focus class eg: Spam / Non Spam

If accuracy is taken as metric algorithm tend to bias towards majority class

Methods to handle
• Under sampling

100 – NC
100 – C

====
200 -- perfectly balanced
========
• ML data is very important , loosing data is not recommended
Methods to handle
• Over Sampling

900 – NC
900 – C
===================================
Cancer
Take 30 records randomly
Till Reach – 900
Random Duplication
Few records may be more duplicated , few records less duplicated
900 – 800 are duplicates
===================================
1800 -- perfectly balanced --- focus is on minority class
===================================
• ML data is very important , loosing data is not recommended
Under Sampling vs Over Sampling
Methods to handle
• SMOTE (Synthetic Minority Oversampling Technique )
SMOTE
• Calculate the linear distance
between two vectors and
SMOTE multiply it by random number
between 0 -1 and plot the new
data point with the output

• The new point is the synthetic

data point continue
SMOTE – Repeat the process till you reach
the desired points

Dimensionality Reduction (1)
Document19 pages
Dimensionality Reduction (1)
Atul Patil
No ratings yet
Dimension Reduction
Document38 pages
Dimension Reduction
apurva
No ratings yet
Cap6 - Data Reduction
Document27 pages
Cap6 - Data Reduction
priyanshidubey2008
No ratings yet
DL Class3
Document28 pages
DL Class3
Rishi Chaary
No ratings yet
DuongToGiangSon 517H0162 HW2 Nov-26
Document17 pages
DuongToGiangSon 517H0162 HW2 Nov-26
Son Tran
No ratings yet
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
Document55 pages
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
Chanda Test
No ratings yet
NLP Chapter 2
Document79 pages
NLP Chapter 2
ai20152023
No ratings yet
Data Science Q&A - Latest Ed (2020) - 5 - 1
Document2 pages
Data Science Q&A - Latest Ed (2020) - 5 - 1
M K
No ratings yet
MODULE 3 Classification
Document5 pages
MODULE 3 Classification
dhruu2503
No ratings yet
Lesson 2
Document4 pages
Lesson 2
mohammadsadaf
No ratings yet
Lecture 09 ML
Document26 pages
Lecture 09 ML
saharabdouma
No ratings yet
Statistics Interview 02
Document30 pages
Statistics Interview 02
Sudharshan Venkatesh
100% (1)
Fiches Machine Learning
Document21 pages
Fiches Machine Learning
Rhysand Re
No ratings yet
Six Sigma Green Belt Exam Study Notes
Document11 pages
Six Sigma Green Belt Exam Study Notes
Kumaran Vel
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
Document32 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
Rule2
No ratings yet
F2 (1)
Document33 pages
F2 (1)
chandreshpadmani9993
No ratings yet
Data Mining Assignment Help
Document5 pages
Data Mining Assignment Help
Statistics Homework Solver
No ratings yet
Dimension Reduction Techniques in Machine Learning
Document24 pages
Dimension Reduction Techniques in Machine Learning
Shil Shambharkar
No ratings yet
Random Forest
Document83 pages
Random Forest
Bharath Reddy Mannem
No ratings yet
DA 4class
Document33 pages
DA 4class
kireet04
No ratings yet
Introduction To Machine Learning
Document29 pages
Introduction To Machine Learning
Aayush Kansara
No ratings yet
Multivariate Statistical Methods with R
Document11 pages
Multivariate Statistical Methods with R
qwety300
No ratings yet
Day 6 of 100 Data Science Interview Questions Series!!
Document3 pages
Day 6 of 100 Data Science Interview Questions Series!!
Silga
No ratings yet
Data Warehousing and Mining: Ii Unit: Data Preprocessing, Language Architecture Concept Description
Document7 pages
Data Warehousing and Mining: Ii Unit: Data Preprocessing, Language Architecture Concept Description
ravi3754
No ratings yet
Data Management Analytics
Document42 pages
Data Management Analytics
NEVOX STUDIOS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
Document59 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
Indumathy Paranthaman
No ratings yet
Supervised Learning 1 PDF
Document162 pages
Supervised Learning 1 PDF
Alexander
No ratings yet
SSCK 1203 Data Analysis 090214 Students 01
Document33 pages
SSCK 1203 Data Analysis 090214 Students 01
Syahimi Sazini
No ratings yet
Bank Marketing Data
Document14 pages
Bank Marketing Data
sanju
100% (2)
Name: Vijay Patel Class: SYBSC-IT Div: B Roll No.: 4163 Assignment Questions
Document19 pages
Name: Vijay Patel Class: SYBSC-IT Div: B Roll No.: 4163 Assignment Questions
White Rock
No ratings yet
Unit 2
Document28 pages
Unit 2
LOGESH WARAN P
No ratings yet
Data Mining Techniques for Recommender Systems
Document6 pages
Data Mining Techniques for Recommender Systems
DainikMitra
No ratings yet
Chapter 14 Part B
Document21 pages
Chapter 14 Part B
Jennifer Jones
No ratings yet
Basic Interview Q's On ML PDF
Document243 pages
Basic Interview Q's On ML PDF
sourajit roy chowdhury
100% (2)
Data Prep Roc Es
Document31 pages
Data Prep Roc Es
M sindhu
No ratings yet
2012 Pls-Sem Workshop 5-26-2012 Initial Revised - 2
Document25 pages
2012 Pls-Sem Workshop 5-26-2012 Initial Revised - 2
katon
No ratings yet
Presentation 1
Document28 pages
Presentation 1
Nisar Mohammad
No ratings yet
Csa202 Unit 2
Document36 pages
Csa202 Unit 2
vbknukwcysgycpmlzs
No ratings yet
Sample Size Calculations For Impact Evaluations: Marcos Vera-Hernandez Ucl, Ifs, and Pepa M.vera@ucl - Ac.uk
Document47 pages
Sample Size Calculations For Impact Evaluations: Marcos Vera-Hernandez Ucl, Ifs, and Pepa M.vera@ucl - Ac.uk
Alejandra Rivera
No ratings yet
PPT on Unit-3
Document30 pages
PPT on Unit-3
Nihar Ranjan Prusty 92
No ratings yet
7 - Conceptual Data Science
Document22 pages
7 - Conceptual Data Science
Putri Anisa
No ratings yet
ML Final Project Report
Document8 pages
ML Final Project Report
Aditya Gupta
No ratings yet
Data Driven Decisions With Visual Analytics
Document9 pages
Data Driven Decisions With Visual Analytics
Jason
No ratings yet
Data Mining: How To Estimate A Classifier Performance (E.g. Accuracy) ?
Document15 pages
Data Mining: How To Estimate A Classifier Performance (E.g. Accuracy) ?
vijayganesh pinisetti
No ratings yet
Research Citation Notes
Document35 pages
Research Citation Notes
Web Best Wabii
No ratings yet
Data Preprocessing Techniques
Document52 pages
Data Preprocessing Techniques
Akbar Kushanoor
No ratings yet
Untitled
Document128 pages
Untitled
P.V.S. VEERANJANEYULU
No ratings yet
KNN - Classify and Predict with the K-Nearest Neighbors Algorithm
Document10 pages
KNN - Classify and Predict with the K-Nearest Neighbors Algorithm
nithinmamidala999
No ratings yet
Primer On Major Data Mining Algorithms
Document86 pages
Primer On Major Data Mining Algorithms
Vikram Sankhala
No ratings yet
ML 5
Document14 pages
ML 5
dibloa
No ratings yet
NIPS 2003 Bayesian NN and DDT Feature Selection
Document33 pages
NIPS 2003 Bayesian NN and DDT Feature Selection
saurabh_34
No ratings yet
Theory in Machine Learning
Document47 pages
Theory in Machine Learning
Sreetam Ganguly
100% (2)
Six Sigma Green Belt Exam Study Notes PDF
Document12 pages
Six Sigma Green Belt Exam Study Notes PDF
naacha457
No ratings yet
Six Sigma Green Belt Exam Study Notes
Document12 pages
Six Sigma Green Belt Exam Study Notes
sys-eng
90% (48)
DM chapter 4
Document47 pages
DM chapter 4
world channel
No ratings yet
Measures of Variability
Document71 pages
Measures of Variability
Rinna Legaspi
100% (1)
Practical Engineering, Process, and Reliability Statistics
From Everand
Practical Engineering, Process, and Reliability Statistics
Mark Allen Durivage
No ratings yet
Python Machine Learning for Beginners: Unsupervised Learning, Clustering, and Dimensionality Reduction. Part 1
From Everand
Python Machine Learning for Beginners: Unsupervised Learning, Clustering, and Dimensionality Reduction. Part 1
Tom Lesley
No ratings yet
Simulation for Data Science with R
From Everand
Simulation for Data Science with R
Matthias Templ
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
Rating: 5 out of 5 stars
5/5 (5)
2024 05 Exam SRM Syllabus
Document6 pages
2024 05 Exam SRM Syllabus
Ashish Kumar Yadav
No ratings yet
Machine Learning-Based Classification of Vector Vortex Beams
Document7 pages
Machine Learning-Based Classification of Vector Vortex Beams
Parag Sharma
No ratings yet
Applied Neuro Cryptography
Document191 pages
Applied Neuro Cryptography
Raiee Dee
100% (1)
Stock Price Movements Classification Using Machine and Deep Learning Techniques-The Case Study of Indian Stock Market
Document8 pages
Stock Price Movements Classification Using Machine and Deep Learning Techniques-The Case Study of Indian Stock Market
Alexandre Rezende
No ratings yet
Uswatul Hasana K11110368
Document16 pages
Uswatul Hasana K11110368
Trie Hermawan
No ratings yet
Seminar Project - Face - Recognition
Document58 pages
Seminar Project - Face - Recognition
Italiya Bhakti
No ratings yet
Intellectual Capital Bontis
Document14 pages
Intellectual Capital Bontis
NicoleSun
No ratings yet
Operational Efficiency With Simatic Pcs 7: White Paper
Document32 pages
Operational Efficiency With Simatic Pcs 7: White Paper
palash_mon
No ratings yet
The Cartoon Guide To Statistics-3
Document8 pages
The Cartoon Guide To Statistics-3
Swetha Ramesh
100% (1)
MATLAB Package Guide for Bayesian VAR Models
Document27 pages
MATLAB Package Guide for Bayesian VAR Models
mfanari
No ratings yet
1983 - The Brief Symptom Inventory, An Introductory Report
Document11 pages
1983 - The Brief Symptom Inventory, An Introductory Report
Bogdan Baceanu
No ratings yet
2017 A Nwe Data Structure and Workfolw For Using 3d Anthropometry in The Design of Wearable Product
Document10 pages
2017 A Nwe Data Structure and Workfolw For Using 3d Anthropometry in The Design of Wearable Product
Vivian Li
No ratings yet
NMR Screening in The Quality Control of Food and Nutraceuticals
Document6 pages
NMR Screening in The Quality Control of Food and Nutraceuticals
Byn Tran
No ratings yet
International Journal of Scientific Research and Reviews
Document23 pages
International Journal of Scientific Research and Reviews
Daniel Das
No ratings yet
PublishedPaper 2020-APCSM MachineLearning
Document8 pages
PublishedPaper 2020-APCSM MachineLearning
Jilmil Kalita
No ratings yet
PCA and MLR for Hair Salon Dataset
Document11 pages
PCA and MLR for Hair Salon Dataset
rishit
33% (3)
Infrared Physics and Technology
Document10 pages
Infrared Physics and Technology
Ronald Marcelo Diaz
No ratings yet
Datamining K-Means
Document25 pages
Datamining K-Means
sniper
No ratings yet
Hunter, M. A., & Takane, Y. (2002) - Constrained Principal Component Analysis.
Document41 pages
Hunter, M. A., & Takane, Y. (2002) - Constrained Principal Component Analysis.
alicia palma
No ratings yet
PCA Guide: Principal Component Analysis Explained
Document34 pages
PCA Guide: Principal Component Analysis Explained
Karthik K
100% (1)
Cross-Cultural Perceptions of "Made in
Document12 pages
Cross-Cultural Perceptions of "Made in
Sabina Frățilă
No ratings yet
Multivariate Statistics: Factor Analysis
Document4 pages
Multivariate Statistics: Factor Analysis
veerashah85
No ratings yet
Journal Pre-Proof: Food Control
Document43 pages
Journal Pre-Proof: Food Control
OSWALDO ALEXIS VARGAS VALENCIA
No ratings yet
Crude Oil Characterization and Correlation by PCA of 13C NMR Spectra
Document7 pages
Crude Oil Characterization and Correlation by PCA of 13C NMR Spectra
Greciel Egurrola Sanchez
No ratings yet
Lithological Mapping and Hydrothermal Alteration Using Landsat 8 Data: A Case Study in Ariab Mining District, Red Sea Hills, Sudan
Document10 pages
Lithological Mapping and Hydrothermal Alteration Using Landsat 8 Data: A Case Study in Ariab Mining District, Red Sea Hills, Sudan
Gabriel Barbosa Soares
100% (1)
A Concise Guide To Compositional Data Analysis
Document134 pages
A Concise Guide To Compositional Data Analysis
maynardjameskeenan
100% (2)
Identifying Aquifers and Water Influx Patterns in Oil Fields
Document12 pages
Identifying Aquifers and Water Influx Patterns in Oil Fields
Anonymous qaI31H
No ratings yet
Commodity Futures As An Investment Avenue
Document62 pages
Commodity Futures As An Investment Avenue
jitendra jaushik
50% (4)
Dimensionality Reduction Techniques For Hyperspectral Images
Document8 pages
Dimensionality Reduction Techniques For Hyperspectral Images
International Journal of Application or Innovation in Engineering & Management
No ratings yet
Ch. 10 PCA Outline for Regression
Document17 pages
Ch. 10 PCA Outline for Regression
José António Pereira
No ratings yet