Welcome to Scribd!

Imbalanced Classes Cheatsheet

Uploaded by

0% found this document useful (0 votes)

5 views1 page

This Python cheatsheet provides methods for handling machine learning datasets that have disproportionate class ratios, known as imbalanced classes. These include downsampling the majority class, upsampling the minority class, changing performance metrics from accuracy to ROC AUC score, using cost-sensitive algorithms like SVC with class weights, and tree-based algorithms like random forests. The full tutorial with explanations and an example dataset can be found online at elitedatascience.com.

Original Description:

Data Science Shortcuts

Original Title

Imbalanced-Classes-Cheatsheet

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

5 views1 page

Imbalanced Classes Cheatsheet

Uploaded by

wormagscopybiz9

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 1

Search inside document

Python Cheatsheet:

Handling Imbalanced Classes

This Python cheatsheet will cover some of the most useful methods for handling machine learning datasets that
have a disproportionate ratio of observations in each class. These “imbalanced” classes render standard accuracy
metrics useless.

To see the most up-to-date full tutorial and download the sample dataset, visit the online tutorial at
elitedatascience.com.

SETUP Down-sample Majority Class

Make sure the following are installed on your computer: df_majority = df[df.balance==0]
df_minority = df[df.balance==1]
• Python 2.7+ or Python 3
• NumPy
• Pandas
df_majority_downsampled = resample(df_majority,
• Scikit-Learn (a.k.a. sklearn)
replace=False,
n_samples=49,
Load Sample Dataset random_state=123)
import pandas as pd
import numpy as np
df_downsampled = pd.concat([df_majority_downsampled, df_minority])

df = pd.read_csv(‘balance-scale.data’,
names=[‘balance’, ‘var1’, ‘var2’, ‘var3’, ‘var4’])
Change Your Performance Metric
*Up-to-date link to the sample dataset can be found here.
from sklearn.metrics import roc_auc_score

prob_y_2 = clf_2.predict_proba(X)
Up-sample Minority Class prob_y_2 = [p[1] for p in prob_y_2]
df_majority = df[df.balance==0]
print( roc_auc_score(y, prob_y_2) )
df_minority = df[df.balance==1]

df_minority_upsampled = resample(df_minority, Use Cost-Sensitive Algorithms

replace=False, from sklearn.svm import SVC

n_samples=49, clf = SVC(kernel=’linear’, class_weight=’balanced’, probability=True)

random_state=123)

df_upsampled = pd.concat([df_majority, df_minority_upsampled]) Use Tree-Based AlgoriTHMS

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()

Honorable Mentions

• Create Synthetic Samples (Data Augmentation) - A close cousin of upsampling.

• Combine Minority Classes - Group together similar classes.
• Reframe as Anomaly Detection - Treat minority classes as outliers.

To see the most up-to-date full tutorial, explanations, and additional context, visit the online tutorial at elitedatascience.com.
We also have plenty of other tutorials and guides.

ELITEDATASCIENCE.COM

The Book of (PLC & SCADA) Dosing System by HMI
Document102 pages
The Book of (PLC & SCADA) Dosing System by HMI
Awidhi Kresnawan
No ratings yet
Cohen Book
Document122 pages
Cohen Book
Dat Nguyen
100% (1)
08 Qi No Fault Found Intermittent Failure
Document12 pages
08 Qi No Fault Found Intermittent Failure
Jack Xuan
100% (1)
Machine Learning With SQL
Document12 pages
Machine Learning With SQL
prince krish
100% (1)
Machine Learning Algorithms PDF
Document148 pages
Machine Learning Algorithms PDF
jeff omanga
No ratings yet
Machine Learning LAB: Practical-1
Document24 pages
Machine Learning LAB: Practical-1
Tsering Jhakree
100% (1)
PACS Fundamentals
Document79 pages
PACS Fundamentals
Syed Asad
100% (3)
Manual SmartClass Ethernet
Document118 pages
Manual SmartClass Ethernet
Gustavo Carpio
100% (1)
R12 Compensation Activity Guide
Document224 pages
R12 Compensation Activity Guide
personifiedgenius
0% (1)
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Big Data Merged
Document7 pages
Big Data Merged
Ingame Id
No ratings yet
Chapter 3
Document43 pages
Chapter 3
2022ac05722
No ratings yet
20MIS1025 - DecisionTree - Ipynb - Colaboratory
Document4 pages
20MIS1025 - DecisionTree - Ipynb - Colaboratory
Sandip Das
No ratings yet
SL Classification For Data Science..
Document4 pages
SL Classification For Data Science..
shivaybhargava33
No ratings yet
Iris - Regression - Jupyter Notebook
Document5 pages
Iris - Regression - Jupyter Notebook
Bibhudutta Swain
No ratings yet
DL 8
Document6 pages
DL 8
Siddesh Pingale
No ratings yet
Machine
Document45 pages
Machine
Gagan Sharma
100% (1)
Regularization: Updates To Assignment
Document21 pages
Regularization: Updates To Assignment
Yun Su
No ratings yet
Pattern Report Final
Document10 pages
Pattern Report Final
MD Rubel Amin
No ratings yet
ML - Practical File
Document15 pages
ML - Practical File
Jatin Mathur
No ratings yet
Class 14 - Basic Coding in Python - 5
Document24 pages
Class 14 - Basic Coding in Python - 5
Naimur Rahman
No ratings yet
PL Ii
Document43 pages
PL Ii
Rohan 7
No ratings yet
Import As Import As Import As: # Importing The Libraries
Document3 pages
Import As Import As Import As: # Importing The Libraries
19-361 Sai Prathik
No ratings yet
Correction
Document3 pages
Correction
bougmazisoufyane
No ratings yet
Lecture 5
Document33 pages
Lecture 5
立哲
No ratings yet
Deep Learning With Python File
Document22 pages
Deep Learning With Python File
Arnav Shrivastava
No ratings yet
MLprac 3
Document2 pages
MLprac 3
T42 BEIT3OO43 Kundalia Harsh H.
No ratings yet
Take It Easy: Created Status Last Read
Document55 pages
Take It Easy: Created Status Last Read
Sandhya
No ratings yet
Decision Tree, Random Forest
Document37 pages
Decision Tree, Random Forest
Akshay kashyap
No ratings yet
Aggialavura - Python Linear Regression Model
Document1 page
Aggialavura - Python Linear Regression Model
himtajay
No ratings yet
ML Record
Document18 pages
ML Record
harshitsr1234
No ratings yet
Experiment No 3 ML
Document6 pages
Experiment No 3 ML
ANIKET THAKUR [UCOE- 4388]
No ratings yet
Experiment No 3 ML
Document6 pages
Experiment No 3 ML
ANIKET THAKUR [UCOE- 4388]
No ratings yet
ML Lab6
Document4 pages
ML Lab6
Pankaj Mandloi
No ratings yet
K-Nearest Neighbor On Python Ken Ocuma
Document9 pages
K-Nearest Neighbor On Python Ken Ocuma
Aliyha Dionio
100% (2)
Data Analysis W Pandas
Document4 pages
Data Analysis W Pandas
x7jn4sxdn9
No ratings yet
DWDM Lab 3
Document10 pages
DWDM Lab 3
shreyastha2058
No ratings yet
Codes
Document6 pages
Codes
Vamshi Krishna
No ratings yet
ML Lab Test I232
Document4 pages
ML Lab Test I232
Namra Shah
No ratings yet
Assignment 3 Q1
Document5 pages
Assignment 3 Q1
Pratyush
No ratings yet
Machine Learning
Document54 pages
Machine Learning
Jacob
No ratings yet
FRA Cheat Sheet Week2
Document2 pages
FRA Cheat Sheet Week2
meena13121993
No ratings yet
Machine Learnin
Document23 pages
Machine Learnin
Manoj Kumar 1183
100% (1)
Exp 5
Document8 pages
Exp 5
jay
No ratings yet
Machine Learning
Document43 pages
Machine Learning
Mayank Tripathi
No ratings yet
FML File Final
Document36 pages
FML File Final
Kunal Saini
No ratings yet
Linear Regression: Scikit-Learn
Document3 pages
Linear Regression: Scikit-Learn
sripathi
No ratings yet
Linear Regression: Scikit-Learn
Document3 pages
Linear Regression: Scikit-Learn
sripathi
No ratings yet
ArbolesRF UNI
Document3 pages
ArbolesRF UNI
ALISON ELIZABETH HUAPAYA CAYCHO
No ratings yet
Python For Machine Leraning
Document30 pages
Python For Machine Leraning
Sajeev G P
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
Document16 pages
MLA Lab 6:-Implementation of Decision Tree
tushar3patil03
No ratings yet
PRACTICE Supervised Learning
Document5 pages
PRACTICE Supervised Learning
Abdi Omer Ebrahim
No ratings yet
Mark Goldstein mg3479 A2 Code
Document4 pages
Mark Goldstein mg3479 A2 Code
joziah43
No ratings yet
المستند
Document2 pages
المستند
هــبه قـاسم
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
Document20 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
Saloni Tuli
No ratings yet
WLeaf Disease Classification - ResNet50.ipynb - Colaboratory
Document12 pages
WLeaf Disease Classification - ResNet50.ipynb - Colaboratory
Tefe
No ratings yet
Ai Kode Pert 12
Document2 pages
Ai Kode Pert 12
Pidut 02
No ratings yet
Assignment 3 DS5620
Document11 pages
Assignment 3 DS5620
humaragpt
No ratings yet
DeepLearningForVisionSystems Ch5 AlexNet
Document32 pages
DeepLearningForVisionSystems Ch5 AlexNet
mkkadambi
No ratings yet
Linearregression 4
Document3 pages
Linearregression 4
api-559045701
No ratings yet
Autoencoder - MPL - Basic - Ipynb - Colaboratory PDF
Document21 pages
Autoencoder - MPL - Basic - Ipynb - Colaboratory PDF
ushasree
No ratings yet
sectionSVM PDF
Document10 pages
sectionSVM PDF
Arif Rahman Hakim
No ratings yet
Classification
Document40 pages
Classification
niranjan
No ratings yet
CTRL
Document2 pages
CTRL
jaffar bikat
No ratings yet
Kabir Khan 1147 - 4
Document4 pages
Kabir Khan 1147 - 4
mohammed.ibrahimdurrani.bscs-2020b
No ratings yet
5 Ejercicio - Experimentación Con Los Modelos de Regresión Más Eficaces - Training - Microsoft Learn Ingles
Document9 pages
5 Ejercicio - Experimentación Con Los Modelos de Regresión Más Eficaces - Training - Microsoft Learn Ingles
acxel david castillo casas
No ratings yet
Video TimingController v6.1 LogiCORE IP Pg016 - V - TC
Document89 pages
Video TimingController v6.1 LogiCORE IP Pg016 - V - TC
ibanitescu
No ratings yet
Portofolio - Powerpoint
Document24 pages
Portofolio - Powerpoint
Kevin Nuraditama
No ratings yet
9239 04 Outline Proposal Form
Document2 pages
9239 04 Outline Proposal Form
bonalambok
No ratings yet
PM85 - Manual
Document68 pages
PM85 - Manual
Mc3l
No ratings yet
Marc Kagan Resume
Document2 pages
Marc Kagan Resume
marckagan
No ratings yet
ThinkPad L460 Platform Specifications
Document1 page
ThinkPad L460 Platform Specifications
Aket Lohia
No ratings yet
Introduction To Powerpoint
Document3 pages
Introduction To Powerpoint
igwe nnabuike
No ratings yet
Chap 1 CPP 3 RD
Document40 pages
Chap 1 CPP 3 RD
Florentino Elegado
No ratings yet
Internship Report: Sher Afgun Metla 2011394 Faculty of Computer Science & Engineering
Document8 pages
Internship Report: Sher Afgun Metla 2011394 Faculty of Computer Science & Engineering
MuhammadWaqasAfridi
No ratings yet
Prismatic Design
Document12 pages
Prismatic Design
Persia Juliet
No ratings yet
Raspberry Pi Build HAT
Document1 page
Raspberry Pi Build HAT
Steve Attwood
No ratings yet
DVR
Document130 pages
DVR
John Glenn Lambayon
No ratings yet
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
Document9 pages
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
Manish Jagtap
No ratings yet
Browse The Book: First-Hand Knowledge
Document49 pages
Browse The Book: First-Hand Knowledge
Ashish
No ratings yet
S8906: Fast Data Pipelines For Deep Learning Training: Przemek Tredak, Simon Layton
Document41 pages
S8906: Fast Data Pipelines For Deep Learning Training: Przemek Tredak, Simon Layton
Ryan Bloxham
No ratings yet
Cálculo Bombas
Document10 pages
Cálculo Bombas
Catalina Acevedo
No ratings yet
Datasheet LiYCY Cable PDF
Document1 page
Datasheet LiYCY Cable PDF
ResaKandhy
100% (1)
FPGA Based Efficient Implementation of Viterbi Decoder: Anubhuti Khare
Document6 pages
FPGA Based Efficient Implementation of Viterbi Decoder: Anubhuti Khare
Krishnateja Yarrapatruni
No ratings yet
About Tenorshare 4ddig: Key Features
Document53 pages
About Tenorshare 4ddig: Key Features
Juankar Apolo
No ratings yet
Corel Draw New Product Flyer Tutorial 130528105758 Phpapp01 PDF
Document31 pages
Corel Draw New Product Flyer Tutorial 130528105758 Phpapp01 PDF
Sima Dragos
No ratings yet
Aircraft Weighing Set - Ac40lp
Document2 pages
Aircraft Weighing Set - Ac40lp
Mohamed Benrahal
No ratings yet
CH 11 Quiz
Document8 pages
CH 11 Quiz
dadu89
No ratings yet
VHDL Statements
Document34 pages
VHDL Statements
P. VENKATESHWARI
No ratings yet
Using Apache JMeter To Perform Load Testing Agains
Document10 pages
Using Apache JMeter To Perform Load Testing Agains
patrick_wangrui
100% (2)