Welcome to Scribd!

Data Cleaning With Python Cheat Sheet Anello

Uploaded by

0% found this document useful (0 votes)

10 views4 pages

This cheat sheet provides techniques for cleaning data in Python including: 1) Dealing with missing data through methods like dropna(), fillna(), and interpolation 2) Finding and dropping duplicates 3) Detecting outliers through descriptive statistics, boxplots, and outlier removal thresholds 4) Encoding categorical features using techniques like one-hot and label encoding 5) Transforming data through standardization, rescaling, and robust scaling.

Original Description:

Original Title

Data_Cleaning_with_Python_Cheat_Sheet_Anello

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

10 views4 pages

Data Cleaning With Python Cheat Sheet Anello

Uploaded by

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 4

Search inside document

Data Cleaning Cheat Sheet in

Python - By Eugenia Anello

Table of Contents:
1. Dealing with Missing Data
2. Dealing with Duplicates
3. Outlier Detection
4. Encode Categorical Features
5. Transformation
1. Dealing with Missing data
Check missing data in each column of the dataset

df.isnull().sum()

Delete missing data

df.dropna(how='all')

Drop columns that have missing values

df.dropna(how='columns')

Drop specific columns that have missing values

df.dropna(subset=[‘municipal,'city'])

Replace missing values with Mean/Median/Mode

df[‘price’].fillna(df[‘price’].mean())
df[‘age’].fillna(df[‘age’].median())
df[‘type_building’].fillna(df[‘type_building’].mode())

Forward Fill - Fill missing values with values before them

df.fillna(method='ffill')

Backward Fill - FIll missing values with values after them

df.fillna(method='bfill')

Fill missing values using the interpolation method

df['stock_price'] =
df['stock_price'].interpolate(method='polynomial',order=2)

2. Dealing with Duplicates

Check if there are duplicates

df.duplicated().sum()
Extract duplicate rows from the dataframe

df[df.duplicated()]

Drop duplicates

df.drop_duplicates()

Aggregate data

df.groupby('id').agg({'price':'mean'}).reset_index()

3. Outlier detection
Detect range of values for each column of the dataset

df.describe([x*0.1 for x in range(10)])

Display boxplot to display the distribution of a column

import seaborn as sns

sns.boxplot(x=df['age'])

Display histogram to display the distribution of a column

sns.displot(data=df[‘column1’])

Remove outliers

df = df[df['age']<df[‘age'].quantile(0.9)]

Outlier detection with machine learning models, like Isolation Forest

if = IsolationForest(random_state=42)
if.fit(X)
y_pred = if.predict(X)
4. Encode categorical features
Apply one-hot-encoding to a categorical column

from sklearn.prepreprocessing import OneHotEncoder

ohe = OneHotEncoder()
encoded_data =
pd.DataFrame(ohe.fit_transform(df[[‘type_building’,’color’]]).toarray())
new_df = df.join(encoded_data)

Apple label-encoding to a categorical column

from sklearn.prepreprocessing import LabelEncoder

le = LabelEncoder()
df[‘column1’] = le.fit_transform(df[‘price_levels’])

5. Transformation
Standardize features by removing the mean and scaling to unit variance

from sklearn.processing import StandardScaler

X_std = StandardScaler().transform(X)

Rescale features into the range [0,1]

from sklearn.processing import MinMaxScaler

X_mms = MinMaxScaler().transform(X)

Scale features exploiting statistics that are robust to outliers

from sklearn.processing import RobustScaler
X_rs = RobustScaler().transform(X)

Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
Rating: 3 out of 5 stars
3/5 (1)
Data Clearning
Document7 pages
Data Clearning
lequangtrung010389
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Pandas: Reference Sheet
Document9 pages
Pandas: Reference Sheet
deveshag
No ratings yet
Excel Techniques
From Everand
Excel Techniques
Online Trainees
Rating: 2 out of 5 stars
2/5 (1)
Commands SQL, Python (BASICS)
Document7 pages
Commands SQL, Python (BASICS)
Kuldeep Gangwar
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Pandas
Document5 pages
Pandas
Smart Crazy
No ratings yet
Python Cheat Sheet Code Academy
Document1 page
Python Cheat Sheet Code Academy
Jaroslav Kostelník
100% (1)
Pandas Cheat Sheet PDF
Document1 page
Pandas Cheat Sheet PDF
moraleseconomia
50% (2)
Real Estate
Document10 pages
Real Estate
Muskaan
No ratings yet
Pandas Commands
Document3 pages
Pandas Commands
vijayagajula1359
No ratings yet
Data Science Cheat Sheet: KEY Imports
Document1 page
Data Science Cheat Sheet: KEY Imports
Mihai Dragomir
100% (1)
Practicum Data Analysis Takeaways Course1 Theme6 Us
Document2 pages
Practicum Data Analysis Takeaways Course1 Theme6 Us
Thai Chi
No ratings yet
Pandas Cheat Sheet
Document6 pages
Pandas Cheat Sheet
shan halder
100% (2)
Literacy Rate Analysis Coding and Output
Document22 pages
Literacy Rate Analysis Coding and Output
Suchita
No ratings yet
Practicle6 (Code)
Document4 pages
Practicle6 (Code)
Pallavi Gaikwad
No ratings yet
Imp Details
Document6 pages
Imp Details
Jyotirmay Sahu
No ratings yet
C121 Exp2
Document23 pages
C121 Exp2
Devanshu Maheshwari
No ratings yet
Pandas - PySpark Equivalents-1
Document3 pages
Pandas - PySpark Equivalents-1
Rufai
No ratings yet
Data Exploration Preparation
Document12 pages
Data Exploration Preparation
hamidsithole65
No ratings yet
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
Document9 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
rameshb87
100% (3)
Machine Learning Notes: 2. All The Commands For Eda
Document5 pages
Machine Learning Notes: 2. All The Commands For Eda
naveen katta
100% (1)
C121 Exp1
Document32 pages
C121 Exp1
Devanshu Maheshwari
No ratings yet
EDA Cheat Sheet - Exploratory Data Analysis
Document2 pages
EDA Cheat Sheet - Exploratory Data Analysis
Vanshika Rastogi
No ratings yet
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
Document10 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
Raju Rimal
100% (1)
Form1: "Ingrese Nombre"
Document1 page
Form1: "Ingrese Nombre"
alejandrolsm1
No ratings yet
FakeNewsDetection Student
Document7 pages
FakeNewsDetection Student
nehaila
No ratings yet
Mercedes-Benz Greener Manufacturing Ai
Document16 pages
Mercedes-Benz Greener Manufacturing Ai
Puji
0% (1)
Learn Pandas Dataframe Object Cheat Sheet Part-8
Document8 pages
Learn Pandas Dataframe Object Cheat Sheet Part-8
Satyabrat Das
No ratings yet
Data Cleaning & Preparation
Document2 pages
Data Cleaning & Preparation
Nisha R S
No ratings yet
Pandas DataFrame Notes
Document6 pages
Pandas DataFrame Notes
Nhan Nguyen
100% (1)
Data Analysis W Pandas
Document4 pages
Data Analysis W Pandas
x7jn4sxdn9
No ratings yet
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
Document37 pages
Descriptive Statistics With Pandas: Data Handling Using Pandas - II
B. Jennifer
100% (1)
Python Pandas Demo PDF
Document23 pages
Python Pandas Demo PDF
Rakshit Kukreja
100% (2)
Pandas Usefull Code
Document2 pages
Pandas Usefull Code
أحمد موريس
No ratings yet
How To Use Popular Data Structures and Algorithms in Python ?
Document11 pages
How To Use Popular Data Structures and Algorithms in Python ?
Satish Singh
100% (1)
PYTHON PANDAS Cheat Sheet
Document2 pages
PYTHON PANDAS Cheat Sheet
Indri Dayanah Ayulani
No ratings yet
Pandas Library Problems For Parctice
Document13 pages
Pandas Library Problems For Parctice
MAXI 32
No ratings yet
Python Cheat Sheet For Data Analysis
Document2 pages
Python Cheat Sheet For Data Analysis
Abdullah amin
No ratings yet
Datavischeatsheet
Document2 pages
Datavischeatsheet
rcg97.hd
No ratings yet
AL Notes
Document61 pages
AL Notes
Shikha Jayaswal
No ratings yet
ML 1-10
Document53 pages
ML 1-10
22128008
No ratings yet
Assignment 1 - LP1
Document14 pages
Assignment 1 - LP1
bbad070105
No ratings yet
Cognitive Class - Answers Data Analysis With Python
Document6 pages
Cognitive Class - Answers Data Analysis With Python
Sloan Ian Ariff
No ratings yet
Rec 11
Document49 pages
Rec 11
Tamer Sultany
No ratings yet
PW2 DataCleaning
Document6 pages
PW2 DataCleaning
hhaline9
No ratings yet
Document Py
Document8 pages
Document Py
Udaya sankari sankari
No ratings yet
CSC KV CHN - Stack With Worksheet
Document10 pages
CSC KV CHN - Stack With Worksheet
Thenmozhi Govindaraj
No ratings yet
Multiple - Linear - Regression - AirBNB - Student - File0.2 - New (1) .Ipynb - Colaboratory
Document8 pages
Multiple - Linear - Regression - AirBNB - Student - File0.2 - New (1) .Ipynb - Colaboratory
SHEKHAR SWAMI
No ratings yet
Data Wrangling PDF
Document14 pages
Data Wrangling PDF
Edgar Guerrero
No ratings yet
Important Pandas Operations 1697910759
Document6 pages
Important Pandas Operations 1697910759
qhksfqnqsr
No ratings yet
Python Quiz
Document220 pages
Python Quiz
Ranvitha G
No ratings yet
Untitled2.ipynb - Colaboratory
Document2 pages
Untitled2.ipynb - Colaboratory
rvvnbrao
No ratings yet
Prac 303
Document19 pages
Prac 303
VIGHNESH RAJEEVAN
No ratings yet
String (Pandas) - Removing $ After Int Sales ( Revenue') Sales ( Revenue') .STR - Strip ( $') #Convert String To Int
Document12 pages
String (Pandas) - Removing $ After Int Sales ( Revenue') Sales ( Revenue') .STR - Strip ( $') #Convert String To Int
MAHMUDA AKTER KEYA
No ratings yet
MAU Practica4
Document16 pages
MAU Practica4
Paul Pinto J
No ratings yet
Pandas: Import
Document13 pages
Pandas: Import
hello
100% (1)
HIV Regression Source Code
Document26 pages
HIV Regression Source Code
Văn Thịnh
No ratings yet
Mathematics: Quarter 1 - Module 7 Polynomial Equation
Document10 pages
Mathematics: Quarter 1 - Module 7 Polynomial Equation
Limar Anasco Escaso
No ratings yet
S.No. Name of The Agency Contact Details: M/s M.P. Printers
Document3 pages
S.No. Name of The Agency Contact Details: M/s M.P. Printers
VIKKAS AGARWAL
No ratings yet
Report Project
Document8 pages
Report Project
Nadeem
No ratings yet
The Greatest Inventions in The Past 1000 Years
Document2 pages
The Greatest Inventions in The Past 1000 Years
Rosario Cabarrubias
No ratings yet
Information and Communication Technology
Document18 pages
Information and Communication Technology
Avantee Singh
No ratings yet
مختبر ابلايد 1
Document6 pages
مختبر ابلايد 1
عبد الرحمن أبوخاطر, أبو رزق
No ratings yet
Navfac Far East Waterfrnt Sec Sept 2019 A
Document37 pages
Navfac Far East Waterfrnt Sec Sept 2019 A
MICHAEL JOHN Tolonghari
No ratings yet
Juan Dela Cruz: UP Mindanao Security Personnel Profile
Document4 pages
Juan Dela Cruz: UP Mindanao Security Personnel Profile
invictuslorenzo
No ratings yet
Implementing Client Virtualization and Cloud Computing
Document33 pages
Implementing Client Virtualization and Cloud Computing
Ethan Boois
No ratings yet
Leanna Balinado - Puzzle Design
Document13 pages
Leanna Balinado - Puzzle Design
api-529497466
No ratings yet
Recurrent Neural Networks: Anahita Zarei, PH.D
Document37 pages
Recurrent Neural Networks: Anahita Zarei, PH.D
Nick
No ratings yet
The Role of Computers in The Development of Odia Language
Document5 pages
The Role of Computers in The Development of Odia Language
IJRASETPublications
No ratings yet
Valliammai Engineering College: SRM Nagar, Kattankulathur - 603 203
Document9 pages
Valliammai Engineering College: SRM Nagar, Kattankulathur - 603 203
nandini
No ratings yet
Practical 5 - Text Processing PDF
Document2 pages
Practical 5 - Text Processing PDF
Darian Chetty
No ratings yet
Mini Project Doc 2
Document25 pages
Mini Project Doc 2
Likhil Goud
No ratings yet
Revised Inauguration Function
Document2 pages
Revised Inauguration Function
Usha Krishna
No ratings yet
Repair The Great Conduit - Hob Walkthrough - Neoseeker
Document7 pages
Repair The Great Conduit - Hob Walkthrough - Neoseeker
Enla Mira Tetengo
No ratings yet
Sil4s en Silence Full
Document23 pages
Sil4s en Silence Full
Ashish Pawar
No ratings yet
Section 1 PDF
Document2 pages
Section 1 PDF
Robert Fergusto
No ratings yet
Gujarat Technological University
Document2 pages
Gujarat Technological University
Vijay Dharajiya
No ratings yet
Final Thesis On 3phase Fault Analysis
Document73 pages
Final Thesis On 3phase Fault Analysis
abel tibebu
No ratings yet
6.4 Worksheet
Document6 pages
6.4 Worksheet
mohanad sayed
No ratings yet
Rab Foodcourt Alt 1
Document176 pages
Rab Foodcourt Alt 1
cita
No ratings yet
B.tech (EEE 2011), Fresher Resume
Document3 pages
B.tech (EEE 2011), Fresher Resume
Madhar Sha
68% (28)
Danfoss Modbus2
Document4 pages
Danfoss Modbus2
erick paredes
No ratings yet
Disomat Opus: System Manual
Document36 pages
Disomat Opus: System Manual
rajban cement
No ratings yet
GPOA (PR Consultant)
Document7 pages
GPOA (PR Consultant)
Jeydrew TV
No ratings yet
Manual Warp Pyxis
Document13 pages
Manual Warp Pyxis
Pablo Ruiz
No ratings yet
Electric Substation Practices PDF
Document11 pages
Electric Substation Practices PDF
Shelke Pankaj
No ratings yet
Tge463 Pnqx778ya
Document96 pages
Tge463 Pnqx778ya
Charles Lee
No ratings yet
Summary of Mustafa Suleyman's The Coming Wave
From Everand
Summary of Mustafa Suleyman's The Coming Wave
Milkyway Media
No ratings yet
100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi
From Everand
100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi
Ben Preston
No ratings yet
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
From Everand
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
Alec Rowe
No ratings yet
Generative AI: The Insights You Need from Harvard Business Review
From Everand
Generative AI: The Insights You Need from Harvard Business Review
Harvard Business Review
Rating: 4.5 out of 5 stars
4.5/5 (2)
Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World
From Everand
Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World
Mo Gawdat
Rating: 4.5 out of 5 stars
4.5/5 (55)
HBR's 10 Must Reads on AI, Analytics, and the New Machine Age
From Everand
HBR's 10 Must Reads on AI, Analytics, and the New Machine Age
Michael E. Porter
Rating: 4.5 out of 5 stars
4.5/5 (69)
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
From Everand
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Alec Rowe
No ratings yet
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
From Everand
Mastering Large Language Models: Advanced techniques, applications, cutting-edge methods, and top LLMs (English Edition)
Sanket Subhash Khandare
No ratings yet
Four Battlegrounds: Power in the Age of Artificial Intelligence
From Everand
Four Battlegrounds: Power in the Age of Artificial Intelligence
Paul Scharre
Rating: 5 out of 5 stars
5/5 (5)
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
From Everand
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
Pedro Domingos
Rating: 4.5 out of 5 stars
4.5/5 (107)
Deep Utopia: Life and Meaning in a Solved World
From Everand
Deep Utopia: Life and Meaning in a Solved World
Nick Bostrom
No ratings yet
Artificial Intelligence: The Insights You Need from Harvard Business Review
From Everand
Artificial Intelligence: The Insights You Need from Harvard Business Review
Thomas H. Davenport
Rating: 4.5 out of 5 stars
4.5/5 (104)
ChatGPT: Is the future already here?
From Everand
ChatGPT: Is the future already here?
Rodrigo Serzedello
Rating: 4 out of 5 stars
4/5 (21)
Working with AI: Real Stories of Human-Machine Collaboration (Management on the Cutting Edge)
From Everand
Working with AI: Real Stories of Human-Machine Collaboration (Management on the Cutting Edge)
Thomas H. Davenport
Rating: 5 out of 5 stars
5/5 (5)
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business
From Everand
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business
Alec Rowe
No ratings yet
Who's Afraid of AI?: Fear and Promise in the Age of Thinking Machines
From Everand
Who's Afraid of AI?: Fear and Promise in the Age of Thinking Machines
Thomas Ramge
Rating: 4.5 out of 5 stars
4.5/5 (13)
Writing AI Prompts For Dummies
From Everand
Writing AI Prompts For Dummies
Stephanie Diamond
No ratings yet
Artificial Intelligence: A Guide for Thinking Humans
From Everand
Artificial Intelligence: A Guide for Thinking Humans
Melanie Mitchell
Rating: 4.5 out of 5 stars
4.5/5 (30)
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
From Everand
Python Machine Learning - Third Edition: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition
Sebastian Raschka
Rating: 5 out of 5 stars
5/5 (2)
Your AI Survival Guide: Scraped Knees, Bruised Elbows, and Lessons Learned from Real-World AI Deployments
From Everand
Your AI Survival Guide: Scraped Knees, Bruised Elbows, and Lessons Learned from Real-World AI Deployments
Sol Rashidi
No ratings yet
Make Money with ChatGPT: Your Guide to Making Passive Income Online with Ease using AI: AI Wealth Mastery
From Everand
Make Money with ChatGPT: Your Guide to Making Passive Income Online with Ease using AI: AI Wealth Mastery
Ben Preston
No ratings yet
A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going
From Everand
A Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going
Michael Wooldridge
Rating: 4.5 out of 5 stars
4.5/5 (3)
T-Minus AI: Humanity's Countdown to Artificial Intelligence and the New Pursuit of Global Power
From Everand
T-Minus AI: Humanity's Countdown to Artificial Intelligence and the New Pursuit of Global Power
Michael Kanaan
Rating: 4 out of 5 stars
4/5 (4)
The AI Advantage: How to Put the Artificial Intelligence Revolution to Work
From Everand
The AI Advantage: How to Put the Artificial Intelligence Revolution to Work
Thomas H. Davenport
Rating: 4 out of 5 stars
4/5 (7)
The Roadmap to AI Mastery: A Guide to Building and Scaling Projects
From Everand
The Roadmap to AI Mastery: A Guide to Building and Scaling Projects
Somdip Dey
No ratings yet
Hands-On System Design: Learn System Design, Scaling Applications, Software Development Design Patterns with Real Use-Cases
From Everand
Hands-On System Design: Learn System Design, Scaling Applications, Software Development Design Patterns with Real Use-Cases
Harsh Kumar Ramchandani
No ratings yet
HBR Guide to AI Basics for Managers
From Everand
HBR Guide to AI Basics for Managers
Harvard Business Review
No ratings yet
System Design Interview: 300 Questions And Answers: Prepare And Pass
From Everand
System Design Interview: 300 Questions And Answers: Prepare And Pass
Rob Botwright
No ratings yet
Architects of Intelligence: The truth about AI from the people building it
From Everand
Architects of Intelligence: The truth about AI from the people building it
Martin Ford
Rating: 4.5 out of 5 stars
4.5/5 (21)
3550+ Most Effective ChatGPT Prompts
From Everand
3550+ Most Effective ChatGPT Prompts
Om Prakash Saini
No ratings yet