Welcome to Scribd!

Practicum Data Analysis Takeaways Course1 Theme6 Us

Uploaded by

0% found this document useful (0 votes)

4 views2 pages

The document discusses various data preprocessing techniques in Python including: 1. The set_axis(), isnull(), isna(), fillna(), and replace() methods for modifying, finding, and filling missing values. 2. The dropna(), duplicated(), drop_duplicates(), and unique() methods for deleting rows with missing values and identifying duplicate rows. 3. Guidelines for column names and handling missing values such as deleting them or filling them in using available data.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

4 views2 pages

Practicum Data Analysis Takeaways Course1 Theme6 Us

Uploaded by

Thai Chi

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Data preprocessing

Syntax
The set_axis() method modifying column names
In df.set_axis(['a','b','c'],axis = 'columns',inplace = True)

# arguments are a list of new column names,

# axis with the 'columns' value for changes in columns,
# 'inplace' with the value True for changes to the data structure

The isnull() and isna() methods for finding The dropna() method for deleting
missing values missing values
In
df.isnull() In df.dropna()
df.isna() # delete all rows containing
# at least one missing value

df.isnull().sum() df.dropna(subset = ['a','b','c'],

df.isna().sum() In
inplace = True)
# the 'subset' argument is the names
# of the columns, in which you need
The fillna() method for filling
# to find missing values
in missing values
In df = df.fillna(0) In df.dropna(axis = 'columns',
# the argument with the new value that inplace = True)
# will replace all the missing values # axis argument with the value ‘columns’
# for deleting columns with at least
# one missing value

The duplicated() method for finding duplicates The drop_duplicates() method for finding duplicates
In df.duplicated() df.drop_duplicates().reset_index(drop
In
= True)
# Along with the method sum() - returns # the argument drop with the value True,
# the total number of duplicates # so you avoid creating a column with
df.duplicated().sum() # old index values

'''
The unique() method for seeing all When calling the method
unique values in a column drop_duplicates() along with repeating
In df['column'].unique() rows, their indices are deleted, at
which point the method reset_index()
is used.
'''

The replace() method for replacing values in a table or column:

In df.replace('first_value', 'second_value')

# the first argument is the current value

# the second argument is the new value
Glossary
Preprocessing Missing values can be deleted or filled in using available
Preparing data for subsequent analysis. The idea is to find data:
and eliminate potential problems in the data.
• most often, they’re None or NaN
GIGO (GIGO (garbage in, garbage out))
• Placeholders of a generally accepted standard,
The principle that when you have poor input data , even the
sometimes
best analytical algorithm will return poor results.
one you don’t know about, but which the compilers
stick to. Most often, they’re n/a, na, NA, and N.N. or
A table that makes it easy to analyze data:
NN
• each column stores the values • a random value the creators of a source data table
for one variable have decided to use.
• each row contains one observation the values,
Missing values can be deleted
for different
or filled in using available data:
variables are tied to
• the upside to deleting them is that it’s a simple pro-
Column names cess. That also makes sure that the remaining data
is clean and matches all the requirements. Potential
• without spaces at the beginning, at the end, or in the
downsides: losing important information and reduc-
middle
ing accuracy.
• multiple words are separated
• filling in missing values lets you save the most data.
by underscores
An obvious drawback is that you can get poor results
• in the same language and case based on existing data.

• briefly describe the kind There are different kinds

of information each column contains of duplicates:

• two or more rows containing identical information.

Lots of repetitions pad out tables, forcing us to spend
more time processing data.

• categories with different names but identical subject

matters (Politics and Political Situation, for instance).
Disguised repetitions can cause serious roadblocks
for analysis that are difficult to pinpoint.

Oracle Workflow
Document56 pages
Oracle Workflow
ladabala
No ratings yet
Perl 0411 PDF
Document124 pages
Perl 0411 PDF
Gobara Dhan
100% (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Introduction To Perl
Document62 pages
Introduction To Perl
Parimal Dave
100% (1)
Scripting Languages Advanced Perl: Course: 67557 Hebrew University Lecturer: Elliot Jaffe - הפי טוילא
Document44 pages
Scripting Languages Advanced Perl: Course: 67557 Hebrew University Lecturer: Elliot Jaffe - הפי טוילא
vcosmin
100% (1)
Pandas: Reference Sheet
Document9 pages
Pandas: Reference Sheet
deveshag
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
CodeJam Exercise 3 CDS
Document34 pages
CodeJam Exercise 3 CDS
Gurushantha Doddamani
No ratings yet
EDA Cheat Sheet - Exploratory Data Analysis
Document2 pages
EDA Cheat Sheet - Exploratory Data Analysis
Vanshika Rastogi
No ratings yet
1 SQR
Document31 pages
1 SQR
Yashwanth
No ratings yet
Datavischeatsheet
Document2 pages
Datavischeatsheet
rcg97.hd
No ratings yet
Chapter 4 Arrays in PHP
Document26 pages
Chapter 4 Arrays in PHP
mohamedabdulkadir767
No ratings yet
Unit 4 - Session 6
Document33 pages
Unit 4 - Session 6
Mahek Agarwal (RA1911003010893)
No ratings yet
Practical Extraction and Reporting Language
Document45 pages
Practical Extraction and Reporting Language
itanilmits
No ratings yet
Python-for-Data-Analysis (Pandas
Document31 pages
Python-for-Data-Analysis (Pandas
Naman Jain
No ratings yet
R Basic
Document16 pages
R Basic
Taslima Chowdhury
No ratings yet
SQL Notes
Document24 pages
SQL Notes
TheMemes Guy
No ratings yet
Chapter 4 Arrays in PHP-1
Document21 pages
Chapter 4 Arrays in PHP-1
Abdalla
No ratings yet
Clustsig R
Document7 pages
Clustsig R
Severodvinsk Masterskaya
No ratings yet
Data Manipulation Using R: Acm Datascience Camp
Document35 pages
Data Manipulation Using R: Acm Datascience Camp
Anurag Sharma
No ratings yet
Database Management System
Document9 pages
Database Management System
Divyanshu Yadav
No ratings yet
Python For Data Science Nympy and Pandas
Document4 pages
Python For Data Science Nympy and Pandas
StocknEarn
No ratings yet
Perl
Document62 pages
Perl
vivek_121079
No ratings yet
PYTHON PANDAS Cheat Sheet
Document2 pages
PYTHON PANDAS Cheat Sheet
Indri Dayanah Ayulani
No ratings yet
Data Definition Language: Module of Instruction
Document14 pages
Data Definition Language: Module of Instruction
chobiipiggy26
No ratings yet
Machine Learning Lab: Delhi Technological University
Document6 pages
Machine Learning Lab: Delhi Technological University
Himanshu Singh
No ratings yet
Commands SQL, Python (BASICS)
Document7 pages
Commands SQL, Python (BASICS)
Kuldeep Gangwar
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
Document8 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
Sudan Pudasaini
No ratings yet
Chap2 P3
Document36 pages
Chap2 P3
HO KUN HAN
No ratings yet
Arrays and Do Loops
Document18 pages
Arrays and Do Loops
Mpho Seutloali
No ratings yet
CS50 2023 Notes
Document26 pages
CS50 2023 Notes
johnappleseed4life
No ratings yet
Unit 3 - Database Management System: Tables/Relations Are Saved in
Document15 pages
Unit 3 - Database Management System: Tables/Relations Are Saved in
riya
No ratings yet
Very Short Answer Type Q-1
Document1 page
Very Short Answer Type Q-1
B9B25
No ratings yet
Content Pandas Cheat Sheet
Document9 pages
Content Pandas Cheat Sheet
Turya Ganguly
No ratings yet
COMFUN2 Chapter 11 12
Document4 pages
COMFUN2 Chapter 11 12
Jd Azada
No ratings yet
PHP - Variable Types
Document6 pages
PHP - Variable Types
Fetsum Lakew
No ratings yet
Pandas
Document41 pages
Pandas
Sivam Chinna
No ratings yet
03 - Introduction To Pandas 6
Document1 page
03 - Introduction To Pandas 6
Kabir West
No ratings yet
Exploratory Data Analysis and Graphics: Lab 2
Document19 pages
Exploratory Data Analysis and Graphics: Lab 2
juntujuntu
No ratings yet
Array in PHP: Course: Z1167 Advanced in Web Based Application Development Year: 2019
Document26 pages
Array in PHP: Course: Z1167 Advanced in Web Based Application Development Year: 2019
Adrian Tanuwijaya
No ratings yet
Data Cleansing - Manipulation
Document22 pages
Data Cleansing - Manipulation
heryads
100% (1)
Sample Paper 2 Sol
Document3 pages
Sample Paper 2 Sol
Unknown Man
No ratings yet
Import Theory Question - SQL
Document5 pages
Import Theory Question - SQL
I'M POPZSHA GAMER
No ratings yet
Trie - Wikipedia
Document10 pages
Trie - Wikipedia
xbsd
No ratings yet
Omputer Cience: 3. Arrays
Document48 pages
Omputer Cience: 3. Arrays
Hassan Tariq
No ratings yet
CS8651 - Ip - Unit - Iv - 2 - PHP Variables
Document8 pages
CS8651 - Ip - Unit - Iv - 2 - PHP Variables
Durai samy
No ratings yet
Data & Variable Transformation: Recode and Transform Variables Summarise Variables and Cases Descriptives and Summaries
Document1 page
Data & Variable Transformation: Recode and Transform Variables Summarise Variables and Cases Descriptives and Summaries
ayrusurya
No ratings yet
13 Standard Algorithms: 13.1 Finding The Right Element: Binary Search
Document26 pages
13 Standard Algorithms: 13.1 Finding The Right Element: Binary Search
api-3745065
No ratings yet
PW2 DataCleaning
Document6 pages
PW2 DataCleaning
hhaline9
No ratings yet
Perl Basics
Document90 pages
Perl Basics
prajwalshekar
No ratings yet
An Introduction To Data Entry, Data Analysis, and Graphing Using SPSS
Document49 pages
An Introduction To Data Entry, Data Analysis, and Graphing Using SPSS
Minh
No ratings yet
Merge, Join, and Concatenate: Concatenating Objects
Document62 pages
Merge, Join, and Concatenate: Concatenating Objects
suresh
No ratings yet
Python Data Frame New
Document32 pages
Python Data Frame New
Ben Ten
No ratings yet
PYTHON Pandas and Manipulation Data
Document36 pages
PYTHON Pandas and Manipulation Data
QORRY HILDA TIF
No ratings yet
8 Type
Document81 pages
8 Type
an.nguyenhoang2003
No ratings yet
Cheats Hee Ten
Document14 pages
Cheats Hee Ten
spleen-5230
No ratings yet
Algor 39
Document27 pages
Algor 39
heypartygirl
No ratings yet
Demo Class 15 and 16102022 (Pandas in Python)
Document45 pages
Demo Class 15 and 16102022 (Pandas in Python)
Oskar Nguyen
No ratings yet
Day 1
Document8 pages
Day 1
krypton
No ratings yet
Data Exploration Preparation
Document12 pages
Data Exploration Preparation
hamidsithole65
No ratings yet
Imp Details
Document6 pages
Imp Details
Jyotirmay Sahu
No ratings yet
ITS62604 Tutorial 6 (Answer)
Document2 pages
ITS62604 Tutorial 6 (Answer)
Teng Jun teh
No ratings yet
Week 7
Document10 pages
Week 7
Hanumanthu Gouthami
No ratings yet
Rural Livelihood Change
Document11 pages
Rural Livelihood Change
Thai Chi
No ratings yet
Radel2010 Article AgriculturalLivelihoodTransiti
Document14 pages
Radel2010 Article AgriculturalLivelihoodTransiti
Thai Chi
No ratings yet
Global Forest Transition
Document33 pages
Global Forest Transition
Thai Chi
No ratings yet
Course1 Theme2 List and Loop
Document1 page
Course1 Theme2 List and Loop
Thai Chi
No ratings yet
UNIT-3 Data Mining Primitives, Languages, and System Architectures
Document27 pages
UNIT-3 Data Mining Primitives, Languages, and System Architectures
deeuGirl
No ratings yet
1 Driving School
Document4 pages
1 Driving School
Sab Fumi
100% (1)
Dbms Lab Manual
Document96 pages
Dbms Lab Manual
MEDAGUM AARTHI VINATHI LAKSHMI CSEUG-2020 BATCH
No ratings yet
DSM Ar Staging Interface Base Tables
Document3 pages
DSM Ar Staging Interface Base Tables
Nagaraj Gunti
No ratings yet
Massachusetts Institute of Technology: Database Systems: Fall 2008 Quiz II
Document12 pages
Massachusetts Institute of Technology: Database Systems: Fall 2008 Quiz II
igogin
No ratings yet
DBMS Set 1
Document13 pages
DBMS Set 1
Santhosh K
100% (1)
DAC 11g
Document358 pages
DAC 11g
bsampatk
100% (1)
Chapter 5. Enhanced Entity Relationship Modeling
Document33 pages
Chapter 5. Enhanced Entity Relationship Modeling
Kibur
No ratings yet
What Is Normalization in SQL and What Are Its Types
Document6 pages
What Is Normalization in SQL and What Are Its Types
Saagar Shetage
No ratings yet
Miro Sap MM
Document5 pages
Miro Sap MM
Pradipta Mallick
100% (1)
Geodatabase: KH 4513 Geographical Information System (Gis)
Document15 pages
Geodatabase: KH 4513 Geographical Information System (Gis)
Fatima rafiq
No ratings yet
Spring Hibernate JSF Primefaces Intergration
Document21 pages
Spring Hibernate JSF Primefaces Intergration
Huy Quan Vu
No ratings yet
Veeam Backup 11 0 Enterprise Manager User Guide
Document341 pages
Veeam Backup 11 0 Enterprise Manager User Guide
mafe Murcia
100% (1)
Altered Fingerprints
Document22 pages
Altered Fingerprints
VPLAN INFOTECH
No ratings yet
Database Programming With PL/SQL 1-1: Practice Activities
Document2 pages
Database Programming With PL/SQL 1-1: Practice Activities
Naski Kuafni
No ratings yet
Introduction To Oracle Database Appliance
Document4 pages
Introduction To Oracle Database Appliance
Amr Mohammed
No ratings yet
Oracle Database 12cR2 New Features
Document48 pages
Oracle Database 12cR2 New Features
Constantin Caia
No ratings yet
Installing Oracle Database 11g Release 1 On Enterprise Linux 5 (32 - and 64-Bit)
Document19 pages
Installing Oracle Database 11g Release 1 On Enterprise Linux 5 (32 - and 64-Bit)
Irena Susanti
No ratings yet
Master AccessSQL
Document66 pages
Master AccessSQL
BenjaminBegovic
No ratings yet
Data Warehousing and Data Mining
Document18 pages
Data Warehousing and Data Mining
lskannan47
No ratings yet
Solution Er
Document4 pages
Solution Er
Ahmad M. Khalifi
100% (1)
memoq introduction module 1 - setting up a memoq project.mp4-文稿-转写结果
Document4 pages
memoq introduction module 1 - setting up a memoq project.mp4-文稿-转写结果
COLA PEMBERTON
No ratings yet
MySQL Engine Os 44
Document7 pages
MySQL Engine Os 44
Mostafa Roshdy
No ratings yet
Ou Dbms IV Sem Notes
Document127 pages
Ou Dbms IV Sem Notes
Vistas
No ratings yet
Configure Bind As A Master-Authoritative Private DNS Server
Document4 pages
Configure Bind As A Master-Authoritative Private DNS Server
Dimuthu Ruwan Bandara Daundasekara
No ratings yet
Best Practices - BW Data Loading & Performance
Document37 pages
Best Practices - BW Data Loading & Performance
Debanshu Mukherjee
0% (1)
Lab 1 Introduction To Oracle
Document18 pages
Lab 1 Introduction To Oracle
Karam Salah
No ratings yet