Welcome to Scribd!

EDA - Session-7 - Convert Categorical To Numerical

Uploaded by

0% found this document useful (0 votes)

6 views5 pages

The document discusses techniques for encoding categorical variables in machine learning. It shows how to map categorical variables in a dataset to numerical values using dictionaries. Specifically, it maps the 'continent' and 'case_status' columns to integer values. It then loops through all other categorical columns, gets unique values for each, maps them to integers, and applies this mapping to the dataframe. This encoding process prepares the categorical variables for machine learning models which require numerical input.

Original Description:

Original Title

EDA_Session-7_Convert categorical to Numerical

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

6 views5 pages

EDA - Session-7 - Convert Categorical To Numerical

Uploaded by

jeeshu048

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 5

Search inside document

In [1]: # Import the packages

# read the data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

path=r"C:\Users\omkar\OneDrive\Documents\Data science\Naresh IT\Datafiles\Vi
visa_df=pd.read_csv(path)
visa_df.head(3)

Out[1]: case_id continent education_of_employee has_job_experience requires_job_training no_o

0 EZYV01 Asia High School N N

1 EZYV02 Asia Master's Y N

2 EZYV03 Asia Bachelor's N Y

 

In Machine learning it is very important to convert categorical data to numerical data

Machine learning models develop by Maths
Machine learning takes the input in the form of Numbers only
To convert the we have some encoding techniques
Label Encoder
map
np.where
using sklearn package: LabelEncoder
One hot encoder
using pandas package: pd.get_dummies

𝑚𝑎𝑝
Before applying map method first get the unique labels of the column
For example case_status is a categorical column
It has two unique labels are there
Denied
Certified
Create a dictionary key as label, value as number
d={'Certified':0,'Denied':1}
This dictionary we need to map the case_status column

In [2]: visa_df['case_status'].unique()

Out[2]: array(['Denied', 'Certified'], dtype=object)

In [4]: d={'Certified':0,'Denied':1}
visa_df['case_status']=visa_df['case_status'].map(d)
In [5]: visa_df

Out[5]: case_id continent education_of_employee has_job_experience requires_job_trainin

0 EZYV01 Asia High School N

1 EZYV02 Asia Master's Y

2 EZYV03 Asia Bachelor's N

3 EZYV04 Asia Bachelor's N

4 EZYV05 Africa Master's Y

... ... ... ... ...

25475 EZYV25476 Asia Bachelor's Y

25476 EZYV25477 Asia High School Y

25477 EZYV25478 Asia Master's Y

25478 EZYV25479 Asia Master's Y

25479 EZYV25480 Asia Bachelor's Y

25480 rows × 12 columns

 

In [7]: # when ever you go the error

# run all together
path=r"C:\Users\omkar\OneDrive\Documents\Data science\Naresh IT\Datafiles\Vi
visa_df=pd.read_csv(path)
d={'Certified':0,'Denied':1}
visa_df['case_status']=visa_df['case_status'].map(d)
visa_df

Out[7]: case_id continent education_of_employee has_job_experience requires_job_trainin

0 EZYV01 Asia High School N

1 EZYV02 Asia Master's Y

2 EZYV03 Asia Bachelor's N

3 EZYV04 Asia Bachelor's N

4 EZYV05 Africa Master's Y

... ... ... ... ...

25475 EZYV25476 Asia Bachelor's Y

25476 EZYV25477 Asia High School Y

25477 EZYV25478 Asia Master's Y

25478 EZYV25479 Asia Master's Y

25479 EZYV25480 Asia Bachelor's Y

25480 rows × 12 columns

 
In [10]: d={}
labels=visa_df['continent'].unique()
for i in range(len(labels)):
d[labels[i]]=i

visa_df['continent']=visa_df['continent'].map(d)
visa_df

Out[10]: case_id continent education_of_employee has_job_experience requires_job_trainin

0 EZYV01 0 High School N

1 EZYV02 0 Master's Y

2 EZYV03 0 Bachelor's N

3 EZYV04 0 Bachelor's N

4 EZYV05 1 Master's Y

... ... ... ... ...

25475 EZYV25476 0 Bachelor's Y

25476 EZYV25477 0 High School Y

25477 EZYV25478 0 Master's Y

25478 EZYV25479 0 Master's Y

25479 EZYV25480 0 Bachelor's Y

25480 rows × 12 columns

 

In [14]: path=r"C:\Users\omkar\OneDrive\Documents\Data science\Naresh IT\Datafiles\Vi

visa_df=pd.read_csv(path)
visa_df

Out[14]: case_id continent education_of_employee has_job_experience requires_job_trainin

0 EZYV01 Asia High School N

1 EZYV02 Asia Master's Y

2 EZYV03 Asia Bachelor's N

3 EZYV04 Asia Bachelor's N

4 EZYV05 Africa Master's Y

... ... ... ... ...

25475 EZYV25476 Asia Bachelor's Y

25476 EZYV25477 Asia High School Y

25477 EZYV25478 Asia Master's Y

25478 EZYV25479 Asia Master's Y

25479 EZYV25480 Asia Bachelor's Y

25480 rows × 12 columns

 
In [ ]: # case_id cate
# labels 25480
# for i in range(25480):
# d[case_id[1]]=

In [25]: # Read the data

path=r"C:\Users\omkar\OneDrive\Documents\Data science\Naresh IT\Datafiles\Vi
visa_df=pd.read_csv(path)
cat_cols=visa_df.select_dtypes(include='object').columns

d={}
for j in cat_cols[1:]: # j= column
labels=visa_df[j].unique()
for i in range(len(labels)): # i =number
d[labels[i]]=i
visa_df[j]=visa_df[j].map(d)

visa_df

Out[25]: case_id continent education_of_employee has_job_experience requires_job_trainin

0 EZYV01 0 0 0

1 EZYV02 0 1 1

2 EZYV03 0 2 0

3 EZYV04 0 2 0

4 EZYV05 1 1 1

... ... ... ... ...

25475 EZYV25476 0 2 1

25476 EZYV25477 0 0 1

25477 EZYV25478 0 1 1

25478 EZYV25479 0 1 1

25479 EZYV25480 0 2 1

25480 rows × 12 columns

 

In [16]: cat_cols=visa_df.select_dtypes(include='object').columns
cat_cols[1:]

Out[16]: Index(['continent', 'education_of_employee', 'has_job_experience',

'requires_job_training', 'region_of_employment', 'unit_of_wage',
'full_time_position', 'case_status'],
dtype='object')

In [ ]: # we always drop the id columns

# id columns never provide any information

LabelEncoder

LabelEncoder is pacakge avialabel in sklearn

Sickit learn heart of ML
Read the package
Save the package
Apply fit transform

In [26]: # Read the data again

path=r"C:\Users\omkar\OneDrive\Documents\Data science\Naresh IT\Datafiles\Vi
visa_df=pd.read_csv(path)

In [29]: from sklearn.preprocessing import LabelEncoder # read the pacakge

le=LabelEncoder()
visa_df['case_status']=le.fit_transform(visa_df['case_status'])
visa_df

Out[29]: case_id continent education_of_employee has_job_experience requires_job_trainin

0 EZYV01 Asia High School N

1 EZYV02 Asia Master's Y

2 EZYV03 Asia Bachelor's N

3 EZYV04 Asia Bachelor's N

4 EZYV05 Africa Master's Y

... ... ... ... ...

25475 EZYV25476 Asia Bachelor's Y

25476 EZYV25477 Asia High School Y

25477 EZYV25478 Asia Master's Y

25478 EZYV25479 Asia Master's Y

25479 EZYV25480 Asia Bachelor's Y

25480 rows × 12 columns

 

In [30]: path=r"C:\Users\omkar\OneDrive\Documents\Data science\Naresh IT\Datafiles\Vi

visa_df=pd.read_csv(path)
cat_cols= visa_df.select_dtypes(include='object').columns
cat_cols # avoid

Out[30]: Index(['case_id', 'continent', 'education_of_employee', 'has_job_experienc

e',
'requires_job_training', 'region_of_employment', 'unit_of_wage',
'full_time_position', 'case_status'],
dtype='object')

In [ ]:

Introduction to Geometry, Grades 4 - 5
From Everand
Introduction to Geometry, Grades 4 - 5
Carson Dellosa Education
No ratings yet
Sydney Grammar 2014 3U Prelim Yearly & Solutions
Document22 pages
Sydney Grammar 2014 3U Prelim Yearly & Solutions
geeeeloo
No ratings yet
Arc Arc Sensor-2
Document234 pages
Arc Arc Sensor-2
Mihail Avramov
No ratings yet
A Self-Learning Worksheet in Math 9 (Week 3) : Prepared By: LESSER A. LOC
Document8 pages
A Self-Learning Worksheet in Math 9 (Week 3) : Prepared By: LESSER A. LOC
Estelle Nica Marie Dunlao
100% (1)
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
Document25 pages
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
umesh hasija
No ratings yet
Parts of Speech 2
Document14 pages
Parts of Speech 2
cagasanjelian112001
No ratings yet
History of The Indian Caste System and Its Impact Today MA Thesis
Document35 pages
History of The Indian Caste System and Its Impact Today MA Thesis
jupiterjjrm
100% (1)
Unua Libro
Document5 pages
Unua Libro
Mihai Iulian Păunescu
No ratings yet
Chapter 3 (Inserting The Missing Character) (Unit 1)
Document15 pages
Chapter 3 (Inserting The Missing Character) (Unit 1)
Nishta Karjagi
No ratings yet
Make or Do PDF
Document3 pages
Make or Do PDF
English Coach
No ratings yet
EDA Session-3 Categorical Data Analysis
Document16 pages
EDA Session-3 Categorical Data Analysis
jeeshu048
No ratings yet
EDA - Session-2 - Data Frame Basics-2
Document11 pages
EDA - Session-2 - Data Frame Basics-2
jeeshu048
No ratings yet
EDA - Session-5 - Outlier Analysis
Document11 pages
EDA - Session-5 - Outlier Analysis
jeeshu048
No ratings yet
EDA - Session-6 - Bi Variate Analysis
Document17 pages
EDA - Session-6 - Bi Variate Analysis
jeeshu048
No ratings yet
EDA - Session-1 - Basic Dataframe Opertaions-1
Document7 pages
EDA - Session-1 - Basic Dataframe Opertaions-1
jeeshu048
No ratings yet
Solution:: 4 (n+1) 10 4 (n+1) 10 4 (10+1) 10 4 11 10 44 10 4.4 4th Position 154
Document4 pages
Solution:: 4 (n+1) 10 4 (n+1) 10 4 (10+1) 10 4 11 10 44 10 4.4 4th Position 154
Gellier Guerrero
No ratings yet
Gcse/Igcsefm:: Sketching Quadratics and Cubics
Document16 pages
Gcse/Igcsefm:: Sketching Quadratics and Cubics
Yun Ho CHO
No ratings yet
Decision Trees - Jupyter Notebook
Document4 pages
Decision Trees - Jupyter Notebook
Me me
No ratings yet
Hkust: Midterm Examination (Version A) Name: 11 October 2014 Student ID: 11:00am-12:00pm Lecture Section: Seat Number
Document8 pages
Hkust: Midterm Examination (Version A) Name: 11 October 2014 Student ID: 11:00am-12:00pm Lecture Section: Seat Number
Pak Ho
No ratings yet
Econometrics For Finance Assignment R.V
Document3 pages
Econometrics For Finance Assignment R.V
Dureti Niguse
No ratings yet
20 - TMEET226 - P2 - Ta-Ay - Prob No 03
Document2 pages
20 - TMEET226 - P2 - Ta-Ay - Prob No 03
Jay
No ratings yet
Relational Algebra and SQL: Computer Science E-66 Harvard University David G. Sullivan, PH.D
Document49 pages
Relational Algebra and SQL: Computer Science E-66 Harvard University David G. Sullivan, PH.D
Mohammed Khaled
No ratings yet
DB1 2018 04 06 Solution
Document16 pages
DB1 2018 04 06 Solution
Reuben Vas
No ratings yet
MA122 Fall 21 Final Problems
Document8 pages
MA122 Fall 21 Final Problems
Clinton Osawe
No ratings yet
MMW - Assignment Work
Document1 page
MMW - Assignment Work
iluveducation753
No ratings yet
ER Model
Document33 pages
ER Model
rajputajay0821
No ratings yet
Data Visualization
Document13 pages
Data Visualization
Shania Jone
No ratings yet
Iznf'Kzr FP Esa Igys RFKK FQJ Ls Nks Øekxr Ijkorzuksa Ds I'Pkr Dqy Vko/Kzu KKR DHFT S%&
Document6 pages
Iznf'Kzr FP Esa Igys RFKK FQJ Ls Nks Øekxr Ijkorzuksa Ds I'Pkr Dqy Vko/Kzu KKR DHFT S%&
Rajat Verma X D 39
No ratings yet
2 of 2 - Assignment - Practice
Document2 pages
2 of 2 - Assignment - Practice
tian jin
No ratings yet
20bce2251 VL2021220503859 Ast05
Document4 pages
20bce2251 VL2021220503859 Ast05
TANMAY MEHROTRA
No ratings yet
Four Easy Steps To Get Your Detailed Reports in Quizalize: 8.G.A.3 Dilations, Translations, Rotations, and Re Ections
Document4 pages
Four Easy Steps To Get Your Detailed Reports in Quizalize: 8.G.A.3 Dilations, Translations, Rotations, and Re Ections
Dathan Victoria
No ratings yet
MATH Forecast Fr5
Document13 pages
MATH Forecast Fr5
Patricia Noble
No ratings yet
Four Easy Steps To Get Your Detailed Reports in Quizalize: 8.G.A.1 Properties of Rotations, Re Ections, and Translations
Document3 pages
Four Easy Steps To Get Your Detailed Reports in Quizalize: 8.G.A.1 Properties of Rotations, Re Ections, and Translations
Dathan Victoria
No ratings yet
Stats and Probability
Document13 pages
Stats and Probability
Van Duran
No ratings yet
Spearman Rho: Education Department
Document7 pages
Spearman Rho: Education Department
Leo Cordel Jr.
No ratings yet
MachineLearning MidTerm UMT Spring 2021
Document12 pages
MachineLearning MidTerm UMT Spring 2021
Zarfa Masood
No ratings yet
Piads &al E2022: National Level
Document6 pages
Piads &al E2022: National Level
Aleena Ansari
No ratings yet
Lec # 04 Inheritance
Document34 pages
Lec # 04 Inheritance
zainzahid99
No ratings yet
Lec 3 - 4 - Linear Regression PDF
Document1 page
Lec 3 - 4 - Linear Regression PDF
Ahmed Samy
No ratings yet
Final Term Exam Spring 2020
Document6 pages
Final Term Exam Spring 2020
Abubakar Sadeeq
No ratings yet
Chapter 1, Section 1, Problem #11
Document2 pages
Chapter 1, Section 1, Problem #11
leigh jannah ramos
No ratings yet
Math Assessment Task Aleyna Sinmaz Yr 9
Document7 pages
Math Assessment Task Aleyna Sinmaz Yr 9
api-282257848
No ratings yet
SampleBook Algebra1 PDF
Document9 pages
SampleBook Algebra1 PDF
faithinhim7515
No ratings yet
Stacks and Queues
Document126 pages
Stacks and Queues
dmummina
No ratings yet
Spearman
Document6 pages
Spearman
ESLAM MAGDY
No ratings yet
CSEC - Add Math - Paper 2 Booklet (2016-2023)
Document151 pages
CSEC - Add Math - Paper 2 Booklet (2016-2023)
Tari
No ratings yet
Relational Decomposition
Document13 pages
Relational Decomposition
sanjayv7210
No ratings yet
Problem Set 5 (Key)
Document18 pages
Problem Set 5 (Key)
wheeler8921
0% (2)
Unit 3
Document26 pages
Unit 3
Vijayudu Maheza
No ratings yet
Appgraph-Ca2-Powerpoint Presentation Video-G00394631
Document13 pages
Appgraph-Ca2-Powerpoint Presentation Video-G00394631
api-544280724
No ratings yet
Report - Review Assignment 1 - Practice
Document7 pages
Report - Review Assignment 1 - Practice
tian jin
No ratings yet
Relative Position
Document5 pages
Relative Position
Analyn M. Africa
No ratings yet
Primal Conjecture In Matrixe9Φ: Eduardo J. Acuña T, Paul Francisco Marrero Romero
Document13 pages
Primal Conjecture In Matrixe9Φ: Eduardo J. Acuña T, Paul Francisco Marrero Romero
Eduardo Acuña
No ratings yet
Lab 4 - SQLStatements
Document3 pages
Lab 4 - SQLStatements
Doan Hoang Dung QP2775
No ratings yet
Cnhs-Item Analysis Template 2022-2023 For 1 Teacher
Document19 pages
Cnhs-Item Analysis Template 2022-2023 For 1 Teacher
Renato Rivera
No ratings yet
Mathematics 7: 1 Quarter Week 4
Document7 pages
Mathematics 7: 1 Quarter Week 4
Nimfa Maliza
No ratings yet
Wicoxon Dan Friedman
Document6 pages
Wicoxon Dan Friedman
Eka Vebriana
No ratings yet
Business School School Accreditation Residential Campus
Document13 pages
Business School School Accreditation Residential Campus
Priya Sirnam
No ratings yet
Ruby - Variables
Document15 pages
Ruby - Variables
Phú Nguyễn Đức
No ratings yet
Name That Song
Document49 pages
Name That Song
Rose Aquino
No ratings yet
EE421/Lab 03 Visualization and Programming: - O + X S D V P N
Document4 pages
EE421/Lab 03 Visualization and Programming: - O + X S D V P N
Кдйікі Кци
No ratings yet
Trabajo Geologia Calculos (Corregido)
Document1 page
Trabajo Geologia Calculos (Corregido)
jtrinacokhotmail.com
No ratings yet
Precal 3rd Monthly Test Answers
Document5 pages
Precal 3rd Monthly Test Answers
REGGIE MAIGUE
No ratings yet
Week 5 LAB
Document23 pages
Week 5 LAB
tarunbinjadagi18
No ratings yet
Module 1 Ungrouped Data
Document5 pages
Module 1 Ungrouped Data
Angel Pasahol
No ratings yet
Mayer of Casterbridge Themes
Document4 pages
Mayer of Casterbridge Themes
Faisal Ashraf
No ratings yet
Mathematics P1 Nov 2019 Memo Afr & Eng
Document18 pages
Mathematics P1 Nov 2019 Memo Afr & Eng
pullenluchian
No ratings yet
Changes
Document105 pages
Changes
Marinos Giannoukakis
No ratings yet
Mathematics 10
Document4 pages
Mathematics 10
Lance Akizha Printshop
No ratings yet
Anaconda-Open Source Data Science
Document8 pages
Anaconda-Open Source Data Science
Santiago Botero
No ratings yet
01 Teach Yourself Catalan
Document200 pages
01 Teach Yourself Catalan
Enric Moran
100% (10)
Speech Writing Format, Topics, Examples, Class 11, 12
Document1 page
Speech Writing Format, Topics, Examples, Class 11, 12
CREATIVE X
No ratings yet
Divyansh Dwivedi: Employment History
Document1 page
Divyansh Dwivedi: Employment History
Kunal Nag
No ratings yet
Product Description FONST 3000 (Intelligent OTN Equipment)
Document310 pages
Product Description FONST 3000 (Intelligent OTN Equipment)
mohd fauzi mohamad yusof
No ratings yet
The Crown Test Num
Document63 pages
The Crown Test Num
Yassinox kamal
No ratings yet
English Proficiency Thesis PDF
Document8 pages
English Proficiency Thesis PDF
wvttzhief
100% (2)
COMP1405 - Assignment #1: (1) Wrapping
Document2 pages
COMP1405 - Assignment #1: (1) Wrapping
Anna Gow
No ratings yet
Genero Poema e Vocabulário
Document2 pages
Genero Poema e Vocabulário
Leocxx
No ratings yet
PT - Mapeh 5 - Q4 V1
Document6 pages
PT - Mapeh 5 - Q4 V1
Jhoanna Arche Sulit
No ratings yet
Lecture 9a ICT Logic Gates Boolean Algebra 15052022 022650pm
Document31 pages
Lecture 9a ICT Logic Gates Boolean Algebra 15052022 022650pm
Minahil Fatima
No ratings yet
Planificare Calendaristică: Ministerul Educaţiei Naționale
Document10 pages
Planificare Calendaristică: Ministerul Educaţiei Naționale
Paul Hasas
No ratings yet
Worksheet : 1. Put The Right Pronoun
Document2 pages
Worksheet : 1. Put The Right Pronoun
alenaanomar
No ratings yet
Unit 7 Basic Test
Document2 pages
Unit 7 Basic Test
Marija
No ratings yet
Apostles Creed
Document4 pages
Apostles Creed
JM Aguilar
No ratings yet
Rabindranath Tagore World Literature Analysis
Document5 pages
Rabindranath Tagore World Literature Analysis
Abinesh Kumar
100% (1)
Subtasking-of-MELCs - English For Academics - Professional Purposes - Quarter 1
Document12 pages
Subtasking-of-MELCs - English For Academics - Professional Purposes - Quarter 1
एप्रिल आनंद मॅन्टिकाहोन गेटिगन
No ratings yet
Nep Upc Sem1
Document20 pages
Nep Upc Sem1
Whha jsus
No ratings yet
Inline Function
Document11 pages
Inline Function
Junaid khan
No ratings yet
Prelim Exam UTS
Document3 pages
Prelim Exam UTS
Marthy John
No ratings yet
An Introduction To The Theory of Automorphic Functions
Document112 pages
An Introduction To The Theory of Automorphic Functions
Emmanureva
No ratings yet