Practical 3 - Categorical Feature Engineering

Artificial Intelligence and Machine Learning
PRACTICAL 3
Data Engineering - Feature Engineering
Categorical Feature Imputation and Encoding
Prepared by Nima Dema
Table of Contents
0. Learning Objectives 2
1. Imputing Categorical features 2
1.1. Import required Libraries (Already done for you) 3
1 | Page 29 November 2023

1.2. Load data from CSV file 3

1.3. Create new cdf containing categorical features 3
1.4. Check null values in cdf 3
1.5. Impute Categorical features 4
1.6. Drop features 4
2. Encoding Categorical features 4
2.1. Encode nominal features 4
2.2. Encode ordinal features 5
2.3. Encode target feature 5
2.4. Combine all encoded categorical features. 6
0. Learning Objectives
In this week’s practical session on data engineering, the focus is on categorical
feature engineerig techniques, a crucial aspect of preparing data for machine
learning models. The overarching goal is to equip participants with the skills
necessary to handle categorical features effectively, addressing challenges such as
missing values and optimizing feature representation for improved model
performance.
By the end of the lab, you should be able to:
➔ Apply feature engineering techniques for categorical features.

➔ Prepare categorical features for traini ng machine learning model.
1. Imputing Categorical features

INSTRUCTIONS:
➔ Load data from loan_train.csv file in a dataframe named df. Check datatypes
of all the features. Do you think datatypes of all the features are in expected
type? If not, use astype method to convert features to required types.
➔ From df create a dataframe named as cdf which contains only categorical
features.
➔ Use appropriate method to check null values in given dataset.
➔ use sklearn SimpleImputer to impute the missing values using most suitable
strategy.

1.1. Import required Libraries (Already done for you)

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
1.2. Load data from CSV file

…….
…….
1.3. Create new cdf containing categorical features

For this task, use select_dtypes() method of pandas dataframe with include
parameter.
…….
…….
1.4. Check null values in cdf

Use isna() or isnull() method to check null values. You may find sum() method useful
here.
…….
…….
1.5. Impute Categorical features

For this task, you may have to import SimpleImputer from sklearn (Already done).
After the imputation is completed, check for null values again to verify if your
imputation step is successfully completed or not.
from sklearn.impute import SimpleImputer

…….
…….
…….

1.6. Drop features

The Loan_ID column contains unique values for all the training example. Lets drop it
from our dataframe. Use drop() method for this task.
…….
…….
2. Encoding Categorical features

INSTRUCTIONS:
➔ Choose all nominal feature in your cdf and apply one-hot encoding. Create
nominaldf which contains encoded nominal feature.
➔ Choose all ordinal features in your cdf and apply OrdinalEncoder to encode
ordinal features. Create ordinaldf which contains encoded ordinal features.
➔ Use LabelEncoder to encode target feature.
2.1. Encode nominal features

The values of features such as Gender, Married, Self_Employed doesnot posses any
order. They are nomial features. Encode them using OneHotEncoder from sklearn.
For this task, you need to import OneHotEncoder (already done for you). Then
convert your encoded features to dataframe back as sklearn returns array.
from sklearn.preprocessing import OneHotEncoder

#Create one hot encoder object
ohe = OneHotEncoder()
#use fit_transform method to actually transform your categorical

features into number
…….
…….
#Now convert encoded features into dataframe
…….

…….
2.2. Encode ordinal features

The values of features such as Dependents, Education, Property_Area posses
natural ordering. They are ordinal features. Encode them using Ordinal Encoder from
sklearn. For this task, you need to import OrdinalEncoder (already done for you).
Then convert your encoded features to dataframe back as sklearn returns array.
from sklearn.preprocessing import OrdinalEncoder
categories_val = [['0','1','2','3+'],['Not
Graduate','Graduate'],['Rural','Semiurban','Urban']]
#create ordinalencoder's object

oe = OrdinalEncoder(categories = categories_val)
#use fit_transform method to actually transform your categorical

features into number
…….
…….
#Now convert encoded features into dataframe
…….
…….
2.3. Encode target feature

Encoding target feature is optional since many algorithms implemented in sklearn
accept it in categorical format. However, in this step, we are going to implement
target encoding using LabelEncoder. Note that you don’t have to convert encoded
target to dataframe.
from sklearn.preprocessing import LabelEncoder

#create Labelencoder's object
le = LabelEncoder()
#use fit_transform method to actually transform your target feature

…….
…….

2.4. Combine all encoded categorical features.

After encoding our categorical features we have created two different dataframes
which stores ordinal and nominal features separately. Since we need all our features
for training ML model, lets combine all of them. Also add target to the combined
dataframe. Our dataframe which contains all encoded categorical features is called
categoricaldf. For this task use pandas concat() method.
#Since nominaldf and ordinal df are dataframe, we can concatenate them

categoricaldf = pd.concat([nominaldf,ordinaldf],axis=1)
#lets add our encoded target variable to categoricaldf

categoricaldf['Target'] = target
categoricaldf.head()
Note: We need to reuse our categoricaldf together with numerical features which we
will be working on in next week lab practical.
THANK YOU ☺

Practical 3 - Categorical Feature Engineering

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Practical 3 - Categorical Feature Engineering

Uploaded by

Copyright:

Available Formats

Artificial Intelligence and Machine Learning

Data Engineering - Feature Engineering

Categorical Feature Imputation and Encoding

Prepared by Nima Dema

1 | Page 29 November 2023

1.2. Load data from CSV file 3

By the end of the lab, you should be able to:

➔ Apply feature engineering techniques for categorical features.

1. Imputing Categorical features

2 | Page 29 November 2023

1.1. Import required Libraries (Already done for you)

1.2. Load data from CSV file

1.3. Create new cdf containing categorical features

1.4. Check null values in cdf

1.5. Impute Categorical features

from sklearn.impute import SimpleImputer

3 | Page 29 November 2023

1.6. Drop features

2. Encoding Categorical features

2.1. Encode nominal features

from sklearn.preprocessing import OneHotEncoder

#use fit_transform method to actually transform your categorical

4 | Page 29 November 2023

2.2. Encode ordinal features

from sklearn.preprocessing import OrdinalEncoder

#create ordinalencoder's object

#use fit_transform method to actually transform your categorical

2.3. Encode target feature

from sklearn.preprocessing import LabelEncoder

#use fit_transform method to actually transform your target feature

5 | Page 29 November 2023

2.4. Combine all encoded categorical features.

#Since nominaldf and ordinal df are dataframe, we can concatenate them

#lets add our encoded target variable to categoricaldf

6 | Page 29 November 2023

You might also like