Professional Documents
Culture Documents
PRACTICAL 3
Table of Contents
0. Learning Objectives 2
1. Imputing Categorical features 2
1.1. Import required Libraries (Already done for you) 3
0. Learning Objectives
In this week’s practical session on data engineering, the focus is on categorical
feature engineerig techniques, a crucial aspect of preparing data for machine
learning models. The overarching goal is to equip participants with the skills
necessary to handle categorical features effectively, addressing challenges such as
missing values and optimizing feature representation for improved model
performance.
➔ Load data from loan_train.csv file in a dataframe named df. Check datatypes
of all the features. Do you think datatypes of all the features are in expected
type? If not, use astype method to convert features to required types.
➔ From df create a dataframe named as cdf which contains only categorical
features.
➔ Use appropriate method to check null values in given dataset.
➔ use sklearn SimpleImputer to impute the missing values using most suitable
strategy.
…….
…….
…….
…….
…….
…….
➔ Choose all nominal feature in your cdf and apply one-hot encoding. Create
nominaldf which contains encoded nominal feature.
➔ Choose all ordinal features in your cdf and apply OrdinalEncoder to encode
ordinal features. Create ordinaldf which contains encoded ordinal features.
➔ Use LabelEncoder to encode target feature.
…….
categories_val = [['0','1','2','3+'],['Not
Graduate','Graduate'],['Rural','Semiurban','Urban']]
Note: We need to reuse our categoricaldf together with numerical features which we
will be working on in next week lab practical.
THANK YOU ☺