Feature
Engineering
Better Features Make Better Models
Which is better?
1. Good algorithm with low quality columns
2. Bad algorithm with high quality columns
What is FE?
Using domain knowledge to extract features from raw data.
Features can be used to improve machine learning algos.
Categories of FE
• Feature Transformation
• Feature Construction
• Feature Selection
• Feature Extraction
Feature Transformation
• Handle Missing Values
• Handle Categorical Features
• Identify Outliers
• Feature Scaling
Fill Missing Values
Handle Categorical Data (One hot
Encoding)
Identify Outliers
Feature Scaling
Feature Construction – Building New
Features
Example : Creating a new food recipe
The basic ingredients are like the raw data you start with.
New sauce or grating the cheese is like feature construction, where you
combine or change the raw data to make new features that could help
your machine learning model (the recipe) perform better and impress
your guests (achieve better accuracy).
Feature Extraction
Handle Missing Values
• Remove the entire row (Not Recommended)
• Impute(fill) the values (Univariate/ Multivariate)
Univariate – Simple Imputer Class
Numerical Categorical
• Mean, Median Mode (Most Frequent)
• Random Value Missing
• End of Distribution
Multivariate – Simple Imputer Class
KNN Imputer
Iterative Imputer
Complete Case Analysis (CCA)
AKA “List-Wise Deletion” – RANDOM VALUES ARE MISSING !
(MCAR – Missing Completely At Random)
Deletes Observations where values in ANY of the variables is missing
Basically, delete the entire row if any missing values in one of the columns
When to use CCA?
• MCAR
• Less than 5% data is missing
Need for Feature Scaling
Imagine you're trying to compare two items, like a pair of shoes. You're
looking at both their size and price to decide which one is better for you.
If you only focus on the price, which has a big range (from really cheap
to very expensive), you might ignore the size, which is just as important
but doesn't change as much (only a few sizes available).
Need for Feature Scaling
In the world of machine learning, when we compare things (like data
points), we also look at different features (like size and price). But if one
feature (like price) varies a lot more than another (like size), our
comparison might unfairly focus too much on the feature that changes a
lot.
Min-Max Normalisation
Standardization
X -> Value (Data Point)
u -> Mean of the data
Sigma -> Standard Deviation(SD)
Mean = 0;
SD = 1 ;
For New Column