Feature Engineering for Machine Learning

The document discusses feature engineering techniques for improving machine learning models. It covers categories of feature engineering like transformation, construction, selection and extraction. Specific techniques discussed include handling missing values, categorical encoding, outlier detection, feature scaling, and feature construction.

Uploaded by

abhaybytheway

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views20 pages

Feature Engineering for Machine Learning

Uploaded by

abhaybytheway

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Feature

Engineering
Better Features Make Better Models
Which is better?

1. Good algorithm with low quality columns

2. Bad algorithm with high quality columns
What is FE?

Using domain knowledge to extract features from raw data.

Features can be used to improve machine learning algos.
Categories of FE

• Feature Transformation
• Feature Construction
• Feature Selection
• Feature Extraction
Feature Transformation

• Handle Missing Values

• Handle Categorical Features
• Identify Outliers
• Feature Scaling
Fill Missing Values
Handle Categorical Data (One hot
Encoding)
Identify Outliers
Feature Scaling
Feature Construction – Building New
Features
Example : Creating a new food recipe

The basic ingredients are like the raw data you start with.
New sauce or grating the cheese is like feature construction, where you
combine or change the raw data to make new features that could help
your machine learning model (the recipe) perform better and impress
your guests (achieve better accuracy).
Feature Extraction
Handle Missing Values

• Remove the entire row (Not Recommended)

• Impute(fill) the values (Univariate/ Multivariate)
Univariate – Simple Imputer Class

Numerical Categorical
• Mean, Median Mode (Most Frequent)
• Random Value Missing
• End of Distribution
Multivariate – Simple Imputer Class

KNN Imputer
Iterative Imputer
Complete Case Analysis (CCA)

AKA “List-Wise Deletion” – RANDOM VALUES ARE MISSING !

(MCAR – Missing Completely At Random)

Deletes Observations where values in ANY of the variables is missing

Basically, delete the entire row if any missing values in one of the columns
When to use CCA?

• MCAR
• Less than 5% data is missing
Need for Feature Scaling

Imagine you're trying to compare two items, like a pair of shoes. You're
looking at both their size and price to decide which one is better for you.
If you only focus on the price, which has a big range (from really cheap
to very expensive), you might ignore the size, which is just as important
but doesn't change as much (only a few sizes available).
Need for Feature Scaling
In the world of machine learning, when we compare things (like data
points), we also look at different features (like size and price). But if one
feature (like price) varies a lot more than another (like size), our
comparison might unfairly focus too much on the feature that changes a
lot.
Min-Max Normalisation
Standardization

X -> Value (Data Point)

u -> Mean of the data
Sigma -> Standard Deviation(SD)

Mean = 0;
SD = 1 ;
For New Column

Feature Engineering for Machine Learning
No ratings yet
Feature Engineering for Machine Learning
81 pages
AIPPTMaker - Data Preprocessing and Feature Engineering - Key To Improving AI Algorithm Performance
No ratings yet
AIPPTMaker - Data Preprocessing and Feature Engineering - Key To Improving AI Algorithm Performance
35 pages
Data Preprocessing and Feature Engineering
No ratings yet
Data Preprocessing and Feature Engineering
32 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
Data Preprocessing Guide
No ratings yet
Data Preprocessing Guide
19 pages
Feature Engineering
No ratings yet
Feature Engineering
15 pages
Data Preprocessing Techniques in Python
No ratings yet
Data Preprocessing Techniques in Python
12 pages
Data Pre-Processing for Machine Learning
No ratings yet
Data Pre-Processing for Machine Learning
12 pages
Unit 1
No ratings yet
Unit 1
8 pages
Feature Engineering Basics for ML
No ratings yet
Feature Engineering Basics for ML
33 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
43 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Feature Engineering For Machine Learning
No ratings yet
Feature Engineering For Machine Learning
41 pages
ML Notes
No ratings yet
ML Notes
44 pages
Data Proprocesing
No ratings yet
Data Proprocesing
18 pages
Comprehensive Guide to EDA Techniques
No ratings yet
Comprehensive Guide to EDA Techniques
48 pages
Data Cleaning and Quality Assessment Guide
No ratings yet
Data Cleaning and Quality Assessment Guide
17 pages
Unit-II Feature Engineering - Removed
No ratings yet
Unit-II Feature Engineering - Removed
158 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
62 pages
BA - Exam Notes
No ratings yet
BA - Exam Notes
7 pages
Data Wrangling and Analytics Guide
No ratings yet
Data Wrangling and Analytics Guide
57 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
64 pages
Feature Engineering in Data Science
No ratings yet
Feature Engineering in Data Science
50 pages
Data Cleaning Techniques
No ratings yet
Data Cleaning Techniques
11 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
52 pages
Data Mining CSE-443: Ayesha Aziz Prova Lecturer, Dept. of CSE CWU
No ratings yet
Data Mining CSE-443: Ayesha Aziz Prova Lecturer, Dept. of CSE CWU
21 pages
Data Preprocessing Techniques in Mining
No ratings yet
Data Preprocessing Techniques in Mining
52 pages
04 - Feature Engineering
No ratings yet
04 - Feature Engineering
28 pages
5 Preprocessing
No ratings yet
5 Preprocessing
44 pages
Data Pre-processing Techniques in Mining
No ratings yet
Data Pre-processing Techniques in Mining
33 pages
Unit II 10 Data Preprocessing Techniques
No ratings yet
Unit II 10 Data Preprocessing Techniques
13 pages
Knowledge Discovery and Data Mining
No ratings yet
Knowledge Discovery and Data Mining
55 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
Essential Data Preprocessing Techniques
No ratings yet
Essential Data Preprocessing Techniques
52 pages
4 Data Pre Processing II
No ratings yet
4 Data Pre Processing II
26 pages
Data Preprocessing
No ratings yet
Data Preprocessing
49 pages
Lecture # 13 Data - Transformation - Techniques
No ratings yet
Lecture # 13 Data - Transformation - Techniques
36 pages
ML - Week 04
No ratings yet
ML - Week 04
33 pages
Understanding Qualitative and Quantitative Data
No ratings yet
Understanding Qualitative and Quantitative Data
89 pages
Preprocessing Techniques
No ratings yet
Preprocessing Techniques
63 pages
DWDM PDF
No ratings yet
DWDM PDF
21 pages
Unit 2 Data Cleaning
No ratings yet
Unit 2 Data Cleaning
12 pages
Data Pre-Processing in Data Mining
No ratings yet
Data Pre-Processing in Data Mining
37 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
100 pages
Data Preprocessing Techniques in Data Mining
No ratings yet
Data Preprocessing Techniques in Data Mining
49 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
139 pages
Data Quality Challenges in Big Data
No ratings yet
Data Quality Challenges in Big Data
48 pages
Unit-4 Part 3 Feature Engineering
No ratings yet
Unit-4 Part 3 Feature Engineering
29 pages
Importance of Data Preprocessing in Mining
No ratings yet
Importance of Data Preprocessing in Mining
77 pages
EDA Guide for Data Analysts
No ratings yet
EDA Guide for Data Analysts
35 pages
Importance of Data Preprocessing Techniques
No ratings yet
Importance of Data Preprocessing Techniques
52 pages
EDA Explanations
No ratings yet
EDA Explanations
22 pages
Week 2 - Data Quality
No ratings yet
Week 2 - Data Quality
43 pages
Data Preprocessing Techniques in Mining
No ratings yet
Data Preprocessing Techniques in Mining
25 pages
Data Preprocessing Essentials
No ratings yet
Data Preprocessing Essentials
85 pages
Assignment 4 MB511
No ratings yet
Assignment 4 MB511
6 pages
ML ch-1
No ratings yet
ML ch-1
32 pages
Essential Steps in Data Preprocessing
No ratings yet
Essential Steps in Data Preprocessing
34 pages
Data Science, ML, AI: Key Differences
No ratings yet
Data Science, ML, AI: Key Differences
37 pages
Car Popularity Prediction
No ratings yet
Car Popularity Prediction
5 pages
1.data Cleaning Screening
No ratings yet
1.data Cleaning Screening
21 pages
Data Analytics - Unit 2
No ratings yet
Data Analytics - Unit 2
8 pages
Data Analysis: Steps & Missing Values
No ratings yet
Data Analysis: Steps & Missing Values
20 pages
Boosting Lead Conversion Rates
No ratings yet
Boosting Lead Conversion Rates
13 pages
E-Book Data Cleaning Techniques in Python
100% (2)
E-Book Data Cleaning Techniques in Python
50 pages
Data Analytics Program - Introduction To Data Analytics - Lesson 1
No ratings yet
Data Analytics Program - Introduction To Data Analytics - Lesson 1
56 pages
AMOS 4.0: Bootstrapping in SEM Analysis
No ratings yet
AMOS 4.0: Bootstrapping in SEM Analysis
4 pages
Social Isolation
No ratings yet
Social Isolation
10 pages
FDS Practice Question Paper
No ratings yet
FDS Practice Question Paper
3 pages
DataAnalysis and Interpretation
100% (2)
DataAnalysis and Interpretation
49 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Job Satisfaction and Creative Performance Analysis
No ratings yet
Job Satisfaction and Creative Performance Analysis
9 pages
Statistical Data Integration in Survey Sampling: A Review: Shu Yang Jae Kwang Kim
No ratings yet
Statistical Data Integration in Survey Sampling: A Review: Shu Yang Jae Kwang Kim
26 pages
PCA with Missing Data in R Using missMDA
No ratings yet
PCA with Missing Data in R Using missMDA
24 pages
Pipeline and Riser Loss of Containment 2001 - 2012 (PARLOC 2012)
No ratings yet
Pipeline and Riser Loss of Containment 2001 - 2012 (PARLOC 2012)
21 pages
SCM Optimization with HLXM Model
No ratings yet
SCM Optimization with HLXM Model
9 pages
FRA Project Report Milestone 1 PDF
No ratings yet
FRA Project Report Milestone 1 PDF
29 pages
Data Preparation and Analysis in Research
100% (1)
Data Preparation and Analysis in Research
85 pages
Data Quality and Remediation
No ratings yet
Data Quality and Remediation
40 pages
MS5107 Boston Housing, Corolla NUIG
No ratings yet
MS5107 Boston Housing, Corolla NUIG
6 pages
Comparing ZNA-2 and MABC-2 Assessments
No ratings yet
Comparing ZNA-2 and MABC-2 Assessments
19 pages
Research Paper
No ratings yet
Research Paper
22 pages
The Landscape of R Packages For Automated Exploratory Data Analysis
No ratings yet
The Landscape of R Packages For Automated Exploratory Data Analysis
19 pages
Handling Missing Data in ML
No ratings yet
Handling Missing Data in ML
8 pages
Data Analysis Challenges
No ratings yet
Data Analysis Challenges
2 pages
CertGod 20250919 DA0-001 - 29nov25
No ratings yet
CertGod 20250919 DA0-001 - 29nov25
173 pages
Emotion Regulation in Youth: Personality & Attachment
No ratings yet
Emotion Regulation in Youth: Personality & Attachment
6 pages
Machine Learning for Cost Estimation in Nepal
No ratings yet
Machine Learning for Cost Estimation in Nepal
62 pages

Feature Engineering for Machine Learning

Uploaded by

Feature Engineering for Machine Learning

Uploaded by

Feature

1. Good algorithm with low quality columns

Using domain knowledge to extract features from raw data.

• Handle Missing Values

• Remove the entire row (Not Recommended)

AKA “List-Wise Deletion” – RANDOM VALUES ARE MISSING !

Deletes Observations where values in ANY of the variables is missing

X -> Value (Data Point)

You might also like