You are on page 1of 41

Feature Engineering for

Machine Learning
• Feature engineering is the pre-processing step of machine learning,
which is used to transform raw data into features that can be used
for creating a predictive model using Machine learning
• In other words, it is the process of selecting, extracting, and
transforming the most relevant features from the available
data to build more accurate and efficient machine learning
models.
• Feature engineering involves a set of techniques that enable us
to create new features by combining or transforming the
existing ones.
Need for feature Engineering
• to improve the performance of machine learning models by
providing them with relevant and informative input data.
• Feature engineering can also help in addressing issues such as
overfitting, underfitting, and high dimensionality.
• feature engineering is a crucial step in preparing data for
analysis and decision-making in various fields, such as finance,
healthcare, marketing, and social sciences.
Processes Involved in Feature
Engineering
• Feature engineering in Machine learning consists of mainly 5
processes:
1) Feature Creation,
2) Feature Transformation,
3) Feature Extraction,
4) Feature Selection, and
5) Feature Scaling.
• The success of a machine learning model largely depends on
the quality of the features used in the model.
Feature Creation
• Feature Creation is the process of generating new features
based on domain knowledge or by observing patterns in the
data.
• The new features are created by mixing existing features using
addition, subtraction, and ration, and these new features have great
flexibility.
Types of Feature Creation:
• Domain-Specific, Creating new features based on domain
knowledge
• Data-Driven, Creating new features by observing patterns in
the data
• Synthetic, Generating new features by combining existing
features
Benefits of Feature Creation
• Improves Model Performance
• Increases Model Robustness
• Improves Model Interpretability
• Increases Model Flexibility
2. Feature Transformation

• Feature Transformation is the process of transforming the


features into a more suitable representation for the machine
learning model.
Types of Feature Transformation:

• Normalization: Rescaling the features to have a similar range,


such as between 0 and 1, to prevent some features from
dominating others.
• Scaling: Rescaling the features to have a similar scale, such as
having a standard deviation of 1, to make sure the model
considers all features equally.
• Encoding: Transforming categorical features into a numerical
representation. Examples are one-hot encoding and label encoding.
• Transformation: Transforming the features using mathematical
operations to change the distribution or scale of the features.
Examples are logarithmic, square root, and reciprocal
transformations.
3. Feature Extraction

• Feature Extraction is the process of creating new features from


existing ones to provide more relevant information to the
machine learning model.
• The main aim of this step is to reduce the volume of data so that it
can be easily used and managed for data modelling.
• Feature extraction methods include cluster analysis, text analytics,
edge detection algorithms, and principal components analysis (PCA).
Types of Feature Extraction
• Dimensionality Reduction, Reducing the number of features by
transforming the data into a lower-dimensional space while
retaining important information. Examples are PCA and t-SNE.
• Feature Combination, Combining two or more existing features to
create a new one. For example, the interaction between two
features.
• Feature Aggregation, Aggregating features to create a new one.
For example, calculating the mean, sum, or count of a set of
features.
• Feature Transformation, Transforming existing features into a new
representation. For example, log transformation of a feature with a
skewed distribution.
4. Feature Selection

• Feature selection is a way of selecting the subset of the most


relevant features from the original features set by removing the
redundant, irrelevant, or noisy features.
Types of Feature Selection
• Filter Method, Based on the statistical measure of the
relationship between the feature and the target variable.
Features with a high correlation are selected.
• Wrapper Method, Based on the evaluation of the feature
subset using a specific machine learning algorithm. The feature
subset that results in the best performance is selected.
• Embedded Method, Based on the feature selection as part of
the training process of the machine learning algorithm.
Feature Scaling

• Feature Scaling is the process of transforming the features so


that they have a similar scale.
• Types of Feature Scaling:
• Min-Max Scaling, subtracting the minimum value and dividing by the
range.
• Standard Scaling, Rescaling the features to have a mean of 0 and a
standard deviation of 1 by subtracting the mean and dividing by the
standard deviation.
• Robust Scaling, Rescaling the features to be robust to outliers by
dividing them by the interquartile range.
Steps to Feature Engineering

• Data Cleaning (removing or correcting any errors or inconsistencies )


• Data Transformation (normalization, standardization, and log transformation)
• Feature Extraction (principal component analysis (PCA), text parsing, and image
processing)

• Feature Selection (correlation analysis, mutual information, and stepwise regression)


• Feature Iteration (adding new features, removing redundant features and transforming
features in different ways. Binning is the process of grouping continuous features into discrete
bins)

• Feature Split (splitting a single variable into multiple variables)


Feature engineering techniques
• Missing data imputation
• Categorical encoding
• Variable transformation
• Outlier engineering
Missing data imputation

1.Complete case analysis


2.Mean / Median / Mode imputation
3.Missing Value Indicator
• Complete Case Analysis for Missing Data Imputation
• remove all the observations that contain missing values
• can only be used when there are only a few observations which
has a missing dataset
# check how many observations we would drop
print('total passengers with values in all variables: ', data1.dropna().shape[0])
print('total passengers in the Titanic: ', data1.shape[0])
print('percentage of data without missing values: ', data1.dropna().shape[0]/ np.float(data1.shape[0]))
So, we have complete information for only 20% of our observations in the Titanic dataset.
Thus, Complete Case Analysis method would not be an option for this dataset.
Mean/ Median/ Mode for Missing Data
Imputation
• Missing values can also be replaced with the mean, median, or
mode of the variable(feature)

• Output: 0, no null value in Age feature.


Missing Value Indicator For Missing Value
Indication
• This technique involves adding a binary variable to indicate
whether the value is missing for a certain observation.
• Output:
So, the Age_NA variable was created to capture the missingness.
Categorical encoding in Feature Engineering

• There are multiple techniques to do so:

1.One-Hot encoding (OHE)

2.Ordinal encoding

3.Count and Frequency encoding

4.Target encoding / Mean encoding


One-Hot Encoding

• It is a commonly used technique for encoding categorical


variables. It basically creates binary variables for each category
present in the categorical variable.
• These binary variables will have 0 if it is absent in the category
or 1 if it is present. Each new variable is called a dummy variable
or binary variable.
only 1 dummy variable to represent Sex
categorical variable
Ordinal Encoding

• In this case, a simple way to encode is to replace the labels with


some ordinal number. Look at sample code:
Count and Frequency Encoding

• In this encoding technique, categories are replaced by the count


of the observations that show that category in the dataset.
• Replacement can also be done with the frequency of the
percentage of observations in the dataset.
• Suppose, if 30 of 100 genders are male we can replace male
with 30 or by 0.3.
Target / Mean Encoding

• replace each category of a variable with the mean value of the


target for the observations that show a certain category.
Variable Transformation

• Machine learning algorithms like linear and logistic regression


assume that the variables are normally distributed.
• If a variable is not normally distributed, sometimes it is possible
to find a mathematical transformation so that the transformed
variable is Gaussian.
• Commonly used mathematical transformations are:
1.Logarithm transformation – log(x)
2.Square root transformation – sqrt(x)
3.Reciprocal transformation – 1 / x
4.Exponential transformation – exp(x)
• Loading numerical features of the titanic dataset.
• Now, to visualize the distribution of the age variable we will plot
histogram and Q-Q-plot.
Now, let’s apply the transformation and
compare the transformed Age variable.
• Logarithmic transformation
Square root transformation – sqrt(x)
Reciprocal transformation – 1 / x
Exponential transformation – exp(x)

You might also like