You are on page 1of 5

Feature Creation

What is Feature creation?


Feature creation is the process to create new set of attributes
from the original attributes that captures the important information
in a data set much more effectively.
→ No. of new set of attributes can be smaller than the no. of original
attributes.
→ Existing features can be mixed via arithmetic operations to create
new derived features that have greater predictive power.

For e.g : Below are the prices of properties in x city. It shows the
area of the house and total price.

Now we’ll add a new column Cost per sq feet.


In this new feature, meaningful and relevant pattern can be extracted
more easily.
When we plot the data, we’ll notice that one price is significantly
different from the rest. In the visualization method, we can readily
notice the problem.

There are 3 related methodologies for creating new attributes :-


→ Feature Extraction
→ Mapping data to new space
→ Feature Construction
Feature Extraction
It is a process of reducing the dimensionality of a dataset. Feature
extraction involves combining the existing features into new ones
thereby reducing the number of features in the dataset. This reduces
the amount of data into manageable sizes for algorithms to process,
without distorting the original relationships or relevant information.

For eg:- Consider a set of photographs, where each photograph is to


be classified according to whether or not it contains a human face.
The original (raw) data of a set of pixels and each image is lets say
of 256 x 256 pixels.

Using feature extraction, we can process data by combining


attributes (pixels) to make higher-level features such as the
presence or absence of certain types of edges and areas that are
highly correlated with the presence of human faces. This will not
only produce a more relevant set of features but also reduce the
dimensionality of the dataset thus making it more manageable.

This way a much broader set of classification techniques can be


applied to this dataset.
Mapping data to new space

A totally different view of the data can reveal interesting and important
features. Consider a time-based dataframe.
● We can extract parts of the date into different columns like Year,
month, day, a week, etc.
● We can find the number of days between two dates.
● We can create new features like if the day is weekend or
weekday.
● We can create features like if it’s a holiday or not.

This can be achieved by pandas function ‘strftime’ and ‘DatetimeIndex’.

Below is an example where we have extracted the month and week of the
year. Similarly, we can create more features like day, year, weekend, etc.

import pandas as pd

#reading file

df = pd.read_csv('housing_price.csv')

df['date'] = pd.to_datetime(df['date'])

df['month'] = pd.DatetimeIndex(df['date']).month

df['week'] = pd.DatetimeIndex(df['date']).week
Feature Construction

=> Deriving more higher-level features by combining already available


features to extract more useful information is known as feature
construction.

Feature construction has long been considered a powerful tool for


increasing both accuracy and understanding of structure, particularly
in high-dimensional problems.

For eg:- Consider a dataset consisting information about


Historical artifacts, which along with other information, contains the
volume and mass of each artifact. For simplicity, assume that these
artifacts are made of a small number of materials (wood, clay, bronze,
gold) and that we want to classify the artifacts w.r.t the material of
which they are made. In this case a density feature i.e.

Density = mass/volume

would most directly yield an accurate classification.

Another e.g :
If we have a patient dataset consisting of attributes Name, Patient Id,
Height , Weight , Age. and we are interested in the category of weight
(overweight, underweight, normal weight) a patient lies in, then the
feature BMI (Body Mass Index) i.e.

BMI = weight(kgs) / ( height(m) )2

Will be more beneficial to us and we can calculate it using height and


weight attributes.

You might also like