Professional Documents
Culture Documents
Introduction
Values in a dataset might be of various magnitudes, ranges or
scales
• Data Transformation Techniques are used to transfer the features of
the data to the same scale, magnitudes or ranges.
• Some features in the data set will have high magnitude values (eg.
Annual Salary)
• Others with smaller value (eg. Number of years worked at a
company)
Scaling Standardization
Implementing Scaling using
Standard Scalar Method
• import pandas as pd
Import the
library
• null_=df.isna().any()
• dtypes=df.dtypes
Check any • info=pd.concat([null_,dtypes],axis=1,keys=[‘Null’, ‘type’])
missing data
• std_scale = preprocessing.StandardScaler().fit_transform(df)
• scaled_frame = pd.DataFrame(std_scale, columns=df.columns)
Perform • scaled_frame.head()
standard scaling
Implementing Scaling using
MinMax Scalar Method
• import pandas as pd
Import the
library
• null_=df.isna().any()
• dtypes=df.dtypes
Check any • info=pd.concat([null_,dtypes],axis=1,keys=[‘Null’, ‘type’])
missing data
• minmax_scale = preprocessing.MinMaxScaler().fit_transform(df)
• scaled_frame = pd.DataFrame(minmax_scale,columns=df.columns)
Perform • scaled_frame.head()
minmax scaling
Data Discretization
Is the process of continuous data into discrete buckets by grouping
it.