You are on page 1of 7

Data in Different Scales

Introduction
Values in a dataset might be of various magnitudes, ranges or
scales
• Data Transformation Techniques are used to transfer the features of
the data to the same scale, magnitudes or ranges.
• Some features in the data set will have high magnitude values (eg.
Annual Salary)
• Others with smaller value (eg. Number of years worked at a
company)

Scaling Standardization
Implementing Scaling using
Standard Scalar Method

• import pandas as pd
Import the
library

• null_=df.isna().any()
• dtypes=df.dtypes
Check any • info=pd.concat([null_,dtypes],axis=1,keys=[‘Null’, ‘type’])
missing data

• std_scale = preprocessing.StandardScaler().fit_transform(df)
• scaled_frame = pd.DataFrame(std_scale, columns=df.columns)
Perform • scaled_frame.head()
standard scaling
Implementing Scaling using
MinMax Scalar Method

• import pandas as pd
Import the
library

• null_=df.isna().any()
• dtypes=df.dtypes
Check any • info=pd.concat([null_,dtypes],axis=1,keys=[‘Null’, ‘type’])
missing data

• minmax_scale = preprocessing.MinMaxScaler().fit_transform(df)
• scaled_frame = pd.DataFrame(minmax_scale,columns=df.columns)
Perform • scaled_frame.head()
minmax scaling
Data Discretization
Is the process of continuous data into discrete buckets by grouping
it.

Discretization is used for easy maintainability of the data.

Training a model with discrete data becomes faster and more


effective than attempting with continuous data.

Continuous data contains more information, huge amount of data


can slow the model down.

Methods used for data discretization are binning and using a


histogram
Discretization of Continuous
Data

Import the library


•import pandas as pd

Display first five


records
•df.head()
• df['bucket']=pd.cut(df['marks'],5,labels=['Poor','Below_average','
Average','Above_Average','Excellent'])
Perform bucketing
using pd.cut()
• df.head(10)

You might also like