You are on page 1of 5

Binning

The binning method can be used for smoothing the data.


Mostly data is full of noise. Data smoothing is a data pre-processing
technique using a different kind of algorithm to remove the noise from the
data set. This allows important patterns to stand out.
Smoothing by bin means : In smoothing by bin means, each value in a bin
is replaced by the mean value of the bin.
Smoothing by bin median : In this method each bin value is replaced by its
bin median value.
Smoothing by bin boundary : In smoothing by bin boundaries, the
minimum and maximum values in a given bin are identified as the bin
boundaries. Each bin value is then replaced by the closest boundary value.

Unsorted data for price in dollars


Before sorting: 8 16, 9, 15, 21, 21, 24, 30, 26, 27, 30, 34
First of all, sort the data
After Sorting: 8, 9, 15, 16, 21, 21, 24, 26, 27, 30, 30, 34
Bin depth – indicates the depth or no of bins

Smoothing the data by equal frequency bins


Bin 1: 8, 9, 15, 16
Bin 2: 21, 21, 24, 26,
Bin 3: 27, 30, 30, 34

(i) Smoothing by bin means


Smoothing by bin means : In smoothing by bin means, each
value in a bin is replaced by the mean value of the bin.

Apply Arithmetic Mean


For Bin 1:
(8+ 9 + 15 +16 / 4) = 12
(4 indicating the total values like 8, 9 , 15, 16)
Bin 1 = 12, 12, 12, 12

For Bin 2:
(21 + 21 + 24 + 26 / 4) = 23
Bin 2 = 23, 23, 23, 23

For Bin 3:
(27 + 30 + 30 + 34 / 4) = 30
Bin 3 = 30, 30, 30, 30
(ii)

Smoothing by bin median : In this method each bin value is replaced by its
bin median value.

Bin 1 : 12,12,12,12

Bin 2 : 22.5,22.5,22.5,22.5

Bin 3 : 30,30,30,30

Smoothing by bin boundaries


Smoothing by bin boundary : In smoothing by bin boundaries, the
minimum and maximum values in a given bin are identified as the bin
boundaries. Each bin value is then replaced by the closest boundary value.
For Bin 1
Bin 1: 8, 9, 15, 16

Min—8

Max—16

(i) Check for the occurences between min I,e 8 and max I,e 16 for
all the numbers between min and max
(ii) Whichever has got the closest distance compared to the lower
or higher it will be replaced.

Hence 8,8,15,16

Take 15 and it is closest to 16

Hence 8,8,16,16

21, 21, 24, 26

Bin 2 will be 21,21,24,26

Hence 21,21,26,26
Bin 3: 27, 30, 30, 34

27,27,27,34

How to smooth data by bin boundaries?


You need to pick the minimum and maximum value. Put the minimum on
the left side and maximum on the right side.
Now, what will happen to the middle values?
Middle values in bin boundaries move to its closest neighbor value with less
distance.
Unsorted data for price in dollars:
Before sorting: 8 16, 9, 15, 21, 21, 24, 30, 26, 27, 30, 34
First of all, sort the data
After sorting: 8, 9, 15, 16, 21, 21, 24, 26, 27, 30, 30, 34

Smoothing the data by equal frequency bins


Bin 1: 8, 9, 15, 16
Bin 2: 21, 21, 24, 26

After bin Boundary: Bin 3: 27, 27, 27, 34

Advantages (Pros) of data smoothing

Data smoothing clears the understandability of different important hidden


patterns in the data set.
Data smoothing can be used to help predict trends. Prediction is very helpful
for getting the right decisions at the right time.

Data smoothing helps in getting accurate results from the data.

Cons of data smoothing


Data smoothing doesn’t always provide a clear explanation of the patterns
among the data.
It is possible that certain data points being ignored by focusing the other
data points.

Example of binning for data smoothing


Sorted data for Age: 3, 7, 8, 13, 22, 22, 22, 26, 26, 28, 30, 37

How to smooth the data by equal frequency bins?

 Bin 1: 3, 7, 8, 13
 Bin 2: 22, 22, 22, 26
 Bin 3: 26, 28, 30, 37

How to smooth the data by bin means?

 Bin 1: 8, 8, 8, 8
 Bin 2: 23, 23, 23, 23
 Bin 3: 30, 30, 30, 30
How to smooth the data by bin boundaries?

 Bin 1: 3, 3, 3, 13
 Bin 2: 22, 22, 22, 26
 Bin 3: 26, 26, 26, 37

Question
1.If there is a attribute with height 100,102,110,113,123,143,116
Sort as:
100,102,110,113,116,123,143
Assume bins to be as 2 then
We can have as :
Bin 1: 100,102,110
Bin 2: 113,116,123,143
Rest of the concepts remain the same.

Reference :
https://t4tutorials.com/binning-methods-for-data-smoothing-in-data-
mining/

You might also like