You are on page 1of 10

1 What is dispersion of data?

2 Why is it important?
3 How do you measure dispersion of data?
4 is dispersion good or bad?
5 How Outliers influence the data?
6 Calculating disperion of data in excel

1 What is dispersion of data?

Dispersion is a spread or varaibility of data

Wider Dispersion

Narrow Dispersion

How do you measure the dispersion of data?

Dispersion of data is measured mainly by 2 methods


1. Standard Deviation
2. Inter Quartile Range
Standard Deviation measures how much each observation is different from mean value of the group

σ - Population Standard Deviation

(Each X Observations - Mean) 2


Total number of Observations

s-Sample Standard Deviation

(Each X Observations - Mean) 2


(Total number of Observations-1)

Population refers to entire data set and sample refers to subset of the population

Most of the time, we will be dealing with sample data as capturing entire population is not practical. Example, if w

Let us measure sample statdard deviation for the following data set
Test score of 10 students in the class is recorded as follows

Test Scores
80
70
90
60
50
80
90
70
60
50

Measuring Interquartile Range


To measure interquartile range, data is divided into 4 parts
First 25% of data(1st quartile)
Next 25% of data or 50% of cumilative data (Median)
Next 25% of data or 75% of cumilative data (3rd quartile)
Next 25% of data (3rd quartile to highest value of data)

First quartile: the lowest 25% of numbers


Second quartile: Greater than 25% of data and less than or equal to 50% of
the data
Third quartile:  Greater than 50% of data and less than or equal to 75% of
the data

Interquartile Range = Third Quartile - First Quartile

Leaving extreme values, interquartile consider width of data from 25% to


75% and measures dispersion

What is outlier? How does outlier influence Standard deviation?

Outlier - Any data point which is extreamly far away from main set of data

Method to identify outlier


There is no defined mathematical expression of outlier of the data
Following is the one of numerical methods to identify outlier of the data
Why measuring of dispersion of data is important?
In order to make your prediction accurate,Central tendency
measures( mean, median, mode) should be interpreted
along with measures of dispersion
Example, There are 2 bus routes
First bus A, used to arrive the bus stop between 7.45 to 8.15 am
Second bus B, used to arrive the bus stop between 7.00 to 9.00 am
Average time at which Bus A and Bus B arrive at the busstop - 8.00 am
Though the average time of Bus A and Bus B are same, catching bus A is easy as it it arrives in narrow timing window
(i.e arrival of Bus A is more predictable than Bus B)
Similarly, data with narrow dispersion is easy to predict and without measuring dispersion of data, mean or median value is m
mean value of the group

opulation is not practical. Example, if we want to analyse the food habit of Indians, we will survey the 100 or 200 people and draw inferenc

Mean of the data(average) Each Observation minus Mean


70 ((80+70+90+60+50+80+90+70+60+50)/10) 10 i.e(80-70)
0
20
-10
-20
10
20
0
-10
-20

Sum of Squares
Total number of observations
Total number of observations - 1

Sample Standard Deviation


n narrow timing window

data, mean or median value is meaningless


0 or 200 people and draw inference to entire population as it is not feasible to survey the entire Indian Population

Squares of previous column


100
0
400
100
400
100
400
0
100
400

2000( Sum of 100+0+400+100+400+0+100+400)


10
9

Squareroot of (2000÷9)
14.9
4 1.3
5 1.1
1 1.2
2 1.3
6 1.4
8 1.5 Chart Title
12 1.6
14 1.45
3.4 1.55
5.6 1.65
Chart Title

You might also like