You are on page 1of 119

LECTURE 5

THE MEASURES OF DISPERSION AND SKEWNESS


Learning Outcomes

At the end of this Lecture, you should be able to:


1. Explain the meaning of dispersion through examples;
2.Define various measures of dispersion - range, mean
deviation, variance and standard deviation and compute
them.
3. Compute Combined Standard Deviation
4. Explain the concept of skewness in statistics
5. Compute various measures of skewness
Introduction

In the previous lecture, you learned when and how to compute


several measures of central tendency. This lecture explores
other important properties of data: dispersion and skewness. In
spite of their great utility in statistical analysis, averages have
their own limitations.
If we are given only the average of a series of observations, we
cannot form a complete idea about the distribution since there
may exist a number of distributions whose averages are same but
which may differ widely from each other in a number of ways.
The following example will illustrate this view point.
Introduction……..

Let us consider the following three series A, B and C of 9 items each.


 Series Total Mean Mode Median
 A 15, 15, 15, 15, 15, 15, 15, 15, 15 135 15 15 15
 B 11, 12, 13, 15, 15, 15, 17, 18, 19 135 15 15 15
 C 3, 6, 9, 15, 15, 18, 20, 22, 27 135 15 15 15
Introduction……..

All the three series A, B, and C, have the same size (n=9) and the same
mean, mode and median, i.e., 15. Thus, if we are given that the
mean, mode or median of a series of 9 observations is 15, we cannot
determine if we are talking of the series A, B, or C.
In fact any series of 9 items which total 135 will give mean 15.
Thus, we may have a large number of series with entirely different
structures and compositions but having the same mean. The measures
of central tendencies (i.e. mean, mode or median) indicate the
general magnitude of the data and locate only the center of a
distribution of measures. They do not establish the degree of
variability or the spread out or scatter of the individual items and their
deviation from (or the difference with) the mean.
Definition of dispersion

Literal meaning of dispersion is ‘scatteredness’. We study dispersion


to have an idea of the homogeneity (compactness) or heterogeneity
(scatter) of the distribution.
In the above illustration, we say that the series A is stationary, i.e.,
it is constant and shows no variability. Series B is slightly dispersed
and series C is relatively more dispersed. We say that series B is
more homogeneous (or uniform) as compared with series C or the
series C is more heterogeneous than series B.
Spiegel notes that ‘the degree to which numerical data tend to
spread about an average value is the variation or dispersion of the
data’
Characteristics for an Ideal Measure of Dispersion

 It should be rigidly defined.


 It should be easy to calculate and easy to understand
 It should be based on all the observations
 It should be amenable to further mathematical treatment
 It should not be affected much by extreme observations
 It should have sampling stability.
Absolute Vs. Relative Measures of Dispersion

The measures of dispersion which are expressed in terms of the


original units of a series are termed as absolute measures.
Such measures are not suitable for comparing the variability of
the two distributions which are expressed in different units of
measurement.
On the other hand, relative measures of dispersion are
obtained as ratios or percentages and are thus pure numbers
independent of the units of measurement. For comparing the
variability of the two distributions (even if they are measured
in the same units), we compute the relative measures of
dispersion instead of absolute measures of dispersion.
1.The Range

 It is the simplest of all the measures of dispersion. It is


defined as the difference between the two extreme
observations of the distribution. In other words, range is
the difference between the greatest (maximum) and the
smallest (minimum) observation of the distribution. Thus:
 Range = Xm – Xo (Range = L-S).
 Where Xm is the Maximum Value and
 Xo is the smallest observation (Minimum Value)
1.The Range…….

 In case of the grouped frequency distribution (for discrete


values) or the continuous frequency distribution, range is
defined as the difference between the upper limit of the
highest class and the lower limit of the smallest class. It is
also calculated by using the difference between the mid
points of the highest class and the lowest class. Here, the
frequencies of the classes are immaterial .
 To compute the coefficient of range (for comparison purposes), we have:

Coefficient of Range Formula: s = (xm- xo)/(xm + xo)


 Where xm = Maximum Value xo = Minimum Value
1.The Range…….

Range is an index of variability. When the range is more the group is


more variable. The smaller the range the more homogeneous is the
group. Range is the most general measure of ‘spread’ or ‘scatter’ of
scores (or measures). When we wish to make a rough comparison of
variability of two or more groups we may compute the range.
Range as compared above is in a crude form or is an absolute
measure of dispersion and is unfit for the purposes of comparison,
especially when the series are in two different units. For the purpose
of comparison, coefficient of range is calculated by dividing the
range by the sum of the largest and the smallest- items.
1.The Range…….

 Example: Find the range and the coefficient of range of the following prices of
shares of ABC company ltd.
 Day: Monday Tuesday Wednesday Thursday Friday
Saturday
 Prices KShs: 200 210 208 160 200 250

 Solution:
 Range = Xm – Xo where Xm = 250, Xo = 160.
 = 250 – 160 = Ksh90.
Coefficient of range = S = (Xm – Xo )/(Xm + Xo)= 250 – 160 = 0.22
 250 + 160
1.The Range…….Example

 Letus take two sets of observations. Set A contains


the marks of five students in mathematics out of
25 marks and group B contains marks of the same
students in English out of 100 marks.
 SetA:   10, 15, 18, 20, 20
Set B:   30, 35, 40, 45, 50
 Thevalues of the ranges and coefficients of range
are calculated as:
1.The Range…….Example

Range Coefficient of Range

Set A: (Mathematics) 20 – 10 = 10 20 –10 /20 +10 = 0.33


Set B: (English) 50 – 30 = 20 50 – 30 /50+30 = 0.25
In set A the range is 10 and in set B the range is 20. Apparently it seems there is greater dispersion
in set B, but this is not true.
The range of 20 in set B is for more observations and the range of 10 in set A is for
fewer observations. Thus 20 and 10 cannot be compared directly. Their base is not
the same. The marks in mathematics are out of 25 and the marks of English are out
of 100. Thus, it makes no sense to compare 10 with 20.
When we convert these two values into coefficients of range, we see that the
coefficient of range for set A is greater than that of set B. Thus there is greater
dispersion or variation in set A. The marks of students in English are more stable than
their marks in mathematics.
1.The Range…….Example

Find the range of the weight of the students of a university.


Calculate the range and coefficient of range.

Weight (Kg) 60 - 62 63 - 65 66 - 68 69 - 71 72 - 74

Number of 5 18 42 27 8
Students
1.The Range……. Solution

Weight (Kg) Class Boundaries Mid Value No. of Students


60-62 59.5 - 62.5 61 5
63-65 62.5 – 65.5 64 18
66-68 65.5 – 68.5 67 42
69-71 68.5 – 71.5 70 27
72-74 71.5 – 74.5 73 8
1.The Range……. Solution

 Method 1:- Using Class boundaries


 Here Xm= the upper class boundary of the highest class =74.5
 Xo = and the lower class boundary of the lowest class =59.5
 Range = Xm – Xo = 74.5 - 59.5 = 15
Coefficient of range = S = (Xm – Xo )/(Xm + Xo)= 74.5 – 59.5 = 0.1119
74.5 + 59.5
1.The Range……. Solution

 Method 2: Using Class Mid Points


 Here Xm= the mid value of the highest class = 73
 Xo = and the mid value of the lowest class = 61
 Range = Xm – Xo = 73 - 61 = 12

Coefficient of range = S = (Xm – Xo )/(Xm + Xo)= 73 – 61 = 0.0895


73 + 61
Activity

 Find the range and coefficient of range of data


in following distribution using both methods.
Solution - Method 1:- Using Class boundaries

 In this case, the upper true limit of the highest class 70-79
is Xm = 79.5 and the lower true limit of the lowest class
20-29 is Xo = 19.5

 Therefore, Range R = Xm – Xo = 79.5 – 19.5 = 60.00

Coefficient of range = S = (Xm – Xo )/(Xm + Xo)= 79.5 – 19.5 = 0.6060


79.5 + 19.5
1. Solution- Method 2: Using Class Mid Points

 Here Xm= the mid value of the highest class = 74.5


 Xo = and the mid value of the lowest class = 24.5
 Range = Xm – Xo = 74.5 – 24.5 = 50

Coefficient of range = S = (Xm – Xo )/(Xm + Xo)= 74.5 – 24.5 = 0.5050


74.5 + 24.5
Merits of Range

 1. Range is easily calculated and readily understood.


 2. It is the simplest measure of variability. It provides a quick estimate
of the measure of variability.
 3. It is used to check the quality of a product for quality control.
Range plays an important role in preparing R- charts, thus quality is
maintained.
 4. The idea about the price of Gold and Shares is also made taking
care of the range in which prices have moved for the past some
periods.
 5. Meteorological Dep’t. Also makes forecasts about the weather by
keeping range of temp, in view.
Demerits of Range

 1. Range does not change even the least even if all other, in between, terms
and variables are changed.
 2. It is not based on all the observations of the series. It only takes the highest
and the lowest scores in to account.
 3. For open-end intervals, range is indeterminate because lower and upper
limits of first and last interval are not given.
 4. It is affected very greatly by fluctuations in sampling. Its value is never
stable. In a class where normally the height of students ranges from 150 cm to
180 cm, if a dwarf, whose height is 90 cm is admitted, the range would shoot
up from 90 cm to 180 cm. .
 5. Range does not present the series and dispersion truly. Asymmetrical and
symmetrical distribution can have the same range but not the same dispersion.
It is of limited accuracy and should be used with caution. .
Uses of Range: In spite of these limitations, the
range has its applications in a number of fields;

 It is used in studying the variations in the prices of stocks (i.e., stock


market fluctuations) and other commodities.
 Range is used in industry for the statistical quality control of the
manufactured product by the construction of R-chart, i.e., the
control chart for range.
 Used very conveniently by meteorological department; i.e., maximum
and minimum temperatures of the day.
 Most widely used measure of variability in our day-to-day life,
difference between highly paid and lowly paid worker, etc.
When to use Range?

 1. When the data are too scant or too scattered


to justify the computation of a more precise
measure of variability.

 2. When a knowledge of extreme scores or of


total spread is all that is required.
2.Quartile Deviation and its
Coefficient
Range is the interval or distance on the scale of
measurement which includes 100 percent cases. The
limitations of the range are due to its dependence on the
two extreme values only.
There are some measures of dispersion which are
independent of these two extreme values. Most common of
these is the quartile deviation which is based upon the
interval containing the middle 50 percent of cases in a given
distribution.
2.Quartile Deviation and its
Coefficient
One quarter means 1/4th of something, when a scale is
divided in to four equal parts. “The quartile deviation or Q is
the one-half the scale distance between the 75t and 25th
percentiles in a frequency distribution.”
Quartile deviation is based on the lower quartile Q1 and the
upper quartile Q3.
The difference Q3–Q1 is called the inter quartile range.
The difference Q3–Q1 divided by 2 is called semi-inter-
quartile range or the quartile deviation. Thus
2.Quartile Deviation and its
Coefficient………….
 Symbolically:

The quartile deviation is a slightly better measure of absolute dispersion than


the range, but it ignores the observations on the tails. If we take difference
samples from a population and calculate their quartile deviations, their
values are quite likely to be sufficiently different. This is called sampling
fluctuation, and it is not a popular measure of dispersion. The quartile
deviation calculated from the sample data does not help us to draw any
conclusion (inference) about the quartile deviation in the population.
2.Quartile Deviation and its
Coefficient………….
 The Quartile Deviation (Q) is one half the scale distance between the Third Quartile
(Q3) and the First Quartile (Q1):

 L = Lower limit of the c.f. where Q3 lies,


 3N/4= 3/4 of N or 75% of N.
 F = total of all frequencies below ‘L’,
 fq = Frequency upon which Q3 lies
 and i = size or length of the class magnitude
2.Quartile Deviation and its
Coefficient………….
 The first quartile

 L = Lower limit of the class where Q1 lies,


 N/4 = One fourth (or 25%) of N,
 F = total of all frequencies below ‘L’,
 fq = frequency of the class upon which Q1 lies,
 and i = size or length of class magnitude upon which Q1 lies
2.Quartile Deviation and its
Coefficient………….
 1. Inter-Quartile Range:
 The range between the third quartile and the first quartile is known as
the inter-quartile range. Symbolically inter-quartile range = Q3 – Q1.
 Semi-Interquartile Range:
 It is half the distance between the third quartile and the first quartile.
 Thus, S I R. = Q3 – Q1/4
 Q or Quartile Deviation is otherwise known as semi-interquartile range
(or S.I.R.)
 Thus, Q = Q3 – Q1/2
Note: If we will compare the formula of Q3 and Q1 with
the formula of median the following observations will be
clear:

 i.In case of Median we use N/2 whereas for


Q1 we use N/4 and for Q3 we use 3N/4.
 ii.In case of median we use fm to denote the
frequency of class, upon which median lies;
but in case of Q1 and Q3 we use fq to denote
the frequency of the class upon which Q1 or Q3
lies.
2.Coefficient of quartile deviation

A relative measure of dispersion based on the quartile


deviation is called the coefficient of quartile deviation. It is a
pure number free of any units of measurement. It can be used
for comparing the dispersion of two or more sets of data.
2.Quartile Deviation and its
Coefficient- Ungrouped Data
Example: Calculate the Quartile Deviation and Co-efficient
of Quartile Deviation from the following data:
20,8,12,10,14,18,16.

Solution
Arrange data in ascending order: 8, 10, 12, 14, 16, 18, 20.
2.Quartile Deviation and its
Coefficient- Ungrouped Data…..
 Here, n = 7, the first and third quartiles are:
2.Quartile Deviation and its
Coefficient- Discrete Series
2.Quartile Deviation and its
Coefficient- Continuous Series
2.Quartile Deviation and its
Coefficient- Continuous Series…….
Activity:

 Calculate of Quartile Deviation and Coefficient of Quartile Deviation


Solution
Merits of quartile deviation:

 Quartile deviation is simple to calculate and easy to understand.


 It is more representative and trust worthy than range. In case of
open ended class intervals it is used in studying measures of
dispersion.
 It is a good index of score density at the middle of the distribution.
 Quartile-deviation is rigidly defined
 The presence of extreme observations has no impact on quartile-
deviation since quartile-deviation is based on the central fifty-
percent of the observations.
 Wherever median is preferred as a measure of central tendency,
quartile deviation is preferred as measure of dispersion.
Demerits of quartile deviation

 1. It is not based on all the observations of data. It


ignores the first 25% and the last 25% of the scores.
 2. Further algebraic treatment is not possible in case of
Q. It is only a positional average. It does not study
variation of the values of a variable from any average. It
merely indicates a distance on a scale.
 3. It is affected by fluctuation of scores. Its value is
affected in any case, by a change in the value of a
single score.
Uses of quartile deviation

 When extreme scores affect S.D. or the scores are


scattered at that time Q is used as measure of variability.
 When our primary interest is to know the concentration
around the median-the middle 50% of cases, at that time
Q is used.
 When the median is a measure of a central tendency;
 Quartile deviation is the best measure of dispersion for
open-end classifications..
3. The Average Deviation (A.D.)

 It is because both the dispersions do not take all individual scores into
account. We can overcome some of the serious shortcomings of range
and quartile deviation by using another dispersion called Average
deviation or Mean deviation.
 “Average deviation is the arithmetic mean of all the deviations of
different scores from the mean value of the scores without the
regard for sign of the deviation.”
 Average deviation is the arithmetic mean of the deviations of a
series computed from some measure of central tendency (mean,
median or mode), all the deviations being considered positive.
 In other words the average of the deviations of all the values from
the arithmetic mean is known as mean deviation or average
deviation.
3. The Average Deviation (A.D.)…..

 No account is taken of signs and all deviations whether +ve or —ve


treated as positive.

Where AD = Average deviation


∑ = Capital Sigma, Means Sum total of
I I = Modulous in short Mod, means no respect to negative sign.
x = deviation, (X—M)
Computation of Average Deviation

 There are two situations for computing average


deviation:
 (a) When data are ungrouped.
 (b) When data are grouped.
Computation of AD from ungrouped data.
Example:

Find AD of the following 10 scores given below:


23, 34, 16, 27, 28, 39, 45, 26, 18, 27
 Solution:
 Step-1: Find out the mean of the scores with formula:
∑X/N
 Step-2: Find out deviation of all scores deducting the
mean from the scores.
 Step-3: Find out the absolute deviation as shown in table-
9.2 and then ∑ |x|
Solution:
Solution:

 Step-4: Put the values in formula.


 The A.D. = 7.58.


Example

 Calculate mean deviation about Arith­metic Mean of the


following numbers: 35, 70, 15, 75, 30, 50
 Solution:
 Let us arrange the numbers in an increasing order as 15,
30, 35, 50, 70, 75 and compute their AM as:
 AM = 15 + 30 + 35 + 50 + 70 + 75/6 = 275/6
 = 45.83.
Solution: Now, we convert the data into:
Activity

Find mean deviation for the


following set of variates:
X = 55, 45, 39, 41, 40, 48, 42, 53, 41,
56
Solution
Solution

 Here we have


Computation of AD from grouped data:
discrete Series
 Calculate the Mean Deviation for the following data:

 Solution:
 To calculate MD of the given distribution, we construct
the following table:
Solution:
Solution:
Computation of AD from grouped data:
Continuous Series
 Find the mean deviation for the following frequency
distribution:

 Step 1: Find out the mean of the distribution. Mean = 32


 Step-2: Find out the midpoint for each class intervals. As
in column —3 of table —4.4
Solution:

 Step-3: Find out absolute deviation or |d|. As column —4.


 Step-4: Find out |fd|. by multiplying f with |d|. As shown in
column —5 and find out Σ|fd|.
 Step-5: Put the above values in formula.
 The formula for AD from grouped data
Solution:
Coefficient of Mean Deviation

It is calculated to compare the data of two series.


The coefficient of mean deviation is calculated by
dividing mean deviation by the average.
If deviations are taken from mean, we divide it by
mean, if the deviations are taken from mode, then
it is divided by mode and if the “deviations are
taken from median, then we divide mean deviation
by median.
Coefficient of Mean Deviation
Activity

Compute M.D and Coefficient of M.D from mean and median


and from the following data

Marks 0 – 10 10 – 20 20 – 30 30 – 40 40- 50

No. of 6 28 51 11 4
Students
Solution
Solution
Merits of A.D.

 Average deviation is rigidly defined and its value is precise


and definite.
 It is easy to calculate numerically and simple to understand
 It is based on all the observations.
 It is less affected by the value of extreme scores.
 On many occasions it gives fairly good results to represent
the degree of varia­bility or the extent of dispersion of the
given values of a variable as it takes sepa­rately all the
observations given into account
Demerits of A.D.

 1. The most serious drawback with average deviation is


that it ignores the algebraic signs of the deviations which
is against the fundamental rules of mathematics.
 2. Further algebraic treatment is not possible in case of
AD.
 3. It is very rarely used. Because of standard deviation is
generally used as a measure of dispersion.
 4. When calculated from mode AD does not give accurate
measure of dispersion.
Uses of Average Deviation

 1. Average deviation is used when it is desired to


weight all the deviations from the mean according to
their size.
 2. When extreme scores influence standard deviation
at that time AD is the best measure of dispersion.
 3.AD is used when we want to know the extent to
which the measures are spread out either side of the
mean.
4. The Standard Deviation (SD)

The range just takes in to account only the highest score and
the lowest score.
The quartile deviation takes into account only the middle
50% of scores and in case of average deviation we ignore the
signs.
Therefore in order to overcome all these difficulties we use
another measure of dispersion called Standard Deviation. It is
commonly used in experimental research as it is the most stable
index of variability.
Symbolically it is wrote as σ (Greek small letter sigma).
4. The Standard Deviation (SD)….

 It is also the most important because of being the only


measure of dispersion amenable to algebraic treatment.
 Here also, the deviations of all the values from the mean
of the distribution are considered. This measure suffers
from the least drawbacks and provides accurate results.
 It removes the drawback of ignoring the algebraic signs
while calculating deviations of the items from the
average. Instead of neglecting the signs, we square the
deviations, thereby making all of them positive.
It differs from the AD in several
respects:
 i. In computing AD or MD, we disregard signs, whereas in
finding SD we avoid the difficulty of signs by squaring the
separate deviations;
 ii.The squared deviations used in computing SD are always
taken from the mean, never from the median or mode.

 “Standard deviation or S.D. is the square root of the


mean of the squared deviations of the individual scores
from the mean of the distribution.”
4. The Standard Deviation (SD)……..

Standard deviation is the square root of the average value of the squared
deviations of the scores from their arithmetical mean. The SD is computed
by summing the squared deviation of each measure from the mean,
divided by the number of cases and extracting the square root.
To be clearer, we should note here that in computing the SD we square all
the deviations separately, find their sum, divide the sum by the total
number of scores and then find the square root of the mean of the
squared deviation. That is why it is also called the ‘root mean square
deviation’.
The square of standard deviation is known as Variance (σ2). It is referred
to as the mean square deviation. It is also called as the second moment
dispersion.
The Standard Deviation (SD)……..
Steps in the calculation of SD from
Ungrouped Data
 1. First of all, calculate the mean of the data.
 2. Find out deviation from the mean for each variable of the series.
 3. Square each deviation from the mean computed in step 2
 4. Find the sum of the squared deviations computed in step 3
 4. Divide the sum of the squared deviations by n, the no. of
observations given in the series. – At this point, You have
computed the VARIANCE OF THE SERIES.
 5. Take the positive square root of the quotient you have computed
in step 4 and the result would be the standard deviation of the
series.
Computation of SD from Ungrouped Data

 Example: Find out the SD of the following data:


 6, 8, 10, 12, 5, 8, 9, 17, 20, 11.
 Solution:
 Step-1: Find out the mean of the scores.
 Step-2: Find out deviation (x) of each score from the
mean. In this case since the mean is 11.6, we will subtract
the mean from each variable (x) .
Computation of SD from Ungrouped Data
Computation of SD from Ungrouped Data
Activity

 Calculate SD of the following numbers:


 20, 85, 120, 60 and 40
 Solution
 We can represent AM of the given number as:
Solution

 Now, we calculate the desired SD through the following exercise:


Computation of SD from grouped Data-

 From the grouped data standard deviation can


be calculated by two methods:

 (A) Long Method.

 (B) Short or Coded Method.


Steps for Calculating Standard Deviation
(SD) by Long Method
 1. Find out midpoints (X) of the intervals or use the variate given.
 2. Find the mean of the distribution.
 3. Find out deviations of the midpoints( or variate) from the mean (X)
 4. Then Square all the deviations from the mean to obtain (x2)
 5. Find fx2(multiply each of the squared deviation by its respective
frequency).
 6. Add all the fx2 so as to obtain ∑fx2.
 6. Divide the sum- ∑fx2 by the total number of cases {∑fx2/N}.THE
VALUE YOU OBTAIN IS THE VARIANCE OF THE DISTRIBUTION.
 7. Find out square root of the quotient. The result would be the
standard Deviation or SD.
Steps for calculating standard deviation
(SD) by short or coded method:
 1. Find the mean of the distribution.
 2. Find out fx2 and the sum of fx2 i.e., the ∑fx2
 3. Find out ∑fx2/N and ∑fx/N(mean)
 4. Subtract the square of ∑fx/N from ∑fx2/N
 5.Find out the square root of the remainder and
the product will be the SD.
FORMULA

 In case of a GROUPED DATA, standard deviation is calculated using the


formula given below
Long Method/Direct Method: Example

 Find the S.D. for the following distribution

 Here also, the first step is to find the mean M, for which we have to take the
mid-points of the c.i’s denoted by X’ and find the product fX.’. Mean is given
by ∑fx’/N.
 The second step is to find the deviations of the mid-points of class intervals X’
from the mean i.e. X’- M denoted by d.
 The third step is to square the deviations and find the product of the squared
deviations and the corresponding frequency.
Solution

To solve the above problem, c.i.’s are written in column 1,


frequencies are written in column 2, mid-points of c.i’s i.e.
X’ are written in column 3, the product of fX’ is written in
column 4, the deviation of X’ from the mean is written in
column 5, the squared deviation d2 is written in column 6,
and the product fd2 is written in column 7, As shown below.
Thus, the required standard deviation is 4.74.
Solution
Combined Standard Deviation (σcomb):

 When two sets of scores have been combined into a single lot, it is
possible to calculate the σ of the total distribution from the σ’ s of
the two component distributions.
 The formula is:
Combined Standard Deviation (σcomb)

 where σ1 , = SD of distribution 1
 σ2 = SD of distribution 2
 d1 = (M1 – Mcomb)
 d2 = (M2 – Mcomb)
 N1 = No. of cases in distribution 1.
 N2 = No. of cases in distribution 2.
Combined Standard Deviation (σcomb)

 Example : Suppose we are given the means and SD’s on an


Achievement Test for two classes differing in size, and are
asked to find the combined group.
Solution

 First, we find that


Combined Standard Deviation (σcomb)

 The formula can be extended to any number of distributions. For example, in


the case of three distributions, it will be
Merits of S.D

 Standard deviation is rigidly defined and its value is always definite.


 It is the most widely used and important measure of dispersion. It occupies a central
position in statistics.
 It is based on all the observations of data.
 It is capable of further algebraic treatment and possesses many mathematical properties.
 Unlike Q and AD it is less affected by fluctuations of scores.
 Unlike AD, it does not ignore the negative signs. By squaring of deviations it overcomes
these difficulties.
 It is the reliable and most accurate measure of variability. It always goes with the mean
which is the most stable measure of central tendency.
 S.D. gives a measure that is comparable meaning from one test to other. Above all the
normal curve units are expressed in a unit.
Demerits of S.D:

 1. It involves complicated and laborious numerical calculations


especially when the information are large enough.
 2. The concept of SD is neither easy to take up, nor much simple
to calculate.
 3. It is considerably affected by the extreme values of the given
variable.
 4. S.D. gives more weight to extreme scores and less to those
which are nearer to the mean. It is because the squares of the
deviations, which are big in size, would be proportionately greater
than the squares of those deviations which are comparatively small
When to Use the Standard Deviation
(SD)
 1. S.D. is used when our thrust is to measure the variability having
greatest stability.
 2. When extreme deviations might affect the variability at that time
S.D. is used.
 3. S.D. is used for calculating the further statistics like coefficient of
correlation, standard scores, standard errors, Analysis of Variance,
Analysis of Co-variance etc.
 4. When scores are to be properly interpreted with reference to the
normal curve.
 5. When we want to determine the reliability and validity of test
scores, S.D. is used.
Measurements of Relative Dispersion
(Coefficient of Variation)
 The measures of dispersion give us an idea about the extent
to which scores are scattered around their central value.
Therefore, two frequency distributions having the same
central values can be compared directly with the help of
various measures of dispersion.
 We have situations when two or more distributions having
unequal means or different units of measurements are to be
compared in respect of their scattered-ness or variability.
For making such comparisons we use coefficients of relative
dispersion or coefficient of variations (C.V.).
Computation of Coefficient of Variation:

 Merits of Coefficient of Variation:


 1. Coefficient of variation is independent of unit of
measurement as it is expressed as a percentage.
 2. It facilitates comparison of data sets with different
units of measurement and significantly different means.
Demerits of Coefficient of Variation:

 1.Coefficient of variation cannot be computed if


the mean of a data set is zero.
 2.It is misleading when there are positive and
negative values in a data set.
 3. It cannot be used to determine the confidence
interval for mean as in case of standard deviation.
Lorenz Curve

 Lorenz curve is a graphical method of studying the dispersion of


data, named after Dr. Max O. Lorenz, who developed it in 1905.
He studied the dispersion of wealth by graphical method. In order
to construct a Lorenz curve, the items as well as the frequencies
are cumulated and the total is considered as 100 percentages.
 Then percentages are calculated for the cumulated values. These
percentages are plotted on a graph paper. If there is equal
distribution of frequencies, the points would lie on a straight
line. This line is known as the line of equal distribution or the
line of equality.
Lorenz Curve

However, if the distribution is unequal, the curve would be away from


the line of equality. The farther the curve from the line of equal
distribution, the higher is the inequality or dispersion. Given below is the
Lorenz curve depicting income distribution among households.
As shown by the figure the further away the Lorenz Curve is from the
"line of perfect equality" (diagonal), the more diverse is the sample and
the more unevenly the values are spread out . This is very useful to
estimate how wealth is distributed among a population: if a country's
Lorenz Curve is distant from the line of perfect equality, it means a
small % of the population controls most of the wealth and that the
country's income distribution is uneven.
Lorenz Curve

 In a perfectly equal country, 60% of the population should


earn 60% of the country's wealth, but in this example:
60% of the population of country X earns 20% of the
country's wealth

 60% of the population of country Y earns 15 of the


country's wealth
 This means that the income distribution in country Y is
more unequal than in country X
Lorenz Curve
Lorenz Curve: To draw a Lorenz Curve, follow these
steps:

 Gather the data (e.g. census data from two cities)


 For each set of data, rank the categories and order them by rank in a
table
 Convert each value in a % of the total
 Calculate the running totals (i.e. cumulative %, by adding the % of
one line to the ones before)
 Graph ranks (horizontal) against cumulative % (vertical)
 Draw the "even distribution line" running from (rank = 0, % = 0) to
(rank = max, % = 100%), which represents the line if all the categories
were the same size.
Example: comparison of employment
between city block #1 and city block #2
 Employment survey in city block #1:
Number Frequency Cumulative
Occupation Rank
employed (= % of total) Frequency %
Office workers 195 1 60% 60%
Retail workers 34 2 10.5% 70.5%
Waiters 25 3 7.6% 78.1%
Teachers 23 4 7.1% 85.2%
Public
28 5 5.5% 90.7%
Employees
Managers 16 6 4.8% 95.5%
Doctors 7 7 2.1% 97.6%
Unemployed 6 8 1.8% 99.4%
CEOs 1 9 0.6% 100%
Example: comparison of employment
between city block #1 and city block #2
 Employment survey in city block #2:
Number Frequency Cumulative
Occupation Rank
employed (= % of total) Frequency %
Waiters 19 1 26.3 26.3
Managers 18 2 25% 51.3%
Office workers 14 3 19.4% 70.7%
CEOs 6 4 8.3% 79.0%
Doctors 5 5 6.9% 85.9%
Retail workers 5 6 5.5% 91.4%
Public
3 7 4.2% 95.6%
employees
Unemployed 2 8 2.9% 98.5%
Teachers 1 9 1.5% 100%
Example: comparison of employment
between city block #1 and city block #2
Example: comparison of employment
between city block #1 and city block #2
Graphic interpretation: The Lorenz Curve for the city block #2 (red) is
closer to the Even Distribution Line (blue) than for city block #1 (green):
this means that various types of jobs are more evenly distributed in city
block #2, while more people tend to do the same kind of work in city block
#1 (e.g. 60% of them hold the type of work found at rank #1, i.e. office
workers).

However, in both cases, we find that there appears to be a significant


deviation from the "ideal" line of even distribution, which means that in
both cases, there isn't much diversity in the types of jobs found in both
blocks: just two types of jobs employ 50% (city block #2) to 70% (city block
#1) of all people, while other types of jobs are much less represented.
SKEWNESS
 Literal meaning of skewness is ‘lack of symmetry’. We
study skewness to have an idea about the shape of the
curve which we can draw with the help of the given
frequency distribution. It helps us to determine the
nature and extent of concentration of the observations
towards the higher or lower values of the variable. A
distribution with an asymmetric tail extending out to the
right is referred to as “positively skewed” or “skewed to
the right,” while a distribution with an asymmetric tail
extending out to the left is referred to as “negatively
skewed” or “skewed to the left.”
Types of Skewness

Figure 1. Sketches showing general position of mean, median, and mode in a population.
A Skewed Left b Skewed Right
Long tail point’s left Long tail points right
Types of Skewness

In negatively skewed In the positively skewed


distribution, it has a longer distribution, the curve has a
tail towards the left and the longer tail towards the right
and the value of mean is
value of mode is maximum
maximum and that of mode
‘and that of mean least, the
least and the median lies in
median lies in between the between.
two.
Types of Skewness

 Symmetric- Normal Tails are balanced

 Normal or Symmetrical Distribution: The spread of the frequencies is the


same on both sides of the centre point of the curve. The curve drawn for such
distribution is bell-shaped. The value of Mean, Median and Mode are equal.
Existence of skewness

 In order to ascertain whether a distribution is skewed or not


the following tests are applied.
 Skewness is present if -
 If mean, median and mode are not equal.
 If the curve is not bell shaped.
 Quartiles are not equidistant from the median.
 If the sum of deviations from median and mode is not zero, and
 If the sum of frequencies on the two sides of the mode are not
equal, the distribution has skewness.
Tests of Skewness

 There are certain tests to know whether skewness does or does not exist
in a frequency distribution.
 They are:
 1. In a skewed distribution, values of mean, median and mode would not
coincide. The values of mean and mode are pulled away and the value of
median will be at the centre. In this distribution, mean-Mode = 2/3
(Median - Mode).
 2. Quartiles will not be equidistant from median.
 3. When the asymmetrical distribution is drawn on the graph paper, it
will not give a bell shaped curve.
 4. Sum of the positive deviations from the median is not equal to sum of
negative deviations.
 5. Frequencies are not equal at points of equal deviations from the
mode
Nature of Skewness

 Skewness can be positive or negative or zero.


 1. When the values of mean, median and mode
are equal, there is no skewness.
 2.When mean > median > mode, skewness will be
positive.
 3.When mean < median < mode, skewness will be
negative.
Characteristic of a good measure of skewness

 1. It should be a pure number in the sense that its


value should be independent of the unit of the
series and also degree of variation in the series.
 2. It should have zero-value, when the
distribution is symmetrical.
 3. It should have a meaningful scale of
measurement so that we could easily interpret
the measured value.
Mathematical measures of skewness can
be calculated by:
(a) Bowley’s Method
(b) Karl-Pearson’s Method
(c) Kelly‘s method
Bowley’s Method

Bowley’s Skewness =
(Q1+Q3–2Q2) / (Q3-Q1).
 Where Q1 is the lower quartile, Q3 is the
upper quartile, and Q2 is the median.
6. Karl Pearson’s Measure of Skewness

 (i) Where the relationship of mean and mode is established;


 Pearson’s Coefficient of Skewness #1 uses the mode. The formula
is:
 SK= (Mean - Mode) / σ

(ii)Where the relationship between mean and median is not


established.
Pearson’s Coefficient of Skewness #2 uses the median. The
formula is: Sk=3(Mean — Median) / σ
Kelly’s Measure Formula

 Kelley’s measure of skewness is given in terms of percentiles and deciles


(D).
 Kelley’s absolute measure of skewness (Sk) is:
 Sk=P90 + P10–2*P50 = D9 + D1–2*D5

Kelley’s relative measure of skewness (Sk) is: 


 Sk= (P90 + P10 – 2P50) / (P90 - P10 )

 Or Sk = (D9 + D1– 2D5) / (D9 - D1)


 
Activity : Find skewness of the distribution
based on the three measures discussed.

Weight (gm) Number of Items Less than c.f

90-100 5 5
100 -110 8 13
110 -120 10 23
120 -130 15 38
130 -140 9 47
140 -150 7 54
150 -160 6 60

You might also like