You are on page 1of 11

Activity 5: Measures of Dispersion

and Measure of Shapes

Rommel Enopil Estrellado


BSA 2D
Measures of Dispersion

Introduction
Real values that are not negative are used as measures of dispersion to determine how widely the
facts concerning a core value are dispersed. These metrics aid in identifying how stretched or
compressed the provided data is. The five most popular dispersion measures are listed below.
Range, variance, standard deviation, mean deviation, and quartile deviation are examples of
these.
Measures of dispersion are most useful when they assist in understanding the distribution of data.
The measure of dispersion's value rises with the diversity of the data. Measurements of
dispersion, their varieties with examples, and many significant topics pertaining to these
measures are covered in this article.
It helps to explain how data can vary. The statistical concept of dispersion can be used to define
how dispersed data is. As a result, dispersion measures are specific types of measurements that
are used to assess data dispersion.

Definition
Measures of dispersion are measurement in statistics. It give us an idea of how values of a set
of data are scattered. Dispersion small if the data set are quantitative measures such as range,
interquartile range, variance and standard deviation

Types

Range: It is simply the difference between the maximum value and the minimum value
given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6

Variance: Deduct the mean from each data in the set, square each of them and add each
square and finally divide them by the total no of values in the data set to get the variance.
Variance (σ2) = ∑(X−μ)2/N

Standard Deviation: The square root of the variance is known as the standard deviation
i.e. S.D. = √σ.
Formula

Standard deviation formula is used to find the values of a particular data that is dispersed.
In simple words, the standard deviation is defined as the deviation of the values or data
from an average mean. Lower standard deviation concludes that the values are very close
to their average. Whereas higher values mean the values are far from the mean value. It
should be noted that the standard deviation value can never be negative.

Standard Deviation is of two types:

1. Population Standard Deviation


2. Sample Standard Deviation

Formulas for Standard Deviation

Notations for Standard Deviation

• σ = Standard Deviation
• xi = Terms Given in the Data
• x̄ = Mean
• n = Total number of Terms
Standard Deviation Formula Based on Discrete Frequency Distribution
For discrete frequency distribution of the type:
x: x1, x2, x3, … xn and
f: f1, f2, f3, … fn
The formula for standard deviation becomes:
σ=1N∑i=1nfi(xi−x¯)2
Here, N is given as:
N = n∑i=1 fi

Standard Deviation Formula for Grouped Data


There is another standard deviation formula which is derived from the variance. This formula is
given as:

Examples on Measures of Dispersion (ungroup data)

The table below shows the masses, in kg of 10 pupils

52
62
12
9
8
75
44
33
19
16
State the difference in mass, in kg of the pupils
Solution:

Largest mass = 75=75

Smallest mass =8=8

Difference in mass,

=75−8

=67.

Examples on Measures of Dispersion (group data)

Let’s consider two varieties of coffee – X & Y with different yields. Coffee X and Y have the
following yields for a period of six months:

To know the spread of each variety of coffee, let’s calculate its range.

Range (R) = Largest value (L) – Smallest Value (S)

As mentioned before, the higher the range, the greater the data spread. Thus,

• X has a lower range. It means it has less scattered data or a more homogeneous data set.
• Y has a higher range. It represents a highly scattered data set or a more heterogeneous
data set.

Therefore, X has a lower spread than Y. Lower spread means better yield, and a higher spread
represents lower yield. Hence, higher dispersion in data means lesser returns, and lower
dispersion in the data set means higher returns.
Measures of Shape

Introduction
Measures of Shape, as descriptive statistics, let us understand how data points in a dataset are
distributed and help us comprehend patterns that may be hidden but may be seen after the data is
plotted on the graph. The data and the numerous shapes it might take are discussed in this blog.
It's crucial to remember that only some types of data may be described by their shapes, such as
quantitative data, which has a logical order and some sort of "weight," as opposed to qualitative
data, whose values have no such "weight" and cannot be employed.
By taking into account the distribution of the data points in the space, it is possible to
comprehend the form of the data. Asymmetrical Distribution and Symmetrical Distribution are
two categories into which this distribution can be placed (Skewed Distribution). Every single one
of these distributions is examined in this blog post.

Definition
Measures of shape define the distribution of the data in a dataset. And, the distribution shape of
quantitative data means that there is a logical order to the values. Moreover, then there can be the
identification of the ‘low’ and ‘high’ end values on the x-axis

Types
Rise and drop of distribution in measures of shape

For distributions summarizing data from continuous measurement scales, statistics can be used to
describe how the distribution rises and drops.

Symmetric

Symmetric means if the distributions having the same shape on both sides of the center.
Moreover, those with only one peak are known as a normal distribution.

Skewness

Skewness refers to the degree of asymmetry in a distribution. And, asymmetry reflects extreme
scores in a distribution. Moreover, it includes positive and negative skewness.

• Positively skew
It is when it has a tail extending out to the right so, the mean is greater than the median and the
mean is sensitive to each score in the distribution. Moreover, it is subject to large shifts when the
sample is small and contains extreme scores.

• Negatively skew

Negative skew has an extended tail pointing to the left and reflects bunching of numbers in the
upper part of the distribution. Moreover, it has fewer scores at the lower end of the measurement
scale.

Examples and Formulas

College Men’s Heights

Here are grouped data for heights of 100 randomly selected male students, adapted from Spiegel
and Stephens (1999, 68).

A histogram shows that the data are skewed left, not symmetric.

But how highly skewed are they, compared to other data sets? To answer this question, you have
to compute the skewness.

Begin with the sample size and sample mean. (The sample size was given, but it never hurts to
check.)
n = 5+18+42+27+8 = 100
x̅ = (61×5 + 64×18 + 67×42 + 70×27 + 73×8) ÷ 100
x̅ = 9305 + 1152 + 2814 + 1890 + 584) ÷ 100
x̅ = 6745÷100 = 67.45
Now, with the mean in hand, you can compute the skewness. (Of course, in real life you’d
probably use Excel or a statistics package, but it’s good to know where the numbers come from.)
Class Mark, x Frequency, f xf (x−x̅) (x−x̅)²f (x−x̅)³f

61 5 305 -6.45 208.01 -1341.68

64 18 1152 -3.45 214.25 -739.15

67 42 2814 -0.45 8.51 -3.83

70 27 1890 2.55 175.57 447.70

73 8 584 5.55 246.42 1367.63

∑ 6745 n/a 852.75 −269.33

x̅, m2, m3 67.45 n/a 8.5275 −2.6933

Finally, the skewness is


g1 = m3 / m23/2 = −2.6933 / 8.52753/2 = −0.1082

But wait, there’s more! That would be the skewness if you had data for the whole population.
But obviously there are more than 100 male students in the world, or even in almost any school,
so what you have here is a sample, not the population. You must compute the sample skewness:

= [√100×99 / 98] [−2.6933 / 8.52753/2] = −0.1098

Interpreting

If skewness is positive, the data are positively skewed or skewed right, meaning that the right tail
of the distribution is longer than the left. If skewness is negative, the data are negatively skewed
or skewed left, meaning that the left tail is longer.
If skewness = 0, the data are perfectly symmetrical. But a skewness of exactly zero is quite
unlikely for real-world data, so how can you interpret the skewness number? There’s no one
agreed interpretation, but for what it’s worth Bulmer (1979) — a classic — suggests this rule of
thumb:
• If skewness is less than −1 or greater than +1, the distribution can be
called highly skewed.
• If skewness is between −1 and −½ or between +½ and +1, the distribution can
be called moderately skewed.
• If skewness is between −½ and +½, the distribution can be
called approximately symmetric.

With a skewness of −0.1098, the sample data for student heights are approximately symmetric.

Kurtosis is unfortunately harder to picture than skewness, but these illustrations, suggested
by Wikipedia, should help. All three of these distributions have mean of 0, standard deviation of
1, and skewness of 0, and all are plotted on the same horizontal and vertical scale. Look at the
progression from left to right, as kurtosis increases.

The normal distribution will probably be the subject of roughly the second half of your
course; the logistic distribution is another one used in mathematical modeling. Don’t
worry at this stage about what these distributions mean; they’re just handy examples that
illustrate what I want to illustrate.

Normal(μ=0, σ=1) Logistic(α=0, β=0.55153)


Uniform(min=−√3, kurtosis = 3, excess = 0 kurtosis = 4.2, excess = 1.2
max=√3)
kurtosis = 1.8, excess = −1.2

Moving from the illustrated uniform distribution to a normal distribution, you see that the
“shoulders” have transferred some of their mass to the center and the tails. In other words, the
intermediate values have become less likely and the central and extreme values have become
more likely. The kurtosis increases while the standard deviation stays the same, because more of
the variation is due to extreme values.
Moving from the normal distribution to the illustrated logistic distribution, the trend
continues. There is even less in the shoulders and even more in the tails, and the central peak is
higher and narrower.

How far can this go? What are the smallest and largest possible values of kurtosis? The
smallest possible kurtosis is 1 (excess kurtosis −2), and the largest is ∞, as shown here:
Student’s t (df=4)
Discrete: equally likely values kurtosis = ∞, excess = ∞
kurtosis = 1, excess = −2

A discrete distribution with two equally likely outcomes, such as winning or losing on the flip of
a coin, has the lowest possible kurtosis. It has no central peak and no real tails, and you could
say that it’s “all shoulder” — it’s as platykurtic as a distribution can be. At the other extreme,
Student’s t distribution with four degrees of freedom has infinite kurtosis. A distribution can’t
be any more leptokurtic than this.
References:
https://www.cuemath.com/data/measures-of-dispersion/
https://byjus.com/maths/dispersion/
https://byjus.com/standard-deviation-formula/
https://www.datavedas.com/measures-of-shape/
https://www.vskills.in/certification/tutorial/measures-of-shape/
https://brownmath.com/stat/shape.htm
https://www.wallstreetmojo.com/dispersion/
https://app.pandai.org/note/read/kssm-mt-10-08/kssm-f4-mm-08/bab-8-sukatan-serakan-data-
tak-terkumpul

You might also like