Professional Documents
Culture Documents
BS 3
BS 3
BS/IIFT/Harsh/3 2
Session Objective
To distinguish between measures of central tendency,
measures of variation & measures of shape
To calculate descriptive summary measures for a
population
To differentiate between sample & population variance &
standard deviation
To understand the meaning of standard deviation as it is
applied by using the empirical rule & Chebyshev’s theorem
BS/IIFT/Harsh/3 3
Content
To understand the following concepts:
Measures of Central Tendency
Mean, Median, Mode, Midrange, Midhinge, Quartile, Percentiles
Measures of Variability
Range, Interquartile Range, Variance and Standard Deviation, Coefficient of Variation
Measures of Shape
Symmetric, Skewed, using Box-and-Whisker Plots, Kurtosis
Differentiate between sample and population variance and standard deviation
Locating Extreme Outliers: Z-Score
Understand the meaning of standard deviation as it is applied using the
empirical rule and Chebyshev’s theorem
BS/IIFT/Harsh/3 4
Summary Measures
Summary Measures
Mean Mode
Median Range Coefficient of
Variation
Midrange Variance
Standard Deviation
Midhinge
BS/IIFT/Harsh/3 5
Central Tendency & Dispersion
Central Tendency is midpoint of a distribution. Measure of Central
Tendency is also known as Measure of Location.
Dispersion is the spread of the data in a distribution i.e the extent to which
the observation is spread.
Curves representing the data points in the data set may be either
Symmetrical or Skewed.
Symmetrical A characteristic of a distribution in which each half is the
mirror image of the other half.
Skewness The extent to which a distribution of data points are
concentrated at one end or other or it is lack of Symmetry.
BS/IIFT/Harsh/3 6
Contd.
BS/IIFT/Harsh/3 7
Measures of Central Tendency
Central Tendency
Midhinge
BS/IIFT/Harsh/3 8
The Mean (Arithmetic Average)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
BS/IIFT/Harsh/3 9
Contd.
Population Mean
x
= , x is sum of all observations,
N
N is number of elements in the population.
Sample Mean
x
𝑥= , x is sum of all observations,
n
n is number of elements in the sample.
Weighted Average for grouped data
fx
𝑥= , where f is the respective frequency of the mid point x
f
fd
𝑥 = A+ h* f , A is an arbitrary mid point,
(x−A)
h is class interval, d=
h
BS/IIFT/Harsh/3 10
Advantages and Disadvantages of
Arithmetic Mean
BS/IIFT/Harsh/3 11
Mean - Mean -
Ungrouped Data Grouped Data
X 1 X 2 X 3 ... X N X fX
N N N
X 1 X 2 X 3 ... X n X fX
X X
n n n
BS/IIFT/Harsh/3 12
Table
Table | Approximation of the Arithmetic Mean from a
Frequency Distribution
Absolute Class
Class Frequency (number Class
(net profit in of companies in Midpoint
millions of dollars) class) f X fX
-1,250 to under 0 6 -625 -3,750
0 to under 1,250 49 625 30,625
1,250 to under 2,500 18 1,875 33,750
2,500 to under 3,750 15 3,125 46,875
3,750 to under 5,000 3 4,375 13,125
5,000 to under 6,250 2 5,625 11,250
6,250 to under 7,500 4 6,875 27,500
7,500 to under 8,750 2 8,125 16,250
8,750 to under 1 9,375 9,375
10,000 f = N = 100 fX = 185,000
BS/IIFT/Harsh/3 14
Ans.
Class Mid- Frequency d= (x-A)/h fd
Interval Value A=28,h=8
0-8 4 8 -3 -24
8-16 12 7 -2 -14
16-24 20 16 -1 -16
24-32 28 24 0 0
32-40 36 15 1 15
40-48 44 7 2 14
Total 77 -25
fd (−25) 200
𝑥 = A+ h* =28+8* = 28 - = 25.404
f 77
BS/IIFT/Harsh/3
77 15
Geometric Mean
Geometric Mean is another measure of central tendency.
It measures the average rate of change or growth for a
quantity computed by taking the nth root of product of n
values representing change.
A good working hint to use GM is while calculating the
average percentage change in some variable over time,
for example average inflation rate, average growth rate of
saving bank interest.
GM = 𝑛 ∏𝑥
BS/IIFT/Harsh/3 16
The Median
An important Measure of Central Tendency
In an ordered array, the median is the middle
number.
If n is odd, the median is the middle number.
If n is even, the median is the average of the 2
middle numbers.
Not Affected by Extreme Values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
BS/IIFT/Harsh/3 17
Median-Contd.
Median is the middle point of the data set,a measure of location
which divides the data set into halves.
(n+1)
Median= th item in the data array.
2
The extreme values does not affect the median as strongly as
mean. It is easy to understand and can be calculated for any kind
of data even for grouped data with open ended class.
We can find median for qualitative description such as sharpness
of image.
However data need to be arrayed (time consuming) and it is
seen that Mean is easier to compute than Median.
BS/IIFT/Harsh/3 18
Median - Ungrouped Data Median - Grouped Data
For a population: For a population:
M X N 1 ( N / 2) -F
M L w
2 f
For a sample: For a sample:
m X n 1 ( n / 2) -F
m L w
2 f
X = population (or sample) value L = the median class’s lower limit
N = number of observations in population f = its absolute frequency
n = number of observations in sample w = its width
subscript = position of X in ordered array F = the sum of frequencies up to
(but not including) those of
the median class
BS/IIFT/Harsh/3 19
The Median divides the area in the graph in half
BS/IIFT/Harsh/3 20
To locate the Median
BS/IIFT/Harsh/3 21
Numerical Problem-Median-Discrete distribution
Obtain the Median for following frequency distribution
x f
1 8 In case of discrete distribution .
1.Find cumulative distribution.
2 10 N
2.Determine ,where N=Σf,
3 11 2
3.See cumulative frequency greater than N/2,
4 16 4.The corresponding value of x is the Median
5 20
6 25
7 15
8 9
9 6 BS/IIFT/Harsh/3 22
Ans.
x f cf N/2=60,
1 8 8 Cumulative frequency(cf)
greater than N/2=60, is 65,
2 10 18 value of x corresponding to
3 11 29 65 is 5,
4 16 45 therefore 5 is the Median
5 20 65
6 25 90
7 15 105
8 9 114
9 6 120
Total N=120
BS/IIFT/Harsh/3 23
Median of continuous frequency distribution
Find the median wage of the
following distribution:
Median - Grouped Data
Wages in No.of For a population:
Rs. labors ( N / 2) -F
M L w
20-30 3 f
30-40 5
40-50 20
50-60 10 L = the median class’s lower limit
f = frequency of median class
60-70 5 w = its width
F = the sum of frequencies up to
(but not including) those of
the median class
BS/IIFT/Harsh/3 24
Ans.
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9 No Mode
BS/IIFT/Harsh/3 26
A bimodal distribution- number of fish caught
BS/IIFT/Harsh/3 27
Mode-advantages & disadvantages.
Mode like mean can be used for both qualitative as well as
quantitative data.
It is not affected by extreme values.
It can be used for open ended class.
However it is not used as often as Median or Mean.
It is also possible that no value occurs more than once and it
is also possible that every value occurs same number of
times so there is no mode.
BS/IIFT/Harsh/3 28
Mean ,Mode, Median Comparison
For symmetrical distribution that has one mode, all three mean, mode
and median are equal.
For positively skewed distribution mode is the highest point ,median
is to its right and mean on the right of both.
For negatively skewed ,mode is highest point of distribution median to
its left and mean to left of both.
When population is skewed median is best measure for location.
Median is not influenced by frequency of occurrence of a single value
as mode is ,nor pulled by extreme value as is the mean.
BS/IIFT/Harsh/3 29
Mean Mode Median Comparison
Negatively Positively
Symmetrical
skewed skewed
BS/IIFT/Harsh/3 30
The Shape of a Frequency Curve The Shape of a Frequency Curve
Skewness Kurtosis
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Midrange = 5 BS/IIFT/Harsh/3
Midrange = 5 32
Quartiles
X 0 1 2 3 4 5 6 7 8
f 1 9 26 59 72 52 29 7 1
BS/IIFT/Harsh/3 34
Solution Q1 Median Q3
X 0 1 2 3 4 5 6 7 8
f 1 9 26 59 72 52 29 7 1
Summary Measures s2
n -1
Mean Mode
n Median Range Coefficient of
xi Variation
i 1 Variance
n Midrange
Midhinge Standard Deviation
BS/IIFT/Harsh/3 37
The Concept of Dispersion
Dispersion = variety, diversity, amount of variation between
scores.
The greater the dispersion of a variable, the greater the
range of scores, and the greater the differences between
scores.
BS/IIFT/Harsh/3 38
Central Tendency & Dispersion
Measures of Central Tendency can be informative, but may not give a
complete picture.
Example: assume there are 14 students in each of 3 classes, with the
following scores (out of a possible 10) on a test:
1 66666666666666 6 6
2 1 2 2 3 4 4 4 8 8 9 9 10 10 10 6 6
3 445556666 77788 6 6
BS/IIFT/Harsh/3 39
Mean=Median ... but distributions are very different
BS/IIFT/Harsh/3 40
The Concept of Dispersion: Examples
Typically, a large city will have more diversity than a small town.
Some states (New Delhi, Mumbai, Kolkata) are more culturally diverse
than others (Shri Nagar, Pauri, Darjeeling ).
BS/IIFT/Harsh/3 41
The Concept of Dispersion
BS/IIFT/Harsh/3 42
Measures of Variation
Variation
BS/IIFT/Harsh/3 44
The Range
Measure of Variation
Difference Between Largest & Smallest Observations:
Range = Xlargest - Xsmallest
Quick and easy indication of variability
Ignores How Data is distributed, heavily influenced by the extreme values:
Range = 12 - 7 = 5 Range = 12 - 7 = 5
7 8 9 10 11 12 7 8 9 10 11 12
BS/IIFT/Harsh/3 45
Interquartile Range(IQR)
The Interquartile Range measures approximately how far from the
median we must go on either side before we can include one half value of
the data set.
Also Known as Mid Spread:
Spread in the Middle 50%
Difference Between Third & First Quartiles:
Interquartile Range = Q3 - Q1
Not Affected by Extreme Values
Example n=9,
Data in Ordered Array: 11 12 13 16 16 17 17 18 21
Q3 - Q1 = 17.5 - 12.5 = 5
BS/IIFT/Harsh/3 46
Interquartile Range(IQR)
Disadvantages to using range and IQR:
(1) you ignore a lot of information – you only use two scores in each
measure;
(2) don’t get an idea about how much scores are different from the
center of the distribution.
BS/IIFT/Harsh/3 47
Mean Absolute Deviation -
Ungrouped Data
For a population:
X -
MAD
N
For a sample:
X -X
MAD
n
numerators = the sums of absolute
differences between each observed
population or sample value (X) & the
population ( ) or sample ( X ) means
n -1
BS/IIFT/Harsh/3 49
Variance
Computational form:
2
n
n
i 1
x - xi / n
2
i
i 1
s
2
n -1
BS/IIFT/Harsh/3 50
Standard Deviation
x - x
n n
x - xi / n
2 2
i i
s i 1
s i 1 i 1
n -1 n -1
BS/IIFT/Harsh/3 51
Sample Standard Deviation
X - X
2 For the Sample : use n - 1 in
s i the denominator.
n -1
Data: Xi : 10 12 14 15 17 18 18 24
s= (10 - 16)2 (12 - 16)2 (14 - 16)2 (15 - 16)2 (17 - 16)2 (18 - 16)2 (24 - 16)2
8-1
= 4.2426
BS/IIFT/Harsh/3 52
Measures of Variability
Example: Partners in Accounting Firms
An analyst takes a sample of the number of partners in six of the largest accounting firms
in the U.S. What are the sample variance and sample standard deviation?
BS/IIFT/Harsh/3 53 53
Measures of Variability
Example: Partners in Accounting Firms
xi xi - x 2 x
13, 374
6
2229.00
3327 1,205,604 x - x
2
s 2
i
n -1
3200 942,841
7, 248,818
3135 820,836 5
799 2,044,900 xi - x
2
s
n -1
735 2,232,036
1, 449, 763.6
TOTAL 13,374 7,248,818
1, 204.06
BS/IIFT/Harsh/3 54 54
Comparing Standard Deviations
Data A
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
BS/IIFT/Harsh/3 55
Standard Deviation
We only squared the deviations to keep the sum from being zero
(a property of the mean). Now that we have a non-zero number,
take the square root to get a statistic s the standard deviation:
s2
i
( X - X ) 2
, so s
i
( X - X ) 2
n -1 n -1
BS/IIFT/Harsh/3 57
Variance -
Ungrouped Data
For a population:
( X - )2 X 2 - 2
N
2
N N
For a sample:
2 ( X - X ) 2
X 2 -
n X 2
s
n -1 n -1
numerators = the sums of squared
deviations between each population
or sample value (X) and the
population ( ) or sample ( X ) mean
BS/IIFT/Harsh/3 58
Standard Deviation
To find the SD
Subtract mean from each score.
Square the deviations.
Sum the squared deviations.
Divide the sum of the squared deviations by N.
Find the square root of the result.
BS/IIFT/Harsh/3 59
Use of Standard Deviation
BS/IIFT/Harsh/3 60
Normal Frequency Distribution
BS/IIFT/Harsh/3
Normal Frequency Distribution 61
Z Scores
Z score – represents the number of Std Dev a value (x) is above or
below the mean of a set of numbers
Z score allows translation of a value’s raw distance from the mean
into units of std dev
(x−µ)
Z=
σ
Negative z scores indicate that the raw value (x) is below the mean;
positive z scores indicate x values above the mean
BS/IIFT/Harsh/3 62
Measures of Variability
For a normally distributed population with mean of 50 and a standard deviation of 10, an x value
of 70 would have a z score of 2
70 - 50
z 2
10
BS/IIFT/Harsh/3 63 63
Coefficient of Variation
Measure of Relative Variation
Always a %
Shows Variation Relative to Mean
Used to Compare 2 or More Groups
S
Formula ( for Sample): CV = *100%
𝑋
BS/IIFT/Harsh/3 64
Comparing Coefficient of Variation
Coefficient of Variation:
S Stock A: CV = 10%
CV 100% Stock B: CV = 5%
X
BS/IIFT/Harsh/3 65
Shape
BS/IIFT/Harsh/3 66
Coefficient of Skewness
Coefficient of Skewness (Sk) - compares the mean
and median in light of the magnitude to the standard deviation;
Md is the median;
Sk is coefficient of skewness;
σ is the Std Dev
3 - Md
Sk
BS/IIFT/Harsh/3 67
Coefficient of Skewness
3 - Md
Sk
If Sk < 0, the distribution is negatively skewed (skewed to
the left).
If Sk = 0, the distribution is symmetric (not skewed) . If Sk is
close to 0, it’s almost symmetric
If Sk > 0, the distribution is positively skewed (skewed to
the right).
BS/IIFT/Harsh/3 68
Box-and-Whisker Plot
Graphical Display of Data Using
5-Number Summary
4 6 8 10 12
BS/IIFT/Harsh/3 69
Distribution Shape &
Box-and-Whisker Plots
BS/IIFT/Harsh/3 70
Learning
Discussed Measures of Central Tendency
Mean, Median, Mode, Midrange, Midhinge
Quartiles
Addressed Measures of Variation
The Range, Interquartile Range, Variance,
Standard Deviation, Coefficient of Variation
Determined Shape of Distributions
Symmetric, Skewed, Box-and-Whisker Plot
BS/IIFT/Harsh/3 71
Thanks
BS/IIFT/Harsh/3 72