You are on page 1of 9

CHAPTER 3 -

MEASURES OF CENTRAL TENDENCY


MEASURES OF VARIATION
MEASURES OF SHAPE
09 September 2023 15:33

MEASURES OF CENTRAL TENDENCY: UNGROUPED DATA

Measures of central tendency yield information about “particular places or locations in a group of numbers.”
They yield information about the centre, or middle part, of a group of numbers.

Common Measures of Location


• Mode
• Median
• Mean
• Percentiles
• Quartiles

MODE:
Mode - the most frequently occurring value in a data set
• Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio)
• Can be used to determine what categories occur most frequently
• Sometimes, no mode exists (no duplicates)
• 2 Modes in dataset - Bimodal
• More than 2 Modes - Multimodal

APPLICATION: In the world of business, the concept of mode is often used in determining sizes. As an example, manufacturers who produce cheap
rubber flip-flops that are sold for as little as $1.00 around the world might only produce them in one size in order to save on machine setup costs. In
determining the one size to produce, the manufacturer would most likely produce flip-flops in the modal size.

MEDIAN:
Median - middle value in an ordered array of numbers.
• Half the data are above it, half the data are below it

• Mathematically, it’s the ⎯⎯⎯ ordered observation


○ For an array with an odd number of terms, the median is the middle number

 n=11 => ⎯⎯⎯ = 12/2 th = 6th ordered observation


○ For an array with an even number of terms the median is the average of the middle two numbers

 n=10 => ⎯⎯⎯ = 11/2 th = 5.5th = average of 5th and 6th ordered observation
The median is unaffected by the extreme values. Used for measuring salaries, age, etc.

MEAN
Mean is the average of a group of numbers
• Applicable for interval and ratio data
• Not applicable for nominal or ordinal data
• Affected by each value in the data set, including extreme values which may become a disadvantage when extreme values (very large or very
small) pull the mean towards a higher or a lower value distorting the assessment of the sample or the population.
• Computed by summing all values in the data set and dividing the sum by the number of values in the data set
• The popula on mean is represented by the Greek le er mu 'µ'. The sample mean is represented by X-bar ' x̅ '.
𝛴𝑥 𝑥 + 𝑥 + 𝑥 + ⋯+𝑥
• 𝑃𝑂𝑃𝑈𝐿𝐴𝑇𝐼𝑂𝑁 𝑀𝐸𝐴𝑁: 𝜇 = ⎯⎯⎯ = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝑁 𝑁
∑𝑥 𝑥 + 𝑥 + 𝑥 + ⋯+ 𝑥
• 𝑆𝐴𝑀𝑃𝐿𝐸 𝑀𝐸𝐴𝑁: 𝑥̅ = ⎯⎯⎯⎯= ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝑛 𝑛
• 'N' is the number of terms in the population, and 'n' is the number of terms in the sample.
• The sample mean only considers a selected number of observations—drawn from the population data. The population mean, on the other
hand, considers all the observations in the population—to compute the average value.

WHY POPULATION AND SAMPLE FORMULAE ARE DIFFERENT AND NEEDED AS SUCH?

QTM Page 1
WHY POPULATION AND SAMPLE FORMULAE ARE DIFFERENT AND NEEDED AS SUCH?
• For smaller numbers we can effectively calculate population mean, standard deviation etc. however, it isn't possible when we have millions of
products or hundreds of millions of vehicles. We can't approach every single person on the planet or in a country or even in a town. So, for
practical reasons in our worldly activities, we take a sample and then by doing survey or research on that sample we test our hypotheses or
arrive at a result and then we predict or infer results upon the whole population. Hence there is a need for separate considerations of sample
and population and hence we have separate formulae for them.

Estimation The sample mean is used to estimate the population mean and make The population mean is a known or unknown value that is of
inferences about the population based on the sample. interest and may not require estimation.
Sampling Error The sample mean is subject to sampling error, which is the difference The population mean does not have sampling error as it
between the sample mean and the population mean. represents the true average of the entire population.
Bias The sample mean may be biased due to the sampling method used or the The population mean is unbiased as it represents the true
characteristics of the sample. average of the entire population.
Statistical The sample mean is used in statistical inference techniques, such as The population mean may serve as a reference point for
Inference hypothesis testing and confidence interval estimation. statistical comparisons or as a benchmark.
Precision The sample mean is typically less precise than the population mean due The population mean is often more precise as it considers all
to the smaller sample size and variability. the observations in the population.
Central Limit The sample mean tends to follow a normal distribution as the sample size The population mean does not require the Central Limit
Theorem increases, according to the Central Limit Theorem. Theorem as it represents the true average of the population.
Accuracy The sample mean may or may not be accurate in estimating the The population mean is the accurate and true average of the
population mean, depending on the representativeness and sampling entire population.
method.
Sampling Frame The sample mean is based on a specific sampling frame or the set of The population mean considers all individuals or units in the
individuals or units from which the sample is drawn. population.
Statistical The sample mean has statistical properties, such as variance, standard The population mean has statistical properties that describe
Properties deviation, and confidence interval, which are calculated based on the the variability and distribution of the values in the population.
sample.
Efficiency The sample mean may be less efficient than the population mean in The population mean is the most efficient estimator of the
estimating the true average due to the smaller sample size. true average as it considers all the observations in the
population.
Characteristics The sample mean may not accurately represent all the characteristics The population mean represents all the characteristics and
and parameters of the population mean. parameters of the population.
Statistical Tests The sample mean is used in various statistical tests, such as t-tests and The population mean may be used as a reference point in
ANOVA, to assess differences between groups or variables. statistical tests or comparisons.

PROBLEM 3.1:
Arrange in ascending order:
2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 8, 8, 8, 9

There are 15 terms.


Since there are an odd number of terms, the median is the middle number.
The median = 4
𝒏+𝟏
Using the formula, the median is located at the ⎯⎯⎯ 𝑡𝑒𝑟𝑚
2
The 8th term = 4

Mode=2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 8, 8, 8, 9
The mode = 4
4 is the most frequently occurring value

EXCEL FORMULA FOR MEAN : =AVERAGE(RANGE)


EXCEL FORMULA FOR MEDIAN: =MEDIAN(RANGE)
EXCEL FORMULA FOR MODE: =MODE(1st Number, 2nd Number, ….) ; It happens automatically in excel as well

PROBLEM 3.5 Average closing price of a group of stocks on the New York stock exchange : 21,21,21,22,23,25,28,29,33,35,38,56,61
Find Mean, Median and Mode

Arranging data in ascending order: 21,21,21,22,23,25,28,29,33,35,38,56,61


n=13, ∑ 𝑥 = 21+21+21+22+23+25+28+29+35+38+56+61 = 380
Mean= x̅ = ⎯⎯⎯= 380/13 = 29.2308

Median = ⎯⎯⎯ observation which is (13+1)/2 = 7


Hence, the 7th observation i.e. 28 would be the median of all observations.

QTM Page 2
Mode = 21 since it occurs the most number of times,3.

EXCEL FORMULA FOR MEAN : =AVERAGE(RANGE)


EXCEL FORMULA FOR MEDIAN: =MEDIAN(RANGE)
EXCEL FORMULA FOR MODE: =MODE(1st Number, 2nd Number, ….) ; It happens automatically in excel as well

Question from ppt:


Collect a sample of students shoe sizes:
8, 9, 9, 8, 10, 11, 10, 9, 8, 9, 10, 11, 11, 10, 10, 10, 10, 10, 10, 11, 11, 12, 11, 10, 9, 7, 8, 7, 7
Determine

Mean: Mean= x̅ = ⎯⎯⎯= 276/29 = 9.5172

Median: Arranging data in ascending order -


7 7 7 8 8 8 8 9 9 9 9 9 10 10
10 10 10 10 10 10 10 10 11 11 11 11 11 11
12
n= 29

Median : ⎯⎯⎯ observation which is 15th observation i.e. 10

Mode: 10 since it occurs the most, 10 times.

What do you think is the right measure of size of shoes for the class?
A: Median and mode sized shoe would fit a higher number than mean

Key Takeaway:
• When large number of small values are involved, a mode or a median can be disastrous. Mode will lead to a small value, median will also lead
to small value, and the outlier, which are large values will be left out and will not contribute to the picture.
• The mode of the salary, net worth of people of India: Mode, or median would be a good idea.
• Also the purpose: when a tax rule is to be applied: mean would be good, but when a subsidy is to be distributed, mode or median may be a
good idea.
• A shoe company, t shirt making company may like knowing mode then median, or mean. Since they would know of the maximum bought/
demanded product.
• In general, if there are outliers, the median is preferred to the mean

The number of U.S. cars in service by top car rental companies in a recent year according to Auto Rental News follows.
Company Number of Cars in Service
Enterprise 643,000; Hertz 327,000; National/Alamo 233,000; Avis 204,000; Dollar/Thrifty 167,000; Budget 144,000; Advantage 20,000; U-Save 12,000;
Payless 10,000; ACE 9,000; Fox 9,000; Rent-A-Wreck 7,000; Triangle 6,000
Compute the mode, the median, and the mean.
A:
DATA: ASCENDING ORDER ARRANGED DATA:
Enterprise 6,43,000 Triangle 6,000
Hertz 3,27,000 Rent-A-Wreck 7,000
National/Alamo 2,33,000 ACE 9,000
Avis 2,04,000 Fox 9,000
Dollar/Thrifty 1,67,000 Payless 10,000
Budget 1,44,000 U-Save 12,000
Advantage 20,000 Advantage 20,000
U-Save 12,000 Budget 1,44,000
Payless 10,000 Dollar/Thrifty 1,67,000

QTM Page 3
ACE 9,000 Avis 2,04,000
Fox 9,000 National/Alamo 2,33,000
Rent-A-Wreck 7,000 Hertz 3,27,000
Triangle 6,000 Enterprise 6,43,000

Mean= x̅ = ⎯⎯⎯= 17,91,000 / 13 = 1,37,769.2308

Median= ⎯⎯⎯ observation which would be 7th observation. I.e. 20,000


Mode = 9,000 since it gets repeated twice

Percentile:
Percentiles are measures of central tendency that divide a group of data into 100 parts. There are 99 percentiles because it takes 99 dividers to
separate a group of data into 100 parts.

A 75th percentile of a group of data is a value that indicates that at least 75% of all values of that group are below 75th percentile and no more than
25% of that group of data are above it.

Percentiles are “stair-step” values

Steps in Determining the Location of a Percentile


1. Organize the numbers into an ascending-order array.
2. Calculate the percentile location (i) by:

i = ⎯⎯⎯(𝑁) where,
P = the percentile of interest
i = percentile location
N = number in the data set

3. Determine the location by either (a) or (b).


a. If i is a whole number, the Pth percentile is the average of the value at the ith location and the value at the (i + 1)st
location.
b. If i is not a whole number, the Pth percentile value is located at the whole number part of i + 1.

For example, suppose you want to determine the 80th percentile of 1240 numbers. P is 80 and N is 1240. First, order the numbers from lowest to
highest. Next, calculate the location of the 80th percentile.
i = ⎯⎯⎯(1240)
Because i = 992 is a whole number, follow the directions in step 3(a). The 80th percentile is the average of the 992nd number and the 993rd number.

992 Number + 993 𝑁𝑢𝑚𝑏𝑒𝑟


𝑃 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
2

Q: Determine the 30th percentile of the following numbers: 14, 12, 19, 23,5,13,28,17

Arranging data in ascending order: 5, 12, 13, 14, 17, 19,23,28

P =30, N =8

i = ⎯⎯⎯(8) = 2.4

Since this value is not a whole number we would see the next whole number which would be 3rd number in the data set, i.e. 13

Quartiles
Quartiles are measures of central tendency that divide a group of data into four subgroups or parts. The three quartiles are denoted as Q1, Q2, and
Q3.
• The first quartile,Q1, separates the first, or lowest, one-fourth of the data from the upper three-fourths and is equal to the 25th percentile.
• The second quartile, Q2, separates the second quarter of the data from the third quarter. Q2 is located at the 50th percentile and equals the
median of the data.
• The third quartile, Q3, divides the first three-quarters of the data from the last quarter and is equal to the value of the 75th percentile.

QTM Page 4
MEASURES OF VARIABILITY: UNGROUPED DATA
RANGE
INTERQUARTILE RANGE
DEVIATION FROM MEAN, ABSOLUTE DEVIATION, SQUARED DEVIATION & VARIANCE
STANDARD DEVIATION & MEANING OF STANDARD DEVIATION
EMPIRICAL RULE
CHEBYSHEV'S THEOREM
POPULATION VERSUS SAMPLE STANDARD DEVIATION AND VARIANCE
Z-SCORES
COEFFICIENT OF VARIATION

Measures of central tendency yield information about the centre or middle part of a data set. However, business researchers can use another group
of analytic tools, measures of variability, to describe the spread or the dispersion of a set of data.

Three Distributions with the Same Mean but Different Dispersions

RANGE: HIGHEST- LOWEST VALUE


• The range is the difference between the largest value of a data set and the smallest value of a set.
• A Crude measure of variability, it is affected by extreme values and hence, its application as a measure of variability is limited.
• One important use of the range is in quality assurance, where the range is used to construct control charts.

INTERQUARTILE RANGE: Q3-Q1


• The interquartile range is the range of values between the first and third quartile. Essentially, it is the range of the middle 50% of the data and is
determined by computing the value of Q3- Q1

Number Deviation Absolute Squared


(N=5) from Mean Deviation Deviations
x x-μ |x-μ| (x-μ)2
5 -8 8 64
9 -4 4 16
16 3 3 9
17 4 4 16
18 5 5 25
Σx=65 Σ(x-μ)=0 Σ|x-μ|=24 SSx=Σ(x-μ)2=130
Mean Mean Absolute Deviation Variance Standard Deviation = σ =
μ=65/5=13 =

QTM Page 5
Standard Deviation = σ =
μ=65/5=13 |( )| ( ) ⎯⎯⎯⎯⎯⎯⎯⎯⎯
MAD = ⎯⎯⎯⎯⎯⎯ = ⎯⎯=4.8 σ2 = ⎯⎯⎯⎯⎯⎯⎯=⎯⎯⎯= 𝛴(𝑥 − 𝜇) ⎯⎯⎯
26 ⎯⎯⎯⎯⎯⎯⎯⎯⎯= √26
𝑁
= 5.1
N: No. of data entries
SSx=Sum of squared deviations = Σ(x-μ)2
( )
Variance = σ2 = ⎯⎯⎯⎯⎯⎯⎯
⎯⎯⎯⎯⎯⎯
( )
Standard Deviation = σ = ⎯⎯⎯⎯⎯⎯

EMPIRICAL RULE
The empirical rule is an important rule of thumb that is used to state the approximate per centage of values that lie within a given number of
standard deviations from the mean of a set of data if the data are normally distributed.

Distance From Value within


Mean Distance
μ ± 1σ 68%
μ ± 2σ 95%
μ ± 3σ 99.7%

CHEBYSHEV'S THEOREM
• The empirical rule applies only when data are known to be approximately normally distributed.
• Chebyshev’s theorem applies to all distributions regardless of their shape and thus can be used whenever the data distribution shape is
unknown or is nonnormal.
• Chebyshev’s theorem states that at least 1 − ⎯⎯ values will fall within ±k standard deviations of the mean (μ±kσ) regardless of the shape of
the distribution.
• Specifically, Chebyshev’s theorem says that at least 75% of all values are within ±2σ of the mean regardless of the shape of a distribution
because if k = 2, then 1 − ⎯⎯= 1 - ¼= ¾ = .75.
• According to Chebyshev’s theorem, the percentage of values within three standard deviations of the mean is at least 89%, in contrast to 99.7%
for the empirical rule.

CHEBYSHEV'S THEOREM APPLICATION :


Q: In the computing industry the average age of professional employees tends to be younger than in many other business professions. Suppose the
average age of a professional employed by a particular computer firm is 28 with a standard deviation of 6 years. A histogram of professional
employee ages with this firm reveals that the data are not normally distributed but rather are amassed in the 20s and that few workers are over 40.
Apply Chebyshev’s theorem to determine within what range of ages would at least 80% of the workers’ ages fall.

A: Since, Chebyshev's Theorem states that 1 − ⎯⎯ values will fall within ±k standard deviations of the mean (μ±kσ):
1 − ⎯⎯= 0.8
Therefore, k2 = 5 ; k = 2.24

Now, μ = 28 and σ = 6 and k = 2.24 Therefore, 80% or 0.8 values will lie within (μ±kσ) = 28 ± 2.24 ×6 = 28 ± 13.44 = 14.35 TO 41.44 years of age

SAMPLE STANDARD DEVIATION AND VARIANCE:

QTM Page 6
SAMPLE STANDARD DEVIATION AND VARIANCE:
• The sample variance is denoted by s2 and the sample standard deviation by s as against population variance by σ2 and population standard
deviation by σ.
• The main use for sample variances and standard deviations is as estimators of population variances and standard deviations. Since in practical
cases we won't be able to find data for the whole population and would need to work from sample to population measurements.
• Both the sample variance and sample standard deviation use n- 1 in the denominator instead of n because using n in the denominator of a
sample variance results in a statistic that tends to underestimate the population variance.
• Instead of μ we use x̅ for sample mean. Other than these, it is the same formulae.

( ̅)
SAMPLE VARIANCE: S2= ⎯⎯⎯⎯⎯⎯⎯
⎯⎯⎯⎯⎯⎯⎯
( ̅)
SAMPLE STANDARD DEVIATION: 𝑠 = ⎯⎯⎯⎯⎯⎯⎯

The following is a data of six accounting firms:


x x-x̅ (x−x̄)2
1. 2654 1066.67 1137777.78
2. 2108 520.67 271093.78
3. 2069 481.67 232002.78
4. 1664 76.67 5877.78
5. 720 -867.33 752267.11
6. 309 -1278.33 1634136.11
Σx = 9524.00 Σ(x-x̅)2 = 4033155.33
x̅ = 1587.33

( ̅) 4033155.33
SAMPLE VARIANCE: S2=⎯⎯⎯⎯⎯⎯⎯= ⎯⎯⎯⎯⎯⎯⎯⎯= 806631.07
⎯⎯⎯⎯⎯⎯⎯
⎯⎯ ( ̅) ⎯⎯⎯⎯⎯⎯⎯⎯⎯
SAMPLE STANDARD DEVIATION: 𝑠 = √𝑠 = ⎯⎯⎯⎯⎯⎯⎯= √806631.07 = 898.13

No ce: We used x̅ instead of μ and n-1 (5) instead of N (6) here.

Z SCORE
• a value’s (x's) raw distance from the mean into units of standard deviations. How much a certain value x is above or below the mean in units of
standard deviation.
• The z distribution is a normal distribution with a mean of 0 and a standard deviation of 1

• 𝑧 = ⎯⎯⎯
̅
• For Population. 𝑧 = ⎯⎯⎯, for sample. ( Value - Mean / Standard deviation)
• If a z score is negative, the raw value (x) is below the mean. If the z score is positive, the raw value (x) is above the mean.
• For a normally distributed data set, with 𝜇 = 50, and 𝜎 = 10, suppose a staistician wants to find a z score for a value of 70, Then,
𝑧 = ⎯⎯⎯ = (70 -50) / 10=2
Here a positive z score of 2 indicates that the data point 70, lies two standard deviations above the mean.

• Between z=-1.00 and z=+1.00 are approximately 68% of the values.


Between z=-2.00 and z=+2.00 are approximately 95% of the values.
Between z=-3.00 and z=+3.00 are approximately 99.7% of the values.

Q: What is the probability of obtaining a score greater than 700 on a GMAT test that has a mean of 494 and a standard deviation of 100? Assume
GMAT scores are normally distributed.

QTM Page 7
QTM Page 8
Z score
Excel Calc...

COEFFICIENT OF VARIATION:
• The coefficient of variation is a statistic that is the ratio of the standard deviation to the mean expressed in percentage and is denoted CV.
𝜎
• 𝐶𝑉 = ⎯⎯(100)
𝜇
• CV tells how much percentage standard deviation is of mean.

QTM Page 9

You might also like