Professional Documents
Culture Documents
Measures of central tendency yield information about “particular places or locations in a group of numbers.”
They yield information about the centre, or middle part, of a group of numbers.
MODE:
Mode - the most frequently occurring value in a data set
• Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio)
• Can be used to determine what categories occur most frequently
• Sometimes, no mode exists (no duplicates)
• 2 Modes in dataset - Bimodal
• More than 2 Modes - Multimodal
APPLICATION: In the world of business, the concept of mode is often used in determining sizes. As an example, manufacturers who produce cheap
rubber flip-flops that are sold for as little as $1.00 around the world might only produce them in one size in order to save on machine setup costs. In
determining the one size to produce, the manufacturer would most likely produce flip-flops in the modal size.
MEDIAN:
Median - middle value in an ordered array of numbers.
• Half the data are above it, half the data are below it
n=10 => ⎯⎯⎯ = 11/2 th = 5.5th = average of 5th and 6th ordered observation
The median is unaffected by the extreme values. Used for measuring salaries, age, etc.
MEAN
Mean is the average of a group of numbers
• Applicable for interval and ratio data
• Not applicable for nominal or ordinal data
• Affected by each value in the data set, including extreme values which may become a disadvantage when extreme values (very large or very
small) pull the mean towards a higher or a lower value distorting the assessment of the sample or the population.
• Computed by summing all values in the data set and dividing the sum by the number of values in the data set
• The popula on mean is represented by the Greek le er mu 'µ'. The sample mean is represented by X-bar ' x̅ '.
𝛴𝑥 𝑥 + 𝑥 + 𝑥 + ⋯+𝑥
• 𝑃𝑂𝑃𝑈𝐿𝐴𝑇𝐼𝑂𝑁 𝑀𝐸𝐴𝑁: 𝜇 = ⎯⎯⎯ = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝑁 𝑁
∑𝑥 𝑥 + 𝑥 + 𝑥 + ⋯+ 𝑥
• 𝑆𝐴𝑀𝑃𝐿𝐸 𝑀𝐸𝐴𝑁: 𝑥̅ = ⎯⎯⎯⎯= ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝑛 𝑛
• 'N' is the number of terms in the population, and 'n' is the number of terms in the sample.
• The sample mean only considers a selected number of observations—drawn from the population data. The population mean, on the other
hand, considers all the observations in the population—to compute the average value.
WHY POPULATION AND SAMPLE FORMULAE ARE DIFFERENT AND NEEDED AS SUCH?
QTM Page 1
WHY POPULATION AND SAMPLE FORMULAE ARE DIFFERENT AND NEEDED AS SUCH?
• For smaller numbers we can effectively calculate population mean, standard deviation etc. however, it isn't possible when we have millions of
products or hundreds of millions of vehicles. We can't approach every single person on the planet or in a country or even in a town. So, for
practical reasons in our worldly activities, we take a sample and then by doing survey or research on that sample we test our hypotheses or
arrive at a result and then we predict or infer results upon the whole population. Hence there is a need for separate considerations of sample
and population and hence we have separate formulae for them.
Estimation The sample mean is used to estimate the population mean and make The population mean is a known or unknown value that is of
inferences about the population based on the sample. interest and may not require estimation.
Sampling Error The sample mean is subject to sampling error, which is the difference The population mean does not have sampling error as it
between the sample mean and the population mean. represents the true average of the entire population.
Bias The sample mean may be biased due to the sampling method used or the The population mean is unbiased as it represents the true
characteristics of the sample. average of the entire population.
Statistical The sample mean is used in statistical inference techniques, such as The population mean may serve as a reference point for
Inference hypothesis testing and confidence interval estimation. statistical comparisons or as a benchmark.
Precision The sample mean is typically less precise than the population mean due The population mean is often more precise as it considers all
to the smaller sample size and variability. the observations in the population.
Central Limit The sample mean tends to follow a normal distribution as the sample size The population mean does not require the Central Limit
Theorem increases, according to the Central Limit Theorem. Theorem as it represents the true average of the population.
Accuracy The sample mean may or may not be accurate in estimating the The population mean is the accurate and true average of the
population mean, depending on the representativeness and sampling entire population.
method.
Sampling Frame The sample mean is based on a specific sampling frame or the set of The population mean considers all individuals or units in the
individuals or units from which the sample is drawn. population.
Statistical The sample mean has statistical properties, such as variance, standard The population mean has statistical properties that describe
Properties deviation, and confidence interval, which are calculated based on the the variability and distribution of the values in the population.
sample.
Efficiency The sample mean may be less efficient than the population mean in The population mean is the most efficient estimator of the
estimating the true average due to the smaller sample size. true average as it considers all the observations in the
population.
Characteristics The sample mean may not accurately represent all the characteristics The population mean represents all the characteristics and
and parameters of the population mean. parameters of the population.
Statistical Tests The sample mean is used in various statistical tests, such as t-tests and The population mean may be used as a reference point in
ANOVA, to assess differences between groups or variables. statistical tests or comparisons.
PROBLEM 3.1:
Arrange in ascending order:
2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 8, 8, 8, 9
Mode=2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 7, 8, 8, 8, 9
The mode = 4
4 is the most frequently occurring value
PROBLEM 3.5 Average closing price of a group of stocks on the New York stock exchange : 21,21,21,22,23,25,28,29,33,35,38,56,61
Find Mean, Median and Mode
QTM Page 2
Mode = 21 since it occurs the most number of times,3.
What do you think is the right measure of size of shoes for the class?
A: Median and mode sized shoe would fit a higher number than mean
Key Takeaway:
• When large number of small values are involved, a mode or a median can be disastrous. Mode will lead to a small value, median will also lead
to small value, and the outlier, which are large values will be left out and will not contribute to the picture.
• The mode of the salary, net worth of people of India: Mode, or median would be a good idea.
• Also the purpose: when a tax rule is to be applied: mean would be good, but when a subsidy is to be distributed, mode or median may be a
good idea.
• A shoe company, t shirt making company may like knowing mode then median, or mean. Since they would know of the maximum bought/
demanded product.
• In general, if there are outliers, the median is preferred to the mean
The number of U.S. cars in service by top car rental companies in a recent year according to Auto Rental News follows.
Company Number of Cars in Service
Enterprise 643,000; Hertz 327,000; National/Alamo 233,000; Avis 204,000; Dollar/Thrifty 167,000; Budget 144,000; Advantage 20,000; U-Save 12,000;
Payless 10,000; ACE 9,000; Fox 9,000; Rent-A-Wreck 7,000; Triangle 6,000
Compute the mode, the median, and the mean.
A:
DATA: ASCENDING ORDER ARRANGED DATA:
Enterprise 6,43,000 Triangle 6,000
Hertz 3,27,000 Rent-A-Wreck 7,000
National/Alamo 2,33,000 ACE 9,000
Avis 2,04,000 Fox 9,000
Dollar/Thrifty 1,67,000 Payless 10,000
Budget 1,44,000 U-Save 12,000
Advantage 20,000 Advantage 20,000
U-Save 12,000 Budget 1,44,000
Payless 10,000 Dollar/Thrifty 1,67,000
QTM Page 3
ACE 9,000 Avis 2,04,000
Fox 9,000 National/Alamo 2,33,000
Rent-A-Wreck 7,000 Hertz 3,27,000
Triangle 6,000 Enterprise 6,43,000
Percentile:
Percentiles are measures of central tendency that divide a group of data into 100 parts. There are 99 percentiles because it takes 99 dividers to
separate a group of data into 100 parts.
A 75th percentile of a group of data is a value that indicates that at least 75% of all values of that group are below 75th percentile and no more than
25% of that group of data are above it.
i = ⎯⎯⎯(𝑁) where,
P = the percentile of interest
i = percentile location
N = number in the data set
For example, suppose you want to determine the 80th percentile of 1240 numbers. P is 80 and N is 1240. First, order the numbers from lowest to
highest. Next, calculate the location of the 80th percentile.
i = ⎯⎯⎯(1240)
Because i = 992 is a whole number, follow the directions in step 3(a). The 80th percentile is the average of the 992nd number and the 993rd number.
Q: Determine the 30th percentile of the following numbers: 14, 12, 19, 23,5,13,28,17
P =30, N =8
i = ⎯⎯⎯(8) = 2.4
Since this value is not a whole number we would see the next whole number which would be 3rd number in the data set, i.e. 13
Quartiles
Quartiles are measures of central tendency that divide a group of data into four subgroups or parts. The three quartiles are denoted as Q1, Q2, and
Q3.
• The first quartile,Q1, separates the first, or lowest, one-fourth of the data from the upper three-fourths and is equal to the 25th percentile.
• The second quartile, Q2, separates the second quarter of the data from the third quarter. Q2 is located at the 50th percentile and equals the
median of the data.
• The third quartile, Q3, divides the first three-quarters of the data from the last quarter and is equal to the value of the 75th percentile.
QTM Page 4
MEASURES OF VARIABILITY: UNGROUPED DATA
RANGE
INTERQUARTILE RANGE
DEVIATION FROM MEAN, ABSOLUTE DEVIATION, SQUARED DEVIATION & VARIANCE
STANDARD DEVIATION & MEANING OF STANDARD DEVIATION
EMPIRICAL RULE
CHEBYSHEV'S THEOREM
POPULATION VERSUS SAMPLE STANDARD DEVIATION AND VARIANCE
Z-SCORES
COEFFICIENT OF VARIATION
Measures of central tendency yield information about the centre or middle part of a data set. However, business researchers can use another group
of analytic tools, measures of variability, to describe the spread or the dispersion of a set of data.
QTM Page 5
Standard Deviation = σ =
μ=65/5=13 |( )| ( ) ⎯⎯⎯⎯⎯⎯⎯⎯⎯
MAD = ⎯⎯⎯⎯⎯⎯ = ⎯⎯=4.8 σ2 = ⎯⎯⎯⎯⎯⎯⎯=⎯⎯⎯= 𝛴(𝑥 − 𝜇) ⎯⎯⎯
26 ⎯⎯⎯⎯⎯⎯⎯⎯⎯= √26
𝑁
= 5.1
N: No. of data entries
SSx=Sum of squared deviations = Σ(x-μ)2
( )
Variance = σ2 = ⎯⎯⎯⎯⎯⎯⎯
⎯⎯⎯⎯⎯⎯
( )
Standard Deviation = σ = ⎯⎯⎯⎯⎯⎯
EMPIRICAL RULE
The empirical rule is an important rule of thumb that is used to state the approximate per centage of values that lie within a given number of
standard deviations from the mean of a set of data if the data are normally distributed.
CHEBYSHEV'S THEOREM
• The empirical rule applies only when data are known to be approximately normally distributed.
• Chebyshev’s theorem applies to all distributions regardless of their shape and thus can be used whenever the data distribution shape is
unknown or is nonnormal.
• Chebyshev’s theorem states that at least 1 − ⎯⎯ values will fall within ±k standard deviations of the mean (μ±kσ) regardless of the shape of
the distribution.
• Specifically, Chebyshev’s theorem says that at least 75% of all values are within ±2σ of the mean regardless of the shape of a distribution
because if k = 2, then 1 − ⎯⎯= 1 - ¼= ¾ = .75.
• According to Chebyshev’s theorem, the percentage of values within three standard deviations of the mean is at least 89%, in contrast to 99.7%
for the empirical rule.
A: Since, Chebyshev's Theorem states that 1 − ⎯⎯ values will fall within ±k standard deviations of the mean (μ±kσ):
1 − ⎯⎯= 0.8
Therefore, k2 = 5 ; k = 2.24
Now, μ = 28 and σ = 6 and k = 2.24 Therefore, 80% or 0.8 values will lie within (μ±kσ) = 28 ± 2.24 ×6 = 28 ± 13.44 = 14.35 TO 41.44 years of age
QTM Page 6
SAMPLE STANDARD DEVIATION AND VARIANCE:
• The sample variance is denoted by s2 and the sample standard deviation by s as against population variance by σ2 and population standard
deviation by σ.
• The main use for sample variances and standard deviations is as estimators of population variances and standard deviations. Since in practical
cases we won't be able to find data for the whole population and would need to work from sample to population measurements.
• Both the sample variance and sample standard deviation use n- 1 in the denominator instead of n because using n in the denominator of a
sample variance results in a statistic that tends to underestimate the population variance.
• Instead of μ we use x̅ for sample mean. Other than these, it is the same formulae.
( ̅)
SAMPLE VARIANCE: S2= ⎯⎯⎯⎯⎯⎯⎯
⎯⎯⎯⎯⎯⎯⎯
( ̅)
SAMPLE STANDARD DEVIATION: 𝑠 = ⎯⎯⎯⎯⎯⎯⎯
( ̅) 4033155.33
SAMPLE VARIANCE: S2=⎯⎯⎯⎯⎯⎯⎯= ⎯⎯⎯⎯⎯⎯⎯⎯= 806631.07
⎯⎯⎯⎯⎯⎯⎯
⎯⎯ ( ̅) ⎯⎯⎯⎯⎯⎯⎯⎯⎯
SAMPLE STANDARD DEVIATION: 𝑠 = √𝑠 = ⎯⎯⎯⎯⎯⎯⎯= √806631.07 = 898.13
Z SCORE
• a value’s (x's) raw distance from the mean into units of standard deviations. How much a certain value x is above or below the mean in units of
standard deviation.
• The z distribution is a normal distribution with a mean of 0 and a standard deviation of 1
• 𝑧 = ⎯⎯⎯
̅
• For Population. 𝑧 = ⎯⎯⎯, for sample. ( Value - Mean / Standard deviation)
• If a z score is negative, the raw value (x) is below the mean. If the z score is positive, the raw value (x) is above the mean.
• For a normally distributed data set, with 𝜇 = 50, and 𝜎 = 10, suppose a staistician wants to find a z score for a value of 70, Then,
𝑧 = ⎯⎯⎯ = (70 -50) / 10=2
Here a positive z score of 2 indicates that the data point 70, lies two standard deviations above the mean.
Q: What is the probability of obtaining a score greater than 700 on a GMAT test that has a mean of 494 and a standard deviation of 100? Assume
GMAT scores are normally distributed.
QTM Page 7
QTM Page 8
Z score
Excel Calc...
COEFFICIENT OF VARIATION:
• The coefficient of variation is a statistic that is the ratio of the standard deviation to the mean expressed in percentage and is denoted CV.
𝜎
• 𝐶𝑉 = ⎯⎯(100)
𝜇
• CV tells how much percentage standard deviation is of mean.
QTM Page 9