0% found this document useful (0 votes)

128 views71 pages

Applied Statistics for Decision-Making

The document discusses key concepts in statistics and data analysis. It introduces important terms like population, sample, parameter, statistic, outliers, and data distributions. It explains different measures of central tendency - mode, mean, and median. The mode is the most common value, the mean is the average, and the median divides the data set in half. The mean can be affected by outliers while the median and mode are not. Understanding these concepts is important for analyzing data sets and making decisions.

Uploaded by

devashreechande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

128 views71 pages

Applied Statistics for Decision-Making

Uploaded by

devashreechande

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

APPLIED STATISTICS FOR DECISION-MAKING

The fascinating world of numbers…

Workbook Part 1
Prof. S. Arvind
NUMBERS ARE IMPORTANT, BUT…
Factfulness by Hans Rosling

I don’t love numbers. I am a huge, huge fan of data, but I don’t love it. It has its limits. I love data only
when it helps me to understand the reality behind the numbers, i.e., people’s lives. In my research, I
have needed the data to test my hypotheses, but the hypotheses themselves often emerged from
talking to, listening to, and observing people. Though we absolutely need numbers to understand the
world, we should be highly skeptical about conclusions derived purely from number crunching.

The prime minister from Mozambique from 1994 to 2004, Pascoal Mocumbi, visited Stockholm in 2002
and told me that his country was making great economic progress. I asked him how he knew that;
after all, the quality of the economic statistics in Mozambique was probably not very good. Had he
looked at GDP per capita?

“I do look at those figures,” he said, “but they are not so accurate. So, I have also made it a habit to
watch the marches on 1st May every year. They are a popular tradition in our country. And I look at
people’s feet and what kind of shoes they have. I know that people do their best to look good on that
day. I know that they cannot borrow their friend’s shoes, because their friend will be out marching, too.
So, I look. And I can see, if they walk barefoot, or if they have bad shoes or if they have good shoes.
And I can compare with what I saw last year.

“Also, when I travel across the country, I look at the construction going on. If the grass is growing over
new foundations, that is bad. But if they keep adding new bricks to the building, then I know people
have money to invest, not just to consume day-to-day.”

A wise prime minister looks at the numbers, but not only at the numbers.

And, of course, some of the most valued and important aspects of human development cannot be
measured in numbers at all. We can estimate suffering from disease using numbers. We can measure
improvements in material living conditions using numbers. But the end goal of economic growth is
individual freedom and culture, and these values are difficult to capture with numbers. The idea of
measuring human progress in numbers seems completely bizarre to many people. I often agree. The
numbers will never tell the full story on what life on Earth is all about.

The world cannot be understood without numbers. But the world cannot be understood with numbers
alone.
TERMS WE NEED TO KNOW…
HISTOGRAM
A bar chart where vertical bars represent the frequency or percentage of the class
(group or individual member).
The variable of interest is displayed along the X-axis.
Note: There should be no gaps between adjacent bars.

DATA DISTRIBUTION OR DISTRIBUTION

A graphical display of statistical data under study.
The variable of interest is displayed along the X-axis.
The frequency or percentage of different values of the variable is displayed along
the Y-axis.
Note: A histogram is one example of a data distribution.
ANALYSING DATA SETS
TERMS WE NEED TO KNOW…
STATISTICS
The branch of mathematics that deals with collecting, analysing,
presenting and interpreting data/information for decision-making.

POPULATION
The set of all the items or individuals of interest in a particular study.
Example: All the students who have graduated from our school since
inception.

SAMPLE
A collection of a portion of the population selected for analysis. Thus, a
sample is a sub-set of the population.
Example: 80 students selected at random from the 4,800 students who
graduated from our school since inception (population).

REPRESENTATIVE SAMPLE
A sample that is expected to be similar to the population in its
characteristics. Inferences based upon the sample are expected to be
reasonably accurate for the population as well.
PICKING SAMPLES AT RANDOM

2-DIGIT RANDOM
NUMBER TABLE

EXCEL FUNCTION:
RANDBETWEEN(lowest, highest)
Example: RANDBETWEEN(00,99)
TERMS WE NEED TO KNOW…
PARAMETER (term starts with ‘P’)
A descriptive measure for a population.
Example: Average marks scored in the first quiz of this [entire] batch of
students.

STATISTIC (term starts with ‘S’)

A descriptive measure for a sample.
Example: Average marks scored in the first quiz of a sample from this
batch of students.
PARAMETER AND STATISTIC

Parameter Statistic
Average marks scored Average marks scored
by all students of by a sample of students
this batch. of this batch.
TERMS WE NEED TO KNOW…
OUTLIER
A data-point/observation whose value lies at an ‘abnormal’ distance from
the rest of the data-set.
Point to ponder over: What is meant by ‘abnormal distance’?
Example: Student marks in an exam that qualify for ‘A+’ or ‘F’ grades.

SKEWNESS
The asymmetry of a data distribution.
INDEX OF SKEWNESS
A measure of the degree of asymmetry of a data distribution.
IDENTIFYING
OUTLIERS

Abnormal
Distance?
UNDERSTANDING DISTRIBUTION
OF DATA
Properties to describe data distribution

1 Central Tendency
2 Dispersion
3 Skewness
CENTRAL TENDENCY

Measures of central tendency

1 Mode
2 Mean (arithmetic)
3 Median
CENTRAL TENDENCY - MODE
The Mode is the score or qualitative category that
occurs most frequently in the data set.

Example:
On studying trends over three years in the sales of
white shirts of sizes 35”, 37” 39” and 41”, a garment
manufacturer finds that 37” size is the fastest moving,
accounting for 43% of total sales.

THE MODE TELLS US WHAT IS MOST TYPICAL.

Examples:
1 Most families in Europe have four members.
2 Graduates’ favourite colour in 2021 is maroon.
Important for
63 years
Marketing!
CENTRAL TENDENCY - MODE
NOTE
The Mode is usually arrived at by inspection, not
computation.

Example of grades in a class:

B, A, B, B, C, A, B, B, C, B, A, B, B

VALUE FREQUENCY
A 3
B 8
C 2
Quiz 1:
Can a data set have more than one Mode?
Quiz 2:
Can a data set have no Mode?

Every value
has the same
frequency!
Quiz 3:
Can the Mode be an extreme value?

Data Set
A+ A+ A+ A B+ B+ B C+ C+ C D

The concept of ‘Central Tendency’?

CENTRAL TENDENCY - MEAN
The (Arithmetic) Mean is the sum of scores divided by
the number of scores.

It usually represents the “average”.

Caution:
Check if the mean is to be computed without or with
weights (simple versus weighted mean).
If weights are assigned, we would be computing the
‘weighted mean’ or ‘weighted average’.
Note – Recent values are usually given more
importance than older ones.
CENTRAL TENDENCY - MEAN
NOTE 1
A numerical data set has only one Mean.

NOTE 2
The Mean cannot be applied to qualitative data.
CENTRAL TENDENCY - MEDIAN
The Median is the single point in a distribution
that divides the data set into two groups of equal
frequency.

It represents the “middlemost” point of the

distribution.

Caution:
The median can often be observed; sometimes it
has to be computed.
Median value is not affected by extreme values [outliers].
CENTRAL TENDENCY - MEDIAN
NOTE
In an ordered data set,

If n is odd, the median is the (n + 1)/2th score from

either end of the line.

If n is even, the median is the midway point

(mean) between the n/2th score and the (n/2) + 1th
score from either end of the line.
CENTRAL TENDENCY - MEDIAN
Example: n = 7
Values – 8, 13, 15, 19, 26, 33, 37
Median is the (n + 1)th value: 4th value = 19

Example: n = 8
Values – 8, 13, 15, 19, 26, 33, 37, 43
Median is the mean of the middle two values
= Mean of 4th and 5th values = 22.5
COMPARING MEAN & MEDIAN

Outliers, if present.

Mean divides the data set into two equal parts by Value.
Mean is affected by the presence of extreme values [outliers].

Median divides the data set into two equal parts by Frequency
[number of data points].
Median is unaffected by the presence of extreme values [outliers].
LOCATING CENTRAL TENDENCIES

A data distribution may have multiple Modes,

but can have only one Mean and one Median.
MEAN, MEDIAN & MODE
MODE # Applies to both quantitative & qualitative data.
# A data set may contain no modes.
# A data set may contain multiple modes.
# The mode may be an extreme data point.
Use to determine the most ‘typical’ occurrence.
MEAN # Applies to quantitative data only.
# A data set can contain only one mean.
# Dependent on the value of each data point.
# The mean is affected by extreme values.
Divides the data set into two equal parts by Value.
MEDIAN # Applies to quantitative data only.
# A data set can contain only one median.
# Not dependent on the values of data points.
# The median is not affected by extreme values.
Divides the data set into two equal parts by Number
of Values.
LOCATING CENTRAL TENDENCIES

For Symmetrical Distributions,

Mean = Median = Mode (unimodal cases)
LOCATING CENTRAL TENDENCIES

Significant tail on left Equal tails Significant tail on right

LOCATING CENTRAL TENDENCIES

Negatively or Left Skewed Distribution:

Mean is ‘pulled’ to the left by data points in the significant tail.
Median and Mode are unaffected by the values of the data points.
LOCATING CENTRAL TENDENCIES

Positively or Right Skewed Distribution:

Mean is ‘pulled’ to the right by data points in the significant tail.
Median and Mode are unaffected by the values of the data points.
LOCATING CENTRAL TENDENCIES

Mean & Median

What’s that?
PRACTICAL WORKING RULE
When dealing with a skewed quantitative
distribution, consider using Median instead
of Mean for the average.
WHAT WOULD THE DATA DISTRIBUTION OF USAIN BOLT’S
100 m TIMINGS LOOK LIKE (LAST 50 RACES)?
THE 1% SUPER-RICH
In this case, are the values being considered or the number of values?
SAMPLE EXERCISE
30 applicants for a driving license scored the following marks in
the written test (in ascending order).

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
R1 4 6 6 9 11 11 13 16 17 19
R2 23 27 28 31 32 35 37 37 42 43
R3 46 47 51 53 54 58 59 64 72 78

The driving school issues grades from A (highest) to F (lowest)

where A and F grades are awarded to outliers on a relative scale.

Required of you:
1) Compute the Mode, Mean and Median of the data set.
2) Identify the outliers in the data set.
TERMS WE NEED TO KNOW…
Dispersion/Variation/Variability of a Data Distribution

DEGREES OF VARIATION ZERO VARIATION

Frequency = 16

5’ 9”
Variable of interest
5’ 0” 5’ 3” 5’ 6” 5’ 9” 6’ 0” 6’ 3” 6’ 6”
(height of students)
Variable of interest
(height of students)
TERMS WE NEED TO KNOW…
Dispersion/Variation/Variability of a Data Distribution

RANGE
Difference between the largest and smallest values in the data set.
Note: a) Considers only extreme values; ignores all others.
b) Is always positive.
c) Simple to compute.

VARIANCE
Is the average (mean) of the squared deviations of observations from the
mean in the data set.
Note: a) Higher the variance, greater is the data dispersed from the mean.
b) If all observations in the data set have the same value, the
variance is zero.
c) Its unit of measure is meaningless.

STANDARD DEVIATION
Is the square root of variance.
Note: a) Its unit of measure is the same as the average and deviation.
b) Considers all the data points in its computation.
NOTATIONS
For Mean and Standard Deviation

POPULATION [Greek] SAMPLE [English]

Mean :µ Mean : Xbar

Standard Deviation : σ Standard Deviation : S

Variance : σ2 Variance : S2
FORMULAE FOR VARIANCE

For data relating to a population,

Variance σ2 = ∑ (xi - µ)2 / N

For data relating to a sample,

Variance s2 = ∑ (xi - xbar)2 / (n – 1)

Xi is the value of each data point in the data set.

N is the population size.
n is the sample size.
µ is the mean of the population (parameter).
Xbar is the mean of the sample (statistic).
WHY USE (n – 1) FOR SAMPLES?

Bessel’s Correction
To compensate for sampling errors (the sample not
accurately representing the population).
DEGREES OF FREEDOM
The number of variables in a system that are free to
vary without violating any constraint.

In other words, how many variables does one need

to specify in order to define the system?

To locate a point in space, df = 3

FORMULAE FOR STANDARD DEVIATION

To compute the standard deviation of a

data set, first compute the variance.
Then compute the square root of variance.

For data relating to a population,

Standard deviation σ = √σ2

For data relating to a sample,

Standard deviation s = √s2
1 To compute Standard Deviation, we need to first
compute the Variance.

2 Two or more Standard Deviations cannot be added

to compute the combined value.
COEFFICIENT OF VARIATION
COV = SD/MEAN

Variance and Standard Deviation : Absolute measures of data dispersion

Example: Standard deviation of scores in two exams

Statistics: 16% Research Methods: 13%

Mean of scores in the two exams

Statistics: 76% Research Methods: 58%

Coefficient of Variation of scores in the two exams

Statistics: 16%/76% Research Methods: 13%/58%
= 0.211 = 0.224

RELATIVE MEASURES ARE USUALLY MORE

MEANINGFUL THAN ABSOLUTE MEASURES.
COEFFICIENT OF VARIATION

EXAMPLE:
Estimated profit in a project [3 years] = $10,000,000
Standard Deviation of expected profit = $16,000

Coefficient of Variation = 0.16%

Significant?
COMBINING TWO DATA SETS

MEANS AND STANDARD DEVIATIONS

ARE NOT ADDITIVE!
FOR POPULATIONS

Population 1
Number of data points = N1
Mean = µ1
Standard Deviation = σ1

Population 2
Number of data points = N2
Mean = µ2
Standard Deviation = σ2

Combined Mean (weighted average)

Combined Mean = [(N1 * µ1) + (N2 * µ2)]

(N1 + N2)
FOR POPULATIONS
Combined Variance (weighted average)

Combined Variance =

[(N1 * σ12) + (N2 * σ22)] + [N1 * (µ1 - µ)2] + [N2 * (µ2 - µ)2]

(N1 + N2)

Re-written as:

Combined Variance =

N1 * [σ12 + (µ1 - µ)2] + N2 * [σ22 + (µ2 - µ)2]

(N1 + N2)
FOR SAMPLES
Sample 1
Number of data points = n1
Mean = X1bar
Standard Deviation = S1

Sample 2
Number of data points = n2
Mean = X2bar
Standard Deviation = S2

Combined Mean (weighted average)

Combined Mean = [(n1 * X1bar) + (n2 * X2bar)]

(n1 + n2)
FOR SAMPLES
Combined Variance (weighted average)

Combined Variance =

[(n1 * s12) + (n2 * s22)] + [n1 * (X1bar - Xbar)2] + [n2 * (X2bar - Xbar)2]

(n1 + n2)

Re-written as:

Combined Variance =

n1 * [s12 + (X1bar - Xbar)2] + n2 * [s22 + (X2bar - Xbar)2]

(n1 + n2)
EXERCISE: COMBINING TWO DATA SETS
Students’ marks for the final exam of Marketing Management for the first
two batches of EMBA are shown below.

Required of you:
1 Compute the mean and standard deviation of the marks for the two
batches separately.
2 Compute the mean and standard deviation of the two batches combined.

BATCH 1 [N = 14]
73 67 68 74 70 72 73 72
75 73 75 69 69 56

BATCH 2 [N = 12]
74 50 72 69 71 72 72 68
72 70 70 68
EXERCISE: COMBINING TWO DATA SETS

BATCH 1 [N = 14]:
Total = 986 Mean = 70.43 Std. Dev. = 4.70

BATCH 2 [N = 12]:
Total = 828 Mean = 69.00 Std. Dev. = 5.99

COMBINED MEAN:
[986 + 828]/26 = 69.77

COMBINED VARIANCE:
{14 x [4.72 + (70.43 – 69.77)2] + 12 X [5.992 + (69 – 69.77)2]}/(14 + 12)
= 28.96

COMBINED STANDARD DEVIATION:

= 5.38
CLASSIFICATION OF DATA

DATA

QUALITATIVE DATA QUANTITATIVE DATA

CONTINUOUS DISCRETE
DATA DATA
CLASSIFICATION OF DATA
CONTINUOUS DATA
As the name suggests, with continuous data, the variable can take any
value in a defined range without any breaks. In other words, there are
an ‘infinite’ number of possible values the variable can take.

Continuous data should be measured (not counted). To specify the

value of a variable using continuous data, we should ask the question,
“How much?”

Examples: Measurement of weight, height, temperature, purity

DISCRETE DATA
Discrete classes of data involve breaks between the classes. Usually, the
number of possible values the variable can take are limited and known.

Discrete data should be counted (not measured). To specify the value of

a variable using discrete data, we should ask the question,
“How many?”

Examples: Measurement of students present in class, number of

textbooks prescribed for a course, seats on a plane
EXERCISES ON CENTRAL TENDENCY AND DISPERSION OF DATA

Exercise 1
These data are a sample of the daily production rate of fiberglass boats from Hydrosport
Limited.

17 21 18 27 17 21 20 22 18 23

The company’s production manager feels that a standard deviation of more than three
boats per day indicates unacceptable production-rate variations. Should she be
concerned based on the above data?

Exercise 2
The reading readiness of pre-school children in two neighbourhoods was determined
through sampling and the data sets are displayed below.

a) Compute the mode, mean and median for each neighbourhood separately.
b) Compute the range and standard deviation for each neighbourhood separately.
c) Are the two distributions symmetrical or skewed?

Neighbourhood A
30 33 32 31 35 33
32 29 33 30 32 28
31 31 29 31 26 30
32 30 33 32 27 32

Neighbourhood B
29 32 28 29 29
30 31 26 30 28
28 29 29 34 30
29 27 30 31 35

Exercise 3
The head chef of The Flying Taco has just received two dozen tomatoes from her
supplier, but she isn’t ready to accept them. She knows from the invoice that the
average weight of a tomato is 7.5 ounces, but she insists that all be of uniform weight.
She will accept them only if the average weight is 7.5 ounces and the standard
deviation is less than 0.5 ounce.

Based on the weights of the tomatoes, determine what the head chef’s decision is.

6.3 7.2 7.3 8.1 7.8 6.8 7.5 7.8 7.2 7.5 8.1 8.2
8.0 7.4 7.6 7.7 7.6 7.4 7.5 8.4 7.4 7.6 6.2 7.4

1
RANGE AND STANDARD DEVIATION
When should one use each of these measures?

Exercise 4
A population with 20 numbers drawn at random from 0 to 99 has been tabulated below.

17 63 23 84 6 47 38 29 73 19
61 84 92 81 43 4 13 28 89 38

Part A
Compute the range and standard deviation of the population.

Part B
From the population, a random sample of five numbers is drawn, as tabulated below.

92 81 84 6 4

Compute the range and standard deviation of the sample.

Do you find the computed values strange?

CASE ANALYSIS
Life Expectancy in Top 10 Countries

RANK COUNTRY HEALTHY YEARS IN LIFE

YEARS ILL-HEALTH EXPECTANCY
1 Singapore 73.62 10.11 83.73
2 Japan 73.16 10.78 83.94
3 Spain 72.62 10.35 82.97
4 Switzerland 71.93 11.25 83.18
5 Italy 71.75 10.59 82.34
6 France 71.71 10.63 82.34
7 Australia 71.53 10.99 82.52
8 Norway 71.49 10.61 82.10
9 Iceland 71.48 10.79 82.27
10 Israel 71.44 10.70 82.14

Source: Global Burden of Disease Study, Institute for Health Metrics and Evaluation,
University of Washington

Comment on the data presented above.

2
UNDERSTANDING THE WORLD AS FOUR LEVELS
Factfulness by Hans Rosling

I am often quite rude during my presentations when people from the “developed world” use the
term “developing world”.

Afterward, people ask me, “So, what should we call them instead?”

But, listen carefully. It’s the same misconception: we and them. What should “we” call “them”
instead?

What we should do is stop dividing the countries of the world into two groups. It doesn’t make
sense anymore. It doesn’t help us to understand the world in a practical way. It doesn’t help
businesses find opportunities, and it doesn’t help aid money to find the poorest people.

But we need to do some kind of sorting to make sense of the world. We can’t give up our old
labels and replace them with… nothing. What should we do?

One reason the old labels are so popular is that they are so simple. But they are wrong! So, to
replace them, I will now suggest an equally simple but more relevant and useful way of dividing
up the world. Instead of dividing the world into two groups, I will divide it into four income levels,
as describe below.

Each figure in the chart represents 1 billion people, and the seven figures show how the current
world population is spread out across four income levels, expressed in terms of dollar income
per day. You can see that most people are living in the two middle levels, where people have
most of their basic human needs met.

Are you excited? You should be. Because the four income levels are the first, most important
part of your new fact-based framework. They are one of the simple thinking tools I promised
would help you to guess better about the world. So, I want to try to explain what life is like on
each of these four levels.

Think of the four income levels as the levels of a computer game. Everyone wants to move from
Level 1 to Level 2 and upward through the levels from there. Only, it’s a very strange computer
game, because Level 1 is the hardest. Let’s play.
Page | 1
LEVEL 1
You start on Level 1 with $1 per day. Your five children have to spend hours walking barefoot
with your single plastic bucket, back and forth, to fetch water from a dirty mud hole an hour’s
walk away. On their way home, they gather firewood, and you prepare the same grey porridge
that you have been eating at every meal, every day, for your whole life – except during the
months when the meagre soil yielded no crops and you went to bed hungry. One day, your
youngest daughter develops a nasty cough. Smoke from the indoor fire is weakening her lungs.
You cannot afford antibiotics, and one month later, she is dead. This is extreme poverty. Yet,
you keep struggling on. If you are lucky, and the yields are good, you can maybe sell some
surplus crops and manage to earn more than $2 a day, which would move you to the next level.
Good luck!
[Roughly 1 billion people live like this today.]

LEVEL 2
You’ve made it. In fact, you’ve quadrupled your income and now you earn $4 a day. Three extra
dollars every day. What are you going to do with all this money? Now you can buy food that you
didn’t grow yourself, and you can afford chickens, which means eggs. You save some money
and buy sandals for your children, and a bike, and more plastic buckets. Now it takes you only
half an hour to fetch water for the day. You buy a gas stove so your children can attend school
instead of gathering wood. When there’s power, they can do their homework under a bulb. But
the electricity is too unstable for a freezer. You save up for mattresses so you don’t have to
sleep on the mud floor. Life is much better now, but still very uncertain. A single illness and you
would have to sell most of your possessions to buy medicine. That would throw you back to
Level 1 again. Another three dollars a day would be good, but to experience really drastic
improvement, you need to quadruple again. If you can land a job in the local garment industry,
you will be the first member of your family to bring home a salary.
[Roughly 3 billion people live like this today.]

LEVEL 3
Wow! You did it! You work multiple jobs, 16 hours a day, seven days a week, and manage to
quadruple your income again, to $16 a day. Your savings are impressive and you install a cold-
water tap. No more fetching water. With a stable electric line, the kids’ home work improves and
you can buy a fridge that lets you store food and serve different dishes each day. You save to
buy a motorcycle, which means you can travel to a better-paying job at a factory in town.

Unfortunately, you crash on your way there one day, and you have to use money you had saved
for your children’s education to pay the medical bills. You recover, and thanks to your savings,
you are not thrown back a level. Two of your children start high school. If they manage to finish,
they will be able to get better-paying jobs than you have ever had. To celebrate, you take the
whole family on its first-ever vacation, one afternoon to the beach, just for fun.
[Roughly, 2 billion people live like this today.]

LEVEL 4
You have more than $32 a day. You are a rich consumer and three more dollars a day makes
very little difference to your everyday life. That’s why you think three dollars, which can change
the life of someone living in extreme poverty, is not a lot of money. You have more than 12

Page | 2
years of education and you have been on an airplane on vacation. You can eat out once a
month and you can buy a car. Of course, you have hot and cold water indoors.

But you know about this level already. Since you are reading this passage, I’m pretty sure you
live in Level 4. I don’t have to describe it for you to understand. The difficulty, when you have
always known this high level of income, is to understand the huge differences between the other
three levels. People on Level 4 must struggle hard not to misunderstand the reality of the other
6 billion people in the world.
[Roughly, 1 billion people live like this today.]

I’ve described the progress up the levels as if one person managed to move through several
levels. That is very unusual. Often, it takes several generations for a family to move from Level
1 to Level 4. I hope though that you now have a clear picture of the kinds of lives people live on
different levels; a sense that it is possible to move through the levels, both for individuals and for
countries; and above all the understanding, that there are not just two kinds of lives.

Human history started with everyone on Level 1. For more than 100,000 years, nobody made it
up the levels and most children didn’t survive to become parents. Just 200 years ago, 85% of
the world population was still on Level 1, in extreme poverty.

Today, the vast majority of people are spread out in the middle, across Levels 2 and 3, with the
same range of standards of living as people in Western Europe and North America in the 1950s.
And this has been the case for many years.

The gap instinct of classifying all data into two categories is very strong. The first time I lectured
to the staff of the World Bank was in 1999. I told them the labels “developing” and “developed”
were no longer valid. It took the World Bank 17 years and 14 more of my lectures before it
finally announced publicly that it was dropping these terms and would from now on divide the
world into four income groups. The UN and most other global organisations have still not made
this change.

Page | 3
Page | 4
QUARTILES, DATA SET SUMMARIES AND OUTLIERS

QUARTILES
Often, to understand and analyse a set of data, it is useful to divide the set into four equal parts so
that each part contains about 25% of the values. Each of the three dividing points (three dividers are
required for four parts) is called a ‘quartile’ and is defined as follows:

First Quartile, Q1 (25th percentile)

Divides the smallest 25% of the values from the other 75% that are larger.
Q1 = (n + 1)/4 ranked value

Second Quartile, Q2 (50th percentile) – also the median

Divides the data set so that 50% of the values are smaller than or equal to the median and 50% are
larger than or equal to the median.

Third Quartile, Q3 (75th percentile)

Divides the smallest 75% of the values from the largest 25%.
Q3 = 3 x (n + 1)/4 ranked value

INTERQUARTILE RANGE (MIDSPREAD)

The interquartile range or midspread is the difference between the third and first quartiles in a data
set. It measures the spread of the middle 50% of the values in the data set. Therefore, it is not
influenced by extreme values (which may be ‘outliers’).
Interquartile range [IQR] = Q3 – Q1

EXAMPLE
A data set consists of 21 recordings, which are shown in the table below. Compute the following:
1 The first, second and third quartiles
2 The interquartile range

8408 1374 1872 8879 2459 11413 608

14138 6452 1850 2818 1356 10498 7478
4019 4341 739 2127 3653 5794 8305

Solution
As the first step, we arrange the data set in ascending order as shown below.
608 739 1356 1374 1850 1872 2127
2459 2818 3653 4019 4341 5794 6452
7478 8305 8408 8879 10498 11413 14138

Median or Q2 is the 11th value: 4019

Q1 is the value ranked (n + 1)/4 : 1861
Q3 is the value ranked 3 x (n + 1)/4: 8356.50
Interquartile range [IQR]: [8356.50 - 1861] = 6495.50

Page | 1
FIVE-NUMBER SUMMARY
It is often convenient to summarise a data set by specifying five numbers:
1 Smallest value
2 First quartile, Q1
3 Second quartile (median), Q2
4 Third quartile, Q3
5 Largest value

In the previous example, the five numbers (in sequence) are:

608, 1861, 4019, 8356.50, 14138

The five-number summary is useful in judging the shape of the data distribution, as described
below.

COMPARISON LEFT-SKEWED SYMMERTIC RIGHT-SKEWED

DISRIBUTION DISTRIBUTION DISTRIBUTION
1 Distance from Xsmallest Distance from Xsmallest The two distances Distance from
to median versus to median is greater are the same Xsmallest to median is
distance from median than distance from less than distance
to Xlargest median to Xlargest from median to
Xlargest
2 Distance from Xsmallest Distance from Xsmallest The two distances Distance from
to Q1 versus distance to Q1 is greater than are the same Xsmallest to Q1 is less
from Q3 to Xlargest distance from Q3 to than distance from Q3
Xlargest to Xlargest
3 Distance from Q1 to Distance from Q1 to The two distances Distance from Q1 to
median versus median is greater than are the same median is less than
distance from median distance from median distance from median
to Q3 to Q3 to Q3

In the previous exercise,

Comparison 1 [Entire range of data set]

[Median - Xsmallest] = [4019 – 608] = 3411
[Xlargest – Median] = [14138 – 4019] = 10119
This is right-skewed.

Comparison 2 [Tails]
[Q1 - Xsmallest] = [1861 – 608] = 1253
[Xlargest – Q3] = [14138 – 8356.50] = 5781.50
This is right-skewed.

Comparison 3 [Body]
[Median – Q1] = [4019 – 1861] = 2158
[Q3 – Median] = [8356.50 – 4019] = 4337.50
This is right-skewed.

Thus, from the five-number summary, we can judge the shape of the data set: is it symmetric or
skewed to the left or right.

Page | 2
IDENTIFYING OUTLIERS
An objective method of identifying outliers in a data set is to compute threshold values on either side
of the median that are called ‘limits’. Data values that lie outside these two limits (Lower Limit and
Upper Limit) are identified as outliers.

We use the following formulae to compute the limits:

Lower Limit = Q1 – 1.5 x [Interquartile Range] = Q1 – 1.5 x [Q3 – Q1]
Upper Limit = Q3 + 1.5 x [Interquartile Range] = Q3 + 1.5 x [Q3 – Q1]

In our example,
Lower Limit = 1861 – 1.5 x [8356.50 – 1861] = - 7882.25
Upper Limit = 8356.50 + 1.5 x [8356.50 – 1861] = + 18099.75

Since none of the data points lies outside this range, we conclude that the data set does not contain
outliers on either side of the median.

THE BOXPLOT
A boxplot provides a graphical representation of the data based on the five-number summary. It
looks like this.

The ‘box’ refers to the interquartile range (Q3 – Q1), which also contains the median Q2. The lines
on either side of the box are referred to as ‘whiskers’. The ‘box and whiskers plot’ may be drawn in
three different ways:

1 Until the extreme values (minimum and maximum) in the data set
2 Until the lower and upper limits as computed in the previous section (as shown above)
3 Until the extreme values (minimum and maximum) in the data set that lie within the lower and
upper limits computed in the previous section (as shown in the next page).

Outliers are simply data points that lie outside the lower and upper limits in the plot.

Page | 3
The shape of the data set can be determined from the box and whiskers plot, as discussed on page
2. Applying Comparison 2 from the table on page 2 to the plot below, we can infer that the data set
is skewed to the right.

Page | 4
EXERCISE 1 ON QUARTILES, OUTLIERS & BOX-AND-WHISKERS PLOT

Naples, Florida, hosts a marathon in January each year. The event attracts top runners
from across the United States. In January last year, 22 men and 31 women entered the
19-24 age class.
Finish time in minutes were recorded as shown in the table below (in order of finish).

FINISH MEN WOMEN

1 65.30 109.03
2 66.27 111.22
3 66.52 111.65
4 66.85 111.93
5 70.87 114.38
6 87.18 118.33
7 96.45 121.25
8 98.52 122.08
9 100.52 122.48
10 108.18 122.62
11 109.05 123.88
12 110.23 125.78
13 112.90 129.52
14 113.52 129.87
15 120.95 130.72
16 127.98 131.67
17 128.40 132.03
18 130.90 133.20
19 131.80 133.50
20 138.63 136.57
21 143.83 136.75
22 148.70 138.20
23 139.00
24 147.18
25 147.35
26 147.50
27 147.75
28 153.88
29 154.83
30 189.27
31 189.28

Required of you:
1 Prepare the five-number summary for men and women separately.
2 Draw the box-and-whiskers plots for men and women separately on a single graph
sheet. The end of the whiskers can be taken as the lower and upper limits for
identifying outliers.
3 Determine whether the two data sets (men and women) are symmetric or skewed.
4 Identify outliers, if any, in the two data sets.
Page | 5
EXERCISE 2 ON QUARTILES, OUTLIERS & BOX-AND-WHISKERS PLOT

ASSETS UNDER MANAGEMENT OF TOP 25 ASSET MANAGERS GLOBALLY

The world’s largest money managers ranked by total assets under management [AUM]
in billions of US$ as on 31st December 2020.

RANK COPANY COUNTRY TOTAL AUM [$ in bn]

1 Blackrock US 7,318
2 Vanguard Group US 6,100
3 UBS Group Switzerland 3,518
4 Fidelity Investments US 3,319
5 State Street Global Advisors US 3,054
6 Allianz Group Germany 2,530
7 JP Morgan UA 2,511
8 Goldman Sachs US 2,057
9 Bank of New York Mellon US 1,961
10 PIMCO US 1,920
11 Morgan Stanley US 1,901
12 Amundi France 1,791
13 Capital Group US 1,700
14 Prudential Financial US 1,605
15 Credit Suisse Switzerland 1,521
16 Franklin Resources US 1,428
17 Deutsche Bank Germany 1,368
18 Northern Trust US 1,258
19 Legal & General Trust UK 1,232
20 BNP Paribas France 1,221
21 Bank of America US 1,220
22 T. Rowe Price US 1,218
23 Invesco Limited US 1,145
24 TIAA US 1,143
25 Wellington Management Co. US 1,100

Required of you:
1 Prepare the five-number summary for this data set.
2 Draw the box-and-whiskers plots for the data set.
3 Determine whether the data set is symmetric or skewed.
4 Identify outliers, if any, in the data set.

Page | 6

Understanding Business Statistics Basics
No ratings yet
Understanding Business Statistics Basics
83 pages
Statistics for Scientists
No ratings yet
Statistics for Scientists
35 pages
Probability and Statistics Overview
No ratings yet
Probability and Statistics Overview
127 pages
Understanding Test Scores and Statistics
No ratings yet
Understanding Test Scores and Statistics
39 pages
Unit 2 DS PDF
No ratings yet
Unit 2 DS PDF
97 pages
Understanding Variables in Statistics
No ratings yet
Understanding Variables in Statistics
16 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Understanding Histograms and Data Distribution
No ratings yet
Understanding Histograms and Data Distribution
52 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
11 pages
Statistics Referesher
No ratings yet
Statistics Referesher
30 pages
Understanding Levels of Measurement in Statistics
No ratings yet
Understanding Levels of Measurement in Statistics
38 pages
Statistical Foundations for Data Analysis
No ratings yet
Statistical Foundations for Data Analysis
108 pages
Interpretation of Assessment Results: Graphical Presentation & Quantitative Analysis
No ratings yet
Interpretation of Assessment Results: Graphical Presentation & Quantitative Analysis
33 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
73 pages
Statistics for Beginners
No ratings yet
Statistics for Beginners
26 pages
Introduction to Data Analysis Basics
No ratings yet
Introduction to Data Analysis Basics
48 pages
Introduction to Basic Statistics Concepts
No ratings yet
Introduction to Basic Statistics Concepts
41 pages
Measures of Central Tendancy
No ratings yet
Measures of Central Tendancy
24 pages
Quantitative Methods in Finance Overview
No ratings yet
Quantitative Methods in Finance Overview
45 pages
10th Grade Statistics Overview
No ratings yet
10th Grade Statistics Overview
24 pages
PSM 2020N
No ratings yet
PSM 2020N
399 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
74 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
29 pages
Statistics Concepts and Computations Guide
No ratings yet
Statistics Concepts and Computations Guide
9 pages
DATA MANAGEMENT STATISTICSMeasures of Central Tendency Measures
No ratings yet
DATA MANAGEMENT STATISTICSMeasures of Central Tendency Measures
60 pages
Mathematics As A Tool
No ratings yet
Mathematics As A Tool
2 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
41 pages
Central Tendencies
No ratings yet
Central Tendencies
44 pages
Comprehensive Statistics Guide
No ratings yet
Comprehensive Statistics Guide
81 pages
Nursing Statistics and Data Analysis Guide
No ratings yet
Nursing Statistics and Data Analysis Guide
214 pages
Introduction to Statistics: Descriptive & Inferential
No ratings yet
Introduction to Statistics: Descriptive & Inferential
23 pages
Understanding Data and Statistics Basics
No ratings yet
Understanding Data and Statistics Basics
8 pages
Lesson 1-05 Measuring Central Tendency STAT
No ratings yet
Lesson 1-05 Measuring Central Tendency STAT
12 pages
STATISTICS
No ratings yet
STATISTICS
98 pages
Comprehensive Guide to Statistics Basics
No ratings yet
Comprehensive Guide to Statistics Basics
68 pages
Math Project
No ratings yet
Math Project
21 pages
Measures of Central Tendency Explained
No ratings yet
Measures of Central Tendency Explained
105 pages
Introduction To Data Analytics: ITE 5201 Lecture5-Data Visualization-2
No ratings yet
Introduction To Data Analytics: ITE 5201 Lecture5-Data Visualization-2
77 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
7 pages
Importance and Types of Statistics
No ratings yet
Importance and Types of Statistics
17 pages
Basic Statistical Concepts Overview
No ratings yet
Basic Statistical Concepts Overview
10 pages
AOL 1 Chapter Chapter 7 Part 1
No ratings yet
AOL 1 Chapter Chapter 7 Part 1
10 pages
Data Analytics: Statistical Methods Overview
No ratings yet
Data Analytics: Statistical Methods Overview
38 pages
Analysis of Data - Unit III (New)
No ratings yet
Analysis of Data - Unit III (New)
90 pages
Understanding Statistics and Its Applications
No ratings yet
Understanding Statistics and Its Applications
11 pages
Types and Characteristics of Statistical Data
No ratings yet
Types and Characteristics of Statistical Data
6 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
16 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
29 pages
Unit 4 & 5 8614
No ratings yet
Unit 4 & 5 8614
58 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
36 pages
Introduction to Basic Statistics Concepts
No ratings yet
Introduction to Basic Statistics Concepts
39 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
32 pages
4.12 Measure of Central Tendency: The Mean
No ratings yet
4.12 Measure of Central Tendency: The Mean
4 pages
Statistical Tools and Techniques: College-Level Notes
No ratings yet
Statistical Tools and Techniques: College-Level Notes
14 pages
Unit 5 8614
No ratings yet
Unit 5 8614
39 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
32 pages
Statistics Overview and Key Concepts
No ratings yet
Statistics Overview and Key Concepts
24 pages
Soil Testing for Road Upgrade
No ratings yet
Soil Testing for Road Upgrade
9 pages
Scilympics 2019: Mega Quiz & Activities
No ratings yet
Scilympics 2019: Mega Quiz & Activities
10 pages
Advantages vs. Disadvantages Essay Template
No ratings yet
Advantages vs. Disadvantages Essay Template
2 pages
Solution Manual For MIS Essentials 4th Edition by Kroenke ISBN '0133546594 9780133546590
No ratings yet
Solution Manual For MIS Essentials 4th Edition by Kroenke ISBN '0133546594 9780133546590
5 pages
Neuwaldegg Summer Seminar 2010
No ratings yet
Neuwaldegg Summer Seminar 2010
4 pages
Sulbasutras: Geometry and Construction
No ratings yet
Sulbasutras: Geometry and Construction
35 pages
Optical Comm Systems Course
No ratings yet
Optical Comm Systems Course
6 pages
Fractions Review for General Education
No ratings yet
Fractions Review for General Education
25 pages
19 Analysis of Deformation and Pile Group Dimensioning 1
No ratings yet
19 Analysis of Deformation and Pile Group Dimensioning 1
9 pages
5 7+classnotes
No ratings yet
5 7+classnotes
6 pages
Vocabulary and Conversation Practice Guide
No ratings yet
Vocabulary and Conversation Practice Guide
5 pages
Unit 12 PPT Organizational Behaviour
No ratings yet
Unit 12 PPT Organizational Behaviour
16 pages
Impossible Paths in Langton's Ant
100% (1)
Impossible Paths in Langton's Ant
19 pages
TSPSC Group I Prelims 4100 PYQs Key
No ratings yet
TSPSC Group I Prelims 4100 PYQs Key
436 pages
Algorithms For Vlsi: Partitioning: Problem Formulation
No ratings yet
Algorithms For Vlsi: Partitioning: Problem Formulation
3 pages
Allama Iqbal Open University, Islamabad: (Department of Business Administration)
No ratings yet
Allama Iqbal Open University, Islamabad: (Department of Business Administration)
7 pages
003 Yogoda Lessons 15-21
No ratings yet
003 Yogoda Lessons 15-21
47 pages
Definition and Terms of Events
No ratings yet
Definition and Terms of Events
13 pages
PSC and GI - Dec 2025-26 Examination
No ratings yet
PSC and GI - Dec 2025-26 Examination
35 pages
Nemo File Format V2.14
No ratings yet
Nemo File Format V2.14
545 pages
India's Leading Campus Hiring Platform
No ratings yet
India's Leading Campus Hiring Platform
14 pages
Masonry NC II (Superseded)
No ratings yet
Masonry NC II (Superseded)
67 pages
Mass Effect: The K-Pop Space Diva Edition
No ratings yet
Mass Effect: The K-Pop Space Diva Edition
25 pages
Android Emulator Setup & Usage Guide
No ratings yet
Android Emulator Setup & Usage Guide
19 pages
Concept Design Report Mahanakhon Building PDF
78% (9)
Concept Design Report Mahanakhon Building PDF
85 pages
Duterte's Foreign Policy Analysis
No ratings yet
Duterte's Foreign Policy Analysis
6 pages
Power of Silence After Breakup Tips
25% (4)
Power of Silence After Breakup Tips
18 pages
Patterns and Fibonacci in Nature
No ratings yet
Patterns and Fibonacci in Nature
37 pages

Applied Statistics for Decision-Making

Uploaded by

Applied Statistics for Decision-Making

Uploaded by

APPLIED STATISTICS FOR DECISION-MAKING

The fascinating world of numbers…

DATA DISTRIBUTION OR DISTRIBUTION

STATISTIC (term starts with ‘S’)

Measures of central tendency

THE MODE TELLS US WHAT IS MOST TYPICAL.

Example of grades in a class:

The concept of ‘Central Tendency’?

It usually represents the “average”.

It represents the “middlemost” point of the

If n is odd, the median is the (n + 1)/2th score from

If n is even, the median is the midway point

A data distribution may have multiple Modes,

For Symmetrical Distributions,

Significant tail on left Equal tails Significant tail on right

Negatively or Left Skewed Distribution:

Positively or Right Skewed Distribution:

Mean & Median

The driving school issues grades from A (highest) to F (lowest)

DEGREES OF VARIATION ZERO VARIATION

POPULATION [Greek] SAMPLE [English]

Standard Deviation : σ Standard Deviation : S

For data relating to a population,

For data relating to a sample,

Xi is the value of each data point in the data set.

In other words, how many variables does one need

To locate a point in space, df = 3

To compute the standard deviation of a

For data relating to a population,

For data relating to a sample,

2 Two or more Standard Deviations cannot be added

Variance and Standard Deviation : Absolute measures of data dispersion

Example: Standard deviation of scores in two exams

Mean of scores in the two exams

Coefficient of Variation of scores in the two exams

RELATIVE MEASURES ARE USUALLY MORE

Coefficient of Variation = 0.16%

MEANS AND STANDARD DEVIATIONS

Combined Mean (weighted average)

Combined Mean = [(N1 * µ1) + (N2 * µ2)]

N1 * [σ12 + (µ1 - µ)2] + N2 * [σ22 + (µ2 - µ)2]

Combined Mean (weighted average)

Combined Mean = [(n1 * X1bar) + (n2 * X2bar)]

n1 * [s12 + (X1bar - Xbar)2] + n2 * [s22 + (X2bar - Xbar)2]

COMBINED STANDARD DEVIATION:

QUALITATIVE DATA QUANTITATIVE DATA

Continuous data should be measured (not counted). To specify the

Examples: Measurement of weight, height, temperature, purity

Discrete data should be counted (not measured). To specify the value of

Examples: Measurement of students present in class, number of

Compute the range and standard deviation of the sample.

Do you find the computed values strange?

RANK COUNTRY HEALTHY YEARS IN LIFE

Comment on the data presented above.

First Quartile, Q1 (25th percentile)

Second Quartile, Q2 (50th percentile) – also the median

Third Quartile, Q3 (75th percentile)

INTERQUARTILE RANGE (MIDSPREAD)

8408 1374 1872 8879 2459 11413 608

Median or Q2 is the 11th value: 4019

In the previous example, the five numbers (in sequence) are:

COMPARISON LEFT-SKEWED SYMMERTIC RIGHT-SKEWED

In the previous exercise,

Comparison 1 [Entire range of data set]

We use the following formulae to compute the limits:

FINISH MEN WOMEN

ASSETS UNDER MANAGEMENT OF TOP 25 ASSET MANAGERS GLOBALLY

RANK COPANY COUNTRY TOTAL AUM [$ in bn]

You might also like