Lecture 1 (10-9-22)

Management and presentation of data and Measures of
Central Tendency & Dispersion
Course: Quantitative Analysis for Decision Making

STA 411 WE1- 15527
Dr. Abdur Rasheed

Assistant Professor
School of Public Health
Dow University of Health Sciences
Management and Presentation of data
and Measures of Central Tendency &
Dispersion
What is Statistics
The science of collecting, monitoring, analyzing,

summarizing and interpreting the data
Business Statistics
business statistics can be described as the collection,

summarization, analysis, and reporting of numerical
findings relevant to a business decision or situation.
: An entire collection of people, animal,
things, objects or measurements for the required study
is called a population.
• It is the entire group we are interested, which we wish to
describe or draw conclusion.
The population can be a group of business

transactions, companies, customers, anything we can
measure and want to know about.
: A representative part of a population
selected from the population is called the sample. By
studying a sample it is hoped to draw a valid
conclusion about a larger population.
Population: Group of business transactions

of whole months.
• Sample: All transactions of a particular day might be a
sample.
Individual units of the population and
sample. For example
A transaction
A company
A customer
Populations and Samples
• Studying populations is too expensive and time -
consuming, and thus impractical
• If a sample is representative of the population, then by

observing the sample we can learn something about the
population
– And thus by looking at the characteristics of the sample we may

learn something about the characteristics of the population
The “Universe” and the “Sample”
The Sample
(a
The Universe representative
(we can never part of the
really understand Participant Selection universe, it is
what is going on nice and small,
here, it is just too and we can
big) understand
this)
Statistics
The mathematical description
inference
of the sample Analysis
Types of data
Constant Variables
Variable : A variable is that factor whose
values changes from time to time, place to place
or individual to individual.
Variables are
things that
vary and change
Variable
• Age of study participants
• Height of students
• No of Customers in a company
• No of daily transactions
• No of invoices
• Hair color
• Exam result
For example, suppose a company is being audited for invoice errors.
Instead of examining all 15,472 invoices produced by the company
during a given year, an auditor may select and examine just 100
invoices. If he is interested in the "invoice error status," he would
record (measure) the status (error or no error) of each sampled
invoice.
Population: All 15,472 invoices
Sample: Group of 100 invoices
Variable: Invoice error status
Element: Each invoice

Generally there are 2 variables:
▪ An independent (or input) variable

▪ An dependent (or outcome) variable
Let’s look at each type….

Variables that affect sales include customer demographics, store location and
weather.
Customer demographics include age, occupation, family status, income level and
gender.
A store located in a densely populated metropolitan area may have higher sales
than a store in a sparsely populated rural area.
Similarly, customers may go shopping when the weather is pleasant, but few would
venture outside in stormy or snowy weather.
Some variables have a circular relationship with sales. For example, sales
depend on advertising, but the level of advertising expenses also depends
on sales.
Independent variables: Demographics, Store location and weather.

Dependent variable: Sale
Types of Variables
Qualitative vs. Quantitative Variables
Quantitative: Values indicate a quantity or amount and
can be expressed numerically. Values can be arranged
according to magnitude
Examples:
• No of transactions
• No of employees in a company
• No of customers to a company
• No of years of education of the people
• No of Daily invoices
• Hight of student
• Age of participants
Types of Variables
Qualitative vs. Quantitative Variables
Qualitative: Simply labels to distinguish one group

from another.
Examples:
• Sex
• Educational Level
• Occupation
• Religion
• Blood group
Note: Numerical representation only for coding / labeling

and not for comparison
Qualitative and Quantitative variables
Qualitative variables are sometimes called categorical
If five-year old participant were asked to name their

favorite color, then the variable would be qualitative. If
the time it took them to respond were measured, then
the variable would be quantitative.
Qualitative and quantitative variables may be
further subdivided:
Nominal
Qualitative
Ordinal
Variable
Discrete
Quantitative
Continuous
Quantitative: Discrete variables
➢ Assumes only discrete values(integral value of in whole

numbers).
➢ There is definite gap between two values.
➢ The data which are describe by a discrete variable are called

discrete data
Examples:
Number of children in a family,
Number of rooms in a school etc.
No of customers to a company
No of years of education of the people
No of Daily invoices
Quantitative: continuous variables
➢ Assumes any value (numerical or fractional)

➢ When we round the values of continuous variable
to a certain decimal place, these variables become
discrete variables
Example
Height of a person up to a fraction of an inch,
weight up to grams etc.
Age up to months
Temperature
Qualitative: Nominal variables
As the name implies it consist of “naming” or

classifies into various mutually exclusive categories.
Example
Eye color
Male-Female
Sick-well
Married-Single-Divorced
Blood group
Qualitative: Ordinal variables
A qualitative variable that incorporates an ordered

position, or ranking.
Example
Students grades (A+, A B+ ,B….)
Socio economic status (Low, Middle, High)
Education level (Low ,High)
Olympic medals (Gold, Silver, Bronze)
Shoe quality (Ordinary, good, best)
Identify: quantitative (discrete or continuous) qualitative
(nominal or ordinal)
• Blood group
• Number of student
• Students grades
• Height of students
• No of flights
• Computerized National Identity Card No.
• Smoking status
• Gender
• Salary
• Race
• No of Transactions
 (Greek letter Sigma) short hand for addition
If Xi is the ith observation

Then: X1 + X2 + X3
3
Is the same as:  Xi
i= 1
Measure Sample Population
Mean X 
Stand. Dev. S 
Variance S2 2
Size n N
Numerical Data
Properties
Central
Dispersion
Tendency
Mean Range
Variance
Median
Mode Standard Deviation
In most of the cases the data have a tendency to gather
around middle observed value
In other words, some central value gives a special
characteristics of the data.
The phenomena is referred as central tendency.

The statistics we calculate for this purpose are termed as
measure of central tendency also known as measure of
location.
A measure of central tendency is a single value that
describes the way in which a group of data cluster around a
central value. To put in other words, it is a way to describe
the center of a data set
Central tendency It lets us know what is normal or
'average' for a set of data. It also condenses the data set
down to one representative value, which is useful when you
are working with large amounts of data.
Central tendency also allows you to compare one data set

to another. For example, let's say you have a sample of
girls and a sample of boys, and you are interested in
comparing their heights. By calculating the average height
for each sample, you could easily draw comparisons
between the girls and boys.
Central tendency is also useful when you want to compare one piece
of data to the entire data set. Let's say you received a 60% on your
last quiz, which is usually in the C range.
You go around and talk to your classmates and find out that the average
score on the quiz was 43%.
In this instance, your score was significantly higher than those of your
classmates. Since your teacher grades on a curve, your 60% becomes an
A.
• 1. Measure of Central Tendency
• 2. Most Common Measure
• 3. Acts as ‘Balance Point’
• 4. Affected by Extreme Values (‘Outliers’)
• 5. Formula (Sample Mean)
Following weights (in kg) represent the data of 10 students of a
randomly selected class of IoBM
X = 45 48 52 54 51 59 60 58 57 48
n= number of data observation

Here n = 10
Mean
Mean weight of 10 students is 53.2 kg

X = 82 48 52 54 51 59 60 58 88 48
Mean = 60
Mean
Mean weight of 10 students is 60 kg

Hence mean is affected by
extreme values
• 2. Middle Value In Ordered Sequence
– If Odd n, Middle Value of Sequence
Position of Median in Sequence
– If Even n, Average of 2 Middle Values

Position of Median in Sequence
• 4. Not Affected by Extreme Values

X = 45 48 52 54 51 59 60 58 57 48
Here n = 10
Median
Ordered values
X = 45 48 48 51 52 54 57 58 59 60
No of observation(10) is even
(n/2)th term =5th term = 52
and (n/2+1) th term = 6th term= 54
Median is = (52+ 54) / 2 = 53kg
X = 82 48 52 54 51 59 60 58 88 48
Here n = 10
Median
Ordered values
X = 48 48 51 52 54 58 59 60 82 88
No of observation( 10) is even
(n/2)th term = 5th term = 54
and (n/2+1) th term= 6th term = 58
Median is = (54+ 58) / 2 = 56 kg

Hence median is less
affected by extreme values
as compared to mean
• 2. Value That Occurs Most Often
• 3. Not Affected by Extreme Values
• 4. May Be No Mode or Several Modes
• 5. May Be Used for Numerical & Categorical
Data
X = 45 48 52 54 51 59 60 58 57 48
Mode
Most repetitive value is 48
( 48 repeat twice)
Hence mode is 48kg
X = 82 48 52 54 51 59 60 58 88 48
Mode
Most repetitive value is 48
(48 repeat twice)
Hence mode is 48kg
• The dispersion of data reveals how the observations
are spread out or scattered on each side of the
center.
• To measure the dispersion, scatter, or variation of a
data is as important as to locate the central
tendency.
• If the dispersion is small, it indicates high uniformity
of the observations in the data set.
• Absence of dispersion in the data indicates perfect
uniformity. This situation arises when all
observations in the data are identical.
7 7 7 8
3 2
7 77 7 77
7 8 13
7 6
9
Mean = 7 Mean = 7 Mean = 7

Consider the number of customers at two different outlets
on a particular day
Outlet A: Number of customers

68 68 69 70 71 71 71 72 73 74 74
Outlet B: Number of customers

64 64 70 70 71 71 71 74 75 75 76
Data set A Data set B
Mean = 71 Mean = 71
Median = 71 Median = 71
Mode = 71 Mode = 71
The three measure of central tendency have the same
value 71. Nonetheless is clear that two data sets are
quite different.
In particular there is much more variation (dispersion)

in the heights of the patients treated by two different
physiotherapist.
To describe the difference in variation quantitatively, we

need a measure of dispersion
The range is defined as the difference between the
largest score in the set of data and the smallest score
in the set of data, XL - XS

X = 45 48 52 54 51 59 60 58 57 48
Range = 60 – 45 = 15
The range is used when
you have ordinal data or
you are presenting your results to people with little or
no knowledge of statistics
The range is rarely used in scientific work as it is
fairly insensitive
It depends on only two values in the set of data, XL and XS.
Two very different sets of data can have the same range:
1 1 1 1 9 vs 1 3 5 7 9
If x1, x2, x3,…, xn are n values of variable x then
the variance can be defined as
First subtract the mean from each of the observation
This difference is called a deviation observation
The deviate tells us how far a given score is from the
Mean.
Square the deviates
Variance is the mean of the squared deviations.
Observations Deviates Squared deviates
45 (45-53.2) = -8.2 (-8.2)2 = 67.24
48 (48-53.2) = -5.2 (-5.2)2 = 27.04
52 (52-53.2) = -1.2 (-1.2)2= 1.44
54 (54-53.2) = 0.8 (0.8)2= .64
51 (51-53.2) = -2.2 (-2.2)2= 4.84
59 (59-53.2) = 5.8 (5.8)2= 33.64
60 (60-53.2) = 6.8 (6.8)2= 46.24
58 (58-53.2) = 4.8 (4.8)2= 23.04
57 (57-53.2) = 3.8 (3.8)2= 14.44
48 (48-53.2) = -5.2 (-5.2)2= 27.04
Mean= 53.2 Sum = 0 Mean(squared deviation) = variance =
24.56 kg2
Large variance means that observations are
far away from mean value and small variance
means that observations are near to mean
value.
When the deviations are squared in variance, their unit of
measure is squared as well
E.g. If people’s weights are measured in pounds, then the
variance of the weights would be expressed in pounds2 (or
squared pounds)
Since squared units of measure are often awkward to deal
with, the square root of variance is often used instead
The standard deviation is the square root of variance
X = 45 48 52 54 51 59 60 58 57 48
Coefficient of Variation
The coefficient of variation (CV) represents the ratio of the standard
deviation to the mean, and it is a useful statistic for comparing the
degree of variation from one data series to another.
When two data sets are of different units,
When two data sets have same unit but same standard deviations as
well,
So CV can be computed, in order to compare variability of two data

sets, for above mentioned conditions.
S= 4.95
Mean= 53.22
𝐶V= (4.95/53.22 ) X 100

CV= 9.31
Consider the number of customers at two different outlets
on a particular day
Outlet A: Number of customers

68 68 69 70 71 71 71 72 73 74 74
Outlet B: Number of customers

64 64 70 70 71 71 71 74 75 75 76
Data set A Data set B
Mean = 71 Mean = 71
Median = 71 Median = 71
Mode = 71 Mode = 71
Now calculate the coefficient of variances of both data sets A and B
Box and Wishker Plot
A box and whisker plot—also called a box plot—displays the five-number

summary of a set of data. The five-number summary is the minimum, first
quartile, median, third quartile, and maximum.
• The term skewness refers to the lack of symmetry.
The lack of symmetry in a distribution is always
determined with reference to a normal or Gaussian
distribution. Note that a normal distribution is
always symmetrical.
• The skewness may be either positive or negative.

When the skewness of a distribution is positive
(negative), the distribution is called a positively
(negatively) skewed distribution. Absence of
skewness makes a distribution symmetrical.
Skewness
• In a perfect symmetric
distribution, the tails on either
side of the curve are exact
mirror images of each other.
• When the tail on the curve's

right-hand side is longer, then
the type of distribution is called
positive or right skewed.
• When the tail on the curve's

left-hand side is longer, then the
type of distribution is called
negative or left skewed.
Skewness
Kurtosis
• When a distribution is peaked in the same
way as any normal distribution, is said to
be mesokurtic (normal). The peak of this
distribution is neither high nor low.
• Leptokurtic distributions are identified by

peaks that are thin and tall. The tails of
these distributions, are thick and heavy.
• Platykurtic distributions are characterized

by a certain flatness to the peak, and have
slender tails.
Kurtosis
If (-1 ≤ Kurtosis ≤ 1)- Mesokurtic (normal) distribution
If (Kurtosis > 1)- Leptokurtic distribution
If (Kurtosis < -1)- Platykurtic distribution

For quantitative data
if the data is symmetrical or nearby then, mean, median and mode
will be close to each other, however in this case mean is the value
of the central tendency that is usually reported.
If the data is positively or negatively skewed than mean may

overestimate or underestimate the true central tendency,
therefore of the distribution, in this case median is suitable
measure of central tendency.
For qualitative data
If the data being analyzed is qualitative, then only measure of
central tendency that can be reported is the mode.
For quantitative data
if the data is symmetrical , then the measure of variability usually
reported are variance or standard deviation, although standard
deviation would be more interpretable.
If the data is positively or negatively skewed then the measure of

variability that would be appropriate for the data would be range.
For qualitative data

If the data being analyzed is qualitative, then there is no measure
of dispersion to report.

Lecture 1 (10-9-22)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 1 (10-9-22)

Uploaded by

Copyright:

Available Formats

Management and presentation of data and Measures of

Central Tendency & Dispersion

Course: Quantitative Analysis for Decision Making

Dr. Abdur Rasheed

The science of collecting, monitoring, analyzing,

business statistics can be described as the collection,

The population can be a group of business

Population: Group of business transactions

• If a sample is representative of the population, then by

– And thus by looking at the characteristics of the sample we may

Population: All 15,472 invoices

Sample: Group of 100 invoices

Variable: Invoice error status

Element: Each invoice

▪ An independent (or input) variable

Let’s look at each type….

Independent variables: Demographics, Store location and weather.

Qualitative: Simply labels to distinguish one group

Note: Numerical representation only for coding / labeling

Qualitative variables are sometimes called categorical

If five-year old participant were asked to name their

➢ Assumes only discrete values(integral value of in whole

➢ There is definite gap between two values.

➢ The data which are describe by a discrete variable are called

➢ Assumes any value (numerical or fractional)

As the name implies it consist of “naming” or

A qualitative variable that incorporates an ordered

If Xi is the ith observation

The phenomena is referred as central tendency.

Central tendency also allows you to compare one data set

n= number of data observation

Mean weight of 10 students is 53.2 kg

Mean weight of 10 students is 60 kg

– If Even n, Average of 2 Middle Values

• 4. Not Affected by Extreme Values

Median is = (54+ 58) / 2 = 56 kg

Mean = 7 Mean = 7 Mean = 7

Outlet A: Number of customers

Outlet B: Number of customers

Data set A Data set B

In particular there is much more variation (dispersion)

To describe the difference in variation quantitatively, we

Following weights (in kg) represent the data of 10 students of a

When two data sets are of different units,

So CV can be computed, in order to compare variability of two data

𝐶V= (4.95/53.22 ) X 100

Outlet A: Number of customers

Outlet B: Number of customers

Data set A Data set B

A box and whisker plot—also called a box plot—displays the five-number

• The skewness may be either positive or negative.

• When the tail on the curve's

• When the tail on the curve's

• Leptokurtic distributions are identified by

• Platykurtic distributions are characterized

If (-1 ≤ Kurtosis ≤ 1)- Mesokurtic (normal) distribution

If (Kurtosis > 1)- Leptokurtic distribution

If (Kurtosis < -1)- Platykurtic distribution

If the data is positively or negatively skewed than mean may

If the data is positively or negatively skewed then the measure of

For qualitative data

You might also like