You are on page 1of 72

Management and presentation of data and Measures of

Central Tendency & Dispersion

Course: Quantitative Analysis for Decision Making


STA 411 WE1- 15527

Dr. Abdur Rasheed


Assistant Professor
School of Public Health
Dow University of Health Sciences
Management and Presentation of data
and Measures of Central Tendency &
Dispersion
What is Statistics

The science of collecting, monitoring, analyzing,


summarizing and interpreting the data
Business Statistics

business statistics can be described as the collection,


summarization, analysis, and reporting of numerical
findings relevant to a business decision or situation.
: An entire collection of people, animal,
things, objects or measurements for the required study
is called a population.
• It is the entire group we are interested, which we wish to
describe or draw conclusion.

The population can be a group of business


transactions, companies, customers, anything we can
measure and want to know about.
: A representative part of a population
selected from the population is called the sample. By
studying a sample it is hoped to draw a valid
conclusion about a larger population.

Population: Group of business transactions


of whole months.
• Sample: All transactions of a particular day might be a
sample.
Individual units of the population and
sample. For example
A transaction
A company
A customer
Populations and Samples
• Studying populations is too expensive and time -
consuming, and thus impractical

• If a sample is representative of the population, then by


observing the sample we can learn something about the
population

– And thus by looking at the characteristics of the sample we may


learn something about the characteristics of the population
The “Universe” and the “Sample”

The Sample
(a
The Universe representative
(we can never part of the
really understand Participant Selection universe, it is
what is going on nice and small,
here, it is just too and we can
big) understand
this)

Statistics
The mathematical description
inference
of the sample Analysis
Types of data
Constant Variables
Variable : A variable is that factor whose
values changes from time to time, place to place
or individual to individual.
Variables are
things that
vary and change
Variable
• Age of study participants

• Height of students

• No of Customers in a company

• No of daily transactions

• No of invoices

• Hair color

• Exam result
For example, suppose a company is being audited for invoice errors.
Instead of examining all 15,472 invoices produced by the company
during a given year, an auditor may select and examine just 100
invoices. If he is interested in the "invoice error status," he would
record (measure) the status (error or no error) of each sampled
invoice.

Population: All 15,472 invoices

Sample: Group of 100 invoices

Variable: Invoice error status

Element: Each invoice


Generally there are 2 variables:

▪ An independent (or input) variable


▪ An dependent (or outcome) variable

Let’s look at each type….


Variables that affect sales include customer demographics, store location and
weather.
Customer demographics include age, occupation, family status, income level and
gender.

A store located in a densely populated metropolitan area may have higher sales
than a store in a sparsely populated rural area.
Similarly, customers may go shopping when the weather is pleasant, but few would
venture outside in stormy or snowy weather.

Some variables have a circular relationship with sales. For example, sales
depend on advertising, but the level of advertising expenses also depends
on sales.

Independent variables: Demographics, Store location and weather.


Dependent variable: Sale
Types of Variables
Qualitative vs. Quantitative Variables
Quantitative: Values indicate a quantity or amount and
can be expressed numerically. Values can be arranged
according to magnitude
Examples:
• No of transactions
• No of employees in a company
• No of customers to a company
• No of years of education of the people
• No of Daily invoices
• Hight of student
• Age of participants
Types of Variables
Qualitative vs. Quantitative Variables

Qualitative: Simply labels to distinguish one group


from another.
Examples:
• Sex
• Educational Level
• Occupation
• Religion
• Blood group

Note: Numerical representation only for coding / labeling


and not for comparison
Qualitative and Quantitative variables

Qualitative variables are sometimes called categorical

If five-year old participant were asked to name their


favorite color, then the variable would be qualitative. If
the time it took them to respond were measured, then
the variable would be quantitative.
Qualitative and quantitative variables may be
further subdivided:

Nominal
Qualitative
Ordinal
Variable
Discrete
Quantitative
Continuous
Quantitative: Discrete variables

➢ Assumes only discrete values(integral value of in whole


numbers).

➢ There is definite gap between two values.

➢ The data which are describe by a discrete variable are called


discrete data

Examples:
Number of children in a family,
Number of rooms in a school etc.
No of customers to a company
No of years of education of the people
No of Daily invoices
Quantitative: continuous variables

➢ Assumes any value (numerical or fractional)


➢ When we round the values of continuous variable
to a certain decimal place, these variables become
discrete variables

Example
Height of a person up to a fraction of an inch,
weight up to grams etc.
Age up to months
Temperature
Qualitative: Nominal variables

As the name implies it consist of “naming” or


classifies into various mutually exclusive categories.

Example
Eye color
Male-Female
Sick-well
Married-Single-Divorced
Blood group
Qualitative: Ordinal variables

A qualitative variable that incorporates an ordered


position, or ranking.

Example
Students grades (A+, A B+ ,B….)
Socio economic status (Low, Middle, High)
Education level (Low ,High)
Olympic medals (Gold, Silver, Bronze)
Shoe quality (Ordinary, good, best)
Identify: quantitative (discrete or continuous) qualitative
(nominal or ordinal)

• Blood group
• Number of student
• Students grades
• Height of students
• No of flights
• Computerized National Identity Card No.
• Smoking status
• Gender
• Salary
• Race
• No of Transactions
 (Greek letter Sigma) short hand for addition

If Xi is the ith observation


Then: X1 + X2 + X3
3
Is the same as:  Xi
i= 1
Measure Sample Population

Mean X 

Stand. Dev. S 

Variance S2 2

Size n N
Numerical Data
Properties

Central
Dispersion
Tendency
Mean Range
Variance
Median
Mode Standard Deviation
In most of the cases the data have a tendency to gather
around middle observed value
In other words, some central value gives a special
characteristics of the data.

The phenomena is referred as central tendency.


The statistics we calculate for this purpose are termed as
measure of central tendency also known as measure of
location.
A measure of central tendency is a single value that
describes the way in which a group of data cluster around a
central value. To put in other words, it is a way to describe
the center of a data set
Central tendency It lets us know what is normal or
'average' for a set of data. It also condenses the data set
down to one representative value, which is useful when you
are working with large amounts of data.

Central tendency also allows you to compare one data set


to another. For example, let's say you have a sample of
girls and a sample of boys, and you are interested in
comparing their heights. By calculating the average height
for each sample, you could easily draw comparisons
between the girls and boys.
Central tendency is also useful when you want to compare one piece
of data to the entire data set. Let's say you received a 60% on your
last quiz, which is usually in the C range.

You go around and talk to your classmates and find out that the average
score on the quiz was 43%.

In this instance, your score was significantly higher than those of your
classmates. Since your teacher grades on a curve, your 60% becomes an
A.
• 1. Measure of Central Tendency
• 2. Most Common Measure
• 3. Acts as ‘Balance Point’
• 4. Affected by Extreme Values (‘Outliers’)
• 5. Formula (Sample Mean)
Following weights (in kg) represent the data of 10 students of a
randomly selected class of IoBM
X = 45 48 52 54 51 59 60 58 57 48

n= number of data observation


Here n = 10

Mean

Mean weight of 10 students is 53.2 kg


Following weights (in kg) represent the data of 10 students of a
randomly selected class of IoBM
X = 82 48 52 54 51 59 60 58 88 48

Mean = 60

Mean

Mean weight of 10 students is 60 kg


Hence mean is affected by
extreme values
• 1. Measure of Central Tendency
• 2. Middle Value In Ordered Sequence
– If Odd n, Middle Value of Sequence
Position of Median in Sequence

– If Even n, Average of 2 Middle Values


Position of Median in Sequence

• 4. Not Affected by Extreme Values


Following weights (in kg) represent the data of 10 students of a
randomly selected class of IoBM
X = 45 48 52 54 51 59 60 58 57 48
n= number of data observation
Here n = 10
Median

Ordered values
X = 45 48 48 51 52 54 57 58 59 60
No of observation(10) is even
(n/2)th term =5th term = 52
and (n/2+1) th term = 6th term= 54
Median is = (52+ 54) / 2 = 53kg
Following weights (in kg) represent the data of 10 students of a
randomly selected class of IoBM
X = 82 48 52 54 51 59 60 58 88 48
n= number of data observation
Here n = 10
Median
Ordered values
X = 48 48 51 52 54 58 59 60 82 88
No of observation( 10) is even
(n/2)th term = 5th term = 54
and (n/2+1) th term= 6th term = 58

Median is = (54+ 58) / 2 = 56 kg


Hence median is less
affected by extreme values
as compared to mean
• 1. Measure of Central Tendency
• 2. Value That Occurs Most Often
• 3. Not Affected by Extreme Values
• 4. May Be No Mode or Several Modes
• 5. May Be Used for Numerical & Categorical
Data
Following weights (in kg) represent the data of 10 students of a
randomly selected class of IoBM
X = 45 48 52 54 51 59 60 58 57 48

Mode
Most repetitive value is 48
( 48 repeat twice)
Hence mode is 48kg
Following weights (in kg) represent the data of 10 students of a
randomly selected class of IoBM
X = 82 48 52 54 51 59 60 58 88 48

Mode
Most repetitive value is 48
(48 repeat twice)
Hence mode is 48kg
• The dispersion of data reveals how the observations
are spread out or scattered on each side of the
center.
• To measure the dispersion, scatter, or variation of a
data is as important as to locate the central
tendency.
• If the dispersion is small, it indicates high uniformity
of the observations in the data set.
• Absence of dispersion in the data indicates perfect
uniformity. This situation arises when all
observations in the data are identical.
7 7 7 8
3 2
7 77 7 77
7 8 13
7 6
9

Mean = 7 Mean = 7 Mean = 7


Consider the number of customers at two different outlets
on a particular day

Outlet A: Number of customers


68 68 69 70 71 71 71 72 73 74 74

Outlet B: Number of customers


64 64 70 70 71 71 71 74 75 75 76

Data set A Data set B

Mean = 71 Mean = 71
Median = 71 Median = 71
Mode = 71 Mode = 71
The three measure of central tendency have the same
value 71. Nonetheless is clear that two data sets are
quite different.

In particular there is much more variation (dispersion)


in the heights of the patients treated by two different
physiotherapist.

To describe the difference in variation quantitatively, we


need a measure of dispersion
The range is defined as the difference between the
largest score in the set of data and the smallest score
in the set of data, XL - XS

Following weights (in kg) represent the data of 10 students of a


randomly selected class of IoBM
X = 45 48 52 54 51 59 60 58 57 48

Range = 60 – 45 = 15
The range is used when
you have ordinal data or
you are presenting your results to people with little or
no knowledge of statistics
The range is rarely used in scientific work as it is
fairly insensitive
It depends on only two values in the set of data, XL and XS.

Two very different sets of data can have the same range:
1 1 1 1 9 vs 1 3 5 7 9
If x1, x2, x3,…, xn are n values of variable x then
the variance can be defined as
First subtract the mean from each of the observation
This difference is called a deviation observation
The deviate tells us how far a given score is from the
Mean.
Square the deviates
Variance is the mean of the squared deviations.
Observations Deviates Squared deviates
45 (45-53.2) = -8.2 (-8.2)2 = 67.24
48 (48-53.2) = -5.2 (-5.2)2 = 27.04
52 (52-53.2) = -1.2 (-1.2)2= 1.44
54 (54-53.2) = 0.8 (0.8)2= .64
51 (51-53.2) = -2.2 (-2.2)2= 4.84
59 (59-53.2) = 5.8 (5.8)2= 33.64
60 (60-53.2) = 6.8 (6.8)2= 46.24
58 (58-53.2) = 4.8 (4.8)2= 23.04
57 (57-53.2) = 3.8 (3.8)2= 14.44
48 (48-53.2) = -5.2 (-5.2)2= 27.04
Mean= 53.2 Sum = 0 Mean(squared deviation) = variance =
24.56 kg2
Large variance means that observations are
far away from mean value and small variance
means that observations are near to mean
value.
When the deviations are squared in variance, their unit of
measure is squared as well
E.g. If people’s weights are measured in pounds, then the
variance of the weights would be expressed in pounds2 (or
squared pounds)
Since squared units of measure are often awkward to deal
with, the square root of variance is often used instead
The standard deviation is the square root of variance
Following weights (in kg) represent the data of 10 students of a
randomly selected class of IoBM
X = 45 48 52 54 51 59 60 58 57 48
Coefficient of Variation
The coefficient of variation (CV) represents the ratio of the standard
deviation to the mean, and it is a useful statistic for comparing the
degree of variation from one data series to another.

When two data sets are of different units,

When two data sets have same unit but same standard deviations as
well,

So CV can be computed, in order to compare variability of two data


sets, for above mentioned conditions.
S= 4.95
Mean= 53.22

𝐶V= (4.95/53.22 ) X 100


CV= 9.31
Consider the number of customers at two different outlets
on a particular day

Outlet A: Number of customers


68 68 69 70 71 71 71 72 73 74 74

Outlet B: Number of customers


64 64 70 70 71 71 71 74 75 75 76

Data set A Data set B

Mean = 71 Mean = 71
Median = 71 Median = 71
Mode = 71 Mode = 71
Now calculate the coefficient of variances of both data sets A and B
Box and Wishker Plot

A box and whisker plot—also called a box plot—displays the five-number


summary of a set of data. The five-number summary is the minimum, first
quartile, median, third quartile, and maximum.
• The term skewness refers to the lack of symmetry.
The lack of symmetry in a distribution is always
determined with reference to a normal or Gaussian
distribution. Note that a normal distribution is
always symmetrical.

• The skewness may be either positive or negative.


When the skewness of a distribution is positive
(negative), the distribution is called a positively
(negatively) skewed distribution. Absence of
skewness makes a distribution symmetrical.
Skewness
• In a perfect symmetric
distribution, the tails on either
side of the curve are exact
mirror images of each other.

• When the tail on the curve's


right-hand side is longer, then
the type of distribution is called
positive or right skewed.

• When the tail on the curve's


left-hand side is longer, then the
type of distribution is called
negative or left skewed.
Skewness
Kurtosis
• When a distribution is peaked in the same
way as any normal distribution, is said to
be mesokurtic (normal). The peak of this
distribution is neither high nor low.

• Leptokurtic distributions are identified by


peaks that are thin and tall. The tails of
these distributions, are thick and heavy.

• Platykurtic distributions are characterized


by a certain flatness to the peak, and have
slender tails.
Kurtosis

If (-1 ≤ Kurtosis ≤ 1)- Mesokurtic (normal) distribution

If (Kurtosis > 1)- Leptokurtic distribution

If (Kurtosis < -1)- Platykurtic distribution


For quantitative data
if the data is symmetrical or nearby then, mean, median and mode
will be close to each other, however in this case mean is the value
of the central tendency that is usually reported.

If the data is positively or negatively skewed than mean may


overestimate or underestimate the true central tendency,
therefore of the distribution, in this case median is suitable
measure of central tendency.
For qualitative data
If the data being analyzed is qualitative, then only measure of
central tendency that can be reported is the mode.
For quantitative data
if the data is symmetrical , then the measure of variability usually
reported are variance or standard deviation, although standard
deviation would be more interpretable.

If the data is positively or negatively skewed then the measure of


variability that would be appropriate for the data would be range.

For qualitative data


If the data being analyzed is qualitative, then there is no measure
of dispersion to report.

You might also like