You are on page 1of 9

Describing Data

Numerical Measures
Measures of Location or Measures of Central Tendency
Lecture Two / 2022-2023

1-The Population Mean (µ)


Population Mean (µ): For raw data:
𝑥1 +𝑥2 +𝑥3 +⋯+ 𝑥𝑁 ∑𝑁
𝑖=1 𝑥𝑖 ∑𝑁
𝑖=1 𝑥𝑖
𝜇= = =
𝑁 𝑁 𝑁
Where:
µ: represent the population mean.
N: is the number of values in the population.
x: represent any particular value.
Σ: is the Greek capital letter “sigma” and indicates the operation of adding.
Σx: is the sum of the x values in the population.
Any measurable characteristic of a population is called a parameter. The mean of a
population is a parameter.
Parameter: A characteristic of a population.

̅ ):
2- The Sample Mean (𝑿
Frequently we select a sample from the population in order to find something about a
specific characteristic of the population.

For raw data, sample mean (𝑋̅):


𝑥1 +𝑥2 +𝑥3 +⋯+ 𝑥𝑛 ∑𝑛
𝑖=1 𝑥𝑖
𝑋̅ = =
𝑛 𝑛
where:
𝑋̅: is the sample mean. It is read “X bar”
n: is the number of values in the sample.
The mean of a sample or any other measure based on sample data is called statistic.
Statistic: A characteristic of a sample.

Lecture 2 Page -1
For grouped data, the sample mean is given as:
𝑥1 𝑓1 + 𝑥2 𝑓2 + 𝑥3 𝑓3 + ⋯ + 𝑥𝐾 𝑓𝐾
𝑋̅ =
𝑓1 + 𝑓2 + 𝑓3 + ⋯ + 𝑓𝐾

∑𝐾
𝑖=1 𝑥𝑖 𝑓𝑖
𝑋̅ = ∑𝐾𝑖=1 𝑓𝑖

1
𝑋̅ = ∑𝐾
𝑖=1 𝑥𝑖 𝑓𝑖
𝑛
Where:
𝑥𝑖: is the class mid-point
𝑓𝑖: is the class frequency
n: is the number of values in the sample.

Example: Find the mean for the data in the table shown. Class Frequency
10-14 2
15-19 3
20-24 4
25-29 3
30-34 2
35-39 1
Solution:
We have to find class mid-point.
Class Class true limit Frequency Class mid-point
10-14 9.5-14.5 2 (9.5+14.5)/2=12
𝐾 15-19 14.5-19.5 3 17
1
𝑋̅ = ∑ 𝑥𝑖 𝑓𝑖 20-24 19.5-24.5 4 22
𝑛 25-29 24.5-29.5 3 27
𝑖=1
30-34 29.5-34.5 2 32
12 ∗ 2 + 17 ∗ 3 + 22 ∗ 4 + 27 ∗ 3 + 32 ∗ 2 + 37 ∗ 1
𝑋̅ = 35-39 34.5-39.5 1 37
2+3+4+3+2+1
𝑋̅ = 23

Properties of the Arithmetic Mean


1- Every set of interval or ratio level data has a mean.
2- All the values are included in computing the mean.
3- The mean is unique.
4-The sum of the deviations of each value from the mean is zero. Expressed symbolically
as:

∑(𝑥𝑖 − 𝑋̅) = 0.0

Lecture 2 Page -2
As an example, the mean of (3, 8, 4) is 5. Then:
∑(𝑥𝑖 − 𝑋̅) = (3 − 5) + (8 − 5) + (4 − 5)

= −2 + 3 − 1
= 0.0

5)If ( ̅̅̅
𝑋1 , ̅̅̅
𝑋2 , ̅̅̅ ̅̅̅̅
𝑋3 , … … , 𝑋 𝐵 )be the means of B distributions with respective frequencies (𝑛1,
𝑛2, 𝑛3,..., 𝑛𝐵 ) then the mean 𝑋̅ of the whole distribution with frequency ( 𝑁 = ∑𝐵𝑖=1 𝑛𝑖 ) is
given by:
𝐵
1
𝑋̅ = ∑ 𝑛𝑖 𝑋̅𝑖
𝑁
𝑖=1

The mean does have a weakness because it uses the value of every item in a sample, or
population, in its computation. If one or two of these values are either extremely large or
extremely small compared to the majority of data, the mean might not be an appropriate
average to represent the data.

The Weighted Mean: The weighted mean is a special case of the arithmetic mean. We
will refer to the weighted mean as 𝑋̅𝑤 . Any measure of importance could be used as a weight.
In general, the weighted mean of a set of numbers designated (𝑥1, 𝑥2, ..., 𝑥𝑛) with the
corresponding weights (𝑤1, 𝑤2, …., 𝑤𝑛) is computed as:

𝑤1 𝑥1 +𝑤2 𝑥2 +𝑤3 𝑥3 +⋯+𝑤𝑛 𝑥𝑛


𝑋̅𝑤 =
𝑤1 +𝑤2 +𝑤3 +⋯+𝑤𝑛

This may be shortened to:

∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖
𝑋̅𝑤 =
∑𝑛𝑖=1 𝑤𝑖

Lecture 2 Page -3
Example: The Carter Construction Company pays its hourly employees $16.50, $17.50, or
$ 18.50 per hour. There are 26 hourly employees, 14 are paid at the $16.50 rate, 10 at the
$17.50 rate, and 2 at the $18.50 rate. What is the mean hourly rate paying the 26 employees?

Solution:

∑𝑛𝑖=1 𝑤𝑖 𝑥𝑖
𝑋̅𝑤 =
∑𝑛𝑖=1 𝑤𝑖
14∗16.50+10∗17.50+2∗18.50
𝑋̅𝑤 =
14+10+2

443
𝑋̅𝑤 = = $ 17.038
26

The weighted mean hourly wage is rounded to $17.04.

The Median

For data containing one or two very large or very small values the arithmetic mean may not
be representative. The center for such data can be better described by the median.

Median: The midpoint of the values after they have been ordered from the smallest to the
largest, or the largest to the smallest.

For the median, the data must be at least an ordinal level of measurement. The major
properties of the median are:
1- It is not affected by extremely large or small values. Therefore, the median is a
valuable measure of location when such values do occur.
2- It can be computed for ordinal level data or higher.
3- It is unique.
4- Can be computed for a frequency distribution with an open-ended class if the median
does not lie in an open-ended class.

For raw data: for n values arranged in ascending (or descending) order of magnitude, the
median is:
❖ The middle value if n is odd.

M = element[(n+1) /2]
❖ The arithmetic means of the two middle values if n is even

M= {element(n/2) + element[(n/2) +1]}/2


So, for an even number of observations, the median may not be one of the given values.

Lecture 2 Page -4
Example: Find the median for the data:
a.55, 62, 75, 35, 47, 80,50
b.70, 56, 42, 86, 26, 23, 75, 62

Solution:
a.35, 47, 50, 55, 62, 75, 80
M= 55

b. 23, 26, 42, 56, 62, 70, 75, 86


M= (56+62)/2 = 59

For grouped data, the median is:


𝑛
− 𝐶𝐹
𝑀 = 𝐿 + (2 )∗ℎ
𝑓
Where:
M: is the median of the frequency distribution
L: is the lower true limit of the class containing the median
n: is the total number of values
f: is the frequency of the median class
CF: is the cumulative frequency of the class preceding the median class
h: is the median class width

Example: Find the median for the table shown:


𝑛 Class Frequency
−𝐶𝐹
2
𝑀=𝐿+ ( )∗ℎ , CFi = Fi + CFi-1 10-14 2
𝑓
15-19 3
20-24 4
25-29 3
30-34 2
35-39 1

Class Class true limit Frequency CF


M = 19.5 + [ (7.5-5)/4]*5 = 22.625 10-14 9.5-14.5 2 2
15-19 14.5-19.5 3 5
20-24 19.5-24.5 4 9
25-29 24.5-29.5 3 12
30-34 29.5-34.5 2 14
35-39 34.5-39.5 1 15

Lecture 2 Page -5
The Mode
Mode: The value of the observation that appears most frequently.
The major properties of the mode are:
1. It is not affected by extremely large or small values.
2. It can be computed for all levels of data (nominal, ordinal, interval, and ratio).
For raw data:
1- Some sets of data have no mode because no value appears more than the others.
Example: (19, 21, 23, 20)

2- For some data sets there is more than one mode.


Example: (22,26, 27, 27, 31, 35, 35), 35 is the main mode and 27 is the secondary mode.

3- If the set of data has more than two modes, the distribution is referred to as being
multimode. In such cases we would probably not consider any of the modes as being
representative of the central value of the data.

For grouped data:


𝑓𝑚 − 𝑓1
𝑀𝑜 = 𝐿 + ( )∗ℎ
2𝑓𝑚 − 𝑓1 − 𝑓2

Where:
L: true lower limit of the modal class.
𝑓𝑚: frequency of modal class (maximum frequency).
𝑓1: frequency of class preceding the modal class.
𝑓2: frequency of class following the modal class.
h: modal class interval.

Class Frequency
Example: Find the mode for the data in the table shown. 10-14 2
15-19 3
20-24 4
25-29 3
30-34 2
35-39 1
Solution:
𝑓𝑚 − 𝑓1 Class Class true limit Frequency
𝑀𝑜 = 𝐿 + ( )∗ℎ 10-14 9.5-14.5 2
2𝑓𝑚 − 𝑓1 − 𝑓2 15-19 14.5-19.5 3
4 −3 20-24 19.5-24.5 4
𝑀𝑜 = 19.5 + ( )∗5 25-29 24.5-29.5 3
(2 ∗ 4) − 3 − 3
Mo= 22 30-34 29.5-34.5 3
35-39 34.5-39.5 1

Lecture 2 Page -6
Example: The following is the percent change in net income from last year to this year for
a sample of 12 construction companies in a certain state. Determine mean, median, and the
mode:

5 1 -10 -6 5 12 7 8 2 5 -1 11

Solution:

µ = [5+1+(-10) + (-6) +5+12+7+8+2+5+ (-1) +11]/12 = 39/12 =3.25

To find the median we have to sort the data from the smallest value to the largest value
-10, -6, -1, 1, 2, 5, 5, 5, 7, 8, 11, 12
M = (5+5) / 2 = 5
Mo=5

The Relative Positions of the Mean, Median, and Mode


1- In a symmetric distribution which has the same shape
on either side of the center, if the polygon were folded
in half, the two halves would be identical. For any
symmetric unimodal distribution, the mode, median,
and mean are located at the center and are always equal.

2- If a distribution is nonsymmetric or skewed, the relation among the three measures


changes:

a- In a positive skewed distribution, the arithmetic mean is the


largest of the three measures. The median is generally the
next largest measure, and the mode is the smallest of the
three measures.

b- In a negative skewed distribution, the arithmetic mean is


the lowest of the three measures. The median is greater
than the arithmetic mean, and the modal value is the
largest of the three measures.

Lecture 2 Page -7
Measures of Dispersion

A measure of location, such as the mean or the median, only describes the center of the data.
It is valuable from that standpoint, but it does not tell us anything about the spread of the
data. A small value for a measure of dispersion indicates that the data are clustered closely,
say, around the mean. The mean is therefore considered representative. While a large value
of the measure of dispersion indicates that the mean is not reliable.
If two groups of students have the same average marks, we may like to know whether
one group consists of students of average and near-average intelligence and the other group
is made up of a large number of very bright and very dull students
Group A: 75 85 95 105 115 125
Group B: 10 20 30 70 190 280

Both A and B have mean 100, yet they are different. Such a variation is variously called as
dispersion, spread, scatter, or variability. We will consider several measures of dispersion:

The Range

Range: It is the difference between the largest and the smallest values in a data set. It is the
simplest measure of dispersion
Range = Largest value – Smallest value
Interquartile range = 𝑄3 – 𝑄1
Interdecile range = 𝐷9 – 𝐷 1
Interpercentile range = 𝑃90 – 𝑃10
Range is a very useful measure in industrial engineering work, especially in statistical
quality control work.

Mean Deviation
The arithmetic mean of the absolute values of the deviations from the arithmetic mean.

For raw data:

Where:
X: is the value of each observation.
𝑋̅: is the arithmetic mean of the values.
n: is the number of observations in the sample.
| |: indicates the absolute value

Lecture 2 Page -8
For grouped date

Where:
𝑥𝑖: is the class mid-point
𝑓𝑖: is the class frequency
𝑋̅: is the arithmetic mean of the sample

Advantages of mean deviation:


1- It uses all the values in the computation.
2- It is easy to understand.
Its main drawback is the use of absolute values and the deviations are not taken with their
proper signs.

Variance and Standard Deviation


Variance: The arithmetic mean of the squared deviations from the mean.
Standard Deviation: The square root of the variance.

Population Variance ( 𝜎 2 ) for raw data:


∑(𝑋−𝜇)2
𝜎2 =
𝑁

Where:
𝜎2: is the population variance.
X: is the value of an observation in the population.
𝜇: is the arithmetic mean of the population.
N: is the number of observations in the population.

Because the unit of the variance the square of the variate or the square variable unit we use
the square root of the variance which is called the standard deviation (σ):

𝑁
1
𝜎 = √ ∑(𝑥𝑖 − 𝜇)2
𝑁
𝑖=1

where 𝜎 is the population standard deviation.

Lecture 2 Page -9

You might also like