You are on page 1of 9

CORRESPONDENCE LEARNING MODULE

MELS 1073 (Biostatistics & Epidemiology) – Lecture


AY 2022-2023

Lesson 3: Descriptive Statistics – Part 1

Topic: Measures of Central Tendency and Position

Learning Outcomes: At the end of this module, you are expected to:

1. Explain the uses of the different descriptive statistical tools.


2. Interpret results of the different statistical tools

LEARNING CONTENT

Introduction:

The previous lessons discussed the different methods of data collection as well as the different
methods of organizing, summarizing, and presenting data which includes tables and graphs. Although
tables and graphs are extremely useful in data presentation, they do not allow us to make concise,
quantitative statements that characterize a distribution as a whole. In order to do this, we have to use
descriptive statistics. In describing numerical data, the type of descripting is determined by the nature
of the data themselves and the objective(s) or purpose(s) of the description. In this topic, we are going
to discuss the measure of central tendency and position. Enjoy learning!

Lesson Proper:

Measures of Central Tendency

Descriptions of statistical data can be quite brief or elaborate depending on the nature of the
data or what we intend to do. Sometimes, presenting data as they are and letting them speak for
themselves may be quite satisfactory by data further summarized by means of appropriate statistical
description give more useful information. One of these appropriate statistical description give more
useful information. One of these appropriate statistical descriptions is the measure of central tendency.

The measure of central tendency of a given set of data is the value around which the whole set
of data tends to cluster. It is represented by a single number which summarizes and describes the
whole set.

The most commonly used measures of central tendency are the mean, median, and mode.

The Mean or Average

MELS 1073 – Biostatistics & Epidemiology (Lecture) | 1


The arithmetic mean may be defined as the sum of the individual observed value divided by the
total number of observations. It is a computed average and its magnitude is influenced by every value
in the set. It is the location measure most commonly used, but can be misleading when the distribution
contains extremely large or small values. The formula is:
𝑛
𝑖=1∑𝑋𝑖
𝑋̅ =
𝑛

Where: 𝑋̅ = sample mean; ∑= symbol for “summation”; 𝑋𝑖 = ith individual observation; n = total
no. of observations

Example :

A pediatrician had 9 patients on a particular clinic day. The weights (in kilograms) of her patients
on that day were as follows: 7, 17, 12.6, 15.7, 16, 16, 11.7, 17.5, and 12.6. Compute and interpret the
mean.

Solution:
𝑛
𝑖=1∑𝑋𝑖 7 + 17 + ⋯ + 12.6 143.8
𝑋̅ = = = = 14.01 𝑘𝑔
𝑛 9 9

Interpretation:

On average, the weight of patients is 14. 01 kg

The Median

The median is the midpoint of the distribution. Half of the value in the distribution fall below the
median and the other half above it. For distributions having an even number or arrayed observed
values, the median is the average of the two middle most value; but, for odd number of arrayed
observations, it is the middlemost value.

The median is the most appropriate locator of the center since it has resistance to extreme
values. It is a positional average; hence, its value depends on its position relative to the number of
observations in the array and on the number of items in the distribution. The median is sometimes
denoted by 𝑋̃ or Mdn. The steps are:

1. Arrange the observations from lowest to highest or vice versa;


2. Find 𝑋̃.
𝑛 (𝑛⁄2)+(𝑛⁄2+1)𝑡ℎ
a. If is an integer, 𝑋̃𝑘 = ordered observations.
2 2
𝑛 𝑛
b. If is not an integer, 𝑋̃𝑘 = 𝑖 𝑡ℎ observation where 𝑖 𝑡ℎ is the closest integer greater than 2 .
2

Example 1:

MELS 1073 – Biostatistics & Epidemiology (Lecture) | 2


A pediatrician had 10 patients on a particular clinic day. The weights (in kilograms) of her patients
on that day were as follows: 7, 17, 12.6, 15.7, 16, 16, 11.7, 17.5, and 12.6. Determine the median
weight of the patients. Interpret it.

Solution:

1. Arrange the weights of the patients as follows: 7, 11.7, 12.6, 12.6, 15.7, 16, 16, 17, 17.5, 17.7
𝑛 𝑛 10
2. Solve for 2 . 2 = 2 = 5.
(𝑛⁄2)+(𝑛⁄2+1)𝑡ℎ (5𝑡ℎ+6𝑡ℎ 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠) 15.7+16.0
3. Since it is an integer, 𝑋̃𝑘 = = 𝑋̃𝑘 = = =
2 2 2
15.85 𝑘𝑔

Interpretation:

Half of the patients weigh less than or equal to 15.85 kg while the other half weigh more than
15.85 kg.

Example 2:

Consider the 11 patients admitted to a psychiatric ward of a general hospital who experienced
the following lengths of stay (in days) were as follows: 29, 14, 11, 24, 14, 14, 28, 14, 18, 22, 14. Find
and interpret the median length of stay of the patients.

Solution:

1. Arrange the weights of the patients as follows: 11, 14, 14, 14, 14, 14, 18, 22, 24, 28, 29.
𝑛
2. Solve for 2 .
𝑛 11
= = 5.5 = 𝑋̃ = 6𝑡ℎ 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 14 𝑑𝑎𝑦𝑠.
2 2

Interpretation:

Half of the patients length of stay is less than or equal to 14 days while the other half of the
patients length of stay is more than 14 days.

The Mode

The mode, denoted by 𝑥̂, of a give set of ungrouped data is the value that occurs most frequently.
The mode is not a unique measure since two or more values may occur more frequently in a given
distribution.

Example:

A pediatrician had 10 patients on a particular clinic day. The weights (in kilograms) of her patients
on that day were as follows: 7, 17, 12.6, 15.7, 16, 16, 11.7, 17.5, and 12.6. Find the mode weight of the
patients. Interpret it.

Solution:
MELS 1073 – Biostatistics & Epidemiology (Lecture) | 3
By inspection, the most frequent weights of the patients are 12.6 kg and 16 kg; hence the mode
is

𝑥̂ = 12.6 𝑘𝑔 𝑎𝑛𝑑 16 𝑘𝑔

Interpretation:

The usual weights of the patients are either 12.6 kg or 16 kg

Comparing the Mean, the Median, and the Mode

1. The mean is most frequently used measure of location since it reflects every value and
has the characteristics of simplicity, uniqueness and stability from sample to sample in a
distribution. However, when the distribution contains very large or very small values, it
can be misleading, while the median, on the other hand, is the most appropriate locater
of central tendency since it is the midpoint of the distribution and is not influenced by
extreme values, large or small, but by the number of observations in a given set.
2. In a symmetrical distribution (normal curve), where there is only one mode, the mean, the
median, and the mode have equal values and coincide at the highest point on the graph
and they all lie on the axis of symmetry.
3. In every given set of distribution, a unique value of the mean and of the median exist while
the mode, unlike the mean and the median, does not always exists nor is it unique—two
or more value may occur in a given distribution.
4. In a symmetrical distributions, the position of these measures varies. In a negatively
skewed (skewed to the left) distribution, the median lies to the left of the mode and the
mean to the left of the median, while in the positively skewed (skewed to the right), the
median lies to the right of the mode and the mean to the right of the median.
5. The mean is the most significant and widely used measures of averages. The median, on
the other hand, can be determined even for qualitative data as long as they can be
ordered, while the mode is most preferrable in getting the most typical average, since it
is the value that occurs frequently in a series.

Uses of the Mean, Median, and Mode

Mean. The mean is used for

1. Higher statistical computations;


2. Distribution where there are no extreme values since it is easily affected by extremely
high or low values;
3. Distribution requiring the greatest reliability since it includes all the given set of values in
its computation, and;
4. Interval and ratio measurements.

Median. The median is used for

MELS 1073 – Biostatistics & Epidemiology (Lecture) | 4


1. Ordinal and ranked measurements;
2. Distributions which are markedly skewed;
3. Distributions where the highest or lowest class interval or both are not defined, that is,
distributions using open-ended classes such as 100 and below or 60 and above, and;
4. Determination of whether the cases fall within the lower halve or the upper halve of the
distribution (appropriate locator of central tendency).

Mode. The mode is used for

1. Nominal data;
2. Solving for the most typical average since it is the value that occur most frequently in a series,
and;
3. A quick or rough estimate of a central value

Limitations of the Mean, Median, and Mode

Mean. The limitations of the mean:

1. The mean is easily affected by extremely large or extremely small values;


2. When the clustering of values is not substantial, the resulting mean can be very
misleading. For example, getting the mean of 20 and 110 which are very far apart;
3. It is a poor measure of central position when the given values do not tend to cluster around
a central value.
4. It cannot be used as means of comparison between two or more distributions, since they
may have the same mean but their other characteristics may be entirely different. For
example, distributions 50, 55, 60 and 56, 55, 44 have the same mean, but have different
patterns of dispersion.

Median. The limitations of the median:

1. It is affected by the number of observations in a given set


2. It cannot be determined unless the given set of observations are arrayed.
3. If there are several observations in a distribution, it becomes laborious to array them. Its
value is not as accurate as the mean since it is only an ordinal number.

Mode. The limitations of the mode:

1. It does not always exist.


2. It is not a unique value since two or more values may occur in a given distribution.
3. Its value easily changes since it depends on the methods used in finding it.
4. It is just a rough estimate.

Skewness in Relation to the Mean, Median, and Mode

The mean, median and mode can describe the characteristics of a given distribution.

MELS 1073 – Biostatistics & Epidemiology (Lecture) | 5


1. Symmetrical distribution (normal curve). There is only one mode, the mean, the median and the
mode have equal values, and coincide at the highest point on the graph
2. Positively skewed distribution (skewed to the right). The distribution has larger values so that the
value accumulate to the right.
3. Negatively skewed distribution (skewed to the left). The values accumulate to the left.

Whether the curve is symmetrical, positively skewed, or negatively skewed, the area under the
curve to the left of the median is equal to the right; and no matter what the shape of the curve, the mode
is always located at the highest point.

Measures of Position

The quantiles or fractiles are point measures. There are quantile, decile, and percentile which
divide the distribution into a given number of equal parts. The most commonly and widely used quantile
is the percentile. Quartile and decile are seldomly used.

Quartiles

Quartiles are measures that divide the observations into four equal parts. Twenty-five percent
(25%) falls below the first quartiles, fifty percent (50%) is below the second quartile, and seventy-five
percent (75%) is below the third quartile. Quartiles are computed in the same as the median is
computed, since the second quartile is the same as the median. The interpretation of the obtained value
of the quartiles follows the interpretation of the median value.

The steps in finding the quartiles from raw data are as follows:

1. Arrange the observations from lowest to highest.


2. Determine 𝑄𝑘 , where 𝑄𝑘 is the 𝑘 𝑡ℎ quartile and k = 1, 2, 3.

MELS 1073 – Biostatistics & Epidemiology (Lecture) | 6


𝑛𝑘 (𝑛𝑘⁄4)𝑡ℎ +(𝑛𝑘⁄4 + 1)𝑡ℎ
a. If is an integer, 𝑄𝑘 = ordered observations.
4 2
𝑛𝑘
b. If is not an integer, 𝑄𝑘 = 𝑖 𝑡ℎ observation where 𝑖 𝑡ℎ is the closest integer greater than
4
𝑛𝑘
.
4
Example:

The individual ages (in years) of 10 patients entering the general hospital are as follows: 15, 31,
75, 84, 19, 79, 74, 78, 79, and 29. Determine the quartiles.

Solution:

1. Arrange their individual ages as 15, 19, 29, 31, 74, 75, 78, 79, 79, and 84.
2. Solve for quartiles:

𝑛𝑘 (10)(1)
= = 2.5 = 𝑄1 = 3𝑟𝑑 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 = 29
4 4

𝑛𝑘 (10)(2) (5𝑡ℎ + 6𝑡ℎ )𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 74 + 75


= = 5 = 𝑄2 = = = 74.5
4 4 2 2

𝑛𝑘 (10)(3)
= = 7.5 = 𝑄3 = 8𝑡ℎ 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 = 79
4 4

Interpretation:

𝑄1= 29 can be interpreted as one-fourth of the patients are of ages lower or equal to 29 years
while three-fourth of the patients are of ages lower or equal to 79 years. 𝑄2 = 74.5 means that half of
the patients are of ages lower or equal to 74.5 years while another half fall above it.

Deciles

The deciles are measures of position that divide the total number of observations into ten (10)
equal parts. There are nine (9) deciles. Ten percent (10%) falls below the first decile; 20% falls below
the second decile; 30% falls below the third decile; and so on. The fifth decile is the same with median.
Thus, deciles are computed exactly in the same manner as the median is computed. The interpretation
is the same as the median or the quartiles.

The steps in finding the deciles from raw data are as follows:

1. Arrange the observations from lowest to highest.


2. Determine 𝐷𝑘 , where 𝐷𝑘 is the 𝑘 𝑡ℎ quartile and k = 1, 2, 3,…, 9.
𝑛𝑘 (𝑛𝑘⁄10)𝑡ℎ +(𝑛𝑘⁄10 + 1)𝑡ℎ
a. If is an integer, 𝐷𝑘 = ordered observations.
10 2
𝑛𝑘
b. If is not an integer, 𝐷𝑘 = 𝑖 𝑡ℎ observation where 𝑖 𝑡ℎ is the closest integer greater than
10
𝑛𝑘
.
10
Example:
MELS 1073 – Biostatistics & Epidemiology (Lecture) | 7
Determine 𝐷1 , 𝐷5 , 𝐷7 and 𝐷9 of the survival time in days of 10 patients after surgery at hospital
Y are as follows: 135, 42, 32, 47, 59, 90, 86, 75, 96, and 105. Interpret.

Solution:

1. Arrange the survival times of 10 patients as 32, 42, 47, 59, 75, 86, 90, 96, 105, 135.
2. Solve for deciles:

𝑛𝑘 (10)(1) (1𝑠𝑡 + 2𝑛𝑑 ) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 32 + 42


= = 1 = 𝐷1 = = = 37
10 10 2 2

𝑛𝑘 (10)(5) (5𝑡ℎ + 6𝑡ℎ ) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 75 + 86


= = 5 = 𝐷5 = = = 80.5
10 10 2 2

𝑛𝑘 (10)(7) (7𝑡ℎ + 8𝑡ℎ ) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 90 + 96


= = 7 = 𝐷7 = = = 93
10 10 2 2

𝑛𝑘 (10)(9) (9𝑡ℎ + 10𝑡ℎ ) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 105 + 135


= = 9 = 𝐷9 = = = 120
10 10 2 2

Interpretation:

𝐷1 = 37 means that 10% of the survival times of patients fall below or equal to 37 days while the
remaining 90% of patients fall above it. 𝐷5 = 80.5 means that half of the survival times of patients fall
below or equal to 80.5 days while the other half fall above it. 𝐷7 = 93 means that 70% of the survival
times of patients fall below or equal to 93 days while the remaining 30% of patients fall above it. 𝐷9 =
120 means that 90% of the survival times of patients fall below or equal to 120 days while the remaining
10% of patients fall above it.

Percentile

The percentiles are measures of position which divide the total number of observations into
exactly one hundred equal parts. There are 99 percentiles that determine the points below which
percentages of observations would fall. For example, the seventh percentile would indicate that 7% of
the observations in the distribution lies within or below it while 93% lies above it.

The steps in finding the quartiles from raw data are as follows:

1. Arrange the observations from lowest to highest.


2. Determine 𝑃𝑘 , where 𝑃𝑘 is the 𝑘 𝑡ℎ percentile and k = 1, 2, 3,…, 99.
𝑛𝑘 (𝑛𝑘⁄100)𝑡ℎ +(𝑛𝑘 ⁄100 + 1)𝑡ℎ
a. If 100 is an integer, 𝑃𝑘 = ordered observations.
2
𝑛𝑘
b. If is not an integer, 𝑃 = 𝑖 𝑡ℎ observation where 𝑖 𝑡ℎ is the closest integer greater than
100
𝑛𝑘
.
100
Example:

MELS 1073 – Biostatistics & Epidemiology (Lecture) | 8


Determine 𝑃5 , 𝑃50 , 𝑎𝑛𝑑 𝑃95 of the length of services (in years) of nine faculty members at the
College of Medicine: 10, 14, 22, 17, 15, 25, 22, 34, and 30. Interpret.

Solution:

1. Arrange the length of services (in years) of nine faculty members as 10, 14, 15, 17, 22,
22, 25, 30, and 34
2. Solve for percentiles:

𝑛𝑘 (9)(5)
= = 0.45 = 𝑃5 = 1𝑠𝑡 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 = 10
100 100

𝑛𝑘 (9)(50)
= = 4.5 = 𝑃50 = 5𝑡ℎ 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 = 22
100 100

𝑛𝑘 (9)(95)
= = 8.55 = 𝑃95 = 9𝑡ℎ 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 = 34
100 100

Interpretation:

𝑃5 = 10 means that 5% of the length of service in years of faculty members of the College of
Medicine fall below or equal to 10 years while the remaining 95% of faculty members fall above it. 𝑃50
= 22 means that half of the length of service in years of the faculty members of the College of Medicine
fall below or equal to 22 years while the other half fall above it. 𝑃95 = 34 means that 95% of the length
of service in years of the faculty members of the College of Medicine fall below or equal to 34 years
while the remaining 5% of faculty members fall above it.

*** END of LESSON 3***

MELS 1073 – Biostatistics & Epidemiology (Lecture) | 9

You might also like