You are on page 1of 18

TOPIC 3

Numerical
Descriptive
Techniques

Measures of Central Tendency


(Ukuran Kecenderungan Memusat)

 An attribute of a distribution concerning where the values


of the distribution tend to congregate (berkumpul).

 Arithmetic Mean (min aritmetik) for ungrouped data:

Population mean (min populasi)

∑𝑥 xi = observation-i
𝜇=
𝑁 N = total observations in the
Sample mean (min sampel) population
∑𝑥 n = total observations in the
𝑥̅ = sample
𝑛

1
Mean, Median and Mode

Mean (Min):

The mean is the average (purata) value of the data distribution.

Median:

The median is the midpoint of the data values after they have
been ordered from the smallest to the largest

Mode (Mod):

The mode is the value of the observation that appears most


frequently

Properties of the Mean,


Median & Mode

 Every set of interval and ratio-level data has a mean. It is


the most popular & useful measure of central tendency.
However, it is affected by outliers (nilai terpencil)
(i.e. unusually large or small data values).

 Median can be computed for ratio, interval, and ordinal-


level data. It is not affected by outliers.

 Mode can be computed for all levels of data. A set of


data may have more than one mode or there may be
no mode if no one value appears more than any other.

2
Distribution Shape & Measures
of Central Tendency

Mean Mean Mode Mode Mean


Median Median Median
Mode

Mean < Median < Mode < Median <


Mean = Median =
Mode Mean
Mode
Left skewed Right skewed
Symmetry (simetri)
(pencong kiri) (pencong kanan)

Refer to Time Data in Topic 2

a) ∑ 𝑥 = 4171.6
4171.6
Mean, 𝑥̅ = = 52.145
80
80 + 1
b) The location of the median is = 40.5
2

40th observation = 53.2 ; 41st observation = 53.8

53.2 + 53.8
Median = = 53.5
2

c) Mode = 62.2 (frequency of occurrence = 3)

3
Interpretation of mean, median, mode

• On average, the amount of time needed to complete a


critical task on an assembly line for each worker was
52.145 seconds.

• 50% of the workers used more than 53.5 seconds to


complete a critical task on an assembly line.

• Majority of the workers used 62.2 seconds to complete a


critical task on an assembly line.

• The distribution of the amount of time needed to complete


a critical task on an assembly line was left-skewed.

Weighted Mean

Instead of each data point contributing equally to the final mean,


weighted mean (min berpemberat) is calculated by giving
values in a data set different weight (pemberat) according to
some attribute of the data. These weightings determine the
relative importance of each value on the average.

The weighted mean of a set of numbers x1 , x2 , ... , xn with


corresponding weights w1 , w2 , ... , wn is computed from the
following formula:

𝜇 or 𝑥̅ =

4
Example 1:

All the purchases of a raw material over the past three months are
shown in following table.

Purchase 1 2 3 4 5
Cost per pound ($) 3.00 3.40 2.80 2.90 3.25
Number of pounds 1200 500 2750 1000 800

Compute the mean cost per pound.

1200 3.00 + 500 3.40 + ⋯ + 800(3.25)


𝜇 = = $2.96
1200 + 500 + 2750 + 1000 + 800

Geometric Mean

The geometric mean (min geometri) of a set of numbers

x1 , x2 , ... , xn

is computed from the following formula:

𝑥̅ = (𝑥 ∙ 𝑥 ∙ … ∙ 𝑥 )

It is actually the n-th root of the product of n values.

The geomatric mean is used to measure the average growth


rate (kadar pertumbuhan purata) or average rate of change
(kadar perubahan purata) in a variable over time.

5
Geometric Mean of Growth Rate

Let Ri denote the growth rate / rate of change (expressed in


decimal form) in period i , for i = 1, 2, ... , n

The geometric mean, RG of the growth rate / rate of change is


defined such that

1+𝑅 = 1+𝑅 1+𝑅 .... 1 + 𝑅

Solving for RG , we obtain the following formula

𝑅 = 1+𝑅 1+𝑅 … (1 + 𝑅 ) − 1

The average growth rate / average rate of change is expressed


in term of percentage

Example 2:

Suppose you made a 6-year investment of RM5000, and the


annual returns rate (in percentage) over the past six years were
-22.1, 28.7, 4.9, 0.5, -1.3 and 15.8 respectively. What is the
average annual return rate over this period?

𝑅 = 0.779 × 1.287 × 1.049 × 1.005 × 0.987 × 1.158 − 1


= 0.032 or 3.2%

The value at the end of the investment period is

5000(1 + 𝑅 ) = 5000(1 + 0.032)  RM6040

6
Arithmetic Mean of Growth Rate

 The arithmetic mean of return (or growth rates) is


the appropriate mean to calculate if we wish to
estimate the mean rate of return (or growth rate) for
any single period in the future.

 The arithmetic mean for Example 2 is 4.42%.

 This indicate that the rate of return in the coming


year would be 4.42%.

Measures of Dispersion
(Ukuran Serakan)

An attribute of a distribution concerning the spread (sebaran)


of the values from the mean.

Distribution A, Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21

Distribution B, Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21

7
Range

 The range (julat) is the distance between the smallest


and the largest data value in the set

 range = largest observation – smallest observation

 Its major shortcoming is its failure to provide


information on the dispersion of the observations
between the two end points

Variance

 Variance (varians) is one of the most frequently used


measures of dispersion.

∑( ) ∑
 population: 𝜎 = = -𝜇

∑( ̅) (∑ )
 sample: 𝑠 = = ∑𝑥 −

= ∑ 𝑥 − 𝑛𝑥̅

(𝑥 −𝜇) and (𝑥 − 𝑥̅ ) are known as square of deviation


(sisihan kuasa dua)

8
Standard Deviation
(Sisihan Piawai)

 has the same units as the original data

 cannot be negative

 most widely reported measure of dispersion

 𝜎 = 𝜎 or 𝑠 = 𝑠

Refer to Time Data in Topic 2

a) Range = 80.3 – 19.3 = 61

b) ∑ 𝑥 = 4171.6 and ∑ 𝑥 = 233098.92

( . )
Variance, 𝑠 = 233098.92 −

= 197.0992

c) Standard deviation, 𝑠 = 197.0992 = 14.0392

9
Excel Output

Data  Data Analysis Time


Time Kurtosis -0.4658
Mean 52.145 Skewness -0.2746
Standard Error 1.5696 Range 61
Median 53.5 Minimum 19.3
Mode 62.2 Maximum 80.3
Standard Deviation 14.0392 Sum 4171.6
Sample Variance 197.0992 Count 80

Note: Excel only shows the smallest data value if there is


more than one mode in the data.

Chebysheff’s Theorem
(Teorem Chebysheff)

 Knowing the mean and std. dev. allows us to extract


useful bits of information. A more general interpretation
of the std. dev. is derived from Chebysheff’s
Theorem, which applies to skewed distribution.

 For any set of observations, at least 1 − 100% of


the data values lie within k
standard deviations of the
mean,   k , for k > 1

10
Example 3:

Suppose that the mean and standard deviation of midterm test


marks are 70 and 6, respectively. Assumed that the distribution is
skewed.

a) At least 36% of the students score between what two


values?

Answer: k = 1.25, at least 36% of the students score


between 62.5 and 77.5

b) What is the minimum proportion of students score


between 55.6 to 84.4?

  k = (55.6, 84.4)  k = 2.4 (Answer: at least 82.64%)

Empirical Rule (Peraturan Empirikal)

 If the distribution is bell-shaped, the Empirical Rule


states that
• approximately 68% of all observations fall
within   1

• approximately 95% of all observations fall


within   2

• approximately 99.7% of all observations fall


within   3

11
Illustration of Empirical Rule

2.35% 34% 34% 2.35%


13.5% 13.5%

 - 3  - 2  - 1   + 1  + 2  + 3

68%
95%
99.7%

Refer to Example 3

Suppose that the mean and standard deviation of midterm test


marks are 70 and 6, respectively. Assumed that the distribution is
bell-shaped.

a) Approximately 95% of the students score between what


two values from the mean?

Answer: Approximately 95% of the students score


between 58 and 82

b) What is the proportion of students score (i) more than 88?


and (ii) between 52 and 76?

Answer: (i) 0.15% and (ii) 83.85%

12
Coefficient of Variation, CV
(Pekali Variasi)

 Coefficient of Variation is the ratio of the standard


deviation to the mean, expressed as a percentage

𝜎 𝑠
CV = × 100 CV = × 100
𝜇 𝑥̅
(population) (sample)

 it is a measurement of the degree by which an


observed variable deviates from its average value.

 provides a proportionate measure of variation which


measure the relative dispersion (serakan relatif) of
data

Coefficient of Variation, CV
(Pekali Variasi)

 useful for comparing dispersion of 2 distributions with


different unit of measurement or the data values (or
descriptive statistics) differ substantially in magnitude

 useful measurement in applications such as comparing


stocks and other investment portfolios because it is a
way to determine the risk involved with the holdings in
the portfolio

13
Example 4:

Compare the dispersion of the following data.

HDI LEXP YSCH


(in points) (in years) (in years)
Minimum 0.3539 52.2 4.87
Maximum 0.9525 84.1 22.92
Mean () 0.709 72.159 13.255
Std. dev. () 0.152 7.616 2.930

Answer:

• The units of measurement of Human Development Index


(HDI) and life expectancy at birth (LEXP) are different.

From the relative dispersion perspective, the dispersion


of Human Development Index (21.5%) is actually greater
than life expectancy at birth (10.6%).

• The data values and descriptive statistics of life expectancy


at birth (LEXP) and expected years of schooling (YSCH) are
differ substantially in magnitude.

Therefore, the dispersion of expected years of schooling


(22.1%) is relatively greater than the dispersion life
expectancy at birth (10.6%).

14
Measures of Relative Standing
(Kedudukan Relatif)

 Method to determine the position of values that divide a


set of observations into equal parts

 Percentiles (Persentil), Pi divide the values into 100


parts of equal size (P1, P2, …, P99), each comprising 1%
of the observations.

So, the P-th percentile is the value for which P % are


less than that value and (100 – P)% are greater than
that value.

Measures of Relative Standing

 Quartiles (Kuartil), Qi divide the values of a data set


into 4 subsets of equal size (Q1, Q2, Q3), each comprising
25% of the observations.

 lower or first quartile (kuartil pertama) , Q1 = P25


second quartile, Q2 = P50 = median
upper or third quartile (kuartil ketiga), Q3 = P75
 Refer to Time Data in Topic 2
Q1 = 42.225 ; Q3 = 62.075 ; P10 = 33.07 ; P90 = 70.37

15
Interpretation of quartile & percentile

• Q1 : 25% of the workers used less than 42.225 seconds to


complete a critical task on an assembly line.

• P90 : 10% of the workers used more than 70.37 seconds to


complete a critical task on an assembly line.

Interquartile Range

 The quartiles can be used to create another measure


of dispersion, the interquartile range (julat antara
kuartil), which is defined as follows:

 The interquartile range measures the spread of the


middle 50% of the observations. Large values of this
statistic mean that the 1st and 3rd quartiles are far
apart indicating a high level of dispersion. It is not
affected by outliers.

16
Approximating the Mean and
Variance from Grouped Data

Sample Mean Sample Variance

∑𝑓𝑚 (∑ )
𝑥̅ = 𝑠 = ∑𝑓𝑚 −
𝑛
Population Mean Population Variance

∑𝑓𝑚 ∑
𝜇= 𝜎 = -𝜇
𝑁
fi = frequency for class-i
mi = midpoint (titik tengah) for class-i

Example 5:

A local hospital provided the following frequency distribution


summarizing the weights (in pounds) of all babies delivered over
the month of August, 2020.

Weight Number of babies


2-<4 3
4-<6 8
6-<8 25
8 - < 10 30
10 - < 12 4
Find the approximate value of the mean and standard deviation
of weight.

17
Number of Midpoint,
Weight 𝑓𝑚 𝑓𝑚
babies, 𝑓 𝑚
2-<4 3 3 9 27
4-<6 8 5 40 200
6-<8 25 7 175 1225
8 - < 10 30 9 270 2430
10 - < 12 4 11 44 484
70 538 4366

538
Approximate population mean, 𝜇 = = 7.686 pounds
70

Approximate population standard deviation,

𝜎= − (7.686) = 1.816 pounds

18

You might also like