You are on page 1of 17

09/08/2023

HS 522

Topic 1, Mean

Numerical Measures
• Many numerical measures are used to
describe the data set. We will pick up what is
most important for us.
• Central tendency: which measurement
represents the data set?
• Dispersion/ Variability: how the data is
arranged around the representational
measure?

09/08/2023 Topic 1: Mean 2

Why Important?
• What is the average infant mortality rate in
India?
• Which states show higher variability?
• Higher variability: lower stability.
• Similarly, in case of income distribution:
higher variability: higher inequality.

09/08/2023 Topic 1: Mean 3

1
09/08/2023

Measures of Central Tendency


• Key question: what is the ‘center’ of a
distribution?
• Aside: distribution means how the data is
arranged, but may also refer to frequency
distribution (which is also one arrangement).
• Because the data can be arranged in a
different way, the notion of a ‘center’ is
context specific.

09/08/2023 Topic 1: Mean 4

Distribution-1 (Symmetric)
Most data is points are around the red dot.
Frequencies fall on both sides of red dot.

09/08/2023 Topic 1: Mean 5

An Example
x f
1 1
2 2
3 3
4 8
5 3
6 2
7 1

09/08/2023 Topic 1: Mean 6

2
09/08/2023

Distribution-2: Skewed (Negatively)

Many data points are left to the blue dot


Highest
number of
data is at
blue dot

09/08/2023 Topic 1: Mean 7

An Example
X freq
1 1
2 2
1 1
4 3
5 2
6 30
7 1

09/08/2023 Topic 1: Mean 8

Distribution-3 (Known as Bimodal)

09/08/2023 Topic 1: Mean 9

3
09/08/2023

An Example
X freq
1 1
2 5
3 2
4 0
5 1
6 7
7 1

09/08/2023 Topic 1: Mean 10

Discussion
• For distribution 1, a center is probably well
defined..
• Not so for distributions 2 and 3.
• As there are different notions of center, there
are different notions of measure of central
tendency.
• Those are called mean, median and mode.
• We take up the issue of mean first.
09/08/2023 Topic 1: Mean 11

Arithmetic Mean
• Arithmetic mean, or sometimes simply the mean is
computed by adding the data points and divide by total
number of observations.
• In symbol , for a sample,

x i
x i

• Population mean is defined by Greek lower case letter


‘mu’ μ.

09/08/2023 Topic 1: Mean 12

4
09/08/2023

Example

Y 0 2 3 4
• Assuming this to be a sample, the sample
mean is Y 9 i
Y    2.25
n 4

• Assuming this to be a population, the


population mean is
Y i
 i
 2.25
N

09/08/2023 Topic 1: Mean 13

Finding Sum from Mean


• A set of data is given. It is known that the
arithmetic mean of these numbers is 8. There
are 15 observations. What is the sum of those
data points?

x n
x i
; x  8, n  15   xi  ?
n i 1
n

 xi  n * x  15*8  120
i 1

09/08/2023 Topic 1: Mean 14

Mean As Representation
• Suppose we have “n” observations.
• If we sum the mean n times, that will be equal
to the sum of the data set.
• If we erase every number and replace with
AM, the sum of AM will preserve the sum of
data set.

09/08/2023 Topic 1: Mean 15

5
09/08/2023

Mean As Centre
• Subtract mean from each observation and
sum up.
Y Y- (mean)
0 -2.25
2 -.25
3 .75
4 1.75
Sum= 9 Sum= ?

09/08/2023 Topic 1: Mean 16

Mean: Change in Base


(Example)
• Suppose a data set is given.
• Change in base simply implies either adding or
subtracting a constant from each data point
X Z=X-1
1 0
2 1 7 3
x  1.75; z   .75
1 0 4 4
z  x 1
3 2
ΣX=7 ΣZ=3

09/08/2023 Topic 1: Mean 17

Mean: Change in Base Rule


• Suppose there is a dataset (X). We add (or
subtract) a constant term to each of the data
set and produce Z, i.e. Zi  X i  a
• Then the mean increases (or decreases) by the
same constant, i.e.
Z  X a

09/08/2023 Topic 1: Mean 18

6
09/08/2023

Mean: Change in Scale(Example)


• Suppose a data set is given.
• Change in scale simply implies either
multiplying or dividing the data set by a
constant X Z=X/2 W= 3X
1 0.5 3
2 1 6 7 3.5
x  1.75; z   .875
1 0.5 3 4 4
3 1.5 9 x
z
ΣX= ΣZ=3.5 ΣW=21 2
7 w  5.25; w  3* x

09/08/2023 Topic 1: Mean 19

Mean: Change in Scale Rule


• Suppose there is a dataset (X). We multiply (or
divide) a constant term to each of the data set
and produce Z, i.e. Zi  a * X i
• Then the mean increases(or decreases ) by the
same proportional constant, i.e.
Z  aX

09/08/2023 Topic 1: Mean 20

Motivation for The Next Section


• A society consists of many groups.
• Suppose we are interested in income.
• There is average income for the whole society.
• There is average incomes for each group
within the society.
• How does the societal average relate to group
specific averages?

09/08/2023 Topic 1: Mean 21

7
09/08/2023

Example
• The scores relate to two groups within a class,
say Group X for English speakers (4 in number)
and Y for non-English speakers (5 in number)

X (nx 1 2 1 3 - ΣX=7
=4)
Y (ny 2 4 6 6 2 ΣY=20
=5)

Here, x  1.75; y  4

09/08/2023 Topic 1: Mean 22

Total Class size


• For the whole class
Z 1 2 1 3 2 4 6 6 2 ΣZ=
(nz= 27
9)

27
• Thus z  9  3
• But notice that 4 5
*1.75  * 4  3
9 9

09/08/2023 Topic 1: Mean 23

In Words
Suppose there are two groups (1, and 2) in a
class/sample. Then the
Mean of the class=
(fraction of the class in group 1) * (mean of
group 1) + (fraction of the class in group 2)*
(mean of group 2).
• Can be extended beyond two groups.

09/08/2023 Topic 1: Mean 24

8
09/08/2023

Another Example
• In a class of 100, 40 are English and 60 are
Spanish. The teacher gave an IQ test. The
English student mean is 30 and the Spanish
students’ mean is 50. What is the mean of the
class?

09/08/2023 Topic 1: Mean 25

Solution
• Total number= (60+40)=100.
• English fraction = 40/100= 2/5
• Spanish fraction = 60/100 = 3/5
• Thus, class mean = (2/5)*30+ (3/5)*(50)=?.
• The fractions, 3/5,2/5 etc. are called weights
attached to each group.
• Weighted mean.

09/08/2023 Topic 1: Mean 26

Notation

z i
z i
nz
x y i i
 i i
nx  n y
nx x  n y y nx ny
  x y
nx  n y nx  n y nx  n y

09/08/2023 Topic 1: Mean 27

9
09/08/2023

Example: Ungrouped Frequency


Distribution
• Suppose the data is 1,1,1, 2,2,3,3,4,4,4,7,7,8.
• We can construct a frequency distribution
table X Frequency X * f
i i i
(fi)

1 3 3
2 2 4
3 2 6 6

f x i i
47
4 3 12 x= i=1
= =3.61(apprx)
n 13
7 2 14
8 1 8
Σfi =13=n Σfi *Xi=
47
09/08/2023 Topic 1: Mean 28

Why The Formula Makes Sense?

111 2  2  3  3  4  4  4  7  7  8
x
13
1*3  2* 2  3*3  4*3  7 * 2  8*1

13
1* f1  2* f 2  3* f 3  4* f 4  7 * f5  8* f 6

13

09/08/2023 Topic 1: Mean 29

Another Example
• X=(4,4,6,6,6,8,10,10)
Xi fi fi *Xi
4 2 8
6 3 18
8 1 8
10 2 20
Σfi= 8 ΣfiXi= 54

fX i i
54
x i 1
  6.75
n 8

09/08/2023 Topic 1: Mean 30

10
09/08/2023

Grouped Frequency Distribution


• Not all information is available.
Class Interval Frequency
0-10 2
10-20 5
20-30 4
30-40 1

• We find the approximate mean by computing


the arithmetic mean of mid points of each
class interval.
09/08/2023 Topic 1: Mean 31

Grouped Frequency Distribution:


Approximate Mean

Class Mid Frequen fi*Xi


Interval point cy (fi)
(Xi)
0-10 5 2 10
10-20 15 5 75
20-30 25 4 100
30-40 35 1 35
Σfi= ? ΣfiXi =?

09/08/2023 Topic 1: Mean 32

Open Ended Class: Mean Cannot be


computed
• Not all information is available.
Class Interval Frequency
Less than 10 2
10-20 5
20-30 4
More than 30 1

• AM cannot be computed.

09/08/2023 Topic 1: Mean 33

11
09/08/2023

Discussion About Arithmetic Mean

• It is well defined for each data set, and provides a


single measure of central tendency.
• Calculation involves all data points.
• However, affected by extreme values (possible
outliers).
• In an economy, 100 people have income 10 (mean
income=10).
• In another economy, 99 people have income 1, but one
person has income 1001 (mean income=11).
• The mean will not be quite representative of the data
in the second case.

09/08/2023 Topic 1: Mean 34

Geometric Mean
• Definition.
• Computation
• The rational.
• Importance in Social Sciences.

09/08/2023 Topic 1: Mean 35

Geometric Mean: Definition


• Suppose there is a data set x1,…x5.
• We want to know the number, which, when
multiplied 5 times, provides the product of
x1*x2…x5. That number is GM (G).
• That is
G*G*G*G*G= x1*x2*x3*x4*x5
In notation of topic 0
G5 = x1*x2*x3*x4*x5
09/08/2023 Topic 1: Mean 36

12
09/08/2023

GM: Example-1
• What is the GM of 1,3, 9.
• Here, x1*x2*x3 = 1*3*9=27 (n=3).
• So we want a number G, such that
G3 = 27
G=3
Ans: GM of 1,3, and 9 is 3.

09/08/2023 Topic 1: Mean 37

GM: Example-2
• What is the GM of 2,8.
• Here, x1*x2 = 2*8=16 (n=2).
• So we want a number G, such that
G2 = 16
G=4
Ans: GM of 2 and 8 is 4.

09/08/2023 Topic 1: Mean 38

GM: Example-3
• What is the GM of 2, 4,8.
• Here, x1*x2*x3 = 2*4*8=64 (n=3).
• So we want a number G, such that
G3 = 64
G=4
Ans: GM of 2,4, and 8 is 4.

09/08/2023 Topic 1: Mean 39

13
09/08/2023

Problem
• If You do not have a calculator, computation of
GM is problematic, even for simple cases.
• What is the GM of 1, 2, 4,8.
• Here, x1*x2*x3 *x4= 1*2*4*8=64 (n=4).
• So we want a number G, such that
G4 = 64
Answer is not simple to find! (2.83 approx)
Ans: GM of 1, 2,4, and 8 is 2.83.

09/08/2023 Topic 1: Mean 40

AM as Representative Data
• AM represents the data set in the sense that
If we replace every value of a data set by the
AM, then the sum of the data set gets
preserved.
Example: (1,3,3,9). The sum of the data is 16.
The AM is 4.
If we have the data set (4,4,4,4) the sum is
preserved.

09/08/2023 Topic 1: Mean 41

GM as Representative Data
• GM represents the data set in the sense that
If we replace every value of a data set by the
GM, then the product of the data set gets
preserved.
Example: (1,3,3,9). The product of the data is
81. The GM is 3 (34 =81).
If we have the data set (3,3,3,3) the product is
preserved.

09/08/2023 Topic 1: Mean 42

14
09/08/2023

Example
• I have deposited 100 Rupees in a bank. The bank
offers 1% of interest rate in the first year, 3% in
the next year, and 8% in the third year. What is
the average interest rate offered by the bank?
• Value of Money after 1st year= 100(1+.01)= 101
• Value of money after 2nd year=101(1+.03)=104.03
• Value of money at the end:
104.03(1+.08)=112.3524 (bank will actually give
112.35.
• (highly unrealistic, but let us keep the value).

09/08/2023 Topic 1: Mean 43

Question: AM
• Which interest rate, applied uniformly over the 3
years, will give us 112.3524? (that is the mean
interest rate quoted by bank)
• Use AM.
• Over the three years, the money has increased by
the proportion 1.01, 1.03 and 1.08, respectively
• AM= 1.04→ 4% interest rate each year
• After 3 years, we will get
100(1+.04)(1+.04)(1+.04)=112.4864.
• Error= 112.4864-112.3524=.134

09/08/2023 Topic 1: Mean 44

Question: GM
• Use GM.
• Over the three years, the money has increased by
the proportion 1.01, 1.03 and 1.08, respectively.
• GM of the proportion G3 =1.01*1.03*1.08=
1. 1235→G=1.0396 (use calculator) (interest rate:
3.96%)
Over three years, we have
• 100*(1+.0396)*(1+.0396)*(1+.0396)=112.3567
• Error= 112.3567-112.3524= .0043

09/08/2023 Topic 1: Mean 45

15
09/08/2023

Conclusion
• GM is appropriate if the data involves
calculation of mean growth rates.
• Inflation, GDP, population…..the list is quite
endless (especially in Development Studies).

09/08/2023 Topic 1: Mean 46

The Story of HDI


• HDI involves aggregation of three parameters in
producing an overall index of well being (more on
this in later courses).
• HDI: 3 indicators. Health (H), Education (E), GDP
per capita (Y). Each indicator is between 0 and 1.
• Upto 2010-11, HDI was calculated in terms of AM
I= (H+E+Y)/3.
• After that, HDI is calculated in terms of GM
I3= H*E*Y (or I=(H*E*Y)1/3) )
09/08/2023 Topic 1: Mean 47

A Fictional Country
• H=.5, Y=.5. E=.5. AM=GM=.5
• Now suppose E and H both fall to .4. The country
increases Y score to .7. (think about relaxing child
labor laws, which takes a toll in health and
education, but boosts GDP).
• AM= .5 (as before)
• GM= .48 (approx.).
• GM would be worse if some of these indicators
become close to zero: even if one boosts Y to 1.

09/08/2023 Topic 1: Mean 48

16
09/08/2023

Moral
• If one uses GM, then it has hard to substitute
and compensate between indicators,
compared to AM.
• Therefore, GM is used for calculation of HDI.
• There is more to HDI calculation than this.
• However, GM also suffers from the problem of
extremely large values (outliers).
• If any observation is 0 or negative, GM cannot
be calculated.

09/08/2023 Topic 1: Mean 49

Reference
• GW, Chapt 3

09/08/2023 Topic 1: Mean 50

17

You might also like