You are on page 1of 192

Subscribe to DeepL Pro to edit this document.

Visit www.DeepL.com/Pro for more information.


Synthesis methods: aggregation and weighting Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Aggregation methods
We must now choose a merging function that allows us to
aggregate the individual indicators, made dimensionless, for
the de nition of the nal indicator
s = f (w1g (x1); : : : ; wmg (xm))
The most commonly used functions f () are linear additive functions
s = w1g (x1) + w2g (x2) + : : : + wmg (xm)
In fact, until now we have used the function sum with weights w
unitary ( ssati equal to 1)

m
X
s = g (xij )
j =1
or the simple arithmetic mean function
1 m

Xg
s= m (xij )
j =1
Synthesis methods: aggregation and weighting Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

In all the examples seen not to hour we have assumed that the
function of aggregation (merging) (f ()) was the function sum, or
the function aritmetica average

In reality we will see that in some cases we use another type of


average, or the combination of several types of function, in
particular

La media aritmetica (Human Development Index - HDI -


Old methodology -UNDP)
average power (Human Poverty Index - HPI - Old methodology
-UNDP)
... between the power averages the quadratic mean
La media geometrica (Human Development Index - HDI - UNDP)
The harmonic mean (Gender Inequality Index -GII- UNDP)
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index
(2008) The m

Human Development Indicators (Old


methodology, until 2009)
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

SU de nition

From Wikipedia:
The Human Development Index (HDI) is a macroeconomic
development indicator used, alongside GDP (Gross Domestic
Product), by the United Nations to assess development in
member countries.
Previously, only GDP was used

GDP is based solely on economic growth and does not take into
account the capital (especially natural capital) that is lost in the
growth processes
This parameter only measures total economic value (average
income distribution).

This measure can be distorted by the fact that a very rich citizen
can redistribute his wealth over many poor people, distorting the
measure of the latter's standard of living.
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

SU de nition

In the Human Development Index, di erent factors have been


introduced as indicators of well-being that cannot be held
massively by a single individual: think literacy and life expectancy

The index scale goes from 0 to 1 and is divided into high human
development nations, medium development nations, low
development nations (old methodology)
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

United Nations SU indicators

The Human Development Indicator proposed by the


United Nations The Human Development Index { HDI
and a synthetic measure of human development consisting of
three basic dimensions:
Long and healthy life
Knowledge
Wealth { decent standard of living
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

United Nations SU indicators (Old


Methodology)

The three basic components of HDI are:

1 Life expectancy at birth { Life expectancy at birth


2 Adult literacy rate (weight w = 2=3) + the combination of gross enrollment
rates in primary, secondary, and tertiary education (weight w = 1=3) { Adult Literacy Index +
GER
3 GDP per capita { GDP (PPP US$)

Before calculating the HDI, the basic components are re-


scaled in the interval [0; 1] using the min and max values
defined for each indicator (goalposts).

The index, in its first formulation, was obtained through the


arithmetic mean of the three components.
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

United Nations SU indicators (Old


Methodology)
echnical note 1

Calculating the human development indices

he diagrams hereSchematicamente:summarizehowthefivehuman development indices used in the Human Development


Report are constructed, ighlighting both their similarities and their differences. The text on the following pages provides a detailed
explanation.

HDI DIMENSION A long and A decent standard


healthy life Knowledge of living
INDICATOR Life expectancy Adult literacy rate Gross enrolment ratio GDP per capita
at (GER) (PPP US$)
Adult literacy index GER index

DIMENSION Life expectancy index Education index GDP index


INDEX

Human development index (HDI)


HPI-1 DIMENSION A long and
healthy life Knowledge A decent standard of living
INDICATOR Probability at birth Adult illiteracy rate Percentage of population Percentage of children
of not surviving without sustainable access to under weight for age
to age 40 an improved water source

Deprivation in
a decent standard of living
Human poverty index
for developing countries (HPI-1)
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

United Nations SU indicators (Old


Methodology)

Let's focus on the knowledge component

The adult literacy rate is defined as the % of the population 15


that can read, rewrite, and understand a simple sentence that
relates to everyday life (i.e., alphabets; the adult illiteracy rate is
the 100's complement of this measure).

It shows the degree to which the objectives pursued by the


training programs aimed at improving the basic skills of the
population have been achieved.

Because of these characteristics, the index represents the


knowledge base necessary for the further cultural growth of a
country.
It is obtained by dividing the number of alphabets 15 by the
population
15 and multiplying by 100
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

United Nations SU indicators (Old


Methodology)

Formula:
Lt

LITt15+ = P15+t 100


15+

Where LITt15+ and the adult literacy rate 15+ in year t


Lt15+
and the number of alphabets 15+ in year t
P15+t and the population 15+ in year t

The sources that are used to calculate this magnitude are either
censuses or labor force surveys

This indicator is assigned a weight of 2=3 in the definition of the


knowledge component of the HDI.
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

United Nations SU indicators (Old


Methodology)

The Gross Enrollment Ratio (GER) is defined as the share of


the total population of a certain age that is enrolled in a certain
level of education.

Indicates the ability of an educational system to recruit students


of a particular age group.

It is obtained by dividing the number of students enrolled (in a


certain year t) at a certain level (h) of education by the amount
of population that is in the age group corresponding to that
level of education and multiplying by 100
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

United Nations SU indicators (Old


Methodology)

Formula:

GERh =

Where GERth is the Gross Enrollment Ratio for level h in year t


Eht and the number of enrollees at level h in year t
Pht;a and the population in the age group a in year t to
which, u cialy corresponds the level of education h

The sources that are used to calculate this magnitude are either
censuses or education statistics
This indicator is assigned a weight of 1=3 in the definition of the
knowledge component of the HDI.
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index
(2008) The m

United Nations SU indicators { e.g., South Africa

1 Calculation of the long and healthy life component


The life expectancy indicator measures a country's level (in
relative terms) of life expectancy at birth. For South Africa which
has a life expectancy at birth of 48:4 years

48
Life expectancy index =
85

Calculating the HDI

is illustration of the calculation of the HDI uses


of human data for South Africa.
age
1. Calculating the life expectancy index
achieve-
e life expectancy index measures the relative
mensions of
achievement of a country in life expectancy at birth. For
South Africa, with a life expectancy of 48.4 years in
ured by life 2003, the life expectancy index is 0.391.
Life expectancy index = = 0.391
dult literacy ) 48.4 – 25
and the and 85–25
tertiary one-
third
30
Goalpost
70

Goal
post 60
0
.600

90
1
. 50
0 48.4 0.391 .400 25 yrs. 20 Life
0
. Life expectancy
8
0 40 .200
85 yrs. 80 0
expectancy index

measured (years)

by
2. Calculating the education index

d, an index e education index measures a country's relative


ese dimen-
achievement in both adult literacy and combined

primary, secondary and tertiary gross enrolment. First,

an index for adult literacy and one for combined gross 100 1.00
Synthesis methods: aggregation and weighting Human Development
Indicators The Human Poverty Index (2008) The m

United Nations SU indicators { e.g., South Africa

Calculating the HDI

nt 2 Calculation of the knowledge component90

Goalpost

1.00

is illustration of the calculation of the HDI uses data for 85 yrs.


80

The education indic ator considers l 'a lpha b etization of adults and

ure of human South Africa. 70 .800


verage achieve-
gross education enrollment rates first ia, secondary and tertiary

r
1. Calculating the life expectancy index 60 .600

c dimensions of

e life expectancy index measures the relative


achievement 50 82:4
4
00
0.

of a country in life expectancy at birth. For South Africa, 48.4 0.391

Adult literacy index = = 0:824

0.
easured by life with a life expectancy of 48.4 years in 2003, the life 40 100 200

he adult literacy expectancy index is 0.391. 30

78:0
Life expectancy index = = 0.391 25 yrs. 0

ght) and the Gross enrolment index = = 0:780

85–25 20 Life

Life expectancy

ry and tertiary 100 0

expectancy index

with one-third (years)

2. Calculating the education index 2 1

Education index = (0:824) + (0:780) = 0:809

e education index measures a country's relative

as measured by 3 3
achievement in both adult literacy and combined

primary, secondary and tertiary gross enrolment. First,

lated, an index an index for adult literacy and one for combined gross 100 1.00

f these dimen- enrolment are calcula ted. en these two indices are 90

ension indices combined to create the education index, with 82.4 80 78 0.809 .800

ion and GDP two-thirds weight giv en to adult literacy and


70

ximum values one-third weight to combined gross enrolment. For 60 .600

ch underlying South Africa, with an adult literacy rate of 82.4 % in


50

2003 and a combined gross enrolment ratio of 78% in 40 .400


the school year 2002/03, the education index is 0.8 09.
30

82.4 – 0
20 .200

Adult literacy index = = 0.824


10

1.00 100–0 0 0

.900 Adult Gross

literacy enrolment Education

.800 Gross enrolment index = 78–0 = 0.780 rate ratio index

.700 100–0
(%) (%)

.600 Education in dex = 2/3 (adult literacy index) + 1/3 (gross enrolment index)

.500

500

400
) and the and

tertiary one-third

measuredGliby

d, an index
hese dimen-
ion indices
and GDP
um values
underlying

1.00

900

800

700

600
ting the education index

0
. y
2
5
indicatorieeducationindex measuresdiSUacountry'sdellerelative United Nations { e.g. South Africa
Life3 r
expectanc
9 s
y index
=1 .= 0
Sy achievement in both adult literacy and combined
nth primary, secondary and tertiary gross enrolment.
esi First,
s
me
tho an index for adult literacy and one for combined
ds: gross 100 1.00
ag
gre
gati enrolment are calculated. en these two indices are 90
on LeLif
85- eMe
25 die 3 Calculation of the wealth component
an Pow
d er
wei Aver
ghti age
combined to create the education index, with 82.4 8 0 78 0.809 .800

ng Hum
De an two-thirds weight given to adult literacy and 70
vel Pove
op rty
me Inde The GDP (in PPP US$) and a prox y of t ut to what on the SU is not
nt Hx
Indiu(200
cat m8) one-third weight to combined gross enrolment. For
60 .600
ors aThe
20 nm
L South Africa, with an adult literacy rate of 82.4% in
if
e
e 50

x
p
e 2003 and a combined 'captured' grossenrolmentdalleratioofcomponen78%in ti life e xpec ta ncy and education. Since a
c
t
a 40 .400
n
c
y the school year 2002/03, the education index is 0.809. 30
e
x
p acceptable level of SU no20 n call e de n ec essentially.200 availability
e
c
ti 82.4 – 0

an
nd
ce Adult literacy index = = 0.824 10

yx
2. (y a transf o

C e
al ar unlimited100- 0of income, we adopt0 rmation0 in logarithmic scale

cu s
) Adult Gross
la
e
nE
rd enrolment index = = 0.780 (%) (%)
ou
lc
ma
e ti
litera
no
tn log(40000) log(100)
7 ri
8 r an
Education index = 2/3 (adult literacy index) + 1/3 (gross enrolment index)
at d
– tie = 2/3 (0.824) + 1/3 (0.780) = 0.809
0 eox
G
ro
ss 100–0 GDPi nd ex = log(10346) log(100) = 0 :7 74

300 100,000
200 3. Calculating the GDP index
100 e GDP index is calculated using adjusted GDP per
0 capita (PPP US$). In the HDI income serves as a
ion surrogate for all the dimensions of human development Goalpost 1.00
not re ected in a long and healthy life and in knowledge.
Income is adjusted because achieving a respectable
level of human development does not require unlimited
$40,000

xpressed as income. Accordingly, the logarithm of income is used. 10,346 .800


a plying the For South Africa, with a GDP per capita of $10,346
(PPP US$) in 2003, the GDP index is 0.774.
mum value

inimum value log (10,346) - log (100)


10,000 0.774 .600

mple average GDP index = = 0.774


ox at right 1,000 .400
HDI for a
log (40,000) - log (100)

Goalpost .200
4. Calculating the HDI
Once the dimension indices have been
calculated, determining the HDI is $100
0

GDP GDP

per capita index


(PPP US$)
Log scale

Dimension indices

HDI
1.
0
0
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

E ect of the logarithmic transformation { GNI PPP {

5
Log (GNI per capita PPP$)

0
0 20000 40000 60000 80000 100000 120000 140000
GNI per capita PPP$
= 2/3 (0.824) + 1/3 (0.780) = 0.809

Synthesis methods: aggregation and weighting Human Development Indicators Averages Power Average Human Poverty Index (2008) The m

100,000

3. Calculating the GDP index Goalpost

Glie SU indicators of Nations Unite { Ex. South Africa

1.00

GDP index is calculated using adjusted GDP per capita

$40,000
0.774

(PPP US$). In the HDI income serves as a surrogate for all 10,346
the dimensions of human development not re ected in a 10,000

.800
long and healthy life and in knowledge. Income is adjusted .600

4 HDI

because achievingCalcolorespectable level of human development

does not require unlimited income. Accordingly, the 1,000 .400


We calculate the simple arithmetic
mean i ca t re components:
logarithm of income is used. For South Africa, with a GDP
per capita of $10,346 (PPP US$) in 2003, the GDP index is .200

HDI = 1 (Life exp. index) + 1 (E du cation at dex) + 1 (GDP index) =

0.774. Goalpost

3 3 0 3

$100

log (10,346) - log (100) 1 1 1GDP GDP

GDP index = = 0.774 for it happens index

= (0:391) + (0:809) + (0:774) = 0:658

log (40,000) - log (100) 3 3 3

(PPP US$)
Log scale
Once the dimension indices have been calculated, determining the HDI is
4. Calculating the HDI straightforward. It is a simple average of the three dimension indices.
HDI = 1/3 (life expectancy index) + 1/3 (education index)
Dimension indices
+ 1/3 (GDP index)
= 1/3 (0.391) + 1/3 (0.809) + 1/3 (0.774) = 0.658 HDI

1.00

0.809 0.774
0.658 .800

.600

0.391 .400

.200

Life Education GDP

expectancy

human de velopmen t repo r t 2005 341


Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

United Nations SU indicators

Monitoring human development: enlarging people's choices . . .

table
. . . the first1. . Human development index

Combined gross

enrolment ratio

Human Life Adult for primary,

development expectancy literacy rate secondary and GDP for

index (HDI) at birth (% ages 15 tertiary schools it happens Life

value (years) and above) (%) (PPP US$) expectancy Education

a b c
HDI rank 2003 2003 2003 2002/03 2003 index index GDP in

High human development

e f
1 Norway 0.963 79.4 .. 101 37,670 0.91 0.99 0.9

e
2 Iceland 0.956 80.7 .. 96 31,243 0.93 0.98 0.9

e f
3 Australia 0.955 80.3 .. 116 29,632 0.92 0.99 0.9

e g h
4 Luxembourg 0.949 78.5 .. 88 62,298 0.89 0.95 1.0

e
5 Canada 0.949 80.0 .. 94 i, j 30,677 0.92 0.97 0.9

e f
6 Sweden 0.949 80.2 .. 114 26,750 0.92 0.99 0.9
e
7 Switzerland 0.947 80.5 .. 90 30,552 0.93 0.96 0.9

e
8 Ireland 0.946 77.7 .. 93 37,738 0.88 0.97 0.9

e f
9 Belgium 0.945 78.9 .. 114 28,335 0.90 0.99 0.9

e
10 United States 0.944 77.4 .. 93 37,562 0.87 0.97 0.9

e
11 Japan 0.943 82.0 .. 84 27,967 0.95 0.94 0.9

e
12 Netherlands 0.943 78.4 .. 99 29,371 0.89 0.99 0.9

e f
13 Finland 0.941 78.5 .. 108 27,619 0.89 0.99 0.9

e f
14 Denmark 0.941 77.2 .. 102 31,465 0.87 0.99 0.9

e
15 United Kingdom 0.939 78.4 .. 123 f, i 27,147 0.89 0.99 0.9

e
16 France 0.938 79.5 .. 92 27,677 0.91 0.97 0.9

e
17 Austria 0.936 79.0 .. 89 30,094 0.90 0.96 0.9

18 Italy 0.934 80.1 98.5 e, k, l 87 27,119 0.92 0.95 0.9

e f
19 New Zealand 0.933 79.1 .. 106 22,582 0.90 0.99 0.9

e
20 Germany 0.930 78.7 .. 89 27,756 0.90 0.96 0.9

21 Spain 0.928 79.5 97.7 e, k, l 94 22,391 0.91 0.97 0.9

5k, l
22 Hong Kong, China (SAR) 0.916 81.6 93. 74 27,179 0.94 0.87 0.9

23 Israel 0.915 79.7 96.9 91 20,033 0.91 0.95 0.8

e
24 Greece 0.912 78.3 91.0 92 19,954 0.89 0.97 0.8

m
25 Singapore 0.907 78.7 92.5 87 24,481 0.89 0.91 0.9

e, k
26 Slovenia 0.904 76.4 99.7 95 19,150 0.86 0.98 0.8

27 Portugal 0.904 77.2 92.5 e, k, l 94 18,126 0.87 0.97 0.8

28 Korea, Rep. of 0.901 77.0 97.9 e, k, l 93 17,971 0.87 0.97 0.8


Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

United Nations SU indicators

table
1
Human development index
. . . and the latest . . .

Combined gross

enrolment ratio

Human Life Adult for primary,

development expectancy literacy rate secondary and GDP for

index (HDI) at birth (% ages 15 tertiary schools it happens Life

value (years) and above) (%) (PPP US$) expectancy Education

a b c
HDI rank 2003 2003 2003 2002/03 2003 index index GDP in

q
160 Angola 0.445 40.8 66.8 30 i, j 2,344 0.26 0.54 0.5

7k, l i q
161 Eritrea 0.444 53.8 56. 35 849 0.48 0.49 0.3

i
162 Benin 0.431 54.0 33.6 55 1,115 0.48 0.41 0.4

163 Côte d'Ivoire 0.420 45.9 48.1 42 i, j 1,476 0.35 0.46 0.4

i
164 Tanzania, U. Rep. of 0.418 46.0 69.4 41 621 0.35 0.60 0.3

n i
165 Malawi 0.404 39.7 64.1 72 605 0.24 0.67 0.3
n i
166 Zambia 0.394 37.5 67.9 48 877 0.21 0.61 0.3

167 Congo, Dem. Rep. of the 0.385 43.1 65.3 28 i, j 697 0.30 0.53 0.3

k i q
168 Mozambique 0.379 41.9 46.5 43 1,117 0.28 0.45 0.4

i q
169 Burundi 0.378 43.6 58.9 35 648 0.31 0.51 0.3

k i q
170 Ethiopia 0.367 47.6 41.5 36 711 0.38 0.40 0.3

q
171 Central African Republic 0.355 39.3 48.6 31 1,089 0.24 0.43 0.4

6k, l q
172 Guinea-Bissau 0.348 44.7 39. 37 i, j 711 0.33 0.39 0.3

i q
173 Chad 0.341 43.6 25.5 38 1,210 0.31 0.30 0.4

n i
174 Mali 0.333 47.9 19.0 32 994 0.38 0.23 0.3

n i q
175 Burkina Faso 0.317 47.5 12.8 24 1,174 0.38 0.16 0.4

j
176 Sierra Leone 0.298 40.8 29.6 45 548 0.26 0.35 0.2

i q
177 Niger 0.281 44.4 14.4 21 835 0.32 0.17 0.3

Developing countries 0.694 65.0 76.6 63 4,359 0.67 0.72 0.7

Least developed countries 0.518 52.2 54.2 45 1,328 0.45 0.50 0.6

Arab States 0.679 67.0 64.1 62 5,685 0.70 0.61 0.7

East Asia and the Pacific 0.768 70.5 90.4 69 5,100 0.76 0.83 0.7

Latin America and the Caribbean 0.797 71.9 89.6 81 7,404 0.78 0.87 0.7

South Asia 0.628 63.4 58.9 56 2,897 0.64 0.58 0.6

Sub-Saharan Africa 0.515 46.1 61.3 50 1,856 0.35 0.56 0.6

Central and Eastern Europe and the CIS 0.802 68.1 99.2 83 7,939 0.72 0.94 0.7

OECD 0.892 77.7 .. 89 25,915 0.88 0.95 0.8


Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

United Nations SU indicators

We evaluate from a theoretical point of view the HDI

HDI = f (g1(x1); g2(x2); : : : ; gk (xk ))

The aggregation function f () and the arithmetic mean of the


three values obtained with g(xi )
Synthesis methods: aggregation and weighting Human Development IndicatorsHuman Le Medie Media di Potenza Poverty Index (2008) The m

United Nations SU indicators

In order to obtain the pure numbers of the components long


healthy life and decent standard of living (GDP), the
following transformation was adopted

g (x) = x min(x)
max(x) min(x)

The knowledge component is, in turn, a function of two


elementary components always obtained through the
transformations of the class
g (x) = x min(x)
max(x) min(x)
Synthesis methods: aggregation and weighting Human Development Indicators The AveragesHuman Media di Potenza Poverty Index
(2008) The m

Averages
Synthesis methods: aggregation and weighting Human Development Indicators The AveragesHuman Media di Potenza Poverty Index (2008)
The m

Other frequently used functions for indicator synthesis are:


The average power (Human Poverty Index - HPI -
Old methodology -UNDP)

La media geometrica (Human Development Index -HDI - UNDP)

The harmonic mean (Gender Inequality Index -GII- UNDP)

Quadratic mean
Synthesis methods: aggregation and weighting Indicators of Human Development The Power AveragesHuman Poverty Index (2008) The m

Power Average
Synthesis methods: aggregation and weighting Indicators of Human Development The Power AveragesHuman Poverty Index (2008) The m

The average of Potenza

The average power of order is given by the -th root of the arithmetic
mean of the -th highest powers of the terms

x= v
1
u n

u
t

The average power is expressed in the same unit of measurement as the


data
Synthesis methods: aggregation and weighting Indicators of Human Development The Power AveragesHuman Poverty Index (2008) The m

Human Poverty Index (2008)


Synthesis methods: aggregation and weighting Indicators of Human Development The Power AveragesHuman Poverty Index (2008) The m

Human Poverty Index (2008)

Human poverty index (HPI1) for developing countries. A composite


index measuring deprivations in the three basic dimensions captured
in the human development index: a long and healthy life, knowledge
and a decent standard of living.
fonte: Technical Note Human Development Report 2005, 2007/2008
Synthesis methods: aggregation and weighting Indicators of Human Development The Power AveragesHuman Poverty Index (2008)
The m

The HPI (Human Poverty Index) measures the degree of scarcity


(deprivations) of states in the three basic dimensions of the
development synthesized in the HDI
A long and healthy life: it is measured through the probability
at birth of not surviving to the fortieth year of age
Knowledge: exclusion from the world of reading and
communication measured through the % of illiterate
people
A decent standard of living: the lack of access to essential
goods is measured through the unweighted average of two
indicators

The percentage of the population that does not have access to a source
of water (with a minimum availability of 20 liters per day per person and
at a distance of less than 1 Km, see Human Development Report)

The percentage of underweight children compared to age

fonte: Technical Note Human Development Report 2005, 2007/2008


Synthesis methods: aggregation and weighting Indicators of Human Development The Power AveragesHuman Poverty Index (2008)
The m

The indicators are all normalized and take values from 0 100
q
HPI1 = 1=3(P1 + P2 + P3 )
q
HPI1 = 3 1=3(P13 + P23 + P33)
P1 =probability at birth of not surviving to age 40 (per 100)

P2 = % illiterate
P3 = unweighted average of the % of the population without
access to a water source and the percentage of underweight
children
fonte: Technical Note Human Development Report 2005, 2007/2008
Synthesis methods: aggregation and weighting Indicators of Human Development The Power AveragesHuman Poverty Index (2008)
The m

for = 1 The HPI1 and the arithmetic mean of the three dimensions

to the increase of the value of a greater weight is attributed


to the dimension in which the deprivation and greater

A value of = 3 has been chosen to give a greater but not


excessive weight to the dimension in which the country is more
penalized.
Let's see the value of HPI1 for Bolivia
P1 = 15:5%
P2 = 13:3%
P3 = 11:3% , P31 = 15% , P32 = 8%
p
HPI1 = 3 1=3(15:53 + 13:33 + 11:33) = 13:6
fonte: Technical Note Human Development Report 2005, 2007/2008
Synthesis methods: aggregation and weighting Indicators of Human Development The Power AveragesHuman Poverty Index (2008) The m

HPI2: for OECD countries


Human poverty index (HPI2) for selected highincome OECD
countries: measures deprivations in the same dimensions of the
HPI 1 and also captures social exclusion
The index is composed of four dimensions
q
HPI 2 = 3 1=4(P13 + P23 + P33 + P43)
P1 = (long and healthy life) probability at birth of not
surviving to the 60th year of age

P2 = (knowledge) percentage of adults in the age group 16 65


lacking functional literacy skills

P3 = (a decent standard of living) percentage of the population


living below the poverty line (50% of the median household
income available)
P4 = (social exclusion) long-term unemployment rate
fonte: Technical Note Human Development Report 2005, 2007/2008
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Quadratic mean
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Given a set n of observations x1; : : : ; xn their quadratic mean is


equal to:
1
xq = v
u n

u
t
The quadratic mean leaves the sum of the squares of the
terms unchanged

The following relationship applies


xaxgx xq
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

The geometric mean


Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

The geometric mean


Given a set of observations x1; x2; :::; xn, the geometric
mean is given by
p
xg = n x1 x2 xn

The geometric mean is that value xg which substituted for all the
terms of the distribution x1; x2; : : : ; xn leaves unchanged the
productivity of the terms of the distribution

xg = vn
u
uY

t
It is an average that can be used only in the case where the set of data
used consists of positive values.

Fields of use: To determine the average rate of increase (or decrease) of


a phenomenon; when data vary according to a geometric progression;
when individual values are derived from ratios.
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Example
With regard to the values of the indicators of the standard of living of
the Italian regions transformed into index numbers, it makes sense to
use as a summary measure the geometric mean
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

The Pearson correlation coefficient between the synthetic indicator


obtained with the arithmetic mean S(x) and the synthetic indicator
obtained with the geometric mean S(xg ) is equal to 0.997.
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Human Development Index New Methodology


Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Human Development Index New Methodology

The Human Development Index (HDI) is a summary measure of


achievements in three key dimensions of human development:
a long and healthy life

access to knowledge

a decent standard of living

The HDI is the geometric mean of normalized indices I for each of


the three dimensions
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Very interesting is the explanation of the determination of


goalpoints reported in the technical notes page 2
http://hdr.undp.org/sites/default/files/hdr2018_technical_ notes.pdf
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

When the geometric mean should be used . . .

An investment of 1 euro returned 10% in the first year, 13% in


the second, 30% in the third year (1.30) and 60% in the fourth
(1.60)

What is the average return on investment over 4 years?


N. B. The geometric mean is used to answer questions such as
this.
It is necessary to determine that value of the interest rate which,
if applied to the capital invested in the four years, would
guarantee the same amount of money.

(1; 10 1; 13 1; 30 1; 60)(1=4) = 1; 268


It is wrong to calculate the arithmetic mean of
the rates of return . . . let's see why .

(1; 10 + 1; 13 + 1; 30 + 1; 60)=4 = 1; 38
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Determination of the amount at of the four years:


I year 1 + (1 0; 10) = 1; 10
II anno 1; 10 + (1; 10 0; 13) = 1; 243
III anno 1; 243 + (1; 243 0; 30) = 1; 616
IV anno 1; 616 + (1; 616 0; 60) = 2; 585

What is the value of the average annual interest rate (xi ) that
guarantees the same amount nal
xg = (1; 10 1; 13 1; 30 1; 60)1=4 = 1; 268

I year 1 + (1 0,268) = 1; 268


II year 1; 268 + (1; 2680,268) = 1; 433
III year 1; 608 + (1; 6080,268) = 2:039
IV year 2:039 + (2:039 0,268) = 2; 585
The average annual rate of increase which guarantees the same
amount at the end of the period is 0.268 (or 26.8%).
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Harmonic mean
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Harmonic Media

The harmonic mean of a set of observations x1; : : : ; xn


and the inverse of the arithmetic mean of the reciprocal
of the terms
n
xa = Pn 1 (3)
i=1 xi

Starting from the de nition of analytic mean, we will say that


the harmonic mean is that value xa which substituted for the
terms x1; : : ; xn leaves unchanged the sum of the reciprocals
of the terms
n
X1
xi
i=1
Cannot be calculated if at least one of the terms is equal to 0
It makes sense to calculate it if the reciprocal of the terms
has a meaning (as we will see in the example)
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

For a frequency distribution in which the values x1; : : : ; xk are


observed with frequency n1; : : ; nk the harmonic mean is given by
n

xa = n1 n2 nk

x1 +x2 :::+xk
nni
If we use the relative frequency distribution, placed fi =
1

xa = f1 f2
fk

xk
x1 +x2 :::+
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

The harmonic mean of the distribution 2, 3, 3, 4, 4, 4, 7, 7


10
xa =
12 + 23 + 54 + 27

In terms of relative frequencies fi


1
xa = = 3:70
02:1 + 03:2 + 04:5 + 07:2
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

When to use the harmonic mean

The number of hours spent by 4 secretaries of a


company to prepare a notice for a tender is equal to

2;5; 2;0; 1;5; 3;0

Determine the average time taken (in hours) to prepare a call for proposals
4
a = 21 11 = 2; 1
;5 + 12 + ;5 + 13

The reciprocal of terms and hourly productivity of the 4 secretaries:

0; 4; 0; 5; 0; 66; 0; 33
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Harmonic mean properties

And always between the minimum and maximum of the distribution


Whatever the distribution of the variable, the following
relationship is always valid

xaxgx
It is invariant with respect to scale (or similarity) transformations. If
we apply the transformation bxi to all observations xi of a variable,
the harmonic mean of the variable x = bx will be equal to

xa = bxa
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Gender-Related Development Index - Old


Methodology
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Gender-Related Development Index

Reflects inequalities between men and women in the three


dimensions measured by the HDI
A long and healthy life
Knowledge
A decent standard of living
fonte: Technical Note Human Development Report 2005, 2007/2008
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

1 For each dimension, size indices are calculated for both genders using the
re scaling (maximum and minimum are ssi)

at the end of the first step, two indices are obtained for each
dimension (Female and Male life expected index, female and
male education index and female and male income index).

2 The indices for the two genders are aggregated with a function that
penalizes the differences in the results achieved between men and women.
For each dimension we obtain an equally distributed index

Eq:distrib:Index = ( femaleindexfemaleshare +
1
maleindexmaleshare ) which is the harmonic mean of the
male and female index in each dimension. In this way is
introduced a penalty for the presence of inequalities between the
two genders that gives greater weight to the lowest value (greater
weight is given to the group with lower results).

fonte: Technical Note Human Development Report 2005, 2007/2008


Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

The Gender-Related Development Index e la media non pesata


dei tre Equally distributed Indexes (EDI)

1 1 1

GDI = 3 Eq:distrib:Indexleb+ 3 Eq:distrib:Indexedu+ 3 Eq:distrib:Indexincome


Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Calculation of the indicator for Botswana


1 Equally distributed life expectancy index
Female 48.4 (min 27.7 max 87.5), share of population 0.504 Male
47.6 (min 22.5 max 82.5), share of population 0.496

48:4 27:5
LifExpIndF = = 0:34
87:5 27:5
47:6 22:5
LifExpIndM = = 0:41
82:5 22:5

0:504 0:496
EquallyDistriIndex = +
0:348 0:419
fonte: Technical Note Human Development Report 2005, 2007/2008
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Calculation of the indicator for Botswana


2 Equally Distributed Education Index
Female adult literacy rate 81.8% (min 0 max 100), share of
population 0.504 (Female adult literacy index 0.818)
Female gross enrolment ratio 70.1% (min 0 max 100),
population share 0.504 (Female gross enrolment index 0.701)
Male adult literacy rate 80.4% (min 0 max 100), population
share 0.504 (Male adult literacy index 0.804)
Male gross enrolment ratio 69.0% (min 0 max 100),
population share 0.504 (Male gross enrolment index 0.690)

EducationIndex = 2=3(AdultLiteracyIndex)+1=3(GrossEnrolmentIndex)

FemaleEducationIndex = 2=3(0:818) + 1=3(0:701) = 0:779


MaleEducationIndex = 2=3(0:804) + 1=3(0:690) = 0:766
0:504 0:496
EquallyDistriEducationIndex = ( + ) 1 = 0:773
0:779 0:766
fonte: Technical Note Human Development Report 2005, 2007/2008
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Calculation of the indicator for Botswana


3 Equally Distributed Income Index
Female 5913 (PPP US$) (min 100 max 40000), share of
population 0.504
Male 19094 (PPP US$) (min 100 max 40000),
population share 0.496

log(5913) log(100)
IncomeIndF =
log(40000) log(100)
log(19094) log(100)
IncomeIndM =
log(40000) log(100)
0:504 0:496
EquallyDistIncIndex = ( + )
0:681 0:877
fonte: Technical Note Human Development Report 2005, 2007/2008
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

4 GDI and the average of the indices for the 3 dimensions

GDI = 1=3(EqDistLifeExpI )+1=3(EqDistEduI )+1=3(EqDistIncomeI )

GDI = 1=30:380 + 1=30:773 + 1=30:766 = 0:639


fonte: Technical Note Human Development Report 2005, 2007/2008
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Summary of analytical averages

Arithmetic mean ! expresses the concept of equidistribution

Geometric mean ! fields of use: average rate of price increase,


rate of growth of a population, observations that vary in
geometric progression, observations that are the result of ratios

Harmonic mean ! average speed of travel, purchasing


power of money, average productivity

Quadratic mean ! eliminates the in uence of signs, it is chosen


when one wants to give more importance to the observations
that deviate more from the typical values of the distribution
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Other synthesis methods


Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Other widely used synthesis methods for aggregating simple


indicators are
The method of penalties by coefficient of variation, known as
the Mazziotta-Pareto Index (Mazziotta and Pareto, 2007)

Adjusted Mazziotta-Pareto Index: a variant of the Mazziotta-


Pareto Index that is used to obtain summary indices for each
dimension of the BES.
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

The Penalty Method for CV - Mazziotta Pareto Index

The method of penalties per coefficient of variation (Mazziotta


and Pareto, 2007) proposes to provide a synthetic measure of a
concept operationalized in several components under the
assumption that each indicator is not replaceable with the others
(formative approach).

This involves the introduction of a penalty for units that have


an unbalanced supply of all components.

The imbalance is assessed by grasping the variability between


the indicators (previously made dimensionless) relating to the
same statistical unit (variability between the values reported in
the same row of the matrix).
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

The calculation of the synthetic index involves the following steps


1 Standardization of indicators
Given a matrix X = xij of n rows (territorial units) and m columns
(indicators), we turn to the matrix Z = zij :

xij xj
zij = S+M
xj

where

r
xj = nixij ; xj = i (xi jn

P P

The values are transformed into a normal variable with mean 100
and standard deviation 10.

S = 10 and M = 100
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

2 Calculation of the horizontal variability (OF RIGA) between the values


of z relative to the same unit. Given the matrix Z = zij , we calculate for
each observation the coefficient of variation (CV)

CVi = zi

zi

The CVi is given by the ratio between the standard deviation


of the values of zij relative to the i-th unit and their mean ( )
where s
j (zijm zi )2 =
zi = zi

3 Construction of the synthetic index of the i-th unit (MPCVi) using the
formula:

MPCV =
izi
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

The Mazziotta-Pareto Index is a function for the synthesis of a


set of elementary indicators which assumes that each
component is not substitutable with the others

Adopts a non-compensatory approach that requires a


balanced distribution of all elementary components

The index is based on a standardization of the elementary


indicators, at the reference time, which makes the indicators
independent of variability.

This allows time comparisons to be made only in relative terms


with respect to the average (The index is not suitable for making
time comparisons).
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Focus: What is the coefficient of coefficient of


variation CV ?
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

The coe variation coefficient CV

The variance and standard deviation are indices that are


affected by the unit of measurement and the order of
magnitude of the data. Therefore the comparison of variability
between different collectives or different variables is
compromised.
To compare the variability of two distributions for the character
x with > 0 can be used the coefficient of variation:
CV = 100

Il coe ciente di variazione e uguale a 0, quando = 0


Therefore can be used to determine whether a distribution and
more variable than another but nothing can tell us about the
intensity of this variability (ie if we are far or near the case of
maximum variability)
It takes on negative values if the average is negative. In this case
the absolute value of the average is taken.
Assumes sharp values if the average is equal to 0
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Example: The Coe variation coefficient

Income of a company's junior and senior employees

Employees x=income
Junior 1100 1000 900 1300 1450
Senior 3000 4000 2800 4500 3500

1= 1216; 7 1 = 235; 70

2 = 3666;7 2 = 620;9

235; 70
CV1 = 100 = 100 = 19; 3%
1216; 7

CV2 = 100 = 3666620;; 97 100 = 16; 7%


Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Adjusted Mazziotta Pareto Index

It consists in aggregating, through the arithmetic mean, the


elementary indicators transformed with the Rescaling method.

The average obtained is penalized by the horizontal variability of


the indicators using the penalty method for the CV.
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Adjusted Mazziotta Pareto Index

In order to make absolute comparisons possible, over time, a


different procedure of data normalization was adopted with a re-
scaling of the elementary indicators with respect to two
goalposts

a minimum and a maximum that are representative of the entire


time period considered (e.g. several years)
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Adjusted Mazziotta Pareto Index

The steps for calculating the Adjusted Mazziotta-Pareto Index


are as follows (Mazziotta and Pareto, 2015)
1 The Rescaling technique is applied to the values of each indicator xijt
relative to a given year t of the indicator xj.

xijt min(xj )
Rijt =
max(xjmin ) (xj )
where min(xj ) and max(xj ) are the goalposts
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

Adjusted Mazziotta Pareto Index

1.a Determination of Goalposts:

minxj = Refxj maxxj = Rifxj +


xj xj

Refxj and the reference value for indicator xj (e.g. Italy 2010) and
and meta range of variation of the observed values

xj = (supxj infxj )=2


with supxj and infxj the maximum and minimum of the indicator
xj in the entire time period considered
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

2 In order to have an indicator value centered on the value 100 and with
standard deviation 10 in the methodology of BES indicators, the following
transformation is applied

rijt = Rijt 60 + 70
In the BES indicators the goalposts are defined in such a
way as to set the total for Italy in the base year equal to 100.

In this way, regions with an overall level of indicators higher


than the value for Italy in the base year will have scores
higher than 100, while those with a lower level will have
scores lower than 100.

In case of an indicator with negative polarity, the complement to


200 of the indicator is calculated rij
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

3 The synthetic index of the i-th unit (MPCVi) is obtained through the
formula of the Mazziotta-Pareto Index

AMPIi = ri (1

AMPIi = re
where cvri is the coefficient of variation of the unit i
BES2015 Report (p. 49-54)
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

The reference matrix is "Indicators". The polarity of the negative indicators was inverted using the 200- r_ij transformation
Metodi di sintesi: aggregazione e ponderazione Indicatori di Sviluppo Umano Le Medie Media di Potenza Human Poverty Index (2008) The m

BES2015. . . ...
The AMPI is characterized by the combination of an average
hectare and a penalty hectare and indicates how each unit
ranks with respect to the goalposts.

The choice of the synthesis method is based on the assumption


of a formative measurement model, in which indicators are seen
as the cause, rather than the effect of the latent variable.

The elementary indicators are not interchangeable (the omission


of an indicator leads to the omission of a part of the construct)
and the correlations between them are not explained by the
measurement model
Gender Inequality Index

Gender Inequality Index


Gender Inequality Index

Gender Inequality Index, presents a composite measure of


gender inequality using three dimensions:
reproductive health

empowerment
labour market
Gender Inequality Index

Reproductive health is measured by two indicators: the


maternal mortality ratio and the adolescent birth rate.

Empowerment is measured by the share of parliamentary seats


held by women and the shares of population with at least some
secondary education by gender.

Labour market is measured by participation in the labour force


by gender.

A low GII value indicates low inequality between women and men,
and vice-versa.
Gender Inequality Index
Gender Inequality Index

The index varies between 0 (situation of perfect equality


between the two genders) to 1 (situation in which there is
maximum inequality to the disadvantage of one of the two
groups)

The index nal and the result of several aggregation functions:


The geometric mean is used to aggregate the indices for the
different dimensions for men and women
The harmonic mean is used to aggregate the indices for the two
genders
Gender Inequality Index
Gender Inequality Index
Gender Inequality Index
Gender Inequality Index
Gender Inequality Index

http://hdr.undp.org/sites/default/files/hdr2018_technical_ notes.pdf
An introduction to the analysis of the relationships between more variables

An introduction to the analysis of the


relationships between more variables

Isabella Sulis

Socio-Economic Indicators

yr. 2019/2020
An introduction to the analysis of the relationships between more variables

Structure

Bivariate relationships and limitations of


bivariate analysis Introduction to multivariate
relationships
Types of multivariate relationships
Notes on trivariate regression and partial correlation
coefficients
Reference Text: Chapters 9-10-11 Agresti-Finlay
An introduction to the analysis of the relationships between more variables

Linear Regression

The following variables were collected for 50 states.


homicide rate: number of homicides per 100,000 inhabitants
Violent crime rates: number of violent crimes per 100,000
population
Percentage of people who have at least upper secondary level
education
percentage of people living in metropolitan areas percentag
percentage of single-parent families
An introduction to the analysis of the relationships between more variables

The rate of crime and that one of homicides are of the natural
variable answer while the rate of poverty and the percentage of
families with a single parent like explanatory variables
In the relationships between quantitative variables we analyze
three aspects:
1 We investigate if there is an association and if it is of linear type

2 We study the intensity of the linear association through the


correlation
3 We study dependence: through the regression equation we
predict the value of the response variable based on the value assumed by the
explanatory variable
An introduction to the analysis of the relationships between more variables

Linear functions

Study the dependence between two variables

We analyze how the y = homicide rate across states varies with


levels of the x = percentage of population below the poverty
level

The simplest type of link between two variables x and y can be


expressed through a line
In this case we will say that y is a linear function of x

The y = + x expresses the observations of y as a linear function


of the observations of x. This formula yields a line with slope
(beta) and intercept (alpha)
An introduction to the analysis of the relationships between more variables

Linear functions

Example | A linear function

The formula y = 3 + 2x is a linear function of the type y = + x


with = 3 and = 2

The intercept on the y-axis is equal to 3 and the slope


to 2 x = 0 implies y = 3 + 2(0) = 3 while for x = 1 we
have
y =3+2(1)=5
18
y
15

12

3
x
0 1 2 3 4 5 6
An introduction to the analysis of the relationships between more variables

Intercept and Slope Interpretation

Quando x = 0, l’equazione y = + x simplifies in


y = + x = + (0) =

The constant in this equation is the value of y when x = 0.


When the points on the y-axis have x = 0 the line, at its point
of intersection with the y-axis, has a height equal to

Therefore, and called the y-axis intercept


An introduction to the analysis of the relationships between more variables

Intercept and Slope Interpretation

The slope (angular coefficient) expresses the variation of y for


unit increments of x
That is, for two values of x that di erate by 1.0 (such as x =
0 and x = 1), the values of y di erate by a quantity
For the line y = 3 + 2x, we have y = 3 when x = 0 and y = 5 for x
= 1 These values of y di erate by = 5 3 = 2
For two values of x that have a di erence of 10 units,
the corresponding values of y will di er by 10
An introduction to the analysis of the relationships between more variables

Intercept and Slope Interpretation

Example | Rettes for predicting crime rate

For the 50 states USA we consider the variables y = rate of


criminality and x = rate of poverta

We see that the line y = 210 + 25x approximates the


relationship between the two variables

The y-intercept equals 210: it represents the crime rate in


correspondence of the poverty rate x = 0

The slope is equal to 25: when the percentage of population


with income below the poverty level increases by 1, the crime
rate increases by about 25 crimes per year per 100000
inhabitants.
An introduction to the analysis of the relationships between more variables

Intercept and Slope Interpretation

Example | Rettes for predicting crime rate

If x = percentage of population living in urban areas, the line


that approximates the relationship is y = 26 + 8x

The slope is smaller than the slope when the predictor is the
poverty rate.

An increase of 1 in the percentage of the population below the


poverty level corresponds to a greater change in the crime rate
than an increase of 1 in the percentage of the population living
in urban areas.
An introduction to the analysis of the relationships between more variables

Intercept and Slope Interpretation

The equation y = 1756 16x, which has slope 16, approximates


the relationship between y = crime rate and x = percentage of
residents with a high school diploma

When = 0, the graph is a horizontal line: the value of y is


constant and does not vary with the variation of x
An introduction to the analysis of the relationships between more variables

Models are simple approximations of reality

A model and a simple approximation of the relationship that


binds variables together in the population

The linear function is the simplest mathematical function


to describe this relationship in the case of two quantitative
variables
LA
20
Rate Residual for
homicides Louisiana
15

10

10 15 20 25 Poverty rate
An introduction to the analysis of the relationships between more variables

Models are simple approximations of reality

Per un dato valore di x, il modello y = + x predicts a y value

The values of e are estimated using an estimation method and


the prediction equation will be given by

y^ = a + bx
y^ represents the expected (average) value of y , E (y ) for a
given value of x
More the values predicted are near to the observed
values, better is the model
An introduction to the analysis of the relationships between more variables

Forecasting errors: residuals

Forecasting errors: residuals

The prediction equation y^ = 0:86 + 0:58x predicts the


homicide rate through x = poverty rate

A comparison of actual and predicted homicide rates verifies


the goodness of the prediction equation

For example, Massachusetts has x = 10:7 and y = 3:9: y^ e

y^ = 0:86 + 0:58x = 0:86 + 0:58(10:7) = 5:4

. . . for Lousiana x = 26:4 and y^ = 0:86 + 0:58(26:4) = 14:6

The observed value is y = 20:3

the residual yy^ = 20:3 14:6 = 5:7


Forecast errors are also called residuals
An introduction to the analysis of the relationships between more variables

Forecasting errors: residuals

In a scatter plot, the residual for an observation and the vertical


distance between its point and the prediction line
LA
20
Rate Residual for
homicides Louisiana
15

10

10 15 20 25 Poverty rate

Each observation has a residual. If the prediction line falls near


the points in the scatter plot, the residuals are small
An introduction to the analysis of the relationships between more variables

The prediction equation has the property of least squares

We synthesize the magnitude of the residuals through the sum


of their squares X

SSE = (y y^)2

The quantity describes the variation of the data around the


prediction line

The better the prediction equation, the smaller the residuals will
be and therefore the smaller the SSE will be.

The prediction equation specified by the formulas for estimates


a and b of e has the smallest value of SSE of all those
obtainable with the possible linear prediction equations

a and b are obtained by minimizing the sum of the squares of the


residuals
An introduction to the analysis of the relationships between more variables

r -frame: proportional reduction in forecast error

2 R2
The coefficient r (or also ) is an index between 0-1 that
indicates the goodness of fit of the linear regression model to
the data (0=worst value, 1=best value), the smaller the sum of
2
the squares of the residuals, the more the r tends to 1.
2 2
TSS SSE (y y) (y y^)
P TSS (y P

r2 = =

P
2
y)
2 2
Where TSS = (y y ) and SSE = (y y^)
TSS = P P

Sum of Total Squares(TSS)

TSS SSE = Sum of squares of regression(SSR)

2
SSR (^y
2
r = TSS
= P (y y )2
P
This quantity is called r-square or also co-efficient of
proportional reduction in the prediction error. It corresponds to
the value of the correlation coefficient r squared.
An introduction to the analysis of the relationships between more variables

r -frame: proportional reduction in forecast error

2
Example | r for homicide rate and poverty rate

The correlation between poverty rate and homicide rate for the
2= 2
50 U.S. states is r = 0:629; r (0:629) = 0:395

To predict the homicide rate, the linear prediction equation y^


= 0:86 + 0:58x yields 39.5% less error than y :

Sum of Squares
Regression 307.342
Residual 470.406
Total 777.749

2 = = = 0:3
r =
TSS SSE 777:7 470:4 307:3
777:7 777:7
TSS
An introduction to the analysis of the relationships between more variables

r -frame: proportional reduction in forecast error

P 2
The total sum of squares, TSS = (y y ) , summarizes the
variability of the observations y , since this quantity is the
numerator of the variance of the values y
P 2
SSE = (y y^) summarizes the variability around the
prediction equation and refers to the variability of conditional
distributions
r2
is often presented as the proportion of the variability in the
values of the
2
y explained by the variable x. For example, r = 0:39 shows that the
poverty rate explains 39% of the variance in the homicide rate.
An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

Fertility and use of contraceptive methods

The Robey data set contains the following information for


50 countries:
1 Percentage of women using contraceptive methods among
married women of childbearing age.
2 The fertility rate of the country (average number of children per
woman)
3 A variable indicating the geographic area to which the country belongs.
Fonte: Robey, B., Shea, M. A., Rutstein, O. and Morris, L.
(1992) The reproductive revolution: New survey ndings.
Population Reports. Technical Report (in Fox, J. (2008)
Applied Regression Analysis and Generalized Linear Models,
Second Edition. Sage.)
An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

What is the relationship between contraceptive use and


fertility?
An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

COUNTRY REGION TFR CONTRACEPTORS


Botswana Africa 4.80 35
Burundi Africa 6.50 9
Cameroon Africa 5.90 16
Ghana Africa 6.10 13
Kenya Africa 6.50 27
Liberia Africa 6.40 6
Mali Africa 6.80 5
Mauritius Africa 2.20 75
Niger Africa 7.30 4
Nigeria Africa 5.70 6
Senegal Africa 6.40 12
Sudan Africa 4.80 9
Swaziland Africa 5.00 21
Tanzania Africa 6.10 10
Togo Africa 6.10 12
Uganda Africa 7.20 5
Zambia Africa 6.30 15
Zimbabwe Africa 5.30 45
An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

We now analyze the relationship observed in 50 states between the


following variables: X = percentage of married women of childbearing
age who use contraceptive methods and Y =average number of Gs
per woman.


● ●
● ●

● ●●



● ●
y=Total Fertility Rate




● ●

● ●●

● ● ●

● ●●

● ●







20 40 60
x=Percent of contraceptors among married women of childbearing age
An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

From the scatter plot, we find that there is a negative


association or discordance between the two variables (Data
source: Robey, B., Shea, M. A., Rutstein, O. and Morris, L.
(1992) The reproductive revolution: New survey ndings.
Population Reports. Technical Report M-11.)

The relationship between y and x is of an asymmetric type (y


=response or dependent variable, x = explanatory variable)
The coefficient of linear correlation between the two variables in
the 50 countries is 0; 92
An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

Forecast Equation

y=+x
y^ = a + bx = 6; 88

a = 6; 88 b =


● ●
● ●

● ●●



● ●


y=Total Fertility Rate



● ●

● ●●

● ● ●

● ● ●

● ●








20 40 60
An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

Parameters Estimate
(INTERCEPT) 6.88
CONTRACEPTORS -0.06
Root Mean Square Error (s): 0.5745 on 48 degrees of freedom
R-squared: 0.847

Root Mean Square Error: describes the variability of residuals


around the regression line.
s= r n 2 = r Pn

SSE (y

R-squared: the share of variability in the variable y explained


by di erences in the values of the variable x
TSS 2 2
r2= = (y y) (y y^)
P P
TSS P 2

(y y)

TSS = SSR + SSE


An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

In Senegal, where CONTRACEPTORS=12% the expected value of the


rate of
Fertility E (y ) = 6:88 0:06(12) = 6:16

the prediction error and y y^ = 6:40 6:16 = 0:24


An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

Limitations of Bivariate Analysis


We now analyze the relationship observed in 188 states between
the following variables: X = People in the population who use the
internet (% of total) Y = Average number of gli per woman (TFR)
(Source: United Nations Development Report, 2015)

Internet Users Vs Fertility rates


8,00

7,00

6,00

5,00
Fertility rate

4,00

3,00

2,00

1,00

0,00
0,00 10,00 20,00 30,00 40,00 50,00 60,00 70,00 80,00 90,00 100,00

Internet users (% on population)


An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

Has the di usion of the internet caused a contraction in births in


countries?

Can we give alternative explanations?

Before arriving at these (meaningless) conclusions, we should


ask ourselves other questions: What is the relationship between
these two variables and the level of human development of the
countries?
An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

We now analyze the relationship observed in 188 states between the


following variables (Source: United Nations Development Report,
2015):
Y = Average number of per woman (TFR)
X1 = Percentage of people in the population using the Internet X2

=Country level of human development


An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

On the basis of the Human Development Index (HDI) (which


takes values between 0-1, 0 = MINIMUM VALUE and 1 =
MAXIMUM VALUE), the United Nations Development
Programme classifies countries into 4 categories

VERY HIGH HUMAN DEVELOPMENT: from 0.800

HIGH HUMAN DEVELOPMENT: 0.700-0.799

MEDIUM LOW HUMAN DEVELOPMENT: 0.550-0.699

LOW HUMAN DEVELOPMENT: below 0.550


An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

8.00

7.00

HDI=VH HDI=H HDI=LOW HD

6.00

5.00
TFR

4.00

3.00

2.00

1.00

0.00
0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.
People who use the Internet (% of population)
An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

HDI=LOW HDI=MEDIUM

8.00 7.00

7.00 r=-0.11 6.00


TFR

6.00

5.00

5.00

4.00

TFR
4.00

3.00

3.00

2.00

2.00

1.00

1.00

0.00
0.00 10.00 20.00 30.00 40.00 50.00 60.00
0.00
0.00 10.00 20.00 30.00 40.00 50.00
NETUSERS (%)

HDI=HIGH HDI=VERY HIGH

4.50

3.50
TFR

4.00

3.00 r=+0.116
3.50

2.50

3.00

2.50 2.00

TFR
2.00

1.50

1.50

1.00

1.00

0.50 0.50

0.00 0.00

20.00 30.00 40.00 50.00 60.00 70.00 80.00

60.00 65.00 70.00 75.00 80.00 85.00 90.00 95.00 100.00


INTERNETUSERS (%)
INTERNETUSERS (%)
An introduction to the analysis of the relationships between more variables
Bivariate Relationships and Limits

The level of Internet use in the country is related to the


level of human development in the country, between the
two variables there is a positive relationship.
...the correlation between HDI and INTERNETUSERS is 0,91

Fertility rate is negatively correlated with human development


...the correlation HDI and TFR -0.84

To properly study the relationship between these three variables


we must resort to trivariate analysis ....

For example, study the correlation between TFR and


INTERNETUSERS for countries with the same level of HDI through
partial correlation
....the coe partial correlation coefficient is 0.11%.
An introduction to the analysis of the relationships between more variables
Introduction to trivariate relationships

When we evaluate the causal relationship X ! Y it is


important to verify that three criteria are satisfied:
1 The two variables are associated
2 The X variable chronologically precedes the Y variable
3 No alternative explanations are plausible
An introduction to the analysis of the relationships between more variables
Introduction to trivariate relationships
Association - Chronological order - Exclusion of alternative explanation

1 The first thing to do is to verify the existence of an association


between X and Y . It must be true that if X ! Y
..Let's evaluate if when X varies the values of Y vary.

However, the association does not imply causation


2 Another important element is chronological order: In a study of
the instructional levels of parents (X) and the (Y) we would say that

X!Y
We arrive at the same conclusion in a study from income (X) of
households and expenditures on vacation travel (Y)
However, it is not always easy to identify a cause and effect
relationship. If I am studying the relationship between two
parents' educational qualifications, in general there is no
causal relationship between the two.
Before making an analysis, it is important to identify which
variables are in a causal relationship and which ones, instead,
have a symmetrical relationship between them, with respect to the
phenomenon that we want to study
An introduction to the analysis of the relationships between more variables
Introduction to trivariate relationships
Association - Chronological order - Exclusion of alternative explanation

3 Once we have identified a causal relationship, we must ascertain


that the relationship X ! Y is not explained in any other way. There may be,
in fact, alternative explanations or spurious relationships
An introduction to the analysis of the relationships between more variables
Introduction to trivariate relationships
Statistical control in social research

Statistical control in social research

Statistical control can be performed by means of a trivariate


analysis.

When a trivariate analysis of the data is performed, it is decided


to study the in uence of several explanatory variables (X1, X2)
on the response variable in order to identify the (net) effect of
each variable on Y

A variable is said to be controlled when its in uence is removed.

In the example related to the relationship between TFR and


INTERNETUSERS, we studied the relationship between
these two variables by controlling for Human Development
Index (HDI) levels
An introduction to the analysis of the relationships between more variables
Introduction to trivariate relationships
Beware of "lurking" variables

Confounding Variables

A variable that has an in uence on the relationship between


two variables is called a lurking variable or confounding
variable.
The in uence of X1 INTERNETUSERS on Y ! TFR and
explained by X2 (HDI)

In the social sciences, all socio-demographic characteristics of


the individual are considered confounding or controlling
variables: age, gender, income, education level, etc.

Statistical control is ectuated by grouping observations into


groups that have a different level of the variable that is to be
controlled. For example, to control for education, individuals are
grouped according to educational attainment.
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships

We can identify five types of multivariate relationships between


a response variable Y and two explanatory variables X1 and
X2:
1 Spurious Associations
2 Chained relationships
3 Multiple Causes
4 Suppression variables
5 Interactions
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Spurious associations

An association between X1 and Y is de nite spurious if both


variables depend on a third variable X2 and if their association
disappears when X2 and controlled
The X1 ! Y disappears if I check for levels of X2
Y
2

In practice, changes in X2 produce modi cations in both Y and


X1 which are, therefore, associated, but only in function of their
association with X2.
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Spurious associations

1 X2 has much in uence on Y (HDI ! TFR) (ryx2 = 0; 84)


2 X2 and also association with X1 (HDI ! INTUSERS) (rx1x2 = 0; 90)
3 The association between X1 and Y disappears by checking for
rx1y = 0; 11
X2: :x2
Y
2

1
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Chained relationships

Chained relationships

Also in the concatenated relations concatenated relations we


observe an association that disappears when controlling for a
third variable :

X1!X2!Y

EDU ! INCOME ! LEB


where X1 in uence X2 which in turn in uence Y .
X1 and indirect cause of Y . X2 It comes de nita variable
intervening (or mediatrice)

For example, a higher education (EDUCATION X ) is usually


associated with a higher life expectancy (LEB Y ).
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Chained relationships

Longer life expectancy could be due to higher incomes (INCOME), which


are generally associated with higher levels of education
1 2 Y
EDU LIFEEXP
INCOME
EDU INCOME MORTALITY
INEQUALITY

Greater inequality in income distribution is associated with higher


mortality rates (Agresti Ch. 10).
The relationship seems to disappear by controlling for the percentage
of residents with a high school degree.
Higher rates of education (EDUCATION) could be associated with
less inequality in the distribution of income, which in turn determines
lower rates of mortality (MORTALITY).
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Multiple Causes

Multiple Causes
The situation that is veri ca more frequently and that Y is
explained from more variable. One speaks, therefore, of multiple
causes
2
Y
1

- Di erences in Y=fecundity rates across countries can be


explained by: X1= di erences in women's education levels across
countries; X2= di erences in women's labor market participation rates;
X3=presence of child care services across countries. What is the
relationship of X3 to X2?
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Multiple Causes

Variables that are causally related to (Y ) can be independent


of each other

In a study of delinquency, measured in terms of Y = number of


deviant and illegal behaviors performed by a person

will be considered as factors affecting the values of Y, the


variables: X1 = GENDER, X2 = RAZE and X3 = AGE.

The three variables are independent of each other, but we expect


that the variable Y, assumes different values as the GENDER,
the RAZE and the AGE vary.

Other explanatory variables will also be considered, which may


be associated with each other, such as educational qualifications
X4= EDUCATION, socio-economic status of the family of origin
X5=SES, crime rates in the area of residence X6
=CRIMINALITY
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Multiple Causes

GENRE

BREED
Y
AGE

EDUCATION ?
INCOME
CRIME
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Multiple Causes

Socioeconomic status and educational attainment can both


explain di erences in the number of deviant behaviors, but
these two factors are likely associated

So, when we check for education level.


(X4=ISTRUCTION) we expect that the relationship between the
socio-economic status (X5=SES) and the number of deviant
behaviors Y becomes weaker

What we expect to observe is a decrease in the


association between (X5=SES) and Y controlling for
levels of education (X4=ISTRUCTION)
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Multiple Causes

We can also expect that the socioeconomic status of the


family of origin (X5=SES) has:
1 a direct ect on Y
2 an indirect ect through the instruction (X4=INSTRUCTION)
4
Y
5

X5=SES in uenza indirectly X4=ISTRUCTION that to its turn in


uenza Y
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Multiple Causes

In social research, most of the response variables are


in uenced by multiple factors, which have both direct and indirect
effects on the values we observe.
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Suppression variables

Suppression variables

In the examples seen, we analyzed how the relationship


between two variables changes or disappears when we
keep a third variable under control

It can also happen that the association between two variables


(which show a null level of association) emerges only when we
keep under control a third variable, which takes the name of
suppressing variable
For example, it is known that the relationship between
education (EDUCATION) and income (INCOME) emerges by
controlling for age (Eta)
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Suppression variables

Agresti, Finlay (2012), p. 315


1 INCOME (Y ) and EDUCATION (X1): null association
Income
Education Low
Low 250
High 250
2 EDUCATION (X1) and AGE (X2): negative association = 0; 4 ( and an
association index for qualitative variables whose categories are ordered
and should be interpreted as the correlation coefficient)
Education
Eta Low High
Low 150 350
High 350 150
3 INCOME (Y ) and Eta (X2): positive association = +0; 4
Income
Age Basso Alto % HIGH
Low 350 150 30%
High 150 350 70%
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Suppression variables

Trivariate analysis: We analyze the relationship between


INCOME (Y ) and EDUCATION (X1), controlling for Eta
(X2)
1 Relationship between INCOME (Y ) and EDUCATION (X1) in
the subgroup Eta=LOW
Age=LOW
Income
Istruzione Basso High % HIGH
Low 125 25
High 225 125
2 Relationship between INCOME (Y ) and EDUCATION (X1) in
the subgroup Eta=HIGH
Age= HIGH
Income
Istruzione Basso High % HIGH
Low 125 225
High 25 125
Only by controlling for the variable X2 =Eta does the relationship between Y
and X1 emerge.
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Statistical interaction

Statistical Interaction

When the ect of a predictor X1 on Y changes with the change of the


value of another predictor X2 it is said that there is interaction
between X1 and
X2
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Statistical interaction

There is a statistical interaction if the hectof the number of years


of EDUCATION X1 on the income INCOME (Y ) varies with the
variation of the GENDER X2
We analyze the relationship between EDUCATION X1 and
INCOME Y (in thousands of dollars) for MEN and WOMEN
(who have the same age range X3 = AGE )
X2 =MEN
Y= 10+4X1
X2 =WOMEN
Y= 10+2X1
For each additional year of education we have an expected
change of about $2,000 in the expected value of a woman's
income....
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Statistical interaction

2
1 Y
An introduction to the analysis of the relationships between more variables
Types of multivariate relationships
Statistical interaction

From what we have seen so far, it appears that:


Even when variables show no association, it is important to
exclude suppressing factors.

The fact that the explanatory variables are associated


with each other and associated with the dependent
variable creates confounding

Many times the relationships between variables are of


different types, and it is not always possible to identify what
type of multivariate relationship exists between the
variables.

There are often several causes of the relationships we observe


For this reason we resort to multiple regression ....
An introduction to the analysis of the relationships between more variables
TRIVARIATE Regression Model

We generalize the biavariate model by considering two explanations, x1


e x2
E (y ) = + 1x1 + 2x2
, 1 and 2 are the parameters of the model
We consider two explanatory variables X1 and X2 and a
dependent variable Y
Each explanatory has its own x. For example 2 expresses how
much the expected value of the variable Y varies after a unit
change in the variable x2, keeping the values of x1 constant (for
observations that have the same value as x1)
An introduction to the analysis of the relationships between more variables
TRIVARIATE Regression Model

Example | Education Levels and Crime

For each of the 67 counties in the state of Florida, the


variables were considered.
Y = Crime Rate (annual number of crimes per 1000
inhabitants)
X1 = EDUCATION RATE (% of adults with at least a high school
diploma).
X2 = URBANIZATION RATE (% population living in urban
areas)
An introduction to the analysis of the relationships between more variables
TRIVARIATE Regression Model

Example | Education Levels and Crime

As in the bivariate regression model, we assess the goodness


2
of fit of the model using the index r
2
In the models of multiple regression the corrected r is used (a
2
variant of r )
2
r measures the proportional reduction in error that occurs when
employing variables X1 and X2 to predict Y

Corresponds to the square of the coe multiple correlation


coefficient. ^

That is, the coe correlation coefficient between Y and Y


An introduction to the analysis of the relationships between more variables
TRIVARIATE Regression Model

Example | Education Levels and Crime

For each of the 67 counties in the state of Florida, the


variables were considered.
Y = Crime Rate (annual number of crimes per 1000
inhabitants)
X1 = EDUCATION RATE (% of adults with at least a high school
diploma).
X2 = URBANIZATION RATE (% population living in urban
areas)
The bivariate relationship between Crime Rate and
Education Rate is approximated by

E(Y ) = 51:3 + 1:5X1

(Surprisingly) the association is moderately positive being


ryx1 = 0:47
An introduction to the analysis of the relationships between more variables
TRIVARIATE Regression Model

Example | Education Levels and Crime

In reality URBANIZATION is a causal factor common to


both variables (EDUCATION RATE and CRIME RATE).
The relationship between Crime Rate and the predictors
considered together and:

E (y ) = 58:9 0:6X1 + 0:7X2


The Expected Crime Rate (Y) for a county that records an
EDUCATION RATE X1 = 70 (equal to the median level) and a
URBANIZATION RATE X2 = 50 and

E (y ) = 58:9 0:6(70) + 0:7(50) = 51:9


An introduction to the analysis of the relationships between more variables
TRIVARIATE Regression Model

The expected crime rate (Y) for a county with an EDUCATION


RATE X1 = 80 (equal to the median level) and a
URBANIZATION RATE X2 = 50 and

E (y ) = 58:9 0:6(80) + 0:7(50) = 45:9


An introduction to the analysis of the relationships between more variables
TRIVARIATE Regression Model

Example | Education Levels and Crime

We study the eect of X1 while keeping X2 in check


Let's fix X2 at its mean level of 50. The relationship
will be
E (Y ) = 58:9 0:6X1 + 0:7(50) = 58:9 0:6X1 + 35:0

E(Y ) = 93:9 0:6X1


Controlling for X2, setting it to 50, the relationship between
CRIMES and EDUCATION is negative rather than positive
The inclination has decreased and changed sign from +1:5
of the bivariate m at 0:6 in the trivariate model
For this level of URBANIZATION RATE there is a negative
relationship between EDUCATION RATE and CRIME RATE.
An introduction to the analysis of the relationships between more variables
TRIVARIATE Regression Model

Example | Education Levels and Crime

In summary, education has a negative eect on the crime rate


controlling for the level of urbanization
This phenomenon is an example of Simpson's Paradox
As in the bivariate regression model, we assess the goodness
2
of fit of the model using the index r
R2
In multiple regression models with
R2
measures the proportional reduction in error that occurs when
employing variables X1 and X2 to predict Y

Corresponds to the square of the coe multiple correlation


coefficient. ^

That is, the coe correlation coefficient between Y and Y


In models with more than one explanatory variable, an
R2
adjusted version (adjusted ) is used, which takes into
account the increase in the number of explanatory variables
by introducing a penalty factor.

You might also like