Professional Documents
Culture Documents
Visit www.DeepL.com/Pro
Comparisons by ratios for more information. Comparisons by di erences
http://dati.istat.it/Index.aspx?DataSetCode=DCIS_FECONDITA1
Comparisons by ratios Comparisons by di erences
Average Ratios
They can be calculated as the ratio of a quantity and/or a
frequency referring to one or more collectives
Comparisons by di erences
Comparisons by ratios Comparisons by di erences
x2 x1
1
(x1 + x2) 100
2
e.g. To compare the population density (inhabitants per km2)
in two countries
Comparisons by ratios Comparisons by di erences
Part II
Topics:
Inequality in transferable quantitative traits
Diseguaglianza nelle variabili qualitative Inequality in transferable quantitative characters Graphical representation of the concentration index
Examples
If we examine the distribution of university students of two
universities with respect to the region of origin and we observe
that the students of the first university all come from the same
region, while the students of the second university come from
different Italian regions, we can say that in the second university
there is more inequality of students with respect to geographical
origin than in the first...
Information on the origin of students enrolled in a university can
be used as an indicator of the university's ability to attract
students from other regions (attractiveness indicator). The
indicator will assume a value equal to 0 when all those enrolled
in the university come from a single region, and a maximum
value when they come in equal measure from all regions.
Similarly, if we look at the distribution of two states (A and B)
with respect to the income of citizens: if in state A all citizens
have the same income, while in state B the distribution of
income has a U-shape, we will say that in state B there is more
inequality in the distribution of income than in state A.
Diseguaglianza nelle variabili qualitative Inequality in transferable quantitative characters Graphical representation of the concentration index
In the case (1) all statistical units are characterized by the same
modality
In case (2) all statistical units are distributed among the modes
Inequality in qualitative variablesInequality in transferable quantitative characteristics Graphical
representation of the concentration index
Heterogeneity indices
that it grows as you move away from situation (1) and closer to
situation (2)
Background
How does wealth vary across social groups, across countries, and
over time?
How is wealth inequality measured? What information should a
synthetic index of inequality provide?
Inequality in qualitative variablesInequality in transferable quantitative characteristics Graphical
representation of the concentration index
All three distributions have mean 2.9 and the sum of incomes is
14.5
In distribution (a): i = 1; 2; 3; 4; 5 A1
= 1;2;
A2 = 1;2+1;7 = 2;9
A3 = 1;2+1;7+2;9 = 5;8
...
A5 = 1;2+1;7+2;9+3;6+5;1 = 14;5 The
ratio
Ai
Qi =
An
The report
i
=
Pi
indicates the fraction of the poorest i units out of the total n units
P1 = 1=5, P2 = 2=5, P3 = 3=5, P4 = 4=5, P5 = 5=5
xi
Ranking To Pi
1 1,2 1,2, 0,20
2 1,7 2,9 0,40
3 2,9 5,8 0,60
4 3,6 9,4 0,80
5 5,1 14,5 1,00
40% of the poorest units own 20% of the total income, 60% of
the poorest units own 40% of the total income
Ranking xi To Qi
1 1,2 1,2, 0,08 0
2 1,7 2,9 0,20 0
3 2,9 5,8 0,40 0
4 3,6 9,4 0,65 0
5 5,1 14,5 1,00 1
Pi = Qi
PiQi = 0
Inequality in qualitative variablesInequality in transferable quantitative characteristics Graphical
representation of the concentration index
Minimum concentration
Ranking xi To Qi Pi
1 2,9 2,9, 0,20 0,20
2 2,9 5,8 0,40 0,40
3 2,9 8,7 0,60 0,60
4 2,9 11,6 0,80 0,80
5 2,9 14,5 1,00 1,00
X
(Pi Qi ) = 0
Inequality in qualitative variablesInequality in transferable quantitative characteristics
Graphical representation of the concentration index
a
Maximum concentration= MAXIMUM VARIABILITY =
unit possesses the total; the other n 1 possesses
a null amount of the character:
x1 = x2 = = xn 1 = 0 , xn = An
e
i
Pi = n ,Q1 = Q2 = = Qn 1 = 0,Qn = 1
Pi Qi = Pi = n
and for i = n
PiQi = 0
Inequality in qualitative variablesInequality in transferable quantitative characteristics Graphical
representation of the concentration index
Maximum concentration
1 0 0 0,0 0,20
2 0 0 0,0 0,40
3 0 0 0,0 0,60
4 0 0 0,0 0,80
5 14,5 14,5 1,00 1,00
n1 n1
Xi X
(Pi Qi ) = Pi
=1 i=1
The more concentrated the character, the greater the di erence
between (Pi Qi )
Pn 1
The maximum value e i=1 Pi
Inequality in qualitative variablesInequality in transferable quantitative characteristics
Graphical representation of the concentration index
Qi
The more the character is concentrated the greater the Pi di
erence.
The concentration measurement should take into account all Pi
Qi di erences except the last one (which is always equal to 0).
The simplest formula you can use is
n 1
G= (Pi Qi )
i=1
X
which is minimal, and is worth 0, if
P1 Q1 = 0;P2 Q2 = 0; : : : ; Pn 1 Qn
n
i=1 =1
P P
Inequality in qualitative variablesInequality in transferable quantitative characteristics Graphical
representation of the concentration index
n1 (P Q ) 0; 67
i
g= Pi=1 in=11i Pi
= 2; 00
= 0; 335
P
Inequality in qualitative variablesInequality in transferable quantitative characteristics Graphical
representation of the concentration index
Maximum concentration
Ranking xi To Qi Pi Pi
1 0 0 0,0 0,20
2 0 0 0,0 0,40
3 0 0 0,0 0,60
4 0 0 0,0 0,80
5 14,5 14,5 1,00 1,00
4
X
Pi = 0; 20 + 0; 40 + 0; 60 + 0; 80 = 2; 0
i =1
n 1(P Q ) 2;
i
g= Pi=1in=11i Pi
=
P
Inequality in qualitative variablesInequality in transferable quantitative characteristics Graphical
representation of the concentration index
Minimum concentration
Ranking xi To Qi Pi
1 2,9 2,9, 0,20 0,20
2 2,9 5,8 0,40 0,40
3 2,9 8,7 0,60 0,60
4 2,9 11,6 0,80 0,80
5 2,9 14,5 1,00 1,00
n1 (P Q
i
g= P i=1 in=11i Pi
P
Diseguaglianza nelle variabili qualitative Inequality in transferable quantitative characters Graphical representation of the concentration index
LORENZ CURVE
1
0.9
0.8
Surface Concentration
0.7
0.6
i/A_ n
0.5
i=A_
S
Broken of
Q_
0.4
concentration
0.3
Q_2
0.2
Q_1
0.1
Area S
Area triangle of maximum concentration
1
P1Q1
2
and n 1 trapezoids of bases Qi and Qi+1 and height Pi+1
1 n1
X
2 (Pi+1 Pi )(Qi+1 + Qi )
i=1
Diseguaglianza nelle variabili qualitative Inequality in transferable quantitative characters Graphical representation of the concentration index
2 2 P1Q1 2 (
Pi+1 Pi )(Qi+1 + Qi )
=1
1 1 n1
X
2 2 (Pi+1 Pi )(Qi+1 + Qi )
i=0
Diseguaglianza nelle variabili qualitative Inequality in transferable quantitative characters Graphical representation of the concentration index
S= 2 (n 1 2
Diseguaglianza nelle variabili qualitative Inequality in transferable quantitative characters Graphical representation of the concentration index
Broken: Summary
1
Diseguaglianza nelle variabili qualitative Inequality in transferable quantitative characters Graphical representation of the concentration index
However, since for n large the two triangles OB0C and OBC
are equal, Gini proposed to consider as max the value 12
So, dividing the area S by its max = 12
1 1 n1
P 2
2 2 (
i=0 Pi+1
Pi )(Qi+1
R=
1
n 1
X
2 Independence from the mean: If all incomes are multiplied by a constant the
value of the index I does not change. Given the two distributions (5; 10; 3) and
(10; 20; 6) if we calculate the index of inequality for both will be true I (5; 10; 3) = I
(10; 20; 6)
The index does not satisfy the principle of descending transfer [4.1].
The Gini index also does not satisfy the property of exact
decomposability between population groups [5].
Diseguaglianza nelle variabili qualitative Inequality in transferable quantitative characters Graphical representation of the concentration index
Gini index measures the extent to which the distribution of income (or, in
some cases, consumption expenditure) among individuals or households
within an economy deviates from a perfectly equal distribution. A Lorenz
curve plots the cumulative percentages of total income received against the
cumulative number of recipients, starting with the poorest individual or
household. The Gini index measures the area between the Lorenz curve and
a hypothetical line of absolute equality, expressed as a percentage of the
maximum area under the line. Thus a Gini index of 0 represents perfect
equality, while an index of 100 implies perfect inequality.
Diseguaglianza nelle variabili qualitative Inequality in transferable quantitative characters Graphical representation of the concentration index
http://www.oecd.org/social/income-distribution-database.htm
To download key indicators click on
http://www.oecd.org/social/soc/IDD-Key-Indicators.pdf
To download the tables of data in excel click on
http://www.oecd.org/social/ OECD2016-
Inequality-Update-Figures.xlsx
Composite indicators Transformation of variables
Part I
References:
1 Delvecchio (1995) Measurement Scales and Social Indicators. Chap
5 page 117-141 page 158-160
Testo alternativo in lingua inglese: Nardo M., Saisana M.,
Tarantola S., Homan A., Giovannini E. (2005), Handbook on
Costructing Composite Indicators Methodology and user guide,
OCSE 2005 pag. 1-49.
2 Leti G. Cerbara L. (2009). Elements of Descriptive Statistics. Chap 10
The averages of distributions according to a character p. 185-189
Composite indicators
Composite indicators Transformation of variables
Plain or composite?
This is the case when several indicators are available that can
be used to define the phenomenon, and the ranking of the
units depends on which of the available indicators is selected
to monitor the phenomenon.
Income: Employment income? Property income ?
Composite indicators Transformation of variables
Plain or composite?
Plain or composite?
We indicate xij the value observed for the i esima unit in the j
esimo indicator
The composite indicator for the i-th unit is obtained from the
synthesis of the values observed in relation to several simple
indicators
xi1; xi2; : : : ; xim
si = f (xi1; : : ; xim)
If we use the simple arithmetic average (not weighted) of the
m indicators we will have
Pm xij
j=1
yes =
m
Then we will get n values of the composite indicator s(s1; : : ; sn)
Composite indicators Transformation of variables
xi1; : : : ; xim
#
g (xi1); : : : ; g (xim)
g (x) is a transformation of the original data made with the aim of
obtaining measurements that all have the same direction and
the same unit of measurement
Composite indicators Transformation of variables
Example:
Suppose we measure the socioeconomic status of
households through two indicators
socio-economic status =f(economic status, social status)
The two variables are not directly measurable, so we use two
indicators x1 and x2
x1 = annual household income (in thousands of euros) ! x1
x2 = education of the parent with the highest level of education
(number of years taken to obtain the degree) ! x2
Composite indicators Transformation of variables
Suppose:
for the x1 indicator I observe values between 5400-154000 euros
with an average of 36000 euros
for x2 I observe values between 5-25 years
I want to synthesize the values with a composite indicator, using
the arithmetic mean as a synthesis function
x x
z = g (x) =
Transformation of variables
Composite indicators Transformation of variables
Transformation of variables
1 Ordinal approach
2 Cardinal approach
Values of elementary indicators transformed into index numbers
Negative Indicators
For Piedmont
X11 = 1000 8; 2 = 991; 8
Composite indicators Transformation of variables
X2 = 10000 X2
In the next table the direction of all negative indicators has
been changed
Composite indicators Transformation of variables
X1 = 1000 X1
Composite indicators Transformation of variables
X*=1/X
1.9
1.7
1.5
X_5*
1.3
1.1
0.9
0.7
0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
X_5
Composite indicators Transformation of variables
Ranking
For each elementary indicator the statistical units are ordered in
ascending order and each unit is assigned a value equal to the
order number (or rank) that the unit occupies in the ranking
g (x) = rank(x)
If it is a negative indicator and there is no change of direction,
the units will be sorted in descending order.
When several units have the same value, the average rank is
assigned to them.
Sum of ranks
you = gij
=1
m
X
you = gij
j =1
Composite indicators Transformation of variables
2+1+4+5:::+6+1+3+7+6=57;5
you mi
yes =
max(s) m
O between 0-100
you min(s)
max(s) min(s)
Both composite indicators (s and s ) do not take into account
the value assumed by the statistical units in each of the
elementary indicators that contributed to its determination, but
only their relative position
Composite indicators Transformation of variables
you min(s)
yes =
max(s) min(
Composite indicators Transformation of variables
you min(s)
yes =
max(s) min
Composite indicators Transformation of variables
To remember :
1 You lose all information about the levels
2 Transformation is not suitable for temporal comparisons.
Composite indicators Transformation of variables
ESC.PERCENT.RANGO('indicators+'!B$2:B$22, 'indicators+'!B2)*100
Matrix: the values of variable X1 are found in the excel sheet 'Indicators+' in the column
B, lines 2 to 22 (B2 : B22)
x: I want to extract the percentile rank relative to the first observation (x11 which is in
position B2)
ROTATE(num,num digits)
num: Number to be rounded.
num digits: Number of digits to round off the num argument.
If num digits is greater than 0 (zero), num will be rounded to
the specified number of decimal places.
If num digits is equal to 0, num will be rounded to the nearest
integer.
If num digits is less than 0, num will be rounded to the left of the
decimal point.
Composite indicators Transformation of variables
Synthesis
m
X
yes = pij
j=1
m
X
yes = pij
j =1
Composite indicators Transformation of variables
i max(s) min(s)
Composite indicators Transformation of variables
x
g (x) =
x
In the example relating to the quality of life in Italian regions
n = 20 and m = 16
Composite indicators Transformation of variables
For each unit the arithmetic mean (line average) of the index
numbers is calculated
Composite indicators Transformation of variables
The region with the lowest value is Basilicata (0.78), while the
region with the highest value is Liguria (1.17).
Re-scaling
In the lab you will see how to apply the function in Excel
The value of R1 for Piedmont was rescaled using the following function
=('indicators+'!B2-MIN('indicators+'!B$2:B$21))/(MAX('indicators+'!B2 : B21)-MIN('indicators+'!B$2:B$21))
Composite indicators Transformation of variables
Pm
j=1 Rij
si = i = 1; : : : ; n
m
Composite indicators Transformation of variables
Standardization
Each value of the elementary indicator (xij ) and transformed
into standardized deviation (i.e. z-score)
xij
xj
zij =
j
where
n
1
j= v (xij x
u n i=1
u X
t
Therefore
x x
g (x) =
Measures how far the individual observations xij are from the
mean of variable j in terms of standard deviations
Range of variation of the new variables ! g (x) +1
The indicators are transformed into a common scale with
mean 0 and variance 1
Composite indicators Transformation of variables
xij xj
zij =
j
Composite indicators Transformation of variables
For Piedmont
991; 8 991; 63
z11 = = 0; 0939
Composite indicators Transformation of variables
Pn
xij
i=1
Composite indicators Transformation of variables
xij
= 100
Pij Pi xij
For Piedmont:
991; 8
P11 = 100
19832; 6
Composite indicators Transformation of variables
Pm
j=1 Pij
si = i = 1; : : : ; n
m
we will assume values between 0-100
Composite indicators Transformation of variables
The indicator Q that is derived takes values between 0-100 (or between 0-1 if relative frequencies or
proportions are used)
easily and very with some difficulty with difficulty with great difficulty
To
easily and quite easily
pk (%) 3.2 69.4 19.5 7.9 10
qk (weight) 0.00 0.33 0.66 1.00
Households who can and don't afford to arrive at the end of the month (Frequenze Osservate %)
easily and very easily with some difficulty and with difficulty with great difficulty
quite easily total
Households who can and don't afford to arrive at the end of the month (Fre Cumulate Osservate -F o x100)
easily and very easily with some difficulty and with difficulty with great difficult
quite easily
3.2 72.6 92.1 100.0
Theoretical Situation of Minimal Difficulty (Reference Theoretical Cumulated Frequencies -F r x100 )
easily and very easily with some difficulty and with difficulty with great difficult
quite easily
100.0 100.0 100.0 100.0
Composite indicators Transformation of variables
z0 = P k=1KjFk 1
(Leti, 1983)
K the number of categories
Fok = cumulative freq. of mode k in the observed distribution
Households who can and don't afford to arrive at the end of the month (Freq. Perc Cum Osservate -P o) totale
P or
easily and very with some difficulty and quite with difficulty with great difficulty
x100 easily easily
3.2 72.6 92.1 100.0
Theoretical Situation of Minimal Difficulty (Reference Theoretical Cumulative Perc. Frequencies -P r )
easily and very with some difficulty and quite with difficulty with great difficulty
P r easily easily
K-1=3 z'=132.1/3=44.03
Derive the value of the hardship index z' for households with a different number of members:
Territory Italy
Select time 2017
with great with difficulty with some easily and very Get (z'
Economic situation perceived difficulty difficulty and easily
quite easily
Household number of components
one 7.9 22.1 67.0 3.0
two 5.9 16.3 74.3 3.5
three 8.3 17.1 71.3 3.3
four 9.4 20.6 67.0 3.0
five or more 13.5 26.2 58.1 2.2
total 7.9 19.5 69.5 3.2
Composite indicators Transformation of variables
Comparison of rankings
Comparison of rankings
6 n
P
n (
i=1
s= 1
n2
Comparison of rankings
1 s 1
i=1 i
s= 1
n2 1)
Let's take the assessment of language and math skills for 9
countries as an example
s= 1 6 27 = 0:775 (2)
9(81 1)
Comparison of rankings
When the number of ties is high, their value is not negligible and
should be taken into account in the calculation of the index. A
modified version of the Spearman index is used that takes into
account the number of ties in both rankings (the formula is quite
complicated to remember).
Comparison of rankings
Part I