You are on page 1of 44

by

Tanmoy Talukdar, Avik Sarkar, Ritam Bhaumik


and
Riddhipratim Basu
Problem of measuring inequality has
been a very important problem in
economic statistics.

When the data is univariate, we know


of various parametric inequality
measures, which we generalize for
multivariate applications.

However, it seems that distribution-


On why we prefer a nonparametric inequality measure

Non-parametric measures, being


distribution-free, will not depend on (not
necessarily valid) distributional assumptions,
as is the case with parametric measures.

They will, moreover, be robust to small


errors in measurement which might be
present in the data.
The problem that we had as our motivation-

Suppose we have data available on all


states of India on more than one
variables/attributes that are indicators of
socio-economic growth of the state. And
we are interested in finding whether
there is a significant amount of inequality
present among the states in this respect.

As we want to construct a nonparametric


measure, it is most natural to look at the
»» Let us suppose we have n > 1 blocks,
denoted by B1,B2,…,Bn.
»» Let us assume we also have m > 1
categories.
We assume that all the categories are equally
important.
»» For each of the m categories we have a
ranking of the n blocks.
The nature of the data may tempt us to use
A necessary assumption at this
stage will be that
there are no ties.

Under the above assumption


each rank vector will be a
permutation of {1,2,...,n}.
Denote the rank vectors as σ1,σ2,…,σm.
Amxn = [σ1’,σ2’ ,…σm’ ]’ is the Rank matrix.

1 5 2 3 4
 
3 4 1 5 2
Example: 3 4 1 5 2
 
5 2 1 4 3

The above is a typical example of a rank matrix


with four categories and five blocks.
Definition:
A Rank matrix Amxn = [σ1’,σ2’ ,…σm’]’ is said to
be a Complete Inequality configuration if all the
rows of A are identical, i.e., σ1 =σ2 =… =σm.
(NOTE: Complete inequality configuration is not
unique.)
Here is a typical 3example:
4 1 5 2
 
 3 4 1 5 2 
3 4 1 5 2
 
3 4 1 5 2
Null hypothesis:

H0 : P(σi =σ) = 1/n! for all


permutations σ of {1,2,…,n}.
Given a rank matrix A, our objective
is to find a measure of inequality
corresponding to that matrix.
With respect to the earlier definition
of Complete Inequality, we try to
propose a measure of inequality
based on some notion of
“distance” of the given rank matrix
A from a Complete Inequality
Let Sn denote the set of all permutations of
{1, 2, ..., n}.
Definitions:
d : Sn   {0} is a distance function if
 ,  Sn :
1.d ( , )  0iff    , and
2.d ( , )  d ( ,  ).
d is called a metric if, in addition to conditions 1 and 2
above it satisfies the triangle inequality, i.e.,
 , ,  Sn, d ( , )  d ( , )  d (, ).
1. Spearman’s Distance:

     
n

 
2
d 1( , )  k  k .
k 1

2. Spearman’s Footrule*:


n

d 2( , )    k     k   .
k 1
3. Kendall’s Distance*:

d K  ,   1{  k  l   k  l 0}.


k l

4. Cayley’s Distance*:
Cayley's Distance dC between two permutations σ and τ is
given by the minimum number of transpositions needed to
reach σ from τ.

[The * denotes the distances that are metrics as well.]


Given a Rank matrix Amxn = [σ1’,σ2’ ,…,σm’]’ and a distance d
on Sn, we propose the following D-Measures:

(i) D1  A   min  d  i,  .
d

 1i  m

(ii ) D 2  A    d  ,  .
d
i j
1i  j  m

[NB:Both the D-Measures above attain the value 0 if and only if A is


a Complete Inequality configuration.]
Theorem 1:
For any distance d on Sn and any rank matrix Amxn:
m d
(i) D 2  A  D1  A .
d

2
(ii) If d is a metric,
then
D 2  A   m 1 D1  A .
d d

Thus, if d is a metric, then for a fixed matrix A, both


the D-Measures are of the same order. But Dd2 is far
easier to compute for well-behaved d than Dd1.
»» We want to choose a suitable
distance function on Sn which will be
sensisitive to inequality in rank matrix
A.
»» Spearman's Distance and Spearman's
Footrule are both restrictions of
distance functions on Rn to Sn , and
hence, are not reflective of the special
structure of Sn and A.
»» Cayley’s Distance gives equal
Call υ(i)the (i)-value adjacent transposition if for all σ
in Sn ,υ(i) acting upon σ swaps the values i and (i+1).

• Call υ value adjacent transposition if υ=υ(i)


for some i.
Example:

Place Adjacent Transposition 3 1 4 2


Value Adjacent Transposition 3 1 4 2
We choose a suitable metric on Sn ,d* as follows:
d* (σ,τ) is defined as the minimum number of
value adjacent transpositions needed to reach τ
starting from σ.
Consider the problem of calculating d* (σ,τ)
where τ = {1,2,3,4} and σ = {3,1,4,2}.

Step 1: 1 2 3 4
So,
Step 2: 2 1 3 4 d* (σ,τ) = 3.
Step 3:
2 1 4 3
We propose the following D-Measure:
 D*= D2 d*
i.e. for a rank matrix Amxn ,

D  A  D  A   d  ,   .
* d* *
2 i j
1i  j  m
Proposition 1: Let ,  '  S n .Then
d *   ,  '   d K  ,  '  .

Proposition 2: Let ,  S n .Then


d *  ,   d *  ,    S n .

D  A  1{ k  l  k  l 0}.


*
Proposition 3:
i i j j
i  j k l
Theorem 2: Let Amxn be a rank matrix. Then,
1. D*(A)=0 iff A is a Complete Inequality configuration, and
2. D* is invariant under row and column permutations.

Theorem 3: Let Amxn be a rank matrix. Let Bi dominate Bj w.r.t.


A. Let r be fixed,1≤ r≤m. Let A* be the matrix obtained from A
by swapping σr(i) and σr(j). Then,
D*(A) <D*(A*).

Corollary: Let a block Bi be ranked 1 (or for that matter, n) in all


categories of A. Let A1 be obtained by interchanging the rank of Bi with
any other block, in any category in A. Then,
D*(A) <D*(A1).
A natural upper bound for D* is  m n
   .
 2  2
Theorem 4: Let Amxn be a rank matrix. Then,

(m 2  1)  n 

{
D*  A  
m2
4
n
 ,
 2
if m is odd;

 , if m is even.
4  2
Contd.

Attainment of the improved upper bound-


a construction:
1 2 3 4 5
1 2 3 4  
  1 2 3 4 5
 1 2 3 4
A1  A2   1 2 3 4 5
4 3 2 1  
  5 4 3 2 1
4 3 2 1 5 1 
 4 3 2

For the both the rank matrices above the


aforesaid upper bound is attained.
Definition:
For a rank matrix Amxn we define its Inequality Coefficient,I as
follows:
1
I 
 m  n  i  j k l 1{ i  k   i  l   j  k   j  l   0}
.
  
 2  2 
Proposition 4:
Let Amxn be a rank matrix, and D* and I be as defined before.
Then, D*
I  1 .
 m  n 
  
 2  2 
Theorem 5:
For a rank matrix Amxn with Inequality Coefficient I, we have

I ≤ 1, with equality iff A is a Complete Inequality configuration.

1 1
 , if m is odd;
(ii)
{
I
1
2 2m
1
 , if m is even.
2 2(m  1)
Theorem 5: (contd.)
For a rank matrix Amxn with Inequality Coefficient I, we have

(iii) I is invariant under row and column permutations.

(iv) Let Bi dominate Bj in A. Let A1 be the rank matrix obtained


from A by swapping ranks of Bi and Bj in any one category Ck.
Let the inequality coefficient for A1 be I1. Then, I1 ≤ I.
Theorem 6:

For a rank matrix Amxn with Inequality Coefficient I, if


σ1,σ2,…,σm are i.i.d. random permutations of {1,2,…,n}, then,
1
E(I )  .
2
Theorem 7:

For a rank matrix Amxn with Inequality Coefficient I, if


σ1,σ2,…,σm are i.i.d. random permutations of {1,2,…,n}, then,
2n  5
var( I )  .
 m n
36    
 2  2
Corollary:
1
Let n be fixed. Then as m goes to infinity, we have I m 
P
 .
2
The distribution of I under H0 is a slightly right tailed one.
As m or n increases, it rapidly becomes concentrated around 0.5.
The graph of the simulated distribution of I for m=5 and n=29 is provided
below.
Collection of Data

•We use the results of 59th and 61st rounds of NSSO household
survey.

•The 59th round survey was carried out in 2003 and the 61st in
2004-05.

•We have excluded the Union Territories from our study.


Inequality among States in India
Variable Selection

The values of the following variables or attributes are


used to act as categories to rank the states:

1.MPCE- Monthly Per-capita Consumption


Expenditure,
2. Level of Education,
3. Employment,
4.Primary Source of Lighting, and
5.Area of Land Possessed .
Inequality among States in India
Results: 59th Round
•Data were not available for all the states.

•We used the data for 17 states for which data on all
categories were available.

•For this round we have a 5X17 matrix.


Inequality among States in India
Results: 59th Round Contd.

I=0.591

P-value=0.003
Inequality among States in India
Results: 61st Round
•Data were available for all the states.

•We used the data for 28 states and Delhi.

•For this round we have a 5X29 matrix.


Inequality among States in India
Results: 61st Round Contd.

I=0.605

P-value=0.00001
Inequality among States in India
Results: 61st Round
TRUNCATED
•Data were available for all the states in the 61 st round,
but only for 17 states in the 59th.

•To compare the two, we analyze by truncating the data


so as to include only those states that were included in
the 59th Round study.

•For this round also, we have a 5X17 matrix.


Inequality among States in India
Results: 61st Round
TRUNCATED Contd.

I=0.575

P-value=0.010
Comparison with other Statistics
Other statistics used for comparing are:
(1)Friedman Statistic:
2
  (n  1)  
D ''   i  j   i ( j )  
  2 
(2)Statistic used by Sarkar et al.:

D'  D . dC
1
Comparison with D’
1 2 3 4 1 2 3 4
A1    A2   
 4 3 2 1   2 1 4 3
D '( A1 )  1, D '( A2 )  2
D* ( A1 )  5, D* ( A2 )  2

has more inequality w.r.t. D’.

But w.r.t. D*,more inequality is present in


Comparison with D’’
1 2 3 4 D ''( A1 )  D ''( A2 )  0
 
 1 2 3 4
A1  So D’’ cannot distinguish between
4 3 2 1
  A1 and A2 .
4 3 2 1

1 4 3 2 D* ( A1 )  24, D* ( A2 )  20
 
 2 1 4 3
A2  has more inequality w.r.t. D*.
3 2 1 4
 
4 3 2 1
In real life scenarios we often end up with situations where ties
exist between the ranks of blocks, or where the data is
incomplete.
In such cases, we give a natural extension to our measure by
using the formula
1
I 
 m  n  i  j k l 1{ i  k   i  l   j  k   j  l   0}
.
  
 2  2 
In case of ties we replace the indicator function by 0.5.
In case of incomplete data, we ignore those cases and scale by
the number of meaningful observation pairs.
»» Large sample distribution of I
»» Further investigation of the combinatorial properties of
D*.
»» Effect of an outlier block.
»» Effect of different clusters in categories.
»» Exploring the case when all the categories are not of
equal importance.

You might also like