A Nonparametric Measure of Inequality

by
Tanmoy Talukdar, Avik Sarkar, Ritam Bhaumik

and
Riddhipratim Basu
Problem of measuring inequality has
been a very important problem in
economic statistics.
When the data is univariate, we know

of various parametric inequality
measures, which we generalize for
multivariate applications.
However, it seems that distribution-

On why we prefer a nonparametric inequality measure
Non-parametric measures, being

distribution-free, will not depend on (not
necessarily valid) distributional assumptions,
as is the case with parametric measures.
They will, moreover, be robust to small

errors in measurement which might be
present in the data.
The problem that we had as our motivation-
Suppose we have data available on all

states of India on more than one
variables/attributes that are indicators of
socio-economic growth of the state. And
we are interested in finding whether
there is a significant amount of inequality
present among the states in this respect.
As we want to construct a nonparametric

measure, it is most natural to look at the
»» Let us suppose we have n > 1 blocks,
denoted by B1,B2,…,Bn.
»» Let us assume we also have m > 1
categories.
We assume that all the categories are equally
important.
»» For each of the m categories we have a
ranking of the n blocks.
The nature of the data may tempt us to use
A necessary assumption at this
stage will be that
there are no ties.
Under the above assumption

each rank vector will be a
permutation of {1,2,...,n}.
Denote the rank vectors as σ1,σ2,…,σm.
Amxn = [σ1’,σ2’ ,…σm’ ]’ is the Rank matrix.
1 5 2 3 4
 
3 4 1 5 2
Example: 3 4 1 5 2
 
5 2 1 4 3
The above is a typical example of a rank matrix

with four categories and five blocks.
Definition:
A Rank matrix Amxn = [σ1’,σ2’ ,…σm’]’ is said to
be a Complete Inequality configuration if all the
rows of A are identical, i.e., σ1 =σ2 =… =σm.
(NOTE: Complete inequality configuration is not
unique.)
Here is a typical 3example:
4 1 5 2
 
 3 4 1 5 2 
3 4 1 5 2
 
3 4 1 5 2
Null hypothesis:
H0 : P(σi =σ) = 1/n! for all

permutations σ of {1,2,…,n}.
Given a rank matrix A, our objective
is to find a measure of inequality
corresponding to that matrix.
With respect to the earlier definition
of Complete Inequality, we try to
propose a measure of inequality
based on some notion of
“distance” of the given rank matrix
A from a Complete Inequality
Let Sn denote the set of all permutations of
{1, 2, ..., n}.
Definitions:
d : Sn   {0} is a distance function if
 ,  Sn :
1.d ( , )  0iff    , and
2.d ( , )  d ( ,  ).
d is called a metric if, in addition to conditions 1 and 2
above it satisfies the triangle inequality, i.e.,
 , ,  Sn, d ( , )  d ( , )  d (, ).
1. Spearman’s Distance:
     
n
 
2
d 1( , )  k  k .
k 1
2. Spearman’s Footrule*:

n
d 2( , )    k     k   .
k 1
3. Kendall’s Distance*:
d K  ,   1{  k  l   k  l 0}.

k l
4. Cayley’s Distance*:
Cayley's Distance dC between two permutations σ and τ is
given by the minimum number of transpositions needed to
reach σ from τ.
[The * denotes the distances that are metrics as well.]

Given a Rank matrix Amxn = [σ1’,σ2’ ,…,σm’]’ and a distance d
on Sn, we propose the following D-Measures:
(i) D1  A   min  d  i,  .
d
 1i  m
(ii ) D 2  A    d  ,  .
d
i j
1i  j  m
[NB:Both the D-Measures above attain the value 0 if and only if A is

a Complete Inequality configuration.]
Theorem 1:
For any distance d on Sn and any rank matrix Amxn:
m d
(i) D 2  A  D1  A .
d
2
(ii) If d is a metric,
then
D 2  A   m 1 D1  A .
d d
Thus, if d is a metric, then for a fixed matrix A, both

the D-Measures are of the same order. But Dd2 is far
easier to compute for well-behaved d than Dd1.
»» We want to choose a suitable
distance function on Sn which will be
sensisitive to inequality in rank matrix
A.
»» Spearman's Distance and Spearman's
Footrule are both restrictions of
distance functions on Rn to Sn , and
hence, are not reflective of the special
structure of Sn and A.
»» Cayley’s Distance gives equal
Call υ(i)the (i)-value adjacent transposition if for all σ
in Sn ,υ(i) acting upon σ swaps the values i and (i+1).
• Call υ value adjacent transposition if υ=υ(i)

for some i.
Example:
Place Adjacent Transposition 3 1 4 2

Value Adjacent Transposition 3 1 4 2
We choose a suitable metric on Sn ,d* as follows:
d* (σ,τ) is defined as the minimum number of
value adjacent transpositions needed to reach τ
starting from σ.
Consider the problem of calculating d* (σ,τ)
where τ = {1,2,3,4} and σ = {3,1,4,2}.
Step 1: 1 2 3 4
So,
Step 2: 2 1 3 4 d* (σ,τ) = 3.
Step 3:
2 1 4 3
We propose the following D-Measure:
 D*= D2 d*
i.e. for a rank matrix Amxn ,
D  A  D  A   d  ,   .
* d* *
2 i j
1i  j  m
Proposition 1: Let ,  '  S n .Then
d *   ,  '   d K  ,  '  .
Proposition 2: Let ,  S n .Then

d *  ,   d *  ,    S n .
D  A  1{ k  l  k  l 0}.

*
Proposition 3:
i i j j
i  j k l
Theorem 2: Let Amxn be a rank matrix. Then,
1. D*(A)=0 iff A is a Complete Inequality configuration, and
2. D* is invariant under row and column permutations.
Theorem 3: Let Amxn be a rank matrix. Let Bi dominate Bj w.r.t.

A. Let r be fixed,1≤ r≤m. Let A* be the matrix obtained from A
by swapping σr(i) and σr(j). Then,
D*(A) <D*(A*).
Corollary: Let a block Bi be ranked 1 (or for that matter, n) in all

categories of A. Let A1 be obtained by interchanging the rank of Bi with
any other block, in any category in A. Then,
D*(A) <D*(A1).
A natural upper bound for D* is  m n
   .
 2  2
Theorem 4: Let Amxn be a rank matrix. Then,
(m 2  1)  n 
{
D*  A  
m2
4
n
 ,
 2
if m is odd;
 , if m is even.
4  2
Contd.
Attainment of the improved upper bound-

a construction:
1 2 3 4 5
1 2 3 4  
  1 2 3 4 5
 1 2 3 4
A1  A2   1 2 3 4 5
4 3 2 1  
  5 4 3 2 1
4 3 2 1 5 1 
 4 3 2
For the both the rank matrices above the

aforesaid upper bound is attained.
Definition:
For a rank matrix Amxn we define its Inequality Coefficient,I as
follows:
1
I 
 m  n  i  j k l 1{ i  k   i  l   j  k   j  l   0}
.
  
 2  2 
Proposition 4:
Let Amxn be a rank matrix, and D* and I be as defined before.
Then, D*
I  1 .
 m  n 
  
 2  2 
Theorem 5:
For a rank matrix Amxn with Inequality Coefficient I, we have
I ≤ 1, with equality iff A is a Complete Inequality configuration.
1 1
 , if m is odd;
(ii)
{
I
1
2 2m
1
 , if m is even.
2 2(m  1)
Theorem 5: (contd.)
For a rank matrix Amxn with Inequality Coefficient I, we have
(iii) I is invariant under row and column permutations.
(iv) Let Bi dominate Bj in A. Let A1 be the rank matrix obtained

from A by swapping ranks of Bi and Bj in any one category Ck.
Let the inequality coefficient for A1 be I1. Then, I1 ≤ I.
Theorem 6:
For a rank matrix Amxn with Inequality Coefficient I, if

σ1,σ2,…,σm are i.i.d. random permutations of {1,2,…,n}, then,
1
E(I )  .
2
Theorem 7:
For a rank matrix Amxn with Inequality Coefficient I, if

σ1,σ2,…,σm are i.i.d. random permutations of {1,2,…,n}, then,
2n  5
var( I )  .
 m n
36    
 2  2
Corollary:
1
Let n be fixed. Then as m goes to infinity, we have I m 
P
 .
2
The distribution of I under H0 is a slightly right tailed one.
As m or n increases, it rapidly becomes concentrated around 0.5.
The graph of the simulated distribution of I for m=5 and n=29 is provided
below.
Collection of Data
•We use the results of 59th and 61st rounds of NSSO household
survey.
•The 59th round survey was carried out in 2003 and the 61st in
2004-05.
•We have excluded the Union Territories from our study.

Inequality among States in India
Variable Selection
The values of the following variables or attributes are

used to act as categories to rank the states:
1.MPCE- Monthly Per-capita Consumption

Expenditure,
2. Level of Education,
3. Employment,
4.Primary Source of Lighting, and
5.Area of Land Possessed .
Results: 59th Round
•Data were not available for all the states.
•We used the data for 17 states for which data on all
categories were available.
•For this round we have a 5X17 matrix.

Results: 59th Round Contd.
I=0.591
P-value=0.003
Results: 61st Round
•Data were available for all the states.
•We used the data for 28 states and Delhi.
•For this round we have a 5X29 matrix.

Results: 61st Round Contd.
I=0.605
P-value=0.00001
Results: 61st Round
TRUNCATED
•Data were available for all the states in the 61 st round,
but only for 17 states in the 59th.
•To compare the two, we analyze by truncating the data

so as to include only those states that were included in
the 59th Round study.
•For this round also, we have a 5X17 matrix.

Results: 61st Round
TRUNCATED Contd.
I=0.575
P-value=0.010
Comparison with other Statistics
Other statistics used for comparing are:
(1)Friedman Statistic:
2
  (n  1)  
D ''   i  j   i ( j )  
  2 
(2)Statistic used by Sarkar et al.:
D'  D . dC
1
Comparison with D’
1 2 3 4 1 2 3 4
A1    A2   
 4 3 2 1   2 1 4 3
D '( A1 )  1, D '( A2 )  2
D* ( A1 )  5, D* ( A2 )  2
has more inequality w.r.t. D’.
But w.r.t. D*,more inequality is present in

Comparison with D’’
1 2 3 4 D ''( A1 )  D ''( A2 )  0
 
 1 2 3 4
A1  So D’’ cannot distinguish between
4 3 2 1
  A1 and A2 .
4 3 2 1
1 4 3 2 D* ( A1 )  24, D* ( A2 )  20
 
 2 1 4 3
A2  has more inequality w.r.t. D*.
3 2 1 4
 
4 3 2 1
In real life scenarios we often end up with situations where ties
exist between the ranks of blocks, or where the data is
incomplete.
In such cases, we give a natural extension to our measure by
using the formula
1
I 
 m  n  i  j k l 1{ i  k   i  l   j  k   j  l   0}
.
  
 2  2 
In case of ties we replace the indicator function by 0.5.
In case of incomplete data, we ignore those cases and scale by
the number of meaningful observation pairs.
»» Large sample distribution of I
»» Further investigation of the combinatorial properties of
D*.
»» Effect of an outlier block.
»» Effect of different clusters in categories.
»» Exploring the case when all the categories are not of
equal importance.

A Nonparametric Measure of Inequality

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Nonparametric Measure of Inequality

Uploaded by

Copyright:

Available Formats

by

Tanmoy Talukdar, Avik Sarkar, Ritam Bhaumik

When the data is univariate, we know

However, it seems that distribution-

Non-parametric measures, being

They will, moreover, be robust to small

Suppose we have data available on all

As we want to construct a nonparametric

Under the above assumption

The above is a typical example of a rank matrix

H0 : P(σi =σ) = 1/n! for all

d K  ,   1{  k  l   k  l 0}.

[The * denotes the distances that are metrics as well.]

[NB:Both the D-Measures above attain the value 0 if and only if A is

Thus, if d is a metric, then for a fixed matrix A, both

• Call υ value adjacent transposition if υ=υ(i)

Place Adjacent Transposition 3 1 4 2

Proposition 2: Let ,  S n .Then

D  A  1{ k  l  k  l 0}.

Theorem 3: Let Amxn be a rank matrix. Let Bi dominate Bj w.r.t.

Corollary: Let a block Bi be ranked 1 (or for that matter, n) in all

Attainment of the improved upper bound-

For the both the rank matrices above the

I ≤ 1, with equality iff A is a Complete Inequality configuration.

(iii) I is invariant under row and column permutations.

(iv) Let Bi dominate Bj in A. Let A1 be the rank matrix obtained

For a rank matrix Amxn with Inequality Coefficient I, if

For a rank matrix Amxn with Inequality Coefficient I, if

•We have excluded the Union Territories from our study.

The values of the following variables or attributes are

1.MPCE- Monthly Per-capita Consumption

•For this round we have a 5X17 matrix.

•We used the data for 28 states and Delhi.

•For this round we have a 5X29 matrix.

•To compare the two, we analyze by truncating the data

•For this round also, we have a 5X17 matrix.

has more inequality w.r.t. D’.

But w.r.t. D*,more inequality is present in

You might also like