You are on page 1of 12

The Estimation of the Lorenz Curve and Gini Index

Author(s): Joseph L. Gastwirth


Source: The Review of Economics and Statistics, Vol. 54, No. 3 (Aug., 1972), pp. 306-316
Published by: The MIT Press
Stable URL: http://www.jstor.org/stable/1937992
Accessed: 20-08-2014 23:45 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to The Review of Economics and Statistics.

http://www.jstor.org

This content downloaded from 146.155.17.149 on Wed, 20 Aug 2014 23:45:26 UTC
All use subject to JSTOR Terms and Conditions

THE ESTIMATION OF THE LORENZ CURVE


AND GINI INDEX
*
JosephL. Gastwirth

I Introduction

boundsis ? .002. Because Soltow(1965) detectsa changein the Giniindexof 0.8 of one
ofincome
OST ofthemeasures
inequal- per cent or about 0.003 or 0.004, our bound
ity are derivedfromthe Lorenz curve; seemsquiteadequate. In sectionVI we extend
indeed Morgan (1962) states that the Gini our methodto obtainupperand lowercurves
indexis the best singlemeasureof inequality. fortheLorenzcurve.
The presentarticlereviewssome of the theoAfterreviewingthe basic propertiesof the
reticalpropertiesof the Lorenz curve,relates Lorenz curvewe proceedto deriveboundson
ofthefrequency
themtocharacteristics
function the meandifference
and Giniindex. In section
income
distribution
the
and devel- IV we analyzean actual sampleand showthat
underlying
ops methodsforobtainingaccurateboundson themethodused by theCensusBureau (1967)
the Gini indexwhichdo not dependon curve oftenleads to estimateswhichare outsidethe
fitting.In the processwe shouldalso like to mathematically
possible bounds we derived.
lay to rest some mythsconcerningthe Gini Finally,in an appendixwe showthatthePareto
index such as: (a) its relativeinsensitivitylaw does not give a good fitto currentUnited
in Statestax data.
(Rltet6 and Frigyes,1968), (b) difficulty
related
and
computation(1968),
(c) problems
to the inclusionof negativeincomes (Budd,
II Properties
oftheLorenzCurveandAssociated
1970).
Measures
ofInequality
The basic idea of our approachis to obtain
upperand lowerboundsto theGiniindexfrom
Given a set of n orderednumbers,x1 x2
data whichare groupedin intervalsand the
c x,, the empiricalLorenz curve genmean incomein each intervalis known. The erated by themis definedat the pointsi/n,
usual method (Morgan, 1962) of estimating i 0,.. ., , by L (0) = 0 and L (i/n) = sils,
the Giniindexyieldsa lowerboundby assum- wheresi = xl + ... + xi. The empiricalLorenz
ing that all incomesin any intervalequal the curve,L(p), is definedforall p in theinterval
average income. We derive an upper bound (0,1) by linearinterpolation
and represents
the
to the groupingcorrection(Goldsmith,et al., fractionof the total variable measured(e.g.,
1954, p. 10) and hence to the Gini index by income) that the holdersof the smallestpth
distributing
theincometo maximizethespread fractionpossess.
withineach group. On the 1967 InternalReveFor theoreticalpurposesit is usefulto connue Service tax data, the difference
between sider the numbersxi as a sample drawnfrom
our boundsis less than0.006. As mostincome a distribution
functionF (x). Throughoutthis
distributions
come froma frequencyfunction articlewe shall assumethatF (x) increaseson
(density)whichdecreasesin the largeincome itssupport(thevaluesofx forwhich0 < F(x)
range,we develop improvedbounds for the < 1) and themeanp of F(x) exists. The first
Gini index based on this assumption.Fortu- assumptionimpliesthatF-1(p) is well defined
nately,this assumptioncan be checkedfrom and is the populationpth quantile. Givenany
thedata so thatwe can use thesharperbounds degreeof freedom(d.f.) F(x), the theoretical
only forthe appropriateintervals.Using this Lorenz curvecorresponding
to it is definedby
second method the differencebetween our
rP

Received for publication November 11, 1970. Revision


accepted for publicationFebruary 23, 1972.
* Research supported by the National Science Foundation Research Grant No. GP-20527 awarded to the Department of Statistics,The Johns Hopkins University.

L(p)

= ,-1

F-1(t) dt.

(1)

In table 1 we presenta shorttableoftheLorenz


curvesgeneratedby severalcommondistributions.

[306]

This content downloaded from 146.155.17.149 on Wed, 20 Aug 2014 23:45:26 UTC
All use subject to JSTOR Terms and Conditions

ESTIMATION OF THE LORENZ CURVE AND GINI INDEX


TABLE 1.

307

THE LORENZ CURVES GENERATED BY SOME COMMON DISTRIBUTIONS

Distribution

C.D.F.

Equal

Fx

Exponential

F(x)

1- e

ShiftedExponential

F(x)

1-

GeneralUniform

F(x)

Pareto

F(x)

L
L(p) = p
,

x>

p + (1-p)ln(l

ap +

-, a<x<a-<a

(a/x)a,

-p)

p + (1+xa)-'(1-p)ln(1-p)

,x> a

x-a1-

LorenzCurve

x > a, a >

p2/2
1

a + 0/2

1
1-

(1-p)

(a-l)/a

Two simple facts concerningthe Lorenz


curveare derivablefromformula(1) and the
fact that F-'(t) is nondecreasing.We state
Lemma 1. Let L(p) be the Lorenzcurvecorresponding
to a d.f. F(x). Then L(p) is convex and its derivative,L'(p), equals one at
p=F= ().

The mostcommonmeasureofinequality,the
Gini indexG, is the ratioof the area between

the Lorenz curve L(p) and the 450 line (see


figure1) to the area under the 450 line (which

is 112). The area,A, betweentheLorenzcurve


and the straightline is called the area of concentration.
An alternativeformulafor the Gini index,
G, is based on the mean difference,
A, of the
d.f. F(x) and is givenin
underlying
Lemma 2: (Kendall and Stuart,1963). The
Giniindexof theLorenzcurveL(p) generated
by a d.f.F(x) is AI(2vL),where
A

ff0 x-yldF(x)dF(y)

= 2 JF(x)

(2)

The formula G = A/(2bL) shows that the

Giniindexmeasuresrelative inequality as it is
the ratioof a measureof dispersion,the mean
difference,
totheaveragevalue (/). Othermeasures are the coefficient
of variationo-/,4,and
half the relativemean deviation8/2ct,where
=

feAx-dF(x)

= 2

f (x-

The area A is thearea underthe curvep -

L(p).

As L(p) is convex,p

tL)dF(x) is

the mean deviation.The relativemean deviation (8/pu)is relatedto severalothermeasures


of relativeinequalitywhichwe now review.

L(p) is concave

and vanishesat p= 0 or 1. Thus, thereis a


value p' (called thepointof maximumdiscrepancy betweenthe Lorenz curve and line of
equality) satisfying
p- L(p')

[1-F(x) ]dx

4f x[F(x)-1/2]dF(x).

FIGURE 1. - A LORENZ CURVE (THE SHADED AREA


IS THE AREA OF CONCENTRATION)

-p

L(p)

forall p.

(3)

The relevanceof the point of maximumdiscrepancyis statedin


Lemma 3: The pointp' equals F(/,) and the
value of themaximum
discrepancy,
p'-L(p'),
equals 8/( 2[).
The lemmashowsthatp' is the fractionof
thepopulationreceiving
less thanthe"average"
whilethe value of the maximumdiscrepancy,
p'- L(p'), is halftherelativemean deviation.
In 1951,Shutzproposedto measureinequality
by comparingthe derivative,L'(p), of the
Lorenz curve to the derivative(one) of the
line of total equality. His measure,S, is the
area betweenL'(p) and 1 in the region(O,p'),
whichreducesto 8/(2/i).

This content downloaded from 146.155.17.149 on Wed, 20 Aug 2014 23:45:26 UTC
All use subject to JSTOR Terms and Conditions

308

THE REVIEW OF ECONOMICS AND STATISTICS

The measure8/(21) was proposedby Yn- Lemma 6: Let F(x) be a concaved.f.supported


tema (1933) and Pietra in the 1930's. Pietra on (a,b); then
measuredinequalityby the area of the largest A 1 (4/3) (u- a )2/ (b -a).
(5)
trianglethatcould be inscribedin the area of
A further
improvement
can be obtainedby
concentration.The ratio of the area of the showingthat the uniformdistribution
on any
Pietratriangleto thearea (112) underthe450 intervalof theform(a, 2t1-a) has thesmallest
line of "perfectequality"equals halfthe rela- meandifference
in theclass of all concaved.f.'s
tivemeandeviation.Yntema'smeasureessen- supported in (a, oo) with mean ,u. In order to
tiallyis p' - L(p') or 8/(2pu). More recently establishthis resultwe requirea modification
1ltet6 and Frigyes (1968) proposed related of a resultof Chow and Studden(1969).
measuresbased on the Lorenzcurve. We now Lemma 7: Let h and g be two nonincreasing
of the Giniindexin detail functions
studytheestimation
on thereal line suchthat
as thesameideas are extendedin sectionVI to
h(x)dx
J'g(x)dx
(6)
deriveboundsfortheentireLorenzcurvefrom
which one can obtain bounds on the other
and supposethath - g changessignonce from
measures.
plus to minus,i.e., tosuch that
?0 forall t.
[k(t)-g(t)](t-to)
(7)
forthe
III UsefulInequalities
forany concavefunctionb,
Then,
MeanDifference
andGiniIndex
0 [k (t) ] dt.
In thissectionwe deriveseveralboundson
0 [g(t) ] dt -(8)
A and G. By assumingweak propertiesof the
Our mainapplicationis
d.f. F(x) or its derivativef(x), the density Theorem 1: Let F(x) be a concave d.f. supfunction,
we can oftenobtainboundswhichwe portedon an interval(a, oc ) and let F(x) have
use in the next sectionto estimatethe Gini mean
pt. Let Fo(x) be the uniformd.f. on
index.
(a, 2i - a) withthe same mean i as F has.
Our firstresultgivesa generalboundon the Then the mean difference
of F(x) is greater
mean difference
for any d.f. F(x) supported than or equal to the mean difference,
2 (t on a finiteinterval(a,b). From formula(3),
a)13, of Fo(x).
inequality105 in Hardy et al. (1952) and the
Proof: The resultfollowsonce it is shown
factthatF(x) increases,onecan prove
that
Lemma 4: For any d.f. F(x) supportedon

(a,b) withmean I,
o A
z- ? 2 (tt-a) (b-

)/(b-a).

(4)

F(x)

[1-F(x)dx

For intervalswhichare "open ended,"using


Fo(x) [ 1-Fo(x) ] dx,
(9)
~
the second formulaforA in (2) and the fact
thatF(x) - 1, one can derive
whereFo(x) = 0 whenx < a, (x-a)/(2,4-a)
on [a, co] whena -? x < 21pt-aand one whenx 2 -a.
Lemma 5: If F(x) is a d.f.supported
with finitemean pt, then A - 2 (/t - a).
In Lemma 7, set g (x) = 1-F (x) and I (x) =
Remark: For theParetolaw withparameter 1-Fo (x). Since Fo (x) and F (x) have the
a, A/2/1= (2a-1)-1 and approachesone as same mean ,t, condition(19) holds. As F(x)
a approachesone fromabove, so the boundis is concave,g is convexand h(x) is a straight
lineso thath crossesg exactlyoncefromabove.
sharp.
Both theupperand lowerboundsof Lemma The functionk(t) = t( 1-t) is concaveso that
4 can be strengthened
if one is willingto as- (8) impliesthat
sume thatthe densityfunctiondecreases,i.e.,
(1/3)(-a)
Fo(t) [1-Fo(t)
the d.f.F(x) is concave,in the interval(a,b).
A bound which is derived froma resultof
? J'F(t) [1-F(t) ]dt.
(10)
Gauss (Kendall and Stuart, 1963, p. 92) is
By analogousmethodswe can obtain
statedin

This content downloaded from 146.155.17.149 on Wed, 20 Aug 2014 23:45:26 UTC
All use subject to JSTOR Terms and Conditions

ESTIMATION OF THE LORENZ CURVE AND GINI INDEX


Theorem 2: If F(x) is concave on (a,b), then

309

(-,p+i)[L(pi) + L(pi+1)].
2/3(p-a) -? A ?2([-a)(b-a)-1
i=0
(15)
(1
1)
1/3 (aj-a,)].
[(ba)
of
In orderto assess the accuracy (15) we
It shouldbe notedthatin contrastto Theoneed
an upper bound for G. As the Lorenz
rem 1 the upper bound (11) dependson the
finiteness
of the interval(a,b). WhenF(x) is curveis convex,its derivativeincreasesand by
the tangentsto the curve at the
convexon (a,b), themeandifference
is bounded constructing
points
(pi,L(pi)) we can bound.thecurvefrom
by
below.
This approachis developedin section
(2/3)(b -aI) (b - a) h
For
our purposesan approach,based on
VI.
2/3 (b -) (b- a) -1
formula
G = A/(2,u)is moreconvenient.
the
- (b-a)].
(12)
[4 (-a)
Givenn orderednumberswhichare grouped
For the open endedinterval(a, oo) the up(preservingthe ordering)into (k+- 1) subper bound forA givenin Lemma5 cannotbe groups
improved. Considerationof densitieswith a
decreasinghazard rate permitsstrengthening xi,. . ,Ixml; XM1+11.. *.XnM2; .. * ; Xmek+l . yXno
n m2 = nP2,...
mk= p=1n and
the lowerboundof Theorem1. As the Pareto wherem1 npl,
law and thetailsof thelognormal(Barlow and O < P1 < P2 < ** < Pk < Pk+1= 1, thenthe
A* of the original
Proschan,1965) and Fisk-Champernowne
Law empiricalmean difference
(Fisk, 1961 and Champernowne,
1952) have numbersequals (Yntema, 1933)
this property,it is a reasonableassumption.
n - 1Enn
Ixi-xj
We recall the Definition: A d.f. F(x) supi<]
portedon (a, o ) has thedecreasinghazardrate
.,

(D.H.R.)

propertyif -log[1

F(x)]

is con-

cave for x - a. When the densityfunction


f(x) = F'(x) exists,thenthe function
q(x) = f(x)[1-F(x)]-l
(13)

i$]
k+1

i=l

(16)

y2
7

whereptjis the mean of the jth groupand A*d


of the ith group,and yj
UsingLemma5 of Hardyet al., (1952) one is the mean difference
is the proportionof observationsin the ith
can prove
is nonincreasing.

Theorem3: If F(x) is a d.f. on (a, oo) withthe

D.H.R. property,
densityf(x) and finitemean

/,t,then

(p-a)

2A c 2(-a).

IV Estimation
oftheGiniIndex

(14)

groups (i.e., yj = Pl, Y2

P2

P1, ...

,yYk,1

Of course,G A*/(2,1). Formula


of the
(16) shows that the mean difference
originalnumberscan be regardedas the sum
of the "mean difference
betweengroups"and
termwhichweightsthe mean difa correction
ference(A*i) withineach groupby the factor
1

Yi.

Pk) -

Given a Lorenz curve L(p) the standard When all the observationsin each of the
method(Census, 1967 and Morgan,1962) of (k +1) groups are equal, the Gini index retheGiniindexis to approximate
estimating
the duces (aftersomealgebra) to thelowerbound
area of concentration
by choosingk fractiles (15). Thus,thestandardmethodofestimating
the Gini index neglectsthe differences
in in... <Pk<Pk+1z=
land
OZPo<P1
<P2<
G
computing
thearea ofthepolygonwithvertices come withinthe groupsand underestimates
.
by
..
and
,
(0?,), (plL(pj)),
(PkL(Pk))
(1,1).
This procedureleads to an under-estimate
of
(17)
bothA and G sincethestraightlineconnecting D = (2 )1 Y>,y2 Off.
(pOL(pi)) to (pi+l,L(pi+ )) lies above the The factorD is knownas the "groupingcorconvexcurve L(p). Thus, the standardpro- rection"(Goldsmithet al., 1954) and almost
cedureyieldsthe following
lowerboundforG: all the interpolation
formulasattemptto esti-

This content downloaded from 146.155.17.149 on Wed, 20 Aug 2014 23:45:26 UTC
All use subject to JSTOR Terms and Conditions

THE REVIEW OF ECONOMICS AND STATISTICS

310

a specificdensityin each inmateit by fitting


idea of our approachis to
The
main
terval.
obtainboundson D underminimalassumptions
on the densityfunctionin each interval.Let
in theith grouplie between
all theobservations
ai-1 and ai and let ,tibe the group mean. Lem-

TABLE
Income
Interval
(thousands
of dollars)
0-1-

1-2ma 4 yieldsboundson Apj,themeandifference 2-3in theith group,regardless 3-4of the observations

of the form of the underlyingdensity. Thus,

the Gini indexalwayslies between


GL

(2
G

lioI
GL +D

= GU

(18)

where
k+1

D = Al
i=1

-1.

(19)

INCOME DISTRIBUTION
IN TEN GROUPS

Per Centof
Families

Per Centof
Income

Mean Income
(dollars)

4.824
8.253
7.215
6.902
6.615
7.598
7.847
21.404
19.111
10.241

.323
1.492
2.179
2.931
3.625
5.068
6.195
21.950
28.094
28.154

541.41
1,463.63
2,445.72
3,438.90
4,437.32
5,401.18
6,392.92
8,304.54
11,904.33
22,261.50

the "withingroupmean differences,"


Api,by
from
(11))
usingthebound (derived
(213)

t - 1) X
pi
(/,-i
i

(ai -/i) (ai -ai-l

4-55-66-77-1010-15over 15

2.- C.P.S.

(j - a-i-)

A*i

[ (at-i)

-1/3 (,ui-aj1) ]
(21)

One interesting
applicationof formula(17)
is its use in designingthe groupingintervals on theintervals(ai-1,ai) on whichthe density
neededto obtaina desireddegreeof accuracy. decreases. Finally,one can test the assumption that the frequencyfunctiondecreasesin
pth quantileof the popuSince the theoretical
lation is F-1(p), settinga, = F-1(pi) one can (ai-1,ai) by requiringthat ui < (1/2)(a_1 +
thechoiceoffractiles{pj} ai) and that
use ( 17) to determine
or populationquantiles {ail which minimize ni-1 (ali-1 - ai2 ) -1 > nj(ai-a~i_ ) -1
> nj+1(ai+1-ai)-1, (22)
the "groupingcorrection."An exampleof this
whereni is the numberof observationsin the
typeofresultis
Proposition 1: If the underlyingd.f. F(x) is interval(ai_1,ai). In the last interval(ak, oo)
on an interval(a,b), one may be willingto use bounds obtained
the uniformdistribution
then the optimumbounds using k-fractiles fromTheorem3.
In orderto judge our methodwe testedit on
(O = Po < P< ... < Pk < 1) or (k + 1)
presentedin table 2, givento us by Dr.
data,
=
and
i/
when
are
achieved
+
(k 1)
groups
pi
BenjaminTeppingof the Census Bureau. He
1
a)
D -(12(b)
(20) computedthe exact Gini index for the CPS
(b+a) (k+1)2
sample(1968) and formedtwogroupings(into
The above resultimpliesthat the standard 10 and 28 subgroups)of thedata.
practiceofusingquintilesor deciles,i.e., equalThe Gini index computedby Dr. Tepping
ly spacedfractiles{pi}, is notan optimalchoice fromtheentiresampleofapproximately
60,000
which typicallyare incomeswas 0.4014. We computedthe Gini
for income distributions
skewedto the right.
procedures.Methindex usingthreedifferent
In practicethe numberof groupsrequired od 1 used the crudebounds (Mendershausen,
to achieve close bounds on G is ratherlarge 1946). No matterwhattheunderlying
density
(at least 20) as the groupboundaries(a>) are is, the Gini index of the givennumbersmust
D lie betweenthesebounds. Method 2 testsfor
not chosenwiththe purposeof minimizing
or D. So far,no priorknowledgeconcerning decreasing density and replaces the crude
has been bound on the withingroup mean differences
the shape of the incomedistribution
used. As most frequencyfunctionsthathave used in Method 1 by Soltow (1960) where
been fitto incomedata decrease in the high appropriate.The thirdmethodwas the same
incomerange,we can sharpenthe bounds on as Method2 exceptthatwe assumedthatthe

This content downloaded from 146.155.17.149 on Wed, 20 Aug 2014 23:45:26 UTC
All use subject to JSTOR Terms and Conditions

ESTIMATION OF THE LORENZ CURVE AND GINI INDEX


TABLE 3. -

GL

.3883

TABLE 4.- "ESTIMATES"


OF THE GINI INDEX
FROM IRS DATA (POSITIVE INCOME)

ESTIMATES AND BOUNDS ON THE


GINI INDEX

Data in 10 Groups
Procedure
MethodNo.

311

Data in 28 Groups
GU

GL

GU

Year

Number
of
Groups

.4083

.4001

.4020

1955
1956
1957
1958
1959
1960

.3928

.40605

.4005

.401 75

.3975

.40605

.40055

.40175

.4009

.4009

.40525

.40525

Method2

Method 1

Method4

GL

GU

GL

26
25
25
25
25

.4256
.4254
.4253
.4307
.4351

.4283
.4281
.4280
.4335
.4379

.4266
.4265
.4263
.4318
.4363

.4278
.4276
.4275
.4330
.4374

.4329
.4349
.4346
.4407
.4454

25

.4336

.4366

.4351

.4360

.4442

GU

1961
29
.4412
.4433
.442 1
.4465
.4429
densityhad a decreasinghazard rate in the 1962
29
.4401
.4411
.4417
.4481
.4422
last interval.The fourthmethodstudied(The 1963 29
.4423
.4433
.4438
.4501
.4443
.4492
.4465
.4482
.4670
Census Bureau's) does not use the means of 1964 18 .4440
18
.4504
.4530
.4747
.4559
.4549
each groupbut assumesthatthe meanincome 1965
1966
.4536
.4596
.4565
.4584
.4734
19
of each groupis at themidpointof theinterval
1967
19
.4574
.4638
.4608
.4624
.4785
and fitsa Pareto-tailto the last (open-ended) 1968
21
.4622
.4686
.4659
.4670
.4721
interval.In table 3, we presentthe resultsof 1969 2 1 .4597
.4638
.4651
.4669
.4669
computingour boundson Dr. Tepping'sdata
usingboththe 10 groupdecomposition
and anboundsby about 0.002 whichis slightlylarger
othergroupinginto28 intervals.
thanthe difference
betweenthebounds.
From table 3, it is seen that the firstthree
of our methodsgive boundswhichbracketthe
true value (0.4014). Moreover,witha large
V Analysis
ofIncomeTax Data
numberof groups the intervalobtained by
Method1,0.4001 < G < 0.4020,is ofsufficient The IRS summarizesincome tax data by
accuracyto detectsmall changesin the Gini groupingthedata intointervalsand estimating
index. Whenthedata was groupedin 10 inter- the average incomeof each group. By estivals theboundsderivedfromMethod1 differed matingthe Gini index directlyand deriving
by 0.02; however,theboundsusingMethods2 boundson the "groupingcorrection"we avoid
to 0.01325 and the problemnoted by Budd (1970): "Puband 3 reducedthis difference
give frequenciesfor
0.00855,respectively.The value of usingsome lished size distributions
dollar
income
size
brackets
that remainrelaextra assumptionsis apparentif the data is
tivelyconstantfromyear to year; as a result,
coarselygrouped.
sizes and positionsof thequantilereadingsfor
One interesting
resultof our study is the
relativedistributions
derivedfrom.themvary
of Method 4 which considerably."
ratherpoor performance
does notuse theindividualgroupmeans. That
In table 4 we presentan,analysisof all inestimatewas moreaccuratein the case of 10 cometax returnsfortheyears 1955 thru1969
groupsthanin the case of 28 groups. Indeed, which reporteda positiveadjusted gross inthe estimatedGini indexusing 28 groupslies come. The second columngives the number
outsidethe boundsgivenby Method 1 and is, of groupingintervalswhereinthe data was
an impossiblevalue.
therefore,
summarized.Columns3 and 4 give the upper
As a nextstepwe includednegativeincomes and lower bounds for the Gini index using
in our studyand used 29 intervals.As longas Method1 whiletheboundsderivedfromMeththe averageincomeof the populationis posi- od 2 are givenin columns5 and 6. Column4
tive,thiscauses no difficulty
in thecomputation givesthe singleestimatederivedby Method4.
of ourbounds. UsingMethod1, theGiniindex
The figuresin table4 showhow the number
of the Census Bureau's data was boundedby of groupsaffectstheaccuracyof theestimated
0.4024 < G < 0.4039 whileMethod2 yielded Giniindex. Sincethestandardprocedureis the
thebounds0.4027 < G < 0.4037. The inclu- lowerbound (GL) of Method 1 we note that
sion of negativeincomeincreasesboth sets of when the numberof groups is large (25 or

This content downloaded from 146.155.17.149 on Wed, 20 Aug 2014 23:45:26 UTC
All use subject to JSTOR Terms and Conditions

312

THE REVIEW OF ECONOMICS AND STATISTICS

TABLE 5.- "ESTIMATES" OF THE GINI INDEX


the true Gini index
more), it underestimates
FROM IRS DATA (NEGATIVE INCOME INCLUDED)
by onlyabout 0.002 or 0.003. Whenthenumber of groupsused was 18 or 19, thisestimate
Method2
MethodI
was shortby about 0.005 or 0.006. Since SolNumberof
GU
GL
GL
GU
Groups
tow (1960) was tryingto detecta changein Year
.4372
.4360
.4377
27
.4350
the Gini index of about one-halfof one per 1955
.4361
.4349
.4366
26
.4339
cent a year,it appears that Morgan'ssugges- 1956
.4366
.4354
26
.4370
.4343
1957
tion that8 groupswouldsuffice(1962, p. 28) 1958
.4418
.4406
.4423
26
.4396
.4486
.4475
.4491
26
.4463
between the 1959
is not correct. The difference
.4440
.4450
.4456
26
.4426
1960
boundsderivedby Method2, however,always 1961
.4515
.4508
.4519
30
.4498
by less than0.002 and usuallyone can 1962
differed
.4503
.4497
.4507
30
.4487
.4528
.4534
.4539
30
.4519
distinguishbetween years. The years 1955 1963
.4574
.4585
.4558
1964
19
.4533
thru 1957 seem to have a "nearlyconstant" 1965
.4630
.4641
.4612
19
.4586
Giniindex;however,since 1962 theGiniindex 1966
.4670
.4682
.4651
20
.4622
.4688
.4705
.4719
20
.4655
1967
appearsto have increasedyear by year.
.4736
.4748
22
.4764
.4699
Anotherresultof this analysis is the poor 1968
.4718
.4732
22
.4750
.4678
1969
of Method 4 whichalways was
performance
largerthanthe upperboundgivenby Method
1. This occursbecause the typicalfrequency usuallyin the range0.7 to 0.8) and a function
of income(Lydall, 1968) is unimodal, suggestedby Gini (1936), whichis equivalent
function
a Paretotail to theupperincomes.
rising up to a modal value and decreasing to fitting
It is interesting
to contrastour resultsto
thereafter.In the low-incomerange,the freHis
estimatesof the Giniindex
those
of
Budd.
mean
the
true
rises
so
group
quency function
for
1960,
1964, 1966 and 1967
1955,
the
years
of
intervals.
those
the
midpoint
is greaterthan
0.461,
and 0.468. The
0.443,
0.464
0.435,
were
modal
the
for
intervals
past
is
true
The reverse
income. Thus, the procedureunderestimatesmost significantresult is that the value for
theincomeof thelow incomegroupsand over- 1964 (0.461) lies outsidethe interval(given
estimatesthe incomein the higherbrackets, by Method1) in whichtheGiniindexmustlie.
the "relativeinequal- The estimatesfortheotheryearslie insidethe
therebyoverestimating
intervalsobtainedby Method 1 but not by
ity."
Finally we mentionthat Method 4 shows Method2.
Since our approach enables one to detect
thatthe Giniindexfellin 1968 and 1969 while
rather
smallchangesin the Giniindexwithout
our methodshows the index didn't fall until
curvesand as the computertimeto run
fitting
1969. Since the recentboom pickedup steam
our programfor both Methods 1 and 2 for
in 1967-1968 it is reasonablethat inequality
thirteenyearsof data was undereighteensecincreased(on IRS data) in 1968 as morelow
onds on the IBM 360, it should be of pracincomerecipientsearnedenoughmoneyto file tical use.
returnsand reporttheirincomes.
In order to compareour bounds with the
ofLorenzCurves
VI The Comparison
recentestimatesof Budd (1970) we had to
includenegativeincomes. In the computation Giventwo Lorenz curvesL,(p) and L2(p),
of our boundswe used only the crude bound the distribution
L2(p) is morecongenerating
of centrated than the distributiongenerating
(14) on the withingroupmean difference
the negativeincomegroup. Beforepresenting L1(p) wheneverL1(p) c L2(p) forall p. This
thevariousboundson IRS data we recallthat impliesthatthe corresponding
GiniindexesG1
a and G2 satisfyG2 c G1. In this sectionwe
Budd's estimatesare based on interpolating
special kind of Lorenz curve and then com- study conditionson the underlyingd.f.'s F1
putingthe area of concentration.He fitsa and F2 whichallow us to concludethat one
functionwhich is a polynomialfor p po Lorenz curvelies above another.It turnsout
fromthe data and is that the d.f.'s generatingthe extremeGini in(wherepo is determined

This content downloaded from 146.155.17.149 on Wed, 20 Aug 2014 23:45:26 UTC
All use subject to JSTOR Terms and Conditions

ESTIMATION OF THE LORENZ CURVE AND GINI INDEX

313

dices also generatethe extremeLorenzcurves. and


This allows us to generalizethe methodsof
anda* = (a+b)/2.
f = 1-2 ( -a)/(b-a)
section4 to obtainboundingcurvesto thetrue
In orderto generalizeTheorem3 we recall
Lorenzcurveforgroupeddata.
the conceptof a "matchingexponentialdistriIn orderto comparethe "relativespread" bution"foranyd.f.F withD.H.R. supported
on
of two positiver.v.'s X1 and X2 with corre- (0,oo) with mean u. This is the exponential
spondingd.f.'s F1 and F2 and Lorenz curves densitywiththe same meanand supportas F.
L1 and L2 it is convenient
to standardizethem Its d.f.is
by requiringthattheirmeansbe equal. Since
E(x) = 1 -expf-(x-0)/
(a-0) 1, x > 6,
all the proofsof the resultsof thissectionare
(28)
based on the methodsused in (Barlow and
and its Lorenz curveis
Proschan,1965) we omitthemand onlystate
L(p) = p + (1-OL-1) (1-P)ln(1-p).
(29)
thegeneralizations
of theresultsof sectionIII.
The appropriategeneralization
of Theorem
Our firstresultgeneralizesLemma 4 and
yieldsboundson theLorenzcurveforarbitrary 3 is
Theorem 7: Let H(x) be a D.H.R. law on
distributions.
we have
Specifically,
Theorem4: Let F(x) be a d.f.withmean,x and (,cc ) with density hI(x) and let E(x) be its
matchingexponential.Then the Lorenz curve
support(a,b). Then its Lorenz curve,L(p),
L(p)
generatedby H(x) satisfiesL(p) ?
satisfies
L*(p),
whereL*(p) is the Lorenz curvegenp,
B(p)
L(p)
(23)
eratedby E(x).
where
Remark: Whencomparing
a D.H.R. law with
)+
B (p) -a{
(24) its matching
exponential
law
it is essentialthat
1
+ t-'lb(p-r), p > r
s1(ar)
boththeoriginsand themeans of bothd.f.'sbe
and r is determined
by therelationra+ ( 1- r) b identical. Theorem 7 only gives an "upper
= /i. The r.v. X generating
the Lorenz curve bound"to theLorenzcurve. Unfortunately
no
B(p) takes on the value a withprobability
r good "lowerbound" existsbecause the
family
and the value b withprobability(1 - r).
of r.v.'ssupportedon (O,oo) withmean u given
For densitieswhichdecreaseon finiteinterby thed.f.'s
vals theanalogsof Theorems1 and 2 are given
x> 0
e-ax,
(30)
G,(x) =1by Theorem 5: Let F(x) be a concaved.f.supportedon (a,b) withmean u and let U(x) be where a= EU-1, have the D.H.R. property.
Each d.f. of this familycorrespondsto a r.v.
the uniformdensityon (a, 2,u-a) and
whichequals 0 withprobability1 - E and is an
0,

, a

exponentialr.v. with mean JuE-L with probabil-

ityE. As E approaches0, mostofthepopulation


receivesno incomewhilethe small fraction,
E,
x . +
a x<b.
(25)
receivingincomeget large incomes. Thus the
(b-a) (b-a)
I
Lorenz curves generatedby G,(x) approach
,b x
1,
Then the Lorenz curve generatedby the d.f. themostextremeLorenzcurvepossible,namely, L(p) =0 forp < 1 and L(1) = 1.
F(x) obeys
We now discusshow Theorems4 and 5 can
B(p) ? L(p)
[ap + (/t-a)p2] 1,
(26) be used to obtain
boundingcurvesto the true
where
Lorenz curvein all but the open-endedgroup.
As usual, assume that the data have been
P
-lap
,
f
groupedinto intervalswhere boundariesare
f-1+ 1 f)-1,
[a/l
(1-f)
given by 0 a=
< a, . . . < ak < ak+1, the
meanincomein theinterval(aj_1,aj) is ui,the
B (p) =
7)
+,
(* - a) (p2f
numberof incomesin the interval(aj_,aj) is
and the fractilespi = F(aj = (n1 + ... +
ni
P > f
+ ... + nk+l)
are used to estimate
ni)/(n1
(27)

This content downloaded from 146.155.17.149 on Wed, 20 Aug 2014 23:45:26 UTC
All use subject to JSTOR Terms and Conditions

314

THE REVIEW OF ECONOMICS AND STATISTICS

the Lorenzcurve. Thus, we have (k + 1) val- uniformlaw withmean Aft= (ai,+ai)/2


ues, L(pj), of theLorenzcurve. In the region Theorem5 yieldsthe lowerboundary
to incomesin the in(pi1l,pi) corresponding
L(p*)]X
L(p*i) + [L(pi)
terval (ai-1,ai) Theorem4 yields the upper
boundaryline
L(pi-1) + [L(pi) -L(pi-1)]

P-PP
pi-Pi-i

(31)

ai-i

(,*.

so

)(pp*

(36b)
and, if thedensitycan.be assumedto decrease
Remarks: (a) The readershouldnote that
in (a1_jai) Theorem5 yieldstheupperboundthe
fraction
f and thepointp*i of interpolation
arycurve
are different
in the two formulas(34) and
L L,(~i~)
+LL(pi)
L(pi--)]X
(36).
in the case of decreasingdenIndeed
[p))
Px
sitiesthef of (33) is largerthanthef of (35).
PI pPi 1 +(
pi_1)2
al
i)(pi(b) The boundson theLorenzcurvegenerated
ai-+ ( ~iA-)
by (31) and (34) generateverygeneralbound(32) ary curves. They once weredrawnby Hanna
To obtain the corresponding
lower bound- et al. (1948) butbecausethenumberof grouparies to the Lorenz curve,we note that the ingsused was verysmalltheentireidea seems
lowerboundgivenby Theorem4 corresponds to have been dropped. (c) The boundsgiven
to givinga fractionf of thepopulationincome by (31) and (34) also allow us to derive
ai-1 and a fraction(1-f) of incomeai where boundson thederivativeor slopeof theLorenz
(33) curvewhichcan be applied to obtainbounds
If- 1 - (,i-aj_)(ai-aj_)-1.
Letting p*i pi-i + f(pi - P-i), the Lorenz on other measures of inequality. (d) The
curve is bounded frombelow in (pi -p,-)
boundsobtainedforthoseintervalswherethe
densitydecreases,(32) and (36), are an imby the lines
provement
over the corresponding
lines (31)
+
L(pi-1)
[L(p*i) -L(pi-1)]
and
(34)
as
the
curve
(32)
lies
below
(31)
(p-pi-l) Wi-i-Pi_)-1,
p
while
(36)
is
above
(34).
<
<
(34a)
Pi_1
p*l
and
VII Conclusions
L(p *) + [L(pi) - L(p*t)] (p-p*i) (pi-p*j)-l,
andFutureProblems
-

P*j ? P

Pi

(34b)

This papershowsthatthe Giniindexcan be


accuratelyestimatedwithoutfittingcurvesto
data wheneverthe data is groupedproperly.
Nevertheless,
some futureproblemsremain.
From a statisticalviewpointwe need to assess theeffectthatestimating
thegroupmeans
has on ourbounds. The IRS sampleis so large
we ignoredthisin our analysis.
For analyticalpurposesthe variationin the
concept of income measuredmay be severe
(see Budd, 1970). We saw that the effectof
includingnegativeincomeswas much greater
on IRS data than on the Census data. This
suggeststhatwe mightobtaina betterpicture
of whatis happeningif we considerjust wage
L(pi-1) + [L(p*i) -L(pi-)]
incomesin industrialareas. Aggregateincomes
(P-Pi.1) (P*i-P1 ) -1
(36a) fromtax returnscoverso manyworkersin so
where L(p*) = L(p-1) + jL-Jfaij_.In the manyincomparablejobs thattheymay not be
region(p*j,pj), the underlying
densityis the as accuratean indicatoras one wouldlike. In-1- fai_1and )u is
whereL(p*i ) = L(pi-1) +
the mean incomeof the entirepopulation,i.e.,
If the densitycan be assumed
Ft= E nilFtiAni.
to decreasein (a_1,aj) Theorem5 impliesthat
the most "unequal" distribution
of incomeis
generatedby givinga fractionf of the population the lowest income possible,ai1, and
assumingthatthe remaining
incomeis distributed accordingto a uniformlaw on (a_1,aj).
Then f satisfiestheequation
f =1 - 2(stt-aj_1)/(aj-a1_j).
(35)
Definingthe interpolation
point p*i now by
P*i = pi-1 + f(pj-pj_1) the Lorenz curve in
(Pi-1,P*i) is boundedbelowby

This content downloaded from 146.155.17.149 on Wed, 20 Aug 2014 23:45:26 UTC
All use subject to JSTOR Terms and Conditions

ESTIMATION OF THE LORENZ CURVE AND GINI INDEX

315

deed the Census Bureau (1967) foundthat course of this investigation.Also I wish to
the Gini indexvaries greatlyamongdifferentthankMr. J. T. Smithand Mr. David Kasik
occupations.
of The JohnsHopkins Universitywho wrote
Finally it is a pleasureto thankDr. Ben- the computerprogramsforall our procedures
jamin Tepping of the Bureau of the Census and tests. Their cooperationexemplified
not
not only for providingdata for our study only the idea that teachingand researchare
but forhis constantencouragement
duringthe mutuallyrelatedbut thattheyare fun.

APPENDIX
Does theParetoLaw Fit UnitedStatesIncomeData?
Since the Census Bureau fits a Pareto law to the
open-endedgroup,we made a comparisonbetweenthe
average incomeassignedto the over-$20,000groupby
this procedureand the value estimatedby the IRS
fromtax data. It appears that the Pareto fitwas not
bad as late as 1955 but is no longerappropriate.Our
purposeis not to determinewhetherthe Pareto law can
be fittedto the "tail" of the incomedistribution
but to
justifythe desirabilityof our approach which avoids
fittingcurvesand estimatingparameters.
Recall that if income is distributedaccordingto a
Pareto law on (A, co), the proportion,Q, of the total
population with incomes

x is (A/x)a so that

a( = lnQ/ (lnA-lnx)
(37)
and the mean incomeof the group receivingat least x
is X a/(a-1)-

In table 6, we presentthe estimatedvalue of a and


the mean income received by those earningat least
$20,000 (i.e., x= 20,000) whenthe value A, specifying
the startingpointof the Pareto fit,is takenas $10,000
and $15,000. The last columnis the actual mean of the
group obtained fromthe IRS data. It is interesting
that the estimates using $15,000 as the origin are
slightlycloserto the truevalue in recentyearsthanthe
estimates using $10,000; however, the fit in recent
yearsleaves a lot to be desired.
As anotherexampleof the arbitrariness
of the Pareto
fitto the tail of incomedata, we estimateda fromthe
CPS-P-60 seriesNo. 59 (April 18, 1969) data. We set
x= $25,000 and the origin A of the Pareto law at
TABLE

6. -

THE PARETO FIT TO

A =15, x=20

$12,000 and $15,000. When A = $12,000, the estimate


of a was 2.04316 which yielded an estimate of the
average income of $48,965 in the group receivingat
least $25,000. When A = $15,000, the estimate of a
was 2.38315 yielding an estimated group mean of
$43,126 for those incomesgreaterthan $25,000. This
illustrateshow sensitivea Pareto tail is to the choice
of theoriginof thefit. This problemis especiallysevere
with economic data since the groupingintervalsare
not determinedwiththe objectiveof fittinga curve to
the last group. Thus, the "ideal origin"may be in the
middleof a group.

IRS

DATA

A=10, x=20

Year

Alpha

Alpha

1967
1966
1964
1963
1961
1960
1959
1957
1956
1955

2.7702
2.7172
2.4094
2.5047
2.2991
2.2830
2.1518
1.9453
1.7757
1.7119

31298.00
31646.61
34190.23
33292.03
35395.74
35588.41
37363.87
41158.25
45784.49
48094.72

2.7549
2.8226
2.7205
2.7959
2.6294
2.6356
2.5146
2.2730
2.1430
2.0000

31396.74
30973.69
31624.68
31136.73
322 74.78
32228.01
33205.04
35710.69
37498.46
40000.00

Real Data
37524.98
37323.54
37232.96
36678.53
38195.41
37630.12
38918.30
38560.75
39252.67
39646.30

REFERENCES
Aitcheson,J., and J. A. C. Brown,The LognormalDistribution(Cambridge:CambridgeUniversityPress,
1957).
Barlow,R. E., and F. Proschan,MathematicalTheory
of Reliability(New York: JohnWiley and Sons,
Inc., 1965).
Bowman, M. J., "A Graphical Analysis of Personal
Income Distributionin the United States," American Economic Review, 35 (Sept. 1945), 606-628.
Budd, E. C., "PostwarChangesin the Size Distribution
of Income in the United States," AmericanEconomicReview, 60 (May 1970), 247-260.
Bureau of the Census,Trendsin theIncome of Families
and Personsin the UnitedStates, 1947-1964,Technical Paper No. 17 (U.S. GovernmentPrinting
Office,1967).
Champernowne,D. G., "The Graduation of Income
Distributions,"Econometrica,20 (1952), 591-615.
Chow, Y. S. and W. J. Studden,"Monotonicityof the
VarianceUnder Truncationand Variationsof Jensen's Inequality." Annals of MathematicalStatistics,40 (June 1969), 1106-1108.
Rltet6, O., and E. Frigyes,"New Income Inequality
Measures as EfficientTools for Causal Analysis
and Planning,"Econometrica36 (Apr. 1968), 383396.
Fisk, P. R., "The Graduationof Income Distributions,"
Econometrica29 (Apr. 1961), 171-185.
Gini, C., "On the Measure of Concentrationwith Spe-

This content downloaded from 146.155.17.149 on Wed, 20 Aug 2014 23:45:26 UTC
All use subject to JSTOR Terms and Conditions

316

THE REVIEW OF ECONOMICS AND STATISTICS

cial Referenceto Income and Wealth," Abstracts Lydall, H. F., The Structureof Earnings (Oxford:
of paperspresentedat the Cowles CommissionReClarendonPress, 1968).
search Conferenceon Economics and Statistics Mendershausen,H., Changes in Income Distribution
During the Great Depression. Studies in Income
(Colorado Springs:Colorado College Press, 1936).
Goldsmith,S., G. Jaszi,H. Kaitz, and M. Liebenberg,
and Wealth, 7 (New York: H. Wolfffor NBER
1946).
"Size Distributionof Income Since the Mid-ThirMorgan, J., "The Anatomyof Income Distribution,"
ties," this REVIEW, 36 (Feb. 1954), 1-32.
thisREVIEW, 44 (Aug. 1962), 270-282.
Hanna, F. A., J. A. Pechman,and S. M. Lerner,Analysis
of WisconsinIncome. Conferenceon Research in Schutz, R. R., "On the Measurementof Income Inequality," American Economic Review (Mar.
Income and Wealth,9 (New York: National Bu1951), 107-122.
reau of Economic Research,1948).
Hardy, G. H., J. E. Littlewood,and G. Polya, In- Soltow, L., "The Distributionof Income Related to
Press,
equalities(Cambridge:CambridgeUniversity
Changesin the Distributionof Education,Age and
1952).
Occupation,"this REVIEW, 42 (Nov. 1960), 450InternalRevenue Service,Statisticsof Income: Indi454.
vidual Income Tax Returns for 1955 thru 1967.
, "The Share of Lower Income Groups in
Income," thisREVIEW, 47 (Nov. 1965), 429-433.
(U.S. GovernmentPrintingOffice).
Methodsand StrucKendall, M. G., and A. Stuart,The Advanced Theory Taguchi,T., "Concentration-Curve
of Statistics1. 2nd ed. (London: Charles Griffen
turesof Skew Populations,"Annalsof theInstitute
of StatisticalMathematics20 (1968).
and Company,1963).
Liebenberg,M. and H. Kaitz, "An Income Size Dis- Yntema, D., "Measures of the Inequalityin the Pertribution
fromIncomeTax and SurveyData, 1944,"
sonal Distributionof Wealth or Income," Journal
in Studies in Income and Wealth,13 (Cambridge:
of the AmericanStatisticalAssociation28 (1933),
RiversidePress for NBER 1951), 443-444.
423-433.

This content downloaded from 146.155.17.149 on Wed, 20 Aug 2014 23:45:26 UTC
All use subject to JSTOR Terms and Conditions

You might also like