Professional Documents
Culture Documents
Suman Rakshit
M.Sc. Statistics I.I.T. Kanpur, India
Declaration
I hereby declare that this thesis contains no material which has been accepted for the
award of any other degree or diploma in any university or equivalent institution, and that,
to the best of my knowledge and belief, this thesis contains no material previously by
another person, except where due reference is made in the text of the thesis.
SUMAN RAKSHIT
3
Abstract
4
Acknowledgements
5
Contents
6
List of Tables
7
List of Figures
8
Chapter 1
Simultaneous test of Non-inferiority and
Superiority : Lorenz curve perspective
Introduction
1.1 Motivation
Lorenz dominance:
Recent interest in poverty issues is centered on unambiguous ranking of income distribu-
tions. Lorenz dominance provides an unanimous ranking of populations based on income
9
inequality (Atkinson (1970)).1 An extensive research is bestowed to test for Lorenz domi-
nance and stochastic dominance (see, for instance, Kaur, Rao and Singh (1994), Mcfadden
(1989), Davidson and Duclos (2006), Anderson (1996), Barrett and Donald (2003)). While
Lorenz dominance continues to play the central role in ranking populations, experience
shows that in many cases of practical interest, the Lorenz curves being compared do
intersect and hence there is no Lorenz dominance. Therefore, there is a definite need to
identify the limitations of this criterion and develop new statistical methodology that
should be able to rank the populations under much broader circumstances.
New methodology:
A related statistical inference problem has been studied in recent literatures of medical
statistics for comparing two different treatments when the response variable is multivariate
(see for instance, Bloch, Lai, Tubert-Bitter (2001), Hung, Wang, Tsong, Lawrence, O’Neil
1 For ease of understanding, the Lorenz dominance is considered in accordance with the Lorenz curve
dominance, i.e. population-1 dominates population-2 by Lorenz order if Lorenz curve of population-1
completely lies above the Lorenz curve of the population-2.
2 Davies and Hoy (1995) made the aversion to downside inequality (ADI) criterion more fully operational
by providing a simple procedure for establishing whether any two distributions whose Lorenz curves cross a
finite number of times can be ranked under ADI
10
(2003), O’Brein (1984), Laska, Tang, Meisner (2002), Tamhane, Logan (2004) Perlman, Wu
(2004)). It is very rare that one treatment will dominate another in terms of every outcome
variable of interest. In such cases, one may conclude that there is a difference of practical
significance when treatment-1 performs better than treatment-2 with respect to at least
one variable but not substantially worse with respect to every variable. In this chapter,
we adopt this approach to suggest a new way of formulating an income inequality testing
procedure, what we may call near Lorenz dominance. This would not be as strong as Lorenz
dominance.
The need for statistical methods to establish that a particular treatment is better than an-
other, is encountered frequently in many areas of studies. In most of the existing empirical
studies, the treatment effect is usually captured by a scalar parameter and a studentized t
test statistic is used. Often the treatment effect is reduced to a scalar parameter for sim-
plicity. There are many practical settings where the parameter representing the treatment
effect is either a finite dimensional vector parameter or a smooth function. In such cases,
the methodological issues can be challenging.
Suppose that income data are available only in grouped form. For example data are
available only for the income groups (t0 , t1 ], · · · , (tk , tk+1 ]. Let θi = F1 (ti ) − F2 (ti ), i = 1, · · · , k.
In this case, to establish that population-2 dominates population-1 in terms of income
level, we need to establish that θi ≥ 0 for every i with strict inequality holding for at least
one i (i = 1, · · · , k).
most cases, stochastic functions such as Lorenz curve, response functions, ROC curve e.t.c)
over another under comparison.
The underlying assumption of all of these earlier works was that the test statistic is pivotal
i.e. the corresponding asymptotic covariance matrix say V , does not depend on the original
parameter value θ. Consequently, that made their testing problem quite simple and the
bootstrap algorithm straightforward. However, the testing problem we are focused on
is not pivotal in nature (see Holly and Monfront (1980) Kodde and Palm (1987), Wolak
(1980)). Therefore, the bootstrap procedure described in Tamhane and Logan (2004) and
Bloch et al. (2001,2007) does not directly solve the problem of unknown least favorable
null configuration. Accordingly, we adhere to the local nature of the hypothesis test for
solving the intractable least favorable null configuration problem. It makes the problem
nonstandard and more challenging. Thus, there is a definite need to develop new improved
inferential methodologies for problems like ours. This chapter focuses on these issues.
well as analyze its power properties. The study compares the performance of the UI-IU
test with that of the Perlman and Wu test.
Section 6 provides an empirical study to illustrate the methodological ideas. The com-
parison of income inequality between year 2001 and year 2002 is derived for households
throughout the Australia. The annual disposable income estimates are used to conduct
the study.
In section 7, the main issues are summarized. Based on the findings and recent develop-
ment in the area, a conclusion is made.
Section 8, 9 and 10 form the appendix for the proofs, tables and figures respectively.
A simulation study is conducted to investigate the type-I error rate of the UI-IU test,
as well as analyze its power properties. The important aspect of this study is to assess
the accuracy of the theories those are developed in earlier section. Accordingly, the null
configuration θ = δ n is investigated for various populations. In line with proposition 1.5.2,
the type-I error rate for configurations on the boundaries are investigated and reported.
To evaluate the performance of the UI-IU test, a comparison is made with the Perlman and
Wu test statistic.
Lorenz Curve:
The main objective of this study is to demonstrate the applicability of the testing procedure
in case of income inequality comparison. Lorenz curves have been used to compare the
income inequality between two populations. Formally, the Lorenz curve corresponding to
an income distribution F is defined as
R F−1 (p)
0
u dF(u)
L(p) = R∞ for p ∈ [0, 1] (1)
0
u dF(u)
15
L(p) represents the total income received by the bottom 100p% of the population expressed
as the proportion of the total income of the population. The empirical estimate of the
Lorenz ordinate is obtained from the sample income data. Let {X(1) ≤ · · · , X(n) } be the
ordered observations of a random sample. Then, the sample estimate of Lorenz ordinate is
given by
{X(1) + · · · + X(r) }
L̂(p) = (2)
{X(1) + · · · + X(n) }
Let us first consider the scenario when the income data are generated from a discrete
income distribution. The income groups for discrete income distributions are obtained
using Singh-Maddala distribution. A 3-parameter Singh-Maddala distribution is well
known for providing good fit to income data ( Branchman et al. (1996)). The functional
form of the Lorenz curve in the case of Singh-Maddala distribution is used to obtain the
income groups. The Lorenz ordinate of a Singh-Maddala distribution is expressed as
R 1−(1−p)1/c
0
t1/b (1 − t)c−1/b−1 dt
L(p) = R1 , 0≤p≤1 (3)
0
t1/b (1 − t)c−1/b−1 dt
The design that is implemented to determine the discrete income distributions is the
following. For m comparison points on x-axis the Lorenz ordinates are obtained. The
discrete income distribution is assumed to be multinomial distribution with m + 1 income
levels. Under the assumption that it produces the same Lorenz ordinates at those m grid
points, the income levels are calculated.
16
Proposition 1.7.1:
Let {L1 , · · · , Lm } are Lorenz ordinates obtained using Singh-Maddala distribution at {0 < x1 <
· · · < xm < 1}. The income levels corresponding to the discrete distribution are given as
where β > 0 is any positive number signifies the scale of the incomes.
Lorenz ordinates for population 1 are obtained using the afore mentioned technique.
However, to select the Lorenz ordinates for population 2, the following result is considered
Proposition 1.7.2:
Let g : [0, 1] → [0, 1] is an increasing convex function satisfying, g(0) = 0 and g(1) = 1. Let
L(x), x ∈ [0, 1] is a Lorenz curve. Then L∗ (x) = L(g(x)) is also a Lorenz curve and L∗ (x) ≤ L(x) for
all x ∈ [0, 1].
Consequently, the Lorenz ordinate for population 2 is defined as L2 (p) = L1 (g(p)), where
g(x) = x1+η has been chosen. For this part of the simulation, ε n was chosen as half of the δ n .
The simulations are carried out for different combinations of sample size and η to evaluate
the accuracy of the asymptotic least favorable null configuration given in proposition
1.5.3. For each Singh-Maddala distribution, four combinations of (n, η) are considered:
(100, 1.4), (500, 1.2) (3000, 1.1) and (10000, 1.05). Furthermore, for each simulation, 1000
independent sets of data are generated. Five hundred bootstrap samples are generated
17
from each of the 1000 sets of data, as described in Section 1.6. The nominal level of the
test is 0.05 in every case.
Results:
The simulation results for null configuration θ = δ n are summarized in Table 1.11.1 -
Table 1.11.4. The results here are fairly consistent with the theory developed in Section
1.5. Even when the sample size is only 100, the empirical level of the test is consistently
close to the nominal level. The level error is not affected substantially by the choice of
η. However, it is worth noting that the rejection rate is almost equal to the nominal level
for small η and large n. On the other hand, rejection rates for smaller sample sizes are
relatively lower than the significance level of the test, mostly lying between 0.030-0.040.
Moreover, the difference between the results due to the difference in m is small, with m = 9
performing slightly better. Overall, the rejection rates corresponding to m = 9 are closer to
the nominal level than m = 5. In addition, the difference in results between the UI-IU and
the Perlman and Wu tests are not considerable. However, the rejection rate corresponding
to the Perlman-Wu statistic is perceived to be slightly higher. Overall, both the tests control
the type I error rates for all the cases that are considered.
{θt = −εnt , for exactly one t, and θj > δnj for all j , t, t = 1, · · · , m}
These configurations are asymptotic least favorable if the distance between θj and δnj
increases to infinity ( in limit) for all j , t. However, in case of Lorenz difference, the
18
configurations of this nature are not feasible. Consequently, simulations are carried out
for configurations with large differences between θj and δnj for j , t.
In addition, the null configurations corresponding to θt = −εnt for two or more t are also
included in the simulation study. The construction of these configurations amounts to a
tedious design plan.
Design plan:
The income groups for both the populations are obtained using the Singh-Maddala distri-
bution. All the configurations in the present scenario are associated to intersecting Lorenz
curves. Therefore, the parameter values for both the populations are chosen in such a way
that the corresponding population Lorenz curves intersect each other.
Let (bi , ci ) be the parameters corresponding to the ith population, i = 1, 2. Then, the two
population’s Lorenz curves intersect each other if any one of the following occurs
However, the δn is obtained as it was done in the previous section, using the proposition
1.7.1. It is calculated with respect to population 1 by choosing small η values between 1.01
and 1.05. The εn is considered in accordance with the configurations under consideration.
A sample size of 5000 is considered for this part of the simulation study. The large sample
size and the small value of η are chosen in line with the theory. The number of bootstrap
repetitions and the sets of data that are generated are kept same as before.
Results:
The results of the study are reported in Table 1.11.5-1.11.8. Parameter values for the
populations are reported under the column heading (bi , ci ) in the table. Clearly, the
estimated levels of the bootstrap test are much lower than the nominal level. Specifically,
for the configurations possessing less number of θt ’s larger than δt s, are substantially
19
lower than the significance level. Rejection rates for configurations featuring two or less
such points are between 0.01 and 0.02. Furthermore, the rate increases up to 0.03 for
configurations with more than two points.
The empirical level of the test is also influenced by the magnitude of the difference between
θ and δ n . The rejection rates are between 0.01 and 0.03 when the difference between θj ’s
and δnj ’s are not substantial. However, it is worth noting that, the rejection rate is close to
the nominal level for large difference between θ and δ n . Rejection rate increases close to
the nominal level for configurations bearing θj greater than five times of δnj . Overall, the
error rates do not surpass the nominal level for all the configurations that are studied.
Alongside the grouped income data, continuous income data are also encountered quite
frequently in different economic study (Bhattacharya et al. (2007)). Continuous income
data are generated from the Singh-Maddala distribution for both the populations. Simu-
lations are conducted using the same Singh-Maddala distributions those are used in the
study of discrete data. The comparison margins of the test respectively εn and δn are also
considered to be the same as before. All the simulation results are summarized in the table
1.11.9 and 1.11.10.
Simulation results corresponding to θ = δn are reported in the table 1.11.9. In general, the
results are fairly consistent with the theory. The empirical level of the tests are close to
the nominal level. Apart from few isolated cases, the rejection rates are between 0.045
and 0.05. Overall, the results are considerably similar to the case of discrete income
distribution.
Simulations corresponding to other null configurations are produced in table 1.11.10. All
the results here are close to their discrete counter parts. Overall, the simulation results
justify the theory for both kinds of data.
20
1.9 Conclusion