This is T. After years trying to convince you that this is danger you don't believe.
These streets are not where you want to be when it's late at night
This is T. After years trying to convince you that this is danger you don't believe.
These streets are not where you want to be when it's late at night
This is T. After years trying to convince you that this is danger you don't believe.
These streets are not where you want to be when it's late at night
QUESTHIG, Vol 14, n° 1,2,3 pp. 11-25, 1990
AN INDEX OF DIVERSITY IN
STRATIFIED RANDOM SAMPLING BASED
ON THE HYPOENTROPY MEASURE
L. PARDO. D. MORALES
Universidad Complutense de Madrid
We consider a measure of the diversity of a population based on the
A-measure of hypoentropy introduced by Ferreri (1980). Our purpose
is to study its asymptotic distribution in a stratified sampling and its
application to testing hypothesis, A numerical example based on real
data is given.
Keywords: Asymptotic normality, Diversity, Hypoentropy measu-
re, Estimation, Testing hypothesis, Stratified sampling.
A.M.S. subject classification: 62B10, 94A15.
1. INTRODUCTION
When the observations from a population are classified according to several
categories, the uncertainty of the population may be quantified by means of
several measures in Information Theory. The diversity of the population is
intuitively intended as a measure of the average variability of classes in it, based
on the number of classes and their relative frequencies.
L. Pardo, D. Morales, Departamento de Estadistica ¢ LO. Facultad de Mateméticas. Uni-
versidad Complutense de Madrid. 28040 Madrid (SPAIN).
his work was partially supported by the Direccién General de Investigacién Cientifica y
Técnica (DGICYT) under the contract PS89-0019.
~Article rebut el novembre de 1990.
IConsider a finite population of N individuals which is classified according to
a classification process of factor X into M classes or especies 1,...,247. We
denote by X the set of all categories or classes
%X = (21,...,2m}
Let Ay = {r= (pi, --.spar)/ps > 0, n= = i} be the set of all complete
finite discrete probability distribution on the measurable space (X , Bx )-
Rao (1982) established that a measure of diversity is a function
:Ay —R
satisfying the following conditions:
i) &(P)>0VPE Ay and oP)
iff P is degenerate
ii) ® is a concave function of P in Ags.
We shall refer to ®(p) as the diversity measure within a probability space
(% , Beg , P). The condition i) is a natural one since a measure of diversity should
be noPhegative and take the value zero when all individuals of a population
are identical, ic., when the associated probability measure is concentrated at a
particular point of X . The condition (ii) is motivated by the consideration that
the diversity in a mixture of populations should not be smaller than the avera-
ge of the diversities within individual populations. Consequently, the entropy
measures can be used as diversity indices.
Ferreri (1980) introduced a generalization of Shannon’s entropy called hy-
poentropy; the expression of this measure of uncertainty for a discrete probabil-
ity distribution P € Agr, is given by
M.
o)(P)= (: + 3) log(1 + A) — ; yu + Ap;)log(1 + Ap;), A> 0
where
a
lim ®4(P) = H(P) = — LP log pi
12Assume that a sample of n members is drawn at random sampling. If we
consider the estimate , obtained by replacing p;’s by the observed proportions
fi, i=1,...,M, then
nr (
= ba(Prpas---sPM)) + N'(0,03)
nfoo
where
%
M M 2
SY pilog’(d + Ap) = (= pilog(1 + »»)
is ist
which has been used for testing several hypothesis (ref. Morales et al., 1990).
Now we suppose that the population can be divided into r non-overlapping
subpopulations, called strata, as homogeneous as possible with respect to the
diversity associated with XY. Let Nj be the number of individuals in the jth
stratum (so that, =}. =, Nj = N), and let pie denote the probability that
a randomly selected individual belongs - the kth stratum and to the class
2(i=1,...,M,k=1,...,7). Thus, OM) pe = Me /N, OM 1 j= 1Pij = 1.
Let pj be, the probability that a randomly selected He in the whole
population will belong to the class xi (pi = Ti. ,M). Then the
hypoentropy population diversity associated with X is given by
M.
a + Float +) -4 Ltt an )log(1 + Api.)
(x)
M.
at F)log(1 +») — FEO Eira batt + Apa A>0
"
Assume that a stratified sample of size n is drawn at random from the popu-
lation independently in different strata. We hereafter suppose that the sample
is chosen by proportional allocation in each stratum. Assume that a sample of
size nz is drawn independently at random with replacement from the kth stra-
tum, where n/n = Nx/N. If fix denotes the relative frequency of individuals
belonging to the class 2; an to kth stratum (and, hence SM, fie = ne/n),
and fi = Dje1 Six, then the diversity sample with respect to the classification
process or factor X could be quantified by means of the analogue estimates,
the hypoentropy sample diversity, 6§ = $4(X). Following the ideas in M.A.
13Gil (1989), we will study in this paper asymptotic behavior of the hypoentropy
sample diversity 4§.
2. ASYMPTOTIC DISTRIBUTION OF a
In this section, we state a general result concerning the asymptotic behavior
of the hypoentropy sample diversity, 4%, in the stratified random sampling with
proportional allocation, with replacement in each stratum and independence
among different strata.
The following theorem establishes the asymptotic behavior of $5.
Theorem 1
If we consider the estimate o{, then
nll? (65 — 6(X)) — N(0.03,4)
where
and t;, = —log(1+ Api.) — 1.
Proof
Let. us define
GAS) = OND) = 44(X))
where {0 = (fice dimen Sie Serety) and f= fy fanny
sey fary---; Emr). Now we consider the Taylor expansion for G{(f*)
I Baayen")
sac Ye GE ae (fir — pin) + Rn
is1 kat
in a neighbourhood of= (Pray. PEM ayy Piro PUM =e)
where
G4(p") = $3 (p)(O5(p) = $4(X)), and p = (pir, --,Patiy +++) Pirs-+-1PMr)
and Ry is the Lagrange rest
As,
Gor) = (145) toed +ay— £% (142 Ene) ay
mi
‘| roM=1 1 M=1
- $((:494- 2 me) (4.0- )))
mim fi ist
we have
oe) = — (log(1 + Ap; ) + 1) + (log(1 + Apa.) + 1).
ik
Then,
Mat
GAP) = GY") + YO (= log (1 + Api) = 1) (fi. - vi.)
im
Mz
+} (log 1 + Apa) +1) fi. =i.) + Rn
a
Maa
= Gir") + SO (—log(1 + Api.) — 1) Fi. — i)
isi
+ (log(1 + Apar.) + 1)! = far. — (= pt.) + Ba =
My
= GAP") + YO (= log(1 + Avi.) = Di - 7.) + Rn‘Therefore, as
GAW") = OX) and GF) = A)
it follows that, the random variables
: M
(=X) and YO (log 41) - DG - 2.)
converge in law to the same distribution because Ry converge in probability to
zero. From now on, we denote f;, = (—log(1 + Ap;.) ~ 1)
‘The random vectors,
(Rin - fore), keleyr
on
ne
are independent and distributed as a multinomial distribution of parameters
(nas Pres Aeparay) » respectively
Applying the (M — 1)—dimensional Central Limit Theorem, we obtain
abl? (hie — Ripe) os (Be Searim — RP) — NOE)
ng f 00
k=l.ur
where
ePu 0
X(K) = 7 -P(K)P(K) k=1,...,7,
0 PePiM =k
and
: N
PIRy! = feraroe))
ie,nul? (Sae — Pin)s «+s Sune — Paar-ne)) > N(0,E(K))
np Too
k=1,...,7
2 2( NYP
nll? = nif (=)
we have
nl (YO fag — pus) oo-o(Searenye ~ Para) — NBC)
ng 120
‘Therefore
phat
iP? (YS = ta Sie = Pin) — NO TSK)
i=l |
with
T= (ty — tat --stare. ~ ta)
As,
Mo} M.
Sh. = ta fie = Pin) = SOU Sin — Pie)
ict ist
we have
M.
n> tiie — Pit) =
ist k=
= L
= PSH (fi = 14.)
istTherefore,
|. FU MTMAT =
fal
1G, (4M “oN
— wo" (: Pee (Sprms.)
3. APPLICATIONS ON TESTING HYPOTHESIS
In the stratified radom sampling with proportional allocation with replace-
ment in each stratum and independence among different strata, the hypoentropy
can be used for testing the following hypothesis:
i) Ho : 6§ = Do against 4, > Do. Under Ho, the statistic
_ nl? (65 — Do)
~ Or
has aproximately a standard normal distribution for sufficiently large n.
Clearly, we reject Ho at level « if : > zq, where zo is such that P(Z >
Zq) = @. Similar arguments may be applied in the remaining cases, i.e.,
Hy: 4§ < Do and I: 4% # Do.
On the basis of this result we can obtain a method to test the null hypo-
thesis:
Ho: pi. = pa, = +++ = pa. = 1/M
This hypothesis is equivalent to test
No: =O) MAX
whereue a.
#8 MAx = (1+1/) log(1 +A) —
In this case we reject Ho at the level a if
Faf2Or,s
<8 Max or 64> 45 Max t
”
Since in the non-null case
ni!fd4 ~ 63(P)]
i
n}co
the asymptotic power function in P = (pi,p2.
-sPar) is given by
AP)
6) pax — PMP) = Oaaza/2
+( aaAP) )
on,s(P)
* tax — OCP) eto)
where W denotes P(X < x) when X is normally distributed with mean
0 and variance 1.
ii) H. : Dy = Dz (diversities of two independent populations are equal)
against one-sided or two sided alternatives. Now under Hg, the statistic
_ (mn
(ra
has aproximately a standard normal distribution, where subscript i has
been used to denote population i and nj denote the sample size in popu-
lation i, (i = 1,2).
1/2 (a —
TE
1tmd%0)Remark
From theorem 1, an approximate 1 — a level confidence interval for § is
given by
Grstal2 zy , Frs%a/2
(3 S aaiitee A arys
furthermore, the minimum sample size guaranteeing a specified limit of error €
with a small risk is
4, CONNECTIONS WITH THE RANDOM SAMPLING
Stratification provides a method of utilizing supplemental information to get
greater precision in our sample estimates. Auxiliary information may be used to
divide the population into groups, called strata, such that the elements within
cach group are more alike than are the elements in the population as a whole.
Then a sample is selected from each stratum and the sample results from the
different strata are pooled in order to arrive at an estimate for the whole. If there
are large differences between the units in the differents strata the accuracy of
the averall estimates will be substantially increased as strata will be represented
in their correct proportions, whereas in a random sample these proportions will
be subject to sampling errors. We are now going to formulate the comments
above for the asymptotic variances as well as for their analogue estimates.
The following theorem establish that the asymptotic variance of the statistic
n1/?(g§ — 6§(X)) is smaller than the asymptotic variance of n¥/2(d, — $,(P)).
Theorem
It is verified that
Fr Soh
with equality if and only if r= 1 or
val)M
N
Slog (1 + Api HP
ist
does not depend on k(k =
an).
Proof
Since
with 4, = log(1 + Api.) — 1, we have
e M
1 N
a. = FD m(3 (log 1 + Am ) +1) Spee
2
pie(log() + Ap) + ») ) =
baer 1 Moy 7
= log?(1 +p. oe — 57 Do Ne (= FagPit lawl + ps ) .
pater ‘ =
M i ee 2
os a ooo Ps
= Votan ie - 5 z Do pyres loa(d + Ap. )
= = OM
Now we consider the random variable Z, taking the values
Pix log(1 + Ap; ).
with probabilities 44, respectively. Let us consider the convex function,
o(2) = 2
.then by using Jensen’s inequality, we have
M 7 1 r M N 2
(Serta m)