You are on page 1of 16
QUESTHIG, Vol 14, n° 1,2,3 pp. 11-25, 1990 AN INDEX OF DIVERSITY IN STRATIFIED RANDOM SAMPLING BASED ON THE HYPOENTROPY MEASURE L. PARDO. D. MORALES Universidad Complutense de Madrid We consider a measure of the diversity of a population based on the A-measure of hypoentropy introduced by Ferreri (1980). Our purpose is to study its asymptotic distribution in a stratified sampling and its application to testing hypothesis, A numerical example based on real data is given. Keywords: Asymptotic normality, Diversity, Hypoentropy measu- re, Estimation, Testing hypothesis, Stratified sampling. A.M.S. subject classification: 62B10, 94A15. 1. INTRODUCTION When the observations from a population are classified according to several categories, the uncertainty of the population may be quantified by means of several measures in Information Theory. The diversity of the population is intuitively intended as a measure of the average variability of classes in it, based on the number of classes and their relative frequencies. L. Pardo, D. Morales, Departamento de Estadistica ¢ LO. Facultad de Mateméticas. Uni- versidad Complutense de Madrid. 28040 Madrid (SPAIN). his work was partially supported by the Direccién General de Investigacién Cientifica y Técnica (DGICYT) under the contract PS89-0019. ~Article rebut el novembre de 1990. I Consider a finite population of N individuals which is classified according to a classification process of factor X into M classes or especies 1,...,247. We denote by X the set of all categories or classes %X = (21,...,2m} Let Ay = {r= (pi, --.spar)/ps > 0, n= = i} be the set of all complete finite discrete probability distribution on the measurable space (X , Bx )- Rao (1982) established that a measure of diversity is a function :Ay —R satisfying the following conditions: i) &(P)>0VPE Ay and oP) iff P is degenerate ii) ® is a concave function of P in Ags. We shall refer to ®(p) as the diversity measure within a probability space (% , Beg , P). The condition i) is a natural one since a measure of diversity should be noPhegative and take the value zero when all individuals of a population are identical, ic., when the associated probability measure is concentrated at a particular point of X . The condition (ii) is motivated by the consideration that the diversity in a mixture of populations should not be smaller than the avera- ge of the diversities within individual populations. Consequently, the entropy measures can be used as diversity indices. Ferreri (1980) introduced a generalization of Shannon’s entropy called hy- poentropy; the expression of this measure of uncertainty for a discrete probabil- ity distribution P € Agr, is given by M. o)(P)= (: + 3) log(1 + A) — ; yu + Ap;)log(1 + Ap;), A> 0 where a lim ®4(P) = H(P) = — LP log pi 12 Assume that a sample of n members is drawn at random sampling. If we consider the estimate , obtained by replacing p;’s by the observed proportions fi, i=1,...,M, then nr ( = ba(Prpas---sPM)) + N'(0,03) nfoo where % M M 2 SY pilog’(d + Ap) = (= pilog(1 + »») is ist which has been used for testing several hypothesis (ref. Morales et al., 1990). Now we suppose that the population can be divided into r non-overlapping subpopulations, called strata, as homogeneous as possible with respect to the diversity associated with XY. Let Nj be the number of individuals in the jth stratum (so that, =}. =, Nj = N), and let pie denote the probability that a randomly selected individual belongs - the kth stratum and to the class 2(i=1,...,M,k=1,...,7). Thus, OM) pe = Me /N, OM 1 j= 1Pij = 1. Let pj be, the probability that a randomly selected He in the whole population will belong to the class xi (pi = Ti. ,M). Then the hypoentropy population diversity associated with X is given by M. a + Float +) -4 Ltt an )log(1 + Api.) (x) M. at F)log(1 +») — FEO Eira batt + Apa A>0 " Assume that a stratified sample of size n is drawn at random from the popu- lation independently in different strata. We hereafter suppose that the sample is chosen by proportional allocation in each stratum. Assume that a sample of size nz is drawn independently at random with replacement from the kth stra- tum, where n/n = Nx/N. If fix denotes the relative frequency of individuals belonging to the class 2; an to kth stratum (and, hence SM, fie = ne/n), and fi = Dje1 Six, then the diversity sample with respect to the classification process or factor X could be quantified by means of the analogue estimates, the hypoentropy sample diversity, 6§ = $4(X). Following the ideas in M.A. 13 Gil (1989), we will study in this paper asymptotic behavior of the hypoentropy sample diversity 4§. 2. ASYMPTOTIC DISTRIBUTION OF a In this section, we state a general result concerning the asymptotic behavior of the hypoentropy sample diversity, 4%, in the stratified random sampling with proportional allocation, with replacement in each stratum and independence among different strata. The following theorem establishes the asymptotic behavior of $5. Theorem 1 If we consider the estimate o{, then nll? (65 — 6(X)) — N(0.03,4) where and t;, = —log(1+ Api.) — 1. Proof Let. us define GAS) = OND) = 44(X)) where {0 = (fice dimen Sie Serety) and f= fy fanny sey fary---; Emr). Now we consider the Taylor expansion for G{(f*) I Baayen") sac Ye GE ae (fir — pin) + Rn is1 kat in a neighbourhood of = (Pray. PEM ayy Piro PUM =e) where G4(p") = $3 (p)(O5(p) = $4(X)), and p = (pir, --,Patiy +++) Pirs-+-1PMr) and Ry is the Lagrange rest As, Gor) = (145) toed +ay— £% (142 Ene) ay mi ‘| roM=1 1 M=1 - $((:494- 2 me) (4.0- ))) mim fi ist we have oe) = — (log(1 + Ap; ) + 1) + (log(1 + Apa.) + 1). ik Then, Mat GAP) = GY") + YO (= log (1 + Api) = 1) (fi. - vi.) im Mz +} (log 1 + Apa) +1) fi. =i.) + Rn a Maa = Gir") + SO (—log(1 + Api.) — 1) Fi. — i) isi + (log(1 + Apar.) + 1)! = far. — (= pt.) + Ba = My = GAP") + YO (= log(1 + Avi.) = Di - 7.) + Rn ‘Therefore, as GAW") = OX) and GF) = A) it follows that, the random variables : M (=X) and YO (log 41) - DG - 2.) converge in law to the same distribution because Ry converge in probability to zero. From now on, we denote f;, = (—log(1 + Ap;.) ~ 1) ‘The random vectors, (Rin - fore), keleyr on ne are independent and distributed as a multinomial distribution of parameters (nas Pres Aeparay) » respectively Applying the (M — 1)—dimensional Central Limit Theorem, we obtain abl? (hie — Ripe) os (Be Searim — RP) — NOE) ng f 00 k=l.ur where ePu 0 X(K) = 7 -P(K)P(K) k=1,...,7, 0 PePiM =k and : N PIRy! = feraroe)) ie, nul? (Sae — Pin)s «+s Sune — Paar-ne)) > N(0,E(K)) np Too k=1,...,7 2 2( NYP nll? = nif (=) we have nl (YO fag — pus) oo-o(Searenye ~ Para) — NBC) ng 120 ‘Therefore phat iP? (YS = ta Sie = Pin) — NO TSK) i=l | with T= (ty — tat --stare. ~ ta) As, Mo} M. Sh. = ta fie = Pin) = SOU Sin — Pie) ict ist we have M. n> tiie — Pit) = ist k= = L = PSH (fi = 14.) ist Therefore, |. FU MTMAT = fal 1G, (4M “oN — wo" (: Pee (Sprms.) 3. APPLICATIONS ON TESTING HYPOTHESIS In the stratified radom sampling with proportional allocation with replace- ment in each stratum and independence among different strata, the hypoentropy can be used for testing the following hypothesis: i) Ho : 6§ = Do against 4, > Do. Under Ho, the statistic _ nl? (65 — Do) ~ Or has aproximately a standard normal distribution for sufficiently large n. Clearly, we reject Ho at level « if : > zq, where zo is such that P(Z > Zq) = @. Similar arguments may be applied in the remaining cases, i.e., Hy: 4§ < Do and I: 4% # Do. On the basis of this result we can obtain a method to test the null hypo- thesis: Ho: pi. = pa, = +++ = pa. = 1/M This hypothesis is equivalent to test No: =O) MAX where ue a. #8 MAx = (1+1/) log(1 +A) — In this case we reject Ho at the level a if Faf2Or,s <8 Max or 64> 45 Max t ” Since in the non-null case ni!fd4 ~ 63(P)] i n}co the asymptotic power function in P = (pi,p2. -sPar) is given by AP) 6) pax — PMP) = Oaaza/2 +( aaAP) ) on,s(P) * tax — OCP) eto) where W denotes P(X < x) when X is normally distributed with mean 0 and variance 1. ii) H. : Dy = Dz (diversities of two independent populations are equal) against one-sided or two sided alternatives. Now under Hg, the statistic _ (mn (ra has aproximately a standard normal distribution, where subscript i has been used to denote population i and nj denote the sample size in popu- lation i, (i = 1,2). 1/2 (a — TE 1tmd%0) Remark From theorem 1, an approximate 1 — a level confidence interval for § is given by Grstal2 zy , Frs%a/2 (3 S aaiitee A arys furthermore, the minimum sample size guaranteeing a specified limit of error € with a small risk is 4, CONNECTIONS WITH THE RANDOM SAMPLING Stratification provides a method of utilizing supplemental information to get greater precision in our sample estimates. Auxiliary information may be used to divide the population into groups, called strata, such that the elements within cach group are more alike than are the elements in the population as a whole. Then a sample is selected from each stratum and the sample results from the different strata are pooled in order to arrive at an estimate for the whole. If there are large differences between the units in the differents strata the accuracy of the averall estimates will be substantially increased as strata will be represented in their correct proportions, whereas in a random sample these proportions will be subject to sampling errors. We are now going to formulate the comments above for the asymptotic variances as well as for their analogue estimates. The following theorem establish that the asymptotic variance of the statistic n1/?(g§ — 6§(X)) is smaller than the asymptotic variance of n¥/2(d, — $,(P)). Theorem It is verified that Fr Soh with equality if and only if r= 1 or val) M N Slog (1 + Api HP ist does not depend on k(k = an). Proof Since with 4, = log(1 + Api.) — 1, we have e M 1 N a. = FD m(3 (log 1 + Am ) +1) Spee 2 pie(log() + Ap) + ») ) = baer 1 Moy 7 = log?(1 +p. oe — 57 Do Ne (= FagPit lawl + ps ) . pater ‘ = M i ee 2 os a ooo Ps = Votan ie - 5 z Do pyres loa(d + Ap. ) = = OM Now we consider the random variable Z, taking the values Pix log(1 + Ap; ). with probabilities 44, respectively. Let us consider the convex function, o(2) = 2 . then by using Jensen’s inequality, we have M 7 1 r M N 2 (Serta m)

You might also like