2007ONTHEDISTANCEBETWEENSOME

‘Acta Appl Math (2007) 97: 79-97 ‘DOF 10,1007/s10440.007-9134-x On the Distance Between Some xps Sampling Designs Anders Lundqvis Published online: 5 April 2007 (© Springer Science + Business Media B.V. 2007 Abstract Asymptotic distances between probability distributions appearing in srps sampling theory are studied. The distributions are Poisson, Conditional Poisson (CP), Sampford, Pareto, Adjusted CP and Adjusted Pareto sampling. We start with the Kullback-Leibler divergence and the Hellinger distance and derive a simpler distance measure using a Taylor expansion of order two. This measure is evaluated fist theoretically and then numerically, using small populations. The numerical examples are also illustrated using a multidimensional scaling technique called principal coordinate analysis (PCO), It tums out that Ad~ Jjusted CP, Sampford, and adjusted Pareto are quite close to each other. Pareto is a bit further away from these, then comes CP and finally Poisson which is rather far from all the others. Keywords Asymptotic distance - Conditional Poisson sampling - Hellinger distance Inclusion probabilities - Kullback-Leibler divergence - Pareto sampling + Principal coordinate analysis - Sampford sampling - Target probabilities ‘Mathematics Subject Classification (2000) Primary 62D05 - Secondary 62E15 1 Introduction ‘This paper deals with problems related to ps sampling; where each of the 1V units in the population is sampled with a target inclusion probability p), = 1,...,. Usually the p)s are defined so that ~", p; =m, where n is the (desired) sample size, If each unit is selected independently of every other unit with probability p; and all possible samples regardless of sample size are accepted the sampling design is denoted Poisson sampling. ‘This sampling design can be modified in different ways to make sure that a sample of size n is obtained. We mention the modified designs, along with some references where they are described in more detail. The methods are Conditional Poisson (CP) sampling {7, 8, 12], A. Lundqvist. (3) Institutionen fbr Matematik och Matematisk Statistik, Umed University, 901 87 Umea, Sweden coma anders ondgust @mathomu se 2 springerA. Lundgyist ‘Sampford (S) sampling (16), [8, pp. 85-87], Pareto sampling [14, 15, 19], Adjusted Condi- tional Poisson (CPA) sampling (11, 18], and Adjusted Pareto (Par) sampling ([1, 2], see Appendix 5 for details on the adjustment). ‘We also need to define the term achieved inclusion probability. tis, in this paper, denoted by m, i= 1,...,.N, and is the factual probability for unit i to be included in the selected sample, In general, x; pi. which is the case for e.g. CP or Pareto sampling, Far CPA and arA sampling the target probabilities are modified to give desired 7, and in usyase we have 1; = py exactly. ‘We mention the inclusion indicators: let Z; be a random variable such that 1 if unit i belongs to the sample, = 10. otherwise ‘Then the vector I= (1s, la, ..., 1) defines the random sample. ‘The problem we focus on is the following: We want to compare the asymptotic distance between the probability distributions that arise when using the mentioned zrps sampling designs. We present the distributions in Sect. 2.1. Throughout this paper, asymptotic means that d = 0", pi (1 — px) > co. This in turn implies that both NV and n also should tend to infinity, and that (NW — n) + oo. Some different distance measures are used. We start with two well-known measures, the Kullback-Leibler divergence and the Hellinger distance. When asymptotics are consid- cred we end up with a third measure called the Chi-Square distance, regardless of which of the other two measures we start with. The Hellinger distance is a metric, but the other two distances are not. ‘We make the comparisons since different sampling methods can be advocated for differ. ent reasons, eg. maximum entropy (CP, CPA), correct, (Pareto), We are interested in how far apart these designs really are from each other, from ‘a probabilistic viewpoint. Although this is a theoretically oriented paper, the results may also provide the practitioner with some interesting information. ‘When performing srps sampling we are usually interested in applying the well-known Horvite-Thompson (HT) estimator [17, p. 43]. The HT estimator of a population total Y is zy Ble m Par where s denotes the sample. Among other things it is desirable to have a fixed sample size. D> and also independent inclusion indicators, The contribution from a random sample size to the fotal estimator variance can be quite substantial, see e.g. Brewer (3, pp. 251-253]. If the inclusion indicators are independent we get a stable variance estimation for the HT esti- ‘ator. These two properties are not simultaneously realizable. We feel that the best compro- mise is to make the inclusion indicators as independent as possible while keeping the sample size fixed. In Poisson sampling, the inclusion indicators are independent, but we have random sample size. Because of the independent inclusion indicators we are interested in comparing Poisson sampling with the fixed size sampling schemes. ‘The outline of the paper is as follows: In Sect. 2 we give the theoretical foundations of the paper. Then, in Sect. 3 we calculate distances between chosen pairs of distributions. We illustrate our theoretical results with some numerical examples in Sect. 4. We also make 1 graphical study using Multidimensional Scaling (MDS), using the same examples as in Sect. 4. This is done in Sect. 5. The paper is concluded with some final remarks in Sect, 6. ‘There are also five appendices, where we consider some of the derivations in more detail. D SpringerOn the Distance Between Some x ps Sampling Designs 81 2 Probability Functions and Distance Measures Here the probability functions and the distance measures are presented. From now on, all products and sums are taken from I to IV unless otherwise stated, 2.1 The Probability Functions Now we write down the distributions to be compared. These are the distributions arising from Poisson, Conditional Poisson (CP), Adjusted Conditional Poisson (CPA), Sampford, Pareto (Par) and Adjusted Pareto (ParA) sampling. They all give fixed sample size, except Poisson. The vector x= [21,.x2,...,£y] in the probability functions (pfs) should be inter- preted in the following way: 7() = P= x) where I= [I1, fa..--. Jy]. We make the notation more compact by introducing Some special quantities below. The target probabilities pj are used in all quantities, We assume that > p; =n. The quantities are: 1 =p, n= Pla, d= apy and =a (2 - 5) @ ‘The pfs follow, We begin with the Poisson pf, fo) =T] pita =CeT] ri, where Cp =[] @ This design gives m = p; but not a fixed sample size, Now consider Conditional Poisson, CoT] i= a= @ ‘The sample size is fixed but 2, # pr. Next we write down the Sampford pf, Souno(8)= Cs] [1 - So aexe, I This scheme yields both fixed sample size and zr; = p:. The adjusted CP comes next. It was derived with the aim of finding new CP target inclusion probabilities, call them p/, such that ‘we obtain desired inclusion probabilities. The pf arises from a very good approximation [11], cam [ [rf -exp(d~* aun), x= m © We get m; * py. The precision of the approximation can be improved, but that involves iterations, meaning that we cannot write down an explicit pf. The initial approximation is sufficiently accurate in most cases. Pareto sampling is a simple design for selecting a sample, but more complicated probabilistically. The pf is Seas) = Cro T] + ete Iklan. © ‘This design does not give 2; = p;. An expression for ¢, is presented below. The of notation is chosen to follow the notation of Bondesson et al [2]. As in the CP case, we can try to find new Pareto target probabilities to obtain desired inclusion probabilities. We get an explicit approximation yielding the pf, of (1), Siwa(0) = Coon] 7 -0xp(—d? > bese) eres bl a fers) 4) fem) = o D Springer2 A. Lundgyist Now zr; p), and the precision of the approximation can be improved, which also involves iterations, leaving us with a non-explicit pf. The intial approximation is usually even better than the CPA approximation. ‘The different normalizing constants make (27) true pfs. Approximate expressions for the constants are derived in Appendix 3. The cs in the Pareto and Adjusted Pareto pfs are ERE) ts inf eae . Bondesson et al. [2] have shown that, for d not small, rele gna(r+ 2@=B) In summary, for Poisson we get x; = p; but the sample size is random, for CP and Par, 1x; # py but the sample size is fixed. CPA and ParA give ms very close to the pis and fixed sample size. The approximation for CPA is derived in Lundqvist and Bondesson [11]. The details for ParA can be found in Appendix 5. 2.2 Distance Measures ‘There are many possible measures one could consider for calculating the distance between ‘wo distributions, f(x) and f(x). Two of the most widely used are the Kullback-Leibler (KL) divergence [9, 10] and the Hellinger distance (e.g. [61). Let fi(x) and f,(x) be discrete probability distributions. For simplicity we write fi and fz instead of f(x) and fa(x), e- spectively. We stil use the interpretation: f(x) = P(I=x) where I= [1j, fz...» ly]. Then wwe have Dulhiv.= Dh tee(#) = Ey voe( 4 ): @) Diff =2 (VR - VAY ® We are interested in asymptotic distances, and we know that the considered distributions approach each other in the limit, Thus we use a Taylor expansion of both measures hoping to get an expression which is easier to deal with analytically for our pfs. If we use a Taylor expansion of order two we get the same result for both the KL divergence and the squared Hellinger distance. We denote the common result by x?. The formula resembles that for a Chi-Square test statistic for a Goodness-of-fit test, hence the notation. The x? distance is 2 Ly ha fi? Pee = 3D Details on the derivation are found in Appendix 1. A closer look at this expression yields i DSi fd) Q springer fy? Nog STM_ETRLEOn the Distance Between Some aps Sampling Designs 83 To obtain the Hellinger and x? distances, we take the square root of (9) and (10), respectively. It may be noted that all the distance measures above are special cases of the more general Cressie-Read distance [5]. Remark I The Hellinger distance (9) is usually defined without the factor 2, and the x? distance is usually defined without the factor +, see e.g. Gibbs and Su [6]. In that paper the x? distance is also defined without taking the square root, and with the order of the dis~ tributions interchanged. In e.g. Reiss (13, p. 98] the x? distance is defined as in this paper, except for the factor 1/2, We use our modified definitions to achieve asymptotic equality, see Appendix 1. One may also note that the Hellinger distance is symmetric and satisfies the triangle inequality, whereas both the KL divergence and x* distance are non-symmetric. ‘Thus the Hellinger distance is the only true metric among these three. Remark 2 There are different demands on the support of the distributions for the different ‘measures, The Hellinger distance is usable regardless of differences in support. For the KL divergence we must have Support fa) 2 Support) for the ratio fi /f, to be always finite. The x? distance demands Support.) 2 Support fa), otherwise the ratio f2/,fi is not always finite. Remark 3 Till (18, pp. 142-145] uses another distance measure, Largest Possible Devi- ation (LPD), for comparing fixed size sampling designs with the same achieved inclusion probabilities. The LPD distance does not use distributions directly, instead it uses covariance ‘matrices. If we have distributions f; and f_ with corresponding covariance matrices © and >, the LPD is defined as. de UD 2 = BA rE Tillé states that the maximum above equals the largest eigenvalue of EYE, where Bf is the Moore-Penrose inverse of 32, The exact distributions are not needed for calculating the LPD, but on the other hand the LPD can be zero even ifthe distributions are not identical ‘The LPD is non-symmetric and thus not a metric, 3 Distances We use the x? distance, i. the square root of (10), as distance measure. Thus we need to choose a reference distribution, f, since wé take the variance with respect to fi. We have chosen the CPA pf (5) as reference, for the following reasons: The CPA pf can be shown to minimize the KL divergence (8) of the class of distributions yielding desired inclusion probabilities with respect to Poisson (which in the KL-sense makes the inclusion indicators as independent as possible). Details on this are found in Appendix 2. Sampling designs with minimum KL divergence to some reference design are treated in e.g. Tillé [18, pp. 64-65]. ‘The CPA pf also has a maximum entropy property, which many authors find desirable (see eg. [8, p. 29, (3, p. 260). Finally, the CPA design is derived to yield desired inclusion probabilities. D springer84 A. Lundquist 3.1 CP vs CPA Here we derive the asymptotic distance between CP and CPA. Let Ji be the CPA pf (5). ‘Then we get 204 (CPA, CP) = Viren (£2) * Sora. Te ) expla" Y gute) Now, let dur = I peg = Cquxe. Then we make a Taylor expansion of e(der) exp(—dyr/d) around Ecpsdur = as px = 4, keeping only the first non-zero term containing diyr. The result is pee 8(dir) © 5 ~ 7, (dur — 4). Further, in Appendix 3 we can see that 2.71828, Combining, we have -*Varcmy (dur) 11, 2, we Beet 2D2,(CPA, CP) Nason (2 5g der o) ‘The expression on the right-hand side is the squared coefficient of variation for dyyr. The distance is obtained by taking the square root, which yields CV eo dur) D,2(CPA, CP) a 8 3.2 Sampford vs CPA Consider now the asymptotic distance between Sampford and CPA, with the CPA pf (5) again being reference. We have ts tera -(& LY von (sollte) 202, (CPA, S) = Veron (FE T] ri -expld- Yo gee] ~ (EJ mona eof Lee) Com, D Springer(On the Distance Between Some srps Sampling Designs 85 Soar. Then we Taylor expand g(r) = derexp(—diy/d) around Eopadir = again keeping only the first non-zero term containing dir. The result is a et lea dur) * aq tera. From Appendix 3 we see that, Sn Com a” Putting all this together, we get a dla a 2D%a(CPA, S) ® Svasen (S ~ gg Gur - 4 ) = ya Vaton (Gar ~ 4?) 1 ; = Ga (Eco (dir — 294 — (Border —2)) Now, Ecps (dur —d)* is the fourth central moment of dir. Since we are considering asymptotics here, we can use a Normal approximation. This yields Econ(dr ~ a)’ © 3(Ecea(dur — 4), implying that 2 * 2D8 (CPA, 8) 2a ( (ca dar ~@)))? 1 os 1 » = 5a atena den) = 5 (CV Gan))? =2( 2s cp) We see that D,a(CPA, S) = (D,a(CPA, CP))?, 80 if Dy2(CPA, CP) is less than one, D,2(CPA, S) should be considerably smaller than D,2(CPA, CP). 3.3 Pareto vs CPA ‘We turn to the Pareto design, and use the previously mentioned approximation in Bondesson etal. (2), CPA is, as usual, the reference distribution. ms) = (3) on ete aa) = (St) ven (Deow-evo[- Dee)) Cow” 1 «(= e) Varcns (3 (ae + abu) exo] - 22, (CPA, Par) = Vascn86 A, Landayist “1 bux. Then we work with g(dyr,z) We let dur = Ygne and z (dase +2) 2xp(~dur/d). We have 4 4 Fi 3 4 z (Gur. 2) = dur exp(—darr/d) + 2exp(—dar |) * dur exp(—darr/d) + > ‘We Taylor expand dyer exp(—dyr/d) around Ecpdyr = d. This yields . eel 2 aden.) 5 +5 ~ 37, Gur— a). From Appendix 3 we have Chur p, & Com a Combining these expressions we get atc. ten acne dP +€°Ntrone =202 4 = 2D2,(CPA, 3) +d~*Varem (bum). ‘We have neglected Cov((dizr — 4), z) since this should be very small compared to the vari- ances. Here itis seen that D,2(CPA, Par) is greater than D,2(CPA, S) but it should be less than D,2(CPA, CP). 3.4 ParetoAdapt vs CPA Next we consider the Adjusted Pareto distribution, and again we use the approximation in Bondesson et al. (2]. The reference distribution is CPA, 202, (CPA, ParA) = vasen( 4 ) 7 fom 2 f — Da~badae) D chr (CE) wn (OT ete baad) (ae) Varcn (Soebxe-exp[-2-" (ae + aa) ]) Com ~ (Sea) Varcoa(S (an tabs) a4 -exp[—d-t (au +d) ]) Copa Now we let z= (gr +d7'by)xp . Then we Taylor expand f(z) = zexp(—z/d) around d, as before keeping only the frst nonzero terms involving z. The result is, egy Faro -Fe-a Appendix 3 yields Coad Cor “F D Springer(On the Distance Berween Some mps Sampling Designs Ed which further yields 1 2D8s (CPA, Pard) = 77 Vatem (2 ~ 4). ‘This expression is similar to the one encountered in Sect. 3.2, where the distance between S and CPA was investigated. The normal approximation can be used here also, giving us ‘Vatcpa(z ~ 4)? -% 2(Varora(z))*. This gives us the final expression 22,(CPA, Para) = aa (Vater (Sa +a-toy)ne))” We note that this is almost the same expression as for the CPA-S distance, the difference being that we have ge + d~'by instead of ge in the sum. As d tends to infinity, the d~'b,~ ppart vanishes. This means that these distances should be very similar asymptotically. 3.5 Poisson vs CPA ‘The x? distance is infinite inthis case, since the support for Poisson is larger than the support for CPA. However, one can show that the H distance and the KL divergence for CPA-P both are greater than those for CPA-CP. The proof for the H distance follows. The proof for the KL divergence is found in Appendix 4. We use the Poisson distribution with target and achieved inclusion probabilities p,, CPA with inclusion probabilities p, and CP with target probabilities p:: D3 (CPA, P) =20(\ fem — VF)” 2D fom +25 So 49 Vion Fo =a. . Lik) Analogously, we get: D3 (CPA, on =(1 -DViewTe). ‘This means that D}(CPA, P) > D3,(CPA,CP) iff YW Fem V fer —VTr) > 0- ‘The last sum above is effectively taken over x, [x| =, and for such x we have fep > fr. ‘Thus we have our desired result: D3,(CPA, P) > D3, (CPA, CP). Note that this result is still valid if we replace fcrg with any distribution f corresponding to a design with fixed size n. Q Springer88 A. Landgvist Remark 4 Tnall our asymptotic considerations, the need for calculating covariances appears, due to the fact that Va(Slaex) = Datars) + TP aayCoveey x), Lie where Cov(x,, xj) =m — miztj. We have chosen to use an approximation derived for distributions with support (x; x] =) in Bondesson et al. [2], and there denoted by BTLI. The approximation is: Covers x) -C-m(l —m)n)(1 -=( where 6 = J) mj(1—m)) and = 2nd —=) Formula (11) was derived by minimizing the sum of the squared correlations between the inclusion indicators. This is a step towards making the inclusion indicators as independent as possible while keeping the sample size fixed. For small populations itis possible to calculate the 7s (and thus also the covariances) directly by writing down the pf for all possible samples and summing over the samples containing units 7 and j. Inthe paper by Bondesson et al. (2] a formula for estimating the variance of the HT estimator with the BTL1 approximation (11) inserted is given. Further, Hajek (8] and Tillé [18] both include a wide range of ‘ther methods for obtaining accurate covariance approximations. 3.6 Summary Using the formulas derived in Sects. 3.1-3.5, we can order the different distributions with respect to their distance to CPA, However, when comparing CPA-Sampford and CPA-ParA things are not obvious. IF Varcon (So(ax +4-)ax) 00, whereas the largest d in our examples is d = 2.09. D springer0 A. Lundqvist ‘Table 3. Sampford-Hajek population; distances between CPA. (as reference distribution) and Poisson, CP, Sampford, Pareto, PartoA Distibution —-/Dar Da Dytave Dydian BTLIY—_ Dy a, Exe) Pp 145 139 cy o o.0866 0.0872 0.0894 0877 0.0908 Pareto oo oouz cout o.107 0.0108 Sampford ooo72 00714. 0.00720 0.00769 0.00824 Pareto o.0asts 0.00643 0.00638 0.00648 0.00699 5 Graphical Study For the Examples | and 2 we have also graphically studied the distances between the distributions (3-7). We use multidimensional scaling (MDS). There are many types of MDS, here we use principal coordinate analysis (PCO). 5.1 Review of PCO In PCO the starting point is the dissimilarity matrix. This is a matrix containing the dissimilarity measures between different objects. The dissimilarity measure can be almost any measure, The dissimilarity matrix should however be symmetric with nonnegative entries. Every diagonal element must be zero, since it is the dissimilarity between an object and itself. The objective in PCO is to find a Buclidean coordinate representation for each object, such that the Buclidean distances between objects is the same as the dissimilarity between the objects. One hopes that two Euclidean coordinates give a good approximation of the true dissimilarity, In that case, one can plot these coordinates and thus get a “map” of how the objects are located with respect to each other. We give a brief overview, without proofs. For ‘more details, see e.g. Cox and Cox [4, pp. 25-35]. Let D = (dj) be the m x m dissimilarity matrix and define a matrix Hl = (ij) such that hy =1 = 1/m and hy =—1/m. Also define A = (ai), where ay = —d3.. Let B= HAH Let Ay 2 Ag ++ 2 Aw denote the eigenvalues and uy, tz, .., Uy the corresponding eigen- vectors of the matrix B. If all the eigenvalues are nonnegative, we set x) = /A;u, and form an m x m matrix X where X = [x); 823 ...}%q]> We interpret the i-th row of X as the Buclid- ean coordinates for object i. This is a complete representation, ie. the Euclidean distances between the objects are exactly equal to the corresponding dissimilarity measure. If 4, and do are the only nonzero eigenvalues we only need two Buclidean coordinates for a complete representation. In practice we hope that 2; and 3 are much bigger than As, ..., Ams meaning, that the two-dimensional Buclidean representation is a very good approximation of the true issimilasities. Remark 5 If all the eigenvalues are nonnegative, at least one of them will be zero. This is due to the fact that for m points in an m-dimensional space, there is always a hyper-plane of dimension m — 1 that contains all the points. 5.2 Examples Revisited ‘Ourexamples are considered again, this time calculating all the pairwise Hellinger distances, using them as the dissimilarities, ie. diy = (\/'D%)y. Exact calculations can be performed D springer(On the Distance Between Some -rps Sampling Designs 31 “able 4 TBM popwatons |< Helinger des Sumpford GP ~~CRA~~CPar=~CPA Samptort 0 a1 001% a0189 0.00389 e 0 oat ape2 117 cra 0 ist aoa Per 0190 ark 0 _—— ona oe ‘ scr ven = ora 1 “eg ae ee a 0a Fig. 1 TBM population. Positions of distributions for Par and PatA since we have small populations. The Hellinger distance is used since it satisfies the triangle inequality, a property shared with the Buclidean distance. The Poisson distribution is excluded here, since it tums out that the Hellinger distances between Pois- son and the other distributions are very large compared to all other distances. This yields the result that a one-dimensional Buclidean representation is sufficient, where the Poisson distribution is at one point, and all the other distributions are clustered together at one other point. Example 3 (TBM again) We consider the TBM population (see [19]) where N= 6, n =3, Pi = Pa= P= 1/3, pa = Ps = Ps = 2/3. We get the following Hellinger distances. For reasons of transparency, only the upper part of the symmetric matrix is shown, Performing the PCO analysis we get the vector of eigenvalues A=[2.55- 107°, 2.16- 10-%, 7.64 - 1078, 2.22- 107, 0}. We see that Ay, Ay and 2s ate negligible compared to 2 and 4g. Hence, a two-dimensional Euclidean representation is a good approximation, and we can draw a “map” showing the positions of the distributions. We see that the CP distribution stands out a bit from the others. This picture does not contradict our conjecture that Sampford and Par are very close to each other. These two are also quite close to CPA, which also confirms our previous distance ordering. Example 4 (Sampford-Hajek) In our last example we again make use of the Sampford- Hajek population from Example 2. Going through the PCO analysis, the eigenvalues are A=[1.42- 107, 7.63- 10%, 5.36- 10°7, 8.40- 107°, 0). D Springer2 ‘A. Lundavist ‘Table S Sampford-Hajele population; Hllinger distances Sampford CP CPA Par ParA Sampford 0 0.0854 0.00714 0.00795 0.00117 ce o 0.0872 0.077 0.0850 cra ° corte 0.00642 Pe 0.00758 Par ° 5 ore of @ oP Park eon = 1 Boas 008 0025-002 0015 -001 0005 0 0005 001 0.015 ig. 2. Sampford-Hajek population. Positions of distributions In this case, 4, is much larger than 22,..., As, telling us that we could probably use just a one-dimensional Euclidean representation, However we consider the two-dimensional representation. The position “map” can be viewed in Fig. 2. Also in this case the CP distribution is located rather far from the others. Sampford and Par are almost inseparable. ‘The examples show a similar pattern, which is also the expected one. The distributions yielding correct ;s (at least approximately) are close to each other. 6 Final Remarks ‘We conclude that the CPA, Sampford and ParA distributions are quite close to each other, ‘which was expected since they are all derived to give correct zs. The CP distribution lies quite far away from these, while Pareto resides somewhere in between. Numerical exam- pies and graphical investigations with MDS using small populations have been presented. ‘The examples seem to confirm our conclusions. We have an interesting result for those who advocate Pareto sampling. Our proposed adaptation of Pareto sampling is not difficult to realize in practice, nor is it computationally demanding, We thus advocate that it should be used. One then obtains inclusion probabilities very close to the desired ones and a distribution very close to Sampford. For those who feel more comfortable in the Conditional Poisson setting, we have a result regarding the KL divergence between Poisson sampling ‘and the fixed size schemes, stating that we obtain the minimal divergence (while obtaining desired inclusion probabilities) when using CPA. Finally, those who wish to use Sampford sampling may still find it useful to know that one can adapt both the Pareto and CP methods and get a distribution close the Sampford distribution. Acknowledgements ‘The author would like to thank Professor Lennart Bondesson for help and discus- sions during the work, and Geran Amoldsson for comments on previous versions ofthe paper. I also thank the referee for valuable comments and suggestions. D springerOn the Distance Between Some ps Sampling Designs 93 Appendix 1 Derivation of the x? Distance Here we derive D2, starting from Dyz and D3, respectively. Beginning with the Kullback- Leibler divergence, we have A Duff =Dh toe(4) Dsiloe( 4) \ _ Arf) f-fi_1f(fh-Ay j = Liiee(1+ Fi ) ral Fi al Fi )) - 2 ae =Du-+3D0(44) pr ® -u.n. Now consider the squared Hellinger distance Dii(Sis fr) =2 LWh- VA) =2 L(A +5-2/B) Efe) -(i-Ea fr) Fale Sf) =f? fi? (C4 )-}r BP em. ‘We see that Taylor expansions of order two on Dyz and D3, result in D2. Appendix 2 An Optimal Property of CPA Here itis proven thatthe KL divergence for Poisson wrt. the class of distributions with fixed inclusion probabilities and sample size (denoted by n) is minimized for CPA. The Poisson distribution with target and achieved inclusion probabilities p; and CPA with inclusion probabilities p,. All sums and products are taken from 1 to NV, unless otherwise indicated. We have fo) = CoP] 7, where Cp =[] a. Let f(x) be any design of fixed size n with inclusion probabilities 7. Then Dufur fo) = Sa tog DY flog fe — Do fulog fo =D fale fi — log Ce fe Yo se Domi logn —Entropy(f,) — log Cr Yo fe — Ylogri.)%i Si ~ Bntropy(f,) — log Cr — )lognpr D springerA. Landgyist ‘The last two terms of the final expression above are constants which do not affect the KL, divergence. We can now use the maximum entropy result by Hajek (8, pp. 28-31] to see that, swe must have f(x) oT] (7})", where we use r} ocr exp (d~1qx) to get inclusion probabilities p; (see [11]). Thus we conclude that fy (x) = fera(x) Appendix 3 Normalization Constants In this appendix we derive approximate expressions for the normalization constants appear ing in (3-7). We start by just writing down Cp, ca= DT" Moving on to Cn, we start from Cz» which we multiply by 1: YTD = Cah O Coos [] 7 exp (aD auae) exp Da) ‘The sum over |x| = is identified as the expectation of exp(—d~! > qyx) wart. the CPA. distribution. Thus we have Cor Gia Bem exp (—d~" aux) 1 ans Beene (—d! Yanna — Pe). In the last step above we have used the fact that d = S° ge pe. Now, using the Gauss approximation, B(g(Z)) * g(E(Z)), on Z = —d~" YD au(xe ~ pp) and g(Z) =e”, together with the fact that Bera xs) = Pa, We get -d"' Eon (Saxtse— pd) Econexp (—a™" Y an(ae ~ ped) * exp] ‘Thus: Cop ¥ (€- Com)! Next, we consider C5, c7!= PUP? Dawe Da Dalle ‘This implies that Cs © Cer/d. Proceeding with Cpe, We have C= DTT De ie From Bondesson et al. [2] we know that for large , (Pe = 1/2) b aa (1+ BO) nat toa springerOn the Distance Between Some zrps Sampling Designs ‘able 6 Approximations of ee Distribution Constant CPA, Cople s Copia Par Cop/d Pac Conia If ci = gu, it is the Sampford pf. Hence Cher ® Cs * Cop/d. Bot Chore Cala = DT] ew (-4? Down) Deine Here we just note that for large d, exp (-4? Dun) ¥1-(@* Dan) 1. ‘We are back to the original Pareto case, and have Crys * Cs * Ccp/d. In summary, we present our approximations for the constants in Table 6. Appendix 4 Dxz(CPA, P) vs. Dyz(CPA, CP) Here we prove, for the KL divergence, that Dgx(CPA, P) > Dex(CPA, CP), cf. Sect. 3:5 where the corresponding property is proved for the H distance. We have three distributions, the Poisson distribution with inclusion probabilities p, CPA with (approximate) achieved inclusion probabilities p; and CP with target inclusion probabilities p,. We have Liven voe(*) =D fen voe( 2) +Y fa woe( #2) Dy (CPA, P. = Di(CPA, CP) + > fers toe( 2). Since ep > fp for all x, |x| =n, it follows that log(t) > 0 for all x, |x| =, and thus the last term is positive. Hence Dy (CPA, P) > Dru (CPA, CP). Appendix 5 Derivation of Adjusted Pareto We derive an expression for the adjusted target probabilities under the Pareto design to achieve given inclusion probabilities p,, i = 1,...,.N. Our reasoning is based on two as- sumptions. 1. The adjusted Pareto pf is very close tothe Sampford pf. 2, For Pareto sampling the inclusion probabilities are close to the target ones. D Springer96 Lundqvist ‘We start with the Pareto pf with some target probabilities p/, such that ""” p} =n. Then we hhave rf = pi/(1— 2). Now, Four X) = CrarT] 1" Yoeirey xl = a2 oe BEM) “rast GP eT ax In the above expression r/ could have been used instead of rj, but we are expecting the difference to be rather small, o the rs are used. Now we introduce Dad ad a We can rewrite (12) as Sins) oT] tri (1+ Daan) © T] tr exp (San) ‘The target probabilities p/ are chosen such that, ; 1 ie. rfocr 2 =? The 78 must be determined. Leting 7 =n". paya, we get fro) <[] o0(- 7 +a)n) =] (1+ Div -7 +as). ‘The last factor is rewritten as 1 +Din-v+an=D(, We utilize assumption 1 above and set 1 1 5 qth ta=C(~ pds cf. the Sampford pf (4). We multiply by p¢ and sum over k. It follows that C= d~', where d= pi(] — px). Insertion of the expre: 3) Bondesson etal. [2] have shown that, for d not small, gad ~pa(i+ 2A) D springer97 On the Distance Between Some srps Sampling Designs For calculation of 2, we use the slightly rougher approximation cf © 1 — pp, giving us né =n-n-! > py (1 — py) =. Substitution of this into (13) yields I= pe_ ch __ pall = pe)(P2~ 4) aa # Hence we choose the target probabilities p{ such that >y’ pi =n and pel — pe)(Pe ae). 14 TF ar 7) “ Remark We can use this result directly to adjust the ranking variables used in Pareto sampling. The adjusted ranking variables are =u; il — Pi) (Pi— usa -U) on( 2 Pale dy pid =P) # ‘This means that sampling according to the adjusted target probabilities pj is as easy to perform in practice than ordinary Pareto sampling, References 1. Aires, N: Comparisons between conditional Poisson sampling and Pareto ps sampling designs. J. Sa Plan, Inference 88, 133-147 (2000) 2, Bondesson, L Tan, I, Lundqvist, A: Pareto sampling versus Sempford and conditional Poisson sampling. Scand, J. Stat. 38, 699720 (2006) 3, Brower, Ks Combined Survey Sampling Inference; Weighing Basu's Elephants. Armold Publishers, Lon- don (2002) 4. Cox, TR, Cox, M.A.A. Multcimensional Scaling. Chapman & Hall, London (1994) 5. Grete, NAC, Read, TRC: Mulnomial goodness-of-fit tests. J. Roy. Stat, Soc. B 46, 440-464 (1984) 6 Gibbs, AL, $0, FE: On choosing and bounding probability meties. Int. Stat Rev. 70, 419-435 (2002) 7. Hajek. J: Theory of rejective sempling. Ann. Math Stat 38, 1491-1523 (1964) 8. Hajek, J: Sampling from a Finite Population. Marcel Dekker, New York (1981) 8. 0. 1 Kallback, Information Theory and Statistics, Dover, New York (1968) XKallback, 8. Leiber, R.A: On information and sufciency. Ann. Math Stat 22, 79-86 (1951) Lundgyist, A, Bondesson, L: On sampling with desired incision probsbilities of first and second or der. Research Report in Mathematical Stastios No.3, Deparnent of Mathematics and Mathematical Statistics, Umed University (2005) 12, Rao, INK: On thee procedures of unequal probability sampling without replacement. J. Am. Stat Asse. 58, 202-215 (1963) 1, Reiss, R-D.: Approximate Distributions of Order Statistics. Springer, New York (1989) 14, Rossa, B: Asymptotic theory for oder sampling. J. Stat. Plan. Inference 62, 135-158 1997) 15, Rosém,B. On sampling with probability proportional to size. J Stat. Plan. Inference 62, 158-191 (1997) 16, Sampford, M.R.: On sampling without replacement with unequal probabilities of selection, Biomettika 54, 499-513 (1967) ° 17, Simdal, C-E., Swenssoa, B., Wistman, J: Model Assisted Survey Sampling. Springes, New York 992) 18, Tilé,¥: Sampling Algovthms. Springer, New York (2006) 19, rat L, Bondesson, L., Meise, K: Sampling design and sample selection through distribution theory. 4. Sta. lan inference 123, 395-413 (2004) D springer

2007ONTHEDISTANCEBETWEENSOME

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2007ONTHEDISTANCEBETWEENSOME

Uploaded by

Copyright:

Available Formats

You might also like