Professional Documents
Culture Documents
114397
1
2 NGS EXPERIMENTS WITH POOLED SAMPLES
Abstract
Next generation sequencing (NGS) is about to revolutionize genetic analysis.
Currently NGS techniques are mainly used to sequence individual genomes. Due
to the high sequence coverage required, the costs for population scale analyses
are still too high to allow an extension to non-model organisms. Here, we show
that NGS of pools of individuals is often more effective in SNP discovery and
provides more accurate allele frequency estimates, even when taking sequencing
errors into account. We modify the population genetic estimators Tajima’s π and
Watterson’s θ to obtain unbiased estimates from NGS pooling data. Given the
same sequencing effort, the resulting estimators often show a better performance
than those obtained from individual sequencing. Although our analysis also shows
that NGS of pools of individuals will not be preferable under all circumstances, it
provides a cost effective approach to estimate allele frequencies on a genome-wide
scale.
NGS EXPERIMENTS WITH POOLED SAMPLES 3
Methods
Throughout, we will consider an individual sequencing project where k indi-
viduals are sequenced each with an expected coverage λ, by which we mean that
any given locus is sequenced λ times on average. For a comparable pooling exper-
iment that involves the same amount of sequencing effort, the expected coverage
will then be kλ, i.e. any particular locus will be read kλ times on average from
the pool consisting of n individuals. In practice, one might for instance sequence
each of the k individuals on a separate Illumina lane with coverage λ. With the
same sequencing effort, the pool could be sequenced on k lanes simultaneously,
leading to a total coverage of kλ.
For the convenience of the reader, we summarize our notation in Table 1.
SNP detection. A SNP is detected at a site if the site is polymorphic, i.e. if
at least two alleles A and a are found in the sequenced sample. We will consider
SNP detection both in the context of pooling experiments and for individual
sequencing. To assess the performance of these two competing scenarios, we will
look both at the power and at the probability of falsely calling a SNP due to
sequencing errors.
Generally speaking, an experimental design that provides high power while
keeping the probability of incorrectly detecting a SNP small, will be preferable.
When individuals are sequenced separately, the probability of sequencing errors
being interpreted as true SNPs can be reduced by a sufficiently high expected
coverage if the genotype of an individual is inferred by the majority of reads.
Note, that in the case of diploid individuals, the distinction between sequencing
errors and true SNPs is significantly more complicated. In pooling experiments, a
simple way to control the probability of falsely detecting SNPs both in the haploid
and in the diploid case is to require a certain minimum number of reads for the
minor allele in order to call a SNP. We extend work by Eberle and Kruglyak
(2000) on SNP detection, and derive both the power and error rates for pooling
experiments and for separate sequencing.
Separate Sequencing of Individuals. Let MA , (Ma ) denote the number of
times allele A (a) is sequenced. Given that exactly LA = l out of k individuals in
the sample have an allele of type A, the probability of detecting polymorphism is
equal to the probability of reading at least one of the A and one of the remaining
a alleles in the sample. Assuming that for each individual the number of reads at
a particular locus is Poisson distributed with parameter λ, the probability of not
covering the SNP locus for an individual is exp(−λ). This leads to the probability
qc (l, k, λ) := (1 − [exp(−λ)]l )(1 − [exp(−λ)]k−l )
for getting at least one “A” and one “a” read. Notice that for larger values of
λ, this probability is nearly one, except for l = 0 or l = k, where qc (l, k, λ) = 0.
Suppose now that our population size N is fairly large and that the relative
frequency of allele A is p in the population. Then, by conditioning on the number
NGS EXPERIMENTS WITH POOLED SAMPLES 5
In the independent case, an error is made by choosing one of the three incorrect
bases at random, each with probability ǫ/3. The probability of falsely detecting
a SNP is
(4)
k
X r r
X λ
qe(i) (k, λ, ǫ) = 1 − 1 − 3 (ǫ/3)i (1 − (ǫ/3))(r−i) exp(−λ) .
r≥1
i r!
i>r/2
The resulting error probabilities can be made very small by ensuring a coverage
λ that is large enough. Obviously a more sophisticated rule will be needed when
sequencing diploid individuals.
Pooling Experiment. We now assume that a pooled sample of size n is se-
quenced with the same expected total number kλ of reads per locus as for sep-
i
arate sequencing. Let F(P ) (b, γ) = bi=0 γi! exp(−γ) denote the probability that
P
a Poisson random variable with parameter γ is at most b. Given a frequency
LA = l of A alleles in the pool, we obtain the probability of reading at least one
6 NGS EXPERIMENTS WITH POOLED SAMPLES
of making at least b sequencing errors and getting at least one correct read. If
the expected number of reads λk is fairly large, the term 1 − F(P ) (0, λk(1 − ǫ))
is very close to one and can be omitted without changing the results much. With
independent sequencing errors, an upper bound for the probability of falsely
detecting a SNP is given by
p(i)
(9) e (k, λ, ǫ, b) = 3 1 − F(P ) (b − 1, λkǫ/3) .
Now given J, MA is binomial B(J,p) distributed and Var MJA |J = p(1 − p)/J.
This leads to
MA
E Var |J = E(1/J)p(1 − p).
J
Furthermore E[ MJA |J] = p and therefore Var E[ MJA |J] = 0. Together
reads of the same sequence are entering the normalizing constant as indepen-
dently sampled sequences, if the estimators are computed in a standard way for
pooled samples. Sequencing errors also lead to bias, and if a minimum minor
allele frequency is required in order to make sequencing errors rare, this needs to
be taken into account. For individual sequencing, the effect of omitting single-
tons has been studied by Knudsen and Miyamoto (2009) as well as Achaz (2008).
Based on the expected values of Tajima’s π and Watterson’s θ, we introduce
modified normalizing constants that make the resulting estimators unbiased un-
der neutrality. These bias corrected estimators will then be compared with those
obtained from individual sequencing. (See the RESULTS section.)
We first derive a bias correction for Tajima’s π and start by considering a
locus for which M reads are available. We do not consider sequencing errors for
the moment, and focus on the bias that is caused by possibly reading the same
sequence more than once. Let ∆ij denote the number of differences between the
sequences i and j at this locus that are selected randomly with replacement from
the pool of n individuals. Now for this locus
X M
Eθ̂π = E ∆ij /
i6=j
2
= E∆IJ
= E[∆IJ |I = J]P(I = J) + E[∆IJ |I 6= J]P(I 6= J)
= 0 + θP(I 6= J)
n−1
(11) = θ .
n
n
Therefore n−1 θ̂π will be unbiased, if we neglect sequencing errors. Since this
bias correction only depends on the size n of the pool and not on the coverage
by reads, a bias corrected version of Tajima’s π for the entire sequence can be
obtained by adding up individual values of θ̂π,l for all loci and then multiplying
n n
, leading to θ̂π∗ = n−1
P
by n−1 l θ̂π,l .
In order to also correct for sequencing errors, two approaches seem feasible. If
an unbiased estimate for the sequencing errors is available, such an estimate could
be used to correct θ̂π∗ . Analogous to Achaz (2008, equation (1)) for the standard
n
experimental setup, θ̂π∗ −2 n−1 µ̂err will be unbiased, if µ̂err is an unbiased estimate
of the number of reading errors per sequence. Introducing µ̂err will obviously add
to the variance of the resulting estimator and the overall performance will depend
on the accuracy of µ̂err . Another way to take into account sequencing errors is
to require a minimum minor allele frequency b for including a segregating site
in the analysis, and to ignore sequencing errors subsequently. The idea is that
sequencing errors will be rare if b is sufficiently large.
(b)
Again, we first consider a locus for which the coverage is equal to M. Let θ̂π
denote the version of Tajima’s π where the minor allele frequency is required to
NGS EXPERIMENTS WITH POOLED SAMPLES 9
(b)
be at least b. Notice that θ̂π = θ̂π for b = 1. With Km denoting the number of
(b)
sites where the derived allele A has frequency m, θ̂π may be written as
−1 M −b
(b) M X
θ̂π = Km m(M − m)
2 m=b
for a locus for which M reads are available (see section 1.4 in Durrett (2008)).
Let cn = 1/ n−1 −1
P
i=1 i , and let furthermore XM denote the number of A alleles
among the reads, and Yn the number of A alleles in the pool. Then
M
P (XM = m|Yn = r) = (r/n)m (1 − r/n)M −m
m
and under neutrality P (Yn = r) = r −1 /cn . With cn θ being the expected number
of segregating sites in the pool,
−1 M n−1
−b X
(b) M X
(12) E(θ̂π ) = cn θ m(M − m)P (XM = m|Yn = r)P (Yn = r)
2 m=b r=1
(b)∗
Dividing θ̂π by the total length of the considered sequence, an estimator for the
scaled mutation parameter per base results.
We now derive a bias correction for Watterson’s estimator, again first focusing
on a locus with coverage M. We consider a version of Watterson’s estimator that
requires a minimum minor allele frequency b. For b = 1 we use all segregating
sites, and versions that protect against sequencing errors can be obtained by
choosing b > 1. Let Sb denote the number of segregating sites found in the M
sequence reads from the pool for which the minor allele frequency is at least b.
Then
(b) Sb
(16) θ̂W := PM −1
i=1 1/i
Let F(B) (x, M, p) denote the probability that a binomial random variable X sat-
isfies P (X ≤ x) for M trials with success probability p. In particular for p = r/n,
x
X M r i r M −i
F(B) (x, M, r/n) = 1− .
i=0
i n n
PM −1 (b)
Recall furthermore that cM = i=1 i−1 . Then a bias corrected version of θ̂W
for b ≥ 1 is given as
(b)
(b)∗ θ̂W cM
(18) θ̂W = Pn−1 1
.
r=1 [F(B) (M − b, M, r/n) − F(B) (b − 1, M, r/n)] r
(b)∗
As with Tajima’s π, θ̂W can be easily adapted to work with longer sequences.
For this purpose, partition the sequence into L loci such that for each locus a
constant number of reads Ml is available, and obtain the bias corrected Watterson
(b)∗
estimate θ̂W,l separately for each locus l. Then
L
(b)∗ (b)∗
X
(19) θ̂W = θ̂W,l ,
l=1
NGS EXPERIMENTS WITH POOLED SAMPLES 11
(b)∗
provides an estimate of the overall scaled mutation parameter. Dividing θ̂W by
the total length of the considered sequence, an estimator for the scaled mutation
parameter per base results.
Results
SNP Detection.
For many biological applications SNP genotyping provides a cost effective ap-
proach, and SNP discovery is the first step required. We compared the efficiency
of SNP discovery using an approach in which each individual is sequenced sepa-
rately with a pooling approach. The panels of Figure 1 show that the comparative
efficiency of pooling depends both on the expected coverage and on the minimum
number of reads for allele calling used for error protection. While pooling experi-
ments provide a higher probability of SNP detection in most cases, it is expected
to be less efficient, if both the coverage is small and a a high minimum number
of reads is required. This is not entirely unexpected, since an increased number
of reads required for the inference of the minor allele reduces the probability of
detecting SNPs in a pooling experiment. The higher the expected coverage, the
more inefficient individual sequencing becomes. As long as not chosen too small,
the size of the pool seems to play a less important role. Figure 2 addresses the
problem of wrongly identifying a sequencing error as a SNP. Irrespective of the
assumed model of sequencing errors (see methods section for further details), a
high probability of sequencing errors makes SNP calling from pools highly unre-
liable. On the other hand, if sequencing error rates are reduced (e.g. by quality
filtering), a suitable lower bound on the minimum allele frequency for detecting
a SNP makes pooling very reliable for the identification of SNPs. Interestingly,
in some cases, we found pooling to result in fewer erroneous SNP calls than
individual sequencing.
Allele frequency inference.
In population genetics, the allele frequency spectrum is of central interest. Es-
timating the allele frequency spectrum of a population is subject to sampling
variation. In an individual based sequencing strategy, most of the sampling vari-
ation comes from the selection of individuals used for DNA sequencing. The
advantage of the pooling approach is that this sampling error can be dramati-
cally reduced by including a large number of individuals in the pool. On the
other hand, a second level of sampling error arises in the pooling approach from
the fact that not all chromosomes in the pool are sequenced and some chromo-
somes may be sequenced more than once. We start by discussing the situation
where individuals contribute equal amounts of probe material and refer to the
last paragraph of the section for the case when this assumption is violated.
In the methods section, we obtained expression (10) for the ratio of the vari-
ances of the estimated relative allele frequency both for a pooling experiment Rp
(pool size n), and a classical experiment with individual sequencing Rc . For a
12 NGS EXPERIMENTS WITH POOLED SAMPLES
large enough expected coverage λ and with k individuals sequenced, this equa-
tion can be approximated by the following quick rule of thumb: Pooling will lead
to a smaller variance for those experimental setups that satisfy 1/λ + nk < 1 or
equivalently n/(n − k) < λ. Thus a case where pooling provides a better estimate
of the allele frequency is when the pool contains more than twice the number
of separately sequenced individuals and the coverage λ per separately sequenced
individual is at least two. For larger pools smaller values of λ will be sufficient.
So far we compared the individual based and pooling strategy only for the same
number of sequenced reads. Alternatively, the superiority of the pooling approach
could be expressed by the reduction of sequencing costs. Figure 3 compares the
pooling approach to sequencing of individuals when both methods provide the
same accuracy for allele frequency estimates. Suppose that k individuals are
sequenced separately, each at an expected coverage λ. Then k ∗ indicates the
cost in single genome sequencing equivalents that results in the same accuracy
as sequencing k genomes individually. If, for instance, k = 20 and k ∗ = 10, then
pooling would give the same accuracy with half the sequencing effort, correspond-
ing to an individual sequencing project with 10 instead of 20 individuals. Figure 3
clearly indicates that larger pool sizes increase the advantage of sequencing pools.
A higher sequence coverage (λ) for sequencing of individuals further improves the
cost effectiveness of pooling.
In genome-wide association studies (GWA’s), the association between allele
frequencies and traits (diseases) is investigated. A possible approach is to test
whether alleles have different frequencies in two pools that differ with respect to
the trait of interest, (see Sham et al. (2002)). Since the ratio of variances (10)
does not depend on the allele frequencies in the sub-populations, the standard
deviation entering the test statistic will differ by the square root of (10) between
a pooling and a classical experimental setup. If the square root of (10) is 1/2
(say), the shift of the expected value of the test statistic under the alternative
will be twice as large in a pooling experiment: Overall pooling will be the more
powerful approach, whenever the variance ratio is smaller than one (see (10)). It
should be noted however that the variance of the pooling experiment will become
larger if individuals contribute unequal amounts of probe material. This issue
will be addressed in the last paragraph of this section.
Estimating population genetic parameters.
We now compare the estimation of the scaled mutation parameter using Watter-
son’s θ and Tajima’s π under our two experimental setups. For this purpose, we
simulated 100 samples under neutrality with mutation parameter θ = 10, using
the ms software (Hudson, 2002).
(./ms 500 100 -t 10 > ms.out)
where 500 is the number of sequences generated. For separate sequencing, we took
random sub-samples of size k = 10 from each sample, thus simulating separate
sequencing of 10 individuals each with an expected number λ of reads. With
NGS EXPERIMENTS WITH POOLED SAMPLES 13
pooling, we took samples of size n out of the 500 simulated sequences. From
this pool, reads were taken independently for each locus l by making a random
number of draws Ml with replacement. The quantities Ml were chosen according
to a Poisson distribution with expected value kλ.
For Tajima’s π, we used the bias correction (14) for individual loci and added
the estimates across loci using (15)). For Watterson’s θ, the bias has been cor-
rected using formula (18) for each locus.
Neglecting sequencing errors for the moment, it turns out that the pooling
approach with bias correction leads to more accurate estimates of θ and π, pro-
vided that the size of the pool is large enough. For small pools, multiple reads of
the same chromosome become more common, which affects the accuracy of the
estimates negatively. (Figure 5.)
We now investigate the pooling approach when including a protection against
sequencing errors by removing all segregating sites where the minor allele has
frequency x satisfies x = 1 or alternatively x ≤ 2. Again, the normalizing
constants have been adapted in order to avoid bias. Let b denote the minimum
required minor allele frequency.
Figure 5 shows the relative advantage of pooling conditional on different min-
imum minor allele frequencies. Pooling still leads to a decreased variance under
neutrality as long as the pool size is large enough. Not unexpectedly, the reduc-
(b)∗
tion in variance is now somewhat smaller for Watterson’s θ̂W . The increase in
(b)∗
the variance of Tajima’s θ̂π is much smaller, since frequency one minor alleles
receive a low weight in the calculation of π.
Unequal amounts of probe material. One obvious source of error in the
pooling approach is the heterogeneity in DNA amounts due to measurement
errors. In experiments that rely on PCR amplification, the heterogeneity can
be expected to be particularly strong.
Individuals for which a larger DNA amount has been included in the DNA
pool, will be over-represented, which potentially causes a change in allele fre-
quency estimates. This affects the bias and the variance also for our considered
population genetic summary statistics.
To investigate the sensitivity of population genetic estimates based on pool-
ing experiments, we simulated a scenario involving unequal amounts of probe
material. We set the expected amount of probe material to one, and allowed for
log-normally distributed multiplicative deviations from this expected value. More
specifically, the deviation factors have been chosen independently for each indi-
vidual contributing to the pool according to exp(Xi ), where Xi (1 ≤ i ≤ n) are
normal N(0, log(scale)) random variables. Thus the median amount of probe ma-
terial is always equal to one. If the deviation factor has a value of exp(Xi ) = 1.5,
this means that the respective individual will have a 50% higher chance of being
sequenced, than another with a factor of exp(Xi ) = 1. Similarly, a value of 0.8
means a 20% decreased chance of being read.
14 NGS EXPERIMENTS WITH POOLED SAMPLES
As our first scenario (scale = 2), slightly more than 30% of all individuals
differed at least twofold from the median. In other words, for a pool of size
n = 100, the most abundant individual contributed about sixteen times the probe
material of the least abundant individual. We also simulated a more extreme
scenario (scale = 8) where about 30% of the individuals differed at least eightfold
from the median. As further parameters we chose λ = 30, k = 10, n ∈ [5, 200].
As the amount of heterogeneity in the sample will usually be unknown, we
applied the same bias correction as for equal amounts of probe material. We
measured the deviation from the true θ by the mean squared error, as this ac-
counts for bias and variance.
Figure 6 displays the effect of heterogeneity in probe material on the bias and
the variance of Tajima’s π and Watterson’s θ. Although both bias and variance
change noticably for higher levels of heterogeneity, these effects cancel out to
a large extent. Thus the overall performance measured in terms of the mean
squared error
(20) MSE = Bias2 + Var
changes only marginally even for a large level of heterogeneity (scale = 8), see
Figure 7. This effect can be explained by shrinkage that leads to improved es-
timates of the mutation parameter θ by permitting for some bias (Futschik and
Gach, 2008). Interestingly, even for a high level of heterogeneity in probe mate-
rial (scale =8), the performance (measured in terms of the MSE) changes only
marginally.
Heterogeneity in probe material also affects the accuracy of the estimated allele
frequencies, as the variance of the estimator based on a pooled sample becomes
larger. However, this effect can be kept small, by choosing a pool of a large
enough size. This is illustrated in Figure 8, where it can be seen that pooling
leads for large enough pool sizes eventually to smaller variances even for a high
level of heterogeneity in probe material (scale =8).
Discussion
Over the past decades we have been witnessing a continuous turnover of molec-
ular markers used in genetical research. To a large extent this turnover has been
driven by the advances in molecular biology and technology. With the arrival of
the second generation sequencing technologies this race is about to come to an
end - rather than relying on a more or less representative fraction of the genome,
it has come into reach to have full genomic sequences available for multiple indi-
viduals.
With further technological advances, it is anticipated that it will become pos-
sible to sequence individual genomes at a cost that allows even small laboratories
to perform population analyses on a genome scale. Currently, this is not pos-
sible as the costs are still too high. In this study, we showed that sequencing
NGS EXPERIMENTS WITH POOLED SAMPLES 15
This work has been supported by a WWTF grant to AF and CS as well as FWF
Grants (P 19832-B11, L403-B11) awarded to CS. Special thanks to C. Kosiol, N.
de Maio, and R. Kofler for helpful comments on earlier versions of the manuscript.
We are grateful to A. Vasemägi and J. Wolf on general discussions about pooling
for NGS, and also thank the reviewers for helpful comments.
16 NGS EXPERIMENTS WITH POOLED SAMPLES
References
Achaz, G., 2008. Testing for neutrality in samples with sequencing errors. Genet-
ics 179, 1409–1424.
Durrett, R., 2008. Probabiliy models for DNA Sequence Evolution. Springer, New
York.
Eberle, M., Kruglyak, L., 2000. An analysis of strategies for discovery of single-
nucleotide polymorphisms. Genet. Epidemiol. 19, S29–S35.
Erlich, Y., Chang, K., Gordon, A., Ronen, R., Navon, O., Rooks, M., Han-
non, G. J., 2009. Dna sudoku-harnessing high-throughput sequencing for mul-
tiplexed specimen analysis. Genome Research 19, 1243–1253.
Futschik, A., Gach, F., 2008. On the inadmissibility of watterson’s estimate.
Theoretical Population Biology 73, 212–221.
Holt, K., Teo, Y., Li, H., Nair, S., Dougan, G., Wain, J., Parkhill, J., 2009.
Detecting snps and estimating allele frequencies in clonal bacterial populations
by sequencing pooled dna. Bioinformatics 25, 2074–2075.
Hudson, R. R., 2002. Generating samples under a wright-fisher neutral model of
genetic variation. Bioinformatics 18, 337–338.
Jiang, R., Tavaré, S., Marjoram, P., 2009. Population genetic inference from
resequencing data. Genetics 181, 187–197.
Knudsen, B., Miyamoto, M. M., 2009. Accurate and fast methods to estimate
the population mutation rate from error prone sequences. BMC Bioinformatics
10:247, doi:10.1186/1471–2105–10–247.
Lynch, M., 2008. Estimation of nucleotide diversity, disequilibrium coefficients,
and mutation rates from high-coverage genome-sequencing projects. Mol. Biol.
Evol. 25, 2421–2431.
Lynch, M., 2009. Estimation of allele frequencies from high-coverage genome-
sequencing projects. Genetics 182, 295–301.
Palmieri, N., Schlötterer, C., 2009. Mapping accuracy of short reads from mas-
sively parallel sequencing and the implications for quantitative expression pro-
filing. PloS one 4, e6323+.
Sham, P., Bader, J. S., Craig, I., M., O., M., O., 2002. Dna pooling: A tool for
large-scale association studies. Nature Rev. Genet. 3, 862–871.
Van Tassell, C. P. P., Smith, T. P. L. P., Matukumalli, L. K. K., Taylor, J.
F. F., Schnabel, R. D. D., Lawley, C. T. T., Haudenschild, C. D. D., Moore, S.
S. S., Warren, W. C. C., Sonstegard, T. S. S., 2008. Snp discovery and allele
frequency estimation by deep sequencing of reduced representation libraries.
Nat. Methods.
NGS EXPERIMENTS WITH POOLED SAMPLES 17
1.0
1.0
0.8
0.8
prob of SNP detection
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
1.0
0.8
0.8
prob of SNP detection
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
1.0
0.8
0.8
prob of SNP detection
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
0
log_10−prob. of false SNP calling
−2
−4
−4
−6
−6
−8
−8
−4.0 −3.5 −3.0 −2.5 −2.0 −1.5 −1.0 −4.0 −3.5 −3.0 −2.5 −2.0 −1.5 −1.0
0
log_10−prob. of false SNP calling
−2
−4
−4
−6
−6
−8
−8
−4.0 −3.5 −3.0 −2.5 −2.0 −1.5 −1.0 −4.0 −3.5 −3.0 −2.5 −2.0 −1.5 −1.0
0
log_10−prob. of false SNP calling
−2
−4
−4
−6
−6
−8
−8
−4.0 −3.5 −3.0 −2.5 −2.0 −1.5 −1.0 −4.0 −3.5 −3.0 −2.5 −2.0 −1.5 −1.0
40
40
o
30
30
corresponding k with pooling
20
o
20
o
o o
o o
o o
o o
o ++ o
o ++ o
++
10
o + o
10
o + o
o ++ xx o
oo ++ xxxx oo
++ +++
+
ooo ++++ xxxxxx oo
oo
+++
o +++ xxxxx oo +++++++ xxxxxx
o
o+++xxxx oo+ +xxxxxxxx
o+ oo+ ++
o+
o+ xxx o
o+++ +
xxxxxxxx
o+ xx
oo
x+
o
x+xxxx
x+ o
o+ o
o+
+ +
xx
o
o++
xx
x+
o
x+
o x+
o
x+
o o
++
xxo
o+
xxo
o+
+ +
xxxx
x+
o
+
0
0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
k k
10
10
8
8
mean estimate
mean estimate
6
6
4
4
2
2
0
0
0 50 100 200 0 50 100 200
n n
variance ratio
0 50 100 150 200 0 50 100 150 200
n n
variance ratio
0 50 100 150 200 0 50 100 150 200
n n
variance ratio
n n
10
10
bias (solid), sd (dashed)
8
6
6
4
4
2
2
0
0
0 50 100 150 200 0 50 100 150 200
n n
50
30
30
0 10
0 10
n n
1.5
1.5
MSE ratio
MSE ratio
1.0
1.0
0.5
0.5
0.0
0.0
0 50 100 150 200 0 50 100 150 200
n n
1.5
MSE ratio
MSE ratio
1.0
1.0
0.5
0.5
0.0
0.0
n n
3.0
2.5
2.0
variances ratios
1.5
1.0
0.5
0.0