3 Prediction Handout

Machine Learning II
Part I: An overview of Bayesian Nonparametrics

Predictive approach to BNP
Igor Prünster
Bocconi University
1 / 25
Exchangeability & Prediction
de Finetti’s representation theorem
A sequence of X–valued observations (Xn )n≥1 is exchangeable if and only

if
n
Z Y
P[X1 ∈ A1 , . . . , Xn ∈ An ] = P(Ai )Q(dP)
P i=1
for any n ≥ 1, where P is the space of probability measures on X.
Prediction within de Finetti’s framework:

▶ Singular (or one–step) prediction
Z
P[Xn+1 ∈ A |X (n) ] = P(A) Q(dP|X (n) )
| {z } P | {z }
predicitve distribution ∈ P posterior distribution
(n)
where throughout we set X := (X1 , . . . , Xn ).
▶ m–step prediction
Z m
Y
P[Xn+1 ∈ A1 , . . . , Xn+m ∈ Am |X (n) ] = P(Ai ) Q(dP|X (n) ).
P i=1
2 / 25
Two key properties of the Dirichlet process
Recall the following key properties of the Dirichlet process:

▶ For the number of distinct values (species, agents, clusters, components
etc.) Kn generated by a sample X (n) we have
θk
P [Kn = k] = |s(n, k)|
(θ)n
n
X θ
E[Kn ] =
i=1
θ + i −1
with (θ)n := θ(θ + 1) . . . (θ + n − 1) and |s(n, k)| the signless Stirling

number of the first kind.
▶ The predictive distributions are of the form
Kn
θ n 1 X
P[Xn+1 ∈ · |X (n) ] = P ∗( · ) + Ni δXi∗ ( · )
θ+n | {z } θ+n n i=1
| {z } prior guess
| {z }
| {z }
P[Xn+1 =“new” | X (n) ] P[Xn+1 =“old” | X (n) ] emprirical measure
3 / 25
m–step prediction for the Dirichlet process
Given a basic sample X (n) , prediction of various features of an additional

sample X (m) := (Xn+1 , . . . , Xn+m ) is of interest. In particular,
Km(n) := Km+n − Kn
is the number of new species to be recorded in X (m) given X (n) .

In the DP case [Favaro, P & Walker, 2011] one has
h i θj (θ)n
P Km(n) = j|X (n) = |s(m, j, n)|
(θ)(n+m)
m
h i X θ
E Km(n) |X (n) =
i=1
θ + n + i −1
with|s(n, k, r )| denoting the non–central version of the signless Stirling number.

(n)
=⇒ The only sample information affecting the distribution of Km |X (n) is the
size n; it depends neither on Kn nor on N1 , . . . , NKn !
4 / 25
Since P[Xn+1 = “new” | X (n) ] = θ+n

θ
for any sample size n, clearly Kn will
diverge as the sample size n → ∞. But which is its growth rate?
▶ In the unconditional case, by Korwar and Hollander (1973), one has
Kn a.s.
−→ θ (n → ∞)
log n
▶ In the conditional case, given a fixed sample X (n) , one has

(n)
Km a.s.
X (n) −→ θ (m → ∞).
log m
Remark: Ideally one would like a model with a flexible growth rate which
depends on the model parameters (e.g. nσ with parameter σ ∈ (0, 1)). Instead,
the DP displays a logarithmic growth and such a behaviour is unaffected by the
sample X (n) =⇒ highly restrictive and inappropriate in applications [e.g.
linguistics (Teh, 2006), certain bipartite graphs (Caron, 2012), network models
(Caron & Fox, 2017) and species sampling]
Can such a model be really considered nonparametric? What is re-
sponsible for such a restrictive behaviour?
It all boils down to P[Xn+1 = “new” | X (n) ] = θ
θ+n
!
5 / 25
Gibbs–type priors & the Pitman–Yor process
Probability of discovering a new species

As seen, a key quantity in the analysis of discrete nonparametric priors is the
probability of discovering a new species
P[Xn+1 = “new” | X (n) ]. (∗)
Fundamental Characterization:
Based on (∗), discrete P̃ classified in 3 categories [De Blasi et al., 2015]:
(a) P[Xn+1 = “new” | X (n) ] = f (n, model parameters)
⇐⇒ depends on n but not on Kn and Nn = (N1 , . . . , NKn )
⇐⇒ Dirichlet process;
(b) P[Xn+1 = “new” | X (n) ] = f (n, Kn , model parameters)
⇐⇒ depends on n and Kn but not on Nn = (N1 , . . . , NKn )
⇐⇒ Gibbs–type priors [Gnedin & Pitman, 2006];
(c) P[Xn+1 = “new” | X (n) ] = f (n, Kn , Nn , model parameters)
⇐⇒ depends on all information conveyed by the sample i.e. n, Kn and
Nn = (N1 , . . . , NKn )
⇐⇒ serious tractability issues.
=⇒ Let’s go for the intermediate case which exhibits a richer predictive
structure than the DP!
6 / 25
Gibbs–type priors
Q is a Gibbs-type prior of order σ ∈ (−∞, 1) if and only if it gives rise to

predictive distributions of the form
P[Xn+1 ∈ · | X (n) ]
PKn
i=1 (Ni − σ) δXi ( · )

Vn+1,Kn +1 Vn+1,Kn +1 ∗
= P ∗( · ) + 1 −
Vn,K Vn,Kn n − σKn
| {z n }
| {z }
prior guess | {z }| {z }
weighted emprirical measure
P[Xn+1 =“new” | X (n) ] P[Xn+1 =“old” | X (n) ]
where P ∗ is diffuse and {Vn,j : n ≥ 1, 1 ≤ j ≤ n} is a set of weights which

satisfy the recursion
Vn,j = (n − jσ)Vn+1,j + Vn+1,j+1 . (♢)
=⇒ A Gibbs–type prior is completely characterized by choice of P ∗ , σ < 1 and

a set of weights Vn,j ’s.
=⇒ Crucially now P[Xn+1 = “new” | X (n) ] depends on both the sample size n
and the distinct values in the sample Kn .
7 / 25
The Pitman-Yor process & the NGG process

By defining, for σ ≥ 0 and θ > −σ or σ < 0 and θ = r |σ| with r ∈ N,
Qk−1
(θ + iσ)
Vn,k = i=1
(θ + 1)n−1
one obtains the two parameter Poisson–Dirichlet process aka Pitman–Yor (PY)
process [Pitman & Yor, 1997], which yields
PKn
i=1 (Ni − σ)δXi ( · )

(n) θ + σKn ∗ n − σKn ∗
P Xn+1 ∈ · X = P (·) + .
θ+n θ+n n − σKn
θ+Kn σ θ
=⇒ if σ = 0, the PY reduces to the Dirichlet process and θ+n
to θ+n
.
The normalized generalized gamma process (NGG) arises with
n−1
!
eβ σ k−1 X n − 1

i
Vn,k = (−1)i β i/σ Γ k − ; β ,
Γ(n) i=0 i σ
where β > 0, σ ∈ (0, 1) and Γ(x, a) denotes the incomplete gamma function. If
σ = 1/2 it reduces to the normalized inverse Gaussian process (N–IG). [Lijoi,
Mena & P, 2005 & 2007b].
=⇒ In the following we restrict attention to the PY process but most results
carry over (with minor modifications) to general Gibbs–type priors
8 / 25
A closer look at the predictive structure

The Gibbs–structure allows to look at the predictives as the result of two steps:
(1) Xn+1 is a new species with probability
θ + σKn
θ+n
or “old” {X1∗ , . . . , XK∗n } with probability (n − σKn )(θ + n)−1 .
=⇒ depends on n and Kn but not on the frequencies Nn = (N1 , . . . , NKn )
=⇒ P[Xn+1 = “new” | X (n) ] is monotonically increasing or decreasing in
Kn according to σ > 0 or σ < 0, respectively.
(2) (i) Given Xn+1 is new, it is independently sampled from P ∗ .
(ii) Given Xn+1 is a tie, it coincides with Xi∗ with probability
Ni − σ
.
n − σKn
=⇒ A reinforcement mechanism driven by σ takes place among the
“old” values. For instance, if N1 = 2 and N2 = 1 then
 
 < 2 if σ < 0  1.5 if σ = −1
P[Xn+1 = X1∗ | X (n) ] 2−σ e.g .
= = 2 if σ = 0 = 2 if σ = 0
P[Xn+1 = X2∗ | X (n) ] 1−σ
> 2 if σ > 0 3 if σ = 0.5
 
9 / 25
The number of clusters generated by the PY process

The (prior) distribution of the number of clusters [Pitman, 2006] is given by
Qk−1
(θ + iσ)
P(Kn = j) = i=1 C (n, j; σ)
(θ + 1)n−1 σ j
with C (n, j; σ) = j!1 ji=0 (−1)i ji (−iσ)n a generalized factorial coefficient.
P
In general, the dependence of the distribution of Kn on the prior parameters is:

▶ σ controls the “ flatness ” (or variability) of the (prior) distribution of Kn .
▶ θ controls the location of the (prior) distribution of Kn
(n)
Recall that Km is the number of new species to be recorded in the additional
sample of size m given X (n) featuring Kn = j distinct values. One has that
Qj+k−1
(θ + iσ)

(θ + 1)n−1 i=j
(n)
P Km = k X (n)
= C (m, k; σ, −n + jσ).
(θ + 1)n+m−1 σk
and also that the expected number of new species is

(n) (n) θ (θ + n + σ)m
E[Km |X ] = j + −1 .
σ (θ + n)m
See Lijoi, Mena & P (2007a) and Favaro, Lijoi, Mena & P (2009).
10 / 25
Prior distribution of the number of clusters as σ varies
0.3
0.2
0.1
0
0.2
10 0.3
20 0.4
0.5
30
0.6
40 0.7
50 0.8
Prior distributions on the number of clusters corresponding to the PY

process with n = 50, θ = 1 and σ = 0.2, 0.3, . . . , 0.8.
11 / 25
Asymptotics for the number of clusters in the PY case

a.s.
▶ If σ < 0 and θ = r |σ|, then Kn −→ r as n → ∞. Also, conditionally on
(n) a.s.
X featuring Kn = j distinct values, Km |X (n) −→ r − j for m → ∞.
(n)
▶ One has
Kn a.s.
−→ Yθ/σ (n → ∞).
nσ
where Yq , with q ≥ 0 is a suitably generalized Mittag–Leffler random
variable. [Pitman, 2006]
▶ Conditional on X (n) featuring Kn = j distinct values, as the size of the
additional sample m diverges
(n)
Km a.s.
X (n) −→ Zn,j (m → ∞)
mσ
d
where Zn,j = Bj+θ/σ, n/σ−j Y(θ+n)/σ with Ba,b a beta random variable
independent of Yq . [Favaro, Lijoi, Mena & P, 2009]
=⇒ In the DP case logarithmic growth of Kn : such a restriction has been
overcome by allowing a richer predictive structure and the growth now depends
on the model parameter σ.
12 / 25
Species sampling
Data structure in species sampling problems
▶ X (n) = basic sample of draws from a population containing different

species (plants, genes, animals,...). Information:
⋄ sample size n and number of distinct species in the sample Kn ;
PKn
⋄ a collection of frequencies N = (N1 , . . . , NKn ) s.t. i=1 Ni = n;
⋄ the labels (names) Xi∗ ’s of the distinct species, for i = 1, . . . , Kn .
▶ The information provided by N can also be coded by M := (M1 , . . . , Mn )

Mi = number of species in the sample X (n) having frequency i.
Note that ni=1 Mi,n = Kn and ni=1 iMi,n = n.
P P
▶ Example: Consider a basic sample such that

⋄ n = 10 with j = 4 and frequencies (n1 , n2 , n3 , n4 ) = (2, 5, 2, 1).
⋄ equivalently we can code this information as
(m1 , m2 , . . . , m10 ) = (1, 2, 0, 0, 1, . . . , 0),
meaning that 1 species appears once, 2 appear twice and 1 five times.
13 / 25
Species sampling
One–step Prediction
▶ Discovery probability estimation: Given a basic sample X (n) , estimate the
probability of discovering at the (n+1)–th sampling step either a new
species or an “ old ” species with frequency r .
▶ Turing estimator [Good, 1953; Mao & Lindsay, 2002]: probability of
discovering at (n+1)–th step a new species is
m1
n
and a species with frequency r in X (n) is
mr +1
(r + 1) .
n
▶ Problem: mr +1 is used to estimate the probability of discovering a species
with frequency r =⇒ counterintuitive! It should be based on mr .
E.g. If m5 = 10, m6 = 0, the estimated probability of detecting a species
with frequency 5 would be 0 =⇒ problem bypassed by use of “ smoothing
functions ” but, in I.J. Good’s words, it seems like an “adhockery”!
Origin of the problem? In a frequentist nonparametric setup there is no
natural quantity to use for estimating the probability of discovering a new
species and so m1 is used. Hence, the discovery probability of species with
frequency 1 uses m2 and so on.
14 / 25
Species sampling
BNP approach to discovery probab. [Lijoi, Mena & P, 2007a]
Key advantage of the Bayesian nonparametric approach is that the

predictive structure includes a positive probability of discovering a new
species, value, cluster, agent etc.
Assume the data (Xn )n≥1 are exchangeable with a PY de Finetti measure.
(n)
BNP analog to Turing estimator: given Pna basic sample X featuring Kn = j
distinct species with m1 , . . . , mn s.t. i=1 i mi = n:
▶ the probability of discovering a new species is
θ + σj
P[Xn+1 = “new” | X (n) ] = .
θ+n
▶ the probability of detecting a species with frequency r in X (n) is

r −σ
P[Xn+1 = “species with frequency r ” | X (n) ] = mr .
θ+n
=⇒ Probability of sampling a species with frequency r depends, in agreement

with intuition, on mr and also on Kn = j.
15 / 25
Species sampling
More discovery probability estimation problems

Conditionally on a basic sample X (n) , estimate the probability of discovering at
the (n+m+1)–th step either a new species or an “ old ” species with frequency
r without observing the additional sample X (m) := (Xn+1 , . . . , Xn+m ).
Remark. From such an estimator one immediately obtains:
▶ the discovery probability for rare species i.e. the probability of discovering
a species which is either new or has frequency at most τ at the
(n+m+1)–th step =⇒ rare species estimation
▶ an optimal additional sample size: sampling is stopped once the
probability of sampling new or rare species is below a certain threshold
▶ the sample coverage, i.e. the proportion of species in the population
detected with a sample of size n + m.
Frequentist estimators:
▶ Good–Toulmin estimator [Good & Toulmin, 1956; Mao, 2004]: estimator
for the probability of discovering a new species at (n+m+1)–th step.
=⇒ unstable if the size of the additional unobserved sample m is larger
than n (estimated probability becomes either < 0 or > 1).
▶ No frequentist nonparametric estimator for the probability of discovering a
species with frequency r at (n+m+1)–th sampling step is available.
16 / 25
Species sampling
▶ BNP analog of the Good–Toulmin estimator [Favaro, Lijoi, Mena & P,

2009]: estimator for the probability of discovering a new species at the
(n+m+1)–th step
θ + kσ (θ + n + σ)m
P[Xn+m+1 = “new” | X (n) ] = .
θ + n (θ + n + 1)m
▶ BNP estimator for the probability of discovering a species with frequency

r at the (n+m+1)–th sampling step [Favaro, Lijoi and P, 2012]:
P[Xn+m+1 = “ species with frequency r ” | X (n) ]

r
!
X m (θ + n − i + σ)m−r +i
= mi (i − σ)r +1−i
i=1
r − i (θ + n)m+1
!
m (θ + kσ)(θ + n + σ)m−r
+ (1 − σ)r
r (θ + n)m+1
17 / 25
Species sampling
Discovery probability at the (n + m + 1)–th sampling step.

1.2 Anaerobic Aerobic
PY-Estimator PY-Estimator
GT-Estimator GT-Estimator
Probability of discovering a new species DP-Estimator DP-Estimator
1.0
0.8
0.6
0.4
0.2
0.0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Size of the additional sample
EST data from Naegleria gruberi aerobic and anaerobic cDNA libraries
with basic sample n ∼
= 950: Good–Toulmin (GT), DP process and PY
process estimators of the probability of discovering a new gene at the
(n + m + 1)–th sampling step for m = 1, . . . , 2000.
18 / 25
Species sampling
Expected number of new genes in an additional sample of size

m. 2000
Aerobic Anaerobic
PY-Estimator PY-Estimator
GT-Estimator GT-Estimator
DP-Estimator DP-Estimator
Expected number of new genes
1500
1000
500
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Size of the additional sample
EST data from Naegleria gruberi aerobic and anaerobic cDNA libraries
with basic sample n ∼
= 950: Good–Toulmin (GT), DP process and PY
process estimators of the number of new genes to be observed in an
additional sample of size m = 1, . . . , 2000.
19 / 25
Species sampling
Some remarks on BNP models for species sampling problems
▶ BNP estimators based on Gibbs–type priors available for the before

mentioned and other quantities of interest in species sampling problems.
▶ BNP models correspond to large probabilistic models in which all objects
of potential interest are modeled jointly and coherently thus leading to
intuitive predictive structures
=⇒ avoids ad–hoc procedures and incoherencies sometimes connected
with frequentist nonparametric procedures.
▶ Gibbs–type priors with σ > 0 (recall that they assume an infinite number
of species) are ideally suited for populations with large unknown number
of species =⇒ typical case in Genomics.
▶ In Ecology “∞” assumption often too strong =⇒ Gibbs–type priors with
σ < 0 (surprising heuristic by–product: by combining Gibbs-type priors
with σ > 0 and σ < 0 is possible to identify situations in which frequentist
estimators work).
20 / 25
Frequentist Consistency
Frequentist Posterior Consistency
“ What if ” or frequentist approach to consistency [Diaconis and Freedman,

1986]: What happens if the data are not exchangeable but i.i.d. from a “true”
P0 ? Does the posterior Q( · |X (n) ) accumulate around P0 as the sample size
increases?
Definition. Q is weakly consistent at P0 if for every Aε
n→∞
Q(Aε |X (n) ) −→ 1 a.s. − P0∞
where Aε is a weak neighbourhood of P0 and P0∞ denotes the infinite product

measure.
The typical proof strategy for discrete nonparametric priors consists in showing
▶ E[P̃ | X (n) ] n→∞
−→ P0 a.s.–P0∞ (the key step consists in showing that
n→∞
P[Xn+1 = “new” | X (n) ] −→ 0 a.s.–P0∞ )
▶ Var[P̃ | X (n) ] n→∞
−→ 0 a.s.–P0∞ by finding a suitable bound on the variance.
21 / 25
Consistency of the PY process

Denote by κn the number of distinct values generated by the “ true ” P0 in a
sample of size n. We distinguish two cases of P0 :
▶ P0 is discrete (with finite or infinite support points) =⇒ κnn n→∞
−→ 0:
The predictive distribution given a sample generated by P0 is then
P[Xn+1 ∈ · | X (n) ] =
κn
θ + σκn ∗ n − σκn X (Ni − σ)
= P (·) + δXi∗ ( · )→ P0 ( · )
θ+n θ+n n − σκ
| {z } | {z } i=1 | {z n}
→0 →1 ∼Ni /n
▶ P0 is diffuse (i.e. P0 ({x}) = 0 ∀x ∈ X) =⇒ κn = n and Ni = 1, ∀i.
n
θ + σn ∗ n − σn X (1 − σ)
P[Xn+1 ∈ · | X (n) ] = P (·) + δXi∗ ( · )
θ+n θ+n n − σn
| {z } | {z } i=1 | {z }
→σ →1−σ =1/n
∗
→ σP ( · ) + (1 − σ)P0 ( · )
22 / 25
n→∞
Moreover, in both cases we have Var[P̃ | X (n) ] −→ 0 a.s.–P0∞ and hence:
▶ The PY process is consistent for discrete data generating P0 (e.g. species
sampling problems);
▶ The PY process is inconsistent for diffuse data generating P0 .
Similar results hold for general Gibbs–type priors [De Blasi, Lijoi and P, 2013].
Should one worry about the inconsistency result in the case of diffuse P0 ? NO!
Discrete nonparametric priors are designed to model discrete distributions and
should not be used to model data from continuous distributions. In fact, as the
sample size n diverges:
▶ P0 generates (Xn )n≥1 containing no ties with probability 1;
▶ a discrete de Finetti measure generates (Xn )n≥1 containing no ties with
probability 0
=⇒ model and data generating mechanism are incompatible!
=⇒ There are not good or bad priors, but rather models which are
suitable/compatible/designed for a certain phenomenon and not for others!
23 / 25
Some References
Some References
• Antoniak(1974). Mixtures of Dirichlet processes with applications to Bayesian

nonparametric problems. Ann. Statist. 2, 1152–1174.
• De Blasi, Favaro, Lijoi, Mena, Prünster & Ruggiero (2015). Are Gibbs-type priors
the most natural generalization of the Dirichlet process? IEEE TPAMI 37, 212-229.
• De Blasi, Lijoi, & Prünster (2013). An asymptotic analysis of a class of discrete
nonparametric priors. Satist. Sinica 23, 1299–1322.
• Caron (2012). Bayesian nonparametric models for bipartite graphs. NIPS’2012.
• Caron & Fox (2017), Sparse graphs using exchangeable random measures. J. R.
Stat. Soc. B 79, 1295-1366.
• Diaconis & Freedman (1986). On the consistency of Bayes estimates. Ann. Statist.
14, 1-26.
• Favaro, Lijoi, Mena & Prünster (2009). Bayesian nonparametric inference for species
variety with a two parameter Poisson-Dirichlet process prior. J. Roy. Statist. Soc. Ser.
B 71, 993–1008.
• Favaro, Lijoi & Prünster (2012). A new estimator of the discovery probability.
Biometrics 68, 1188-96.
• Favaro, Prünster & Walker (2011). On a class of random probability measures with
general predictive structure. Scand. J. Statist. 38, 359–376.
• Ferguson (1973). A Bayesian analysis of some nonparametric problems. Ann.
Statist. 1, 209-30.
• Gnedin & Pitman (2006). Exchangeable Gibbs partitions and Stirling triangles. J.
Math. Sci. (N.Y.) 138, 5674-85.
24 / 25
Some References
• Good & Toulmin (1956). The number of new species, and the increase in
population coverage, when a sample is increased. Biometrika 43, 45-63.
• Good (1953). The population frequencies of species and the estimation of
population parameters. Biometrika 40, 237-64.
• Korwar & Hollander (1973). Contribution to the theory of Dirichlet processes.
Ann. Probab. 1, 705-11
• Lijoi, Mena & Prünster (2007a). Bayesian nonparametric estimation of the
probability of discovering new species. Biometrika 94, 769–786.
• Lijoi, Mena & Prünster (2007b). Controlling the reinforcement in Bayesian
nonparametric mixture models. J. Roy. Statist. Soc. Ser. B 69, 715–740.
• Lo (1991). A characterization of the Dirichlet process. Statist. Probab. Lett. 12,
185–187.
• Mao (2004). Prediction of the conditional probability of discovering a new class. J.
Am. Statist. Assoc. 99, 1108-18.
• Mao & Lindsay (2002). A Poisson model for the coverage problem with a genomic
application. Biometrika 89, 669-81.
• Pitman & Yor (1997). The two-parameter Poisson-Dirichlet distribution derived
from a stable subordinator. Ann. Probab. 25, 855–900.
• Pitman (2006). Combinatorial Stochastic Processes. Lecture Notes in Math.,
vol.1875, Springer, Berlin.
• Regazzini (1978). Intorno ad alcune questioni relative alla definizione del premio
secondo la teoria della credibilità. Giorn. Istit. Ital. Attuari, 41, 77–89.
• Teh (2006). A Hierarchical Bayesian Language Model based on Pitman-Yor
Processes. Coling/ACL 2006, 985-92.
25 / 25

3 Prediction Handout

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 Prediction Handout

Uploaded by

Copyright:

Available Formats

Machine Learning II

Part I: An overview of Bayesian Nonparametrics

Exchangeability & Prediction

de Finetti’s representation theorem

A sequence of X–valued observations (Xn )n≥1 is exchangeable if and only

for any n ≥ 1, where P is the space of probability measures on X.

Prediction within de Finetti’s framework:

Two key properties of the Dirichlet process

Recall the following key properties of the Dirichlet process:

with (θ)n := θ(θ + 1) . . . (θ + n − 1) and |s(n, k)| the signless Stirling

m–step prediction for the Dirichlet process

Given a basic sample X (n) , prediction of various features of an additional

is the number of new species to be recorded in X (m) given X (n) .

with|s(n, k, r )| denoting the non–central version of the signless Stirling number.

Since P[Xn+1 = “new” | X (n) ] = θ+n

▶ In the conditional case, given a fixed sample X (n) , one has

Probability of discovering a new species

P[Xn+1 = “new” | X (n) ]. (∗)

Q is a Gibbs-type prior of order σ ∈ (−∞, 1) if and only if it gives rise to

where P ∗ is diffuse and {Vn,j : n ≥ 1, 1 ≤ j ≤ n} is a set of weights which

Vn,j = (n − jσ)Vn+1,j + Vn+1,j+1 . (♢)

=⇒ A Gibbs–type prior is completely characterized by choice of P ∗ , σ < 1 and

The Pitman-Yor process & the NGG process

A closer look at the predictive structure

The number of clusters generated by the PY process

In general, the dependence of the distribution of Kn on the prior parameters is:

Prior distribution of the number of clusters as σ varies

Prior distributions on the number of clusters corresponding to the PY

Asymptotics for the number of clusters in the PY case

Data structure in species sampling problems

▶ X (n) = basic sample of draws from a population containing different

▶ The information provided by N can also be coded by M := (M1 , . . . , Mn )

▶ Example: Consider a basic sample such that

(m1 , m2 , . . . , m10 ) = (1, 2, 0, 0, 1, . . . , 0),

BNP approach to discovery probab. [Lijoi, Mena & P, 2007a]

Key advantage of the Bayesian nonparametric approach is that the

▶ the probability of detecting a species with frequency r in X (n) is

=⇒ Probability of sampling a species with frequency r depends, in agreement

More discovery probability estimation problems

▶ BNP analog of the Good–Toulmin estimator [Favaro, Lijoi, Mena & P,

▶ BNP estimator for the probability of discovering a species with frequency

P[Xn+m+1 = “ species with frequency r ” | X (n) ]

Discovery probability at the (n + m + 1)–th sampling step.

Size of the additional sample

Expected number of new genes in an additional sample of size

Some remarks on BNP models for species sampling problems

▶ BNP estimators based on Gibbs–type priors available for the before

Frequentist Posterior Consistency

“ What if ” or frequentist approach to consistency [Diaconis and Freedman,

where Aε is a weak neighbourhood of P0 and P0∞ denotes the infinite product

Consistency of the PY process

▶ P0 is diffuse (i.e. P0 ({x}) = 0 ∀x ∈ X) =⇒ κn = n and Ni = 1, ∀i.

• Antoniak(1974). Mixtures of Dirichlet processes with applications to Bayesian

You might also like