Professional Documents
Culture Documents
[Cérou, Frédéric]
On: 6 April 2007
Access Details: [subscription number 772729327]
Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954
Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction,
re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly
forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents will be
complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be
independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings,
demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or
arising out of the use of this material.
Frédéric Cérou
IRISA/INRIA, Campus de Beaulieu, Rennes, France
Arnaud Guyader
Equipe de Statistique, Université Haute Bretagne, Rennes, France
Abstract: The estimation of rare event probability is a crucial issue in areas such
as reliability, telecommunications, aircraft management. In complex systems,
analytical study is out of question and one has to use Monte Carlo methods.
When rare is really rare, which means a probability less than 10−9 , naive Monte
Carlo becomes unreasonable. A widespread technique consists in multilevel
splitting, but this method requires enough knowledge about the system to decide
where to put the levels at hand. This, unfortunately, is not always possible.
In this article, we propose an adaptive algorithm to cope with this problem: The
estimation is asymptotically consistent, costs just a little bit more than classical
multilevel splitting, and has the same efficiency in terms of asymptotic variance.
In the one-dimensional case, we rigorously prove the a.s. convergence and the
asymptotic normality of our estimator, with the same variance as with other
algorithms that use fixed crossing levels. In our proofs we mainly use tools from
the theory of empirical processes, which seems to be quite new in the field of
rare events.
1. INTRODUCTION
Let Xt t≥0 be a strong Markov process with values in . Suppose that
Downloaded By: [Cérou, Frédéric] At: 10:05 6 April 2007
T0 = inft ≥ 0 Xt = 0
TM = inft ≥ 0 Xt = M
we have
TM < T0 ≈ 0
2. THE ALGORITHM
1 1
and sort the sample Sn1 Snn in increasing order:
1
Sn1 ≤ · · · ≤ Snn−k
1
≤ · · · ≤ Snn
1
q̂1 = Snn−k
1
1
• Step 2: Keep Snn−k+1 1
Snn unchanged, but denote them simply
2
2
Snn−k+1 Snn . Simulate n − k trajectories Xtj t≥0 from initial
point q̂1 . Wait until all these n − k trajectories have reached 0: for
the jth particle, this requires time T0j , with T0j = q̂1 T0 < +.
For each j ∈ 1 n − k, denote
2
Snj = sup Xtj
j
0≤t≤T0
2 2 2 2
and sort the sample Sn1 Snn−k Snn−k+1 Snn in increasing
order:
2
Sn1 ≤ · · · ≤ Snn−k
2
≤ · · · ≤ Snn
2
q̂2 = Snn−k
2
3. CONSISTENCY
Thanks to the strong Markov property of the process and the continuity
of its trajectories, at the end of step l, it is clear that, given q̂l−1 , the
l
random variables Snj 1≤j≤n are i.i.d. according to the following law :
∼ sup Xt 0 sup Xt 0 ≥ q̂l−1
l x x
Snj
0≤t≤T0 0≤t≤T0
with the convention that q̂0 = x0 . We can write it a little bit simpler:
q̂
l
Snj ∼ sup Xt l−1 q̂l−1
0≤t≤T0
x
Let us also denote S = sup0≤t≤T0 Xt 0 , and F its distribution function.
We suppose that F is continuous. (Note that this property is not implied
by the continuity of trajectories.) We now define F a deterministic
function of two variables q1 q2 as follows:
Fq1 q2 = S ≤ q2 S > q1
Since F is continuous, we have the obvious identity
Fq2 − Fq1
Fq1 q2 =
1 − Fq1
422 Cérou and Guyader
l
Thus, each Snj has the distribution function Fq̂l−1 . Note that at each
step, the algorithm chooses q̂l so that Fq̂l−1 q̂l is close to q = 1 − p.
l l
Downloaded By: [Cérou, Frédéric] At: 10:05 6 April 2007
Hypothesis ( ). The strongly Markov process Xt t≥0 starts from
x0 > 0, with 0 as an attractive point. Xt t≥0 has time-continuous
trajectories and the distribution function F of the random variable
x
S = sup0≤t≤T0 Xt 0 is continuous.
n→
a s
Hn − n
−−→ 0
n→
Proof of the Theorem. First of all, we shall see that for all l :
a s
F q̂l−1 q̂l −−→ 1 − p (1)
n→
p
. Then we have that a.s. for n large enough,
N
+1
N
1 − Fq̂k−1 q̂k <
< 1 − Fq̂k−1 q̂k
k=1 k=1
that is,
1 − Fq̂N +1 <
< 1 − Fq̂N
so that, a.s. for n large enough, the algorithm stops after N =N
iterations.
N Let us denote
by nN the empirical distribution function of
1 ≤ j ≤ n . Using Lemma 2, we have that a.s.
Downloaded By: [Cérou, Frédéric] At: 10:05 6 April 2007
the Snj
a s
Fq̂N M − nN M ≤
Fq̂N − nN
−−→ 0
n→
which implies:
lim
ˆ n = lim pN 1 − nN M = pN =
n→+ n→+ pN
which gives the estimate consistency.
log
N
a s
1 − Fq̂k−1 q̂k −−→ pN =
n→
k=1
ˆ n = pN −1 1 − nN M q̂N ≥M + pN 1 − nN +1 M q̂N <M
We also have, using Lemma 2 in the same way as we did in the first part
of the proof:
and
Then we get
ˆ
n −
≤ pN −1 1 − nN M − pN · q̂N ≥M
+ pN 1 − nN +1 M − pN · q̂N <M
results say that empirical quantiles converge almost surely towards the
true quantiles (see for example [18], Lemma 21.2, p. 305).
4. ASYMPTOTIC NORMALITY
with
1−p 1−r
2 =
2 N +
p r
Then
Vn + Wn −−→ V
n→
and
P
Vn Wn −−→ 0
n→
426 Cérou and Guyader
random variables.
Lemma 5. Let l ≥ 1, q̂l , q̂l+1 , F, and kn be as before. For any test function
→,
Fq̂l q̂l+1 q̂l = Unn−kn
with Unj 1≤j≤n a triangular array of identically distributed random
variables with uniform law on 0 1 and row-wise independent.
and
√
nrn − r −−→
0 2
n→
Let us consider next a triangular array Bnj 1≤j≤n of random variables, with
the nth row being conditionally to
n i.i.d. Bernoulli trials, of parameter rn
(i.e., for all 1 ≤ j ≤ n, Bnj = 1 rn = rn = 1 − Bnj = 0 rn ). Then we
have the following result:
n
√ 1
n Bnj − r −−→
0 s2
n j=1 n→
Proof. In the sequel the notation o· refers to a.s. convergence. Let us
first consider the conditional characteristic function
n
√ 1
nrn t = exp it n Bnj − r rn
n j=1
√ √it
n
= e−itr n rn e n + 1 − rn
√
√it
= exp −itr n + n log rn e n + 1 − rn
√ it t2 1
= exp −itr n + n log rn 1 + √ − +o + 1 − rn
n 2n n
√ itr r t 2
1
= exp −itr n + n log 1 + √ n − n + o
n 2n n
Adaptive Multilevel Splitting for Rare Event Analysis 427
nrn t
Downloaded By: [Cérou, Frédéric] At: 10:05 6 April 2007
√ itr r t2 1 itrn rn t2 2 1
= exp − itr n + n √ n − n − √ − +o
n 2n 2 n 2n n
√ 1
= exp it nrn − r − rn 1 − rn t + o1
2
2
where
1 a s 1
r 1 − rn t2 + o1 −−→ r1 − rt2
2 n n→ 2
and
√
nrn − r −−→
0 2
n→
k=1
428 Cérou and Guyader
n→+
We want to prove that the other terms in Equation (2) both converge in
distribution. For this we use the characteristic function.
l
√
n t = exp it p n 1 − Fq̂k−1 q̂k − pl
k=1
√
+ pl n 1 − Fq̂l q̂l+1 − p
k=1
√
× exp itpl n1 − Fq̂l q̂l+1 − p q̂1 q̂l
l
√
= exp itp n 1 − Fq̂k−1 q̂k − p l
k=1
l√
× exp itp n1 − Fq̂l q̂l+1 − p q̂l
Lemma 5 ensures that we can write the last term in another way:
√ √
expitpl n 1 − Fq̂l q̂l+1 − p q̂l = exp itpl n1 − Unn−kn
k=1
√
× exp itpl n1 − Unn−kn
Adaptive Multilevel Splitting for Rare Event Analysis 429
And now we just have to show that both terms have a Gaussian limit.
By induction hypothesis, we know that
Downloaded By: [Cérou, Frédéric] At: 10:05 6 April 2007
l
√
p n 1 − Fq̂k−1 q̂k − p −−−→
0 p2 l2
l
n→+
k=1
Thus,
√
l √
p n 1 − F q̂k−1 q̂k −p l
+ pl n 1 − Fq̂l q̂l+1 − p
k=1
−−−→
0 p2 l2 + p2l+1 1 − p
n→+
2
with l+1 = p2 l2 + p2l+1 1 − p. From this recursion, we deduce:
N2 = Np2N −1 1 − p
Now we deal with the last step. Let N be the (random) number of steps
log
1 n
ˆ N = N
n j=1 Snj ≥M
rn = Snj
N ≥M = 1 q̂N = 1 − Fq̂N M = (3)
1 − Fq̂N
N
Downloaded By: [Cérou, Frédéric] At: 10:05 6 April 2007
Let r =
p−N . Using Equation 3, we get:
√ √ pN − 1 − Fq̂N
nrn − r =
n
pN 1 − Fq̂N
We know from the proof of Theorem 1 that
a s
1
−−−→ N N (5)
pN 1 − Fq̂N n→+ p p
So we have that
√
nrn − r −−−→
0 r2
n→+
with
2
r2 = N1 − p
p2N +1
Then we apply Lemma 6 to get that
√
nˆ
dN −
−−−→
0 2
n→+
with
1−p 1−r
2 =
2 N +
p r
. Let us consider
Then we come back to the true (random) N
˜ N =
ˆ N=N + N=N
−−a s
as we saw in the proof of Theorem 1 that N −→ N . Now from all this
√ √ n→+
we conclude that nˆ
−
and n˜
N −
both converge to the same
limit in distribution.
Adaptive Multilevel Splitting for Rare Event Analysis 431
If N = log
log
p
is an integer, we combine the reasoning at the end of
the proof of Theorem 1 (i.e., distinction of two cases) and, in each case,
Downloaded By: [Cérou, Frédéric] At: 10:05 6 April 2007
5. NUMERICAL EXAMPLE
sinhx − a
x XH = b = eb−x
sinhb − a
small enough to avoid clipping the process, which could introduce a bias
in the estimation. Examples of trajectories reaching the rare set are given
in Figure 5.
We will illustrate first the a.s. convergence. We ran our algorithm on
the above example with parameters a = 0, b = 12, = −1, x0 = 1, such
that the rare event probability is
≈ 2 412 × 10−10 . Figure 6 gives the
relative error as a function of the number n of particles. For n = 20000
we have an error as low as 5%.
Then we illustrate the asymptotic normality. As we need to run the
algorithm many times to estimate the law of the estimator, we chose
a setting where
is not very small, but about 0 1244. Estimating of
the same probability
1,000 times gives the histogram of Figure 7.
This confirms the fact that the distribution of the estimating values
Figure 9. Theoretical curve and histogram of the numerical values for the exit
time in b.
Adaptive Multilevel Splitting for Rare Event Analysis 435
p
. Finally the total cost
of the algorithm is in On log n operations.
We are now interested in the precision of the estimator. We have
seen above that the variance of
ˆ is
1−p 1−r
=
N
2 2
+
p r
In real life applications, the only parameter that is fixed a priori is the
small probability
to estimate. So the question is: What is the optimal
choice for p? A straightforward study of the variance for p varying
between
and 1 proves that 2 decreases when p goes to one.
This result is intuitively clear and merely says that if we want a
precise estimate for
, we just have to put a lot of intermediate levels.
But, of course, the complexity of the algorithm is then increasing since
log
= p1 p2 pN
1
Unfortunately, he introduces some confusion in his paper by using the term
“importance sampling” for what generally is named “importance splitting.”
Adaptive Multilevel Splitting for Rare Event Analysis 437
Downloaded By: [Cérou, Frédéric] At: 10:05 6 April 2007
2
In this case, most of the computing time is spent with highly correlated
trajectories.
438 Cérou and Guyader
is clear that we get closer and closer to the classical splitting algorithm
with regular levels. Thus, the result obtained by preceding authors on the
classical splitting algorithm shows that the best thing we have to do in
our case is to keep the same couple k n at each step.
For the sake of simplicity, let us suppose again that
= pN , with
R = p1 an integer. Then the variance s2 of the estimator
˜ in classical
multilevel splitting is just the same3 as for our
ˆ :
s2 1−p
2
=N
p
3
Note that the formula in [10] (p. 589) seems to be different, but this is only
due to the first step of their algorithm.
Adaptive Multilevel Splitting for Rare Event Analysis 439
6.4. Generalization
normality of
ˆ are, mainly: The process is Markov, the distribution
function F of its supremum is continuous, the process is in one
dimension.
Even if, for now, we do not have theoretical results in more general
cases, the algorithm itself is quite versatile. In the following, we briefly
present two examples where the above assumptions are not satisfied, but
where first experimentations give promising results.
Examples.
s = X2
s
lim
→+
exists and is finite for some > 0. But in 2 the exact value of is
still unknown, and it is conjectured [14] that = 3/2. In Figure 15, we
present estimates of s as a function of . These were computed for
≤ 150 with n = 150000, and k = 50000. To estimate we decided to
remove the beginning because we are not yet in the asymptotic regime.
Keeping the values for 50 ≤ ≤ 150, and fitting the logarithms with a
simple linear regression, we found 1 511, which is compatible with
the conjecture.
442 Cérou and Guyader
Downloaded By: [Cérou, Frédéric] At: 10:05 6 April 2007
7. CONCLUSION
REFERENCES