HAL Id: inria-00421619
https://hal.inria.fr/inria-00421619
Submitted on 2 Oct 2009
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
SMC with Adaptive Resampling: Large Sample Asymptotics
Elise Arnaud, François Le Gland
To cite this version:
Elise Arnaud, François Le Gland. SMC with Adaptive Resampling: Large Sample Asymptotics. SSP 2009 - 15th IEEE/SP Workshop on Statistical Signal Processing, Aug 2009, Cardiff, United Kingdom.
pp.481-484, �10.1109/SSP.2009.5278533�. �inria-00421619�
SMC WITH ADAPTIVE RESAMPLING: LARGE SAMPLE ASYMPTOTICS Elise Arnaud and Franc¸ois Le Gland ´
Universit´e Joseph Fourier, Laboratoire Jean Kuntzman and INRIA Rhˆone Alpes, France INRIA Rennes and IRISA, France
ABSTRACT
A longstanding problem in sequential Monte Carlo (SMC) is to mathematically prove the popular belief that resam- pling does improve the performance of the estimation (this of course is not always true, and the real question is to clar- ify classes of problems where resampling helps). A more pragmatic answer to the problem is to use adaptive proce- dures that have been proposed on the basis of heuristic con- siderations, where resampling is performed only when it is felt necessary, i.e. when some criterion (effective number of particles, entropy of the sample, etc.) reaches some pre- scribed threshold. It still remains to mathematically prove the efficiency of such adaptive procedures. The contribu- tion of this paper is to propose an approach, based on a rep- resentation in terms of multiplicative functionals (in which importance weights are treated as particles, roughly speak- ing) to obtain the asymptotic variance of adaptive resam- pling procedures, when the sample size goes to infinity. It is then possible to see the impact of the threshold on the asymptotic variance, at least in the Gaussian case, where the resampling criterion has an explicit expressions in the large sample asymptotics.
1. INTRODUCTION
Consider the unnormalized and normalized Feynman–Kac distributions defined on the set E by
γ
n, φ =
E
· · ·
E
φ ( x
n) γ
0( dx
0)
n k=1R
k( x
k−1, dx
k) ,
and
μ
n, φ = γ
n, φ γ
n, 1 , respectively, characterized by
• the nonnegative measure γ
0(dx) ,
• and the nonnegative kernels R
k(x, dx
) ,
for any k = 1 , · · · , n. Associated with this integral repre- sentation is the recurrence relation γ
k= γ
k−1R
k, for any
k = 1 , · · · , n. In full generality, it is always possible to decompose the nonnegative measure
γ
0( dx ) = W
0( x ) p
0( dx ) , (1) in terms of a nonnegative function and a normalized proba- bility distribution, and to decompose the nonnegative kernel R
k( x, dx
) = W
k( x, x
) P
k( x, dx
) , (2) in terms of a nonnegative function and a normalized Markov kernel, for any k = 1 , · · · , n. Given the decompositions (1) and (2) introduce the nonnegative measure and the nonneg- ative kernels
γ
0( dx ) = | W
0( x )|
2p
0( dx ) , and
R
k( x, dx
) = | W
k( x, x
)|
2P
k( x, dx
) ,
respectively, for any k = 1, · · · , n. With the decomposi- tions (1) and (2) is associated the equivalent probabilistic representation
γ
n, φ = E[ φ ( X
n)
n k=0W
k( X
k−1, X
k) ] , (3)
where {X
k, k = 0, 1, · · · , n} is a Markov chain taking val- ues in E and characterized by
• its initial probability distribution p
0( dx ) ,
• and its transition probabilities P
k(x, dx
) ,
for any k = 1, · · · , n, and where W
k(x, x
) is a bounded nonnegative function for any k = 0 , 1 , · · · , n, with the abuse of notation W
0( x, x
) = W
0( x
) to be used throughout the paper for k = 0 .
Starting from the probabilistic representation (3) for the unnormalized distribution, a first type of Monte Carlo ap- proximation can be designed as
γ
n, φ ≈ 1 N
N i=1φ ( ξ
in)
n k=0W
k( ξ
k−1i, ξ
ki) ,
for any bounded measurable function φ, where indepen- dently for any i = 1 , · · · , N
( ξ
0i, · · · , ξ
ni) ∼ p
0( dx
0) P
1( x
0, dx
1) · · · P
n( x
n−1, dx
n) , i.e. ξ
0:ni= ( ξ
i0, · · · , ξ
ni) is distributed as a sample path of the Markov chain. This immediately results in a self–normalized approximation
μ
n≈ μ
Nn= γ
nNγ
Nn, 1 =
Ni=1
w
niδ ξ
ni,
of the normalized distribution, where
w
in∝
n k=0W
k( ξ
k−1i, ξ
ik) ,
in terms of a weighted empirical probability distribution characterized by sample positions (ξ
n1, · · · , ξ
nN) and sample weights (w
n1, · · · , w
Nn) , which can be computed recursively as follows
ξ
ki∼ P
k( ξ
k−1i, dx
) and w
ik∝ w
k−1iW
k( ξ
k−1i, ξ
ki) , for any i = 1 , · · · , N. There is a well–known limitation with this first type of Monte Carlo approximation: indeed, it appears after some iterations that a few sample paths have a significant weight, while the remaining sample paths have a negligible weight, which means that only a few sample paths are really contributing to the approximation. A possi- ble remedy to this degeneracy problem is to use the weights to resample from the current weighted empirical probability distributions, so that samples with a high weight are repli- cated while samples with a low weight are discarded. Sev- eral different resampling strategies are available. Resam- pling introduces extra additional randomness in the sample.
A more efficient implementation is to resample not at each iteration of the algorithm, but only at some time instants, and the question that naturally arises is when to resample, i.e. how to take the decision to resample. Two different ap- proaches can be used to address this issue:
• evaluate the asymptotic variance, as the sample size N goes to infinity, of the algorithm that resamples at some prescribed times 0 ≤ t
1< · · · < t
p< n, and optimize the expression of the asymptotic vari- ance with respect to the number and the location of the resampling times,
• evaluate the asymptotic variance, as the sample size N goes to infinity, of the adaptive algorithm [1] that resamples at times where some heuristic rule is met, e.g. if the effective sample size (or alternatively the entropy of the weighted sample) drops below a pre- scribed level, and study the impact of the prescribed level, or threshold.
The first approach relies on considering the Markov chain with values in path space, and has been presented at the workshop on adaptive Monte Carlo methods held in Fleu- rance in July 2007. The second approach relies on con- sidering another Markov chain, with values in the (space, weight) product space, along the lines of [2].
2. NONLINEAR ADAPTIVE MARKOV MODEL On the product set E × [0 , ∞) , define the probability distri- bution
μ
e0( dx
0, dv
0) = p
0( dx
0) δW
0(x
0)( dv
0) , and the nonnegative kernels
R
ek( μ, x, v, dx
, dv
) =
= g
k−1e(μ, v) P
k(x, dx
) δf
ke(μ, x, v, x
)( dv
)
P
ke( μ, x, v, dx
, dv
) ,
where
f
ke(μ, x, v, x
) =
⎧ ⎨
⎩
v W
k( x, x
) , if μ ∈ D, W
k(x, x
) , if μ ∈ D, and
g
ek−1(μ, v) =
⎧ ⎨
⎩
1 , if μ ∈ D, v , if μ ∈ D, by definition, so that the identity
g
ek−1( μ, v ) f
ke( μ, x, v, x
) = v W
k( x, x
) , holds for any k = 1 , · · · , n. Clearly
R
ek( μ, x, v, dx
, dv
) =
= 1(μ ∈ D ) P
k(x, dx
) δv W
k(x, x
)( dv
)
P
kimp( x, v, dx
, dv
) + 1( μ ∈ D ) v P
k( x, dx
) δW
k( x, x
)( dv
)
R
redk( x, v, dx
, dv
)
,
and define
g
k−1red( x, v ) = v , and
P
kred(x, v, dx
, dv
) = P
k(x, dx
) δW
k( x, x
)( dv
) , for any k = 1 , · · · , n. Let e ( v ) = v for any v ≥ 0 , by definition.
482 2009 IEEE/SP 15th Workshop on Statistical Signal Processing
Example 2.1. For instance
D = { μ ∈ P : μ, 1 ⊗ e
2μ, 1 ⊗ e
2≥ c } , where c ≥ 1 is a threshold to be fixed.
Consider next the unnormalized and normalized Feynman–
Kac distributions defined on the product set E × [0 , ∞) by γ
ne, F =
E
∞0
· · ·
E
∞0
F (x
n, v
n) μ
e0(dx
0, dv
0)
nk=1
R
ek( μ
ek−1, x
k−1, v
k−1, dx
k, dv
k) and
μ
en, F = γ
en, F γ
ne, 1 ,
respectively. Associated with this integral representation is the recurrence relation
γ
ke= γ
k−1eR
ek(μ
ek−1)
= μ
ek−1R
ek( μ
ek−1) γ
k−1e, 1
= [ 1(μ
ek−1∈ D) μ
ek−1P
kimp+ 1(μ
ek−1∈ D) ( g
k−1redμ
ek−1) P
kred] γ
k−1e, 1 , for the unnormalized distribution, hence
γ
ek, 1 = [ 1( μ
ek−1∈ D )
+ 1( μ
ek−1∈ D ) μ
ek−1, g
k−1red] γ
k−1e, 1 , for the normalizing constant, and
μ
ek= 1(μ
ek−1∈ D ) μ
ek−1P
kimp+ 1( μ
ek−1∈ D ) ( g
k−1red· μ
ek−1) P
kred, for the normalized distribution, for any k = 1, · · · , n.
The Feynman–Kac distributions for the more general nonlinear Markov model defined on the product set E × [0 , ∞) includes as a special case the Feynman–Kac distri- butions for the original model and other unnormalized dis- tributions defined on the set E, for appropriate choices of test functions. Indeed, introduce
Γ
k= γ
ke, 1 , γ
k(1), φ = γ
ek, φ ⊗ e and
γ
k(2), φ = γ
ke, φ ⊗ e
2,
and notice that the statistics that governs redistribution can also be expressed in terms of normalizing constants as fol- lows
c
k= μ
ek, 1 ⊗ e
2μ
ek, 1 ⊗ e
2= γ
k(2), 1 Γ
kγ
k, 1
2, (4) for any k = 0 , 1 , · · · , n.
Proposition 2.2. The two unnormalized distributions de- fined above satisfy the following recurrence relations
γ
(1)k= γ
k−1(1)R
k,
with initial condition γ
(1)0= γ
0, which implies that γ
k(1)= γ
k, and
γ
k(2)= 1( μ
ek−1∈ D ) γ
k−1(2)R
k+ 1( μ
ek−1∈ D ) γ
k−1R
k, with initial condition γ
0(2)= γ
0, and the normalizing con- stant satisfies the following recurrence relation
Γ
k= 1( μ
ek−1∈ D ) Γ
k−1+ 1( μ
ek−1∈ D ) γ
k−1, 1 , with initial condition Γ
0= 1 .
3. PARTICLE APPROXIMATION Introducing a particle approximation of the form
μ
ek−1≈ μ
e,Nk−1= 1 N
N i=1δ ( ξ
k−1i, v
ik−1) ,
yields : if μ
e,Nk−1∈ D, then μ
e,Nk−1P
kimp(dx
, dv
) =
= 1 N
N i=1P
k( ξ
k−1i, dx
) δ v
ik−1W
k( ξ
k−1i, x
)( dv
)
m
ik( dx
, dv
)
,
hence the approximation
μ
e,Nk= 1 N
N i=1δ (ξ
ki, v
ik) ,
where independently for any i = 1, · · · , N the random vari- able (ξ
ik, v
ki) is distributed according to m
ik(dx
, dv
) , or equivalently
ξ
ki∼ P
k( ξ
k−1i, dx
) and v
ik= v
ik−1W
k( ξ
k−1i, ξ
ki) , and if μ
e,Nk−1∈ D, then
g
k−1red· μ
e,Nk−1=
Ni=1
w
ik−1δ (ξ
k−1i, v
k−1i) ,
with w
k−1i∝ v
ik−1, and
( g
redk−1· μ
e,Nk−1) P
kred( dx
, dv
) =
=
N i=1w
ik−1P
k( ξ
k−1i, dx
) δ W
k( ξ
k−1i, x
)( dv
)
¯
m
k( dx
, dv
)
,
hence the approximation
μ
e,Nk= 1 N
N i=1δ (ξ
ik, v
ki) ,
where independently for any i = 1 , · · · , N the random vari- able ( ξ
ik, v
ki) is distributed according to m ¯
k( dx
, dv
) , or equivalently
ξ
ik∼ P
k(ξ τ
kik−1
, dx
) and v
ik= W
k(ξ τ
kik−1
, ξ
ik) , where the index τ
ki∈ {1 , · · · , N } is selected according to the weights ( w
k−11, · · · , w
Nk−1) , e.g. using multinomial sampling or some other sampling strategy from a discrete probability distribution.
Example 3.1. Notice that the effective sample size can be expressed in terms of the proposed particle approximation.
Indeed
μ
e,Nk−1, 1 ⊗ e
2μ
e,Nk−1, 1 ⊗ e
2=
1 N
N i=1| v
k−1i|
2| 1 N
N i=1v
k−1i|
2= N
N i=1|w
k−1i|
2,
i.e. μ
e,Nk−1, 1 ⊗ e
2μ
e,Nk−1, 1 ⊗ e
2= N
N
eff≥ 1 , where N
effis the effective sample size.
This particle approximation for the nonlinear adaptive Markov model coincides with the classical particle approx- imation [1] with adaptive resampling, and it satisfies a CLT that can be proved using standard techniques.
4. CENTRAL LIMIT THEOREM
Let T = {t
1, · · · , t
p} ⊂ {0, 1, · · · , n − 1} denotes the set of redistribution times, i.e. t ∈ T iff μ
et∈ D, and let T
+= T∪{n} = {t
1, · · · , t
p, t
p+1} with t
p+1= n by convention.
Theorem 4.1. For the normalization constant and for the normalized distribution, it holds
√ N [ γ
nN, 1
γ
n, 1 − 1 ] = ⇒ N(0 , V
n) ,
and √
N μ
Nn− μ
n, φ = ⇒ N(0 , v
n( φ )) ,
in distribution as N ↑ ∞ , for any bounded measurable function φ, with the following expression for the asymptotic variances
V
n=
t∈T+
[ μ
(2)t, | R
t+1:n1|
2μ
t, R
t+1:n1
2c
t− 1 ] , and
v
n(φ) =
t∈T+