An Introduction to Sequential Monte Carlo for Filtering and Smoothing
Olivier Capp´e
LTCI, TELECOM ParisTech & CNRS http://perso.telecom-paristech.fr/∼cappe/
Acknowlegdment: Eric Moulines (TELECOM ParisTech) & Tobias Ryd´en (Lund)
Bayesian Dynamic Models
1 Bayesian Dynamic Models
Hidden Markov Models and State-Space Models Extensions
2 The Filtering and Smoothing Recursions
3 Sequential Importance Sampling
4 Sequential Importance Sampling with Resampling
5 The Auxiliary Particle Filter
6 Smoothing
7 Mixture Kalman Filter
Bayesian Dynamic Models Hidden Markov Models and State-Space Models
Hidden Markov Model (HMM)
The Hidden State Process {Xk}k≥0 is a Markov chainwith initial probability density function (pdf)t0(x)and transition density function t(x, x0) such that*
p(x0:k) =t0(x0)
k−1
Y
l=0
t(xl, xl+1).
The Observed Process {Yk}k≥0 is such that the conditional joint density ofy0:k given x0:k has the conditional independence (product) form
p(y0:k|x0:k) =
k
Y
l=0
`(xl, yl).
*x0:kdenotes the collectionx0, . . . , xk.
Bayesian Dynamic Models Hidden Markov Models and State-Space Models
Graphical Representation of the Dependence Structure
The HMM can be represented pictorially by aBayesian network which depicts the conditional independence relations:
· · · -
-
- · · ·
?
?
Xk Xk+1
Yk Yk+1
Bayesian Dynamic Models Hidden Markov Models and State-Space Models
State-Space Form
Here the model is described in a functional form:
Xk+1 =a(Xk, Uk), Yk =b(Xk, Vk),
where{Uk}k≥0 and {Vk}k≥0 are mutually independent i.i.d.
sequences of random variables (also independent ofX0).
Remark
The termstate-space modeloften refers to the case where aandb are linear functions of their arguments (and{Uk},{Vk},X0 are jointly Gaussian).
Likewise, the termHMM is sometimes used (not in this talk!) more restrictively for the case where X is a finite set.
Bayesian Dynamic Models Hidden Markov Models and State-Space Models
HMM Examples
- finite state space
6
continuous state space
ergodicleft-right
speech recognition handwritting recognition source coding
ion channel modelling
trajectory models tracking, vision quantitative finance
Bayesian Dynamic Models Extensions
Beyond HMMs
For sequential Monte Carlo methods, the key point is thestructure of the conditionalp(x0:k|y0:k): the methods described in this talk directly apply in cases where the conditional may be factored as
p(x0:k|y0:k) =p(x0|y0)
k−1
Y
l=0
p(xl+1|xl, y0:l+1)
Example: Switching Autoregressive Model If the observation equation is replaced by
Yk =b(Xk)Yk−1+Vk thenp(x0:k|y0:k) =p(x0|y0)Qk−1
l=0 p(xl+1|xl, yl+1, yl)
Bayesian Dynamic Models Extensions
Hierarchical HMMs
· · · -
-
- · · ·
Ck Ck+1
R R
? ?
· · · -
-
- · · ·
Zk Zk+1
? ?
Yk Yk+1
Bayesian network for a hierarchical HMM: The state sequence{Xk}is composed of two chains{Ck}and{Zk}that evolve in parallel and the observationYkdepends on both component of the chain.
Bayesian Dynamic Models Extensions
In hierarchical HMMs, sequential Monte Carlo is applicable to the marginalized posteriorp(c0:k|Y0:k) (despite the fact that it doesn’t factorizes as previously requested due to the marginalization of Z0:k) when
p(zk|c0:k, y0:k) is computable Conditionally Gaussian Linear State-Space Model
Dynamic equation
Zk+1=A(Ck+1)Zk+R(Ck+1)Uk
Observation equation
Yk=B(Ck)Zk+S(Ck)Vk where{Ck} is a finite-valued Markov Chain.
The Filtering and Smoothing Recursions
1 Bayesian Dynamic Models
2 The Filtering and Smoothing Recursions Basic Recursions
Computational Filtering and Smoothing Approaches 3 Sequential Importance Sampling
4 Sequential Importance Sampling with Resampling
5 The Auxiliary Particle Filter
6 Smoothing
7 Mixture Kalman Filter
The Filtering and Smoothing Recursions Basic Recursions
Tasks of interest for HMMs
State Inference How to make probabilistic statements on the state sequence given the model and the observations?
Filtering πk|k(xk) =p(xk|Y0:k) Prediction πk+1|k(xk+1) =p(xk+1|Y0:k) Smoothing π0:k|k(x0:k) =p(x0:k|Y0:k)
(fixed-interval: πl|k for l= 0, . . . , k;
fixed-lag: πk|k+∆ fork= 0, . . .) Parameter Inference How to tune the model parameters based on
the observations?
The Filtering and Smoothing Recursions Basic Recursions
Recursive Structure of the Joint Smoothing Density
By Bayes’ rule π0:k+1|k+1(x0:k+1)
= (Lk+1(Y0:k+1))−1 t0(x0)
k
Y
l=0
t(xl, xl+1)
k+1
Y
l=0
`(xl, Yl)
=
Lk+1(Y0:k+1) Lk(Y0:k)
−1
π0:k|k(x0:k)t(xk, xk+1)`(xk+1, Yk+1), where the normalization constantsLk, i.e., thelikelihood of the observations, is usually not computable.
The Filtering and Smoothing Recursions Basic Recursions
The Joint Smoothing Recursion
π0:k+1|k+1(x0:k+1) =
Lk+1 Lk
−1
π0:k|k(x0:k)t(xk, xk+1)`(xk+1, Yk+1)
Themarginal recursionmay be decomposed in two steps:
Prediction
πk+1|k(xk+1) = Z
πk|k(xk)t(xk, xk+1)dxk Filtering
πk+1|k+1(xk+1) =
Lk+1 Lk
−1
πk+1|k(xk+1)`(xk+1, Yk+1)
The Filtering and Smoothing Recursions Computational Filtering and Smoothing Approaches
Exact Implementation of the Filtering and Smoothing Recursions
When X is finite (Baumet al., 1970) The associated
computational cost is|X|2 per time index (for the filtering part).
In linear Gaussian state-space models (Kalman & Bucy, 1961) The filtering and prediction recursion is implemented by theKalman filter (Lk+1/Lk is interpreted as the likelihood of the (k+ 1)-th innovation).
Suchfinite dimensional filtersexist only in very specific models (see, e.g., Runggaldier & Spizzichino, 2001)
The Filtering and Smoothing Recursions Computational Filtering and Smoothing Approaches
Approximate Implementations of the Filtering and Smoothing Recursions
EKF (Extended Kalman Filter) Linearization-based approach (for non-linear Gaussian state space models)
UKF (Unscented Kalman Filter, Julier & Uhlmann, 1997) Point-based approach
Variational Methods (e.g., Valpola & Karhunen, 2002)Based on parametric density approximation arguments.
Exact Suboptimal Filters In particular, Kalman filter viewed as minimum mean square error linearfiltering.
The Filtering and Smoothing Recursions Computational Filtering and Smoothing Approaches
Sequential Monte Carlo (SMC)
Sequential Monte Carlo (sometimes calledparticle filtering) is a method which uses pseudo-random simulations to produce successive populations of weighted “particles” Xk1:n and associated weights Wk1:n such that
n
X
i=1
Wkif(Xki)≈ Z
f(x)πk|k(x)dx , for all functions f of interest.
The SMC process is sequential in the sense that given Xk1:n, Wk1:n and the observationsY0:k+1,Xk+11:n andWk+11:n are conditionally independent of previous populations of particles.
SMC is based on importance sampling and resampling.
Sequential Importance Sampling
1 Bayesian Dynamic Models
2 The Filtering and Smoothing Recursions
3 Sequential Importance Sampling
Self-Normalized Importance Sampling Sequential Importance Sampling (SIS) Weight Degeneracy
4 Sequential Importance Sampling with Resampling
5 The Auxiliary Particle Filter
6 Smoothing
7 Mixture Kalman Filter
Sequential Importance Sampling Self-Normalized Importance Sampling
Self-Normalized Importance Sampling, or IS (Hammersley
& Handscomb, 1964)
IS is a weighted form of Monte Carlo approximation, in which expectations under a target pdfπ
π(f) = Eπ[f(X)]
are estimated as ˆ πqn(f) =
n
X
i=1
ωi Pn
j=1ωj
| {z }
Wi
f(Xi),
where
Xi ∼iid q, ωi = πq(Xi).
This form of IS (sometimes also called Bayesian IS) does not necessitate thatπ be properly normalized.
Sequential Importance Sampling Self-Normalized Importance Sampling
Performance of IS
Assuming thatEπ[πq(X)(1 +f2(X))]<∞,πˆqn(f)is consistent and asymptotically normal, withasymptotic variance given by
υq(f) = Eπ π
q(X)(f(X)−π(f))2
.
The asymptotic variance can be estimated from the IS sample by ˆ
υnq(f) =n
n
X
i=1
(Wi)2{f(Xi)−πˆnq(f)}2 , whereWi =ωi/Pn
j=1ωj are the normalized weights.
Sequential Importance Sampling Self-Normalized Importance Sampling
Elements of proof
ConsistencyIfπ¯=π/c, wherec=R
π(x)dx, is a pdf, Eq[ωif(Xi)] = Eq
π
q(X)f(X)
=cEπ¯[f(X)].
CLT
√n(ˆπqn(f)−π(f¯ )) =
√1 n
Pn
i=1ωi(f(Xi)−π(f¯ ))
1 n
Pn j=1ωj
andVar[ωi(f(Xi)−π(f))] =c2E¯π[π¯q(X)(f(X)−π(f))2].
Empirical estimate
ˆ υqn(f) =
n
X
i=1
Wi
¯ π q(Xi)
1 n
Pn j=1 π¯
q(Xj){f(Xi)−πˆqn(f)}2
Sequential Importance Sampling Self-Normalized Importance Sampling
Empirical diagnostic tools for IS
Effective sample size
ESSnq =
" n X
i=1
Wi2
#−1
1 1≤ESSnq ≤n
2 n/ESSnq is an estimate of Eπ[πq(X)]which is the maximal IS asymptotic variance for functions f such that
|f(x)−π(f)| ≤1(note: these functions have maximal Monte Carlo variance of 1 under π ).
3 n/ESSnq−1 is an estimator of theχ2 divergence Z (π(x)−q(x))2
q(x) dx
(n/ESSnq−1is also the square of the empirical coefficient of variation associated with the normalized weights).
Sequential Importance Sampling Self-Normalized Importance Sampling
Another Empirical diagnostic tools for IS
(Shannon) Entropy of the Normalized Weights
ENTnq =−
N
X
i=1
Wilog(Wi)
1 1≤exp(ENTnq)≤n
2 exp(ENTnq)/nis an estimate of exp[−K(π||q)], where K(π||q)is the Kullback-Leibler divergence between π and q.
Sequential Importance Sampling Self-Normalized Importance Sampling
Optimal IS Instrumental Density
When considering large classes of functionsf, IS generally appears to perform worse than Monte Carlo underπ (e.g., for functions such that|f(x)−π(f)| ≤1, the maximal Monte Carlo asymptotic variance is 1 whereas, the maximal IS asymptotic variance is Eπ[πq(X)] which is strictly larger than 1 by Jensen’s inequality, unlessq=π).
However, for a fixed functionf, the optimal IS density is notπ but
|f(x)−π(f)|π(x) R |f(x0)−π(f)|π(x0)dx0 as Cauchy-Schwarz inequality implies that
Eπ π
q(X)(f(X)−π(f))2
Eπhq π(X)i
| {z }
=1
≥E2π[|f(X)−π(f)|] .
Sequential Importance Sampling Self-Normalized Importance Sampling
Example (Bayesian Posterior) Consider a simple model in which
X0∼N(1,22), Y0|X0∼N(X02, σ2).
And apply IS to compute expectations under the posterior for Y0 = 2, using the prior as instrumental pdf.
Sequential Importance Sampling Self-Normalized Importance Sampling
−6 −4 −2 0 2 4 6 8
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
Instrumental Target
σ= 2 ESSnq /n 0.66 υq(x7→x) 1.69 υπ(x7→x) 1.25
−6 −4 −2 0 2 4 6 8
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Instrumental Target
σ = 0.5 ESSnq/n 0.25 υq(x7→x) 5.35 υπ(x7→x) 1.25
Sequential Importance Sampling Sequential Importance Sampling (SIS)
Back to the Filtering and Smoothing Problem
How to estimate expectations under the posterior π0:k|k(x0:k) =p(x0:k|Y0:k) in the model
p(x0:k) =t0(x0)
k−1
Y
l=0
t(xl, xl+1),
p(y0:k|x0:k) =
k
Y
l=0
`(xl, yl), using a sequential algorithm ?
Sequential Importance Sampling Sequential Importance Sampling (SIS)
Sequential Smoothing through IS, or SIS (Handschin & Mayne, 1969-1970)
1 Propose nindependent particle trajectories {X0:k+1i }1≤i≤n under a Markovian scheme such that
p(x0:k+1) =ρ0:k+1(x0:k+1) =q0(x0)
k
Y
l=1
ql(xl, xl+1).
2 Computeimportance weights sequentially:
ωk+1i = π0:k+1|k+1(X0:k+1i )
ρ0:k+1(X0:k+1i ) =ωik×t(Xki, Xk+1i )`(Xk+1i , Yk+1) qk(Xki, Xk+1i ) . Then,
n
X
i=1
ωik+1 Pn
j=1ωk+1j f(X0:k+1i ) is an estimate ofE [f(X0:k+1)|Y0:k+1].
Sequential Importance Sampling Sequential Importance Sampling (SIS)
FILT.
INSTR.
FILT. +1
One step of the SIS algorithm with just seven particles.
Sequential Importance Sampling Sequential Importance Sampling (SIS)
Choice of the Instrumental Kernel
The so-called “optimal” choice ofqk, consists in setting qk(x, x0) =qoptk (x, x0) = t(x, x0)`(x0, Yk+1)
R t(x, x00)`(x00, Yk+1)dx00
for which the normalized weights are predictable, i.e. Wk+1i depend onXki but not Xk+1i (hence, for this instrumental kernel the conditional variance cancels).
This is however usually not feasible and common choices include
1 the prior qk=t (and thenωki ∝`(Xki, Yk))
2 approximations (sometimes heuristic) to qkopt(moment matching, use of EKF or UKF, . . . ),
3 tuning parameters of qk so as to maximize theeffective sample sizeor entropy criterions
Sequential Importance Sampling Weight Degeneracy
Weight Degeneracy
Empirically,the SIS approach always fail when the time-horizonk is more than a few tens; the IS weights ωk1:n usually become very unbalanced with a few weights dominating all the other
To understand why it is the case, consider the(silly) model where (t(x, x0) =t(x0) =t0(x0), (Independent states)
`(x, y) =`(y), (Non-informative observations) and the instrumental kernel is such thatql(x, x0) =q0(x0) =q(x0)
Sequential Importance Sampling Weight Degeneracy
Weight Degeneracy (Contd.)
For a function of interestf that only depends on the last
coordinatexk of the trajectoryx0:k, the asymptotic variance of the SIS approximation toπk|k(f) = Eπk|k[f(X)]is given by
υk(f)= Z
· · · Z k
Y
l=0
t q(xl)
!2
f(xk)−πk|k(f)2 k
Y
l=0
q(xl)dx0. . . dxk
= Z
t
q(x)t(x)dx
| {z }
>1
kZ t
q(x) f(x)−πk|k(f)2
t(x)dx .
In practise, this situation can usually be detected by monitoring the effective sample sizeor entropycriterions, which become
abnormally small.
Sequential Importance Sampling Weight Degeneracy
Application to the Stochastic Volatility Model
This is a non-linearly observed state-space model used to represent log-returns in quantitative finance:
Xk+1 =φXk+σUk |φ|<1, Yk=βexp(Xk/2)Vk,
where
{Uk}k≥0 and{Vk}k≥0 are independent standard Gaussian white noise processes.
X0∼N(0, σ2/(1−φ2).
We consider trajectories of length 50 simulated under the model withφ= 0.98,σ = 0.17and β= 0.64 and apply SIS withqk =q andn= 1,000particles.
Sequential Importance Sampling Weight Degeneracy
Typical Evolution of ESS for a Single Trajectory
1 5 10 15 20 25 30 35 40 45 50
0 100 200 300 400 500 600 700 800 900 1000
ESS (500 indep. replications of the particles)
time index
Sequential Importance Sampling Weight Degeneracy
Evolution of ESS When Averaging Over Trajectories
1 5 10 15 20 25 30 35 40 45 50
0 100 200 300 400 500 600 700 800 900 1000
ESS (500 indep. simulations)
time index
Sequential Importance Sampling Weight Degeneracy
Typical Evolution of the Weight Distribution
−250 −20 −15 −10 −5 0
500 1000
−250 −20 −15 −10 −5 0
500 1000
−250 −20 −15 −10 −5 0
50 100
Importance Weights (base 10 logarithm)
Histograms of the base 10 logarithm of the normalized impor- tance weights after (from top to bottom) 1, 10 and 100 iterations for the stochastic volatility model(improved instrumental kernel based on moment matching).
Sequential Importance Sampling Weight Degeneracy
Filtering Results
−1.5 −2
−0.5 −1 0.5 0 1.5 1 2 0
5
10
15
20 0
0.02 0.04 0.06 0.08
State Time Index
Density
Waterfall representation of the sequence of estimated filtering pdfs (weighted kernel smooth) together with the actual state for 1,000 particles(improved instrumental kernel based on moment matching).
Sequential Importance Sampling with Resampling
1 Bayesian Dynamic Models
2 The Filtering and Smoothing Recursions
3 Sequential Importance Sampling
4 Sequential Importance Sampling with Resampling Sampling Importance Resampling
Sequential Importance Sampling with Resampling (SISR) Case Study
5 The Auxiliary Particle Filter
6 Smoothing
7 Mixture Kalman Filter
Sequential Importance Sampling with Resampling Sampling Importance Resampling
In IS, it is indeed possible toreset the weights to a constant value at the price of a, usually moderate, increase in variance.
Sampling Importance Resampling (Rubin, 1987)
Replace{X1:n, W1:n} by{X˜1: ˜N,W˜1: ˜N} such that the discrepancy between the resampled weights{W˜1: ˜N} is reduced and
PN˜
i=1W˜kiδX˜i is a good approximation toPn
i=1WiδXi.
In general the resampling is random and subject to the constraints
N˜ =n , W˜i= 1/N ,˜ E
hPN˜
i=11{X˜i=Xj}
X1:n, W1:n i
= ˜N Wj (1≤j ≤n).
The last condition is often referred to asunbiasedness or proper weighting.
Sequential Importance Sampling with Resampling Sampling Importance Resampling
Multinomial Resampling
1 Draw, conditionally independently given{X1:n, W1:n},n discrete random variables(J1, . . . , Jn) taking their values in the set{1, . . . , n}with probabilities(W1, . . . , Wn).
2 Set, fori= 1, . . . , n,X˜i =XJi andW˜i= 1/n.
Let Ci=Pn
j=11{X˜j =Xi} (i= 1, . . . , n) denote the number of times each particle is duplicated in the resampling process. The counts (C1, . . . , Cn) follow a multinomial distribution with parameters n, (W1, . . . , Wn), conditionally to {X1:n, W1:n} .
INSTRUMENTAL
TARGET
Resampled particles
Sequential Importance Sampling with Resampling Sampling Importance Resampling
Some Results on SIR
1 X˜i −→D π are n→ ∞ (some extensions of this result)
2 1 n
Pn
i=1f( ˜Xi)is an asymptotically normal estimator ofπ(f) (assumingEπ[πq(X)(1 +f2(X)) +f2(X)]<∞) with asymptotic variance given by
˜
υq(f) = Eπ π
q(X) (f(X)−π(f))2
| {z }
υq(f)
+Eπh
(f(X)−π(f))2i
| {z }
Varπ[f(X)]
Ifnis sufficiently large, the cost of resampling is very moderate in situation that are challenging for IS, i.e., when
υq(f)Varπ[f(X)].
Sequential Importance Sampling with Resampling Sampling Importance Resampling
Elements of Proof
For a bounded function f, E
h f( ˜Xi)
i
= E h
E
f( ˜Xi)
X1:n, W1:n i
= E
n
X
j=1
Wjf(Xj)
andPn
j=1Wjf(Xj)−→a.s. π(f) (|Pn
j=1Wjf(Xj)| ≤ kfk∞).
For the CLT, an ingredient of the proof is that nVar
"
1 n
n
X
i=1
f( ˜Xi)
#
=nVar
" n X
i=1
Wif(Xi)
#
| {z }
→υq(f)
+ E
" n X
i=1
Wif2(Xi)−
n
X
i=1
Wif(Xi)
!2
| {z }
−→Vara.s. π[f(X)]
#
Sequential Importance Sampling with Resampling Sampling Importance Resampling
There Exists Resampling Schemes with Reduced Variance
Residual Resampling Fori= 1, . . . , nset
Ci= nWi
+ ¯Ci ,
whereC¯1, . . . ,C¯n are distributed, conditionally toW1:n, according to the multinomial distributionMult(n−R,W¯1, . . . ,W¯n) with R=Pn
i=1bnWic and
W¯i= nWi− bnWic
n−R , i= 1, . . . , n .
This scheme is obviously unbiased and induces an additional variance which is always lower thanVarπ[f](the same is true for the bootstrap filter based on residual resampling, Douc &
Moulines, 2005).
Sequential Importance Sampling with Resampling Sequential Importance Sampling with Resampling (SISR)
The Simplest Functional Algorithm (Gordon et al., 1993)
Regular resampling is added to avoidweight degeneracy and to guarantee the long-term (k→ ∞) stability of the particle filter.
The Bootstrap filter
1 Given X˜k1:n, propose new positions Xk+1i independentlyunder the prior dynamic t( ˜Xki,·), for i= 1, . . . , n;
2 Compute the weightsωik+1=`(Xk+1i , Yk+1), for i= 1, . . . , n and normalize them (Wk+1i =ωik+1/Pn
j=1ωk+1j );
3 Resampleto obtain X˜k+11:n, e.g., by drawing independent indicesJk+1i such thatP Jk+1i =j
Wk+11:n
=Wk+1j and setting X˜k+1i =XJ
i k+1
k+1 (Multinomial Resampling).
To avoid unnecessary resampling,3 can be used only when the ESS statistics of the weightsWk+11:n falls below a threshold.
Sequential Importance Sampling with Resampling Sequential Importance Sampling with Resampling (SISR)
FILT.
FILT. +1
FILT. +2
FILT.
FILT. +1
FILT. +2
SIS (left) and SISR (right).
Sequential Importance Sampling with Resampling Case Study
AR(1) Model Observed in Pulsated Noise
We consider a simple Gaussian linear state-space model to allow for comparison with analytical computations:
Xk+1=φXk+σUk , Yk=Xk+ηkVk,
whereφ= 0.99,σ = 0.2, andηk is varied(periodically)between 0.1 and3.
The observation sequence is simulated from the model but we also consider the influence of an out-of-modeloutlying observationat time 460.
Sequential Importance Sampling with Resampling Case Study
0 50 100 150 200 250 300 350 400 450 500
−5 0 5 10
time
state obs.
filt.
smooth.
10 20 30 40 50 60 70 80 90 100
−6
−4
−2 0 2 4 6
time
state obs.
filt.
smooth.
400 410 420 430 440 450 460 470 480 490 500
−8
−6
−4
−2 0 2 4 6 8 10
time
state obs.
filt.
smooth.
Sequential Importance Sampling with Resampling Case Study
Sequential Importance Sampling (prior kernel, n = 100)
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0
0.5 1 1.5
filt. std.
time
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
−2 0 2
filt. mean
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0
50 100
ESS
Sequential Importance Sampling with Resampling Case Study
Bootstrap Filter (prior kernel, n = 100, resamp. ESS < 10)
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0
0.5 1 1.5
filt. std.
time
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
−2 0 2
filt. mean
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 20
40 60 80 100
ESS
Sequential Importance Sampling with Resampling Case Study
Bootstrap Filter (prior kernel, n = 100, resamp. always)
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
−2 0 2
filt. mean
10 20 30 40 50 60 70 80 90 100
20 40 60 80 100
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0
0.5 1 1.5
filt. std.
time
Sequential Importance Sampling with Resampling Case Study
Bootstrap Filter (prior kernel, n = 100, resamp. ESS < 10)
400 410 420 430 440 450 460 470 480 490 500
0 0.5 1
filt. std.
time
400 410 420 430 440 450 460 470 480 490 500
0 2 4
filt. mean
20 40 60 80 100
ESS
Sequential Importance Sampling with Resampling Case Study
Bootstrap Filter (n = 5, 000, resamp. ESS < 500)
400 410 420 430 440 450 460 470 480 490 500
0 0.5 1
filt. std.
time
400 410 420 430 440 450 460 470 480 490 500
0 2 4
filt. mean
1000 2000 3000 4000 5000
ESS
Sequential Importance Sampling with Resampling Case Study
Conclusions
Withresampling, SISR achieveslong-term stability.
The increase in variance due to resampling is very moderate, especially when resampling is applied only when needed.
The method is still sensitive to outliers, model
misspecification, etc., which may necessitate the use of more elaborate instrumental kernels.
The Auxiliary Particle Filter
1 Bayesian Dynamic Models
2 The Filtering and Smoothing Recursions
3 Sequential Importance Sampling
4 Sequential Importance Sampling with Resampling
5 The Auxiliary Particle Filter A New Interpretation of SISR The Auxiliary Trick
6 Smoothing
7 Mixture Kalman Filter
The Auxiliary Particle Filter A New Interpretation of SISR
Alternatives to SISR
The resampling step in the SISR algorithm can be seen as a method to sample approximately under the distribution obtained when plugging the current particle approximation into the filtering update.
This alternative way of thinking about resampling suggests several sequential Monte Carlo variants.
The Auxiliary Particle Filter A New Interpretation of SISR
The Filtering Recursion Revisited
Recall that(with more general notations) πk+1|k+1(f) =
R R f(x0)πk|k(dx)t(x, dx0)`(x0, Yk+1) R R πk|k(dx)t(x, dx0)`(x0, Yk+1) . Now, consider what happens when considering theempirical filtering distribution
ˆ πk|kn =
n
X
i=1
WkiδXi k
as an approximation toπk|k and plugging it into the previous relation.
The Auxiliary Particle Filter A New Interpretation of SISR
Filtering Target
One obtains a mixture target pdf πnk+1|k+1(x) =
Pn
i=1Wkit(Xki, x)`(x, Yk+1) Pn
i=1
R Wkit(Xki, x)`(x, Yk+1)dx
=
n
X
i=1
Wkiγk(Xki) Pn
j=1Wkjγk(Xkj)qkopt(Xki, x), where
γk(x) = Z
t(x, dx0)`(x0, Yk+1), qkopt(x, x0) = t(x, x0)`(x0, Yk+1)
γk(x) .
The Auxiliary Particle Filter A New Interpretation of SISR
Filtering Step
−100 −8 −6 −4 −2 0 2 4 6 8 10
0.1 0.2 0.3 0.4
−100 −8 −6 −4 −2 0 2 4 6 8 10
0.1 0.2 0.3 0.4
−100 −8 −6 −4 −2 0 2 4 6 8 10
0.1 0.2 0.3 0.4
Prediction
Filtering
likelihood predictive distribution
Approximation
−100 −8 −6 −4 −2 0 2 4 6 8 10
0.05 0.1 0.15 0.2
−100 −8 −6 −4 −2 0 2 4 6 8 10
0.1 0.2 0.3 0.4
−100 −8 −6 −4 −2 0 2 4 6 8 10
0.2 0.4 0.6 0.8
Prediction
Filtering
likelihood predictive distribution
The Auxiliary Particle Filter A New Interpretation of SISR
The previous reinterpretation of SISR shows that resampling and trajectory update can be integrated into a singleglobal IS step that targetsπk+1|k+1n .
But,
how to chose the global instrumental kernel qglobk ((Xk1:n, Wk1:n), x)?
as πk+1|k+1n is an n-mixture density, evaluation of the IS weights may necessitate of the order ofn2 computations.
RemarkThe global instrumental kernel that corresponds to the bootstrap filter(with resampling)is
qglobk ((Xk1:n, Wk1:n), x) =
n
X
i=1
Wkit(Xki, x).
The Auxiliary Particle Filter The Auxiliary Trick
The Auxiliary Trick
Assume that the global instrumental kernel is a mixture of the form qglobk ((Xk1:n, Wk1:n), x) =
n
X
i=1
τkiqk(Xki, x).
To avoid then2 update, one can use data augmentation, introducing the mixture component as anauxiliary variable:
qauxk ((Xk1:n, Wk1:n),(i, x)) =τkiqk(Xki, x) defines a pdf on{1, . . . , n} ×X, whose marginal is qk((Xk1:n, Wk1:n), x).
Likewise,
πk+1|k+1n,aux (i, x) = 1
cWkit(Xki, x)`(x, Yk+1), wherec=Pn
i=1
R Wkit(Xki, x)`(x, Yk+1)dx, is the auxiliary target with marginalπnk+1|k+1(x).
The Auxiliary Particle Filter The Auxiliary Trick
Auxiliary Sampling Algorithm (Pitt & Shephard, 1999)
Auxiliary Particle Filter
Repeatntimes independently, for i= 1, . . . , n,
1 Sample an index Ji in {1, . . . , n}with probabilities (τk1, . . . , τkn).
2 Sample a position Xk+1i fromqk(XkJi, x).
3 Compute the (unnormalized)auxiliary IS weight ωk+1i = WkJit(XkJi, Xk+1i )`(Xk+1i , Yk+1)
τkJiqk(XkJi, Xk+1i ) .
The Auxiliary Particle Filter The Auxiliary Trick
The Auxiliary View Provides New Degrees of Freedom
In the original auxiliary particle filter, Pitt & Shephard usedqik=t and proposed an heuristic rule for setting the weightsτki based on Xki andYk+1.
The auxiliary IS weight may also be rewritten in normalized form as ωik+1= WkJiγk(XkJi)qkopt(XkJi, Xk+1i )
τkJit(XkJi, Xk+1i ) .
showing that the optimal choice ofτki, in the sense of minimizing theconditional(toXk1:n, Wk1:n) variance ofPn
i=1Wk+1i f(Xk+1i ) for a functionf, is an instance of the Neyman allocation problem.
And thus the optimal choice is τki ∝Wkiγk(Xki)
q υi(f)
whereυi(f) is the asymptotic variance of IS for f, target
qoptk (Xki, x), and, instrumental pdft(Xki, x) (Olssonet al., 2007).
Smoothing
1 Bayesian Dynamic Models
2 The Filtering and Smoothing Recursions
3 Sequential Importance Sampling
4 Sequential Importance Sampling with Resampling
5 The Auxiliary Particle Filter
6 Smoothing
Using the Ancestry Tree Backward Reweighting
Smoothing for Sum Functionals 7 Mixture Kalman Filter
Smoothing Using the Ancestry Tree
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
Predictive densities and evolution of the particle ancestry tree
Smoothing Using the Ancestry Tree
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
Predictive densities and evolution of the particle ancestry tree
Smoothing Using the Ancestry Tree
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
Predictive densities and evolution of the particle ancestry tree
Smoothing Using the Ancestry Tree
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
Predictive densities and evolution of the particle ancestry tree
Smoothing Using the Ancestry Tree
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
Predictive densities and evolution of the particle ancestry tree
Smoothing Using the Ancestry Tree
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
Predictive densities and evolution of the particle ancestry tree
Smoothing Using the Ancestry Tree
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
Predictive densities and evolution of the particle ancestry tree
Smoothing Using the Ancestry Tree
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
Predictive densities and evolution of the particle ancestry tree
Smoothing Using the Ancestry Tree
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
Predictive densities and evolution of the particle ancestry tree
Smoothing Using the Ancestry Tree
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
Predictive densities and evolution of the particle ancestry tree
Smoothing Using the Ancestry Tree
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
5 10 15 20 25
0.4 0.6 0.8 1 1.2 1.4 1.6
time index
state
Predictive densities and evolution of the particle ancestry tree