An Introduction to Sequential Monte Carlo for Filtering and Smoothing

(1)

An Introduction to Sequential Monte Carlo for Filtering and Smoothing

Olivier Capp´e

LTCI, TELECOM ParisTech & CNRS http://perso.telecom-paristech.fr/^∼cappe/

Acknowlegdment: Eric Moulines (TELECOM ParisTech) & Tobias Ryd´en (Lund)

(2)

Bayesian Dynamic Models

1 Bayesian Dynamic Models

Hidden Markov Models and State-Space Models Extensions

2 The Filtering and Smoothing Recursions

3 Sequential Importance Sampling

4 Sequential Importance Sampling with Resampling

5 The Auxiliary Particle Filter

6 Smoothing

7 Mixture Kalman Filter

(3)

Bayesian Dynamic Models Hidden Markov Models and State-Space Models

Hidden Markov Model (HMM)

The Hidden State Process {X_k}_k≥0 is a Markov chainwith initial probability density function (pdf)t₀(x)and transition density function t(x, x⁰) such that^*

p(x_0:k) =t₀(x₀)

k−1

Y

l=0

t(x_l, x_l+1).

The Observed Process {Y_k}_k≥0 is such that the conditional joint density ofy_0:k given x_0:k has the conditional independence (product) form

p(y_0:k|x_0:k) =

k

Y

l=0

`(x_l, y_l).

*x0:kdenotes the collectionx0, . . . , xk.

(4)

Graphical Representation of the Dependence Structure

The HMM can be represented pictorially by aBayesian network which depicts the conditional independence relations:

· · · -

-

- · · ·

?

Xk X_k+1

Y_k Y_k+1

(5)

State-Space Form

Here the model is described in a functional form:

X_k+1 =a(X_k, U_k), Y_k =b(X_k, V_k),

where{U_k}_k≥0 and {V_k}_k≥0 are mutually independent i.i.d.

sequences of random variables (also independent ofX₀).

Remark

The termstate-space modeloften refers to the case where aandb are linear functions of their arguments (and{U_k},{V_k},X0 are jointly Gaussian).

Likewise, the termHMM is sometimes used (not in this talk!) more restrictively for the case where X is a finite set.

(6)

HMM Examples

- finite state space

6

continuous state space

ergodicleft-right

speech recognition handwritting recognition source coding

ion channel modelling

trajectory models tracking, vision quantitative finance

(7)

Bayesian Dynamic Models Extensions

Beyond HMMs

For sequential Monte Carlo methods, the key point is thestructure of the conditionalp(x0:k|y_0:k): the methods described in this talk directly apply in cases where the conditional may be factored as

p(x0:k|y_0:k) =p(x0|y₀)

k−1

Y

l=0

p(xl+1|x_l, y0:l+1)

Example: Switching Autoregressive Model If the observation equation is replaced by

Y_k =b(X_k)Yk−1+V_k thenp(x0:k|y_0:k) =p(x0|y₀)Qk−1

l=0 p(xl+1|x_l, yl+1, yl)

(8)

Hierarchical HMMs

· · · -

-

- · · ·

C_k Ck+1

R R

? ?

· · · -

-

- · · ·

Z_k Z_k+1

? ?

Y_k Y_k+1

Bayesian network for a hierarchical HMM: The state sequence{Xk}is composed of two chains{Ck}and{Zk}that evolve in parallel and the observationYkdepends on both component of the chain.

(9)

In hierarchical HMMs, sequential Monte Carlo is applicable to the marginalized posteriorp(c_0:k|Y_0:k) (despite the fact that it doesn’t factorizes as previously requested due to the marginalization of Z_0:k) when

p(zk|c_0:k, y0:k) is computable Conditionally Gaussian Linear State-Space Model

Dynamic equation

Zk+1=A(Ck+1)Zk+R(Ck+1)Uk

Observation equation

Y_k=B(C_k)Z_k+S(C_k)V_k where{C_k} is a finite-valued Markov Chain.

(10)

The Filtering and Smoothing Recursions

2 The Filtering and Smoothing Recursions Basic Recursions

Computational Filtering and Smoothing Approaches 3 Sequential Importance Sampling

6 Smoothing

(11)

The Filtering and Smoothing Recursions Basic Recursions

Tasks of interest for HMMs

State Inference How to make probabilistic statements on the state sequence given the model and the observations?

(fixed-interval: π_l|k for l= 0, . . . , k;

fixed-lag: πk|k+∆ fork= 0, . . .) Parameter Inference How to tune the model parameters based on

the observations?

(12)

Recursive Structure of the Joint Smoothing Density

By Bayes’ rule π_0:k+1|k+1(x_0:k+1)

= (Lk+1(Y0:k+1))⁻¹ t0(x0)

k

Y

l=0

t(xl, xl+1)

k+1

Y

l=0

`(xl, Yl)

=

L_k+1(Y_0:k+1) L_k(Y_0:k)

−1

π_0:k|k(x_0:k)t(x_k, x_k+1)`(x_k+1, Y_k+1), where the normalization constantsL_k, i.e., thelikelihood of the observations, is usually not computable.

(13)

The Joint Smoothing Recursion

π_0:k+1|k+1(x_0:k+1) =

L_k+1 L_k

−1

π_0:k|k(x_0:k)t(x_k, x_k+1)`(x_k+1, Y_k+1)

Themarginal recursionmay be decomposed in two steps:

Prediction

π_k+1|k(x_k+1) = Z

π_k|k(x_k)t(x_k, x_k+1)dx_k Filtering

π_k+1|k+1(x_k+1) =

L_k+1 L_k

−1

π_k+1|k(x_k+1)`(x_k+1, Y_k+1)

(14)

The Filtering and Smoothing Recursions Computational Filtering and Smoothing Approaches

Exact Implementation of the Filtering and Smoothing Recursions

When X is finite (Baumet al., 1970) The associated

computational cost is|X|² per time index (for the filtering part).

In linear Gaussian state-space models (Kalman & Bucy, 1961) The filtering and prediction recursion is implemented by theKalman filter (L_k+1/L_k is interpreted as the likelihood of the (k+ 1)-th innovation).

Suchfinite dimensional filtersexist only in very specific models (see, e.g., Runggaldier & Spizzichino, 2001)

(15)

Approximate Implementations of the Filtering and Smoothing Recursions

EKF (Extended Kalman Filter) Linearization-based approach (for non-linear Gaussian state space models)

UKF (Unscented Kalman Filter, Julier & Uhlmann, 1997) Point-based approach

Variational Methods (e.g., Valpola & Karhunen, 2002)Based on parametric density approximation arguments.

Exact Suboptimal Filters In particular, Kalman filter viewed as minimum mean square error linearfiltering.

(16)

Sequential Monte Carlo (SMC)

Sequential Monte Carlo (sometimes calledparticle filtering) is a method which uses pseudo-random simulations to produce successive populations of weighted “particles” X_k^1:n and associated weights W_k^1:n such that

n

X

i=1

W_kⁱf(X_kⁱ)≈ Z

f(x)π_k|k(x)dx , for all functions f of interest.

The SMC process is sequential in the sense that given X_k^1:n, W_k^1:n and the observationsY0:k+1,X_k+1^1:n andW_k+1^1:n are conditionally independent of previous populations of particles.

SMC is based on importance sampling and resampling.

(17)

Sequential Importance Sampling

Self-Normalized Importance Sampling Sequential Importance Sampling (SIS) Weight Degeneracy

6 Smoothing

(18)

Sequential Importance Sampling Self-Normalized Importance Sampling

Self-Normalized Importance Sampling, or IS (Hammersley

& Handscomb, 1964)

IS is a weighted form of Monte Carlo approximation, in which expectations under a target pdfπ

π(f) = E_π[f(X)]

are estimated as ˆ π_qⁿ(f) =

n

X

i=1

ωⁱ Pn

j=1ω^j

| {z }

Wⁱ

f(Xⁱ),

where

Xⁱ ∼iid q, ωⁱ = ^π_q(Xⁱ).

This form of IS (sometimes also called Bayesian IS) does not necessitate thatπ be properly normalized.

(19)

Performance of IS

Assuming thatE_π[^π_q(X)(1 +f²(X))]<∞,πˆ_qⁿ(f)is consistent and asymptotically normal, withasymptotic variance given by

υ_q(f) = E_π π

q(X)(f(X)−π(f))²

.

The asymptotic variance can be estimated from the IS sample by ˆ

υⁿ_q(f) =n

n

X

i=1

(Wⁱ)²{f(Xⁱ)−πˆⁿ_q(f)}² , whereWⁱ =ωⁱ/Pn

j=1ω^j are the normalized weights.

(20)

Elements of proof

ConsistencyIfπ¯=π/c, wherec=R

π(x)dx, is a pdf, Eq[ωⁱf(Xⁱ)] = Eq

π

q(X)f(X)

=cEπ¯[f(X)].

CLT

√n(ˆπ_qⁿ(f)−π(f¯ )) =

√1 n

Pn

i=1ωⁱ(f(Xⁱ)−π(f¯ ))

1 n

Pn j=1ω^j

andVar[ωⁱ(f(Xⁱ)−π(f))] =c²E_¯_π[^π^¯_q(X)(f(X)−π(f))²].

Empirical estimate

ˆ υ_qⁿ(f) =

n

X

i=1

Wⁱ

¯ π q(Xⁱ)

1 n

Pn j=1 π¯

q(X^j){f(Xⁱ)−πˆ_qⁿ(f)}²

(21)

Empirical diagnostic tools for IS

Effective sample size

ESSⁿ_q =

" _n X

i=1

Wⁱ2

#−1

1 1≤ESSⁿ_q ≤n

2 n/ESSⁿ_q is an estimate of E_π[^π_q(X)]which is the maximal IS asymptotic variance for functions f such that

|f(x)−π(f)| ≤1(note: these functions have maximal Monte Carlo variance of 1 under π ).

3 n/ESSⁿ_q−1 is an estimator of theχ² divergence Z (π(x)−q(x))²

q(x) dx

(n/ESSⁿ_q−1is also the square of the empirical coefficient of variation associated with the normalized weights).

(22)

Another Empirical diagnostic tools for IS

(Shannon) Entropy of the Normalized Weights

ENTⁿ_q =−

N

X

i=1

Wⁱlog(Wⁱ)

1 1≤exp(ENTⁿ_q)≤n

2 exp(ENTⁿ_q)/nis an estimate of exp[−K(π||q)], where K(π||q)is the Kullback-Leibler divergence between π and q.

(23)

Optimal IS Instrumental Density

When considering large classes of functionsf, IS generally appears to perform worse than Monte Carlo underπ (e.g., for functions such that|f(x)−π(f)| ≤1, the maximal Monte Carlo asymptotic variance is 1 whereas, the maximal IS asymptotic variance is Eπ[^π_q(X)] which is strictly larger than 1 by Jensen’s inequality, unlessq=π).

However, for a fixed functionf, the optimal IS density is notπ but

|f(x)−π(f)|π(x) R |f(x⁰)−π(f)|π(x⁰)dx⁰ as Cauchy-Schwarz inequality implies that

E_π π

q(X)(f(X)−π(f))²

E_πhq π(X)i

| {z }

=1

≥E²_π[|f(X)−π(f)|] .

(24)

Example (Bayesian Posterior) Consider a simple model in which

X0∼N(1,2²), Y₀|X₀∼N(X₀², σ²).

And apply IS to compute expectations under the posterior for Y₀ = 2, using the prior as instrumental pdf.

(25)

−6 −4 −2 0 2 4 6 8

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Instrumental Target

σ= 2 ESSⁿ_q /n 0.66 υq(x7→x) 1.69 υ_π(x7→x) 1.25

−6 −4 −2 0 2 4 6 8

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Instrumental Target

σ = 0.5 ESSⁿ_q/n 0.25 υq(x7→x) 5.35 υ_π(x7→x) 1.25

(26)

Sequential Importance Sampling Sequential Importance Sampling (SIS)

Back to the Filtering and Smoothing Problem

How to estimate expectations under the posterior π_0:k|k(x_0:k) =p(x_0:k|Y_0:k) in the model

p(x_0:k) =t0(x0)

k−1

Y

l=0

t(x_l, x_l+1),

p(y0:k|x_0:k) =

k

Y

l=0

`(xl, yl), using a sequential algorithm ?

(27)

Sequential Smoothing through IS, or SIS (Handschin & Mayne, 1969-1970)

1 Propose nindependent particle trajectories {X_0:k+1ⁱ }^1≤i≤n under a Markovian scheme such that

p(x_0:k+1) =ρ_0:k+1(x_0:k+1) =q₀(x₀)

k

Y

l=1

q_l(x_l, x_l+1).

2 Computeimportance weights sequentially:

ω_k+1ⁱ = π_0:k+1|k+1(X_0:k+1ⁱ )

ρ_0:k+1(X_0:k+1ⁱ ) =ωⁱ_k×t(X_kⁱ, X_k+1ⁱ )`(X_k+1ⁱ , Y_k+1) q_k(X_kⁱ, X_k+1ⁱ ) . Then,

n

X

i=1

ωⁱ_k+1 Pn

j=1ω_k+1^j f(X_0:k+1ⁱ ) is an estimate ofE [f(X0:k+1)|Y0:k+1].

(28)

FILT.

INSTR.

FILT. +1

One step of the SIS algorithm with just seven particles.

(29)

Choice of the Instrumental Kernel

The so-called “optimal” choice ofq_k, consists in setting q_k(x, x⁰) =q^opt_k (x, x⁰) = t(x, x⁰)`(x⁰, Y_k+1)

R t(x, x⁰⁰)`(x⁰⁰, Y_k+1)dx⁰⁰

for which the normalized weights are predictable, i.e. W_k+1ⁱ depend onX_kⁱ but not X_k+1ⁱ (hence, for this instrumental kernel the conditional variance cancels).

This is however usually not feasible and common choices include

1 the prior q_k=t (and thenω_kⁱ ∝`(X_kⁱ, Y_k))

2 approximations (sometimes heuristic) to q_k^opt(moment matching, use of EKF or UKF, . . . ),

3 tuning parameters of qk so as to maximize theeffective sample sizeor entropy criterions

(30)

Sequential Importance Sampling Weight Degeneracy

Weight Degeneracy

Empirically,the SIS approach always fail when the time-horizonk is more than a few tens; the IS weights ω_k^1:n usually become very unbalanced with a few weights dominating all the other

To understand why it is the case, consider the(silly) model where (t(x, x⁰) =t(x⁰) =t₀(x⁰), (Independent states)

`(x, y) =`(y), (Non-informative observations) and the instrumental kernel is such thatql(x, x⁰) =q0(x⁰) =q(x⁰)

(31)

Weight Degeneracy (Contd.)

For a function of interestf that only depends on the last

coordinatex_k of the trajectoryx_0:k, the asymptotic variance of the SIS approximation toπk|k(f) = Eπk|k[f(X)]is given by

υ_k(f)= Z

· · · Z k

Y

l=0

t q(x_l)

!²

f(x_k)−π_k|k(f)2 k

Y

l=0

q(x_l)dx0. . . dx_k

= Z

t

q(x)t(x)dx

| {z }

>1

kZ t

q(x) f(x)−π_k|k(f)2

t(x)dx .

In practise, this situation can usually be detected by monitoring the effective sample sizeor entropycriterions, which become

abnormally small.

(32)

Application to the Stochastic Volatility Model

This is a non-linearly observed state-space model used to represent log-returns in quantitative finance:

X_k+1 =φX_k+σU_k |φ|<1, Y_k=βexp(X_k/2)V_k,

where

{U_k}_k≥0 and{V_k}_k≥0 are independent standard Gaussian white noise processes.

X0∼N(0, σ²/(1−φ²).

We consider trajectories of length 50 simulated under the model withφ= 0.98,σ = 0.17and β= 0.64 and apply SIS withqk =q andn= 1,000particles.

(33)

Typical Evolution of ESS for a Single Trajectory

1 5 10 15 20 25 30 35 40 45 50

0 100 200 300 400 500 600 700 800 900 1000

ESS (500 indep. replications of the particles)

time index

(34)

Evolution of ESS When Averaging Over Trajectories

1 5 10 15 20 25 30 35 40 45 50

0 100 200 300 400 500 600 700 800 900 1000

ESS (500 indep. simulations)

time index

(35)

Typical Evolution of the Weight Distribution

−250 −20 −15 −10 −5 0

500 1000

−250 −20 −15 −10 −5 0

500 1000

−250 −20 −15 −10 −5 0

50 100

Importance Weights (base 10 logarithm)

Histograms of the base 10 logarithm of the normalized importance weights after (from top to bottom) 1, 10 and 100 iterations for the stochastic volatility model(improved instrumental kernel based on moment matching).

(36)

Filtering Results

−1.5 −2

−0.5 −1 0.5 0 1.5 1 2 0

5

10

15

20 0

0.02 0.04 0.06 0.08

State Time Index

Density

Waterfall representation of the sequence of estimated filtering pdfs (weighted kernel smooth) together with the actual state for 1,000 particles(improved instrumental kernel based on moment matching).

(37)

Sequential Importance Sampling with Resampling

4 Sequential Importance Sampling with Resampling Sampling Importance Resampling

Sequential Importance Sampling with Resampling (SISR) Case Study

6 Smoothing

(38)

Sequential Importance Sampling with Resampling Sampling Importance Resampling

In IS, it is indeed possible toreset the weights to a constant value at the price of a, usually moderate, increase in variance.

Sampling Importance Resampling (Rubin, 1987)

Replace{X^1:n, W^1:n} by{X˜^{1: ˜}^N,W˜^{1: ˜}^N} such that the discrepancy between the resampled weights{W˜^{1: ˜}^N} is reduced and

PN^˜

i=1W˜_kⁱδ_X_˜i is a good approximation toPn

i=1Wⁱδ_Xi.

In general the resampling is random and subject to the constraints











N˜ =n , W˜ⁱ= 1/N ,˜ E

hPN^˜

i=11{X˜ⁱ=X^j}

X^1:n, W^1:n i

= ˜N W^j (1≤j ≤n).

The last condition is often referred to asunbiasedness or proper weighting.

(39)

Multinomial Resampling

1 Draw, conditionally independently given{X^1:n, W^1:n},n discrete random variables(J¹, . . . , Jⁿ) taking their values in the set{1, . . . , n}with probabilities(W¹, . . . , Wⁿ).

2 Set, fori= 1, . . . , n,X˜ⁱ =X^Jⁱ andW˜ⁱ= 1/n.

Let Cⁱ=Pn

j=11{X˜^j =Xⁱ} (i= 1, . . . , n) denote the number of times each particle is duplicated in the resampling process. The counts (C¹, . . . , Cⁿ) follow a multinomial distribution with parameters n, (W¹, . . . , Wⁿ), conditionally to {X^1:n, W^1:n} .

INSTRUMENTAL

TARGET

Resampled particles

(40)

Some Results on SIR

1 X˜ⁱ −→^D π are n→ ∞ (some extensions of this result)

2 1 n

Pn

i=1f( ˜Xⁱ)is an asymptotically normal estimator ofπ(f) (assumingE_π[^π_q(X)(1 +f²(X)) +f²(X)]<∞) with asymptotic variance given by

˜

υ_q(f) = E_π π

q(X) (f(X)−π(f))²

| {z }

υq(f)

+E_πh

(f(X)−π(f))²i

| {z }

Varπ[f(X)]

Ifnis sufficiently large, the cost of resampling is very moderate in situation that are challenging for IS, i.e., when

υ_q(f)Var_π[f(X)].

(41)

Elements of Proof

For a bounded function f, E

h f( ˜Xⁱ)

i

= E h

E

f( ˜Xⁱ)

X^1:n, W^1:n i

= E





n

X

j=1

W^jf(X^j)



 andP_n

j=1W^jf(X^j)−→^a.s. π(f) (|P_n

j=1W^jf(X^j)| ≤ kfk_∞).

For the CLT, an ingredient of the proof is that nVar

"

1 n

n

X

i=1

f( ˜Xⁱ)

#

=nVar

" _n X

i=1

Wⁱf(Xⁱ)

#

| {z }

→υq(f)

+ E

" _n X

i=1

Wⁱf²(Xⁱ)−

n

X

i=1

Wⁱf(Xⁱ)

!2

| {z }

−→Vara.s. π[f(X)]

#

(42)

There Exists Resampling Schemes with Reduced Variance

Residual Resampling Fori= 1, . . . , nset

Cⁱ= nWⁱ

+ ¯Cⁱ ,

whereC¯¹, . . . ,C¯ⁿ are distributed, conditionally toW^1:n, according to the multinomial distributionMult(n−R,W¯¹, . . . ,W¯ⁿ) with R=Pn

i=1bnWⁱc and

W¯ⁱ= nWⁱ− bnWⁱc

n−R , i= 1, . . . , n .

This scheme is obviously unbiased and induces an additional variance which is always lower thanVar_π[f](the same is true for the bootstrap filter based on residual resampling, Douc &

Moulines, 2005).

(43)

Sequential Importance Sampling with Resampling Sequential Importance Sampling with Resampling (SISR)

The Simplest Functional Algorithm (Gordon et al., 1993)

Regular resampling is added to avoidweight degeneracy and to guarantee the long-term (k→ ∞) stability of the particle filter.

The Bootstrap filter

1 Given X˜_k^1:n, propose new positions X_k+1ⁱ independentlyunder the prior dynamic t( ˜X_kⁱ,·), for i= 1, . . . , n;

2 Compute the weightsωⁱ_k+1=`(X_k+1ⁱ , Y_k+1), for i= 1, . . . , n and normalize them (W_k+1ⁱ =ωⁱ_k+1/P_n

j=1ω_k+1^j );

3 Resampleto obtain X˜_k+1^1:n, e.g., by drawing independent indicesJ_k+1ⁱ such thatP J_k+1ⁱ =j

W_k+1^1:n

=W_k+1^j and setting X˜_k+1ⁱ =X^J

i k+1

k+1 (Multinomial Resampling).

To avoid unnecessary resampling,3 can be used only when the ESS statistics of the weightsW_k+1^1:n falls below a threshold.

(44)

Sequential Importance Sampling with Resampling Sequential Importance Sampling with Resampling (SISR)

FILT.

FILT. +1

FILT. +2

FILT.

FILT. +1

FILT. +2

SIS (left) and SISR (right).

(45)

Sequential Importance Sampling with Resampling Case Study

AR(1) Model Observed in Pulsated Noise

We consider a simple Gaussian linear state-space model to allow for comparison with analytical computations:

X_k+1=φX_k+σU_k , Yk=Xk+ηkVk,

whereφ= 0.99,σ = 0.2, andηk is varied(periodically)between 0.1 and3.

The observation sequence is simulated from the model but we also consider the influence of an out-of-modeloutlying observationat time 460.

(46)

0 50 100 150 200 250 300 350 400 450 500

−5 0 5 10

time

state obs.

filt.

smooth.

10 20 30 40 50 60 70 80 90 100

−6

−4

−2 0 2 4 6

time

state obs.

filt.

smooth.

400 410 420 430 440 450 460 470 480 490 500

−8

−6

−4

−2 0 2 4 6 8 10

time

state obs.

filt.

smooth.

(47)

Sequential Importance Sampling (prior kernel, n = 100)

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0

0.5 1 1.5

filt. std.

time

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

−2 0 2

filt. mean

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0

50 100

ESS

(48)

Bootstrap Filter (prior kernel, n = 100, resamp. ESS < 10)

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0

0.5 1 1.5

filt. std.

time

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

−2 0 2

filt. mean

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 20

40 60 80 100

ESS

(49)

Bootstrap Filter (prior kernel, n = 100, resamp. always)

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

−2 0 2

filt. mean

10 20 30 40 50 60 70 80 90 100

20 40 60 80 100

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0

0.5 1 1.5

filt. std.

time

(50)

Bootstrap Filter (prior kernel, n = 100, resamp. ESS < 10)

400 410 420 430 440 450 460 470 480 490 500

0 0.5 1

filt. std.

time

400 410 420 430 440 450 460 470 480 490 500

0 2 4

filt. mean

20 40 60 80 100

ESS

(51)

Bootstrap Filter (n = 5, 000, resamp. ESS < 500)

400 410 420 430 440 450 460 470 480 490 500

0 0.5 1

filt. std.

time

400 410 420 430 440 450 460 470 480 490 500

0 2 4

filt. mean

1000 2000 3000 4000 5000

ESS

(52)

Conclusions

Withresampling, SISR achieveslong-term stability.

The increase in variance due to resampling is very moderate, especially when resampling is applied only when needed.

The method is still sensitive to outliers, model

misspecification, etc., which may necessitate the use of more elaborate instrumental kernels.

(53)

The Auxiliary Particle Filter

5 The Auxiliary Particle Filter A New Interpretation of SISR The Auxiliary Trick

6 Smoothing

(54)

The Auxiliary Particle Filter A New Interpretation of SISR

Alternatives to SISR

The resampling step in the SISR algorithm can be seen as a method to sample approximately under the distribution obtained when plugging the current particle approximation into the filtering update.

This alternative way of thinking about resampling suggests several sequential Monte Carlo variants.

(55)

The Filtering Recursion Revisited

Recall that(with more general notations) πk+1|k+1(f) =

R R f(x⁰)π_k|k(dx)t(x, dx⁰)`(x⁰, Y_k+1) R R π_k|k(dx)t(x, dx⁰)`(x⁰, Y_k+1) . Now, consider what happens when considering theempirical filtering distribution

ˆ π_k|kⁿ =

n

X

i=1

W_kⁱδ_Xi k

as an approximation toπk|k and plugging it into the previous relation.

(56)

Filtering Target

One obtains a mixture target pdf πⁿ_k+1|k+1(x) =

Pn

i=1W_kⁱt(X_kⁱ, x)`(x, Y_k+1) Pn

i=1

R W_kⁱt(X_kⁱ, x)`(x, Y_k+1)dx

=

n

X

i=1

W_kⁱγ_k(X_kⁱ) Pn

j=1W_k^jγk(X_k^j)q_k^opt(X_kⁱ, x), where

γ_k(x) = Z

t(x, dx⁰)`(x⁰, Y_k+1), q_k^opt(x, x⁰) = t(x, x⁰)`(x⁰, Y_k+1)

γ_k(x) .

(57)

Filtering Step

−100 −8 −6 −4 −2 0 2 4 6 8 10

0.1 0.2 0.3 0.4

−100 −8 −6 −4 −2 0 2 4 6 8 10

0.1 0.2 0.3 0.4

−100 −8 −6 −4 −2 0 2 4 6 8 10

0.1 0.2 0.3 0.4

Prediction

Filtering

likelihood predictive distribution

Approximation

−100 −8 −6 −4 −2 0 2 4 6 8 10

0.05 0.1 0.15 0.2

−100 −8 −6 −4 −2 0 2 4 6 8 10

0.1 0.2 0.3 0.4

−100 −8 −6 −4 −2 0 2 4 6 8 10

0.2 0.4 0.6 0.8

Prediction

Filtering

likelihood predictive distribution

(58)

The previous reinterpretation of SISR shows that resampling and trajectory update can be integrated into a singleglobal IS step that targetsπ_k+1|k+1ⁿ .

But,

how to chose the global instrumental kernel q^glob_k ((X_k^1:n, W_k^1:n), x)?

as π_k+1|k+1ⁿ is an n-mixture density, evaluation of the IS weights may necessitate of the order ofn² computations.

RemarkThe global instrumental kernel that corresponds to the bootstrap filter(with resampling)is

q^glob_k ((X_k^1:n, W_k^1:n), x) =

n

X

i=1

W_kⁱt(X_kⁱ, x).

(59)

The Auxiliary Particle Filter The Auxiliary Trick

The Auxiliary Trick

Assume that the global instrumental kernel is a mixture of the form q^glob_k ((X_k^1:n, W_k^1:n), x) =

n

X

i=1

τ_kⁱqk(X_kⁱ, x).

To avoid then² update, one can use data augmentation, introducing the mixture component as anauxiliary variable:

q^aux_k ((X_k^1:n, W_k^1:n),(i, x)) =τ_kⁱq_k(X_kⁱ, x) defines a pdf on{1, . . . , n} ×X, whose marginal is q_k((X_k^1:n, W_k^1:n), x).

Likewise,

π_k+1|k+1^n,aux (i, x) = 1

cW_kⁱt(X_kⁱ, x)`(x, Yk+1), wherec=Pn

i=1

R W_kⁱt(X_kⁱ, x)`(x, Y_k+1)dx, is the auxiliary target with marginalπⁿ_k+1|k+1(x).

(60)

Auxiliary Sampling Algorithm (Pitt & Shephard, 1999)

Auxiliary Particle Filter

Repeatntimes independently, for i= 1, . . . , n,

1 Sample an index Jⁱ in {1, . . . , n}with probabilities (τ_k¹, . . . , τ_kⁿ).

2 Sample a position X_k+1ⁱ fromq_k(X_k^Jⁱ, x).

3 Compute the (unnormalized)auxiliary IS weight ω_k+1ⁱ = W_k^Jⁱt(X_k^Jⁱ, X_k+1ⁱ )`(X_k+1ⁱ , Y_k+1)

τ_k^Jⁱqk(X_k^Jⁱ, X_k+1ⁱ ) .

(61)

The Auxiliary View Provides New Degrees of Freedom

In the original auxiliary particle filter, Pitt & Shephard usedqⁱ_k=t and proposed an heuristic rule for setting the weightsτ_kⁱ based on X_kⁱ andYk+1.

The auxiliary IS weight may also be rewritten in normalized form as ωⁱ_k+1= W_k^Jⁱγ_k(X_k^Jⁱ)q_k^opt(X_k^Jⁱ, X_k+1ⁱ )

τ_k^Jⁱt(X_k^Jⁱ, X_k+1ⁱ ) .

showing that the optimal choice ofτ_kⁱ, in the sense of minimizing theconditional(toX_k^1:n, W_k^1:n) variance ofPn

i=1W_k+1ⁱ f(X_k+1ⁱ ) for a functionf, is an instance of the Neyman allocation problem.

And thus the optimal choice is τ_kⁱ ∝W_kⁱγ_k(X_kⁱ)

q υⁱ(f)

whereυⁱ(f) is the asymptotic variance of IS for f, target

q^opt_k (X_kⁱ, x), and, instrumental pdft(X_kⁱ, x) (Olssonet al., 2007).

(62)

Smoothing

6 Smoothing

Using the Ancestry Tree Backward Reweighting

Smoothing for Sum Functionals 7 Mixture Kalman Filter

(63)

Smoothing Using the Ancestry Tree

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

Predictive densities and evolution of the particle ancestry tree

(64)

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

(65)

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

(66)

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

(67)

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

(68)

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

(69)

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

(70)

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

(71)

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

(72)

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

(73)

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state