• Aucun résultat trouvé

An Introduction to Sequential Monte Carlo for Filtering and Smoothing

N/A
N/A
Protected

Academic year: 2022

Partager "An Introduction to Sequential Monte Carlo for Filtering and Smoothing"

Copied!
110
0
0

Texte intégral

(1)

An Introduction to Sequential Monte Carlo for Filtering and Smoothing

Olivier Capp´e

LTCI, TELECOM ParisTech & CNRS http://perso.telecom-paristech.fr/cappe/

Acknowlegdment: Eric Moulines (TELECOM ParisTech) & Tobias Ryd´en (Lund)

(2)

Bayesian Dynamic Models

1 Bayesian Dynamic Models

Hidden Markov Models and State-Space Models Extensions

2 The Filtering and Smoothing Recursions

3 Sequential Importance Sampling

4 Sequential Importance Sampling with Resampling

5 The Auxiliary Particle Filter

6 Smoothing

7 Mixture Kalman Filter

(3)

Bayesian Dynamic Models Hidden Markov Models and State-Space Models

Hidden Markov Model (HMM)

The Hidden State Process {Xk}k≥0 is a Markov chainwith initial probability density function (pdf)t0(x)and transition density function t(x, x0) such that*

p(x0:k) =t0(x0)

k−1

Y

l=0

t(xl, xl+1).

The Observed Process {Yk}k≥0 is such that the conditional joint density ofy0:k given x0:k has the conditional independence (product) form

p(y0:k|x0:k) =

k

Y

l=0

`(xl, yl).

*x0:kdenotes the collectionx0, . . . , xk.

(4)

Bayesian Dynamic Models Hidden Markov Models and State-Space Models

Graphical Representation of the Dependence Structure

The HMM can be represented pictorially by aBayesian network which depicts the conditional independence relations:

· · · -

-

- · · ·

?

?

Xk Xk+1

Yk Yk+1

(5)

Bayesian Dynamic Models Hidden Markov Models and State-Space Models

State-Space Form

Here the model is described in a functional form:

Xk+1 =a(Xk, Uk), Yk =b(Xk, Vk),

where{Uk}k≥0 and {Vk}k≥0 are mutually independent i.i.d.

sequences of random variables (also independent ofX0).

Remark

The termstate-space modeloften refers to the case where aandb are linear functions of their arguments (and{Uk},{Vk},X0 are jointly Gaussian).

Likewise, the termHMM is sometimes used (not in this talk!) more restrictively for the case where X is a finite set.

(6)

Bayesian Dynamic Models Hidden Markov Models and State-Space Models

HMM Examples

- finite state space

6

continuous state space

ergodicleft-right

speech recognition handwritting recognition source coding

ion channel modelling

trajectory models tracking, vision quantitative finance

(7)

Bayesian Dynamic Models Extensions

Beyond HMMs

For sequential Monte Carlo methods, the key point is thestructure of the conditionalp(x0:k|y0:k): the methods described in this talk directly apply in cases where the conditional may be factored as

p(x0:k|y0:k) =p(x0|y0)

k−1

Y

l=0

p(xl+1|xl, y0:l+1)

Example: Switching Autoregressive Model If the observation equation is replaced by

Yk =b(Xk)Yk−1+Vk thenp(x0:k|y0:k) =p(x0|y0)Qk−1

l=0 p(xl+1|xl, yl+1, yl)

(8)

Bayesian Dynamic Models Extensions

Hierarchical HMMs

· · · -

-

- · · ·

Ck Ck+1

R R

? ?

· · · -

-

- · · ·

Zk Zk+1

? ?

Yk Yk+1

Bayesian network for a hierarchical HMM: The state sequence{Xk}is composed of two chains{Ck}and{Zk}that evolve in parallel and the observationYkdepends on both component of the chain.

(9)

Bayesian Dynamic Models Extensions

In hierarchical HMMs, sequential Monte Carlo is applicable to the marginalized posteriorp(c0:k|Y0:k) (despite the fact that it doesn’t factorizes as previously requested due to the marginalization of Z0:k) when

p(zk|c0:k, y0:k) is computable Conditionally Gaussian Linear State-Space Model

Dynamic equation

Zk+1=A(Ck+1)Zk+R(Ck+1)Uk

Observation equation

Yk=B(Ck)Zk+S(Ck)Vk where{Ck} is a finite-valued Markov Chain.

(10)

The Filtering and Smoothing Recursions

1 Bayesian Dynamic Models

2 The Filtering and Smoothing Recursions Basic Recursions

Computational Filtering and Smoothing Approaches 3 Sequential Importance Sampling

4 Sequential Importance Sampling with Resampling

5 The Auxiliary Particle Filter

6 Smoothing

7 Mixture Kalman Filter

(11)

The Filtering and Smoothing Recursions Basic Recursions

Tasks of interest for HMMs

State Inference How to make probabilistic statements on the state sequence given the model and the observations?

Filtering πk|k(xk) =p(xk|Y0:k) Prediction πk+1|k(xk+1) =p(xk+1|Y0:k) Smoothing π0:k|k(x0:k) =p(x0:k|Y0:k)

(fixed-interval: πl|k for l= 0, . . . , k;

fixed-lag: πk|k+∆ fork= 0, . . .) Parameter Inference How to tune the model parameters based on

the observations?

(12)

The Filtering and Smoothing Recursions Basic Recursions

Recursive Structure of the Joint Smoothing Density

By Bayes’ rule π0:k+1|k+1(x0:k+1)

= (Lk+1(Y0:k+1))−1 t0(x0)

k

Y

l=0

t(xl, xl+1)

k+1

Y

l=0

`(xl, Yl)

=

Lk+1(Y0:k+1) Lk(Y0:k)

−1

π0:k|k(x0:k)t(xk, xk+1)`(xk+1, Yk+1), where the normalization constantsLk, i.e., thelikelihood of the observations, is usually not computable.

(13)

The Filtering and Smoothing Recursions Basic Recursions

The Joint Smoothing Recursion

π0:k+1|k+1(x0:k+1) =

Lk+1 Lk

−1

π0:k|k(x0:k)t(xk, xk+1)`(xk+1, Yk+1)

Themarginal recursionmay be decomposed in two steps:

Prediction

πk+1|k(xk+1) = Z

πk|k(xk)t(xk, xk+1)dxk Filtering

πk+1|k+1(xk+1) =

Lk+1 Lk

−1

πk+1|k(xk+1)`(xk+1, Yk+1)

(14)

The Filtering and Smoothing Recursions Computational Filtering and Smoothing Approaches

Exact Implementation of the Filtering and Smoothing Recursions

When X is finite (Baumet al., 1970) The associated

computational cost is|X|2 per time index (for the filtering part).

In linear Gaussian state-space models (Kalman & Bucy, 1961) The filtering and prediction recursion is implemented by theKalman filter (Lk+1/Lk is interpreted as the likelihood of the (k+ 1)-th innovation).

Suchfinite dimensional filtersexist only in very specific models (see, e.g., Runggaldier & Spizzichino, 2001)

(15)

The Filtering and Smoothing Recursions Computational Filtering and Smoothing Approaches

Approximate Implementations of the Filtering and Smoothing Recursions

EKF (Extended Kalman Filter) Linearization-based approach (for non-linear Gaussian state space models)

UKF (Unscented Kalman Filter, Julier & Uhlmann, 1997) Point-based approach

Variational Methods (e.g., Valpola & Karhunen, 2002)Based on parametric density approximation arguments.

Exact Suboptimal Filters In particular, Kalman filter viewed as minimum mean square error linearfiltering.

(16)

The Filtering and Smoothing Recursions Computational Filtering and Smoothing Approaches

Sequential Monte Carlo (SMC)

Sequential Monte Carlo (sometimes calledparticle filtering) is a method which uses pseudo-random simulations to produce successive populations of weighted “particles” Xk1:n and associated weights Wk1:n such that

n

X

i=1

Wkif(Xki)≈ Z

f(x)πk|k(x)dx , for all functions f of interest.

The SMC process is sequential in the sense that given Xk1:n, Wk1:n and the observationsY0:k+1,Xk+11:n andWk+11:n are conditionally independent of previous populations of particles.

SMC is based on importance sampling and resampling.

(17)

Sequential Importance Sampling

1 Bayesian Dynamic Models

2 The Filtering and Smoothing Recursions

3 Sequential Importance Sampling

Self-Normalized Importance Sampling Sequential Importance Sampling (SIS) Weight Degeneracy

4 Sequential Importance Sampling with Resampling

5 The Auxiliary Particle Filter

6 Smoothing

7 Mixture Kalman Filter

(18)

Sequential Importance Sampling Self-Normalized Importance Sampling

Self-Normalized Importance Sampling, or IS (Hammersley

& Handscomb, 1964)

IS is a weighted form of Monte Carlo approximation, in which expectations under a target pdfπ

π(f) = Eπ[f(X)]

are estimated as ˆ πqn(f) =

n

X

i=1

ωi Pn

j=1ωj

| {z }

Wi

f(Xi),

where

Xi ∼iid q, ωi = πq(Xi).

This form of IS (sometimes also called Bayesian IS) does not necessitate thatπ be properly normalized.

(19)

Sequential Importance Sampling Self-Normalized Importance Sampling

Performance of IS

Assuming thatEπ[πq(X)(1 +f2(X))]<∞,πˆqn(f)is consistent and asymptotically normal, withasymptotic variance given by

υq(f) = Eπ π

q(X)(f(X)−π(f))2

.

The asymptotic variance can be estimated from the IS sample by ˆ

υnq(f) =n

n

X

i=1

(Wi)2{f(Xi)−πˆnq(f)}2 , whereWii/Pn

j=1ωj are the normalized weights.

(20)

Sequential Importance Sampling Self-Normalized Importance Sampling

Elements of proof

ConsistencyIfπ¯=π/c, wherec=R

π(x)dx, is a pdf, Eqif(Xi)] = Eq

π

q(X)f(X)

=cEπ¯[f(X)].

CLT

√n(ˆπqn(f)−π(f¯ )) =

1 n

Pn

i=1ωi(f(Xi)−π(f¯ ))

1 n

Pn j=1ωj

andVar[ωi(f(Xi)−π(f))] =c2E¯π[π¯q(X)(f(X)−π(f))2].

Empirical estimate

ˆ υqn(f) =

n

X

i=1

Wi

¯ π q(Xi)

1 n

Pn j=1 π¯

q(Xj){f(Xi)−πˆqn(f)}2

(21)

Sequential Importance Sampling Self-Normalized Importance Sampling

Empirical diagnostic tools for IS

Effective sample size

ESSnq =

" n X

i=1

Wi2

#−1

1 1≤ESSnq ≤n

2 n/ESSnq is an estimate of Eπ[πq(X)]which is the maximal IS asymptotic variance for functions f such that

|f(x)−π(f)| ≤1(note: these functions have maximal Monte Carlo variance of 1 under π ).

3 n/ESSnq−1 is an estimator of theχ2 divergence Z (π(x)−q(x))2

q(x) dx

(n/ESSnq−1is also the square of the empirical coefficient of variation associated with the normalized weights).

(22)

Sequential Importance Sampling Self-Normalized Importance Sampling

Another Empirical diagnostic tools for IS

(Shannon) Entropy of the Normalized Weights

ENTnq =−

N

X

i=1

Wilog(Wi)

1 1≤exp(ENTnq)≤n

2 exp(ENTnq)/nis an estimate of exp[−K(π||q)], where K(π||q)is the Kullback-Leibler divergence between π and q.

(23)

Sequential Importance Sampling Self-Normalized Importance Sampling

Optimal IS Instrumental Density

When considering large classes of functionsf, IS generally appears to perform worse than Monte Carlo underπ (e.g., for functions such that|f(x)−π(f)| ≤1, the maximal Monte Carlo asymptotic variance is 1 whereas, the maximal IS asymptotic variance is Eπ[πq(X)] which is strictly larger than 1 by Jensen’s inequality, unlessq=π).

However, for a fixed functionf, the optimal IS density is notπ but

|f(x)−π(f)|π(x) R |f(x0)−π(f)|π(x0)dx0 as Cauchy-Schwarz inequality implies that

Eπ π

q(X)(f(X)−π(f))2

Eπhq π(X)i

| {z }

=1

≥E2π[|f(X)−π(f)|] .

(24)

Sequential Importance Sampling Self-Normalized Importance Sampling

Example (Bayesian Posterior) Consider a simple model in which

X0∼N(1,22), Y0|X0∼N(X02, σ2).

And apply IS to compute expectations under the posterior for Y0 = 2, using the prior as instrumental pdf.

(25)

Sequential Importance Sampling Self-Normalized Importance Sampling

−6 −4 −2 0 2 4 6 8

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Instrumental Target

σ= 2 ESSnq /n 0.66 υq(x7→x) 1.69 υπ(x7→x) 1.25

−6 −4 −2 0 2 4 6 8

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Instrumental Target

σ = 0.5 ESSnq/n 0.25 υq(x7→x) 5.35 υπ(x7→x) 1.25

(26)

Sequential Importance Sampling Sequential Importance Sampling (SIS)

Back to the Filtering and Smoothing Problem

How to estimate expectations under the posterior π0:k|k(x0:k) =p(x0:k|Y0:k) in the model

p(x0:k) =t0(x0)

k−1

Y

l=0

t(xl, xl+1),

p(y0:k|x0:k) =

k

Y

l=0

`(xl, yl), using a sequential algorithm ?

(27)

Sequential Importance Sampling Sequential Importance Sampling (SIS)

Sequential Smoothing through IS, or SIS (Handschin & Mayne, 1969-1970)

1 Propose nindependent particle trajectories {X0:k+1i }1≤i≤n under a Markovian scheme such that

p(x0:k+1) =ρ0:k+1(x0:k+1) =q0(x0)

k

Y

l=1

ql(xl, xl+1).

2 Computeimportance weights sequentially:

ωk+1i = π0:k+1|k+1(X0:k+1i )

ρ0:k+1(X0:k+1i ) =ωik×t(Xki, Xk+1i )`(Xk+1i , Yk+1) qk(Xki, Xk+1i ) . Then,

n

X

i=1

ωik+1 Pn

j=1ωk+1j f(X0:k+1i ) is an estimate ofE [f(X0:k+1)|Y0:k+1].

(28)

Sequential Importance Sampling Sequential Importance Sampling (SIS)

FILT.

INSTR.

FILT. +1

One step of the SIS algorithm with just seven particles.

(29)

Sequential Importance Sampling Sequential Importance Sampling (SIS)

Choice of the Instrumental Kernel

The so-called “optimal” choice ofqk, consists in setting qk(x, x0) =qoptk (x, x0) = t(x, x0)`(x0, Yk+1)

R t(x, x00)`(x00, Yk+1)dx00

for which the normalized weights are predictable, i.e. Wk+1i depend onXki but not Xk+1i (hence, for this instrumental kernel the conditional variance cancels).

This is however usually not feasible and common choices include

1 the prior qk=t (and thenωki ∝`(Xki, Yk))

2 approximations (sometimes heuristic) to qkopt(moment matching, use of EKF or UKF, . . . ),

3 tuning parameters of qk so as to maximize theeffective sample sizeor entropy criterions

(30)

Sequential Importance Sampling Weight Degeneracy

Weight Degeneracy

Empirically,the SIS approach always fail when the time-horizonk is more than a few tens; the IS weights ωk1:n usually become very unbalanced with a few weights dominating all the other

To understand why it is the case, consider the(silly) model where (t(x, x0) =t(x0) =t0(x0), (Independent states)

`(x, y) =`(y), (Non-informative observations) and the instrumental kernel is such thatql(x, x0) =q0(x0) =q(x0)

(31)

Sequential Importance Sampling Weight Degeneracy

Weight Degeneracy (Contd.)

For a function of interestf that only depends on the last

coordinatexk of the trajectoryx0:k, the asymptotic variance of the SIS approximation toπk|k(f) = Eπk|k[f(X)]is given by

υk(f)= Z

· · · Z k

Y

l=0

t q(xl)

!2

f(xk)−πk|k(f)2 k

Y

l=0

q(xl)dx0. . . dxk

= Z

t

q(x)t(x)dx

| {z }

>1

kZ t

q(x) f(x)−πk|k(f)2

t(x)dx .

In practise, this situation can usually be detected by monitoring the effective sample sizeor entropycriterions, which become

abnormally small.

(32)

Sequential Importance Sampling Weight Degeneracy

Application to the Stochastic Volatility Model

This is a non-linearly observed state-space model used to represent log-returns in quantitative finance:

Xk+1 =φXk+σUk |φ|<1, Yk=βexp(Xk/2)Vk,

where

{Uk}k≥0 and{Vk}k≥0 are independent standard Gaussian white noise processes.

X0∼N(0, σ2/(1−φ2).

We consider trajectories of length 50 simulated under the model withφ= 0.98,σ = 0.17and β= 0.64 and apply SIS withqk =q andn= 1,000particles.

(33)

Sequential Importance Sampling Weight Degeneracy

Typical Evolution of ESS for a Single Trajectory

1 5 10 15 20 25 30 35 40 45 50

0 100 200 300 400 500 600 700 800 900 1000

ESS (500 indep. replications of the particles)

time index

(34)

Sequential Importance Sampling Weight Degeneracy

Evolution of ESS When Averaging Over Trajectories

1 5 10 15 20 25 30 35 40 45 50

0 100 200 300 400 500 600 700 800 900 1000

ESS (500 indep. simulations)

time index

(35)

Sequential Importance Sampling Weight Degeneracy

Typical Evolution of the Weight Distribution

−250 −20 −15 −10 −5 0

500 1000

−250 −20 −15 −10 −5 0

500 1000

−250 −20 −15 −10 −5 0

50 100

Importance Weights (base 10 logarithm)

Histograms of the base 10 logarithm of the normalized impor- tance weights after (from top to bottom) 1, 10 and 100 iterations for the stochastic volatility model(improved instrumental kernel based on moment matching).

(36)

Sequential Importance Sampling Weight Degeneracy

Filtering Results

−1.5 −2

−0.5 −1 0.5 0 1.5 1 2 0

5

10

15

20 0

0.02 0.04 0.06 0.08

State Time Index

Density

Waterfall representation of the sequence of estimated filtering pdfs (weighted kernel smooth) together with the actual state for 1,000 particles(improved instrumental kernel based on moment matching).

(37)

Sequential Importance Sampling with Resampling

1 Bayesian Dynamic Models

2 The Filtering and Smoothing Recursions

3 Sequential Importance Sampling

4 Sequential Importance Sampling with Resampling Sampling Importance Resampling

Sequential Importance Sampling with Resampling (SISR) Case Study

5 The Auxiliary Particle Filter

6 Smoothing

7 Mixture Kalman Filter

(38)

Sequential Importance Sampling with Resampling Sampling Importance Resampling

In IS, it is indeed possible toreset the weights to a constant value at the price of a, usually moderate, increase in variance.

Sampling Importance Resampling (Rubin, 1987)

Replace{X1:n, W1:n} by{X˜1: ˜N,W˜1: ˜N} such that the discrepancy between the resampled weights{W˜1: ˜N} is reduced and

PN˜

i=1kiδX˜i is a good approximation toPn

i=1WiδXi.

In general the resampling is random and subject to the constraints





N˜ =n , W˜i= 1/N ,˜ E

hPN˜

i=11{X˜i=Xj}

X1:n, W1:n i

= ˜N Wj (1≤j ≤n).

The last condition is often referred to asunbiasedness or proper weighting.

(39)

Sequential Importance Sampling with Resampling Sampling Importance Resampling

Multinomial Resampling

1 Draw, conditionally independently given{X1:n, W1:n},n discrete random variables(J1, . . . , Jn) taking their values in the set{1, . . . , n}with probabilities(W1, . . . , Wn).

2 Set, fori= 1, . . . , n,X˜i =XJi andW˜i= 1/n.

Let Ci=Pn

j=11{X˜j =Xi} (i= 1, . . . , n) denote the number of times each particle is duplicated in the resampling process. The counts (C1, . . . , Cn) follow a multinomial distribution with parameters n, (W1, . . . , Wn), conditionally to {X1:n, W1:n} .

INSTRUMENTAL

TARGET

Resampled particles

(40)

Sequential Importance Sampling with Resampling Sampling Importance Resampling

Some Results on SIR

1i −→D π are n→ ∞ (some extensions of this result)

2 1 n

Pn

i=1f( ˜Xi)is an asymptotically normal estimator ofπ(f) (assumingEπ[πq(X)(1 +f2(X)) +f2(X)]<∞) with asymptotic variance given by

˜

υq(f) = Eπ π

q(X) (f(X)−π(f))2

| {z }

υq(f)

+Eπh

(f(X)−π(f))2i

| {z }

Varπ[f(X)]

Ifnis sufficiently large, the cost of resampling is very moderate in situation that are challenging for IS, i.e., when

υq(f)Varπ[f(X)].

(41)

Sequential Importance Sampling with Resampling Sampling Importance Resampling

Elements of Proof

For a bounded function f, E

h f( ˜Xi)

i

= E h

E

f( ˜Xi)

X1:n, W1:n i

= E

n

X

j=1

Wjf(Xj)

 andPn

j=1Wjf(Xj)−→a.s. π(f) (|Pn

j=1Wjf(Xj)| ≤ kfk).

For the CLT, an ingredient of the proof is that nVar

"

1 n

n

X

i=1

f( ˜Xi)

#

=nVar

" n X

i=1

Wif(Xi)

#

| {z }

→υq(f)

+ E

" n X

i=1

Wif2(Xi)−

n

X

i=1

Wif(Xi)

!2

| {z }

−→Vara.s. π[f(X)]

#

(42)

Sequential Importance Sampling with Resampling Sampling Importance Resampling

There Exists Resampling Schemes with Reduced Variance

Residual Resampling Fori= 1, . . . , nset

Ci= nWi

+ ¯Ci ,

whereC¯1, . . . ,C¯n are distributed, conditionally toW1:n, according to the multinomial distributionMult(n−R,W¯1, . . . ,W¯n) with R=Pn

i=1bnWic and

i= nWi− bnWic

n−R , i= 1, . . . , n .

This scheme is obviously unbiased and induces an additional variance which is always lower thanVarπ[f](the same is true for the bootstrap filter based on residual resampling, Douc &

Moulines, 2005).

(43)

Sequential Importance Sampling with Resampling Sequential Importance Sampling with Resampling (SISR)

The Simplest Functional Algorithm (Gordon et al., 1993)

Regular resampling is added to avoidweight degeneracy and to guarantee the long-term (k→ ∞) stability of the particle filter.

The Bootstrap filter

1 Given X˜k1:n, propose new positions Xk+1i independentlyunder the prior dynamic t( ˜Xki,·), for i= 1, . . . , n;

2 Compute the weightsωik+1=`(Xk+1i , Yk+1), for i= 1, . . . , n and normalize them (Wk+1iik+1/Pn

j=1ωk+1j );

3 Resampleto obtain X˜k+11:n, e.g., by drawing independent indicesJk+1i such thatP Jk+1i =j

Wk+11:n

=Wk+1j and setting X˜k+1i =XJ

i k+1

k+1 (Multinomial Resampling).

To avoid unnecessary resampling,3 can be used only when the ESS statistics of the weightsWk+11:n falls below a threshold.

(44)

Sequential Importance Sampling with Resampling Sequential Importance Sampling with Resampling (SISR)

FILT.

FILT. +1

FILT. +2

FILT.

FILT. +1

FILT. +2

SIS (left) and SISR (right).

(45)

Sequential Importance Sampling with Resampling Case Study

AR(1) Model Observed in Pulsated Noise

We consider a simple Gaussian linear state-space model to allow for comparison with analytical computations:

Xk+1=φXk+σUk , Yk=XkkVk,

whereφ= 0.99,σ = 0.2, andηk is varied(periodically)between 0.1 and3.

The observation sequence is simulated from the model but we also consider the influence of an out-of-modeloutlying observationat time 460.

(46)

Sequential Importance Sampling with Resampling Case Study

0 50 100 150 200 250 300 350 400 450 500

−5 0 5 10

time

state obs.

filt.

smooth.

10 20 30 40 50 60 70 80 90 100

−6

−4

−2 0 2 4 6

time

state obs.

filt.

smooth.

400 410 420 430 440 450 460 470 480 490 500

−8

−6

−4

−2 0 2 4 6 8 10

time

state obs.

filt.

smooth.

(47)

Sequential Importance Sampling with Resampling Case Study

Sequential Importance Sampling (prior kernel, n = 100)

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0

0.5 1 1.5

filt. std.

time

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

−2 0 2

filt. mean

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0

50 100

ESS

(48)

Sequential Importance Sampling with Resampling Case Study

Bootstrap Filter (prior kernel, n = 100, resamp. ESS < 10)

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0

0.5 1 1.5

filt. std.

time

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

−2 0 2

filt. mean

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 20

40 60 80 100

ESS

(49)

Sequential Importance Sampling with Resampling Case Study

Bootstrap Filter (prior kernel, n = 100, resamp. always)

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

−2 0 2

filt. mean

10 20 30 40 50 60 70 80 90 100

20 40 60 80 100

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 0

0.5 1 1.5

filt. std.

time

(50)

Sequential Importance Sampling with Resampling Case Study

Bootstrap Filter (prior kernel, n = 100, resamp. ESS < 10)

400 410 420 430 440 450 460 470 480 490 500

0 0.5 1

filt. std.

time

400 410 420 430 440 450 460 470 480 490 500

0 2 4

filt. mean

20 40 60 80 100

ESS

(51)

Sequential Importance Sampling with Resampling Case Study

Bootstrap Filter (n = 5, 000, resamp. ESS < 500)

400 410 420 430 440 450 460 470 480 490 500

0 0.5 1

filt. std.

time

400 410 420 430 440 450 460 470 480 490 500

0 2 4

filt. mean

1000 2000 3000 4000 5000

ESS

(52)

Sequential Importance Sampling with Resampling Case Study

Conclusions

Withresampling, SISR achieveslong-term stability.

The increase in variance due to resampling is very moderate, especially when resampling is applied only when needed.

The method is still sensitive to outliers, model

misspecification, etc., which may necessitate the use of more elaborate instrumental kernels.

(53)

The Auxiliary Particle Filter

1 Bayesian Dynamic Models

2 The Filtering and Smoothing Recursions

3 Sequential Importance Sampling

4 Sequential Importance Sampling with Resampling

5 The Auxiliary Particle Filter A New Interpretation of SISR The Auxiliary Trick

6 Smoothing

7 Mixture Kalman Filter

(54)

The Auxiliary Particle Filter A New Interpretation of SISR

Alternatives to SISR

The resampling step in the SISR algorithm can be seen as a method to sample approximately under the distribution obtained when plugging the current particle approximation into the filtering update.

This alternative way of thinking about resampling suggests several sequential Monte Carlo variants.

(55)

The Auxiliary Particle Filter A New Interpretation of SISR

The Filtering Recursion Revisited

Recall that(with more general notations) πk+1|k+1(f) =

R R f(x0k|k(dx)t(x, dx0)`(x0, Yk+1) R R πk|k(dx)t(x, dx0)`(x0, Yk+1) . Now, consider what happens when considering theempirical filtering distribution

ˆ πk|kn =

n

X

i=1

WkiδXi k

as an approximation toπk|k and plugging it into the previous relation.

(56)

The Auxiliary Particle Filter A New Interpretation of SISR

Filtering Target

One obtains a mixture target pdf πnk+1|k+1(x) =

Pn

i=1Wkit(Xki, x)`(x, Yk+1) Pn

i=1

R Wkit(Xki, x)`(x, Yk+1)dx

=

n

X

i=1

Wkiγk(Xki) Pn

j=1Wkjγk(Xkj)qkopt(Xki, x), where

γk(x) = Z

t(x, dx0)`(x0, Yk+1), qkopt(x, x0) = t(x, x0)`(x0, Yk+1)

γk(x) .

(57)

The Auxiliary Particle Filter A New Interpretation of SISR

Filtering Step

−100 −8 −6 −4 −2 0 2 4 6 8 10

0.1 0.2 0.3 0.4

−100 −8 −6 −4 −2 0 2 4 6 8 10

0.1 0.2 0.3 0.4

−100 −8 −6 −4 −2 0 2 4 6 8 10

0.1 0.2 0.3 0.4

Prediction

Filtering

likelihood predictive distribution

Approximation

−100 −8 −6 −4 −2 0 2 4 6 8 10

0.05 0.1 0.15 0.2

−100 −8 −6 −4 −2 0 2 4 6 8 10

0.1 0.2 0.3 0.4

−100 −8 −6 −4 −2 0 2 4 6 8 10

0.2 0.4 0.6 0.8

Prediction

Filtering

likelihood predictive distribution

(58)

The Auxiliary Particle Filter A New Interpretation of SISR

The previous reinterpretation of SISR shows that resampling and trajectory update can be integrated into a singleglobal IS step that targetsπk+1|k+1n .

But,

how to chose the global instrumental kernel qglobk ((Xk1:n, Wk1:n), x)?

as πk+1|k+1n is an n-mixture density, evaluation of the IS weights may necessitate of the order ofn2 computations.

RemarkThe global instrumental kernel that corresponds to the bootstrap filter(with resampling)is

qglobk ((Xk1:n, Wk1:n), x) =

n

X

i=1

Wkit(Xki, x).

(59)

The Auxiliary Particle Filter The Auxiliary Trick

The Auxiliary Trick

Assume that the global instrumental kernel is a mixture of the form qglobk ((Xk1:n, Wk1:n), x) =

n

X

i=1

τkiqk(Xki, x).

To avoid then2 update, one can use data augmentation, introducing the mixture component as anauxiliary variable:

qauxk ((Xk1:n, Wk1:n),(i, x)) =τkiqk(Xki, x) defines a pdf on{1, . . . , n} ×X, whose marginal is qk((Xk1:n, Wk1:n), x).

Likewise,

πk+1|k+1n,aux (i, x) = 1

cWkit(Xki, x)`(x, Yk+1), wherec=Pn

i=1

R Wkit(Xki, x)`(x, Yk+1)dx, is the auxiliary target with marginalπnk+1|k+1(x).

(60)

The Auxiliary Particle Filter The Auxiliary Trick

Auxiliary Sampling Algorithm (Pitt & Shephard, 1999)

Auxiliary Particle Filter

Repeatntimes independently, for i= 1, . . . , n,

1 Sample an index Ji in {1, . . . , n}with probabilities (τk1, . . . , τkn).

2 Sample a position Xk+1i fromqk(XkJi, x).

3 Compute the (unnormalized)auxiliary IS weight ωk+1i = WkJit(XkJi, Xk+1i )`(Xk+1i , Yk+1)

τkJiqk(XkJi, Xk+1i ) .

(61)

The Auxiliary Particle Filter The Auxiliary Trick

The Auxiliary View Provides New Degrees of Freedom

In the original auxiliary particle filter, Pitt & Shephard usedqik=t and proposed an heuristic rule for setting the weightsτki based on Xki andYk+1.

The auxiliary IS weight may also be rewritten in normalized form as ωik+1= WkJiγk(XkJi)qkopt(XkJi, Xk+1i )

τkJit(XkJi, Xk+1i ) .

showing that the optimal choice ofτki, in the sense of minimizing theconditional(toXk1:n, Wk1:n) variance ofPn

i=1Wk+1i f(Xk+1i ) for a functionf, is an instance of the Neyman allocation problem.

And thus the optimal choice is τki ∝Wkiγk(Xki)

q υi(f)

whereυi(f) is the asymptotic variance of IS for f, target

qoptk (Xki, x), and, instrumental pdft(Xki, x) (Olssonet al., 2007).

(62)

Smoothing

1 Bayesian Dynamic Models

2 The Filtering and Smoothing Recursions

3 Sequential Importance Sampling

4 Sequential Importance Sampling with Resampling

5 The Auxiliary Particle Filter

6 Smoothing

Using the Ancestry Tree Backward Reweighting

Smoothing for Sum Functionals 7 Mixture Kalman Filter

(63)

Smoothing Using the Ancestry Tree

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

Predictive densities and evolution of the particle ancestry tree

(64)

Smoothing Using the Ancestry Tree

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

Predictive densities and evolution of the particle ancestry tree

(65)

Smoothing Using the Ancestry Tree

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

Predictive densities and evolution of the particle ancestry tree

(66)

Smoothing Using the Ancestry Tree

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

Predictive densities and evolution of the particle ancestry tree

(67)

Smoothing Using the Ancestry Tree

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

Predictive densities and evolution of the particle ancestry tree

(68)

Smoothing Using the Ancestry Tree

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

Predictive densities and evolution of the particle ancestry tree

(69)

Smoothing Using the Ancestry Tree

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

Predictive densities and evolution of the particle ancestry tree

(70)

Smoothing Using the Ancestry Tree

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

Predictive densities and evolution of the particle ancestry tree

(71)

Smoothing Using the Ancestry Tree

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

Predictive densities and evolution of the particle ancestry tree

(72)

Smoothing Using the Ancestry Tree

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

Predictive densities and evolution of the particle ancestry tree

(73)

Smoothing Using the Ancestry Tree

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

5 10 15 20 25

0.4 0.6 0.8 1 1.2 1.4 1.6

time index

state

Predictive densities and evolution of the particle ancestry tree

Références

Documents relatifs

Our techniques combine χ-square concentration inequalities and Laplace estimates with spectral and random matrices theory, and the non asymptotic stability theory of quadratic

Le contrôle des organistes sur l’attaque d’une note étant limité, peu d’études se sont intéressée au contrôle des caractéristiques des transitoires d’attaque des

Best rmse for ETKF over a wide range of inflation factors, rmse of ETKF-N without inflation and rmse of the alternate ETKF- N without inflation, for several time intervals

60 Keywords: Community trait means; Cultivation; Dry Grassland; Grazing; Intraspecific trait 61 variability; Leaf dry matter content; Natural recovery; Restoration; Specific

Nevertheless, the manuscript is rich in references to original literature, elaborates on interesting lines of thought and discussion and will hopefully trigger novel

Here, we generate three different datasets (true state, noisy observations and analog database) using the ex- act Lorenz-63 differential equations with the classical parameters ρ =

The method is then leveraged to derive two simple alternative unscented Kalman filters on Lie groups, for both cases of noisy partial measurements of the state, and full state

In this paper, building upon both the recent theory of Unscented Kalman Filtering on Lie Groups (UKF-LG) and more generally the theory of invariant Kalman filtering (IEKF),