Annexe : Contrˆ ole de l’approximation particulaire

particulaire

Nous commen¸cons par rappeler l’expression de la quantité intermédiaire que nous cherchons à approcher. L’algorithme BOEM produit une suite d’estimations {θn}n≥0 en utilisant, pour tout n ≥ 0, l’espérance ¯Sτχn+1n,Tn(θn, Y)

Figure 5.11 – Représentation graphique des cartes de propagation estimées par l’algorithme Monte Carlo BOEM moyennisé : la carte estimée associée `

Applications au SLAM 79 d´efinie par ¯ Sχn,Tn τn+1 (θn, Y) = 1 τn+1 τn+1 X k=1 Eχ_θ_nn,TnS(XTn+k−1, XTn+k, YTn+k) YTn+1:Tn+τn+1 ,

o`u χn est une loi sur (X, X ) et o`u Eχ_θ_nn,Tn·

YTn+1:Tn+τn+1 est d´efinie par

(1.2). ¯Sχn,Tn

τn+1 (θn, Y) est la quantit´e interm´ediaire de l’algorithme BOEM et

correspond à la quantité intermédiaire de l’algorithme EM, calculée avec les observations YTn+1:Tn+τn+1 du bloc n, lorsque les lois régissant le HMM sont

paramétrées par θn et lorsque l’état initial est distribué sous la loi χn.

Cette quantité n’est calculable de fa¸con explicite que dans certaines situations spécifiques : si l’espace d’état est fini ou lorsque l’on considère des modèles linéaires et gaussiens. Dans des cas plus généraux nous proposons au Chapitre 6 de remplacer cette quantité par une approximation Monte Carlo. Nous considérons ici la situation où ¯Sχn,Tn

τn+1 (θn, Y) est approch´ee par

l’approximation particulaire de l’algorithme FFBS calcul´ee avec Nn+1 par-

ticules, not´ee eSn(voir la Section 3.1.2). Nous effectuons ce choix pour deux

raisons :

i) Le calcul de eSn avec l’algorithme FFBS est r´ealisable en ligne (sans

stocker d’observations), comme cela est pr´ecis´e en Section 2.3.

ii) On peut grâce au Chapitre 7 contrôler l’erreur Lp effectuée sur chaque

bloc. Cela nous permet de choisir le nombre de particules par bloc pour v´erifier les conditions suffisantes de convergence de l’algorithme Monte Carlo BOEM (voir l’hypoth`ese A6 du Chapitre 6).

Calcul en ligne

Pour plus de clart´e, nous redonnons ici le m´ecanisme permettant d’obtenir l’approximation FFBS en ligne au sein du bloc n. On note ηn la loi

de proposition des particules `a l’instant Tn et {ϑnt}t≤τn+1 et {p

t}t≤τn+1 les

poids d’ajustement et les noyaux de proposition utilis´es dans notre algorithme FFBS (voir la Section 3.1.2 pour plus de pr´ecisions).

1) Initialisation.

Pour tout ` ∈ {1, · · · , Nn+1}, simuler de fa¸con ind´ependante ξ0N,` ∼ ηn

et d´efinir ω₀N,`def= dχn dηn(ξ N,` 0 )gθn(ξ N,` 0 , YTn).

Poser, pour tout ` ∈ {1, · · · , Nn+1}, ρ0(ξ0N,`) def

= 0. 2) Propagation : pour t ∈ {0, · · · , τn+1− 1}

Pour tout ` ∈ {1, · · · , Nn+1}, simuler (It+1Nn+1,`, ξ Nn+1,`

t+1 ) suivant la loi ins-

trumentale π_t+1|t+1(i, dx) ∝ ωNn+1,i t ϑnt+1(ξ Nn+1,i t )pnt+1(ξ Nn+1,i t , dx) ,

Poser ωNn+1,` t+1 def = mθn(ξ Nn+1,It+1Nn+1,` t , ξ Nn+1,` t+1 )gθn(ξ Nn+1,` t+1 , YTn+t+1) ϑn t+1(ξ Nn+1,` t )pnt+1(ξ Nn+1,It+1Nn+1,` t , ξ Nn+1,` t+1 ) .

Poser, pour tout ` ∈ {1, · · · , Nn+1},

ρt+1(ξtNn+1,`) def = Nn+1 X i=1 1 t + 1S(ξ Nn+1,i t , ξ Nn+1,` t+1 , YTn+t+1) + 1 − 1 t + 1 ρt(ξtNn+1,i) × ω Nn+1,i t mθn(ξ Nn+1,i t , ξ Nn+1,` t+1 ) PNn+1 j=1 ω Nn+1,j t mθn(ξ Nn+1,j t , ξ Nn+1,` t+1 ) . 3) Calcul de l’approximation. D´efinir e Sndef= Nn+1 X `=1 ωNn+1,` τn+1 ρτn+1(ξ Nn+1,` τn+1 ) .

L’approximation donn´ee par eSnest exactement celle fournie par l’algorithme

FFBS introduite au Chapitre 3. Ceci nous assure que, dans le cas qui nous préoccupe, l’approximation particulaire donnée par l’algorithme FFBS peut se calculer en ligne. Elle ne nécessite donc pas d’effectuer un path-space smoother suivi d’un parcours des données à l’envers (de la dernière observation du bloc jusqu’à la première). D’autre part, les contrôles donnés pour l’algorithme FFBS sont applicables à notre approximation eSn.

Contrˆole de l’erreur d’approximation

Il nous reste maintenant à utiliser les résultats du Chapitre 7 pour obtenir un contrôle de l’erreur plus précis, faisant intervenir les observations dans les bornes (dans le Chapitre 7 nous travaillons conditionnellement à un jeu d’observations fixé).

Ces contrôles s’obtiennent en suivant les mêmes étapes que pour les preuves des propositions 7.1 et 7.2 du Chapitre 7. Nous ne donnons ici que le résultat, toutes les preuves étant détaillées par ailleurs dans l’article [Le Corff et Fort, 2011a]. Les preuves en question nécessitent l’introduction de nouvelles quantités liées aux observations ainsi que des hypothèses qui leur sont rattachées. Pour tout y ∈ Y, on définit

ω₊(y) = sup θ∈Θ sup (x,x0)∈X×X t≥0,n≥0 mθ(x, x0)gθ(x0, y) ϑn t(x)pnt(x, x0) et

b−(y)def= inf θ∈Θ

Applications au SLAM 81

Pour effectuer ces contrˆoles, nous avons besoin

i) d’hypothèses sur le modèle HMM et sur le mécanisme de production des particules (similaires à celles données au Chapitre 7),

ii) d’hypothèses sur les observations (hypothèses de stationnarité et de contrôle de moments faisant intervenir les fonctions b− et ω+).

Sous ces hypoth`eses, on peut alors montrer qu’il existe p > 2 et une constante C > 0 tels que pour tout n ≥ 0,

Sen− ¯S χn,Tn τn+1 (θn, Y) p ≤ C 1 τ_n+11/2N_n+11/2 + 1 Nn+1 ! .

Ceci nous permet donc d’avoir un contrˆole explicite de l’erreur Lp effectu´ee

sur chaque bloc en fonction de la taille du bloc et du nombre de particules utilis´ees pour effectuer l’approximation FFBS. Si, comme au Chapitre 6 nous choisissons un nombre d’observations par bloc de la forme τn = bcnac avec

c > 0 et a > 1, alors il est suffisant de choisir Nn de la forme Nn = bτnd

avec d ≥ (a + 1)/2a pour obtenir l’hypoth`ese A6 du Chapitre 6 et avoir la convergence de l’algorithme Monte Carlo BOEM.

Chapitre 6

Algorithmes de type

Expectation-Maximization

en ligne pour l’estimation

dans les mod`eles de

Markov cach´es (article)

The Expectation Maximization (EM) algorithm is a versatile tool for model parameter estimation in latent data models. When processing large data sets or data stream however, EM becomes intractable since it requires the whole data set to be available at each iteration of the algorithm. In this contribution, a new generic online EM algorithm for model parameter inference in general Hidden Markov Model is proposed. This new algorithm updates the parameter estimate after a block of observations is processed (online). The convergence of this new algorithm is established, and the rate of convergence is studied showing the impact of the block-size sequence. An averaging procedure is also proposed to improve the rate of convergence. Finally, practical illustrations are presented to highlight the performance of these algorithms in comparison to other online maximum likelihood proce- dures.

6.1 Introduction

A hidden Markov model (HMM) is a stochastic process {Xk, Yk}k≥0 in

X × Y, where the state sequence {Xk}k≥0 is a Markov chain and where the

observations {Yk}k≥0 are independent conditionally on {Xk}k≥0. Moreover,

the conditional distribution of Yk given the state sequence depends only

on Xk. The sequence {Xk}k≥0 being unobservable, any statistical infer-

ence task is carried out using the observations {Yk}k≥0. These HMM can

be applied in a large variety of disciplines such as financial econometrics ([Mamon et Elliott, 2007]), biology ([Churchill, 1992]) or speech recognition ([Juang et Rabiner, 1991]).

The Expectation Maximization (EM) algorithm is an iterative algorithm used to solve maximum likelihood estimation in HMM. The EM algorithm is generally simple to implement since it relies on complete data computa- tions. Each iteration is decomposed into two steps: the E-step computes the conditional expectation of the complete data log-likelihood given the observations and the M-step updates the parameter estimate based on this conditional expectation. In many situations of interest, the complete data likelihood belongs to the curved exponential family. In this case, the E-step boils down to the computation of the conditional expectation of the complete data sufficient statistic. Even in this case, except for simple models such as linear Gaussian models or HMM with finite state-spaces, the E-step is intractable and has to be approximated e.g. by Monte Carlo methods such as Markov Chain Monte Carlo methods or Sequential Monte Carlo methods (see [Carlin et al., 1992] or [Capp´e et al., 2005, Doucet et al., 2001] and the references therein).

However, when processing large data sets or data streams, the EM algorithm might become impractical. Online variants of the EM algorithm have been first proposed for independent and identically distributed (i.i.d.) observations, see [Capp´e et Moulines, 2009]. When the complete data likelihood belongs to the cruved exponential family, the E-step is replaced by a stochastic approximation step while the M-step remains unchanged. The convergence of this online variant of the EM algorithm for i.i.d. observations is addressed by [Capp´e et Moulines, 2009]: the limit points are the stationary points of the Kullback-Leibler divergence between the marginal distribution of the observation and the model distribution.

An online version of the EM algorithm for HMM when both the observations and the states take a finite number of values (resp. when the states take a finite number of values) was recently proposed by [Mongillo et Denève, 2008] (resp. by [Cappé, 2011a]). This algorithm has been extended to the case of general state-space models by substituting deterministic approximation of the smoothing probabilities for Sequential Monte Carlo algorithms (see for example [Cappé, 2009, Del Moral et al., 2010a, Le Corff et al., 2011b]). There do not exist convergence results for these online EM algorithms for general state-space models (some insights on the asymptotic behavior are nevertheless given in [Cappé, 2011a]): the introduction of many approximations at different steps of the algorithms makes the analysis quite challeng- ing.

In this contribution, a new online EM algorithm is proposed for HMM with complete data likelihood belonging to the curved exponential family.

Algorithmes BOEM (article) 85

This algorithm sticks closely to the principles of the original batch-mode EM algorithm. The M-step (and thus, the update of the parameter) occurs at some deterministic times {Tk}k≥1 i.e. we propose to keep a fixed parameter

estimate for blocks of observations of increasing size. More precisely, let {Tk}k≥0 be an increasing sequence of integers (T0 = 0). For each k ≥ 0, the

parameter’s value is kept fixed while accumulating the information brought by the observations {YTk+1, · · · , YTk+1}. Then, the parameter is updated at

the end of the block. This algorithm is an online algorithm since the sufficient statistics of the k-th block can be computed on the fly by updating an intermediate quantity when a new observation Yt, t ∈ {Tk+ 1, . . . , Tk+1}

becomes available. Such recursions are provided in recent works on online estimation in HMM, see [Cappé, 2009, Cappé, 2011a, Del Moral et al., 2010a]. This new algorithm, called Block Online EM (BOEM) is derived in Sec- tion 6.2 together with an averaged version. Section 6.3 is devoted to practical applications: the BOEM algorithm is used to perform parameter inference in HMM where the forward recursions mentioned above are available explicitly. In the case of finite state-space HMM, the BOEM algorithm is compared to a gradient-type recursive maximum likelihood procedure and to the online EM algorithm of [Cappé, 2011a]. The convergence of the BOEM algorithm is addressed in Section 6.4. The BOEM algorithm is seen as a perturbation of a deterministic limiting EM algorithm which is shown to converge to the stationary points of the limiting relative entropy (to which the true parameter belongs if the model is well specified). The perturbation is shown to vanish (in some sense) as the number of observations increases thus implying that the BOEM algorithms inherits the asymptotic behavior of the limiting EM algorithm. Finally, in Section 6.5, we study the rate of convergence of the BOEM algorithm as a function of the block-size sequence. We prove that the averaged BOEM algorithm is rate-optimal when the block-size sequence grows polynomially. All the proofs are postponed to Section 6.6; supplementary proofs and comments are provided in Appendix A.

Dans le document Estimations pour les modèles de Markov cachés et approximations particulaires. Application à la cartographie et à la localisation simultanées. (Page 78-86)