Non-parametric estimation in state-space models

Algorithm 12: SEM algorithm for SSMs [SEM(M)]

Initialization: choose an initial parameter ˆθ0 and a conditioning trajectory.

Forr≥1,

(1) E-step: generate N trajectories {x⁽ⁱ⁾_{0:T ,r}}_i=1:N by using CPF-BS algorithm (14) with the given conditioning sequence, the parameter value θbr−1, the dynamical model Mand the observations y_1:T, and deduce an empirical estimatep_b_r of the smoothing distributionp(x_0:T|y_1:T;θ^br−1).

(2) M-step: compute an estimate of of θ, θbr = arg max

θ∈ΘE

bpr[lnp(X0:T, y1:T;θ)]

end.

When the noises (η_t, _t) in the SSM (4.1) have Gaussian distributions with respect to unknown covariances (Q, R), the following close form expressions can be derived as the estimators of θ= (Q, R) in the M-step.

Qbr = PN

i=1

PT t=1

hx⁽ⁱ⁾_t,r−Mx⁽ⁱ⁾_t−1,r^{i h}x⁽ⁱ⁾_t,r−Mx⁽ⁱ⁾_t−1,rⁱ^>

N T , (4.5)

Rb_r = PN

i=1

PT t=1

hyt−Hx⁽ⁱ⁾_t,r^{i h}yt−Hx⁽ⁱ⁾_t,rⁱ^>

N T . (4.6)

4.3 Non-parametric estimation in state-space models

In the previous section, it is assumed that the dynamical modelmis known or that an estimate ˆm is available. The last case may happen, for example, when the evolution model (4.1) is observed without observational error on some time intervals and thus observations of the process{X_t}are available. When no parametric model is available form, a non-parametric estimate ofmcan be built. Here, we focus on Local Linear Regression (LLR), but other non-parametric estimation methods can be easily plugged into the methodology described in this section (see Chapter 1 [Section 1.2.2.1] for details). In [35], the authors discussed the practical implementation of LLR and presented several interesting case studies. The asymptotic theory of these estimators was described in [136] (see also [61] for time series). The idea of LLR is to locally approximatemby the first-order Taylor’s expansion,m(x⁰)≈m(x) +∇m(x)(x⁰−x), for anyx⁰ in a neighborhood of x. In practice, the intercept m(x) and the slope ∇m(x) are estimated by minimizing a weighted mean square error. The weights are defined relating to a kernel. Here we choose to

Chapter 4. Reconstruction and estimation for non-parametric nonlinear state-space models

use the tricube kernel (Eq. 1.28) as in [35]. This kernel used in many implementations of LLR has a compact support and it is smooth at its boundary. One of the advantages of considering such a bounded kernel is that it reduces the computation of the points out of the support. For the tricube kernel, it is usual to fix a bandwidthhequal to the half width of the kernel support.

An alternative is to perform LLR on the k-nearest neighborhood of x (see [7,91,119,162]). In this case, the support is defined as the smallest rectangular area which contains the k nearest neighbors. It leads to an adaptive way of defining the bandwidth h =h_x because the support depends on the location x.

For the example presented in the general introduction of the thesis, the LLR estimate leads to the ˆm-curve of the left panel of Figure. 1.8 when the time series {X_t} is observed without observational error. As in parametric problems, ignoring the observational error causes inconsistent estimation of m as it is illustrated on the right panel of Figure. 1.8 where the ˆ m-curve corresponds to the LLR estimate based on a sequence of the noisy process {Y_t}. We now propose to adapt the SEM algorithm (Algorithm12) introduced in the previous section to better estimate m in the case wheremis unknown and the only available observed data is a sequence y_1:T of the process{Y_t}.

The key idea of the algorithm is to update a non-parametric estimate ofm at each iteration of the SEM algorithm using the smoothing trajectories simulated in the E-step. It permits to reduce sequentially the bias induced by the observation noise. More details are given in Algorithm 13.

p(x_0:T, y_1:T;θ,m_br−1) denotes the complete likelihood function where m_br−1 is substituted to m. Algorithm 13 looks similar to Algorithm 12. The main difference between the two algorithms is that in the second one, at iteration r, M is an approximation of m defined using a non-parametric estimate based on the current sequences{˜x⁽ⁱ⁾_{0:T ,r}}_i=1:N of the state {X_t}.

This is illustrated on Figure. 4.1 using the toy model (1.4). At the first iteration, an initial parameter value ˆθ0 is chosen and a non-parametric estimate m_b0 of m is computed based on the observed sequence y_1:T. Then, in the E-step, a smoothing algorithm is run. This produces smoothed trajectories with less observation noise than in the original sequence. In the M-step, the parameter valueθand the non-parametric estimate ofmare updated by fitting the SSM using the smoothed trajectories as possible trajectories for the true state. To simplify the illustration, the LLR estimation ˆm_r of m is learned based on one simulated trajectory denoted by ˜x_{0:T ,r}. As shown on Figure. 4.1, the distribution of ˜x0:T,r is closer and closer to the one of X0:T (see

4.3. Non-parametric estimation in state-space models

Algorithm 13: SEM-like algorithm for non-parametric SSMs [npSEM]

Initialization: choose an initial parameter ˆθ0, set the first learning sequence x˜_{1:T ,0} =y_1:T and a conditioning trajectory, and compute the corresponding LLR estimatem_b₀ on ˜x_{1:T ,0}.

Forr≥1,

(1) E-step: generate N trajectories ⁿx˜⁽ⁱ⁾_{0:T ,r}^o

i=1:N by using CPF-BS algorithm (14) with the given conditioning sequence, the parameter value θbr−1, the dynamical model M=m_br−1 and the observationsy1:T, and deduce an empirical estimate p_br of the smoothing distribution p(x_0:T|y_1:T;θbr−1).

(2) M-step:

i. compute an estimate of θ,

θb_r= arg max

θ∈ΘE

bpr[lnp(X_0:T, y_1:T;θ,m_br−1)]. ii. compute a LLR estimatem_b_r ofm withⁿx˜⁽ⁱ⁾_{0:T ,r}^o

i=1:N. end.

Figure.1.8) whenr increases and this permits to reduce the bias in the non-parametric estimate of m. The spirit of this algorithm is close to the one of the iterative global/local estimation (IGLE) algorithm of [176] for estimation of mixture models with mixing proportions depending on covariates.

Algorithm 13 is referred to a SEM-like algorithm because in the M-step the parameter θ is estimated as in Algorithm 12. But we have no warranty that the E-step leads to an increase of a likelihood function, and the M-step is composed of a likelihood maximization for θ and a

"data update" form which cannot be written as a solution of an optimization problem.

At each iteration, the LLR estimate of m is updated and a bandwidth has to be chosen for the kernel. This is done using cross-validation to minimize a mean square error, and a different optimal bandwidth is searched on a grid at each iteration. Note also that the time series {x˜⁽ⁱ⁾_{0:T ,r}}_i=1:N are used both as a learning set to estimatemat iterationrand for the propagation step of the CPF-BS smoother in Algorithm 13. We found, using numerical experiments, that it leads to over-fitting. In order to reduce over-fitting, at each iteration r and for each time t, mbr(˜x⁽ⁱ⁾_t−1,r) is estimated using LLR based on the subsample ˜x⁽ⁱ⁾0:(`−t),(t+`):T ,r where the sequence x˜⁽ⁱ⁾(`−t+1):(`+t−1) is removed from the learning sequence. The lag` is chosen as a priori.

Chapter 4. Reconstruction and estimation for non-parametric nonlinear state-space models

(E-step) generate˜x0:T,1given y1:T,mb0 andθb0

Figure 4.1 – An illustration of Algorithm 13 (npSEM) on the sinus model (1.4). For each iteration, the LLR estimate ( ˆmr)_r≥0 of the dynamical model m is learned on the smoothed samples generated from the previous iteration (˜x_{1:T ,0} =Y_1:T for the first iteration).

Dans le document The DART-Europe E-theses Portal (Page 97-100)