Contents lists available atScienceDirect
Automatica
journal homepage:www.elsevier.com/locate/automatica
Brief paper
A novel suboptimal method for solving polynomial filtering problems
✩Xue Luo
a, Yang Jiao
b, Wen-Lin Chiou
c, Stephen S.-T. Yau
d,1aSchool of Mathematics and Systems Science, Beihang University, Beijing, 100191, PR China
bDepartment of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, Science and Engineering Offices (M/C 249), 851 S. Morgan Street, Chicago, IL 60607-7045, USA
cDepartment of Mathematics, Fu-Jen University, New Taipei City, Taiwan, ROC
dDepartment of Mathematical Sciences, Tsinghua University, Beijing, 100084, PR China
a r t i c l e i n f o
Article history:
Received 22 April 2014 Received in revised form 31 May 2015
Accepted 19 August 2015
Keywords:
Polynomial filtering problems Suboptimal filter
Recursive algorithm Estimation theory
a b s t r a c t
In this paper we derive the stochastic differentials of the conditional central moments of the nonlinear filtering problems, especially those of the polynomial filtering problem, and develop a novel suboptimal method by solving this evolution equation. The basic idea is to augment the state of the original nonlinear system by including the original states’ conditional central moments such that the augmented states form a so-called bilinear system after truncating. During our derivation, it is clear to see that the stochastic differentials of the conditional central moments of the linear filtering problem (i.e.,f,gandhare all at most degree one polynomials) form a closed system automatically without truncation. This gives one reason for the existence of optimal filtering for linear problems. On the contrary, the conditional central moments form an infinite dimensional system, in general. To reduce it to a closed-form, we let all the high enough central moments to be zero, as one did in the Carleman approach (Germani et al., 2007).
Consequently, a novel suboptimal method is developed by dealing with the bilinear system. Numerical simulation is performed for the cubic sensor problem to illustrate the accuracy and numerical stability.
©2015 Elsevier Ltd. All rights reserved.
1. Introduction
The nonlinear filtering (NLF) problem has been extensively studied since the linear one has been satisfactorily solved by Kalman in 1960s. But until now there exists no universal optimal method for the general nonlinear settings. The main goal of NLF is to get ‘‘good’’ estimation of the conditional expectation, or perhaps even the conditional density function of the state, given the observation history. We refer the readers to the book by Jazwinski(1970) for excellent introduction to NLF.
One possible general method to NLF is the so-called global ap- proaches, see the survey paper (Luo, 2014) for more detailed dis- cussion. All these methods try to solve analytically or numerically
✩ The material in this paper was not presented at any conference. This paper was recommended for publication in revised form by Associate Editor Michael V. Basin under the direction of Editor Ian R. Petersen.
E-mail addresses:xluo@buaa.edu.cn(X. Luo),jiaoyang0715@gmail.com (Y. Jiao),chiou@math.fju.edu.tw,kubulinchiou@gmail.com(W.-L. Chiou), yau@uic.edu(S.S.-T. Yau).
1 Tel.: +86 10 62787874; fax: +86 10 62798033.
for the conditional density function. One of the most recent global approach is Yau–Yau’s on- and off-line algorithm, which is first de- rived inYau and Yau(2008) and further generalized to nonlinear time-varying setting inLuo and Yau(2013). The Yau–Yau’s method works for all NLF problems theoretically, however, there are still some technical works to be done for high-dimensional states’ prob- lem, say to overcome ‘‘the curse of dimensionality’’. Therefore, cer- tain suboptimal methods still need to be developed.
Another possible way-out to solve the general NLF problems stems from the local approaches, especially the Kalman filter and its derivatives. The basic idea is to augment the states of the original NLF problem in certain way such that the augmented states satisfy a linear or so-called bilinear system (Carravetta, Germani, & Shuakayev, 2000). Generally speaking, one cannot obtain a closed system unless it is Benes’ filter (Benes, 1981) or Yau filter (Yau, 1994). The suboptimal filtering therefore is derived by truncating the infinite-dimensional system in some way. In this direction,Basin(2003) is the first paper where the conditional higher order moments were employed for suboptimal polynomial filtering (PF). Later, a series of papers, say (Basin, 2008; Basin, Shi, & Calderon-Alvarez, 2010) and references therein follow this
http://dx.doi.org/10.1016/j.automatica.2015.09.001 0005-1098/©2015 Elsevier Ltd. All rights reserved.
line. Based on the suboptimal approach introduced for the bilinear system in Carravetta et al. (2000) and Germani, Manes, and Palumbo(2007) developed a Carleman approximation approach for the NLF problems. In their paper, the higher moments are omitted to form a finite-dimensional system. However, it is as early as in 1967 thatKushner(1967) considered the moment sequences.
Even in the one-dimensional problem, the moment sequence has to satisfy the following inequalities:
m2
>
0,
m4>
m22,
m6>
m24/
m2, . . . ,
wherems
,
s=
1,
2, . . .
represent thes-moment of some random variable. In particular, if the random variable is the standard Gaussian, then all the higher moments can be computed explicitly:ms
=
0,
all odds≥
1(
s−
1) !!
ms2,
all evens≥
2,
where
(
s−
1) !! =
1·
3·
5· · · · (
s−
1)
, ifsis even. It is easy to see that no matter how smallm2is, the even moments grows without bound ass→ ∞
. Therefore, it is inappropriate to let all the higher moments to be zero, even in the Gaussian case.In this paper, we propose a novel suboptimal method (NSM) for the PF problems by observing the evolution of the conditional centralmoments of the states. Instead of augmenting the states by their higher moments as in the Carleman approach, we derive the evolution of the highercentralmoments, and omit the high enough ones to form a finite-dimensional system. Another novelty of this paper is that we provide an explanation why the optimal method can be derived only for the linear/bilinear filtering problems from the viewpoint of the evolution of the conditional higher central moments. According toTheorem 2, the stochastic differentials of the conditional central moments form a closed system automatically without truncation, if the linear/bilinear filtering problem is considered.
2. Filtering model and notations The model we consider here is:
dxt=
f(
xt,
t)
dt+
g(
t)
dv
tdyt
=
h(
xt,
t)
dt+
dw
t,
(2.1)wherext
, v
t,
yt, andw
t areRn− ,
Rp− ,
Rm−
, andRm−
valued processes, respectively, andf:
Rn×
R+→
Rn,
g:
R+→
Rn×p,
h:
Rn×
R+→
Rmare polynomials with respect tox. Assume that{ v
t,
t > 0}
and{ w
t,
t > 0}
are Brownian motion processes with Var[
dv
t] =
Q(
t)
dtand Var[
dw
t] =
R(
t)
dt, respectively. Moreover,{ v
t,
t>0} , { w
t,
t>0}
andx0are independent, andy0=
0.The conditional expectation for certain processxtis denoted as
xt:=
E[
xt|
Yt]
for short, whereYt:= {
ys:
0≤
s≤
t}
is the observation history. Also the a priori conditional expectation is denoted as( ◦ )
−:=
E[◦ |
Yt−]
, whereY−:= {
ys:
0≤
s≤
t−}
.In this paper, we shall use the Kronecker algebra for concise- ness. For the quick survey on the Kronecker product and its prop- erties can be found inCarravetta, Germani, and Raimondi(1996).
For the readers’ convenience, we include some simple facts of Kro- necker algebra here. The Kronecker product
⊗
is defined for any two matricesMr×sandNp×q:M
⊗
N:=
m11N
· · ·
m1sN· · · · · · · · ·
mr1N· · ·
mrsN .
LetM[i]denote theith Kronecker power of the matrixM, which is defined as
M[0]
=
1;
M[i]=
M⊗
M[i−1]=
M[i−1]⊗
M.
The stack of the matrixMr×s
:= [
m1,
m2, . . . ,
ms]
is defined as st(
M) =
mT1mT2
· · ·
mTs
T,
where mi is theith column ofM. The inverse operation of the stack can reduce a vector into a matrix with proper size. That is, M
=
st−r×1sst(
M)
. The following are properties of Kronecker product and the stack operation:(
A+
B) ⊗ (
C+
D) =
A⊗
C+
A⊗
D+
B⊗
C+
B⊗
D (2.2)(
A⊗
B) ⊗
C=
A⊗ (
B⊗
C)
(2.3)(
A·
C) ⊗ (
B·
D) = (
A⊗
B) · (
C⊗
D),
(2.4)u
⊗ v =
st(v ·
uT),
(2.5)whereA,B,C, andDare matrices with suitable size,uand
v
are vectors. As the usual matrix multiplication, Kronecker product is not commutative. Given any two matrices A∈
Rra×sa andB∈
Rrb×sb, thenB
⊗
A=
CrTa,rb
(
A⊗
B)
Csa,sb,
whereCa,bis an orthonormal commutative matrix in
{
0,
1}
ab×ab with its entry(
h,
l)
given by{
Ca,b}
h,l=
1
,
ifl= ( |
h−
1|
b)
a+
h−
1 b +
1
0
,
otherwise,
where
[·]
and| · |
s denote the integer part and s-modulo, respectively. The Kronecker power of a binomial,(
a+
b)
[i]allows the following expansion:(
a+
b)
[i]=
i
j=0
Mji
a[j]
⊗
b[i−j]
,
(2.6)for anya
,
b∈
Rn, whereMji∈
Rn×ncan be recursively computed Mhh=
M0h=
Inh,
Mjh
= (
Mjh−1⊗
In) + (
Mjh−−11⊗
In)(
Inj−1⊗
Gh−j),
and
{
Gl}
is a sequence that satisfies the following equations:G1
=
CnT,n,
Gl= (
In⊗
Gl−1) · (
G1⊗
Inl−1).
See detailed derivation inCarravetta et al.(1996).
In the derivation of our NSM for PF problems, the Itô formula for the computation of stochastic differentials is needed (seeLiptser &
Shiryayev, 1977). For any vector function
ψ(
xt) :
Rn→
Rr, we haved
ψ = ( ∇
x⊗ ψ) |
xtdxt
+
1 2 ∇
x[2]⊗ ψ
xt(
dxt⊗
dxt).
(2.7) The differential operator∇
x[i]⊗
applied toψ
is defined as∇
x[0]⊗ ψ = ψ, ∇
x[i+1]⊗ ψ = ∇
x⊗
∇
x[i]⊗ ψ
,
i≥
1,
with∇
x=
∂∂x1
∂∂x2
· · ·
∂∂xn
. 3. NSM for PF problems3.1. Derivation of stochastic differentials for higher central moments It is well-known fromJazwinski (1970) that the conditional mean of the processxtsatisfies
dx
ˆ
t= ˆ
f dt+
P[1]hTR−1
dy− ˆ
hdt
=
ˆ
f+
P[1]hTR−1(
h− ˆ
h)
dt
+
P[1]hTR−1dw
t,
(3.8)whereP[α]
:= (
x− ˆ
x−)
[α], for anyα ≥
1. Thus, it is not hard to get thatdP[1]
:=
d
x− ˆ
x−
(2.1),(3.8)
=
f
− ˆ
f−
−
P[1]hT−R−1
h− ˆ
h−
dt
+
gdv
t−
P[1]hT−R−1dw
t.
(3.9)Theorem 1. For any
α ≥
1, dP[α]=
Unα
f
− ˆ
f−
−
P[1]hT−R−1
h− ˆ
h−
⊗
P[α−1]
+
1 2Oαn
st
gQgT
+
P[1]hT−R−1hP[1]T−
⊗
P[α−2]
dt+
Unα
In
⊗
P[α−1]
gd
v
t−
P[1]hT−R−1dw
t ,
(3.10) with convention that P[α]=
0, ifα <
0, and P[0]=
1, where Inis the n×
n identity matrix, Unαand Oαn, forα >
1, are recursively computed asUn1
=
In,
Unα=
Inα+
CnTα−1,n
Unα−1⊗
In
(3.11) Oαn
=
UnαCTnα−1,n
Unα−1CT
nα−2,n
⊗
In
CTn2,nα−2
.
Proof. Taking
ψ
in(2.7)as the function ofP[1]yields that dP[α]=
∇
P[1]⊗
P[α]
dP[1]+
12
∇
[2]P[1]
⊗
P[α]
dP[1]
⊗
dP[1]
.
(3.12)According to Lemma 1,Germani et al.(2007), we have
∇
x⊗
x[α]=
Unα
In
⊗
x[α−1]
;
∇
x[2]⊗
x[α]=
Oαn
In2
⊗
x[α−2] ,
whereUnαandOαnare given in(3.11), see the proof in the appendix ofGermani et al.(2007). Eq.(3.10)follows immediately from(3.9), (3.12)and the fact that
dP[1]
⊗
dP[1]=
st
dP[1]·
dP[1]
T
(3.9)
=
st
gQgT
+
P[1]hT−R−1hP[1]T−
dt.
3.2. Stochastic differentials and NSM for PF problems
In the sequel, we shall apply Theorem 1 to the polynomial filtering problem(2.1)withf,g andhwritten in the Kronecker product notation:
f
(
x,
t) =
deg(f)
l=0
Flx[l]
;
g:,i(
x,
t) =
deg(g)
l=0
Gi,lx[l]
;
h
(
x,
t) =
deg(h)
l=0
Hlx[l]
,
(3.13)where deg
(
f)
, deg(
g)
and deg(
h)
are the degrees of the polyno- mialsf,gandh, respectively.Fl:
R+→
Rn×nl,G:,l:
R+→
Rn×nl andHl:
R+→
Rm×nl. For the conciseness of notations, we may omit thet-dependence inFl,
G:,landHlin the sequel.Theorem 2. Consider the system(2.1). Assume further that Q
=
diag(
qi),
i=
1, . . . ,
p, and R=
diag(
rj),
j=
1, . . . ,
m. Forα ≥
1,the stochastic differentials of the central moments P[α]are given dP[α]
=
Unα
deg(f)+α−1
q=α−1
deg(f)−q+α−1
k=0
Fq+k−α+1Mkq+k−α+1
⊗
Inα−1 ˆ
x[−k]⊗
Inq
P[q]−
deg(f)
l=0 l
k=0
FlMkl⊗
Inα−1
ˆ
x[−k]⊗
P[l−k]−⊗
Inα−1
P[α−1]
−
Unα
P[1]hT−R−1
⊗
Inα−1
·
deg(h)+α−1
q=α−1
deg(f)−q+α−1
k=0
Hq+k−α+1Mkq+k−α+1
⊗
Inα−1 ˆ
x[−k]⊗
Inq
P[q]−
deg(h)
l=0 l
k=0
HlMkl⊗
Inα−1
ˆ
x[−k]⊗
P[l−k]−⊗
Inα−1
P[α−1]
+
1 2Oαnp
i=1
qi
deg(g)+α−2
s=α−2
deg(g)−s+α−2
j=0
j+s−α+2
k=0
+
deg(g)+α−1
s=α−2
deg(g)−s+α−2
j=deg(g)−s+α−1
+
2 deg(g)+α−2
s=deg(g)+α
2 deg(g)−s+α−2
j=0
deg(g)
k=−deg(g)+j+s−α+2
Gi,j+s−α+2−k
⊗
Gi,k
Mjj+s−α+2
⊗
Inα−2
·
xˆ
[−j]⊗
Ins
P[s]+
1 2Oαnm
i=1
ri−1
P[1]hT−
[2] :,i⊗
Inα−2
P[α−2]
dt+
Unαp
i=1
deg(g)+α−1
q=α−1
deg(g)−q+α−1
j=0
·
Gi,q+j−α+1Mjq+j−α+1
⊗
Inα−1
xˆ
[−j]⊗
Inq
P[q]
(
dv
t)
i−
Unα
P[1]hT−R−1dw
t⊗
Inα−1
P[α−1]
,
(3.14)with the convention that P[α]
=
0, ifα <
0and P[0]=
1, where P[1]hT−=
deg(h)
l=0 l
k=0
HlMkl
x
ˆ
[−k]⊗
st−1nl−k×nP[l−k+1]−
T,
and( ◦ )
:,idenotes the ith column of the matrix( ◦ )
.Proof. Direct computation of the terms on the right-hand side of (3.8)withf,gandhas polynomials(3.13)yields that
ˆ
f−=
deg(f)
l=0
Flx
[l]− (2.6)=
deg(f)
l=0 l
k=0
FlMkl
x
ˆ
[−k]⊗
P[l−k]− ,
(3.15)and similarly, hˆ−=
deg(h)
l=0 l
k=0
HlMkl
xˆ[−k]⊗P[l−k]−
. (3.16)