Contents lists available atSciVerse ScienceDirect
Journal of Computational and Applied Mathematics
journal homepage:www.elsevier.com/locate/cam
Solving large-scale continuous-time algebraic Riccati equations by doubling
Tiexiang Li
a, Eric King-wah Chu
b,∗, Wen-Wei Lin
c, Peter Chang-Yi Weng
baDepartment of Mathematics, Southeast University, Nanjing, 211189, People’s Republic of China
bSchool of Mathematical Sciences, Building 28, Monash University, VIC 3800, Australia
cDepartment of Applied Mathematics, National Chiao Tung University, Hsinchu 300, Taiwan
a r t i c l e i n f o
Article history:
Received 21 July 2011
Received in revised form 22 May 2012 MSC:
15A24 65F50 93C05 Keywords:
Continuous-time algebraic Riccati equation Doubling algorithm
Krylov subspace Large-scale problem
a b s t r a c t
We consider the solution of large-scale algebraic Riccati equations with numerically low- ranked solutions. For the discrete-time case, the structure-preserving doubling algorithm has been adapted, with the iterates forAnot explicitly computed but in the recursive formAk=A2k−1−D(k1)Sk−1[D(k2)]⊤, withD(k1) and D(k2) being low-ranked and Sk−1 being small in dimension. For the continuous-time case, the algebraic Riccati equation will be first treated with the Cayley transform before doubling is applied. Withnbeing the dimension of the algebraic equations, the resulting algorithms are of an efficientO(n) computational complexity per iteration, without the need for any inner iterations, and essentially converge quadratically. Some numerical results will be presented. For instance in Section 5.2, Example 3, of dimensionn=20 209 with 204 million variables in the solutionX, was solved using MATLAB on a MacBook Pro within 45 s to a machine accuracy ofO(10−16).
©2012 Elsevier B.V. All rights reserved.
1. Large-scale algebraic Riccati equations
Let the system matrixAbe large and sparse, possibly with band structures. The discrete-time algebraic Riccati equation (DARE):
D
(
X) ≡ −
X+
A⊤X(
I+
GX)
−1A+
H=
0,
(1a) and the continuous-time algebraic Riccati equation (CARE):C
(
X) ≡
A⊤X+
XA−
XGX+
H=
0,
(1b) with the low-rankedG
=
BR−1B⊤,
H=
CT−1C⊤,
(1c)whereB
∈
Rn×m,
C∈
Rn×landm,
l≪
n, arise often in linear–quadratic optimal control problems [1,2].The solution of CAREs and DAREs has been an extremely active area of research; see, e.g., [3,1,2]. The usual solution methods such as the Schur vector method, symplectic SR methods, the matrix sign function, the matrix disk function or the
∗Corresponding author. Tel.: +61 3 99054480; fax: +61 3 99054403.
E-mail addresses:[email protected](T. Li),[email protected](E.K.-w. Chu),[email protected](W.-W. Lin),[email protected] (P.C.-Y. Weng).
0377-0427/$ – see front matter©2012 Elsevier B.V. All rights reserved.
doi:10.1016/j.cam.2012.06.006
doubling method have not made (full) use of the sparsity and structure inA
,
GandH. Requiring in generalO(
n3)
flops and workspace of sizeO(
n2)
, these methods are obviously inappropriate for the large-scale problems we are interested in here.For control problems for parabolic PDEs and the balancing based model order reduction of large linear systems, large- scale CAREs and DAREs have to be solved [4–9]. As stated in [10,11], ‘‘the basic observation on which all methods for solving such kinds of matrix equations are based, is that often the (numerical) rank of the solution is very small compared to its actual dimension and therefore it allows for a good approximation via low rank solution factors’’. Importantly, without solving the corresponding algebraic Riccati equations, alternative solutions to the optimal control problem require the deflating subspace of the corresponding Hamiltonian matrices or (generalized) symplectic pencils which are prohibitively expensive to compute.
Benner, Fassbender and Saak have done much on large-scale algebraic Riccati equations; see [10–13] and the references therein. They built their methods on (inexact) Newton’s methods with inner iterations for the associated Lyapunov and Stein equations. We shall adapt the structure-preserving doubling algorithm (SDA) [14–16], making use of the sparsity inAand the low-ranked structures inGandH. For other applications of the SDA, see [17].
2. Structure-preserving doubling algorithm for DAREs
We shall abbreviate the discussion for DAREs; please consult [18] for details.
The structure-preserving doubling algorithm (SDA) [15], assuming
(
I+
GH)
−1exists, has the following form:
G
←
G+
A(
I+
GH)
−1GA⊤,
H←
H+
A⊤H(
I+
GH)
−1A,
A←
A(
I+
GH)
−1A.
(2)
We shall apply the Sherman–Morrison–Woodbury formula (SMWF) to
(
I+
GH)
−1and make use of the low-ranked forms ofGandHin(1c).2.1. Large-scale SDA
From the first glance, the iteration forAin the SDA in(2)appears doomed, withO
(
n3)
operations for the products of full matrices. However, with the low rank form in(1c), we shall organize the SDA into the form: (fork=
1,
2, . . .
)
Ak
=
A2k−1−
D(k1)Sk−1 D(k2)
⊤
,
Gk=
BkR−k1B⊤k,
Hk
=
CkTk−1Ck⊤.
(3)
The application of the SMWF on
(
In+
GkHk)
−1yields Ak+1=
Ak(
In+
GkHk)
−1Ak=
Ak
In
−
GkCkTk−1Ilk
+
Ck⊤GkCkTk−1−1Ck⊤
Ak
=
Ak In
−
BkImk
+
R−k1B⊤kHkBk−1R−k1B⊤kHk
Ak
,
whereCkandBkhave respectivelylkandmkcolumns. It will be obvious that it is more convenient to work withSk−1
,
R−k1andTk−1, and we retain the inverse notation only for historical reasons, although there is no actual inversion involved.
Consequently, withCk
∈
Rn×lkandBk∈
Rn×mk, we have Ak+1=
A2k−
D(k1+)1Sk−+11 D(k2+)1
⊤
,
(4)with the update of ‘‘size’’lkdefined by
D(k1+)1
=
AkGkCk,
D(k2+)1=
A⊤kCk,
Sk−+11=
Tk−1Ilk
+
Ck⊤GkCkTk−1−1∈
Rlk×lk,
(5a) or the update of ‘‘size’’mkdefined byD(k1+)1
=
AkBk,
D(k2+)1=
A⊤kHkBk,
Sk−+11=
Imk
+
R−k1B⊤kHkBk−1R−k1
∈
Rmk×mk,
(5b) all involvingO(
n3)
operations for a denseA. The operation counts will be reduced toO(
n)
with the assumption that the maximum number of nonzero components in any row or column ofAis much less thann(seeTable 2in Section4.2). The trick isnotto formAkexplicitly. Note that we have to store all theBi,
Ci,R−i1andTi−1fori=
0,
1, . . . ,
k−
1 to facilitate the multiplication of low-ranked matrices byAkorA⊤k.We may choose between(5a)and(5b)based on the sizeslkandmk. Ignoring the small saving in the inversion of smaller matrices, the compression and truncation in the next section produces the leanerBkandCk, which makes the choice here irrelevant. However, this choice may be important whenGorHare not low-ranked.
The induction proof of the general form ofAkin(4)–(5b)can be completed by considering the initialk
=
1 case, which is trivial.ForBk
,
CkandRk, applying the SMWF to(
I+
GkHk)
−1in the SDA, we have Gk+1=
Gk+
AkGkA⊤k−
AkGkCkTk−1Ilk
+
Ck⊤GkCkTk−1−1Ck⊤GkA⊤k
=
Gk+
AkGkA⊤k−
AkBkImk
+
R−k1B⊤kHkBk−1R−k1B⊤kHkGkA⊤k
,
(6)and
Hk+1
=
Hk+
A⊤kHkAk−
A⊤kHkGkCkTk−1Ilk
+
Ck⊤GkCkTk−1−1Ck⊤Ak
=
Hk+
A⊤kHkAk−
A⊤kHkBk(
Imk+
R−k1B⊤kHkBk)
−1R−k1B⊤kHkAk.
(7) These imply thatBk+1
= [
Bk,
AkBk] ,
Ck+1= [
Ck,
A⊤kCk] ,
(8) R−k+11=
R−k1⊕
R−k1
−
R−k1B⊤kCkTk−1Ilk
+
Ck⊤GkCkTk−1−1Ck⊤BkR−k1
(9a)
=
R−k1⊕
R−k1
−
Imk
+
R−k1B⊤kHkBk−1R−k1B⊤kHkBkR−k1
,
(9b)Tk−+11
=
Tk−1⊕
Tk−1
−
Tk−1Ck⊤GkCkTk−1Ilk
+
Ck⊤GkCkTk−1−1(10a)
=
Tk−1⊕
Tk−1
−
Tk−1Ck⊤BkImk
+
R−k1B⊤kHkBk−1R−k1B⊤kCkTk−1
(10b) with the initial values
A0
=
A,
B0=
B,
C0=
C,
R0=
R,
T0=
T.
(11)We have shown that the SDA can be organized into the form(3). The existence ofR−k1
,
Tk−1and(
In+
GkHk)
−1guarantees the same for other inverses in(9a)–(10b). Note thatR−k1,
Sk−1andTk−1are symmetric for allk. Again, the choice in(9a)–(10b) may be relevant whenGorHare not low-ranked.For well-behaved DAREs [14,15], we haveHk
=
CkTk−1Ck⊤→
XandGk=
BkR−k1B⊤k→
Y (solution of the dual DARE) as k→ ∞
.Note that the ranks ofXandY have been observed to be numerically low-ranked. Under suitable assumptions [14,15], the convergence of the SDA implies the convergence ofAk
=
O( | λ |
2k) →
0, for some| λ | <
1. Together with(8)–(10b), we see thatBk+1andCk+1equal, respectively, the sums ofBkandCkand the diminishing componentsAkBkandA⊤kCk. Thus the observation about the low numerical ranks ofXandYhas been shown to be true.2.2. Compression and truncation of Bkand Ck
Now we shall consider an important aspect of the SDA for large-scale DAREs (SDA_ls)—the growth ofBkandCk. Obviously, as the SDA converges, increasingly smaller components are added toBkandCk. As is apparent from(8), the growth in the sizes and ranks of these iterates is potentially exponential. Let the computational complexity of the SDA_ls beO
(
n) = α
n+
O(
1)
. If the convergence is slow relative to the growth inBkandCk, the algorithm will fail, withα
growing exponentially (see Table 2in Section4.2). In such cases,Xis obviously no longer numerically low-ranked, with respect to some given truncation tolerance (seeτ
1, τ
2 in (10)and(11)). It will then be extremely challenging to approximateX in O(
n)
computational complexity to high accuracy, by any method. One possibility will be to accept approximations toXto lower accuracies with a higher truncation tolerance, thus lowering the corresponding numerical rank ofX.To reduce the dimensions ofBk
,
Ck,
D(k1)andD(k2), we shall compress their columns by orthogonaization. Consider the QR decompositions with column pivoting:Bk
=
Q1kM1k+
Q1kM1k,
Ck=
Q2kM2k+
Q2kM2k with∥
M1k∥ ≤ τ
1, ∥
M2k∥ ≤ τ
2where
τ
i(
i=
1,
2)
are some small tolerances controlling the compression and truncation process,lkandmkare respectively the numbers of columns inBkandCkbounded from above by some correspondingmmaxandlmax,r1k
=
rankBk≤
lk≤
mmax≪
n,
r2k=
rankCk≤
mk≤
lmax≪
n,
and fori
=
1,
2,
Qik∈
Rn×rikare unitary andMik∈
Rrik×nikare full-ranked and upper triangular. We then have BkR−k1B⊤k=
Q1kM1kR−k1M1k⊤
Q1k⊤
+
O(τ
1),
(12)CkTk−1Ck⊤
=
Q2kM2kTk−1M2k⊤
Q2k⊤
+
O(τ
2),
(13)and we should replaceBkandR−k1(or,CkandTk−1) respectively by the leanerQ1kandM1kR−k1M1k⊤(or,Q2kandM2kTk−1M2k⊤).
We may ignore compressing and truncatingD(k1)andD(k2)after compressing and truncatingBkandCk. As a result, we ignore theO
(τ
i)
terms and control the growth ofrikwhile sacrificing a hopefully negligible bit of accuracy.Interestingly, we need onlyR
,
TandI+
GkHkto be invertible (which imply the invertibility ofRkandTkfor allk), opening up the possibility of dealing with DAREs with indefiniteRs andTs [19].Eqs.(4)(used recursively butnotexplicitly),(5a)(or(5b)),(8),(9a)(or(9b)),(10a)(or(10b)),(12)and(13), together with the corresponding initial values in(11), constitute the SDA_ls.
2.3. SDA and Krylov subspaces
There is an interesting relationship between the SDA_ls and Krylov subspaces. Define the Krylov subspaces Kk
(
A,
B) ≡
span
{
B} (
k=
0),
span{
B,
AB,
A2B, . . . ,
A2k−1B} (
k>
0).
From(4)and(8), we can see that
B0
=
B∈
K0(
A,
B),
B1= [
B,
AB] ∈
K1(
A,
B)
and, for some low-rankedF,B2
=
B1
,
A1B1] = [
B,
AB, (
A2−
ABF⊤)(
B,
AB)
∈
K2(
A,
B).
(We have abused notations, withV
∈
Kk(
A,
B)
meaning span{
V} ⊆
Kk(
A,
B)
.) Similarly, it is easy to show that Bk∈
Kk(
A,
B),
Ck∈
Kk(
A⊤,
C).
In other words, the general SDA is closely related to approximating the solutionsXandY using Krylov subspaces, with additional components vanishing quadratically. However, for problems of small sizen
,
BkandCkbecome full-ranked after a few iterations.The Krylov subspacesKk
(
A,
B)
play a vital part in the fast convergence of the SDA, which comes from two sources. Apart from the diminishingAkcontributing in(2)in the updating ofGandH, the power of approximation of the corresponding Krylov subspaces also contributes, creating cancellations inGk+1 and Hk+1 in(6) and (7). This phenomenon has been confirmed in some extreme examples, with some eigenvalueλ
of the symplectic matrix pencil associated with the DARE nearly on the unit circle [16]. Instead of the number of iterations predicted purely fromλ
for convergence, the SDA requires significantly less.2.4. Errors of SDA_ls
The SDA_ls can be interpreted as a Galerkin method, or directly from(2). With
δ
k≡
max{∥ δ
Gk∥ , ∥ δ
Hk∥ , ∥ δ
Ak∥} ,
where
δ
Gk, δ
Hkandδ
Akare respectively the truncation/round-off errors inGk,
HkandAk, we can showδ
k+1≤ (
1+
ck)δ
k+
O(δ
2k),
(14)withck
→
0 ask→ ∞
. A more detailed discussion can be found in [18, Section 2.5]. Essentially, we limit the rank of the approximation toX, trading off the accuracy inXwith the efficiency of the SDA_ls. Assume that the compression and truncation in(12)and(13)create errors ofO(τ
i) (
i=
1,
2)
inGkandHk, respectively. It is easy to see from(14)that errors of the same magnitude will propagate through toAk+1,
Gk+1andHk+1. The fact thatAk→
0 impliesck→
0 and contributes towards diminishing these errors. From our numerical experience, the trade-off between the ranks ofGkandHkand the accuracy of the approximate solutions toXandY is the key to the success of our computation. If these ranks grow out of control, unnecessary and insignificant small additions to the iterates overwhelm the computation in terms of flop counts and memory requirement. Limiting the ranks will obviously reduce the accuracy of the approximate solution. We found we do not have to experiment much with the tolerances for the compression/truncation and convergence while trying to achieve a balance between accuracy and the feasibility/efficiency of the SDA.Table 1
Krylov subspaces for solutionXand adjoint solutionY.
Equation X Y
DARE, Stein equation Kk(A⊤,C) Kk(A,B) CARE, Lyapunov equation Kk(A−⊤γ ,A−⊤γ C) Kk(A−γ1,A−γ1B)
3. CAREs
One possible approach for large-scale CAREs is to transform them to DAREs using Cayley transforms.
3.1. SDA after Cayley transform
From [14], the matricesA
,
GandHin the CARE(1b)are first treated with the Cayley transform:A0
=
I+
2γ
Aγ
+
GA−⊤γ H−1,
(15)G0
=
2γ
A−γ1GA⊤γ
+
HA−γ1G−1,
(16)H0
=
2γ
A⊤γ
+
HA−γ1G−1HA−γ1
,
(17)withAγ
≡
A− γ
Iand a suitableγ >
0 chosen to optimize the condition of various matrix inversions. A simple application of the SMWF implies(
Aγ+
GA−⊤γ H)
−1=
A−γ1−
A−γ1GA−⊤γ C·
T−1(
Il+
C⊤A−γ1GA−⊤γ CT−1)
−1·
C⊤A−γ1 (18a)=
A−γ1−
A−γ1B·
Im
+
R−1B⊤A−⊤γ HA−γ1B−1R−1
·
B⊤A−⊤γ HA−γ1.
(18b) It is not hard to see, with the above initialA0,
G0andH0, that the SDA_ls still works, again with exactly the same forms and updating formulae forAk,
Bk,
Ck,
D(k1),
D(k2)and the inverses ofRk,
SkandTk. One relevant difference for CAREs is thatA0̸=
A but satisfies, from(15),(18a)and(18b),A0
=
In
+
2γ
A−γ1−
D(01)S0−1 D(02)⊤
(19) with
B0
=
A−γ1B,
C0=
A−⊤γ C.
(20)The corresponding sizelandmperturbed updates have the forms, respectively, D(02)
=
C0,
D(01)=
A−γ1GC0,
S0−1=
2γ
Il
+
T−1C0⊤GC0−1T−1
;
(21a)D(01)
=
B0,
D(02)=
A−⊤γ HB0,
S−01=
2γ
Im
+
R−1B⊤0HB0−1R−1
.
(21b)Note that all computations can be realized inO
(
n)
operations, assuming that the operationsA−γ1BandA−⊤γ Care achievable inO(
n)
flops; see [20, Section 9.1] for a bandedA.Similarly, we have R−01
=
2γ
R−1
−
R−1B⊤C0·
Il
+
T−1C0⊤GC0−1T−1
·
C0⊤BR−1
(22a)
=
2γ
R−1
−
R−1B⊤0HB0Im
+
R−1B⊤0HB0−1R−1
,
(22b)and
T0−1
=
2γ
T−1
−
T−1Il
+
C0⊤GC0T−1−1C0⊤GC0T−1
(23a)
=
2γ
T−1
−
T−1C⊤B0·
R−1Im
+
B⊤0HB0R−1−1·
B⊤0CT−1.
(23b)For CAREs, we have
Bk
∈
Kk(
A−γ1,
A−γ1B),
Ck∈
Kk(
A−⊤γ,
A−⊤γ C).
(24) Note that the Krylov subspacesKk(
A±1,
B)
andKk(
A±⊤,
C)
have been used in the solution of CAREs and Lyapunov equations in [21–26], quite different from the subspaces associated with the SDA here. This difference may explain the superiority of our methods. From(24)and [18,27], we can see clearly the appropriate choices of Krylov subspaces for DAREs and CAREs, as well as the corresponding Stein and Lyapunov equations. A summary is contained inTable 1.We summarize the algorithm below, with the particular choice of(4),(5a),(8),(9a),(10b),(12)and(13). We would like to emphasize that care has to be exercised in Algorithm 1 below, with the multiplications byAk+1andA⊤k+1carried out recursively using(4)and(5a)or(5b). Otherwise, computations cannot be carried out inO
(
n)
complexity. Similar care has to be taken in the computation of residuals (used in Algorithm 1 below) or differences of iterates (as an alternative convergence control), as discussed in Section4.2later.Algorithm 1 (SDA_ls)
Input: A
∈
Rn×n,
B∈
Rn×m,
R−1=
R−⊤∈
Rm×m,
C∈
Rn×l,
T−1=
T−⊤∈
Rl×lshiftγ >
0, positive tolerancesτ
1, τ
2andϵ
, andmmax,
lmax;Output: Bϵ
∈
Rn×mϵ,
R−ϵ1=
R−⊤ϵ∈
Rmϵ×mϵ,
Cϵ∈
Rn×lϵandTϵ−1=
Tϵ−⊤∈
Rlϵ×lϵ, withCϵTϵ−1Cϵ⊤and BϵR−ϵ1B⊤ϵ approximating, respectively, the solutionsXandYto the large-scale CARE(1b) and its adjoint;ComputeAγ
=
A− γ
I;Setk
=
0,
r0=
2ϵ ;
B0=
A−γ1B,
C0=
A−⊤γ C; R−01=
2γ
R−1
−
R−1B⊤C0·
Il
+
T−1C0⊤GC0−1T−1
·
C0⊤BR−1 , T0−1
=
2γ
T−1
−
T−1C⊤B0·
R−1Im
+
B⊤0HB0R−1−1·
B⊤0CT−1
; D(02)
=
C0,
D(01)=
A−γ1GC0,
S0−1=
2γ
Il
+
T−1C0⊤GC0−1T−1, A0
=
In+
2γ
A−γ1−
D(01)S0−1 D(02)
⊤
; Computeh
= ∥
H0∥ = ∥
C0T0−1C0⊤∥
; Dountil convergence:Ifthe relative residualr
˜
k= |
dk/(
hk+
mk+
h) | < ϵ
,SetBϵ
=
Bk,
R−ϵ1=
R−k1,
Cϵ=
CkandTϵ−1=
Tk−1; ExitEnd If
ComputeBk+1
= [
Bk,
AkBk] ,
Ck+1= [
Ck,
A⊤kCk]
; R−k+11=
R−k1⊕
R−k1
−
R−k1B⊤kCkTk−1Ilk
+
Ck⊤GkCkTk−1−1Ck⊤BkR−k1
, Tk−+11
=
Tk−1⊕
Tk−1
−
Tk−1Ck⊤BkImk
+
R−k1B⊤kHkBk−1R−k1B⊤kCkTk−1
; withAk+1
=
A2k−
D(k1+)1Sk−+11 D(k2+)1
⊤ , D(k1+)1
=
AkGkCk,
D(k2+)1=
A⊤kCk,
Sk−+11=
Tk−1Il
+
Ck⊤GkCkTk−1−1; CompressBk+1andCk+1, using the tolerances
τ
1andτ
2, and modifyR−k+11andTk−+11, as in(12)and(13);
Computek
←
k+
1,
dk= ∥
D(
Hk) ∥ ,
hk= ∥
Hk∥
andmk= ∥
Mk∥
, as in Section4.2;End Do
4. Computational issues
4.1. Residuals and convergence control
Consider the difference of successive iterates:
dGk
≡
BkR−k1B⊤k−
Bk+1R−k+11B⊤k+1=
Bk+1R−k+11B⊤k+1,
we haveBk+1
≡
[Bk,
Bk+1],
R−k+11≡
R−k1⊕
−
R−k+11.
Similarly, withdHk≡
CkTk−1Ck⊤−
Ck+1Tk−+11Ck⊤+1, we havedHk
=
Ck+1Tk−+11Ck⊤+1 withCk+1